Spout & PDO Example

The code in this repository demonstrates how to efficiently extract large data set from a MySQL database and export it to an Excel file. It turns out that this is not as easy at it seems. Both of these tasks often leads to terrible performance and out of memory crashes with big data sets.

This example uses PDO as the MySQL driver as well as Spout to create XLSX files. [Read why.]

How to run the example?

Clone or download this repository.
Add data in your MySQL database. This example provides test data that can be found in sql/fixtures.sql. To create a new database and import the test data into it, you can run mysql -u {USERNAME} -p {password} < sql/fixtures.sql. You may need to comment the first line of the sql file for the first import.
Run the example: php index.php

Benchmarks

These metrics have been obtained by fetching and writing 50,000 rows (200,000 cells) to a XLSX file.

Row fetching mode	Elapsed time	Memory peak
One by one	8.32s	1MB
Batch (500)	8.02s	2.25MB
Batch (1000)	9.72s	3.75MB
All at once	9.24s	73.25MB

These metrics have been obtained by fetching and writing 200,000 rows (800,000 cells) to a XLSX file.

Row fetching mode	Elapsed time	Memory peak
One by one	32.95s	1MB
Batch (500)	36.86s	2.25MB
Batch (1000)	38.70s	3.75MB
All at once	40.61s	289MB

FAQ

Why Spout? Why not PHPExcel?

Even though PHPExcel is a great and popular library, it reaches its limits with large data sets. Spout was designed to support any size of data sets, making it super easy to scale without needing to worry about caching, optimizations, etc.

To support the previous claim, the same set of metrics was collected, but using PHPExcel instead of Spout. Here are the results obtained:

Num written rows	Metric	PHPExcel	Spout	Ratio
50,000	Elapsed time	33.56s	8.02s	4.2x
	Memory peak	186MB	2.25MB	83x
200,000	Elapsed time	172.91s	36.86s	4.5x
	Memory peak	720.5MB	2.25MB	320x

How can the example consume so little memory?

In this example, several techniques are used to achieve this goal without compromising the performance.

The single row fetching mode uses unbuffered queries. Contrary to buffered queries where the whole result set is immediately sent to the client and stored in memory, the results of unbuffered queries are stored on the MySQL server and can be fetched one row at a time. This greatly reduces the amount of memory needed by the PHP script since only one row is stored in memory at any given time.

The batch fetching mode uses a primary key to split the final result set in small chunks. Most importantly, it does not use OFFSET to fetch next batches but remembers the ID of the last fetched row and uses this as the offset (WHERE id > $lastFetchedId). This works great for very large tables and prevents a bad full table scan. By using LIMIT XXX, only few rows are stored in memory at any given time.

Finally, Spout uses streams to write the data it is being passed. Streaming means that Spout only acts as a passthrough, and once the data is written, it just forgets about it freeing the memory.

How to generate a smaller/bigger data set?

The data set is generated by a script: gen-fixtures.php

This script contains the number of rows to generate (by default 50,000). You can update this number and run php scripts/gen-fixtures.php to re-generate the fixtures.sql file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
conf		conf
scripts		scripts
sql		sql
src/SpoutExample		src/SpoutExample
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock
index.php		index.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conf

conf

scripts

scripts

sql

sql

src/SpoutExample

src/SpoutExample

.editorconfig

.editorconfig

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

composer.json

composer.json

composer.lock

composer.lock

index.php

index.php

Repository files navigation

Spout & PDO Example

How to run the example?

Benchmarks

FAQ

Why Spout? Why not PHPExcel?

How can the example consume so little memory?

How to generate a smaller/bigger data set?

About

Releases

Packages

Languages

License

adrilo/spout-pdo-example

Folders and files

Latest commit

History

Repository files navigation

Spout & PDO Example

How to run the example?

Benchmarks

FAQ

Why Spout? Why not PHPExcel?

How can the example consume so little memory?

How to generate a smaller/bigger data set?

About

Resources

License

Stars

Watchers

Forks

Languages