Skip to content

This example demonstrates how to efficiently fetch large data set from MySQL and create an Excel report file out of it - without causing any out of memory issues - using Spout and PDO.

License

adrilo/spout-pdo-example

Repository files navigation

Spout & PDO Example

The code in this repository demonstrates how to efficiently extract large data set from a MySQL database and export it to an Excel file. It turns out that this is not as easy at it seems. Both of these tasks often leads to terrible performance and out of memory crashes with big data sets.

This example uses PDO as the MySQL driver as well as Spout to create XLSX files. [Read why.]

How to run the example?

  1. Clone or download this repository.
  2. Add data in your MySQL database. This example provides test data that can be found in sql/fixtures.sql. To create a new database and import the test data into it, you can run mysql -u {USERNAME} -p {password} < sql/fixtures.sql. You may need to comment the first line of the sql file for the first import.
  3. Run the example: php index.php

Benchmarks

These metrics have been obtained by fetching and writing 50,000 rows (200,000 cells) to a XLSX file.

Row fetching mode Elapsed time Memory peak
One by one 8.32s 1MB
Batch (500) 8.02s 2.25MB
Batch (1000) 9.72s 3.75MB
All at once 9.24s 73.25MB

These metrics have been obtained by fetching and writing 200,000 rows (800,000 cells) to a XLSX file.

Row fetching mode Elapsed time Memory peak
One by one 32.95s 1MB
Batch (500) 36.86s 2.25MB
Batch (1000) 38.70s 3.75MB
All at once 40.61s 289MB

FAQ

Why Spout? Why not PHPExcel?

Even though PHPExcel is a great and popular library, it reaches its limits with large data sets. Spout was designed to support any size of data sets, making it super easy to scale without needing to worry about caching, optimizations, etc.

To support the previous claim, the same set of metrics was collected, but using PHPExcel instead of Spout. Here are the results obtained:

Num written rows Metric PHPExcel Spout Ratio
50,000 Elapsed time 33.56s 8.02s 4.2x
Memory peak 186MB 2.25MB 83x
200,000 Elapsed time 172.91s 36.86s 4.5x
Memory peak 720.5MB 2.25MB 320x
How can the example consume so little memory?

In this example, several techniques are used to achieve this goal without compromising the performance.

The single row fetching mode uses unbuffered queries. Contrary to buffered queries where the whole result set is immediately sent to the client and stored in memory, the results of unbuffered queries are stored on the MySQL server and can be fetched one row at a time. This greatly reduces the amount of memory needed by the PHP script since only one row is stored in memory at any given time.

The batch fetching mode uses a primary key to split the final result set in small chunks. Most importantly, it does not use OFFSET to fetch next batches but remembers the ID of the last fetched row and uses this as the offset (WHERE id > $lastFetchedId). This works great for very large tables and prevents a bad full table scan. By using LIMIT XXX, only few rows are stored in memory at any given time.

Finally, Spout uses streams to write the data it is being passed. Streaming means that Spout only acts as a passthrough, and once the data is written, it just forgets about it freeing the memory.

How to generate a smaller/bigger data set?

The data set is generated by a script: gen-fixtures.php

This script contains the number of rows to generate (by default 50,000). You can update this number and run php scripts/gen-fixtures.php to re-generate the fixtures.sql file.

About

This example demonstrates how to efficiently fetch large data set from MySQL and create an Excel report file out of it - without causing any out of memory issues - using Spout and PDO.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages