Thursday, March 3. 2011Why go with O(n) if you can have O(n²)?Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Could this bennefit from using the HANDLER statement? mysql> SELECT * FROM test001; mysql> HANDLER test001 OPEN; mysql> HANDLER test001 READ `id` NEXT LIMIT 2; mysql> HANDLER test001 READ `id` NEXT LIMIT 2; mysql> HANDLER test001 CLOSE; mysql> SELECT * FROM test001; mysql> HANDLER test001 OPEN; mysql> HANDLER test001 READ `id` NEXT LIMIT 2; mysql> HANDLER test001 READ `id` NEXT LIMIT 2; mysql> HANDLER test001 CLOSE; Status after HANDLER use: Status after select with limit (same rows read): No, as the query is spanning several tables (person, address, country, ...) Thought I'd share a bit since I've done this sort of thing myself. I don't believe you're giving the tradeoffs the proper attention in this case. One being that if you are dealing with a large amount of rows and columns, memory consumption is a major barrier you start running into right away. Mix in how easy it is to introduce memory leaks with php's object model, and you have a recipe for disaster. I've found that people are much more tolerant if they have to wait a bit longer for something than not getting it at all (because you're script died from out of memory). If you use the paginate model, you have more fine-grained control over the entire process and can free up memory every pagination run. The other trade off in this case is that if the server is multi-tasking, you are stealing a lot of i/o (mysql) in one fell swoop, then following up with a huge amount of cpu/memory. If you paginate, you switch from modest i/o (mysql) to modest cpu, then modest i/o, modest cpu (and so on), so you don't have to flat out deny other pending work. I'm not ranting against pagination in a user frontend, that usually makes perfect sense as the typical user would only check out the first hand full of pages anyway. What i'm talking about is an export script that creates the CSV export file in one go, just with multiple queries instead of a single one. So any potential PHP side memory leak issues would hit you anyway while the script chews its way through the ~40K result rows; the only exception being the MySQL result set buffer which would indeed be smaller when using good old ext/mysql or mysqli_store_result() with the newer ext/mysqli (while mysqli_use_result() would only internally store one result row at a time anyway). And regarding your multitasking/resource usage argument: any advantage of that in this "we need to show all data anyway" context would quickly be eaten away due to the O(n^2) nature of the approach, so that twice the amount of data would increase the database load four times while with a single query things would grow on a linear basis (assuming that the query itself scales linearly). In this special case you'd also additionally have to count in that it does a GROUP BY and has "Using temporary; Using filesort" in it's EXPLAIN plan, so the LIMIT would save next to nothing in database resources while the pseudo-pagination causes the query to be run 400 times instead of just once ... and if we ever double the number of contacts in our database to 80K it would be run 1600 times then ... epic fail ... I concur with Dan. Not really being interested in php memory leaks (that should be fixed by the developer), you still have to take into account the memory used for fetching the data from the db. If the developer hand-codes his queries, he can send a single select and fetch little by little. Unfortunately if there is an orm involved, it is most likely that this piecewise fetching is not supported, but sending many paginated queries is... Any 'Solution' that scales by n^2 while an equivalent solution scales linearly with growing n is a bad idea. Period. If you're using an ORM that can not deal with large result sets then the ORM is conceptually broken and adding LIMITs is just a workaround, not a solution. And if you really have to use such a workaround then please use a sensible chunk size. Something in the range of 10K maybe ... but *definitely* not just 100 ... |
Calendar
QuicksearchCategoriesBlog AdministrationChoose LanguageShow tagged entries |
|||||||||||||||||||||||||||||||||||||||||||||||||