Wednesday, May 20, 2009

Optimizing MySQL Queries-Part 1

Indexing Basic MySQL Queries
The database is just too slow. Queries are queuing up, backlogs growing, users being refused connection. Management is ready to spend millions on "upgrading" to some other system, when the problem is really that MySQL is simply not being used properly. Badly defined or non-existent indexes are one of the primary reasons for poor performance, and fixing these can often lead to phenomenal improvements. Indexes increase the performance of your query but one thing you should know that if indexes are not defined properly then it can decrease your performance.

So Let’s see the example:
First we must understand why the query is slow in the mind of MySQL and the only way to do that is to have MySQL explain it to us:
MySQL EXPLAIN
-> SELECT attach_id as total_attachment FROM exp
-> WHERE post_id IN (163246, 164224, 164894, 165146, 167931);
+-------------------+-------+---------------+-------------------+---------+------+-------+--------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------------+-------+---------------+-------------------+---------+------+-------+--------------------------+
| exp | index | NULL | attach_id_post_id | 6 | NULL | 61834 | Using where; Using index |
+-------------------+-------+---------------+-------------------+---------+------+-------+--------------------------+

It's a thing of beauty to see inside the mind of the machine, but in this case perhaps not. Why is MySQL telling us there's no possible keys but it's using key attach_id_post_id? And if it's using a key then why does it suspect it will have to examine 61,834 rows (by performing a full index scan, denoted by "type: index")? It seems there's a problem with the keys so we must now understand them:
MySQL DESCRIBE exp;
+-------------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------+------+-----+---------+-------+
| attach_id | mediumint(8) unsigned | | MUL | 0 | |
| post_id | mediumint(8) unsigned | | | 0 | |
| privmsgs_id | mediumint(8) unsigned | | | 0 | |
| user_id_1 | mediumint(8) | | | 0 | |
| user_id_2 | mediumint(8) | | | 0 | |
+-------------+-----------------------+------+-----+---------+-------+

MySQL> SHOW INDEX FROM exp;
+-------------------+------------+-----------------------+--------------+-------------+-----------+-------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality |
+-------------------+------------+-----------------------+--------------+-------------+-----------+-------------+
| exp | 1 | attach_id_post_id | 1 | attach_id | A | 61834 |
| exp | 1 | attach_id_post_id | 2 | post_id | A | 61834 |
| exp | 1 | attach_id_privmsgs_id | 1 | attach_id | A | 61834 |
| exp | 1 | attach_id_privmsgs_id | 2 | privmsgs_id | A | 61834 |

Understanding indexes (or keys if you prefer) is two part: Understanding the structure of the table then understanding the indexes. You can't just slap an index on a table and think everything will be wonderful. In this example it looks like everything should be wonderful with key attach_id_post_id. Given that the SELECT statement is selecting attach_id and post_id and that's just what this key indexes, so why isn't it working? It is working, just not how we're intending; it's working for MySQL which is why in EXPLAIN it says "Using index." When MySQL says this in "Extra" is means "The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row." In other words: It finds and returns matching columns from the index in memory not the table on disk, which is a good thing, unless it's doing this 12 million times for 1 matching column.

How very annoying: MySQL is using the index but still in effect examing every row of the table. The reason why in this example deals with how MySQL uses multiple column indexes. From DESCRIBE we see "MUL" for multi-column index, and from SHOW INDEX we see attach_id_post_id twice, first for attach_id second for post_id. A multiple column index acts like a single column index if the columns were put end-to-end in the order specified by "Seq_in_index" from SHOW INDEX. In this example if attach_id were 100 and post_id were 200 this is indexed as "100 200". Painfully simple correct? Throw this in the mix: MySQL will only use a multi-column index if a value is specified for the first column in the index. In this example the first column in the index is attach_id and we're not specifying a value for this column which is why MySQLwon't use the index like we want it to. What MySQLdoes do, and why it's able to use the index at all, is use any value for attach_id and the values we gave it for post_id. In effect it looks for '* 163246', '* 164224', '* 164894', '* 165146', and '* 167931'. Since attach_id is unique MySQLreally does have to look at every single one, all 61,00+. While doing that if it comes across one with a matching post_id lucky for us. I hope you see the obvious and simple solution: Swap the order of columns in the key, post_id then attach_id. Later we'll do this but first it's good learning to examine another possibility.

No comments:

Post a Comment