简单的查询优化（WHERE + ORDER + LIMIT）(Simple query optimization (WHERE + ORDER + LIMIT))

编程入门行业动态更新时间:2024-10-27 07:20:46

我有这个查询运行速度令人难以置信的慢（4分钟）：

SELECT * FROM `ad` WHERE `ad`.`user_id` = USER_ID ORDER BY `ad`.`id` desc LIMIT 20;

广告表大约有1000万行。

SELECT COUNT(*) FROM `ad` WHERE `ad`.`user_id` = USER_ID;

返回10k行。

表有以下索引：

PRIMARY KEY (`id`), KEY `idx_user_id` (`user_id`,`status`,`sorttime`),

EXPLAIN给出了这个：

id: 1 select_type: SIMPLE table: ad type: index possible_keys: idx_user_id key: PRIMARY key_len: 4 ref: NULL rows: 4249 Extra: Using where

我不明白为什么需要这么长时间？此查询也是由ORM（分页）生成的，因此从外部优化它可能会很好（可能会添加一些额外的索引）。

BTW这个查询很快：

select aa.* from (select id from ad where user_id=USER_ID order by id desc limit 20) as a join ad as aa on a.id = aa.id ;

编辑：我尝试了另一个用户，其行数（数十个）比原始行少得多。我想知道为什么原始查询不使用idx_user_id ：

EXPLAIN SELECT * FROM `ad` WHERE `ad`.`user_id` = ANOTHER_ID ORDER BY `ad`.`id` desc LIMIT 20; id: 1 select_type: SIMPLE table: ad type: ref possible_keys: idx_user_id **key: idx_user_id** key_len: 3 ref: const rows: 84 Extra: Using where; Using filesort

Edit2：在Alexander的帮助下，我决定尝试强制MySQL使用我想要的索引，并且跟随查询要快得多（1秒而不是4分钟）：

SELECT * FROM `ad` USE INDEX (idx_user_id) WHERE `ad`.`user_id` = 1884774 ORDER BY `ad`.`id` desc LIMIT 20;

I have this query that runs unbelievably slow (4 minutes):

SELECT * FROM `ad` WHERE `ad`.`user_id` = USER_ID ORDER BY `ad`.`id` desc LIMIT 20;

Ad table has approximately 10 million rows.

SELECT COUNT(*) FROM `ad` WHERE `ad`.`user_id` = USER_ID;

Returns 10k rows.

Table has following indexes:

PRIMARY KEY (`id`), KEY `idx_user_id` (`user_id`,`status`,`sorttime`),

EXPLAIN gives this:

id: 1 select_type: SIMPLE table: ad type: index possible_keys: idx_user_id key: PRIMARY key_len: 4 ref: NULL rows: 4249 Extra: Using where

I am failing to understand why does it take so long? Also this query is generated by ORM (pagination) so it would be nice to optimize it from outside (maybe add some extra index).

BTW this query works fast:

select aa.* from (select id from ad where user_id=USER_ID order by id desc limit 20) as a join ad as aa on a.id = aa.id ;

Edit: I tried another user with much less rows (dozens) than original one. I am wondering why doesn't original query use idx_user_id:

Edit2: with help of Alexander I decided to try force MySQL to use the index I want, and following query is much faster (1 sec instead of 4 mins):

SELECT * FROM `ad` USE INDEX (idx_user_id) WHERE `ad`.`user_id` = 1884774 ORDER BY `ad`.`id` desc LIMIT 20;

最满意答案

在EXPLAIN输出中，您可以看到key是PRIMARY 。这意味着MySQL优化器决定扫描所有表记录（已经按id排序）并使用特定user_id值搜索前20条记录比使用idx_user_id键idx_user_id ，这被优化器视为可能的密钥，然后拒绝。

在第二个查询中，优化器发现子查询中只需要id值，并决定使用idx_user_id索引，因为该索引允许在不触及表本身的情况下计算必需id的列表。然后通过主键值直接搜索仅检索20条记录，这对于少量记录来说是非常快速的操作。

当您使用ANOTHER_ID显示查询时，MySQL错误的决定是基于先前USER_ID值的行数。这个数字是如此之大，以至于优化器猜测它只会通过查看表记录本身并使用错误的user_id值跳过记录来更快地找到具有此特定user_id的前20条记录。

如果通过索引访问表行，则需要随机访问操作。对于典型的HDD，随机访问操作比顺序扫描慢大约100倍。因此，为了使索引有用，它必须将行数减少到总行数的1％以下。如果特定USER_ID值的行占总行数的1％以上，那么如果我们要检索所有这些行，则执行全表扫描而不是使用索引可能更有效。但MySQL优化器没有考虑到只检索20行的事实。所以它错误地决定不使用索引而是进行全表扫描。

为了快速查询任何user_id值，您可以添加一个索引，以便以最快的方式执行查询：

create index idx_user_id_2 on ad(user_id, id);

该索引允许MySQL进行过滤和排序。为此，应首先放置用于过滤的列，并将用于排序的列放在第二位。 MySQL应该足够聪明以使用该索引，因为该索引允许搜索所有必要的记录而不跳过任何记录。

In the EXPLAIN output you can see that the key value is PRIMARY. This means that MySQL optimizer decided that it is faster to scan all table records (which are already sorted by id) and search first 20 records with the specific user_id value than to use idx_user_id key, which was considered by optimizer as a possible key and then rejected.

In your second query the optimizer sees that only id values are necessary in the subquery, and decided to use idx_user_id index instead, as that index allows to calculate the list of necessary ids without touching the table itself. Then only 20 records are retrieved by direct search by primary key value, which is very fast operation for that small number of records.

As you query with ANOTHER_ID shows, the MySQL wrong decision was based on the number of rows for the previous USER_ID value. This number was so big that the optimizer guessed that it will find the first 20 records with this specific user_id faster just by looking at the table records itself and skipping records with wrong user_id values.

If table rows are accessed by index, it requires random access operations. For typical HDD random access operations are about 100 time slower then sequential scan. So in order for index to be useful it must reduce the count of rows to less then 1% of the total rows count. If the rows for the specific USER_ID value accounts for more than 1% of the total number of rows, it may be more efficient to do full table scan instead of using of index, if we want to retrieve all these rows. But MySQL optimizer doesn't takes into account the fact that only 20 of this rows will be retrieved. So it mistakenly decided not to use index and do full table scan instead.

In order to make your query fast for any user_id value you can add one more index which will allow the query execution in the fastest way possible:

create index idx_user_id_2 on ad(user_id, id);

This index allows MySQL to do both filtering and sorting. To do that the columns used for filtering should be placed first, and the columns used for ordering should be placed second. MySQL should be smart enough to use that index, because this index allows to search all necessary records without skipping any records.

更多推荐

本文发布于:2023-07-30 06:17:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1336695.html