如何优化订单和限制？(How to optimize order by and limit?)

我有一位客户希望我为他的网站制作一个后端。他需要一个表格来显示所有带分页的文件。

CREATE TABLE `content_files` ( `id` varchar(16) NOT NULL, `owner` varchar(16) DEFAULT NULL, `location` varchar(16) NOT NULL, `parent` varchar(16) DEFAULT NULL, `date` int(11) NOT NULL, `filename` varchar(256) NOT NULL, `username` varchar(64) NOT NULL, `email` varchar(256) NOT NULL, `ip` varchar(15) NOT NULL, `json` text NOT NULL, `bin` blob NOT NULL ); ALTER TABLE `content_files` ADD PRIMARY KEY (`id`), ADD UNIQUE KEY `ID` (`id`), ADD KEY `id_2` (`id`), ADD KEY `date` (`date`), ADD KEY `filename` (`filename`(255)), ADD KEY `username` (`username`(63)), ADD KEY `email` (`email`(255)), ADD KEY `ip` (`ip`(14));

需要排序的项目包括日期，文件名，用户名，电子邮件和IP。目前有65,000条记录。如果限制很高，正如预期的那样需要更长的时间，但它会非常长。 100秒获得第60,000个条目。

我只是使用：

SELECT id, date, filename, username, email ip FROM content_files ORDER BY filename LIMIT 60000, 20

我已经搜索过这个问题，然而，没有任何提示似乎可以改善我的查询。我的架构中是否有一些明显的错误？我该如何优化呢？

I have a client who wishes for me to make a back-end for his website. He requires a table displaying all files with pagination.

CREATE TABLE `content_files` ( `id` varchar(16) NOT NULL, `owner` varchar(16) DEFAULT NULL, `location` varchar(16) NOT NULL, `parent` varchar(16) DEFAULT NULL, `date` int(11) NOT NULL, `filename` varchar(256) NOT NULL, `username` varchar(64) NOT NULL, `email` varchar(256) NOT NULL, `ip` varchar(15) NOT NULL, `json` text NOT NULL, `bin` blob NOT NULL ); ALTER TABLE `content_files` ADD PRIMARY KEY (`id`), ADD UNIQUE KEY `ID` (`id`), ADD KEY `id_2` (`id`), ADD KEY `date` (`date`), ADD KEY `filename` (`filename`(255)), ADD KEY `username` (`username`(63)), ADD KEY `email` (`email`(255)), ADD KEY `ip` (`ip`(14));

The items which need to be sortable are date, filename, username, email, and ip. There are currently 65,000 records. If the limit is high, as expected it takes longer, but it's very significantly longer. 100 seconds to get the 60,000th entry.

I was simply using:

SELECT id, date, filename, username, email ip FROM content_files ORDER BY filename LIMIT 60000, 20

I have searched this issue, however, none of the tips seemed to improve my queries. Is there some glaring mistake I am missing in his schema? How can I optimize this?

最满意答案

您正在构建一个大数据集并对其进行排序，仅丢弃60K行并显示20.该工作可以通过所谓的延迟连接来减少。排序仍然必须发生，但它可以占用更少的内存，因此更快。

编辑将子查询转换为连接。

SELECT a.id, a.date, a.filename, a.username, a.email ip FROM content_files a JOIN ( SELECT id FROM content_files ORDER BY filename LIMIT 60000, 20 ) b ON a.id = b.id ORDER BY a.filename

这是一个很棒的大排序 - 在较小的数据集上丢弃操作。然后它只查找20行所需的所有数据。

最后，如果在(filename, id)上添加复合索引(filename, id)则可以通过扫描索引来满足子查询，这将使其更快。创建复合索引时，可以删除仅包含filename的索引。

您的表上有一堆冗余索引。（仅限三个人id ）。清理你的索引！他们放慢了更新速度。

You're building up a big dataset and sorting it, only to discard 60K rows and show 20. That work can be cut down by a so-called deferred join. The sorting still has to happen, but it can take less memory, and so be faster.

Edit get the subquery into a join.

SELECT a.id, a.date, a.filename, a.username, a.email ip FROM content_files a JOIN ( SELECT id FROM content_files ORDER BY filename LIMIT 60000, 20 ) b ON a.id = b.id ORDER BY a.filename

This does your great big sort - discard operation on a smaller dataset. It then looks up all the data you need for just the 20 rows.

Finally, if you add a compound index on (filename, id) the subquery can be satisfied by scanning the index which will make it even faster. You can delete the index you have on just filename when you create the compound index.

You have a bunch of redundant indexes on your table. (Three alone on id). Clean up your indexes! They slow down updates.

更多推荐