带有ORDER和LIMIT子句的极慢的PostgreSQL查询

编程入门行业动态更新时间:2024-10-27 00:27:47

本文介绍了带有ORDER和LIMIT子句的极慢的PostgreSQL查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个表，我们称它为"foos"，其中有近600万条记录.我正在运行以下查询:

I have a table, let's call it "foos", with almost 6 million records in it. I am running the following query:

SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0;

此查询需要很长时间才能运行(运行时，Rails超时).所有相关ID都有一个索引.奇怪的是，如果我删除ORDER BY子句或LIMIT子句，它几乎立即运行.

This query takes a very long time to run (Rails times out while running it). There is an index on all IDs in question. The curious part is, if I remove either the ORDER BY clause or the LIMIT clause, it runs almost instantaneously.

我假设同时存在ORDER BY和LIMIT使得PostgreSQL在查询计划中做出了一些错误的选择.有人对如何解决这个问题有任何想法吗?

I'm assuming that the presence of both ORDER BY and LIMIT are making PostgreSQL make some bad choices in query planning. Anyone have any ideas on how to fix this?

如果有帮助，以下是所有3种情况的EXPLAIN:

In case it helps, here is the EXPLAIN for all 3 cases:

//////// Both ORDER and LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..16663.44 rows=5 width=663) -> Nested Loop (cost=0.00..25355084.05 rows=7608 width=663) Join Filter: (foos.bar_id = bars.id) -> Index Scan Backward using foos_pkey on foos (cost=0.00..11804133.33 rows=4963477 width=663) Filter: (((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text)) -> Materialize (cost=0.00..658.96 rows=182 width=4) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) (8 rows) //////// Just LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) LIMIT 5 OFFSET 0; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..22.21 rows=5 width=663) -> Nested Loop (cost=0.00..33788.21 rows=7608 width=663) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) -> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663) Index Cond: (foos.bar_id = bars.id) Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (7 rows) //////// Just ORDER SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=36515.17..36534.19 rows=7608 width=663) Sort Key: foos.id -> Nested Loop (cost=0.00..33788.21 rows=7608 width=663) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) -> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663) Index Cond: (foos.bar_id = bars.id) Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (8 rows)

推荐答案

同时使用LIMIT和ORDER BY时，优化器决定通过键降序遍历foo上未过滤的记录会更快，直到获得5个匹配项其余标准.在其他情况下，它只是将查询作为嵌套循环运行，并返回所有记录.

When you have both the LIMIT and ORDER BY, the optimizer has decided it is faster to limp through the unfiltered records on foo by key descending until it gets five matches for the rest of the criteria. In the other cases, it simply runs the query as a nested loop and returns all the records.

暂时，我要说的问题是PG不会占用各种id的 joint 分布，这就是计划如此欠佳的原因.

Offhand, I'd say the problem is that PG doesn't grok the joint distribution of the various ids and that's why the plan is so sub-optimal.

对于可能的解决方案:我假设您最近运行过ANALYZE.如果没有，请这样做.这可以解释为什么即使在返回速度很快的版本上，您的估计时间仍然很高.如果问题仍然存在，则可以将ORDER BY作为子选择并在外部查询中加上LIMIT.

For possible solutions: I'll assume that you have run ANALYZE recently. If not, do so. That may explain why your estimated times are high even on the version that returns fast. If the problem persists, perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query.

更多推荐

带有ORDER和LIMIT子句的极慢的PostgreSQL查询

本文发布于:2023-10-22 22:36:45，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1518937.html