IN(子查询)的性能

编程入门 行业动态 更新时间:2024-10-27 01:26:56
本文介绍了IN(子查询)的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我在Mac OS X上使用PG 7.4.3。 我对''select foo from $ b $等查询的表现感到失望b bar其中baz in(子查询)'',或更新如''update bar set foo = 2 where baz in(subquery)''。 PG似乎总是想对条形表进行连续的b $ b扫描。我希望有一种方法告诉PG,在你的计划中使用baz上的 索引,因为我知道子查询将返回 非常少的结果。在真正重要的地方,我一直在构建 动态查询,循环遍历baz的值并为每个值构建一个单独的查询并与UNION(或者只是 直接更新,在更新的情况下)。根据 吧台的大小,我可以获得数百甚至超过一千美元的加速,但是要做到这一点是一件很大的痛苦。 任何提示? 谢谢, Kevin Murphy 插图: 我想做的查询非常慢: 从build.elements中选择bundle_id where elementid in( SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1); ----------- 7644 7644 (2行) 时间:518.242毫秒 子查询很快: SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1; ------------ 41209 25047 (2行) 时间:3.268毫秒 我们主表上的索引很快: 从build.elements中选择bundle_id 其中elementid in(41209,25047); ----------- 7644 7644 (2行) 时间:2.468毫秒 缓慢查询的计划: egenome_test =#解析分析从build.elements中选择bundle_id where elementid in( SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1); egenome_test-#egenome_test(#egenome_test(#egenome_test(# QUERY PLAN \ ------------------------------------------------ ------------------------ ------------------ ------------------------------------------- 散列加入(成本= 70.33..72.86行= 25宽度= 4)(实际 时间= 583.051..583.059行= 2循环= 1) 哈希条件:(外部.eleme nt_id =" inner" .elementid) - > HashAggregate(成本= 47.83..47.83行= 25宽度= 4)(实际 时间= 0.656..0.658行= 2个循环= 1) - >散列连接(成本= 22.51..47.76行= 25宽度= 4)(实际 时间= 0.615..0.625行= 2循环= 1) 哈希条件: (" outer" .superloc_id =" inner" .superloc_id) - >在superlocs_2上进行Seq扫描(成本= 0.00..20.00行= 1000 宽度= 8)(实际时间= 0.004..0.012行= 9次循环= 1) - >散列(成本= 22.50..22.50行= 5宽度= 4)(实际 时间= 0.076..0.076行= 0循环= 1) - >在bundle_superlocs_2上进行Seq扫描 (成本= 0.00..22.50行= 5宽度= 4)(实际时间= 0.024..0.033行= 2 循环= 1) 过滤器:(protobundle_id = 1) - >哈希(成本= 20.00..20.00行= 1000宽度= 8)(实际 时间= 581.802..581.802行= 0循环= 1) - > Seq扫描元素(成本= 0.00..20.00行= 1000宽度= 8) (实际时间= 0.172..405.243行= 185535循环= 1) 总计运行时间:593.843 ms (12行) -------------------------- - (广播结束)--------------------------- 提示6:您是否搜索过我们的列表档案? archives.postgresql

我是在Mac OS X上使用PG 7.4.3。 我对查询的表现感到失望,例如'从酒吧选择foo baz in(subquery)'',或更新如' '更新栏设置foo = 2其中baz in(子查询)''。 PG似乎总是想要对酒吧表进行顺序扫描。我希望有一种方法告诉PG,在你的计划中使用baz上的索引,因为我知道子查询将返回非常少的结果。 真的很重要,我一直在构建动态查询,通过循环遍历baz的值并为每个值构建一个单独的查询并将与UNION组合(或者只是直接更新,在更新案例)。根据酒吧桌的大小,我可以获得数百甚至超过千倍的加速,但是要做到这一点是一件很大的痛苦。 任何提示? 谢谢, Kevin Murphy 插图: 我想做的查询非常慢: 从build.elements中选择bundle_id 其中elementid in( SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1); ----------- 7644 7644 (2行)时间:518.242毫秒

什么字段类型是protobundle_id?如果你将''''转换为 相同,索引是否会被使用? 电子邮件: sc ***** @ hub Yahoo!:yscrappy ICQ:7615664 ------ ---------------------(广播结束)------------------------ --- 提示3:如果通过Usenet发布/阅读,请发送适当的 subscribe-nomail命令给 ma ******* @ postgresql ,以便您的 消息可以干净地通过邮件列表

Kevin Murphy写道:

-------------------- -------------------------------------------------- - -------------------------------------------- ----------------- Hash Join(成本= 70.33..72.86行= 25宽= 4)(实际时间= 583.051..583.059行= 2循环= 1)哈希条件:(外部.element_id ="内部.elementid) - > HashAggregate(成本= 47.83..47.83行= 25宽度= 4)(实际时间= 0.656..0.658行= 2个循环= 1) - >散列连接(成本= 22.51..47.76行= 25宽度= 4)(实际时间= 0.615..0.625行= 2循环= 1)散列条件:(外部.superloc_id = inner.superloc_id) - >在superlocs_2上进行Seq扫描(成本= 0.00..20.00行= 1000宽度= 8)(实际时间= 0.004..0.012行= 9个循环= 1) - >散列(成本= 22.50..22.50行= 5宽度= 4)(实际时间= 0.076..0.076 行= 0循环= 1) - >在bundle_superlocs_2上进行Seq Scan(成本= 0.00..22.50行= 5宽度= 4)(实际时间= 0.024..0.033行= 2个循环= 1)过滤器:(protobundle_id = 1) - >散列(成本= 20.00..20.00行= 1000宽度= 8)(实际时间= 581.802..581.802行= 0循环= 1) - > Seq扫描元素(成本= 0.00..20.00行= 1000宽度= 8)(实际时间= 0.172..405.243行= 185535循环= 1) planner认为对元素的顺序扫描将返回1000 行,但它实际上返回185000.你最近是否对这个表进行了分析? 事后补充:它会如果数据库足够聪明,那就好了。当顺序扫描返回的结果比说应该的20倍时,分析一张自己的表。 Paul 总运行时间:593.843 ms (12行) -------- -------------------(广播结束)-------------------------- - 提示6:您是否搜索了我们的列表档案? http ://archives.postgresql

---------------------- -----(播出结束)--------------------------- 提示7:唐'别忘了增加你的f ree空间地图设置

>事后想想:如果数据库足够智能,那么

在顺序扫描返回的时候比自己应该的20倍更多地分析自己的表格会更好。

我曾经多次想知道PG是否有任何理由不能自动执行与seq同时进行分析的扫描,因为它发生了b $ b。这样,不需要额外的磁盘IO,并且统计数据可以说几乎是免费的 。 任何黑客都可以说明原因这可能是一个坏主意,或者只是需要志愿者的那些东西?b ? (我不是;至少现在不行。) ------------------------- - (播出结束)--------------------------- 提示8:解释分析是你的朋友

I''m using PG 7.4.3 on Mac OS X. I am disappointed with the performance of queries like ''select foo from bar where baz in (subquery)'', or updates like ''update bar set foo = 2 where baz in (subquery)''. PG always seems to want to do a sequential scan of the bar table. I wish there were a way of telling PG, "use the index on baz in your plan, because I know that the subquery will return very few results". Where it really matters, I have been constructing dynamic queries by looping over the values for baz and building a separate query for each one and combining with a UNION (or just directly updating, in the update case). Depending on the size of the bar table, I can get speedups of hundreds or even more than a thousand times, but it is a big pain to have to do this. Any tips? Thanks, Kevin Murphy Illustrated: The query I want to do is very slow: select bundle_id from build.elements where elementid in ( SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1); ----------- 7644 7644 (2 rows) Time: 518.242 ms The subquery is fast: SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1; ------------ 41209 25047 (2 rows) Time: 3.268 ms And using indexes on the main table is fast: select bundle_id from build.elements where elementid in (41209, 25047); ----------- 7644 7644 (2 rows) Time: 2.468 ms The plan for the slow query: egenome_test=# explain analyze select bundle_id from build.elements where elementid in ( SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1); egenome_test-# egenome_test(# egenome_test(# egenome_test(# QUERY PLAN \ ------------------------------------------------------------------------ ------------------------------------------------------------- Hash Join (cost=70.33..72.86 rows=25 width=4) (actual time=583.051..583.059 rows=2 loops=1) Hash Cond: ("outer".element_id = "inner".elementid) -> HashAggregate (cost=47.83..47.83 rows=25 width=4) (actual time=0.656..0.658 rows=2 loops=1) -> Hash Join (cost=22.51..47.76 rows=25 width=4) (actual time=0.615..0.625 rows=2 loops=1) Hash Cond: ("outer".superloc_id = "inner".superloc_id) -> Seq Scan on superlocs_2 (cost=0.00..20.00 rows=1000 width=8) (actual time=0.004..0.012 rows=9 loops=1) -> Hash (cost=22.50..22.50 rows=5 width=4) (actual time=0.076..0.076 rows=0 loops=1) -> Seq Scan on bundle_superlocs_2 (cost=0.00..22.50 rows=5 width=4) (actual time=0.024..0.033 rows=2 loops=1) Filter: (protobundle_id = 1) -> Hash (cost=20.00..20.00 rows=1000 width=8) (actual time=581.802..581.802 rows=0 loops=1) -> Seq Scan on elements (cost=0.00..20.00 rows=1000 width=8) (actual time=0.172..405.243 rows=185535 loops=1) Total runtime: 593.843 ms (12 rows) ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? archives.postgresql

解决方案

On Thu, 26 Aug 2004, Kevin Murphy wrote:

I''m using PG 7.4.3 on Mac OS X. I am disappointed with the performance of queries like ''select foo from bar where baz in (subquery)'', or updates like ''update bar set foo = 2 where baz in (subquery)''. PG always seems to want to do a sequential scan of the bar table. I wish there were a way of telling PG, "use the index on baz in your plan, because I know that the subquery will return very few results". Where it really matters, I have been constructing dynamic queries by looping over the values for baz and building a separate query for each one and combining with a UNION (or just directly updating, in the update case). Depending on the size of the bar table, I can get speedups of hundreds or even more than a thousand times, but it is a big pain to have to do this. Any tips? Thanks, Kevin Murphy Illustrated: The query I want to do is very slow: select bundle_id from build.elements where elementid in ( SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1); ----------- 7644 7644 (2 rows) Time: 518.242 ms

what field type is protobundle_id? if you typecast the ''1'' to be the same, does the index get used? Email: sc*****@hub Yahoo!: yscrappy ICQ: 7615664 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to ma*******@postgresql so that your message can get through to the mailing list cleanly

Kevin Murphy wrote:

------------------------------------------------------------------------ ------------------------------------------------------------- Hash Join (cost=70.33..72.86 rows=25 width=4) (actual time=583.051..583.059 rows=2 loops=1) Hash Cond: ("outer".element_id = "inner".elementid) -> HashAggregate (cost=47.83..47.83 rows=25 width=4) (actual time=0.656..0.658 rows=2 loops=1) -> Hash Join (cost=22.51..47.76 rows=25 width=4) (actual time=0.615..0.625 rows=2 loops=1) Hash Cond: ("outer".superloc_id = "inner".superloc_id) -> Seq Scan on superlocs_2 (cost=0.00..20.00 rows=1000 width=8) (actual time=0.004..0.012 rows=9 loops=1) -> Hash (cost=22.50..22.50 rows=5 width=4) (actual time=0.076..0.076 rows=0 loops=1) -> Seq Scan on bundle_superlocs_2 (cost=0.00..22.50 rows=5 width=4) (actual time=0.024..0.033 rows=2 loops=1) Filter: (protobundle_id = 1) -> Hash (cost=20.00..20.00 rows=1000 width=8) (actual time=581.802..581.802 rows=0 loops=1) -> Seq Scan on elements (cost=0.00..20.00 rows=1000 width=8) (actual time=0.172..405.243 rows=185535 loops=1) The planner thinks that the sequential scan on elements will return 1000 rows, but it actually returned 185000. Did you ANALYZE this table recently? Afterthought: It would be nice if the database was smart enough to analyze a table of its own accord when a sequential scan returns more than, say, 20 times what it was supposed to. Paul Total runtime: 593.843 ms (12 rows) ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? archives.postgresql

---------------------------(end of broadcast)--------------------------- TIP 7: don''t forget to increase your free space map settings

> Afterthought: It would be nice if the database was smart enough to

analyze a table of its own accord when a sequential scan returns more than, say, 20 times what it was supposed to.

I''ve wondered on several occasions if there is any good reason for PG not to automatically perform an analyze concurrently with a seq scan as it''s happening. That way, no extra disk IO is needed and the stats could say up-to-date for almost free. Any hackers around who can say why this might be a bad idea, or is it one of those things that just needs a volunteer? (I''m not; at least not now.) ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend

更多推荐

IN(子查询)的性能

本文发布于:2023-10-13 14:22:46,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1488200.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:性能

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!