我有2张桌子, transaksi和 buku。 transaksi大约有25万行,而buku大约有17万行。两个表的列均称为 k999a,并且两个表均不使用索引。现在我检查这两个语句。
I have 2 tables, "transaksi" and "buku". "transaksi" has around ~250k rows, and buku has around ~170k rows. Both tables have column called "k999a", and both tables use no indexes. Now I check these 2 statements.
声明1:
explain select k999a from transaksi where k999a not in (select k999a from buku);陈述1输出:
Seq Scan on transaksi (cost=0.00..721109017.46 rows=125426 width=9) Filter: (NOT (SubPlan 1)) SubPlan 1 -> Materialize (cost=0.00..5321.60 rows=171040 width=8) -> Seq Scan on buku (cost=0.00..3797.40 rows=171040 width=8)声明2:
explain select k999a from transaksi where k999a in (select k999a from buku);陈述2输出:
Hash Semi Join (cost=6604.40..22664.82 rows=250853 width=9) Hash Cond: (transaksi.k999a = buku.k999a) -> Seq Scan on transaksi (cost=0.00..6356.53 rows=250853 width=9) -> Hash (cost=3797.40..3797.40 rows=171040 width=8) -> Seq Scan on buku (cost=0.00..3797.40 rows=171040 width=8)为什么在不在查询中,postgresql会循环联接,使查询花费很长时间吗?
Why in the NOT IN query, postgresql does loop join, making the query takes a long time?
PS:Windows 10上的postgresql版本9.6.1
PS: postgresql version 9.6.1 on windows 10
推荐答案这是预料之中的。使用 WHERE NOT EXISTS 可能会获得更好的性能:
This is to be expected. You may get better performance using WHERE NOT EXISTS instead:
SELECT k999a FROM transaksi WHERE NOT EXISTS ( SELECT 1 FROM buku WHERE buku.k999a = transaksi.k999a LIMIT 1 );这里很好地解释了每种方法的原因: explainextended/2009 / 09/16 / not-in-vs-not-exists-vs-left-join-is-null-postgresql /
Here is a good explanation as to why for each of the methods: explainextended/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/
更多推荐
Postgresql使用IN vs NOT IN时的巨大性能差异
发布评论