PostgreSQL 索引不用于 IP 范围的查询

编程入门 行业动态 更新时间:2024-10-27 20:24:39
本文介绍了PostgreSQL 索引不用于 IP 范围的查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在使用 PostgreSQL 9.2 并且有一个 IP 范围表.这是 SQL:

I'm using PostgreSQL 9.2 and have a table of IP ranges. Here's the SQL:

CREATE TABLE ips ( id serial NOT NULL, begin_ip_num bigint, end_ip_num bigint, country_name character varying(255), CONSTRAINT ips_pkey PRIMARY KEY (id ) )

我在 begin_ip_num 和 end_ip_num 上都添加了纯 B-tree 索引:

I've added plain B-tree indices on both begin_ip_num and end_ip_num:

CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num); CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num );

正在使用的查询是:

SELECT ips.* FROM ips WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num;

问题是我的 BETWEEN 查询只使用 begin_ip_num 上的索引.使用索引后,它使用 end_ip_num 过滤结果.这是 EXPLAIN ANALYZE 结果:

The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:

Index Scan using index_ips_on_begin_ip_num on ips (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1) Index Cond: (3065106743::bigint >= begin_ip_num) Filter: (3065106743::bigint <= end_ip_num) Rows Removed by Filter: 47596 Total runtime: 16.425 ms

我已经尝试过各种索引组合,包括在 begin_ip_num 和 end_ip_num 上添加复合索引.

I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.

推荐答案

试试 多列索引,但第二列的顺序相反:

Try a multicolumn index, but with reversed order on the second column:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

排序与单列索引几乎无关,因为它几乎可以以同样快的速度向后扫描.但这对于多列索引很重要.

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

使用我提出的索引,Postgres 可以扫描第一列并找到地址,索引的其余部分满足第一个条件.然后,对于第一列的每个值,它可以返回满足第二个条件的所有行,直到第一个条件失败.然后跳转到第一列的下一个值,以此类推这仍然不是很有效,而且 Postgres 可能会更快,只需扫描第一个索引列并过滤第二个.很大程度上取决于您的数据分布.

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc. This is still not very efficient and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

无论哪种方式,CLUSTER使用上面的多列索引可以提高性能:

Either way, CLUSTER using the multicolumn index from above can help performance:

CLUSTER ips USING index_ips_begin_end_ip_num

这样,满足您的第一个条件的候选人将被打包到相同或相邻的数据页中.如果第一列的每个值有很多行,则可以大大提高性能.否则几乎没有效果.(也有用于此目的的非阻塞外部工具:pg_repack 或 pg_squeeze.)

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is hardly effective. (There are also non-blocking external tools for the purpose: pg_repack or pg_squeeze.)

另外,autovacuum 运行和配置是否正确或有您在桌子上运行 ANALYZE?您需要 Postgres 的当前统计信息来选择合适的查询计划.

Also, is autovacuum running and configured properly or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.

真正有用的是 GiST 索引 用于 int8range 列,自 PostgreSQL 9.2 起可用.

What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2.

进一步阅读:

  • 针对一系列时间戳(两列)优化查询

如果您的 IP 范围可以被 内置网络类型 inet 或 cidr,考虑替换你的两个 bigint 列.或者,更好的是,看看 Andrew Gierth (不在标准分布中.索引策略会相应更改.

If your IP ranges can be covered with one of the built-in network types inet or cidr, consider to replace your two bigint columns. Or, better yet, look to the additional module ip4r by Andrew Gierth (not in the standard distribution. The indexing strategy changes accordingly.

除此之外,您可以使用带有部分索引的复杂机制在 dba.SE 上查看此相关答案.先进的东西,但它提供了出色的性能:

Barring that, you can check out this related answer on dba.SE with using a sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance:

  • 空间索引能否帮助范围-排序-限制"查询

更多推荐

PostgreSQL 索引不用于 IP 范围的查询

本文发布于:2023-10-17 02:46:32,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1499544.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:索引   PostgreSQL   IP

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!