PostgreSQL是否可以快速搜索带有字符串数组的列？(Does PostgreSQL quickly search for columns with arrays of strings?)

编程入门行业动态更新时间:2024-10-27 01:20:30

根据Can PostgreSQL索引数组列？，PostgreSQL可以索引数组列。

它能否像对非数组类型一样有效地搜索数组列？

例如，假设您有一个问题表中的一行（如SO）：

title: ... content:... tags: [ 'postgresql', 'indexing', 'arrays' ]

并且您想要使用标签'postgresql'找到问题。将关系存储在连接表中会更快搜索吗？

是的，每列都有一个索引。

According to Can PostgreSQL index array columns?, PostgreSQL can index array columns.

Can it do searches on an array column as efficiently as it does for non array types?

For example, suppose you have a row from a questions table (like SO):

title: ... content:... tags: [ 'postgresql', 'indexing', 'arrays' ]

And you want to find questions with the tag 'postgresql'. Would storing the relationship in a join table be faster for searching?

And yes, each column would have an index.

最满意答案

GIN和GiST索引通常比简单的b树更大，扫描时间更长。 GIN比GiST快，代价是非常昂贵的更新。

如果将标记存储在数组列中，则对该行的任何更新通常都需要更新数组上的索引。在某些情况下，HOT会允许跳过这个，但这不是你可以依赖的东西。所以你将有更多的索引更新和更多的索引膨胀。

另一方面，您无需扫描b树以获取所需对象的ID，然后通过连接从主表中获取它们。您还通过使用数组节省了相当多的空间，而不是为连接表中的每个标记支付每行28字节的开销。

如果主表上的插入和更新速率相当低 - 包括对标记的更改 - 那么GIN可能是一个合适的选择。否则，我可能会在连接表上找到一个典型的b-tree，在(tag, object_id)上有一个辅助索引(tag, object_id)这样只能使用索引扫描来查找具有给定标记的对象。

最后，最好的办法是对其进行基准测试，以模拟您的工作量 。

GIN and GiST indexes are generally bigger than a simple b-tree, and take longer to scan. GIN is faster than GiST at the cost of very expensive updates.

If you store your tags in an array column then any update to the row will generally require an update to the index on the array. Under some circumstances HOT will permit this to be skipped, but it's not something you can rely on. So you'll have more index updates and more index bloat.

On the other hand, you're avoiding the need to scan a b-tree for the IDs of the desired objects then fetch them from the main table via a join. You're also saving a fair bit of space by using an array instead of paying the 28 byte per row overhead for each tag in the join table.

If your insert and update rate on the main table is fairly low - including changes to tags - then GIN might be a suitable choice. Otherwise I'd probably go for a typical b-tree on a join table, with a secondary index on (tag, object_id) so that index-only scans can be used to look up the object(s) that have a given tag.

In the end the best thing to do is benchmark it for a simulation of your workload.

更多推荐

本文发布于:2023-07-31 21:27:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1348983.html