WHERE子句不同组合的索引策略，包括文字样式

编程入门行业动态更新时间:2024-10-24 16:26:29

本文介绍了WHERE子句不同组合的索引策略，包括文字样式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

此处继续其他问题：

如何获取date_part查询以命中索引？

在执行以下查询时，它命中了复合索引在datelocal，视图，印象，性别，年龄组字段上创建：

When executing the following query, it hits a compound index I created on the datelocal, views, impressions, gender, agegroup fields:

SELECT date_part('hour', datelocal) AS hour , SUM(views) FILTER (WHERE gender = 'male') AS male , SUM(views) FILTER (WHERE gender = 'female') AS female FROM reportimpression WHERE datelocal >= '2019-02-01' AND datelocal < '2019-03-01' GROUP BY 1 ORDER BY 1;

但是，我也希望能够根据例如，在哪里：

However, I'd like to be able to also filter this query down based on additional clauses in the WHERE, for example:

第二个查询比第一个查询慢很多，尽管它在更少的记录上运行

This second query is MUCH slower than the first, although it should be operating on far fewer records, in addition to the fact that it doesn't hit my index.

表模式：

CREATE TABLE reportimpression ( datelocal timestamp without time zone, devicename text, network text, sitecode text, advertisername text, mediafilename text, gender text, agegroup text, views integer, impressions integer, dwelltime numeric ); -- Indices ------------------------------------------------------- CREATE INDEX reportimpression_datelocal_index ON reportimpression(datelocal timestamp_ops); CREATE INDEX reportimpression_viewership_index ON reportimpression(datelocal timestamp_ops,views int4_ops,impressions int4_ops,gender text_ops,agegroup text_ops); CREATE INDEX reportimpression_test_index ON reportimpression(datelocal timestamp_ops,(date_part('hour'::text, datelocal)) float8_ops);

分析输出：

Finalize GroupAggregate (cost=1005368.37..1005385.70 rows=3151 width=24) (actual time=70615.636..70615.649 rows=24 loops=1) Group Key: (date_part('hour'::text, datelocal)) -> Sort (cost=1005368.37..1005369.94 rows=3151 width=24) (actual time=70615.631..70615.634 rows=48 loops=1) Sort Key: (date_part('hour'::text, datelocal)) Sort Method: quicksort Memory: 28kB -> Gather (cost=1005005.62..1005331.75 rows=3151 width=24) (actual time=70615.456..70641.208 rows=48 loops=1) Workers Planned: 1 Workers Launched: 1 -> Partial HashAggregate (cost=1004005.62..1004016.65 rows=3151 width=24) (actual time=70613.132..70613.152 rows=24 loops=2) Group Key: date_part('hour'::text, datelocal) -> Parallel Seq Scan on reportimpression (cost=0.00..996952.63 rows=2821195 width=17) (actual time=0.803..69876.914 rows=2429159 loops=2) Filter: ((datelocal >= '2019-02-01 00:00:00'::timestamp without time zone) AND (datelocal < '2019-03-01 00:00:00'::timestamp without time zone) AND (network ~~ '%'::text)) Rows Removed by Filter: 6701736 Planning time: 0.195 ms Execution time: 70641.349 ms

我是否需要创建其他索引，调整SELECT或完全其他？

Do I need to create additional indexes, tweak my SELECT, or something else entirely?

推荐答案

您添加的谓词使用 Like 运算符：

Your added predicate uses the LIKE operator:

AND network LIKE '%'

实际查询计划取决于您通过的内容而不是'％'。但是，通常，纯btree索引对此没有用。您将需要一个三字母组合索引或使用文本搜索基础结构或类似工具，具体取决于您要寻找的模式。

The actual query plan depends on what you pass instead of '%'. But, generally, plain btree indexes are useless for this. You'll need a trigram index or use the text search infrastructure or similar, depending on what patterns you might be looking for.

请参阅：