我正在从Postgres表中聚合数据,查询大约需要2秒钟,我想减少到不到一秒钟。
请在下面找到执行详细信息:
查询
选择 a.search_keyword, hll_cardinality(hll_union_agg(a.users )):: int as user_count, hll_cardinality(hll_union_agg(a.sessions)):: int as session_count, sum(a.total)as keyword_count from rollup_day a ,其中 a.created_date在'2018-09-01'和'2019-09-30'之间和a.tenant_id ='62850a62-19ac-477d-9cd7-837f3d716885'组由 a.search_keyword 订单由 session_count desc 限制100;
表元数据
查询计划
自定义扫描(成本= 0.00..0.00行= 0宽度= 0)(实际时间= 1722.685..1722.694行= 100循环= 1)任务计数:1 显示的任务:全部->任务节点:host = localhost端口= 5454 dbname = postgres ->限制(cost = 64250.24..64250.49行= 100宽度= 42)(实际时间= 1783.087..1783.106行= 100循环= 1)->排序(cost = 64250.24..64558.81行= 123430宽度= 42)(实际时间= 1783.085..1783.093行= 100循环= 1)排序键:((hll_cardinality(hll_union_agg(sessions))):: integer )DESC 排序方法:top-N堆排序内存:33kB -> GroupAggregate(费用= 52933.89..59532.83行= 123430宽度= 42)(实际时间= 905.502..1724.363行= 212633循环= 1)组关键字:search_keyword ->排序(cost = 52933.89..53636.53行= 281055宽度= 54)(实际时间= 905.483..1351.212行= 280981循环= 1)排序键:search_keyword 排序方法:外部合并磁盘:18496kB ->在rollup_day a上进行Seq扫描(成本= 0.00..17890.22行= 281055宽度= 54)(实际时间= 29.720..112.161行= 280981循环= 1)过滤器:((created_date> ='2018-09 -01':: date)AND(created_date< ='2019-09-30':: date)AND(tenant_id ='62850a62-19ac-477d-9cd7-837f3d716885':: uuid))已删除行通过筛选器:225546 计划时间:0.129 ms 执行时间:1786.222 ms 计划时间:0.103 ms 执行时间:1722.718 ms
我尝试过的事情
任何帮助将不胜感激。
在将 work_mem设置为100MB后,更新
查询计划/ blockquote>
自定义扫描(成本= 0.00..0.00行= 0宽度= 0)(实际时间= 1375.926..1375.935行= 100 loops = 1)任务计数:1 显示的任务:全部->任务节点:host = localhost端口= 5454 dbname = postgres ->限制(成本= 48348.85..48349.10行= 100宽度= 42)(实际时间= 1307.072..1307.093行= 100循环= 1)->排序(cost = 48348.85..48633.55行= 113880宽度= 42)(实际时间= 1307.071..1307.080行= 100循环= 1)排序键:(sum(total))DESC 排序方法:前N个堆排序内存:35kB -> GroupAggregate(费用= 38285.79..43996.44行= 113880宽度= 42)(实际时间= 941.504..1261.177行= 172945循环= 1)组关键字:search_keyword ->排序(cost = 38285.79..38858.52行= 229092宽度= 54)(实际时间= 941.484..963.061行= 227261循环= 1)排序键:search_keyword 排序方法:quicksort内存:32982kB ->在rollup_day_104290 a(成本= 0.00..17890.22行= 229092宽度= 54)上进行序列扫描(实际时间= 38.803..104.350行= 227261循环= 1)过滤器:((created_date> ='2019-01 -01':: date)AND(created_date< ='2019-12-30':: date)AND(tenant_id ='62850a62-19ac-477d-9cd7-837f3d716885':: uuid))已删除行通过过滤器:279266 计划时间:0.131 ms 执行时间:1308.814 ms 计划时间:0.112 ms 执行时间:1375.961 ms
更新2
在created_date创建索引后,将 work_mem增加到120MB
在rollup_day(created_date)上创建索引date_idx;
行是: 12,124,608
查询计划是:
自定义扫描(成本= 0.00..0.00行= 0宽度= 0)(实际时间= 2635.530..2635.540行= 100循环= 1)任务计数:1 显示的任务:全部->任务节点:host = localhost端口= 9702 dbname = postgres ->限制(费用= 73545.19..73545.44行= 100宽度= 51)(实际时间= 2755.849..2755.873行= 100循环= 1)->排序(cost = 73545.19..73911.25行= 146424宽度= 51)(实际时间= 2755.847..2755.858行= 100循环= 1)排序键:(sum(total))DESC 排序方法:前N个堆排序内存:35kB -> GroupAggregate(费用= 59173.97..67948.97行= 146424宽度= 51)(实际时间= 2014.260..2670.732行= 296537循环= 1)组关键字:search_keyword ->排序(成本= 59173.97..60196.85行= 409152宽度= 55)(实际时间= 2013.885..2064.775行= 410618循环= 1)排序键:search_keyword 排序方法:quicksort内存:61381kB ->使用上rollup_day_102913 a上的date_idx_102913进行索引扫描(成本= 0.42..21036.35行= 409152宽度= 55)(实际时间= 0.026..183.370行= 410618循环= 1)索引条件:(((created_date> =' 2018-01-01':: date)AND(created_date< ='2018-12-31':: date))过滤器:(tenant_id ='12850a62-19ac-477d-9cd7-837f3d716885':: uuid)计划时间:0.135毫秒执行时间:2760.667毫秒计划时间:0.090毫秒执行时间:2635.568毫秒解决方案
您应该尝试使用 work_mem 的更高设置获得内存中的排序。当然,只有在计算机上有足够的内存时,您才可以拥有足够的内存。
如果您存储预先聚合的数据,则可以通过以下两种方式来加快查询速度:物化视图或第二个表以及原始表上的触发器,该触发器将使另一个表中的总和保持更新。我不知道您的数据是否可行,因为我不知道 hll_cardinality 和 hll_union_agg 是什么
I am aggregating data from a Postgres table, the query is taking approx 2 seconds which I want to reduce to less than a second.
Please find below the execution details:
Query
select a.search_keyword, hll_cardinality( hll_union_agg(a.users) ):: int as user_count, hll_cardinality( hll_union_agg(a.sessions) ):: int as session_count, sum(a.total) as keyword_count from rollup_day a where a.created_date between '2018-09-01' and '2019-09-30' and a.tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885' group by a.search_keyword order by session_count desc limit 100;
Table metadata
Query plan
Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=1722.685..1722.694 rows=100 loops=1) Task Count: 1 Tasks Shown: All -> Task Node: host=localhost port=5454 dbname=postgres -> Limit (cost=64250.24..64250.49 rows=100 width=42) (actual time=1783.087..1783.106 rows=100 loops=1) -> Sort (cost=64250.24..64558.81 rows=123430 width=42) (actual time=1783.085..1783.093 rows=100 loops=1) Sort Key: ((hll_cardinality(hll_union_agg(sessions)))::integer) DESC Sort Method: top-N heapsort Memory: 33kB -> GroupAggregate (cost=52933.89..59532.83 rows=123430 width=42) (actual time=905.502..1724.363 rows=212633 loops=1) Group Key: search_keyword -> Sort (cost=52933.89..53636.53 rows=281055 width=54) (actual time=905.483..1351.212 rows=280981 loops=1) Sort Key: search_keyword Sort Method: external merge Disk: 18496kB -> Seq Scan on rollup_day a (cost=0.00..17890.22 rows=281055 width=54) (actual time=29.720..112.161 rows=280981 loops=1) Filter: ((created_date >= '2018-09-01'::date) AND (created_date <= '2019-09-30'::date) AND (tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'::uuid)) Rows Removed by Filter: 225546 Planning Time: 0.129 ms Execution Time: 1786.222 ms Planning Time: 0.103 ms Execution Time: 1722.718 ms
What I've tried
Any help would be really appreciated.
Update
Query plan after setting work_mem to 100MB
Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=1375.926..1375.935 rows=100 loops=1) Task Count: 1 Tasks Shown: All -> Task Node: host=localhost port=5454 dbname=postgres -> Limit (cost=48348.85..48349.10 rows=100 width=42) (actual time=1307.072..1307.093 rows=100 loops=1) -> Sort (cost=48348.85..48633.55 rows=113880 width=42) (actual time=1307.071..1307.080 rows=100 loops=1) Sort Key: (sum(total)) DESC Sort Method: top-N heapsort Memory: 35kB -> GroupAggregate (cost=38285.79..43996.44 rows=113880 width=42) (actual time=941.504..1261.177 rows=172945 loops=1) Group Key: search_keyword -> Sort (cost=38285.79..38858.52 rows=229092 width=54) (actual time=941.484..963.061 rows=227261 loops=1) Sort Key: search_keyword Sort Method: quicksort Memory: 32982kB -> Seq Scan on rollup_day_104290 a (cost=0.00..17890.22 rows=229092 width=54) (actual time=38.803..104.350 rows=227261 loops=1) Filter: ((created_date >= '2019-01-01'::date) AND (created_date <= '2019-12-30'::date) AND (tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'::uuid)) Rows Removed by Filter: 279266 Planning Time: 0.131 ms Execution Time: 1308.814 ms Planning Time: 0.112 ms Execution Time: 1375.961 ms
Update 2
After creating an index on created_date and increased work_mem to 120MB
create index date_idx on rollup_day(created_date);
The total number of rows is: 12,124,608
Query Plan is:
Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=2635.530..2635.540 rows=100 loops=1) Task Count: 1 Tasks Shown: All -> Task Node: host=localhost port=9702 dbname=postgres -> Limit (cost=73545.19..73545.44 rows=100 width=51) (actual time=2755.849..2755.873 rows=100 loops=1) -> Sort (cost=73545.19..73911.25 rows=146424 width=51) (actual time=2755.847..2755.858 rows=100 loops=1) Sort Key: (sum(total)) DESC Sort Method: top-N heapsort Memory: 35kB -> GroupAggregate (cost=59173.97..67948.97 rows=146424 width=51) (actual time=2014.260..2670.732 rows=296537 loops=1) Group Key: search_keyword -> Sort (cost=59173.97..60196.85 rows=409152 width=55) (actual time=2013.885..2064.775 rows=410618 loops=1) Sort Key: search_keyword Sort Method: quicksort Memory: 61381kB -> Index Scan using date_idx_102913 on rollup_day_102913 a (cost=0.42..21036.35 rows=409152 width=55) (actual time=0.026..183.370 rows=410618 loops=1) Index Cond: ((created_date >= '2018-01-01'::date) AND (created_date <= '2018-12-31'::date)) Filter: (tenant_id = '12850a62-19ac-477d-9cd7-837f3d716885'::uuid) Planning Time: 0.135 ms Execution Time: 2760.667 ms Planning Time: 0.090 ms Execution Time: 2635.568 ms解决方案
You should experiment with higher settings of work_mem until you get an in-memory sort. Of course you can only be generous with memory if your machine has enough of it.
What would make your query way faster is if you store pre-aggregated data, either using a materialized view or a second table and a trigger on your original table that keeps the sums in the other table updated. I don't know if that is possible with your data, as I don't know what hll_cardinality and hll_union_agg are.
更多推荐
提高PostgresSQL聚合查询性能
发布评论