结合关系查询提高Postgres jsonb查询的性能

编程入门 行业动态 更新时间:2024-10-26 15:13:42
本文介绍了结合关系查询提高Postgres jsonb查询的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个SELECT,可以查询常规的postgres表和jsonb列.当我选择整个jsonb列时,查询速度很快(574毫秒).但是,当我改为选择同一jsonb列的顶级路径时,查询速度降低了6倍(3241ms).我的最终查询需要从这些顶级jsonb路径中的4个访问字符串数组值,这将查询速度降低到5秒.

I have a single SELECT that queries both regular postgres tables and a jsonb column. When I SELECT the entire jsonb column, the query is fast (574 ms). However when I instead select a top-level path of the same jsonb column, the query slows down by 6x (3241ms). My final query needs to access the string array values from 4 of these top-level jsonb paths, which slows the query right down to 5 seconds.

我在 cfiles 表中有大约5万条记录,并且jsonb列 cfiles.property_values 的结构如下:

I have about 50K records in the cfiles table and the jsonb column cfiles.property_values is structured like this:

{ "Sample Names":["up to 200 short strings..."], "Project IDs": ["up to 10 short strings..."], "Run IDs": ["up to 10 short strings..."], "Data Type": ["up to 10 short strings..."] }

按照此答案,我尝试过我在下面添加一个GIN索引,但效果不大(运行时在下面的注释中),我假设是因为我的查询不是使用 @> 运算符的纯json,而是与关系查询结合使用

Following this answer I tried adding a GIN index below but it had very little effect (run time in comments below), I'm assuming because my query is not pure json using the @> operator and is combined with a relational query.

CREATE INDEX ON cfiles USING GIN (property_values jsonb_path_ops);

令我惊讶的是,获得整个专栏与仅查询顶级json键相比,存在巨大差异.在这一点上,将整个jsonb列作为字符串提取并将其在逗号中分割并加引号似乎更为有效,这是我更希望避免的技巧.

I'm surprised at the huge difference in getting the whole column vs querying even just the top-level json keys. At this point, it seems more performant to fetch the whole jsonb column as a string and split it on the commas and chomp the quotes, which is a hack I'd prefer to avoid.

我的目标是< 2000毫秒-我还有其他方法可以改善性能吗?(查询,表格并在下面解释)-更新:使用PostgreSQL 12

I'm aiming for <2000ms - are there any other ways I can improve the performance? (query, table and explain below) - Update: Using PostgreSQL Version 12

SELECT -- FAST OPTION: getting all of json: no GIN=579ms; with GIN=574ms cfiles.property_values as "1907", -- == vs == -- SLOW OPTION: getting a json path: no GIN=3273ms; with GIN=3241ms cfiles.property_values #>> '{"Sample Names"}' as "1907", -- adding another path: with GIN=4028ms cfiles.property_values #>> '{"Project IDs"}' as "1908", -- adding yet another path: with GIN=4774ms cfiles.property_values #>> '{"Run IDs"}' as "1909", -- adding yet another path: with GIN=5558ms cfiles.property_values #>> '{"Data Type"}' as "1910", -- ==== rest of query below I can't change ==== user_permissions.notified_at::text as "111", group_permissions.notified_at::text as "112", user_permissions.task_id::text as "113", group_permissions.task_id::text as "114", datasets.id as "151", datasets.name as "154", datasets.path as "155", datasets.last_modified as "156", datasets.file_count as "157", datasets.locked as "158", datasets.content_types as "159", cfiles.name as "105", cfiles.last_modified as "107", pg_size_pretty(cfiles.size::bigint) as "106", cfiles.id as "101", cfiles.tid as "102", cfiles.uuid as "103", cfiles.path as "104", cfiles.content_type as "108", cfiles.locked as "109", cfiles.checksum as "110" FROM cfiles JOIN datasets ON datasets.id=cfiles.dataset_id LEFT JOIN user_permissions ON (user_permissions.cfile_id=cfiles.id OR user_permissions.dataset_id=datasets.id) LEFT JOIN users on users.id=user_permissions.user_id LEFT JOIN group_permissions ON (group_permissions.cfile_id=cfiles.id OR group_permissions.dataset_id=datasets.id) LEFT JOIN groups ON groups.id=group_permissions.group_id LEFT JOIN user_groups ON groups.id=user_groups.group_id LEFT JOIN picklist_cfiles ON picklist_cfiles.cfile_id=cfiles.id WHERE cfiles.tid=5 ORDER BY "107" desc LIMIT 20 OFFSET 0

Table "public.cfiles" Column | Type | Collation | Nullable | Default -----------------+-----------------------------+-----------+----------+------------------------------------ id | bigint | | not null | nextval('cfiles_id_seq'::regclass) tid | bigint | | not null | uuid | uuid | | not null | gen_random_uuid() dataset_id | bigint | | not null | path | character varying | | not null | name | character varying | | | checksum | character varying | | | size | bigint | | | last_modified | timestamp without time zone | | | content_type | character varying | | | locked | boolean | | not null | false property_values | jsonb | | | created_at | timestamp without time zone | | not null | updated_at | timestamp without time zone | | not null | Indexes: "cfiles_pkey" PRIMARY KEY, btree (id) "cfiles_property_values_idx" gin (property_values jsonb_path_ops) "index_cfiles_dataset_id_path" UNIQUE, btree (dataset_id, path) "index_cfiles_name" btree (name) "index_cfiles_tid" btree (tid) "index_cfiles_uuid_id_path" UNIQUE, btree (uuid) Foreign-key constraints: "cfiles_datasets_fk" FOREIGN KEY (dataset_id) REFERENCES datasets(id) "cfiles_tenants_fk" FOREIGN KEY (tid) REFERENCES tenants(id) Referenced by: TABLE "group_permissions" CONSTRAINT "group_permissions_cfiles_fk" FOREIGN KEY (cfile_id) REFERENCES cfiles(id) TABLE "picklist_cfiles" CONSTRAINT "picklist_cfiles_cfiles_fk" FOREIGN KEY (cfile_id) REFERENCES cfiles(id) TABLE "user_permissions" CONSTRAINT "user_permissions_cfiles_fk" FOREIGN KEY (cfile_id) REFERENCES cfiles(id)

慢查询计划:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=13700.06..13700.11 rows=20 width=662) (actual time=5702.511..5702.521 rows=20 loops=1) Output: ((cfiles.property_values #>> '{"Sample Names"}'::text[])), ((cfiles.property_values #>> '{"Project IDs"}'::text[])), ((cfiles.property_values #>> '{"Run IDs"}'::text[])), ((cfiles.property_values #>> '{"Data Type"}'::text[])), ((user_permissions.notified_at)::text), ((group_permissions.notified_at)::text), ((user_permissions.task_id)::text), ((group_permissions.task_id)::text), datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, (pg_size_pretty(cfiles.size)), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum -> Sort (cost=13700.06..13810.61 rows=44219 width=662) (actual time=5702.508..5702.512 rows=20 loops=1) Output: ((cfiles.property_values #>> '{"Sample Names"}'::text[])), ((cfiles.property_values #>> '{"Project IDs"}'::text[])), ((cfiles.property_values #>> '{"Run IDs"}'::text[])), ((cfiles.property_values #>> '{"Data Type"}'::text[])), ((user_permissions.notified_at)::text), ((group_permissions.notified_at)::text), ((user_permissions.task_id)::text), ((group_permissions.task_id)::text), datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, (pg_size_pretty(cfiles.size)), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum Sort Key: cfiles.last_modified DESC Sort Method: top-N heapsort Memory: 344kB -> Hash Left Join (cost=39.53..12523.41 rows=44219 width=662) (actual time=2.535..5526.409 rows=44255 loops=1) Output: (cfiles.property_values #>> '{"Sample Names"}'::text[]), (cfiles.property_values #>> '{"Project IDs"}'::text[]), (cfiles.property_values #>> '{"Run IDs"}'::text[]), (cfiles.property_values #>> '{"Data Type"}'::text[]), (user_permissions.notified_at)::text, (group_permissions.notified_at)::text, (user_permissions.task_id)::text, (group_permissions.task_id)::text, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, pg_size_pretty(cfiles.size), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum Hash Cond: (cfiles.id = picklist_cfiles.cfile_id) -> Nested Loop Left Join (cost=38.19..10918.99 rows=44219 width=867) (actual time=1.639..632.739 rows=44255 loops=1) Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, user_permissions.notified_at, user_permissions.task_id, group_permissions.notified_at, group_permissions.task_id Join Filter: ((user_permissions.cfile_id = cfiles.id) OR (user_permissions.dataset_id = datasets.id)) Rows Removed by Join Filter: 177020 -> Nested Loop Left Join (cost=38.19..7822.61 rows=44219 width=851) (actual time=1.591..464.449 rows=44255 loops=1) Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, group_permissions.notified_at, group_permissions.task_id Join Filter: ((group_permissions.cfile_id = cfiles.id) OR (group_permissions.dataset_id = datasets.id)) Rows Removed by Join Filter: 354040 -> Hash Join (cost=35.75..4723.32 rows=44219 width=835) (actual time=1.301..163.411 rows=44255 loops=1) Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types Inner Unique: true Hash Cond: (cfiles.dataset_id = datasets.id) -> Seq Scan on public.cfiles (cost=0.00..4570.70 rows=44219 width=644) (actual time=0.044..49.425 rows=44255 loops=1) Output: cfiles.id, cfiles.tid, cfiles.uuid, cfiles.dataset_id, cfiles.path, cfiles.name, cfiles.checksum, cfiles.size, cfiles.last_modified, cfiles.content_type, cfiles.locked, cfiles.property_values, cfiles.created_at, cfiles.updated_at Filter: (cfiles.tid = 5) Rows Removed by Filter: 1561 -> Hash (cost=28.11..28.11 rows=611 width=199) (actual time=1.234..1.235 rows=611 loops=1) Output: datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types Buckets: 1024 Batches: 1 Memory Usage: 149kB -> Seq Scan on public.datasets (cost=0.00..28.11 rows=611 width=199) (actual time=0.012..0.571 rows=611 loops=1) Output: datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types -> Materialize (cost=2.44..3.97 rows=4 width=32) (actual time=0.000..0.002 rows=8 loops=44255) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id -> Hash Right Join (cost=2.44..3.95 rows=4 width=32) (actual time=0.170..0.248 rows=8 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id Hash Cond: (user_groups.group_id = groups.id) -> Seq Scan on public.user_groups (cost=0.00..1.34 rows=34 width=8) (actual time=0.022..0.056 rows=34 loops=1) Output: user_groups.id, user_groups.tid, user_groups.user_id, user_groups.group_id, user_groups.created_at, user_groups.updated_at -> Hash (cost=2.39..2.39 rows=4 width=40) (actual time=0.121..0.121 rows=4 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Hash Right Join (cost=1.09..2.39 rows=4 width=40) (actual time=0.063..0.092 rows=4 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id Hash Cond: (groups.id = group_permissions.group_id) -> Seq Scan on public.groups (cost=0.00..1.19 rows=19 width=8) (actual time=0.010..0.017 rows=19 loops=1) Output: groups.id, groups.tid, groups.name, groups.description, groups.default_uview, groups.created_at, groups.updated_at -> Hash (cost=1.04..1.04 rows=4 width=40) (actual time=0.032..0.033 rows=4 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on public.group_permissions (cost=0.00..1.04 rows=4 width=40) (actual time=0.017..0.022 rows=4 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id -> Materialize (cost=0.00..1.06 rows=4 width=40) (actual time=0.000..0.001 rows=4 loops=44255) Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id -> Seq Scan on public.user_permissions (cost=0.00..1.04 rows=4 width=40) (actual time=0.021..0.025 rows=4 loops=1) Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id -> Hash (cost=1.15..1.15 rows=15 width=8) (actual time=0.040..0.040 rows=15 loops=1) Output: picklist_cfiles.cfile_id Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on public.picklist_cfiles (cost=0.00..1.15 rows=15 width=8) (actual time=0.010..0.017 rows=15 loops=1) Output: picklist_cfiles.cfile_id Planning Time: 3.141 ms Execution Time: 5702.799 ms (61 rows)

更新:对CTE模式的重构使我降低到20ms

Update: Refactorng to CTE pattern got me down to 20ms

WITH T as ( select cfiles.property_values as prop_vals, user_permissions.notified_at::text as "111", group_permissions.notified_at::text as "112", user_permissions.task_id::text as "113", group_permissions.task_id::text as "114", datasets.id as "151", datasets.name as "154", datasets.path as "155", datasets.last_modified as "156", datasets.file_count as "157", datasets.locked as "158", datasets.content_types as "159", cfiles.name as "105", cfiles.last_modified as "107", pg_size_pretty(cfiles.size::bigint) as "106", cfiles.id as "101", cfiles.tid as "102", cfiles.uuid as "103", cfiles.path as "104", cfiles.content_type as "108", cfiles.locked as "109", cfiles.checksum as "110" FROM cfiles JOIN datasets ON datasets.id=cfiles.dataset_id LEFT JOIN user_permissions ON (user_permissions.cfile_id=cfiles.id OR user_permissions.dataset_id=datasets.id) LEFT JOIN users on users.id=user_permissions.user_id LEFT JOIN group_permissions ON (group_permissions.cfile_id=cfiles.id OR group_permissions.dataset_id=datasets.id) LEFT JOIN groups ON groups.id=group_permissions.group_id LEFT JOIN user_groups ON groups.id=user_groups.group_id LEFT JOIN picklist_cfiles ON picklist_cfiles.cfile_id=cfiles.id WHERE cfiles.tid=5 LIMIT 20 ) SELECT prop_vals ->> 'Sample Names' as "1907", prop_vals ->> 'Project IDs' as "1908", prop_vals ->> 'Run IDs' as "1909", prop_vals ->> 'Data Type' as "1910", "111", "112", "113", "114", "151", "154", "155", "156", "157", "158", "159", "105", "107", "106", "101", "102", "103", "104", "108", "109", "110" FROM T ORDER BY "107" desc;

CTE查询计划:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=16.18..16.23 rows=20 width=662) (actual time=18.771..18.779 rows=20 loops=1) Output: ((t.prop_vals ->> 'Sample Names'::text)), ((t.prop_vals ->> 'Project IDs'::text)), ((t.prop_vals ->> 'Run IDs'::text)), ((t.prop_vals ->> 'Data Type'::text)), t."111", t."112", t."113", t."114", t."151", t."154", t."155", t."156", t."157", t."158", t."159", t."105", t."107", t."106", t."101", t."102", t."103", t."104", t."108", t."109", t."110" Sort Key: t."107" DESC Sort Method: quicksort Memory: 368kB -> Subquery Scan on t (cost=4.05..15.74 rows=20 width=662) (actual time=1.091..18.412 rows=20 loops=1) Output: (t.prop_vals ->> 'Sample Names'::text), (t.prop_vals ->> 'Project IDs'::text), (t.prop_vals ->> 'Run IDs'::text), (t.prop_vals ->> 'Data Type'::text), t."111", t."112", t."113", t."114", t."151", t."154", t."155", t."156", t."157", t."158", t."159", t."105", t."107", t."106", t."101", t."102", t."103", t."104", t."108", t."109", t."110" -> Limit (cost=4.05..15.34 rows=20 width=987) (actual time=0.320..1.241 rows=20 loops=1) Output: cfiles.property_values, ((user_permissions.notified_at)::text), ((group_permissions.notified_at)::text), ((user_permissions.task_id)::text), ((group_permissions.task_id)::text), datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, (pg_size_pretty(cfiles.size)), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum -> Nested Loop Left Join (cost=4.05..24965.23 rows=44219 width=987) (actual time=0.318..1.224 rows=20 loops=1) Output: cfiles.property_values, (user_permissions.notified_at)::text, (group_permissions.notified_at)::text, (user_permissions.task_id)::text, (group_permissions.task_id)::text, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, pg_size_pretty(cfiles.size), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum Join Filter: ((user_permissions.cfile_id = cfiles.id) OR (user_permissions.dataset_id = datasets.id)) Rows Removed by Join Filter: 80 -> Nested Loop Left Join (cost=4.05..20873.92 rows=44219 width=851) (actual time=0.273..1.056 rows=20 loops=1) Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, group_permissions.notified_at, group_permissions.task_id Join Filter: ((group_permissions.cfile_id = cfiles.id) OR (group_permissions.dataset_id = datasets.id)) Rows Removed by Join Filter: 160 -> Nested Loop (cost=1.61..17774.63 rows=44219 width=835) (actual time=0.125..0.745 rows=20 loops=1) Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types Inner Unique: true -> Hash Left Join (cost=1.34..4738.00 rows=44219 width=644) (actual time=0.094..0.475 rows=20 loops=1) Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, cfiles.dataset_id Hash Cond: (cfiles.id = picklist_cfiles.cfile_id) -> Seq Scan on public.cfiles (cost=0.00..4570.70 rows=44219 width=644) (actual time=0.046..0.360 rows=20 loops=1) Output: cfiles.id, cfiles.tid, cfiles.uuid, cfiles.dataset_id, cfiles.path, cfiles.name, cfiles.checksum, cfiles.size, cfiles.last_modified, cfiles.content_type, cfiles.locked, cfiles.property_values, cfiles.created_at, cfiles.updated_at Filter: (cfiles.tid = 5) Rows Removed by Filter: 629 -> Hash (cost=1.15..1.15 rows=15 width=8) (actual time=0.034..0.035 rows=15 loops=1) Output: picklist_cfiles.cfile_id Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on public.picklist_cfiles (cost=0.00..1.15 rows=15 width=8) (actual time=0.010..0.018 rows=15 loops=1) Output: picklist_cfiles.cfile_id -> Index Scan using datasets_pkey on public.datasets (cost=0.28..0.29 rows=1 width=199) (actual time=0.008..0.008 rows=1 loops=20) Output: datasets.id, datasets.tid, datasets.bucket_path_id, datasets.path, datasets.name, datasets.last_modified, datasets.file_count, datasets.size, datasets.content_types, datasets.locked, datasets.created_at, datasets.updated_at Index Cond: (datasets.id = cfiles.dataset_id) -> Materialize (cost=2.44..3.97 rows=4 width=32) (actual time=0.005..0.009 rows=8 loops=20) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id -> Hash Right Join (cost=2.44..3.95 rows=4 width=32) (actual time=0.088..0.122 rows=8 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id Hash Cond: (user_groups.group_id = groups.id) -> Seq Scan on public.user_groups (cost=0.00..1.34 rows=34 width=8) (actual time=0.007..0.016 rows=34 loops=1) Output: user_groups.id, user_groups.tid, user_groups.user_id, user_groups.group_id, user_groups.created_at, user_groups.updated_at -> Hash (cost=2.39..2.39 rows=4 width=40) (actual time=0.069..0.069 rows=4 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Hash Right Join (cost=1.09..2.39 rows=4 width=40) (actual time=0.043..0.064 rows=4 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id Hash Cond: (groups.id = group_permissions.group_id) -> Seq Scan on public.groups (cost=0.00..1.19 rows=19 width=8) (actual time=0.006..0.011 rows=19 loops=1) Output: groups.id, groups.tid, groups.name, groups.description, groups.default_uview, groups.created_at, groups.updated_at -> Hash (cost=1.04..1.04 rows=4 width=40) (actual time=0.022..0.022 rows=4 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on public.group_permissions (cost=0.00..1.04 rows=4 width=40) (actual time=0.009..0.014 rows=4 loops=1) Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id -> Materialize (cost=0.00..1.06 rows=4 width=40) (actual time=0.001..0.003 rows=4 loops=20) Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id -> Seq Scan on public.user_permissions (cost=0.00..1.04 rows=4 width=40) (actual time=0.018..0.022 rows=4 loops=1) Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id Planning Time: 4.049 ms Execution Time: 19.128 ms (60 rows)

推荐答案

您的慢查询将对所有44255行的大型jsonb数据进行解映射,然后将经过解析的值进行排序以挑选出前20行.(我不知道为什么它如此急切地进行去味).因此,将44235 JSONB删除了,只是被扔掉了.

Your slow query is deTOASTing the large jsonb data for all 44255 rows, and then carrying the parsed-out values through the sort to pick out the top 20 rows. (I don't know why it does the deTOASTing eagerly like that). So 44235 JSONB were deTOASTed just to be thrown away.

您的快速查询(大概是)从哈希联接中返回TOAST指针,使用这些小指针对行进行排序,然后仅对20个幸存者进行DETOAST删除.在EXPLAIN ANALYZE的情况下,它甚至不会破坏幸存者,它只是将指针扔掉了.

Your fast query is (presumably) returning TOAST pointers from the hash join, sorting the rows with those small pointers, and then deTOASTing only the 20 survivors. In the case of EXPLAIN ANALYZE, it doesn't even deTOAST the survivors, it just throws the pointers away.

这就是为什么",至于该怎么做,如果您真的无法更改最顶部下方的任何查询,我怀疑您在服务器端是否可以对此做任何事情

That is the "why", as for what to do about it, if you truly can't alter any of the query below the very top part, I doubt there is anything you can do about it on the server side.

如果可以更实质性地修改查询,则可以使用CTE来缩短运行时间.让CTE选择整个jsonb,然后在CTE上进行选择以从中提取值.

If you can modify the query more substantially, then you can improve the run time with a CTE. Have the CTE select the entire jsonb, and then the select on the CTE pulls the value out of it.

WITH T as (select cfiles.property_values as "1907", <rest of query>) SELECT "1907"->>'name1', "1907"->>'name2', <rest of select list> from T;

更多推荐

结合关系查询提高Postgres jsonb查询的性能

本文发布于:2023-10-13 14:24:47,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1488205.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:性能   关系   Postgres   jsonb

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!