我尝试搜索解决方案,但没有找到适合我的情况...
I try to search a solution but I didn't find anything for my case...
这是数据库声明(简体):
Here is the database declaration (simplified):
CREATE TABLE documents ( document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY, data_block jsonb NULL );这是插入的示例.
INSERT INTO documents (document_id, data_block) VALUES(878979, {"COMMONS": {"DATE": {"value": "2017-03-11"}}, "PAYABLE_INVOICE_LINES": [ {"AMOUNT": {"value": 52408.53}}, {"AMOUNT": {"value": 654.23}} ]}); INSERT INTO documents (document_id, data_block) VALUES(977656, {"COMMONS": {"DATE": {"value": "2018-03-11"}}, "PAYABLE_INVOICE_LINES": [ {"AMOUNT": {"value": 555.10}} ]});我要搜索PAYABLE_INVOICE_LINES之一所在行的值大于1000.00的所有文档
I want to search all documents where one of the PAYABLE_INVOICE_LINES has a line with a value greater than 1000.00
我的查询是
select * from documents d cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil where (pil->'AMOUNT'->>'value')::decimal >= 1000但是,由于要限制为50个文档,我必须对document_id进行分组,并将结果限制为50个.
But, as I want to limit to 50 documents, I have to group on the document_id and limit the result to 50.
具有数百万个文档,此查询非常昂贵...一百万个内容需要10秒.
With millions of documents, this query is very expensive... 10 seconds with 1 million.
您有一些想法可以提高性能吗?
Do you have some ideas to have better performance ?
谢谢
推荐答案代替cross join lateral使用where exists:
select * from documents d where exists ( select 1 from jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil where (pil->'AMOUNT'->>'value')::decimal >= 1000) limit 50;
更新
还有另一种方法,不仅更复杂,而且效率更高.
And yet another method, more complex but also much more efficient.
创建从您的JSONB数据返回最大值的函数,如下所示:
Create function that returns max value from your JSONB data, like this:
create function fn_get_max_PAYABLE_INVOICE_LINES_value(JSONB) returns decimal language sql as $$ select max((pil->'AMOUNT'->>'value')::decimal) from jsonb_array_elements($1 -> 'PAYABLE_INVOICE_LINES') as pil $$在此函数上创建索引:
create index idx_max_PAYABLE_INVOICE_LINES_value on documents(fn_get_max_PAYABLE_INVOICE_LINES_value(data_block));在查询中使用功能:
select * from documents d where fn_get_max_PAYABLE_INVOICE_LINES_value(data_block) > 1000 limit 50;在这种情况下,将使用索引,并且对大量数据的查询将更快.
In this case the index will be used and query will be much faster on large amount of data.
PS:通常limit与order by配对是有意义的.
PS: Usually limit have sense in pair with order by.
更多推荐
使用更大的运算符在数组中进行Postgres jsonb搜索(使用jsonb
发布评论