ElasticSearch计算按字段分组的多个字段

编程入门行业动态更新时间:2024-10-22 13:40:07

本文介绍了ElasticSearch计算按字段分组的多个字段的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有类似的文件

{"domain":"US"，"zipcode":"11111"，"eventType":"click"，"id":"1"，"time":100}{域":"US"，邮政编码":"22222"，"eventType":出售"，"id":"2"，时间":200}{域":美国"，邮政编码":"22222"，事件类型":点击"，"id":"3"，时间":150}{域":美国"，邮政编码":"11111"，事件类型":出售"，"id":"4"，时间":350}{域":美国"，邮政编码":"33333"，事件类型":出售"，"id":"5"，时间":225}{域":"EU"，邮政编码":"44444"，事件类型":点击"，"id":"5"，时间":120}

我想通过eventType = sell过滤这些文档，并在125到400之间过滤时间，按域名分组，后按邮政编码，并对每个存储分区中的文档进行计数.所以我的输出就像(过滤器会忽略第一个文档和最后一个文档)

美国11111,1

美国，22222,1

美国，33333,1

在SQL中，这应该很简单.但是我无法在ElasticSearch上使用它.有人可以帮我吗?

如何编写ElasticSearch查询以完成上述任务?

解决方案

此查询似乎可以满足您的要求:

POST/test_index/_search{大小":0，询问": {已过滤":{筛选": {布尔":{必须": [{学期": {"eventType":出售"}}，{范围": {时间": {"gte":125，"lte":400}}}]}}}}，"aggs":{"zipcode_terms":{条款":{"field":邮政编码"}}}}

{接":8"timed_out":否，"_shards":{总计":5成功":5失败":0}，点击数":{总计":3，"max_score":0，点击数":[]}，集合":{"zipcode_terms":{"doc_count_error_upper_bound":0，"sum_other_doc_count":0，存储桶":[{"key":"11111"，"doc_count":1}，{"key":"22222"，"doc_count":1}，{"key":"33333"，"doc_count":1}]}}}

(请注意，"22222"处只有1个卖出"，而不是2个.)

这是我用来测试的一些代码:

sense.qbox.io/gist/1c4cb591ab72a6f3ae681df30fe023ddfca4225b

您可能想看看术语集合，布尔过滤器和范围过滤器.

我刚刚意识到我省略了域部分，但是如果需要的话，也可以直接在其上添加存储桶聚合.

I have documents like

{"domain":"US", "zipcode":"11111", "eventType":"click", "id":"1", "time":100} {"domain":"US", "zipcode":"22222", "eventType":"sell", "id":"2", "time":200} {"domain":"US", "zipcode":"22222", "eventType":"click", "id":"3","time":150} {"domain":"US", "zipcode":"11111", "eventType":"sell", "id":"4","time":350} {"domain":"US", "zipcode":"33333", "eventType":"sell", "id":"5","time":225} {"domain":"EU", "zipcode":"44444", "eventType":"click", "id":"5","time":120}

I want to filter these documents by eventType=sell and time between 125 and 400, group by domain followed by zipcode and count the documents in each bucket. So my output would be like (first and last docs would be ignored by the filters)

US, 11111,1

US, 22222,1

US, 33333,1

In SQL, this should have been straightforward. But I am not able to get this to work on ElasticSearch. Could someone please help me out here?

How do I write ElasticSearch query to accomplish the above?

解决方案

This query seems to do what you want:

POST /test_index/_search { "size": 0, "query": { "filtered": { "filter": { "bool": { "must": [ { "term": { "eventType": "sell" } }, { "range": { "time": { "gte": 125, "lte": 400 } } } ] } } } }, "aggs": { "zipcode_terms": { "terms": { "field": "zipcode" } } } }

returning

{ "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0, "hits": [] }, "aggregations": { "zipcode_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "11111", "doc_count": 1 }, { "key": "22222", "doc_count": 1 }, { "key": "33333", "doc_count": 1 } ] } } }

(Note that there is only 1 "sell" at "22222", not 2).

Here is some code I used to test it:

sense.qbox.io/gist/1c4cb591ab72a6f3ae681df30fe023ddfca4225b

You might want to take a look at terms aggregations, the bool filter, and range filters.

EDIT: I just realized I left out the domain part, but it should be straightforward to add in a bucket aggregation on that as well if you need to.

更多推荐

ElasticSearch计算按字段分组的多个字段

本文发布于:2023-11-22 13:52:27，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1617687.html