如何在自定义聚合器的collect（）方法中按顺序收集文档？(How to collect docs in order in collect() method of a custom aggregat

编程入门行业动态更新时间:2024-10-25 00:27:40

如何在自定义聚合器的collect（）方法中按顺序收集文档？(How to collect docs in order in collect() method of a custom aggregator?) elasticsearch

我正在使用ES 1.3.4开发自定义聚合器。它从NumericMetricsAggregator.MultiValue类扩展而来。它的代码结构与Stats聚合器非常相似。根据我的要求，我需要在重写的collect()方法中按升序接收doc ID。对于大多数查询，我确实按升序获得了文档ID。有趣的是，对于bool should查询有多个子句，我按降序获得doc Ids！我怎样才能解决这个问题？这是一个错误吗？

I am developing a custom aggregator using ES 1.3.4. It extends from NumericMetricsAggregator.MultiValue class. Its code structure closely resembles that of the Stats aggregator. For my requirements, I need the doc Ids to be received in ascending order in the overridden collect() method. For most queries, I do get the doc Ids in ascending order. Interestingly for bool should queries having multiple clauses, I get doc Ids in descending order! How can I fix this? Is this a bug?

最满意答案

我在github上问了同样的问题，得到了对我有用的答案。这是解决方案：

你可以调用aggregationContext.ensureScoreDocsInOrder（）; 为了确保文档按顺序排列，请查看使用此方法的ReverseNestedAggregator。

确实允许查询在不允许的情况下无序地发出文档，如果它使事情变得更快。我相信今天发生的唯一情况是当你得到Lucene的布尔分数时，它用于顶级析取，所以你的观察是有道理的。

链接到该问题： https ： //github.com/elasticsearch/elasticsearch/issues/8216

I asked the same question on github and got the answer which worked for me. Here's the solution:

You can call aggregationContext.ensureScoreDocsInOrder(); to make sure that docs are going to come in order, have a look for instance at ReverseNestedAggregator which uses this method.

Queries are indeed allowed to emit documents out-of-order if allowed to do so and if it makes things faster. I believe the only case when it happens today is when you get Lucene's BooleanScorer which is used for top-level disjunctions, so your observation makes sense.

Link to the issue: https://github.com/elasticsearch/elasticsearch/issues/8216

更多推荐

本文发布于:2023-07-15 19:03:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1117556.html