带有docIds的Lucene过滤器

编程入门行业动态更新时间:2024-10-25 20:28:40

本文介绍了带有docIds的Lucene过滤器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述我试图做到以下几点：我想通过分别查询每个字段来创建一个候选集合，然后将最前面的k个匹配添加到这个集合中。完成之后，我需要在这个候选集上运行另一个查询。我现在如何实现它的方式是使用QueryWrapperFilter和一个BooleanQuery匹配每个候选文档的唯一id字段。但是，这意味着我必须先为每个候选文档调用IndexSearcher.doc（）。get（docId），然后才能将其添加到我的BooleanQuery中，这是主要的瓶颈。我只是通过MapFieldSelector（docId）加载docId字段。我想创建自己的Filter类，但是我不能使用内部的Lucene doc ID直接，因为它们是每段指定的。任何想法如何解决这个问题？解决方案

该字段（它可能已经是），并使用 FieldCache 以更快的速度检索docId，而不是在布尔查询中使用docIds，可以使用 TermsFilter 或 FieldCacheTermsFilter 。后面的文档描述了性能的权衡。

I'm trying to do the following: I want to create a set of candidates by querying each field separately and then adding the top k matches to this set. After I'm done with that, I need to run another query on this candidate set. The way how I implemented it right now is using a QueryWrapperFilter with a BooleanQuery that matches the unique id field of each candidate document. However, this means I have to call IndexSearcher.doc().get("docId") for each candidate document before I can add it to my BooleanQuery, which is the major bottleneck. I'm only loading the docId field via MapFieldSelector("docId).

I wanted to create my own Filter class, but I can't use the internal Lucene doc ids directly, because they are specified per segment. Any thoughts on how to approach this?
解决方案
Instead of reading the stored docId, index the field (it probably already is) and use the FieldCache to retrieve docIds much faster. Then instead of using the docIds in a BooleanQuery, try using a TermsFilter or FieldCacheTermsFilter. The latter documentation describes the performance trade-offs.

更多推荐

带有docIds的Lucene过滤器

本文发布于:2023-10-13 20:34:22，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1489005.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

过滤器 docIds Lucene

上一篇：带有分支过滤器的GitHub Actions标签过滤器

下一篇：带有多个过滤器的ElasticSearch

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word