多语言Solr搜索索引(Multi

编程入门 行业动态 更新时间:2024-10-25 11:25:09
多语言Solr搜索索引(Multi-Language Solr Search Index)

我正在设置索引多种语言的Solr搜索引擎。 我创建了一个自定义的UpdateProcessorFactory来确定输入文本的哪些部分是哪种语言,然后将文档的这些部分复制到语言特定的字段中。 例如,用这个文本:

“Hello World,Bonjour le Monde,Hallo Welt。”

它将“Hello World”复制到文本字段中,将“Bonjour le Monde”复制到fr文本字段中,并将“Hallo Welt”复制到解除文本字段中。 每个领域都有适当的语言分析器来标记和扼杀词汇。

最后,我希望为用户提供一个框,用于输入将搜索所有语言的搜索词。 搜索条件不需要翻译,但它们应该适当地加以阻止。 什么是完成这个最好的方法? 我也很关心搜索的性能。

I am setting up a Solr Search Engine that will index multiple languages. I created a custom UpdateProcessorFactory to figure out which sections of the input text are which language, and then I copy those sections of the document into language specific fields. For example, with this text:

"Hello World, Bonjour le Monde, Hallo Welt."

It copies "Hello World" into the en-text field, "Bonjour le Monde" into the fr-text field, and "Hallo Welt" into the de-text field. Each field has the appropriate language analyzers to tokenize and stem the words.

In the end I would like to have one box for a user to enter search terms that would search across all languages. The search terms don't need to be translated, but they should be stemmed appropriately. What is the best way to accomplish this? I'm also very concerned about the performance of the searches.

最满意答案

最好的方法是使用DisMaxRequestHandler 。 它将适当地分析每个字段的适当语言(如schema.xml中定义的)。

所以,如果你的查询看起来像/ solr / select?qt = dismax&qf = en-text%20fr-text%20de-text&q = hello%world Solr会做正确的事情。

(假设你在solrconfig.xml中的requestHandler块中将dismax配置为solr.DisMaxRequestHandler)

大多数分析都很快。 你的表现范围主要取决于你的指数大小,总学期数量等。一定要根据他们wiki上的solr perfomance指南调整一切。 目前我正在运行一个60GB的索引,并继续在硬件上在100ms以下的范围内进行搜索,这些并不是那么有趣。

The best way is to use the DisMaxRequestHandler. It will appropriately analyze each field for the appropriate language (as defined in schema.xml).

So, if your query looks like /solr/select?qt=dismax&qf=en-text%20fr-text%20de-text&q=hello%world Solr will do the right thing.

(assuming you configured dismax as a solr.DisMaxRequestHandler in a requestHandler block in solrconfig.xml)

Most analysis is fast. Your performance bounds are mostly on your index size, total term counts, etc. Be sure to tune everything according to the solr perfomance guide on their wiki. I'm currently running a 60GB index and continue to get searches in the sub 100ms range on hardware that isn't all that fancy.

更多推荐

本文发布于:2023-08-06 23:53:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1457630.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多语言   索引   Solr   Multi

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!