如何在elasticsearch中配置同义词

编程入门 行业动态 更新时间:2024-10-04 19:21:34
本文介绍了如何在elasticsearch中配置同义词_路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我是Elasticsearch的新手,我想使用同义词,我在配置文件中添加了以下几行:

i'm pretty new to elasticsearch and i want to use synonyms, i added these lines in the configuration file:

index : analysis : analyzer : synonym : type : custom tokenizer : whitespace filter : [synonym] filter : synonym : type : synonym synonyms_path: synonyms.txt

然后我创建了一个索引测试:

then i created an index test:

"mappings" : { "test" : { "properties" : { "text_1" : { "type" : "string", "analyzer" : "synonym" }, "text_2" : { "search_analyzer" : "standard", "index_analyzer" : "synonym", "type" : "string" }, "text_3" : { "type" : "string", "analyzer" : "synonym" } } }

}

并使用以下数据插入类型测试:

and insrted a type test with this data:

{ "text_3" : "foo dog cat", "text_2" : "foo dog cat", "text_1" : "foo dog cat" }

synonyms.txt包含 foo,bar,baz,当我搜索foo时,它会返回预期的内容,但是当我搜索baz时或禁止返回零结果:

synonyms.txt contains "foo,bar,baz", and when i search for foo it returns what i expected but when i search for baz or bar it return zero results:

{ "query":{ "query_string":{ "query" : "bar", "fields" : [ "text_1"], "use_dis_max" : true, "boost" : 1.0 }}}

结果:

{ "took":1, "timed_out":false, "_shards":{ "total":5, "successful":5, "failed":0 }, "hits":{ "total":0, "max_score":null, "hits":[ ] } }

推荐答案

我不知道,如果您遇到问题是因为您为 bar定义了错误的同义词。正如您所说的,您是一个新手,我将举一个与您类似的示例。我想展示一下Elasticsearch在搜索时和索引时如何处理同义词。希望对您有所帮助。

I don't know, if your problem is because you defined bad the synonyms for "bar". As you said you are pretty new I'm going to put an example similar to yours that works. I want to show how elasticsearch deal with synonyms at search time and at index time. Hope it helps.

首先创建同义词文件:

foo => foo bar, baz

现在,我使用您要测试的特定设置创建索引:

Now I create the index with the particular settings you are trying to test:

curl -XPUT 'localhost:9200/test/' -d '{ "settings": { "index": { "analysis": { "analyzer": { "synonym": { "tokenizer": "whitespace", "filter": ["synonym"] } }, "filter" : { "synonym" : { "type" : "synonym", "synonyms_path" : "synonyms.txt" } } } } }, "mappings": { "test" : { "properties" : { "text_1" : { "type" : "string", "analyzer" : "synonym" }, "text_2" : { "search_analyzer" : "standard", "index_analyzer" : "standard", "type" : "string" }, "text_3" : { "type" : "string", "search_analyzer" : "synonym", "index_analyzer" : "standard" } } } } }'

请注意,onymous.txt必须与配置文件位于同一目录中,因为该路径是相对于配置目录的。

Note that synonyms.txt must be in the same directory that the configuration file since that path is relative to the config dir.

现在为文档编制索引:

curl -XPUT 'localhost:9200/test/test/1' -d '{ "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" }'

现在搜索

在字段text_1中搜索

curl -XGET 'localhost:9200/test/_search?q=text_1:baz' { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.15342641, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 0.15342641, "_source": { "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" } } ] } }

您得到文档是因为baz是foo的同义词,并且在索引时间foo扩展了其同义词

You get the document because baz is synonym of foo and at index time foo is expanded with its synonyms

在字段text_2中搜索

curl -XGET 'localhost:9200/test/_search?q=text_2:baz'

结果:

{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }

我没有获得成功,因为我在索引(标准分析器)时没有扩展同义词。而且,由于我正在搜索baz且baz不在文本中,所以不会得到任何结果。

I don't get hits because I didn't expand synonyms while indexing (standard analyzer). And, since I'm searching baz and baz is not in the text, I don't get any result.

在字段text_3中搜索

curl -XGET 'localhost:9200/test/_search?q=text_3:foo' { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.15342641, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 0.15342641, "_source": { "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" } } ] } }

注意:text_3是巴兹狗猫

Note: text_3 is "baz dog cat"

text_3是索引不扩展同义词。当我搜索foo时,它的同义词之一就是 baz。

text_3 was indexes without expanding synonyms. As I'm searching for foo, which have "baz" as one of the synonyms I get the result.

如果要调试,可以使用 _analyze 端点,例如:

If you want to debug you can use _analyze endpoint for example:

curl -XGET 'localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'

结果:

{ "tokens": [ { "token": "foo", "start_offset": 0, "end_offset": 3, "type": "SYNONYM", "position": 1 }, { "token": "baz", "start_offset": 0, "end_offset": 3, "type": "SYNONYM", "position": 1 }, { "token": "bar", "start_offset": 0, "end_offset": 3, "type": "SYNONYM", "position": 2 } ] }

更多推荐

如何在elasticsearch中配置同义词

本文发布于:2023-11-28 03:25:27,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1640770.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:同义词   如何在   elasticsearch

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!