elasticsearch ngram analyzer / tokenizer无法正常工作？(elasticsearch ngram analyzer/tokenizer not working?)

似乎ngram tokenizer不工作或者我对它的理解/使用是不正确的。

我的标记器正在做3的mingram和5的maxgram。我正在寻找术语'madonna'，这绝对是在artists.name下的文件中。我可以用其他技术（使用简单的分析仪和相关技术）找到该术语，但不使用ngram。

我想通过使用ngram来实现的目的是找到错误拼写的名称和会计。

请查看我的映射，我的设置和查询的缩短版本，如果您有任何想法，请告诉我 - 这让我疯了！

设置...

{ "myindex": { "settings": { "index": { "analysis": { "analyzer": { "ngramAnalyzer": { "type": "custom", "filter": [ "lowercase" ], "tokenizer": "nGramTokenizer" } }, "tokenizer": { "nGramTokenizer": { "type": "nGram", "min_gram": "3", "max_gram": "5" } } }, "number_of_shards": "5", "number_of_replicas": "1", "version": { "created": "1020199" }, "uuid": "60ggSr6TREaDTItkaNUagg" } } } }

映射......

{ "myindex": { "mappings": { "mytype": { "properties": { "artists.name": { "type": "string", "analyzer": "simple", "fields": { "ngram": { "type": "string", "analyzer": "ngramAnalyzer" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } }

查询...

{"query": {"match": {"artists.name.ngram": "madonna"}}}

文件......

{ "_index": "myindex", "_type": "mytype", "_id": "602537592951", "_version": 1, "found": true, "_source": { "artists": [ { "name": "Madonna", "id": "P 64565" } ] } }

偶然编辑，这个查询工作（没有ngram）：

{"query": {"match": {"artists.name": "madonna"}}}

这显然与嵌套对象有关。我显然没有正确地将ngram应用于嵌套对象。

想法？

it seems that the ngram tokenizer isn't working or perhaps my understanding/use of it isn't correct.

my tokenizer is doing a mingram of 3 and maxgram of 5. i'm looking for the term 'madonna' which is definitely in my documents under artists.name. i can find the term with other techniques (using simple analyzer and related), but not using ngram.

what i'm trying to accomplish by using the ngram is to find names and accounting for misspellings.

please see a shortened version of my mappings, my settings, and my query, and if you have any ideas, please let me know - it's driving me nuts!

settings...

{ "myindex": { "settings": { "index": { "analysis": { "analyzer": { "ngramAnalyzer": { "type": "custom", "filter": [ "lowercase" ], "tokenizer": "nGramTokenizer" } }, "tokenizer": { "nGramTokenizer": { "type": "nGram", "min_gram": "3", "max_gram": "5" } } }, "number_of_shards": "5", "number_of_replicas": "1", "version": { "created": "1020199" }, "uuid": "60ggSr6TREaDTItkaNUagg" } } } }

mappings ...

{ "myindex": { "mappings": { "mytype": { "properties": { "artists.name": { "type": "string", "analyzer": "simple", "fields": { "ngram": { "type": "string", "analyzer": "ngramAnalyzer" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } }

query ...

{"query": {"match": {"artists.name.ngram": "madonna"}}}

document ...

{ "_index": "myindex", "_type": "mytype", "_id": "602537592951", "_version": 1, "found": true, "_source": { "artists": [ { "name": "Madonna", "id": "P 64565" } ] } }

EDIT incidentally, this query works (without ngram):

{"query": {"match": {"artists.name": "madonna"}}}

this obviously has something to do with the nested object here. i'm apparently not applying the ngram to the nested object properly.

ideas?

最满意答案

好的 - 我明白了。我真的希望这有助于某人b / c它让我疯狂。

这是我的映射结果如下：

{ "myindex": { "mappings": { "mytype": { "properties": { "artists": { "properties": { "id": { "type": "string" }, "name": { "type": "string", "analyzer": "ngramAnalyzer", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } } } } } }

这是我如何使用Nest语法做到的...

首先我有一个名为Person的子类型（类），其名称和ID看起来像这样（POCO）......

[Serializable] public class Person { public string Name { get; set; } [ElasticProperty(Analyzer = "fullTerm", Index = FieldIndexOption.not_analyzed)] public string Id { get; set; } }

然后我的映射就像这样......

.AddMapping<MyIndex>(m => m .MapFromAttributes() .Properties(props => { props .Object<Person>(x => x.Name("artists") .Properties(pp => pp .MultiField( mf => mf .Name(s => s.Name) .Fields(f => f .String(s => s.Name(o => o.Name).Analyzer("ngramAnalyzer")) .String(s => s.Name(o => o.Name.Suffix("raw")).Index(FieldIndexOption.not_analyzed)) ) ) ) ) )

注意：此处的对象表示它是我的类型“艺术家”下面的另一个对象。

谢谢，我！

ok - i figured it out. i really hope this helps someone b/c it drove me crazy.

here's what my mapping turned out to look like:

{ "myindex": { "mappings": { "mytype": { "properties": { "artists": { "properties": { "id": { "type": "string" }, "name": { "type": "string", "analyzer": "ngramAnalyzer", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } } } } } }

and here's how i did it using Nest syntax...

first i had a sub type (class) called Person which has a Name and Id which looks like this (POCO)...

[Serializable] public class Person { public string Name { get; set; } [ElasticProperty(Analyzer = "fullTerm", Index = FieldIndexOption.not_analyzed)] public string Id { get; set; } }

and then my mapping went something like this ...

.AddMapping<MyIndex>(m => m .MapFromAttributes() .Properties(props => { props .Object<Person>(x => x.Name("artists") .Properties(pp => pp .MultiField( mf => mf .Name(s => s.Name) .Fields(f => f .String(s => s.Name(o => o.Name).Analyzer("ngramAnalyzer")) .String(s => s.Name(o => o.Name.Suffix("raw")).Index(FieldIndexOption.not_analyzed)) ) ) ) ) )

Note: the Object here which indicates it's another object beneath my type 'artists'.

Thanks, me!!!

更多推荐