我对自动完成/搜索文本/项目的理解如何在任何可扩展的产品(如亚马逊电子商务/谷歌)中在高级别工作是:-
基于弹性搜索(ES)的方法
Elastic Search(ES) based approach
文档存储在 DB 中.一旦持久化给弹性搜索,它会创建索引并将索引/文档(基于标记器)存储在内存或基于磁盘中配置.
Documents are stored in DB . Once persisted given to Elastic search, It creates the index and store the index/document(based on tokenizer) in memory or disk based configuration.
一旦用户输入3个字符,它就会搜索ES下的所有索引(甚至可以配置为索引ngram),根据权重对它们进行排名并返回给用户
Once user types say 3 characters, it search all index under ES(Can be configured to index even ngram) , Rank them based on weightage and return to user
但是在阅读谷歌上的一些资源后,比如 基于 Trie 的搜索
But after reading couple of resources on google like Trie based search
看起来一些可扩展的产品也使用 Trie 数据结构来进行基于前缀的搜索.
Looks some of the scalable product also uses Trie data stucture to do the prefix based search.
我的问题是基于 trie 的方法可以替代 ES 或 ES 内部使用 Trie 还是我在这里完全遗漏了?
My question Is Can trie based approach be good alternative to ES or ES internally uses Trie or am i missing completely here ?
推荐答案ES自动补全可以通过两种方式实现:
ES autocompletion can be achieved in two ways:
第一个选项是穷人的补全功能.我之所以提到它,是因为它在某些情况下很有用,但如果您有大量文档,则应避免使用它.
The first option is the poor man's completion feature. I'm mentioning it because it can be useful in certain situation but you should avoid it if you have a substantial amount of documents.
第二个选项使用传统的 ES 索引功能,即它将标记文本,所有 (edge-)ngram 将被索引,然后您可以搜索任何已被索引的前缀/中缀/后缀.
The second option uses the conventional ES indexing features, i.e. it will tokenize the text, all (edge-)ngrams will be indexed and then you can search for any prefix/infix/suffix that have been indexed.
第三个选项使用不同的方法并针对速度进行了优化.基本上,当索引 completion 类型的字段时,ES 将创建一个 "finite状态转换器"并将其存储在内存中以实现超快速访问.
The third option uses a different approach and is optimized for speed. Basically, when indexing a field of type completion, ES will create a "finite state transducer" and store it in memory for ultra fast access.
就实现而言,有限状态转换器接近于特里.您可以查看这篇优秀文章,其中展示了trie 与 finite 比较状态传感器
A finite state transducer is close to a trie in terms of implementation. You can check this excellent article which shows how trie compares to finite state transducer
更新(2019 年 6 月 25 日):
ES 7.2 引入了一种名为 search_as_you_type 的新数据类型,它本机允许这种行为.阅读更多信息:www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you-type.html
ES 7.2 introduced a new data type called search_as_you_type that allows this kind of behavior natively. Read more at: www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you-type.html
更多推荐
用于搜索/自动完成的弹性搜索或 Trie?
发布评论