我一直在分析提高 SOLR 索引性能的最佳方法,并且可能会分片当前索引以允许搜索变得分布式.
I've been analyzing the best method to improve the performance of our SOLR index and will likely shard the current index to allow searches to become distributed.
然而,鉴于我们的索引超过 400GB 并且包含大约 700MM 的文档,重新索引数据似乎很麻烦.我一直在考虑复制索引和删除文档作为更有效地创建分片环境的方法.
However given that our index is over 400GB and contains about 700MM documents, reindexing the data seems burdensome. I've been toying with the idea of duplicating the indexes and deleting documents as a means to more efficiently create the sharded environment.
不幸的是,模数似乎不可用于查询文档的内部数字 ID.我可以使用哪些其他可能的分区策略按查询删除而不是完全重新索引?
Unfortunally it seems that modulus isn't available to query against the document's internal numeric ID. What other possible partitioning strategies could I use to delete by query rather than a full reindex?
推荐答案lucene 工具可以完成这项工作 IndexSplitter,参见这里一篇文章的链接(日语,用谷歌翻译...)
A lucene tool would do the job IndexSplitter, see mentioned here with a link to an article (japanese, tranlate it with google...)
更多推荐
将当前的 solr 索引划分为分片
发布评论