加快Lucene指数建设进程(Speed up Lucene index building process)

编程入门行业动态更新时间:2024-10-28 14:22:08

我正在研究这个类，在Lucene中为一个旧的Web应用程序构建一些索引（类别和条目）（Lucene 2.0，Java 6）。

问题是构建索引需要很长时间：在Intel i3上10分钟，包含60000个条目的50 MB索引和包含10000个类别的20 MB索引。

我想加快这个过程，所以每次添加/编辑一个条目以将其编入索引时，我都不必等待永恒。

使用Hibernate和JDBC驱动程序从MySQL数据库读取条目/类别，尽管SQL表被正确编入索引，但起初我认为这是瓶颈所在的位置（我总共执行20000个MySQL查询**）。但是每个查询平均花费不到1毫秒，所以我想情况并非如此。

在我建立并运行一个可能只告诉我我已经知道的分析器之前，我想知道是否有人对如何提高索引构建性能有任何直接的建议......也许是更新版本的Lucene或Java 7会有帮助吗？或者是我正在使用的分析仪？

** （每个类别2个：第一个查找类别的子类别，第二个查找类别的条目）

I'm working on this class to build a couple of indexes (categories and entries) in Lucene for an old web app (Lucene 2.0, Java 6).

The thing is it takes too long to build the indexes: 10 minutes on Intel i3 for a 50 MB index containing 60000 entries and a 20 MB index containing 10000 categories.

I would like to speed up the process so I don't have to wait an eternity every time I add/edit an entry to have it indexed.

The entries/categories are read from a MySQL database using Hibernate and JDBC driver and, despite the SQL tables being properly indexed, at first I thought this was where the bottleneck was located (I'm performing 20000 MySQL queries in total**). But each query takes less than 1 milliseconds on average, so I guess that's not the case.

Before I set up and run a profiler which may just tell me what I already know, I would like to know if anybody has any straight forward suggestions on how to improve the index building performance... Perhaps a more recent version of Lucene or Java 7 would help? Or is it the Analyzer I'm using?

** (2 per category: the first to find sub-categories for a category and the second to find entries for a category)

最满意答案

为了让问题结束（感谢@Joshua的评论）：

我设置了探查器。我没错。解决方案：安排并行增量更新 +定期重建整个索引（在这个特定情况下，每2-3天一次就足够了）。

For the sake of having the question closed (credit goes to @Joshua for his comments):

I set up the profiler. I was not wrong. Solution: schedule parallel incremental updates + rebuilding the whole index periodically (once every 2-3 days was enough on this particular case).

更多推荐

本文发布于:2023-08-07 13:42:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1464747.html