我正在研究这个类 ,在Lucene中为一个旧的Web应用程序构建一些索引(类别和条目)(Lucene 2.0,Java 6)。
问题是构建索引需要很长时间:在Intel i3上10分钟,包含60000个条目的50 MB索引和包含10000个类别的20 MB索引。
我想加快这个过程,所以每次添加/编辑一个条目以将其编入索引时,我都不必等待永恒。
使用Hibernate和JDBC驱动程序从MySQL数据库读取条目/类别,尽管SQL表被正确编入索引,但起初我认为这是瓶颈所在的位置(我总共执行20000个MySQL查询**)。 但是每个查询平均花费不到1毫秒,所以我想情况并非如此。
在我建立并运行一个可能只告诉我我已经知道的分析器之前,我想知道是否有人对如何提高索引构建性能有任何直接的建议......也许是更新版本的Lucene或Java 7会有帮助吗? 或者是我正在使用的分析仪 ?
** (每个类别2个:第一个查找类别的子类别,第二个查找类别的条目)
I'm working on this class to build a couple of indexes (categories and entries) in Lucene for an old web app (Lucene 2.0, Java 6).
The thing is it takes too long to build the indexes: 10 minutes on Intel i3 for a 50 MB index containing 60000 entries and a 20 MB index containing 10000 categories.
I would like to speed up the process so I don't have to wait an eternity every time I add/edit an entry to have it indexed.
The entries/categories are read from a MySQL database using Hibernate and JDBC driver and, despite the SQL tables being properly indexed, at first I thought this was where the bottleneck was located (I'm performing 20000 MySQL queries in total**). But each query takes less than 1 milliseconds on average, so I guess that's not the case.
Before I set up and run a profiler which may just tell me what I already know, I would like to know if anybody has any straight forward suggestions on how to improve the index building performance... Perhaps a more recent version of Lucene or Java 7 would help? Or is it the Analyzer I'm using?
** (2 per category: the first to find sub-categories for a category and the second to find entries for a category)
最满意答案
为了让问题结束(感谢@Joshua的评论):
我设置了探查器。 我没错。 解决方案:安排并行增量更新 +定期重建整个索引(在这个特定情况下,每2-3天一次就足够了)。
For the sake of having the question closed (credit goes to @Joshua for his comments):
I set up the profiler. I was not wrong. Solution: schedule parallel incremental updates + rebuilding the whole index periodically (once every 2-3 days was enough on this particular case).
更多推荐
发布评论