加快Lucene指数建设进程(Speed up Lucene index building process)

编程入门 行业动态 更新时间:2024-10-28 14:22:08
加快Lucene指数建设进程(Speed up Lucene index building process)

我正在研究这个类 ,在Lucene中为一个旧的Web应用程序构建一些索引(类别和条目)(Lucene 2.0,Java 6)。

问题是构建索引需要很长时间:在Intel i3上10分钟,包含60000个条目的50 MB索引和包含10000个类别的20 MB索引。

我想加快这个过程,所以每次添加/编辑一个条目以将其编入索引时,我都不必等待永恒。

使用Hibernate和JDBC驱动程序从MySQL数据库读取条目/类别,尽管SQL表被正确编入索引,但起初我认为这是瓶颈所在的位置(我总共执行20000个MySQL查询**)。 但是每个查询平均花费不到1毫秒,所以我想情况并非如此。

在我建立并运行一个可能只告诉我我已经知道的分析器之前,我想知道是否有人对如何提高索引构建性能有任何直接的建议......也许是更新版本的Lucene或Java 7会有帮助吗? 或者是我正在使用的分析仪 ?

** (每个类别2个:第一个查找类别的子类别,第二个查找类别的条目)

I'm working on this class to build a couple of indexes (categories and entries) in Lucene for an old web app (Lucene 2.0, Java 6).

The thing is it takes too long to build the indexes: 10 minutes on Intel i3 for a 50 MB index containing 60000 entries and a 20 MB index containing 10000 categories.

I would like to speed up the process so I don't have to wait an eternity every time I add/edit an entry to have it indexed.

The entries/categories are read from a MySQL database using Hibernate and JDBC driver and, despite the SQL tables being properly indexed, at first I thought this was where the bottleneck was located (I'm performing 20000 MySQL queries in total**). But each query takes less than 1 milliseconds on average, so I guess that's not the case.

Before I set up and run a profiler which may just tell me what I already know, I would like to know if anybody has any straight forward suggestions on how to improve the index building performance... Perhaps a more recent version of Lucene or Java 7 would help? Or is it the Analyzer I'm using?

** (2 per category: the first to find sub-categories for a category and the second to find entries for a category)

最满意答案

为了让问题结束(感谢@Joshua的评论):

我设置了探查器。 我没错。 解决方案:安排并行增量更新 +定期重建整个索引(在这个特定情况下,每2-3天一次就足够了)。

For the sake of having the question closed (credit goes to @Joshua for his comments):

I set up the profiler. I was not wrong. Solution: schedule parallel incremental updates + rebuilding the whole index periodically (once every 2-3 days was enough on this particular case).

更多推荐

本文发布于:2023-08-07 13:42:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1464747.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:进程   指数   Lucene   Speed   process

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!