我有一个 SOLr 实例,我在其中索引了来自客户端的大量文档,以便用户可以在 Web 应用程序中搜索它们.
I have a SOLr instance where i index a large number of documents from my client so users can search them in a web application.
因为我们有大量文件,他们只需要搜索最近的文件(90 天左右),我们有一个计划的工作,从索引中删除旧文档.
Because we have a large number of files and they need to search the recent ones only (90 days or so) we have a scheduled job that remove old documents from index.
问题是,磁盘空间每天增加大约 2Gb,即使有删除.
The problem is, the disk space is increasing about 2Gb a day, even with the deletions.
这是正常行为还是我们应该采取更多措施来保持索引大小稳定?
Is this a normal behavior or should we do something more to keep index in a stable size?
我们正在使用 Java 应用程序向索引添加和删除文件.
We are using a Java application to add and remove files to the index.
推荐答案删除只会将文档标记为已删除——它们仍然存在于索引中.由于删除它们需要重写索引文件,因此不会执行实际删除在您发出优化命令之前.
Deletions will only mark documents as deleted - they're still present in the index. Since removing them would require rewriting the index files, the actual removal is not performed before you issue an optimize command.
当您发出提交时,还有一个 expungeDeletes 选项,但据我所知,最好在正常工作时间之外发出优化.如果您每晚删除文档,您可以在删除后发出优化,甚至更频繁地发出,例如每两天或三天.
There's also an option to expungeDeletes when you issue a commit, but as far as I can see, it's better to issue an optimize outside of normal operating hours. If you remove documents nightly, you can issue the optimize after removal, or even more infrequent, such as every second or third day.
优化需要与索引占用相同数量的可用磁盘空间(因为最坏的情况是整个索引被再次写入).
Optimizing requires the same amount in free disk space as the index takes up (since worst case is the whole index being written again).
更多推荐
删除文档后 SOLr 索引大小是否会减小?
发布评论