我的默认Solr评分算法存在问题,该算法特定于我的集合的域。 在我的域中,包含所有查询字词或大多数查询字词的文档比仅包含少量字词的文档更具相关性。 我想提高文档的分数,以便匹配的条件越多,分数越高。 我知道solr已经通过将得分乘以协调因子来提升这些文档。 但是,协调因素对我来说不够重要,我希望将它提升到一定的权力。 我也熟悉ExtendedDismax解析器的最小匹配功能,但该功能并不能解决我的问题,因为我不想消除那些与术语不匹配的文档,我只是想“惩罚”他们。
有没有办法增加协调因素的重要性? 如果他们解决问题,我也会接受其他解决方案,这些解决方案不会使用协调因素。
I have a problem with the default Solr scoring algorithm that's specific the domain of my collection. In my domain, documents that contain all the query terms or most query terms are substantially more relevant than documents that contain only a few terms. I would like to boost the score of documents so that the more terms that match, the higher the score. I'm aware to the fact that solr already boosts such documents by multiplying the score by the coordination factor. However, the coordination factor is not significant enough for me, and I wish to raise it to a certain power. I'm also familiar with the ExtendedDismax parser's Minimum-Should-Match feature, but that feature doesn't solve my problem because I don't want to eliminate the documents that don't match enough terms, I just want to "punish" them.
Is there a way to increase the significance of the coordination factor? I'll also accept other solutions that don't make any use of the coordination factor if they solve the problem.
最满意答案
编写自己的相似性可能最简单。 您可以使用您喜欢的任何方式覆盖coord方法 ,并且它的实现非常简单,例如:
public class MySimilarity extends DefaultSimilarity { @Override public float coord(int overlap, int maxOverlap) { return super.coord(overlap, maxOverlap)^2; } }您可以在架构中引入自己的相似性实现:
<similarity class="this.is.MySimilarity"/>It might be easiest to just write your own similarity. You can override the coord method with whatever you like, and the implementation of it is pretty simple really Something like:
public class MySimilarity extends DefaultSimilarity { @Override public float coord(int overlap, int maxOverlap) { return super.coord(overlap, maxOverlap)^2; } }You can bring in your own similarity implementation in the schema:
<similarity class="this.is.MySimilarity"/>
更多推荐
发布评论