SQL Server,ISABOUT,加权术语

编程入门 行业动态 更新时间:2024-10-22 14:39:19
本文介绍了SQL Server,ISABOUT,加权术语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我试图找出在SQL SERVER的ISABOUT查询中加权术语的确切工作方式。 这里是我现在的位置:

每个查询都会返回以下行:

QUERY 1(权重1): 初始排名

SELECT * FROM CONTAINSTABLE( documentPart,title,'ISABOUT(eweight(1))')ORDER BY RANK DESC,[KEY] KEY RANK 306342 249 272619 156 221557 114

QUERY 2(权重0.8):保留初始订单

SELECT * FROM CONTAINSTABLE(documentParts,title,'ISABOUT(eweight (0.8))')ORDER BY RANK DESC,[KEY] KEY RANK 306342 321 272619 201 221557 146

QUERY 3(权重0.2): 排名增加,初始订单被保留

SELECT * FROM CONTAINSTABLE(documentParts,title,'ISABOUT(eweight(0.2))')ORDER BY RANK DESC, [KE Y] KEY RANK 306342 998 272619 877 221557 692

QUERY 4(权重0.17): 排名下降,最佳匹配现在是最后一个,这些词的倒行为开始于0.17

SELECT * FROM CONTAINSTABLE(documentParts,title,'ISABOUT(eweight(0.17))')ORDER BY RANK DESC,[KEY ] KEY RANK 272619 960 221557 958 306342 802

QUERY 5(分量0.16): 排名增加,最佳匹配现在为秒 pre $ SELECT * FROM CONTAINSTABLE(documentParts,title,'ISABOUT(eweight(0.17))')ORDER BY RANK DESC,[KEY] KEY RANK 272619 978 306342 935 221557 841

QUERY 6(权重0.01): 排名减少,最佳匹配再次排在前面

选择*从CON TAINSTABLE(documentParts,title,'ISABOUT(eweight(0.01))')ORDER BY RANK DESC,[KEY] KEY RANK 221557 105 272619 77 306342 50

体重1的最佳匹配级别为249,体重降至最佳匹配的0.2排名上升到998. 从0.2到0.17排名下降,从0.16结果倒转(重现此行为的权重值取决于术语,也许在列中搜索... )

似乎有一点,权重意味着相反,就像不包含这个词。 你对这种行为有任何解释吗? 为什么在体重下降时排名增加? 为什么排名在某个点之后下降直到结果出现倒退,您如何预测这一点? 当用户搜索创建以下查询的内容时,我使用自定义的断字符:

我是否期待0.1字的大排名?  下面的查询与上面的查询相同,我是否期望0.1排名有些奇怪的行为?

编辑: /> 我找到了这个话题: http ://msdn.microsoft/en-us/library/ms142524(v = sql.105).aspx 它回答了我的一些问题,但创建了一些新的!

我在两张表格documents和documentParts中搜索并使用union all来总结行列并获得结果。根据这篇文章,这是错误的,因为索引行被计算为计算排名,所以RANK将像加入苹果和胡萝卜一样...

现在我的解决方案是计算一个每个CONTAINSTABLE的百分比是这样的: pre $ Log(RANK)/ Log(Sum(RANK)OVER(PARTITION BY 1))AS [PERCENT]

以及总和...

解决方案

根据我的经验,我已经获得了权重加起来为1的最佳结果。 code> CONTAINSTABLE(documentParts,content,'ISABOUT(wordA wordB wordCweight(0.5),wordA *NEARwordB *NEARwordC *权重(0.2),wordA *权重(0.1),wordB *权重(0.1),wordC *权重(0.1))')

I am trying to figure out exactly how weighted terms work in a ISABOUT query in SQL SERVER. Here is where I currently am:

Each query returns the following rows:

QUERY 1 (weight 1): Initial ranking

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (1) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 306342 249 272619 156 221557 114

QUERY 2 (weight 0.8): Ranking increases, initial order is preserved

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.8) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 306342 321 272619 201 221557 146

QUERY 3 (weight 0.2): Ranking increases, initial order is preserved

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.2) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 306342 998 272619 877 221557 692

QUERY 4 (weight 0.17): Ranking decreases, best match is now last, inverted behavior for these terms begin at 0.17

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 272619 960 221557 958 306342 802

QUERY 5 (weight 0.16): Ranking increases, best match is now second

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 272619 978 306342 935 221557 841

QUERY 6 (weight 0.01): Ranking decreases, best match is last again

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.01) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 221557 105 272619 77 306342 50

Best match for weight 1 has a rank of 249 and while weight goes down to 0.2 ranking of best match increases to 998. From 0.2 to 0.17 ranking decreases and from 0.16 results are inverted (the weight values that reproduce this behavior depend on terms and maybe on columns searched...)

It seems there is a point where weight means the opposite, something like "do not include this term". Do you have any explanation of this behavior? Why ranking increases when weight decreases? Why ranking decreases after some point until results are inverted and how can you predict this point? I use a custom "word-breaker", when user searches for something creating the following query:

CONTAINSTABLE(documentParts, title, 'ISABOUT ( "wordA wordB wordC" weight (0.8), "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6), "wordA*" weight (0.1), "wordB*" weight (0.1), "wordC*" weight (0.1), ) ')

Am I to expect big ranks for for 0.1 words? Is the following query the same as above and am I to expect some weird behavior with the 0.1 rankings?

CONTAINSTABLE(documentParts, title, ' ISABOUT ( "wordA wordB wordC" weight (0.8) ), OR ISABOUT ( "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6) ), OR ISABOUT ( "wordA*" weight (0.1) ), OR ISABOUT ( "wordB*" weight (0.1) ), OR ISABOUT ( "wordC*" weight (0.1) ), ')

EDIT: I found this topic: msdn.microsoft/en-us/library/ms142524(v=sql.105).aspx which answers some of my questions, but creates some new!

I am searching in two tables, "documents" and "documentParts" and use a union all to sum ranks and get my results. According to this article there it's wrong since indexed rows are counted to compute ranking so RANK will be like adding apples and carrots...

My solution for now is to compute a percentage for each CONTAINSTABLE like this:

Log(RANK) / Log(Sum(RANK) OVER( PARTITION BY 1)) AS [PERCENT]

and sum on this...

解决方案

In my experience I have had the best results where the weights add up to 1.

CONTAINSTABLE(documentParts, content, 'ISABOUT ( "wordA wordB wordC" weight (0.5), "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.2), "wordA*" weight (0.1), "wordB*" weight (0.1), "wordC*" weight (0.1) ) ')

更多推荐

SQL Server,ISABOUT,加权术语

本文发布于:2023-11-22 05:37:31,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:术语   SQL   Server   ISABOUT

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!