Levenshtein矩阵仅使用对角线条(Levenshtein Matrix using only a diagonal strip)

编程入门 行业动态 更新时间:2024-10-28 03:21:18
Levenshtein矩阵仅使用对角线条(Levenshtein Matrix using only a diagonal strip)

根据维基百科 ,Wagner-Fischer算法可能会进行修改,可以计算出两个单词的Levenshtein距离是否低于某个阈值,如果这是您想知道的全部内容,则比原始的阈值快得多。

“通过检查对角线而不是行,并通过使用惰性评估,我们可以在O(m(1 + d))时间(其中d是Levenshtein距离)中找到Levenshtein距离,这比常规动态规划算法快得多如果距离很小“。

这个解决方案如何工作? 我真的很难看到它,因为它感觉像任何矩阵单元格的值取决于上面的单元格的值,左侧和对角线左侧,所以我不知道如何遍历该矩阵仅使用对角条。

According to wikipedia there's a possible modification to the Wagner-Fischer-algorithm that can calculate if the Levenshtein distance of two words is lower than a certain threshold, that is much quicker than the original one if that's all you want to know.

"By examining diagonals instead of rows, and by using lazy evaluation, we can find the Levenshtein distance in O(m (1 + d)) time (where d is the Levenshtein distance), which is much faster than the regular dynamic programming algorithm if the distance is small."

How does this solution work? I'm having a real hard time visualizing it since it feels like the value of any matrix cell depends on the values of the cells above, to the left and diagonally up to the left of it, so I'm not sure how to traverse the matrix using only a diagonal strip.

最满意答案

第二次尝试解释:

假设我们找到长度为m的词和长度为n的词之间的距离。 设矩阵条目的索引为[0,m]×[0,n],其中(i,j)条目表示长度为m的单词的长度为i的前缀与长度为j的长度为j的前缀之间的编辑距离长度为n的词。

我们可以将动态程序视为在顶点对应于矩阵条目的有向图中找到从(0,0)到(m,n)的最短路径,长度为1的向右弧和长度为1的向下弧和长度为0或长度为1的对角线弧,这取决于i和j上的字符是否匹配。 简而言之,这个想法是使用具有长度差异启发式H(i,j)= |(m-i) - (n-j)|的A * 。 然后,我们不扩展其A *值大于d的节点。 结果是只有一些对角线需要打开:

o t h e r w o r d t * * * h * * * e * * * w * * * o * * * r * * * d * * *

首先尝试解释:

每个矩阵项(i,j)由| i - j |下界,因为这是达到该状态所需的非对角移动数的下限。 该条是坐标(i,j)满足| i - j |的每个元素 ≤d,看起来像

o t h e r w o r d t * * * h * * * * e * * * * * w * * * * * o * * * * * r * * * * * d * * * * *

对于d = 2。当需要关闭条带上的空白元素值时,只需使用d。 最后,任何条带条目≤d都是有效的,因为空白元素只能贡献d + 1,因为条带元素的左上邻居也在条带上。

对于不同长度的单词,我们实际上可以将参数应用于转置,并将其限制为条状

o t h e r w o r d t * * * h * * * e * * * w * * * o * * * r * * * d * * *

尽管渐近运行时间是相同的。

Second attempt at explanation:

Suppose that we're finding the distance between a length-m word and a length-n word. Let the matrix entries be indexed by [0, m] × [0, n], where the (i, j) entry represents the edit distance between the length-i prefix of the length-m word and the length-j prefix of the length-n word.

We can view the dynamic program as finding the shortest path from (0, 0) to (m, n) in a directed graph whose vertices correspond to matrix entries, with length-1 rightward arcs and length-1 downward arcs and length-0 or length-1 diagonal arcs depending on whether the characters at i and j match. The idea, in a nutshell, is to use A* with the length difference heuristic H(i, j) = |(m - i) - (n - j)|. Then, we don't expand nodes whose A* value is greater than d. The result is that only some of the diagonals need to be opened:

o t h e r w o r d t * * * h * * * e * * * w * * * o * * * r * * * d * * *

First attempt at explanation:

Each matrix entry (i, j) is lowerbounded by |i - j|, because that's a lowerbound on the number of non-diagonal moves needed to reach that state. The strip is every element whose coordinates (i, j) satisfies |i - j| ≤ d, which looks like

o t h e r w o r d t * * * h * * * * e * * * * * w * * * * * o * * * * * r * * * * * d * * * * *

for d = 2. When the values for the blank elements off of the strip are needed, just use d. At the end, any strip entry that's ≤ d is valid, because the blank elements can only contribute d + 1, seeing as the upper left neighbor of a strip element is also on the strip.

For words of different lengths, we can actually apply the argument to the transpose and restrict to the strip like

o t h e r w o r d t * * * h * * * e * * * w * * * o * * * r * * * d * * *

though the asymptotic running time is the same.

更多推荐

本文发布于:2023-08-04 18:08:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1420942.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:矩阵   线条   Levenshtein   strip   diagonal

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!