Levenshtein距离与界限/界限

编程入门 行业动态 更新时间:2024-10-07 14:28:30
本文介绍了Levenshtein距离与界限/界限的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我发现Python 实现= nofollow noreferrer> Levenshtein距离。

I have found some Python implementations of the Levenshtein distance.

我想知道如何有效地修改这些算法,以便在Levenshtein距离大于 n (例如3)而不是一直运行到最后?

I am wondering though how these algorithms can be efficiently modified so that they break if the Levenshtein distance is greater than n (e.g. 3) instead of running until the end?

所以本质上我不想让算法如果我只是想知道距离是否大于阈值,则运行太长的时间来计算最终距离。

So essentially I do not want to let the algorithm run for too long to calculate the final distance if I simply want to know if the distance is greater than a threshold or not.

我在这里找到了一些相关的文章:

I have found some relevant posts here:

  • Levenstein距离限制
  • 计算Levenshtein距离的最有效方法
  • Modifying Levenshtein Distance algorithm to not calculate all distances
  • Levenstein distance limit
  • Most efficient way to calculate Levenshtein distance
  • Levenshtein Distance Algorithm better than O(n*m)?
  • 但仍然,我看不到任何Python代码能够完成我上面所描述的(或多或少这些文章也描述了)。

    but still, I do not see any Python code which does what I describe above (which is more or less what these posts describe too).

    PS:解决方案下面的@amirouche提供的内容基于我已经通过一些基准测试测试过的最快实现(来自此处: en.wikibooks/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python , h ttps://stackoverflow/a/32558749/9024698 )及其有界版本是我测试中同类产品中最快的一种(不排除可能有更快的测试)。

    P.S.: The solution provided by @amirouche below is based on the fastest implementation that I have tested with some benchmarking (from here: en.wikibooks/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python, stackoverflow/a/32558749/9024698) and its bounded version is the fastest one of its kind from my tests (without excluding that there may be even faster ones).

    推荐答案

    如 Levenstein距离限制,您可以在计算得出要早返回的行上添加测试:

    As described in Levenstein distance limit, you can add a test over the row that is computed to return early:

    def levenshtein(s1, s2, maximum): if len(s1) > len(s2): s1, s2 = s2, s1 distances = range(len(s1) + 1) for i2, c2 in enumerate(s2): distances_ = [i2+1] for i1, c1 in enumerate(s1): if c1 == c2: distances_.append(distances[i1]) else: distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1]))) if all((x >= maximum for x in distances_)): return False distances = distances_ return distances[-1]

    更多推荐

    Levenshtein距离与界限/界限

    本文发布于:2023-11-29 09:23:40,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1645994.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:界限   距离   Levenshtein

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!