算法计算出百分比差值betweem两个文本斑点

编程入门 行业动态 更新时间:2024-10-12 16:23:44
本文介绍了算法计算出百分比差值betweem两个文本斑点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我一直在研究上找到一个有效的解决方案这一点。我看着版本比较引擎(谷歌的差异匹配补丁,Python的差异)和一些些最长公共链的算法。

I've been researching on finding an efficient solution to this. I've looked into diffing engines (google's diff-match-patch, python's diff) and some some longest common chain algorithms.

我希望在得到关于如何解决这个问题你们的建议。任何算法或库特别是你想推荐?

I was hoping on getting you guys suggestions on how to solve this issue. Any algorithm or library in particular you would like to recommend?

感谢。

推荐答案

我不知道什么是最长的公共[链?子?],是因为有百分比差异,尤其是在看到的评论您期望两个字符串,在中间相差一个字符之间一个非常小的差异%(因此它们的最长公共子大约是一半的弦的长度)。

I don't know what "longest common [[chain? substring?]]" has to do with "percent difference", especially after seeing in a comment that you expect a very small % difference between two strings that differ by one character in the middle (so their longest common substring is about one half of the strings' length).

忽略了时间最长的共同的陌生感和定义百分比差异为由最大长度划分的字符串(时间,当然100 ;-),怎么样的编辑距离:

Ignoring the "longest common" strangeness, and defining "percent difference" as the edit distance between the strings divided by the max length (times 100 of course;-), what about:

def levenshtein_distance(first, second): """Find the Levenshtein distance between two strings.""" if len(first) > len(second): first, second = second, first if len(second) == 0: return len(first) first_length = len(first) + 1 second_length = len(second) + 1 distance_matrix = [[0] * second_length for x in range(first_length)] for i in range(first_length): distance_matrix[i][0] = i for j in range(second_length): distance_matrix[0][j]=j for i in xrange(1, first_length): for j in range(1, second_length): deletion = distance_matrix[i-1][j] + 1 insertion = distance_matrix[i][j-1] + 1 substitution = distance_matrix[i-1][j-1] if first[i-1] != second[j-1]: substitution += 1 distance_matrix[i][j] = min(insertion, deletion, substitution) return distance_matrix[first_length-1][second_length-1] def percent_diff(first, second): return 100*levenshtein_distance(a, b) / float(max(len(a), len(b))) a = "the quick brown fox" b = "the quick vrown fox" print '%.2f' % percent_diff(a, b)

在莱文斯坦功能是从斯塔夫罗斯的博客。结果在这种情况下将是5.26(百分比差)。

The Levenshtein function is from Stavros' blog. The result in this case would be 5.26 (percent difference).

更多推荐

算法计算出百分比差值betweem两个文本斑点

本文发布于:2023-11-29 11:35:12,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1646279.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:差值   百分比   斑点   算法   计算出

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!