我正在寻找一个 Python 库,用于从一组字符串中找到最长的公共子字符串.有两种方法可以解决这个问题:
I'm looking for a Python library for finding the longest common sub-string from a set of strings. There are two ways to solve this problem:
- 使用后缀树
- 使用动态规划.
实现的方法并不重要.重要的是它可以用于一组字符串(不仅仅是两个字符串).
Method implemented is not important. It is important it can be used for a set of strings (not only two strings).
推荐答案这些成对的函数将在任意字符串数组中找到最长的公共字符串:
These paired functions will find the longest common string in any arbitrary array of strings:
def long_substr(data): substr = '' if len(data) > 1 and len(data[0]) > 0: for i in range(len(data[0])): for j in range(len(data[0])-i+1): if j > len(substr) and is_substr(data[0][i:i+j], data): substr = data[0][i:i+j] return substr def is_substr(find, data): if len(data) < 1 and len(find) < 1: return False for i in range(len(data)): if find not in data[i]: return False return True print long_substr(['Oh, hello, my friend.', 'I prefer Jelly Belly beans.', 'When hell freezes over!'])毫无疑问,算法可以改进,而且我对 Python 的接触不多,所以也许它在语法上也可以更有效,但它应该可以完成工作.
No doubt the algorithm could be improved and I've not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job.
编辑:内联第二个 is_substr 函数,如 J.F. Sebastian 所示.用法保持不变.注意:算法没有变化.
in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm.
def long_substr(data): substr = '' if len(data) > 1 and len(data[0]) > 0: for i in range(len(data[0])): for j in range(len(data[0])-i+1): if j > len(substr) and all(data[0][i:i+j] in x for x in data): substr = data[0][i:i+j] return substr希望这会有所帮助,
杰森.
更多推荐
来自两个以上字符串的最长公共子字符串
发布评论