我试图找到一种方法来查找一组字符串的最大重复子。该最长重复子串问题通常适用于单个字符串,而不是一组字符串。什么类型的算法将是寻找最大的重复子在一组字符串有用吗?
查找最大的重复字符串中的一组文件(以删除重复的code在大型软件库)是主要的用例,我有一点,但仍然会有许多其他的用例这种算法为好。
例如,我想找到这组串的最长重复子:
世界,你好,这是第一个字符串。 你好走向世界,这是第二个字符串。 世界,你好,这是第三个字符串。 这是第三个字符串。在这种情况下,这是第三个字符串。将是最长的重复的字符串(例如,出现在一个以上的这些字符串的最长的字符串)
解决方案- 让我们把它叫做 T(I)字符串我
- 让我们把它叫做 M
- 如果键 P 已经在 M ,然后增加 M [P]
- 一样,插入 M [P] = 1
- C *> = 2 (*)
- 长度(P *)是最大所有这些对中
(*)如果你想获得共同的 K 的字符串,你将取代 2 与 K
I'm trying to find a way to find the largest duplicate substring in a group of strings. The longest duplicate substring problem usually applies to a single string, instead of a group of strings. What type of algorithm would be useful for finding the largest duplicate substring in a group of strings?
Finding the largest duplicate string in a group of files (in order to remove duplicate code in large software libraries) is the main use case that I have in mind, but there would be many other use cases for this algorithm as well.
For example, I'd want to find the longest duplicate substring in this group of strings:
"Hello world, this is the first string." "Hello to the world, this is the second string." "Hello world. This is the third string." "This is the third string."In this case, "This is the third string." would be the longest repeated string (i. e., the longest string that appears in more than one of these strings).
解决方案- Let's call it T(i) for string i
- Let's call it M
- if key P is already in M, then increment M[P]
- else, insert M[P] = 1
- C* >= 2 (*)
- length(P*) is maximum among all such pairs
(*) If you wanted to get the longest substring common to K of the strings, you would replace the 2 with K
更多推荐
找到最长的重复子在一组字符串
发布评论