我需要找到一个字符串中最长的非重叠重复子字符串。我有可用的字符串的后缀树和后缀数组。
I need to find the longest non-overlapping repeated substring in a String. I have the suffix tree and suffix array of the string available.
允许重叠时,答案很简单(后缀树中最深的父节点)。
When overlapping is allowed, the answer is trivial (deepest parent node in suffix tree).
例如String = acaca
For example for String = "acaca"
如果允许重叠,则答案为 aca,但不允许重叠允许,答案是 ac或 ca。
If overlapping is allowed, the answer is "aca" but when overlapping is not allowed, the answer is "ac" or "ca".
我只需要算法或高级思路。
I need the algorithm or high level idea only.
PS:我尝试过,但是没有明确的答案可以在网上找到。
P.S.: I tried but there is no clear answer I can find on web.
推荐答案生成后缀数组并按O(nlogn).ps排序:有更有效的算法,例如DC3和Ukkonen算法。 示例:
Generate suffix array and sort in O(nlogn).ps: There is more effective algorithm like DC3 and Ukkonen algorithm. example:
字符串:ababc 后缀数组:子字符串的起始索引| substring 0-ababc 2-abc 1-babc 3-bc 4- c
String : ababc Suffix array: start-index of substring | substring 0 - ababc 2 - abc 1 - babc 3 - bc 4 - c
比较每个连续的两个子字符串,并获得具有以下约束的公共前缀: 说 i1 是子字符串 ababc 的索引: 0 说 i2 是子字符串 abc : 2 的公共前缀为 ab ,公共前缀的长度为 len
compare each two consecutive substring and get common prefix with following constraint: Say i1 is index of substring "ababc": 0 say i2 is index of substring "abc":2 common prefix is "ab" , length of common prefix is len
abs(i1-i2)> len //避免重叠
abs(i1-i2) >len //avoid overlap
通过带有解决方案的后缀数组,您将获得 ababc的结果,即 ab ;
go through suffix array with solution, and you will get the result of "ababc", which is "ab";
整个解决方案将运行O(nlogn)
The whole solution will run O(nlogn)
但是,会有一个特例: aaaaa,该解决方案无法彻底解决。 欢迎讨论并提出O(nlogn)而不是O(n ^ 2)的解决方案
However, there will be a special case: "aaaaa" that this solution can't solve thoroughly. Welcome to discuss and come up to a solution of O(nlogn) but not O(n^2)
更多推荐
使用后缀树/数组的最长非重叠重复子串(仅算法)
发布评论