我正在编写正则表达式以检查是否有一个子字符串,该子字符串包含至少2个彼此重复的某种模式.我将正则表达式的结果与以前的字符串匹配-如果相等,则存在这种模式.可以更好地举例说明:1010包含模式10,并且在连续序列中存在2次.另一方面,10210不会有这样的模式,因为那10个不是相邻的.
I'm writing regular expression for checking if there is a substring, that contains at least 2 repeats of some pattern next to each other. I'm matching the result of regex with former string - if equal, there is such pattern. Better said by example: 1010 contains pattern 10 and it is there 2 times in continuous series. On other hand 10210 wouldn't have such pattern, because those 10 are not adjacent.
此外,我需要找到可能的最长模式,并且其长度至少为1.我编写了表达式以检查它是否为^.*?(.+)(\1).*?$.为了找到最长的模式,我在模式之前使用了非贪婪的版本来匹配某些内容,然后将模式匹配到组1,并且再次匹配了已经为组1匹配的相同内容.然后匹配其余的字符串,产生相等的字符串.但是存在一个问题,正则表达式急于在找到第一个模式之后返回,并且并没有真正考虑到我打算在尽可能短的前后使这些子字符串(使其余的字符串尽可能长).因此,从字符串01011010中我可以正确地找到匹配项,但是存储在组1中的模式只是01,尽管我想除101之外.
What's more, I need to find the longest pattern possible, and it's length is at least 1. I have written the expression to check for it ^.*?(.+)(\1).*?$. To find longest pattern, I've used non-greedy version to match something before patter, then pattern is matched to group 1 and once again same thing that has been matched for group1 is matched. Then the rest of string is matched, producing equal string. But there's a problem that regex is eager to return after finding first pattern, and don't really take into account that I intend to make those substrings before and after shortest possible (leaving the rest longest possible). So from string 01011010 I get correctly that there's match, but the pattern stored in group 1 is just 01 though I'd except 101.
因为我相信我无法在更贪婪"之前和之后都不会更贪婪"或浪费"模式,所以我只能想出一个使正则表达式不那么渴望的想法,但是我不确定这是否是可能的.
As I believe I can't make pattern "more greedy" or trash before and after even "more non-greedy" I can only come whit an idea to make regex less eager, but I'm not sure if this is possible.
更多示例:
56712453289 - no pattern - no match with former string 22010110100 - pattern 101 - match with former string (regex resulted in 22010110100 with 101 in group 1) 5555555 - pattern 555 - match 1919191919 - pattern 1919 - match 191919191919 - pattern 191919 - match 2323191919191919 - pattern 191919 - match使用当前表达式(使用相同的字符串)会得到什么:
What I would get using current expression (same strings used):
no pattern - no match pattern 2 - match pattern 555 - match pattern 1919 - match pattern 191919 - match pattern 23 - match推荐答案
在Perl中,您可以借助 (??{ code }) :
In Perl you can do it with one expression with help of (??{ code }):
$_ = '01011010'; say /(?=(.+)\1)(?!(??{ '.+?(..{' . length($^N) . ',})\1' }))/;输出:
101这里发生的是,在匹配了连续的一对子字符串之后,我们以负前瞻的方式确保不再跟随子对.
What happens here is that after a matching consecutive pair of substrings, we make sure with a negative lookahead that there is no longer pair following it.
为使更长的对的表达式更易使用,使用了延迟的子表达式构造(??{ code }),该子表达式每次(在每次操作中)对内部代码进行求值,并将返回的字符串用作表达式.
To make the expression for the longer pair a postponed subexpression construct is used (??{ code }), which evaluates the code inside (every time) and uses the returned string as an expression.
它构造的子表达式的格式为.+?(..{N,})\1,其中N是第一个捕获组的当前长度(length($^N),$^N包含前一个捕获组的当前值).
The subexpression it constructs has the form .+?(..{N,})\1, where N is the current length of the first capturing group (length($^N), $^N contains the current value of the previous capturing group).
因此完整的表达式将具有以下形式:
Thus the full expression would have the form:
(?=(.+)\1)(?!.+?(..{N,})\2}))具有神奇的N(第二个捕获组不是原始表达的真实"/正确捕获组).
With the magical N (and second capturing group not being a "real"/proper capturing group of the original expression).
用法示例:
use v5.10; sub longest_rep{ $_[0] =~ /(?=(.+)\1)(?!(??{ '.+?(..{' . length($^N) . ',})\1' }))/; } say longest_rep '01011010'; say longest_rep '010110101000110001'; say longest_rep '2323191919191919'; say longest_rep '22010110100';输出:
101 10001 191919 101更多推荐
正则表达式以匹配最长的重复子字符串
发布评论