我正在尝试匹配一个可选的组,该组可以在任意数量的字符之前和之后.整个模式还具有必需的开始和结束匹配,但是中间匹配是可选的.
我从此开始,它在需要中间组时有效:
string text = @等等等等,这是一个实验等等.该测试不起作用.字符串requiredBlah = @(foo).*?(blah).*?(bar)";匹配m = Regex.Match(text,requiredBlah);结果为"foo","blah","bar".
但是,当中间组是可选的时,我猜想正则表达式引擎的机制倾向于不匹配中间组.
string optionalBlah = @(foo).*?(blah)?.*?(bar)";结果:"foo",",bar".
此
最简单的解决方案是将懒惰的.*?模式和(blah)捕获组封装到一个可选的非捕获组中(即(?:.*?(blah))?)来使正则表达式引擎尝试至少一次匹配组模式(= greedily ):
(foo)(?:.*?(blah))?.*?(bar)请参见
另一种解决方案是使用先行限制点匹配(使用所谓的
I am trying to match an optional group that can be preceded and followed by any number of characters. The entire pattern also has a required beginning and ending match, but the middle match is optional.
I started with this, which works when the middle group is required:
string text = @"blah blah foo This is a test blah. the test does not work. bar"; string requiredBlah = @"(foo).*?(blah).*?(bar)"; Match m = Regex.Match(text, requiredBlah);Results are "foo", "blah", "bar".
However, when the middle group is optional, I guess the mechanisms of the regex engine prefer to not match the middle group.
string optionalBlah = @"(foo).*?(blah)?.*?(bar)";Results: "foo", "", bar".
This SO answer says that I can capture the middle optional group if there are delimiters before and after the optional group, but that is not my situation.
I could skip the optional group entirely and use string.Contains("blah"), but I'm wondering if there is a purely regex solution to this kind of problem. My goal is to design regular expressions that match a generic pattern, with multiple optional parts, so that I can determine which parts of the pattern are missing.
解决方案The problem is quite common. The second dot matching pattern grabs the blah and does not have to yield it back to (blah)? as it is optional (see this demo where I added capture groups to the original regex to show what group matches blah).
The simplest solution is to enclose the lazy .*? pattern and the (blah) capturing group into an optional non-capturing group (i.e. (?:.*?(blah))?) to make the regex engine try matching the group pattern at least once (= greedily):
(foo)(?:.*?(blah))?.*?(bar)See the regex demo. Here, (foo) captures foo in Group 1, (?:.*?(blah))? matches an optional sequence of 0 or more chars other than line break chars, as few as possible and then captures blah into Group 2, and then .*?(bar) matches 0 or more chars other than line break chars, as few as possible and then captures bar into Group 3:
Another solution is to restrict the dot matching with a lookahead (using a so called tempered greedy token):
(foo)(?:(?!blah).)*(blah)?.*?(bar) ^^^^^^^^^^^^^^See the regex demo. The (?:(?!blah).)* pattern matches any text up to the first blah. (If it is at the end of the pattern, it may also match up to the end of string.)
更多推荐
正则表达式匹配可选组,由任何字符分组包围
发布评论