正则表达式匹配可选组,由任何字符分组包围

编程入门 行业动态 更新时间:2024-10-27 18:31:45
本文介绍了正则表达式匹配可选组,由任何字符分组包围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在尝试匹配一个可选的组,该组可以在任意数量的字符之前和之后.整个模式还具有必需的开始和结束匹配,但是中间匹配是可选的.

我从此开始,它在需要中间组时有效:

string text = @等等等等,这是一个实验等等.该测试不起作用.字符串requiredBlah = @(foo).*?(blah).*?(bar)";匹配m = Regex.Match(text,requiredBlah);

结果为"foo","blah","bar".

但是,当中间组是可选的时,我猜想正则表达式引擎的机制倾向于不匹配中间组.

string optionalBlah = @(foo).*?(blah)?.*?(bar)";

结果:"foo",",bar".

最简单的解决方案是将懒惰的.*?模式和(blah)捕获组封装到一个可选的非捕获组中(即(?:.*?(blah))?)来使正则表达式引擎尝试至少一次匹配组模式(= greedily ):

(foo)(?:.*?(blah))?.*?(bar)

请参见

另一种解决方案是使用先行限制点匹配(使用所谓的

I am trying to match an optional group that can be preceded and followed by any number of characters. The entire pattern also has a required beginning and ending match, but the middle match is optional.

I started with this, which works when the middle group is required:

string text = @"blah blah foo This is a test blah. the test does not work. bar"; string requiredBlah = @"(foo).*?(blah).*?(bar)"; Match m = Regex.Match(text, requiredBlah);

Results are "foo", "blah", "bar".

However, when the middle group is optional, I guess the mechanisms of the regex engine prefer to not match the middle group.

string optionalBlah = @"(foo).*?(blah)?.*?(bar)";

Results: "foo", "", bar".

This SO answer says that I can capture the middle optional group if there are delimiters before and after the optional group, but that is not my situation.

I could skip the optional group entirely and use string.Contains("blah"), but I'm wondering if there is a purely regex solution to this kind of problem. My goal is to design regular expressions that match a generic pattern, with multiple optional parts, so that I can determine which parts of the pattern are missing.

解决方案

The problem is quite common. The second dot matching pattern grabs the blah and does not have to yield it back to (blah)? as it is optional (see this demo where I added capture groups to the original regex to show what group matches blah).

The simplest solution is to enclose the lazy .*? pattern and the (blah) capturing group into an optional non-capturing group (i.e. (?:.*?(blah))?) to make the regex engine try matching the group pattern at least once (= greedily):

(foo)(?:.*?(blah))?.*?(bar)

See the regex demo. Here, (foo) captures foo in Group 1, (?:.*?(blah))? matches an optional sequence of 0 or more chars other than line break chars, as few as possible and then captures blah into Group 2, and then .*?(bar) matches 0 or more chars other than line break chars, as few as possible and then captures bar into Group 3:

Another solution is to restrict the dot matching with a lookahead (using a so called tempered greedy token):

(foo)(?:(?!blah).)*(blah)?.*?(bar) ^^^^^^^^^^^^^^

See the regex demo. The (?:(?!blah).)* pattern matches any text up to the first blah. (If it is at the end of the pattern, it may also match up to the end of string.)

更多推荐

正则表达式匹配可选组,由任何字符分组包围

本文发布于:2023-11-06 16:13:48,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1564178.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:可选   包围   字符   正则表达式

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!