由于重复捕获组而不是捕获重复组,正则表达式不匹配

编程入门 行业动态 更新时间:2024-10-21 09:35:11
本文介绍了由于重复捕获组而不是捕获重复组,正则表达式不匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有以下正则表达式:

/(?:[\[\{]*)(?:([A-G\-][^A-G\]\}]*)+)(?:[\]\}]*)/

使用以下表达式:

{A''BsCb}

我希望有 3 个匹配的结果

I expect 3 matched results

A'' Bs Cb

但在 regex101/ 测试只给了我最后一场比赛 Cb,并告诉我重复捕获组只会捕获最后一次迭代,在重复组周围放置一个捕获组.

but testing at regex101/ only gives me the last match Cb, and tells me that a repeated capturing group will only capture the last iteration, put a capturing group around the repeated group.

我以为这就是我所做的!我想我已经理解了这里描述的问题 www.regular-expressions.info/捕获所有.html因此,my + 外的括号与内部的捕获组.

I thought that was what I had done! I thought I'd understood the problem as described here www.regular-expressions.info/captureall.html Hence the brackets outside my + with the capturing group inside.

但要么为时已晚,要么我需要一个不会在提到正则表达式时头脑崩溃的人来告诉我我哪里出错了.

But either it's getting too late or I need someone who's head doesn't implode at the mention of regexp to show me where I've gone wrong.

推荐答案

您正在尝试匹配重复的捕获组并获取捕获.使用 PHP PCRE 正则表达式是不可能的.

You are trying to match repeated capturing groups and get the captures. It is not possible with PHP PCRE regex.

您可以做的是确保提取所有 {...}/[...] 子字符串,从括号中修剪它们并使用简单的 [AG-][^AG]* 正则表达式,或者添加一个 \G 操作符,让你的正则表达式无法维护,但可以像原来的一样工作.

What you can do is to make sure you either extract all {...} / [...] substrings, trim them from the brackets and use a simple [A-G-][^A-G]* regex, or add a \G operator and make your regex unmaintainable but working as the original one.

解决方案 1 是

/(?:[[{]*|(?!\A)\G)\K[A-G-][^A-G\]}]*/

查看正则表达式演示.注意:此正则表达式不检查结束的 ] 或 },但可以通过正向预测添加.

See the regex demo. Note: this regex does not check for the closing ] or }, but it can be added with a positive lookahead.

  • (?:[[{]*|(?!\A)\G) - 匹配 [ 或 {,零或多次出现,或上一次成功匹配的结束位置
  • \K - 省略目前匹配的文本
  • [A-G-] - 从 A 到 G 和一个 -
  • 的字母
  • [^AG\]}]*- 零个或多个字符,除了 A 到 G 和 ] 和 }.
  • (?:[[{]*|(?!\A)\G) - matches a [ or {, zero or more occurreces, or the end location of the previous successful match
  • \K - omits the text matched so far
  • [A-G-] - letters from A to G and a -
  • [^A-G\]}]*- zero or more chars other than A to G and other than ] and }.

参见 PHP 演示.

解决方案 2 是

$re = '/(?|{([^}]*)}|\[([^]]*)])/'; $str = "{A''BsCb}"; $res = array(); preg_match_all($re, $str, $m); foreach ($m[1] as $match) { preg_match_all('~[A-G-][^A-G]*~', $match, $tmp); $res = array_merge($tmp, $res); } print_r($res);

查看 PHP 演示

(?|{([^}]*)}|\[([^]]*)]) 正则表达式只匹配字符串,如 {...} 或 [...](但不是 {...] 或 [...})并捕获括号之间的内容进入组 1(因为分支重置组 (?|...) 重置每个分支中的组 ID).然后,我们所需要的就是使用更连贯的 '~[A-G-][^A-G]*~' 正则表达式来获取我们需要的内容.

The (?|{([^}]*)}|\[([^]]*)]) regex just matches strings like {...} or [...] (but not {...] or [...}) and captures the contents between brackets into Group 1 (since the branch reset group (?|...) resets the group IDs in each branch). Then, all we need is to grab what we need with a more coherent '~[A-G-][^A-G]*~' regex.

更多推荐

由于重复捕获组而不是捕获重复组,正则表达式不匹配

本文发布于:2023-11-01 18:04:19,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1550028.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:而不是   不匹配   正则表达式

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!