匹配方括号内的内容,包括嵌套的方括号

编程入门 行业动态 更新时间:2024-10-10 05:27:17
本文介绍了匹配方括号内的内容,包括嵌套的方括号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在尝试编写一个剧透识别系统,以便字符串中的任何破坏者都被指定的扰流角色替换。

I am attempting to write a spoiler identification system so that any spoilers in a string are replaced with a specified spoiler character.

我想匹配一个包围的字符串方括号,方括号内的内容是捕获组1,包括周围括号的整个字符串是匹配。

I want to match a string surrounded by square brackets, such that the contents within the square brackets is capture group 1, and the whole string including the surrounding brackets is the match.

我目前正在使用 \ [(。*?] *)\] ,对此答案中的表达式略有修改这里,因为我还希望嵌套的方括号成为捕获组1的一部分。

I am currently using \[(.*?]*)\], a slight modification of the expression found in this answer here, as I also want nested square brackets to be a part of capture group 1.

该表达式的问题在于,尽管它有效且匹配以下内容:

The problem with that expression is that, although it works and matches the following:

  • Jim吃了一个[三明治] 匹配 [三明治] 与三明治作为群组1
  • 吉姆吃了一个[泡菜三明治]离子]] 匹配 [夹心与[泡菜和洋葱]] 与三明治配[泡菜和洋葱] 作为第1组
  • [[[[] 匹配 [[[[] [[[ as group 1
  • []]]] 匹配 []]]] 与]]] 作为第1组
  • Jim ate a [sandwich] matches [sandwich] with sandwich as group 1
  • Jim ate a [sandwich with [pickles and onions]] matches [sandwich with [pickles and onions]] with sandwich with [pickles and onions] as group 1
  • [[[[] matches [[[[] with [[[ as group 1
  • []]]] matches []]]] with ]]] as group 1

但是,如果我想匹配以下内容,它将无法正常工作:

However, if I want to match the following, it does not work as expected:

  • 吉姆吃了[三明治配[泡菜]和[洋葱]] 匹配两者:
    • [三明治含[泡菜] 含三明治含[泡菜作为第1组
    • [洋葱]] 与洋葱] 作为第1组
    • Jim ate a [sandwich with [pickles] and [onions]] matches both:
      • [sandwich with [pickles] with sandwich with [pickles as group 1
      • [onions]] with onions] as group 1

      我应该使用什么表达式,使其匹配 [三明治配[泡菜]和[洋葱] ] 三明治含[泡菜]和[洋葱] 作为第1组?

      What expression should I use such that it matches [sandwich with [pickles] and [onions]] with sandwich with [pickles] and [onions] as group 1?

      编辑:

      因为它使用正则表达式在Java中无法实现这一目标,是否有替代解决方案?

      As it seems impossible to achieve this in Java using regex, is there an alternative solution?

      编辑2 :

      我还希望能够通过找到的每个匹配来拆分字符串,因此由于 String.split(regex)方便。这是一个例子:

      I also want to be able to split the string by each match found, so an alternative to regular expressions would be harder to implement due to String.split(regex) being convenient. Here's an example:

      • 吉姆吃[泡菜]和[干[洋葱]] 匹配全部:
        • [三明治] 带三明治作为第1组
        • [泡菜] 含泡菜作为第1组
        • [干[洋葱]] 带干[洋葱] 作为第1组
        • Jim ate a [sandwich] with [pickles] and [dried [onions]] matches all:
          • [sandwich] with sandwich as group 1
          • [pickles] with pickles as group 1
          • [dried [onions]] with dried [onions] as group 1

          分句应如下所示:

          Jim ate a with and

          推荐答案

          更直接的解决方案

          此解决方案将省略空或仅空白的子字符串

          More direct solution

          This solution will omit empty or whitespace only substrings

          public static List<String> getStrsBetweenBalancedSubstrings(String s, Character markStart, Character markEnd) { List<String> subTreeList = new ArrayList<String>(); int level = 0; int lastCloseBracket= 0; for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if (c == markStart) { level++; if (level == 1 && i != 0 && i!=lastCloseBracket && !s.substring(lastCloseBracket, i).trim().isEmpty()) { subTreeList.add(s.substring(lastCloseBracket, i).trim()); } } } else if (c == markEnd) { if (level > 0) { level--; lastCloseBracket = i+1; } } } if (lastCloseBracket != s.length() && !s.substring(lastCloseBracket).trim().isEmpty()) { subTreeList.add(s.substring(lastCloseBracket).trim()); } return subTreeList; }

          然后,将其用作

          String input = "Jim ate a [sandwich][ooh] with [pickles] and [dried [onions]] and ] [an[other] match] and more here"; List<String> between_balanced = getStrsBetweenBalancedSubstrings(input, '[', ']'); System.out.println("Result: " + between_balanced); // => Result: [Jim ate a, with, and, and ], and more here]

          原始答案(更复杂,显示了一种提取嵌套括号的方法)

          您还可以提取平衡括号内的所有子串,然后用它们拆分:

          Original answer (more complex, shows a way to extract nested parentheses)

          You can also extract all substrings inside balanced parentheses and then split with them:

          String input = "Jim ate a [sandwich] with [pickles] and [dried [onions]] and ] [an[other] match]"; List<String> balanced = getBalancedSubstrings(input, '[', ']', true); System.out.println("Balanced ones: " + balanced); List<String> rx_split = new ArrayList<String>(); for (String item : balanced) { rx_split.add("\\s*" + Pattern.quote(item) + "\\s*"); } String rx = String.join("|", rx_split); System.out.println("In-betweens: " + Arrays.toString(input.split(rx)));

          这个函数会找到所有 [] -balanced substrings:

          And this function will find all []-balanced substrings:

          public static List<String> getBalancedSubstrings(String s, Character markStart, Character markEnd, Boolean includeMarkers) { List<String> subTreeList = new ArrayList<String>(); int level = 0; int lastOpenBracket = -1; for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if (c == markStart) { level++; if (level == 1) { lastOpenBracket = (includeMarkers ? i : i + 1); } } else if (c == markEnd) { if (level == 1) { subTreeList.add(s.substring(lastOpenBracket, (includeMarkers ? i + 1 : i))); } if (level > 0) level--; } } return subTreeList; }

          参见 IDEONE演示

          代码执行结果:

          Balanced ones: ['[sandwich], [pickles], [dried [onions]]', '[an[other] match]'] In-betweens: ['Jim ate a', 'with', 'and', 'and ]']

          致谢: getBalancedSubstrings 基于 peter.murray.rust 回答 如何在Java正则表达式中拆分这个树状字符串?帖子 。

          Credits: the getBalancedSubstrings is based on the peter.murray.rust's answer for How to split this "Tree-like" string in Java regex? post.

更多推荐

匹配方括号内的内容,包括嵌套的方括号

本文发布于:2023-11-29 02:31:13,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1644947.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:方括号   嵌套   配方   括号内   内容

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!