正则表达式匹配单个字符或转义字符(Regex to match a single character or an escaped character)

系统教程行业动态更新时间:2024-06-14 17:02:18

我正在尝试编写自己的Format代码，这是一个类项目，但Format是我自己添加到C＃Regex更多的工作。所以我想要做的是匹配某些字符。

W w : w = weeks. W weeks preceded by a leading zero if smaller than 10 D d : d = days. D days preceded by a leading zero if smaller than 10 G g : g = Military Hours: G hours preceded by a leading zero if smaller than 10 H h : h = Civilian Hours: H hours preceded by a leading zero... m : m = minutes s : s = seconds

所以我到目前为止所拥有的正则表达式是这样的

(w|W)(?=\b)|(d|D)(?=\b)|(h|H|g|G)(?=\b)|(m)(?=\b)|(s)(?=\b) (w|W) //match upper or lower W (?=\b) //positive lookahead only match if not apart of a word boundary

随着s匹配字符串中的所有s ，所以我当然会相信我的正则表达式是错误的。我的问题是，我不确定如何正确地进行前瞻和后视。我基本上只想要我提供的字符的情况，并且只有它们自己或转义时才会看到下面的示例。

Format("w Weeks, D days, h:m:s"); //returns 7 Weeks, 04 days, 10:01:05 Format("[w] weeks [d] days H:m:s"); //returns [7] weeks [4] days 10:01:05 Format("w \Weeks D \days, h:m:s"); //returns 7 07eeks 04 4ays, 10:01:05

正如你可以看到转义w和d的最后一种格式，它仍然取代它们。这就是我想要的。我再也不确定如何正确地编写前瞻和后视。

我在这里使用https://regex101.com/r/sL9cI2/1 regex101进行测试。你可以看到它和发生了什么。请给我任何建议。

I am trying to write my own Format code for time, this is a class project but the Format is an added for myself to work more with C# Regex. So what I am trying to do is match certain characters.

So what I have the regex so far is this

(w|W)(?=\b)|(d|D)(?=\b)|(h|H|g|G)(?=\b)|(m)(?=\b)|(s)(?=\b) (w|W) //match upper or lower W (?=\b) //positive lookahead only match if not apart of a word boundary

With the s it's match all s in the string so I'm lead to believe my regex is wrong of course. My problem is that I'm not sure how to do lookaheads and lookbehinds correctly. I basically only want the cases of characters I've supplied and only if they are by themselves OR escaped see examples below.

As you can see the last format with escaped w's and d's it still replaces them. Which is what I want. Again I'm not sure how to write the lookaheads and lookbehinds correctly .

I am using https://regex101.com/r/sL9cI2/1 regex101 here to test on. You can see it and what is going on. any suggestions please.

最满意答案

关于单词边界的一件事是它们匹配一个空字符串 。 \b匹配一个位置，而不是一个字符，它在一侧有一个单词字符 ，而另一侧没有单词字符 。例如，在"This is an example" ，有8个位置匹配\b ：

为了匹配单词，正则表达式应检查每一侧是否有单词边界： \bword\b （注意这里不需要前瞻）。

我基本上只想要我提供的字符的情况，并且只有他们自己或逃脱

然后你有2个选项可供选择：

\bw\b字母“w”为单词。 \\w一个反斜杠（你需要在正则表达式中转义反斜杠），然后是字母w。

正则表达式：

(\bw\b|\\w)

此外，看看你的尝试，我认为你可以使用一个字符类来简化模式。

正则表达式：

(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101演示

请注意，此正则表达式不会验证连续的反斜杠，这意味着我们无法在格式代码前面可靠地指定反斜杠。

以\\week为例，它被解释为\后跟周格式代码然后是文字字符串eek ，而不是文字\后跟文字字符串week 。

如果要支持此类用例，请使用以下正则表达式：

\G(?:[^\\]|\\.)*?(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101演示

One thing about word boundaries is that they match an empty string. \b matches a position, not a character, where it has a word character on one side, and it doesn't have a word character on the other. Eg, in "This is an example", there are 8 positions matching \b:

To match words, the regex should check it has a word boundary on each side: \bword\b (Notice there's no need for lookaheads here).

I basically only want the cases of characters I've supplied and only if they are by themselves OR escaped

Then you have 2 options to match:

\bw\b The letter "w" as a word. \\w a backslash (you need to escape backslashes in regex) followed by the letter w.

Regex:

(\bw\b|\\w)

Moreover, looking at your attempts, I think you can use a character class to simplify the pattern.

Regex:

(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101 Demo

Do note that this regex does not validate consecutive backslashes, which means we can't reliably specify a backslash in front of format code.

Using \\week as an example, it is interpreted as \ followed by week format code then literal string eek, instead of literal \ followed by literal string week.

Use the following regex if you want to support such use case:

\G(?:[^\\]|\\.)*?(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101 Demo

更多推荐

本文发布于:2023-04-21 18:39:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/dzcp/27f2e3633cf4967eb5d6bfc023b97971.html