正则表达式匹配单个字符或转义字符(Regex to match a single character or an escaped character)

系统教程 行业动态 更新时间:2024-06-14 17:02:18
正则表达式匹配单个字符或转义字符(Regex to match a single character or an escaped character)

我正在尝试编写自己的Format代码,这是一个类项目,但Format是我自己添加到C#Regex更多的工作。 所以我想要做的是匹配某些字符。

W w : w = weeks. W weeks preceded by a leading zero if smaller than 10 D d : d = days. D days preceded by a leading zero if smaller than 10 G g : g = Military Hours: G hours preceded by a leading zero if smaller than 10 H h : h = Civilian Hours: H hours preceded by a leading zero... m : m = minutes s : s = seconds

所以我到目前为止所拥有的正则表达式是这样的

(w|W)(?=\b)|(d|D)(?=\b)|(h|H|g|G)(?=\b)|(m)(?=\b)|(s)(?=\b) (w|W) //match upper or lower W (?=\b) //positive lookahead only match if not apart of a word boundary

随着s匹配字符串中的所有s ,所以我当然会相信我的正则表达式是错误的。 我的问题是,我不确定如何正确地进行前瞻和后视。 我基本上只想要我提供的字符的情况,并且只有它们自己或转义时才会看到下面的示例。

Format("w Weeks, D days, h:m:s"); //returns 7 Weeks, 04 days, 10:01:05 Format("[w] weeks [d] days H:m:s"); //returns [7] weeks [4] days 10:01:05 Format("w \Weeks D \days, h:m:s"); //returns 7 07eeks 04 4ays, 10:01:05

正如你可以看到转义w和d的最后一种格式,它仍然取代它们。 这就是我想要的。 我再也不确定如何正确地编写前瞻和后视。

我在这里使用https://regex101.com/r/sL9cI2/1 regex101进行测试。 你可以看到它和发生了什么。 请给我任何建议。

I am trying to write my own Format code for time, this is a class project but the Format is an added for myself to work more with C# Regex. So what I am trying to do is match certain characters.

W w : w = weeks. W weeks preceded by a leading zero if smaller than 10 D d : d = days. D days preceded by a leading zero if smaller than 10 G g : g = Military Hours: G hours preceded by a leading zero if smaller than 10 H h : h = Civilian Hours: H hours preceded by a leading zero... m : m = minutes s : s = seconds

So what I have the regex so far is this

(w|W)(?=\b)|(d|D)(?=\b)|(h|H|g|G)(?=\b)|(m)(?=\b)|(s)(?=\b) (w|W) //match upper or lower W (?=\b) //positive lookahead only match if not apart of a word boundary

With the s it's match all s in the string so I'm lead to believe my regex is wrong of course. My problem is that I'm not sure how to do lookaheads and lookbehinds correctly. I basically only want the cases of characters I've supplied and only if they are by themselves OR escaped see examples below.

Format("w Weeks, D days, h:m:s"); //returns 7 Weeks, 04 days, 10:01:05 Format("[w] weeks [d] days H:m:s"); //returns [7] weeks [4] days 10:01:05 Format("w \Weeks D \days, h:m:s"); //returns 7 07eeks 04 4ays, 10:01:05

As you can see the last format with escaped w's and d's it still replaces them. Which is what I want. Again I'm not sure how to write the lookaheads and lookbehinds correctly .

I am using https://regex101.com/r/sL9cI2/1 regex101 here to test on. You can see it and what is going on. any suggestions please.

最满意答案

关于单词边界的一件事是它们匹配一个空字符串 。 \b匹配一个位置,而不是一个字符,它在一侧有一个单词字符 ,而另一侧没有单词字符 。 例如,在"This is an example" ,有8个位置匹配\b :

|This| |is| |an| |example| | ::: denotes a word boundary

为了匹配单词,正则表达式应检查每一侧是否有单词边界: \bword\b (注意这里不需要前瞻)。

我基本上只想要我提供的字符的情况,并且只有他们自己或逃脱

然后你有2个选项可供选择:

\bw\b字母“w”为单词。 \\w一个反斜杠(你需要在正则表达式中转义反斜杠),然后是字母w。

正则表达式:

(\bw\b|\\w)

此外,看看你的尝试,我认为你可以使用一个字符类来简化模式。


正则表达式:

(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101演示

请注意,此正则表达式不会验证连续的反斜杠,这意味着我们无法在格式代码前面可靠地指定反斜杠。

以\\week为例,它被解释为\后跟周格式代码然后是文字字符串eek ,而不是文字\后跟文字字符串week 。

如果要支持此类用例,请使用以下正则表达式:

\G(?:[^\\]|\\.)*?(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101演示

One thing about word boundaries is that they match an empty string. \b matches a position, not a character, where it has a word character on one side, and it doesn't have a word character on the other. Eg, in "This is an example", there are 8 positions matching \b:

|This| |is| |an| |example| | ::: denotes a word boundary

To match words, the regex should check it has a word boundary on each side: \bword\b (Notice there's no need for lookaheads here).

I basically only want the cases of characters I've supplied and only if they are by themselves OR escaped

Then you have 2 options to match:

\bw\b The letter "w" as a word. \\w a backslash (you need to escape backslashes in regex) followed by the letter w.

Regex:

(\bw\b|\\w)

Moreover, looking at your attempts, I think you can use a character class to simplify the pattern.


Regex:

(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101 Demo

Do note that this regex does not validate consecutive backslashes, which means we can't reliably specify a backslash in front of format code.

Using \\week as an example, it is interpreted as \ followed by week format code then literal string eek, instead of literal \ followed by literal string week.

Use the following regex if you want to support such use case:

\G(?:[^\\]|\\.)*?(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101 Demo

更多推荐

本文发布于:2023-04-21 18:39:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/27f2e3633cf4967eb5d6bfc023b97971.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字符   正则表达式   Regex   match   character

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!