我正在尝试编写自己的Format代码,这是一个类项目,但Format是我自己添加到C#Regex更多的工作。 所以我想要做的是匹配某些字符。
W w : w = weeks. W weeks preceded by a leading zero if smaller than 10 D d : d = days. D days preceded by a leading zero if smaller than 10 G g : g = Military Hours: G hours preceded by a leading zero if smaller than 10 H h : h = Civilian Hours: H hours preceded by a leading zero... m : m = minutes s : s = seconds所以我到目前为止所拥有的正则表达式是这样的
(w|W)(?=\b)|(d|D)(?=\b)|(h|H|g|G)(?=\b)|(m)(?=\b)|(s)(?=\b) (w|W) //match upper or lower W (?=\b) //positive lookahead only match if not apart of a word boundary随着s匹配字符串中的所有s ,所以我当然会相信我的正则表达式是错误的。 我的问题是,我不确定如何正确地进行前瞻和后视。 我基本上只想要我提供的字符的情况,并且只有它们自己或转义时才会看到下面的示例。
Format("w Weeks, D days, h:m:s"); //returns 7 Weeks, 04 days, 10:01:05 Format("[w] weeks [d] days H:m:s"); //returns [7] weeks [4] days 10:01:05 Format("w \Weeks D \days, h:m:s"); //returns 7 07eeks 04 4ays, 10:01:05正如你可以看到转义w和d的最后一种格式,它仍然取代它们。 这就是我想要的。 我再也不确定如何正确地编写前瞻和后视。
我在这里使用https://regex101.com/r/sL9cI2/1 regex101进行测试。 你可以看到它和发生了什么。 请给我任何建议。
I am trying to write my own Format code for time, this is a class project but the Format is an added for myself to work more with C# Regex. So what I am trying to do is match certain characters.
W w : w = weeks. W weeks preceded by a leading zero if smaller than 10 D d : d = days. D days preceded by a leading zero if smaller than 10 G g : g = Military Hours: G hours preceded by a leading zero if smaller than 10 H h : h = Civilian Hours: H hours preceded by a leading zero... m : m = minutes s : s = secondsSo what I have the regex so far is this
(w|W)(?=\b)|(d|D)(?=\b)|(h|H|g|G)(?=\b)|(m)(?=\b)|(s)(?=\b) (w|W) //match upper or lower W (?=\b) //positive lookahead only match if not apart of a word boundaryWith the s it's match all s in the string so I'm lead to believe my regex is wrong of course. My problem is that I'm not sure how to do lookaheads and lookbehinds correctly. I basically only want the cases of characters I've supplied and only if they are by themselves OR escaped see examples below.
Format("w Weeks, D days, h:m:s"); //returns 7 Weeks, 04 days, 10:01:05 Format("[w] weeks [d] days H:m:s"); //returns [7] weeks [4] days 10:01:05 Format("w \Weeks D \days, h:m:s"); //returns 7 07eeks 04 4ays, 10:01:05As you can see the last format with escaped w's and d's it still replaces them. Which is what I want. Again I'm not sure how to write the lookaheads and lookbehinds correctly .
I am using https://regex101.com/r/sL9cI2/1 regex101 here to test on. You can see it and what is going on. any suggestions please.
最满意答案
关于单词边界的一件事是它们匹配一个空字符串 。 \b匹配一个位置,而不是一个字符,它在一侧有一个单词字符 ,而另一侧没有单词字符 。 例如,在"This is an example" ,有8个位置匹配\b :
|This| |is| |an| |example| | ::: denotes a word boundary为了匹配单词,正则表达式应检查每一侧是否有单词边界: \bword\b (注意这里不需要前瞻)。
我基本上只想要我提供的字符的情况,并且只有他们自己或逃脱
然后你有2个选项可供选择:
\bw\b字母“w”为单词。 \\w一个反斜杠(你需要在正则表达式中转义反斜杠),然后是字母w。正则表达式:
(\bw\b|\\w)此外,看看你的尝试,我认为你可以使用一个字符类来简化模式。
正则表达式:
(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])regex101演示
请注意,此正则表达式不会验证连续的反斜杠,这意味着我们无法在格式代码前面可靠地指定反斜杠。
以\\week为例,它被解释为\后跟周格式代码然后是文字字符串eek ,而不是文字\后跟文字字符串week 。
如果要支持此类用例,请使用以下正则表达式:
\G(?:[^\\]|\\.)*?(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])regex101演示
One thing about word boundaries is that they match an empty string. \b matches a position, not a character, where it has a word character on one side, and it doesn't have a word character on the other. Eg, in "This is an example", there are 8 positions matching \b:
|This| |is| |an| |example| | ::: denotes a word boundaryTo match words, the regex should check it has a word boundary on each side: \bword\b (Notice there's no need for lookaheads here).
I basically only want the cases of characters I've supplied and only if they are by themselves OR escaped
Then you have 2 options to match:
\bw\b The letter "w" as a word. \\w a backslash (you need to escape backslashes in regex) followed by the letter w.Regex:
(\bw\b|\\w)Moreover, looking at your attempts, I think you can use a character class to simplify the pattern.
Regex:
(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])regex101 Demo
Do note that this regex does not validate consecutive backslashes, which means we can't reliably specify a backslash in front of format code.
Using \\week as an example, it is interpreted as \ followed by week format code then literal string eek, instead of literal \ followed by literal string week.
Use the following regex if you want to support such use case:
\G(?:[^\\]|\\.)*?(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])regex101 Demo
更多推荐
发布评论