(注意:标题似乎不清楚——如果有人可以改写这个,我完全赞成!)
(Note: Title doesn't seem to clear -- if someone can rephrase this I'm all for it!)
给定这个正则表达式:(.*_e.txt),它匹配一些文件名,除了 e 之外,我还需要添加一些其他的单字符后缀.我应该选择一个字符类还是应该为此使用替代?(或者真的重要吗??)
Given this regex: (.*_e.txt), which matches some filenames, I need to add some other single character suffixes in addition to the e. Should I choose a character class or should I use an alternation for this? (Or does it really matter??)
也就是说,以下哪两个看起来更好",为什么:
That is, which of the following two seems "better", and why:
a) (.*(e|f|x).txt),或
b) (.*[efx].txt)
推荐答案使用 [efx] - 这正是字符类的设计目的:匹配包含的字符之一.因此它也是最易读、最短的解决方案.
Use [efx] - that's exactly what character classes are designed for: to match one of the included characters. Therefore it's also the most readable and shortest solution.
我不知道它是否更快,但如果不是,我会非常惊讶.绝对不会慢.
I don't know if it's faster, but I would be very much surprised if it wasn't. It definitely won't be slower.
我的推理(从来没有写过正则表达式引擎,所以这纯粹是猜想):
My reasoning (without ever having written a regex engine, so this is pure conjecture):
正则表达式标记 [abc] 将在正则表达式引擎的一个步骤中应用:下一个字符是 a, b,还是c?"
The regex token [abc] will be applied in a single step of the regex engine: "Is the next character one of a, b, or c?"
(a|b|c) 但是告诉正则表达式引擎
(a|b|c) however tells the regex engine to
- 如有必要,记住字符串中的当前位置以进行回溯
- 检查是否可以匹配a.如果是这样,成功.如果不是:
- 检查是否可以匹配b.如果是这样,成功.如果不是:
- 检查是否可以匹配c.如果是这样,成功.如果不是:
- 放弃.
- remember the current position in the string for backtracking, if necessary
- check if it's possible to match a. If so, success. If not:
- check if it's possible to match b. If so, success. If not:
- check if it's possible to match c. If so, success. If not:
- give up.
更多推荐
使用交替或字符类进行单字符匹配?
发布评论