在http://llvm.org/svn/llvm-project/libcxx/trunk/test/re/re.alg/re.alg.match/ecma.pass.cpp中 ,存在以下测试:
std::cmatch m; const char s[] = "tournament"; assert(!std::regex_match(s, m, std::regex("tour|to|tournament"))); assert(m.size() == 0);为什么这场比赛失败?
在VC ++ 2012和boost上,匹配成功。 在Chrome和Firefox的Javascript上, "tournament".match(/^(?:tour|to|tournament)$/)成功。
仅在libc ++上,匹配失败。
In http://llvm.org/svn/llvm-project/libcxx/trunk/test/re/re.alg/re.alg.match/ecma.pass.cpp, the following test exists:
std::cmatch m; const char s[] = "tournament"; assert(!std::regex_match(s, m, std::regex("tour|to|tournament"))); assert(m.size() == 0);Why should this match be failed?
On VC++2012 and boost, the match succeeds. On Javascript of Chrome and Firefox, "tournament".match(/^(?:tour|to|tournament)$/) succeeds.
Only on libc++, the match fails.
最满意答案
我相信测试是正确的。 在re.alg下搜索所有libc ++测试中的“比赛”是有益的,并比较不同引擎如何处理regex("tour|to|tournament") ,以及regex_search与regex_match区别。
让我们从regex_search开始:
awk,egrep,extended:
regex_search("tournament", m, regex("tour|to|tournament"))匹配整个输入字符串:“锦标赛”。
ECMAScript中:
regex_search("tournament", m, regex("tour|to|tournament"))只匹配部分输入字符串:“游览”。
grep,基本:
regex_search("tournament", m, regex("tour|to|tournament"))根本不匹配。 '|' 性格并不特别。
awk,egrep和extended将尽可能匹配替换。 但是,ECMAScript更改是“有序的”。 这在ECMA-262中有规定。 一旦ECMAScript在交替中匹配一个分支,它就退出搜索。 该标准包括这个例子:
/a|ab/.exec("abc")返回结果“a”而不是“ab”。
<plug>
这也在Jeffrey EF Friedl的Mastering Regular Expressions中深入讨论。 没有这本书,我无法实现<regex> 。 我会毫不犹豫地承认,我还不知道正则表达式的含义还远不止我所知道的。
作者在交替章节的末尾写道:
如果你在第一次阅读本章时理解了本章的所有内容,那么你可能并没有首先阅读它。
相信它!
</plug>
无论如何,ECMAScript只匹配“游览”。 只有在整个输入字符串匹配的情况下, regex_match算法才会返回成功。 由于只有输入字符串的前4个字符相匹配,因此与awk,egrep和extended不同,ECMAScript以零大小的cmatch返回false。
I believe the test is correct. It is instructive to search for "tournament" in all of the libc++ tests under re.alg, and compare how the different engines treat the regex("tour|to|tournament"), and how regex_search differs from regex_match.
Let's start with regex_search:
awk, egrep, extended:
regex_search("tournament", m, regex("tour|to|tournament"))matches the entire input string: "tournament".
ECMAScript:
regex_search("tournament", m, regex("tour|to|tournament"))matches only part of the input string: "tour".
grep, basic:
regex_search("tournament", m, regex("tour|to|tournament"))Doesn't match at all. The '|' character is not special.
awk, egrep and extended will match as much as they can with alternation. However the ECMAScript alternation is "ordered". This is specified in ECMA-262. Once ECMAScript matches a branch in the alternation, it quits searching. The standard includes this example:
/a|ab/.exec("abc")returns the result "a" and not "ab".
<plug>
This is also discussed in depth in Mastering Regular Expressions by Jeffrey E.F. Friedl. I couldn't have implemented <regex> without this book. And I will freely admit that there is still much more that I don't know about regular expressions, than what I know.
At the end of the chapter on alternation the author states:
If you understood everything in this chapter the first time you read it, you probably didn't read it in the first place.
Believe it!
</plug>
Anyway, ECMAScript matches only "tour". The regex_match algorithm returns success only if the entire input string is matched. Since only the first 4 characters of the input string are matched, then unlike awk, egrep and extended, ECMAScript returns false with a zero-sized cmatch.
更多推荐
发布评论