在libc ++上,为什么regex

编程入门 行业动态 更新时间:2024-10-26 20:21:51
libc ++上,为什么regex_match(“tournament”,regex(“tour | to | tournament”))失败?(On libc++, why does regex_match(“tournament”, regex(“tour|to|tournament”)) fail?)

在http://llvm.org/svn/llvm-project/libcxx/trunk/test/re/re.alg/re.alg.match/ecma.pass.cpp中 ,存在以下测试:

std::cmatch m; const char s[] = "tournament"; assert(!std::regex_match(s, m, std::regex("tour|to|tournament"))); assert(m.size() == 0);

为什么这场比赛失败?

在VC ++ 2012和boost上,匹配成功。 在Chrome和Firefox的Javascript上, "tournament".match(/^(?:tour|to|tournament)$/)成功。

仅在libc ++上,匹配失败。

In http://llvm.org/svn/llvm-project/libcxx/trunk/test/re/re.alg/re.alg.match/ecma.pass.cpp, the following test exists:

std::cmatch m; const char s[] = "tournament"; assert(!std::regex_match(s, m, std::regex("tour|to|tournament"))); assert(m.size() == 0);

Why should this match be failed?

On VC++2012 and boost, the match succeeds. On Javascript of Chrome and Firefox, "tournament".match(/^(?:tour|to|tournament)$/) succeeds.

Only on libc++, the match fails.

最满意答案

我相信测试是正确的。 在re.alg下搜索所有libc ++测试中的“比赛”是有益的,并比较不同引擎如何处理regex("tour|to|tournament") ,以及regex_search与regex_match区别。

让我们从regex_search开始:

awk,egrep,extended:

regex_search("tournament", m, regex("tour|to|tournament"))

匹配整个输入字符串:“锦标赛”。

ECMAScript中:

regex_search("tournament", m, regex("tour|to|tournament"))

只匹配部分输入字符串:“游览”。

grep,基本:

regex_search("tournament", m, regex("tour|to|tournament"))

根本不匹配。 '|' 性格并不特别。

awk,egrep和extended将尽可能匹配替换。 但是,ECMAScript更改是“有序的”。 这在ECMA-262中有规定。 一旦ECMAScript在交替中匹配一个分支,它就退出搜索。 该标准包括这个例子:

/a|ab/.exec("abc")

返回结果“a”而不是“ab”。

<plug>

这也在Jeffrey EF Friedl的Mastering Regular Expressions中深入讨论。 没有这本书,我无法实现<regex> 。 我会毫不犹豫地承认,我还不知道正则表达式的含义还远不止我所知道的。

作者在交替章节的末尾写道:

如果你在第一次阅读本章时理解了本章的所有内容,那么你可能并没有首先阅读它。

相信它!

</plug>

无论如何,ECMAScript只匹配“游览”。 只有在整个输入字符串匹配的情况下, regex_match算法才会返回成功。 由于只有输入字符串的前4个字符相匹配,因此与awk,egrep和extended不同,ECMAScript以零大小的cmatch返回false。

I believe the test is correct. It is instructive to search for "tournament" in all of the libc++ tests under re.alg, and compare how the different engines treat the regex("tour|to|tournament"), and how regex_search differs from regex_match.

Let's start with regex_search:

awk, egrep, extended:

regex_search("tournament", m, regex("tour|to|tournament"))

matches the entire input string: "tournament".

ECMAScript:

regex_search("tournament", m, regex("tour|to|tournament"))

matches only part of the input string: "tour".

grep, basic:

regex_search("tournament", m, regex("tour|to|tournament"))

Doesn't match at all. The '|' character is not special.

awk, egrep and extended will match as much as they can with alternation. However the ECMAScript alternation is "ordered". This is specified in ECMA-262. Once ECMAScript matches a branch in the alternation, it quits searching. The standard includes this example:

/a|ab/.exec("abc")

returns the result "a" and not "ab".

<plug>

This is also discussed in depth in Mastering Regular Expressions by Jeffrey E.F. Friedl. I couldn't have implemented <regex> without this book. And I will freely admit that there is still much more that I don't know about regular expressions, than what I know.

At the end of the chapter on alternation the author states:

If you understood everything in this chapter the first time you read it, you probably didn't read it in the first place.

Believe it!

</plug>

Anyway, ECMAScript matches only "tour". The regex_match algorithm returns success only if the entire input string is matched. Since only the first 4 characters of the input string are matched, then unlike awk, egrep and extended, ECMAScript returns false with a zero-sized cmatch.

更多推荐

本文发布于:2023-04-29 10:21:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1336265.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:libc   regex

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!