匹配数字前面或后面没有数字的特定数字(Match a specific number of digits not preceded or followed by digits)

编程入门 行业动态 更新时间:2024-10-25 08:27:22
匹配数字前面或后面没有数字的特定数字(Match a specific number of digits not preceded or followed by digits)

我有一个字符串:

string = u'11a2ee22b333c44d5e66e777e8888'

我想找到所有k个连续的数字块,其中n <= k <= m 。

仅使用正则表达式:例如n=2和m=3使用(?:\D|^)(\d{2,3})(?:\D|$)

re.findall(u'(?:\D|^)(\d{2,3})(?:\D|$)',u'11a2ee22b333c44d5e66e777e8888')

给出这个输出:

['11', '333', '66']

期望的输出:

['11', '22', '333', '44', '66', '777']

我知道有其他解决方案,如:

filter(lambda x: re.match('^\d{2,3}$', x), re.split(u'\D',r'11a2ee22b333c44d5e66e777e8888'))

它提供了所需的输出,但我想知道第一种方法有什么问题?

似乎re.findall按顺序排列并在匹配时跳过前一部分,那么可以做些什么呢?

I have a string:

string = u'11a2ee22b333c44d5e66e777e8888'

I want to find all k consecutive chunks of digits where n <= k <= m.

Using regular expression only: say for example n=2 and m=3 using (?:\D|^)(\d{2,3})(?:\D|$)

re.findall(u'(?:\D|^)(\d{2,3})(?:\D|$)',u'11a2ee22b333c44d5e66e777e8888')

Gives this output:

['11', '333', '66']

Desired output:

['11', '22', '333', '44', '66', '777']

I know there are alternate solutions like:

filter(lambda x: re.match('^\d{2,3}$', x), re.split(u'\D',r'11a2ee22b333c44d5e66e777e8888'))

which gives the desired output, but I want to know what's wrong with the first approach?

It seems re.findall goes in sequence and skips the previous part when matched, so what can be done?

最满意答案

注意:您在问题中显示的结果不是我得到的结果:

>>> import re >>> re.findall(u'(?:\D|^)(\d{2,3})(?:\D|$)',u'11a2ee22b333c44d5e66e777e8888') [u'11', u'22', u'44', u'66']

它仍然缺少你想要的一些比赛,但不是相同的。

问题是即使像(?:\D|^)和(?:\D|$)这样的非捕获组没有捕获它们匹配的内容,它们仍然会使用它。

这意味着产生'22'的匹配实际消耗了:

e ,带(?:\D|^) - 未捕获(但仍然消耗) 22与(\d{2,3}) - 被捕获 b与(?:\D|$) - 未捕获(但仍然消耗)

...所以b在333之前不再可以匹配。

您可以使用lookbehind和lookahead语法获得所需的结果:

>>> re.findall(u'(?<!\d)\d{2,3}(?!\d)',u'11a2ee22b333c44d5e66e777e8888') [u'11', u'22', u'333', u'44', u'66', u'777']

在这里, (?<!\d)是负面的后视,检查匹配是否前面没有数字, (?!\d)是否为前瞻,检查匹配后面没有数字。 至关重要的是,这些结构不会消耗任何字符串。

Python的re文档的正则表达式语法部分描述了各种先行和后视结构。

Note: The result you show in your question is not what I'm getting:

>>> import re >>> re.findall(u'(?:\D|^)(\d{2,3})(?:\D|$)',u'11a2ee22b333c44d5e66e777e8888') [u'11', u'22', u'44', u'66']

It's still missing some of the matches you want, but not the same ones.

The problem is that even though non-capturing groups like (?:\D|^) and (?:\D|$) don't capture what they match, they still consume it.

This means that the match which yields '22' has actually consumed:

e, with (?:\D|^) – not captured (but still consumed) 22 with (\d{2,3}) – captured b with (?:\D|$) – not captured (but still consumed)

… so that b is no longer available to be matched before 333.

You can get the result you want with lookbehind and lookahead syntax:

>>> re.findall(u'(?<!\d)\d{2,3}(?!\d)',u'11a2ee22b333c44d5e66e777e8888') [u'11', u'22', u'333', u'44', u'66', u'777']

Here, (?<!\d) is a negative lookbehind, checking that the match is not preceded by a digit, and (?!\d) is a negative lookahead, checking that the match is not followed by a digit. Crucially, these constructions do not consume any of the string.

The various lookahead and lookbehind constructions are described in the Regular Expression Syntax section of Python's re documentation.

更多推荐

本文发布于:2023-07-16 07:29:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1125436.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数字   Match   specific   preceded   digits

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!