匹配数字前面或后面没有数字的特定数字(Match a specific number of digits not preceded or followed by digits)

编程入门行业动态更新时间:2024-10-25 08:27:22

我有一个字符串：

string = u'11a2ee22b333c44d5e66e777e8888'

我想找到所有k个连续的数字块，其中n <= k <= m 。

仅使用正则表达式：例如n=2和m=3使用(?:\D|^)(\d{2,3})(?:\D|$)

re.findall(u'(?:\D|^)(\d{2,3})(?:\D|$)',u'11a2ee22b333c44d5e66e777e8888')

给出这个输出：

['11', '333', '66']

期望的输出：

['11', '22', '333', '44', '66', '777']

我知道有其他解决方案，如：

filter(lambda x: re.match('^\d{2,3}$', x), re.split(u'\D',r'11a2ee22b333c44d5e66e777e8888'))

它提供了所需的输出，但我想知道第一种方法有什么问题？

似乎re.findall按顺序排列并在匹配时跳过前一部分，那么可以做些什么呢？

I have a string:

string = u'11a2ee22b333c44d5e66e777e8888'

I want to find all k consecutive chunks of digits where n <= k <= m.

Using regular expression only: say for example n=2 and m=3 using (?:\D|^)(\d{2,3})(?:\D|$)

re.findall(u'(?:\D|^)(\d{2,3})(?:\D|$)',u'11a2ee22b333c44d5e66e777e8888')

Gives this output:

['11', '333', '66']

Desired output:

['11', '22', '333', '44', '66', '777']

I know there are alternate solutions like:

filter(lambda x: re.match('^\d{2,3}$', x), re.split(u'\D',r'11a2ee22b333c44d5e66e777e8888'))

which gives the desired output, but I want to know what's wrong with the first approach?

It seems re.findall goes in sequence and skips the previous part when matched, so what can be done?

最满意答案

注意：您在问题中显示的结果不是我得到的结果：

>>> import re >>> re.findall(u'(?:\D|^)(\d{2,3})(?:\D|$)',u'11a2ee22b333c44d5e66e777e8888') [u'11', u'22', u'44', u'66']

它仍然缺少你想要的一些比赛，但不是相同的。

问题是即使像(?:\D|^)和(?:\D|$)这样的非捕获组没有捕获它们匹配的内容，它们仍然会使用它。

这意味着产生'22'的匹配实际消耗了：

e ，带(?:\D|^) - 未捕获（但仍然消耗） 22与(\d{2,3}) - 被捕获 b与(?:\D|$) - 未捕获（但仍然消耗）

...所以b在333之前不再可以匹配。

您可以使用lookbehind和lookahead语法获得所需的结果：

>>> re.findall(u'(?<!\d)\d{2,3}(?!\d)',u'11a2ee22b333c44d5e66e777e8888') [u'11', u'22', u'333', u'44', u'66', u'777']

在这里， (?<!\d)是负面的后视，检查匹配是否前面没有数字， (?!\d)是否为前瞻，检查匹配后面没有数字。至关重要的是，这些结构不会消耗任何字符串。

Python的re文档的正则表达式语法部分描述了各种先行和后视结构。

Note: The result you show in your question is not what I'm getting:

>>> import re >>> re.findall(u'(?:\D|^)(\d{2,3})(?:\D|$)',u'11a2ee22b333c44d5e66e777e8888') [u'11', u'22', u'44', u'66']

It's still missing some of the matches you want, but not the same ones.

The problem is that even though non-capturing groups like (?:\D|^) and (?:\D|$) don't capture what they match, they still consume it.

This means that the match which yields '22' has actually consumed:

e, with (?:\D|^) – not captured (but still consumed) 22 with (\d{2,3}) – captured b with (?:\D|$) – not captured (but still consumed)

… so that b is no longer available to be matched before 333.

You can get the result you want with lookbehind and lookahead syntax:

>>> re.findall(u'(?<!\d)\d{2,3}(?!\d)',u'11a2ee22b333c44d5e66e777e8888') [u'11', u'22', u'333', u'44', u'66', u'777']

Here, (?<!\d) is a negative lookbehind, checking that the match is not preceded by a digit, and (?!\d) is a negative lookahead, checking that the match is not followed by a digit. Crucially, these constructions do not consume any of the string.

The various lookahead and lookbehind constructions are described in the Regular Expression Syntax section of Python's re documentation.

更多推荐

本文发布于:2023-07-16 07:29:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1125436.html

上一篇：如何将windbg命令重定向到文件而不在windbg控制台上回显输出?
下一篇：如何在python中删除一个诅咒窗口并恢复背景窗口?

发布评论取消回复

评论列表（有 0 条评论）

匹配数字前面或后面没有数字的特定数字(Match a specific number of digits not preceded or followed by digits)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表