在Python 2.7中使用re.sub ,以下示例使用简单的反向引用:
re.sub('-{1,2}', r'\g<0> ', 'pro----gram-files')它按预期输出以下字符串:
'pro-- -- gram- files'我希望以下示例相同,但它不是:
def dashrepl(matchobj): return r'\g<0> ' re.sub('-{1,2}', dashrepl, 'pro----gram-files')这会产生以下意外输出:
'pro\\g<0> \\g<0> gram\\g<0> files'为什么这两个例子给出不同的输出? 我是否遗漏了解释此内容的文档? 这种行为是否比我预期的更好? 有没有办法在替换函数中使用反向引用?
Using re.sub in Python 2.7, the following example uses a simple backreference:
re.sub('-{1,2}', r'\g<0> ', 'pro----gram-files')It outputs the following string as expected:
'pro-- -- gram- files'I would expect the following example to be identical, but it is not:
def dashrepl(matchobj): return r'\g<0> ' re.sub('-{1,2}', dashrepl, 'pro----gram-files')This gives the following unexpected output:
'pro\\g<0> \\g<0> gram\\g<0> files'Why do the two examples give different output? Did I miss something in the documentation that explains this? Is there any particular reason that this behavior is preferable to what I expected? Is there a way to use backreferences in a replacement function?
最满意答案
由于有更简单的方法来实现您的目标,您可以使用它们。
正如您已经看到的那样,您的替换函数会获取匹配对象作为参数。
除其他外,该对象具有可以替代使用的方法group() :
def dashrepl(matchobj): return matchobj.group(0) + ' '这将给出你的结果。
但你完全正确 - 文档有点令人困惑:
他们描述了repl参数:
repl可以是字符串或函数; 如果它是一个字符串,则处理其中的任何反斜杠转义。
和
如果repl是一个函数,则会为每个非重叠的模式调用调用它。 该函数接受单个匹配对象参数,并返回替换字符串。
你可以解释这个,好像函数返回的“替换字符串”也适用于反斜杠转义的处理。
但由于此处理仅针对“它是一个字符串”的情况进行描述,因此它变得更清晰,但乍一看并不明显。
As there are simpler ways to achieve your goal, you can use them.
As you already see, your replacement function gets a match object as it argument.
This object has, among others, a method group() which can be used instead:
def dashrepl(matchobj): return matchobj.group(0) + ' 'which will give exactly your result.
But you are completely right - the docs are a bit confusing in that way:
they describe the repl argument:
repl can be a string or a function; if it is a string, any backslash escapes in it are processed.
and
If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.
You could interpret this as if "the replacement string" returned by the function would also apply to the processment of backslash escapes.
But as this processment is described only for the case that "it is a string", it becomes clearer, but not obvious at the first glance.
更多推荐
发布评论