Python多次重复错误

编程入门 行业动态 更新时间:2024-10-24 16:22:45
本文介绍了Python多次重复错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在尝试确定一个术语是否出现在字符串中.术语前后必须出现空格,也允许使用标准后缀.示例:

术语:谷歌字符串:我爱谷歌!!!"结果:找到术语:狗字符串:我爱狗"结果:找到

我正在尝试以下代码:

regexPart1 = "\s"regexPart2 = "(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s"p = repile(regexPart1 + term + regexPart2 , re.IGNORECASE)

并得到错误:

引发错误(多次重复")sre_constants.error:多次重复

更新失败的真实代码:

term = 'lg incite" OR author:"http++www.dealitem" OR "for sale'regexPart1 = r"\s"regexPart2 = r"(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s"p = repile(regexPart1 + term + regexPart2 , re.IGNORECASE)

另一方面,下面的term顺利通过(+而不是++)

term = 'lg incite" OR author:"http+www.dealitem" OR "for sale'

解决方案

问题在于,在非原始字符串中,\" 是 ".>

幸运的是,所有其他未转义的反斜杠 - \s 与 \\s 相同,而不是 s;\( 等同于 \\(,而不是 (,等等.但是你永远不应该依赖运气,或者假设你熟记所有 Python 转义序列.

要么打印出您的字符串并转义丢失的反斜杠(坏),转义所有您的反斜杠(好的),或者首先使用原始字符串(最好).

话虽如此,您发布的正则表达式与某些它应该匹配的表达式不匹配,但它永远不会引发 "multiple repeat" 错误.显然,您的实际代码与您向我们展示的代码不同,我们无法调试我们看不到的代码.

既然您已经展示了一个真正可重现的测试用例,那就是一个单独的问题.

您正在搜索可能包含特殊正则表达式字符的术语,如下所示:

term = 'lg incite" OR author:"http++www.dealitem" OR "for sale'

正则表达式中间的那个p++表示字母p中的1个或多个"(在其他情况下,与字母p的1个或多个"相同)在某些正则表达式语言中,在其他语言中总是失败",在其他语言中引发异常".Python 的 re 属于最后一组.事实上,您可以单独测试:

>>>重新编译('p++')错误:多次重复

如果要将随机字符串放入正则表达式,则需要调用 re.escape 在他们身上.

还有一个问题(感谢 Ωmega):

. 在正则表达式中表示任何字符".所以,,|.|;|:"(我刚刚提取了你较长交替链的一小段)意味着一个逗号,或任何字符,或分号,或冒号"......与任何字符"相同.您可能想转义 ..

将所有三个修复程序放在一起:

term = 'lg incite" OR author:"http++www.dealitem" OR "for sale'regexPart1 = r"\s"regexPart2 = r"(?:s|'s|!+|,|\.|;|:|\(|\)|\"|\?+)?\s"p = repile(regexPart1 + re.escape(term) + regexPart2, re.IGNORECASE)

正如 Ωmega 在评论中指出的那样,如果它们都是一个字符长,您就不需要使用一连串的交替;字符类也一样,更简洁,更易读.

而且我确信还有其他方法可以改进这一点.

I'm trying to determine whether a term appears in a string. Before and after the term must appear a space, and a standard suffix is also allowed. Example:

term: google string: "I love google!!! " result: found term: dog string: "I love dogs " result: found

I'm trying the following code:

regexPart1 = "\s" regexPart2 = "(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s" p = repile(regexPart1 + term + regexPart2 , re.IGNORECASE)

and get the error:

raise error("multiple repeat") sre_constants.error: multiple repeat

Update Real code that fails:

term = 'lg incite" OR author:"http++www.dealitem" OR "for sale' regexPart1 = r"\s" regexPart2 = r"(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s" p = repile(regexPart1 + term + regexPart2 , re.IGNORECASE)

On the other hand, the following term passes smoothly (+ instead of ++)

term = 'lg incite" OR author:"http+www.dealitem" OR "for sale'

解决方案

The problem is that, in a non-raw string, \" is ".

You get lucky with all of your other unescaped backslashes—\s is the same as \\s, not s; \( is the same as \\(, not (, and so on. But you should never rely on getting lucky, or assuming that you know the whole list of Python escape sequences by heart.

Either print out your string and escape the backslashes that get lost (bad), escape all of your backslashes (OK), or just use raw strings in the first place (best).

That being said, your regexp as posted won't match some expressions that it should, but it will never raise that "multiple repeat" error. Clearly, your actual code is different from the code you've shown us, and it's impossible to debug code we can't see.

Now that you've shown a real reproducible test case, that's a separate problem.

You're searching for terms that may have special regexp characters in them, like this:

term = 'lg incite" OR author:"http++www.dealitem" OR "for sale'

That p++ in the middle of a regexp means "1 or more of 1 or more of the letter p" (in the others, the same as "1 or more of the letter p") in some regexp languages, "always fail" in others, and "raise an exception" in others. Python's re falls into the last group. In fact, you can test this in isolation:

>>> repile('p++') error: multiple repeat

If you want to put random strings into a regexp, you need to call re.escape on them.

One more problem (thanks to Ωmega):

. in a regexp means "any character". So, ,|.|;|:" (I've just extracted a short fragment of your longer alternation chain) means "a comma, or any character, or a semicolon, or a colon"… which is the same as "any character". You probably wanted to escape the ..

Putting all three fixes together:

term = 'lg incite" OR author:"http++www.dealitem" OR "for sale' regexPart1 = r"\s" regexPart2 = r"(?:s|'s|!+|,|\.|;|:|\(|\)|\"|\?+)?\s" p = repile(regexPart1 + re.escape(term) + regexPart2 , re.IGNORECASE)

As Ωmega also pointed out in a comment, you don't need to use a chain of alternations if they're all one character long; a character class will do just as well, more concisely and more readably.

And I'm sure there are other ways this could be improved.

更多推荐

Python多次重复错误

本文发布于:2023-11-22 20:57:01,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1618994.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:错误   Python

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!