我正在研究Python中的Regex函数。 作为其中的一部分,我试图从字符串中提取子字符串。
例如,假设我有字符串:
<place of birth="Stockholm">有没有办法通过一个正则表达式调用来提取斯德哥尔摩?
到目前为止,我有:
location_info = "<place of birth="Stockholm">" #Remove before location_name1 = re.sub(r"<place of birth=\"", r"", location_info) #location_name1 --> Stockholm"> #Remove after location_name2 = re.sub(r"\">", r"", location_name1) #location_name2 --> Stockholm关于如何在不使用两个“re.sub”调用的情况下提取字符串斯德哥尔摩的任何建议都非常受欢迎。
I am looking into the Regex function in Python. As part of this, I am trying to extract a substring from a string.
For instance, assume I have the string:
<place of birth="Stockholm">Is there a way to extract Stockholm with a single regex call?
So far, I have:
location_info = "<place of birth="Stockholm">" #Remove before location_name1 = re.sub(r"<place of birth=\"", r"", location_info) #location_name1 --> Stockholm"> #Remove after location_name2 = re.sub(r"\">", r"", location_name1) #location_name2 --> StockholmAny advice on how to extract the string Stockholm, without using two "re.sub" calls is highly appreciated.
最满意答案
当然,您可以将开头与双引号相匹配,然后匹配并捕获除双引号之外的所有字符:
import re p = re.compile(r'<place of birth="([^"]*)') location_info = "<place of birth=\"Stockholm\">" match = p.search(location_info) if match: print(match.group(1))请参阅IDEONE演示
<place of birth="匹配为文字, ([^"]*)是捕获组1,匹配0或更多字符而不是" 。该值使用.group(1)访问。
这是一个REGEX演示 。
Sure, you can match the beginning up to the double quotes, and match and capture all the characters other than double quotes after that:
import re p = re.compile(r'<place of birth="([^"]*)') location_info = "<place of birth=\"Stockholm\">" match = p.search(location_info) if match: print(match.group(1))See IDEONE demo
The <place of birth=" is matched as a literal, and ([^"]*) is a capture group 1 matching 0 or more characters other than ". The value is accessed with .group(1).
Here is a REGEX demo.
更多推荐
发布评论