在Python中使用正则表达式从Apple store html中提取应用程序的描述(Using regex in Python to extract description of an app fr

编程入门行业动态更新时间:2024-10-11 21:25:00

在Python中使用正则表达式从Apple store html中提取应用程序的描述(Using regex in Python to extract description of an app from Apple store html)

我需要从Apple商店html中提取应用程序的描述。说明介于两者之间

DESCRIPTION HERE

描述包含一堆符号，文字，空格等

很明显，html有很多其他的文本和标签，因此它需要非常精确的进行模式匹配。

谢谢

I need to extract description of an app from Apple store html. Description sits in-between

DESCRIPTION HERE

Where description contains bunch of symbols, words, spaces, etc.

Clearly html has lots of other text and tags, so it needs to be very precise for pattern matching.

Thanks

最满意答案

不要使用正则表达式来解析HTML！

使用像BeautifulSoup这样的HTML解析器！

>>> import bs4 >>> s = ' DESCRIPTION HERE ' >>> soup = bs4.BeautifulSoup(s, "html.parser") >>> soup.find("p", {"itemprop": "description"}).text >>> u' DESCRIPTION HERE '

或者如果你想找到所有元素：

>>> [item.text for item in soup.find_all("p", {"itemprop": "description"})] >>> [u' DESCRIPTION HERE ']

Don't use regular expressions to parse HTML!

Use an HTML parser like BeautifulSoup!

>>> import bs4 >>> s = ' DESCRIPTION HERE ' >>> soup = bs4.BeautifulSoup(s, "html.parser") >>> soup.find("p", {"itemprop": "description"}).text >>> u' DESCRIPTION HERE '

Or if you want to find all elements:

>>> [item.text for item in soup.find_all("p", {"itemprop": "description"})] >>> [u' DESCRIPTION HERE ']

更多推荐

本文发布于:2023-07-31 10:06:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1342490.html