在Python中使用正则表达式从Apple store html中提取应用程序的描述(Using regex in Python to extract description of an app from Apple store html)
我需要从Apple商店html中提取应用程序的描述。 说明介于两者之间
<p itemprop="description"> DESCRIPTION HERE </p>描述包含一堆 符号,文字,空格等
很明显,html有很多其他的文本和标签,因此它需要非常精确的进行模式匹配。
谢谢
I need to extract description of an app from Apple store html. Description sits in-between
<p itemprop="description"> DESCRIPTION HERE </p>Where description contains bunch of symbols, words, spaces, etc.
Clearly html has lots of other text and tags, so it needs to be very precise for pattern matching.
Thanks
最满意答案
不要使用正则表达式来解析HTML!
使用像BeautifulSoup这样的HTML解析器!
>>> import bs4 >>> s = '<p itemprop="description"> DESCRIPTION HERE </p>' >>> soup = bs4.BeautifulSoup(s, "html.parser") >>> soup.find("p", {"itemprop": "description"}).text >>> u' DESCRIPTION HERE '或者如果你想找到所有元素:
>>> [item.text for item in soup.find_all("p", {"itemprop": "description"})] >>> [u' DESCRIPTION HERE ']Don't use regular expressions to parse HTML!
Use an HTML parser like BeautifulSoup!
>>> import bs4 >>> s = '<p itemprop="description"> DESCRIPTION HERE </p>' >>> soup = bs4.BeautifulSoup(s, "html.parser") >>> soup.find("p", {"itemprop": "description"}).text >>> u' DESCRIPTION HERE 'Or if you want to find all elements:
>>> [item.text for item in soup.find_all("p", {"itemprop": "description"})] >>> [u' DESCRIPTION HERE ']更多推荐
发布评论