正则表达式从Python中删除字符串中的html标记(Regular Expression to remove html tags from a string in Python)
我使用以下代码从RSS源获取我的resut:
try: desc = item.xpath('description')[0].text if date is not None: desc =date +"\n"+"\n"+desc except: desc = None但有时描述包含RSS feed中的html标签,如下所示:
这是一篇文章
<img src =“http:// imageURL”alt =“”/>
在显示内容时,我不希望在页面上显示任何HTML标记。 是否有任何正则表达式来删除HTML标记。
I am fetching my resut from a RSS feed using following code:
try: desc = item.xpath('description')[0].text if date is not None: desc =date +"\n"+"\n"+desc except: desc = NoneBut sometimes the description contains html tags inside RSS feed as below:
This is samle text
< img src="http://imageURL" alt="" />
While displaying the content I do not want any HTML tags to be displayed on page. Is there any regular expression to remove the HTML tags.
最满意答案
尝试:
pattern = re.compile(u'<\/?\w+\s*[^>]*?\/?>', re.DOTALL | re.MULTILINE | re.IGNORECASE | re.UNICODE) text = pattern.sub(u" ", text)Try:
pattern = re.compile(u'<\/?\w+\s*[^>]*?\/?>', re.DOTALL | re.MULTILINE | re.IGNORECASE | re.UNICODE) text = pattern.sub(u" ", text)更多推荐
发布评论