正则表达式从Python中删除字符串中的html标记(Regular Expression to remove html tags from a string in Python)

编程入门行业动态更新时间:2024-10-27 10:20:47

我使用以下代码从RSS源获取我的resut：

try: desc = item.xpath('description')[0].text if date is not None: desc =date +"\n"+"\n"+desc except: desc = None

但有时描述包含RSS feed中的html标签，如下所示：

这是一篇文章

<img src =“http：// imageURL”alt =“”/>

在显示内容时，我不希望在页面上显示任何HTML标记。是否有任何正则表达式来删除HTML标记。

I am fetching my resut from a RSS feed using following code:

try: desc = item.xpath('description')[0].text if date is not None: desc =date +"\n"+"\n"+desc except: desc = None

But sometimes the description contains html tags inside RSS feed as below:

This is samle text

< img src="http://imageURL" alt="" />

While displaying the content I do not want any HTML tags to be displayed on page. Is there any regular expression to remove the HTML tags.

最满意答案

尝试：

pattern = re.compile(u'<\/?\w+\s*[^>]*?\/?>', re.DOTALL | re.MULTILINE | re.IGNORECASE | re.UNICODE) text = pattern.sub(u" ", text)

Try:

pattern = re.compile(u'<\/?\w+\s*[^>]*?\/?>', re.DOTALL | re.MULTILINE | re.IGNORECASE | re.UNICODE) text = pattern.sub(u" ", text)

更多推荐

本文发布于:2023-07-21 23:18:00，感谢您对本站的认可！

评论列表（有 0 条评论）