BeautifulSoup:只要进入标签内部,无论标签有多少个封闭标签(BeautifulSoup: just get inside of a tag, no matter how many encl

编程入门 行业动态 更新时间:2024-10-23 07:39:18
BeautifulSoup:只要进入标签内部,无论标签有多少个封闭标签(BeautifulSoup: just get inside of a tag, no matter how many enclosing tags there are)

我试图使用BeautifulSoup从网页中的<p>元素中删除所有内部html。 有内部标签,但我不在乎,我只想获得内部文本。

例如,对于:

<p>Red</p> <p><i>Blue</i></p> <p>Yellow</p> <p>Light <b>green</b></p>

我如何提取:

Red Blue Yellow Light green

.string和.contents[0]都没有做我所需要的。 .extract()也不是。因为我不想事先指定内部标签 - 我想处理任何可能发生的事情。

在BeautifulSoup中是否存在'仅获取可见的HTML'类型的方法?

---- ------ UPDATE

建议,尝试:

soup = BeautifulSoup(open("test.html")) p_tags = soup.findAll('p',text=True) for i, p_tag in enumerate(p_tags): print str(i) + p_tag

但是这没有帮助 - 它打印出来:

0Red 1 2Blue 3 4Yellow 5 6Light 7green 8

I'm trying to scrape all the inner html from the <p> elements in a web page using BeautifulSoup. There are internal tags, but I don't care, I just want to get the internal text.

For example, for:

<p>Red</p> <p><i>Blue</i></p> <p>Yellow</p> <p>Light <b>green</b></p>

How can I extract:

Red Blue Yellow Light green

Neither .string nor .contents[0] does what I need. Nor does .extract(), because I don't want to have to specify the internal tags in advance - I want to deal with any that may occur.

Is there a 'just get the visible HTML' type of method in BeautifulSoup?

----UPDATE------

On advice, trying:

soup = BeautifulSoup(open("test.html")) p_tags = soup.findAll('p',text=True) for i, p_tag in enumerate(p_tags): print str(i) + p_tag

But that doesn't help - it prints out:

0Red 1 2Blue 3 4Yellow 5 6Light 7green 8

最满意答案

简短的回答: soup.findAll(text=True)

这已经在StackOverflow和BeautifulSoup文档中得到了解答。

更新:

为了澄清,一段代码:

>>> txt = """\ <p>Red</p> <p><i>Blue</i></p> <p>Yellow</p> <p>Light <b>green</b></p> """ >>> import BeautifulSoup >>> BeautifulSoup.__version__ '3.0.7a' >>> soup = BeautifulSoup.BeautifulSoup(txt) >>> for node in soup.findAll('p'): print ''.join(node.findAll(text=True)) Red Blue Yellow Light green

Short answer: soup.findAll(text=True)

This has already been answered, here on StackOverflow and in the BeautifulSoup documentation.

UPDATE:

To clarify, a working piece of code:

>>> txt = """\ <p>Red</p> <p><i>Blue</i></p> <p>Yellow</p> <p>Light <b>green</b></p> """ >>> import BeautifulSoup >>> BeautifulSoup.__version__ '3.0.7a' >>> soup = BeautifulSoup.BeautifulSoup(txt) >>> for node in soup.findAll('p'): print ''.join(node.findAll(text=True)) Red Blue Yellow Light green

更多推荐

本文发布于:2023-08-07 21:01:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1465989.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:标签   有多少个   BeautifulSoup   tag   tags

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!