BeautifulSoup：只要进入标签内部，无论标签有多少个封闭标签(BeautifulSoup: just get inside of a tag, no matter how many encl

编程入门行业动态更新时间:2024-10-23 07:39:18

BeautifulSoup：只要进入标签内部，无论标签有多少个封闭标签(BeautifulSoup: just get inside of a tag, no matter how many enclosing tags there are)

我试图使用BeautifulSoup从网页中的元素中删除所有内部html。有内部标签，但我不在乎，我只想获得内部文本。

例如，对于：

Red Blue Yellow Light green

我如何提取：

Red Blue Yellow Light green

.string和.contents[0]都没有做我所需要的。 .extract()也不是。因为我不想事先指定内部标签 - 我想处理任何可能发生的事情。

在BeautifulSoup中是否存在'仅获取可见的HTML'类型的方法？

---- ------ UPDATE

建议，尝试：

soup = BeautifulSoup(open("test.html")) p_tags = soup.findAll('p',text=True) for i, p_tag in enumerate(p_tags): print str(i) + p_tag

但是这没有帮助 - 它打印出来：

0Red 1 2Blue 3 4Yellow 5 6Light 7green 8

I'm trying to scrape all the inner html from the elements in a web page using BeautifulSoup. There are internal tags, but I don't care, I just want to get the internal text.

For example, for:

Red Blue Yellow Light green

How can I extract:

Red Blue Yellow Light green

Neither .string nor .contents[0] does what I need. Nor does .extract(), because I don't want to have to specify the internal tags in advance - I want to deal with any that may occur.

Is there a 'just get the visible HTML' type of method in BeautifulSoup?

----UPDATE------

On advice, trying:

soup = BeautifulSoup(open("test.html")) p_tags = soup.findAll('p',text=True) for i, p_tag in enumerate(p_tags): print str(i) + p_tag

But that doesn't help - it prints out:

0Red 1 2Blue 3 4Yellow 5 6Light 7green 8

最满意答案

简短的回答： soup.findAll(text=True)

这已经在StackOverflow和BeautifulSoup文档中得到了解答。

更新：

为了澄清，一段代码：

>>> txt = """\ Red Blue Yellow Light green """ >>> import BeautifulSoup >>> BeautifulSoup.__version__ '3.0.7a' >>> soup = BeautifulSoup.BeautifulSoup(txt) >>> for node in soup.findAll('p'): print ''.join(node.findAll(text=True)) Red Blue Yellow Light green

Short answer: soup.findAll(text=True)

This has already been answered, here on StackOverflow and in the BeautifulSoup documentation.

UPDATE:

To clarify, a working piece of code:

更多推荐

本文发布于:2023-08-07 21:01:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1465989.html