如何在python beautifulsoup中抓取交替的子标签(how to grab alternating child tags in python beautifulsoup)

编程入门 行业动态 更新时间:2024-10-28 16:16:37
如何在python beautifulsoup中抓取交替的子标签(how to grab alternating child tags in python beautifulsoup)

我试图从html页面中的交替标签获取一系列数据。 html看起来像这样:

<div>
    <h3>title</h3>
    <div>text</div>
    <h3>title</h3>
    <div>text</div>
    ...
</div>
 

由于我不能在“为div中的每一对”中获取每个h3 / div对,如何有效地抓住它们?

I am trying to get a series of data from alternating tags in a html page. The html looks like this:

<div>
    <h3>title</h3>
    <div>text</div>
    <h3>title</h3>
    <div>text</div>
    ...
</div>
 

Since I can't grab each h3/div pair in a "for each pair in div", how to I grab them efficiently?

最满意答案

找到所有标题,然后从那里抓住下一个兄弟 :

for header in soup.select('div h3'): next_div = header.find_next_sibling('div')

如果找不到这样的兄弟, element.find_next_sibling()返回一个元素或None 。

演示:

>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('''\ ... <div> ... <h3>First header</h3> ... <div>First div to go with a header</div> ... <h3>Second header</h3> ... <div>Second div to go with a header</div> ... </div> ... ''') >>> for header in soup.select('div h3'): ... next_div = header.find_next_sibling('div') ... print(header.text, next_div.text) ... First header First div to go with a header Second header Second div to go with a header

Find all headers, and grab the next sibling from there:

for header in soup.select('div h3'): next_div = header.find_next_sibling('div')

element.find_next_sibling() returns an element or None if no such sibling can be found.

Demo:

>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('''\ ... <div> ... <h3>First header</h3> ... <div>First div to go with a header</div> ... <h3>Second header</h3> ... <div>Second div to go with a header</div> ... </div> ... ''') >>> for header in soup.select('div h3'): ... next_div = header.find_next_sibling('div') ... print(header.text, next_div.text) ... First header First div to go with a header Second header Second div to go with a header

更多推荐

本文发布于:2023-08-03 14:44:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1392483.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:标签   如何在   beautifulsoup   python   tags

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!