如何在python beautifulsoup中抓取交替的子标签(how to grab alternating child tags in python beautifulsoup)

编程入门行业动态更新时间:2024-10-28 16:16:37

我试图从html页面中的交替标签获取一系列数据。 html看起来像这样：

<div>
    <h3>title</h3>
    <div>text</div>
    <h3>title</h3>
    <div>text</div>
    ...
</div>
 
 由于我不能在“为div中的每一对”中获取每个h3 / div对，如何有效地抓住它们？ 
I am trying to get a series of data from alternating tags in a html page. The html looks like this: 
<div>
    <h3>title</h3>
    <div>text</div>
    <h3>title</h3>
    <div>text</div>
    ...
</div>
 
Since I can't grab each h3/div pair in a "for each pair in div", how to I grab them efficiently?
                最满意答案
                
                    
                         找到所有标题，然后从那里抓住下一个兄弟 ：  
for header in soup.select('div h3'):
    next_div = header.find_next_sibling('div')
 
 如果找不到这样的兄弟， element.find_next_sibling()返回一个元素或None 。  
 演示：  
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <div>
...     <h3>First header</h3>
...     <div>First div to go with a header</div>
...     <h3>Second header</h3>
...     <div>Second div to go with a header</div>
... </div>
... ''')
>>> for header in soup.select('div h3'):
...     next_div = header.find_next_sibling('div')
...     print(header.text, next_div.text)
... 
First header First div to go with a header
Second header Second div to go with a header
Find all headers, and grab the next sibling from there: 
for header in soup.select('div h3'):
    next_div = header.find_next_sibling('div')
 
element.find_next_sibling() returns an element or None if no such sibling can be found. 
Demo: 
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <div>
...     <h3>First header</h3>
...     <div>First div to go with a header</div>
...     <h3>Second header</h3>
...     <div>Second div to go with a header</div>
... </div>
... ''')
>>> for header in soup.select('div h3'):
...     next_div = header.find_next_sibling('div')
...     print(header.text, next_div.text)
... 
First header First div to go with a header
Second header Second div to go with a header