使用Beautiful Soup找到第三个出现的``标签(Find third occurring `` tag using with Beautiful Soup)

编程入门 行业动态 更新时间:2024-10-28 12:19:41
使用Beautiful Soup找到第三个出现的`

`标签(Find third occurring `

` tag using with Beautiful Soup)

正如标题所示,我试图了解如何找到网站的第三个<p> (例如,我使用以下网站: http : //www.musicmeter.nl/album/31759 )。

使用这个问题的答案,我尝试了以下代码

from bs4 import BeautifulSoup import requests html = requests.get("http://www.musicmeter.nl/album/31759").text # get HTML from http://www.musicmeter.nl/album/31759 soup = BeautifulSoup(html, 'html5lib') # Get data out of HTML first_paragraph = soup.find('p') # or just soup.p print "first paragraph:", first_paragraph second_paragraph = first_paragraph.find_next_siblings('p') print "second paragraph:", second_paragraph third_paragraph = second_paragraph.find_next_siblings('p') print "third paragraph:", third_paragraph

但是这段代码会导致third_paragraph出现以下错误:

Traceback (most recent call last): File "page_109.py", line 21, in <module> third_paragraph = second_paragraph.find_next_siblings('p') AttributeError: 'ResultSet' object has no attribute 'find_next_siblings'

我试图查找错误,但我无法弄清楚出了什么问题。

As the title suggests, I'm trying to understand how to find the third occurring <p> of a website (as an example, I used the following website: http://www.musicmeter.nl/album/31759).

Using the answer to this question, I tried the following code

from bs4 import BeautifulSoup import requests html = requests.get("http://www.musicmeter.nl/album/31759").text # get HTML from http://www.musicmeter.nl/album/31759 soup = BeautifulSoup(html, 'html5lib') # Get data out of HTML first_paragraph = soup.find('p') # or just soup.p print "first paragraph:", first_paragraph second_paragraph = first_paragraph.find_next_siblings('p') print "second paragraph:", second_paragraph third_paragraph = second_paragraph.find_next_siblings('p') print "third paragraph:", third_paragraph

But this code results in the following error for the third_paragraph:

Traceback (most recent call last): File "page_109.py", line 21, in <module> third_paragraph = second_paragraph.find_next_siblings('p') AttributeError: 'ResultSet' object has no attribute 'find_next_siblings'

I tried to lookup the error, but I couldn't figure out what is wrong.

最满意答案

你正在使用兄弟姐妹,即复数,所以你得到一个ResultSet / list back,你不能调用.find_next_siblings

如果你想要每个下一段,你会使用兄弟姐妹而不是兄弟姐妹

second_paragraph = first_paragraph.find_next_sibling('p') print "second paragraph:", second_paragraph third_paragraph = second_paragraph.find_next_sibling('p')

哪个可以链接:

third_paragraph = soup.find("p").find_next_sibling('p').find_next_sibling("p")

一种更简单的方法是使用nth-of-type

print(soup.select_one("p:nth-of-type(3)"))

你还应该知道找到第三个发生的p与找到你在页面上找到的第一个p的第二个兄弟是不一样的,使用nth-of-type实际上确实找到了页面中的第三个p标签,如果是第一个p没有两个兄弟p标签,那么你的逻辑就会失败。

要使用find逻辑真正获得第三个p,请使用find_next

third_paragraph = soup.find("p").find_next('p').find_next("p")

如果你想要前三个使用find_all并将限制设置为3:

soup.find_all("p", limit=3)

使用原始逻辑获得前两个:

first_paragraph = soup.find('p') # or just soup.p second, third = first_paragraph.find_next_siblings("p", limit=2)

如果你只想要x标签然后只解析x标签,那么请确保你理解找到第三个发生的<p>标签和第二个兄弟标签到第一个p标签之间的区别,因为它们可能不同。

You are using siblings i.e plural so you are getting a ResultSet/list back which you cannot call .find_next_siblings on.

If you wanted each next paragraph you would use sibling not siblings:

second_paragraph = first_paragraph.find_next_sibling('p') print "second paragraph:", second_paragraph third_paragraph = second_paragraph.find_next_sibling('p')

Which can be chained:

third_paragraph = soup.find("p").find_next_sibling('p').find_next_sibling("p")

A much simpler way is to use nth-of-type:

print(soup.select_one("p:nth-of-type(3)"))

You should also be aware that finding the third occurring p is not the same as finding the 2nd sibling to the first p you find on the page, using nth-of-type actually does find the third p tag in the page, if the first p does not have two sibling p tags then your logic will fail.

To really get the third occurring p using find logic just use find_next:

third_paragraph = soup.find("p").find_next('p').find_next("p")

Of if you want the first three use find_all with a limit set to 3:

soup.find_all("p", limit=3)

Of using your original logic to get the first two:

first_paragraph = soup.find('p') # or just soup.p second, third = first_paragraph.find_next_siblings("p", limit=2)

If you only want x tags then only parse x tags, just be sure you understand the difference between finding the third occurring <p> tag and the 2nd sibling to the first p tag as they may be different.

更多推荐

本文发布于:2023-08-07 19:21:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1465683.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:第三个   标签   Soup   Beautiful   occurring

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!