BeautifulSoup:如何让div标签的孩子(BeautifulSoup: how to get children of div tab)

编程入门 行业动态 更新时间:2024-10-27 11:27:56
BeautifulSoup:如何让div标签的孩子(BeautifulSoup: how to get children of div tab)

这是我的代码。

import requests from bs4 import BeautifulSoup res = requests.get('http://www.snapdeal.com/products/computers-laptops?sort=plrty&') soup = BeautifulSoup(res.text) price = soup.find_all('div', class_="product-price").children

我想从本网站刮数据,但该div没有类,这就是为什么我不知道该怎么做,然后我发现你可以找到div标签的孩子,但它也没有工作,我试图获取所有标签。

Here is my code.

import requests from bs4 import BeautifulSoup res = requests.get('http://www.snapdeal.com/products/computers-laptops?sort=plrty&') soup = BeautifulSoup(res.text) price = soup.find_all('div', class_="product-price").children

I want to scrape data from this website but that div doesn't have class that is why I don't know how to do that then I found that you can find children of div tag but it is also not working and I'm trying to get all tag.

最满意答案

有多种方法可以获得理想的价格值。

您可以使用CSS选择器并获得每个具有product-price类的div的第一个孩子:

for price in soup.select("div.product-price > div:nth-of-type(1)"): print price.get_text(strip=True)

这将打印:

Rs 33490Rs 42990(22%) Rs 26799Rs 31500(15%) ... Rs 41790Rs 44990(7%) Rs 48000Rs 50000(4%)

nth-of-type文档参考 。

请注意,它与实际价格一起包含了删除线字体的上一个价格。 为了摆脱它,通过使用find() text=True和recursive=False从div获取顶级文本:

for price in soup.select("div.product-price > div:nth-of-type(1)"): print price.find(text=True, recursive=False).strip()

打印:

Rs 33490 Rs 26799 ... Rs 41790 Rs 48000

你可以走得更远,并在开始时省略Rs并获得int(或float)价格值:

for div in soup.select("div.product-price > div:nth-of-type(1)"): price = div.find(text=True, recursive=False).strip() price = float(price.replace("Rs ", "")) print price

打印:

33490.0 26799.0 ... 41790.0 48000.0

There are multiple ways to get the desired price values.

You can use a CSS selector and get the first child of every div having product-price class:

for price in soup.select("div.product-price > div:nth-of-type(1)"): print price.get_text(strip=True)

This would print:

Rs 33490Rs 42990(22%) Rs 26799Rs 31500(15%) ... Rs 41790Rs 44990(7%) Rs 48000Rs 50000(4%)

nth-of-type documentation reference.

Note that along with an actual price it contains the previous price which is on the strikethrough font. To get rid of it, get only top level text from the div by using find() with text=True and recursive=False:

for price in soup.select("div.product-price > div:nth-of-type(1)"): print price.find(text=True, recursive=False).strip()

Prints:

Rs 33490 Rs 26799 ... Rs 41790 Rs 48000

You can go further and omit the Rs at the beginning and get the int (or float) price values:

for div in soup.select("div.product-price > div:nth-of-type(1)"): price = div.find(text=True, recursive=False).strip() price = float(price.replace("Rs ", "")) print price

Prints:

33490.0 26799.0 ... 41790.0 48000.0

更多推荐

本文发布于:2023-07-05 08:52:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1035442.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:标签   孩子   div   BeautifulSoup   children

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!