这是我的代码。
import requests from bs4 import BeautifulSoup res = requests.get('http://www.snapdeal.com/products/computers-laptops?sort=plrty&') soup = BeautifulSoup(res.text) price = soup.find_all('div', class_="product-price").children我想从本网站刮数据,但该div没有类,这就是为什么我不知道该怎么做,然后我发现你可以找到div标签的孩子,但它也没有工作,我试图获取所有标签。
Here is my code.
import requests from bs4 import BeautifulSoup res = requests.get('http://www.snapdeal.com/products/computers-laptops?sort=plrty&') soup = BeautifulSoup(res.text) price = soup.find_all('div', class_="product-price").childrenI want to scrape data from this website but that div doesn't have class that is why I don't know how to do that then I found that you can find children of div tag but it is also not working and I'm trying to get all tag.
最满意答案
有多种方法可以获得理想的价格值。
您可以使用CSS选择器并获得每个具有product-price类的div的第一个孩子:
for price in soup.select("div.product-price > div:nth-of-type(1)"): print price.get_text(strip=True)这将打印:
Rs 33490Rs 42990(22%) Rs 26799Rs 31500(15%) ... Rs 41790Rs 44990(7%) Rs 48000Rs 50000(4%)nth-of-type文档参考 。
请注意,它与实际价格一起包含了删除线字体的上一个价格。 为了摆脱它,通过使用find() text=True和recursive=False从div获取顶级文本:
for price in soup.select("div.product-price > div:nth-of-type(1)"): print price.find(text=True, recursive=False).strip()打印:
Rs 33490 Rs 26799 ... Rs 41790 Rs 48000你可以走得更远,并在开始时省略Rs并获得int(或float)价格值:
for div in soup.select("div.product-price > div:nth-of-type(1)"): price = div.find(text=True, recursive=False).strip() price = float(price.replace("Rs ", "")) print price打印:
33490.0 26799.0 ... 41790.0 48000.0There are multiple ways to get the desired price values.
You can use a CSS selector and get the first child of every div having product-price class:
for price in soup.select("div.product-price > div:nth-of-type(1)"): print price.get_text(strip=True)This would print:
Rs 33490Rs 42990(22%) Rs 26799Rs 31500(15%) ... Rs 41790Rs 44990(7%) Rs 48000Rs 50000(4%)nth-of-type documentation reference.
Note that along with an actual price it contains the previous price which is on the strikethrough font. To get rid of it, get only top level text from the div by using find() with text=True and recursive=False:
for price in soup.select("div.product-price > div:nth-of-type(1)"): print price.find(text=True, recursive=False).strip()Prints:
Rs 33490 Rs 26799 ... Rs 41790 Rs 48000You can go further and omit the Rs at the beginning and get the int (or float) price values:
for div in soup.select("div.product-price > div:nth-of-type(1)"): price = div.find(text=True, recursive=False).strip() price = float(price.replace("Rs ", "")) print pricePrints:
33490.0 26799.0 ... 41790.0 48000.0更多推荐
发布评论