Dryscrape:使用xpath从父节点列表中抓取子节点数据(Dryscrape: scrape child node data from parent node list using xpath)
我试图用dryscrape和python来学习http://quotes.toscrape.com/用于学习目的。 我能够通过class =“quote”获得所有div。 想要使用class =“quote”循环遍历div列表,并使用xpath从此父元素中获取多个数据。
import dryscrape from bs4 import BeautifulSoup session = dryscrape.Session() url = 'http://quotes.toscrape.com/' print 'Visiting the URL...' session.visit(url) print 'Status: ', session.status_code() for div in session.xpath("//div[@class='quote']"): # please help me to scrape author and quote for each div elementsI was trying to scrape http://quotes.toscrape.com/ using dryscrape and python for learning purpose. I was able to get all divs with class="quote". Would like to loop through the list of divs with class="quote" and get multiple data from this parent element using xpath.
import dryscrape from bs4 import BeautifulSoup session = dryscrape.Session() url = 'http://quotes.toscrape.com/' print 'Visiting the URL...' session.visit(url) print 'Status: ', session.status_code() for div in session.xpath("//div[@class='quote']"): # please help me to scrape author and quote for each div elements最满意答案
import requests from bs4 import BeautifulSoup url = 'http://quotes.toscrape.com/' r = requests.get(url) soup = BeautifulSoup(r.text) for div in soup.findAll("div", {"class": "quote"}): print('Quote : ' + div.find('span').get_text()) print('Author : ' + div.find('small').get_text())We can loop through each xpath elements and those will be objects having the content of individual elements. Each objects will have methods to get the data.
import dryscrape session = dryscrape.Session() url = 'http://quotes.toscrape.com/' print 'Visiting the URL...' session.visit(url) print 'Status: ', session.status_code() for div in session.xpath("//div[@class='quote']"): print "Quote: ", div.at_xpath(".//span").text() print "Author: ", div.at_xpath(".//small").text()更多推荐
发布评论