使用 webscraping 提供数据框

编程入门行业动态更新时间:2024-10-25 00:31:54

本文介绍了使用 webscraping 提供数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我不想将一些抓取的值附加到数据帧中.我有这个代码:

I'mt trying to append some scraped values to a dataframe. I have this code:

import time import requests import pandas import pandas as pd from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.firefox.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdrivermon.by import By from selenium.webdriver.support import expected_conditions as EC import json # Grab content from URL url = "www.remax.pt/comprar?searchQueryState={%22regionName%22:%22%22,%22businessType%22:1,%22listingClass%22:1,%22page%22:1,%22sort%22:{%22fieldToSort%22:%22ContractDate%22,%22order%22:1},%22mapIsOpen%22:false,%22listingTypes%22:[],%22prn%22:%22%22}" PATH = 'C:\DRIVERS\chromedriver.exe' driver = webdriver.Chrome(PATH) option = Options() option.headless = False #chromedriver = #driver = webdriver.Chrome(chromedriver) #driver = webdriver.Firefox() #(options=option) #driver.get(url) #driver.implicitly_wait(10) # in seconds time.sleep(1) wait = WebDriverWait(driver, 10) driver.get(url) rows = driver.find_elements_by_xpath("//div[@class='row results-list ']/div") data=[] for row in rows: price=row.find_element_by_xpath(".//p[@class='listing-price']").text print(price) address=row.find_element_by_xpath(".//p[@class='listing-address']").text print(address) Tipo=row.find_element_by_xpath(".//p[@class='listing-type']").text print(Tipo) Area=row.find_element_by_xpath(".//p[@class='listing-area']").text print(Area) Quartos=row.find_element_by_xpath(".//p[@class='icon-bedroom-full']").text print(Quartos) data.append([price],[address],[Tipo],[Area],[Quartos]) #driver.quit()

问题在于它返回以下错误:

The problem is that it returns the following error:

NoSuchElementException Traceback (most recent call last) <ipython-input-16-9e4d01985cda> in <module> 49 price=row.find_element_by_xpath(".//p[@class='listing-price']").text 50 print(price) ---> 51 address=row.find_element_by_xpath(".//p[@class='listing-address']").text 52 print(address) 53 Tipo=row.find_element_by_xpath(".//p[@class='listing-type']").text ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in find_element_by_xpath(self, xpath) 349 element = element.find_element_by_xpath('//div/td[1]') 350 """ --> 351 return self.find_element(by=By.XPATH, value=xpath) 352 353 def find_elements_by_xpath(self, xpath): ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in find_element(self, by, value) 656 value = '[name="%s"]' % value 657 --> 658 return self._execute(Command.FIND_CHILD_ELEMENT, 659 {"using": by, "value": value})['value'] 660 ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in _execute(self, command, params) 631 params = {} 632 params['id'] = self._id --> 633 return self._parent.execute(command, params) 634 635 def find_element(self, by=By.ID, value=None): ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params) 319 response = selfmand_executor.execute(driver_command, params) 320 if response: --> 321 self.error_handler.check_response(response) 322 response['value'] = self._unwrap_value( 323 response.get('value', None)) ~\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response) 240 alert_text = value['alert'].get('text') 241 raise exception_class(message, screen, stacktrace, alert_text) --> 242 raise exception_class(message, screen, stacktrace) 243 244 def _value_or_default(self, obj, key, default): NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//p[@class='listing-address']"} (Session info: chrome=90.0.4430.72)

但是当我只尝试使用第一个元素时，它会返回一个价格列表.如果我在数据框中给它不同的位置并且我使用相同类型的路径有什么区别?

But when I try only with the first element it returns a list of prices. What is the difference if I'm giving it the differente places in the dataframe and I use the same type of path?

推荐答案

您遇到的主要问题是定位器.1 首先，比较我使用的定位器和您代码中的定位器.2 二、添加显式等待from selenium.webdriver.support import expected_conditions as EC3 三、去除不必要的代码.

The main problem you have are locators. 1 First, compare the locators I use and the ones in your code. 2 Second, Add explicit waits from selenium.webdriver.support import expected_conditions as EC 3 Third, remove unnecessary code.

from selenium import webdriver from selenium.webdrivermon.by import By from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver') url = "www.remax.pt/comprar?searchQueryState={%22regionName%22:%22%22,%22businessType%22:1,%22listingClass%22:1,%22page%22:1,%22sort%22:{%22fieldToSort%22:%22ContractDate%22,%22order%22:1},%22mapIsOpen%22:false,%22listingTypes%22:[],%22prn%22:%22%22}" driver.get(url) wait = WebDriverWait(driver, 10) wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='row results-list ']/div"))) rows = driver.find_elements_by_xpath("//div[@class='row results-list ']/div") data = [] for row in rows: price_p = row.find_element_by_xpath(".//p[@class='listing-price']").text address = row.find_element_by_xpath(".//h2[@class='listing-address']").text type = row.find_element_by_xpath(".//li[@class='listing-type']").text area = row.find_element_by_xpath(".//li[@class='listing-area']").text quartos = row.find_element_by_xpath(".//li[@class='listing-bedroom']").text data.append([price, address, price_p, area, quartos]) driver.close() driver.quit()

请注意，我是在 Linux 上完成的.您的 Chrome 驱动程序位置不同.另外，要打印列表，请使用:

Please note that I did in on Linux. Your Chrome driver location is different. Also, to print the list use:

for p in data: print(p.text, sep='\n')

您可以随意修改它.我收到以下输出:

You can modify it as you like. I received the following output:

['240 000 €', 'Lisboa - Lisboa, Carnide', 'Apartamento', '54 m\n2', '1'] ['280 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '80 m\n2', '1'] ['285 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '83 m\n2', '1'] ['290 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '85 m\n2', '1'] ['280 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '80 m\n2', '1'] ['290 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '85 m\n2', '1'] ['285 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '83 m\n2', '1'] ['80 000 €', 'Santarém - Cartaxo, Ereira e Lapa', 'Terreno', '12440 m\n2', '1'] ['260 000 €', 'Lisboa - Sintra, Queluz e Belas', 'Prédio', '454 m\n2', '1'] ['37 500 €', 'Santarém - Torres Novas, Torres Novas (Santa Maria, Salvador e Santiago)', 'Prédio', '92 m\n2', '1'] ['505 000 €', 'Lisboa - Sintra, Algueirão-Mem Martins', 'Duplex', '357 m\n2', '1'] ['135 700 €', 'Lisboa - Mafra, Milharado', 'Terreno', '310 m\n2', '1'] ['132 800 €', 'Lisboa - Mafra, Milharado', 'Terreno', '310 m\n2', '1'] ['133 440 €', 'Lisboa - Mafra, Milharado', 'Terreno', '310 m\n2', '1'] ['179 000 €', 'Lisboa - Mafra, Milharado', 'Terreno', '310 m\n2', '1'] ['75 000 €', 'Lisboa - Vila Franca de Xira, Vila Franca de Xira', 'Apartamento', '52 m\n2', '1'] ['575 000 €', 'Porto - Matosinhos, Matosinhos e Leça da Palmeira', 'Apartamento', '140 m\n2', '1'] ['35 000 €', 'Setúbal - Almada, Caparica e Trafaria', 'Outros - Habitação', '93 m\n2', '1'] ['550 000 €', 'Leiria - Alcobaça, Évora de Alcobaça', 'Moradia', '160 m\n2', '1'] ['550 000 €', 'Lisboa - Loures, Santa Iria de Azoia, São João da Talha e Bobadela', 'Moradia', '476 m\n2', '1']

更多推荐

使用 webscraping 提供数据框

本文发布于:2023-07-20 17:45:50，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1169673.html