我目前正在研究一个项目,该项目需要分析特定产品的评论并获得有关该产品的总体思路.
I'm currently working on a research project which need to analyze reviews of a particular product and get an overall idea about the product.
我听说亚马逊是获得产品评论/评论的好地方.有没有办法通过API从Amazon检索这些用户评论/评论?我尝试了几种python代码,但没有用.如果没有API可以检索数据,我是否需要编写蜘蛛程序?
I heard that amazon is a good place to get product reviews/comments. Is there any way to retrieve those user reviews/comments from Amazon via an API?? I tried several python codes but it doesn't work.. Do i need to write a spider if there is no API to retrieve data?
是否有任何方法/场所来检索给定产品的用户评论?
Are there any approaches/places to retrieve user reviews for a given product?
推荐答案www.Scrapehero上有一个很好的教程,介绍如何抓取Amazon产品详细信息:如何使用Python抓取Amazon产品详细信息和定价
www.Scrapehero has a great tutorial on how to scrape Amazon product details: How To Scrape Amazon Product Details and Pricing using Python
他们使用的完整纯文本代码是...产品由其ASIN标识,因此请将数组值更改为您感兴趣的产品.
The complete plain text code they use is ... Products are identified by their ASIN so change the array values to the products you are interested in watching.
from lxml import html import csv,os,json import requests from exceptions import ValueError from time import sleep def AmzonParser(url): headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'} page = requests.get(url,headers=headers) while True: sleep(3) try: doc = html.fromstring(page.content) XPATH_NAME = '//h1[@id="title"]//text()' XPATH_SALE_PRICE = '//span[contains(@id,"ourprice") or contains(@id,"saleprice")]/text()' XPATH_ORIGINAL_PRICE = '//td[contains(text(),"List Price") or contains(text(),"M.R.P") or contains(text(),"Price")]/following-sibling::td/text()' XPATH_CATEGORY = '//a[@class="a-link-normal a-color-tertiary"]//text()' XPATH_AVAILABILITY = '//div[@id="availability"]//text()' RAW_NAME = doc.xpath(XPATH_NAME) RAW_SALE_PRICE = doc.xpath(XPATH_SALE_PRICE) RAW_CATEGORY = doc.xpath(XPATH_CATEGORY) RAW_ORIGINAL_PRICE = doc.xpath(XPATH_ORIGINAL_PRICE) RAw_AVAILABILITY = doc.xpath(XPATH_AVAILABILITY) NAME = ' '.join(''.join(RAW_NAME).split()) if RAW_NAME else None SALE_PRICE = ' '.join(''.join(RAW_SALE_PRICE).split()).strip() if RAW_SALE_PRICE else None CATEGORY = ' > '.join([i.strip() for i in RAW_CATEGORY]) if RAW_CATEGORY else None ORIGINAL_PRICE = ''.join(RAW_ORIGINAL_PRICE).strip() if RAW_ORIGINAL_PRICE else None AVAILABILITY = ''.join(RAw_AVAILABILITY).strip() if RAw_AVAILABILITY else None if not ORIGINAL_PRICE: ORIGINAL_PRICE = SALE_PRICE if page.status_code!=200: raise ValueError('captha') data = { 'NAME':NAME, 'SALE_PRICE':SALE_PRICE, 'CATEGORY':CATEGORY, 'ORIGINAL_PRICE':ORIGINAL_PRICE, 'AVAILABILITY':AVAILABILITY, 'URL':url, } return data except Exception as e: print e def ReadAsin(): # AsinList = csv.DictReader(open(os.path.join(os.path.dirname(__file__),"Asinfeed.csv"))) AsinList = ['B0046UR4F4', 'B00JGTVU5A', 'B00GJYCIVK', 'B00EPGK7CQ', 'B00EPGKA4G', 'B00YW5DLB4', 'B00KGD0628', 'B00O9A48N2', 'B00O9A4MEW', 'B00UZKG8QU',] extracted_data = [] for i in AsinList: url = "www.amazon/dp/"+i print "Processing: "+url extracted_data.append(AmzonParser(url)) sleep(5) f=open('data.json','w') json.dump(extracted_data,f,indent=4) if __name__ == "__main__": ReadAsin()更多推荐
检索特定产品的亚马逊评论
发布评论