admin管理员组文章数量:1624341
# 解决爬虫download不能尽早执行的问题(前几分钟一直在请求url返回url,没有到达数据库的操作);优化请求顺序;
spider文件:
方法:priority=number (默认为0,越大优先级越大)
def parse(self, response):
res = response.selector.re('<a><span>(.*?)</span></a>')
for val in res:
val = quote(val)
# range(1,61)
for i in range(1,60):
url = f'https://fe-api.zhaopin/c/i/sou?start={60*i}&pageSize=60&cityId=530&workExperience=-1&education=-1&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw={val}&kt=3&lastUrlQuery=%7B%22p%22:{i},%22pageSize%22:%2260%22,%22jl%22:%22530%22,%22kw%22:%22{val}%22,%22kt%22:%223%22%7D&at=54721ddd55fd4f8ca9f2080ab3dfb7ea&_v=0.64103108'
# Request请求中priority(优先级)默认值是0,越大优先级越大,允许是负值
yield scrapy.Request(url = url,callback=self.parseone)
def parseone(self,response):
# 最后一个请求,之后用来下载数据存入数据库
res = json.loads(response.text)['data']['results']
for i in res:
url = 'https://jobs.zhaopin/' + i['number'] + '.htm'
print(url)
# 提高优先级,让队列中的请求尽早提前到达存储数据库这一步;
yield scrapy.Request(url = url,callback=self.parsetwo,priority=10)
def parsetwo(self,response):
jobname = response.xpath('/html/body/div[1]/div[3]/div[4]/div/ul/li[1]/h1/text()').extract_first()
time = response.xpath('/html/body/div[1]/div[3]/div[4]/div/ul/li[2]/div[1]/span/span/text()').extract_first()
url = 'https://www.zhaopin/'
salary = response.xpath('/html/body/div[1]/div[3]/div[4]/div/ul/li[1]/div[1]/strong/text()').extract_first()
station = response.xpath('/html/body/div[1]/div[3]/div[4]/div/ul/li[2]/div[2]/span[1]/a/text()').extract_first()
degree = response.xpath('/html/body/div[1]/div[3]/div[4]/div/ul/li[2]/div[2]/span[3]/text()').extract_first()
experience = response.xpath('/html/body/div[1]/div[3]/div[4]/div/ul/li[2]/div[2]/span[2]/text()').extract_first()
desc = response.xpath("//div[@class='responsibility pos-common']//text()").getall()
desc = ''.join(i.strip() for i in desc )
item = LiepinItem()
item['jobname'] = str(jobname)
item['time'] = time
item['url'] = url
item['salary'] = salary
item['station'] = station
item['degree'] = degree
item['experience'] = experience
item['desc'] = desc
return item
版权声明:本文标题:Scrapy请求顺序优化priority(优先级) 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://www.elefans.com/xitong/1728897390a1178585.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论