如何向 Scrapy CrawlSpider 请求添加标头?

编程入门 行业动态 更新时间:2024-10-25 02:28:20
本文介绍了如何向 Scrapy CrawlSpider 请求添加标头?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在使用 CrawlSpider 类来抓取网站,我想修改每个请求中发送的标头.具体来说,我想将引用添加到请求中.

I'm working with the CrawlSpider class to crawl a website and I would like to modify the headers that are sent in each request. Specifically, I would like to add the referer to the request.

根据这个问题,我检查了

response.request.headers.get('Referer', None)

在我的响应解析函数中,Referer 标头不存在.我认为这意味着请求中没有提交 Referer(除非网站没有返回它,我不确定).

in my response parsing function and the Referer header is not present. I assume that means the Referer is not being submitted in the request (unless the website doesn't return it, I'm not sure on that).

我一直无法弄清楚如何修改请求的标头.同样,我的蜘蛛是从 CrawlSpider 派生的.覆盖 CrawlSpider 的 _requests_to_follow 或为规则指定 process_request 回调将不起作用,因为此时引用者不在范围内.

I haven't been able to figure out how to modify the headers of a request. Again, my spider is derived from CrawlSpider. Overriding CrawlSpider's _requests_to_follow or specifying a process_request callback for a rule will not work because the referer is not in scope at those times.

有人知道怎么动态修改请求头吗?

Does anyone know how to modify request headers dynamically?

推荐答案

我不想回答我自己的问题,但我找到了如何去做.您必须启用 SpiderMiddleware 来填充引用以进行响应.请参阅文档了解scrapy.contrib.spidermiddleware.referer.RefererMiddleware

I hate to answer my own question, but I found out how to do it. You have to enable the SpiderMiddleware that will populate the referer for responses. See the documentation for scrapy.contrib.spidermiddleware.referer.RefererMiddleware

简而言之,您需要将此中间件添加到您项目的设置文件中.

In short, you need to add this middleware to your project's settings file.

SPIDER_MIDDLEWARES = { 'scrapy.contrib.spidermiddleware.referer.RefererMiddleware': True, }

然后在您的响应解析方法中,您可以使用 response.request.headers.get('Referrer', None) 来获取引用者.

Then in your response parsing method you can use, response.request.headers.get('Referrer', None), to get the referer.

如果你马上理解这些中间件,再读一遍,休息一下,然后再读一遍.我发现它们非常令人困惑.

If you understand these middlewares right away, read them again, take a break, and then read them again. I found them to be very confusing.

更多推荐

如何向 Scrapy CrawlSpider 请求添加标头?

本文发布于:2023-10-29 10:23:36,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1539441.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:Scrapy   CrawlSpider

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!