scrapy使用代理ip的报错问题！！！

编程入门行业动态更新时间:2024-10-22 12:28:37

scrapy使用代理ip的<a href=https://www.elefans.com/category/jswz/34/1771188.html style= 报错问题！！！"/>

scrapy使用代理ip的报错问题！！！

当我用scrapy使用代理爬取网站的时候，出现了一些错误，想要分享一下。

第一个出错：

Connection to the other side was lost in a non-clean fashion: Connection lost.

当我搜索这个时候，解决方案便是在seetings.py中增加user-agent。

但毕竟bug这种东西千奇百怪，回到正题，我使用了代理，如果是头文件可能出错的话，那我就找一下装有请求头的代码。

发现

if 'proxy' not in request.meta or self.current_proxy.is_expiring:print(request.meta)self.update_proxy()request.meta['proxy'] = self.current_proxy.proxy

感觉这里面有猫腻，果然用print的方式调试，到这边代码就卡出了，然后，额。。。。。

好，介绍一下request.meta:

meta是一个字典，它的主要作用是传递数据。它包含了本次HTTP请求的HEADERS信息，Ip、user-agent和cookie等，都包括在里面，知道了这些就算是足够了，就是meta里面没有proxy，所以会报错，改成

request.meta['REMOTE_ADDR'] = self.current_proxy.proxy

程序又可以开始跑了。

Traceback (most recent call last):File "g:\python_learn\python_setup\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacksresult = g.send(result)File "g:\python_learn\python_setup\lib\site-packages\scrapy\core\downloader\middleware.py", line 56, in process_response(six.get_method_self(method).__class__.__name__, type(response)))
scrapy.exceptions._InvalidOutput: Middleware IPProxyDownloadMiddleware.process_response must return Response or Request, got <class 'NoneType'>
2019-09-10 11:42:58 [scrapy.core.scraper] ERROR: Error downloading <GET />
Traceback (most recent call last):File "g:\python_learn\python_setup\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacksresult = g.send(result)File "g:\python_learn\python_setup\lib\site-packages\scrapy\core\downloader\middleware.py", line 44, in process_requestdefer.returnValue((yield download_func(request=request, spider=spider)))File "g:\python_learn\python_setup\lib\site-packages\twisted\internet\defer.py", line 1362, in returnValueraise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <200 />

这个bug确实是自己很傻批

def process_response(self,request,response,spider):print(response.status)if response.status != 200 or 'captcha' in response.url:print(response.status)if not self.current_proxy.blacked:self.current_proxy.blacked = Trueself.update_proxy()print('%s代理失效' % self.current_proxy.proxy)request.meta['proxy'] = self.current_proxy.proxyprint(request)return requestreturn response

主要是由于我原来的return 写错了地方，看来下次的细心了。上面的代码是正确的。主要为了检验代理ip是否可以爬取此网站。

3.pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')

产生这个的原因主要是因为，字符串插入到pymysql会出现报错。

也就是这段代码：

insert into meishi(name,phone,address) values(%s,%s,%s)

改进后便可以插入字符串了：

insert into meishi(name,phone,address) values('"+name[x]+"','"+phone[x]+"','"+address[x]+"')

day3 2019/9/10 坚持坚持坚持

更多推荐

scrapy使用代理ip的报错问题！！！

本文发布于:2023-07-28 22:06:53，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1336373.html

报错 scrapy ip

上一篇：在OpenERP中单击树视图的事件(Click event of tree view in OpenERP)
下一篇：通过单击扩展内容(Expanding the content by click it)

发布评论取消回复

评论列表（有 0 条评论）

scrapy使用代理ip的报错问题！！！

scrapy使用代理ip的报错问题！！！

发布评论取消回复

最近发表

热门文章

标签列表