CasperJS和PhantomJS触发“站点离线”浏览器没有(CasperJS and PhantomJS trigger “site is offline” browser doesn'

编程入门 行业动态 更新时间:2024-10-09 08:34:56
CasperJS和PhantomJS触发“站点离线”浏览器没有(CasperJS and PhantomJS trigger “site is offline” browser doesn't)

所以我试图刮一个网站( https://shop.advanceautoparts.com/ ),我可以通过CasperJS在过去几周内正常访问它。 当我现在尝试这样做时(就像2天前一样)我收到一条奇怪的消息,说该网站处于离线状态:

当我试用普通浏览器或PhantomJS时,我得到了正常的网站。 我尝试在不同的计算机上进行,更改我的IP,更改用户代理但没有任何作用。

编辑

在PhantomJS上尝试相同的事情后,运行代码大约5次后我得到了相同的消息。 这是网站正在采取哪些措施来防止刮擦?

So I'm trying to scrape a site (https://shop.advanceautoparts.com/) and I could access it normally for the past couple of weeks through CasperJS. When I try to do it now (as of like 2 days ago) I get an odd message saying that the website is offline:

When I try it off a normal browser or PhantomJS, I get the normal site. I've tried doing it off different computers, changing my IP, changing the User agent but nothing works.

EDIT

After trying the same thing on PhantomJS, after running the code about 5 times I got the same message. Is this something the site is doing to prevent scraping?

最满意答案

我怀疑该网站知道你正在根据你的用户代理进行抓取,因为你多次点击它

也许尝试随意使用你的用户,看看会发生什么。 ( 见这里的清单 )

var casper = require('casper').create({ pageSettings: { userAgent: "USE SOME OTHER USER AGENT HERE" } });

但是,在多个同时请求之后,该站点也可能被IP地址阻止。 因此,也尝试a)减慢脚本速度或b)导航到不同的页面

编辑

我把一个测试脚本拼凑起来,一切都适合我。 重要的是:

casper.waitUntilVisible("#header-top", function() {

在此处输入图像描述

HTH

I suspect the site knows you are scraping based on your user agent as you are hitting it mutltiple times

Maybe try randomising your useragent and seeing what happens. (see list here)

var casper = require('casper').create({ pageSettings: { userAgent: "USE SOME OTHER USER AGENT HERE" } });

However the site might also be blocking by IP address after a number of simultaneous requests. Therefore also try a) slowing down your script or b) navigating to different pages

EDIT

I have knocked together a testing script and all works for me. The important bit is:

casper.waitUntilVisible("#header-top", function() {

enter image description here

HTH

更多推荐

本文发布于:2023-08-07 11:50:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1464154.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:离线   浏览器   站点   PhantomJS   CasperJS

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!