puppeteer 被重定向时浏览器不是

编程入门 行业动态 更新时间:2024-10-25 16:27:11
本文介绍了puppeteer 被重定向时浏览器不是的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

正在尝试测试页面

我会尝试调查这个奇怪的请求是否合法以及为什么它会在 chrome puppeteer 上重定向

这个帖子可能会有所帮助,可能有一些与铬相关的内容被视为不安全

我也尝试将 args: ['--disable-web-security', '--allow-running-insecure-content'] 传递给 launch() 对象参数,但没有结果

请告诉我们进展如何!发现 Har 很有趣!

Attempting to test page publicindex.sccourts/anderson/publicindex/ When navigating with standard browser to the page, the navigation ends at the requested page (publicindex.sccourts/anderson/publicindex/) with the page displaying an "accept" button.

However, when testing with puppeteer in headless mode, the request is redirected to publicindex.sccourts.

I have a rough idea of what is occuring, but can not seem to prevent the redirection to publicindex.sccourts when the page is requested using puppeteer. here is what I believe is occuring with the user controlled browser:

  • request for page is sent. (assuming first visit)

  • the response is pure JS,

  • The js code specifies to:

    copy the initial page request headers

    add a specific header, and re-request the same page (xhr)

    copies a url from one of the response headers and replaces the location

    (or)

    checks the page history,

    adds the url from the response to page to history,

    opens a new window,

    writes the xhr response to the new page

    closes the new window

    adds an event listener for a function in the returned xhr request

    fires the event

  • With puppeteer I have tried tracing the js, recording har, monitoring cookies, watching the request chain, intercepting page requests and adjusting headers,watching history....etc. I'm stumped. Here is the most basic version of the puppeteer script:

    function run () { let url = 'publicindex.sccourts/anderson/publicindex/'; const puppeteer = require('puppeteer'); const PuppeteerHar = require('puppeteer-har'); puppeteer.launch({headless: true}).then(async browser => { const page = await browser.newPage(); await page.setJavaScriptEnabled(true); await page.setViewport({width: 1920, height: 1280}); await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'); const har = new PuppeteerHar(page); await har.start({path: 'results.har'}); const response = await page.goto(url); await page.waitForNavigation(); await har.stop(); let bodyHTML = await page.content(); console.log(bodyHTML); }); }; run();

    why can I not get puppeteer to simply replicate the process that is being executed by js when I am navigating to the page in chrome, and end navigation on the "accept" page?

    here is a version with more verbose logging:

    function run () { let url = 'publicindex.sccourts/anderson/publicindex/'; const puppeteer = require('puppeteer'); const PuppeteerHar = require('puppeteer-har'); puppeteer.launch().then(async browser => { const page = await browser.newPage(); await page.setJavaScriptEnabled(true); await page.setViewport({width:1920,height:1280}); await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'); await page.setRequestInterception(true); page.on('frameattached', frame =>{ console.log('frame attached ');}); page.on('framedetached', frame =>{ console.log('frame detached ');}); page.on('framenavigated', frame =>{ console.log('frame navigated '); }); page.on('requestfailed', req =>{ console.log('request failed ');}); page.on('requestfinished', req =>{ console.log('frame finished '); console.log(req.url())}); let count = 0; let headers = ''; page.on('request', interceptedRequest => { console.log('requesting ' + count + 'times'); console.log('request for ' + interceptedRequest.url()); console.log(interceptedRequest); if (count>2) { interceptedRequest.abort(); return; } if (interceptedRequest.url() == url) { count++; if (count == 1) { const headers = interceptedRequest.headers(); headers['authority'] = 'publicindex.sccourts'; headers['sec-fetch-dest'] = 'empty'; headers['sec-fetch-mode'] = 'cors'; headers['sec-fetch-site'] = 'same-origin'; headers['upgrade-insecure-requests'] = '1'; interceptedRequest.continue({headers}); return; } else { interceptedRequest.continue(); return; } } count++; interceptedRequest.continue(); return; }); const har = new PuppeteerHar(page); await har.start({ path: 'results.har' }); await page.tracing.start({path: 'trace.json'}); await Promise.all([page.coverage.startJSCoverage({reportAnonymousScripts : true})]); const response = await page.goto(url); const session = await page.target().createCDPSession(); await session.send('Page.enable'); await session.send('Page.setWebLifecycleState', {state: 'active'}); const jsCoverage = await Promise.all([page.coverage.stopJSCoverage()]); console.log(jsCoverage); const chain = response.request().redirectChain(); console.log(chain + "\n\n"); await page.waitForNavigation(); await har.stop(); let bodyHTML = await page.content(); console.log(bodyHTML); }); }; run();

    解决方案

    I don't have a full resolution but I know where the redirection is happening.

    I tested your script locally with below:

    const puppeteer = require('puppeteer'); const PuppeteerHar = require('puppeteer-har'); function run () { let url = 'publicindex.sccourts/anderson/publicindex/'; puppeteer.launch({headless: false, devtools: true }).then(async browser => { const page = await browser.newPage(); await page.setRequestInterception(true); page.on('request', request => { console.log('GOT NEW REQUEST', request.url()); request.continue(); }); page.on('response', response => { console.log('GOT NEW RESPONSE', response.status(), response.headers()); }); await page.setJavaScriptEnabled(true); await page.setViewport({width: 1920, height: 1280}); await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'); const har = new PuppeteerHar(page); await har.start({path: 'results.har'}); const response = await page.goto(url); await page.waitForNavigation(); await har.stop(); let bodyHTML = await page.content(); }); }; run();

    I edited three parts:

    • Removed headless mode and open the devtools automatically
    • Intercept all network requests (that I audited)
    • Hoisted require import because it hurts my eyes. I always see them call without nesting

    Turns out the page publicindex.sccourts/anderson/publicindex/ make a request to publicindex.sccourts/

    However this request returns a 302 Redirect to www.sccourts/caseSearch/ location, so the browser acts accordingly

    I would try to investigate this weird request if it is legit or not and why it redirects on chrome puppeteer

    This post might help, there could be something related on chromium being seen as insecure

    I also tried to pass args: ['--disable-web-security', '--allow-running-insecure-content'] to launch() object parameter, but without results

    Please let us know how it goes! Har has been fun to discover!

    更多推荐

    puppeteer 被重定向时浏览器不是

    本文发布于:2023-10-31 19:12:39,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1547019.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:重定向   浏览器   puppeteer

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!