正在尝试测试页面
我会尝试调查这个奇怪的请求是否合法以及为什么它会在 chrome puppeteer 上重定向
这个帖子可能会有所帮助,可能有一些与铬相关的内容被视为不安全
我也尝试将 args: ['--disable-web-security', '--allow-running-insecure-content'] 传递给 launch() 对象参数,但没有结果
请告诉我们进展如何!发现 Har 很有趣!
Attempting to test page publicindex.sccourts/anderson/publicindex/ When navigating with standard browser to the page, the navigation ends at the requested page (publicindex.sccourts/anderson/publicindex/) with the page displaying an "accept" button.
However, when testing with puppeteer in headless mode, the request is redirected to publicindex.sccourts.
I have a rough idea of what is occuring, but can not seem to prevent the redirection to publicindex.sccourts when the page is requested using puppeteer. here is what I believe is occuring with the user controlled browser:
request for page is sent. (assuming first visit)
the response is pure JS,
The js code specifies to:
copy the initial page request headers
add a specific header, and re-request the same page (xhr)
copies a url from one of the response headers and replaces the location
(or)
checks the page history,
adds the url from the response to page to history,
opens a new window,
writes the xhr response to the new page
closes the new window
adds an event listener for a function in the returned xhr request
fires the event
With puppeteer I have tried tracing the js, recording har, monitoring cookies, watching the request chain, intercepting page requests and adjusting headers,watching history....etc. I'm stumped. Here is the most basic version of the puppeteer script:
function run () { let url = 'publicindex.sccourts/anderson/publicindex/'; const puppeteer = require('puppeteer'); const PuppeteerHar = require('puppeteer-har'); puppeteer.launch({headless: true}).then(async browser => { const page = await browser.newPage(); await page.setJavaScriptEnabled(true); await page.setViewport({width: 1920, height: 1280}); await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'); const har = new PuppeteerHar(page); await har.start({path: 'results.har'}); const response = await page.goto(url); await page.waitForNavigation(); await har.stop(); let bodyHTML = await page.content(); console.log(bodyHTML); }); }; run();
why can I not get puppeteer to simply replicate the process that is being executed by js when I am navigating to the page in chrome, and end navigation on the "accept" page?
here is a version with more verbose logging:
function run () { let url = 'publicindex.sccourts/anderson/publicindex/'; const puppeteer = require('puppeteer'); const PuppeteerHar = require('puppeteer-har'); puppeteer.launch().then(async browser => { const page = await browser.newPage(); await page.setJavaScriptEnabled(true); await page.setViewport({width:1920,height:1280}); await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'); await page.setRequestInterception(true); page.on('frameattached', frame =>{ console.log('frame attached ');}); page.on('framedetached', frame =>{ console.log('frame detached ');}); page.on('framenavigated', frame =>{ console.log('frame navigated '); }); page.on('requestfailed', req =>{ console.log('request failed ');}); page.on('requestfinished', req =>{ console.log('frame finished '); console.log(req.url())}); let count = 0; let headers = ''; page.on('request', interceptedRequest => { console.log('requesting ' + count + 'times'); console.log('request for ' + interceptedRequest.url()); console.log(interceptedRequest); if (count>2) { interceptedRequest.abort(); return; } if (interceptedRequest.url() == url) { count++; if (count == 1) { const headers = interceptedRequest.headers(); headers['authority'] = 'publicindex.sccourts'; headers['sec-fetch-dest'] = 'empty'; headers['sec-fetch-mode'] = 'cors'; headers['sec-fetch-site'] = 'same-origin'; headers['upgrade-insecure-requests'] = '1'; interceptedRequest.continue({headers}); return; } else { interceptedRequest.continue(); return; } } count++; interceptedRequest.continue(); return; }); const har = new PuppeteerHar(page); await har.start({ path: 'results.har' }); await page.tracing.start({path: 'trace.json'}); await Promise.all([page.coverage.startJSCoverage({reportAnonymousScripts : true})]); const response = await page.goto(url); const session = await page.target().createCDPSession(); await session.send('Page.enable'); await session.send('Page.setWebLifecycleState', {state: 'active'}); const jsCoverage = await Promise.all([page.coverage.stopJSCoverage()]); console.log(jsCoverage); const chain = response.request().redirectChain(); console.log(chain + "\n\n"); await page.waitForNavigation(); await har.stop(); let bodyHTML = await page.content(); console.log(bodyHTML); }); }; run();
解决方案
I don't have a full resolution but I know where the redirection is happening.
I tested your script locally with below:
const puppeteer = require('puppeteer'); const PuppeteerHar = require('puppeteer-har'); function run () { let url = 'publicindex.sccourts/anderson/publicindex/'; puppeteer.launch({headless: false, devtools: true }).then(async browser => { const page = await browser.newPage(); await page.setRequestInterception(true); page.on('request', request => { console.log('GOT NEW REQUEST', request.url()); request.continue(); }); page.on('response', response => { console.log('GOT NEW RESPONSE', response.status(), response.headers()); }); await page.setJavaScriptEnabled(true); await page.setViewport({width: 1920, height: 1280}); await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'); const har = new PuppeteerHar(page); await har.start({path: 'results.har'}); const response = await page.goto(url); await page.waitForNavigation(); await har.stop(); let bodyHTML = await page.content(); }); }; run();I edited three parts:
- Removed headless mode and open the devtools automatically
- Intercept all network requests (that I audited)
- Hoisted require import because it hurts my eyes. I always see them call without nesting
Turns out the page publicindex.sccourts/anderson/publicindex/ make a request to publicindex.sccourts/
However this request returns a 302 Redirect to www.sccourts/caseSearch/ location, so the browser acts accordingly
I would try to investigate this weird request if it is legit or not and why it redirects on chrome puppeteer
This post might help, there could be something related on chromium being seen as insecure
I also tried to pass args: ['--disable-web-security', '--allow-running-insecure-content'] to launch() object parameter, but without results
Please let us know how it goes! Har has been fun to discover!
更多推荐
puppeteer 被重定向时浏览器不是
发布评论