是否有可能以某种方式将页面和浏览器从一个路由转移到另一个路由,同时保持 puppeteer 的并发性.如果全局设置该变量,页面和浏览器将被覆盖,多任务将无法运行.
Is it possible to somehow transfer page and browser from one route to another while maintaining puppeteer concurrency. If you set the variable globally, then the page and browser will be overwritten and multitasking will not work.
推荐答案一种方法是创建一个闭包,该闭包返回将解析为相同页面和浏览器实例的承诺.由于 HTTP 是无状态的,我假设您有一些会话/身份验证管理系统将用户的会话与 Puppeteer 浏览器实例相关联.
One approach is to create a closure that returns promises that will resolve to the same page and browser instances. Since HTTP is stateless, I assume you have some session/authentication management system that associates a user's session with a Puppeteer browser instance.
为了制作一个完整的、可运行的示例,我已经稍微简化了您的路线,并添加了一个简单的令牌管理系统来将用户与会话相关联,但我认为您在将其适应您的情况时不会遇到问题用例.
I've simplified your routes a bit and added a naive token management system to associate a user with a session in the interests of making a complete, runnable example but I don't think you'll have problems adapting it to your use case.
const express = require("express"); const puppeteer = require("puppeteer"); // stackoverflow/questions/51391080/handling-errors-in-express-async-middleware const asyncHandler = fn => (req, res, next) => Promise.resolve(fn(req, res, next)).catch(next) ; const startPuppeteerSession = async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); return {browser, page}; }; const sessions = {}; express() .use((req, res, next) => req.query.token === undefined ? res.sendStatus(401) : next() ) .get("/start", asyncHandler(async (req, res) => { sessions[req.query.token] = await startPuppeteerSession(); res.sendStatus(200); })) .get("/navigate", asyncHandler(async (req, res) => { const page = await sessions[req.query.token].page; await page.goto(req.query.to || "www.example"); res.sendStatus(200); })) .get("/content", asyncHandler(async (req, res) => { const page = await sessions[req.query.token].page; res.send(await page.content()); })) .get("/kill", asyncHandler(async (req, res) => { const browser = await sessions[req.query.token].browser; await browser.close(); delete sessions[req.query.token]; res.sendStatus(200); })) .use((err, req, res, next) => res.sendStatus(500)) .listen(8000, () => console.log("listening on port 8000")) ;客户角度的示例用法:
$ curl localhost:8000/start?token=1 OK $ curl 'localhost:8000/navigate?to=stackoverflow/questions/66935883&token=1' OK $ curl localhost:8000/content?token=1 | grep 'apsenT' <a href="/users/15547056/apsent">apsenT</a><span class="d-none" itemprop="name">apsenT</span> <a href="/users/15547056/apsent">apsenT</a> is a new contributor to this site. Take care in asking for clarification, commenting, and answering. <a href="/users/15547056/apsent">apsenT</a> is a new contributor. Be nice, and check out our <a href="/conduct">Code of Conduct</a>. $ curl localhost:8000/kill?token=1 OK您可以看到与令牌 1 关联的客户端已跨多个路由保持单个浏览器会话.其他客户端可以启动浏览器会话并同时操作它们.
You can see the client associated with token 1 has persisted a single browser session across multiple routes. Other clients can launch browser sessions and manipulate them simultaneously.
重申一下,这只是跨路由共享 Puppeteer 浏览器实例的概念验证.使用上面的代码,用户只需向 start 路由发送垃圾邮件并创建浏览器,直到服务器崩溃,因此这完全不适合没有真正的身份验证和会话管理/错误处理的生产环境.
To reiterate, this is only a proof-of-concept of sharing a Puppeteer browser instance across routes. Using the code above, a user can just spam the start route and create browsers until the server crashes, so this is totally unfit for production without real authentication and session management/error handling.
使用的包:express ^4.17.1,puppeteer ^8.0.0.
更多推荐
Puppeteer 与 Express Router Node JS 的并行性.如何在保持并发性的同时在路由之间传递页面
发布评论