执行页面的javascript后保存页面的html输出

编程入门 行业动态 更新时间:2024-10-19 12:36:26
本文介绍了执行页面的javascript后保存页面的html输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

有一个我试图抓取的网站,首先加载一个html / js 使用js然后POST修改表单输入字段。 如何获得POSTed页面的最终html输出?

There is a site I am trying to scrape, that first loads an html/js modifies the form input fields using js and then POSTs. How can I get the final html output of the POSTed page?

我试图用phantomjs做这个,但它似乎只有渲染选项图像文件。谷歌搜索表明它应该是可能的,但我无法弄清楚如何。我的尝试:

I tried to do this with phantomjs, but it seems to only have an option to render image files. Googling around suggests it should be possible , but I can't figure out how. My attempt:

var page = require('webpage').create(); var fs = require('fs'); page.open('www.somesite/page.aspx', function () { page.evaluate(function(){ }); page.render('export.png'); fs.write('1.html', page.content, 'w'); phantom.exit(); });

此代码将用于客户端,我不能指望他安装太多包( nodejs,casperjs等)

This code will be used for a client, I can't expect him to install too many packages (nodejs , casperjs etc)

谢谢

推荐答案

输出代码你是正确的,但同步性存在问题。在页面加载完成之前,您正在执行输出行。 您可以绑定onLoadFinished回调以查明何时发生。请参阅下面的完整代码。

the output code you have is correct, but there is an issue with synchronicity. The output lines that you have are being executed before the page is done loading. You can tie into the onLoadFinished Callback to find out when that happens. See full code below.

var page = new WebPage() var fs = require('fs'); page.onLoadFinished = function() { console.log("page load finished"); page.render('export.png'); fs.write('1.html', page.content, 'w'); phantom.exit(); }; page.open("www.google", function() { page.evaluate(function() { }); });

当使用像谷歌这样的网站时,它可能是欺骗性的,因为它加载速度更快,你可以经常像你一样执行屏幕内联。时间在phantomjs中是一件棘手的事情,有时我会用setTimeout测试时间是否有问题。

When using a site like google, it can be deceiving because it loads so quicker, that you can often execute a screengrab inline like you have it. Timing is a tricky thing in phantomjs, sometimes I test with setTimeout to see if timing is an issue.

更多推荐

执行页面的javascript后保存页面的html输出

本文发布于:2023-10-08 01:31:56,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1471121.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:页面   javascript   html

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!