以编程方式读取网页并提取一些信息(read a webpage programmatically and extract some information)

编程入门 行业动态 更新时间:2024-10-23 04:52:58
以编程方式读取网页并提取一些信息(read a webpage programmatically and extract some information)

我想以编程方式访问网页并从中提取一些信息。

我想通过Java代码登录某个网站,让服务器感觉这个请求实际上来自真正的浏览器。

我几乎就是一个问题:网站需要一个parameter - "sessid"传递parameter - "sessid"以随每个请求一起传递。

例如,当我第一次访问页面时, sessid=90334而在下一页,它就像sessid=78204 。

因此,我传递的url应包含sessid的值,否则验证将失败: www.somesite.com/somepage.php?sessid=75749 ? sessid 。

该网页包含一个<input>标记,其中包含sessid的值,我必须检索该标记的值。

我怎样才能做到这一点? 标签是这样的:

<input type="hidden" name="sessid" value="69529">

我可以使用以下代码成功阅读整个网页:

BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream())); StringBuilder response = new StringBuilder(); String line; while ((line = rd.readLine()) != null) { response.append(line); }

I want to access a webpage programmatically and extract some information from it.

I want to log in to some website through Java code and make the server feel that the request is actually coming from a real browser.

I am almost there albeit one problem: the website requires a parameter - "sessid" to be passed with to be passed with every request which keeps on changing with every request.

For e.g when I first access the page the sessid=90334 and at the next page its like sessid=78204.

Therefore the url I pass should contain the value of sessid otherwise the authentication fails: www.somesite.com/somepage.php?sessid=75749.

The webpage contains one <input> tag which holds the value of sessid and i have to retrieve the value of that tag.

How can i do that? The tag is like this:

<input type="hidden" name="sessid" value="69529">

I am able to read the whole webpage successfully using the following code:

BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream())); StringBuilder response = new StringBuilder(); String line; while ((line = rd.readLine()) != null) { response.append(line); }

最满意答案

您可以使用StringBuilder类的indexOf方法:

String startInputFragment = "<input type=\"hidden\" name=\"sessid\" value=\""; int startIdx = response.indexOf(startInputFragment); if (startIdx >= 0) { int endIdx = response.indexOf("\">", startIdx); String val = response.substring(startIdx + startInputFragment.length(), endIdx); System.out.println("-->" + val + "<--"); } else { //tag not found: you may throw an ex or do something else }

You can use indexOf method of StringBuilder class:

String startInputFragment = "<input type=\"hidden\" name=\"sessid\" value=\""; int startIdx = response.indexOf(startInputFragment); if (startIdx >= 0) { int endIdx = response.indexOf("\">", startIdx); String val = response.substring(startIdx + startInputFragment.length(), endIdx); System.out.println("-->" + val + "<--"); } else { //tag not found: you may throw an ex or do something else }

更多推荐

本文发布于:2023-04-29 10:09:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1335936.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:方式   网页   信息   read   information

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!