用Jsoup打开连接,获取状态码并解析文档

编程入门 行业动态 更新时间:2024-10-26 04:22:21
本文介绍了用Jsoup打开连接,获取状态码并解析文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在使用 jsoup 创建一个类,它将执行以下操作:

I'm creating a class using jsoup that will do the following:

  • 构造函数打开一个到 url 的连接.
  • 我有一个方法可以检查页面的状态.即 200、404 等.
  • 我有一个方法可以解析页面并返回一个 url 列表.#
  • 以下是我正在尝试做的粗略工作,不是很粗略,因为我一直在尝试很多不同的事情

    Below is a rough working of what I am trying to do, not its very rough as I've been trying a lot of different things

    public class ParsePage { private String path; Connection.Response response = null; private ParsePage(String langLocale){ try { response = Jsoup.connect(path) .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21") .timeout(10000) .execute(); } catch (IOException e) { System.out.println("io - "+e); } } public int getSitemapStatus(){ int statusCode = response.statusCode(); return statusCode; } public ArrayList<String> getUrls(){ ArrayList<String> urls = new ArrayList<String>(); } }

    如您所见,我可以获取页面状态,但是使用构造函数中已经打开的连接我不知道如何获取要解析的文档,我尝试使用:

    As you can see I can get the page status, but using the already open connection from the constructor I don't know how to get the document to parse, I tried using:

    Document doc = connection.get();

    但那是不行的.有什么建议?或者有更好的方法来解决这个问题?

    But that's a no go. Any suggestions? Or better ways to go about this?

    推荐答案

    如 Connection.Response 类型,有一个 parse() 方法将响应的主体解析为 Document 并返回它.当你拥有它时,你可以用它做任何你想做的事情.

    As stated in the JSoup Documentation for the Connection.Response type, there is a parse() method that parse the response's body as a Document and returns it. When you have that, you can do whatever you want with it.

    例如看getUrls()

    public class ParsePage { private String path; Connection.Response response = null; private ParsePage(String langLocale){ try { response = Jsoup.connect(path) .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21") .timeout(10000) .execute(); } catch (IOException e) { System.out.println("io - "+e); } } public int getSitemapStatus() { int statusCode = response.statusCode(); return statusCode; } public ArrayList<String> getUrls() { ArrayList<String> urls = new ArrayList<String>(); Document doc = response.parse(); // do whatever you want, for example retrieving the <url> from the sitemap for (Element url : doc.select("url")) { urls.add(url.select("loc").text()); } return urls; } }

    更多推荐

    用Jsoup打开连接,获取状态码并解析文档

    本文发布于:2023-11-16 12:57:39,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1604487.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:状态   文档   Jsoup

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!