使用Jsoup从URL解析信息(Parsing Information from URL Using Jsoup)

编程入门 行业动态 更新时间:2024-10-21 11:43:45
使用Jsoup从URL解析信息(Parsing Information from URL Using Jsoup)

我需要使用Jsoup帮助我的Java项目(如果你认为有更有效的方法来实现这个目的,请告诉我)。 我的程序的目的是解析来自不同URL的某些有用信息并将其放在文本文件中。 我不是HTML或JavaScript方面的专家,因此,我很难用Java编码我要解析的内容。 在您在下面的代码中看到的网站作为示例之一,我在Jsoup中解析的信息就是您在“路由”(路由,位置,船只/航程,集装箱到达日期,集装箱出发日期; =原产地,西雅图SSA航站楼T18,66月15日A,26 6月15日......等等)。 到目前为止,使用Jsoup我们只能解析网站的标题,但我们没有成功获得任何身体。 这是我使用的代码,我从在线来源获得:

import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class Jsouptest71115 { public static void main(String[] args) throws Exception { String url = "http://google.com/gentrack/trackingMain.do " + "?trackInput01=999061985"; Document document = Jsoup.connect(url).get(); String title = document.title(); System.out.println("title : " + title); String body = document.select("body").text(); System.out.println("Body: " + body); } }

I need help with my Java project using Jsoup (if you think there is a more efficient way to achieve the purpose, please let me know). The purpose of my program is to parse certain useful information from different URLs and put it in a text file. I am not an expert in HTML or JavaScript, therefore, it has been difficult for me to code in Java exactly what I want to parse. In the website that you see in the code below as one of the examples, the information that interests me to parse with Jsoup is everything you can see in the table under “Routing”(Route, Location, Vessel/Voyage, Container Arrival Date, Container Departure Date; = Origin, Seattle SSA Terminal T18, 26 Jun 15 A, 26 Jun 15 A… and so on). So far, with Jsoup we are only able to parse the title of the website, yet we have been unsuccessful in getting any of the body. Here is the code that I used, which I got from an online source:

import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class Jsouptest71115 { public static void main(String[] args) throws Exception { String url = "http://google.com/gentrack/trackingMain.do " + "?trackInput01=999061985"; Document document = Jsoup.connect(url).get(); String title = document.title(); System.out.println("title : " + title); String body = document.select("body").text(); System.out.println("Body: " + body); } }

最满意答案

工作代码:

import org.jsoup.Connection; import org.jsoup.Jsoup; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; import java.util.ArrayList; public class Sample { public static void main(String[] args) { String url = "http://homeport8.apl.com/gentrack/blRoutingPopup.do"; try { Connection.Response response = Jsoup.connect(url) .data("blNbr", "999061985") // tracking number .method(Connection.Method.POST) .execute(); Element tableElement = response.parse().getElementsByTag("table") .get(2).getElementsByTag("table") .get(2); Elements trElements = tableElement.getElementsByTag("tr"); ArrayList<ArrayList<String>> tableArrayList = new ArrayList<>(); for (Element trElement : trElements) { ArrayList<String> columnList = new ArrayList<>(); for (int i = 0; i < 5; i++) { columnList.add(i, trElement.children().get(i).text()); } tableArrayList.add(columnList); } System.out.println("Origin/Location: " +tableArrayList.get(1).get(1));// row and column number System.out.println("Discharge Port/Container Arrival Date: " +tableArrayList.get(5).get(3)); } catch (IOException e) { e.printStackTrace(); } } }

输出:

产地/地点:西雅图西雅图SSA TERMINAL(T18)

卸货港/集装箱到货日期:19 Jul 15 E.

Working code:

import org.jsoup.Connection; import org.jsoup.Jsoup; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; import java.util.ArrayList; public class Sample { public static void main(String[] args) { String url = "http://homeport8.apl.com/gentrack/blRoutingPopup.do"; try { Connection.Response response = Jsoup.connect(url) .data("blNbr", "999061985") // tracking number .method(Connection.Method.POST) .execute(); Element tableElement = response.parse().getElementsByTag("table") .get(2).getElementsByTag("table") .get(2); Elements trElements = tableElement.getElementsByTag("tr"); ArrayList<ArrayList<String>> tableArrayList = new ArrayList<>(); for (Element trElement : trElements) { ArrayList<String> columnList = new ArrayList<>(); for (int i = 0; i < 5; i++) { columnList.add(i, trElement.children().get(i).text()); } tableArrayList.add(columnList); } System.out.println("Origin/Location: " +tableArrayList.get(1).get(1));// row and column number System.out.println("Discharge Port/Container Arrival Date: " +tableArrayList.get(5).get(3)); } catch (IOException e) { e.printStackTrace(); } } }

Output:

Origin/Location: SEATTLE SSA TERMINAL (T18), WA  

Discharge Port/Container Arrival Date: 23 Jul 15  E

更多推荐

本文发布于:2023-08-07 09:44:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1463773.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:信息   URL   Jsoup   Parsing   Information

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!