用于复杂 html 的 hpple XPath 查询

编程入门 行业动态 更新时间:2024-10-28 06:21:11
本文介绍了用于复杂 html 的 hpple XPath 查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个复杂的 html 文件,需要在 Objective-C 中进行解析.html看起来像

I have a complex html file that I need to parse in Objective-C. The html looks like

<HTML> <TABLE width="100%" border="0" cellpadding="0" cellspacing="0"> <tr> <td width="10" align="left" valign="top"><img src="www.indianrail.gov.in/main_text_left_top2.gif" alt="" width="8" height="8"></td> <td width="100%" align="left" valign="top" class="text_rail_top"><img src="www.indianrail.gov.in/blank.gif" alt="" width="1" height="8"></td> <td width="10" align="right" valign="top"><img src="www.indianrail.gov.in/main_text_rgt_top2.gif"alt="" width="8" height="8" ></td> </tr> <tr> <td height="400" align="right" valign="top" class="text_rail_left"></td> <td width="100%" align="left" valign="top" class="text_back_color"><table border="0" cellPadding="0" cellSpacing="0" width="100%"><tr> <td align="left" valign="top"><table width="100%" border="0" cellspacing="2" cellpadding="0"><tr> <td align="middle"> <FONT SIZE = "1">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Indian Railways Online Website: <b><a TITLE = "Passenger Reservation System - CONCERT" href="www.indianrail.gov.in/index.html" target="_blank">www.indianrail.gov.in</b></a> designed and hosted by CRIS.</FONT> </td></tr></table></td> </tr><tr> <td align="left" valign="top"><table width="100%" border="0" cellspacing="2" cellpadding="0"> <tr> <td><table border="0" width="100%" /></td> </tr> <tr> <td align="center" valign="top" class="inside_heading_text" colspan="4"><br />Trains Between A Pair of Stations </td> </tr> <td colspan="4"> </td> </tr> <tr> <td colspan="4" align="center" valign="top" class="Enq_heading"> You Queried For <SCRIPT LANGUAGE="JavaScript" SRC= "/js/inet_srcdest.js"> function getCookie(www.indianrail.gov.in/tbisip_400x400.htm)</SCRIPT> <link href="www.indianrail.gov.in/cris_google.css" media="all" rel="Stylesheet" type="text/css" /> <script language ="JavaScript"> var searchQuery ='MUMBAI CENTRAL DELHI ' </script><FORM NAME="Accavl" METHOD="POST" ACTION="www.indianrail.gov.in/cgi_bin/inet_accavl_cgi1.cgi"> <TR> <TD valign="top"><table width="98%" border="0" align="center" cellpadding="3" cellspacing="1" class="table_border"> <TR class="heading_table_top"> <TH>Origin</TH> <TH>Destination</TH> </TR> <TR> <TD class="table_border_both">MUMBAI CENTRAL -[BCT ]</TD> <TD class="table_border_both">DELHI -[DLI ]</TD> </TR> </TABLE> </TD></TR> <TR><td> </td></TR> <TR> <td class="main_text">Enter Quota:</td> <td><SELECT NAME="lccp_quota" SIZE="1" > <OPTION VALUE="CK">Tatkal Quota <OPTION VALUE="LD">Ladies Quota <OPTION VALUE="DF">Defence Quota <OPTION VALUE="FT">Foreign Tourist Quota <OPTION VALUE="SS">Lower Berth Quota$ <OPTION VALUE="YU">Yuva Quota <OPTION VALUE="DP">Duty Pass Quota <OPTION VALUE="HP">Handicaped Quota <OPTION VALUE="PH">Parliament House <OPTION selected VALUE="GN">General Quota </SELECT></TD></tr> <tr> <td class="main_text">Journey Date:</td><td><INPUT NAME="lccp_day" SIZE="2" VALUE="11" onchange="return changedate()"><SELECT NAME="lccp_month" SIZE="1" onClick="return changedate()"><OPTION selected VALUE="5">May<OPTION VALUE="6">Jun<OPTION VALUE="7">Jul</SELECT></TD></tr><INPUT TYPE="HIDDEN" NAME="lccp_classopt" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class1" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class2" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class3" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class4" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class5" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class6" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class7" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class8" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class9" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_cls10" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_age" SIZE="2" VALUE="ADULT_AGE"><tr> <td>&nbsp;</td><td><INPUT TYPE="Button" CLASS="btn_style" NAME="lccp_submitacc" ONCLICK="return submitavailability(0)" VALUE="Get Availability">&nbsp;<INPUT TYPE="Button" CLASS="btn_style" NAME="lccp_submitfare" ONCLICK="return submitfare(0)" VALUE="Get Full Fare">&nbsp;<INPUT TYPE="Button" CLASS="btn_style" NAME="lccp_submitpath" ONCLICK="return submitroute(0)" VALUE="Get Schedule">&nbsp;<INPUT TYPE="BUTTON" CLASS="btn_style" NAME="lccp_submitrun" ONCLICK="return submitrun(0)" VALUE="Get Running Status"></td></tr></table><br> <TABLE BORDER ALIGN=center><TABLE width="98%" border="1" bordercolor="#993300" align="center" cellpadding="3" cellspacing="1" class="table_border_both_left"><tr class="heading_table_top"> <TH ROWSPAN = 2 width="9%" >Train No.</TH> <TH ROWSPAN = 2 width="20%" >Train Name</TH> <TH ROWSPAN = 2 width="15%" >Origin</TH> <TH ROWSPAN = 2 width="8%" >Dep.Time</TH> <TH ROWSPAN = 2 width="14%" >Destination</TH> <TH ROWSPAN = 2 width="7%" >Arr.Time</TH> <TH COLSPAN = 7 width="7%" >Days Of Run</TH> <TH COLSPAN = 10 width="7%">Classes</TH> </TR> <TR class="heading_table_top"> <TH width="3%">M</TH> <TH width="3%">T</TH> <TH width="3%">W</TH> <TH width="3%">T</TH> <TH width="3%">F</TH> <TH width="3%">S</TH> <TH width="3%">S</TH> <TH width="3%">1A</TH> <TH width="3%">2A</TH> <TH width="3%">FC</TH> <TH width="3%">3A</TH> <TH width="3%">CC</TH> <TH width="3%">SL</TH> <TH width="3%">2S</TH> <TH width="3%">3E</TH> </TR> <TR><TD><INPUT TYPE="RADIO" NAME="lccp_trndtl" VALUE="19019BDTSNZM YYYYYYYY "ONCLICK="return farefill('19019BDTSNZM YYYYYYYY ','19019','BDTS',0,0,1,0,1,0,1,0,0,0,0)" CHECKED>19019</TD> <TD ALIGN =Center TITLE = " Please look the following same trains list also "><A HREF="#SAMETRN">+DEHRADUN EXP </A><A NAME="BACKSAMETRN"></A> <TD ALIGN =Center TITLE="Station CodeBDTS">BANDRA TERMINUS</TD> <TD ALIGN = Center>00:05</TD> <TD ALIGN = Center TITLE="Station Code NZM ">H NIZAMUDDIN </TD> <TD ALIGN = Center>05:25</TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD>-</TD> <TD><INPUT TYPE="RADIO" Name="lccp_class2" VALUE="2A" ONCLICK="return deselectclass(1,0,1,0,1,0,1,0,0,0,0,'N','Y','N','N','N','N','N','N','N','N')" CHECKED> <TD>-</TD> <TD><INPUT TYPE="RADIO" Name="lccp_class4" VALUE="3A" ONCLICK="return deselectclass(1,0,1,0,1,0,1,0,0,0,0,'N','N','N','Y','N','N','N','N','N','N')"> <TD>-</TD> <TD><INPUT TYPE="RADIO" Name="lccp_class6" VALUE="SL" ONCLICK="return deselectclass(1,0,1,0,1,0,1,0,0,0,0,'N','N','N','N','N','Y','N','N','N','N')"> <TD>-</TD> <TD>-</TD> </TR></FONT> <TR><TD><INPUT TYPE="RADIO" NAME="lccp_trndtl" VALUE="19023BCT NDLSYYYYYYYY "ONCLICK="return farefill('19023BCT NDLSYYYYYYYY ','19023','BCT ',0,0,0,0,0,0,2,1,0,0,0)">19023</TD> <TD ALIGN =Center TITLE = " Please look the following same trains list also "><A HREF="#SAMETRN">+FZR JANATA EXP </A><A NAME="BACKSAMETRN"></A> <TD ALIGN =Center TITLE="Station CodeBCT ">MUMBAI CENTRAL </TD> <TD ALIGN = Center>07:25</TD> <TD ALIGN = Center TITLE="Station Code NDLS">NEW DELHI </TD> <TD ALIGN = Center>12:45</TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD><FONT COLOR = green><B>Y</B></TD> <TD>-</TD> <TD>-</TD> <TD>-</TD> <TD>-</TD> <TD>-</TD> <TD><INPUT TYPE="RADIO" Name="lccp_class6" VALUE="SL" ONCLICK="return deselectclass(2,0,0,0,0,0,2,1,0,0,0,'N','N','N','N','N','Y','N','N','N','N')"> <TD><INPUT TYPE="RADIO" Name="lccp_class7" VALUE="2S" ONCLICK="return deselectclass(2,0,0,0,0,0,2,1,0,0,0,'N','N','N','N','N','N','Y','N','N','N')"> <TD>-</TD> </TR></FONT> </TABLE> </BODY> </HTML>

我想使用 hpple 解析 html 以获取以下输出

I want to parse the html using hpple for the following output

19019 BANDRA TERMINUS 00:05 H NIZAMUDDIN 05:25 2A 3A SL 19023 MUMBAI CENTRAL 07:25 NEW DELHI 12:45 SL 2S

我从以下 xpath 查询开始

I started with the following xpath query

NSString *tutorialsXpathQueryString = @"//table[@class='table_border_both_left']//td";

但它返回了许多结果并且难以进一步解析.有人可以帮助我进行 xpath 查询,以便我可以更有效地解析它.

But it returns way to many results and is difficult to parse further. Can someone help me with the xpath query so I can parse this more efficiently.

谢谢!

推荐答案

你可以用这个来定位表格行:

You can locate table rows with this:

List<WebElement> tableRows = findElements(By.xpath("//TABLE[@class='table_border_both_left']//tr[not(@class='heading_table_top')]"));

在一行中找到预期的数据:

In a row find the expected data :

for (WebElement row : tableRows) { String trainNo = row.findElement(By.xpath("td[1]")).getText(); //or use xpath : td[1]/text() String origin = row.findElement(By.xpath("td[3]")).getText(); //or use xpath : td[3]/text() String deptTime = row.findElement(By.xpath("td[4]")).getText(); //or use xpath : td[4]/text() String destination = row.findElement(By.xpath("td[5]")).getText(); //or use xpath : td[5]/text() String arrTime = row.findElement(By.xpath("td[6]")).getText(); //or use xpath : td[6]/text() List<WebElement> radioButtons = row.findElements(By.xpath("td//input[not(@name='lccp_trndtl')]")); // or use xpath : //TABLE[@class='table_border_both_left']//tr[not(@class='heading_table_top')]//td//input[not(@name='lccp_trndtl')]//@value for (WebElement radio : radioButtons) { String value = radio.getAttribute("value"); } }

抱歉我的代码,但我在 Java 中使用 Selenium WebDriver.我希望给定的 xpath 表达式会有用.

Sorry for my code but I'm using Selenium WebDriver in Java. I hope the given xpath expressions will be useful.

更多推荐

用于复杂 html 的 hpple XPath 查询

本文发布于:2023-10-17 02:44:51,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1499539.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:html   hpple   XPath

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!