TreeBuilder获取嵌入式节点

编程入门 行业动态 更新时间:2024-10-28 12:19:45
本文介绍了TreeBuilder获取嵌入式节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 基本上,我需要从HTML代码中获取所有这些人的姓名和电子邮件。

< thead> ; < tr> < th scope =colclass =rgHeaderstyle =text-align:center;> Name< th>< th scope =colclass =rgHeaderstyle = text-align:center;>电子邮件地址< / th>< th scope =colclass =rgHeaderstyle =text-align:center;> School Phone< / th> < / tr> < / thead>< tbody> < tr class =rgRowid =ctl00_ContentPlaceHolder1_rg_People_ctl00__0> < td> Michael Bowen < / td>< td> mbowen@cpcisd< / td>< td> 903-488-3671 ext3200< / td> < / tr>< tr class =rgAltRowid =ctl00_ContentPlaceHolder1_rg_People_ctl00__1> < td> Christian Calixto < / td>< td> calixtoc@cpcisd< / td>< td> 903-488-3671 x 3430< / td> < / tr>< tr class =rgRowid =ctl00_ContentPlaceHolder1_rg_People_ctl00__2> < td> Rachel Claxton < / td>< td> claxtonr@cpcisd< td>< td> 903-488-3671 x 3450< / td> < / tr> < / tbody> < / table>< input id =ctl00_ContentPlaceHolder1_rg_People_ClientStatename =ctl00_ContentPlaceHolder1_rg_People_ClientStatetype =hiddenautocomplete =off> < / DIV> < br>

我知道如何在节点等处使用treebuilder,并且我在一些我的脚本。

my($ file)= @_; my $ html = path($ file) - >啜; my $ tree = HTML :: TreeBuilder-> new_from_content($ html); my @nodes = $ tree-> look_down(_tag =>'input'); my $ val; foreach my $ node(@nodes){ $ val = $ node-> look_down('name',qr / \ $ txt_Website /) - > attr('value'); } return $ val;

我打算为这个函数使用相同的代码,但是我意识到我没有因为< td> 标签在脚本中的很多其他地方。我相信有更好的方法来处理这个问题,但我似乎无法找到它。

链接到HTML代码: pastebin/qLwu80ZW

我的代码: pastebin/wGb0eXmM

注意:我确实尽可能在谷歌上查看,但我不太确定我应该搜索什么。

包含所需数据的表元素具有唯一的类 rgMasterTable 所以你可以在 look_down

中搜索。它直接从你的pastebin中提取HTML。

use strict; 使用警告'all'; 使用LWP :: Simple'get'; 使用HTML :: TreeBuilder; 使用常量URL => pastebin/raw/qLwu80ZW; my $ tree = HTML :: TreeBuilder-> new_from_content(get URL); my($ table)= $ tree-> look_down(_tag =>'table',class =>'rgMasterTable'); for my $ tr($ table-> look_down(_tag =>'tr')){ next,除非我的@td = $ tr-> look_down (_tag =>'td'); my($ name,$ email)= map {$ _-> as_trimmed_text} @td [0,1]; printf%-17s%s \\\,$ name,$ email; }

输出

Michael Bowen mbowen@cpcisd Christian Calixto calixtoc@cpcisd Rachel Claxton claxtonr@cpcisd

Basically, I need to get the names and emails from all of these people in the HTML code.

<thead> <tr> <th scope="col" class="rgHeader" style="text-align:center;">Name</th><th scope="col" class="rgHeader" style="text-align:center;">Email Address</th><th scope="col" class="rgHeader" style="text-align:center;">School Phone</th> </tr> </thead><tbody> <tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__0"> <td> Michael Bowen </td><td>mbowen@cpcisd</td><td>903-488-3671 ext3200</td> </tr><tr class="rgAltRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__1"> <td> Christian Calixto </td><td>calixtoc@cpcisd</td><td>903-488-3671 x 3430</td> </tr><tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__2"> <td> Rachel Claxton </td><td>claxtonr@cpcisd</td><td>903-488-3671 x 3450</td> </tr> </tbody> </table><input id="ctl00_ContentPlaceHolder1_rg_People_ClientState" name="ctl00_ContentPlaceHolder1_rg_People_ClientState" type="hidden" autocomplete="off"> </div> <br>

I know how to use treebuilder with the nodes and such, and I'm using this code in some of my script.

my ($file) = @_; my $html = path($file)-> slurp; my $tree = HTML::TreeBuilder->new_from_content($html); my @nodes = $tree->look_down(_tag => 'input'); my $val; foreach my $node (@nodes) { $val = $node->look_down('name', qr/\$txt_Website/)->attr('value'); } return $val;

I was going to use the same code for this function, but I realized that I don't have much to search for, since the <td> tag is in so many other places in the script. I'm sure there's a better way to approach this problem, but I can't seem to find it.

LINK TO HTML CODE: pastebin/qLwu80ZW

MY CODE: pastebin/wGb0eXmM

Note: I did look on google as much as possible, but I'm not quite sure what I should search for.

解决方案

The table element that encloses the data you need has a unique class rgMasterTable so you can search for that in look_down

I've written this to demonstrate. It pulls the HTML directly from your pastebin

use strict; use warnings 'all'; use LWP::Simple 'get'; use HTML::TreeBuilder; use constant URL => 'pastebin/raw/qLwu80ZW'; my $tree = HTML::TreeBuilder->new_from_content(get URL); my ($table) = $tree->look_down(_tag => 'table', class => 'rgMasterTable'); for my $tr ( $table->look_down(_tag => 'tr') ) { next unless my @td = $tr->look_down(_tag => 'td'); my ($name, $email) = map { $_->as_trimmed_text } @td[0,1]; printf "%-17s %s\n", $name, $email; }

output

Michael Bowen mbowen@cpcisd Christian Calixto calixtoc@cpcisd Rachel Claxton claxtonr@cpcisd

更多推荐

TreeBuilder获取嵌入式节点

本文发布于:2023-11-03 13:30:52,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:节点   嵌入式   TreeBuilder

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!