我知道如何在节点等处使用treebuilder,并且我在一些我的脚本。
my($ file)= @_; my $ html = path($ file) - >啜; my $ tree = HTML :: TreeBuilder-> new_from_content($ html); my @nodes = $ tree-> look_down(_tag =>'input'); my $ val; foreach my $ node(@nodes){ $ val = $ node-> look_down('name',qr / \ $ txt_Website /) - > attr('value'); } return $ val;我打算为这个函数使用相同的代码,但是我意识到我没有因为< td> 标签在脚本中的很多其他地方。我相信有更好的方法来处理这个问题,但我似乎无法找到它。
链接到HTML代码: pastebin/qLwu80ZW
我的代码: pastebin/wGb0eXmM
注意:我确实尽可能在谷歌上查看,但我不太确定我应该搜索什么。
包含所需数据的表元素具有唯一的类 rgMasterTable 所以你可以在 look_down
中搜索。它直接从你的pastebin中提取HTML。
use strict; 使用警告'all'; 使用LWP :: Simple'get'; 使用HTML :: TreeBuilder; 使用常量URL => pastebin/raw/qLwu80ZW; my $ tree = HTML :: TreeBuilder-> new_from_content(get URL); my($ table)= $ tree-> look_down(_tag =>'table',class =>'rgMasterTable'); for my $ tr($ table-> look_down(_tag =>'tr')){ next,除非我的@td = $ tr-> look_down (_tag =>'td'); my($ name,$ email)= map {$ _-> as_trimmed_text} @td [0,1]; printf%-17s%s \\\,$ name,$ email; }输出
Michael Bowen mbowen@cpcisd Christian Calixto calixtoc@cpcisd Rachel Claxton claxtonr@cpcisd
Basically, I need to get the names and emails from all of these people in the HTML code.
<thead> <tr> <th scope="col" class="rgHeader" style="text-align:center;">Name</th><th scope="col" class="rgHeader" style="text-align:center;">Email Address</th><th scope="col" class="rgHeader" style="text-align:center;">School Phone</th> </tr> </thead><tbody> <tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__0"> <td> Michael Bowen </td><td>mbowen@cpcisd</td><td>903-488-3671 ext3200</td> </tr><tr class="rgAltRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__1"> <td> Christian Calixto </td><td>calixtoc@cpcisd</td><td>903-488-3671 x 3430</td> </tr><tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__2"> <td> Rachel Claxton </td><td>claxtonr@cpcisd</td><td>903-488-3671 x 3450</td> </tr> </tbody> </table><input id="ctl00_ContentPlaceHolder1_rg_People_ClientState" name="ctl00_ContentPlaceHolder1_rg_People_ClientState" type="hidden" autocomplete="off"> </div> <br>I know how to use treebuilder with the nodes and such, and I'm using this code in some of my script.
my ($file) = @_; my $html = path($file)-> slurp; my $tree = HTML::TreeBuilder->new_from_content($html); my @nodes = $tree->look_down(_tag => 'input'); my $val; foreach my $node (@nodes) { $val = $node->look_down('name', qr/\$txt_Website/)->attr('value'); } return $val;I was going to use the same code for this function, but I realized that I don't have much to search for, since the <td> tag is in so many other places in the script. I'm sure there's a better way to approach this problem, but I can't seem to find it.
LINK TO HTML CODE: pastebin/qLwu80ZW
MY CODE: pastebin/wGb0eXmM
Note: I did look on google as much as possible, but I'm not quite sure what I should search for.
解决方案The table element that encloses the data you need has a unique class rgMasterTable so you can search for that in look_down
I've written this to demonstrate. It pulls the HTML directly from your pastebin
use strict; use warnings 'all'; use LWP::Simple 'get'; use HTML::TreeBuilder; use constant URL => 'pastebin/raw/qLwu80ZW'; my $tree = HTML::TreeBuilder->new_from_content(get URL); my ($table) = $tree->look_down(_tag => 'table', class => 'rgMasterTable'); for my $tr ( $table->look_down(_tag => 'tr') ) { next unless my @td = $tr->look_down(_tag => 'td'); my ($name, $email) = map { $_->as_trimmed_text } @td[0,1]; printf "%-17s %s\n", $name, $email; }output
Michael Bowen mbowen@cpcisd Christian Calixto calixtoc@cpcisd Rachel Claxton claxtonr@cpcisd
更多推荐
TreeBuilder获取嵌入式节点
发布评论