从网页中提取价值

编程入门 行业动态 更新时间:2024-10-28 21:20:16
本文介绍了从网页中提取价值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个网站主页,我正在使用 Curl 阅读该主页,我需要获取该网站的页面数量.

信息在一个 div:-

<span class="当前页码">1</span><a href="/users?page=2" title="转到第 2 页"><span class="page-numbers">2</span></a><a href="/users?page=3" title="转到第 3 页"><span class="page-numbers">3</span></a><a href="/users?page=4" title="转到第 4 页"><span class="page-numbers">4</span></a><a href="/users?page=5" title="转到第 5 页"><span class="page-numbers">5</span></a><span class="page-numbers dots">&hellip;</span><a href="/users?page=15" title="转到第 15 页"><span class="page-numbers">15</span></a><a href="/users?page=2" title="转到第 2 页"><span class="下一个页码">下一个</span></a>

我需要的值是 15,但这可以是任何数字,具体取决于站点,但始终处于相同位置.

如何轻松读取此值并将其分配给 PHP 中的变量.

谢谢

乔纳森

解决方案

您可以使用 PHP 的 DOM 模块 为此.使用 DOMDocument::loadhtmlfile() 读取页面,然后创建一个 DOMXPath 对象并查询文档中具有 class="page-numbers" 属性的所有 span 元素.

(哎呀,这不是您要找的,请参阅第二个代码段)

$html = ':::<div class="pager"><span class="当前页码">1</span><a href="/users?page=2" title="转到第 2 页"><span class="page-numbers">2</span></a><a href="/users?page=3" title="转到第 3 页"><span class="page-numbers">3</span></a><a href="/users?page=4" title="转到第 4 页"><span class="page-numbers">4</span></a><a href="/users?page=5" title="转到第 5 页"><span class="page-numbers">5</span></a><span class="page-numbers dots">&hellip;</span><a href="/users?page=15" title="转到第 15 页"><span class="page-numbers">15</span></a><a href="/users?page=2" title="转到第 2 页"><span class="下一个页码">下一个</span></a>

</body></html>';$doc = 新的 DOMDocument;//由于内容已经在这里",我们使用 loadhtml(content)//而不是 loadhtmlfile(url)$doc->loadhtml($html);$xpath = new DOMXPath($doc);$nodelist = $xpath->query('//span[@class="page-numbers"]');echo 'there are ', $nodelist->length, ' span 元素具有 class="page-numbers"';

这样做

<a href="/users?page=15" title="转到第 15 页"><span class="page-numbers">15</span></a>

(倒数第二个 a 元素)始终指向最后一页,即此链接是否包含您要查找的值?然后,您可以使用 XPath 表达式选择第二个但最后一个 a 元素,然后从那里选择它的子 span 元素.

//div[@class="pager"] <- 选择每个

其中属性类等于pager"//div[@class="pager"]/a <- 选择每个 那是寻呼机 div 的直接子级//div[@class="pager"]/a[position()=last()-1] <- 选择 这是第二但最后//div[@class="pager"]/a[position()=last()-1]/span <- 选择直接子的那一秒但最后一个<a>寻呼机

中的元素

(您可能想要获取一个好的 XPath 教程 ;-) )

$doc->loadhtml($html);$xpath = new DOMXPath($doc);$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');如果 ( 0 长度 ) {echo $nodelist->item(0)->nodeValue;}别的 {echo '未找到';}

Hi I have a website's home page that I am reading in using Curl and I need to grab the number of pages that the site has.

The information is in a div:-

<div class="pager"> <span class="page-numbers current">1</span> <a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a> <a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a> <a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a> <a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a> <span class="page-numbers dots">&hellip;</span> <a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a> <a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a> </div>

The value I need is 15 but this could be any number depending on the site but will always be in the same position.

How could I read this value easily and assign it to a variable in PHP.

Thanks

Jonathan

解决方案

You can use PHP's DOM module for that. Read the page with DOMDocument::loadhtmlfile(), then create a DOMXPath object and query all span elements within the document having the class="page-numbers" attribute.

(edit: oops, that's not what you're looking for, see second code snippet)

$html = '<html><head><title>:::</title></head><body> <div class="pager"> <span class="page-numbers current">1</span> <a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a> <a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a> <a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a> <a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a> <span class="page-numbers dots">&hellip;</span> <a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a> <a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a> </div> </body></html>'; $doc = new DOMDocument; // since the content "is already here" we use loadhtml(content) // instead of loadhtmlfile(url) $doc->loadhtml($html); $xpath = new DOMXPath($doc); $nodelist = $xpath->query('//span[@class="page-numbers"]'); echo 'there are ', $nodelist->length, ' span elements having class="page-numbers"';

edit: does this

<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>

(the second last a element) always point to the last page, i.e. does this link contain the value you're looking for? Then you can use a XPath expression that selects the second but last a element and from there its child span element.

//div[@class="pager"] <- select each <div> where the attribute class equals "pager" //div[@class="pager"]/a <- select each <a> that is a direct child of the pager div //div[@class="pager"]/a[position()=last()-1] <- select the <a> that is second but last //div[@class="pager"]/a[position()=last()-1]/span <- select the direct child <span> of that second but last <a> element in the pager <div>

( you might want to fetch a good XPath tutorial ;-) )

$doc->loadhtml($html); $xpath = new DOMXPath($doc); $nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span'); if ( 0 < $nodelist->length ) { echo $nodelist->item(0)->nodeValue; } else { echo 'not found'; }

更多推荐

从网页中提取价值

本文发布于:2023-10-29 22:53:22,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1541003.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:价值   网页

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!