我有一个网站主页,我正在使用 Curl 阅读该主页,我需要获取该网站的页面数量.
信息在一个 div:-
<span class="当前页码">1</span><a href="/users?page=2" title="转到第 2 页"><span class="page-numbers">2</span></a><a href="/users?page=3" title="转到第 3 页"><span class="page-numbers">3</span></a><a href="/users?page=4" title="转到第 4 页"><span class="page-numbers">4</span></a><a href="/users?page=5" title="转到第 5 页"><span class="page-numbers">5</span></a><span class="page-numbers dots">…</span><a href="/users?page=15" title="转到第 15 页"><span class="page-numbers">15</span></a><a href="/users?page=2" title="转到第 2 页"><span class="下一个页码">下一个</span></a>我需要的值是 15,但这可以是任何数字,具体取决于站点,但始终处于相同位置.
如何轻松读取此值并将其分配给 PHP 中的变量.
谢谢
乔纳森
解决方案您可以使用 PHP 的 DOM 模块 为此.使用 DOMDocument::loadhtmlfile() 读取页面,然后创建一个 DOMXPath 对象并查询文档中具有 class="page-numbers" 属性的所有 span 元素.
(哎呀,这不是您要找的,请参阅第二个代码段)
$html = ':::<div class="pager"><span class="当前页码">1</span><a href="/users?page=2" title="转到第 2 页"><span class="page-numbers">2</span></a><a href="/users?page=3" title="转到第 3 页"><span class="page-numbers">3</span></a><a href="/users?page=4" title="转到第 4 页"><span class="page-numbers">4</span></a><a href="/users?page=5" title="转到第 5 页"><span class="page-numbers">5</span></a><span class="page-numbers dots">…</span><a href="/users?page=15" title="转到第 15 页"><span class="page-numbers">15</span></a><a href="/users?page=2" title="转到第 2 页"><span class="下一个页码">下一个</span></a></body></html>';$doc = 新的 DOMDocument;//由于内容已经在这里",我们使用 loadhtml(content)//而不是 loadhtmlfile(url)$doc->loadhtml($html);$xpath = new DOMXPath($doc);$nodelist = $xpath->query('//span[@class="page-numbers"]');echo 'there are ', $nodelist->length, ' span 元素具有 class="page-numbers"';
这样做
<a href="/users?page=15" title="转到第 15 页"><span class="page-numbers">15</span></a>(倒数第二个 a 元素)始终指向最后一页,即此链接是否包含您要查找的值?然后,您可以使用 XPath 表达式选择第二个但最后一个 a 元素,然后从那里选择它的子 span 元素.
//div[@class="pager"] <- 选择每个 其中属性类等于pager"//div[@class="pager"]/a <- 选择每个 那是寻呼机 div 的直接子级//div[@class="pager"]/a[position()=last()-1] <- 选择 这是第二但最后//div[@class="pager"]/a[position()=last()-1]/span <- 选择直接子的那一秒但最后一个<a>寻呼机 中的元素(您可能想要获取一个好的 XPath 教程 ;-) )
$doc->loadhtml($html);$xpath = new DOMXPath($doc);$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');如果 ( 0 长度 ) {echo $nodelist->item(0)->nodeValue;}别的 {echo '未找到';}Hi I have a website's home page that I am reading in using Curl and I need to grab the number of pages that the site has.
The information is in a div:-
<div class="pager"> <span class="page-numbers current">1</span> <a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a> <a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a> <a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a> <a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a> <span class="page-numbers dots">…</span> <a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a> <a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a> </div>The value I need is 15 but this could be any number depending on the site but will always be in the same position.
How could I read this value easily and assign it to a variable in PHP.
Thanks
Jonathan
解决方案You can use PHP's DOM module for that. Read the page with DOMDocument::loadhtmlfile(), then create a DOMXPath object and query all span elements within the document having the class="page-numbers" attribute.
(edit: oops, that's not what you're looking for, see second code snippet)
$html = '<html><head><title>:::</title></head><body> <div class="pager"> <span class="page-numbers current">1</span> <a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a> <a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a> <a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a> <a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a> <span class="page-numbers dots">…</span> <a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a> <a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a> </div> </body></html>'; $doc = new DOMDocument; // since the content "is already here" we use loadhtml(content) // instead of loadhtmlfile(url) $doc->loadhtml($html); $xpath = new DOMXPath($doc); $nodelist = $xpath->query('//span[@class="page-numbers"]'); echo 'there are ', $nodelist->length, ' span elements having class="page-numbers"';edit: does this
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>(the second last a element) always point to the last page, i.e. does this link contain the value you're looking for? Then you can use a XPath expression that selects the second but last a element and from there its child span element.
//div[@class="pager"] <- select each <div> where the attribute class equals "pager" //div[@class="pager"]/a <- select each <a> that is a direct child of the pager div //div[@class="pager"]/a[position()=last()-1] <- select the <a> that is second but last //div[@class="pager"]/a[position()=last()-1]/span <- select the direct child <span> of that second but last <a> element in the pager <div>( you might want to fetch a good XPath tutorial ;-) )
$doc->loadhtml($html); $xpath = new DOMXPath($doc); $nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span'); if ( 0 < $nodelist->length ) { echo $nodelist->item(0)->nodeValue; } else { echo 'not found'; }
更多推荐
从网页中提取价值
发布评论