从网页中提取价值

编程入门行业动态更新时间:2024-10-28 21:20:16

本文介绍了从网页中提取价值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个网站主页，我正在使用 Curl 阅读该主页，我需要获取该网站的页面数量.

信息在一个 div:-

1<a href="/users?page=2" title="转到第 2 页">2</a><a href="/users?page=3" title="转到第 3 页">3</a><a href="/users?page=4" title="转到第 4 页">4</a><a href="/users?page=5" title="转到第 5 页">5</a>…<a href="/users?page=15" title="转到第 15 页">15</a><a href="/users?page=2" title="转到第 2 页">下一个</a>

我需要的值是 15，但这可以是任何数字，具体取决于站点，但始终处于相同位置.

如何轻松读取此值并将其分配给 PHP 中的变量.

谢谢

乔纳森

解决方案

您可以使用 PHP 的 DOM 模块为此.使用 DOMDocument::loadhtmlfile() 读取页面，然后创建一个 DOMXPath 对象并查询文档中具有 class="page-numbers" 属性的所有 span 元素.

(哎呀，这不是您要找的，请参阅第二个代码段)

$html = ':::<div class="pager">1<a href="/users?page=2" title="转到第 2 页">2</a><a href="/users?page=3" title="转到第 3 页">3</a><a href="/users?page=4" title="转到第 4 页">4</a><a href="/users?page=5" title="转到第 5 页">5</a>…<a href="/users?page=15" title="转到第 15 页">15</a><a href="/users?page=2" title="转到第 2 页">下一个</a>

</body></html>';$doc = 新的 DOMDocument;//由于内容已经在这里"，我们使用 loadhtml(content)//而不是 loadhtmlfile(url)$doc->loadhtml($html);$xpath = new DOMXPath($doc);$nodelist = $xpath->query('//span[@class="page-numbers"]');echo 'there are ', $nodelist->length, ' span 元素具有 class="page-numbers"';

这样做

(倒数第二个 a 元素)始终指向最后一页，即此链接是否包含您要查找的值?然后，您可以使用 XPath 表达式选择第二个但最后一个 a 元素，然后从那里选择它的子 span 元素.

//div[@class="pager"] <- 选择每个

其中属性类等于pager"//div[@class="pager"]/a <- 选择每个那是寻呼机 div 的直接子级//div[@class="pager"]/a[position()=last()-1] <- 选择这是第二但最后//div[@class="pager"]/a[position()=last()-1]/span <- 选择直接子的那一秒但最后一个<a>寻呼机

中的元素

(您可能想要获取一个好的 XPath 教程 ;-) )

$doc->loadhtml($html);$xpath = new DOMXPath($doc);$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');如果 ( 0 长度 ) {echo $nodelist->item(0)->nodeValue;}别的 {echo '未找到';}

Hi I have a website's home page that I am reading in using Curl and I need to grab the number of pages that the site has.

The information is in a div:-

The value I need is 15 but this could be any number depending on the site but will always be in the same position.

How could I read this value easily and assign it to a variable in PHP.

Thanks

Jonathan

解决方案

You can use PHP's DOM module for that. Read the page with DOMDocument::loadhtmlfile(), then create a DOMXPath object and query all span elements within the document having the class="page-numbers" attribute.

(edit: oops, that's not what you're looking for, see second code snippet)

$html = '<html><head><title>:::</title></head><body> <div class="pager"> 1 <a href="/users?page=2" title="go to page 2">2</a> <a href="/users?page=3" title="go to page 3">3</a> <a href="/users?page=4" title="go to page 4">4</a> <a href="/users?page=5" title="go to page 5">5</a> … <a href="/users?page=15" title="go to page 15">15</a> <a href="/users?page=2" title="go to page 2"> next</a> </div> </body></html>'; $doc = new DOMDocument; // since the content "is already here" we use loadhtml(content) // instead of loadhtmlfile(url) $doc->loadhtml($html); $xpath = new DOMXPath($doc); $nodelist = $xpath->query('//span[@class="page-numbers"]'); echo 'there are ', $nodelist->length, ' span elements having class="page-numbers"';

edit: does this

(the second last a element) always point to the last page, i.e. does this link contain the value you're looking for? Then you can use a XPath expression that selects the second but last a element and from there its child span element.

//div[@class="pager"] <- select each <div> where the attribute class equals "pager" //div[@class="pager"]/a <- select each <a> that is a direct child of the pager div //div[@class="pager"]/a[position()=last()-1] <- select the <a> that is second but last //div[@class="pager"]/a[position()=last()-1]/span <- select the direct child of that second but last <a> element in the pager <div>

( you might want to fetch a good XPath tutorial ;-) )

$doc->loadhtml($html); $xpath = new DOMXPath($doc); $nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span'); if ( 0 < $nodelist->length ) { echo $nodelist->item(0)->nodeValue; } else { echo 'not found'; }

更多推荐

从网页中提取价值

本文发布于:2023-10-29 22:53:22，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1541003.html