使用php提取html内容(Extract html content using php)

编程入门 行业动态 更新时间:2024-10-27 12:42:09
使用php提取html内容(Extract html content using php)

我有以下代码:

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html"); $dom = new DOMDocument(); $xpath = new DOMXPath($dom); $nodes = $xpath->query('//*[@id="price_div"]/div[2]/span[2]'); //this catches all elements with var_dump($nodes);

我想从页面中提取价格。 但是这个xpath没有给我结果。

I have the following code:

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html"); $dom = new DOMDocument(); $xpath = new DOMXPath($dom); $nodes = $xpath->query('//*[@id="price_div"]/div[2]/span[2]'); //this catches all elements with var_dump($nodes);

I want to extract the price from the page. But this xpath is not giving me the result.

最满意答案

你有没有解决过这个问题? 这是一些工作代码:

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html"); //suppress errors (there is a lot on the page in question) libxml_use_internal_errors(true); //dont preserve whitespaces $page->preserveWhiteSpace = false; $dom = new DOMDocument(); //as @Larry.Z comments, you forgot to load the $html $dom->loadHTML($html); $xpath = new DOMXPath($dom); //assuming there can be more than one "price set" on each page $prices = array(); $price_divs = $xpath->query('//div[@id="price_div"]'); foreach ($price_divs as $price_div) { $price=array(); foreach ($price_div->childNodes as $price_item) { $content=trim($price_item->textContent); if ($content!='') $price[]=$content; } $prices[]=$price; } echo '<pre>'; print_r($prices); echo '</pre>';

输出

Array ( [0] => Array ( [0] => Save 66% [1] => Rs. 5850 [2] => Rs. 1999 ) )

您可以跳过$prices[]部分,如果每页的价格设置不会超过一个,则只能使用$price 。

Did you ever solve the problem? Here is some working code :

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html"); //suppress errors (there is a lot on the page in question) libxml_use_internal_errors(true); //dont preserve whitespaces $page->preserveWhiteSpace = false; $dom = new DOMDocument(); //as @Larry.Z comments, you forgot to load the $html $dom->loadHTML($html); $xpath = new DOMXPath($dom); //assuming there can be more than one "price set" on each page $prices = array(); $price_divs = $xpath->query('//div[@id="price_div"]'); foreach ($price_divs as $price_div) { $price=array(); foreach ($price_div->childNodes as $price_item) { $content=trim($price_item->textContent); if ($content!='') $price[]=$content; } $prices[]=$price; } echo '<pre>'; print_r($prices); echo '</pre>';

outputs

Array ( [0] => Array ( [0] => Save 66% [1] => Rs. 5850 [2] => Rs. 1999 ) )

you can skip the $prices[] part and only use $price if there never will be more than one price set per page.

更多推荐

本文发布于:2023-08-05 09:20:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1431257.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:内容   html   php   Extract   content

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!