使用php提取html内容(Extract html content using php)

我有以下代码：

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html"); $dom = new DOMDocument(); $xpath = new DOMXPath($dom); $nodes = $xpath->query('//*[@id="price_div"]/div[2]/span[2]'); //this catches all elements with var_dump($nodes);

我想从页面中提取价格。但是这个xpath没有给我结果。

I have the following code:

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html"); $dom = new DOMDocument(); $xpath = new DOMXPath($dom); $nodes = $xpath->query('//*[@id="price_div"]/div[2]/span[2]'); //this catches all elements with var_dump($nodes);

I want to extract the price from the page. But this xpath is not giving me the result.

最满意答案

你有没有解决过这个问题？这是一些工作代码：

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html"); //suppress errors (there is a lot on the page in question) libxml_use_internal_errors(true); //dont preserve whitespaces $page->preserveWhiteSpace = false; $dom = new DOMDocument(); //as @Larry.Z comments, you forgot to load the $html $dom->loadHTML($html); $xpath = new DOMXPath($dom); //assuming there can be more than one "price set" on each page $prices = array(); $price_divs = $xpath->query('//div[@id="price_div"]'); foreach ($price_divs as $price_div) { $price=array(); foreach ($price_div->childNodes as $price_item) { $content=trim($price_item->textContent); if ($content!='') $price[]=$content; } $prices[]=$price; } echo '<pre>'; print_r($prices); echo '</pre>';

输出

Array ( [0] => Array ( [0] => Save 66% [1] => Rs. 5850 [2] => Rs. 1999 ) )

您可以跳过$prices[]部分，如果每页的价格设置不会超过一个，则只能使用$price 。

Did you ever solve the problem? Here is some working code :

$html = file_get_contents("http://www.jabong.com/giordano-Dtlm60058-Black-Analog-Watch-267058.html"); //suppress errors (there is a lot on the page in question) libxml_use_internal_errors(true); //dont preserve whitespaces $page->preserveWhiteSpace = false; $dom = new DOMDocument(); //as @Larry.Z comments, you forgot to load the $html $dom->loadHTML($html); $xpath = new DOMXPath($dom); //assuming there can be more than one "price set" on each page $prices = array(); $price_divs = $xpath->query('//div[@id="price_div"]'); foreach ($price_divs as $price_div) { $price=array(); foreach ($price_div->childNodes as $price_item) { $content=trim($price_item->textContent); if ($content!='') $price[]=$content; } $prices[]=$price; } echo '<pre>'; print_r($prices); echo '</pre>';

outputs

Array ( [0] => Array ( [0] => Save 66% [1] => Rs. 5850 [2] => Rs. 1999 ) )

you can skip the $prices[] part and only use $price if there never will be more than one price set per page.

更多推荐