php dom文件loadHTML和getElementByTagName什么都不返回(php dom document loadHTML and getElementByTagName returns nothing)
$urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd";
$pageContentData = file_get_contents($urlToScrap);
$doc = new DOMDocument();
$doc->loadHTML($pageContentData);
$listOfDivs = $doc->getElementsByTagName("div");
foreach ($listOfDivs as $div) {
if($div->getAttribute("class") == "doc-banner-icon"){
$img = $div->getElementsByTagName("img");
var_dump($img->getAttribute("src"));
}
}
返回空。
我在dom中有以下元素:
<div class="doc-banner-icon"><img src="somesrc"></div>我正在尝试获取img src,因为在页面中有很多图像,我想首先获取父div,然后在其中提取图像。
解决方案在这里:
$urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd"; $pageContentData = file_get_contents($urlToScrap); $doc = new DOMDocument(); $doc->loadHTML($pageContentData); $listOfDivs = $doc->getElementsByTagName("div"); foreach ($listOfDivs as $div) { if($div->getAttribute("class") == "doc-banner-icon"){ $listOfImages = $div->getElementsByTagName("img"); foreach($listOfImages as $img){ var_dump($img->getAttribute("src")); } } } $urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd"; $pageContentData = file_get_contents($urlToScrap); $doc = new DOMDocument(); $doc->loadHTML($pageContentData); $listOfDivs = $doc->getElementsByTagName("div"); foreach ($listOfDivs as $div) { if($div->getAttribute("class") == "doc-banner-icon"){ $img = $div->getElementsByTagName("img"); var_dump($img->getAttribute("src")); } }returns empty.
I have the following elements in the dom:
<div class="doc-banner-icon"><img src="somesrc"></div>I'm trying to get the img src and since in the page there are many images, I would like to first get the parent div and then extract the image inside it.
The solution is here:
$urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd"; $pageContentData = file_get_contents($urlToScrap); $doc = new DOMDocument(); $doc->loadHTML($pageContentData); $listOfDivs = $doc->getElementsByTagName("div"); foreach ($listOfDivs as $div) { if($div->getAttribute("class") == "doc-banner-icon"){ $listOfImages = $div->getElementsByTagName("img"); foreach($listOfImages as $img){ var_dump($img->getAttribute("src")); } } }最满意答案
你没有遗漏任何东西, var_dump不能像你期望的那样在DOMNodeList 。 试试这个:
$listOfImages = $doc->getElementsByTagName("img"); foreach ($listOfImages as $img) { $imgClass = $img->getAttribute('class'); echo $imgClass; }在您更新的问题中,只需更改:
$img->getAttribute("src")至:
$img->item(0)->getAttribute("src")鉴于您的选择标准相当复杂,您可以考虑使用XPath而不是手动导航:
$doc = new DOMDocument(); $doc->loadHTML($pageContentData); $xpath = new DOMXPath($doc); $img = $xpath->query("//div[@class = 'doc-banner-icon']/img"); var_dump($img->item(0)->getAttribute('src'));You aren't missing anything, var_dump doesn't work as you expect on a DOMNodeList. Try this instead:
$listOfImages = $doc->getElementsByTagName("img"); foreach ($listOfImages as $img) { $imgClass = $img->getAttribute('class'); echo $imgClass; }In your updated question, just change:
$img->getAttribute("src")to:
$img->item(0)->getAttribute("src")Given that your selection criteria is fairly complex, you might consider using XPath instead of navigating manually:
$doc = new DOMDocument(); $doc->loadHTML($pageContentData); $xpath = new DOMXPath($doc); $img = $xpath->query("//div[@class = 'doc-banner-icon']/img"); var_dump($img->item(0)->getAttribute('src'));更多推荐
发布评论