问题描述
限时送ChatGPT账号..我想抓取网站中的广告,但其中很多都是动态的,而且是 DOM 对象.例如在这个片段
I want to scrape ads in websites but many of them are dynamic and they are DOM objects. For example in this snippet
我可以通过 Selenium 获取 iframe 标签,但我不能再进一步了.我认为这是因为 XPATH.在这种情况下,iframe 内 的 XPATH 是
/html
,与主页 相同.
I can get the iframe tag by Selenium but I cannot go any further. I think it is because of the XPATH. In this case the XPATH of the <html>
inside the iframe is /html
which is the same as the main page <html>
.
这是使用的代码行:
element = WebDriverWait(self.driver,20).until(EC.presence_of_all_elements_located((By.XPATH, '/html')))
有什么建议吗?
推荐答案
默认情况下, selenium.webdriver 对象设置为它已解析的默认页面.要获取 iframe 数据,您必须切换到给定的 iframe.
By default the selenium.webdriver object is set to the default page which it has parsed. To get the iframe data you will have to switch to the given iframe.
driver = webdriver.Chrome(executable_path=path_chrome)
# find the frame using id, title etc.
frame = driver.find_elements_by_xpath("//iframe[@title='iframe_to_get']")
# switch the webdriver object to the iframe.
driver.switch_to.frame(frame[i])
永远记住,如果迭代 iframe,然后切换回到默认网页.否则,您将无法在同一代码中切换到其他 iframe.
Always remember, if iterating over the iframes then to SWITCH BACK to the default webpage. Otherwise you won't be able to switch to other iframes in same code.
driver.switch_to.default_content()
更新
下面提到的功能现已弃用.所以我更新了答案.
Update
Below mentioned functions are deprecated now. So i have updated the answer.
driver.switch_to_frame('Any frame') #deprecated
driver.switch_to_default_content() #deprecated
这篇关于使用 Selenium 抓取 iframe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论