在scrapy中重复结果,而不是scrapy中的CSS选择器(Duplicate results in Xpath and not CSS selectors in scrapy)

编程入门 行业动态 更新时间:2024-10-28 22:32:52
在scrapy中重复结果,而不是scrapy中的CSS选择器(Duplicate results in Xpath and not CSS selectors in scrapy)

所以我正在通过教程来玩scrapy。 我正在尝试使用那里提到的CSS选择器来刮取伴随网站中每个引用的文本,作者和标签:

for quote in response.css('div.quote'): print quote.css('span.text::text').extract() print quote.css('span small::text').extract() print quote.css('div.tags a.tag::text').extract()

我得到了期望的结果(即:每个文本,作者和引号的打印一次)。 但是一旦使用像这样的Xpath选择器:

for quote in response.xpath("//*[@class='quote']"): print quote.xpath("//*[@class='text']/text()").extract() print quote.xpath("//*[@class='author']/text()").extract() print quote.xpath("//*[@class='tag']/text()").extract()

我得到重复的结果!

我仍然无法找到为什么2之间存在这样的差异。

So I am playing around with scrapy through the tutorial. I am trying to scrape the text, author and tags of each quote in the companion website when using CSS selectors like mentioned there:

for quote in response.css('div.quote'): print quote.css('span.text::text').extract() print quote.css('span small::text').extract() print quote.css('div.tags a.tag::text').extract()

I get the desired result (i.e: print of each text, author and quotes once). But once using Xpath selectors like this:

for quote in response.xpath("//*[@class='quote']"): print quote.xpath("//*[@class='text']/text()").extract() print quote.xpath("//*[@class='author']/text()").extract() print quote.xpath("//*[@class='tag']/text()").extract()

I get duplicates results!

I still can't find why there is such a difference between the 2.

最满意答案

尝试.//而不是//用于相关搜索,例如

print quote.xpath(".//*[@class='text']/text()").extract()

当你使用// ,尽管你是从quote搜索的,但这意味着绝对搜索,因此它的上下文仍然是文档的根。 .//然而,意味着搜索. - 当前节点 - 此搜索的上下文将仅限于嵌套在quote下的元素。

作为旁注,如果您希望获得完全相同的结果,则应考虑将*更改为您在CSS搜索中使用的标记 - span或div 。 在这种情况下,它没有任何区别,只是为了将来的参考。

Try .// instead of // for your relative searches e.g.

print quote.xpath(".//*[@class='text']/text()").extract()

When you use //, although you're searching from quote, it takes this to mean an absolute search so its context is still the root of the document. .// however, means to search from . - the current node - and the context of this search will be limited to the elements nested under quote.

As a side note, if you're looking to get the exact same results, you should consider changing * to the tags you used in the CSS search - span or div. In this case it doesn't make any difference but just a head's up for future reference.

更多推荐

本文发布于:2023-07-08 04:24:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1072057.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:而不是   选择器   CSS   scrapy   selectors

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!