在scrapy中重复结果，而不是scrapy中的CSS选择器(Duplicate results in Xpath and not CSS selectors in scrapy)

编程入门行业动态更新时间:2024-10-28 22:32:52

所以我正在通过教程来玩scrapy。我正在尝试使用那里提到的CSS选择器来刮取伴随网站中每个引用的文本，作者和标签：

for quote in response.css('div.quote'): print quote.css('span.text::text').extract() print quote.css('span small::text').extract() print quote.css('div.tags a.tag::text').extract()

我得到了期望的结果（即：每个文本，作者和引号的打印一次）。但是一旦使用像这样的Xpath选择器：

for quote in response.xpath("//*[@class='quote']"): print quote.xpath("//*[@class='text']/text()").extract() print quote.xpath("//*[@class='author']/text()").extract() print quote.xpath("//*[@class='tag']/text()").extract()

我得到重复的结果！

我仍然无法找到为什么2之间存在这样的差异。

So I am playing around with scrapy through the tutorial. I am trying to scrape the text, author and tags of each quote in the companion website when using CSS selectors like mentioned there:

for quote in response.css('div.quote'): print quote.css('span.text::text').extract() print quote.css('span small::text').extract() print quote.css('div.tags a.tag::text').extract()

I get the desired result (i.e: print of each text, author and quotes once). But once using Xpath selectors like this:

I get duplicates results!

I still can't find why there is such a difference between the 2.

最满意答案

尝试.//而不是//用于相关搜索，例如

print quote.xpath(".//*[@class='text']/text()").extract()

当你使用// ，尽管你是从quote搜索的，但这意味着绝对搜索，因此它的上下文仍然是文档的根。 .//然而，意味着搜索. - 当前节点 - 此搜索的上下文将仅限于嵌套在quote下的元素。

作为旁注，如果您希望获得完全相同的结果，则应考虑将*更改为您在CSS搜索中使用的标记 - span或div 。在这种情况下，它没有任何区别，只是为了将来的参考。

Try .// instead of // for your relative searches e.g.

print quote.xpath(".//*[@class='text']/text()").extract()

When you use //, although you're searching from quote, it takes this to mean an absolute search so its context is still the root of the document. .// however, means to search from . - the current node - and the context of this search will be limited to the elements nested under quote.

As a side note, if you're looking to get the exact same results, you should consider changing * to the tags you used in the CSS search - span or div. In this case it doesn't make any difference but just a head's up for future reference.

更多推荐

本文发布于:2023-07-08 04:24:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1072057.html