HXS过滤与scrapy

编程入门行业动态更新时间:2024-10-28 20:16:49

HXS过滤与scrapy - python(HXS filtering with scrapy - python)

我是这个领域的新手，我需要更多信息。我在互联网上找不到任何信息。例如现在我现在使用这个函数hxs.select('//div[@id="CategoryBreadcrumb"]//text()').extract() 。在这个div我有ul和li s，每个li只有一个nchors。我需要li中没有标签的文本。如果您为hxs过滤提供任何教育链接，我将非常感激。提前致谢！如果您无法想象我需要什么，这是一个例子。

I'm new in this sphere and i need more information. I couldn't find any information in the Internet. For example now now i use this function hxs.select('//div[@id="CategoryBreadcrumb"]//text()').extract() . In this div I have ul and lis with anchors in each li but one. I need the text from the li that doesn't have a tag in it. I'd be thankful if you give any educational links for hxs filtering as well. Thanks in advance! Here is an example if u cant visualize what i need.

最满意答案

尝试：

hxs.select('//div[@id = "CategoryBreadcrumb"]/ul/li/text()')

要了解有关XPaths的更多信息，请参阅w3schools了解基础知识， w3.org了解完整规范。

PS：scrapy使用lxml。您可以使用以下代码测试您的XPath：

import lxml.html as LH text = ''' <div id='CategoryBreadcrumb'> <ul> <li><a href=#>I dont need</a></li> <li><a href=#>I dont need</a></li> <li><a href=#>I dont need</a></li> <li>Text that i need</li> </ul> </div> ''' doc = LH.fromstring(text) print(doc.xpath('//div[@id = "CategoryBreadcrumb"]/ul/li/text()')) # ['Text that i need']

Try:

hxs.select('//div[@id = "CategoryBreadcrumb"]/ul/li/text()')

To learn more about XPaths see w3schools for the basics, and w3.org for the full specification.

PS: scrapy uses lxml. You can test your XPaths using code like this:

更多推荐

本文发布于:2023-08-04 09:19:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1415121.html