有人知道我可以使用的基于 Python 的网络爬虫吗?

编程入门行业动态更新时间:2024-10-26 11:24:50

本文介绍了有人知道我可以使用的基于 Python 的网络爬虫吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有点想自己写，但我现在真的没有足够的时间.我看过开源爬虫的维基百科列表，但我更喜欢用 Python 写的东西.我意识到我可能只使用维基百科页面上的工具之一并将其包装在 Python 中.我最终可能会这样做 - 如果有人对这些工具中的任何一个有任何建议，我愿意听取他们的意见.我通过其 Web 界面使用了 Heritrix，但我发现它非常麻烦.我绝对不会在即将开展的项目中使用浏览器 API.

I'm half-tempted to write my own, but I don't really have enough time right now. I've seen the Wikipedia list of open source crawlers but I'd prefer something written in Python. I realize that I could probably just use one of the tools on the Wikipedia page and wrap it in Python. I might end up doing that - if anyone has any advice about any of those tools, I'm open to hearing about them. I've used Heritrix via its web interface and I found it to be quite cumbersome. I definitely won't be using a browser API for my upcoming project.

提前致谢.另外，这是我的第一个 SO 问题！

Thanks in advance. Also, this is my first SO question!