使用Python的Google搜索网络抓取

编程入门 行业动态 更新时间:2024-10-09 22:19:14
本文介绍了使用Python的Google搜索网络抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我最近在学习一些Python,以便在工作中的某些项目上工作.

I've been learning a lot of python lately to work on some projects at work.

当前,我需要对Google搜索结果进行一些网页抓取.我发现了几个站点,这些站点演示了如何使用Ajax谷歌api进行搜索,但是在尝试使用它之后,似乎不再受支持.有什么建议?

Currently I need to do some web scraping with google search results. I found several sites that demonstrated how to use ajax google api to search, however after attempting to use it, it appears to no longer be supported. Any suggestions?

我一直在寻找一种方法,但是似乎找不到任何有效的解决方案.

I've been searching for quite a while to find a way but can't seem to find any solutions that currently work.

推荐答案

您始终可以直接抓取Google结果.为此,您可以使用URL google/search?q=<Query>,它将返回前10个搜索结果.

You can always directly scrape Google results. To do this, you can use the URL google/search?q=<Query> this will return the top 10 search results.

然后,您可以使用 lxml 来解析页面.根据您的使用方式,您可以通过CSS选择器(.r a)或XPath选择器(//h3[@class="r"]/a)

Then you can use lxml for example to parse the page. Depending on what you use, you can either query the resulting node tree via a CSS-Selector (.r a) or using a XPath-Selector (//h3[@class="r"]/a)

在某些情况下,生成的URL将重定向到Google.通常,它包含一个查询参数q,其中将包含实际的请求URL.

In some cases the resulting URL will redirect to Google. Usually it contains a query-parameter qwhich will contain the actual request URL.

使用lxml和请求的示例代码:

Example code using lxml and requests:

from urllib.parse import urlencode, urlparse, parse_qs from lxml.html import fromstring from requests import get raw = get("www.google/search?q=StackOverflow").text page = fromstring(raw) for result in page.cssselect(".r a"): url = result.get("href") if url.startswith("/url?"): url = parse_qs(urlparse(url).query)['q'] print(url[0])

关于Google禁止您使用IP的说明:以我的经验,Google仅禁止 如果您开始向Google发送带有搜索请求的垃圾邮件.它将回应 如果Google认为您是机器人,则显示503.

A note on google banning your IP: In my experience, google only bans if you start spamming google with search requests. It will respond with a 503 if Google thinks you are bot.

更多推荐

使用Python的Google搜索网络抓取

本文发布于:2023-11-28 19:27:08,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1643662.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:网络   Python   Google

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!