asyncio Web抓取101:使用aiohttp获取多个URL

编程入门 行业动态 更新时间:2024-10-28 12:23:43
本文介绍了asyncio Web抓取101:使用aiohttp获取多个URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

在先前的问题中, aiohttp 的一位作者恳切推荐了获取多个使用来自 Python 3.5 的新 async和语法的aiohttp 网址:

In earlier question, one of authors of aiohttp kindly suggested way to fetch multiple urls with aiohttp using the new async with syntax from Python 3.5:

import aiohttp import asyncio async def fetch(session, url): with aiohttp.Timeout(10): async with session.get(url) as response: return await response.text() async def fetch_all(session, urls, loop): results = await asyncio.wait([loop.create_task(fetch(session, url)) for url in urls]) return results if __name__ == '__main__': loop = asyncio.get_event_loop() # breaks because of the first url urls = ['SDFKHSKHGKLHSKLJHGSDFKSJH', 'google', 'twitter'] with aiohttp.ClientSession(loop=loop) as session: the_results = loop.run_until_complete( fetch_all(session, urls, loop)) # do something with the the_results

但是,当 session.get(url)请求之一中断时(如上,由于 SDFKHSKHGKLHSKLJHGSDFKSJH ),该错误未得到处理,整个过程都坏了。

However when one of the session.get(url) requests breaks (as above because of SDFKHSKHGKLHSKLJHGSDFKSJH) the error is not handled and the whole thing breaks.

我在寻找插入测试的方法关于 session.get(url)的结果,例如寻找 try的地方...除了... ,或者对于 if response.status!= 200:,但我只是不了解如何使用与,等待和各种对象。

I looked for ways to insert tests about the result of session.get(url), for instance looking for places for a try ... except ..., or for a if response.status != 200: but I am just not understanding how to work with async with, await and the various objects.

由于与还是很新,没有很多例子。如果 asyncio 向导可以显示如何执行此操作,则对许多人来说将非常有帮助。毕竟,大多数人想要使用 asyncio 进行测试的第一件事就是同时获取多个资源。

Since async with is still very new there are not many examples. It would be very helpful to many people if an asyncio wizard could show how to do this. After all one of the first things most people will want to test with asyncio is getting multiple resources concurrently.

目标

目标是我们可以检查 the_results 并快速查看其中一个:

The goal is that we can inspect the_results and quickly see either:

  • 此网址失败了(原因:状态代码,也许是异常名称),或者
  • 该网址有效,这是一个有用的响应对象
推荐答案

我会使用 收集 而不是等待,它可以将异常作为对象返回,而无需引发它们。然后,您可以检查每个结果(如果它是某些异常的实例)。

I would use gather instead of wait, which can return exceptions as objects, without raising them. Then you can check each result, if it is instance of some exception.

import aiohttp import asyncio async def fetch(session, url): with aiohttp.Timeout(10): async with session.get(url) as response: return await response.text() async def fetch_all(session, urls, loop): results = await asyncio.gather( *[fetch(session, url) for url in urls], return_exceptions=True # default is false, that would raise ) # for testing purposes only # gather returns results in the order of coros for idx, url in enumerate(urls): print('{}: {}'.format(url, 'ERR' if isinstance(results[idx], Exception) else 'OK')) return results if __name__ == '__main__': loop = asyncio.get_event_loop() # breaks because of the first url urls = [ 'SDFKHSKHGKLHSKLJHGSDFKSJH', 'google', 'twitter'] with aiohttp.ClientSession(loop=loop) as session: the_results = loop.run_until_complete( fetch_all(session, urls, loop))

测试:

$python test.py SDFKHSKHGKLHSKLJHGSDFKSJH: ERR google: OK twitter: OK

更多推荐

asyncio Web抓取101:使用aiohttp获取多个URL

本文发布于:2023-11-23 09:42:52,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1620980.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多个   Web   asyncio   URL   aiohttp

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!