最大化并行请求数 (aiohttp)

编程入门行业动态更新时间:2024-10-27 04:24:17

本文介绍了最大化并行请求数 (aiohttp)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

tl;dr:如何最大化可以并行发送的 http 请求数?

我正在使用 aiohttp 库从多个 url 获取数据.我正在测试它的性能，我观察到在这个过程中的某个地方存在瓶颈，一次运行更多的 url 无济于事.

我正在使用此代码:

导入异步导入 aiohttp异步 def fetch(url, session):headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}尝试:与 session.get(网址，标题=标题，ssl = 错误，超时 = aiohttp.ClientTimeout(总计=无，袜子连接= 10，袜子读取 = 10)) 作为回应:内容 = 等待 response.read()返回(网址，确定"，内容)除了作为 e 的例外:打印(e)返回(网址，'错误'，str(e))异步定义运行(url_list):任务 = []与 aiohttp.ClientSession() 异步作为会话:对于 url_list 中的 url:任务 = asyncio.ensure_future(fetch(url, session))任务.附加(任务)响应 = asyncio.gather(*tasks)等待回应回复回复loop = asyncio.get_event_loop()asyncio.set_event_loop(loop)任务 = asyncio.ensure_future(run(url_list))loop.run_until_complete(任务)结果 = task.result().result()

使用不同长度的 url_list 运行此程序(针对

为什么会这样?究竟是什么限制了速度?
如何检查在给定计算机上可以发送的最大并行请求数是多少?(我的意思是一个确切的数字 - 不是如上所述的反复试验")
如何增加一次处理的请求数量?

我在 Windows 上运行这个.

编辑以回应评论:

这是限制设置为None的相同数据.最后只有轻微的改进，并且一次发送了 400 个 url，连接超时错误很多.我最终在我的实际数据上使用了 limit = 200.

解决方案

默认情况下 aiohttp 将同时连接的数量限制为 100.它通过将默认 limit 设置为 TCPConnector ClientSession 使用的 noreferrer">object.您可以通过创建自定义连接器并将其传递给会话来绕过它:

connector = aiohttp.TCPConnector(limit=None)与 aiohttp.ClientSession(connector=connector) 作为会话异步:# ...

但是请注意，您可能不想将此数字设置得太高:您的网络容量、CPU、RAM 和目标服务器都有自己的限制，尝试进行大量连接可能会导致失败增加.

可能只有在混凝土机器上进行实验才能找到最佳数.

不相关:

没有原因，您不必创建任务.大多数 asyncio api 接受常规协程.例如，您的最后几行代码可以这样修改:

loop = asyncio.get_event_loop()loop.run_until_complete(run(url_list))

甚至只是 asyncio.run(run(url_list)) (doc) 如果您使用的是 Python 3.7

tl;dr: how do I maximize number of http requests I can send in parallel?

I am fetching data from multiple urls with aiohttp library. I'm testing its performance and I've observed that somewhere in the process there is a bottleneck, where running more urls at once just doesn't help.

I am using this code:

import asyncio import aiohttp async def fetch(url, session): headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'} try: async with session.get( url, headers=headers, ssl = False, timeout = aiohttp.ClientTimeout( total=None, sock_connect = 10, sock_read = 10 ) ) as response: content = await response.read() return (url, 'OK', content) except Exception as e: print(e) return (url, 'ERROR', str(e)) async def run(url_list): tasks = [] async with aiohttp.ClientSession() as session: for url in url_list: task = asyncio.ensure_future(fetch(url, session)) tasks.append(task) responses = asyncio.gather(*tasks) await responses return responses loop = asyncio.get_event_loop() asyncio.set_event_loop(loop) task = asyncio.ensure_future(run(url_list)) loop.run_until_complete(task) result = task.result().result()

Running this with url_list of varying length (tests against httpbin/delay/2) I see that adding more urls to be run at once helps only up to ~100 urls and then total time starts to grow proportionally to number of urls (or in other words, time per one url does not decrease). This suggests that something fails when trying to process these at once. In addition, with more urls in 'one batch' I am occasionally receiving connection timeout errors.

Why is it happening? What exactly limits the speed here?
How can I check what is the maximum number of parallel requests I can send on a given computer? (I mean an exact number - not approx by 'trial and error' as above)
What can I do to increase the number of requests processed at once?

I am runnig this on Windows.

EDIT in response to comment:

This is the same data with limit set to None. Only slight improvement in the end and there are many connection timeout errors with 400 urls sent at once. I ended up using limit = 200 on my actual data.

解决方案

By default aiohttp limits number of simultaneous connections to 100. It achieves by setting default limit to TCPConnector object that is used by ClientSession. You can bypass it by creating and passing custom connector to session:

connector = aiohttp.TCPConnector(limit=None) async with aiohttp.ClientSession(connector=connector) as session: # ...

Note however that you probably don't want to set this number too high: your network capacity, CPU, RAM and target server have their own limits and try to make enormous amount of connection can lead to increasing failures.

Optimal number can probably be found only through experiments on concrete machine.

Unrelated:

You don't have to create tasks without reason. Most asyncio api accept regular coroutines. For example, your last lines of code can be altered this way:

loop = asyncio.get_event_loop() loop.run_until_complete(run(url_list))

Or even to just asyncio.run(run(url_list)) (doc) if you're using Python 3.7

更多推荐

最大化并行请求数 (aiohttp)

本文发布于:2023-11-23 09:30:28，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1620946.html