使用aiohttp / asyncio发出1百万个请求

编程入门行业动态更新时间:2024-10-27 21:24:29

本文介绍了使用aiohttp / asyncio发出1百万个请求-从字面上看的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我遵循了本教程： https： //pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html ，当我处理5万个请求时，一切正常。但是我需要进行1百万次API调用，然后此代码出现问题：

I followed up this tutorial: pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html and everything works fine when I am doing like 50 000 requests. But I need to do 1 milion API calls and then I have problem with this code:

url = "some_url/?id={}" tasks = set() sem = asyncio.Semaphore(MAX_SIM_CONNS) for i in range(1, LAST_ID + 1): task = asyncio.ensure_future(bound_fetch(sem, url.format(i))) tasks.add(task) responses = asyncio.gather(*tasks) return await responses

由于Python需要创建1百万个任务，因此它基本上只是滞后在终端中显示 Killed 消息。有什么方法可以使用由预设的url组（或列表）插入的生成器吗？谢谢。

Because Python needs to create 1 milion tasks, it basically just lags and then prints Killed message in terminal. Is there any way to use a generator insted of pre-made set (or list) of urls? Thanks.

推荐答案

一次安排所有一百万个任务

这是您正在谈论的代码。它最多需要3 GB的RAM，因此如果您的可用内存较少，很可能会被操作系统终止。

Schedule all 1 million tasks at once

This is the code you are talking about. It takes up to 3 GB RAM so it is easily possible that it will be terminated by the operating system if you have low free memory.

import asyncio from aiohttp import ClientSession MAX_SIM_CONNS = 50 LAST_ID = 10**6 async def fetch(url, session): async with session.get(url) as response: return await response.read() async def bound_fetch(sem, url, session): async with sem: await fetch(url, session) async def fetch_all(): url = "localhost:8080/?id={}" tasks = set() async with ClientSession() as session: sem = asyncio.Semaphore(MAX_SIM_CONNS) for i in range(1, LAST_ID + 1): task = asyncio.create_task(bound_fetch(sem, url.format(i), session)) tasks.add(task) return await asyncio.gather(*tasks) if __name__ == '__main__': asyncio.run(fetch_all())

使用队列来简化工作

这是我的建议，如何使用 asyncio.Queue 将URL传递给工作人员任务。队列按需填充，没有预制的URL列表。

Use queue to streamline the work

This is my suggestion how to use asyncio.Queue to pass URLs to worker tasks. The queue is filled as-needed, there is no pre-made list of URLs.

仅需30 MB RAM：）

It takes only 30 MB RAM :)

import asyncio from aiohttp import ClientSession MAX_SIM_CONNS = 50 LAST_ID = 10**6 async def fetch(url, session): async with session.get(url) as response: return await response.read() async def fetch_worker(url_queue): async with ClientSession() as session: while True: url = await url_queue.get() try: if url is None: # all work is done return response = await fetch(url, session) # ...do something with the response finally: url_queue.task_done() # calling task_done() is necessary for the url_queue.join() to work correctly async def fetch_all(): url = "localhost:8080/?id={}" url_queue = asyncio.Queue(maxsize=100) worker_tasks = [] for i in range(MAX_SIM_CONNS): wt = asyncio.create_task(fetch_worker(url_queue)) worker_tasks.append(wt) for i in range(1, LAST_ID + 1): await url_queue.put(url.format(i)) for i in range(MAX_SIM_CONNS): # tell the workers that the work is done await url_queue.put(None) await url_queue.join() await asyncio.gather(*worker_tasks) if __name__ == '__main__': asyncio.run(fetch_all())

更多推荐

使用aiohttp / asyncio发出1百万个请求

本文发布于:2023-11-23 09:49:20，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1621000.html