aiohttp: set maximum number of requests per second - python

How can I set maximum number of requests per second (limit them) in client side using aiohttp?

Although it's not exactly a limit on the number of requests per second, note that since v2.0, when using a ClientSession, aiohttp automatically limits the number of simultaneous connections to 100.
You can modify the limit by creating your own TCPConnector and passing it into the ClientSession. For instance, to create a client limited to 50 simultaneous requests:
import aiohttp
connector = aiohttp.TCPConnector(limit=50)
client = aiohttp.ClientSession(connector=connector)
In case it's better suited to your use case, there is also a limit_per_host parameter (which is off by default) that you can pass to limit the number of simultaneous connections to the same "endpoint". Per the docs:
limit_per_host (int) – limit for simultaneous connections to the same endpoint. Endpoints are the same if they are have equal (host, port, is_ssl) triple.
Example usage:
import aiohttp
connector = aiohttp.TCPConnector(limit_per_host=50)
client = aiohttp.ClientSession(connector=connector)

I found one possible solution here: http://compiletoi.net/fast-scraping-in-python-with-asyncio.html
Doing 3 requests at the same time is cool, doing 5000, however, is not so nice. If you try to do too many requests at the same time, connections might start to get closed, or you might even get banned from the website.
To avoid this, you can use a semaphore. It is a synchronization tool that can be used to limit the number of coroutines that do something at some point. We'll just create the semaphore before creating the loop, passing as an argument the number of simultaneous requests we want to allow:
sem = asyncio.Semaphore(5)
Then, we just replace:
page = yield from get(url, compress=True)
by the same thing, but protected by a semaphore:
with (yield from sem):
page = yield from get(url, compress=True)
This will ensure that at most 5 requests can be done at the same time.

This is an example without aiohttp, but you can wrap any async method or aiohttp.request using the Limit decorator
import asyncio
import time
class Limit(object):
def __init__(self, calls=5, period=1):
self.calls = calls
self.period = period
self.clock = time.monotonic
self.last_reset = 0
self.num_calls = 0
def __call__(self, func):
async def wrapper(*args, **kwargs):
if self.num_calls >= self.calls:
await asyncio.sleep(self.__period_remaining())
period_remaining = self.__period_remaining()
if period_remaining <= 0:
self.num_calls = 0
self.last_reset = self.clock()
self.num_calls += 1
return await func(*args, **kwargs)
return wrapper
def __period_remaining(self):
elapsed = self.clock() - self.last_reset
return self.period - elapsed
#Limit(calls=5, period=2)
async def test_call(x):
print(x)
async def worker():
for x in range(100):
await test_call(x + 1)
asyncio.run(worker())

Because none of the solution works from the other answers (I've already tried) if the API request limits the time since the end of the request. I'm posting a new one that should work:
class Limiter:
def __init__(self, calls_limit: int = 5, period: int = 1):
self.calls_limit = calls_limit
self.period = period
self.semaphore = asyncio.Semaphore(calls_limit)
self.requests_finish_time = []
async def sleep(self):
if len(self.requests_finish_time) >= self.calls_limit:
sleep_before = self.requests_finish_time.pop(0)
if sleep_before >= time.monotonic():
await asyncio.sleep(sleep_before - time.monotonic())
def __call__(self, func):
async def wrapper(*args, **kwargs):
async with self.semaphore:
await self.sleep()
res = await func(*args, **kwargs)
self.requests_finish_time.append(time.monotonic() + self.period)
return res
return wrapper
Usage:
#Limiter(calls_limit=5, period=1)
async def api_call():
...
async def main():
tasks = [asyncio.create_task(api_call(url)) for url in urls]
asyncio.gather(*tasks)
if __name__ == '__main__':
loop = asyncio.get_event_loop_policy().get_event_loop()
loop.run_until_complete(main())

Related

Python Faust, how to use take to run on multiple values

I'm trying to implement a faust agent using take to process multiple messages at the same time.
app = faust.App('vectors-stream')
vector_topic = app.topic('vector', value_type=VectorRecord)
#app.agent(vector_topic)
async def process_entities(stream: faust.streams.Stream):
async for records in stream.take(max_=500, within=timedelta(seconds=5)):
# yield await update_company_partition(records=records)
yield print(len(records))
now, I'm trying to write a test, just to see that the behaviour is as I except.
import asyncio
import random
from unittest import IsolatedAsyncioTestCase
import pytest
from app.data.kafka.consumer import process_entities, VectorRecord, app
class TestKafkaStream(IsolatedAsyncioTestCase):
async def asyncSetUp(self) -> None:
app.conf.store = 'memory://'
def generate_vector(self, dim: int):
return [random.uniform(0.001, 1) for i in range(dim)]
#pytest.mark.asyncio()
async def test_vectors_kafka_stream(self):
async with process_entities.test_context() as agent:
companies = ['se', 'spotify']
for company in companies:
for i in range(10):
_type = random.choice(['JobCascaded', 'UserHistory'])
msg = VectorRecord(company_slug=company, operation='upsert',
vector=self.generate_vector(16), vector_type=_type, id=i)
await agent.put(msg)
But when I put a break point on the yield print(len(records)) row, it prints that the len of records is just 1.

Asyncio.sleep() seems to sleep forever

I need to write function which adds object to array and deletes it after x seconds. I use asyncio.sleep for delay. Here is the code:
import asyncio
class Stranger:
def __init__(self, address):
self.address = address
class Fortress:
def __init__(self, time_: int, attempts: int):
self.time = time_
self.attempts = attempts
self.strangers_list: list = []
async def _handle_task(self, stranger):
self.strangers_list.append(stranger)
index = len(self.strangers_list) - 1
await asyncio.sleep(self.time)
print('Woke up')
self.strangers_list.pop(index)
async def _create_handle_task(self, stranger):
task = asyncio.create_task(self._handle_task(stranger))
print('Ran _handle_task')
def handle(self, stranger):
asyncio.run(self._create_handle_task(stranger))
async def main(tim):
await asyncio.sleep(tim)
if __name__ == "__main__":
f = Fortress(2, 4)
s = Stranger('Foo street, 32')
f.handle(s)
asyncio.run(main(3))
Theoretically, the output might be:
Ran _handle_task
Woke up
But it is just Ran _handle_task
What's the problem that interferes program to come out of the sleep?
You've created a task, which is an example of an awaitable in asyncio.
You need to await the task in your _create_handle_task method.
async def _create_handle_task(self, stranger):
task = asyncio.create_task(self._handle_task(stranger))
await task
# ^blocks until the task is complete.
print('Ran _handle_task')
Source: asyncio docs

How to change the time period for selected processes only using asyncio?

I have the following code that executes a list of methods (_tick_watchers) every 10 seconds. While this is fine for most of the methods on the _tick_watchers list, there are some that I need to be executed only once every 5 minutes. Any ideas for a simple & neat way to do so?
async def on_tick(self):
while not self._exiting:
await asyncio.sleep(10)
now = time.time()
# call all tick watchers
for w in self._tick_watchers:
w(now)
Since asyncio doesn't come with a scheduler for repeating tasks, you can generalise your on_tick() into one:
import time
import asyncio
class App:
def my_func_1(self, now):
print('every second\t{}'.format(now))
def my_func_5(self, now):
print('every 5 seconds\t{}'.format(now))
def __init__(self):
self.exiting = False
async def scheduler(self, func, interval):
while not self.exiting:
now = time.time()
func(now)
await asyncio.sleep(interval)
# to combat drift, you could try something like this:
# await asyncio.sleep(interval + now - time.time())
async def run(self):
asyncio.ensure_future(self.scheduler(self.my_func_5, 5.0))
asyncio.ensure_future(self.scheduler(self.my_func_1, 1.0))
await asyncio.sleep(11)
self.exiting = True
asyncio.get_event_loop().run_until_complete(App().run()) # test run
Note that currently you do not start your task every 10 seconds: you wait for 10 seconds after the task exits. The small difference accumulates over time, which might be important.
You have some class for periodic execution? If so, you can add timeout parameter to it.
timeout in init:
def __init__(self, timeout=10):
self.timeout = timeout
and use it in tick handler:
async def on_tick(self):
while not self._exiting:
await asyncio.sleep(self.timeout)
# ...
Then create and run several instances of that class with different timeouts.

aiohttp: rate limiting parallel requests

APIs often have rate limits that users have to follow. As an example let's take 50 requests/second. Sequential requests take 0.5-1 second and thus are too slow to come close to that limit. Parallel requests with aiohttp, however, exceed the rate limit.
To poll the API as fast as allowed, one needs to rate limit parallel calls.
Examples that I found so far decorate session.get, approximately like so:
session.get = rate_limited(max_calls_per_second)(session.get)
This works well for sequential calls. Trying to implement this in parallel calls does not work as intended.
Here's some code as example:
async with aiohttp.ClientSession() as session:
session.get = rate_limited(max_calls_per_second)(session.get)
tasks = (asyncio.ensure_future(download_coroutine(
timeout, session, url)) for url in urls)
process_responses_function(await asyncio.gather(*tasks))
The problem with this is that it will rate-limit the queueing of the tasks. The execution with gather will still happen more or less at the same time. Worst of both worlds ;-).
Yes, I found a similar question right here aiohttp: set maximum number of requests per second, but neither replies answer the actual question of limiting the rate of requests. Also the blog post from Quentin Pradet works only on rate-limiting the queueing.
To wrap it up: How can one limit the number of requests per second for parallel aiohttp requests?
If I understand you well, you want to limit the number of simultaneous requests?
There is a object inside asyncio named Semaphore, it works like an asynchronous RLock.
semaphore = asyncio.Semaphore(50)
#...
async def limit_wrap(url):
async with semaphore:
# do what you want
#...
results = asyncio.gather([limit_wrap(url) for url in urls])
updated
Suppose I make 50 concurrent requests, and they all finish in 2 seconds. So, it doesn't touch the limitation(only 25 requests per seconds).
That means I should make 100 concurrent requests, and they all finish in 2 seconds too(50 requests per seconds). But before you actually make those requests, how could you determine how long will they finish?
Or if you doesn't mind finished requests per second but requests made per second. You can:
async def loop_wrap(urls):
for url in urls:
asyncio.ensure_future(download(url))
await asyncio.sleep(1/50)
asyncio.ensure_future(loop_wrap(urls))
loop.run_forever()
The code above will create a Future instance every 1/50 second.
I approached the problem by creating a subclass of aiohttp.ClientSession() with a ratelimiter based on the leaky-bucket algorithm. I use asyncio.Queue() for ratelimiting instead of Semaphores. I’ve only overridden the _request() method. I find this approach cleaner since you only replace session = aiohttp.ClientSession() with session = ThrottledClientSession(rate_limit=15).
class ThrottledClientSession(aiohttp.ClientSession):
"""
Rate-throttled client session class inherited from aiohttp.ClientSession)
USAGE:
replace `session = aiohttp.ClientSession()`
with `session = ThrottledClientSession(rate_limit=15)`
see https://stackoverflow.com/a/60357775/107049
"""
MIN_SLEEP = 0.1
def __init__(self, rate_limit: float = None, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
self.rate_limit = rate_limit
self._fillerTask = None
self._queue = None
self._start_time = time.time()
if rate_limit is not None:
if rate_limit <= 0:
raise ValueError('rate_limit must be positive')
self._queue = asyncio.Queue(min(2, int(rate_limit) + 1))
self._fillerTask = asyncio.create_task(self._filler(rate_limit))
def _get_sleep(self) -> Optional[float]:
if self.rate_limit is not None:
return max(1 / self.rate_limit, self.MIN_SLEEP)
return None
async def close(self) -> None:
"""Close rate-limiter's "bucket filler" task"""
if self._fillerTask is not None:
self._fillerTask.cancel()
try:
await asyncio.wait_for(self._fillerTask, timeout=0.5)
except asyncio.TimeoutError as err:
print(str(err))
await super().close()
async def _filler(self, rate_limit: float = 1):
"""Filler task to fill the leaky bucket algo"""
try:
if self._queue is None:
return
self.rate_limit = rate_limit
sleep = self._get_sleep()
updated_at = time.monotonic()
fraction = 0
extra_increment = 0
for i in range(0, self._queue.maxsize):
self._queue.put_nowait(i)
while True:
if not self._queue.full():
now = time.monotonic()
increment = rate_limit * (now - updated_at)
fraction += increment % 1
extra_increment = fraction // 1
items_2_add = int(min(self._queue.maxsize - self._queue.qsize(), int(increment) + extra_increment))
fraction = fraction % 1
for i in range(0, items_2_add):
self._queue.put_nowait(i)
updated_at = now
await asyncio.sleep(sleep)
except asyncio.CancelledError:
print('Cancelled')
except Exception as err:
print(str(err))
async def _allow(self) -> None:
if self._queue is not None:
# debug
# if self._start_time == None:
# self._start_time = time.time()
await self._queue.get()
self._queue.task_done()
return None
async def _request(self, *args, **kwargs) -> aiohttp.ClientResponse:
"""Throttled _request()"""
await self._allow()
return await super()._request(*args, **kwargs)
I liked #sraw's approached this with asyncio, but their answer didn't quite cut it for me. Since I don't know if my calls to download are going to each be faster or slower than the rate limit I want to have the option to run many in parallel when requests are slow and run one at a time when requests are very fast so that I'm always right at the rate limit.
I do this by using a queue with a producer that produces new tasks at the rate limit, then many consumers that will either all wait on the next job if they're fast, or there will be work backed up in the queue if they are slow, and will run as fast as the processor/network allow:
import asyncio
from datetime import datetime
async def download(url):
# download or whatever
task_time = 1/10
await asyncio.sleep(task_time)
result = datetime.now()
return result, url
async def producer_fn(queue, urls, max_per_second):
for url in urls:
await queue.put(url)
await asyncio.sleep(1/max_per_second)
async def consumer(work_queue, result_queue):
while True:
url = await work_queue.get()
result = await download(url)
work_queue.task_done()
await result_queue.put(result)
urls = range(20)
async def main():
work_queue = asyncio.Queue()
result_queue = asyncio.Queue()
num_consumer_tasks = 10
max_per_second = 5
consumers = [asyncio.create_task(consumer(work_queue, result_queue))
for _ in range(num_consumer_tasks)]
producer = asyncio.create_task(producer_fn(work_queue, urls, max_per_second))
await producer
# wait for the remaining tasks to be processed
await work_queue.join()
# cancel the consumers, which are now idle
for c in consumers:
c.cancel()
while not result_queue.empty():
result, url = await result_queue.get()
print(f'{url} finished at {result}')
asyncio.run(main())
I developed a library named octopus-api (https://pypi.org/project/octopus-api/), that enables you to rate limit and set the number of connections (parallel) calls to the endpoint using aiohttp under the hood. The goal of it is to simplify all the aiohttp setup needed.
Here is an example of how to use it, where the get_ethereum is the user-defined request function:
from octopus_api import TentacleSession, OctopusApi
from typing import Dict, List
if __name__ == '__main__':
async def get_ethereum(session: TentacleSession, request: Dict):
async with session.get(url=request["url"], params=request["params"]) as response:
body = await response.json()
return body
client = OctopusApi(rate=50, resolution="sec", connections=6)
result: List = client.execute(requests_list=[{
"url": "https://api.pro.coinbase.com/products/ETH-EUR/candles?granularity=900&start=2021-12-04T00:00:00Z&end=2021-12-04T00:00:00Z",
"params": {}}] * 1000, func=get_ethereum)
print(result)
The TentacleSession works the same as how you write POST, GET, PUT and PATCH for aiohttp.ClientSession.
Let me know if it helps your issue related to rate limits and parallel calls.
As far as the question here regarding n requests being sent at the same time when gather() is called, the key is using create_task() with an await asyncio.sleep(1.1) before every call. Any task created with create_task is immediately run:
for i in range(THREADS):
await asyncio.sleep(1.1)
tasks.append(
asyncio.create_task(getData(session, q, ''.join(random.choice(string.ascii_lowercase) for i in range(10))))
)
await asyncio.gather(*tasks)
The other issue of limiting # of simultaneous connections is also solved in the below example by using ClientSession() context in async_payload_wrapper and setting the connector with a limit.
With this setup I can run 25 coroutines (THREADS=25) that each loop over a queue of URLS and not violate a 25 concurrent connection rule:
async def send_request(session, url, routine):
start_time = time.time()
print(f"{routine}, sending request: {datetime.now()}")
params = {
'api_key': 'nunya',
'url': '%s' % url,
'render_js': 'false',
'premium_proxy': 'false',
'country_code':'us'
}
try:
async with session.get(url='http://yourAPI.com',params=params,) as response:
data = await response.content.read()
print(f"{routine}, done request: {time.time() - start_time} seconds")
return data
except asyncio.TimeoutError as e:
print('timeout---------------------')
errors.append(url)
except aiohttp.ClientResponseError as e:
print('request failed - Server Error')
errors.append(url)
except Exception as e:
errors.append(url)
async def getData(session, q, test):
while True:
if not q.empty():
url = q.get_nowait()
resp = await send_request(session, url ,test)
if resp is not None:
processData(resp, test, url)
else:
print(f'{test} queue empty')
break
async def async_payload_wrapper():
tasks = []
q = asyncio.Queue()
for url in urls:
await q.put(url)
async with ClientSession(connector=aiohttp.TCPConnector(limit=THREADS), timeout=ClientTimeout(total=61), raise_for_status=True) as session:
for i in range(THREADS):
await asyncio.sleep(1.1)
tasks.append(
asyncio.create_task(getData(session, q, ''.join(random.choice(string.ascii_lowercase) for i in range(10))))
)
await asyncio.gather(*tasks)
if __name__ == '__main__':
start_time = time.time()
asyncio.run(async_payload_wrapper())

Python 3.5 async for blocks the ioloop

I have a simple aiohttp-server with two handlers.
First one does some computations in the async for loop. Second one just returns text response. not_so_long_operation returns 30-th fibonacci number with the slowest recursive implementation, which takes something about one second.
def not_so_long_operation():
return fib(30)
class arange:
def __init__(self, n):
self.n = n
self.i = 0
async def __aiter__(self):
return self
async def __anext__(self):
i = self.i
self.i += 1
if self.i <= self.n:
return i
else:
raise StopAsyncIteration
# GET /
async def index(request):
print('request!')
l = []
async for i in arange(20):
print(i)
l.append(not_so_long_operation())
return aiohttp.web.Response(text='%d\n' % l[0])
# GET /lol/
async def lol(request):
print('request!')
return aiohttp.web.Response(text='just respond\n')
When I'm trying to fetch / and then /lol/, it gives me response for the second one only when the first one gets finished.
What am I doing wrong and how to make index handler release the ioloop on each iteration?
Your example has no yield points (await statements) for switching between tasks.
Asynchronous iterator allows to use await inside __aiter__/__anext__ but don't insert it automatically into your code.
Say,
class arange:
def __init__(self, n):
self.n = n
self.i = 0
async def __aiter__(self):
return self
async def __anext__(self):
i = self.i
self.i += 1
if self.i <= self.n:
await asyncio.sleep(0) # insert yield point
return i
else:
raise StopAsyncIteration
should work as you expected.
In real application most likely you don't need await asyncio.sleep(0) calls because you will wait on database access and similar activities.
Since, fib(30) is CPU bound and sharing little data, you should probably use a ProcessPoolExecutor (as opposed to a ThreadPoolExecutor):
async def index(request):
loop = request.app.loop
executor = request.app["executor"]
result = await loop.run_in_executor(executor, fib, 30)
return web.Response(text="%d" % result)
Setup executor when you create the app:
app = Application(...)
app["exector"] = ProcessPoolExector()
An asynchronous iterator is not really needed here. Instead you can simply give the control back to the event loop inside your loop. In python 3.4, this is done by using a simple yield:
#asyncio.coroutine
def index(self):
for i in range(20):
not_so_long_operation()
yield
In python 3.5, you can define an Empty object that basically does the same thing:
class Empty:
def __await__(self):
yield
Then use it with the await syntax:
async def index(request):
for i in range(20):
not_so_long_operation()
await Empty()
Or simply use asyncio.sleep(0) that has been recently optimized:
async def index(request):
for i in range(20):
not_so_long_operation()
await asyncio.sleep(0)
You could also run the not_so_long_operation in a thread using the default executor:
async def index(request, loop):
for i in range(20):
await loop.run_in_executor(None, not_so_long_operation)

Categories