I'm refactoring the following functions to use requests_async instead of aiohttp
Original code:
async def ping_url(session, url):
logger.debug('Ping: %s', url)
try:
response = await session.get(url)
except asyncio.TimeoutError:
logger.info('Ping timeout for %s', url)
return ('TIMEOUT', url)
else:
logger.debug('Ping done for %s', url)
return (response.status, url)
async def ping_urls(urls, headers=None):
async with aiohttp.ClientSession(timeout=timeout, headers=headers) as session:
tasks = [
asyncio.create_task(
ping_url(session, url)
) for url in urls
]
ping_results = await asyncio.gather(*tasks)
return ping_results
After refactoring:
async def ping_url(session, url):
logger.debug('Ping: %s', url)
try:
response = await session.get(url)
except requests_async.exceptions.Timeout:
logger.info('Ping timeout for %s', url)
return ('TIMEOUT', url)
else:
logger.debug('Ping done for %s', url)
return (response.status_code, url)
async def ping_urls(urls, headers=None):
async with requests_async.Session() as session:
session.headers.update(headers or {})
session.timeout = 2
tasks = [
asyncio.create_task(
ping_url(session, url)
) for url in urls
]
ping_results = await asyncio.gather(*tasks)
return ping_results
If I run the ping_urls coro on my machine with the same URL pinged by my lambda, there is no significant difference in performance between the two versions.
Inside my AWS lambda, the response version is 3 times slower than the aiohttp version.
Any idea on what could be the cause of the difference and why it affects the code only when it's executed in the aws lambda?
I've been checking that the version of dependencies installed by my packaging scripts are the same as the one I use locally.
I have also edited my lambda to execute both versions sequentially in the same invocation to ensure that the execution context is the same, but the performance differences are the same (3/1).
Related
Problem I'm trying to solve:
I'm making many api requests to a server. I'm trying to create delays bewtween async api calls to comply with the server's rate limit policy.
What I want it to do
I want it to behave like this:
Make api request #1
wait 0.1 seconds
Make api request #2
wait 0.1 seconds
... and so on ...
repeat until all requests are made
gather the responses and return the results in one object (results)
Issue:
When when I introduced asyncio.sleep() or time.sleep() in the code, it still made api requests almost instantaneously. It seemed to delay the execution of print(), but not the api requests. I suspect that I have to create the delays within the loop, not at the fetch_one() or fetch_all(), but couldn't figure out how to do so.
Code block:
async def fetch_all(loop, urls, delay):
results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
return results
async def fetch_one(loop, url, delay):
#time.sleep(delay)
#asyncio.sleep(delay)
async with aiohttp.ClientSession(loop=loop) as session:
async with session.get(url, ssl=SSLContext()) as resp:
# print("An api call to ", url, " is made at ", time.time())
# print(resp)
return await resp
delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
Versions I'm using:
python 3.8.5
aiohttp 3.7.4
asyncio 3.4.3
I would appreciate any tips on guiding me to the right direction!
The call to asyncio.gather will launch all requests "simultaneously" - and on the other hand, if you would simply use a lock or await for each task, you would not gain anything from using parallelism at all.
The simplest thing to do, if you know the rate you can issue the requests, is simply to increase the asynchronous pause before each request in sucession - a simple global variable can do that:
next_delay = 0.1
async def fetch_all(loop, urls, delay):
results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
return results
async def fetch_one(loop, url, delay):
global next_delay
next_delay += delay
await asyncio.sleep(next_delay)
async with aiohttp.ClientSession(loop=loop) as session:
async with session.get(url, ssl=SSLContext()) as resp:
# print("An api call to ", url, " is made at ", time.time())
# print(resp)
return await resp
delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
Now, if you want like, issue 5 requests and then issue the next 5, you could use a synchronization primitive like asyncio.Condition, using its wait_for on an expression which checks how many api calls are active:
active_calls = 0
MAX_CALLS = 5
async def fetch_all(loop, urls, delay):
event = asyncio.Event()
event.set()
results = await asyncio.gather(*[fetch_one(loop, url, delay, event) for url in urls], return_exceptions=True)
return results
async def fetch_one(loop, url, delay, cond):
global active_calls
active_calls += 1
if active_calls > MAX_CALLS:
event.clear()
await event.wait()
try:
async with aiohttp.ClientSession(loop=loop) as session:
async with session.get(url, ssl=SSLContext()) as resp:
# print("An api call to ", url, " is made at ", time.time())
# print(resp)
return await resp
finally:
active_calls -= 1
if active_calls == 0:
event.set()
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
For both examples, should your task avoid global variables in the design (actually,these are "module" variables) - you could either move all funtions to a class, and work on an instance, and promote the global variables to instance attributes, or use a mutable container, such as a list for holding the active_calls value in its first item, and pass that as a parameter.
When you use asyncio.gather you run all fetch_one coroutines concurrently. All of them wait for delay together, than make API calls instantaneously together.
To solve the issue, you should either await fetch_one in one by one in fetch_all or to use Semaphore to signalize next shouldn't start before previous is done.
Here's the idea:
import asyncio
_sem = asyncio.Semaphore(1)
async def fetch_all(loop, urls, delay):
results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
return results
async def fetch_one(loop, url, delay):
async with _sem: # next coroutine(s) will stuck here until the previous is done
await asyncio.sleep(delay)
async with aiohttp.ClientSession(loop=loop) as session:
async with session.get(url, ssl=SSLContext()) as resp:
# print("An api call to ", url, " is made at ", time.time())
# print(resp)
return await resp
delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
I am trying to achieve aiohttp async processing of requests that have been defined in my class as follows:
class Async():
async def get_service_1(self, zip_code, session):
url = SERVICE1_ENDPOINT.format(zip_code)
response = await session.request('GET', url)
return await response
async def get_service_2(self, zip_code, session):
url = SERVICE2_ENDPOINT.format(zip_code)
response = await session.request('GET', url)
return await response
async def gather(self, zip_code):
async with aiohttp.ClientSession() as session:
return await asyncio.gather(
self.get_service_1(zip_code, session),
self.get_service_2(zip_code, session)
)
def get_async_requests(self, zip_code):
asyncio.set_event_loop(asyncio.SelectorEventLoop())
loop = asyncio.get_event_loop()
results = loop.run_until_complete(self.gather(zip_code))
loop.close()
return results
When running to get the results from the get_async_requests function, i am getting the following error:
TypeError: object ClientResponse can't be used in 'await' expression
Where am i going wrong in the code? Thank you in advance
When you await something like session.response, the I/O starts, but aiohttp returns when it receives the headers; it doesn't want for the response to finish. (This would let you react to a status code without waiting for the entire body of the response.)
You need to await something that does that. If you're expecting a response that contains text, that would be response.text. If you're expecting JSON, that's response.json. This would look something like
response = await session.get(url)
return await response.text()
I need to send over 1 million HTTP requests and so far every option I've tried is just way too slow. I thought I could speed it up with aiohttp but that doesn't seem any faster than requests.
I was trying to do it with python but I'm open to other options as well.
Here is the code using both requests and aiohttp, any tips for speeding up the process?
requests code:
import requests
url = 'https://mysite.mysite:443/login'
users = [line.strip() for line in open("ids.txt", "r")]
try:
for user in users:
r = requests.post(url,data ={'username':user})
if 'login.error.invalid.username' not in r.text:
print(user, " is valid")
else:
print(user, " not found")
except Exception as e:
print(e)
aiohttp code:
import aiohttp
import asyncio
url = 'https://mysite.mysite:443/login'
users = [line.strip() for line in open("ids.txt", "r")]
async def main():
async with aiohttp.ClientSession() as session:
try:
for user in users:
payload = {"timeZoneOffSet": "240", "useragent": '', "username": user}
async with session.post(url, data=payload) as resp:
if 'login.error.invalid.username' not in await resp.text():
print(user, " is valid")
else:
print(user, " not found")
except Exception as e:
print(e)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
You could use an asyncio.gather to collect results from a bunch of requests working in parallel.
Warning: code is just an example and is not tested.
import asyncio
from aiohttp import ClientSession
async def fetch(url, session, payload):
async with session.post(url, data=payload) as resp:
if 'login.error.invalid.username' not in await resp.text():
print(user, " is valid")
else:
print(user, " not found")
async def run(r):
url = "http://your_url:8000/{}"
tasks = []
async with ClientSession() as session:
for i in range(r):
task = asyncio.ensure_future(fetch(url.format(i), session))
tasks.append(task)
responses = await asyncio.gather(*tasks)
# you now have all response bodies in this variable
def print_responses(result):
print(result)
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run(4))
loop.run_until_complete(future)
I have a script that checks the status code for a couple hundred thousand supplied websites, and I was trying to integrate a Semaphore to the flow to speed up processing. The problem is that whenever I integrate a Semaphore, I just get a list populated with None objects, and I'm not entirely sure why.
I have been mostly copying code from other sources as I don't fully grok asynchronous programming fully yet, but it seems like when I debug I should be getting results out of the function, but something is going wrong when I gather the results. I've tried juggling around my looping, my gathering, ensuring futures, etc, but nothing seems to return a list of things that work.
async def fetch(session, url):
try:
async with session.head(url, allow_redirects=True) as resp:
return url, resp.real_url, resp.status, resp.reason
except Exception as e:
return url, None, e, 'Error'
async def bound_fetch(sem, session, url):
async with sem:
await fetch(session, url)
async def run(urls):
timeout = 15
tasks = []
sem = asyncio.Semaphore(100)
conn = aiohttp.TCPConnector(limit=64, ssl=False)
async with aiohttp.ClientSession(connector=conn) as session:
for url in urls:
task = asyncio.wait_for(bound_fetch(sem, session, url), timeout)
tasks.append(task)
responses = await asyncio.gather(*tasks)
# responses = [await f for f in tqdm.tqdm(asyncio.as_completed(tasks), total=len(tasks))]
return responses
urls = ['https://google.com', 'https://yahoo.com']
loop = asyncio.ProactorEventLoop()
data = loop.run_until_complete(run(urls))
I've commented out the progress bar component, but that implementation returns the desired results when there is no semaphore.
Any help would be greatly appreciated. I am furiously reading up on asynchronous programming, but I can't wrap my mind around it yet.
You should explicitly return results of awaiting coroutines.
Replace this code...
async def bound_fetch(sem, session, url):
async with sem:
await fetch(session, url)
... with this:
async def bound_fetch(sem, session, url):
async with sem:
return await fetch(session, url)
The Getting Started docs for aiohttp give the following client example:
import asyncio
import aiohttp
async def fetch_page(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
assert response.status == 200
return await response.read()
loop = asyncio.get_event_loop()
with aiohttp.ClientSession(loop=loop) as session:
content = loop.run_until_complete(
fetch_page(session, 'http://python.org'))
print(content)
And they give the following note for Python 3.4 users:
If you are using Python 3.4, please replace await with yield from and
async def with a #coroutine decorator.
If I follow these instructions I get:
import aiohttp
import asyncio
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
return (yield from response.text())
if __name__ == '__main__':
loop = asyncio.get_event_loop()
with aiohttp.ClientSession(loop=loop) as session:
html = loop.run_until_complete(
fetch(session, 'http://python.org'))
print(html)
However, this will not run, because async with is not supported in Python 3.4:
$ python3 client.py
File "client.py", line 7
async with session.get(url) as response:
^
SyntaxError: invalid syntax
How can I translate the async with statement to work with Python 3.4?
Just don't use the result of session.get() as a context manager; use it as a coroutine directly instead. The request context manager that session.get() produces would normally release the request on exit, but so does using response.text(), so you could ignore that here:
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
response = yield from session.get(url)
return (yield from response.text())
The request wrapper returned here doesn't have the required asynchronous methods (__aenter__ and __aexit__), they omitted entirely when not using Python 3.5 (see the relevant source code).
If you have more statements between the session.get() call and accessing the response.text() awaitable, you probably want to use a try:..finally: anyway to release the connection; the Python 3.5 release context manager also closes the response if an exception occurred. Because a yield from response.release() is needed here, this can't be encapsulated in a context manager before Python 3.4:
import sys
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
response = yield from session.get(url)
try:
# other statements
return (yield from response.text())
finally:
if sys.exc_info()[0] is not None:
# on exceptions, close the connection altogether
response.close()
else:
yield from response.release()
aiohttp's examples implemented using 3.4 syntax. Based on json client example your function would be:
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(10):
resp = yield from session.get(url)
try:
return (yield from resp.text())
finally:
yield from resp.release()
Upd:
Note that Martijn's solution would work for simple cases, but may lead to unwanted behavior in specific cases:
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(5):
response = yield from session.get(url)
# Any actions that may lead to error:
1/0
return (yield from response.text())
# exception + warning "Unclosed response"
Besides exception you'll get also warning "Unclosed response". This may lead to connections leak in complex app. You will avoid this problem if you'll manually call resp.release()/resp.close():
#asyncio.coroutine
def fetch(session, url):
with aiohttp.Timeout(5):
resp = yield from session.get(url)
try:
# Any actions that may lead to error:
1/0
return (yield from resp.text())
except Exception as e:
# .close() on exception.
resp.close()
raise e
finally:
# .release() otherwise to return connection into free connection pool.
# It's ok to release closed response:
# https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/client_reqrep.py#L664
yield from resp.release()
# exception only
I think it's better to follow official examples (and __aexit__ implementation) and call resp.release()/resp.close() explicitly.