I am using aiohttp and asyncio to run multiple requests asynchronously, the problem is when i try to print the data i receive i end up getting the data of another request in the task queue. I have tried to debug this and look at the docs for any answers but i am unable to solve this problem.
here's my code:
from time import sleep
import aiohttp
import asyncio
async def search(query, session):
search_params = {
"query":query
}
async with session.get(
url,
params=search_params,
) as response:
json_response = await response.json()
data = json_response["data"]
print(data)
"""the above line always prints the data from the response of the first task to get executed
and not the current data from this request with a different query"""
async def main():
async with aiohttp.ClientSession() as session:
await init_session(session)
await enable_search(session)
while True:
tasks = [asyncio.create_task(search(session=session, query)) for query in inputs]
await asyncio.gather(*tasks)
sleep(5)
if __name__ == "__main__":
asyncio.run(main())
Problem I'm trying to solve:
I'm making many api requests to a server. I'm trying to create delays bewtween async api calls to comply with the server's rate limit policy.
What I want it to do
I want it to behave like this:
Make api request #1
wait 0.1 seconds
Make api request #2
wait 0.1 seconds
... and so on ...
repeat until all requests are made
gather the responses and return the results in one object (results)
Issue:
When when I introduced asyncio.sleep() or time.sleep() in the code, it still made api requests almost instantaneously. It seemed to delay the execution of print(), but not the api requests. I suspect that I have to create the delays within the loop, not at the fetch_one() or fetch_all(), but couldn't figure out how to do so.
Code block:
async def fetch_all(loop, urls, delay):
results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
return results
async def fetch_one(loop, url, delay):
#time.sleep(delay)
#asyncio.sleep(delay)
async with aiohttp.ClientSession(loop=loop) as session:
async with session.get(url, ssl=SSLContext()) as resp:
# print("An api call to ", url, " is made at ", time.time())
# print(resp)
return await resp
delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
Versions I'm using:
python 3.8.5
aiohttp 3.7.4
asyncio 3.4.3
I would appreciate any tips on guiding me to the right direction!
The call to asyncio.gather will launch all requests "simultaneously" - and on the other hand, if you would simply use a lock or await for each task, you would not gain anything from using parallelism at all.
The simplest thing to do, if you know the rate you can issue the requests, is simply to increase the asynchronous pause before each request in sucession - a simple global variable can do that:
next_delay = 0.1
async def fetch_all(loop, urls, delay):
results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
return results
async def fetch_one(loop, url, delay):
global next_delay
next_delay += delay
await asyncio.sleep(next_delay)
async with aiohttp.ClientSession(loop=loop) as session:
async with session.get(url, ssl=SSLContext()) as resp:
# print("An api call to ", url, " is made at ", time.time())
# print(resp)
return await resp
delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
Now, if you want like, issue 5 requests and then issue the next 5, you could use a synchronization primitive like asyncio.Condition, using its wait_for on an expression which checks how many api calls are active:
active_calls = 0
MAX_CALLS = 5
async def fetch_all(loop, urls, delay):
event = asyncio.Event()
event.set()
results = await asyncio.gather(*[fetch_one(loop, url, delay, event) for url in urls], return_exceptions=True)
return results
async def fetch_one(loop, url, delay, cond):
global active_calls
active_calls += 1
if active_calls > MAX_CALLS:
event.clear()
await event.wait()
try:
async with aiohttp.ClientSession(loop=loop) as session:
async with session.get(url, ssl=SSLContext()) as resp:
# print("An api call to ", url, " is made at ", time.time())
# print(resp)
return await resp
finally:
active_calls -= 1
if active_calls == 0:
event.set()
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
For both examples, should your task avoid global variables in the design (actually,these are "module" variables) - you could either move all funtions to a class, and work on an instance, and promote the global variables to instance attributes, or use a mutable container, such as a list for holding the active_calls value in its first item, and pass that as a parameter.
When you use asyncio.gather you run all fetch_one coroutines concurrently. All of them wait for delay together, than make API calls instantaneously together.
To solve the issue, you should either await fetch_one in one by one in fetch_all or to use Semaphore to signalize next shouldn't start before previous is done.
Here's the idea:
import asyncio
_sem = asyncio.Semaphore(1)
async def fetch_all(loop, urls, delay):
results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
return results
async def fetch_one(loop, url, delay):
async with _sem: # next coroutine(s) will stuck here until the previous is done
await asyncio.sleep(delay)
async with aiohttp.ClientSession(loop=loop) as session:
async with session.get(url, ssl=SSLContext()) as resp:
# print("An api call to ", url, " is made at ", time.time())
# print(resp)
return await resp
delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
I am trying to achieve aiohttp async processing of requests that have been defined in my class as follows:
class Async():
async def get_service_1(self, zip_code, session):
url = SERVICE1_ENDPOINT.format(zip_code)
response = await session.request('GET', url)
return await response
async def get_service_2(self, zip_code, session):
url = SERVICE2_ENDPOINT.format(zip_code)
response = await session.request('GET', url)
return await response
async def gather(self, zip_code):
async with aiohttp.ClientSession() as session:
return await asyncio.gather(
self.get_service_1(zip_code, session),
self.get_service_2(zip_code, session)
)
def get_async_requests(self, zip_code):
asyncio.set_event_loop(asyncio.SelectorEventLoop())
loop = asyncio.get_event_loop()
results = loop.run_until_complete(self.gather(zip_code))
loop.close()
return results
When running to get the results from the get_async_requests function, i am getting the following error:
TypeError: object ClientResponse can't be used in 'await' expression
Where am i going wrong in the code? Thank you in advance
When you await something like session.response, the I/O starts, but aiohttp returns when it receives the headers; it doesn't want for the response to finish. (This would let you react to a status code without waiting for the entire body of the response.)
You need to await something that does that. If you're expecting a response that contains text, that would be response.text. If you're expecting JSON, that's response.json. This would look something like
response = await session.get(url)
return await response.text()
I am trying to use aiohttp to send requests one after another like this
import aiohttp
import asyncio
from datetime import datetime
async def main():
request_url = "https://..."
async with aiohttp.ClientSession() as session:
while True:
print(datetime.now())
async with session.get(request_url) as response:
json_data = await response.json()
print(json_data)
await asyncio.sleep(0.2)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
So I would expect each datetime print to be 0.2s apart. However, they seem to be about 0.35s apart as I think it takes 0.15s to get the data from the response. Why is this happening? I want it to be asynchronous so it should just go onto the next one?
How can I fix this?
When you use await all next code will wait for end of this code.
If you want to run asyncio code asynchronously, you should use functions like asyncio.gather
import asyncio
import aiohttp
import datetime
async def fetch(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
print('#', response.status)
async def worker(queue):
print('START WORKER')
while True:
url = await queue.get()
await fetch(url)
queue.task_done()
async def control(queue):
while True:
print(datetime.datetime.now())
queue.put_nowait('https://docs.python.org/')
await asyncio.sleep(0.2)
async def main():
queue = asyncio.Queue()
await asyncio.gather(
control(queue),
asyncio.gather(*[worker(queue) for _ in range(10)])
)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Sending http request and fetching response back takes some time. You need excluding this time from asyncio.sleep() call:
import aiohttp
import asyncio
import time
from datetime import datetime
async def main():
request_url = "https://..."
async with aiohttp.ClientSession() as session:
while True:
print(datetime.now())
t0 = time.monotonic()
async with session.get(request_url) as response:
json_data = await response.json()
print(json_data)
t1 = time.monotonic()
await asyncio.sleep(0.2 - (t1 - t0))
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
I have the following code
import asyncio
import aiohttp
urls = [
'http://54.224.27.241',
'http://54.224.27.241',
'http://54.224.27.241',
'http://54.224.27.241',
'http://54.224.27.241',
]
async def query(urls):
out = []
with aiohttp.ClientSession() as session:
for url in urls:
try:
async with session.get(url, timeout=5) as resp:
text = await resp.text()
out.append(resp.status)
except:
print('timeout')
return out
loop = asyncio.get_event_loop()
out = loop.run_until_complete(query(urls))
loop.close()
print(str(out))
The code is much slower than the one that uses a threadpool and keep increasing if you increase the number of urls (lets say 20, 50 etc.)
I have a feeling that when the initial connection establishment is not done in an async way.
(Note that I am connecting here to an non-existing server to deliberately produce a connection timeout).
Can someone point out what is wrong here?
Warning: I don't promise this code works, as I can't install aiohttp atm, but looking at the example in the docs
async def fetch(session, url):
async with async_timeout.timeout(10):
async with session.get(url) as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'http://python.org')
print(html)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Notice how they're calling the aiohttp.ClientSession() with the async keyword. Additionally, I was getting some error in your line data = await async with session.get(url) as resp:
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
out = []
async with aiohttp.ClientSession() as session:
for url in urls:
data = await fetch(session, url)
out.append(data)
return out
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())