asyncio gather yielding results as they come in - python

I want to be able to yield results from a set of tasks run by gather as they come in for further processing.
# Not real code, but an example
async for response in asyncio.gather(*[aiohttp.get(url) for url in ['https://www.google.com', 'https://www.amazon.com']]):
await process_response(response)
At present, I can use the gather method to run all get concurrently, but must wait until they're all complete to process them. I'm still new to Python async/await, so maybe there's some obvious way of doing this I'm missing.
# What I can do now
responses = await asyncio.gather(*[aiohttp.get(url) for url in ['https://www.google.com', 'https://www.amazon.com']])
await asyncio.gather(*[process_response(response) for response in responses])
Thanks!

gather as you already noted will wait until all coroutines are done, thus you need to find another way.
For example you can use function asyncio.as_completed that seems to do exactly what you want.
import asyncio
async def echo(t):
await asyncio.sleep(t)
return t
async def main():
coros = [
echo(3),
echo(2),
echo(1),
]
for first_completed in asyncio.as_completed(coros):
res = await first_completed
print(f'Done {res}')
asyncio.run(main())
Result:
Done 1
Done 2
Done 3
[Finished in 3 sec]

Related

Parallelize requests call with asyncio in Python

I'm trying to create an interface to an API, and I want to have the option to easily run the requests sync or asynchronously, and I came up with the following code.
import asyncio
import requests
def async_run(coro_list):
loop = asyncio.get_event_loop()
futures = [loop.run_in_executor(None, asyncio.run, coro) for coro in coro_list]
result = loop.run_until_complete(asyncio.gather(*futures))
return result
def sync_get(url):
return requests.get(url)
async def async_get(url):
return sync_get(url)
coro_list = [async_get("https://google.com"), async_get("https://google.com")]
responses = async_run(coro_list)
print(responses)
For me it's very intuitive to either call sync_get or create a list of async_get and call async_run, and requires no knowledge of async Python to understand how it works.
The only problem is that loop.run_in_executor(None, asyncio.run, coro) doesn't sound too optimal, and I couldn't find anyone else running this code on Github. So I'm wondering, is there a simpler way to accomplish the objective of abstracting these threading and asyncio concepts in some similar way, or is this code already optimal?
asyncio.run() is usually used as the main entry to run async code from sync code.
loop.run_in_executor(None, asyncio.run, coro) cause an event loop created in executor threads to run coro in coro_list. Why not directly run sync_get in executor threads?
import asyncio
import requests
def async_run(url_list):
loop = asyncio.get_event_loop()
futures = [loop.run_in_executor(None, sync_get, url) for url in url_list]
result = await asyncio.gather(*futures)
return result
def sync_get(url):
return requests.get(url)
#
# async def async_get(url):
# return sync_get(url)
url_list = ["https://google.com", "https://google.com"]
responses = asyncio.run(async_run(url_list))
print(responses)
There are async libaries, eg. aiohttp and httpx, to accomplish similar work.
At the end I chose not to cover completely asyncio under my interface.
Still with the goal of having not having to manage 2 "requests" functions, I made the API async first, and run the synchronous one with asyncio and I ended up with something like this.
def sync_request():
return asyncio.run(async_request(...))
async def async_request():
return await aiohttp.request(...) # pseudo code

python3.6 async/await still works synchronously with fastAPI

I have a fastAPI app that posts two requests, one of them is longer (if it helps, they're Elasticsearch queries and I'm using the AsyncElasticsearch module which already returns coroutine). This is my attempt:
class my_module:
search_object = AsyncElasticsearch(url, port)
async def do_things(self):
resp1 = await search_object.search() #the longer one
print(check_resp1)
resp2 = await search_object.search() #the shorter one
print(check_resp2)
process(resp2)
process(resp1)
do_synchronous_things()
return thing
app = FastAPI()
#app.post("/")
async def service(user_input):
result = await my_module.do_things()
return results
What I observed is instead of awaiting resp1, by the time it got to check_resp1 it's already a full response, as if I didn't use async at all.
I'm new to python async, I knew my code wouldn't work, but I don't know how to fix it. As far as I understand, when interpreter sees await it starts the function then just moves on, which in this case should immediately post the next request. How do I make it do that?
Yes, that's correct the coroutine won't proceed until the results are ready. You can use asyncio.gather to run tasks concurrently:
import asyncio
async def task(msg):
print(f"START {msg}")
await asyncio.sleep(1)
print(f"END {msg}")
return msg
async def main():
await task("1")
await task("2")
results = await asyncio.gather(task("3"), task("4"))
print(results)
if __name__ == "__main__":
asyncio.run(main())
Test:
$ python test.py
START 1
END 1
START 2
END 2
START 3
START 4
END 3
END 4
['3', '4']
Alternatively you can use asyncio.as_completed to get the earliest next result:
for coro in asyncio.as_completed((task("5"), task("6"))):
earliest_result = await coro
print(earliest_result)
Update Fri 2 Apr 09:25:33 UTC 2021:
asyncio.run is available since Python 3.7+, in previous versions you will have to create and start the loop manually:
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Explanation
The reason your code run synchronyously is that in do_things function, the code is executed as follow:
Schedule search_object.search() to execute
Wait till search_object.search() is finished and get the result
Schedule search_object.search() to execute
Wait till search_object.search() is finished and get the result
Execute (synchronyously) process(resp2)
Execute (synchronyously) process(resp1)
Execute (synchronyously) do_synchronous_things()
What you intended, is to make steps 1 and 3 executed before 2 and 4. You can make it easily with unsync library - here is the documentation.
How you can fix this
from unsync import unsync
class my_module:
search_object = AsyncElasticsearch(url, port)
#unsync
async def search1():
return await search_object.search()
#unsync
async def search2(): # not sure if this is any different to search1
return await search_object.search()
async def do_things(self):
task1, task2 = self.search1(), self.search2() # schedule tasks
resp1, resp2 = task1.result(), task2.result() # wait till tasks are executed
# you might also do similar trick with process function to run process(resp2) and process(resp1) concurrently
process(resp2)
process(resp1)
do_synchronous_things() # if this does not rely on resp1 and resp2 it might also be put into separate task to make the computation quicker. To do this use #unsync(cpu_bound=True) decorator
return thing
app = FastAPI()
#app.post("/")
async def service(user_input):
result = await my_module.do_things()
return results
More information
If you want to learn more about asyncio and asyncronyous programming, I recommend this tutorial. There is also similar case that you presented with a few possible solutions to make the coroutines run concurrently.
PS. Obviosuly I could not run this code, so you must debug it on your own.

When to Use Await Keywork in Python?

I'm currently trying to learn asyncio in Python. I know that the await keyword tells the loop that it can switch coroutines. However, when should I actually use it? Why not put it before everything?
Additionally, why is the await before 'response.text()', why not before the session.get(url)?
async def print_preview(url):
# connect to the server
async with aiohttp.ClientSession() as session:
# create get request
async with session.get(url) as response:
# wait for response
response = await response.text()
# print first 3 not empty lines
count = 0
lines = list(filter(lambda x: len(x) > 0, response.split('\n')))
print('-'*80)
for line in lines[:3]:
print(line)
print()
You use await with functions that are marked as coroutine in the documentation. For example, ClientResponse.text is marked as coroutine, while ClientResponse.close is not, which means you must await the former and must not await the latter. If you forget to await a coroutine, it simply won't execute and its return value will be a "coroutine object", which is useless (except for use with await).
session.get() returns an async context manager. When passed to async with, the coroutines it implements are awaited behind the scenes.
Also note that awaiting is not the only thing you can do with coroutines, the other is converting them into tasks, which allows them to run in parallel (without additional cost on the OS level). For more information, consult a tutorial on asyncio.

Retrieving data from python's coroutine object

I am trying to learn async, and now I am trying to get whois information for a batch of domains. I found this lib aiowhois, but there are only a few strokes of information, not enough for such newbie as I am.
This code works without errors, but I don't know how to print data from parsed whois variable, which is coroutine object.
resolv = aiowhois.Whois(timeout=10)
async def coro(url, sem):
parsed_whois = await resolv.query(url)
async def main():
tasks = []
sem = asyncio.Semaphore(4)
for url in domains:
task = asyncio.Task(coro(url, sem))
tasks.append(task)
await asyncio.gather(*tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
You can avoid using tasks. Just apply gather to the coroutine directly.
In case you are confused about the difference, this SO QA might help you (especially the second answer).
You can have each coroutine return its result, without resorting to global variables:
async def coro(url):
return await resolv.query(url)
async def main():
domains = ...
ops = [coro(url) for url in domains]
rets = await asyncio.gather(*ops)
print(rets)
Please see the official docs to learn more about how to use gather or wait or even more options
Note: if you are using the latest python versions, you can also simplify the loop running with just
asyncio.run(main())
Note 2: I have removed the semaphore from my code, as it's unclear why you need it and where.
all_parsed_whois = [] # make a global
async def coro(url, sem):
all_parsed_whois.append(await resolv.query(url))
If you want the data as soon as it is available you could task.add_done_callback()
python asyncio add_done_callback with async def

Python 3.5 async/await with real code example

I've read tons of articles and tutorial about Python's 3.5 async/await thing. I have to say I'm pretty confused, because some use get_event_loop() and run_until_complete(), some use ensure_future(), some use asyncio.wait(), and some use call_soon().
It seems like I have a lot choices, but I have no idea if they are completely identical or there are cases where you use loops and there are cases where you use wait().
But the thing is all examples work with asyncio.sleep() as simulation of real slow operation which returns an awaitable object. Once I try to swap this line for some real code the whole thing fails. What the heck are the differences between approaches written above and how should I run a third-party library which is not ready for async/await. I do use the Quandl service to fetch some stock data.
import asyncio
import quandl
async def slow_operation(n):
# await asyncio.sleep(1) # Works because it's await ready.
await quandl.Dataset(n) # Doesn't work because it's not await ready.
async def main():
await asyncio.wait([
slow_operation("SIX/US9884981013EUR4"),
slow_operation("SIX/US88160R1014EUR4"),
])
# You don't have to use any code for 50 requests/day.
quandl.ApiConfig.api_key = "MY_SECRET_CODE"
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
I hope you get the point how lost I feel and how simple thing I would like to have running in parallel.
If a third-party library is not compatible with async/await then obviously you can't use it easily. There are two cases:
Let's say that the function in the library is asynchronous and it gives you a callback, e.g.
def fn(..., clb):
...
So you can do:
def on_result(...):
...
fn(..., on_result)
In that case you can wrap such functions into the asyncio protocol like this:
from asyncio import Future
def wrapper(...):
future = Future()
def my_clb(...):
future.set_result(xyz)
fn(..., my_clb)
return future
(use future.set_exception(exc) on exception)
Then you can simply call that wrapper in some async function with await:
value = await wrapper(...)
Note that await works with any Future object. You don't have to declare wrapper as async.
If the function in the library is synchronous then you can run it in a separate thread (probably you would use some thread pool for that). The whole code may look like this:
import asyncio
import time
from concurrent.futures import ThreadPoolExecutor
# Initialize 10 threads
THREAD_POOL = ThreadPoolExecutor(10)
def synchronous_handler(param1, ...):
# Do something synchronous
time.sleep(2)
return "foo"
# Somewhere else
async def main():
loop = asyncio.get_event_loop()
futures = [
loop.run_in_executor(THREAD_POOL, synchronous_handler, param1, ...),
loop.run_in_executor(THREAD_POOL, synchronous_handler, param1, ...),
loop.run_in_executor(THREAD_POOL, synchronous_handler, param1, ...),
]
await asyncio.wait(futures)
for future in futures:
print(future.result())
with THREAD_POOL:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
If you can't use threads for whatever reason then using such a library simply makes entire asynchronous code pointless.
Note however that using synchronous library with async is probably a bad idea. You won't get much and yet you complicate the code a lot.
You can take a look at the following simple working example from here. By the way it returns a string worth reading :-)
import aiohttp
import asyncio
async def fetch(client):
async with client.get('https://docs.aiohttp.org/en/stable/client_reference.html') as resp:
assert resp.status == 200
return await resp.text()
async def main():
async with aiohttp.ClientSession() as client:
html = await fetch(client)
print(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Categories