I've read tons of articles and tutorial about Python's 3.5 async/await thing. I have to say I'm pretty confused, because some use get_event_loop() and run_until_complete(), some use ensure_future(), some use asyncio.wait(), and some use call_soon().
It seems like I have a lot choices, but I have no idea if they are completely identical or there are cases where you use loops and there are cases where you use wait().
But the thing is all examples work with asyncio.sleep() as simulation of real slow operation which returns an awaitable object. Once I try to swap this line for some real code the whole thing fails. What the heck are the differences between approaches written above and how should I run a third-party library which is not ready for async/await. I do use the Quandl service to fetch some stock data.
import asyncio
import quandl
async def slow_operation(n):
# await asyncio.sleep(1) # Works because it's await ready.
await quandl.Dataset(n) # Doesn't work because it's not await ready.
async def main():
await asyncio.wait([
slow_operation("SIX/US9884981013EUR4"),
slow_operation("SIX/US88160R1014EUR4"),
])
# You don't have to use any code for 50 requests/day.
quandl.ApiConfig.api_key = "MY_SECRET_CODE"
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
I hope you get the point how lost I feel and how simple thing I would like to have running in parallel.
If a third-party library is not compatible with async/await then obviously you can't use it easily. There are two cases:
Let's say that the function in the library is asynchronous and it gives you a callback, e.g.
def fn(..., clb):
...
So you can do:
def on_result(...):
...
fn(..., on_result)
In that case you can wrap such functions into the asyncio protocol like this:
from asyncio import Future
def wrapper(...):
future = Future()
def my_clb(...):
future.set_result(xyz)
fn(..., my_clb)
return future
(use future.set_exception(exc) on exception)
Then you can simply call that wrapper in some async function with await:
value = await wrapper(...)
Note that await works with any Future object. You don't have to declare wrapper as async.
If the function in the library is synchronous then you can run it in a separate thread (probably you would use some thread pool for that). The whole code may look like this:
import asyncio
import time
from concurrent.futures import ThreadPoolExecutor
# Initialize 10 threads
THREAD_POOL = ThreadPoolExecutor(10)
def synchronous_handler(param1, ...):
# Do something synchronous
time.sleep(2)
return "foo"
# Somewhere else
async def main():
loop = asyncio.get_event_loop()
futures = [
loop.run_in_executor(THREAD_POOL, synchronous_handler, param1, ...),
loop.run_in_executor(THREAD_POOL, synchronous_handler, param1, ...),
loop.run_in_executor(THREAD_POOL, synchronous_handler, param1, ...),
]
await asyncio.wait(futures)
for future in futures:
print(future.result())
with THREAD_POOL:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
If you can't use threads for whatever reason then using such a library simply makes entire asynchronous code pointless.
Note however that using synchronous library with async is probably a bad idea. You won't get much and yet you complicate the code a lot.
You can take a look at the following simple working example from here. By the way it returns a string worth reading :-)
import aiohttp
import asyncio
async def fetch(client):
async with client.get('https://docs.aiohttp.org/en/stable/client_reference.html') as resp:
assert resp.status == 200
return await resp.text()
async def main():
async with aiohttp.ClientSession() as client:
html = await fetch(client)
print(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Related
I'm trying to create an interface to an API, and I want to have the option to easily run the requests sync or asynchronously, and I came up with the following code.
import asyncio
import requests
def async_run(coro_list):
loop = asyncio.get_event_loop()
futures = [loop.run_in_executor(None, asyncio.run, coro) for coro in coro_list]
result = loop.run_until_complete(asyncio.gather(*futures))
return result
def sync_get(url):
return requests.get(url)
async def async_get(url):
return sync_get(url)
coro_list = [async_get("https://google.com"), async_get("https://google.com")]
responses = async_run(coro_list)
print(responses)
For me it's very intuitive to either call sync_get or create a list of async_get and call async_run, and requires no knowledge of async Python to understand how it works.
The only problem is that loop.run_in_executor(None, asyncio.run, coro) doesn't sound too optimal, and I couldn't find anyone else running this code on Github. So I'm wondering, is there a simpler way to accomplish the objective of abstracting these threading and asyncio concepts in some similar way, or is this code already optimal?
asyncio.run() is usually used as the main entry to run async code from sync code.
loop.run_in_executor(None, asyncio.run, coro) cause an event loop created in executor threads to run coro in coro_list. Why not directly run sync_get in executor threads?
import asyncio
import requests
def async_run(url_list):
loop = asyncio.get_event_loop()
futures = [loop.run_in_executor(None, sync_get, url) for url in url_list]
result = await asyncio.gather(*futures)
return result
def sync_get(url):
return requests.get(url)
#
# async def async_get(url):
# return sync_get(url)
url_list = ["https://google.com", "https://google.com"]
responses = asyncio.run(async_run(url_list))
print(responses)
There are async libaries, eg. aiohttp and httpx, to accomplish similar work.
At the end I chose not to cover completely asyncio under my interface.
Still with the goal of having not having to manage 2 "requests" functions, I made the API async first, and run the synchronous one with asyncio and I ended up with something like this.
def sync_request():
return asyncio.run(async_request(...))
async def async_request():
return await aiohttp.request(...) # pseudo code
I am trying to learn async, and now I am trying to get whois information for a batch of domains. I found this lib aiowhois, but there are only a few strokes of information, not enough for such newbie as I am.
This code works without errors, but I don't know how to print data from parsed whois variable, which is coroutine object.
resolv = aiowhois.Whois(timeout=10)
async def coro(url, sem):
parsed_whois = await resolv.query(url)
async def main():
tasks = []
sem = asyncio.Semaphore(4)
for url in domains:
task = asyncio.Task(coro(url, sem))
tasks.append(task)
await asyncio.gather(*tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
You can avoid using tasks. Just apply gather to the coroutine directly.
In case you are confused about the difference, this SO QA might help you (especially the second answer).
You can have each coroutine return its result, without resorting to global variables:
async def coro(url):
return await resolv.query(url)
async def main():
domains = ...
ops = [coro(url) for url in domains]
rets = await asyncio.gather(*ops)
print(rets)
Please see the official docs to learn more about how to use gather or wait or even more options
Note: if you are using the latest python versions, you can also simplify the loop running with just
asyncio.run(main())
Note 2: I have removed the semaphore from my code, as it's unclear why you need it and where.
all_parsed_whois = [] # make a global
async def coro(url, sem):
all_parsed_whois.append(await resolv.query(url))
If you want the data as soon as it is available you could task.add_done_callback()
python asyncio add_done_callback with async def
Simple example: I need to make two unrelated HTTP requests in parallel. What's the simplest way to do that? I expect it to be like that:
async def do_the_job():
with aiohttp.ClientSession() as session:
coro_1 = session.get('http://httpbin.org/get')
coro_2 = session.get('http://httpbin.org/ip')
return combine_responses(await coro_1, await coro_2)
In other words, I want to initiate IO operations and wait for their results so they effectively run in parallel. This can be achieved with asyncio.gather:
async def do_the_job():
with aiohttp.ClientSession() as session:
coro_1 = session.get('http://example.com/get')
coro_2 = session.get('http://example.org/tp')
return combine_responses(*(await asyncio.gather(coro_1, coro_2)))
Next, I want to have some complex dependency structure. I want to start operations when I have all prerequisites for them and get results when I need the results. Here helps asyncio.ensure_future which makes separate task from coroutine which is managed by event loop separately:
async def do_the_job():
with aiohttp.ClientSession() as session:
fut_1 = asyncio.ensure_future(session.get('http://httpbin.org/ip'))
coro_2 = session.get('http://httpbin.org/get')
coro_3 = session.post('http://httpbin.org/post', data=(await coro_2)
coro_3_result = await coro_3
return combine_responses(await fut_1, coro_3_result)
Is it true that, to achieve parallel non-blocking IO with coroutines in my logic flow, I have to use either asyncio.ensure_future or asyncio.gather (which actually uses asyncio.ensure_future)? Is there a less "verbose" way?
Is it true that normally developers have to think what coroutines should become separate tasks and use aforementioned functions to gain optimal performance?
Is there a point in using coroutines without multiple tasks in event loop?
How "heavy" are event loop tasks in real life? Surely, they're "lighter" than OS threads or processes. To what extent should I strive for minimal possible number of such tasks?
I need to make two unrelated HTTP requests in parallel. What's the
simplest way to do that?
import asyncio
import aiohttp
async def request(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
return await resp.text()
async def main():
results = await asyncio.gather(
request('http://httpbin.org/delay/1'),
request('http://httpbin.org/delay/1'),
)
print(len(results))
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
loop.run_until_complete(loop.shutdown_asyncgens())
finally:
loop.close()
Yes, you may achieve concurrency with asyncio.gather or creating task with asyncio.ensure_future.
Next, I want to have some complex dependency structure? I want to
start operations when I have all prerequisites for them and get
results when I need the results.
While code you provided will do job, it would be nicer to split concurrent flows on different coroutines and again use asyncio.gather:
import asyncio
import aiohttp
async def request(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
return await resp.text()
async def get_ip():
return await request('http://httpbin.org/ip')
async def post_from_get():
async with aiohttp.ClientSession() as session:
async with session.get('http://httpbin.org/get') as resp:
get_res = await resp.text()
async with session.post('http://httpbin.org/post', data=get_res) as resp:
return await resp.text()
async def main():
results = await asyncio.gather(
get_ip(),
post_from_get(),
)
print(len(results))
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
loop.run_until_complete(loop.shutdown_asyncgens())
finally:
loop.close()
Is it true that normally developers have to think what coroutines
should become separate tasks and use aforementioned functions to gain
optimal performance?
Since you use asyncio you probably want to run some jobs concurrently to gain performance, right? asyncio.gather is a way to say - "run these jobs concurrently to get their results faster".
In case you shouldn't have to think what jobs should be ran concurrently to gain performance you may be ok with plain sync code.
Is there a point in using coroutines without multiple tasks in event
loop?
In your code you don't have to create tasks manually if you don't want it: both snippets in this answer don't use asyncio.ensure_future. But internally asyncio uses tasks constantly (for example, as you noted asyncio.gather uses tasks itself).
How "heavy" are event loop tasks in real life? Surely, they're
"lighter" than OS threads or processes. To what extent should I strive
for minimal possible number of such tasks?
Main bottleneck in async program is (almost always) network: you shouldn't worry about number of asyncio coroutines/tasks at all.
Let's say I have a class which uses asyncio loop internally and doesn't have async interface:
class Fetcher:
_loop = None
def get_result(...):
"""
After 3 nested sync calls async tasks are finally called with *run_until_complete*
"""
...
I use all advantages of asyncio internally and don't have to care about it in the outer code.
But then I want to call 3 Fetcher instances in one event loop. If I had async def interface there would be no problem: asyncio.gather could help me. Is there really no other way to do it without supporting both interfaces? Come on! It makes you change all your project because of one asyncio usage. Tell me this is not true.
Come on! It makes you change all your project because of one asyncio
usage. Tell me this is not true.
It's true
Whole idea of using await keyword is to execute concurrent jobs in one event loop from different places of the code (which you can't do with regular function code).
asyncio - is not some utility, but whole style of writing asynchronous programs.
On the other hand Python is very flexible, so you can still try to hide using of asyncio. If you really want to get sync result of 3 Fetcher instances, you can for example do something like this:
import asyncio
def sync_exec(coro):
loop = asyncio.get_event_loop()
return loop.run_until_complete(coro)
class Fetcher:
async def async_get_result(self):
# async interface:
async def async_job():
await asyncio.sleep(1)
return id(self)
return (await async_job())
def get_result(self):
# sync interface:
return sync_exec(self.async_get_result())
#classmethod
def get_results(cls, *fetchers):
# sync interface multiple:
return sync_exec(
asyncio.gather(*[fetcher.async_get_result() for fetcher in fetchers])
)
# single sync get_result:
f1 = Fetcher()
print('Result: ', f1.get_result())
# multiple sync get_result:
f2 = Fetcher()
f3 = Fetcher()
print('Results: ', Fetcher.get_results(f1, f2, f3))
Output:
Result: 2504097887120
Results: [2504097887120, 2504104854416, 2504104854136]
But, again, you'll really regret someday if you continue to write code this way, believe me. If you want to get full advantage of asynchronous programming - use coroutines and await explicitly.
I'm trying to consume multiple queues concurrently using python, asyncio and asynqp.
I don't understand why my asyncio.sleep() function call does not have any effect. The code doesn't pause there. To be fair, I actually don't understand in which context the callback is executed, and whether I can yield control bavck to the event loop at all (so that the asyncio.sleep() call would make sense).
What If I had to use a aiohttp.ClientSession.get() function call in my process_msg callback function? I'm not able to do it since it's not a coroutine. There has to be a way which is beyond my current understanding of asyncio.
#!/usr/bin/env python3
import asyncio
import asynqp
USERS = {'betty', 'bob', 'luis', 'tony'}
def process_msg(msg):
asyncio.sleep(10)
print('>> {}'.format(msg.body))
msg.ack()
async def connect():
connection = await asynqp.connect(host='dev_queue', virtual_host='asynqp_test')
channel = await connection.open_channel()
exchange = await channel.declare_exchange('inboxes', 'direct')
# we have 10 users. Set up a queue for each of them
# use different channels to avoid any interference
# during message consumption, just in case.
for username in USERS:
user_channel = await connection.open_channel()
queue = await user_channel.declare_queue('Inbox_{}'.format(username))
await queue.bind(exchange, routing_key=username)
await queue.consume(process_msg)
# deliver 10 messages to each user
for username in USERS:
for msg_idx in range(10):
msg = asynqp.Message('Msg #{} for {}'.format(msg_idx, username))
exchange.publish(msg, routing_key=username)
loop = asyncio.get_event_loop()
loop.run_until_complete(connect())
loop.run_forever()
I don't understand why my asyncio.sleep() function call does not have
any effect.
Because asyncio.sleep() returns a future object that has to be used in combination with an event loop (or async/await semantics).
You can't use await in simple def declaration because the callback is called outside of async/await context which is attached to some event loop under the hood. In other words mixing callback style with async/await style is quite tricky.
The simple solution though is to schedule the work back to the event loop:
async def process_msg(msg):
await asyncio.sleep(10)
print('>> {}'.format(msg.body))
msg.ack()
def _process_msg(msg):
loop = asyncio.get_event_loop()
loop.create_task(process_msg(msg))
# or if loop is always the same one single line is enough
# asyncio.ensure_future(process_msg(msg))
# some code
await queue.consume(_process_msg)
Note that there is no recursion in _process_msg function, i.e. the body of process_msg is not executed while in _process_msg. The inner process_msg function will be called once the control goes back to the event loop.
This can be generalized with the following code:
def async_to_callback(coro):
def callback(*args, **kwargs):
asyncio.ensure_future(coro(*args, **kwargs))
return callback
async def process_msg(msg):
# the body
# some code
await queue.consume(async_to_callback(process_msg))
See Drizzt1991's response on github for a solution.