Share a Python asyncio.Future between multiple coroutines - python

I would like to know if the following is a safe use of futures to bridge
callback-based code to asynchronous code.
I hope the following illustrates what I mean.
I would like to communicate between two coroutines. One coroutine is mainly
running a library (not my code, represented by b method below), which provides
a synchronous callback which is called whenever an event happens. (Note that the library is using asyncio, but this particular API it provides uses a synchronous callback). I would like
to get data from that callback into a second coroutine (my code, represented by
a method below).
I don't care about missing messages (X objects). callback is called
frequently, much more often than we actually need X objects. It is no problem to
just wait for the next if we miss it. And we don't always need the latest X,
there are only certain situations when we need to refresh it (get a new one),
the rest of the time we just ignore them.
Can I share a future across coroutines like this, or do I need some kind of locking around setting/accessing self.fut? I can't actually use an asyncio.Lock in callback (a synchronous function) because I would need to use await lock.acquire() or async with lock: .... As mentioned above, I can't change this API to make callback an async function.
Also, are there any better alternative approaches? I could use an unlimited asyncio.Queue and use put_nowait synchronously from
the callback function. But it seems like overkill to use a Queue for the reasons
above (I only care about the latest, I don't care about missing messages, I
ignore most of them). The queue would (should) only ever have at most 1 item.
import asyncio
from typing import Optional
class C:
fut: Optional[asyncio.Future[X]]
def __init__(self):
self.fut = None
async def a(self):
while True:
...
# Some condition comes up where we need an X
# We are signalling to b() that we want an X object by setting
# self.fut to a Future (making it not None)
fut = asyncio.get_running_loop().create_future()
self.fut = fut
x = await fut
...
# (use the X object)
async def b(self):
while True:
...
# Calls synchronous callback when it has an X object ready
self.callback(x)
def callback(self, x: X):
...
# callback does other things as well as sending X objects over the Future
# But if the future is not None, it will use it send the X object
fut = self.fut
# Reset it to None because we only want one X object
self.fut = None
if fut is not None:
fut.set_result(x)
async def main():
c = C()
tasks = asyncio.create_task(c.a()), asyncio.create_task(c.b())
for coro in asyncio.as_completed(tasks)
await coro
if __name__ == '__main__':
asyncio.run(main())

Related

Implementing a coroutine in python

Lets say I have a C++ function result_type compute(input_type input), which I have made available to python using cython. My python code executes multiple computations like this:
def compute_total_result()
inputs = ...
total_result = ...
for input in inputs:
result = compute_python_wrapper(input)
update_total_result(total_result)
return total_result
Since the computation takes a long time, I have implemented a C++ thread pool (like this) and written a function std::future<result_type> compute_threaded(input_type input), which returns a future that becomes ready as soon as the thread pool is done executing.
What I would like to do is to use this C++ function in python as well. A simple way to do this would be to wrap the std::future<result_type> including its get() function, wait for all results like this:
def compute_total_results_parallel()
inputs = ...
total_result = ...
futures = []
for input in inputs:
futures.append(compute_threaded_python_wrapper(input))
for future in futures:
update_total_result(future.get())
return total_result
I suppose this works well enough in this case, but it becomes very complicated very fast, because I have to pass futures around.
However, I think that conceptually, waiting for these C++ results is no different from waiting for file or network I/O.
To facilitate I/O operations, the python devs introduced the async / await keywords. If my compute_threaded_python_wrapper would be part of asyncio, I could simply rewrite it as
async def compute_total_results_async()
inputs = ...
total_result = ...
for input in inputs:
result = await compute_threaded_python_wrapper(input)
update_total_result(total_result)
return total_result
And I could execute the whole code via result = asyncio.run(compute_total_results_async()).
There are a lot of tutorials regarding async programming in python, but most of them deal with using coroutines where the bedrock seem to be some call into the asyncio package, mostly calling asyncio.sleep(delay) as a proxy for I/O.
My question is: (How) Can I implement coroutines in python, enabling python to await the wrapped future object (There is some mention of a __await__ method returning an iterator)?
First, an inaccuracy in the question needs to be corrected:
If my compute_threaded_python_wrapper would be part of asyncio, I could simply rewrite it as [...]
The rewrite is incorrect: await means "wait until the computation finishes", so the loop as written would execute the code sequentially. A rewrite that actually runs the tasks in parallel would be something like:
# a direct translation of the "parallel" version
def compute_total_results_async()
inputs = ...
total_result = ...
tasks = []
# first spawn all the tasks
for input in inputs:
tasks.append(
asyncio.create_task(compute_threaded_python_wrapper(input))
)
# and then await them
for task in tasks:
update_total_result(await task)
return total_result
This spawn-all-await-all pattern is so uniquitous that asyncio provides a helper function, asyncio.gather(), which makes it much shorter, especially when combined with a list comprehension:
# a more idiomatic version
def compute_total_results_async()
inputs = ...
total_result = ...
results = await asyncio.gather(
*[compute_threaded_python_wrapper(input) for input in inputs]
)
for result in results:
update_total_result(result)
return total_result
With that out of the way, we can proceed to the main question:
My question is: (How) Can I implement coroutines in python, enabling python to await the wrapped future object (There is some mention of a __await__ method returning an iterator)?
Yes, awaitable objects are implemented using iterators that yield to indicate suspension. But that is way too low-level a tool for what you actually need. You don't need just any awaitable, but one that works with the asyncio event loop, which has specific expectations of the underlying iterator. You need a mechanism to resume the awaitable when the result is ready, where you again depend on asyncio.
Asyncio already provides awaitable objects that can be externally assigned a value: futures. An asyncio future represents an async value that will become available at some point in the future. They are related to, but not semantically equivalent to C++ futures, and should not to be confused with multi-threaded futures from the concurrent.futures stdlib module.
To create an awaitable object that is activated by something that happens in another thread, you need to create a future, and then start your off-thread task, instructing it to mark the future as completed when it finishes execution. Since asyncio futures are not thread-safe, this must be done using the call_soon_threadsafe event loop method provided by asyncio for such situations. In Python it would be done like this:
def run_async():
loop = asyncio.get_event_loop()
future = loop.create_future()
def on_done(result):
# when done, notify the future in a thread-safe manner
loop.call_soon_threadsafe(future.set_result, resut)
# start the worker in a thread owned by the pool
pool.submit(_worker, on_done)
# returning a future makes run_async() awaitable, and
# passable to asyncio.gather() etc.
return future
def _worker(on_done):
# this runs in a different thread
... processing goes here ...
result = ...
on_done(result)
In your case, the worker would be presumably implemented in Cython combined with C++.

Using an asynchronous function in __del__

I'd like to define what essentially is an asynchronous __del__ that closes a resource. Here's an example.
import asyncio
class Async:
async def close(self):
print('closing')
return self
def __del__(self):
print('destructing')
asyncio.ensure_future(self.close())
async def amain():
Async()
if __name__ == '__main__':
asyncio.run(amain())
This works, printing destructing and closing as expected. However, if the resource is defined outside an asynchronous function, __del__ is called, but closing is never performed.
def main():
Async()
No warning is raised here, but the prints reveal that closing was not done. The warning is issued if an asynchronous function has been run, but any instance is created outside of it.
def main2():
Async()
asyncio.run(amain())
RuntimeWarning: coroutine 'Async.close' was never awaited
This has been the subject in 1 and 2, but neither quite had what I was looking for, or maybe I didn't know how to look. Particularly the first question was about deleting a resource, and its answer suggested using asyncio.ensure_future, which was tested above. Python documentation suggests using the newer asyncio.create_task, but it straight up raises an error in the non-async case, there being no current loop. My final, desperate attempt was to use asyncio.run, which worked for the non-async case, but not for the asynchronous one, as calling run is prohibited in a thread that already has a running loop. Additionally, the documentation states that it should only be called once in a program.
I'm still new to async things. How could this be achieved?
A word on the use case, since asynchronous context managers were mentioned as the preferred alternative in comments. I agree, using them for short-term resource management would be ideal. However, my use case is different for two reasons.
Users of the class are not necessarily aware of the underlying resources. It is better user experience to hide closing the resource from a user who doesn't fiddle with the resource itself.
The class needs to be instantiated (or for it to be possible to instantiate it) in a synchronous context, and it is often created just once. For example, in a web server context the class would be instantiated in the global scope, after which its async functions would be used in the endpoint definitions.
For example:
asc = Async()
server.route('/', 'GET')
async def root():
return await asc.do_something(), 200
I'm open to other suggestions of implementing such a feature, but at this point even my curiosity for the possibility that this can be done is enough for me to want an answer to this specific question, not just the general problem.
Only thing that comes to mind is to run cleanup after the server shutdown. It'll look something like this:
asc = Async()
try:
asyncio.run(run_server()) # You already do it now somewhere
finally:
asyncio.run(asc.close())
Since asyncio.run creates new event loop each time, you may want to go even deeper and reuse the same event loop:
loop = asyncio.get_event_loop()
asc = Async()
try:
loop.run_until_complete(run_server())
finally:
loop.run_until_complete(asc.close())
It's absolutely ok to call run_until_complete multiple times as long as you know what you're doing.
Full example with your snippet:
import asyncio
class Async:
async def close(self):
print('closing')
return self
async def cleanup(self):
print('destructing')
await self.close()
loop = asyncio.get_event_loop()
asc = Async()
async def amain():
await asyncio.sleep(1) # Do something
if __name__ == '__main__':
try:
loop.run_until_complete(amain())
finally:
loop.run_until_complete(asc.cleanup())
loop.close()

How do I make my list comprehension (and the function it calls) run asynchronously?

class Class1():
def func1():
self.conn.send('something')
data = self.conn.recv()
return data
class Class2():
def func2():
[class1.func1() for class1 in self.classes]
How do I make that last line asynchronously in python? I've been googling but can't understand async/await and don't know which functions I should be putting async in front of. In my case, all the class1.func1 need to send before any of them can receive anything. I was also seeing that __aiter__ and __anext__ need to be implemented, but I don't know how those are used in this context. Thanks!
It is indeed possible to fire off multiple requests and asynchronously
wait for them. Because Python is traditionally a synchronous language,
you have to be very careful about what libraries you use with
asynchronous Python. Any library that blocks the main thread (such as
requests) will break your entire asynchronicity. aiohttp is a common
choice for asynchronously making web API calls in Python. What you
want is to create a bunch of future objects inside a Python list and
await it. A future is an object that represents a value that will
eventually resolve to something.
EDIT: Since the function that actually makes the API call is
synchronous and blocking and you don't have control over it, you will
have to run that function in a separate thread.
Async List Comprehensions in Python
import asyncio
async def main():
loop = asyncio.get_event_loop()
futures = [asyncio.ensure_future(loop.run_in_executor(None, get_data, data)) for data in data_name_list]
await asyncio.gather(*futures) # wait for all the future objects to resolve
# Do something with futures
# ...
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

Python asyncio: function or coroutine, which to use?

I'm wondering if there are any noticeable differences in performance between foo and bar:
class Interface:
def __init__(self, loop):
self.loop = loop
def foo(self, a, b):
return self.loop.run_until_complete(self.bar(a, b))
async def bar(self, a, b):
v1 = await do_async_thing1(a)
for i in range(whatever):
do_some_syncronous_work(i)
v2 = await do_async_thing2(b, v1)
return v2
async def call_foo_bar(loop):
a = 'squid'
b = 'ink'
interface = Interface(loop)
v_foo = interface.foo(a, b)
v_bar = await interface.bar(a, b)
But will the use of run_until_complete cause any practical, noticeable different to running my program?
(The reason I ask is I'm building an interface class which will accommodate decomposable "back-ends" some of which could be asynchronous. I'm hoping to use standard functions for the public (reusable) methods of the interface class so one API can be maintained for all code without messing up the use of asynchronous back-ends where an event-loop is available.)
Update: I didn't check the code properly and the first version was completely invalid, hence rewrite.
loop.run_until_complete() should be used outside of coroutine on very top level. Calling run_until_complete() with active (running) event loop is forbidden.
Also loop.run_until_complete(f) is a blocking call: execution thread is blocked until f coroutine or future becomes done.
async def is a proper way to write asynchronous code by making concurrent coroutines which may communicate with each other.
async def requires running event loop (either loop.run_until_complete() or loop.run_forever() should be called).
There is a world of difference. It is a SyntaxError to use await in a normal function def function.
import asyncio
async def atest():
print("hello")
def test():
await atest()
async def main():
await atest()
test()
--
File "test.py", line 9
await atest()
^
SyntaxError: invalid syntax
I do not believe so. I have used both and have not really seen a difference. I prefer foo just because I have used it more and understand it a bit better, but it is up to you.

When to use and when not to use Python 3.5 `await` ?

I'm getting the flow of using asyncio in Python 3.5 but I haven't seen a description of what things I should be awaiting and things I should not be or where it would be neglible. Do I just have to use my best judgement in terms of "this is an IO operation and thus should be awaited"?
By default all your code is synchronous. You can make it asynchronous defining functions with async def and "calling" these functions with await. A More correct question would be "When should I write asynchronous code instead of synchronous?". Answer is "When you can benefit from it". In cases when you work with I/O operations as you noted you will usually benefit:
# Synchronous way:
download(url1) # takes 5 sec.
download(url2) # takes 5 sec.
# Total time: 10 sec.
# Asynchronous way:
await asyncio.gather(
async_download(url1), # takes 5 sec.
async_download(url2) # takes 5 sec.
)
# Total time: only 5 sec. (+ little overhead for using asyncio)
Of course, if you created a function that uses asynchronous code, this function should be asynchronous too (should be defined as async def). But any asynchronous function can freely use synchronous code. It makes no sense to cast synchronous code to asynchronous without some reason:
# extract_links(url) should be async because it uses async func async_download() inside
async def extract_links(url):
# async_download() was created async to get benefit of I/O
html = await async_download(url)
# parse() doesn't work with I/O, there's no sense to make it async
links = parse(html)
return links
One very important thing is that any long synchronous operation (> 50 ms, for example, it's hard to say exactly) will freeze all your asynchronous operations for that time:
async def extract_links(url):
data = await download(url)
links = parse(data)
# if search_in_very_big_file() takes much time to process,
# all your running async funcs (somewhere else in code) will be frozen
# you need to avoid this situation
links_found = search_in_very_big_file(links)
You can avoid it calling long running synchronous functions in separate process (and awaiting for result):
executor = ProcessPoolExecutor(2)
async def extract_links(url):
data = await download(url)
links = parse(data)
# Now your main process can handle another async functions while separate process running
links_found = await loop.run_in_executor(executor, search_in_very_big_file, links)
One more example: when you need to use requests in asyncio. requests.get is just synchronous long running function, which you shouldn't call inside async code (again, to avoid freezing). But it's running long because of I/O, not because of long calculations. In that case, you can use ThreadPoolExecutor instead of ProcessPoolExecutor to avoid some multiprocessing overhead:
executor = ThreadPoolExecutor(2)
async def download(url):
response = await loop.run_in_executor(executor, requests.get, url)
return response.text
You do not have much freedom. If you need to call a function you need to find out if this is a usual function or a coroutine. You must use the await keyword if and only if the function you are calling is a coroutine.
If async functions are involved there should be an "event loop" which orchestrates these async functions. Strictly speaking it's not necessary, you can "manually" run the async method sending values to it, but probably you don't want to do it. The event loop keeps track of not-yet-finished coroutines and chooses the next one to continue running. asyncio module provides an implementation of event loop, but this is not the only possible implementation.
Consider these two lines of code:
x = get_x()
do_something_else()
and
x = await aget_x()
do_something_else()
Semantic is absolutely the same: call a method which produces some value, when the value is ready assign it to variable x and do something else. In both cases the do_something_else function will be called only after the previous line of code is finished. It doesn't even mean that before or after or during the execution of asynchronous aget_x method the control will be yielded to event loop.
Still there are some differences:
the second snippet can appear only inside another async function
aget_x function is not usual, but coroutine (that is either declared with async keyword or decorated as coroutine)
aget_x is able to "communicate" with the event loop: that is yield some objects to it. The event loop should be able to interpret these objects as requests to do some operations (f.e. to send a network request and wait for response, or just suspend this coroutine for n seconds). Usual get_x function is not able to communicate with event loop.

Categories