Getting Result from Cancelled Task - python

I have a program, roughly like the example below.
A task is gathering a number of values and returning them to a caller.
Sometimes the tasks may get cancelled.
In those cases, I still want to get the results the tasks have gathered so far.
Hence I catch the CancelledError exception, clean up, and return the completed results.
async def f():
results = []
for i in range(100):
try:
res = await slow_call()
results.append(res)
except asyncio.CancelledError:
results.append('Undecided')
return results
def on_done(task):
if task.cancelled():
print('Incomplete result', task.result()
else:
print(task.result())
async def run():
task = asyncio.create_task(f())
task.add_done_callback(on_done)
The problem is that the value returned after a task is cancelled doesn't appear to be available in the task.
Calling task.result() simply rethrows CancelledError. Calling task._result is just None.
Is there a way to get the return value of a cancelled task, assuming it has one?
Edit: I realize now that catching the CancelledError results in the task not being cancelled at all.
This leaves me with another conundrum: How do I signal to the tasks owner that this result is only a "half" result, and the task has really been cancelled.
I suppose I could add an extra return value indicating this, but that seems to go against the whole idea of the task cancellation system.
Any suggestions for a good approach here?

I'm a long way away from understanding the use case, but the following does something sensible for me:
import asyncio
async def fn(results):
for i in range(10):
# your slow_call
await asyncio.sleep(0.1)
results.append(i)
def on_done(task, results):
if task.cancelled():
print('incomplete', results)
else:
print('complete', results)
async def run():
results = []
task = asyncio.create_task(fn(results))
task.add_done_callback(lambda t: on_done(t, results))
# give fn some time to finish, reducing this will cause the task to be cancelled
# you'll see the incomplete message if this is < 1.1
await asyncio.sleep(1.1)
asyncio.run(run())
it's the use of add_done_callback and sleep in run that feels very awkward and makes me think I don't understand what you're doing. maybe posting something to https://codereview.stackexchange.com containing more of the calling code would help get ideas of better ways to structure things. note that there are other libraries like trio that provide much nicer interfaces to Python coroutines than the asyncio builtin library (which was standardised prematurely IMO)

I don't think that is possible, because in my opinion, collides with the meaning of cancellation of a task.
You can implements a similar behavior inside your slow_call, by triggering the CancelledError, catching it inside your function and then returns whatever you want.

Related

What are Python asyncio's cancellation points?

When can an asyncio Task be canceled? Or, more generally, when can an asyncio loop switch to a different Task? It's been really hard for me to use cancellation in my asyncio programs, because I don't know when a CancelledError can get thrown.
I was working on a bit of code earlier, with a context manager kinda like this:
#! /usr/bin/python3
import asyncio
class MyContextManager:
def __init__(self):
self.locked = False
async def __aenter__(self):
self.locked = True
async def __aexit__(self, *_):
self.locked = False
async def main():
async with MyContextManager():
print("Doing something that needs locking")
if __name__ == "__main__":
asyncio.run(main())
What happens if the task is cancelled during __aenter__? I need to make sure that self.locked is false whenever the async with section exits. (I'm using self.locked as a simplification of a more complex acquire/release algorithm here, which includes some steps that are necessarily async.)
The docs regarding async with say:
The following code:
async with EXPRESSION as TARGET:
SUITE
is semantically equivalent to:
manager = (EXPRESSION)
aenter = type(manager).__aenter__
aexit = type(manager).__aexit__
value = await aenter(manager)
hit_except = False
try:
TARGET = value
SUITE
except:
hit_except = True
if not await aexit(manager, *sys.exc_info()):
raise finally:
if not hit_except:
await aexit(manager, None, None, None)
If I'm reading this right, this means that there's a window between when await aenter is called and when the try:finally: block is set up. If a task is canceled at the time that aenter returns, then aexit will not be called.
Can a task be canceled on exit from an async function? Well, let's look at the docs for asyncio.shield:
The statement:
res = await shield(something())
is equivalent to:
res = await something()
except that if the coroutine containing it is cancelled, the Task running in something() is not cancelled. From the point of view of something(), the cancellation did not happen. Although its caller is still cancelled, so the “await” expression still raises a CancelledError.
This seems to imply that an await expression can raise a CancelledError, even if the task is not canceled during the evaluation of the underlying expression.
As an opposing view, when I looked at the source code for asyncio.shield, it looks like the CancelledError is raised within asyncio.shield, rather than at the time that the await expression returns.
The biggest advantage of coroutines over threads is that it's much easier to reason about parallelism: synchronous operations will complete serially, and it's only when you await that anything can change out from under you. I use this reasoning a lot in my code. But it's not clear exactly when that await expression can change something out from under you.
An asyncio task can only be canceled on the await, as that's the only point at which the event loop can be running instead of your code.
The event loop is given control when one of the tasks you await yields to it (this is usually done by something like awaiting a sleep, or a future). Do mind that this is not necessarily done for every await, so an await doesn't guarantee that the event loop will actually run (the simplest example would be a coro that only returns a constant).
With the python equivalent async context manager code, the only point where something can be cancelled before the try block is set up is within __aenter__ itself, as that's the only point that may actually yield control. If something in the __aenter__ is cancelled and propagates up as an exception, it wouldn't make sense to call the __aexit__ either.
For shield, what the docs are saying is that on cancellation, the shield's task will be cancelled, which is shown to the caller by rising CancelledError on await, but the task it's wrapping will continue executing without ever being aware of the shielded cancellation.

Correct way to parallelize work with asyncio

There are many posts on SO asking specific questions about asyncio, but I cannot grasp the right way on what to use for a given situation.
Let's say I want to parse and crawl a number of web pages in parallel. I can do this in at least 3 different ways with asyncio:
with pool.submit:
with ThreadPoolExecutor(max_workers=10) as pool:
result_futures = list(map(lambda x: pool.submit(my_func, x), my_list))
for future in as_completed(result_futures):
results.append(future.result())
return results
With asyncio.gather:
loop = asyncio.get_running_loop()
with ThreadPoolExecutor(max_workers=10) as pool:
futures = [loop.run_in_executor(pool, my_func, x) for x in my_list]
results = await asyncio.gather(*futures)
With just pool.map:
with ThreadPoolExecutor(max_workers=10) as pool:
results = [x for x in pool.map(my_func, arg_list)]
my_func is something like
async def my_func(arg):
async with aiohttp.ClientSession() as session:
async with session.post(...):
...
Could somebody help me understand what would be the differences between those 3 approaches? I understand that I can, for example, handle exceptions independently in the first one, but any other differences?
None of these. ThreadPoolExecutor and run_in_executor will all execute your code in another thread, no matter you use the asyncio loop to watch for their execution. And at that point you might just as well not use asyncio at all: the idea of async is exactly managing to run everything on a single thread - getting some CPU cycles and easing a lot on race-conditions that emerge on multi-threaded code.
If your my_func is using async correctly, all the way (it looks like it is, but the code is incomplete), you have to create an asyncio Task for each call to your "async defined" function. On that, maybe the shortest path is indeed using asyncio.gather:
import asyncio
import aiohttp, ... # things used inside "my_func"
def my_func(x):
...
my_list = ...
results = asyncio.run(asyncio.gather(*(my_func(x) for x in my_list)))
An that is all there is for it.
Now going back to your code, and checking the differences:
your code work almost by chance, as in, you really just passed the async functiona and its parameters to the threadpool executor: on calling any async function in this way, they return imediatelly, with no work done. That means nothing (but some thin boiler plate inner code used to create the co-routines) is executed in your threadpool executors. The values returned by the call that runs in the target threads (i.e. the actual my_func(x) call) are the "co-routines": these are the objects that are to be awaited in the main thread and that will actually performe the network I/O. That is: your "my_func" is a "co-routine function" and when called it retoruns immediately with a "co-routine object". When the co-routine object is awaited the code inside "my_func" is actually executed.
Now, with that out of the way: in your first snippet you call future.result on the concurrent.futures Future: that will jsut give you the co-routine object: that code does not work - if you would write results.append(await future.result()) then, yes, if there are no exceptions in the execution, it would work, but would make all the calls in sequence: "await" stops the execution of the current thread until the awaited object resolves, and since awaiting for the other results would happen in this same code, they will queue and be executed in order, with zero parallelism.
Your pool.map code does the same, and your asyncio.gather code is wrong in a different way: the loop.run_in_executor code will take your call and run it on another thread, and gives you an awaitable object which is suitable to be used with gather. However, awaiting on it will return you the "co-routine object", not the result of the HTTP call.
Your real options regarding getting the exceptions raised in the parallel code are either using asyncio.gather, asyncio.wait or asyncio.as_completed. Check the docs here: https://docs.python.org/3/library/asyncio-task.html

Implementing a coroutine in python

Lets say I have a C++ function result_type compute(input_type input), which I have made available to python using cython. My python code executes multiple computations like this:
def compute_total_result()
inputs = ...
total_result = ...
for input in inputs:
result = compute_python_wrapper(input)
update_total_result(total_result)
return total_result
Since the computation takes a long time, I have implemented a C++ thread pool (like this) and written a function std::future<result_type> compute_threaded(input_type input), which returns a future that becomes ready as soon as the thread pool is done executing.
What I would like to do is to use this C++ function in python as well. A simple way to do this would be to wrap the std::future<result_type> including its get() function, wait for all results like this:
def compute_total_results_parallel()
inputs = ...
total_result = ...
futures = []
for input in inputs:
futures.append(compute_threaded_python_wrapper(input))
for future in futures:
update_total_result(future.get())
return total_result
I suppose this works well enough in this case, but it becomes very complicated very fast, because I have to pass futures around.
However, I think that conceptually, waiting for these C++ results is no different from waiting for file or network I/O.
To facilitate I/O operations, the python devs introduced the async / await keywords. If my compute_threaded_python_wrapper would be part of asyncio, I could simply rewrite it as
async def compute_total_results_async()
inputs = ...
total_result = ...
for input in inputs:
result = await compute_threaded_python_wrapper(input)
update_total_result(total_result)
return total_result
And I could execute the whole code via result = asyncio.run(compute_total_results_async()).
There are a lot of tutorials regarding async programming in python, but most of them deal with using coroutines where the bedrock seem to be some call into the asyncio package, mostly calling asyncio.sleep(delay) as a proxy for I/O.
My question is: (How) Can I implement coroutines in python, enabling python to await the wrapped future object (There is some mention of a __await__ method returning an iterator)?
First, an inaccuracy in the question needs to be corrected:
If my compute_threaded_python_wrapper would be part of asyncio, I could simply rewrite it as [...]
The rewrite is incorrect: await means "wait until the computation finishes", so the loop as written would execute the code sequentially. A rewrite that actually runs the tasks in parallel would be something like:
# a direct translation of the "parallel" version
def compute_total_results_async()
inputs = ...
total_result = ...
tasks = []
# first spawn all the tasks
for input in inputs:
tasks.append(
asyncio.create_task(compute_threaded_python_wrapper(input))
)
# and then await them
for task in tasks:
update_total_result(await task)
return total_result
This spawn-all-await-all pattern is so uniquitous that asyncio provides a helper function, asyncio.gather(), which makes it much shorter, especially when combined with a list comprehension:
# a more idiomatic version
def compute_total_results_async()
inputs = ...
total_result = ...
results = await asyncio.gather(
*[compute_threaded_python_wrapper(input) for input in inputs]
)
for result in results:
update_total_result(result)
return total_result
With that out of the way, we can proceed to the main question:
My question is: (How) Can I implement coroutines in python, enabling python to await the wrapped future object (There is some mention of a __await__ method returning an iterator)?
Yes, awaitable objects are implemented using iterators that yield to indicate suspension. But that is way too low-level a tool for what you actually need. You don't need just any awaitable, but one that works with the asyncio event loop, which has specific expectations of the underlying iterator. You need a mechanism to resume the awaitable when the result is ready, where you again depend on asyncio.
Asyncio already provides awaitable objects that can be externally assigned a value: futures. An asyncio future represents an async value that will become available at some point in the future. They are related to, but not semantically equivalent to C++ futures, and should not to be confused with multi-threaded futures from the concurrent.futures stdlib module.
To create an awaitable object that is activated by something that happens in another thread, you need to create a future, and then start your off-thread task, instructing it to mark the future as completed when it finishes execution. Since asyncio futures are not thread-safe, this must be done using the call_soon_threadsafe event loop method provided by asyncio for such situations. In Python it would be done like this:
def run_async():
loop = asyncio.get_event_loop()
future = loop.create_future()
def on_done(result):
# when done, notify the future in a thread-safe manner
loop.call_soon_threadsafe(future.set_result, resut)
# start the worker in a thread owned by the pool
pool.submit(_worker, on_done)
# returning a future makes run_async() awaitable, and
# passable to asyncio.gather() etc.
return future
def _worker(on_done):
# this runs in a different thread
... processing goes here ...
result = ...
on_done(result)
In your case, the worker would be presumably implemented in Cython combined with C++.

Python coroutines

I have a little bit of experience with promises in Javascript. I am quite experienced with Python, but new to its coroutines, and there is a bit that I just fail to understand: where does the asynchronicity kick in?
Let's consider the following minimal example:
async def gen():
await something
return 42
As I understand it, await something puts execution of our function aside and lets the main program run other bits. At some point something has a new result and gen will have a result soon after.
If gen and something are coroutines, then by all internet wisdom they are generators. And the only way to know when a generator has a new item available, afaik, is by polling it: x=gen(); next(x). But this is blocking! How does the scheduler "know" when x has a result? The answer can't be "when something has a result" because something must be a generator, too (for it is a coroutine). And this argument applies recursively.
I can't get past this idea that at some point the process will just have to sit and wait synchronously.
The secret sauce here is the asyncio module. Your something object has to be an awaitable object itself, and either depend on more awaitable objects, or must yield from a Future object.
For example, the asyncio.sleep() coroutine yields a Future:
#coroutine
def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay == 0:
yield
return result
if loop is None:
loop = events.get_event_loop()
future = loop.create_future()
h = future._loop.call_later(delay,
futures._set_result_unless_cancelled,
future, result)
try:
return (yield from future)
finally:
h.cancel()
(The syntax here uses the older generator syntax, to remain backwards compatible with older Python 3 releases).
Note that a future doesn't use await or yield from; they simply use yield self until some condition is met. In the above async.sleep() coroutine, that condition is met when a result has been produced (in the async.sleep() code above, via the futures._set_result_unless_cancelled() function called after a delay).
An event loop then keeps pulling in the next 'result' from each pending future it manages (polling them efficiently) until the future signals it is done (by raising a StopIteration exception holding the results; return from a co-routine would do that, for example). At that point the coroutine that yielded the future can be signalled to continue (either by sending the future result, or by throwing an exception if the future raised anything other than StopIteration).
So for your example, the loop will kick off your gen() coroutine, and await something then (directly or indirectly) yields a future. That future is polled until it raises StopIteration (signalling it is done) or raises some other exception. If the future is done, coroutine.send(result) is executed, allowing it to then advance to the return 42 line, triggering a new StopIteration exception with that value, allowing a calling coroutine awaiting on gen() to continue, etc.

Is asyncio.wait order guaranteed?

I'm trying to implement fair queuing in my library that is based on asyncio.
In some function, I have a statement like (assume socketX are tasks):
done, pending = asyncio.wait(
[socket1, socket2, socket3],
return_when=asyncio.FIRST_COMPLETED,
)
Now I read the documentation for asyncio.wait many times but it does not contain the information I'm after. Mainly, I'd like to know if:
socket1, socket2 and socket3 happened to be already ready when I issue the call. Is it guaranteed that done will contain them all or could it be that it returns only one (or two) ?
In the second case, does the order of the tasks passed to wait() matter ?
I'm trying to assert if I can just apply fair-queuing in the set of done tasks (by picking one and leaving the other tasks for later resolution) or if I also need to care about the order I pass the tasks in.
The documentation is kinda silent about this. Any idea ?
This is only taken according to the source code of Python 3.5.
If the future is done before calling wait, they will all be placed in the done set:
import asyncio
async def f(n):
return n
async def main():
(done, pending) = await asyncio.wait([f(1), f(2), f(3)], return_when=asyncio.FIRST_COMPLETED)
print(done) # prints set of 3 futures
print(pending) # prints empty set
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

Categories