create and handle conditions with asyncio - python

I have a parent function which should run 2 tests on a data set.
if any of these tests fail parent function should return fail. I want to run these 2 tests asynchronously with asyncio and as soon as one of the tests failed, parent function should return fail and cancel the other test.
I'm new to asyncio and read some examples with the condition here but couldn't figure out how to write asyncio with conditions.
so far I could handle it by throwing exceptions in any test that has been failed.
here is my basic code:
async def test1(data):
# run some test on data and return true on pass and throw exception on fail
async def test2(data):
# run some test on data and return true on pass and throw exception on fail
ioloop = asyncio.get_event_loop()
tasks = [ioloop.create_task(test1(data)), ioloop.create_task(test2(data))]
finished, unfinished = ioloop.run_until_complete(asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION))
but I don't think it's a proper way to handle conditions.
so I want a basic example of how to create and handle conditions with ayncio.

as soon as one of the tests failed, parent function should return fail and cancel the other test.
asyncio.gather does that automatically:
loop = asyncio.get_event_loop()
tasks = [loop.create_task(test1(data)), loop.create_task(test2(data))]
try:
loop.run_until_complete(asyncio.gather(*tasks))
except FailException: # use exception raised by the task that fails
print('failed')
When any task executed in asyncio.gather raises an exception, all other tasks will be canceled using Task.cancel, and the exception will be propagated to the awaiter of gather. You don't need a Condition at all, cancellation will automatically interrupt whatever blocking operation the tasks were waiting on.
Conditions are needed when a task that is otherwise idle (or many such tasks) needs to wait for an event that can happen in some other task. In that case it waits on a condition and is notified of it occurring. If the task is just going about its business, you can cancel it any time you like, or let functions like asyncio.gather or asyncio.wait_for do it for you.

Related

How to prevent python3.11 TaskGroup from canceling all the tasks

I just discovered new features of Python 3.11 like ExceptionGroup and TaskGroup and I'm confused with the following TaskGroup behavior: if one or more tasks inside the group fails then all other normal tasks are cancelled and I have no chance to change that behavior
Example:
async def f_error():
raise ValueError()
async def f_normal(arg):
print('starting', arg)
await asyncio.sleep(1)
print('ending', arg)
async with asyncio.TaskGroup() as tg:
tg.create_task(f_normal(1))
tg.create_task(f_normal(2))
tg.create_task(f_error())
# starting 1
# starting 2
#----------
#< traceback of the error here >
In the example above I cannot make "ending 1" and "ending 2" to be printed. Meanwhile it will be very useful to have something like asyncio.gather(return_exceptions=True) option to do not cancel the remaining tasks when an error occurs.
You can say "just do not use TaskGroup if you do not want this cancellation behavior", but the answer is I want to use new exception groups feature and it's strictly bound to TaskGroup
So the questions are:
May I somehow utilize exception groups in asyncio without this all-or-nothing cancellation policy in TaskGroup?
If for the previous the answer is "NO": why python developers eliminated the possibility to disable cancellation in the TaskGroup API?
BaseExceptionGroups became part of standard Python in version 3.11. They are not bound to asyncio TaskGroup in any way. The documentation is here: https://docs.python.org/3/library/exceptions.html?highlight=exceptiongroup#ExceptionGroup.
Regarding your question 2, within the TaskGroup context you always have the option of creating a task using asyncio.create_task or loop.create_task. Such tasks will not be part of the TaskGroup and will not be cancelled when the TaskGroup closes. An exception in one of these tasks will not cause the group to close, provided the exception does not propagate into the group's __aexit__ method.
You also have the option of handling all errors within a Task. A Task that doesn't propagate an exception won't cancel the TaskGroup.
There's a good reason for enforcing Task cancellation when the group exits: the purpose of a group is to act as a self-contained collection of Tasks. It's contradictory to allow an uncancelled Task to continue after the group exits, potentially allowing tasks to leak out of the context.
As answered by Paul Cornelius, the TaskGroup class is carefully engineered to cancel itself and all its tasks at the moment when any task in it (registered with tg.create_task) raises an exception.
My understanding that a "forgiveful" task group, that would await for all other tasks upon it's context exit (end of async with block), regardless of ne or more tasks created in it erroring would still be useful, and that is the functionality you want.
I tinkered around the source code for the TaskGroup, and I think the minimal coding to get the forgiveful task group can be achieved by neutering its internal _abort method. This method is called on task exception handling, and all it does is loop through all tasks not yet done and cancel them. Tasks not cancelled would still be awaited at the end of the with block - and that is what we get by preventing _abort from running.
Keep in mind that as _abort starts with an underscore, it is an implementation detail, and the mechanisms for aborting might change inside TaskGroup even during Py 3.11 lifetime.
For now, I could get it working like this:
import asyncio
class ForgivingTaskGroup(asyncio.TaskGroup):
_abort = lambda self: None
async def f_error():
print("starting error")
raise RuntimeError("booom")
async def f_normal(arg):
print('starting', arg)
await asyncio.sleep(.1)
print('ending', arg)
async def main():
async with ForgivingTaskGroup() as tg:
tg.create_task(f_normal(1))
tg.create_task(f_normal(2))
tg.create_task(f_error())
# await asyncio.sleep(0)
asyncio.run(main())
The stdout I got here is:
starting 1
starting 2
starting error
ending 1
ending 2
And stderr displayed the beautiful ASCII-art tree as by the book, but with a single exception as child.

asyncio cancel task and related function run via run_in_executor

Just can't wrap my head around solving this issue, so maybe someone here can enlighten me or maybe even tell me that what I want to achieve isn't possible. :)
Problem statement:
I have an asyncio event loop, on that loop I create a task by supplying my asynchronous coroutine work(). I could then go ahead and cancel the task by invoking its cancel() method - this works.
But in my very special case, the asynchronous task itself spawns another operation, which is an underlying blocking / synchronous function.
What happens now, if I decide to cancel the task, is that my asynchronous work() function will be cancelled appropriately, however, the synchronous function is still going to be executed as if nothing ever happened.
I tried to make an example as simple as possible to illustrate my problem:
import asyncio
import time
def sync_work():
time.sleep(10)
print("sync work completed")
return "sync_work_result"
async def work(loop):
result = await loop.run_in_executor(None, sync_work)
print(f"sync_work {result}")
print("work completed")
async def main(loop):
t1 = loop.create_task(work(loop))
await asyncio.sleep(4)
t1.cancel()
loop = asyncio.get_event_loop()
try:
asyncio.ensure_future(main(loop))
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
print("loop closing")
loop.close()
This will print out sync work completed after about 10 seconds.
How would I invoke the synchronous function in a way, that would allow me to terminate it once my asynchronous task is cancelled? The tricky part is, that I would not have control over sync_work() as this comes from another external package.
I'm open to other approaches of calling my synchronous function from an asynchronous function that would allow it to be terminated properly in some kind of way.

Getting Result from Cancelled Task

I have a program, roughly like the example below.
A task is gathering a number of values and returning them to a caller.
Sometimes the tasks may get cancelled.
In those cases, I still want to get the results the tasks have gathered so far.
Hence I catch the CancelledError exception, clean up, and return the completed results.
async def f():
results = []
for i in range(100):
try:
res = await slow_call()
results.append(res)
except asyncio.CancelledError:
results.append('Undecided')
return results
def on_done(task):
if task.cancelled():
print('Incomplete result', task.result()
else:
print(task.result())
async def run():
task = asyncio.create_task(f())
task.add_done_callback(on_done)
The problem is that the value returned after a task is cancelled doesn't appear to be available in the task.
Calling task.result() simply rethrows CancelledError. Calling task._result is just None.
Is there a way to get the return value of a cancelled task, assuming it has one?
Edit: I realize now that catching the CancelledError results in the task not being cancelled at all.
This leaves me with another conundrum: How do I signal to the tasks owner that this result is only a "half" result, and the task has really been cancelled.
I suppose I could add an extra return value indicating this, but that seems to go against the whole idea of the task cancellation system.
Any suggestions for a good approach here?
I'm a long way away from understanding the use case, but the following does something sensible for me:
import asyncio
async def fn(results):
for i in range(10):
# your slow_call
await asyncio.sleep(0.1)
results.append(i)
def on_done(task, results):
if task.cancelled():
print('incomplete', results)
else:
print('complete', results)
async def run():
results = []
task = asyncio.create_task(fn(results))
task.add_done_callback(lambda t: on_done(t, results))
# give fn some time to finish, reducing this will cause the task to be cancelled
# you'll see the incomplete message if this is < 1.1
await asyncio.sleep(1.1)
asyncio.run(run())
it's the use of add_done_callback and sleep in run that feels very awkward and makes me think I don't understand what you're doing. maybe posting something to https://codereview.stackexchange.com containing more of the calling code would help get ideas of better ways to structure things. note that there are other libraries like trio that provide much nicer interfaces to Python coroutines than the asyncio builtin library (which was standardised prematurely IMO)
I don't think that is possible, because in my opinion, collides with the meaning of cancellation of a task.
You can implements a similar behavior inside your slow_call, by triggering the CancelledError, catching it inside your function and then returns whatever you want.

Detect failed tasks in concurrent.futures

I've been using concurrent.futures as it has a simple interface and let user easily control the max number of threads/processes. However, it seems like concurrent.futures hides failed tasks and continue the main thread after all tasks finished/failed.
import concurrent.futures
def f(i):
return (i + 's')
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
fs = [executor.submit(f, i ) for i in range(10)]
concurrent.futures.wait(fs)
Calling f on any integer leads an TypeError. However, the whole script runs just fine and exits with code 0. Is there any way to make it throw an exception/error when any thread failed?
Or, is there a better way to limit number of threads/processes without using concurrent.futures?
concurrent.futures.wait will ensure all the tasks completed, but it doesn't check success (something return-ed) vs. failure (exception raised and not caught in worker function). To do that, you need to call .result() on each Future (which will cause it to either re-raise the exception from the task, or produce the return-ed value). There are other methods to check without actually raising in the main thread (e.g. .exception()), but .result() is the most straightforward method.
If you want to make it re-raise, the simplest approach is just to replace the wait() call with:
for fut in concurrent.futures.as_completed(fs):
fut.result()
which will process results as Futures complete, and promptly raise an Exception if one occurred. Alternatively, you continue to use wait so all tasks finish before you check for exceptions on any of them, then iterate over fs directly and call .result() on each.
There is another way to do the same with multiprocessing.Pool (for processes) or multiprocessing.pool.ThreadPool (for threads). As far as I know it rethrows any caught exceptions.

Does asyncio.wait return only after all done_callbacks were called?

Imagine the following code:
import asyncio
loop = asyncio.get_event_loop()
#asyncio.coroutine
def coro():
yield from asyncio.sleep(1)
def done_callback(future):
print("Callback called")
#asyncio.coroutine
def run():
future = asyncio.async(coro(), loop=loop)
future.add_done_callback(done_callback)
yield from asyncio.wait([future])
print("Wait returned")
loop.run_until_complete(run())
Output is:
$ python3 /tmp/d.py
Callback called
Wait returned
So done_callback was called before wait returned.
Is this guaranteed behavior? I did not find anything in documentation about this.
Is this possible situation when done_callback called after wait returned?
With the current asyncio implementation, as long as add_done_callback is called before the event loop iteration that coro actually completes, all the callback scheduled with add_done_callback will execute before wait unblocks. The reason is that asyncio.wait internally calls add_done_callback on all the Future instances you pass to it, so it's just another callback in the callback chain for the Task. When your Task completes, asyncio calls set_result on it, which looks like this:
def set_result(self, result):
"""Mark the future done and set its result.
If the future is already done when this method is called, raises
InvalidStateError.
"""
if self._state != _PENDING:
raise InvalidStateError('{}: {!r}'.format(self._state, self))
self._result = result
self._state = _FINISHED
self._schedule_callbacks()
_schedule_callbacks looks like this:
def _schedule_callbacks(self):
"""Internal: Ask the event loop to call all callbacks.
The callbacks are scheduled to be called as soon as possible. Also
clears the callback list.
"""
callbacks = self._callbacks[:]
if not callbacks:
return
self._callbacks[:] = []
for callback in callbacks:
self._loop.call_soon(callback, self)
So, once the Task is done, loop.call_soon is used to schedule all the callbacks (which includes your done_callback function, and the callback added by asyncio.wait).
The event loop will process all the callbacks in the internal callback list in one iteration, which means both the asyncio.wait callback and your done_callback will be executed together in a single event loop iteration:
# This is the only place where callbacks are actually *called*.
# All other places just add them to ready.
# Note: We run all currently scheduled callbacks, but not any
# callbacks scheduled by callbacks run this time around --
# they will be run the next time (after another I/O poll).
# Use an idiom that is thread-safe without using locks.
ntodo = len(self._ready)
for i in range(ntodo):
handle = self._ready.popleft()
if handle._cancelled:
continue
if self._debug:
t0 = self.time()
handle._run()
So, as long as your add_done_callback ran prior to the event loop iteration where coro completed, you're guaranteed (at least by the current implementation) that it will run before asyncio.wait unblocks. However, if add_done_callback is executed either after coro completes or on the same event loop iteration that coro completes, it won't run until after asyncio.wait finishes.
I would say that if add_done_callback is called before asyncio.wait, like in your example, you can be confident it will always run before wait unblocks, since your callback will be ahead of the asyncio.wait callback in the callback chain. If you end up calling add_done_callback after asyncio.wait is started, it will still work for now, but theoretically the implementation could change in a way that would make it not; it could be changed to only process a limited number of callbacks on every event loop iteration, for example. I doubt that change will ever be made, but it's possible.

Categories