How to handle Exceptions and ordering in AsyncIO - python

Currently Im trying to process list of messages in the same order they arrive. To process it, I'm using python asyncio to execute each message as couroutine/task. So I will be creating the couroutine/task accordingly and I will add to asyncio run forever loop. but In case of corner cases where there can be some exception occur in the coroutine. At the time of exception, I planning to retry those message or handle it differently. but the next couroutine should not be invoked to preserver the order of execution.
Is there any way to handle this ?
from asyncio import get_event_loop, sleep
status = True
async def c(id, sleep_time=2, fail=False):
global status
print(f'started the edge side {id}')
if status:
print('c', sleep_time, fail)
await sleep(sleep_time)
if fail:
status = False
raise Exception('fail')
loop = get_event_loop()
loop.create_task(c(1, sleep_time=1, fail=False))
loop.create_task(c(2, sleep_time=1, fail=False))
loop.create_task(c(3, sleep_time=1, fail=True))
loop.create_task(c(4, sleep_time=1, fail=False))
loop.create_task(c(5, sleep_time=1, fail=False))
loop.run_forever()
small example that I have tried and still it not working as expected... can anyone please suggest, is there any possible way to handle this
Thanks

but the next couroutine should not be invoked to preserver the order
of execution
It sounds like you need a queue and push an item into the queue when the previous one is completed but then all messages will be processed sequentially.
The doc for asyncio.create_task says:
Wrap the coro coroutine into a Task and schedule its execution. Return
the Task object.
You can see this in the following example. All task has been scheduled and when the task no. 3 failed, the remaining ones are not done but those that take less time are already completed because they were already scheduled:
$ python test.py
START: 1
START: 2
START: 3
START: 4
START: 5
END: 1
END: 2
END: 4
DONE: ['1', '2', '3', '4']
PENDING: ['5']
Task: 3 failed (ERROR), re-scheduling...
test.py:
import asyncio
async def worker(i, sleep):
print(f"START: {i}")
await asyncio.sleep(sleep)
if i == 3:
raise Exception("ERROR")
print(f"END: {i}")
async def main():
tasks = []
for i in range(1, 6):
sleep = 4 if i == 3 else i
tasks.append(asyncio.create_task(worker(i, sleep), name=i))
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
print(f"DONE: {sorted(t.get_name() for t in done)}")
print(f"PENDING: {sorted(t.get_name() for t in pending)}")
for task in done:
if task.exception():
print(
f"Task: {task.get_name()} failed ({task.exception()}), re-scheduling..."
)
if __name__ == "__main__":
asyncio.run(main())

Related

asynchronous iteration, how to move to next step of iteration while waiting for a task to complete?

I am trying to write an iterator which moves on to the next step in the iteration while awaiting an IO bound task. To roughly demonstrate what I'm trying to do in code
for i in iterable:
await io_bound_task() # move on to next step in iteration
# do more stuff when task is complete
I initially tried running with a simple for loop, with a sleep simulating an IO bound task
import asyncio
import random
async def main() -> None:
for i in range(3):
print(f"starting task {i}")
result = await io_bound_task(i)
print(f"finished task {result}")
async def io_bound_task(i: int) -> int:
await asyncio.sleep(random.random())
return i
asyncio.run(main())
here the code runs synchronously and outputs
starting task 0
finished task 0
starting task 1
finished task 1
starting task 2
finished task 2
which I assume is because the for loop is blocking. So I think an asynchronous for loop is the way to proceed? so I try using an asynchronous iterator
from __future__ import annotations
import asyncio
import random
class AsyncIterator:
def __init__(self, max_value: int) -> None:
self.max_value = max_value
self.count = 0
def __aiter__(self) -> AsyncIterator:
return self
async def __anext__(self) -> int:
if self.count == self.max_value:
raise StopAsyncIteration
self.count += 1
return self.count
async def main() -> None:
async for i in AsyncIterator(3):
print(f"starting task {i}")
result = await io_bound_task(i)
print(f"finished task {result}")
async def io_bound_task(i: int) -> int:
await asyncio.sleep(random.random())
return i
asyncio.run(main())
but this also seems to run synchronously and results in the output
starting task 1
finished task 1
starting task 2
finished task 2
starting task 3
finished task 3
every time. So I think the asynchronous iterator is not doing what I assumed it would do? At this point I'm stuck. Is it an issue with my understanding of the asynchronous iterator? Can someone give me some pointers as to how to achieve what I'm trying to do?
I'm new to working with async, so apologies if I'm doing something stupid. Any help is appreciated. Thanks.
I'm on python 3.8.10 if that is a relevant detail.
The thing that you are looking for is called a task, and can be created using the asyncio.create_task function. All the approaches you tried involved awaiting the coroutine io_bound_task(i), and await means something like "wait for this to complete before continuing". If you wrap your coroutine in a task, then it will run in the background rather than you having to wait for it to complete before continuing.
Here is a version of your code using tasks:
import asyncio
import random
async def main() -> None:
tasks = []
for i in range(3):
print(f"starting task {i}")
tasks.append(asyncio.create_task(io_bound_task(i)))
for task in tasks:
result = await task
print(f"finished task {result}")
async def io_bound_task(i: int) -> int:
await asyncio.sleep(random.random())
return i
asyncio.run(main())
Output:
starting task 0
starting task 1
starting task 2
finished task 0
finished task 1
finished task 2
You can also use asyncio.gather (if you need all results before continuing) or asyncio.wait for awaiting multiple tasks, rather than a loop. For example if task 2 completes before task 0 and you don't want to wait for task 0, you could do:
async def main() -> None:
pending = []
for i in range(3):
print(f"starting task {i}")
pending.append(asyncio.create_task(io_bound_task(i)))
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for task in done:
result = await task
print(f"finished task {result}")

Asyncio running same task second time with different input shuts down first task too

I have a script where I have multiple async functions and I am running them in loop. Everything runs okay, except one task which I need to run twice with different input parameters.
def run(self):
checks_to_run = self.returnChecksBasedOnInputs()
loop = asyncio.new_event_loop().run_until_complete(self.run_all_checks_async(checks_to_run))
asyncio.set_event_loop(loop)
return self.output
async def run_all_checks_async(self,checks_to_run):
async with aiohttp.ClientSession() as session:
check_results = []
for single_check in checks_to_run:
if single_check == "cvim_check_storage": #can run parallel in separate thread for each az
total_number_of_azs = len(Constants.cvim_azs)+1
self.log.info(total_number_of_azs)
for x in range(1,total_number_of_azs):
task = asyncio.ensure_future(getattr(self, single_check)(session,x))
else:
task = asyncio.ensure_future(getattr(self, single_check)(session))
check_results.append(task)
await asyncio.gather(*check_results, return_exceptions=False)
class apiCaller:
def __init__(self):
pass
async def callAndReturnJson(self, method, url, headers, session, payload, log):
sslcontext = ssl.create_default_context(purpose=ssl.Purpose.CLIENT_AUTH)
try:
async with session.request(method, url, data=payload, headers=headers,ssl=sslcontext) as response:
response = await response.json()
print(str(response))
return response
except Exception as e:
print("here exception")
raise Exception(str(e))
The problem is here in this function - I am running it twice, but I noticed when the second version of the task goes to the return statement also first task closes down immediately. How can I avoid that and wait till other task also finishes ?
async def cvim_check_storage(self,session, aznumber):
response = await apiCaller().callAndReturnJson("POST",f"https://{single_cvim_az}/v1/diskmgmt/check_disks",getattr(Constants,cvim_az_headers),session=session, payload=payload,log=self.log)
self.log.info(str(response))
self.log.info(str(response.keys()))
if "diskmgmt_request" not in response.keys():
self.output.cvim_checks.cvim_raid_checks.results[az_plus_number].overall_status = "FAILED"
self.output.cvim_checks.cvim_raid_checks.results[az_plus_number].details = str(response)
return
...rest of the code if above if statement is false
The problem is how you track your tasks. You are using task to add new tasks to check_results, but in one of your branches, you are assigning task inside a for loop. You don't add task to check_results until after the loop completes, though, so only the last task gets added. gather won't wait for any of the other tasks created before completing.
The solution is to add task during each iteration of the inner for loop. There are a few different ways to spell that.
One option is to just call check_results.append anywhere you currently assign to task.
if single_check == "cvim_check_storage": #can run parallel in separate thread for each az
total_number_of_azs = len(Constants.cvim_azs)+1
self.log.info(total_number_of_azs)
for x in range(1,total_number_of_azs):
check_results.append(
asyncio.ensure_future(getattr(self, single_check)(session,x))
)
else:
check_results.append(
asyncio.ensure_future(getattr(self, single_check)(session))
)
I'd take it one step further and use a list comprehension when creating multiple tasks, though.
if single_check == "cvim_check_storage": #can run parallel in separate thread for each az
total_number_of_azs = len(Constants.cvim_azs)+1
self.log.info(total_number_of_azs)
check_results.extend(
[
asyncio.ensure_future(getattr(self, single_check)(session,x))
for x in range(1,total_number_of_azs)
]
)
else:
task = asyncio.ensure_future(getattr(self, single_check)(session))
check_results.append(task)

asyncio task was destroyed but it is pending

I am working a sample program that reads from a datasource (csv or rdbms) in chunks, makes some transformation and sends it via socket to a server.
But because the csv is very large, for testing purpose I want to break the reading after few chunks.
Unfortunately something goes wrong and I do not know what and how to fix it. Probably I have to do some cancellation, but now sure where and how. I get the following error:
Task was destroyed but it is pending!
task: <Task pending coro=<<async_generator_athrow without __name__>()>>
The sample code is:
import asyncio
import json
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.001)
yield chunk
async def send(row):
j = json.dumps(row)
print(f"to be sent: {j}")
await asyncio.sleep(0.001)
async def main():
i = 0
async for chunk in readChunks():
for k, v in chunk.items():
await asyncio.gather(send({k:v}))
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Many async resources, such as generators, need to be cleaned up with the help of an event loop. When an async for loop stops iterating an async generator via break, the generator is cleaned up by the garbage collector only. This means the task is pending (waits for the event loop) but gets destroyed (by the garbage collector).
The most straightforward fix is to aclose the generator explicitly:
async def main():
i = 0
aiter = readChunks() # name iterator in order to ...
try:
async for chunk in aiter:
...
i += 1
if i > 5:
break
finally:
await aiter.aclose() # ... clean it up when done
These patterns can be simplified using the asyncstdlib (disclaimer: I maintain this library). asyncstdlib.islice allows to take a fixed number of items before cleanly closing the generator:
import asyncstdlib as a
async def main():
async for chunk in a.islice(readChunks(), 5):
...
If the break condition is dynamic, scoping the iterator guarantees cleanup in any case:
import asyncstdlib as a
async def main():
async with a.scoped_iter(readChunks()) as aiter:
async for idx, chunk in a.enumerate(aiter):
...
if idx >= 5:
break
This works...
import asyncio
import json
import logging
logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s',
datefmt='%S')
root = logging.getLogger()
root.setLevel(logging.INFO)
async def readChunks():
# this is basically a dummy alternative for reading csv in chunks
df = [{"chunk_" + str(x) : [r for r in range(10)]} for x in range(10)]
for chunk in df:
await asyncio.sleep(0.002)
root.info('readChunks: next chunk coming')
yield chunk
async def send(row):
j = json.dumps(row)
root.info(f"to be sent: {j}")
await asyncio.sleep(0.002)
async def main():
i = 0
root.info('main: starting to read chunks')
async for chunk in readChunks():
for k, v in chunk.items():
root.info(f'main: sending an item')
#await asyncio.gather(send({k:v}))
stuff = await send({k:v})
i += 1
if i > 5:
break
#print(f"item in main via async generator is {chunk}")
##loop = asyncio.get_event_loop()
##loop.run_until_complete(main())
##loop.close()
if __name__ == '__main__':
asyncio.run(main())
... At least it runs and finishes.
The issue with stopping an async generator by reaking out of an async for loop is described in bugs.python.org/issue38013 and looks like it was fixed in 3.7.5.
However, using
loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(main())
loop.close()
I get a debug error but no Exception in Python 3.8.
Task was destroyed but it is pending!
task: <Task pending name='Task-8' coro=<<async_generator_athrow without __name__>()>>
Using the higher level API asyncio.run(main()) with debugging ON I do not get the debug message. If you are going to try and upgrade to Python 3.7.5-9 you probably should still use asyncio.run().
The problem is simple. You do early exit from loop, but async generator is not exhausted yet(its pending):
...
if i > 5:
break
...
Your readChunks is running in async and your loop. and without completing the program you are breaking it.
That's why it gives asyncio task was destroyed but it is pending
In short async task was doing its work in the background but you killed it by breaking the loop (stopping the program).

AsyncIO. How to call 1000 methods asynchronously and get the asynchronously result as soon as it is ready

I ask for your advice. I want to understand the work of an async on a simple example. According to the legend, you need to create 1000 workers who return some result. But you need to return it as soon as it is ready.
Here is an example:
import asyncio
async def worker(number):
print("worker # %d" % number)
await asyncio.sleep(0)
return str(number)
async def print_when_done(tasks):
for res in asyncio.as_completed(tasks):
print("Result worker %s" % await res)
coros = [worker(i) for i in range(10)]
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()
The problem is that the result of the work of this example is not synchronous, it simply calls the function without blocking the main process, and at the end it returns the responses of all functions
worker # 2
worker # 3
worker # 4
worker # 1
worker # 0
Result worker 2
Result worker 3
Result worker 4
Result worker 1
Result worker 0
But how to achieve the result in a similarity:
worker # 2
worker # 3
worker # 4
Result worker 3
Result worker 2
worker # 1
Result worker 4
worker # 0
Result worker 1
Result worker 0
You can create a ThreadPoolExecutor, of course, or the ProcessPoolExecutor. But then why do you need Asyncio, you can create threads without it and work with them.
You're looking for asyncio.wait:
from concurrent.futures import FIRST_COMPLETED
async def print_when_done(pending):
while True:
if not pending:
break
done, pending = await asyncio.wait(pending, return_when=FIRST_COMPLETED)
for res in done:
print("Result worker %s" % res)
But then why do you need Asyncio, you can create threads without it and work with them.
Sure, threads can be more efficient and you can do more things with them, but single-threaded asynchronous cooperative multi-tasking is simpler to coordinate.
it simply calls the function without blocking the main process, and at
the end it returns the responses of all functions
It starts all workers concurrently and this is how it should be, calculates theirs results immediately (since worker doesn't contain anything actually I/O blocking) and return results same time.
If you want to see workers return results in a different time you should make them execute different time - for example, by placing await asyncio.sleep(randint(1, 3)) instead your 0-sleep.
I'm not sure I understood why you want this:
worker # 2
worker # 3
worker # 4
Result worker 3
Since you have print in a top of each worker (without some I/O blocking actions before it) and run all workers concurrently - you will see all theirs prints immediately, before any result.
My random guess is that you may be want to limit count of workers running parallely? In this case you can use synchronization primitives like asyncio.Semaphore.
Here's example contains all above:
import asyncio
from random import randint
sem = asyncio.Semaphore(3) # don't allow more then 3 workers parallely
async def worker(number):
async with sem:
print("started # %d" % number)
await asyncio.sleep(randint(1, 3))
return str(number)
async def main():
coros = [worker(i) for i in range(10)]
for res in asyncio.as_completed(coros):
print("finished %s" % await res)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()
Output:
started # 0
started # 6
started # 7
started # 2
finished 7
started # 8
finished 0
started # 3
finished 6
started # 9
finished 2
started # 4
finished 8
started # 1
started # 5
finished 3
finished 9
finished 4
finished 1
finished 5

how to add a coroutine to a running asyncio loop?

How can one add a new coroutine to a running asyncio loop? Ie. one that is already executing a set of coroutines.
I guess as a workaround one could wait for existing coroutines to complete and then initialize a new loop (with the additional coroutine). But is there a better way?
You can use create_task for scheduling new coroutines:
import asyncio
async def cor1():
...
async def cor2():
...
async def main(loop):
await asyncio.sleep(0)
t1 = loop.create_task(cor1())
await cor2()
await t1
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
loop.close()
To add a function to an already running event loop you can use:
asyncio.ensure_future(my_coro())
In my case I was using multithreading (threading) alongside asyncio and wanted to add a task to the event loop that was already running. For anyone else in the same situation, be sure to explicitly state the event loop (as one doesn't exist inside a Thread). i.e:
In global scope:
event_loop = asyncio.get_event_loop()
Then later, inside your Thread:
asyncio.ensure_future(my_coro(), loop=event_loop)
Your question is very close to "How to add function call to running program?"
When exactly you need to add new coroutine to event loop?
Let's see some examples. Here program that starts event loop with two coroutines parallely:
import asyncio
from random import randint
async def coro1():
res = randint(0,3)
await asyncio.sleep(res)
print('coro1 finished with output {}'.format(res))
return res
async def main():
await asyncio.gather(
coro1(),
coro1()
) # here we have two coroutines running parallely
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output:
coro1 finished with output 1
coro1 finished with output 2
[Finished in 2.2s]
May be you need to add some coroutines that would take results of coro1 and use it as soon as it's ready? In that case just create coroutine that await coro1 and use it's returning value:
import asyncio
from random import randint
async def coro1():
res = randint(0,3)
await asyncio.sleep(res)
print('coro1 finished with output {}'.format(res))
return res
async def coro2():
res = await coro1()
res = res * res
await asyncio.sleep(res)
print('coro2 finished with output {}'.format(res))
return res
async def main():
await asyncio.gather(
coro2(),
coro2()
) # here we have two coroutines running parallely
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output:
coro1 finished with output 1
coro2 finished with output 1
coro1 finished with output 3
coro2 finished with output 9
[Finished in 12.2s]
Think about coroutines as about regular functions with specific syntax. You can start some set of functions to execute parallely (by asyncio.gather), you can start next function after first done, you can create new functions that call others.
If the task is to add a coroutine(s) to a loop that is already executing some coroutines, then you can use this solution of mine
import asyncio
import time
from threading import Thread
from random import randint
# first, we need a loop running in a parallel Thread
class AsyncLoopThread(Thread):
def __init__(self):
super().__init__(daemon=True)
self.loop = asyncio.new_event_loop()
def run(self):
asyncio.set_event_loop(self.loop)
self.loop.run_forever()
# example coroutine
async def coroutine(num, sec):
await asyncio.sleep(sec)
print(f'Coro {num} has finished')
if __name__ == '__main__':
# init a loop in another Thread
loop_handler = AsyncLoopThread()
loop_handler.start()
# adding first 5 coros
for i in range(5):
print(f'Add Coro {i} to the loop')
asyncio.run_coroutine_threadsafe(coroutine(i, randint(3, 5)), loop_handler.loop)
time.sleep(3)
print('Adding 5 more coros')
# adding 5 more coros
for i in range(5, 10):
print(f'Add Coro {i} to the loop')
asyncio.run_coroutine_threadsafe(coroutine(i, randint(3, 5)), loop_handler.loop)
# let them all finish
time.sleep(60)
After execution of this example we will get this output:
Add Coro 0 to the loop
Add Coro 1 to the loop
Add Coro 2 to the loop
Add Coro 3 to the loop
Add Coro 4 to the loop
Coro 0 has finished
Adding 5 more coros
Add Coro 5 to the loop
Add Coro 6 to the loop
Add Coro 7 to the loop
Add Coro 8 to the loop
Add Coro 9 to the loop
Coro 1 has finished
Coro 3 has finished
Coro 2 has finished
Coro 4 has finished
Coro 9 has finished
Coro 5 has finished
Coro 7 has finished
Coro 6 has finished
Coro 8 has finished
Process finished with exit code 0
None of the answers here seem to exactly answer the question. It is possible to add tasks to a running event loop by having a "parent" task do it for you. I'm not sure what the most pythonic way to make sure that parent doesn't end until it's children have all finished (assuming that's the behavior you want), but this does work.
import asyncio
import random
async def add_event(n):
print('starting ' + str(n))
await asyncio.sleep(n)
print('ending ' + str(n))
return n
async def main(loop):
added_tasks = []
delays = list(range(5))
# shuffle to simulate unknown run times
random.shuffle(delays)
for n in delays:
print('adding ' + str(n))
task = loop.create_task(add_event(n))
added_tasks.append(task)
await asyncio.sleep(0)
print('done adding tasks')
results = await asyncio.gather(*added_tasks)
print('done running tasks')
return results
loop = asyncio.get_event_loop()
results = loop.run_until_complete(main(loop))
print(results)

Categories