If i run this code, it will hang without throwing ZeroDivisionError.
If i move await asyncio.gather(*tasks, return_exceptions=True)
above await queue.join(), it will finally throw ZeroDivisionError and stop.
If i then comment out 1 / 0 and run, it will execute everything, but will hang in the end.
Now the question is, how can i achive both:
Being able to see unexpected exceptions as in the case 2 above, and...
Actually stop when all task are done in the Queue
.
import asyncio
import random
import time
async def worker(name, queue):
while True:
print('Get a "work item" out of the queue.')
sleep_for = await queue.get()
print('Sleep for the "sleep_for" seconds.')
await asyncio.sleep(sleep_for)
# Error on purpose
1 / 0
print('Notify the queue that the "work item" has been processed.')
queue.task_done()
print(f'{name} has slept for {sleep_for:.2f} seconds')
async def main():
print('Create a queue that we will use to store our "workload".')
queue = asyncio.Queue()
print('Generate random timings and put them into the queue.')
total_sleep_time = 0
for _ in range(20):
sleep_for = random.uniform(0.05, 1.0)
total_sleep_time += sleep_for
queue.put_nowait(sleep_for)
print('Create three worker tasks to process the queue concurrently.')
tasks = []
for i in range(3):
task = asyncio.create_task(worker(f'worker-{i}', queue))
tasks.append(task)
print('Wait until the queue is fully processed.')
started_at = time.monotonic()
print('Joining queue')
await queue.join()
total_slept_for = time.monotonic() - started_at
print('Cancel our worker tasks.')
for task in tasks:
task.cancel()
print('Wait until all worker tasks are cancelled.')
await asyncio.gather(*tasks, return_exceptions=True)
print('====')
print(f'3 workers slept in parallel for {total_slept_for:.2f} seconds')
print(f'total expected sleep time: {total_sleep_time:.2f} seconds')
asyncio.run(main())
There are several ways to approach this, but the central idea is that in asyncio, unlike in classic threading, it is straightforward to await multiple things at once.
For example, you can await queue.join() and the worker tasks, whichever completes first. Since workers don't complete normally (you cancel them later), a worker completing means that it has raised.
# convert queue.join() to a full-fledged task, so we can test
# whether it's done
queue_complete = asyncio.create_task(queue.join())
# wait for the queue to complete or one of the workers to exit
await asyncio.wait([queue_complete, *tasks], return_when=asyncio.FIRST_COMPLETED)
if not queue_complete.done():
# If the queue hasn't completed, it means one of the workers has
# raised - find it and propagate the exception. You can also
# use t.exception() to get the exception object. Canceling other
# tasks is another possibility.
for t in tasks:
if t.done():
t.result() # this will raise
A workaround (but ugly) solution: add try-catch block inside async def worker(...):, this will catch any exception in the code and prevent a no-ending loop.
Follow the same code as the question:
import asyncio
import random
import time
async def worker(name, queue):
while True:
try:
...
1 / 0 # Error code
...
except Exception as e:
print(e) # Show error
finanlly:
queue.task_done() # Make sure to clear the task
async def main():
...
asyncio.run(main())
Related
I have a queue example from here (Python+Tornado framework): https://www.tornadoweb.org/en/stable/queues.html
Now it's a sequential queue. How to make it parallel?
Since I don't fully understand tornado.queues now, it's not clear for me how should the code be changed to implement a parallel queue.
from tornado import gen
from tornado.ioloop import IOLoop
from tornado.queues import Queue
q = Queue(maxsize=2)
async def consumer():
async for item in q:
try:
print('Doing work on %s' % item)
await gen.sleep(0.01)
finally:
q.task_done()
async def producer():
for item in range(5):
await q.put(item)
print('Put %s' % item)
async def main():
# Start consumer without waiting (since it never finishes).
IOLoop.current().spawn_callback(consumer)
await producer() # Wait for producer to put all tasks.
await q.join() # Wait for consumer to finish all tasks.
print('Done')
IOLoop.current().run_sync(main)
I expect all the work to start simultaneously and then to finish simultaneously instead of doing tasks one by one.
Thanks a lot!
All you need to do is spawn multiple consumer tasks:
for i in range(num_consumers):
IOLoop.current().spawn_callback(consumer)
Then each consumer will be able to read from the queue and await things in parallel. (Note that because Tornado is single-threaded, anything that does not use await will block everything)
I have two tasks in a consumer/producer relationship, separated by a asyncio.Queue. If the producer task fails, I'd like the consumer task to also fail as soon as possible, and not wait indefinitely on the queue. The consumer task can be created(spawned) independently from the producer task.
In general terms, I'd like to implement a dependency between two tasks, such that the failure of one is also the failure of the other, while keeping those two tasks concurrent(i.e. one will not await the other directly).
What kind of solutions(e.g. patterns) could be used here?
Basically, I'm thinking of erlang's "links".
I think it may be possible to implement something similar using callbacks, i.e. asyncio.Task.add_done_callback
Thanks!
From the comment:
The behavior I'm trying to avoid is the consumer being oblivious to the producer's death and waiting indefinitely on the queue. I want the consumer to be notified of the producer's death, and have a chance to react. or just fail, and that even while it's also waiting on the queue.
Other than the answer presented by Yigal, another way is to set up a third task that monitors the two and cancels one when the other one finishes. This can be generalized to any two tasks:
async def cancel_when_done(source, target):
assert isinstance(source, asyncio.Task)
assert isinstance(target, asyncio.Task)
try:
await source
except:
# SOURCE is a task which we expect to be awaited by someone else
pass
target.cancel()
Now when setting up the producer and the consumer, you can link them with the above function. For example:
async def producer(q):
for i in itertools.count():
await q.put(i)
await asyncio.sleep(.2)
if i == 7:
1/0
async def consumer(q):
while True:
val = await q.get()
print('got', val)
async def main():
loop = asyncio.get_event_loop()
queue = asyncio.Queue()
p = loop.create_task(producer(queue))
c = loop.create_task(consumer(queue))
loop.create_task(cancel_when_done(p, c))
await asyncio.gather(p, c)
asyncio.get_event_loop().run_until_complete(main())
One way would be to propagate the exception through the queue, combined with delegation of the work handling:
class ValidWorkLoad:
async def do_work(self, handler):
await handler(self)
class HellBrokeLoose:
def __init__(self, exception):
self._exception = exception
async def do_work(self, handler):
raise self._exception
async def worker(name, queue):
async def handler(work_load):
print(f'{name} handled')
while True:
next_work = await queue.get()
try:
await next_work.do_work(handler)
except Exception as e:
print(f'{name} caught exception: {type(e)}: {e}')
break
finally:
queue.task_done()
async def producer(name, queue):
i = 0
while True:
try:
# Produce some work, or fail while trying
new_work = ValidWorkLoad()
i += 1
if i % 3 == 0:
raise ValueError(i)
await queue.put(new_work)
print(f'{name} produced')
await asyncio.sleep(0) # Preempt just for the sake of the example
except Exception as e:
print('Exception occurred')
await queue.put(HellBrokeLoose(e))
break
loop = asyncio.get_event_loop()
queue = asyncio.Queue(loop=loop)
producer_coro = producer('Producer', queue)
consumer_coro = worker('Consumer', queue)
loop.run_until_complete(asyncio.gather(producer_coro, consumer_coro))
loop.close()
Which outputs:
Producer produced
Consumer handled
Producer produced
Consumer handled
Exception occurred
Consumer caught exception: <class 'ValueError'>: 3
Alternatively you could skip the delegation, and designate an item that signals the worker to stop. When catching an exception in the producer you put that designated item in the queue.
Another possible solution:
import asyncio
def link_tasks(t1: Union[asyncio.Task, asyncio.Future], t2: Union[asyncio.Task, asyncio.Future]):
"""
Link the fate of two asyncio tasks,
such that the failure or cancellation of one
triggers the cancellation of the other
"""
def done_callback(other: asyncio.Task, t: asyncio.Task):
# TODO: log cancellation due to link propagation
if t.cancelled():
other.cancel()
elif t.exception():
other.cancel()
t1.add_done_callback(functools.partial(done_callback, t2))
t2.add_done_callback(functools.partial(done_callback, t1))
This uses asyncio.Task.add_done_callback to register callbacks that will cancel the other task if either one fails or is cancelled.
I'm confused about how to use asyncio.Queue for a particular producer-consumer pattern in which both the producer and consumer operate concurrently and independently.
First, consider this example, which closely follows that from the docs for asyncio.Queue:
import asyncio
import random
import time
async def worker(name, queue):
while True:
sleep_for = await queue.get()
await asyncio.sleep(sleep_for)
queue.task_done()
print(f'{name} has slept for {sleep_for:0.2f} seconds')
async def main(n):
queue = asyncio.Queue()
total_sleep_time = 0
for _ in range(20):
sleep_for = random.uniform(0.05, 1.0)
total_sleep_time += sleep_for
queue.put_nowait(sleep_for)
tasks = []
for i in range(n):
task = asyncio.create_task(worker(f'worker-{i}', queue))
tasks.append(task)
started_at = time.monotonic()
await queue.join()
total_slept_for = time.monotonic() - started_at
for task in tasks:
task.cancel()
# Wait until all worker tasks are cancelled.
await asyncio.gather(*tasks, return_exceptions=True)
print('====')
print(f'3 workers slept in parallel for {total_slept_for:.2f} seconds')
print(f'total expected sleep time: {total_sleep_time:.2f} seconds')
if __name__ == '__main__':
import sys
n = 3 if len(sys.argv) == 1 else sys.argv[1]
asyncio.run(main())
There is one finer detail about this script: the items are put into the queue synchronously, with queue.put_nowait(sleep_for) over a conventional for-loop.
My goal is to create a script that uses async def worker() (or consumer()) and async def producer(). Both should be scheduled to run concurrently. No one consumer coroutine is explicitly tied to or chained from a producer.
How can I modify the program above so that the producer(s) is its own coroutine that can be scheduled concurrently with the consumers/workers?
There is a second example from PYMOTW. It requires the producer to know the number of consumers ahead of time, and uses None as a signal to the consumer that production is done.
How can I modify the program above so that the producer(s) is its own coroutine that can be scheduled concurrently with the consumers/workers?
The example can be generalized without changing its essential logic:
Move the insertion loop to a separate producer coroutine.
Start the consumers in the background, letting them process the items as they are produced.
With the consumers running, start the producers and wait for them to finish producing items, as with await producer() or await gather(*producers), etc.
Once all producers are done, wait for consumers to process the remaining items with await queue.join().
Cancel the consumers, all of which are now idly waiting for the queue to deliver the next item, which will never arrive as we know the producers are done.
Here is an example implementing the above:
import asyncio, random
async def rnd_sleep(t):
# sleep for T seconds on average
await asyncio.sleep(t * random.random() * 2)
async def producer(queue):
while True:
# produce a token and send it to a consumer
token = random.random()
print(f'produced {token}')
if token < .05:
break
await queue.put(token)
await rnd_sleep(.1)
async def consumer(queue):
while True:
token = await queue.get()
# process the token received from a producer
await rnd_sleep(.3)
queue.task_done()
print(f'consumed {token}')
async def main():
queue = asyncio.Queue()
# fire up the both producers and consumers
producers = [asyncio.create_task(producer(queue))
for _ in range(3)]
consumers = [asyncio.create_task(consumer(queue))
for _ in range(10)]
# with both producers and consumers running, wait for
# the producers to finish
await asyncio.gather(*producers)
print('---- done producing')
# wait for the remaining tasks to be processed
await queue.join()
# cancel the consumers, which are now idle
for c in consumers:
c.cancel()
asyncio.run(main())
Note that in real-life producers and consumers, especially those that involve network access, you probably want to catch IO-related exceptions that occur during processing. If the exception is recoverable, as most network-related exceptions are, you can simply catch the exception and log the error. You should still invoke task_done() because otherwise queue.join() will hang due to an unprocessed item. If it makes sense to re-try processing the item, you can return it into the queue prior to calling task_done(). For example:
# like the above, but handling exceptions during processing:
async def consumer(queue):
while True:
token = await queue.get()
try:
# this uses aiohttp or whatever
await process(token)
except aiohttp.ClientError as e:
print(f"Error processing token {token}: {e}")
# If it makes sense, return the token to the queue to be
# processed again. (You can use a counter to avoid
# processing a faulty token infinitely.)
#await queue.put(token)
queue.task_done()
print(f'consumed {token}')
I am working on a project that needs to do some tasks asynchronously, but I am limited to using the least amount of extra subprocesses, for that I decided to have 2 processes: dispatcher_p and worker_p. Since I am making use of the async library I have two async tasks: async_worker and async_task. The program works as follows:
worker_p starts the event loop with a single task: async_worker
dispatcher_p waits on a queue for incoming data
dispatcher_p adds data to an async_queue via put_nowait()
async_worker which is awaiting the async_queue gets the data and starts a task using async_task and calls task_done()
For simplicity async_task just sleeps for 3 seconds and exits.
For simplicity lets strip out the sub task that async_worker should start, leaving us with the following code:
import multiprocessing as mp
import asyncio
async def task_worker(aq):
while True:
task = await aq.get()
if task is not None:
await asyncio.sleep(3)
aq.task_done()
else:
break
def dispatcher_p(ev_l, q, async_q):
asyncio.set_event_loop(ev_l)
while True:
task = q.get()
if task is not None:
async_q.put_nowait(task)
else:
async_q.put(None)
break
def worker_p(ev_l, aq):
asyncio.set_event_loop(ev_l)
ev_l.run_until_complete(asyncio.gather(task_worker(aq)))
ev_l.close()
q = mp.Queue()
def put_task(data):
q.put(data)
def init():
event_loop = asyncio.get_event_loop()
aq = asyncio.Queue()
p1 = mp.Process(target=worker_p, args=(event_loop, aq,))
p2 = mp.Process(target=dispatcher_p, args=(event_loop, q, aq))
p1.start()
p2.start()
# Test
put_task("hi")
put_task("bye")
put_task("test")
put_task(None)
if __name__ == '__main__':
init()
The problem is that even though task_worker is running in the event_loop, it freezes at task = await aq.get(). Why does this happen? I still don't understand how asyncio works across several processes.
For the record
The design of this script is flawed!
Things to consider:
The script creates 2 child processes and passes them a reference to the mp Queue and the async Queue.
Even though they report the same object id via python's id(), they seem to be running in different memory contexts, thus unable to communicate with each another
The reason await aq.get() gets stuck is that even though we're adding elements to the async queue passed to process 1, the async queue in process 2 is unable to see/get notified of that.
The solution is to use a multiprocessing queue for inter-process communication and an async queue within the same process. Using an async queue across processes does not work the way you'd expect and it's WRONG!
You can see this more clearly by printing the state of the async queue to console:
import multiprocessing as mp
import asyncio
async def task_worker(aq:asyncio.Queue):
print('ASYNC: Started')
while True:
print(f'ASYNC: Get info from async queue. {aq.qsize()}')
task = await aq.get()
if task is not None:
print(f'ASYNC: Work on task asynchronously: {task}')
await asyncio.sleep(3)
aq.task_done()
else:
print('ASYNC: Sentinel, end async worker')
break
print('ASYNC: Task done?')
def dispatcher_p(ev_l, q, async_q:asyncio.Queue):
print('DISPATCHER: Started')
asyncio.set_event_loop(ev_l)
while True:
print('DISPATCHER: Get task from mp queue')
task = q.get()
if task is not None:
print(f'DISPATCHER: Work on task: {task}')
async_q.put_nowait(task)
print(f'DISPATCHER: Async queue size: {async_q.qsize()}')
else:
print('DISPATCHER: Sentinel, send None')
async_q.put_nowait(None)
break
print('DISPATCHER: Dispatcher ended')
def worker_p(ev_l, aq):
print('WORKER: Started')
asyncio.set_event_loop(ev_l)
print('WORKER: Wait until loop ends')
ev_l.run_until_complete(asyncio.gather(task_worker(aq)))
print('WORKER: Loop ended')
ev_l.close()
print('WORKER: Loop closed')
print('WORKER: Worker ended')
q = mp.Queue()
def put_task(data):
q.put(data)
def init():
event_loop = asyncio.get_event_loop()
aq = asyncio.Queue()
p1 = mp.Process(target=worker_p, args=(event_loop, aq,))
p2 = mp.Process(target=dispatcher_p, args=(event_loop, q, aq))
p1.start()
p2.start()
# Test
put_task("hi")
put_task("bye")
put_task("test")
put_task(None)
if __name__ == '__main__':
init()
where
This is on Linux, Python 3.5.1.
what
I'm developing a monitor process with asyncio, whose tasks at various places await on asyncio.sleep calls of various durations.
There are points in time when I would like to be able to interrupt all said asyncio.sleep calls and let all tasks proceed normally, but I can't find how to do that. An example is for graceful shutdown of the monitor process.
how (failed assumption)
I thought that I could send an ALRM signal to that effect, but the process dies. I tried catching the ALRM signal with:
def sigalrm_sent(signum, frame):
tse.logger.info("got SIGALRM")
signal.signal(signal.SIGALRM, sigalrm_sent)
Then I get the log line about catching SIGALRM, but the asyncio.sleep calls are not interrupted.
how (kludge)
At this point, I replaced all asyncio.sleep calls with calls to this coroutine:
async def interruptible_sleep(seconds):
while seconds > 0 and not tse.stop_requested:
duration = min(seconds, tse.TIME_QUANTUM)
await asyncio.sleep(duration)
seconds -= duration
So I only have to pick a TIME_QUANTUM that is not too small and not too large either.
but
Is there a way to interrupt all running asyncio.sleep calls and I am missing it?
Interrupting all running calls of asyncio.sleep seems a bit dangerous since it can be used in other parts of the code, for other purposes. Instead I would make a dedicated sleep coroutine that keeps track of its running calls. It is then possible to interrupt them all by canceling the corresponding tasks:
def make_sleep():
async def sleep(delay, result=None, *, loop=None):
coro = asyncio.sleep(delay, result=result, loop=loop)
task = asyncio.ensure_future(coro)
sleep.tasks.add(task)
try:
return await task
except asyncio.CancelledError:
return result
finally:
sleep.tasks.remove(task)
sleep.tasks = set()
sleep.cancel_all = lambda: sum(task.cancel() for task in sleep.tasks)
return sleep
Example:
async def main(sleep, loop):
for i in range(10):
loop.create_task(sleep(i))
await sleep(3)
nb_cancelled = sleep.cancel_all()
await asyncio.wait(sleep.tasks)
return nb_cancelled
sleep = make_sleep()
loop = asyncio.get_event_loop()
result = loop.run_until_complete(main(sleep, loop))
print(result) # Print '6'
For debugging purposes, loop.time = lambda: float('inf') also works.
Based on Vincent's answer, I used the following class (every instance of the class can cancel all its running .sleep tasks, allowing better compartmentalization):
class Sleeper:
"Group sleep calls allowing instant cancellation of all"
def __init__(self, loop):
self.loop = loop
self.tasks = set()
async def sleep(self, delay, result=None):
coro = aio.sleep(delay, result=result, loop=self.loop)
task = aio.ensure_future(coro)
self.tasks.add(task)
try:
return await task
except aio.CancelledError:
return result
finally:
self.tasks.remove(task)
def cancel_all_helper(self):
"Cancel all pending sleep tasks"
cancelled = set()
for task in self.tasks:
if task.cancel():
cancelled.add(task)
return cancelled
async def cancel_all(self):
"Coroutine cancelling tasks"
cancelled = self.cancel_all_helper()
await aio.wait(self.tasks)
self.tasks -= cancelled
return len(cancelled)