How to make this queue parallel?

How to make this queue parallel? - python

I have a queue example from here (Python+Tornado framework): https://www.tornadoweb.org/en/stable/queues.html
Now it's a sequential queue. How to make it parallel?
Since I don't fully understand tornado.queues now, it's not clear for me how should the code be changed to implement a parallel queue.
from tornado import gen
from tornado.ioloop import IOLoop
from tornado.queues import Queue
q = Queue(maxsize=2)
async def consumer():
async for item in q:
try:
print('Doing work on %s' % item)
await gen.sleep(0.01)
finally:
q.task_done()
async def producer():
for item in range(5):
await q.put(item)
print('Put %s' % item)
async def main():
# Start consumer without waiting (since it never finishes).
IOLoop.current().spawn_callback(consumer)
await producer() # Wait for producer to put all tasks.
await q.join() # Wait for consumer to finish all tasks.
print('Done')
IOLoop.current().run_sync(main)
I expect all the work to start simultaneously and then to finish simultaneously instead of doing tasks one by one.
Thanks a lot!

All you need to do is spawn multiple consumer tasks:
for i in range(num_consumers):
IOLoop.current().spawn_callback(consumer)
Then each consumer will be able to read from the queue and await things in parallel. (Note that because Tornado is single-threaded, anything that does not use await will block everything)

Related

Async Function Call Inside While Loop

I have a queue which stored on Redis lists. I'm trying to create async consumer for this queue. But couldn't call async function inside loop. Its working like sync function when I call.
import asyncio
async def worker():
print("starting sleep")
await asyncio.sleep(2)
print("slept")
async def main():
while True:
await worker()
asyncio.run(main())
Here is a short and simple example of mine implemantation. I'm expecting to see 'starting sleep' messages until first 'slept' message, it means for 2 seconds.

main is literally awaiting the completion of worker. Until worker is done, main won't progress. async tasks don't run in the background like in multithreading.
What you want is to keep launching new workers without awaiting each one of them. However, if you just keep doing this in a loop like this:
while True:
worker()
then you will never see any output of those workers, since this is an endless loop which never gives anything else the chance to run. You'd need to "break" this loop in some way to allow workers to progress. Here's an example of that:
import asyncio
async def worker():
print("starting sleep")
await asyncio.sleep(2)
print("slept")
async def main():
while True:
asyncio.ensure_future(worker())
await asyncio.sleep(0.5)
asyncio.run(main())
This will produce the expected outcome:
starting sleep
starting sleep
starting sleep
starting sleep
slept
starting sleep
slept
...
The await inside main transfers control back to the event loop, which now has the chance to run the piled up worker tasks, When those worker tasks await, they in turn transfer control back to the event loop, which will transfer it back to either main or a worker as their awaited sleep completes.
Note that this is only for illustration purposes; if and when you interrupt this program, you'll see notices about unawaited tasks which haven't completed. You should keep track of your tasks and await them all to completion at the end somewhere.

Here is an example using asyncio.wait:
import asyncio
async def worker():
print("starting sleep")
await asyncio.sleep(2)
print("slept")
async def main():
tasks = [worker() for each in range(10)]
await asyncio.wait(tasks)
asyncio.run(main())
It spawns all the workers together.

Python asyncio queue not showing any exceptions

If i run this code, it will hang without throwing ZeroDivisionError.
If i move await asyncio.gather(*tasks, return_exceptions=True)
above await queue.join(), it will finally throw ZeroDivisionError and stop.
If i then comment out 1 / 0 and run, it will execute everything, but will hang in the end.
Now the question is, how can i achive both:
Being able to see unexpected exceptions as in the case 2 above, and...
Actually stop when all task are done in the Queue
.
import asyncio
import random
import time
async def worker(name, queue):
while True:
print('Get a "work item" out of the queue.')
sleep_for = await queue.get()
print('Sleep for the "sleep_for" seconds.')
await asyncio.sleep(sleep_for)
# Error on purpose
1 / 0
print('Notify the queue that the "work item" has been processed.')
queue.task_done()
print(f'{name} has slept for {sleep_for:.2f} seconds')
async def main():
print('Create a queue that we will use to store our "workload".')
queue = asyncio.Queue()
print('Generate random timings and put them into the queue.')
total_sleep_time = 0
for _ in range(20):
sleep_for = random.uniform(0.05, 1.0)
total_sleep_time += sleep_for
queue.put_nowait(sleep_for)
print('Create three worker tasks to process the queue concurrently.')
tasks = []
for i in range(3):
task = asyncio.create_task(worker(f'worker-{i}', queue))
tasks.append(task)
print('Wait until the queue is fully processed.')
started_at = time.monotonic()
print('Joining queue')
await queue.join()
total_slept_for = time.monotonic() - started_at
print('Cancel our worker tasks.')
for task in tasks:
task.cancel()
print('Wait until all worker tasks are cancelled.')
await asyncio.gather(*tasks, return_exceptions=True)
print('====')
print(f'3 workers slept in parallel for {total_slept_for:.2f} seconds')
print(f'total expected sleep time: {total_sleep_time:.2f} seconds')
asyncio.run(main())

There are several ways to approach this, but the central idea is that in asyncio, unlike in classic threading, it is straightforward to await multiple things at once.
For example, you can await queue.join() and the worker tasks, whichever completes first. Since workers don't complete normally (you cancel them later), a worker completing means that it has raised.
# convert queue.join() to a full-fledged task, so we can test
# whether it's done
queue_complete = asyncio.create_task(queue.join())
# wait for the queue to complete or one of the workers to exit
await asyncio.wait([queue_complete, *tasks], return_when=asyncio.FIRST_COMPLETED)
if not queue_complete.done():
# If the queue hasn't completed, it means one of the workers has
# raised - find it and propagate the exception. You can also
# use t.exception() to get the exception object. Canceling other
# tasks is another possibility.
for t in tasks:
if t.done():
t.result() # this will raise

A workaround (but ugly) solution: add try-catch block inside async def worker(...):, this will catch any exception in the code and prevent a no-ending loop.
Follow the same code as the question:
import asyncio
import random
import time
async def worker(name, queue):
while True:
try:
...
1 / 0 # Error code
...
except Exception as e:
print(e) # Show error
finanlly:
queue.task_done() # Make sure to clear the task
async def main():
...
asyncio.run(main())

Using asyncio.Queue for producer-consumer flow

I'm confused about how to use asyncio.Queue for a particular producer-consumer pattern in which both the producer and consumer operate concurrently and independently.
First, consider this example, which closely follows that from the docs for asyncio.Queue:
import asyncio
import random
import time
async def worker(name, queue):
while True:
sleep_for = await queue.get()
await asyncio.sleep(sleep_for)
queue.task_done()
print(f'{name} has slept for {sleep_for:0.2f} seconds')
async def main(n):
queue = asyncio.Queue()
total_sleep_time = 0
for _ in range(20):
sleep_for = random.uniform(0.05, 1.0)
total_sleep_time += sleep_for
queue.put_nowait(sleep_for)
tasks = []
for i in range(n):
task = asyncio.create_task(worker(f'worker-{i}', queue))
tasks.append(task)
started_at = time.monotonic()
await queue.join()
total_slept_for = time.monotonic() - started_at
for task in tasks:
task.cancel()
# Wait until all worker tasks are cancelled.
await asyncio.gather(*tasks, return_exceptions=True)
print('====')
print(f'3 workers slept in parallel for {total_slept_for:.2f} seconds')
print(f'total expected sleep time: {total_sleep_time:.2f} seconds')
if __name__ == '__main__':
import sys
n = 3 if len(sys.argv) == 1 else sys.argv[1]
asyncio.run(main())
There is one finer detail about this script: the items are put into the queue synchronously, with queue.put_nowait(sleep_for) over a conventional for-loop.
My goal is to create a script that uses async def worker() (or consumer()) and async def producer(). Both should be scheduled to run concurrently. No one consumer coroutine is explicitly tied to or chained from a producer.
How can I modify the program above so that the producer(s) is its own coroutine that can be scheduled concurrently with the consumers/workers?
There is a second example from PYMOTW. It requires the producer to know the number of consumers ahead of time, and uses None as a signal to the consumer that production is done.

How can I modify the program above so that the producer(s) is its own coroutine that can be scheduled concurrently with the consumers/workers?
The example can be generalized without changing its essential logic:
Move the insertion loop to a separate producer coroutine.
Start the consumers in the background, letting them process the items as they are produced.
With the consumers running, start the producers and wait for them to finish producing items, as with await producer() or await gather(*producers), etc.
Once all producers are done, wait for consumers to process the remaining items with await queue.join().
Cancel the consumers, all of which are now idly waiting for the queue to deliver the next item, which will never arrive as we know the producers are done.
Here is an example implementing the above:
import asyncio, random
async def rnd_sleep(t):
# sleep for T seconds on average
await asyncio.sleep(t * random.random() * 2)
async def producer(queue):
while True:
# produce a token and send it to a consumer
token = random.random()
print(f'produced {token}')
if token < .05:
break
await queue.put(token)
await rnd_sleep(.1)
async def consumer(queue):
while True:
token = await queue.get()
# process the token received from a producer
await rnd_sleep(.3)
queue.task_done()
print(f'consumed {token}')
async def main():
queue = asyncio.Queue()
# fire up the both producers and consumers
producers = [asyncio.create_task(producer(queue))
for _ in range(3)]
consumers = [asyncio.create_task(consumer(queue))
for _ in range(10)]
# with both producers and consumers running, wait for
# the producers to finish
await asyncio.gather(*producers)
print('---- done producing')
# wait for the remaining tasks to be processed
await queue.join()
# cancel the consumers, which are now idle
for c in consumers:
c.cancel()
asyncio.run(main())
Note that in real-life producers and consumers, especially those that involve network access, you probably want to catch IO-related exceptions that occur during processing. If the exception is recoverable, as most network-related exceptions are, you can simply catch the exception and log the error. You should still invoke task_done() because otherwise queue.join() will hang due to an unprocessed item. If it makes sense to re-try processing the item, you can return it into the queue prior to calling task_done(). For example:
# like the above, but handling exceptions during processing:
async def consumer(queue):
while True:
token = await queue.get()
try:
# this uses aiohttp or whatever
await process(token)
except aiohttp.ClientError as e:
print(f"Error processing token {token}: {e}")
# If it makes sense, return the token to the queue to be
# processed again. (You can use a counter to avoid
# processing a faulty token infinitely.)
#await queue.put(token)
queue.task_done()
print(f'consumed {token}')

Combining asyncio with a multi-worker ProcessPoolExecutor

Is it possible to take a blocking function such as work and have it run concurrently in a ProcessPoolExecutor that has more than one worker?
import asyncio
from time import sleep, time
from concurrent.futures import ProcessPoolExecutor
num_jobs = 4
queue = asyncio.Queue()
executor = ProcessPoolExecutor(max_workers=num_jobs)
loop = asyncio.get_event_loop()
def work():
sleep(1)
async def producer():
for i in range(num_jobs):
results = await loop.run_in_executor(executor, work)
await queue.put(results)
async def consumer():
completed = 0
while completed < num_jobs:
job = await queue.get()
completed += 1
s = time()
loop.run_until_complete(asyncio.gather(producer(), consumer()))
print("duration", time() - s)
Running the above on a machine with more than 4 cores takes ~4 seconds. How would you write producer such that the above example takes only ~1 second?

await loop.run_in_executor(executor, work) blocks the loop until work completes, as a result you only have one function running at a time.
To run jobs concurrently, you could use asyncio.as_completed:
async def producer():
tasks = [loop.run_in_executor(executor, work) for _ in range(num_jobs)]
for f in asyncio.as_completed(tasks, loop=loop):
results = await f
await queue.put(results)

The problem is in the producer. Instead of allowing the jobs to run in the background, it waits for each job to finish, thus serializing them. If you rewrite producer to look like this (and leave consumer unchanged), you get the expected 1s duration:
async def producer():
for i in range(num_jobs):
fut = loop.run_in_executor(executor, work)
fut.add_done_callback(lambda f: queue.put_nowait(f.result()))

asyncio queue.get() gets stuck

I am working on a project that needs to do some tasks asynchronously, but I am limited to using the least amount of extra subprocesses, for that I decided to have 2 processes: dispatcher_p and worker_p. Since I am making use of the async library I have two async tasks: async_worker and async_task. The program works as follows:
worker_p starts the event loop with a single task: async_worker
dispatcher_p waits on a queue for incoming data
dispatcher_p adds data to an async_queue via put_nowait()
async_worker which is awaiting the async_queue gets the data and starts a task using async_task and calls task_done()
For simplicity async_task just sleeps for 3 seconds and exits.
For simplicity lets strip out the sub task that async_worker should start, leaving us with the following code:
import multiprocessing as mp
import asyncio
async def task_worker(aq):
while True:
task = await aq.get()
if task is not None:
await asyncio.sleep(3)
aq.task_done()
else:
break
def dispatcher_p(ev_l, q, async_q):
asyncio.set_event_loop(ev_l)
while True:
task = q.get()
if task is not None:
async_q.put_nowait(task)
else:
async_q.put(None)
break
def worker_p(ev_l, aq):
asyncio.set_event_loop(ev_l)
ev_l.run_until_complete(asyncio.gather(task_worker(aq)))
ev_l.close()
q = mp.Queue()
def put_task(data):
q.put(data)
def init():
event_loop = asyncio.get_event_loop()
aq = asyncio.Queue()
p1 = mp.Process(target=worker_p, args=(event_loop, aq,))
p2 = mp.Process(target=dispatcher_p, args=(event_loop, q, aq))
p1.start()
p2.start()
# Test
put_task("hi")
put_task("bye")
put_task("test")
put_task(None)
if __name__ == '__main__':
init()
The problem is that even though task_worker is running in the event_loop, it freezes at task = await aq.get(). Why does this happen? I still don't understand how asyncio works across several processes.

For the record
The design of this script is flawed!
Things to consider:
The script creates 2 child processes and passes them a reference to the mp Queue and the async Queue.
Even though they report the same object id via python's id(), they seem to be running in different memory contexts, thus unable to communicate with each another
The reason await aq.get() gets stuck is that even though we're adding elements to the async queue passed to process 1, the async queue in process 2 is unable to see/get notified of that.
The solution is to use a multiprocessing queue for inter-process communication and an async queue within the same process. Using an async queue across processes does not work the way you'd expect and it's WRONG!
You can see this more clearly by printing the state of the async queue to console:
import multiprocessing as mp
import asyncio
async def task_worker(aq:asyncio.Queue):
print('ASYNC: Started')
while True:
print(f'ASYNC: Get info from async queue. {aq.qsize()}')
task = await aq.get()
if task is not None:
print(f'ASYNC: Work on task asynchronously: {task}')
await asyncio.sleep(3)
aq.task_done()
else:
print('ASYNC: Sentinel, end async worker')
break
print('ASYNC: Task done?')
def dispatcher_p(ev_l, q, async_q:asyncio.Queue):
print('DISPATCHER: Started')
asyncio.set_event_loop(ev_l)
while True:
print('DISPATCHER: Get task from mp queue')
task = q.get()
if task is not None:
print(f'DISPATCHER: Work on task: {task}')
async_q.put_nowait(task)
print(f'DISPATCHER: Async queue size: {async_q.qsize()}')
else:
print('DISPATCHER: Sentinel, send None')
async_q.put_nowait(None)
break
print('DISPATCHER: Dispatcher ended')
def worker_p(ev_l, aq):
print('WORKER: Started')
asyncio.set_event_loop(ev_l)
print('WORKER: Wait until loop ends')
ev_l.run_until_complete(asyncio.gather(task_worker(aq)))
print('WORKER: Loop ended')
ev_l.close()
print('WORKER: Loop closed')
print('WORKER: Worker ended')
q = mp.Queue()
def put_task(data):
q.put(data)
def init():
event_loop = asyncio.get_event_loop()
aq = asyncio.Queue()
p1 = mp.Process(target=worker_p, args=(event_loop, aq,))
p2 = mp.Process(target=dispatcher_p, args=(event_loop, q, aq))
p1.start()
p2.start()
# Test
put_task("hi")
put_task("bye")
put_task("test")
put_task(None)
if __name__ == '__main__':
init()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make this queue parallel? - python

Related

Async Function Call Inside While Loop

Python asyncio queue not showing any exceptions

Using asyncio.Queue for producer-consumer flow

Combining asyncio with a multi-worker ProcessPoolExecutor

asyncio queue.get() gets stuck

Categories

Resources