Asynchronously wait for multiprocessing Queue in main process - python

I have the following scenario: multiple worker processes send events about their current status to an event dispatcher. This event dispatcher then needs to process all the events if we are in the main process or signal the event dispatcher of the main process to handle these events if we are in a worker process.
The main crux here is that event handling must also be in the main thread of the main process, so I can't just run a while True loop inside a thread and wait for messages from worker processes there.
So what I have is this:
import asyncio
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import current_process, Process, Queue
from threading import current_thread
from time import sleep
def get_q(q):
print("Waiting for the queue ({} / {})\n".format(current_thread().name, current_process().name))
return q.get()
async def message_q(q):
while True:
f = loop.run_in_executor(None, get_q, q)
await f
if f.result() is None:
print("Done")
return;
print("Got the result ({} / {})".format(current_thread().name, current_process().name))
print("Result is: {}\n".format(f.result()))
async def something_else():
while True:
print("Something else\n")
await asyncio.sleep(2)
def other_process(q):
for i in range(5):
print("Putting something in the queue ({})".format(current_process().name))
q.put(i)
sleep(1)
q.put(None)
q = Queue()
Process(target=other_process, args=(q,), daemon=True).start()
loop = asyncio.get_event_loop()
loop.set_default_executor(ThreadPoolExecutor(max_workers=1))
asyncio.ensure_future(message_q(q))
asyncio.ensure_future(something_else())
loop.run_until_complete(asyncio.sleep(6))
other_process() is an exemplary worker process which uses a Queue to signal the main process which runs an event loop to process stuff and also wait for any data on the Queue. In the real case, this process would signal the event dispatcher which would then handle the queue messaging, to pass the message on to the main process event dispatcher, but here I simplified it a bit.
However, I'm not quite satisfied with this. Submitting get_q() again and again to a ThreadPoolExecutor produces more overhead and isn't as clean as just one long-running thread. Also the await f there isn't optimal and blocks as soon as no further data is in the queue, which prevents the event loop from exiting. My workaround is to send None after the workers have finished and exit message_q() if None is in the queue.
Is there any better way to implement this? Performance is quite crucial and I would like to keep the Queue object local to the event dispatcher and not pass it to the code that manages the worker processes (or require calling some sort of finalize() method thereof).

I implemented this now as an async context manager. The context manager calls
asyncio.ensure_future(message_q())
in its __aenter__() method and adds None to the queue in its __aexit__() method to shut down the endless loop in message_q().
The context manager can then be used in an async with statement around the process-spawning code section, eliminating the need to call a shutdown method manually. It is, however, advisable to call await asyncio.sleep(0) inside the __aenter__() method after ensuring the message_q() coroutine to allow the context manager to initialize the queue listener. Otherwise, message_q() will not be called immediately. That is not a problem per se (because the queue is filled anyway), but it delays event forwarding until the next await occurs in the code.
The processes should be spawned using a ProcessPoolExecutor together with loop.run_in_executor(), so waiting for the processes doesn't block the event loop.
Instead of using a Queue, you may also want to use a JoinableQueue to make sure all events haven been processed before exiting the context manager.

Related

Why nest_asyncio makes the asyncio.run not to block the main thread where event loop works in although it is a blocking function?

Consider the follwing program:
import asyncio
import signal
import nest_asyncio
nest_asyncio.apply()
async_event_obj = asyncio.Event()
shutdown_command_issued = False
async def async_exit_handler():
await async_event_obj.wait()
def exit_handler(signal, frame):
global shutdown_command_issued
shutdown_command_issued = True
asyncio.run(async_exit_handler())
quit()
signal.signal(signal.SIGINT, exit_handler)
async def coroutine_one():
while True:
if not shutdown_command_issued:
print('coroutine one works')
await asyncio.sleep(1)
else:
break
print('Coroutine one finished.')
async_event_obj.set()
loop = asyncio.new_event_loop()
loop.create_task(coroutine_one())
loop.run_forever()
What I have done is: I added a sync signal handler (i.e.: exit_handler) to gently wait for the running tasks to be completed in the only event loop that runs in the main thread. As everybody knows, the asyncio.run is a synchronize blocking function and because it runs in the main thread where all the singal handlers, handle the signal and my event loop runs in, it has to block the main thread and stop other coroutines. But magically, when I use the nest_asyncio module, the asyncio.run function becomes non blocking and other coroutines in the event loop (i.e.: coroutine_one) continue their execution. What nest_asyncio does exactly under the hood? I know that it lets multiple event loops to be run in one single thread but how it makes a blocking function non blocking?

Run blocking and unblocking tasks together with asyncio

I want to run blocking and unblocking tasks together asynchronously. Obviously that it is necessary to use run_in_executor method for blocking tasks from asyncio. Here is my sample code:
import asyncio
import concurrent.futures
import datetime
import time
def blocking():
print("Enter to blocking()", datetime.datetime.now().time())
time.sleep(2)
print("Exited from blocking()", datetime.datetime.now().time())
async def waiter():
print("Enter to waiter()", datetime.datetime.now().time())
await asyncio.sleep(3)
print("Exit from waiter()", datetime.datetime.now().time())
async def asynchronous(loop):
print("Create tasks", datetime.datetime.now().time())
task_1 = asyncio.create_task(waiter())
executor = concurrent.futures.ThreadPoolExecutor(max_workers=3)
task_2 = loop.run_in_executor(executor, blocking)
tasks = [task_1, task_2]
print("Tasks are created", datetime.datetime.now().time())
await asyncio.wait(tasks)
if __name__ == "__main__":
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(asynchronous(loop))
except (OSError) as exc:
sys.exit('Exception: ' + str(exc))
Should I use the same event loop for blocking task in run_in_executor, or it is necessary to use another one? What should I change in my code to make it work asynchronously? Thanks
You must use the same loop. The loop delegates to the executor, which runs tasks is separate threads to the event loop. So you don't have to worry about your blocking tasks blocking the event loop. If you use a separate loop, your async functions from the event loop will not be able to await the results of blocking the functions run in the new loop.
The event loop manages this by creating a future to represent the executor task. It then runs the blocking task in one of the executors threads, and when the executor task returns the result of the future is set and control returned to awaiting function in the event loop (if any).

How to terminate an event loop

I have the following code in a django view to create a background task:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_in_executor(None, update_contacts, {
'email': email,
'access_token': g.tokens['access_token']
})
Is there anything I need to do at the end to 'kill' the loop? What would be the proper way to close it, etc?
You do not need to start any event loop in the first place. The concurrent.futures package gives direct access to Executors, and threading lets you launch individual Threads:
# raw thread
import threading
background_task = threading.Thread(
target=update_contacts, kwargs={
'email': email,
'access_token': g.tokens['access_token']
})
background_task.start()
# executor thread pool
from concurrent.futures import ThreadPoolExecutor
my_executor = ThreadPoolExecutor()
my_executor.submit(update_contacts, email=email, access_token=g.tokens['access_token'])
In general, a Thread is simpler if you just want to launch a task and forget about it. A ThreadPoolExecutor is more efficient if you have many small tasks at the same time; it can also be used to automatically wait for completion of several tasks.
print('start at', time.time())
with ThreadPoolExecutor() as executor:
executor.submit(time.sleep, 1)
executor.submit(time.sleep, 1)
executor.submit(time.sleep, 1)
executor.submit(time.sleep, 1)
print('done at', time.time()) # triggers after all 4 sleeps have finished
The primary purpose of loop.run_in_executor is not to provide a ThreadPoolExecutor. It is meant to bridge the gap between Executors for blocking code and the event loop for non-blocking code. Without the later, there is no need to use asnycio at all.
import time
import asyncio
def block(delay: float):
print("Stop! Blocking Time!")
time.sleep(delay) # block the current thread
print("Done! Blocking Time!")
async def nonblock(delay: float):
print("Erm.. Non-Blocking Time!")
await asyncio.sleep(delay)
print("Done! Non-Blocking Time!")
async def multiblock(delay: float):
loop = asyncio.get_event_loop()
await asyncio.gather( # await async natively and sync via executors
nonblock(delay),
loop.run_in_executor(None, block, delay),
nonblock(delay),
loop.run_in_executor(None, block, delay),
)
asyncio.run(multiblock(1))
Asyncio tasks can be canceled by calling the cancel method on the Task object. Tasks that run asynchronous code, such as those using the aiohttp library, will be canceled immediately. Tasks that run blocking code using run_in_executor will not be canceled because they are run in an OS thread behind the scenes.
This is part of the reason why run_in_executor is discouraged in asyncio code and is only intended as a stop-gap measure to include legacy blocking code in an asyncio program. (The other part is that the number of tasks is limited by the number of OS threads allowed by the pool, whereas the limit for the number of true asynchronous tasks is much higher.)

Please explain "Task was destroyed but it is pending!" after cancelling tasks

I am learning asyncio with Python 3.4.2 and I use it to continuously listen on an IPC bus, while gbulb listens on the DBus.
I created a function listen_to_ipc_channel_layer that continuously listens for incoming messages on the IPC channel and passes the message to message_handler.
I am also listening to SIGTERM and SIGINT. When I send a SIGTERM to the python process running the code you find at the bottom, the script should terminate gracefully.
The problem I am having is the following warning:
got signal 15: exit
Task was destroyed but it is pending!
task: <Task pending coro=<listen_to_ipc_channel_layer() running at /opt/mainloop-test.py:23> wait_for=<Future cancelled>>
Process finished with exit code 0
…with the following code:
import asyncio
import gbulb
import signal
import asgi_ipc as asgi
def main():
asyncio.async(listen_to_ipc_channel_layer())
loop = asyncio.get_event_loop()
for sig in (signal.SIGINT, signal.SIGTERM):
loop.add_signal_handler(sig, ask_exit)
# Start listening on the Linux IPC bus for incoming messages
loop.run_forever()
loop.close()
#asyncio.coroutine
def listen_to_ipc_channel_layer():
"""Listens to the Linux IPC bus for messages"""
while True:
message_handler(message=channel_layer.receive(["my_channel"]))
try:
yield from asyncio.sleep(0.1)
except asyncio.CancelledError:
break
def ask_exit():
loop = asyncio.get_event_loop()
for task in asyncio.Task.all_tasks():
task.cancel()
loop.stop()
if __name__ == "__main__":
gbulb.install()
# Connect to the IPC bus
channel_layer = asgi.IPCChannelLayer(prefix="my_channel")
main()
I still only understand very little of asyncio, but I think I know what is going on. While waiting for yield from asyncio.sleep(0.1) the signal handler caught the SIGTERM and in that process it calls task.cancel().
Shouldn't this trigger the CancelledError within the while True: loop? (Because it is not, but that is how I understand "Calling cancel() will throw a CancelledError to the wrapped coroutine").
Eventually loop.stop() is called which stops the loop without waiting for either yield from asyncio.sleep(0.1) to return a result or even the whole coroutine listen_to_ipc_channel_layer.
Please correct me if I am wrong.
I think the only thing I need to do is to make my program wait for the yield from asyncio.sleep(0.1) to return a result and/or coroutine to break out the while loop and finish.
I believe I confuse a lot of things. Please help me get those things straight so that I can figure out how to gracefully close the event loop without warning.
The problem comes from closing the loop immediately after cancelling the tasks. As the cancel() docs state
"This arranges for a CancelledError to be thrown into the wrapped coroutine on the next cycle through the event loop."
Take this snippet of code:
import asyncio
import signal
async def pending_doom():
await asyncio.sleep(2)
print(">> Cancelling tasks now")
for task in asyncio.Task.all_tasks():
task.cancel()
print(">> Done cancelling tasks")
asyncio.get_event_loop().stop()
def ask_exit():
for task in asyncio.Task.all_tasks():
task.cancel()
async def looping_coro():
print("Executing coroutine")
while True:
try:
await asyncio.sleep(0.25)
except asyncio.CancelledError:
print("Got CancelledError")
break
print("Done waiting")
print("Done executing coroutine")
asyncio.get_event_loop().stop()
def main():
asyncio.async(pending_doom())
asyncio.async(looping_coro())
loop = asyncio.get_event_loop()
for sig in (signal.SIGINT, signal.SIGTERM):
loop.add_signal_handler(sig, ask_exit)
loop.run_forever()
# I had to manually remove the handlers to
# avoid an exception on BaseEventLoop.__del__
for sig in (signal.SIGINT, signal.SIGTERM):
loop.remove_signal_handler(sig)
if __name__ == '__main__':
main()
Notice ask_exit cancels the tasks but does not stop the loop, on the next cycle looping_coro() stops it. The output if you cancel it is:
Executing coroutine
Done waiting
Done waiting
Done waiting
Done waiting
^CGot CancelledError
Done executing coroutine
Notice how pending_doom cancels and stops the loop immediately after. If you let it run until the pending_doom coroutines awakes from the sleep you can see the same warning you're getting:
Executing coroutine
Done waiting
Done waiting
Done waiting
Done waiting
Done waiting
Done waiting
Done waiting
>> Cancelling tasks now
>> Done cancelling tasks
Task was destroyed but it is pending!
task: <Task pending coro=<looping_coro() running at canceling_coroutines.py:24> wait_for=<Future cancelled>>
The meaning of the issue is that a loop doesn't have time to finish all the tasks.
This arranges for a CancelledError to be thrown into the wrapped coroutine on the next cycle through the event loop.
There is no chance to do a "next cycle" of the loop in your approach. To make it properly you should move a stop operation to a separate non-cyclic coroutine to give your loop a chance to finish.
Second significant thing is CancelledError raising.
Unlike Future.cancel(), this does not guarantee that the task will be cancelled: the exception might be caught and acted upon, delaying cancellation of the task or preventing cancellation completely. The task may also return a value or raise a different exception.
Immediately after this method is called, cancelled() will not return True (unless the task was already cancelled). A task will be marked as cancelled when the wrapped coroutine terminates with a CancelledError exception (even if cancel() was not called).
So after cleanup your coroutine must raise CancelledError to be marked as cancelled.
Using an extra coroutine to stop the loop is not an issue because it is not cyclic and be done immediately after execution.
def main():
loop = asyncio.get_event_loop()
asyncio.ensure_future(listen_to_ipc_channel_layer())
for sig in (signal.SIGINT, signal.SIGTERM):
loop.add_signal_handler(sig, ask_exit)
loop.run_forever()
print("Close")
loop.close()
#asyncio.coroutine
def listen_to_ipc_channel_layer():
while True:
try:
print("Running")
yield from asyncio.sleep(0.1)
except asyncio.CancelledError as e:
print("Break it out")
raise e # Raise a proper error
# Stop the loop concurrently
#asyncio.coroutine
def exit():
loop = asyncio.get_event_loop()
print("Stop")
loop.stop()
def ask_exit():
for task in asyncio.Task.all_tasks():
task.cancel()
asyncio.ensure_future(exit())
if __name__ == "__main__":
main()
I had this message and I believe it was caused by garbage collection of pending task. The Python developers were debating whether tasks created in asyncio should create strong references and decided they shouldn't (after 2 days of looking into this problem I strongly disagree! ... see the discussion here https://bugs.python.org/issue21163)
I created this utility for myself to make strong references to tasks and automatically clean it up (still need to test it thoroughly)...
import asyncio
#create a strong reference to tasks since asyncio doesn't do this for you
task_references = set()
def register_ensure_future(coro):
task = asyncio.ensure_future(coro)
task_references.add(task)
# Setup cleanup of strong reference on task completion...
def _on_completion(f):
task_references.remove(f)
task.add_done_callback(_on_completion)
return task
It seems to me that tasks should have a strong reference for as long as they are active! But asyncio doesn't do that for you so you can have some bad surprises once gc happens and long hours of debugging.
The reasons this happens is as explained by #Yeray Diaz Diaz
In my case, I wanted to cancel all the tasks that were not done after the first finished, so I ended up cancelling the extra jobs, then using loop._run_once() to run the loop a bit more and allow them to stop:
loop = asyncio.get_event_loop()
job = asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
tasks_finished,tasks_pending, = loop.run_until_complete(job)
tasks_done = [t for t in tasks_finished if t.exception() is None]
if tasks_done == 0:
raise Exception("Failed for all tasks.")
assert len(tasks_done) == 1
data = tasks_done[0].result()
for t in tasks_pending:
t.cancel()
t.cancel()
while not all([t.done() for t in tasks_pending]):
loop._run_once()

Should I use two asyncio event loops in one program?

I want use the Python 3 asyncio module to create a server application.
I use a main event loop to listen to the network, and when new data is received it will do some compute and send the result to the client. Does 'do some compute' need a new event loop? or can it use the main event loop?
You can do the compute work in the main event loop, but the whole event loop will be blocked while that happens - no other requests can be served, and anything else you have running in the event loop will be blocked. If this isn't acceptable, you probably want to run the compute work in a separate process, using BaseEventLoop.run_in_executor. Here's a very simple example demonstrating it:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def cpu_bound_worker(x, y):
print("in worker")
time.sleep(3)
return x +y
#asyncio.coroutine
def some_coroutine():
yield from asyncio.sleep(1)
print("done with coro")
#asyncio.coroutine
def main():
loop = asyncio.get_event_loop()
loop.set_default_executor(ProcessPoolExecutor())
asyncio.async(some_coroutine())
out = yield from loop.run_in_executor(None, cpu_bound_worker, 3, 4)
print("got {}".format(out))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output:
in worker
done with coro
got 7
cpu_bound_worker gets executed in a child process, and the event loop will wait for the result like it would any other non-blocking I/O operation, so it doesn't block other coroutines from running.

Categories