Calling asyncio.run_coroutine_threadsafe across different event loops - python

I have a class inside a microservice that looks like this:
import asyncio
import threading
class A:
def __init__(self):
self.state = []
self._flush_thread = self._start_flush()
self.tasks = set()
def _start_flush(self):
threading.Thread(target=self._submit_flush).start()
def _submit_flush(self):
self._thread_loop = asyncio.new_event_loop()
self._thread_loop.run_until_complete(self.flush_state()) #
async def regular_func(self):
# This function is called on an event loop that is managed by asyncio.run()
# process self.state, fire and forget next func
task = asyncio.create_task(B.process_inputs(self.state)) # Should call process_inputs in the main thread event loop
self.tasks.add(task)
task.add_done_callback(self.tasks.discard)
pass
async def flush_state(self):
# flush out self.state at regular intervals, to next func
while True:
# flush state
asyncio.run_coroutine_threadsafe(B.process_inputs(self.state), self._thread_loop) # Calls process_inputs in the new thread event loop
await asyncio.sleep(10)
pass
class B:
#staticmethod
async def process_inputs(self, inputs):
# process
On these two threads, I have two separate event loops to avoid any other async functions in the main event loop from blocking other asyncio functions from running.
I see that asyncio.run_coroutine_threadsafe is thread safe when submitting to a given event loop. Is asyncio.run_coroutine_threadsafe(B.process_inputs()) called between different event loops still threadsafe?
Edit:
process_inputs uploads the state to an object store and calls an external API using the state we passed in.

The answer here is that asyncio.run_coroutine_threadsafe does not protect us from any thread safety issues across different event loops. We need to implement locks to protect any shared states while they are being modified. Credits to #Paul Cornelius for the reply.

"The run_coroutine_threadsafe() function allows a coroutine to be run in an asyncio program from another thread."
check out : Example of Running a Coroutine From Another Thread

Related

How can I make a recurring async task (I don't control where asyncio.run() is called)

I'm using a library that itself makes the call to asyncio.run(internal_function) so I can't control that at all. I do however have access to the event loop, it's something that I pass into this library.
Given that, is there some way I can set up an recurring async event that will execute every X seconds while the main library is running.
This doesn't exactly work, but maybe it's close?
import asyncio
from third_party import run
loop = asyncio.new_event_loop()
async def periodic():
while True:
print("doing a thing...")
await asyncio.sleep(30)
loop.create_task(periodic())
run(loop) # internally this will call asyncio.run() using the given loop
The problem here of course is that the task I've created is never awaited. But I can't just await it, because that would block.
Edit: Here's a working example of what I'm facing. When you run this code you will only ever see "third party code executing" and never see "doing my stuff...".
import asyncio
# I don't know how the loop argument is used
# by the third party's run() function,
def third_party_run(loop):
async def runner():
while True:
print("third party code executing")
await asyncio.sleep(5)
# but I do know that this third party eventually runs code
# that looks **exactly** like this.
try:
asyncio.run(runner())
except KeyboardInterrupt:
return
loop = asyncio.new_event_loop()
async def periodic():
while True:
print("doing my stuff...")
await asyncio.sleep(1)
loop.create_task(periodic())
third_party_run(loop)
If you run the above code you get:
third party code executing
third party code executing
third party code executing
^CTask was destroyed but it is pending!
task: <Task pending name='Task-1' coro=<periodic() running at example.py:22>>
/usr/local/Cellar/python#3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py:674: RuntimeWarning: coroutine 'periodic' was never awaited
You don't need to await on a created task.
It will run in the background as long as the event loop is active and is not stuck in a CPU bound operation.
According to your comment, you don't have an access to the event loop. In this case you don't have many options other than running in a different thread (which will have its own loop), or changing the loop creation policy in order to get the event loop, which is a very bad idea in most cases.
I found a way to make your test program run. However, it's a hack. It could fail, depending on the internal design of your third party library. From the information you provided, the library has been structured to be a black box. You can't interact with the event loop or schedule a callback. It seems like there might be a very good reason for this.
If I were you I would try to contact the library designer and let him know what your problem is. Perhaps there is a better solution. If this is a commercial project, I would make 100% certain that the team understands the issue, before attempting to use my below solution or anything like it.
The script below overrides one method (new_event_loop) in the DefaultEventLoopPolicy. When this method is called, I create a task in this loop to execute your periodic function. I don't know how often, or for what purpose, the library will call this function. Also, if the library internally overrides the EventLoopPolicy then this solution will not work. In both of these cases it may lead to unforeseeable consequences.
OK, enough disclaimers.
The only significant change to your test script was to replace the infinite loop in runner with a one that times out. This allowed me to verify that the program shuts down cleanly.
import asyncio
# I don't know how the loop argument is used
# by the third party's run() function,
def third_party_run():
async def runner():
for _ in range(4):
print("third party code executing")
await asyncio.sleep(5)
# but I do know that this third party eventually runs code
# that looks **exactly** like this.
try:
asyncio.run(runner())
except KeyboardInterrupt:
return
async def periodic():
while True:
print("doing my stuff...")
await asyncio.sleep(1)
class EventLoopPolicyHack(asyncio.DefaultEventLoopPolicy):
def __init__(self):
self.__running = None
super().__init__()
def new_event_loop(self):
# Override to create our periodic task in the new loop
# Get a loop from the superclass.
# This method must return that loop.
print("New event loop")
loop = super().new_event_loop()
if self.__running is not None:
self.__running.cancel() # I have no way to test this idea
self.__running = loop.create_task(periodic())
return loop
asyncio.set_event_loop_policy(EventLoopPolicyHack())
third_party_run()

How to encapsulate asyncio code in Python?

I'd like to use asyncio to do a lot of simultaneous non-blocking IO in Python. However, I want that use of asyncio to be abstracted away from the user--under the hood there's a lot of asychronous calls going on simultaneously to speed things up, but for the user there's a single, synchronous call.
Basically something like this:
async def _slow_async_fn(address):
data = await async_load_data(address)
return data
def synchronous_blocking_io()
addresses = ...
tasks = []
for address in addresses:
tasks.append(_slow_async_fn(address))
all_results = some_fn(asyncio.gather(*tasks))
return all_results
The problem is, how can I achieve this in a way that's agnostic to the user's running environment? I use a pattern like asyncio.get_event_loop().run_until_complete(), I run into issues if the code is being called inside an environment like Jupyter where there's already an event loop running. Is there a way to robustly gather the results of a set of asynchronous tasks that doesn't require pushing async/await statements all the way up the program?
The restriction on running loops is per thread, so running a new event loop is possible, as long as it is in a new thread.
import asyncio
import concurrent.futures
async def gatherer_of(tasks):
# It's necessary to wrap asyncio.gather() in a coroutine (reasons beyond scope)
return await asyncio.gather(*tasks)
def synchronous_blocking_io():
addresses = ...
tasks = []
for address in addresses:
tasks.append(_slow_async_fn(address))
loop = asyncio.new_event_loop()
return loop.run_until_complete(gatherer_of(tasks))
def synchronous_blocking_io_wrapper():
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
fut = executor.submit(synchronous_blocking_io)
return fut.result()
# Testing
async def async_runner():
# Simulating execution from a running loop
return synchronous_blocking_io_wrapper()
# Run from synchronous client
# print(synchronous_blocking_io_wrapper())
# Run from async client
# print(asyncio.run(async_runner()))
The same result can be achieved with the ProcessPoolExecutor, by manually running synchronous_blocking_io in a new thread and joining it, starting an entirely new process and so forth. As long as you are not in the same thread, you won't conflict with any running event loop.

Event loop error in asyncio Lock when instantiated multiple times

I'm running into some strange errors with initialising Locks and running asynchronous code. Suppose we had a class to use with some resource protected by a lock.
import asyncio
class C:
def __init__(self):
self.lock = asyncio.Lock()
async def foo(self):
async with self.lock:
return 'foo'
def async_foo():
c = C()
asyncio.run(c.foo())
if __name__ == '__main__':
async_foo()
async_foo()
This throws an error when run. It occurs on lock initialisation in init.
RuntimeError: There is no current event loop in thread 'MainThread'.
So duplicating the asyncio.run call in the function does not have this effect. It seems that the object needs to be initialised multiple times. It is also not enough to instantiate multiple locks in a single constructor. So perhaps it has something to do with the event loops state after asyncio.run is called.
What is going on? And how could I modify this code to work? Let me also clarify a bit, the instance is created outside asyncio.run and async functions for a reason. I'd like for it to be usable elsewhere too. If that makes a difference.
Alternatively, can threading.Lock be used for async things also? It would have the added benefit of being thread-safe, which asyncio.Lock reportedly is not.
What is going on?
When async object is created (asyncio.Lock()) it is attached to current event loop and can only be used with it
Main thread have some default current event loop (but other threads you create won't have default event loop)
asyncio.run() internally creates new event loop, set it current and close it after finished
So you're trying to use lock with event loop other than one it was attached to on creation. It leads to errors.
And how could I modify this code to work?
Ideal solution is following:
import asyncio
async def main():
# all your code is here
if __name__ == "__main__":
asyncio.run(main())
This will guarantee that every async object created is attached to proper event loop asyncio.run has created.
Running event loop (inside asyncio.run) is meant to be global "entry point" of your async program.
I'd like for it to be usable elsewhere too.
You're able to create an object outside asyncio.run, but then you should you should move creating async object from __init__ somewhere elsewhere so that asyncio.Lock() wouldn't be created until asyncio.run() is called.
Alternatively, can threading.Lock be used for async things also?
No, it is used to work with threads, while asyncio operates coroutines inside a single thread (usually).
It would have the added benefit of being thread-safe, which asyncio.Lock reportedly is not.
In asyncio you usually don't need threads other than main. There're still some reasons to do it, but thread-unsafety of asyncio.Lock shouldn't be an issue.
Consider reading following links. It may help to comprehend a situation better:
why we need asyncio/threads at all
When should I write asynchronous code instead of synchronous?

Share queue in event loop

Is it possible to share an asyncio.Queue over different tasks in one event loop?
The usecase:
Two tasks are publishing data on a queue, and one task is grabbing the new items from the Queue. All tasks in an asynchronous way.
main.py
import asyncio
import creator
async def pull_message(queue):
while True:
# Here I dont get messages, maybe the queue is always
# occupied by a other task?
msg = await queue.get()
print(msg)
if __name__ == "__main__"
loop = asyncio.get_event_loop()
queue = asyncio.Queue(loop=loop)
future = asyncio.ensure_future(pull_message(queue))
creators = list()
for i in range(2):
creators.append(loop.create_task(cr.populate_msg(queue)))
# add future to creators for easy handling
creators.append(future)
loop.run_until_complete(asyncio.gather(*creators))
creator.py
import asyncio
async def populate_msg(queue):
while True:
msg = "Foo"
await queue.put(msg)
The problem in your code is that populate_msg doesn't yield to the event loop because the queue is unbounded. This is somewhat counter-intuitive because the coroutine clearly contains an await, but that await only suspends the execution of the coroutine if the coroutine would otherwise block. Since put() on an unbounded queue never blocks, populate_msg is the only thing executed by the event loop.
The problem will go away once you change populate_msg to actually do something else (like await a network event). For testing purposes you can add await asyncio.sleep(0) inside the loop, which will force the coroutine to yield control to the event loop at every iteration of the while loop. Note that this will cause the event loop to spend an entire core by continuously spinning the loop.

Python asyncio run_coroutine_threadsafe never running coroutine?

I'm not sure what I'm doing wrong here, I'm trying to have a class which contains a queue and uses a coroutine to consume items on that queue. The wrinkle is that the event loop is being run in a separate thread (in that thread I do loop.run_forever() to get it running).
What I'm seeing though is that the coroutine for consuming items is never fired:
import asyncio
from threading import Thread
import functools
# so print always flushes to stdout
print = functools.partial(print, flush=True)
def start_loop(loop):
def run_forever(loop):
print("Setting loop to run forever")
asyncio.set_event_loop(loop)
loop.run_forever()
print("Leaving run forever")
asyncio.set_event_loop(loop)
print("Spawaning thread")
thread = Thread(target=run_forever, args=(loop,))
thread.start()
class Foo:
def __init__(self, loop):
print("in foo init")
self.queue = asyncio.Queue()
asyncio.run_coroutine_threadsafe(self.consumer(self.queue), loop)
async def consumer(self, queue):
print("In consumer")
while True:
message = await queue.get()
print(f"Got message {message}")
if message == "END OF QUEUE":
print(f"exiting consumer")
break
print(f"Processing {message}...")
def main():
loop = asyncio.new_event_loop()
start_loop(loop)
f = Foo(loop)
f.queue.put("this is a message")
f.queue.put("END OF QUEUE")
loop.call_soon_threadsafe(loop.stop)
# wait for the stop to propagate and complete
while loop.is_running():
pass
if __name__ == "__main__":
main()
Output:
Spawaning thread
Setting loop to run forever
in foo init
Leaving run forever
There are several issues with this code.
First, check the warnings:
test.py:44: RuntimeWarning: coroutine 'Queue.put' was never awaited
f.queue.put("this is a message")
test.py:45: RuntimeWarning: coroutine 'Queue.put' was never awaited
f.queue.put("END OF QUEUE")
That means queue.put is a coroutine, so it has to be run using run_coroutine_threadsafe:
asyncio.run_coroutine_threadsafe(f.queue.put("this is a message"), loop)
asyncio.run_coroutine_threadsafe(f.queue.put("END OF QUEUE"), loop)
You could also use queue.put_nowait which is a synchronous method. However, asyncio objects are generally not threadsafe so every synchronous call has to go through call_soon_threadsafe:
loop.call_soon_threadsafe(f.queue.put_nowait, "this is a message")
loop.call_soon_threadsafe(f.queue.put_nowait, "END OF QUEUE")
Another issue is that the loop gets stopped before the consumer task can start processing items. You could add a join method to the Foo class to wait for the consumer to finish:
class Foo:
def __init__(self, loop):
[...]
self.future = asyncio.run_coroutine_threadsafe(self.consumer(self.queue), loop)
def join(self):
self.future.result()
Then make sure to call this method before stopping the loop:
f.join()
loop.call_soon_threadsafe(loop.stop)
This should be enough to get the program to work as you expect. However, this code is still problematic on several aspects.
First, the loop should not be set both in the main thread and the extra thread. Asyncio loops are not meant to be shared between threads, so you need to make sure that everything asyncio related happens in the dedicated thread.
Since Foo is responsible for the communication between those two threads, you'll have to be extra careful to make sure every line of code runs in the right thread. For instance, the instantiation of asyncio.Queue has to happen in the asyncio thread.
See this gist for a corrected version of your program.
Also, I'd like to point out that this is not the typical use case for asyncio. You generally want to have an asyncio loop running in the main thread, especially if you need subprocess support:
asyncio supports running subprocesses from different threads, but there are limits:
An event loop must run in the main thread
The child watcher must be instantiated in the main thread, before executing subprocesses from other threads. Call the get_child_watcher() function in the main thread to instantiate the child watcher.
I would suggest designing your application the other way, i.e. running asyncio in the main thread and use run_in_executor for the synchronous blocking code.

Categories