I wrote code that seems to do what I want, but I'm not sure if it's a good idea since it mixes threads and event loops to run an infinite loop off the main thread. This is a minimal code snippet that captures the idea of what I'm doing:
import asyncio
import threading
msg = ""
async def infinite_loop():
global msg
while True:
msg += "x"
await asyncio.sleep(0.3)
def worker():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
asyncio.get_event_loop().run_until_complete(infinite_loop())
t = threading.Thread(target=worker, daemon=True)
t.start()
The main idea is that I have an infinite loop manipulating a global variable each 0.3 s. I want this infinite loop to run off the main thread so I can still access the shared variable in the main thread. This is especially useful in jupyter, because if I call run_until_complete in the main thread I can't interact with jupyter anymore. I want the main thread available to interactively access and modify msg. Using async might seem unnecessary in my example, but I'm using a library that has async code to run a server, so it's necessary. I'm new to async and threading in python, but I remember reading / hearing somewhere that using threading with asyncio is asking for trouble... is this a bad idea? Are there any potential concurrency issues with my approach?
I'm new to async and threading in python, but I remember reading / hearing somewhere that using threading with asyncio is asking for trouble...
Mixing asyncio and threading is discouraged for beginners because it leads to unnecessary complications and often stems from a lack of understanding of how to use asyncio correctly. Programmers new to asyncio often reach for threads by habit, using them for tasks for which coroutines would be more suitable.
But if you have a good reason to spawn a thread that runs the asyncio event loop, by all means do so - there is nothing that requires the asyncio event loop to be run in the main thread. Just be careful to interact with the event loop itself (call methods such as call_soon, create_task, stop, etc.) only from the thread that runs the event loop, i.e. from asyncio coroutines and callbacks. To safely interact with the event loop from the other threads, such as in your case the main thread, use loop.call_soon_threadsafe() or asyncio.run_coroutine_threadsafe().
Note that setting global variables and such doesn't count as "interacting" because asyncio doesn't observe those. Of course, it is up to you to take care of inter-thread synchronization issues, such as protecting access to complex mutable structures with locks.
is this a bad idea?
If unsure whether to mix threads and asyncio, you can ask yourself two questions:
Do I even need threads, given that asyncio provides coroutines that run in parallel and run_in_executor to await blocking code?
If I have threads providing parallelism, do I actually need asyncio?
Your question provides good answers to both - you need threads so that the main thread can interact with jupyter, and you need asyncio because you depend on a library that uses it.
Are there any potential concurrency issues with my approach?
The GIL ensures that setting a global variable in one thread and reading it in another is free of data races, so what you've shown should be fine.
If you add explicit synchronization, such as a multi-threaded queue or condition variable, you should keep in mind that the synchronization code must not block the event loop. In other words, you cannot just wait on, say, a threading.Event in an asyncio coroutine because that would block all coroutines. Instead, you can await an asyncio.Event, and set it using something like loop.call_soon_threadsafe(event.set) from the other thread.
Related
I have basically the following code and want to embed it in an async coroutine:
def read_midi():
midi_in = pygame.midi.Input(0)
while True:
if midi_in.poll():
midi_data = midi_in.read(1)[0][0]
# do something with midi_data, e.g. putting it in a Queue..
From my understanding since pygame is not asynchronous I have two options here: put the whole function in an extra thread or turn it into an async coroutine like this:
async def read_midi():
midi_in = pygame.midi.Input(1)
while True:
if not midi_in.poll():
await asyncio.sleep(0.1) # very bad!
continue
midi_data = midi_in.read(1)[0][0]
# do something with midi_data, e.g. putting it in a Queue..
So it looks like I have to either keep the busy loop and put it in a thread and waste lots of cpu time or put it into the (fake) coroutine above and introduce a tradeoff between time lags and wasting CPU time.
Am I wrong?
Is there a way to read MIDI without a busy loop?
Or even a way to await midi.Input.read?
It is true that the pygame library is not asynchronous, so you must either utilize a distinct thread or an asynchronous coroutine to process the MIDI input.
Using a distinct thread will permit the other parts of the program to carry on running concurrently to the MIDI input being read, but it will also necessitate more CPU resources.
Employing an async coroutine with the asyncio.sleep(0.1) call will result in a holdup in the MIDI input, although it will also reduce the CPU utilization. The trade-off here is between responsiveness and resource usage.
Using asyncio.sleep(0.1) will not be optimal as it will cause a considerable lag and it might not be wise to incorporate sleep in the while loop, as this will introduce a lot of holdup and won't be responsive.
Another possible choice is to utilize a library that furnishes an asynchronous interface for MIDI input, such as rtmidi-python or mido. These libraries may offer an approach to wait for MIDI input asynchronously without using a blocking call.
It seems that several asyncio functions, like those showed here, for synchronization primitives are not thread safe...
By being not thread safe, considering for example asyncio.Lock, I assume that this lock won't lock the global variable, when we're running multiple threads in our computer, so race conditions are problem.
So, what's the point of having this Lock that doesn't lock? (not a criticism, but an honest doubt)
What are the case uses for these unsafe primitives?
Asyncio is not made for multithreading or multiprocessing work, it is originally made for Asynchronous IO (network) operations with little overhead, hence a lock that is only running in a single thread (as is the case for tasks running in Asyncio eventloop) doesn't need to be threadsafe or process-safe, and therefore doesn't need to suffer from the extra overhead from using a thread-safe or process-safe lock.
using Thread and process executors is only added to allow mingling threaded futures and multiprocessing futures with applications running an eventloop futures seamlessly such as passing them to as_completed function or awaiting their completion as you would with a normal non-multithreaded asyncio task.
if you want a thread-safe lock you can use a thread.Lock, and if you want a process-safe lock you should use a multiprocessing.Lock and suffer the extra overhead.
keep in mind that those locks can still work in an asyncio eventloop and perform almost the same functionality as an asyncio.Lock, they just suffer from higher overhead and will make your code slower when used so don't use them unless you need your code to be Thread-safe or process-safe.
just to briefly explain the difference, when a thread is halted by a thread-safe lock the thread is halted and rescheduled by the operating system, which has a big overhead compared to Asyncio lock that will return to the eventloop again and continue execution instead of halting the thread.
Edit: a threading.Lock is not a direct replacement for asyncio.Lock, and instead you should use threading.RLock followed by an asyncio.Lock to make a function both thread-safe and asyncio-safe, as this will avoid a thread dead-locking itself.
Edit2: as commented by #dano, you can wait for a thread.Lock indirectly using the answer in this question if you want a function to work both threaded and in asyncio eventloop at the same time, but it is not recommended to run a function in both at the same time anyway How to use threading.Lock in async function while object can be accessed from multiple thread
In this minimal example, I expect the program to print foo and fuu as the tasks are scheduled.
import asyncio
async def coroutine():
while True:
print("foo")
async def main():
asyncio.create_task(coroutine())
#asyncio.run_coroutine_threadsafe(coroutine(),asyncio.get_running_loop())
while True:
print("fuu")
asyncio.run(main())
The program will only write fuu.
My aim is to simply have two tasks executing concurrently, but it seems like the created task is never scheduled.
I also tried using run_coroutine_threadsafe with no success.
If I add await asyncio.sleep(1) to the main. The created task takes the execution and the program will only write foo.
What I am supposed to do to run two tasks simultaneously using asyncio ?
I love this question and the explanation of this question tells how asyncio and python works.
Spoilers - It works similar to Javascript single threaded runtime.
So, let's look at your code.
You only have one main thread running which would be continuously running since python scheduler don't have to switch between threads.
Now, the thing is, your main thread that creates a task actually creates a coroutine(green threads, managed by python scheduler and not OS scheduler) which needs main thread to get executed.
Now the main thread is never free, since you have put while True, it is never free to execute anything else and your task never gets executed because python scheduler never does the switching because it is busy executing while True code.
The moment you put sleep, it detects that the current task is sleeping and it does the context switching and your coroutine kicks in.
My suggestion. if your tasks are I/O heavy, use tasks/coroutines and if they are CPU heavy which is in your case (while True), create either Python Threads or Processes and then the OS scheduler will take care of running your tasks, they will get CPU slice to run while True.
I have a class which processes a buch of work elements asynchronously (mainly due to overlapping HTTP connection requests) using asyncio. A very simplified example to demonstrate the structure of my code:
class Work:
...
def worker(self, item):
# do some work on item...
return
def queue(self):
# generate the work items...
yield from range(100)
async def run(self):
with ThreadPoolExecutor(max_workers=10) as executor:
loop = asyncio.get_event_loop()
tasks = [
loop.run_in_executor(executor, self.worker, item)
for item in self.queue()
]
for result in await asyncio.gather(*tasks):
pass
work = Work()
asyncio.run(work.run())
In practice, the workers need to access a shared container-like object and call its methods which are not async-safe. For example, let's say the worker method calls a function defined like this:
def func(shared_obj, value):
for node in shared_obj.filter(value):
shared_obj.remove(node)
However, calling func from a worker might affect the other asynchronous workers in this or any other function involving the shared object. I know that I need to use some synchronization, such as a global lock, but I don't find its usage easy:
asyncio.Lock can be used only in async functions, so I would have to mark all such function definitions as async
I would also have to await all calls of these functions
await is also usable only in async functions, so eventually all functions between worker and func would be async
if the worker was async, it would not be possible to pass it to loop.run_in_executor (it does not await)
Furthermore, some of the functions where I would have to add async may be generic in the sense that they should be callable from asynchronous as well as "normal" context.
I'm probably missing something serious in the whole concept. With the threading module, I would just create a lock and work with it in a couple of places, without having to further annotate the functions. Also, there is a nice solution to wrap the shared object such that all access is transparently guarded by a lock. I'm wondering if something similar is possible with asyncio...
I'm probably missing something serious in the whole concept. With the threading module, I would just create a lock...
What you are missing is that you're not really using asyncio at all. run_in_executor serves to integrate CPU-bound or legacy sync code into an asyncio application. It works by submitting the function it to a ThreadPoolExecutor and returning an awaitable handle which gets resolved once the function completes. This is "async" in the sense of running in the background, but not in the sense that is central to asyncio. An asyncio program is composed of non-blocking pieces that use async/await to suspend execution when data is unavailable and rely on the event loop to efficiently wait for multiple events at once and resume appropriate async functions.
In other words, as long as you rely on run_in_executor, you are just using threading (more precisely concurrent.futures with a threading executor). You can use a threading.Lock to synchronize between functions, and things will work exactly as if you used threading in the first place.
To get the benefits of asyncio such as scaling to a large number of concurrent tasks or reliable cancellation, you should design your program as async (or mostly async) from the ground up. Then you'll be able to modify shared data atomically simply by doing it between two awaits, or use asyncio.Lock for synchronized modification across awaits.
Hi I am new to asyncio and concept of event loops (Non-blocking IO)
async def subWorker():
...
async def firstWorker():
await subWorker()
async def secondWorker():
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
asyncio.ensure_future(firstWorker())
asyncio.ensure_future(secondWorker())
loop.run_forever()
here, when code starts, firstWorker() is executed and paused until it encounters await subWorker(). While firstWorker() is waiting, secondWorker() gets started.
Question is, when firstWorker() encounters await subWorker() and gets paused, the computer then will execute subWorker() and secondWorker() at the same time. Since the program has only 1 thread now, and I guess the single thread does secondWorker() work. Then who executes subWorker()? If single thread can only do 1 thing at a time, who else does the other jobs?
The assumption that subWorker and secondWorker execute at the same time is false.
The fact that secondWorker simply sleeps means that the available time will be spent in subWorker.
asyncio by definition is single-threaded; see the documentation:
This module provides infrastructure for writing single-threaded concurrent code
The event loop executes a task at a time, switching for example when one task is blocked while waiting for I/O, or, as here, voluntarily sleeping.
This is a little old now, but I've found the visualization from the gevent docs (about 1 screen down, beneath "Synchronous & Asynchronous Execution") to be helpful while teaching asynchronous flow control to coworkers: http://sdiehl.github.io/gevent-tutorial/
The most important point here is that only one coroutine is running at any one time, even though many may be in process.