Python asyncio: how single thread can handle mutiple things simultaneously? - python

Hi I am new to asyncio and concept of event loops (Non-blocking IO)
async def subWorker():
...
async def firstWorker():
await subWorker()
async def secondWorker():
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
asyncio.ensure_future(firstWorker())
asyncio.ensure_future(secondWorker())
loop.run_forever()
here, when code starts, firstWorker() is executed and paused until it encounters await subWorker(). While firstWorker() is waiting, secondWorker() gets started.
Question is, when firstWorker() encounters await subWorker() and gets paused, the computer then will execute subWorker() and secondWorker() at the same time. Since the program has only 1 thread now, and I guess the single thread does secondWorker() work. Then who executes subWorker()? If single thread can only do 1 thing at a time, who else does the other jobs?

The assumption that subWorker and secondWorker execute at the same time is false.
The fact that secondWorker simply sleeps means that the available time will be spent in subWorker.
asyncio by definition is single-threaded; see the documentation:
This module provides infrastructure for writing single-threaded concurrent code
The event loop executes a task at a time, switching for example when one task is blocked while waiting for I/O, or, as here, voluntarily sleeping.

This is a little old now, but I've found the visualization from the gevent docs (about 1 screen down, beneath "Synchronous & Asynchronous Execution") to be helpful while teaching asynchronous flow control to coworkers: http://sdiehl.github.io/gevent-tutorial/
The most important point here is that only one coroutine is running at any one time, even though many may be in process.

Related

replacing asyncio with concurent.futures

asyncio is causing issues on my spyder IDE => would like to replace it with concurent.futures library
how can I replace the below code relying only on concurent.futures library
asyncio.get_event_loop().run_until_complete(api(message))
exact function looks as follows
def async_loop(api, message):
return asyncio.get_event_loop().run_until_complete(api(message))
As written, you're starting up the event loop only until a particular task completes (which may or may not launch or wait on other tasks), and blocking until it completes. The only reason it's a task is because it needs to use async functions, those can only run in an event loop, and while running, they may launch other tasks or wait on other awaitables, and while waiting, the event loop can do other tasks.
In short, if not for the need to be an async task running in a non-async context, this would just be:
def async_loop(api, message):
return api(message)
which calls api and waits for it to complete.
Really, that's it. If the things api does or calls need to run some tasks asynchronously, without blocking on them immediately, you'd have some global executor, e.g.
executor = concurrent.Futures.ThreadPoolExecutor()
which would be used to launch tasks with:
fut = executor.submit(callable, 'arg1', 'arg2', kwarg1='somevalue')
and, when the result of the task is needed, someone would call:
value = fut.result()
on it (which would block if it wasn't done yet, return the result if it completed without an exception, or raise the exception it died with if it died with an exception).
Whenever you no longer need the executor, you just call .shutdown() on it and it will wait for all outstanding tasks to complete. That's it.
As a side-note, the error you're experiencing is part of why they've deprecated get_event_loop() in 3.10 (and discouraged it since 3.7). In all likelihood, the simplest solution to your problem (avoiding a switch to threads, because all that means is you've got new problems) is to use the much simpler high-level API, asyncio.run (introduced in 3.7), which creates an event loop, runs the task in it to completion, does reasonable cleanup, then returns the result:
def async_loop(api, message):
return asyncio.run(api(message))
There's also the asyncio.get_running_loop function (that is the exact replacement for get_event_loop) which you use when an event loop already exists (which you should typically be aware of; event loops don't pop into existence in given thread on their own, so you should know if you launched one; in this case you hadn't, so asyncio.run is the correct one to use).

Python asyncio.create_task() - really need to keep a reference?

The documentation of asyncio.create_task() states the following warning:
Important: Save a reference to the result of this function, to avoid a task disappearing mid execution. (source)
My question is: Is this really true?
I have several IO bound "fire and forget" tasks which I want to run concurrently using asyncio by submitting them to the event loop using asyncio.create_task(). However, I do not really care for the return value of the coroutine or even if they run successfully, only that they do run eventually. One use case is writing data from an "expensive" calculation back to a Redis data base. If Redis is available, great. If not, oh well, no harm. This is why I do not want/need to await those tasks.
Here a generic example:
import asyncio
async def fire_and_forget_coro():
"""Some random coroutine waiting for IO to complete."""
print('in fire_and_forget_coro()')
await asyncio.sleep(1.0)
print('fire_and_forget_coro() done')
async def async_main():
"""Main entry point of asyncio application."""
print('in async_main()')
n = 3
for _ in range(n):
# create_task() does not block, returns immediately.
# Note: We do NOT save a reference to the submitted task here!
asyncio.create_task(fire_and_forget_coro(), name='fire_and_forget_coro')
print('awaiting sleep in async_main()')
await asycnio.sleep(2.0) # <-- note this line
print('sleeping done in async_main()')
print('async_main() done.')
# all references of tasks we *might* have go out of scope when returning from this coroutine!
return
if __name__ == '__main__':
asyncio.run(async_main())
Output:
in async_main()
awaiting sleep in async_main()
in fire_and_forget_coro()
in fire_and_forget_coro()
in fire_and_forget_coro()
fire_and_forget_coro() done
fire_and_forget_coro() done
fire_and_forget_coro() done
sleeping done in async_main()
async_main() done.
When commenting out the await asyncio.sleep() line, we never see fire_and_forget_coro() finish. This is to be expected: When the event loop started with asyncio.run() closes, tasks will not be excecuted anymore. But it appears that as long as the event loop is still running, all tasks will be taken care of, even when I never explicitly created references to them. This seem logical to me, as the event loop itself must have a reference to all scheduled tasks in order to run them. And we can even get them all using asyncio.all_tasks()!
So, I think I can trust Python to have at least one strong reference to every scheduled tasks as long as the event loop it was submitted to is still running, and thus I do not have to manage references myself. But I would like a second opinion here. Am I right or are there pitfalls I have not yet recognized?
If I am right, why the explicit warning in the documentation? It is a usual Python thing that stuff is garbage-collected if you do not keep a reference to it. Are there situations where one does not have a running event loop but still some task objects to reference? Maybe when creating an event loop manually (never did this)?
There is an open issue at the cpython bug tracker at github about this topic I just found:
https://github.com/python/cpython/issues/88831
Quote:
asyncio will only keep weak references to alive tasks (in _all_tasks). If a user does not keep a reference to a task and the task is not currently executing or sleeping, the user may get "Task was destroyed but it is pending!".
So the answer to my question is, unfortunately, yes. One has to keep around a reference to the scheduled task.
However, the github issue also describes a relatively simple workaround: Keep all running tasks in a set() and add a callback to the task which removes itself from the set() again.
running_tasks = set()
# [...]
task = asyncio.create_task(some_background_function())
running_tasks.add(task)
task.add_done_callback(lambda t: running_tasks.remove(t))
In python3.11, there is a new API asyncio.TaskGroup.create_task.
It do the things that the other answer have mentioned, so you don't need to do it yourself.

Basic asyncio not executing second task

In this minimal example, I expect the program to print foo and fuu as the tasks are scheduled.
import asyncio
async def coroutine():
while True:
print("foo")
async def main():
asyncio.create_task(coroutine())
#asyncio.run_coroutine_threadsafe(coroutine(),asyncio.get_running_loop())
while True:
print("fuu")
asyncio.run(main())
The program will only write fuu.
My aim is to simply have two tasks executing concurrently, but it seems like the created task is never scheduled.
I also tried using run_coroutine_threadsafe with no success.
If I add await asyncio.sleep(1) to the main. The created task takes the execution and the program will only write foo.
What I am supposed to do to run two tasks simultaneously using asyncio ?
I love this question and the explanation of this question tells how asyncio and python works.
Spoilers - It works similar to Javascript single threaded runtime.
So, let's look at your code.
You only have one main thread running which would be continuously running since python scheduler don't have to switch between threads.
Now, the thing is, your main thread that creates a task actually creates a coroutine(green threads, managed by python scheduler and not OS scheduler) which needs main thread to get executed.
Now the main thread is never free, since you have put while True, it is never free to execute anything else and your task never gets executed because python scheduler never does the switching because it is busy executing while True code.
The moment you put sleep, it detects that the current task is sleeping and it does the context switching and your coroutine kicks in.
My suggestion. if your tasks are I/O heavy, use tasks/coroutines and if they are CPU heavy which is in your case (while True), create either Python Threads or Processes and then the OS scheduler will take care of running your tasks, they will get CPU slice to run while True.

Python asyncio wait_for synchronous

Using Python 3.6.8
async def sleeper():
time.sleep(2)
async def asyncio_sleeper():
await asyncio.sleep(2)
await asyncio.wait_for(sleeper(), 1)
await asyncio.wait_for(asyncio_sleeper(), 1)
Using time.sleep does NOT timeout, asyncio.sleep does timeout.
My intuition was that calling wait_for on a coroutine would base its timeout on how long the coroutine takes, not based on the individual async calls within the coroutine.
What is going on behind the scenes that results in this behavior, and is there a way to modify behavior to match my intuition?
What is going on behind the scenes that results in this behavior
The simplest answer is that asyncio is based on cooperative multitasking, and time.sleep doesn't cooperate. time.sleep(2) blocks the thread for two seconds, the event loop and all, and there is nothing anyone can do about it.
On the other hand, asyncio.sleep is carefully written so that when you await asyncio.sleep(2), it immediately suspends the current task and arranges with the event loop to resume it 2 seconds later. Asyncio's "sleeping" is implicit, which allows the event loop to proceed with other tasks while the coroutine is suspended. The same suspension system allows wait_for to cancel the task, which the event loop accomplishes by "resuming" it in such await that the await where it was suspended raises an exception.
In general, a coroutine not awaiting anything is good indication that it's incorrectly written and is a coroutine in name only. Awaits are the reason coroutines exist, and sleeper doesn't contain any.
is there a way to modify behavior to match my intuition?
If you must call legacy blocking code from asyncio, use run_in_executor. You will have to tell asyncio when you do so and allow it to execute the actual blocking call, like this:
async def sleeper():
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, time.sleep, 2)
time.sleep (or other blocking function) will be handed off to a separate thread, and sleeper will get suspended, to be resumed when time.sleep is done. Unlike with asyncio.sleep(), the blocking time.sleep(2) will still get called and block its thread for 2 seconds, but that will not affect the event loop, which will go about its business similar to how it did when await asyncio.sleep() was used.
Note that cancelling a coroutine that awaits run_in_executor will only cancel the waiting for the blocking time.sleep(2) to complete in the other thread. The blocking call will continue executing until completion, which is to be expected since there is no general mechanism to interrupt it.

run infinite loop in asyncio event loop running off the main thread

I wrote code that seems to do what I want, but I'm not sure if it's a good idea since it mixes threads and event loops to run an infinite loop off the main thread. This is a minimal code snippet that captures the idea of what I'm doing:
import asyncio
import threading
msg = ""
async def infinite_loop():
global msg
while True:
msg += "x"
await asyncio.sleep(0.3)
def worker():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
asyncio.get_event_loop().run_until_complete(infinite_loop())
t = threading.Thread(target=worker, daemon=True)
t.start()
The main idea is that I have an infinite loop manipulating a global variable each 0.3 s. I want this infinite loop to run off the main thread so I can still access the shared variable in the main thread. This is especially useful in jupyter, because if I call run_until_complete in the main thread I can't interact with jupyter anymore. I want the main thread available to interactively access and modify msg. Using async might seem unnecessary in my example, but I'm using a library that has async code to run a server, so it's necessary. I'm new to async and threading in python, but I remember reading / hearing somewhere that using threading with asyncio is asking for trouble... is this a bad idea? Are there any potential concurrency issues with my approach?
I'm new to async and threading in python, but I remember reading / hearing somewhere that using threading with asyncio is asking for trouble...
Mixing asyncio and threading is discouraged for beginners because it leads to unnecessary complications and often stems from a lack of understanding of how to use asyncio correctly. Programmers new to asyncio often reach for threads by habit, using them for tasks for which coroutines would be more suitable.
But if you have a good reason to spawn a thread that runs the asyncio event loop, by all means do so - there is nothing that requires the asyncio event loop to be run in the main thread. Just be careful to interact with the event loop itself (call methods such as call_soon, create_task, stop, etc.) only from the thread that runs the event loop, i.e. from asyncio coroutines and callbacks. To safely interact with the event loop from the other threads, such as in your case the main thread, use loop.call_soon_threadsafe() or asyncio.run_coroutine_threadsafe().
Note that setting global variables and such doesn't count as "interacting" because asyncio doesn't observe those. Of course, it is up to you to take care of inter-thread synchronization issues, such as protecting access to complex mutable structures with locks.
is this a bad idea?
If unsure whether to mix threads and asyncio, you can ask yourself two questions:
Do I even need threads, given that asyncio provides coroutines that run in parallel and run_in_executor to await blocking code?
If I have threads providing parallelism, do I actually need asyncio?
Your question provides good answers to both - you need threads so that the main thread can interact with jupyter, and you need asyncio because you depend on a library that uses it.
Are there any potential concurrency issues with my approach?
The GIL ensures that setting a global variable in one thread and reading it in another is free of data races, so what you've shown should be fine.
If you add explicit synchronization, such as a multi-threaded queue or condition variable, you should keep in mind that the synchronization code must not block the event loop. In other words, you cannot just wait on, say, a threading.Event in an asyncio coroutine because that would block all coroutines. Instead, you can await an asyncio.Event, and set it using something like loop.call_soon_threadsafe(event.set) from the other thread.

Categories