replacing asyncio with concurent.futures

replacing asyncio with concurent.futures - python

asyncio is causing issues on my spyder IDE => would like to replace it with concurent.futures library
how can I replace the below code relying only on concurent.futures library
asyncio.get_event_loop().run_until_complete(api(message))
exact function looks as follows
def async_loop(api, message):
return asyncio.get_event_loop().run_until_complete(api(message))

As written, you're starting up the event loop only until a particular task completes (which may or may not launch or wait on other tasks), and blocking until it completes. The only reason it's a task is because it needs to use async functions, those can only run in an event loop, and while running, they may launch other tasks or wait on other awaitables, and while waiting, the event loop can do other tasks.
In short, if not for the need to be an async task running in a non-async context, this would just be:
def async_loop(api, message):
return api(message)
which calls api and waits for it to complete.
Really, that's it. If the things api does or calls need to run some tasks asynchronously, without blocking on them immediately, you'd have some global executor, e.g.
executor = concurrent.Futures.ThreadPoolExecutor()
which would be used to launch tasks with:
fut = executor.submit(callable, 'arg1', 'arg2', kwarg1='somevalue')
and, when the result of the task is needed, someone would call:
value = fut.result()
on it (which would block if it wasn't done yet, return the result if it completed without an exception, or raise the exception it died with if it died with an exception).
Whenever you no longer need the executor, you just call .shutdown() on it and it will wait for all outstanding tasks to complete. That's it.
As a side-note, the error you're experiencing is part of why they've deprecated get_event_loop() in 3.10 (and discouraged it since 3.7). In all likelihood, the simplest solution to your problem (avoiding a switch to threads, because all that means is you've got new problems) is to use the much simpler high-level API, asyncio.run (introduced in 3.7), which creates an event loop, runs the task in it to completion, does reasonable cleanup, then returns the result:
def async_loop(api, message):
return asyncio.run(api(message))
There's also the asyncio.get_running_loop function (that is the exact replacement for get_event_loop) which you use when an event loop already exists (which you should typically be aware of; event loops don't pop into existence in given thread on their own, so you should know if you launched one; in this case you hadn't, so asyncio.run is the correct one to use).

Related

Python asyncio.create_task() - really need to keep a reference?

The documentation of asyncio.create_task() states the following warning:
Important: Save a reference to the result of this function, to avoid a task disappearing mid execution. (source)
My question is: Is this really true?
I have several IO bound "fire and forget" tasks which I want to run concurrently using asyncio by submitting them to the event loop using asyncio.create_task(). However, I do not really care for the return value of the coroutine or even if they run successfully, only that they do run eventually. One use case is writing data from an "expensive" calculation back to a Redis data base. If Redis is available, great. If not, oh well, no harm. This is why I do not want/need to await those tasks.
Here a generic example:
import asyncio
async def fire_and_forget_coro():
"""Some random coroutine waiting for IO to complete."""
print('in fire_and_forget_coro()')
await asyncio.sleep(1.0)
print('fire_and_forget_coro() done')
async def async_main():
"""Main entry point of asyncio application."""
print('in async_main()')
n = 3
for _ in range(n):
# create_task() does not block, returns immediately.
# Note: We do NOT save a reference to the submitted task here!
asyncio.create_task(fire_and_forget_coro(), name='fire_and_forget_coro')
print('awaiting sleep in async_main()')
await asycnio.sleep(2.0) # <-- note this line
print('sleeping done in async_main()')
print('async_main() done.')
# all references of tasks we *might* have go out of scope when returning from this coroutine!
return
if __name__ == '__main__':
asyncio.run(async_main())
Output:
in async_main()
awaiting sleep in async_main()
in fire_and_forget_coro()
in fire_and_forget_coro()
in fire_and_forget_coro()
fire_and_forget_coro() done
fire_and_forget_coro() done
fire_and_forget_coro() done
sleeping done in async_main()
async_main() done.
When commenting out the await asyncio.sleep() line, we never see fire_and_forget_coro() finish. This is to be expected: When the event loop started with asyncio.run() closes, tasks will not be excecuted anymore. But it appears that as long as the event loop is still running, all tasks will be taken care of, even when I never explicitly created references to them. This seem logical to me, as the event loop itself must have a reference to all scheduled tasks in order to run them. And we can even get them all using asyncio.all_tasks()!
So, I think I can trust Python to have at least one strong reference to every scheduled tasks as long as the event loop it was submitted to is still running, and thus I do not have to manage references myself. But I would like a second opinion here. Am I right or are there pitfalls I have not yet recognized?
If I am right, why the explicit warning in the documentation? It is a usual Python thing that stuff is garbage-collected if you do not keep a reference to it. Are there situations where one does not have a running event loop but still some task objects to reference? Maybe when creating an event loop manually (never did this)?

There is an open issue at the cpython bug tracker at github about this topic I just found:
https://github.com/python/cpython/issues/88831
Quote:
asyncio will only keep weak references to alive tasks (in _all_tasks). If a user does not keep a reference to a task and the task is not currently executing or sleeping, the user may get "Task was destroyed but it is pending!".
So the answer to my question is, unfortunately, yes. One has to keep around a reference to the scheduled task.
However, the github issue also describes a relatively simple workaround: Keep all running tasks in a set() and add a callback to the task which removes itself from the set() again.
running_tasks = set()
# [...]
task = asyncio.create_task(some_background_function())
running_tasks.add(task)
task.add_done_callback(lambda t: running_tasks.remove(t))

In python3.11, there is a new API asyncio.TaskGroup.create_task.
It do the things that the other answer have mentioned, so you don't need to do it yourself.

Basic asyncio not executing second task

In this minimal example, I expect the program to print foo and fuu as the tasks are scheduled.
import asyncio
async def coroutine():
while True:
print("foo")
async def main():
asyncio.create_task(coroutine())
#asyncio.run_coroutine_threadsafe(coroutine(),asyncio.get_running_loop())
while True:
print("fuu")
asyncio.run(main())
The program will only write fuu.
My aim is to simply have two tasks executing concurrently, but it seems like the created task is never scheduled.
I also tried using run_coroutine_threadsafe with no success.
If I add await asyncio.sleep(1) to the main. The created task takes the execution and the program will only write foo.
What I am supposed to do to run two tasks simultaneously using asyncio ?

I love this question and the explanation of this question tells how asyncio and python works.
Spoilers - It works similar to Javascript single threaded runtime.
So, let's look at your code.
You only have one main thread running which would be continuously running since python scheduler don't have to switch between threads.
Now, the thing is, your main thread that creates a task actually creates a coroutine(green threads, managed by python scheduler and not OS scheduler) which needs main thread to get executed.
Now the main thread is never free, since you have put while True, it is never free to execute anything else and your task never gets executed because python scheduler never does the switching because it is busy executing while True code.
The moment you put sleep, it detects that the current task is sleeping and it does the context switching and your coroutine kicks in.
My suggestion. if your tasks are I/O heavy, use tasks/coroutines and if they are CPU heavy which is in your case (while True), create either Python Threads or Processes and then the OS scheduler will take care of running your tasks, they will get CPU slice to run while True.

is asyncio.to_thread() method different to ThreadPoolExecutor?

I see that asyncio.to_thread() method is been added #python 3.9+, its description says it runs blocking codes on a separate thread to run at once. see example below:
def blocking_io():
print(f"start blocking_io at {time.strftime('%X')}")
# Note that time.sleep() can be replaced with any blocking
# IO-bound operation, such as file operations.
time.sleep(1)
print(f"blocking_io complete at {time.strftime('%X')}")
async def main():
print(f"started main at {time.strftime('%X')}")
await asyncio.gather(
asyncio.to_thread(blocking_io),
asyncio.sleep(1))
print(f"finished main at {time.strftime('%X')}")
asyncio.run(main())
# Expected output:
#
# started main at 19:50:53
# start blocking_io at 19:50:53
# blocking_io complete at 19:50:54
# finished main at 19:50:54
By explanation, it seems like using thread mechanism and not context switching nor coroutine. Does this mean it is not actually an async after all? is it same as a traditional multi-threading as in concurrent.futures.ThreadPoolExecutor? what is the benefit of using thread this way then?

Source code of to_thread is quite simple. It boils down to awaiting run_in_executor with a default executor (executor argument is None) which is ThreadPoolExecutor.
In fact, yes, this is traditional multithreading, сode intended to run on a separate thread is not asynchronous, but to_thread allows you to await for its result asynchronously.
Also note that the function runs in the context of the current task, so its context variable values will be available inside the func.
async def to_thread(func, /, *args, **kwargs):
"""Asynchronously run function *func* in a separate thread.
Any *args and **kwargs supplied for this function are directly passed
to *func*. Also, the current :class:`contextvars.Context` is propogated,
allowing context variables from the main thread to be accessed in the
separate thread.
Return a coroutine that can be awaited to get the eventual result of *func*.
"""
loop = events.get_running_loop()
ctx = contextvars.copy_context()
func_call = functools.partial(ctx.run, func, *args, **kwargs)
return await loop.run_in_executor(None, func_call)

you would use asyncio.to_tread when ever you need to call a blocking api from a third party lib that either does not have an asyncio adapter/interface or where you do not want to create one because you just need to use a limited number of functions form that lib.
a concrete example is i am currently writing a applicaiton that will eventually run as a daemon at which point it will use asyncio for its core event loop. The eventloop will involved monitoring a unix socket for notifications which will trigger the deamon to take an action.
for rapid prototyping its currently a cli but one of the depencies/external system the deamon will interact with is call libvirt, an abstraction layer for virtual machine management written in c with a python wrapper called libvirt python.
the python binding are blocking and comunitcate with the libvirt deamon over a separate unix socket with a blocking request responce protocol.
you can conceptually think of making a call to the libvirt bindings as each function internally making a http request to a server and waiting for the server to complete the action. The exact mechanics of how it does that are not important for this disucssion just that its a blocking io operation that depends on and external process that may take some time. i.e. this is not a cpu bound call and therefore it can be offloaded to a thread and awaited.
if i was to directly call “domains = libvirt.conn.listAllDomains()” in a async function
that would block my asyncio event loop until i got a responce form libvirt.
so if any events were recived on the unix socket my main loop is monitoring
they would not be processed while we are waiting for the libvirt deamon to look up all domains and return the list of them to us.
if i use “domains = await asyncio.to_thread(libvirt.conn.listAllDomains)”
however the await call will suspend my current coroutine until we get the responce, yeilding execution back to the asyncio event loop. that means if the daemon recives a notification while we are waiting on libvirt it can be schduled to run concurrently instead of being blocked.
in my application i will also need to read and write to linux speical files in /sys. linux has natiave aio file support which can be used with asyncio vai aiofile however linux does not supprot the aio interface for managing special files, so i would have to use blocking io.
one way to do that in a async applicaiton would be to wrap function that writes to the special files asyncio.to_thread.
i could and might use a decorator to use run_in_executor directly since i own the write_sysfs function but if i did not then to_thread is more polite then monkeypatching someone else’s lib and less work then creating my own wrapper api.
hopefully those are useful examples of where you might want to use to_thread. its really just a convince function and you can use run_in_executor to do the same thing with so addtional overhead.
if you need to support older python release you might also prefer run_in_executor since it predates the intorduction of to_thread but if you can assume 3.9+ then its a nice addtion to leverage when you need too.

When is asyncio's default scheduler fair?

It's my understanding that asyncio.gather is intended to run its arguments concurrently and also that when a coroutine executes an await expression it provides an opportunity for the event loop to schedule other tasks. With that in mind, I was surprised to see that the following snippet ignores one of the inputs to asyncio.gather.
import asyncio
async def aprint(s):
print(s)
async def forever(s):
while True:
await aprint(s)
async def main():
await asyncio.gather(forever('a'), forever('b'))
asyncio.run(main())
As I understand it, the following things happen:
asyncio.run(main()) does any necessary global initialization of the event loop and schedules main() for execution.
main() schedules asyncio.gather(...) for execution and waits for its result
asyncio.gather schedules the executions of forever('a') and forever('b')
whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired (e.g. if we start with 'a' then we have a chance to start trying to evaluate 'b', which should already be scheduled for execution).
In the output we'll see a stream of lines each containing 'a' or 'b', and the scheduler ought to be fair enough that we see at least one of each over a long enough period of time.
In practice this isn't what I observe. Instead, the entire program is equivalent to while True: print('a'). What I found extremely interesting is that even minor changes to the code seem to reintroduce fairness. E.g., if we instead have the following code then we get a roughly equal mix of 'a' and 'b' in the output.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(1.)
Verifying that it doesn't seem to have anything to do with how long we spend in vs out of the infinite loop I found that the following change also provides fairness.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(0.)
Does anyone know why this unfairness might happen and how to avoid it? I suppose when in doubt I could proactively add an empty sleep statement everywhere and hope that suffices, but it's incredibly non-obvious to me why the original code doesn't behave as expected.
In case it matters since asyncio seems to have gone through quite a few API changes, I'm using a vanilla installation of Python 3.8.4 on an Ubuntu box.

whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired
This part is a common misconception. Python's await doesn't mean "yield control to the event loop", it means "start executing the awaitable, allowing it to suspend us along with it". So yes, if the awaited object chooses to suspend, the current coroutine will suspend as well, and so will the coroutine that awaits it and so on, all the way to the event loop. But if the awaited object doesn't choose to suspend, as is the case with aprint, neither will the coroutine that awaits it. This is occasionally a source of bugs, as seen here or here.
Does anyone know why this unfairness might happen and how to avoid it?
Fortunately this effect is most pronounced in toy examples that don't really communicate with the outside world. And although you can fix them by adding await asyncio.sleep(0) to strategic places (which is even documented to force a context switch), you probably shouldn't do that in production code.
A real program will depend on input from the outside world, be it data coming from the network, from a local database, or from a work queue populated by another thread or process. Actual data will rarely arrive so fast to starve the rest of the program, and if it does, the starvation will likely be temporary because the program will eventually suspend due to backpressure from its output side. In the rare possibility that the program receives data from one source faster than it can process it, but still needs to observe data coming from another source, you could have a starvation issue, but that can be fixed with forced context switches if it is ever shown to occur. (I haven't heard of anyone encountering it in production.)
Aside from bugs mentioned above, what happens much more often is that a coroutine invokes CPU-heavy or legacy blocking code, and that ends up hogging the event loop. Such situations should be handled by passing the CPU/blocking part to run_in_executor.

I would like to draw attention to PEP 492, that says:
await, similarly to yield from, suspends execution of [...] coroutine until [...] awaitable completes and returns the result data.
It uses the yield from implementation with an extra step of validating its argument.
Any yield from chain of calls ends with a yield. This is a fundamental mechanism of how Futures are implemented. Since, internally, coroutines are a special kind of generators, every await is suspended by a yield somewhere down the chain of await calls (please refer to PEP 3156 for a detailed explanation).
But in your case async def aprint() does not yield, that is, it does not call any event function like I/O or just await sleep(0), which, if we look at it's source code, just does yield:
#types.coroutine
def __sleep0():
"""Skip one event loop run cycle.
This is a private helper for 'asyncio.sleep()', used
when the 'delay' is set to 0. It uses a bare 'yield'
expression (which Task.__step knows how to handle)
instead of creating a Future object.
"""
yield
async def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay <= 0:
await __sleep0()
return result
...
Thus, because of the forever while True:, we could say, you make a chain of yield from that does not end with a yield.

Why is asyncio.Future incompatible with concurrent.futures.Future?

The two classes represent excellent abstractions for concurrent programming, so it's a bit disconcerting that they don't support the same API.
Specifically, according to the docs:
asyncio.Future is almost compatible with concurrent.futures.Future.
Differences:
result() and exception() do not take a timeout argument and raise an exception when the future isn’t done yet.
Callbacks registered with add_done_callback() are always called via the event loop's call_soon_threadsafe().
This class is not compatible with the wait() and as_completed() functions in the concurrent.futures package.
The above list is actually incomplete, there are a couple more differences:
running() method is absent
result() and exception() may raise InvalidStateError if called too early
Are any of these due to the inherent nature of an event loop that makes these operations either useless or too troublesome to implement?
And what is the meaning of the difference related to add_done_callback()? Either way, the callback is guaranteed to happen at some unspecified time after the futures is done, so isn't it perfectly consistent between the two classes?

The core reason for the difference is in how threads (and processes) handle blocks vs how coroutines handle events that block. In threading, the current thread is suspended until whatever condition resolves and the thread can go forward. For example in the case of the futures, if you request the result of a future, it's fine to suspend the current thread until that result is available.
However the concurrency model of an event loop is that rather than suspending code, you return to the event loop and get called again when ready. So it is an error to request the result of an asyncio future that doesn't have a result ready.
You might think that the asyncio future could just wait and while that would be inefficient, would it really be all that bad for your coroutine to block? It turns out though that having the coroutine block is very likely to mean that the future never completes. It is very likely that the future's result will be set by some code associated with the event loop running the code that requests the result. If the thread running that event loop blocks, no code associated with the event loop would run. So blocking on the result would deadlock and prevent the result from being produced.
So, yes, the differences in interface are due to this inherent difference. As an example, you wouldn't want to use an asyncio future with the concurrent.futures waiter abstraction because again that would block the event loop thread.
The add_done_callbacks difference guarantees that callbacks will be run in the event loop. That's desirable because they will get the event loop's thread local data. Also, a lot of coroutine code assumes that it will never be run at the same time as other code from the same event loop. That is, coroutines are only thread safe under the assumption that two coroutines from the same event loop do not run at the same time. Running the callbacks in the event loop avoids a lot of thread safety issues and makes it easier to write correct code.

concurrent.futures.Future provides a way to share results between different threads and processes usually when you use Executor.
asyncio.Future solves same task but for coroutines, that are actually some special sort of functions running usually in one process/thread asynchronously. "Asynchronously" in current context means that event loop manages code executing flow of this coroutines: it may suspend execution inside one coroutine, start executing another coroutine and later return to executing first one - everything usually in one thread/process.
These objects (and many other threading/asyncio objects like Lock, Event, Semaphore etc.) look similar because the idea of concurrency in your code with threads/processes and coroutines is similar.
I think the main reason objects are different is historical: asyncio was created much later then threading and concurrent.futures. It's probably impossible to change concurrent.futures.Future to work with asyncio without breaking class API.
Should both classes be one in "ideal world"? This is probably debatable issue, but I see many disadvantages of that: while asyncio and threading look similar at first glance, they're very different in many ways, including internal implementation or way of writing asyncio/non-asyncio code (see async/await keywords).
I think it's probably for the best that classes are different: we clearly split different by nature ways of concurrency (even if their similarity looks strange at first).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.