Do I use async all the way down?

Do I use async all the way down? - python

Here's a toy example to illustrate my question: Suppose I have a function get_all_foos() which calls get_one_foo() several times. Each call to get_one_foo() calls do_sql_query(), which actually retrieves one foo by calling await db_conn.fetch(<some sql>).
Which of my three functions should be marked async? Does it add overhead if I mark them all async?
My mental model is that calling a function with await adds some sort of scheduling overhead, and doesn't make sense if that function doesn't do any IO itself, it only calls other functions that eventually do IO. But my mental model still may be left over from threading frameworks, rather than coroutine frameworks.

The answer to the question from the summary is "yes" - you do need async all the way down.
My mental model is that calling a function with await adds some sort of scheduling overhead, and doesn't make sense if that function doesn't do any IO itself, it only calls other functions that eventually do IO.
Fortunately this is not correct. A simple await doesn't (necessarily) do any scheduling - it doesn't even drop into the event loop unless the awaited function chooses to suspend. An await immediately starts executing the awaitee and is no greater performance hit than starting to execute a generator (which is how awaiting is implemented).
This can sometimes even be a problem because a code that awaits something that never suspends actually ends up blocking the event loop.

Related

Is yielding from inside a nursery in an asynchronous generator function bad?

I was told that the following code is not safe, because it is not allowed to have an asynchronous generator that yields from inside a nursery, except if it is an asynchronous context manager.
T = TypeVar('T')
async def delay(interval: float, source: AsyncIterable[T]) -> AsyncIterable[T]:
"""Delays each item in source by an interval.
Received items are temporarily stored in an unbounded queue, along with a timestamp, using
a background task. The foreground task takes items from the queue, and waits until the
item is older than the given interval and then yields it."""
send_channel, receive_channel = trio.open_memory_channel(math.inf)
async def pull_task():
async with aclosing(source) as agen:
async for item in agen:
send_channel.send_nowait((item, trio.current_time() + interval))
async with trio.open_nursery() as nursery:
nursery.start_soon(pull_task)
async with receive_channel:
async for item, timestamp in receive_channel:
now = trio.current_time()
if timestamp > now:
await trio.sleep(timestamp - now)
yield item
I have trouble understanding how this can possibly break. If anyone can provide an example code that uses this exact generator function, which demonstrates the unsafeness, it would be greatly appreciated and rewarded.
The goal of above code, is to delay processing of an asynchronous sequence, without applying any backpressure. If you can demonstrate that this code does not work like I would expect, that would also be appreciated.
Thank you.

Unfortunately, that's correct – yield inside a nursery or cancel scope isn't supported, except in the narrow cases of using #contextlib.asynccontextmanager to create an async context manager or writing an async pytest fixture.
There are several reasons for this. Some of them are technical: Trio has to keep track of which nurseries/cancel scopes are currently "active" on the stack, and when you yield out of one then it breaks the nesting, and Trio has no way to know that you've done this. (There's no way for a library to detect a yield out of a context manager.)
But there's also a fundamental, unsolveable reason, which is that the whole idea of Trio and structured concurrency is that every task "belongs" to a parent task that can receive notification if the child task crashes. But when you yield in a generator, the generator frame gets frozen and detached from the current task – it might resume in another task, or never resume at all. So when you yield, that breaks that link between all the child tasks in the nursery and their parents. There's just no way to reconcile that with the principles of structured concurrency.
Over in the Trio chat, Joshua Oreman gave a specific example that breaks in your case:
if I run the following
async def arange(*args):
for val in range(*args):
yield val
async def break_it():
async with aclosing(delay(0, arange(3))) as aiter:
with trio.move_on_after(1):
async for value in aiter:
await trio.sleep(0.4)
print(value)
trio.run(break_it)
then I get
RuntimeError: Cancel scope stack corrupted: attempted to exit
<trio.CancelScope at 0x7f364621c280, active, cancelled> in <Task
'__main__.break_it' at 0x7f36462152b0> that's still within its child
<trio.CancelScope at 0x7f364621c400, active>
This is probably a bug in your code, that has caused Trio's internal
state to become corrupted. We'll do our best to recover, but from now
on there are no guarantees.
Typically this is caused by one of the following:
- yielding within a generator or async generator that's opened a cancel
scope or nursery (unless the generator is a #contextmanager or
#asynccontextmanager); see https://github.com/python-trio/trio/issues/638 [...]
By changing the timeouts and delay so that the timeout expired while
inside the generator rather than while outside of it, I was able to
get a different error also: trio.MultiError: Cancelled(), GeneratorExit() raised out of aclosing()
There's also a long discussion about all these issues here, which is where we figured out that this just can't be supported: https://github.com/python-trio/trio/issues/264
It's an unfortunate situation, both because it's a shame that we can't support it, and even worse that it looks like it works in simple cases, so folks can end up writing a lot of code that uses this trick before realizing that it doesn't work :-(
Our plan is to make the illegal cases give an obvious error immediately when you try to yield, to at least avoid the second problem. But, this will take a while because it requires adding some extra hooks to the Python interpreter.
It is also possible to create a construct that's almost as easy to write and use as async generators, but that avoids this problem. The idea is that instead of pushing and popping the generator from the stack of the task that's consuming it, you instead run the "generator" code as a second task that feeds the consumer task values. See the thread starting here for more details.

When is asyncio's default scheduler fair?

It's my understanding that asyncio.gather is intended to run its arguments concurrently and also that when a coroutine executes an await expression it provides an opportunity for the event loop to schedule other tasks. With that in mind, I was surprised to see that the following snippet ignores one of the inputs to asyncio.gather.
import asyncio
async def aprint(s):
print(s)
async def forever(s):
while True:
await aprint(s)
async def main():
await asyncio.gather(forever('a'), forever('b'))
asyncio.run(main())
As I understand it, the following things happen:
asyncio.run(main()) does any necessary global initialization of the event loop and schedules main() for execution.
main() schedules asyncio.gather(...) for execution and waits for its result
asyncio.gather schedules the executions of forever('a') and forever('b')
whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired (e.g. if we start with 'a' then we have a chance to start trying to evaluate 'b', which should already be scheduled for execution).
In the output we'll see a stream of lines each containing 'a' or 'b', and the scheduler ought to be fair enough that we see at least one of each over a long enough period of time.
In practice this isn't what I observe. Instead, the entire program is equivalent to while True: print('a'). What I found extremely interesting is that even minor changes to the code seem to reintroduce fairness. E.g., if we instead have the following code then we get a roughly equal mix of 'a' and 'b' in the output.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(1.)
Verifying that it doesn't seem to have anything to do with how long we spend in vs out of the infinite loop I found that the following change also provides fairness.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(0.)
Does anyone know why this unfairness might happen and how to avoid it? I suppose when in doubt I could proactively add an empty sleep statement everywhere and hope that suffices, but it's incredibly non-obvious to me why the original code doesn't behave as expected.
In case it matters since asyncio seems to have gone through quite a few API changes, I'm using a vanilla installation of Python 3.8.4 on an Ubuntu box.

whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired
This part is a common misconception. Python's await doesn't mean "yield control to the event loop", it means "start executing the awaitable, allowing it to suspend us along with it". So yes, if the awaited object chooses to suspend, the current coroutine will suspend as well, and so will the coroutine that awaits it and so on, all the way to the event loop. But if the awaited object doesn't choose to suspend, as is the case with aprint, neither will the coroutine that awaits it. This is occasionally a source of bugs, as seen here or here.
Does anyone know why this unfairness might happen and how to avoid it?
Fortunately this effect is most pronounced in toy examples that don't really communicate with the outside world. And although you can fix them by adding await asyncio.sleep(0) to strategic places (which is even documented to force a context switch), you probably shouldn't do that in production code.
A real program will depend on input from the outside world, be it data coming from the network, from a local database, or from a work queue populated by another thread or process. Actual data will rarely arrive so fast to starve the rest of the program, and if it does, the starvation will likely be temporary because the program will eventually suspend due to backpressure from its output side. In the rare possibility that the program receives data from one source faster than it can process it, but still needs to observe data coming from another source, you could have a starvation issue, but that can be fixed with forced context switches if it is ever shown to occur. (I haven't heard of anyone encountering it in production.)
Aside from bugs mentioned above, what happens much more often is that a coroutine invokes CPU-heavy or legacy blocking code, and that ends up hogging the event loop. Such situations should be handled by passing the CPU/blocking part to run_in_executor.

I would like to draw attention to PEP 492, that says:
await, similarly to yield from, suspends execution of [...] coroutine until [...] awaitable completes and returns the result data.
It uses the yield from implementation with an extra step of validating its argument.
Any yield from chain of calls ends with a yield. This is a fundamental mechanism of how Futures are implemented. Since, internally, coroutines are a special kind of generators, every await is suspended by a yield somewhere down the chain of await calls (please refer to PEP 3156 for a detailed explanation).
But in your case async def aprint() does not yield, that is, it does not call any event function like I/O or just await sleep(0), which, if we look at it's source code, just does yield:
#types.coroutine
def __sleep0():
"""Skip one event loop run cycle.
This is a private helper for 'asyncio.sleep()', used
when the 'delay' is set to 0. It uses a bare 'yield'
expression (which Task.__step knows how to handle)
instead of creating a Future object.
"""
yield
async def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay <= 0:
await __sleep0()
return result
...
Thus, because of the forever while True:, we could say, you make a chain of yield from that does not end with a yield.

How to have something similar to Javascript callbacks in Python

I am trying to get my head around Python asyncio . This is a simple program i wrote . The logic i am trying to simulate is as follows:
I get a list of names from some database. Since we are going to do something with those names after we get them hence i made it a simple function and not an asynchronous one.
After we get the data we again make a call to some external API using the name that we have. Now since this would be an expensive operation from IO standpoint and the API calls for individual names don't depend on each other it makes sense to make them anonymous.
I looked up this thread in Stackoverflow(Cooperative yield in asyncio) which says that to give back control to the event loop to do something else we have to do asyncio.sleep(0).
Here i am comparing the async behaviour of Node.js and Python. If i give back control to the event loop using the above syntax my long running API call would remain suspended right and would not happen in the background as in Node.js?
In Node.js when we make an external API call we get something back called Promises on which we can wait to finish . It essentially means that the database call or API call is happening in the background and we get back something when it is done.
Am i missing something critical concept here about Python asynchronous programming ? Kindly throw some more light on this .
Below is the code and its output:
import asyncio
import time
async def get_message_from_api(name):
# This is supposed to be a long running operation like getting data from external API
print(f"Attempting to yield control to the other tasks....for {name}")
await asyncio.sleep(0)
time.sleep(2)
return f"Creating message for {name}"
async def simulate_long_ops(name):
print(f"Long running operation starting for {name}")
message = await get_message_from_api(name)
print(f"The message returned by the long running operation is {message}")
def get_data_from_database():
return ["John", "Mary", "Sansa", "Tyrion"]
async def main():
names = get_data_from_database()
futures = []
for name in names:
futures.append(loop.create_task(simulate_long_ops(name)))
await asyncio.wait(futures)
if __name__ == '__main__':
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
except Exception as e:
print(e)
finally:
loop.close()
Output:
Long running operation starting for John
Attempting to yield control to the other tasks....for John
Long running operation starting for Mary
Attempting to yield control to the other tasks....for Mary
Long running operation starting for Sansa
Attempting to yield control to the other tasks....for Sansa
Long running operation starting for Tyrion
Attempting to yield control to the other tasks....for Tyrion
The message returned by the long running operation is Creating message for John
The message returned by the long running operation is Creating message for Mary
The message returned by the long running operation is Creating message for Sansa
The message returned by the long running operation is Creating message for Tyrion

The mistake in your code is that you call time.sleep. You should never call that function in asyncio code, it blocks the whole event loop; await asyncio.sleep() instead. In JavaScript terms, calling time.sleep is almost as bad as sleeping like this instead of like this. (I say "almost" because time.sleep at least doesn't burn CPU cycles while waiting.)
Attempts to work around that mistake led to the second problem, the use of asyncio.sleep(0) to give control to the event loop. Although the idiom was added early, the behavior was documented only much later. As Guido hints in the original issue, explicit yielding to the event loop is only appropriate for advanced usage and its use by beginners is most likely an error. If your long-running operation is async ― as is the case in your code, once time.sleep() is replaced with await asyncio.sleep() ― you don't need to drop to the event loop manually. Instead, the async operation will drop as needed on every await, just like it would in JavaScript.
In Node.js when we make an external API call we get something back called Promises on which we can wait to finish.
In Python a future is a close counterpart, and the async models are very similar. One significant difference is that Python's async functions don't return scheduled futures, but lightweight coroutine objects which you must either await or pass to asyncio.create_task() to get them to run. Since your code does the latter, it looks correct.
The return value of create_task is an object that implements the Future interface. Future sports an add_done_callback method with the semantics you'd expect. But it's much better to simply await the future instead - it makes the code more readable and it's clear where the exceptions go.
Also, you probably want to use asyncio.gather() rather than asyncio.wait() to ensure that exceptions do not go unnoticed. If you are using Python 3.7, consider using asyncio.run() to run the async main function.

Why is asyncio.Future incompatible with concurrent.futures.Future?

The two classes represent excellent abstractions for concurrent programming, so it's a bit disconcerting that they don't support the same API.
Specifically, according to the docs:
asyncio.Future is almost compatible with concurrent.futures.Future.
Differences:
result() and exception() do not take a timeout argument and raise an exception when the future isn’t done yet.
Callbacks registered with add_done_callback() are always called via the event loop's call_soon_threadsafe().
This class is not compatible with the wait() and as_completed() functions in the concurrent.futures package.
The above list is actually incomplete, there are a couple more differences:
running() method is absent
result() and exception() may raise InvalidStateError if called too early
Are any of these due to the inherent nature of an event loop that makes these operations either useless or too troublesome to implement?
And what is the meaning of the difference related to add_done_callback()? Either way, the callback is guaranteed to happen at some unspecified time after the futures is done, so isn't it perfectly consistent between the two classes?

The core reason for the difference is in how threads (and processes) handle blocks vs how coroutines handle events that block. In threading, the current thread is suspended until whatever condition resolves and the thread can go forward. For example in the case of the futures, if you request the result of a future, it's fine to suspend the current thread until that result is available.
However the concurrency model of an event loop is that rather than suspending code, you return to the event loop and get called again when ready. So it is an error to request the result of an asyncio future that doesn't have a result ready.
You might think that the asyncio future could just wait and while that would be inefficient, would it really be all that bad for your coroutine to block? It turns out though that having the coroutine block is very likely to mean that the future never completes. It is very likely that the future's result will be set by some code associated with the event loop running the code that requests the result. If the thread running that event loop blocks, no code associated with the event loop would run. So blocking on the result would deadlock and prevent the result from being produced.
So, yes, the differences in interface are due to this inherent difference. As an example, you wouldn't want to use an asyncio future with the concurrent.futures waiter abstraction because again that would block the event loop thread.
The add_done_callbacks difference guarantees that callbacks will be run in the event loop. That's desirable because they will get the event loop's thread local data. Also, a lot of coroutine code assumes that it will never be run at the same time as other code from the same event loop. That is, coroutines are only thread safe under the assumption that two coroutines from the same event loop do not run at the same time. Running the callbacks in the event loop avoids a lot of thread safety issues and makes it easier to write correct code.

concurrent.futures.Future provides a way to share results between different threads and processes usually when you use Executor.
asyncio.Future solves same task but for coroutines, that are actually some special sort of functions running usually in one process/thread asynchronously. "Asynchronously" in current context means that event loop manages code executing flow of this coroutines: it may suspend execution inside one coroutine, start executing another coroutine and later return to executing first one - everything usually in one thread/process.
These objects (and many other threading/asyncio objects like Lock, Event, Semaphore etc.) look similar because the idea of concurrency in your code with threads/processes and coroutines is similar.
I think the main reason objects are different is historical: asyncio was created much later then threading and concurrent.futures. It's probably impossible to change concurrent.futures.Future to work with asyncio without breaking class API.
Should both classes be one in "ideal world"? This is probably debatable issue, but I see many disadvantages of that: while asyncio and threading look similar at first glance, they're very different in many ways, including internal implementation or way of writing asyncio/non-asyncio code (see async/await keywords).
I think it's probably for the best that classes are different: we clearly split different by nature ways of concurrency (even if their similarity looks strange at first).

Is this multi-threaded function asynchronous

I'm afraid I'm still a bit confused (despite checking other threads) whether:
all asynchronous code is multi-threaded
all multi-threaded functions are asynchronous
My initial guess is no to both and that proper asynchronous code should be able to run in one thread - however it can be improved by adding threads for example like so:
So I constructed this toy example:
from threading import *
from queue import Queue
import time
def do_something_with_io_lag(in_work):
out = in_work
# Imagine we do some work that involves sending
# something over the internet and processing the output
# once it arrives
time.sleep(0.5) # simulate IO lag
print("Hello, bee number: ",
str(current_thread().name).replace("Thread-",""))
class WorkerBee(Thread):
def __init__(self, q):
Thread.__init__(self)
self.q = q
def run(self):
while True:
# Get some work from the queue
work_todo = self.q.get()
# This function will simiulate I/O lag
do_something_with_io_lag(work_todo)
# Remove task from the queue
self.q.task_done()
if __name__ == '__main__':
def time_me(nmbr):
number_of_worker_bees = nmbr
worktodo = ['some input for work'] * 50
# Create a queue
q = Queue()
# Fill with work
[q.put(onework) for onework in worktodo]
# Launch processes
for _ in range(number_of_worker_bees):
t = WorkerBee(q)
t.start()
# Block until queue is empty
q.join()
# Run this code in serial mode (just one worker)
%time time_me(nmbr=1)
# Wall time: 25 s
# Basically 50 requests * 0.5 seconds IO lag
# For me everything gets processed by bee number: 59
# Run this code using multi-tasking (launch 50 workers)
%time time_me(nmbr=50)
# Wall time: 507 ms
# Basically the 0.5 second IO lag + 0.07 seconds it took to launch them
# Now everything gets processed by different bees
Is it asynchronous?
To me this code does not seem asynchronous because it is Figure 3 in my example diagram. The I/O call blocks the thread (although we don't feel it because they are blocked in parallel).
However, if this is the case I am confused why requests-futures is considered asynchronous since it is a wrapper around ThreadPoolExecutor:
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
Can this function on just one thread?
Especially when compared to asyncio, which means it can run single-threaded
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.

First of all, one note: concurrent.futures.Future is not the same as asyncio.Future. Basically it's just an abstraction - an object, that allows you to refer to job result (or exception, which is also a result) in your program after you assigned a job, but before it is completed. It's similar to assigning common function's result to some variable.
Multithreading: Regarding your example, when using multiple threads you can say that your code is "asynchronous" as several operations are performed in different threads at the same time without waiting for each other to complete, and you can see it in the timing results. And you're right, your function due to sleep is blocking, it blocks the worker thread for the specified amount of time, but when you use several threads those threads are blocked in parallel. So if you would have one job with sleep and the other one without and run multiple threads, the one without sleep would perform calculations while the other would sleep. When you use single thread, the jobs are performed in in a serial manner one after the other, so when one job sleeps the other jobs wait for it, actually they just don't exist until it's their turn. All this is pretty much proven by your time tests. The thing happened with print has to do with "thread safety", i.e. print uses standard output, which is a single shared resource. So when your multiple threads tried to print at the same time the switching happened inside and you got your strange output. (This also show "asynchronicity" of your multithreaded example.) To prevent such errors there are locking mechanisms, e.g. locks, semaphores, etc.
Asyncio: To better understand the purpose note the "IO" part, it's not 'async computation', but 'async input/output'. When talking about asyncio you usually don't think about threads at first. Asyncio is about event loop and generators (coroutines). The event loop is the arbiter, that governs the execution of coroutines (and their callbacks), that were registered to the loop. Coroutines are implemented as generators, i.e. functions that allow to perform some actions iteratively, saving state at each iteration and 'returning', and on the next call continuing with the saved state. So basically the event loop is while True: loop, that calls all coroutines/generators, assigned to it, one after another, and they provide result or no-result on each such call - this provides possibility for "asynchronicity". (A simplification, as there's scheduling mechanisms, that optimize this behavior.) The event loop in this situation can run in single thread and if coroutines are non-blocking it will give you true "asynchronicity", but if they are blocking then it's basically a linear execution.
You can achieve the same thing with explicit multithreading, but threads are costly - they require memory to be assigned, switching them takes time, etc. On the other hand asyncio API allows you to abstract from actual implementation and just consider your jobs to be performed asynchronously. It's implementation may be different, it includes calling the OS API and the OS decides what to do, e.g. DMA, additional threads, some specific microcontroller use, etc. The thing is it works well for IO due to lower level mechanisms, hardware stuff. On the other hand, performing computation will require explicit breaking of computation algorithm into pieces to use as asyncio coroutine, so a separate thread might be a better decision, as you can launch the whole computation as one there. (I'm not talking about algorithms that are special to parallel computing). But asyncio event loop might be explicitly set to use separate threads for coroutines, so this will be asyncio with multithreading.
Regarding your example, if you'll implement your function with sleep as asyncio coroutine, shedule and run 50 of them single threaded, you'll get time similar to the first time test, i.e. around 25s, as it is blocking. If you will change it to something like yield from [asyncio.sleep][3](0.5) (which is a coroutine itself), shedule and run 50 of them single threaded, it will be called asynchronously. So while one coroutine will sleep the other will be started, and so on. The jobs will complete in time similar to your second multithreaded test, i.e. close to 0.5s. If you will add print here you'll get good output as it will be used by single thread in serial manner, but the output might be in different order then the order of coroutine assignment to the loop, as coroutines could be run in different order. If you will use multiple threads, then the result will obviously be close to the last one anyway.
Simplification: The difference in multythreading and asyncio is in blocking/non-blocking, so basicly blocking multithreading will somewhat come close to non-blocking asyncio, but there're a lot of differences.
Multithreading for computations (i.e. CPU bound code)
Asyncio for input/output (i.e. I/O bound code)
Regarding your original statement:
all asynchronous code is multi-threaded
all multi-threaded functions are asynchronous
I hope that I was able to show, that:
asynchronous code might be both single threaded and multi-threaded
all multi-threaded functions could be called "asynchronous"

I think the main confusion comes from the meaning of asynchronous. From the Free Online Dictionary of Computing, "A process [...] whose execution can proceed independently" is asynchronous. Now, apply that to what your bees do:
Retrieve an item from the queue. Only one at a time can do that, while the order in which they get an item is undefined. I wouldn't call that asynchronous.
Sleep. Each bee does so independently of all others, i.e. the sleep duration runs on all, otherwise the time wouldn't go down with multiple bees. I'd call that asynchronous.
Call print(). While the calls are independent, at some point the data is funneled into the same output target, and at that point a sequence is enforced. I wouldn't call that asynchronous. Note however that the two arguments to print() and also the trailing newline are handled independently, which is why they can be interleaved.
Lastly, the call to q.join(). Here of course the calling thread is blocked until the queue is empty, so some kind of synchronization is enforced and wanted. I don't see why this "seems to break" for you.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.