Python: why use AsyncIO if not with asyncio.gather()? - python

I recently started looking into asynchronous programming in Python. Let's say we want to run a function asynchronously, an example below:
async def print_i_async(no):
print("Async: Preparing print of " + str(no))
await asyncio.sleep(1)
print(str(no))
async def main_async(no):
await asyncio.gather(*(print_i_async(i) for i in range(no)))
asyncio.run(main_async(no))
This will as expected work asynchronously. It's not clear to me, however, why would we use asynchronous functions if not with asyncio.gather(). For example:
def print_i_serial(no):
print("Serial: Preparing print of " + str(no))
time.sleep(1)
print(str(no))
for i in range(5):
print_i_serial(i)
for i in range(5):
asyncio.run(print_i_async(i))
These two functions produce the same result. Am I missing something? Is there any reason we would use an async def if we don't use asyncio.gather(), given this is how we actually get asynchronous results?

There are many reasons to use asyncio besides gather.
What you are really asking is: are there more ways to create concurrent executions besides gather?
To that the answer is yes.
Yes, gather is one of the simplest and most straightforward examples for creating concurrency with asyncio, but it's not limited to gather.
What gather does is creating a bunch of awaitables (if needed, for example coroutines are wrapped in a task) to wait for and return the result once all the futures are ready (and a bunch of other stuff such as propagating cancellation).
Let's examine just two more examples of ways to achieve concurrency:
as_completed - similarly to gather, you send in a bunch of awaitables, but instead of waiting for all of them to be ready, this method returns you the futures as they become ready, unordered.
Another example is to create tasks yourself, e.g. with event_loop.create_task(). This will allow you to create a task that will run on the event loop, which you can later await. In the meantime (until you await the task) you can continue running other code, and basically achieve concurrency (note the task will not run straightaway, but only when you yield control back to the event loop, and it handles the task).
There are many more ways to achieve concurrency. You can start with these examples (the 2nd one is actually a general way you can use to create lots of different concurrent "topologies" of executions).
You can start by reading https://docs.python.org/3/library/asyncio-task.html

Related

Is there a way to wrap pygame.midi.Input.read() in an asynchronous task without polling or an extra thread?

I have basically the following code and want to embed it in an async coroutine:
def read_midi():
midi_in = pygame.midi.Input(0)
while True:
if midi_in.poll():
midi_data = midi_in.read(1)[0][0]
# do something with midi_data, e.g. putting it in a Queue..
From my understanding since pygame is not asynchronous I have two options here: put the whole function in an extra thread or turn it into an async coroutine like this:
async def read_midi():
midi_in = pygame.midi.Input(1)
while True:
if not midi_in.poll():
await asyncio.sleep(0.1) # very bad!
continue
midi_data = midi_in.read(1)[0][0]
# do something with midi_data, e.g. putting it in a Queue..
So it looks like I have to either keep the busy loop and put it in a thread and waste lots of cpu time or put it into the (fake) coroutine above and introduce a tradeoff between time lags and wasting CPU time.
Am I wrong?
Is there a way to read MIDI without a busy loop?
Or even a way to await midi.Input.read?
It is true that the pygame library is not asynchronous, so you must either utilize a distinct thread or an asynchronous coroutine to process the MIDI input.
Using a distinct thread will permit the other parts of the program to carry on running concurrently to the MIDI input being read, but it will also necessitate more CPU resources.
Employing an async coroutine with the asyncio.sleep(0.1) call will result in a holdup in the MIDI input, although it will also reduce the CPU utilization. The trade-off here is between responsiveness and resource usage.
Using asyncio.sleep(0.1) will not be optimal as it will cause a considerable lag and it might not be wise to incorporate sleep in the while loop, as this will introduce a lot of holdup and won't be responsive.
Another possible choice is to utilize a library that furnishes an asynchronous interface for MIDI input, such as rtmidi-python or mido. These libraries may offer an approach to wait for MIDI input asynchronously without using a blocking call.

Is yielding from inside a nursery in an asynchronous generator function bad?

I was told that the following code is not safe, because it is not allowed to have an asynchronous generator that yields from inside a nursery, except if it is an asynchronous context manager.
T = TypeVar('T')
async def delay(interval: float, source: AsyncIterable[T]) -> AsyncIterable[T]:
"""Delays each item in source by an interval.
Received items are temporarily stored in an unbounded queue, along with a timestamp, using
a background task. The foreground task takes items from the queue, and waits until the
item is older than the given interval and then yields it."""
send_channel, receive_channel = trio.open_memory_channel(math.inf)
async def pull_task():
async with aclosing(source) as agen:
async for item in agen:
send_channel.send_nowait((item, trio.current_time() + interval))
async with trio.open_nursery() as nursery:
nursery.start_soon(pull_task)
async with receive_channel:
async for item, timestamp in receive_channel:
now = trio.current_time()
if timestamp > now:
await trio.sleep(timestamp - now)
yield item
I have trouble understanding how this can possibly break. If anyone can provide an example code that uses this exact generator function, which demonstrates the unsafeness, it would be greatly appreciated and rewarded.
The goal of above code, is to delay processing of an asynchronous sequence, without applying any backpressure. If you can demonstrate that this code does not work like I would expect, that would also be appreciated.
Thank you.
Unfortunately, that's correct – yield inside a nursery or cancel scope isn't supported, except in the narrow cases of using #contextlib.asynccontextmanager to create an async context manager or writing an async pytest fixture.
There are several reasons for this. Some of them are technical: Trio has to keep track of which nurseries/cancel scopes are currently "active" on the stack, and when you yield out of one then it breaks the nesting, and Trio has no way to know that you've done this. (There's no way for a library to detect a yield out of a context manager.)
But there's also a fundamental, unsolveable reason, which is that the whole idea of Trio and structured concurrency is that every task "belongs" to a parent task that can receive notification if the child task crashes. But when you yield in a generator, the generator frame gets frozen and detached from the current task – it might resume in another task, or never resume at all. So when you yield, that breaks that link between all the child tasks in the nursery and their parents. There's just no way to reconcile that with the principles of structured concurrency.
Over in the Trio chat, Joshua Oreman gave a specific example that breaks in your case:
if I run the following
async def arange(*args):
for val in range(*args):
yield val
async def break_it():
async with aclosing(delay(0, arange(3))) as aiter:
with trio.move_on_after(1):
async for value in aiter:
await trio.sleep(0.4)
print(value)
trio.run(break_it)
then I get
RuntimeError: Cancel scope stack corrupted: attempted to exit
<trio.CancelScope at 0x7f364621c280, active, cancelled> in <Task
'__main__.break_it' at 0x7f36462152b0> that's still within its child
<trio.CancelScope at 0x7f364621c400, active>
This is probably a bug in your code, that has caused Trio's internal
state to become corrupted. We'll do our best to recover, but from now
on there are no guarantees.
Typically this is caused by one of the following:
- yielding within a generator or async generator that's opened a cancel
scope or nursery (unless the generator is a #contextmanager or
#asynccontextmanager); see https://github.com/python-trio/trio/issues/638 [...]
By changing the timeouts and delay so that the timeout expired while
inside the generator rather than while outside of it, I was able to
get a different error also: trio.MultiError: Cancelled(), GeneratorExit() raised out of aclosing()
There's also a long discussion about all these issues here, which is where we figured out that this just can't be supported: https://github.com/python-trio/trio/issues/264
It's an unfortunate situation, both because it's a shame that we can't support it, and even worse that it looks like it works in simple cases, so folks can end up writing a lot of code that uses this trick before realizing that it doesn't work :-(
Our plan is to make the illegal cases give an obvious error immediately when you try to yield, to at least avoid the second problem. But, this will take a while because it requires adding some extra hooks to the Python interpreter.
It is also possible to create a construct that's almost as easy to write and use as async generators, but that avoids this problem. The idea is that instead of pushing and popping the generator from the stack of the task that's consuming it, you instead run the "generator" code as a second task that feeds the consumer task values. See the thread starting here for more details.

When is asyncio's default scheduler fair?

It's my understanding that asyncio.gather is intended to run its arguments concurrently and also that when a coroutine executes an await expression it provides an opportunity for the event loop to schedule other tasks. With that in mind, I was surprised to see that the following snippet ignores one of the inputs to asyncio.gather.
import asyncio
async def aprint(s):
print(s)
async def forever(s):
while True:
await aprint(s)
async def main():
await asyncio.gather(forever('a'), forever('b'))
asyncio.run(main())
As I understand it, the following things happen:
asyncio.run(main()) does any necessary global initialization of the event loop and schedules main() for execution.
main() schedules asyncio.gather(...) for execution and waits for its result
asyncio.gather schedules the executions of forever('a') and forever('b')
whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired (e.g. if we start with 'a' then we have a chance to start trying to evaluate 'b', which should already be scheduled for execution).
In the output we'll see a stream of lines each containing 'a' or 'b', and the scheduler ought to be fair enough that we see at least one of each over a long enough period of time.
In practice this isn't what I observe. Instead, the entire program is equivalent to while True: print('a'). What I found extremely interesting is that even minor changes to the code seem to reintroduce fairness. E.g., if we instead have the following code then we get a roughly equal mix of 'a' and 'b' in the output.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(1.)
Verifying that it doesn't seem to have anything to do with how long we spend in vs out of the infinite loop I found that the following change also provides fairness.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(0.)
Does anyone know why this unfairness might happen and how to avoid it? I suppose when in doubt I could proactively add an empty sleep statement everywhere and hope that suffices, but it's incredibly non-obvious to me why the original code doesn't behave as expected.
In case it matters since asyncio seems to have gone through quite a few API changes, I'm using a vanilla installation of Python 3.8.4 on an Ubuntu box.
whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired
This part is a common misconception. Python's await doesn't mean "yield control to the event loop", it means "start executing the awaitable, allowing it to suspend us along with it". So yes, if the awaited object chooses to suspend, the current coroutine will suspend as well, and so will the coroutine that awaits it and so on, all the way to the event loop. But if the awaited object doesn't choose to suspend, as is the case with aprint, neither will the coroutine that awaits it. This is occasionally a source of bugs, as seen here or here.
Does anyone know why this unfairness might happen and how to avoid it?
Fortunately this effect is most pronounced in toy examples that don't really communicate with the outside world. And although you can fix them by adding await asyncio.sleep(0) to strategic places (which is even documented to force a context switch), you probably shouldn't do that in production code.
A real program will depend on input from the outside world, be it data coming from the network, from a local database, or from a work queue populated by another thread or process. Actual data will rarely arrive so fast to starve the rest of the program, and if it does, the starvation will likely be temporary because the program will eventually suspend due to backpressure from its output side. In the rare possibility that the program receives data from one source faster than it can process it, but still needs to observe data coming from another source, you could have a starvation issue, but that can be fixed with forced context switches if it is ever shown to occur. (I haven't heard of anyone encountering it in production.)
Aside from bugs mentioned above, what happens much more often is that a coroutine invokes CPU-heavy or legacy blocking code, and that ends up hogging the event loop. Such situations should be handled by passing the CPU/blocking part to run_in_executor.
I would like to draw attention to PEP 492, that says:
await, similarly to yield from, suspends execution of [...] coroutine until [...] awaitable completes and returns the result data.
It uses the yield from implementation with an extra step of validating its argument.
Any yield from chain of calls ends with a yield. This is a fundamental mechanism of how Futures are implemented. Since, internally, coroutines are a special kind of generators, every await is suspended by a yield somewhere down the chain of await calls (please refer to PEP 3156 for a detailed explanation).
But in your case async def aprint() does not yield, that is, it does not call any event function like I/O or just await sleep(0), which, if we look at it's source code, just does yield:
#types.coroutine
def __sleep0():
"""Skip one event loop run cycle.
This is a private helper for 'asyncio.sleep()', used
when the 'delay' is set to 0. It uses a bare 'yield'
expression (which Task.__step knows how to handle)
instead of creating a Future object.
"""
yield
async def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay <= 0:
await __sleep0()
return result
...
Thus, because of the forever while True:, we could say, you make a chain of yield from that does not end with a yield.

Python asyncio: synchronize all access to a shared object

I have a class which processes a buch of work elements asynchronously (mainly due to overlapping HTTP connection requests) using asyncio. A very simplified example to demonstrate the structure of my code:
class Work:
...
def worker(self, item):
# do some work on item...
return
def queue(self):
# generate the work items...
yield from range(100)
async def run(self):
with ThreadPoolExecutor(max_workers=10) as executor:
loop = asyncio.get_event_loop()
tasks = [
loop.run_in_executor(executor, self.worker, item)
for item in self.queue()
]
for result in await asyncio.gather(*tasks):
pass
work = Work()
asyncio.run(work.run())
In practice, the workers need to access a shared container-like object and call its methods which are not async-safe. For example, let's say the worker method calls a function defined like this:
def func(shared_obj, value):
for node in shared_obj.filter(value):
shared_obj.remove(node)
However, calling func from a worker might affect the other asynchronous workers in this or any other function involving the shared object. I know that I need to use some synchronization, such as a global lock, but I don't find its usage easy:
asyncio.Lock can be used only in async functions, so I would have to mark all such function definitions as async
I would also have to await all calls of these functions
await is also usable only in async functions, so eventually all functions between worker and func would be async
if the worker was async, it would not be possible to pass it to loop.run_in_executor (it does not await)
Furthermore, some of the functions where I would have to add async may be generic in the sense that they should be callable from asynchronous as well as "normal" context.
I'm probably missing something serious in the whole concept. With the threading module, I would just create a lock and work with it in a couple of places, without having to further annotate the functions. Also, there is a nice solution to wrap the shared object such that all access is transparently guarded by a lock. I'm wondering if something similar is possible with asyncio...
I'm probably missing something serious in the whole concept. With the threading module, I would just create a lock...
What you are missing is that you're not really using asyncio at all. run_in_executor serves to integrate CPU-bound or legacy sync code into an asyncio application. It works by submitting the function it to a ThreadPoolExecutor and returning an awaitable handle which gets resolved once the function completes. This is "async" in the sense of running in the background, but not in the sense that is central to asyncio. An asyncio program is composed of non-blocking pieces that use async/await to suspend execution when data is unavailable and rely on the event loop to efficiently wait for multiple events at once and resume appropriate async functions.
In other words, as long as you rely on run_in_executor, you are just using threading (more precisely concurrent.futures with a threading executor). You can use a threading.Lock to synchronize between functions, and things will work exactly as if you used threading in the first place.
To get the benefits of asyncio such as scaling to a large number of concurrent tasks or reliable cancellation, you should design your program as async (or mostly async) from the ground up. Then you'll be able to modify shared data atomically simply by doing it between two awaits, or use asyncio.Lock for synchronized modification across awaits.

How to have something similar to Javascript callbacks in Python

I am trying to get my head around Python asyncio . This is a simple program i wrote . The logic i am trying to simulate is as follows:
I get a list of names from some database. Since we are going to do something with those names after we get them hence i made it a simple function and not an asynchronous one.
After we get the data we again make a call to some external API using the name that we have. Now since this would be an expensive operation from IO standpoint and the API calls for individual names don't depend on each other it makes sense to make them anonymous.
I looked up this thread in Stackoverflow(Cooperative yield in asyncio) which says that to give back control to the event loop to do something else we have to do asyncio.sleep(0).
Here i am comparing the async behaviour of Node.js and Python. If i give back control to the event loop using the above syntax my long running API call would remain suspended right and would not happen in the background as in Node.js?
In Node.js when we make an external API call we get something back called Promises on which we can wait to finish . It essentially means that the database call or API call is happening in the background and we get back something when it is done.
Am i missing something critical concept here about Python asynchronous programming ? Kindly throw some more light on this .
Below is the code and its output:
import asyncio
import time
async def get_message_from_api(name):
# This is supposed to be a long running operation like getting data from external API
print(f"Attempting to yield control to the other tasks....for {name}")
await asyncio.sleep(0)
time.sleep(2)
return f"Creating message for {name}"
async def simulate_long_ops(name):
print(f"Long running operation starting for {name}")
message = await get_message_from_api(name)
print(f"The message returned by the long running operation is {message}")
def get_data_from_database():
return ["John", "Mary", "Sansa", "Tyrion"]
async def main():
names = get_data_from_database()
futures = []
for name in names:
futures.append(loop.create_task(simulate_long_ops(name)))
await asyncio.wait(futures)
if __name__ == '__main__':
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
except Exception as e:
print(e)
finally:
loop.close()
Output:
Long running operation starting for John
Attempting to yield control to the other tasks....for John
Long running operation starting for Mary
Attempting to yield control to the other tasks....for Mary
Long running operation starting for Sansa
Attempting to yield control to the other tasks....for Sansa
Long running operation starting for Tyrion
Attempting to yield control to the other tasks....for Tyrion
The message returned by the long running operation is Creating message for John
The message returned by the long running operation is Creating message for Mary
The message returned by the long running operation is Creating message for Sansa
The message returned by the long running operation is Creating message for Tyrion
The mistake in your code is that you call time.sleep. You should never call that function in asyncio code, it blocks the whole event loop; await asyncio.sleep() instead. In JavaScript terms, calling time.sleep is almost as bad as sleeping like this instead of like this. (I say "almost" because time.sleep at least doesn't burn CPU cycles while waiting.)
Attempts to work around that mistake led to the second problem, the use of asyncio.sleep(0) to give control to the event loop. Although the idiom was added early, the behavior was documented only much later. As Guido hints in the original issue, explicit yielding to the event loop is only appropriate for advanced usage and its use by beginners is most likely an error. If your long-running operation is async ― as is the case in your code, once time.sleep() is replaced with await asyncio.sleep() ― you don't need to drop to the event loop manually. Instead, the async operation will drop as needed on every await, just like it would in JavaScript.
In Node.js when we make an external API call we get something back called Promises on which we can wait to finish.
In Python a future is a close counterpart, and the async models are very similar. One significant difference is that Python's async functions don't return scheduled futures, but lightweight coroutine objects which you must either await or pass to asyncio.create_task() to get them to run. Since your code does the latter, it looks correct.
The return value of create_task is an object that implements the Future interface. Future sports an add_done_callback method with the semantics you'd expect. But it's much better to simply await the future instead - it makes the code more readable and it's clear where the exceptions go.
Also, you probably want to use asyncio.gather() rather than asyncio.wait() to ensure that exceptions do not go unnoticed. If you are using Python 3.7, consider using asyncio.run() to run the async main function.

Categories