Why use async.gather in Python?

Why use async.gather in Python? - python

Let's say we have
await async_function_one_with_large_IO_request()
await async_function_two_with_large_IO_request()
versus
asyncio.gather(
async_function_one_with_large_IO_request(),
async_function_two_with_large_IO_request())
In the first version, once we hit the 'large io request' part of function one, it's gonna move onto running function_two, that's the whole point of await, right?
Isn't that what version 2 with gather does too?
What's the performance difference between the two?

In the first version, once we hit the 'large io request' part of function one, it's gonna move onto running function_two, that's the whole point of await, right?
That's incorrect. In your first version, async_function_two_with_large_IO_request (which I will call function_two) won't run until async_function_one_with_large_IO_request (which I will call function_one) completes.
If function_one happens to await on another function, it will yield control to another running async task, but function_two hasn't been scheduled yet.
When you use asyncio.gather, the tasks are scheduled concurrently, so if function_one awaits on something, function_two has a chance to run (along with other async tasks).
Note that asyncio.gather creates an async task, which generally implies you have to await on it:
await asyncio.gather(...)
The Python documentation covers this in detail in Coroutines and Tasks.

Related

python async - efficiency of sleeping multiple coroutines

Context: I am making a discord bot, and the mute command comes with a duration specifier for when the user should be unmuted again (this is being done via assigning a role to the user that removes their ability to send messages).
Idea 1: I could make a loop that checks every, say, 30 seconds and looks through to see which mutes have expired and cancel them.
Idea 2: Each time a mute is processed, I could await asyncio.sleep(however long) and then cancel it.
I would like to ask - which is more idiomatic, and also more importantly, which is more efficient? The first one has the advantage that it only has one coroutine running whereas the last one spawns a new one for each individual case (I don't really know how many that might be, but let's say around 10 concurrent cases maximum). However, the last one is easier for me to implement and feels like a more direct solution, on top of making the unmute happen exactly on time instead of on a timed loop.
There is one idea to make a loop that awaits until the next task and then when it processes that, queue up the next one, but then you can't insert tasks at the front of the queue.
TL;DR - if I need to schedule multiple events, do I run a loop to constantly check the scheduler, or do I open a coroutine for each and just await asyncio.sleep(however long) each of them?

Your second idea seems the most accurate and straightforward as you can simply use asyncio.create_task to run an unmute coroutine after a certain amount of time. By using asyncio.create_task, you do not need to return or yield back the unmute coroutine to be gathered later or submit it to be checked by a loop at an established interval, for awaiting the returned response from asyncio.create_task will run the task in the background:
import asyncio
async def unmute(delay, user_id):
await asyncio.sleep(delay)
#run unmute process
async def mute(user_id):
#your mute function
mute_for = await do_mute(user_id) #some mute function
u = asyncio.create_task(unmute(mute_for, user_id))
await u #submitting the task to be run in the background

Python Asyncio confusion between asyncio.sleep() and time.sleep()

From the documentation, if we want to implement a non-blocking delay we should implement await asyncio.sleep(delay) because time.sleep(delay) is blocking in nature. But from what I could understand, it is the await keyword that cedes control of the thread to do something else while the function f() following the keyword i.e. await f() finishes its calculations.
So if we need the await keyword in order for asyncio.sleep(delay) to be non-blocking, what is the difference with its counterpart time.sleep(delay) if both are not awaited for?
Also, can't we reach the same result by preceding both sleep functions with a await keyword?

From one answer of somewhat similar topic:
The function asyncio.sleep simply registers a future to be called in x seconds while time.sleep suspends the execution for x seconds.
So the execution of that coroutine called await asyncio.sleep() is suspended until event loop of asyncio revokes after timer-expired event.
However, time.sleep() literally blocks execution of current thread until designated time is passed, preventing chance to run either an event loop or other tasks while waiting - which makes concurrency possible despite being single threaded.
For what I understand that's difference of following:
counting X seconds with stopwatch yourself
let the clock ticking and periodically check if X seconds has passed
You, a thread probably can't do other thing while looking at stopwatch yourself, while you're free to do other jobs between periodic check on latter case.
Also, you can't use synchronous functions with await.
From PEP 492 that implements await and async:
await, similarly to yield from, suspends execution of read_data coroutine until db.fetch awaitable completes and returns the result data.
You can't suspend normal subroutine, python is imperative language.

Also, can't we reach the same result by preceding both sleep functions with a await keyword?
No, that's the thing. time.sleep() blocks. You call it, then nothing will happen for a while, then your program will resume when the function returns. You can await at that point all you want, there's nothing to await.
asyncio.sleep() returns immediately, and it returns an awaitable. When you await that awaitable, the event loop will transfer control to another async function (if there is one), and crucially it will be notified when the awaitable is done, and resume the awaiting function at that point.
To illustrate what an awaitable is:
foo = asyncio.sleep()
print('Still active')
await foo
print('After sleep')
An awaitable is an object that can be awaited. You aren't awaiting the function call, you're awaiting its return value. There's no such return value with time.sleep.

When is asyncio's default scheduler fair?

It's my understanding that asyncio.gather is intended to run its arguments concurrently and also that when a coroutine executes an await expression it provides an opportunity for the event loop to schedule other tasks. With that in mind, I was surprised to see that the following snippet ignores one of the inputs to asyncio.gather.
import asyncio
async def aprint(s):
print(s)
async def forever(s):
while True:
await aprint(s)
async def main():
await asyncio.gather(forever('a'), forever('b'))
asyncio.run(main())
As I understand it, the following things happen:
asyncio.run(main()) does any necessary global initialization of the event loop and schedules main() for execution.
main() schedules asyncio.gather(...) for execution and waits for its result
asyncio.gather schedules the executions of forever('a') and forever('b')
whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired (e.g. if we start with 'a' then we have a chance to start trying to evaluate 'b', which should already be scheduled for execution).
In the output we'll see a stream of lines each containing 'a' or 'b', and the scheduler ought to be fair enough that we see at least one of each over a long enough period of time.
In practice this isn't what I observe. Instead, the entire program is equivalent to while True: print('a'). What I found extremely interesting is that even minor changes to the code seem to reintroduce fairness. E.g., if we instead have the following code then we get a roughly equal mix of 'a' and 'b' in the output.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(1.)
Verifying that it doesn't seem to have anything to do with how long we spend in vs out of the infinite loop I found that the following change also provides fairness.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(0.)
Does anyone know why this unfairness might happen and how to avoid it? I suppose when in doubt I could proactively add an empty sleep statement everywhere and hope that suffices, but it's incredibly non-obvious to me why the original code doesn't behave as expected.
In case it matters since asyncio seems to have gone through quite a few API changes, I'm using a vanilla installation of Python 3.8.4 on an Ubuntu box.

whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired
This part is a common misconception. Python's await doesn't mean "yield control to the event loop", it means "start executing the awaitable, allowing it to suspend us along with it". So yes, if the awaited object chooses to suspend, the current coroutine will suspend as well, and so will the coroutine that awaits it and so on, all the way to the event loop. But if the awaited object doesn't choose to suspend, as is the case with aprint, neither will the coroutine that awaits it. This is occasionally a source of bugs, as seen here or here.
Does anyone know why this unfairness might happen and how to avoid it?
Fortunately this effect is most pronounced in toy examples that don't really communicate with the outside world. And although you can fix them by adding await asyncio.sleep(0) to strategic places (which is even documented to force a context switch), you probably shouldn't do that in production code.
A real program will depend on input from the outside world, be it data coming from the network, from a local database, or from a work queue populated by another thread or process. Actual data will rarely arrive so fast to starve the rest of the program, and if it does, the starvation will likely be temporary because the program will eventually suspend due to backpressure from its output side. In the rare possibility that the program receives data from one source faster than it can process it, but still needs to observe data coming from another source, you could have a starvation issue, but that can be fixed with forced context switches if it is ever shown to occur. (I haven't heard of anyone encountering it in production.)
Aside from bugs mentioned above, what happens much more often is that a coroutine invokes CPU-heavy or legacy blocking code, and that ends up hogging the event loop. Such situations should be handled by passing the CPU/blocking part to run_in_executor.

I would like to draw attention to PEP 492, that says:
await, similarly to yield from, suspends execution of [...] coroutine until [...] awaitable completes and returns the result data.
It uses the yield from implementation with an extra step of validating its argument.
Any yield from chain of calls ends with a yield. This is a fundamental mechanism of how Futures are implemented. Since, internally, coroutines are a special kind of generators, every await is suspended by a yield somewhere down the chain of await calls (please refer to PEP 3156 for a detailed explanation).
But in your case async def aprint() does not yield, that is, it does not call any event function like I/O or just await sleep(0), which, if we look at it's source code, just does yield:
#types.coroutine
def __sleep0():
"""Skip one event loop run cycle.
This is a private helper for 'asyncio.sleep()', used
when the 'delay' is set to 0. It uses a bare 'yield'
expression (which Task.__step knows how to handle)
instead of creating a Future object.
"""
yield
async def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay <= 0:
await __sleep0()
return result
...
Thus, because of the forever while True:, we could say, you make a chain of yield from that does not end with a yield.

How to have something similar to Javascript callbacks in Python

I am trying to get my head around Python asyncio . This is a simple program i wrote . The logic i am trying to simulate is as follows:
I get a list of names from some database. Since we are going to do something with those names after we get them hence i made it a simple function and not an asynchronous one.
After we get the data we again make a call to some external API using the name that we have. Now since this would be an expensive operation from IO standpoint and the API calls for individual names don't depend on each other it makes sense to make them anonymous.
I looked up this thread in Stackoverflow(Cooperative yield in asyncio) which says that to give back control to the event loop to do something else we have to do asyncio.sleep(0).
Here i am comparing the async behaviour of Node.js and Python. If i give back control to the event loop using the above syntax my long running API call would remain suspended right and would not happen in the background as in Node.js?
In Node.js when we make an external API call we get something back called Promises on which we can wait to finish . It essentially means that the database call or API call is happening in the background and we get back something when it is done.
Am i missing something critical concept here about Python asynchronous programming ? Kindly throw some more light on this .
Below is the code and its output:
import asyncio
import time
async def get_message_from_api(name):
# This is supposed to be a long running operation like getting data from external API
print(f"Attempting to yield control to the other tasks....for {name}")
await asyncio.sleep(0)
time.sleep(2)
return f"Creating message for {name}"
async def simulate_long_ops(name):
print(f"Long running operation starting for {name}")
message = await get_message_from_api(name)
print(f"The message returned by the long running operation is {message}")
def get_data_from_database():
return ["John", "Mary", "Sansa", "Tyrion"]
async def main():
names = get_data_from_database()
futures = []
for name in names:
futures.append(loop.create_task(simulate_long_ops(name)))
await asyncio.wait(futures)
if __name__ == '__main__':
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
except Exception as e:
print(e)
finally:
loop.close()
Output:
Long running operation starting for John
Attempting to yield control to the other tasks....for John
Long running operation starting for Mary
Attempting to yield control to the other tasks....for Mary
Long running operation starting for Sansa
Attempting to yield control to the other tasks....for Sansa
Long running operation starting for Tyrion
Attempting to yield control to the other tasks....for Tyrion
The message returned by the long running operation is Creating message for John
The message returned by the long running operation is Creating message for Mary
The message returned by the long running operation is Creating message for Sansa
The message returned by the long running operation is Creating message for Tyrion

The mistake in your code is that you call time.sleep. You should never call that function in asyncio code, it blocks the whole event loop; await asyncio.sleep() instead. In JavaScript terms, calling time.sleep is almost as bad as sleeping like this instead of like this. (I say "almost" because time.sleep at least doesn't burn CPU cycles while waiting.)
Attempts to work around that mistake led to the second problem, the use of asyncio.sleep(0) to give control to the event loop. Although the idiom was added early, the behavior was documented only much later. As Guido hints in the original issue, explicit yielding to the event loop is only appropriate for advanced usage and its use by beginners is most likely an error. If your long-running operation is async ― as is the case in your code, once time.sleep() is replaced with await asyncio.sleep() ― you don't need to drop to the event loop manually. Instead, the async operation will drop as needed on every await, just like it would in JavaScript.
In Node.js when we make an external API call we get something back called Promises on which we can wait to finish.
In Python a future is a close counterpart, and the async models are very similar. One significant difference is that Python's async functions don't return scheduled futures, but lightweight coroutine objects which you must either await or pass to asyncio.create_task() to get them to run. Since your code does the latter, it looks correct.
The return value of create_task is an object that implements the Future interface. Future sports an add_done_callback method with the semantics you'd expect. But it's much better to simply await the future instead - it makes the code more readable and it's clear where the exceptions go.
Also, you probably want to use asyncio.gather() rather than asyncio.wait() to ensure that exceptions do not go unnoticed. If you are using Python 3.7, consider using asyncio.run() to run the async main function.

Python asyncio: how single thread can handle mutiple things simultaneously?

Hi I am new to asyncio and concept of event loops (Non-blocking IO)
async def subWorker():
...
async def firstWorker():
await subWorker()
async def secondWorker():
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
asyncio.ensure_future(firstWorker())
asyncio.ensure_future(secondWorker())
loop.run_forever()
here, when code starts, firstWorker() is executed and paused until it encounters await subWorker(). While firstWorker() is waiting, secondWorker() gets started.
Question is, when firstWorker() encounters await subWorker() and gets paused, the computer then will execute subWorker() and secondWorker() at the same time. Since the program has only 1 thread now, and I guess the single thread does secondWorker() work. Then who executes subWorker()? If single thread can only do 1 thing at a time, who else does the other jobs?

The assumption that subWorker and secondWorker execute at the same time is false.
The fact that secondWorker simply sleeps means that the available time will be spent in subWorker.
asyncio by definition is single-threaded; see the documentation:
This module provides infrastructure for writing single-threaded concurrent code
The event loop executes a task at a time, switching for example when one task is blocked while waiting for I/O, or, as here, voluntarily sleeping.

This is a little old now, but I've found the visualization from the gevent docs (about 1 screen down, beneath "Synchronous & Asynchronous Execution") to be helpful while teaching asynchronous flow control to coworkers: http://sdiehl.github.io/gevent-tutorial/
The most important point here is that only one coroutine is running at any one time, even though many may be in process.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.