From the documentation, if we want to implement a non-blocking delay we should implement await asyncio.sleep(delay) because time.sleep(delay) is blocking in nature. But from what I could understand, it is the await keyword that cedes control of the thread to do something else while the function f() following the keyword i.e. await f() finishes its calculations.
So if we need the await keyword in order for asyncio.sleep(delay) to be non-blocking, what is the difference with its counterpart time.sleep(delay) if both are not awaited for?
Also, can't we reach the same result by preceding both sleep functions with a await keyword?
From one answer of somewhat similar topic:
The function asyncio.sleep simply registers a future to be called in x seconds while time.sleep suspends the execution for x seconds.
So the execution of that coroutine called await asyncio.sleep() is suspended until event loop of asyncio revokes after timer-expired event.
However, time.sleep() literally blocks execution of current thread until designated time is passed, preventing chance to run either an event loop or other tasks while waiting - which makes concurrency possible despite being single threaded.
For what I understand that's difference of following:
counting X seconds with stopwatch yourself
let the clock ticking and periodically check if X seconds has passed
You, a thread probably can't do other thing while looking at stopwatch yourself, while you're free to do other jobs between periodic check on latter case.
Also, you can't use synchronous functions with await.
From PEP 492 that implements await and async:
await, similarly to yield from, suspends execution of read_data coroutine until db.fetch awaitable completes and returns the result data.
You can't suspend normal subroutine, python is imperative language.
Also, can't we reach the same result by preceding both sleep functions with a await keyword?
No, that's the thing. time.sleep() blocks. You call it, then nothing will happen for a while, then your program will resume when the function returns. You can await at that point all you want, there's nothing to await.
asyncio.sleep() returns immediately, and it returns an awaitable. When you await that awaitable, the event loop will transfer control to another async function (if there is one), and crucially it will be notified when the awaitable is done, and resume the awaiting function at that point.
To illustrate what an awaitable is:
foo = asyncio.sleep()
print('Still active')
await foo
print('After sleep')
An awaitable is an object that can be awaited. You aren't awaiting the function call, you're awaiting its return value. There's no such return value with time.sleep.
Related
Let's say we have
await async_function_one_with_large_IO_request()
await async_function_two_with_large_IO_request()
versus
asyncio.gather(
async_function_one_with_large_IO_request(),
async_function_two_with_large_IO_request())
In the first version, once we hit the 'large io request' part of function one, it's gonna move onto running function_two, that's the whole point of await, right?
Isn't that what version 2 with gather does too?
What's the performance difference between the two?
In the first version, once we hit the 'large io request' part of function one, it's gonna move onto running function_two, that's the whole point of await, right?
That's incorrect. In your first version, async_function_two_with_large_IO_request (which I will call function_two) won't run until async_function_one_with_large_IO_request (which I will call function_one) completes.
If function_one happens to await on another function, it will yield control to another running async task, but function_two hasn't been scheduled yet.
When you use asyncio.gather, the tasks are scheduled concurrently, so if function_one awaits on something, function_two has a chance to run (along with other async tasks).
Note that asyncio.gather creates an async task, which generally implies you have to await on it:
await asyncio.gather(...)
The Python documentation covers this in detail in Coroutines and Tasks.
It's my understanding that asyncio.gather is intended to run its arguments concurrently and also that when a coroutine executes an await expression it provides an opportunity for the event loop to schedule other tasks. With that in mind, I was surprised to see that the following snippet ignores one of the inputs to asyncio.gather.
import asyncio
async def aprint(s):
print(s)
async def forever(s):
while True:
await aprint(s)
async def main():
await asyncio.gather(forever('a'), forever('b'))
asyncio.run(main())
As I understand it, the following things happen:
asyncio.run(main()) does any necessary global initialization of the event loop and schedules main() for execution.
main() schedules asyncio.gather(...) for execution and waits for its result
asyncio.gather schedules the executions of forever('a') and forever('b')
whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired (e.g. if we start with 'a' then we have a chance to start trying to evaluate 'b', which should already be scheduled for execution).
In the output we'll see a stream of lines each containing 'a' or 'b', and the scheduler ought to be fair enough that we see at least one of each over a long enough period of time.
In practice this isn't what I observe. Instead, the entire program is equivalent to while True: print('a'). What I found extremely interesting is that even minor changes to the code seem to reintroduce fairness. E.g., if we instead have the following code then we get a roughly equal mix of 'a' and 'b' in the output.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(1.)
Verifying that it doesn't seem to have anything to do with how long we spend in vs out of the infinite loop I found that the following change also provides fairness.
async def forever(s):
while True:
await aprint(s)
await asyncio.sleep(0.)
Does anyone know why this unfairness might happen and how to avoid it? I suppose when in doubt I could proactively add an empty sleep statement everywhere and hope that suffices, but it's incredibly non-obvious to me why the original code doesn't behave as expected.
In case it matters since asyncio seems to have gone through quite a few API changes, I'm using a vanilla installation of Python 3.8.4 on an Ubuntu box.
whichever of the those executes first, they immediately await aprint() and give the scheduler the opportunity to run another coroutine if desired
This part is a common misconception. Python's await doesn't mean "yield control to the event loop", it means "start executing the awaitable, allowing it to suspend us along with it". So yes, if the awaited object chooses to suspend, the current coroutine will suspend as well, and so will the coroutine that awaits it and so on, all the way to the event loop. But if the awaited object doesn't choose to suspend, as is the case with aprint, neither will the coroutine that awaits it. This is occasionally a source of bugs, as seen here or here.
Does anyone know why this unfairness might happen and how to avoid it?
Fortunately this effect is most pronounced in toy examples that don't really communicate with the outside world. And although you can fix them by adding await asyncio.sleep(0) to strategic places (which is even documented to force a context switch), you probably shouldn't do that in production code.
A real program will depend on input from the outside world, be it data coming from the network, from a local database, or from a work queue populated by another thread or process. Actual data will rarely arrive so fast to starve the rest of the program, and if it does, the starvation will likely be temporary because the program will eventually suspend due to backpressure from its output side. In the rare possibility that the program receives data from one source faster than it can process it, but still needs to observe data coming from another source, you could have a starvation issue, but that can be fixed with forced context switches if it is ever shown to occur. (I haven't heard of anyone encountering it in production.)
Aside from bugs mentioned above, what happens much more often is that a coroutine invokes CPU-heavy or legacy blocking code, and that ends up hogging the event loop. Such situations should be handled by passing the CPU/blocking part to run_in_executor.
I would like to draw attention to PEP 492, that says:
await, similarly to yield from, suspends execution of [...] coroutine until [...] awaitable completes and returns the result data.
It uses the yield from implementation with an extra step of validating its argument.
Any yield from chain of calls ends with a yield. This is a fundamental mechanism of how Futures are implemented. Since, internally, coroutines are a special kind of generators, every await is suspended by a yield somewhere down the chain of await calls (please refer to PEP 3156 for a detailed explanation).
But in your case async def aprint() does not yield, that is, it does not call any event function like I/O or just await sleep(0), which, if we look at it's source code, just does yield:
#types.coroutine
def __sleep0():
"""Skip one event loop run cycle.
This is a private helper for 'asyncio.sleep()', used
when the 'delay' is set to 0. It uses a bare 'yield'
expression (which Task.__step knows how to handle)
instead of creating a Future object.
"""
yield
async def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay <= 0:
await __sleep0()
return result
...
Thus, because of the forever while True:, we could say, you make a chain of yield from that does not end with a yield.
Using Python 3.6.8
async def sleeper():
time.sleep(2)
async def asyncio_sleeper():
await asyncio.sleep(2)
await asyncio.wait_for(sleeper(), 1)
await asyncio.wait_for(asyncio_sleeper(), 1)
Using time.sleep does NOT timeout, asyncio.sleep does timeout.
My intuition was that calling wait_for on a coroutine would base its timeout on how long the coroutine takes, not based on the individual async calls within the coroutine.
What is going on behind the scenes that results in this behavior, and is there a way to modify behavior to match my intuition?
What is going on behind the scenes that results in this behavior
The simplest answer is that asyncio is based on cooperative multitasking, and time.sleep doesn't cooperate. time.sleep(2) blocks the thread for two seconds, the event loop and all, and there is nothing anyone can do about it.
On the other hand, asyncio.sleep is carefully written so that when you await asyncio.sleep(2), it immediately suspends the current task and arranges with the event loop to resume it 2 seconds later. Asyncio's "sleeping" is implicit, which allows the event loop to proceed with other tasks while the coroutine is suspended. The same suspension system allows wait_for to cancel the task, which the event loop accomplishes by "resuming" it in such await that the await where it was suspended raises an exception.
In general, a coroutine not awaiting anything is good indication that it's incorrectly written and is a coroutine in name only. Awaits are the reason coroutines exist, and sleeper doesn't contain any.
is there a way to modify behavior to match my intuition?
If you must call legacy blocking code from asyncio, use run_in_executor. You will have to tell asyncio when you do so and allow it to execute the actual blocking call, like this:
async def sleeper():
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, time.sleep, 2)
time.sleep (or other blocking function) will be handed off to a separate thread, and sleeper will get suspended, to be resumed when time.sleep is done. Unlike with asyncio.sleep(), the blocking time.sleep(2) will still get called and block its thread for 2 seconds, but that will not affect the event loop, which will go about its business similar to how it did when await asyncio.sleep() was used.
Note that cancelling a coroutine that awaits run_in_executor will only cancel the waiting for the blocking time.sleep(2) to complete in the other thread. The blocking call will continue executing until completion, which is to be expected since there is no general mechanism to interrupt it.
There are two ways to implement intermediate functions that sit between the creation of an Awaitable and the site where the object is awaited. One implementation makes the intermediate function a coroutine, where the Awaitable is awaited on; the user then awaits on that coroutine. The alternative is to write the intermediate step as a function creating the Awaitable and returning it to the site where the await will happen. Below shows an example.
import asyncio
from typing import Awaitable
async def wait1() -> None:
await asyncio.sleep(1)
def wait2() -> Awaitable[None]:
return asyncio.sleep(1)
async def main():
await wait1()
await wait2()
asyncio.run(main())
What are the differences between these two implementations? The pros and cons? Performance differences? Are there any behavioral differences? I am in the position of making a lot of these functions and want to do the "right" thing.
As currently used, wait1 and wait2 are functionally equivalent. When called, both return a coroutine object, i.e. the result of running an async (coroutine) function. Differences arise if:
the functions have side effects, and
the awaitable they return is not awaited immediately
In that case, for wait1 the side effect will occur only after it has been awaited, and for wait2 it will occur as soon as it's called.
In most cases this makes no difference, but sometimes it can be observable. For example, the function run_in_executor can't be implemented as a coroutine because the current implementation is a function that first submits the received callable to the executor and then creates and returns an asyncio future that proxies the underlying concurrent future. This allows the API user to write:
# I don't care about the result, just submit it
loop.run_in_executor(None, my_callback)
If run_in_executor were a coroutine, this would be no-op, nothing would get submitted until you either awaited it or passed it to create_task to get it run.
I am in the position of making a lot of these functions and want to do the "right" thing.
I would use coroutines where possible, simply for clarity. It's ok to use the equivalence between wait1 and wait2 in places where you can't use coroutines, such as lambda expressions.
Hi I am new to asyncio and concept of event loops (Non-blocking IO)
async def subWorker():
...
async def firstWorker():
await subWorker()
async def secondWorker():
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
asyncio.ensure_future(firstWorker())
asyncio.ensure_future(secondWorker())
loop.run_forever()
here, when code starts, firstWorker() is executed and paused until it encounters await subWorker(). While firstWorker() is waiting, secondWorker() gets started.
Question is, when firstWorker() encounters await subWorker() and gets paused, the computer then will execute subWorker() and secondWorker() at the same time. Since the program has only 1 thread now, and I guess the single thread does secondWorker() work. Then who executes subWorker()? If single thread can only do 1 thing at a time, who else does the other jobs?
The assumption that subWorker and secondWorker execute at the same time is false.
The fact that secondWorker simply sleeps means that the available time will be spent in subWorker.
asyncio by definition is single-threaded; see the documentation:
This module provides infrastructure for writing single-threaded concurrent code
The event loop executes a task at a time, switching for example when one task is blocked while waiting for I/O, or, as here, voluntarily sleeping.
This is a little old now, but I've found the visualization from the gevent docs (about 1 screen down, beneath "Synchronous & Asynchronous Execution") to be helpful while teaching asynchronous flow control to coworkers: http://sdiehl.github.io/gevent-tutorial/
The most important point here is that only one coroutine is running at any one time, even though many may be in process.