I'm studying Python's asyncio library and i'm having a hard time understanding how some Python functions work.
For example:
import asyncio
async def coroutine():
print('Initialize coroutine')
await asyncio.sleep(1)
print('Done !')
loop = asyncio.get_event_loop()
#asyncio.ensure_future(coroutine())
#loop.create_task(coroutine())
loop.run_forever()
For example in the code if I make use of ensure_future it automatically starts the execution of my task, so does create_task, but in the case of create task I even understand that it uses the loop used in the loop.create_task method call. My question is because using ensure_future it already starts executing the task, and if it does this because in some case I see codes like this:
import asyncio
async def coroutine():
print('Initialize coroutine')
await asyncio.sleep(1)
print('Done !')
loop = asyncio.get_event_loop()
task = asyncio.ensure_future(coroutine())
loop.run_until_complete(asyncio.wait([task]))
My question is because using ensure_future it already starts executing the task, and if it does this
Yes, ensure_future starts executing the coroutine as soon as the event loop is resumed, even if no one awaits the returned future.
This feature of ensure_future follows from the implementation, but it also makes sense conceptually.
On the implementation level, it is a consequence of ensure_future internally calling create_task when given a coroutine object. In other words, for coroutines there is no difference between loop.create_task and asyncio.ensure_future. Otherwise the difference is that ensure_future is more general: it takes any awaitable object, and converts it to a Future. If it receives a coroutine object, create_task performs the appropriate conversion and the new Task (itself a subclass of Future) is returned. If the argument to ensure_future is already a Future or its subclass, it is returned unchanged.
On the conceptual level, ensure_future returns an object that represents the "future" value of a computation, i.e. it encapsulates the result of execution that happens in the background. In case of a corroutine, running in the background means being scheduled with the event loop. Without ensuring the coroutine's execution, returning a Future would be misleading because that future would never arrive.
Related
I'm trying to understand how asyncio works. As for I/O operation i got understand that when await was called, we register Future object in EventLoop, and then calling epoll for get sockets which belongs to Future objects, that ready for give us data. After we run registred callback and resume function execution.
But, the thing that i cant understant, what's happening if we use await not for I/O operation. How eventloop understands that task is complete? Is it create socket for that or use another kind of loop? Is it use epoll? Or doesnt it add to Loop and used it as generator?
There is an example:
import asyncio
async def test():
return 10
async def my_coro(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...")
await test()
if loop.time() > end_time:
print("Done.")
break
async def main():
await my_coro(3.0)
asyncio.run(main())
await doesn't automatically yield to the event loop, that happens only when an async function (anywhere in the chain of awaits) requests suspension, typically due to IO or timeout not being ready.
In your example the event loop is never returned to, which you can easily verify by moving the "Blocking" print before the while loop and changing main to await asyncio.gather(my_coro(3.0), my_coro(3.0)). What you'll observe is that the coroutines are executed in series ("blocking" followed by "done", all repeated twice), not in parallel ("blocking" followed by another "blocking" and then twice "done"). The reason for that was that there was simply no opportunity for a context switch - my_coro executed in one go as if they were an ordinary function because none of its awaits ever chose to suspend.
I'd like to know what guarantees python gives around when a event loop will switch tasks.
As I understand it async / await are significantly different from threads in that the event loop does not switch task based on time slicing, meaning that unless the task yields (await), it will carry on indefinitely. This can actually be useful because it is easier to manage critical sections under asyncio than with threading.
What I'm less clear about is something like the following:
async def caller():
while True:
await callee()
async def callee():
pass
In this example caller is repeatedly await. So technically it is yielding. But I'm not clear on whether this will allow other tasks on the event loop to execute because it only yields to callee and that is never yielding.
That is if I awaited callee inside a "critical section" even though I know it won't block, am I at risk of something else unexpected happening?
You are right to be wary. caller yields from callee, and yields to the event loop. Then the event loop decides which task to resume. Other tasks may (hopefully) be squeezed in between the calls to callee. callee needs to await an actual blocking Awaitable such as asyncio.Future or asyncio.sleep(), not a coroutine, otherwise the control will not be returned to the event loop until caller returns.
For example, the following code will finish the caller2 task before it starts working on the caller1 task. Because callee2 is essentially a sync function without awaiting a blocking I/O operations, therefore, no suspension point is created and caller2 will resume immediately after each call to callee2.
import asyncio
import time
async def caller1():
for i in range(5):
await callee1()
async def callee1():
await asyncio.sleep(1)
print(f"called at {time.strftime('%X')}")
async def caller2():
for i in range(5):
await callee2()
async def callee2():
time.sleep(1)
print(f"sync called at {time.strftime('%X')}")
async def main():
task1 = asyncio.create_task(caller1())
task2 = asyncio.create_task(caller2())
await task1
await task2
asyncio.run(main())
Result:
sync called at 19:23:39
sync called at 19:23:40
sync called at 19:23:41
sync called at 19:23:42
sync called at 19:23:43
called at 19:23:43
called at 19:23:44
called at 19:23:45
called at 19:23:46
called at 19:23:47
But if callee2 awaits as the following, the task switching will happen even if it awaits asyncio.sleep(0), and the tasks will run concurrently.
async def callee2():
await asyncio.sleep(1)
print('sync called')
Result:
called at 19:22:52
sync called at 19:22:52
called at 19:22:53
sync called at 19:22:53
called at 19:22:54
sync called at 19:22:54
called at 19:22:55
sync called at 19:22:55
called at 19:22:56
sync called at 19:22:56
This behavior is not necessarily intuitive, but it makes sense considering that asyncio was made to handle I/O operations and networking concurrently, not the usual synchronous python codes.
Another thing to note is: This still works if the callee awaits a coroutine that, in turn, awaits a asyncio.Future, asyncio.sleep(), or another coroutine that await one of those things down the chain. The flow control will be returned to the event loop when the blocking Awaitable is awaited. So the following also works.
async def callee2():
await inner_callee()
print(f"sync called at {time.strftime('%X')}")
async def inner_callee():
await asyncio.sleep(1)
TLDR: No. Coroutines and their respective keywords (await, async with, async for) only enable suspension. Whether suspension occurs depends on the framework used, if at all.
Third-party async functions / iterators / context managers can act as
checkpoints; if you see await <something> or one of its friends, then
that might be a checkpoint. So to be safe, you should prepare for
scheduling or cancellation happening there.
[Trio documentation]
The await syntax of Python is syntactic sugar around two fundamental mechanisms: yield to temporarily suspend with a value, and return to permanently exit with a value. These are the same that, say, a generator function coroutine can use:
def gencoroutine():
for i in range(5):
yield i # temporarily suspend
return 5 # permanently exit
Notably, return does not imply a suspension. It is possible for a generator coroutine to never yield at all.
The await keyword (and its sibling yield from) interacts with both the yield and return mechanism:
If its target yields, await "passes on" the suspension to its own caller. This allows to suspend an entire stack of coroutines that all await each other.
If its target returnss, await catches the return value and provides it to its own coroutine. This allows to return a value directly to a "caller", without suspension.
This means that await does not guarantee that a suspension occurs. It is up to the target of await to trigger a suspension.
By itself, an async def coroutine can only return without suspension, and await to allow suspension. It cannot suspend by itself (yield does not suspend to the event loop).
async def unyielding():
return 2 # or `pass`
This means that await of just coroutines does never suspend. Only specific awaitables are able to suspend.
Suspension is only possible for awaitables with a custom __await__ method. These can yield directly to the event loop.
class YieldToLoop:
def __await__(self):
yield # to event loop
return # to awaiter
This means that await, directly or indirectly, of a framework's awaitable will suspend.
The exact semantics of suspending depend on the async framework in use. For example, whether a sleep(0) triggers a suspension or not, or which coroutine to run instead, is up to the framework. This also extends to async iterators and context managers -- for example, many async context managers will suspend either on enter or exit but not both.
Trio
If you call an async function provided by Trio (await <something in trio>), and it doesn’t raise an exception, then it always acts as a checkpoint. (If it does raise an exception, it might act as a checkpoint or might not.)
Asyncio
sleep() always suspends the current task, allowing other tasks to run.
I was wondering what exactly happens when we await a coroutine in async Python code, for example:
await send_message(string)
(1) send_message is added to the event loop, and the calling coroutine gives up control to the event loop, or
(2) We jump directly into send_message
Most explanations I read point to (1), as they describe the calling coroutine as exiting. But my own experiments suggest (2) is the case: I tried to have a coroutine run after the caller but before the callee and could not achieve this.
Disclaimer: Open to correction (particularly as to details and correct terminology) since I arrived here looking for the answer to this myself. Nevertheless, the research below points to a pretty decisive "main point" conclusion:
Correct OP answer: No, await (per se) does not yield to the event loop, yield yields to the event loop, hence for the case given: "(2) We jump directly into send_message". In particular, certain yield expressions are the only points, at bottom, where async tasks can actually be switched out (in terms of nailing down the precise spot where Python code execution can be suspended).
To be proven and demonstrated: 1) by theory/documentation, 2) by implementation code, 3) by example.
By theory/documentation
PEP 492: Coroutines with async and await syntax
While the PEP is not tied to any specific Event Loop implementation, it is relevant only to the kind of coroutine that uses yield as a signal to the scheduler, indicating that the coroutine will be waiting until an event (such as IO) is completed. ...
[await] uses the yield from implementation [with an extra step of validating its argument.] ...
Any yield from chain of calls ends with a yield. This is a fundamental mechanism of how Futures are implemented. Since, internally, coroutines are a special kind of generators, every await is suspended by a yield somewhere down the chain of await calls (please refer to PEP 3156 for a detailed explanation). ...
Coroutines are based on generators internally, thus they share the implementation. Similarly to generator objects, coroutines have throw(), send() and close() methods. ...
The vision behind existing generator-based coroutines and this proposal is to make it easy for users to see where the code might be suspended.
In context, "easy for users to see where the code might be suspended" seems to refer to the fact that in synchronous code yield is the place where execution can be "suspended" within a routine allowing other code to run, and that principle now extends perfectly to the async context wherein a yield (if its value is not consumed within the running task but is propagated up to the scheduler) is the "signal to the scheduler" to switch out tasks.
More succinctly: where does a generator yield control? At a yield. Coroutines (including those using async and await syntax) are generators, hence likewise.
And it is not merely an analogy, in implementation (see below) the actual mechanism by which a task gets "into" and "out of" coroutines is not anything new, magical, or unique to the async world, but simply by calling the coro's <generator>.send() method. That was (as I understand the text) part of the "vision" behind PEP 492: async and await would provide no novel mechanism for code suspension but just pour async-sugar on Python's already well-beloved and powerful generators.
And
PEP 3156: The "asyncio" module
The loop.slow_callback_duration attribute controls the maximum execution time allowed between two yield points before a slow callback is reported [emphasis in original].
That is, an uninterrupted segment of code (from the async perspective) is demarcated as that between two successive yield points (whose values reached up to the running Task level (via an await/yield from tunnel) without being consumed within it).
And this:
The scheduler has no public interface. You interact with it by using yield from future and yield from task.
Objection: "That says 'yield from', but you're trying to argue that the task can only switch out at a yield itself! yield from and yield are different things, my friend, and yield from itself doesn't suspend code!"
Ans: Not a contradiction. The PEP is saying you interact with the scheduler by using yield from future/task. But as noted above in PEP 492, any chain of yield from (~aka await) ultimately reaches a yield (the "bottom turtle"). In particular (see below), yield from future does in fact yield that same future after some wrapper work, and that yield is the actual "switch out point" where another task takes over. But it is incorrect for your code to directly yield a Future up to the current Task because you would bypass the necessary wrapper.
The objection having been answered, and its practical coding considerations being noted, the point I wish to make from the above quote remains: that a suitable yield in Python async code is ultimately the one thing which, having suspended code execution in the standard way that any other yield would do, now futher engages the scheduler to bring about a possible task switch.
By implementation code
asyncio/futures.py
class Future:
...
def __await__(self):
if not self.done():
self._asyncio_future_blocking = True
yield self # This tells Task to wait for completion.
if not self.done():
raise RuntimeError("await wasn't used with future")
return self.result() # May raise too.
__iter__ = __await__ # make compatible with 'yield from'.
Paraphrase: The line yield self is what tells the running task to sit out for now and let other tasks run, coming back to this one sometime after self is done.
Almost all of your awaitables in asyncio world are (multiple layers of) wrappers around a Future. The event loop remains utterly blind to all higher level await awaitable expressions until the code execution trickles down to an await future or yield from future and then (as seen here) calls yield self, which yielded self is then "caught" by none other than the Task under which the present coroutine stack is running thereby signaling to the task to take a break.
Possibly the one and only exception to the above "code suspends at yield self within await future" rule, in an asyncio context, is the potential use of a bare yield such as in asyncio.sleep(0). And since the sleep function is a topic of discourse in the comments of this post, let's look at that.
asyncio/tasks.py
#types.coroutine
def __sleep0():
"""Skip one event loop run cycle.
This is a private helper for 'asyncio.sleep()', used
when the 'delay' is set to 0. It uses a bare 'yield'
expression (which Task.__step knows how to handle)
instead of creating a Future object.
"""
yield
async def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay <= 0:
await __sleep0()
return result
if loop is None:
loop = events.get_running_loop()
else:
warnings.warn("The loop argument is deprecated since Python 3.8, "
"and scheduled for removal in Python 3.10.",
DeprecationWarning, stacklevel=2)
future = loop.create_future()
h = loop.call_later(delay,
futures._set_result_unless_cancelled,
future, result)
try:
return await future
finally:
h.cancel()
Note: We have here the two interesting cases at which control can shift to the scheduler:
(1) The bare yield in __sleep0 (when called via an await).
(2) The yield self immediately within await future.
The crucial line (for our purposes) in asyncio/tasks.py is when Task._step runs its top-level coroutine via result = self._coro.send(None) and recognizes fourish cases:
(1) result = None is generated by the coro (which, again, is a generator): the task "relinquishes control for one event loop iteration".
(2) result = future is generated within the coro, with further magic member field evidence that the future was yielded in a proper manner from out of Future.__iter__ == Future.__await__: the task relinquishes control to the event loop until the future is complete.
(3) A StopIteration is raised by the coro indicating the coroutine completed (i.e. as a generator it exhausted all its yields): the final result of the task (which is itself a Future) is set to the coroutine return value.
(4) Any other Exception occurs: the task's set_exception is set accordingly.
Modulo details, the main point for our concern is that coroutine segments in an asyncio event loop ultimately run via coro.send(). Initial startup and final termination aside, send() proceeds precisely from the last yield value it generated to the next one.
By example
import asyncio
import types
def task_print(s):
print(f"{asyncio.current_task().get_name()}: {s}")
async def other_task(s):
task_print(s)
class AwaitableCls:
def __await__(self):
task_print(" 'Jumped straight into' another `await`; the act of `await awaitable` *itself* doesn't 'pause' anything")
yield
task_print(" We're back to our awaitable object because that other task completed")
asyncio.create_task(other_task("The event loop gets control when `yield` points (from an iterable coroutine) propagate up to the `current_task` through a suitable chain of `await` or `yield from` statements"))
async def coro():
task_print(" 'Jumped straight into' coro; the `await` keyword itself does nothing to 'pause' the current_task")
await AwaitableCls()
task_print(" 'Jumped straight back into' coro; we have another pending task, but leaving an `__await__` doesn't 'pause' the task any more than entering the `__await__` does")
#types.coroutine
def iterable_coro(context):
task_print(f"`{context} iterable_coro`: pre-yield")
yield None # None or a Future object are the only legitimate yields to the task in asyncio
task_print(f"`{context} iterable_coro`: post-yield")
async def original_task():
asyncio.create_task(other_task("Aha, but a (suitably unconsumed) *`yield`* DOES 'pause' the current_task allowing the event scheduler to `_wakeup` another task"))
task_print("Original task")
await coro()
task_print("'Jumped straight out of' coro. Leaving a coro, as with leaving/entering any awaitable, doesn't give control to the event loop")
res = await iterable_coro("await")
assert res is None
asyncio.create_task(other_task("This doesn't run until the very end because the generated None following the creation of this task is consumed by the `for` loop"))
for y in iterable_coro("for y in"):
task_print(f"But 'ordinary' `yield` points (those which are consumed by the `current_task` itself) behave as ordinary without relinquishing control at the async/task-level; `y={y}`")
task_print("Done with original task")
asyncio.get_event_loop().run_until_complete(original_task())
run in python3.8 produces
Task-1: Original task
Task-1: 'Jumped straight into' coro; the await keyword itself does nothing to 'pause' the current_task
Task-1: 'Jumped straight into' another await; the act of await awaitable itself doesn't 'pause' anything
Task-2: Aha, but a (suitably unconsumed) yield DOES 'pause' the current_task allowing the event scheduler to _wakeup another task
Task-1: We're back to our awaitable object because that other task completed
Task-1: 'Jumped straight back into' coro; we have another pending task, but leaving an __await__ doesn't 'pause' the task any more than entering the __await__ does
Task-1: 'Jumped straight out of' coro. Leaving a coro, as with leaving/entering any awaitable, doesn't give control to the event loop
Task-1: await iterable_coro: pre-yield
Task-3: The event loop gets control when yield points (from an iterable coroutine) propagate up to the current_task through a suitable chain of await or yield from statements
Task-1: await iterable_coro: post-yield
Task-1: for y in iterable_coro: pre-yield
Task-1: But 'ordinary' yield points (those which are consumed by the current_task itself) behave as ordinary without relinquishing control at the async/task-level; y=None
Task-1: for y in iterable_coro: post-yield
Task-1: Done with original task
Task-4: This doesn't run until the very end because the generated None following the creation of this task is consumed by the for loop
Indeed, exercises such as the following can help one's mind to decouple the functionality of async/await from notion of "event loops" and such. The former is conducive to nice implementations and usages of the latter, but you can use async and await just as specially syntaxed generator stuff without any "loop" (whether asyncio or otherwise) whatsoever:
import types # no asyncio, nor any other loop framework
async def f1():
print(1)
print(await f2(),'= await f2()')
return 8
#types.coroutine
def f2():
print(2)
print((yield 3),'= yield 3')
return 7
class F3:
def __await__(self):
print(4)
print((yield 5),'= yield 5')
print(10)
return 11
task1 = f1()
task2 = F3().__await__()
""" You could say calls to send() represent our
"manual task management" in this script.
"""
print(task1.send(None), '= task1.send(None)')
print(task2.send(None), '= task2.send(None)')
try:
print(task1.send(6), 'try task1.send(6)')
except StopIteration as e:
print(e.value, '= except task1.send(6)')
try:
print(task2.send(9), 'try task2.send(9)')
except StopIteration as e:
print(e.value, '= except task2.send(9)')
produces
1
2
3 = task1.send(None)
4
5 = task2.send(None)
6 = yield 3
7 = await f2()
8 = except task1.send(6)
9 = yield 5
10
11 = except task2.send(9)
Yes, await passes control back to the asyncio eventloop, and allows it to schedule other async functions.
Another way is
await asyncio.sleep(0)
PEP 0492 adds the async keyword to Python 3.5.
How does Python benefit from the use of this operator? The example that is given for a coroutine is
async def read_data(db):
data = await db.fetch('SELECT ...')
According to the docs this achieves
suspend[ing] execution of read_data coroutine until db.fetch awaitable completes and returns the result data.
Does this async keyword actually involve creation of new threads or perhaps the use of an existing reserved async thread?
In the event that async does use a reserved thread, is it a single shared thread each in their own?
No, co-routines do not involve any kind of threads. Co-routines allow for cooperative multi-tasking in that each co-routine yields control voluntarily. Threads on the other hand switch between units at arbitrary points.
Up to Python 3.4, it was possible to write co-routines using generators; by using yield or yield from expressions in a function body you create a generator object instead, where code is only executed when you iterate over the generator. Together with additional event loop libraries (such as asyncio) you could write co-routines that would signal to an event loop that they were going to be busy (waiting for I/O perhaps) and that another co-routine could be run in the meantime:
import asyncio
import datetime
#asyncio.coroutine
def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
yield from asyncio.sleep(1)
Every time the above code advances to the yield from asyncio.sleep(1) line, the event loop is free to run a different co-routine, because this routine is not going to do anything for the next second anyway.
Because generators can be used for all sorts of tasks, not just co-routines, and because writing a co-routine using generator syntax can be confusing to new-comers, the PEP introduces new syntax that makes it clearer that you are writing a co-routine.
With the PEP implemented, the above sample could be written instead as:
async def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(1)
The resulting coroutine object still needs an event loop to drive the co-routines; an event loop would await on each co-routine in turn, which would execute those co-routines that are not currently awaiting for something to complete.
The advantages are that with native support, you can also introduce additional syntax to support asynchronous context managers and iterators. Entering and exiting a context manager, or looping over an iterator then can become more points in your co-routine that signal that other code can run instead because something is waiting again.
PEP 0492 adds the async keyword to Python 3.5.
How does Python benefit from the use of this operator? The example that is given for a coroutine is
async def read_data(db):
data = await db.fetch('SELECT ...')
According to the docs this achieves
suspend[ing] execution of read_data coroutine until db.fetch awaitable completes and returns the result data.
Does this async keyword actually involve creation of new threads or perhaps the use of an existing reserved async thread?
In the event that async does use a reserved thread, is it a single shared thread each in their own?
No, co-routines do not involve any kind of threads. Co-routines allow for cooperative multi-tasking in that each co-routine yields control voluntarily. Threads on the other hand switch between units at arbitrary points.
Up to Python 3.4, it was possible to write co-routines using generators; by using yield or yield from expressions in a function body you create a generator object instead, where code is only executed when you iterate over the generator. Together with additional event loop libraries (such as asyncio) you could write co-routines that would signal to an event loop that they were going to be busy (waiting for I/O perhaps) and that another co-routine could be run in the meantime:
import asyncio
import datetime
#asyncio.coroutine
def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
yield from asyncio.sleep(1)
Every time the above code advances to the yield from asyncio.sleep(1) line, the event loop is free to run a different co-routine, because this routine is not going to do anything for the next second anyway.
Because generators can be used for all sorts of tasks, not just co-routines, and because writing a co-routine using generator syntax can be confusing to new-comers, the PEP introduces new syntax that makes it clearer that you are writing a co-routine.
With the PEP implemented, the above sample could be written instead as:
async def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(1)
The resulting coroutine object still needs an event loop to drive the co-routines; an event loop would await on each co-routine in turn, which would execute those co-routines that are not currently awaiting for something to complete.
The advantages are that with native support, you can also introduce additional syntax to support asynchronous context managers and iterators. Entering and exiting a context manager, or looping over an iterator then can become more points in your co-routine that signal that other code can run instead because something is waiting again.