As in the following example, I encountered an unusual error when using async Generator.
async def demo():
async def get_data():
for i in range(5): # loop: for or while
await asyncio.sleep(1) # some IO code
yield i
datas = get_data()
await asyncio.gather(
anext(datas),
anext(datas),
anext(datas),
anext(datas),
anext(datas),
)
if __name__ == '__main__':
# asyncio.run(main())
asyncio.run(demo())
Console output:
2022-05-11 23:55:24,530 DEBUG asyncio 29180 30600 Using proactor: IocpProactor
Traceback (most recent call last):
File "E:\workspace\develop\python\crawlerstack-proxypool\demo.py", line 77, in <module>
asyncio.run(demo())
File "D:\devtools\Python310\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "D:\devtools\Python310\lib\asyncio\base_events.py", line 641, in run_until_complete
return future.result()
File "E:\workspace\develop\python\crawlerstack-proxypool\demo.py", line 66, in demo
await asyncio.gather(
RuntimeError: anext(): asynchronous generator is already running
Situation description: I have a loop logic that fetches a batch of data from Redis at a time, and I want to use yield to return the result. But this error occurs when I create a concurrent task.
Is there a good solution to this situation? I don't mean to change the way I'm using it now, but to see if I can tell if it's running or something like a lock and wait for it to run and then execute anext.
Maybe my logic is not reasonable now, but I also want to understand some critical language, let me realize the seriousness of this.
Thank you for your help.
TL;DR: the right way
Async generators suit badly for a parallel consumption. See my explanations below. As a proper workaround, use asyncio.Queue for the communication between producers and consumers:
queue = asyncio.Queue()
async def producer():
for item in range(5):
await asyncio.sleep(random.random()) # imitate async fetching
print('item fetched:', item)
await queue.put(item)
async def consumer():
while True:
item = await queue.get()
await asyncio.sleep(random.random()) # imitate async processing
print('item processed:', item)
await asyncio.gather(producer(), consumer(), consumer())
The above code snippet works well for an infinite stream of items: for example, a web server, which runs forever serving requests from clients. But what if we need to process a finite number of items? How should consumers know when to stop?
This deserves another question on Stack Overflow to cover all alternatives, but the simplest option is a sentinel approach, described below.
Sentinel: finite data streams approach
Introduce a sentinel = object(). When all items from an external data source are fetched and put to the queue, producer must push as many sentinels to the queue as many consumers you have. Once a consumer fetches the sentinel, it knows it should stop: if item is sentinel: break from loop.
sentinel = object()
consumers_count = 2
async def producer():
... # the same code as above
if new_item is None: # if no new data
for _ in range(consumers_count):
await queue.put(sentinel)
async def consumer():
while True:
... # the same code as above
if item is sentinel:
break
await asyncio.gather(
producer(),
*(consumer() for _ in range(consumers_count)),
)
TL;DR [2]: a dirty workaround
Since you require to not change your async generator approach, here is an asyncgen-based alternative. To resolve this issue (in a simple-yet-dirty way), you may wrap the source async generator with a lock:
async def with_lock(agen, lock: asyncio.Lock):
while True:
async with lock: # only one consumer is allowed to read
try:
yield await anext(agen)
except StopAsyncIteration:
break
lock = asyncio.Lock() # a common lock for all consumers
await asyncio.gather(
# every consumer must have its own "wrapped" generator
anext(with_lock(datas, lock)),
anext(with_lock(datas, lock)),
...
)
This will ensure only one consumer awaits for an item from the generator at a time. While this consumer awaits, other consumers are being executed, so parallelization is not lost.
A roughly equivalent code with async for (looks a little smarter):
async def with_lock(agen, lock: asyncio.Lock):
await lock.acquire()
async for item in agen:
lock.release()
yield item
await lock.acquire()
lock.release()
However, this code only handles async generator's anext method. Whereas generators API also includes aclose and athrow methods. See an explanation below.
Though, you may add support for these to the with_lock function too, I would recommend to either subclass a generator and handle the lock support inside, or better use the Queue-based approach from above.
See contextlib.aclosing for some inspiration.
Explanation
Both sync and async generators have a special attribute: .gi_running (for regular generators) and .ag_running (for async ones). You may discover them by executing dir on a generator:
>>> dir((i for i in range(0))
[..., 'gi_running', ...]
They are set to True when a generator's .__next__ or .__anext__ method is executed (next(...) and anext(...) are just a syntactic sugar for those).
This prevents re-executing next(...) on a generator, when another next(...) call on the same generator is already being executed: if the running flag is True, an exception is raised (for a sync generator it raises ValueError: generator already executing).
So, returning to your example, when you run await anext(datas) (via asyncio.gather), the following happens:
datas.ag_running is set to True.
An execution flow steps into the datas.__anext__ method.
Once an inner await statement is reached inside of the __anext__ method (await asyncio.sleep(1) in your case), asyncio's loop switches to another consumer.
Now, another consumer tries to call await anext(datas) too, but since datas.ag_running flag is still set to True, this results in a RuntimeError.
Why is this flag needed?
A generator's execution can be suspended and resumed. But only at yield statements. Thus, if a generator is paused at an inner await statement, it cannot be "resumed", because its state disallows it.
That's why a parallel next/anext call to a generator raises an exception: it is not ready to be resumed, it is already running.
athrow and aclose
Generators' API (both sync and async) includes not only send/asend method for iteration, but also:
close/aclose to release generator-allocated resources (e.g. a database connection) on exit or an exception
and throw/athrow to inform generator that it has to handle an exception.
aclose and athrow are async methods too. Which means that if two consumers try to close/throw an underlying generator in parallel, you will encounter the same issue since a generator will be closing (or handling an exception) while closed (thrown an exception) again.
Sync generators example
Though this is a frequent case for async generators, reproducing it for sync generators is not that naive, since sync next(...) calls are rarely interrupted.
One of the ways to interrupt a sync generator is to run a multithreaded code with multiple consumers (run in parallel threads) reading from a single generator. In that case, when the generator's code is interrupted while executing a next call, all other consumers' parallel attempts to call next will result in an exception.
Another way to achieve this is demonstrated in the generators-related PEP #255 via a self-consuming generator:
>>> def g():
... i = next(me)
... yield i
...
>>> me = g()
>>> next(me)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in g
ValueError: generator already executing
When outer next(me) is called, it sets me.gi_running to True and then executes the generator function code. A subsequent inner next(me) call leads to a ValueError.
Conclusion
Generators (especially async) work the best when consumed by a single reader. Multiple consumers support is hard, since requires patching behaviour of all the generator's methods, and thus discouraged.
Related
I want to return the first element of async generator and handle the remainning values without return like fire and forget. How to make early return of coroutine in python?
After passing the iterator to asyncio.create_task, it doesn't print the remaining values.
import asyncio
import time
async def async_iter(num):
for i in range(num):
await asyncio.sleep(0.5)
yield i
async def handle_remains(it):
async for i in it:
print(i)
async def run() -> None:
it = async_iter(10)
async for i in it:
print(i)
break
# await handle_remains(it)
# want to make this `fire and forget`(no await), expecting just printing the remainning values.
asyncio.create_task(handle_remains(it))
return i
if __name__ == '__main__':
asyncio.run(run())
time.sleep(10)
You’re close with the code, but not quite there yet (see also my comments above). In short, creating the Task isn’t enough: the Task needs to run:
task = asyncio.create_task(handle_remains(it)) # Creates a Task in `pending` state.
await task # Run the task, i.e. execute the wrapped coroutine.
A Task, along with coroutines and Futures, is an “Awaitable”. In fact:
When a coroutine is wrapped into a Task with functions like asyncio.create_task() the coroutine is automatically scheduled to run soon.
Notice the “scheduled to run soon”, now you have to make sure to actually run the task by calling await, a keyword which…
is used to obtain a result of coroutine execution.
I'm trying to understand how asyncio works. As for I/O operation i got understand that when await was called, we register Future object in EventLoop, and then calling epoll for get sockets which belongs to Future objects, that ready for give us data. After we run registred callback and resume function execution.
But, the thing that i cant understant, what's happening if we use await not for I/O operation. How eventloop understands that task is complete? Is it create socket for that or use another kind of loop? Is it use epoll? Or doesnt it add to Loop and used it as generator?
There is an example:
import asyncio
async def test():
return 10
async def my_coro(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...")
await test()
if loop.time() > end_time:
print("Done.")
break
async def main():
await my_coro(3.0)
asyncio.run(main())
await doesn't automatically yield to the event loop, that happens only when an async function (anywhere in the chain of awaits) requests suspension, typically due to IO or timeout not being ready.
In your example the event loop is never returned to, which you can easily verify by moving the "Blocking" print before the while loop and changing main to await asyncio.gather(my_coro(3.0), my_coro(3.0)). What you'll observe is that the coroutines are executed in series ("blocking" followed by "done", all repeated twice), not in parallel ("blocking" followed by another "blocking" and then twice "done"). The reason for that was that there was simply no opportunity for a context switch - my_coro executed in one go as if they were an ordinary function because none of its awaits ever chose to suspend.
I'd like to know what guarantees python gives around when a event loop will switch tasks.
As I understand it async / await are significantly different from threads in that the event loop does not switch task based on time slicing, meaning that unless the task yields (await), it will carry on indefinitely. This can actually be useful because it is easier to manage critical sections under asyncio than with threading.
What I'm less clear about is something like the following:
async def caller():
while True:
await callee()
async def callee():
pass
In this example caller is repeatedly await. So technically it is yielding. But I'm not clear on whether this will allow other tasks on the event loop to execute because it only yields to callee and that is never yielding.
That is if I awaited callee inside a "critical section" even though I know it won't block, am I at risk of something else unexpected happening?
You are right to be wary. caller yields from callee, and yields to the event loop. Then the event loop decides which task to resume. Other tasks may (hopefully) be squeezed in between the calls to callee. callee needs to await an actual blocking Awaitable such as asyncio.Future or asyncio.sleep(), not a coroutine, otherwise the control will not be returned to the event loop until caller returns.
For example, the following code will finish the caller2 task before it starts working on the caller1 task. Because callee2 is essentially a sync function without awaiting a blocking I/O operations, therefore, no suspension point is created and caller2 will resume immediately after each call to callee2.
import asyncio
import time
async def caller1():
for i in range(5):
await callee1()
async def callee1():
await asyncio.sleep(1)
print(f"called at {time.strftime('%X')}")
async def caller2():
for i in range(5):
await callee2()
async def callee2():
time.sleep(1)
print(f"sync called at {time.strftime('%X')}")
async def main():
task1 = asyncio.create_task(caller1())
task2 = asyncio.create_task(caller2())
await task1
await task2
asyncio.run(main())
Result:
sync called at 19:23:39
sync called at 19:23:40
sync called at 19:23:41
sync called at 19:23:42
sync called at 19:23:43
called at 19:23:43
called at 19:23:44
called at 19:23:45
called at 19:23:46
called at 19:23:47
But if callee2 awaits as the following, the task switching will happen even if it awaits asyncio.sleep(0), and the tasks will run concurrently.
async def callee2():
await asyncio.sleep(1)
print('sync called')
Result:
called at 19:22:52
sync called at 19:22:52
called at 19:22:53
sync called at 19:22:53
called at 19:22:54
sync called at 19:22:54
called at 19:22:55
sync called at 19:22:55
called at 19:22:56
sync called at 19:22:56
This behavior is not necessarily intuitive, but it makes sense considering that asyncio was made to handle I/O operations and networking concurrently, not the usual synchronous python codes.
Another thing to note is: This still works if the callee awaits a coroutine that, in turn, awaits a asyncio.Future, asyncio.sleep(), or another coroutine that await one of those things down the chain. The flow control will be returned to the event loop when the blocking Awaitable is awaited. So the following also works.
async def callee2():
await inner_callee()
print(f"sync called at {time.strftime('%X')}")
async def inner_callee():
await asyncio.sleep(1)
TLDR: No. Coroutines and their respective keywords (await, async with, async for) only enable suspension. Whether suspension occurs depends on the framework used, if at all.
Third-party async functions / iterators / context managers can act as
checkpoints; if you see await <something> or one of its friends, then
that might be a checkpoint. So to be safe, you should prepare for
scheduling or cancellation happening there.
[Trio documentation]
The await syntax of Python is syntactic sugar around two fundamental mechanisms: yield to temporarily suspend with a value, and return to permanently exit with a value. These are the same that, say, a generator function coroutine can use:
def gencoroutine():
for i in range(5):
yield i # temporarily suspend
return 5 # permanently exit
Notably, return does not imply a suspension. It is possible for a generator coroutine to never yield at all.
The await keyword (and its sibling yield from) interacts with both the yield and return mechanism:
If its target yields, await "passes on" the suspension to its own caller. This allows to suspend an entire stack of coroutines that all await each other.
If its target returnss, await catches the return value and provides it to its own coroutine. This allows to return a value directly to a "caller", without suspension.
This means that await does not guarantee that a suspension occurs. It is up to the target of await to trigger a suspension.
By itself, an async def coroutine can only return without suspension, and await to allow suspension. It cannot suspend by itself (yield does not suspend to the event loop).
async def unyielding():
return 2 # or `pass`
This means that await of just coroutines does never suspend. Only specific awaitables are able to suspend.
Suspension is only possible for awaitables with a custom __await__ method. These can yield directly to the event loop.
class YieldToLoop:
def __await__(self):
yield # to event loop
return # to awaiter
This means that await, directly or indirectly, of a framework's awaitable will suspend.
The exact semantics of suspending depend on the async framework in use. For example, whether a sleep(0) triggers a suspension or not, or which coroutine to run instead, is up to the framework. This also extends to async iterators and context managers -- for example, many async context managers will suspend either on enter or exit but not both.
Trio
If you call an async function provided by Trio (await <something in trio>), and it doesn’t raise an exception, then it always acts as a checkpoint. (If it does raise an exception, it might act as a checkpoint or might not.)
Asyncio
sleep() always suspends the current task, allowing other tasks to run.
I was wondering what exactly happens when we await a coroutine in async Python code, for example:
await send_message(string)
(1) send_message is added to the event loop, and the calling coroutine gives up control to the event loop, or
(2) We jump directly into send_message
Most explanations I read point to (1), as they describe the calling coroutine as exiting. But my own experiments suggest (2) is the case: I tried to have a coroutine run after the caller but before the callee and could not achieve this.
Disclaimer: Open to correction (particularly as to details and correct terminology) since I arrived here looking for the answer to this myself. Nevertheless, the research below points to a pretty decisive "main point" conclusion:
Correct OP answer: No, await (per se) does not yield to the event loop, yield yields to the event loop, hence for the case given: "(2) We jump directly into send_message". In particular, certain yield expressions are the only points, at bottom, where async tasks can actually be switched out (in terms of nailing down the precise spot where Python code execution can be suspended).
To be proven and demonstrated: 1) by theory/documentation, 2) by implementation code, 3) by example.
By theory/documentation
PEP 492: Coroutines with async and await syntax
While the PEP is not tied to any specific Event Loop implementation, it is relevant only to the kind of coroutine that uses yield as a signal to the scheduler, indicating that the coroutine will be waiting until an event (such as IO) is completed. ...
[await] uses the yield from implementation [with an extra step of validating its argument.] ...
Any yield from chain of calls ends with a yield. This is a fundamental mechanism of how Futures are implemented. Since, internally, coroutines are a special kind of generators, every await is suspended by a yield somewhere down the chain of await calls (please refer to PEP 3156 for a detailed explanation). ...
Coroutines are based on generators internally, thus they share the implementation. Similarly to generator objects, coroutines have throw(), send() and close() methods. ...
The vision behind existing generator-based coroutines and this proposal is to make it easy for users to see where the code might be suspended.
In context, "easy for users to see where the code might be suspended" seems to refer to the fact that in synchronous code yield is the place where execution can be "suspended" within a routine allowing other code to run, and that principle now extends perfectly to the async context wherein a yield (if its value is not consumed within the running task but is propagated up to the scheduler) is the "signal to the scheduler" to switch out tasks.
More succinctly: where does a generator yield control? At a yield. Coroutines (including those using async and await syntax) are generators, hence likewise.
And it is not merely an analogy, in implementation (see below) the actual mechanism by which a task gets "into" and "out of" coroutines is not anything new, magical, or unique to the async world, but simply by calling the coro's <generator>.send() method. That was (as I understand the text) part of the "vision" behind PEP 492: async and await would provide no novel mechanism for code suspension but just pour async-sugar on Python's already well-beloved and powerful generators.
And
PEP 3156: The "asyncio" module
The loop.slow_callback_duration attribute controls the maximum execution time allowed between two yield points before a slow callback is reported [emphasis in original].
That is, an uninterrupted segment of code (from the async perspective) is demarcated as that between two successive yield points (whose values reached up to the running Task level (via an await/yield from tunnel) without being consumed within it).
And this:
The scheduler has no public interface. You interact with it by using yield from future and yield from task.
Objection: "That says 'yield from', but you're trying to argue that the task can only switch out at a yield itself! yield from and yield are different things, my friend, and yield from itself doesn't suspend code!"
Ans: Not a contradiction. The PEP is saying you interact with the scheduler by using yield from future/task. But as noted above in PEP 492, any chain of yield from (~aka await) ultimately reaches a yield (the "bottom turtle"). In particular (see below), yield from future does in fact yield that same future after some wrapper work, and that yield is the actual "switch out point" where another task takes over. But it is incorrect for your code to directly yield a Future up to the current Task because you would bypass the necessary wrapper.
The objection having been answered, and its practical coding considerations being noted, the point I wish to make from the above quote remains: that a suitable yield in Python async code is ultimately the one thing which, having suspended code execution in the standard way that any other yield would do, now futher engages the scheduler to bring about a possible task switch.
By implementation code
asyncio/futures.py
class Future:
...
def __await__(self):
if not self.done():
self._asyncio_future_blocking = True
yield self # This tells Task to wait for completion.
if not self.done():
raise RuntimeError("await wasn't used with future")
return self.result() # May raise too.
__iter__ = __await__ # make compatible with 'yield from'.
Paraphrase: The line yield self is what tells the running task to sit out for now and let other tasks run, coming back to this one sometime after self is done.
Almost all of your awaitables in asyncio world are (multiple layers of) wrappers around a Future. The event loop remains utterly blind to all higher level await awaitable expressions until the code execution trickles down to an await future or yield from future and then (as seen here) calls yield self, which yielded self is then "caught" by none other than the Task under which the present coroutine stack is running thereby signaling to the task to take a break.
Possibly the one and only exception to the above "code suspends at yield self within await future" rule, in an asyncio context, is the potential use of a bare yield such as in asyncio.sleep(0). And since the sleep function is a topic of discourse in the comments of this post, let's look at that.
asyncio/tasks.py
#types.coroutine
def __sleep0():
"""Skip one event loop run cycle.
This is a private helper for 'asyncio.sleep()', used
when the 'delay' is set to 0. It uses a bare 'yield'
expression (which Task.__step knows how to handle)
instead of creating a Future object.
"""
yield
async def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay <= 0:
await __sleep0()
return result
if loop is None:
loop = events.get_running_loop()
else:
warnings.warn("The loop argument is deprecated since Python 3.8, "
"and scheduled for removal in Python 3.10.",
DeprecationWarning, stacklevel=2)
future = loop.create_future()
h = loop.call_later(delay,
futures._set_result_unless_cancelled,
future, result)
try:
return await future
finally:
h.cancel()
Note: We have here the two interesting cases at which control can shift to the scheduler:
(1) The bare yield in __sleep0 (when called via an await).
(2) The yield self immediately within await future.
The crucial line (for our purposes) in asyncio/tasks.py is when Task._step runs its top-level coroutine via result = self._coro.send(None) and recognizes fourish cases:
(1) result = None is generated by the coro (which, again, is a generator): the task "relinquishes control for one event loop iteration".
(2) result = future is generated within the coro, with further magic member field evidence that the future was yielded in a proper manner from out of Future.__iter__ == Future.__await__: the task relinquishes control to the event loop until the future is complete.
(3) A StopIteration is raised by the coro indicating the coroutine completed (i.e. as a generator it exhausted all its yields): the final result of the task (which is itself a Future) is set to the coroutine return value.
(4) Any other Exception occurs: the task's set_exception is set accordingly.
Modulo details, the main point for our concern is that coroutine segments in an asyncio event loop ultimately run via coro.send(). Initial startup and final termination aside, send() proceeds precisely from the last yield value it generated to the next one.
By example
import asyncio
import types
def task_print(s):
print(f"{asyncio.current_task().get_name()}: {s}")
async def other_task(s):
task_print(s)
class AwaitableCls:
def __await__(self):
task_print(" 'Jumped straight into' another `await`; the act of `await awaitable` *itself* doesn't 'pause' anything")
yield
task_print(" We're back to our awaitable object because that other task completed")
asyncio.create_task(other_task("The event loop gets control when `yield` points (from an iterable coroutine) propagate up to the `current_task` through a suitable chain of `await` or `yield from` statements"))
async def coro():
task_print(" 'Jumped straight into' coro; the `await` keyword itself does nothing to 'pause' the current_task")
await AwaitableCls()
task_print(" 'Jumped straight back into' coro; we have another pending task, but leaving an `__await__` doesn't 'pause' the task any more than entering the `__await__` does")
#types.coroutine
def iterable_coro(context):
task_print(f"`{context} iterable_coro`: pre-yield")
yield None # None or a Future object are the only legitimate yields to the task in asyncio
task_print(f"`{context} iterable_coro`: post-yield")
async def original_task():
asyncio.create_task(other_task("Aha, but a (suitably unconsumed) *`yield`* DOES 'pause' the current_task allowing the event scheduler to `_wakeup` another task"))
task_print("Original task")
await coro()
task_print("'Jumped straight out of' coro. Leaving a coro, as with leaving/entering any awaitable, doesn't give control to the event loop")
res = await iterable_coro("await")
assert res is None
asyncio.create_task(other_task("This doesn't run until the very end because the generated None following the creation of this task is consumed by the `for` loop"))
for y in iterable_coro("for y in"):
task_print(f"But 'ordinary' `yield` points (those which are consumed by the `current_task` itself) behave as ordinary without relinquishing control at the async/task-level; `y={y}`")
task_print("Done with original task")
asyncio.get_event_loop().run_until_complete(original_task())
run in python3.8 produces
Task-1: Original task
Task-1: 'Jumped straight into' coro; the await keyword itself does nothing to 'pause' the current_task
Task-1: 'Jumped straight into' another await; the act of await awaitable itself doesn't 'pause' anything
Task-2: Aha, but a (suitably unconsumed) yield DOES 'pause' the current_task allowing the event scheduler to _wakeup another task
Task-1: We're back to our awaitable object because that other task completed
Task-1: 'Jumped straight back into' coro; we have another pending task, but leaving an __await__ doesn't 'pause' the task any more than entering the __await__ does
Task-1: 'Jumped straight out of' coro. Leaving a coro, as with leaving/entering any awaitable, doesn't give control to the event loop
Task-1: await iterable_coro: pre-yield
Task-3: The event loop gets control when yield points (from an iterable coroutine) propagate up to the current_task through a suitable chain of await or yield from statements
Task-1: await iterable_coro: post-yield
Task-1: for y in iterable_coro: pre-yield
Task-1: But 'ordinary' yield points (those which are consumed by the current_task itself) behave as ordinary without relinquishing control at the async/task-level; y=None
Task-1: for y in iterable_coro: post-yield
Task-1: Done with original task
Task-4: This doesn't run until the very end because the generated None following the creation of this task is consumed by the for loop
Indeed, exercises such as the following can help one's mind to decouple the functionality of async/await from notion of "event loops" and such. The former is conducive to nice implementations and usages of the latter, but you can use async and await just as specially syntaxed generator stuff without any "loop" (whether asyncio or otherwise) whatsoever:
import types # no asyncio, nor any other loop framework
async def f1():
print(1)
print(await f2(),'= await f2()')
return 8
#types.coroutine
def f2():
print(2)
print((yield 3),'= yield 3')
return 7
class F3:
def __await__(self):
print(4)
print((yield 5),'= yield 5')
print(10)
return 11
task1 = f1()
task2 = F3().__await__()
""" You could say calls to send() represent our
"manual task management" in this script.
"""
print(task1.send(None), '= task1.send(None)')
print(task2.send(None), '= task2.send(None)')
try:
print(task1.send(6), 'try task1.send(6)')
except StopIteration as e:
print(e.value, '= except task1.send(6)')
try:
print(task2.send(9), 'try task2.send(9)')
except StopIteration as e:
print(e.value, '= except task2.send(9)')
produces
1
2
3 = task1.send(None)
4
5 = task2.send(None)
6 = yield 3
7 = await f2()
8 = except task1.send(6)
9 = yield 5
10
11 = except task2.send(9)
Yes, await passes control back to the asyncio eventloop, and allows it to schedule other async functions.
Another way is
await asyncio.sleep(0)
I'm getting the flow of using asyncio in Python 3.5 but I haven't seen a description of what things I should be awaiting and things I should not be or where it would be neglible. Do I just have to use my best judgement in terms of "this is an IO operation and thus should be awaited"?
By default all your code is synchronous. You can make it asynchronous defining functions with async def and "calling" these functions with await. A More correct question would be "When should I write asynchronous code instead of synchronous?". Answer is "When you can benefit from it". In cases when you work with I/O operations as you noted you will usually benefit:
# Synchronous way:
download(url1) # takes 5 sec.
download(url2) # takes 5 sec.
# Total time: 10 sec.
# Asynchronous way:
await asyncio.gather(
async_download(url1), # takes 5 sec.
async_download(url2) # takes 5 sec.
)
# Total time: only 5 sec. (+ little overhead for using asyncio)
Of course, if you created a function that uses asynchronous code, this function should be asynchronous too (should be defined as async def). But any asynchronous function can freely use synchronous code. It makes no sense to cast synchronous code to asynchronous without some reason:
# extract_links(url) should be async because it uses async func async_download() inside
async def extract_links(url):
# async_download() was created async to get benefit of I/O
html = await async_download(url)
# parse() doesn't work with I/O, there's no sense to make it async
links = parse(html)
return links
One very important thing is that any long synchronous operation (> 50 ms, for example, it's hard to say exactly) will freeze all your asynchronous operations for that time:
async def extract_links(url):
data = await download(url)
links = parse(data)
# if search_in_very_big_file() takes much time to process,
# all your running async funcs (somewhere else in code) will be frozen
# you need to avoid this situation
links_found = search_in_very_big_file(links)
You can avoid it calling long running synchronous functions in separate process (and awaiting for result):
executor = ProcessPoolExecutor(2)
async def extract_links(url):
data = await download(url)
links = parse(data)
# Now your main process can handle another async functions while separate process running
links_found = await loop.run_in_executor(executor, search_in_very_big_file, links)
One more example: when you need to use requests in asyncio. requests.get is just synchronous long running function, which you shouldn't call inside async code (again, to avoid freezing). But it's running long because of I/O, not because of long calculations. In that case, you can use ThreadPoolExecutor instead of ProcessPoolExecutor to avoid some multiprocessing overhead:
executor = ThreadPoolExecutor(2)
async def download(url):
response = await loop.run_in_executor(executor, requests.get, url)
return response.text
You do not have much freedom. If you need to call a function you need to find out if this is a usual function or a coroutine. You must use the await keyword if and only if the function you are calling is a coroutine.
If async functions are involved there should be an "event loop" which orchestrates these async functions. Strictly speaking it's not necessary, you can "manually" run the async method sending values to it, but probably you don't want to do it. The event loop keeps track of not-yet-finished coroutines and chooses the next one to continue running. asyncio module provides an implementation of event loop, but this is not the only possible implementation.
Consider these two lines of code:
x = get_x()
do_something_else()
and
x = await aget_x()
do_something_else()
Semantic is absolutely the same: call a method which produces some value, when the value is ready assign it to variable x and do something else. In both cases the do_something_else function will be called only after the previous line of code is finished. It doesn't even mean that before or after or during the execution of asynchronous aget_x method the control will be yielded to event loop.
Still there are some differences:
the second snippet can appear only inside another async function
aget_x function is not usual, but coroutine (that is either declared with async keyword or decorated as coroutine)
aget_x is able to "communicate" with the event loop: that is yield some objects to it. The event loop should be able to interpret these objects as requests to do some operations (f.e. to send a network request and wait for response, or just suspend this coroutine for n seconds). Usual get_x function is not able to communicate with event loop.