"async with" within asyncio.Task not working - python

I am learning python asyncio and testing a lot of code using them.
Below is a code where I try to subscribe multiple Websocket streaming using asyncio and aiohttp.
I do not understand why when coro(item1, item2): is executed as a task, it does not go into the async with ... block. (i.e "A" is printed but not "B").
Could anyone help me understand the reason for this?
(I have already got a working code but , I simply want to understand what the mechanism behind this is.)
Code
import aiohttp
import asyncio
import json
async def coro(
item1,
item2):
print("A")
async with aiohttp.ClientSession() as session:
async with session.ws_connect(url='URL') as ws:
print("B")
await asyncio.gather(ws.send_json(item1),
ws.send_json(item2))
print("C")
async for msg in ws:
print(msg)
async def ws_connect(item1,
item2):
task = asyncio.create_task(coro(item1, item2))
return task
async def main():
item1 = {
"method": "subscribe",
"params": {'channel': "..."}
}
item2 = {
"method": "subscribe",
"params": {'channel': "..."}
}
ws_task = await ws_connect(item1, item2)
print("D")
asyncio.run(main())
Output
D
A

B is never printed because you never await the returned task, only the method which returned it.
The subtle mistake is in return task followed by await ws_connect(item1, item2).
TL;DR; return await task.
The key to understand the program's output is to know that the context switches in the asyncio event loop can only occur at few places, in particular at await expressions. At this point, the event loop might suspend the current coroutine and continue with another.
First, you create a ws_connect coroutine and immedietely await it, this forces the event loop to suspend main and actually run ws_connect because there is not anything else to run.
Since ws_connect contains none of those points which allow context switch, the coro() function never actually starts.
Only thing create_task does is binding the coroutine to the task object and adding it to the event loop's queue. But you never await it, you just return it as any ordinary return value. Okay, now the ws_connect() finishes and the event loop can choose to run any of the tasks, it chose to continue with main probably since it has been waiting on ws_connect().
Okay, main prints D and returns. Now what?
There is some extra await in asyncio.run which gives coro() a chance to start - hence the printed A (but only after D) yet nothing forces asyncio.run to wait on coro() so when the coro yields back to the context loop through async with, the run finishes and program exits which leaves coro() unfinished.
If you add an extra await asyncio.sleep(1) after print('D'), the loop will again suspend main for at least some time and continue with coro() and that would print B had the URL been correct.
Actually, the context switching is little bit more complicated because ordinary await on a coroutine usually does not switches unless the execution really needs to block on IO or something await asyncio.sleep(0) or yield* guarantees a true context switch without the extra blocking.
*yield from inside __await__ method.
The lesson here is simple - never return awaitables from async methods, it leads to exactly this kind of mistake. Always use return await by default, at worst you get runtime error in case the returned object is not actually awaitable(like return await some_string) and it can easily be spotted and fixed.
On the other hand, returning awaitables from ordinary functions is OK and makes it act like the function is asynchronous. Although one should be careful when mixing these two approaches. Personally, I prefer the first approach as it shifts the responsibility on the writer of the function, not the user which will be warned linters which usually do detect non-awaited corountine calls but not the returned awaitables. So another solution would to make ws_connect an ordinary function, then the await in await ws_connect would apply to the returned value(=the task), not the function itself.

Related

Why do I need to await twice

I'm trying to run some IO blocking code, so I'm using to_thread to send the function to another thread. I tried several things, but in all cases, I seem to have to await the to_thread which just returns another coroutine (?!) that I then have to await again. For some reason this isn't quite clicking.
import asyncio
async def search(keyword_list):
coroutines = set()
for index, kw in enumerate(keyword_list):
coroutines.add(asyncio.to_thread(do_lookup, keyword, index))
for result in asyncio.as_completed(coroutines):
outcome, index = await result
# Do some magic with the outcome and index
# BUT, this doesn't work because `await result` apparently
# just returns ANOTHER coroutine!
async def do_lookup(keyword, index):
# Do long, blocking stuff here
print(f'running...{keyword} {index}')
return keyword, index
if __name__ == '__main__':
asyncio.run(search([1, 2, 3, 4]))
As I was copy/pasting and adapting my code to make a generic example, I discovered the problem here.
do_lookup is supposed to a synchronous function (because of the usage of to_thread), so by defining it async dev do_lookup I'm instead defining it as an asynchronous function, thereby causing the "double" await issue.
Simply redefining do_lookup without the async keyword did the trick!

What are Python asyncio's cancellation points?

When can an asyncio Task be canceled? Or, more generally, when can an asyncio loop switch to a different Task? It's been really hard for me to use cancellation in my asyncio programs, because I don't know when a CancelledError can get thrown.
I was working on a bit of code earlier, with a context manager kinda like this:
#! /usr/bin/python3
import asyncio
class MyContextManager:
def __init__(self):
self.locked = False
async def __aenter__(self):
self.locked = True
async def __aexit__(self, *_):
self.locked = False
async def main():
async with MyContextManager():
print("Doing something that needs locking")
if __name__ == "__main__":
asyncio.run(main())
What happens if the task is cancelled during __aenter__? I need to make sure that self.locked is false whenever the async with section exits. (I'm using self.locked as a simplification of a more complex acquire/release algorithm here, which includes some steps that are necessarily async.)
The docs regarding async with say:
The following code:
async with EXPRESSION as TARGET:
SUITE
is semantically equivalent to:
manager = (EXPRESSION)
aenter = type(manager).__aenter__
aexit = type(manager).__aexit__
value = await aenter(manager)
hit_except = False
try:
TARGET = value
SUITE
except:
hit_except = True
if not await aexit(manager, *sys.exc_info()):
raise finally:
if not hit_except:
await aexit(manager, None, None, None)
If I'm reading this right, this means that there's a window between when await aenter is called and when the try:finally: block is set up. If a task is canceled at the time that aenter returns, then aexit will not be called.
Can a task be canceled on exit from an async function? Well, let's look at the docs for asyncio.shield:
The statement:
res = await shield(something())
is equivalent to:
res = await something()
except that if the coroutine containing it is cancelled, the Task running in something() is not cancelled. From the point of view of something(), the cancellation did not happen. Although its caller is still cancelled, so the “await” expression still raises a CancelledError.
This seems to imply that an await expression can raise a CancelledError, even if the task is not canceled during the evaluation of the underlying expression.
As an opposing view, when I looked at the source code for asyncio.shield, it looks like the CancelledError is raised within asyncio.shield, rather than at the time that the await expression returns.
The biggest advantage of coroutines over threads is that it's much easier to reason about parallelism: synchronous operations will complete serially, and it's only when you await that anything can change out from under you. I use this reasoning a lot in my code. But it's not clear exactly when that await expression can change something out from under you.
An asyncio task can only be canceled on the await, as that's the only point at which the event loop can be running instead of your code.
The event loop is given control when one of the tasks you await yields to it (this is usually done by something like awaiting a sleep, or a future). Do mind that this is not necessarily done for every await, so an await doesn't guarantee that the event loop will actually run (the simplest example would be a coro that only returns a constant).
With the python equivalent async context manager code, the only point where something can be cancelled before the try block is set up is within __aenter__ itself, as that's the only point that may actually yield control. If something in the __aenter__ is cancelled and propagates up as an exception, it wouldn't make sense to call the __aexit__ either.
For shield, what the docs are saying is that on cancellation, the shield's task will be cancelled, which is shown to the caller by rising CancelledError on await, but the task it's wrapping will continue executing without ever being aware of the shielded cancellation.

How asyncio understands that task is complete for non-blocking operations

I'm trying to understand how asyncio works. As for I/O operation i got understand that when await was called, we register Future object in EventLoop, and then calling epoll for get sockets which belongs to Future objects, that ready for give us data. After we run registred callback and resume function execution.
But, the thing that i cant understant, what's happening if we use await not for I/O operation. How eventloop understands that task is complete? Is it create socket for that or use another kind of loop? Is it use epoll? Or doesnt it add to Loop and used it as generator?
There is an example:
import asyncio
async def test():
return 10
async def my_coro(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...")
await test()
if loop.time() > end_time:
print("Done.")
break
async def main():
await my_coro(3.0)
asyncio.run(main())
await doesn't automatically yield to the event loop, that happens only when an async function (anywhere in the chain of awaits) requests suspension, typically due to IO or timeout not being ready.
In your example the event loop is never returned to, which you can easily verify by moving the "Blocking" print before the while loop and changing main to await asyncio.gather(my_coro(3.0), my_coro(3.0)). What you'll observe is that the coroutines are executed in series ("blocking" followed by "done", all repeated twice), not in parallel ("blocking" followed by another "blocking" and then twice "done"). The reason for that was that there was simply no opportunity for a context switch - my_coro executed in one go as if they were an ordinary function because none of its awaits ever chose to suspend.

Does await always give other tasks a chance to execute?

I'd like to know what guarantees python gives around when a event loop will switch tasks.
As I understand it async / await are significantly different from threads in that the event loop does not switch task based on time slicing, meaning that unless the task yields (await), it will carry on indefinitely. This can actually be useful because it is easier to manage critical sections under asyncio than with threading.
What I'm less clear about is something like the following:
async def caller():
while True:
await callee()
async def callee():
pass
In this example caller is repeatedly await. So technically it is yielding. But I'm not clear on whether this will allow other tasks on the event loop to execute because it only yields to callee and that is never yielding.
That is if I awaited callee inside a "critical section" even though I know it won't block, am I at risk of something else unexpected happening?
You are right to be wary. caller yields from callee, and yields to the event loop. Then the event loop decides which task to resume. Other tasks may (hopefully) be squeezed in between the calls to callee. callee needs to await an actual blocking Awaitable such as asyncio.Future or asyncio.sleep(), not a coroutine, otherwise the control will not be returned to the event loop until caller returns.
For example, the following code will finish the caller2 task before it starts working on the caller1 task. Because callee2 is essentially a sync function without awaiting a blocking I/O operations, therefore, no suspension point is created and caller2 will resume immediately after each call to callee2.
import asyncio
import time
async def caller1():
for i in range(5):
await callee1()
async def callee1():
await asyncio.sleep(1)
print(f"called at {time.strftime('%X')}")
async def caller2():
for i in range(5):
await callee2()
async def callee2():
time.sleep(1)
print(f"sync called at {time.strftime('%X')}")
async def main():
task1 = asyncio.create_task(caller1())
task2 = asyncio.create_task(caller2())
await task1
await task2
asyncio.run(main())
Result:
sync called at 19:23:39
sync called at 19:23:40
sync called at 19:23:41
sync called at 19:23:42
sync called at 19:23:43
called at 19:23:43
called at 19:23:44
called at 19:23:45
called at 19:23:46
called at 19:23:47
But if callee2 awaits as the following, the task switching will happen even if it awaits asyncio.sleep(0), and the tasks will run concurrently.
async def callee2():
await asyncio.sleep(1)
print('sync called')
Result:
called at 19:22:52
sync called at 19:22:52
called at 19:22:53
sync called at 19:22:53
called at 19:22:54
sync called at 19:22:54
called at 19:22:55
sync called at 19:22:55
called at 19:22:56
sync called at 19:22:56
This behavior is not necessarily intuitive, but it makes sense considering that asyncio was made to handle I/O operations and networking concurrently, not the usual synchronous python codes.
Another thing to note is: This still works if the callee awaits a coroutine that, in turn, awaits a asyncio.Future, asyncio.sleep(), or another coroutine that await one of those things down the chain. The flow control will be returned to the event loop when the blocking Awaitable is awaited. So the following also works.
async def callee2():
await inner_callee()
print(f"sync called at {time.strftime('%X')}")
async def inner_callee():
await asyncio.sleep(1)
TLDR: No. Coroutines and their respective keywords (await, async with, async for) only enable suspension. Whether suspension occurs depends on the framework used, if at all.
Third-party async functions / iterators / context managers can act as
checkpoints; if you see await <something> or one of its friends, then
that might be a checkpoint. So to be safe, you should prepare for
scheduling or cancellation happening there.
[Trio documentation]
The await syntax of Python is syntactic sugar around two fundamental mechanisms: yield to temporarily suspend with a value, and return to permanently exit with a value. These are the same that, say, a generator function coroutine can use:
def gencoroutine():
for i in range(5):
yield i # temporarily suspend
return 5 # permanently exit
Notably, return does not imply a suspension. It is possible for a generator coroutine to never yield at all.
The await keyword (and its sibling yield from) interacts with both the yield and return mechanism:
If its target yields, await "passes on" the suspension to its own caller. This allows to suspend an entire stack of coroutines that all await each other.
If its target returnss, await catches the return value and provides it to its own coroutine. This allows to return a value directly to a "caller", without suspension.
This means that await does not guarantee that a suspension occurs. It is up to the target of await to trigger a suspension.
By itself, an async def coroutine can only return without suspension, and await to allow suspension. It cannot suspend by itself (yield does not suspend to the event loop).
async def unyielding():
return 2 # or `pass`
This means that await of just coroutines does never suspend. Only specific awaitables are able to suspend.
Suspension is only possible for awaitables with a custom __await__ method. These can yield directly to the event loop.
class YieldToLoop:
def __await__(self):
yield # to event loop
return # to awaiter
This means that await, directly or indirectly, of a framework's awaitable will suspend.
The exact semantics of suspending depend on the async framework in use. For example, whether a sleep(0) triggers a suspension or not, or which coroutine to run instead, is up to the framework. This also extends to async iterators and context managers -- for example, many async context managers will suspend either on enter or exit but not both.
Trio
If you call an async function provided by Trio (await <something in trio>), and it doesn’t raise an exception, then it always acts as a checkpoint. (If it does raise an exception, it might act as a checkpoint or might not.)
Asyncio
sleep() always suspends the current task, allowing other tasks to run.

Is there a difference between 'await future' and 'await asyncio.wait_for(future, None)'?

With python 3.5 or later, is there any difference between directly applying await to a future or task, and wrapping it with asyncio.wait_for? The documentation is unclear on when it is appropriate to use wait_for and I'm wondering if it's a vestige of the old generator-based library. The test program below appears to show no difference but that doesn't really prove anything.
import asyncio
async def task_one():
await asyncio.sleep(0.1)
return 1
async def task_two():
await asyncio.sleep(0.1)
return 2
async def test(loop):
t1 = loop.create_task(task_one())
t2 = loop.create_task(task_two())
print(repr(await t1))
print(repr(await asyncio.wait_for(t2, None)))
def main():
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(test(loop))
finally:
loop.close()
main()
The wait_for gives two more functionalities:
allow to define timeout,
let you specify the loop
Your example:
await f1
await asyncio.wait_for(f1, None) # or simply asyncio.wait_for(f1)
besides an overhead of calling additional wrapper (wait_for), they're the same (https://github.com/python/cpython/blob/master/Lib/asyncio/tasks.py#L318).
Both awaits will wait indefinitely for the results (or exception). In this case plain await is more appropriate.
On the other hand if you provide timeout argument it will wait for the results with time constraint. And if it will take more than the timeout it will raise TimeoutError and the future will be cancelled.
async def my_func():
await asyncio.sleep(10)
return 'OK'
# will wait 10s
await my_func()
# will wait only 5 seconds and then will raise TimeoutError
await asyncio.wait_for(my_func(), 5)
Another thing is the loop argument. In the most cases you shouldn't be bothered, the use case is limited: inject different loop for tests, run other loop ...
The problem with this parameter is, that all subsequent tasks/functions should also have that loop passed along...
More info https://github.com/python/asyncio/issues/362
Passing asyncio loop by argument or using default asyncio loop
Why use explicit loop parameter with aiohttp?
Unfortunately the python documentation is a little bit unclear here, but if you have a look into the sources its pretty obvious:
In contrary to await the coroutine asyncio.wait_for() allows to wait only for a limited time until the future/task completes. If it does not complete within this time a concurrent.futures.TimeoutError is raised.
This timeout can be specified as second parameter. In your sample code this timeout parameter is None which results in exactly the same functionality as directly applying await/yield from.

Categories