How do I collect task garbages in Python asyncio?

How do I collect task garbages in Python asyncio? - python

Sorry for the bad English, but I'll try my best.
Consider the following code:
import asyncio
async def main():
async def printA():
await asyncio.sleep(1)
print('a')
# create a stream
stream=...
async for message in stream:
pass
asyncio.run(main())
Yes, printA is not yet used.
Now I want to invoke printA when I see some types of messages from the stream.
If I can accept that the stream waits printA is done to continue, I can write something like this:
async for message in stream:
if message=='printA':
await printA()
But I can't, so I must write at least:
async def main():
async def printA():
await asyncio.sleep(1)
print('a')
# create a stream
stream=...
taskSet=set()
async for message in stream:
if message=='printA':
taskSet.add(asyncio.create_task(printA()))
await asyncio.gather(*taskSet)
But if the stream is long enough, taskSet would become really big, even if many printA(s) are in fact already done.
So I would want them to be removed as soon as they are done.
I don't know how to write this from now on.
Can I remove that task within printA? The execution of printA() won't be earlier than create_task is invoked, but would it be later than create_task returns? Documentation does not seem to guarantee that. Although I found some says that it is guaranteed by the current implementation.
I can't simply discard the task reference, right? As the doc of create_task says:
Important: Save a reference to the result of this function, to avoid a task disappearing mid execution.

You can find the answer directly in the bug report concerning the same problem of "fire and forget" tasks that led to the documentation update "Important: Save a reference ..."
https://bugs.python.org/issue44665
I'll copy the recipe for an automatic task removal:
running_tasks = set()
# [...]
task = asyncio.create_task(some_background_function())
running_tasks.add(task)
task.add_done_callback(lambda t: running_tasks.remove(t))

Related

Why is create_task() needed to create a queue of coroutines using asyncio gather?

I have the following code running in an event loop where I'm downloading a large number of files using asyncio and restricting the number of files downloaded using asyncio.queue:
download_tasks = asyncio.Queue()
for file in files:
# download_file() is an async function that downloads a file from Microsoft blob storage
# that is basically await blob.download_blob()
download_tasks.put_nowait(asyncio.create_task(download_file(file=file))
async def worker():
while not download_tasks.empty():
return await download_tasks.get_nowait()
worker_limit = 10
# each call to download_file() returns a pandas dataframe
df_list = await asyncio.gather(*[worker() for _ in range(worker_limit)], return_exceptions=True)
df = pd.concat(df_list)
This code seems to run fine, but I originally had the for loop defined as:
for file in files:
# download_file() is an async function that downloads a file from Microsoft blob storage
# that is basically await blob.download_blob()
download_tasks.put_nowait(download_file(file=file)
With this code, the result is the same but I get the following warning:
RuntimeWarning: coroutine 'download_file' was never awaited
Looking at asyncio examples, sometimes I see create_task() used when creating a list or queue of coroutines to be run in gather and sometimes I don't. Why is it needed in my case and what's the best practice for using it?
Edit: As #user2357112supportsMonica discourteously pointed out, the return statement within worker() doesn't really make sense. The point of this code is to limit concurrency because I may have to download thousands at a time and would like to limit it to 10 at a time using the queue. So my actual question is, how can I use gather to return all my results using this queue implementation?
Edit 2: I seemed to have found an easy solution that works using a semaphore instead of a queue with the following code adapted from this answer https://stackoverflow.com/a/61478547/4844593:
download_tasks = []
for file in files:
download_tasks.append(download_file(file=file))
async def gather_with_concurrency(n, *tasks):
semaphore = asyncio.Semaphore(n)
async def sem_task(task):
async with semaphore:
return await task
return await asyncio.gather(*(sem_task(task) for task in tasks))
df_list = await gather_with_concurrency(10, *download_tasks)
return pd.concat(df_list)

As "user2357112 supports Monica" notes, the original issue probably comes from the workers having a return so each worker will download one file then quit, meaning any coroutines after the first 10 will be ignored and never awaited (you can probably see that if you log information about download_tasks after the supposed completion of your processing).
The create_tasks defeats that because it will immediately schedule the downloading at the same time (defeating the attempted rate limiting / workers pool), then the incorrect worker code will just ignore anything after the first 10 items.
Anyway the difference between coroutines (e.g. bare async functions) and tasks is that tasks are independently scheduled. That is, once you've created a task it lives its life independently and you don't have to await it if you don't want its result. That is similar to Javascript's async functions.
coroutines, however, don't do anything until they are awaited, they will only progress if they are explicitelly polled and that is only done by awaiting them (directly or indirectly e.g. gather or wait will await/poll the objects they wrap).

python how to make early return when asyncio generator

I want to return the first element of async generator and handle the remainning values without return like fire and forget. How to make early return of coroutine in python?
After passing the iterator to asyncio.create_task, it doesn't print the remaining values.
import asyncio
import time
async def async_iter(num):
for i in range(num):
await asyncio.sleep(0.5)
yield i
async def handle_remains(it):
async for i in it:
print(i)
async def run() -> None:
it = async_iter(10)
async for i in it:
print(i)
break
# await handle_remains(it)
# want to make this `fire and forget`(no await), expecting just printing the remainning values.
asyncio.create_task(handle_remains(it))
return i
if __name__ == '__main__':
asyncio.run(run())
time.sleep(10)

You’re close with the code, but not quite there yet (see also my comments above). In short, creating the Task isn’t enough: the Task needs to run:
task = asyncio.create_task(handle_remains(it)) # Creates a Task in `pending` state.
await task # Run the task, i.e. execute the wrapped coroutine.
A Task, along with coroutines and Futures, is an “Awaitable”. In fact:
When a coroutine is wrapped into a Task with functions like asyncio.create_task() the coroutine is automatically scheduled to run soon.
Notice the “scheduled to run soon”, now you have to make sure to actually run the task by calling await, a keyword which…
is used to obtain a result of coroutine execution.

"async with" within asyncio.Task not working

I am learning python asyncio and testing a lot of code using them.
Below is a code where I try to subscribe multiple Websocket streaming using asyncio and aiohttp.
I do not understand why when coro(item1, item2): is executed as a task, it does not go into the async with ... block. (i.e "A" is printed but not "B").
Could anyone help me understand the reason for this?
(I have already got a working code but , I simply want to understand what the mechanism behind this is.)
Code
import aiohttp
import asyncio
import json
async def coro(
item1,
item2):
print("A")
async with aiohttp.ClientSession() as session:
async with session.ws_connect(url='URL') as ws:
print("B")
await asyncio.gather(ws.send_json(item1),
ws.send_json(item2))
print("C")
async for msg in ws:
print(msg)
async def ws_connect(item1,
item2):
task = asyncio.create_task(coro(item1, item2))
return task
async def main():
item1 = {
"method": "subscribe",
"params": {'channel': "..."}
}
item2 = {
"method": "subscribe",
"params": {'channel': "..."}
}
ws_task = await ws_connect(item1, item2)
print("D")
asyncio.run(main())
Output
D
A

B is never printed because you never await the returned task, only the method which returned it.
The subtle mistake is in return task followed by await ws_connect(item1, item2).
TL;DR; return await task.
The key to understand the program's output is to know that the context switches in the asyncio event loop can only occur at few places, in particular at await expressions. At this point, the event loop might suspend the current coroutine and continue with another.
First, you create a ws_connect coroutine and immedietely await it, this forces the event loop to suspend main and actually run ws_connect because there is not anything else to run.
Since ws_connect contains none of those points which allow context switch, the coro() function never actually starts.
Only thing create_task does is binding the coroutine to the task object and adding it to the event loop's queue. But you never await it, you just return it as any ordinary return value. Okay, now the ws_connect() finishes and the event loop can choose to run any of the tasks, it chose to continue with main probably since it has been waiting on ws_connect().
Okay, main prints D and returns. Now what?
There is some extra await in asyncio.run which gives coro() a chance to start - hence the printed A (but only after D) yet nothing forces asyncio.run to wait on coro() so when the coro yields back to the context loop through async with, the run finishes and program exits which leaves coro() unfinished.
If you add an extra await asyncio.sleep(1) after print('D'), the loop will again suspend main for at least some time and continue with coro() and that would print B had the URL been correct.
Actually, the context switching is little bit more complicated because ordinary await on a coroutine usually does not switches unless the execution really needs to block on IO or something await asyncio.sleep(0) or yield* guarantees a true context switch without the extra blocking.
*yield from inside __await__ method.
The lesson here is simple - never return awaitables from async methods, it leads to exactly this kind of mistake. Always use return await by default, at worst you get runtime error in case the returned object is not actually awaitable(like return await some_string) and it can easily be spotted and fixed.
On the other hand, returning awaitables from ordinary functions is OK and makes it act like the function is asynchronous. Although one should be careful when mixing these two approaches. Personally, I prefer the first approach as it shifts the responsibility on the writer of the function, not the user which will be warned linters which usually do detect non-awaited corountine calls but not the returned awaitables. So another solution would to make ws_connect an ordinary function, then the await in await ws_connect would apply to the returned value(=the task), not the function itself.

How asyncio understands that task is complete for non-blocking operations

I'm trying to understand how asyncio works. As for I/O operation i got understand that when await was called, we register Future object in EventLoop, and then calling epoll for get sockets which belongs to Future objects, that ready for give us data. After we run registred callback and resume function execution.
But, the thing that i cant understant, what's happening if we use await not for I/O operation. How eventloop understands that task is complete? Is it create socket for that or use another kind of loop? Is it use epoll? Or doesnt it add to Loop and used it as generator?
There is an example:
import asyncio
async def test():
return 10
async def my_coro(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...")
await test()
if loop.time() > end_time:
print("Done.")
break
async def main():
await my_coro(3.0)
asyncio.run(main())

await doesn't automatically yield to the event loop, that happens only when an async function (anywhere in the chain of awaits) requests suspension, typically due to IO or timeout not being ready.
In your example the event loop is never returned to, which you can easily verify by moving the "Blocking" print before the while loop and changing main to await asyncio.gather(my_coro(3.0), my_coro(3.0)). What you'll observe is that the coroutines are executed in series ("blocking" followed by "done", all repeated twice), not in parallel ("blocking" followed by another "blocking" and then twice "done"). The reason for that was that there was simply no opportunity for a context switch - my_coro executed in one go as if they were an ordinary function because none of its awaits ever chose to suspend.

How to use async/await in python 3.5+

I was trying to explain an example of async programming in python but I failed.
Here is my code.
import asyncio
import time
async def asyncfoo(t):
time.sleep(t)
print("asyncFoo")
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncfoo(10)) # I think Here is the problem
print("Foo")
loop.close()
My expectation is that I would see:
Foo
asyncFoo
With a wait of 10s before asyncFoo was displayed.
But instead I got nothing for 10s, and then they both displayed.
What am I doing wrong, and how can I explain it?

run_until_complete will block until asyncfoo is done. Instead, you would need two coroutines executed in the loop. Use asyncio.gather to easily start more than one coroutine with run_until_complete.
Here is a an example:
import asyncio
async def async_foo():
print("asyncFoo1")
await asyncio.sleep(3)
print("asyncFoo2")
async def async_bar():
print("asyncBar1")
await asyncio.sleep(1)
print("asyncBar2")
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.gather(async_foo(), async_bar()))
loop.close()

Your expectation would work in contexts where you run your coroutine as a Task independent of the flow of the code. Another situation where it would work is if you are running multiple coroutines side-by-side, in which case the event-loop will juggle the code execution from await to await statement.
Within the context of your example, you can achieve your anticipated behaviour by wrapping your coroutine in a Task object, which will continue-on in the background without holding up the remainder of the code in the code-block from whence it is called.
For example.
import asyncio
async def asyncfoo(t):
await asyncio.sleep(t)
print("asyncFoo")
async def my_app(t):
my_task = asyncio.ensure_future(asyncfoo(t))
print("Foo")
await asyncio.wait([my_task])
loop = asyncio.get_event_loop()
loop.run_until_complete(my_app(10))
loop.close()
Note that you should use asyncio.sleep() instead of the time module.

run_until_complete is blocking. So, even if it'll happen in 10 seconds, it will wait for it. After it's completed, the other print occurs.
You should launch your loop.run_until_complete(asyncfoo(10)) in a thread or a subprocess if you want the "Foo" to be print before.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I collect task garbages in Python asyncio? - python

Related

Why is create_task() needed to create a queue of coroutines using asyncio gather?

python how to make early return when asyncio generator

"async with" within asyncio.Task not working

How asyncio understands that task is complete for non-blocking operations

How to use async/await in python 3.5+

Categories

Resources