Awaiting for multiple async functions is not really working asynchronously,for example,I am expecting below code to run in ~6 seconds, but it is running like synchronous code and executing in ~10 seconds.
But when I tried it in asyncio.gather, it is executing in ~6 seconds.
Can someone explain why is this so?
#Not working concurrently
async def async_sleep(n):
await asyncio.sleep(n+2)
await asyncio.sleep(n)
start_time = time.time()
asyncio.run(async_sleep(4))
end_time = time.time()
print(end_time-start_time)
#Working concurrently
async def async_sleep(n):
await asyncio.gather(asyncio.sleep(n+2),
asyncio.sleep(n))
Can someone explain why [gather is faster than consecutive awaits]?
That is by design: await x means "do not proceed with this coroutine until x is complete." If you place two awaits one after the other, they will naturally execute sequentially. If you want parallel execution, you need to create tasks and wait for them to finish, or use asyncio.gather which will do it for you.
Related
This question already has answers here:
asyncio.sleep() vs time.sleep()
(2 answers)
Closed 19 days ago.
I am confused by to what extent does the following example from the Python documentation is different from a time.sleep. If you replace the asyncio.sleep with the time.sleep below, both versions last for 3 seconds, I see no difference! Can some one explain what the point of this example in the documentation is? Shouldn't the async version actually last 2 seconds instead? I understand it that both calls to say_hello practically start at the same time (that is the whole sense of asyncio.sleep, right?), so that the whole delay should be just the longest one. Help me to understand what I am understanding wrong.
import asyncio
import time
async def say_after(delay, what):
await asyncio.sleep(delay)
print(what)
async def main():
print(f"started at {time.strftime('%X')}")
await say_after(1, 'hello')
await say_after(2, 'world')
print(f"finished at {time.strftime('%X')}")
asyncio.run(main())
With time.sleep, the program would wait for the specified time before continuing to the next line. This means that the program would be blocked for the entire duration of the sleep, and cannot perform any other tasks in the meantime.
On the other hand, asyncio.sleep is a coroutine function that allows the program to continue executing other tasks while the sleep function is still running. This means that the program is not blocked, and can continue to perform other tasks in parallel.
This is important in a real-world scenario where there may be multiple tasks that need to be executed at the same time, and the program should not be blocked waiting for one task to complete. Using asyncio.sleep allows for more efficient use of resources and time.
asyncio.sleep() vs time.sleep() from the above comment does a good job explaining on a example so you might check that as well
Update:
The comments in the docs to your snippet state:
The following snippet of code will print “hello” after waiting for 1 second, and then print “world” after waiting for another 2 seconds
In this example, the second say_after function call should start immediately after the first call, without waiting for it to finish. However, the asyncio.sleep function suspends the execution of the coroutine until the specified delay has passed. This means that both say_after calls run consecutively and take the total sum of the delays, which is 3 seconds in this case but it does not block the execution of other tasks that might be running concurrently.
You can make two calls for say_after so they run simultaneously which is not possible with time.sleep:
import asyncio
import time
async def say_after(delay, what):
await asyncio.sleep(delay)
print(what)
async def main():
print(f"started at {time.strftime('%X')}")
await asyncio.gather(say_after(1, 'hello'), say_after(2, 'world'))
print(f"finished at {time.strftime('%X')}")
asyncio.run(main())
To have the program take only 2 seconds as you were expecting originally you could also use the next function in the documentation asyncio.create_task() like this:
import asyncio
import time
async def say_after(delay, what):
await asyncio.sleep(delay)
print(what)
async def main():
print(f"started at {time.strftime('%X')}")
tasks = [asyncio.create_task(
say_after(1, 'hello')), asyncio.create_task(say_after(2, 'world'))]
await asyncio.gather(*tasks)
print(f"finished at {time.strftime('%X')}")
asyncio.run(main())
I'm trying to understand how does asyncio.create_task actually work. Suppose I have following code:
import asyncio
import time
async def delayer():
await asyncio.sleep(1)
async def messenger():
await asyncio.sleep(1)
return "A Message"
async def main():
message = await messenger()
await delayer()
start_time = time.time()
asyncio.run(main())
end_time = time.time() - start_time
print(end_time)
The code will take about 2 seconds. But if I make some changes to the body of main like this:
import asyncio
import time
async def delayer():
await asyncio.sleep(1)
async def messenger():
await asyncio.sleep(1)
return "A Message"
async def main():
task1 = asyncio.create_task(delayer())
task2 = asyncio.create_task(delayer())
await task1
await task2
start_time = time.time()
asyncio.run(main())
end_time = time.time() - start_time
print(end_time)
Now the code will take about 1 second.
My understanding from what I read is that await is a blocking process as we can see from the first code. In that code we need to wait 1 second for the messenger function to return, then another second for delayer function.
Now the real question come from the second code. We just learnt that await need us to wait for its expression to return. So even if we use async.create_task, shouldn't awaits in one of the function's body block the process and then return whenever it finishes its job, thus should give us 2 seconds for the program to end?
If that wasn't the case, can you help me understand the asyncio.create_task?
What I know:
await is a blocking process
await executes coroutine function and task object
await makes us possible to pause coroutine process (I don't quite understand about this, too)
create_task creates task object and then schedule and execute it as soon as possible
What I am expecting:
I hope I can get a simple but effective answer about how does asyncio.create_task conduct its work using my sample code.
Perhaps it will help to think in the following way.
You cannot understand what await does until you understand what an event loop is. This line:
asyncio.run(main())
creates and executes an event loop, which is basically an infinite loop with some methods for allowing an exit - a "semi-infinite" loop, so to speak. Until that loop exits, it will be entirely responsible for executing the program. (Here I am assuming that your program has only a single thread and a single Process. I'm not talking about concurrent program in any form.) Each unit of code that can run within an event loop is called a "Task." The idea of the loop is that it can run multiple Tasks by switching from one to another, thus giving the illusion that the CPU is doing more than one thing at a time.
The asyncio.run() call does a second thing: it creates a Task, main(). At that moment, it's the only Task. The event loop begins to run the Task at its first line. Initially it runs just like any other function:
async def main():
task1 = asyncio.create_task(delayer())
task2 = asyncio.create_task(delayer())
await task1
await task2
It creates two more tasks, task1 and task2. Now there are 3 Tasks but only one of them can be active. That's still main(). Then you come to this line:
await task1
The await keyword is what allows this whole rigmarole to work. It is an instruction to the event loop to suspend the active task right here, at this point, and possibly allow another Task to become the active one. So to address your first bullet point, await is neither "blocking" nor is it a "process". Its purpose is to mark a point at which the event loop gets control back from the active Task.
There is another thing happening here. The object that follows the await is called, unimaginatively, an "awaitable" object. Its crucial property is whether or not it is "done." The event loop keeps track of this object; as the loop cycles through its Tasks it will keep checking this object. If it's not done, main() doesn't resume. (This isn't exactly how it's implemented because that would be inefficient, but it's conceptually what's happening.) If you want to say that the await is "blocking" main() until task1 is finished, that's sort-of true; but "blocking" has a technical meaning so it's not the best word to use. In any case, the event loop is not "blocked" at all - it can keep running other Tasks until the awaitable task1 is done. After task1 becomes "done" and main() gets its turn to be the active task, execution continues to the next line of code.
Your second bullet point, "await executes coroutine function and task object" is not correct. await doesn't execute anything. As I said, it just marks a point where the Task gets suspended and the event loop gets control back. Its awaitable determines when the Task can be resumed.
You say, "await makes [it] possible to pause coroutine process". Not quite right - it ALWAYS suspends the current Task. Whether or not there is a significant delay in the Task's execution depends on whether there are other Tasks that are ready to take over, and also the state of its awaitable.
"create_task creates task object and then schedule and execute it as soon as possible." Correct. But "as soon as possible" means the next time the current Task hits an await expression. Other Tasks may get a turn to run first, before the new Task gets a chance to start. Those details are up to the implementation of the event loop. But eventually the new Task will get a turn.
In the comments you ask, "Is it safe if I say that plain await, not being involved in any event loop or any kind of it, works in blocking manner?" It's absolutely not safe to say that. First of all, there is no such thing as a "plain await". Your task must wait FOR something, otherwise how would the event loop know when to resume? An await without an event loop is either a syntax error or a runtime error - it makes no sense, because await is a point where the Task and the event loop interact. The main point is that event loops and await expression are intimately related: an await without an event loop is an error; an event loop without any await expressions is useless.
The closest you can come to a plain await is this expression:
await asyncio.sleep(0)
which has the effect of suspending the current Task momentarily, giving the event loop a chance to run other tasks, resuming this Task as soon as possible.
One other point is that the code:
await task1
is an expression which has a value, in this case the returned value from task1. Since your task1 doesn't return anything this will be None. But if your delayer function looked like this:
async def delayer():
await asyncio.sleep(1)
return "Hello"
then in main() you could write:
print(await task1)
and you would see "Hello" on the console.
I have been learning and exploring Python asyncio for a while. Before starting this journey I have read loads of articles to understand the subtle differences between multithreading, multiprocessing, and asyncio. But, as far as I know, I missed something on about a fundamental issue. I'll try to explain what I mean by pseudocodes below.
import asyncio
import time
async def io_bound():
print("Running io_bound...")
await asyncio.sleep(3)
async def main():
start = time.perf_counter()
result_1 = await io_bound()
result_2 = await io_bound()
end = time.perf_counter()
print(f"Finished in {round(end - start, 0)} second(s).")
asyncio.run(main())
For sure, it will take around 6 seconds because we called the io_bound coroutine directly twice and didn't put them to the event loop. This also means that they were not run concurrently. If I would like to run them concurrently I will have to use asyncio.gather(*tasks) feature. I run them concurrently it would only take 3 seconds for sure.
Let's imagine this io_bound coroutine is a coroutine that queries a database to get back some data. This application could be built with FastAPI roughly as follows.
from fastapi import FastAPI
app = FastAPI()
#app.get("/async-example")
async def async_example():
result_1 = await get_user()
result_2 = await get_countries()
if result_1:
return {"result": result_2}
return {"result": None}
Let's say the get_user and get_countries methods take 3 seconds each and have asynchronous queries implemented correctly. My questions are:
Do I need to use asyncio.gather(*tasks) for these two database queries? If necessary, why? If not, why?
What is the difference between io_bound, which I call twice, and get_user and get_countries, which I call back to back, in the above example?
In the io_bound example, if I did the same thing in FastAPI, wouldn't it take only 6 seconds to give a response back? If so, why not 3 seconds?
In the context of FastAPI, when would be the right time to use asyncio.gather(*tasks) in an endpoint?
Do I need to use asyncio.gather(*tasks) for these two database
queries? If necessary, why? If not, why?
Do you need to? Nope, what you have done works. The request will take 6 seconds but will not be blocking so if you had another request coming in, FastAPI can process the two requests at the same time. I.e. two requests coming in at the same time will take 6 seconds still, rather than 12 seconds.
If the two functions get_user() and get_countries() are independant of eachother, then you can get the run the functions concurrently using either asyncio.gather or any of the many other ways of doing it in asyncio, which will mean the request will now take just 3 seconds. For example:
async def main():
start = time.perf_counter()
result_1_task = asyncio.create_task(io_bound())
result_2_task = asyncio.create_task(io_bound())
result_1 = await result_1_task
result_2 = await result_2_task
end = time.perf_counter()
print(f"Finished in {round(end - start, 0)} second(s).")
or
async def main_2():
start = time.perf_counter()
results = await asyncio.gather(io_bound(), io_bound())
end = time.perf_counter()
print(f"Finished in {round(end - start, 0)} second(s).")
What is the difference between io_bound, which I call twice, and
get_user and get_countries, which I call back to back, in the above
example?
assuming get_user and get_countries just call io_bound, nothing.
In the io_bound example, if I did the same thing in FastAPI,
wouldn't it take only 6 seconds to give a response back? If so, why
not 3 seconds?
It will take 6 seconds. FastAPI doesn't do magic to change the way your functions work, it just allows you to create a server that can easily run asynchronous functions.
In the context of FastAPI, when would be the right time to use
asyncio.gather(*tasks) in an endpoint?
When you want run two or more asyncronous functions concurrently. This is the same, regardless of if you are using FastAPI or any other asynchronous code in python.
I'm trying to understand how asyncio works. As for I/O operation i got understand that when await was called, we register Future object in EventLoop, and then calling epoll for get sockets which belongs to Future objects, that ready for give us data. After we run registred callback and resume function execution.
But, the thing that i cant understant, what's happening if we use await not for I/O operation. How eventloop understands that task is complete? Is it create socket for that or use another kind of loop? Is it use epoll? Or doesnt it add to Loop and used it as generator?
There is an example:
import asyncio
async def test():
return 10
async def my_coro(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...")
await test()
if loop.time() > end_time:
print("Done.")
break
async def main():
await my_coro(3.0)
asyncio.run(main())
await doesn't automatically yield to the event loop, that happens only when an async function (anywhere in the chain of awaits) requests suspension, typically due to IO or timeout not being ready.
In your example the event loop is never returned to, which you can easily verify by moving the "Blocking" print before the while loop and changing main to await asyncio.gather(my_coro(3.0), my_coro(3.0)). What you'll observe is that the coroutines are executed in series ("blocking" followed by "done", all repeated twice), not in parallel ("blocking" followed by another "blocking" and then twice "done"). The reason for that was that there was simply no opportunity for a context switch - my_coro executed in one go as if they were an ordinary function because none of its awaits ever chose to suspend.
I'm trying to use asyncio to handle concurrent network I/O. A very large number of functions are to be scheduled at a single point which vary greatly in time it takes for each to complete. Received data is then processed in a separate process for each output.
The order in which the data is processed is not relevant, so given the potentially very long waiting period for output I'd like to await for whatever future finishes first instead of a predefined order.
def fetch(x):
sleep()
async def main():
futures = [loop.run_in_executor(None, fetch, x) for x in range(50)]
for f in futures:
await f
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Normally, awaiting in order in which futures were queued is fine:
Blue color represents time each task is in executor's queue, i.e. run_in_executor has been called, but the function was not yet executed, as the executor runs only 5 tasks simultaneously; green is time spent on executing the function itself; and the red is the time spent waiting for all previous futures to await.
In my case where functions vary in time greatly, there is a lot of time lost on waiting for previous futures in queue to await, while I could be locally processing GET output. This makes my system idle for a while only to get overwhelmed when several outputs complete simultaneously, then jumping back to idle waiting for more requests to finish.
Is there a way to await whatever future is first completed in the executor?
Looks like you are looking for asyncio.wait with return_when=asyncio.FIRST_COMPLETED.
def fetch(x):
sleep()
async def main():
futures = [loop.run_in_executor(None, fetch, x) for x in range(50)]
while futures:
done, futures = await asyncio.wait(futures,
loop=loop, return_when=asyncio.FIRST_COMPLETED)
for f in done:
await f
loop = asyncio.get_event_loop()
loop.run_until_complete(main())