I decided not use asyncio.sleep() and tried to create my own coroutine function as shown below. Since, time.sleep is an IO bound function, I thought this will print 7 seconds. But it prints 11 seconds.
import time
import asyncio
async def my_sleep(delay):
time.sleep(delay)
async def main():
start = time.time()
await asyncio.gather(my_sleep(4), my_sleep(7))
print("Took", time.time()-start, "seconds")
asyncio.run(main())
# Expected: Took 7 seconds
# Got: Took 11.011508464813232 seconds
Though if I write a similar code with threads, It does print 7 seconds. Do Task objects created by asyncio.gather not recognize time.sleep as an IO bound operation, the way threads do? Please explain why is it happening.
time.sleep is blocking operation for event loop. It has no sense if you write async in defention of function because it not unlock the event loop (no await command)
This two questions might help you to understand more:
Python 3.7 - asyncio.sleep() and time.sleep()
Run blocking and unblocking tasks together with asyncio
This would not work for you because time.sleep is a synchronous function.
From the 'perspective' of the event loop my_sleep might as well be doing a heavy computation within an async function, never yielding the execution context while working.
The first tell tale sign of this is that you're not using an await statement when calling time.sleep.
Making a synchronous function behave as an async one is not trivial, but the common approach is moving the function call to worker threads and awaiting the results.
I'd recommend looking at the solution of anyio, they implemented a run_sync function which does exactly that.
Related
I'm trying to understand how does asyncio.create_task actually work. Suppose I have following code:
import asyncio
import time
async def delayer():
await asyncio.sleep(1)
async def messenger():
await asyncio.sleep(1)
return "A Message"
async def main():
message = await messenger()
await delayer()
start_time = time.time()
asyncio.run(main())
end_time = time.time() - start_time
print(end_time)
The code will take about 2 seconds. But if I make some changes to the body of main like this:
import asyncio
import time
async def delayer():
await asyncio.sleep(1)
async def messenger():
await asyncio.sleep(1)
return "A Message"
async def main():
task1 = asyncio.create_task(delayer())
task2 = asyncio.create_task(delayer())
await task1
await task2
start_time = time.time()
asyncio.run(main())
end_time = time.time() - start_time
print(end_time)
Now the code will take about 1 second.
My understanding from what I read is that await is a blocking process as we can see from the first code. In that code we need to wait 1 second for the messenger function to return, then another second for delayer function.
Now the real question come from the second code. We just learnt that await need us to wait for its expression to return. So even if we use async.create_task, shouldn't awaits in one of the function's body block the process and then return whenever it finishes its job, thus should give us 2 seconds for the program to end?
If that wasn't the case, can you help me understand the asyncio.create_task?
What I know:
await is a blocking process
await executes coroutine function and task object
await makes us possible to pause coroutine process (I don't quite understand about this, too)
create_task creates task object and then schedule and execute it as soon as possible
What I am expecting:
I hope I can get a simple but effective answer about how does asyncio.create_task conduct its work using my sample code.
Perhaps it will help to think in the following way.
You cannot understand what await does until you understand what an event loop is. This line:
asyncio.run(main())
creates and executes an event loop, which is basically an infinite loop with some methods for allowing an exit - a "semi-infinite" loop, so to speak. Until that loop exits, it will be entirely responsible for executing the program. (Here I am assuming that your program has only a single thread and a single Process. I'm not talking about concurrent program in any form.) Each unit of code that can run within an event loop is called a "Task." The idea of the loop is that it can run multiple Tasks by switching from one to another, thus giving the illusion that the CPU is doing more than one thing at a time.
The asyncio.run() call does a second thing: it creates a Task, main(). At that moment, it's the only Task. The event loop begins to run the Task at its first line. Initially it runs just like any other function:
async def main():
task1 = asyncio.create_task(delayer())
task2 = asyncio.create_task(delayer())
await task1
await task2
It creates two more tasks, task1 and task2. Now there are 3 Tasks but only one of them can be active. That's still main(). Then you come to this line:
await task1
The await keyword is what allows this whole rigmarole to work. It is an instruction to the event loop to suspend the active task right here, at this point, and possibly allow another Task to become the active one. So to address your first bullet point, await is neither "blocking" nor is it a "process". Its purpose is to mark a point at which the event loop gets control back from the active Task.
There is another thing happening here. The object that follows the await is called, unimaginatively, an "awaitable" object. Its crucial property is whether or not it is "done." The event loop keeps track of this object; as the loop cycles through its Tasks it will keep checking this object. If it's not done, main() doesn't resume. (This isn't exactly how it's implemented because that would be inefficient, but it's conceptually what's happening.) If you want to say that the await is "blocking" main() until task1 is finished, that's sort-of true; but "blocking" has a technical meaning so it's not the best word to use. In any case, the event loop is not "blocked" at all - it can keep running other Tasks until the awaitable task1 is done. After task1 becomes "done" and main() gets its turn to be the active task, execution continues to the next line of code.
Your second bullet point, "await executes coroutine function and task object" is not correct. await doesn't execute anything. As I said, it just marks a point where the Task gets suspended and the event loop gets control back. Its awaitable determines when the Task can be resumed.
You say, "await makes [it] possible to pause coroutine process". Not quite right - it ALWAYS suspends the current Task. Whether or not there is a significant delay in the Task's execution depends on whether there are other Tasks that are ready to take over, and also the state of its awaitable.
"create_task creates task object and then schedule and execute it as soon as possible." Correct. But "as soon as possible" means the next time the current Task hits an await expression. Other Tasks may get a turn to run first, before the new Task gets a chance to start. Those details are up to the implementation of the event loop. But eventually the new Task will get a turn.
In the comments you ask, "Is it safe if I say that plain await, not being involved in any event loop or any kind of it, works in blocking manner?" It's absolutely not safe to say that. First of all, there is no such thing as a "plain await". Your task must wait FOR something, otherwise how would the event loop know when to resume? An await without an event loop is either a syntax error or a runtime error - it makes no sense, because await is a point where the Task and the event loop interact. The main point is that event loops and await expression are intimately related: an await without an event loop is an error; an event loop without any await expressions is useless.
The closest you can come to a plain await is this expression:
await asyncio.sleep(0)
which has the effect of suspending the current Task momentarily, giving the event loop a chance to run other tasks, resuming this Task as soon as possible.
One other point is that the code:
await task1
is an expression which has a value, in this case the returned value from task1. Since your task1 doesn't return anything this will be None. But if your delayer function looked like this:
async def delayer():
await asyncio.sleep(1)
return "Hello"
then in main() you could write:
print(await task1)
and you would see "Hello" on the console.
Just can't wrap my head around solving this issue, so maybe someone here can enlighten me or maybe even tell me that what I want to achieve isn't possible. :)
Problem statement:
I have an asyncio event loop, on that loop I create a task by supplying my asynchronous coroutine work(). I could then go ahead and cancel the task by invoking its cancel() method - this works.
But in my very special case, the asynchronous task itself spawns another operation, which is an underlying blocking / synchronous function.
What happens now, if I decide to cancel the task, is that my asynchronous work() function will be cancelled appropriately, however, the synchronous function is still going to be executed as if nothing ever happened.
I tried to make an example as simple as possible to illustrate my problem:
import asyncio
import time
def sync_work():
time.sleep(10)
print("sync work completed")
return "sync_work_result"
async def work(loop):
result = await loop.run_in_executor(None, sync_work)
print(f"sync_work {result}")
print("work completed")
async def main(loop):
t1 = loop.create_task(work(loop))
await asyncio.sleep(4)
t1.cancel()
loop = asyncio.get_event_loop()
try:
asyncio.ensure_future(main(loop))
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
print("loop closing")
loop.close()
This will print out sync work completed after about 10 seconds.
How would I invoke the synchronous function in a way, that would allow me to terminate it once my asynchronous task is cancelled? The tricky part is, that I would not have control over sync_work() as this comes from another external package.
I'm open to other approaches of calling my synchronous function from an asynchronous function that would allow it to be terminated properly in some kind of way.
I'm trying to understand how asyncio works. As for I/O operation i got understand that when await was called, we register Future object in EventLoop, and then calling epoll for get sockets which belongs to Future objects, that ready for give us data. After we run registred callback and resume function execution.
But, the thing that i cant understant, what's happening if we use await not for I/O operation. How eventloop understands that task is complete? Is it create socket for that or use another kind of loop? Is it use epoll? Or doesnt it add to Loop and used it as generator?
There is an example:
import asyncio
async def test():
return 10
async def my_coro(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...")
await test()
if loop.time() > end_time:
print("Done.")
break
async def main():
await my_coro(3.0)
asyncio.run(main())
await doesn't automatically yield to the event loop, that happens only when an async function (anywhere in the chain of awaits) requests suspension, typically due to IO or timeout not being ready.
In your example the event loop is never returned to, which you can easily verify by moving the "Blocking" print before the while loop and changing main to await asyncio.gather(my_coro(3.0), my_coro(3.0)). What you'll observe is that the coroutines are executed in series ("blocking" followed by "done", all repeated twice), not in parallel ("blocking" followed by another "blocking" and then twice "done"). The reason for that was that there was simply no opportunity for a context switch - my_coro executed in one go as if they were an ordinary function because none of its awaits ever chose to suspend.
I'm trying to test if I'm correctly using an asyncio library (aiofiles) to see if its blocking.
How can I do this? My idea was to have an infinite loop of asyncio.sleep checking the delay between each call, while I try to insert a blocking call in between
This is my current try, but blocking is never called, I'm assuming the problem is that the infinite loop is actually also blocking.
from asyncio import sleep, get_event_loop
import time
async def test_blocking():
while True:
timea = time.time()
await sleep(1)
timeb = time.time() - timea
delay = timeb - 1.00
if delay >= 0.01:
print(f'Function blocked by {delay}s')
async def blocking():
print('Running blocking function')
time.sleep(5)
loop = get_event_loop()
loop.run_until_complete(test_blocking())
loop.run_until_complete(blocking())
Your test_blocking coroutine is not actually blocking the event loop - it's implemented correctly. However, run_until_complete is a blocking call. So your script is never actually making it to the run_until_complete(blocking()) line - it's waiting for test_blocking to return, which never happens because it's an infinite loop. Use loop.create_task instead, and you'll see the behavior you expect.
Also, you may want to consider using asyncio's debug mode to achieve what you're trying to do here. When enabled, it can alert you when a particular callback takes longer than a given amount of time:
Callbacks taking longer than 100ms are logged. The
loop.slow_callback_duration attribute can be used to set the minimum
execution duration in seconds that is considered “slow”.
When enabled on your test script, it prints this:
Executing <Task finished name='Task-2' coro=<blocking() done, defined at a.py:13> result=None created at /home/dan/.pyenv/versions/3.8.3/lib/python3.8/asyncio/base_events.py:595> took 5.005 seconds
I'm getting the flow of using asyncio in Python 3.5 but I haven't seen a description of what things I should be awaiting and things I should not be or where it would be neglible. Do I just have to use my best judgement in terms of "this is an IO operation and thus should be awaited"?
By default all your code is synchronous. You can make it asynchronous defining functions with async def and "calling" these functions with await. A More correct question would be "When should I write asynchronous code instead of synchronous?". Answer is "When you can benefit from it". In cases when you work with I/O operations as you noted you will usually benefit:
# Synchronous way:
download(url1) # takes 5 sec.
download(url2) # takes 5 sec.
# Total time: 10 sec.
# Asynchronous way:
await asyncio.gather(
async_download(url1), # takes 5 sec.
async_download(url2) # takes 5 sec.
)
# Total time: only 5 sec. (+ little overhead for using asyncio)
Of course, if you created a function that uses asynchronous code, this function should be asynchronous too (should be defined as async def). But any asynchronous function can freely use synchronous code. It makes no sense to cast synchronous code to asynchronous without some reason:
# extract_links(url) should be async because it uses async func async_download() inside
async def extract_links(url):
# async_download() was created async to get benefit of I/O
html = await async_download(url)
# parse() doesn't work with I/O, there's no sense to make it async
links = parse(html)
return links
One very important thing is that any long synchronous operation (> 50 ms, for example, it's hard to say exactly) will freeze all your asynchronous operations for that time:
async def extract_links(url):
data = await download(url)
links = parse(data)
# if search_in_very_big_file() takes much time to process,
# all your running async funcs (somewhere else in code) will be frozen
# you need to avoid this situation
links_found = search_in_very_big_file(links)
You can avoid it calling long running synchronous functions in separate process (and awaiting for result):
executor = ProcessPoolExecutor(2)
async def extract_links(url):
data = await download(url)
links = parse(data)
# Now your main process can handle another async functions while separate process running
links_found = await loop.run_in_executor(executor, search_in_very_big_file, links)
One more example: when you need to use requests in asyncio. requests.get is just synchronous long running function, which you shouldn't call inside async code (again, to avoid freezing). But it's running long because of I/O, not because of long calculations. In that case, you can use ThreadPoolExecutor instead of ProcessPoolExecutor to avoid some multiprocessing overhead:
executor = ThreadPoolExecutor(2)
async def download(url):
response = await loop.run_in_executor(executor, requests.get, url)
return response.text
You do not have much freedom. If you need to call a function you need to find out if this is a usual function or a coroutine. You must use the await keyword if and only if the function you are calling is a coroutine.
If async functions are involved there should be an "event loop" which orchestrates these async functions. Strictly speaking it's not necessary, you can "manually" run the async method sending values to it, but probably you don't want to do it. The event loop keeps track of not-yet-finished coroutines and chooses the next one to continue running. asyncio module provides an implementation of event loop, but this is not the only possible implementation.
Consider these two lines of code:
x = get_x()
do_something_else()
and
x = await aget_x()
do_something_else()
Semantic is absolutely the same: call a method which produces some value, when the value is ready assign it to variable x and do something else. In both cases the do_something_else function will be called only after the previous line of code is finished. It doesn't even mean that before or after or during the execution of asynchronous aget_x method the control will be yielded to event loop.
Still there are some differences:
the second snippet can appear only inside another async function
aget_x function is not usual, but coroutine (that is either declared with async keyword or decorated as coroutine)
aget_x is able to "communicate" with the event loop: that is yield some objects to it. The event loop should be able to interpret these objects as requests to do some operations (f.e. to send a network request and wait for response, or just suspend this coroutine for n seconds). Usual get_x function is not able to communicate with event loop.