asyncio cancel task and related function run via run_in_executor - python

Just can't wrap my head around solving this issue, so maybe someone here can enlighten me or maybe even tell me that what I want to achieve isn't possible. :)
Problem statement:
I have an asyncio event loop, on that loop I create a task by supplying my asynchronous coroutine work(). I could then go ahead and cancel the task by invoking its cancel() method - this works.
But in my very special case, the asynchronous task itself spawns another operation, which is an underlying blocking / synchronous function.
What happens now, if I decide to cancel the task, is that my asynchronous work() function will be cancelled appropriately, however, the synchronous function is still going to be executed as if nothing ever happened.
I tried to make an example as simple as possible to illustrate my problem:
import asyncio
import time
def sync_work():
time.sleep(10)
print("sync work completed")
return "sync_work_result"
async def work(loop):
result = await loop.run_in_executor(None, sync_work)
print(f"sync_work {result}")
print("work completed")
async def main(loop):
t1 = loop.create_task(work(loop))
await asyncio.sleep(4)
t1.cancel()
loop = asyncio.get_event_loop()
try:
asyncio.ensure_future(main(loop))
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
print("loop closing")
loop.close()
This will print out sync work completed after about 10 seconds.
How would I invoke the synchronous function in a way, that would allow me to terminate it once my asynchronous task is cancelled? The tricky part is, that I would not have control over sync_work() as this comes from another external package.
I'm open to other approaches of calling my synchronous function from an asynchronous function that would allow it to be terminated properly in some kind of way.

Related

How can I make a recurring async task (I don't control where asyncio.run() is called)

I'm using a library that itself makes the call to asyncio.run(internal_function) so I can't control that at all. I do however have access to the event loop, it's something that I pass into this library.
Given that, is there some way I can set up an recurring async event that will execute every X seconds while the main library is running.
This doesn't exactly work, but maybe it's close?
import asyncio
from third_party import run
loop = asyncio.new_event_loop()
async def periodic():
while True:
print("doing a thing...")
await asyncio.sleep(30)
loop.create_task(periodic())
run(loop) # internally this will call asyncio.run() using the given loop
The problem here of course is that the task I've created is never awaited. But I can't just await it, because that would block.
Edit: Here's a working example of what I'm facing. When you run this code you will only ever see "third party code executing" and never see "doing my stuff...".
import asyncio
# I don't know how the loop argument is used
# by the third party's run() function,
def third_party_run(loop):
async def runner():
while True:
print("third party code executing")
await asyncio.sleep(5)
# but I do know that this third party eventually runs code
# that looks **exactly** like this.
try:
asyncio.run(runner())
except KeyboardInterrupt:
return
loop = asyncio.new_event_loop()
async def periodic():
while True:
print("doing my stuff...")
await asyncio.sleep(1)
loop.create_task(periodic())
third_party_run(loop)
If you run the above code you get:
third party code executing
third party code executing
third party code executing
^CTask was destroyed but it is pending!
task: <Task pending name='Task-1' coro=<periodic() running at example.py:22>>
/usr/local/Cellar/python#3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py:674: RuntimeWarning: coroutine 'periodic' was never awaited
You don't need to await on a created task.
It will run in the background as long as the event loop is active and is not stuck in a CPU bound operation.
According to your comment, you don't have an access to the event loop. In this case you don't have many options other than running in a different thread (which will have its own loop), or changing the loop creation policy in order to get the event loop, which is a very bad idea in most cases.
I found a way to make your test program run. However, it's a hack. It could fail, depending on the internal design of your third party library. From the information you provided, the library has been structured to be a black box. You can't interact with the event loop or schedule a callback. It seems like there might be a very good reason for this.
If I were you I would try to contact the library designer and let him know what your problem is. Perhaps there is a better solution. If this is a commercial project, I would make 100% certain that the team understands the issue, before attempting to use my below solution or anything like it.
The script below overrides one method (new_event_loop) in the DefaultEventLoopPolicy. When this method is called, I create a task in this loop to execute your periodic function. I don't know how often, or for what purpose, the library will call this function. Also, if the library internally overrides the EventLoopPolicy then this solution will not work. In both of these cases it may lead to unforeseeable consequences.
OK, enough disclaimers.
The only significant change to your test script was to replace the infinite loop in runner with a one that times out. This allowed me to verify that the program shuts down cleanly.
import asyncio
# I don't know how the loop argument is used
# by the third party's run() function,
def third_party_run():
async def runner():
for _ in range(4):
print("third party code executing")
await asyncio.sleep(5)
# but I do know that this third party eventually runs code
# that looks **exactly** like this.
try:
asyncio.run(runner())
except KeyboardInterrupt:
return
async def periodic():
while True:
print("doing my stuff...")
await asyncio.sleep(1)
class EventLoopPolicyHack(asyncio.DefaultEventLoopPolicy):
def __init__(self):
self.__running = None
super().__init__()
def new_event_loop(self):
# Override to create our periodic task in the new loop
# Get a loop from the superclass.
# This method must return that loop.
print("New event loop")
loop = super().new_event_loop()
if self.__running is not None:
self.__running.cancel() # I have no way to test this idea
self.__running = loop.create_task(periodic())
return loop
asyncio.set_event_loop_policy(EventLoopPolicyHack())
third_party_run()

Is this a good alternative of asyncio.sleep

I decided not use asyncio.sleep() and tried to create my own coroutine function as shown below. Since, time.sleep is an IO bound function, I thought this will print 7 seconds. But it prints 11 seconds.
import time
import asyncio
async def my_sleep(delay):
time.sleep(delay)
async def main():
start = time.time()
await asyncio.gather(my_sleep(4), my_sleep(7))
print("Took", time.time()-start, "seconds")
asyncio.run(main())
# Expected: Took 7 seconds
# Got: Took 11.011508464813232 seconds
Though if I write a similar code with threads, It does print 7 seconds. Do Task objects created by asyncio.gather not recognize time.sleep as an IO bound operation, the way threads do? Please explain why is it happening.
time.sleep is blocking operation for event loop. It has no sense if you write async in defention of function because it not unlock the event loop (no await command)
This two questions might help you to understand more:
Python 3.7 - asyncio.sleep() and time.sleep()
Run blocking and unblocking tasks together with asyncio
This would not work for you because time.sleep is a synchronous function.
From the 'perspective' of the event loop my_sleep might as well be doing a heavy computation within an async function, never yielding the execution context while working.
The first tell tale sign of this is that you're not using an await statement when calling time.sleep.
Making a synchronous function behave as an async one is not trivial, but the common approach is moving the function call to worker threads and awaiting the results.
I'd recommend looking at the solution of anyio, they implemented a run_sync function which does exactly that.

Scheduling periodic function call in Quart/asyncio

I need to schedule a periodic function call in python (ie. called every minute), without blocking the event loop (I'm using Quart framework with asyncio).
Essentially need to submit work onto the event loop, with a timer, so that the webserver keeps serving incoming requests in the meantime and roughly every minute it calls my function.
I tried many ways, for instance:
def do_work():
print("WORK", flush=True)
async def schedule():
await asyncio.sleep(0)
print("scheduling")
loop = asyncio.get_running_loop()
t = loop.call_later(2, do_work)
print("scheduled")
asyncio.run(schedule())
But it either never gets executed (like the code above), or it blocks the webserver main event loop. For instance, with the code above I would expect (since it's done within asyncio.run and schedule awaits timer) that "scheduling" would be printed after (or during) the server setup, but that's not the case, it blocks.
You can use a background task that is started on startup,
async def schedule():
while True:
await asyncio.sleep(1)
await do_work()
#app.before_serving
async def startup():
app.add_background_task(schedule)
which will run schedule for the lifetime of the app, being cancelled at shutdown.

How asyncio understands that task is complete for non-blocking operations

I'm trying to understand how asyncio works. As for I/O operation i got understand that when await was called, we register Future object in EventLoop, and then calling epoll for get sockets which belongs to Future objects, that ready for give us data. After we run registred callback and resume function execution.
But, the thing that i cant understant, what's happening if we use await not for I/O operation. How eventloop understands that task is complete? Is it create socket for that or use another kind of loop? Is it use epoll? Or doesnt it add to Loop and used it as generator?
There is an example:
import asyncio
async def test():
return 10
async def my_coro(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...")
await test()
if loop.time() > end_time:
print("Done.")
break
async def main():
await my_coro(3.0)
asyncio.run(main())
await doesn't automatically yield to the event loop, that happens only when an async function (anywhere in the chain of awaits) requests suspension, typically due to IO or timeout not being ready.
In your example the event loop is never returned to, which you can easily verify by moving the "Blocking" print before the while loop and changing main to await asyncio.gather(my_coro(3.0), my_coro(3.0)). What you'll observe is that the coroutines are executed in series ("blocking" followed by "done", all repeated twice), not in parallel ("blocking" followed by another "blocking" and then twice "done"). The reason for that was that there was simply no opportunity for a context switch - my_coro executed in one go as if they were an ordinary function because none of its awaits ever chose to suspend.

When to use and when not to use Python 3.5 `await` ?

I'm getting the flow of using asyncio in Python 3.5 but I haven't seen a description of what things I should be awaiting and things I should not be or where it would be neglible. Do I just have to use my best judgement in terms of "this is an IO operation and thus should be awaited"?
By default all your code is synchronous. You can make it asynchronous defining functions with async def and "calling" these functions with await. A More correct question would be "When should I write asynchronous code instead of synchronous?". Answer is "When you can benefit from it". In cases when you work with I/O operations as you noted you will usually benefit:
# Synchronous way:
download(url1) # takes 5 sec.
download(url2) # takes 5 sec.
# Total time: 10 sec.
# Asynchronous way:
await asyncio.gather(
async_download(url1), # takes 5 sec.
async_download(url2) # takes 5 sec.
)
# Total time: only 5 sec. (+ little overhead for using asyncio)
Of course, if you created a function that uses asynchronous code, this function should be asynchronous too (should be defined as async def). But any asynchronous function can freely use synchronous code. It makes no sense to cast synchronous code to asynchronous without some reason:
# extract_links(url) should be async because it uses async func async_download() inside
async def extract_links(url):
# async_download() was created async to get benefit of I/O
html = await async_download(url)
# parse() doesn't work with I/O, there's no sense to make it async
links = parse(html)
return links
One very important thing is that any long synchronous operation (> 50 ms, for example, it's hard to say exactly) will freeze all your asynchronous operations for that time:
async def extract_links(url):
data = await download(url)
links = parse(data)
# if search_in_very_big_file() takes much time to process,
# all your running async funcs (somewhere else in code) will be frozen
# you need to avoid this situation
links_found = search_in_very_big_file(links)
You can avoid it calling long running synchronous functions in separate process (and awaiting for result):
executor = ProcessPoolExecutor(2)
async def extract_links(url):
data = await download(url)
links = parse(data)
# Now your main process can handle another async functions while separate process running
links_found = await loop.run_in_executor(executor, search_in_very_big_file, links)
One more example: when you need to use requests in asyncio. requests.get is just synchronous long running function, which you shouldn't call inside async code (again, to avoid freezing). But it's running long because of I/O, not because of long calculations. In that case, you can use ThreadPoolExecutor instead of ProcessPoolExecutor to avoid some multiprocessing overhead:
executor = ThreadPoolExecutor(2)
async def download(url):
response = await loop.run_in_executor(executor, requests.get, url)
return response.text
You do not have much freedom. If you need to call a function you need to find out if this is a usual function or a coroutine. You must use the await keyword if and only if the function you are calling is a coroutine.
If async functions are involved there should be an "event loop" which orchestrates these async functions. Strictly speaking it's not necessary, you can "manually" run the async method sending values to it, but probably you don't want to do it. The event loop keeps track of not-yet-finished coroutines and chooses the next one to continue running. asyncio module provides an implementation of event loop, but this is not the only possible implementation.
Consider these two lines of code:
x = get_x()
do_something_else()
and
x = await aget_x()
do_something_else()
Semantic is absolutely the same: call a method which produces some value, when the value is ready assign it to variable x and do something else. In both cases the do_something_else function will be called only after the previous line of code is finished. It doesn't even mean that before or after or during the execution of asynchronous aget_x method the control will be yielded to event loop.
Still there are some differences:
the second snippet can appear only inside another async function
aget_x function is not usual, but coroutine (that is either declared with async keyword or decorated as coroutine)
aget_x is able to "communicate" with the event loop: that is yield some objects to it. The event loop should be able to interpret these objects as requests to do some operations (f.e. to send a network request and wait for response, or just suspend this coroutine for n seconds). Usual get_x function is not able to communicate with event loop.

Categories