I'm trying to use asyncio to handle concurrent network I/O. A very large number of functions are to be scheduled at a single point which vary greatly in time it takes for each to complete. Received data is then processed in a separate process for each output.
The order in which the data is processed is not relevant, so given the potentially very long waiting period for output I'd like to await for whatever future finishes first instead of a predefined order.
def fetch(x):
sleep()
async def main():
futures = [loop.run_in_executor(None, fetch, x) for x in range(50)]
for f in futures:
await f
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Normally, awaiting in order in which futures were queued is fine:
Blue color represents time each task is in executor's queue, i.e. run_in_executor has been called, but the function was not yet executed, as the executor runs only 5 tasks simultaneously; green is time spent on executing the function itself; and the red is the time spent waiting for all previous futures to await.
In my case where functions vary in time greatly, there is a lot of time lost on waiting for previous futures in queue to await, while I could be locally processing GET output. This makes my system idle for a while only to get overwhelmed when several outputs complete simultaneously, then jumping back to idle waiting for more requests to finish.
Is there a way to await whatever future is first completed in the executor?
Looks like you are looking for asyncio.wait with return_when=asyncio.FIRST_COMPLETED.
def fetch(x):
sleep()
async def main():
futures = [loop.run_in_executor(None, fetch, x) for x in range(50)]
while futures:
done, futures = await asyncio.wait(futures,
loop=loop, return_when=asyncio.FIRST_COMPLETED)
for f in done:
await f
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Related
I need to send HTTP requests and do some CPU intensive task while waiting for the response. I tried to mock the situation with an asyncio.sleep and a CPU task below:
import asyncio
async def main():
loop = asyncio.get_event_loop()
start = loop.time()
task = asyncio.create_task(asyncio.sleep(1))
# ------Useless CPU-Bound Task------ #
for n in range(10 ** 7):
n **= 7
# ---------------------------------- #
print(f"CPU-bound process finished in {loop.time()-start:.2f} seconds.")
await task
print(f"Finished in {loop.time()-start:.2f} seconds.")
asyncio.run(main())
Output:
CPU-bound process finished in 2.12 seconds.
Finished in 3.12 seconds.
I expected the sleeping task to proceed during the CPU process but apparently they ran synchronously. This also makes me worry about the requests that I need to send such that CPU process might begin and completely block the requests so that they don't get sent to the server until the process finishes etc.
So the question is why does this happen and how to prevent it?
I've also read somewhere that asyncio only switches context upon await calls. Does this have disadvantages in a situation like this, if so, how?
Append: Will using threading have any advantages over asyncio in this scenario? I know it's many questions, but I'm really confused.
Asyncio tasks are more co-operative concurrency than true concurrency.
Your sleeper task won't actually start running until you "yield" control to it, which is usually done with an await call. Since that happens after your main (CPU-intensive) code is finished, there will be an extra second after that before everything is actually done.
An await asyncio.sleep(0) between sleeper task creation and CPU-intensive work will allow the sleeper task to commence. It will them immediately yield back to the main task and they'll run "concurrently".
Of course, a CPU-bound async task sort of defeats the purpose of asyncio since it won't yield to allow other tasks to run in a timely manner. That doesn't really matter for this sleeper but, if it was a task that had to do thirty things, one per second, that would be a problem.
If you need to do anything like that, it's a good idea to either choose one of the other forty-eight ways of doing concurrency in Python :-), or yield enough in the main task so that other tasks can run. In other words, something like:
yield_cycle = 0.1 # Cycle time.
then = time.monotonic() # Base time.
for n in range(10 ** 7):
n **= 7
if time.monotonic() - then > yield_cycle: # Check cycle time.
await asyncio.sleep(0) # Yield if exceeded.
then = time.monotonic() # Prep next cycle.
In fact, we have a helper function in our own code base which does exactly this. I can't give you the actual source code but I think it's (hopefully) simple enough to recite from memory:
async def play_nice(secs: float, base: float) -> float:
"""Yield periodically in intensive task.
Initial call can use negative base to yield immediately.
Args:
secs: Minimum run time before yield will happen.
base: Base monotonic time to use for calculations.
Returns:
New base time to use.
"""
if base < 0:
base = time.monotonic() - secs
if time.monotonic() - base >= secs:
await asyncio.sleep(0)
return time.monotonic()
return base
# Your code is then:
then = await play_nice(secs=0.1, base=-1) # Initial yield.
for n in range(10 ** 7):
n **= 7
then = await play_nice(secs=0.1, base=then) # Subsequent ones.
The reason is your CPU intensive task has the control until it yields it. You can force it to yield using sleep:
sleep() always suspends the current task, allowing other tasks to run.
Setting the delay to 0 provides an optimized path to allow other tasks to run. This can be used by long-running functions to avoid blocking the event loop for the full duration of the function call.
import asyncio
async def test_sleep(n):
await asyncio.sleep(n)
async def main():
loop = asyncio.get_event_loop()
start = loop.time()
task = asyncio.create_task(asyncio.sleep(1))
await asyncio.sleep(0)
# ------Useless CPU-Bound Task------ #
for n in range(10 ** 7):
n **= 7
# ---------------------------------- #
print(f'CPU-bound process finished in {loop.time()-start:.2f} seconds.')
await task
print(f"Finished in {loop.time()-start:.2f} seconds.")
await main()
Will output
CPU-bound process finished in 4.21 seconds.
Finished in 4.21 seconds.
If I have a coroutine currently sleeping to allow other coroutines to run, is is possible to change the sleep time while sleeping? Or would I have to cancel and restart the coroutine. I think I may have just answered myself there. Looking for help from the more experienced.
The "sleep" coroutine is obviously designed to be simple: it pauses for that amount of time, and it is it.
What you seem to need is a way to synchronize your co-routines, and if no signal gets back in an specified amount of time (the time you are passing to sleep), to move on.
Take a look at the synchronization primitives https://docs.python.org/3.6/library/asyncio-sync.html and asyncio.wait_for
So, you can instead of asyncio.sleep, call a co-routine, with wait_for, where it expects an Event, or a Lock release. The Event or lock-release then is used by whatever part of your code would "cancel sleep" anyway.
I created an example to show both sleeping running to the end, and being canceled.
import asyncio
async def interruptable_sleep(time, event):
try:
await asyncio.wait_for(event.wait(), timeout=time)
except asyncio.TimeoutError:
print("'sleeping' proceeded normaly")
else:
print("'sleeping' canceled")
async def sleeper(m, n, event):
await asyncio.sleep(n)
if n == 3:
event.set()
print(f"cycle {m}, step {n}")
async def main():
event = asyncio.Event()
tasks = []
for cycle in range(3):
event.clear()
# create batch of async tasks to run in parallel
for step in range(6):
tasks.append(asyncio.create_task(sleeper(cycle, step, event), name=f"{cycle}_{step}"))
await interruptable_sleep(2, event)
# 'join' remaining tasks
event.set()
await asyncio.gather(*tasks)
asyncio.run(main())
This pattern sort of "reverses" the idea of a timeout: if a task finishes early, the waiting is canceled . (while timeout means "if a task is too late, cancel it") -
But maybe ou just need the other pattern there: to create a list of all your tasks and call asyncio.gather, rather than calling "sleep" to give "time for the other tasks to run".
What can occur if one or more workers call 'Synchronous function' simultaneously ?
Maybe one or more workers become blocked for a while ?
async def worker(queue):
while True:
queue_out = await queue.get()
file_name = queue_out.file.name
# Create path + file_name
destination_path = create_path(file_name) #<-- SYNC function
await download_medical(queue_out,destination_path)
async def main():
queue_in = asyncio.Queue(1)
workers = [asyncio.create_task(worker(queue_in)) for _ in range(5)]
async for result in get_result(building):
await queue_in.put(result)
def create_path(file_name):
#....#
#operations related to file and folder on the hdd
#creates a folder based on file name
Short answer:
If you call a synchronous (blocking) function from within an async coroutine, all the tasks that are concurrently running in the loop will stall until this function returns.
Use loop.run_in_executor(...) to asynchronous run blocking functions in another thread or subprocess.
async def worker(queue):
loop = Asyncio.get_event_loop() # get a handle to the current run loop
while True:
queue_out = await queue.get()
file_name = queue_out.file.name
# run blocking function in an executor
create_path_task = loop.run_in_executor(None, create_path, file_name)
destination_path = await create_path_task # wait for this task to finish
await download_medical(queue_out, destination_path)
Background:
Note that async functions (coroutines) do not run tasks in parallel, they run concurrently which may appear to run simultaneously. The easiest way to think about this is by realising that every time await is called, i.e, while a result is being waited for, the event loop will pause the currently running coroutine and run another coroutine until that awaits on something and so on; hence making it cooperatively concurrent.
Awaits are usually made on IO operations as they are time consuming and are not cpu-intensive. CPU intensive operation will block the loop until it completes. Also note that regular IO operations are blocking in nature, if you want to benefit from concurrency then you must use Asyncio compatible libraries like aiofile, aiohttp etc.
More about executors:
The easiest way to run regular sync functions without blocking the event loop is to use loop.run_in_executor. The first argument takes an executor like ThreadPoolExecutor or ProcessPoolExecutor from the concurrent.futures module. By passing None, Asyncio will automatically run your function in a default ThreadPoolExecutor. If your task is cpu intensive, use ProcessPoolExecutor so that it can use multiple cpu-cores and run truly in parallel.
Is it possible to share an asyncio.Queue over different tasks in one event loop?
The usecase:
Two tasks are publishing data on a queue, and one task is grabbing the new items from the Queue. All tasks in an asynchronous way.
main.py
import asyncio
import creator
async def pull_message(queue):
while True:
# Here I dont get messages, maybe the queue is always
# occupied by a other task?
msg = await queue.get()
print(msg)
if __name__ == "__main__"
loop = asyncio.get_event_loop()
queue = asyncio.Queue(loop=loop)
future = asyncio.ensure_future(pull_message(queue))
creators = list()
for i in range(2):
creators.append(loop.create_task(cr.populate_msg(queue)))
# add future to creators for easy handling
creators.append(future)
loop.run_until_complete(asyncio.gather(*creators))
creator.py
import asyncio
async def populate_msg(queue):
while True:
msg = "Foo"
await queue.put(msg)
The problem in your code is that populate_msg doesn't yield to the event loop because the queue is unbounded. This is somewhat counter-intuitive because the coroutine clearly contains an await, but that await only suspends the execution of the coroutine if the coroutine would otherwise block. Since put() on an unbounded queue never blocks, populate_msg is the only thing executed by the event loop.
The problem will go away once you change populate_msg to actually do something else (like await a network event). For testing purposes you can add await asyncio.sleep(0) inside the loop, which will force the coroutine to yield control to the event loop at every iteration of the while loop. Note that this will cause the event loop to spend an entire core by continuously spinning the loop.
Suppose I have some tasks running asynchronously. They may be totally independent, but I still want to set points where the tasks will pause so they can run concurrently.
What is the correct way to run the tasks concurrently? I am currently using await asyncio.sleep(0), but I feel this is adding a lot of overhead.
import asyncio
async def do(name, amount):
for i in range(amount):
# Do some time-expensive work
print(f'{name}: has done {i}')
await asyncio.sleep(0)
return f'{name}: done'
async def main():
res = await asyncio.gather(do('Task1', 3), do('Task2', 2))
print(*res, sep='\n')
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output
Task1: has done 0
Task2: has done 0
Task1: has done 1
Task2: has done 1
Task1: has done 2
Task1: done
Task2: done
If we were using simple generators, an empty yield would pause the flow of a task without any overhead, but empty await are not valid.
What is the correct way to set such breakpoints without overhead?
As mentioned in the comments, normally asyncio coroutines suspend automatically on calls that would block or sleep in equivalent synchronous code. In your case the coroutine is CPU-bound, so awaiting blocking calls is not enough, it needs to occasionally relinquish control to the event loop to allow the rest of the system to run.
Explicit yields are not uncommon in cooperative multitasking, and using await asyncio.sleep(0) for that purpose will work as intended, it does carry a risk: sleep too often, and you're slowing down the computation by unnecessary switches; sleep too seldom, and you're hogging the event loop by spending too much time in a single coroutine.
The solution provided by asyncio is to offload CPU-bound code to a thread pool using run_in_executor. Awaiting it will automatically suspend the coroutine until the CPU-intensive task is done, without any intermediate polling. For example:
import asyncio
def do(id, amount):
for i in range(amount):
# Do some time-expensive work
print(f'{id}: has done {i}')
return f'{id}: done'
async def main():
loop = asyncio.get_event_loop()
res = await asyncio.gather(
loop.run_in_executor(None, do, 'Task1', 5),
loop.run_in_executor(None, do, 'Task2', 3))
print(*res, sep='\n')
loop = asyncio.get_event_loop()
loop.run_until_complete(main())