How to call synchronous function(s) from async functions in safe manner - python

What can occur if one or more workers call 'Synchronous function' simultaneously ?
Maybe one or more workers become blocked for a while ?
async def worker(queue):
while True:
queue_out = await queue.get()
file_name = queue_out.file.name
# Create path + file_name
destination_path = create_path(file_name) #<-- SYNC function
await download_medical(queue_out,destination_path)
async def main():
queue_in = asyncio.Queue(1)
workers = [asyncio.create_task(worker(queue_in)) for _ in range(5)]
async for result in get_result(building):
await queue_in.put(result)
def create_path(file_name):
#....#
#operations related to file and folder on the hdd
#creates a folder based on file name

Short answer:
If you call a synchronous (blocking) function from within an async coroutine, all the tasks that are concurrently running in the loop will stall until this function returns.
Use loop.run_in_executor(...) to asynchronous run blocking functions in another thread or subprocess.
async def worker(queue):
loop = Asyncio.get_event_loop() # get a handle to the current run loop
while True:
queue_out = await queue.get()
file_name = queue_out.file.name
# run blocking function in an executor
create_path_task = loop.run_in_executor(None, create_path, file_name)
destination_path = await create_path_task # wait for this task to finish
await download_medical(queue_out, destination_path)
Background:
Note that async functions (coroutines) do not run tasks in parallel, they run concurrently which may appear to run simultaneously. The easiest way to think about this is by realising that every time await is called, i.e, while a result is being waited for, the event loop will pause the currently running coroutine and run another coroutine until that awaits on something and so on; hence making it cooperatively concurrent.
Awaits are usually made on IO operations as they are time consuming and are not cpu-intensive. CPU intensive operation will block the loop until it completes. Also note that regular IO operations are blocking in nature, if you want to benefit from concurrency then you must use Asyncio compatible libraries like aiofile, aiohttp etc.
More about executors:
The easiest way to run regular sync functions without blocking the event loop is to use loop.run_in_executor. The first argument takes an executor like ThreadPoolExecutor or ProcessPoolExecutor from the concurrent.futures module. By passing None, Asyncio will automatically run your function in a default ThreadPoolExecutor. If your task is cpu intensive, use ProcessPoolExecutor so that it can use multiple cpu-cores and run truly in parallel.

Related

How to encapsulate asyncio code in Python?

I'd like to use asyncio to do a lot of simultaneous non-blocking IO in Python. However, I want that use of asyncio to be abstracted away from the user--under the hood there's a lot of asychronous calls going on simultaneously to speed things up, but for the user there's a single, synchronous call.
Basically something like this:
async def _slow_async_fn(address):
data = await async_load_data(address)
return data
def synchronous_blocking_io()
addresses = ...
tasks = []
for address in addresses:
tasks.append(_slow_async_fn(address))
all_results = some_fn(asyncio.gather(*tasks))
return all_results
The problem is, how can I achieve this in a way that's agnostic to the user's running environment? I use a pattern like asyncio.get_event_loop().run_until_complete(), I run into issues if the code is being called inside an environment like Jupyter where there's already an event loop running. Is there a way to robustly gather the results of a set of asynchronous tasks that doesn't require pushing async/await statements all the way up the program?
The restriction on running loops is per thread, so running a new event loop is possible, as long as it is in a new thread.
import asyncio
import concurrent.futures
async def gatherer_of(tasks):
# It's necessary to wrap asyncio.gather() in a coroutine (reasons beyond scope)
return await asyncio.gather(*tasks)
def synchronous_blocking_io():
addresses = ...
tasks = []
for address in addresses:
tasks.append(_slow_async_fn(address))
loop = asyncio.new_event_loop()
return loop.run_until_complete(gatherer_of(tasks))
def synchronous_blocking_io_wrapper():
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
fut = executor.submit(synchronous_blocking_io)
return fut.result()
# Testing
async def async_runner():
# Simulating execution from a running loop
return synchronous_blocking_io_wrapper()
# Run from synchronous client
# print(synchronous_blocking_io_wrapper())
# Run from async client
# print(asyncio.run(async_runner()))
The same result can be achieved with the ProcessPoolExecutor, by manually running synchronous_blocking_io in a new thread and joining it, starting an entirely new process and so forth. As long as you are not in the same thread, you won't conflict with any running event loop.

Scheduling periodic function call in Quart/asyncio

I need to schedule a periodic function call in python (ie. called every minute), without blocking the event loop (I'm using Quart framework with asyncio).
Essentially need to submit work onto the event loop, with a timer, so that the webserver keeps serving incoming requests in the meantime and roughly every minute it calls my function.
I tried many ways, for instance:
def do_work():
print("WORK", flush=True)
async def schedule():
await asyncio.sleep(0)
print("scheduling")
loop = asyncio.get_running_loop()
t = loop.call_later(2, do_work)
print("scheduled")
asyncio.run(schedule())
But it either never gets executed (like the code above), or it blocks the webserver main event loop. For instance, with the code above I would expect (since it's done within asyncio.run and schedule awaits timer) that "scheduling" would be printed after (or during) the server setup, but that's not the case, it blocks.
You can use a background task that is started on startup,
async def schedule():
while True:
await asyncio.sleep(1)
await do_work()
#app.before_serving
async def startup():
app.add_background_task(schedule)
which will run schedule for the lifetime of the app, being cancelled at shutdown.

multiprocessing.Process and asyncio loop communication

import asyncio
from multiprocessing import Queue, Process
import time
task_queue = Queue()
# This is simulating the task
async def do_task(task_number):
for progress in range(task_number):
print(f'{progress}/{task_number} doing')
await asyncio.sleep(10)
# This is the loop that accepts and runs tasks
async def accept_tasks():
event_loop = asyncio.get_event_loop()
while True:
task_number = task_queue.get() <-- this blocks event loop from running do_task()
event_loop.create_task(do_task(task_number))
# This is the starting point of the process,
# the event loop runs here
def worker():
event_loop = asyncio.get_event_loop()
event_loop.run_until_complete(accept_tasks())
# Run a new process
Process(target=worker).start()
# Simulate adding tasks every 1 second
for _ in range(1,50):
task_queue.put(_)
print('added to queue', _)
time.sleep(1)
I'm trying to run a separate process that runs an event loop to do I/O operations. Now, from a parent process, I'm trying to "queue-in" tasks. The problem is that do_task() does not run. The only solution that works is polling (i.e. checking if empty, then sleeping X seconds).
After some researching, the problem seems to be that task_queue.get() isn't doing event-loop-friendly IO.
aiopipe provides a solution, but assumes both processes are running in an event loop.
I tried creating this. But the consumer isn't consuming anything...
read_fd, write_fd = os.pipe()
consumer = AioPipeReader(read_fd)
producer = os.fdopen(write_fd, 'w')
A simple workaround for this situation is to change task_number = task_queue.get() to task_number = await event_loop.run_in_executor(None, task_queue.get). That way the blocking Queue.get() function will be off-loaded to a thread pool and the current coroutine suspended, as a good asyncio citizen. Likewise, once the thread pool finishes with the function, the coroutine will resume execution.
This approach is a workaround because it doesn't scale to a large number of concurrent tasks: each blocking call "turned async" that way will take a slot in the thread pool, and those that exceed the pool's maximum number of workers will not even start executing before a threed frees up. For example, rewriting all of asyncio to call blocking functions through run_in_executor would just result in a badly written threaded system. However, if you know that you have a small number of child processes, using run_in_executor is correct and can solve the problem very effectively.
I finally figured it out. There is a known way to do this with aiopipe library. But it's made to run on two event loops on two different processes. In my case, I only have the child process running an event loop. To solve that, I changed the writing part into a unbuffered normal write using open(fd, buffering=0).
Here is the code without any library.
import asyncio
from asyncio import StreamReader, StreamReaderProtocol
from multiprocessing import Process
import time
import os
# This is simulating the task
async def do_task(task_number):
for progress in range(task_number):
print(f'{progress}/{task_number} doing')
await asyncio.sleep(1)
# This is the loop that accepts and runs tasks
async def accept_tasks(read_fd):
loop = asyncio.get_running_loop()
# Setup asynchronous reading
reader = StreamReader()
protocol = StreamReaderProtocol(reader)
transport, _ = await loop.connect_read_pipe(
lambda: protocol, os.fdopen(read_fd, 'rb', 0))
while True:
task_number = int(await reader.readline())
await asyncio.sleep(1)
loop.create_task(do_task(task_number))
transport.close()
# This is the starting point of the process,
# the event loop runs here
def worker(read_fd):
loop = asyncio.get_event_loop()
loop.run_until_complete(accept_tasks(read_fd))
# Create read and write pipe
read_fd, write_fd = os.pipe()
# allow inheritance to child
os.set_inheritable(read_fd, True)
Process(target=worker, args=(read_fd, )).start()
# detach from parent
os.close(read_fd)
writer = os.fdopen(write_fd, 'wb', 0)
# Simulate adding tasks every 1 second
for _ in range(1,50):
writer.write((f'{_}\n').encode())
print('added to queue', _)
time.sleep(1)
Basically, we use asynchronous reading on the child process' end, and do non-buffered synchronous write on the parent process' end. To do the former, you need to connect the event loop as shown in accept_tasks coroutine.

What is the correct way to switch freely between asynchronous tasks?

Suppose I have some tasks running asynchronously. They may be totally independent, but I still want to set points where the tasks will pause so they can run concurrently.
What is the correct way to run the tasks concurrently? I am currently using await asyncio.sleep(0), but I feel this is adding a lot of overhead.
import asyncio
async def do(name, amount):
for i in range(amount):
# Do some time-expensive work
print(f'{name}: has done {i}')
await asyncio.sleep(0)
return f'{name}: done'
async def main():
res = await asyncio.gather(do('Task1', 3), do('Task2', 2))
print(*res, sep='\n')
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output
Task1: has done 0
Task2: has done 0
Task1: has done 1
Task2: has done 1
Task1: has done 2
Task1: done
Task2: done
If we were using simple generators, an empty yield would pause the flow of a task without any overhead, but empty await are not valid.
What is the correct way to set such breakpoints without overhead?
As mentioned in the comments, normally asyncio coroutines suspend automatically on calls that would block or sleep in equivalent synchronous code. In your case the coroutine is CPU-bound, so awaiting blocking calls is not enough, it needs to occasionally relinquish control to the event loop to allow the rest of the system to run.
Explicit yields are not uncommon in cooperative multitasking, and using await asyncio.sleep(0) for that purpose will work as intended, it does carry a risk: sleep too often, and you're slowing down the computation by unnecessary switches; sleep too seldom, and you're hogging the event loop by spending too much time in a single coroutine.
The solution provided by asyncio is to offload CPU-bound code to a thread pool using run_in_executor. Awaiting it will automatically suspend the coroutine until the CPU-intensive task is done, without any intermediate polling. For example:
import asyncio
def do(id, amount):
for i in range(amount):
# Do some time-expensive work
print(f'{id}: has done {i}')
return f'{id}: done'
async def main():
loop = asyncio.get_event_loop()
res = await asyncio.gather(
loop.run_in_executor(None, do, 'Task1', 5),
loop.run_in_executor(None, do, 'Task2', 3))
print(*res, sep='\n')
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Asyncio two loops for different I/O tasks?

I am using Python3 Asyncio module to create a load balancing application. I have two heavy IO tasks:
A SNMP polling module, which determines the best possible server
A "proxy-like" module, which balances the petitions to the selected server.
Both processes are going to run forever, are independent from eachother and should not be blocked by the other one.
I cant use 1 event loop because they would block eachother, is there any way to have 2 event loops or do I have to use multithreading/processing?
I tried using asyncio.new_event_loop() but havent managed to make it work.
The whole point of asyncio is that you can run multiple thousands of I/O-heavy tasks concurrently, so you don't need Threads at all, this is exactly what asyncio is made for. Just run the two coroutines (SNMP and proxy) in the same loop and that's it.
You have to make both of them available to the event loop BEFORE calling loop.run_forever(). Something like this:
import asyncio
async def snmp():
print("Doing the snmp thing")
await asyncio.sleep(1)
async def proxy():
print("Doing the proxy thing")
await asyncio.sleep(2)
async def main():
while True:
await snmp()
await proxy()
loop = asyncio.get_event_loop()
loop.create_task(main())
loop.run_forever()
I don't know the structure of your code, so the different modules might have their own infinite loop or something, in this case you can run something like this:
import asyncio
async def snmp():
while True:
print("Doing the snmp thing")
await asyncio.sleep(1)
async def proxy():
while True:
print("Doing the proxy thing")
await asyncio.sleep(2)
loop = asyncio.get_event_loop()
loop.create_task(snmp())
loop.create_task(proxy())
loop.run_forever()
Remember, both snmp and proxy needs to be coroutines (async def) written in an asyncio-aware manner. asyncio will not make simple blocking Python functions suddenly "async".
In your specific case, I suspect that you are confused a little bit (no offense!), because well-written async modules will never block each other in the same loop. If this is the case, you don't need asyncio at all and just simply run one of them in a separate Thread without dealing with any asyncio stuff.
Answering my own question to post my solution:
What I ended up doing was creating a thread and a new event loop inside the thread for the polling module, so now every module runs in a different loop. It is not a perfect solution, but it is the only one that made sense to me(I wanted to avoid threads, but since it is only one...). Example:
import asyncio
import threading
def worker():
second_loop = asyncio.new_event_loop()
execute_polling_coroutines_forever(second_loop)
return
threads = []
t = threading.Thread(target=worker)
threads.append(t)
t.start()
loop = asyncio.get_event_loop()
execute_proxy_coroutines_forever(loop)
Asyncio requires that every loop runs its coroutines in the same thread. Using this method you have one event loop foreach thread, and they are totally independent: every loop will execute its coroutines on its own thread, so that is not a problem.
As I said, its probably not the best solution, but it worked for me.
Though in most cases, you don't need multiple event loops running when using asyncio, people shouldn't assume their assumptions apply to all the cases or just give you what they think are better without directly targeting your original question.
Here's a demo of what you can do for creating new event loops in threads. Comparing to your own answer, the set_event_loop does the trick for you not to pass the loop object every time you do an asyncio-based operation.
import asyncio
import threading
async def print_env_info_async():
# As you can see each work thread has its own asyncio event loop.
print(f"Thread: {threading.get_ident()}, event loop: {id(asyncio.get_running_loop())}")
async def work():
while True:
await print_env_info_async()
await asyncio.sleep(1)
def worker():
new_loop = asyncio.new_event_loop()
asyncio.set_event_loop(new_loop)
new_loop.run_until_complete(work())
return
number_of_threads = 2
for _ in range(number_of_threads):
threading.Thread(target=worker).start()
Ideally, you'll want to put heavy works in worker threads and leave the asncyio thread run as light as possible. Think the asyncio thread as the GUI thread of a desktop or mobile app, you don't want to block it. Worker threads are usually very busy, this is one of the reason you don't want to create separate asyncio event loops in worker threads. Here's an example of how to manage heavy worker threads with a single asyncio event loop. And this is the most common practice in this kind of use cases:
import asyncio
import concurrent.futures
import threading
import time
def print_env_info(source_thread_id):
# This will be called in the main thread where the default asyncio event loop lives.
print(f"Thread: {threading.get_ident()}, event loop: {id(asyncio.get_running_loop())}, source thread: {source_thread_id}")
def work(event_loop):
while True:
# The following line will fail because there's no asyncio event loop running in this worker thread.
# print(f"Thread: {threading.get_ident()}, event loop: {id(asyncio.get_running_loop())}")
event_loop.call_soon_threadsafe(print_env_info, threading.get_ident())
time.sleep(1)
async def worker():
print(f"Thread: {threading.get_ident()}, event loop: {id(asyncio.get_running_loop())}")
loop = asyncio.get_running_loop()
number_of_threads = 2
executor = concurrent.futures.ThreadPoolExecutor(max_workers=number_of_threads)
for _ in range(number_of_threads):
asyncio.ensure_future(loop.run_in_executor(executor, work, loop))
loop = asyncio.get_event_loop()
loop.create_task(worker())
loop.run_forever()
I know it's an old thread but it might be still helpful for someone.
I'm not good in asyncio but here is a bit improved solution of #kissgyorgy answer. Instead of awaiting each closure separately we create list of tasks and fire them later (python 3.9):
import asyncio
async def snmp():
while True:
print("Doing the snmp thing")
await asyncio.sleep(0.4)
async def proxy():
while True:
print("Doing the proxy thing")
await asyncio.sleep(2)
async def main():
tasks = []
tasks.append(asyncio.create_task(snmp()))
tasks.append(asyncio.create_task(proxy()))
await asyncio.gather(*tasks)
asyncio.run(main())
Result:
Doing the snmp thing
Doing the proxy thing
Doing the snmp thing
Doing the snmp thing
Doing the snmp thing
Doing the snmp thing
Doing the proxy thing
Asyncio event loop is a single thread running and it will not run anything in parallel, it is how it is designed. The closest thing which I can think of is using asyncio.wait.
from asyncio import coroutine
import asyncio
#coroutine
def some_work(x, y):
print("Going to do some heavy work")
yield from asyncio.sleep(1.0)
print(x + y)
#coroutine
def some_other_work(x, y):
print("Going to do some other heavy work")
yield from asyncio.sleep(3.0)
print(x * y)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait([asyncio.async(some_work(3, 4)),
asyncio.async(some_other_work(3, 4))]))
loop.close()
an alternate way is to use asyncio.gather() - it returns a future results from the given list of futures.
tasks = [asyncio.Task(some_work(3, 4)), asyncio.Task(some_other_work(3, 4))]
loop.run_until_complete(asyncio.gather(*tasks))
If the proxy server is running all the time it cannot switch back and forth. The proxy listens for client requests and makes them asynchronous, but the other task cannot execute, because this one is serving forever.
If the proxy is a coroutine and is starving the SNMP-poller (never awaits), isn't the client requests being starved aswell?
every coroutine will run forever, they will not end
This should be fine, as long as they do await/yield from. The echo server will also run forever, it doesn't mean you can't run several servers (on differents ports though) in the same loop.

Categories