I'd like to use asyncio to do a lot of simultaneous non-blocking IO in Python. However, I want that use of asyncio to be abstracted away from the user--under the hood there's a lot of asychronous calls going on simultaneously to speed things up, but for the user there's a single, synchronous call.
Basically something like this:
async def _slow_async_fn(address):
data = await async_load_data(address)
return data
def synchronous_blocking_io()
addresses = ...
tasks = []
for address in addresses:
tasks.append(_slow_async_fn(address))
all_results = some_fn(asyncio.gather(*tasks))
return all_results
The problem is, how can I achieve this in a way that's agnostic to the user's running environment? I use a pattern like asyncio.get_event_loop().run_until_complete(), I run into issues if the code is being called inside an environment like Jupyter where there's already an event loop running. Is there a way to robustly gather the results of a set of asynchronous tasks that doesn't require pushing async/await statements all the way up the program?
The restriction on running loops is per thread, so running a new event loop is possible, as long as it is in a new thread.
import asyncio
import concurrent.futures
async def gatherer_of(tasks):
# It's necessary to wrap asyncio.gather() in a coroutine (reasons beyond scope)
return await asyncio.gather(*tasks)
def synchronous_blocking_io():
addresses = ...
tasks = []
for address in addresses:
tasks.append(_slow_async_fn(address))
loop = asyncio.new_event_loop()
return loop.run_until_complete(gatherer_of(tasks))
def synchronous_blocking_io_wrapper():
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
fut = executor.submit(synchronous_blocking_io)
return fut.result()
# Testing
async def async_runner():
# Simulating execution from a running loop
return synchronous_blocking_io_wrapper()
# Run from synchronous client
# print(synchronous_blocking_io_wrapper())
# Run from async client
# print(asyncio.run(async_runner()))
The same result can be achieved with the ProcessPoolExecutor, by manually running synchronous_blocking_io in a new thread and joining it, starting an entirely new process and so forth. As long as you are not in the same thread, you won't conflict with any running event loop.
Related
Folllowing is my code, which runs a long IO operation from an Async method using Thread Pool from Concurrent.Futures Package as follows:
# io_bound/threaded.py
import concurrent.futures as futures
import requests
import threading
import time
import asyncio
data = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
def sleepy(n):
time.sleep(n//2)
return n*2
async def ExecuteSleep():
l = len(data)
results = []
# Submit needs explicit mapping of I/p and O/p
# Output may not be in the same Order
with futures.ThreadPoolExecutor(max_workers=l) as executor:
result_futures = {d:executor.submit(sleepy,d) for d in data}
results = {d:result_futures[d].result() for d in data}
return results
if __name__ == '__main__':
print("Starting ...")
t1 = time.time()
result = asyncio.run(ExecuteSleep())
print(result)
print("Finished ...")
t2 = time.time()
print(t2-t1)
Following is my question:
What could be the potential issue if I run the Threadpool directly without using the following asyncio apis:
loop = asyncio.get_event_loop()
loop.run_in_executor(...)
I have reviewed the docs, ran simple test cases to me this looks perfectly fine and it will run the IO operation in the Background using the Custom thread pool, as listed here, I surely can't use pure async await to receive the Output and have to manage calls using map or submit methods, beside that I don't see a negative here.
Ideone link of my code https://ideone.com/lDVLFh
What could be the potential issue if I run the Threadpool directly
There is no issue if you just submit stuff to your thread pool and never interact with it or wait for results. But your code does wait for results¹.
The issue is that ExecuteSleep is blocking. Although it's defined as async def, it is async in name only because it doesn't await anything. While it runs, no other asyncio coroutines can run, so it defeats the main benefit of asyncio, which is running multiple coroutines concurrently.
¹ Even if you remove the call to `result()`, the `with` statement will wait for the workers to finish their jobs in order to be able to terminate them. If you wanted the sync functions to run completely in the background, you could make the pool global and not use `with` to manage it.
What can occur if one or more workers call 'Synchronous function' simultaneously ?
Maybe one or more workers become blocked for a while ?
async def worker(queue):
while True:
queue_out = await queue.get()
file_name = queue_out.file.name
# Create path + file_name
destination_path = create_path(file_name) #<-- SYNC function
await download_medical(queue_out,destination_path)
async def main():
queue_in = asyncio.Queue(1)
workers = [asyncio.create_task(worker(queue_in)) for _ in range(5)]
async for result in get_result(building):
await queue_in.put(result)
def create_path(file_name):
#....#
#operations related to file and folder on the hdd
#creates a folder based on file name
Short answer:
If you call a synchronous (blocking) function from within an async coroutine, all the tasks that are concurrently running in the loop will stall until this function returns.
Use loop.run_in_executor(...) to asynchronous run blocking functions in another thread or subprocess.
async def worker(queue):
loop = Asyncio.get_event_loop() # get a handle to the current run loop
while True:
queue_out = await queue.get()
file_name = queue_out.file.name
# run blocking function in an executor
create_path_task = loop.run_in_executor(None, create_path, file_name)
destination_path = await create_path_task # wait for this task to finish
await download_medical(queue_out, destination_path)
Background:
Note that async functions (coroutines) do not run tasks in parallel, they run concurrently which may appear to run simultaneously. The easiest way to think about this is by realising that every time await is called, i.e, while a result is being waited for, the event loop will pause the currently running coroutine and run another coroutine until that awaits on something and so on; hence making it cooperatively concurrent.
Awaits are usually made on IO operations as they are time consuming and are not cpu-intensive. CPU intensive operation will block the loop until it completes. Also note that regular IO operations are blocking in nature, if you want to benefit from concurrency then you must use Asyncio compatible libraries like aiofile, aiohttp etc.
More about executors:
The easiest way to run regular sync functions without blocking the event loop is to use loop.run_in_executor. The first argument takes an executor like ThreadPoolExecutor or ProcessPoolExecutor from the concurrent.futures module. By passing None, Asyncio will automatically run your function in a default ThreadPoolExecutor. If your task is cpu intensive, use ProcessPoolExecutor so that it can use multiple cpu-cores and run truly in parallel.
We have a rather big project that is doing a lot of networking (API calls, Websocket messages) and that also has a lot of internal jobs running in intervals in threads. Our current architecture involves spawning a lot of threads and the app is not working very well when the system is under a big load, so we've decided to give asyncio a try.
I know that the best way would be to migrate the whole codebase to async code, but that is not realistic in the very near future because of the size of the codebase and the limited development resources. However, we would like to start migrating parts of our codebase to use asyncio event loop and hopefully, we will be able to convert the whole project at some point.
The problem we have encountered so far is that the whole codebase has sync code and in order to add non-blocking asyncio code inside, the code needs to be run in different thread since you can't really run async and sync code in the same thread.
In order to combine async and sync code, I came up with this approach of running the asyncio code in a separate thread that is created on app start. Other parts of the code add jobs to this loop simply by calling add_asyncio_task.
import threading
import asyncio
_tasks = []
def threaded_loop(loop):
asyncio.set_event_loop(loop)
global _tasks
while True:
if len(_tasks) > 0:
# create a copy of needed tasks
needed_tasks = _tasks.copy()
# flush current tasks so that next tasks can be easily added
_tasks = []
# run tasks
task_group = asyncio.gather(*needed_tasks)
loop.run_until_complete(task_group)
def add_asyncio_task(task):
_tasks.append(task)
def start_asyncio_loop():
loop = asyncio.get_event_loop()
t = threading.Thread(target=threaded_loop, args=(loop,))
t.start()
and somewhere in app.py:
start_asyncio_loop()
and anywhere else in the code:
add_asyncio_task(some_coroutine)
Since I am new to asyncio, I am wondering if this is a good approach in our situation or if this approach is considered an anti-pattern and has some problems that will hit us later down the road? Or maybe asyncio already has some solution for this and I'm just trying to invent the wheel here?
Thanks for your inputs!
The approach is fine in general. You have some issues though:
(1) Almost all asyncio objects are not thread safe
(2) Your code is not thread safe on its own. What if a task appears after needed_tasks = _tasks.copy() but before _tasks = []? You need a lock here. Btw making a copy is pointless. Simple needed_tasks = _tasks will do.
(3) Some asyncio constructs are thread safe. Use them:
import threading
import asyncio
# asyncio.get_event_loop() creates a new loop per thread. Keep
# a single reference to the main loop. You can even try
# _loop = asyncio.new_event_loop()
_loop = asyncio.get_event_loop()
def get_app_loop():
return _loop
def asyncio_thread():
loop = get_app_loop()
asyncio.set_event_loop(loop)
loop.run_forever()
def add_asyncio_task(task):
asyncio.run_coroutine_threadsafe(task, get_app_loop())
def start_asyncio_loop():
t = threading.Thread(target=asyncio_thread)
t.start()
class Class1():
def func1():
self.conn.send('something')
data = self.conn.recv()
return data
class Class2():
def func2():
[class1.func1() for class1 in self.classes]
How do I make that last line asynchronously in python? I've been googling but can't understand async/await and don't know which functions I should be putting async in front of. In my case, all the class1.func1 need to send before any of them can receive anything. I was also seeing that __aiter__ and __anext__ need to be implemented, but I don't know how those are used in this context. Thanks!
It is indeed possible to fire off multiple requests and asynchronously
wait for them. Because Python is traditionally a synchronous language,
you have to be very careful about what libraries you use with
asynchronous Python. Any library that blocks the main thread (such as
requests) will break your entire asynchronicity. aiohttp is a common
choice for asynchronously making web API calls in Python. What you
want is to create a bunch of future objects inside a Python list and
await it. A future is an object that represents a value that will
eventually resolve to something.
EDIT: Since the function that actually makes the API call is
synchronous and blocking and you don't have control over it, you will
have to run that function in a separate thread.
Async List Comprehensions in Python
import asyncio
async def main():
loop = asyncio.get_event_loop()
futures = [asyncio.ensure_future(loop.run_in_executor(None, get_data, data)) for data in data_name_list]
await asyncio.gather(*futures) # wait for all the future objects to resolve
# Do something with futures
# ...
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
I am using Python3 Asyncio module to create a load balancing application. I have two heavy IO tasks:
A SNMP polling module, which determines the best possible server
A "proxy-like" module, which balances the petitions to the selected server.
Both processes are going to run forever, are independent from eachother and should not be blocked by the other one.
I cant use 1 event loop because they would block eachother, is there any way to have 2 event loops or do I have to use multithreading/processing?
I tried using asyncio.new_event_loop() but havent managed to make it work.
The whole point of asyncio is that you can run multiple thousands of I/O-heavy tasks concurrently, so you don't need Threads at all, this is exactly what asyncio is made for. Just run the two coroutines (SNMP and proxy) in the same loop and that's it.
You have to make both of them available to the event loop BEFORE calling loop.run_forever(). Something like this:
import asyncio
async def snmp():
print("Doing the snmp thing")
await asyncio.sleep(1)
async def proxy():
print("Doing the proxy thing")
await asyncio.sleep(2)
async def main():
while True:
await snmp()
await proxy()
loop = asyncio.get_event_loop()
loop.create_task(main())
loop.run_forever()
I don't know the structure of your code, so the different modules might have their own infinite loop or something, in this case you can run something like this:
import asyncio
async def snmp():
while True:
print("Doing the snmp thing")
await asyncio.sleep(1)
async def proxy():
while True:
print("Doing the proxy thing")
await asyncio.sleep(2)
loop = asyncio.get_event_loop()
loop.create_task(snmp())
loop.create_task(proxy())
loop.run_forever()
Remember, both snmp and proxy needs to be coroutines (async def) written in an asyncio-aware manner. asyncio will not make simple blocking Python functions suddenly "async".
In your specific case, I suspect that you are confused a little bit (no offense!), because well-written async modules will never block each other in the same loop. If this is the case, you don't need asyncio at all and just simply run one of them in a separate Thread without dealing with any asyncio stuff.
Answering my own question to post my solution:
What I ended up doing was creating a thread and a new event loop inside the thread for the polling module, so now every module runs in a different loop. It is not a perfect solution, but it is the only one that made sense to me(I wanted to avoid threads, but since it is only one...). Example:
import asyncio
import threading
def worker():
second_loop = asyncio.new_event_loop()
execute_polling_coroutines_forever(second_loop)
return
threads = []
t = threading.Thread(target=worker)
threads.append(t)
t.start()
loop = asyncio.get_event_loop()
execute_proxy_coroutines_forever(loop)
Asyncio requires that every loop runs its coroutines in the same thread. Using this method you have one event loop foreach thread, and they are totally independent: every loop will execute its coroutines on its own thread, so that is not a problem.
As I said, its probably not the best solution, but it worked for me.
Though in most cases, you don't need multiple event loops running when using asyncio, people shouldn't assume their assumptions apply to all the cases or just give you what they think are better without directly targeting your original question.
Here's a demo of what you can do for creating new event loops in threads. Comparing to your own answer, the set_event_loop does the trick for you not to pass the loop object every time you do an asyncio-based operation.
import asyncio
import threading
async def print_env_info_async():
# As you can see each work thread has its own asyncio event loop.
print(f"Thread: {threading.get_ident()}, event loop: {id(asyncio.get_running_loop())}")
async def work():
while True:
await print_env_info_async()
await asyncio.sleep(1)
def worker():
new_loop = asyncio.new_event_loop()
asyncio.set_event_loop(new_loop)
new_loop.run_until_complete(work())
return
number_of_threads = 2
for _ in range(number_of_threads):
threading.Thread(target=worker).start()
Ideally, you'll want to put heavy works in worker threads and leave the asncyio thread run as light as possible. Think the asyncio thread as the GUI thread of a desktop or mobile app, you don't want to block it. Worker threads are usually very busy, this is one of the reason you don't want to create separate asyncio event loops in worker threads. Here's an example of how to manage heavy worker threads with a single asyncio event loop. And this is the most common practice in this kind of use cases:
import asyncio
import concurrent.futures
import threading
import time
def print_env_info(source_thread_id):
# This will be called in the main thread where the default asyncio event loop lives.
print(f"Thread: {threading.get_ident()}, event loop: {id(asyncio.get_running_loop())}, source thread: {source_thread_id}")
def work(event_loop):
while True:
# The following line will fail because there's no asyncio event loop running in this worker thread.
# print(f"Thread: {threading.get_ident()}, event loop: {id(asyncio.get_running_loop())}")
event_loop.call_soon_threadsafe(print_env_info, threading.get_ident())
time.sleep(1)
async def worker():
print(f"Thread: {threading.get_ident()}, event loop: {id(asyncio.get_running_loop())}")
loop = asyncio.get_running_loop()
number_of_threads = 2
executor = concurrent.futures.ThreadPoolExecutor(max_workers=number_of_threads)
for _ in range(number_of_threads):
asyncio.ensure_future(loop.run_in_executor(executor, work, loop))
loop = asyncio.get_event_loop()
loop.create_task(worker())
loop.run_forever()
I know it's an old thread but it might be still helpful for someone.
I'm not good in asyncio but here is a bit improved solution of #kissgyorgy answer. Instead of awaiting each closure separately we create list of tasks and fire them later (python 3.9):
import asyncio
async def snmp():
while True:
print("Doing the snmp thing")
await asyncio.sleep(0.4)
async def proxy():
while True:
print("Doing the proxy thing")
await asyncio.sleep(2)
async def main():
tasks = []
tasks.append(asyncio.create_task(snmp()))
tasks.append(asyncio.create_task(proxy()))
await asyncio.gather(*tasks)
asyncio.run(main())
Result:
Doing the snmp thing
Doing the proxy thing
Doing the snmp thing
Doing the snmp thing
Doing the snmp thing
Doing the snmp thing
Doing the proxy thing
Asyncio event loop is a single thread running and it will not run anything in parallel, it is how it is designed. The closest thing which I can think of is using asyncio.wait.
from asyncio import coroutine
import asyncio
#coroutine
def some_work(x, y):
print("Going to do some heavy work")
yield from asyncio.sleep(1.0)
print(x + y)
#coroutine
def some_other_work(x, y):
print("Going to do some other heavy work")
yield from asyncio.sleep(3.0)
print(x * y)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait([asyncio.async(some_work(3, 4)),
asyncio.async(some_other_work(3, 4))]))
loop.close()
an alternate way is to use asyncio.gather() - it returns a future results from the given list of futures.
tasks = [asyncio.Task(some_work(3, 4)), asyncio.Task(some_other_work(3, 4))]
loop.run_until_complete(asyncio.gather(*tasks))
If the proxy server is running all the time it cannot switch back and forth. The proxy listens for client requests and makes them asynchronous, but the other task cannot execute, because this one is serving forever.
If the proxy is a coroutine and is starving the SNMP-poller (never awaits), isn't the client requests being starved aswell?
every coroutine will run forever, they will not end
This should be fine, as long as they do await/yield from. The echo server will also run forever, it doesn't mean you can't run several servers (on differents ports though) in the same loop.