AsyncIO run in executor using ProcessPoolExecutor

AsyncIO run in executor using ProcessPoolExecutor - python

I tried to combine blocking tasks and non-blocking (I/O bound) tasks using ProcessPoolExecutor and found it's behavior pretty unexpected.
class BlockingQueueListener(BaseBlockingListener):
def run(self):
# Continioulsy listening a queue
blocking_listen()
class NonBlockingListener(BaseNonBlocking):
def non_blocking_listen(self):
while True:
await self.get_message()
def run(blocking):
blocking.run()
if __name__ == "__main__":
loop = asyncio.get_event_loop()
executor = ProcessPoolExecutor()
blocking = BlockingQueueListener()
non_blocking = NonBlockingListener()
future = loop.run_in_executor(executor, run(blocking))
loop.run_until_complete(
asyncio.gather(
non_blocking.main(),
future
)
)
I was expecting that both tasks will have control concurrently, but blocking task started in ProcessPoolExecutor blocks and never return control. How could it happen? What the proper way to combine normal coroutines and futures started in multiprocessing executor?

This line:
future = loop.run_in_executor(executor, run(blocking))
Will actually run the blocking function and give its result to the executor.
According to the documentation, you need to pass the function explicitly followed by its arguments.
future = loop.run_in_executor(executor, run, blocking)

Related

Run blocking and unblocking tasks together with asyncio

I want to run blocking and unblocking tasks together asynchronously. Obviously that it is necessary to use run_in_executor method for blocking tasks from asyncio. Here is my sample code:
import asyncio
import concurrent.futures
import datetime
import time
def blocking():
print("Enter to blocking()", datetime.datetime.now().time())
time.sleep(2)
print("Exited from blocking()", datetime.datetime.now().time())
async def waiter():
print("Enter to waiter()", datetime.datetime.now().time())
await asyncio.sleep(3)
print("Exit from waiter()", datetime.datetime.now().time())
async def asynchronous(loop):
print("Create tasks", datetime.datetime.now().time())
task_1 = asyncio.create_task(waiter())
executor = concurrent.futures.ThreadPoolExecutor(max_workers=3)
task_2 = loop.run_in_executor(executor, blocking)
tasks = [task_1, task_2]
print("Tasks are created", datetime.datetime.now().time())
await asyncio.wait(tasks)
if __name__ == "__main__":
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(asynchronous(loop))
except (OSError) as exc:
sys.exit('Exception: ' + str(exc))
Should I use the same event loop for blocking task in run_in_executor, or it is necessary to use another one? What should I change in my code to make it work asynchronously? Thanks

You must use the same loop. The loop delegates to the executor, which runs tasks is separate threads to the event loop. So you don't have to worry about your blocking tasks blocking the event loop. If you use a separate loop, your async functions from the event loop will not be able to await the results of blocking the functions run in the new loop.
The event loop manages this by creating a future to represent the executor task. It then runs the blocking task in one of the executors threads, and when the executor task returns the result of the future is set and control returned to awaiting function in the event loop (if any).

why can't i await readasarray method in gdal module?

I am trying to read several remote images into python and read those image as numpyarray, I try to consider using async to boost my workflow, but I get an error like this:type error: object numpy.ndarray can't be used in 'await' expression',I wonder is it because the method readasarray is not async, so if I have to make it async, I will have to rewrite this method by my own.here are some of my code:
async def taskIO_1():
in_ds = gdal.Open(a[0])
data1 = await in_ds.GetRasterBand(1).ReadAsArray()
async def taskIO_2():
in_ds = gdal.Open(a[1])
data2 = await in_ds.GetRasterBand(1).ReadAsArray()
async def main():
tasks = [taskIO_1(), taskIO_2()]
done,pending = await asyncio.wait(tasks)
for r in done:
print(r.result())
if __name__ == '__main__':
start = time.time()
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.close()
print(float(time.time()-start))

Your notion is correct: In general, library functions are executed in a synchronized (blocking) fashion, unless the library is explicitly written to support asynchronous execution (e.g. by using non-blocking I/O), such as aiofiles or aiohttp.
To use synchronous calls that you want to be executed asynchronously, you could use loop.run_in_executor. This does nothing else than to offload the computation into a separate thread or process and wrap it so it behaves like a coroutine. An example is shown here:
import asyncio
import concurrent.futures
def blocking_io():
# File operations (such as logging) can block the
# event loop: run them in a thread pool.
with open('/dev/urandom', 'rb') as f:
return f.read(100)
def cpu_bound():
# CPU-bound operations will block the event loop:
# in general it is preferable to run them in a
# process pool.
return sum(i * i for i in range(10 ** 7))
async def main():
loop = asyncio.get_running_loop()
## Options:
# 1. Run in the default loop's executor:
result = await loop.run_in_executor(
None, blocking_io)
print('default thread pool', result)
# 2. Run in a custom thread pool:
with concurrent.futures.ThreadPoolExecutor() as pool:
result = await loop.run_in_executor(
pool, blocking_io)
print('custom thread pool', result)
# 3. Run in a custom process pool:
with concurrent.futures.ProcessPoolExecutor() as pool:
result = await loop.run_in_executor(
pool, cpu_bound)
print('custom process pool', result)
asyncio.run(main())
However, if your application is not using any truly asynchronous features, you are probably better off to just use concurrent.futures pool directly and achieve concurrency that way.

How to use run_in_executor right?

I try to use run_in_executor and have some questions. Here is code (basically copypast from docs)
import asyncio
import concurrent.futures
def cpu_bound(val):
# CPU-bound operations will block the event loop:
# in general it is preferable to run them in a
# process pool.
print(f'Start task: {val}')
sum(i * i for i in range(10 ** 7))
print(f'End task: {val}')
async def async_task(val):
print(f'Start async task: {val}')
while True:
print(f'Tick: {val}')
await asyncio.sleep(1)
async def main():
loop = asyncio.get_running_loop()
## Options:
for i in range(5):
loop.create_task(async_task(i))
# 1. Run in the default loop's executor:
# for i in range(10):
# loop.run_in_executor(
# None, cpu_bound, i)
# print('default thread pool')
# 2. Run in a custom thread pool:
# with concurrent.futures.ThreadPoolExecutor(max_workers=10) as pool:
# for i in range(10):
# loop.run_in_executor(
# pool, cpu_bound, i)
# print('custom thread pool')
# 3. Run in a custom process pool:
with concurrent.futures.ProcessPoolExecutor(max_workers = 10) as pool:
for i in range(10):
loop.run_in_executor(
pool, cpu_bound, i)
print('custom process pool')
while True:
await asyncio.sleep(1)
asyncio.run(main())
Case 1: run_in_executor where executor is None:
async_task's execute in the same time as cpu_bound's execute.
In other cases async_task's will execute after cpu_bound's are done.
I thought when we use ProcessPoolExecutor tasks shouldn't block loop. Where am I wrong?

In other cases async_task's will execute after cpu_bound's are done. I thought when we use ProcessPoolExecutor tasks shouldn't block loop. Where am I wrong?
The problem is that with XXXPoolExecutor() shuts down the pool at the end of the with block. Pool shutdown waits for the pending tasks to finish, which blocks the event loop and is incompatible with asyncio. Since your first variant doesn't involve a with statement, it doesn't have this issue.
The solution is simply to remove the with statement and create the pool once (for example at top-level or in main()), and just use it in the function. If you want to, you can explicitly shut down the pool by calling pool.shutdown() after asyncio.run() has completed.
Also note that you are never awaiting the futures returned by loop.run_in_executor. This is an error and asyncio will probably warn you of it; you should probably collect the returned values in a list and await them with something like results = await asyncio.gather(*tasks). This will not only collect the results, but also make sure that the exceptions that occur in the off-thread functions get correctly propagated to your code rather than dropped.

How to terminate an event loop

I have the following code in a django view to create a background task:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_in_executor(None, update_contacts, {
'email': email,
'access_token': g.tokens['access_token']
})
Is there anything I need to do at the end to 'kill' the loop? What would be the proper way to close it, etc?

You do not need to start any event loop in the first place. The concurrent.futures package gives direct access to Executors, and threading lets you launch individual Threads:
# raw thread
import threading
background_task = threading.Thread(
target=update_contacts, kwargs={
'email': email,
'access_token': g.tokens['access_token']
})
background_task.start()
# executor thread pool
from concurrent.futures import ThreadPoolExecutor
my_executor = ThreadPoolExecutor()
my_executor.submit(update_contacts, email=email, access_token=g.tokens['access_token'])
In general, a Thread is simpler if you just want to launch a task and forget about it. A ThreadPoolExecutor is more efficient if you have many small tasks at the same time; it can also be used to automatically wait for completion of several tasks.
print('start at', time.time())
with ThreadPoolExecutor() as executor:
executor.submit(time.sleep, 1)
executor.submit(time.sleep, 1)
executor.submit(time.sleep, 1)
executor.submit(time.sleep, 1)
print('done at', time.time()) # triggers after all 4 sleeps have finished
The primary purpose of loop.run_in_executor is not to provide a ThreadPoolExecutor. It is meant to bridge the gap between Executors for blocking code and the event loop for non-blocking code. Without the later, there is no need to use asnycio at all.
import time
import asyncio
def block(delay: float):
print("Stop! Blocking Time!")
time.sleep(delay) # block the current thread
print("Done! Blocking Time!")
async def nonblock(delay: float):
print("Erm.. Non-Blocking Time!")
await asyncio.sleep(delay)
print("Done! Non-Blocking Time!")
async def multiblock(delay: float):
loop = asyncio.get_event_loop()
await asyncio.gather( # await async natively and sync via executors
nonblock(delay),
loop.run_in_executor(None, block, delay),
nonblock(delay),
loop.run_in_executor(None, block, delay),
)
asyncio.run(multiblock(1))

Asyncio tasks can be canceled by calling the cancel method on the Task object. Tasks that run asynchronous code, such as those using the aiohttp library, will be canceled immediately. Tasks that run blocking code using run_in_executor will not be canceled because they are run in an OS thread behind the scenes.
This is part of the reason why run_in_executor is discouraged in asyncio code and is only intended as a stop-gap measure to include legacy blocking code in an asyncio program. (The other part is that the number of tasks is limited by the number of OS threads allowed by the pool, whereas the limit for the number of true asynchronous tasks is much higher.)

Understanding Python Concurrency with Asyncio

I was wondering how concurrency works in python 3.6 with asyncio. My understanding is that when the interpreter executing await statement, it will leave it there until the awaiting process is complete and then move on to execute the other coroutine task. But what I see here in the code below is not like that. The program runs synchronously, executing task one by one.
What is wrong with my understanding and my impletementation code?
import asyncio
import time
async def myWorker(lock, i):
print("Attempting to attain lock {}".format(i))
# acquire lock
with await lock:
# run critical section of code
print("Currently Locked")
time.sleep(10)
# our worker releases lock at this point
print("Unlocked Critical Section")
async def main():
# instantiate our lock
lock = asyncio.Lock()
# await the execution of 2 myWorker coroutines
# each with our same lock instance passed in
# await asyncio.wait([myWorker(lock), myWorker(lock)])
tasks = []
for i in range(0, 100):
tasks.append(asyncio.ensure_future(myWorker(lock, i)))
await asyncio.wait(tasks)
# Start up a simple loop and run our main function
# until it is complete
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
print("All Tasks Completed")
loop.close()

Invoking a blocking call such as time.sleep in an asyncio coroutine blocks the whole event loop, defeating the purpose of using asyncio.
Change time.sleep(10) to await asyncio.sleep(10), and the code will behave like you expect.

asyncio use a loop to run everything, await would yield back the control to the loop so it can arrange the next coroutine to run.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

AsyncIO run in executor using ProcessPoolExecutor - python

This line: future = loop.run_in_executor(executor, run(blocking)) Will actually run the blocking function and give its result to the executor. According to the documentation, you need to pass the function explicitly followed by its arguments. future = loop.run_in_executor(executor, run, blocking)

Related

Run blocking and unblocking tasks together with asyncio

why can't i await readasarray method in gdal module?

How to use run_in_executor right?

How to terminate an event loop

Understanding Python Concurrency with Asyncio

Categories

Resources