Async multiprocessing python - python

So I've read this nice article about asynch threads in python. Tough, the last one have some troubles with the GIL and threads are not as effective as it may seems.
Luckily python incorporates Multiprocessing which are designed to be not affected by this trouble.
I'd like to understand how to implement a multiprocessing queue (with Pipe open for each process) in an async manner so it wouldn't hang a running async webserver .
I've read this topic however I'm not looking for performance but rather boxing out a big calculation that hangs my webserver. Those calculations require pictures so they might have a significant i/o exchange but in my understanding this is something that is pretty well handled by async.
All the calcs are separate from each other so they are not meant to be mixed.
I'm trying to build this in front of a ws handler.
If you hint heresy in this please let me know as well :)

This is re-sourced from a article after someone nice on #python irc hinted me on async executors, and another answer on reddit :
(2) Using ProcessPoolExecutor
“The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.”
import asyncio
from concurrent.futures import ProcessPoolExecutor
def cpu_heavy(num):
print('entering cpu_heavy', num)
import time
time.sleep(10)
print('leaving cpu_heavy', num)
return num
async def main(loop):
print('entering main')
executor = ProcessPoolExecutor(max_workers=3)
data = await asyncio.gather(*(loop.run_in_executor(executor, cpu_heavy, num)
for num in range(3)))
print('got result', data)
print('leaving main')
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
And this from another nice guy on reddit ;)

Related

Python asyncio: synchronize all access to a shared object

I have a class which processes a buch of work elements asynchronously (mainly due to overlapping HTTP connection requests) using asyncio. A very simplified example to demonstrate the structure of my code:
class Work:
...
def worker(self, item):
# do some work on item...
return
def queue(self):
# generate the work items...
yield from range(100)
async def run(self):
with ThreadPoolExecutor(max_workers=10) as executor:
loop = asyncio.get_event_loop()
tasks = [
loop.run_in_executor(executor, self.worker, item)
for item in self.queue()
]
for result in await asyncio.gather(*tasks):
pass
work = Work()
asyncio.run(work.run())
In practice, the workers need to access a shared container-like object and call its methods which are not async-safe. For example, let's say the worker method calls a function defined like this:
def func(shared_obj, value):
for node in shared_obj.filter(value):
shared_obj.remove(node)
However, calling func from a worker might affect the other asynchronous workers in this or any other function involving the shared object. I know that I need to use some synchronization, such as a global lock, but I don't find its usage easy:
asyncio.Lock can be used only in async functions, so I would have to mark all such function definitions as async
I would also have to await all calls of these functions
await is also usable only in async functions, so eventually all functions between worker and func would be async
if the worker was async, it would not be possible to pass it to loop.run_in_executor (it does not await)
Furthermore, some of the functions where I would have to add async may be generic in the sense that they should be callable from asynchronous as well as "normal" context.
I'm probably missing something serious in the whole concept. With the threading module, I would just create a lock and work with it in a couple of places, without having to further annotate the functions. Also, there is a nice solution to wrap the shared object such that all access is transparently guarded by a lock. I'm wondering if something similar is possible with asyncio...
I'm probably missing something serious in the whole concept. With the threading module, I would just create a lock...
What you are missing is that you're not really using asyncio at all. run_in_executor serves to integrate CPU-bound or legacy sync code into an asyncio application. It works by submitting the function it to a ThreadPoolExecutor and returning an awaitable handle which gets resolved once the function completes. This is "async" in the sense of running in the background, but not in the sense that is central to asyncio. An asyncio program is composed of non-blocking pieces that use async/await to suspend execution when data is unavailable and rely on the event loop to efficiently wait for multiple events at once and resume appropriate async functions.
In other words, as long as you rely on run_in_executor, you are just using threading (more precisely concurrent.futures with a threading executor). You can use a threading.Lock to synchronize between functions, and things will work exactly as if you used threading in the first place.
To get the benefits of asyncio such as scaling to a large number of concurrent tasks or reliable cancellation, you should design your program as async (or mostly async) from the ground up. Then you'll be able to modify shared data atomically simply by doing it between two awaits, or use asyncio.Lock for synchronized modification across awaits.

Is it bad to use concurrent.futures for the delayed tasks?

(preamble: I use python-telegram-bot for running a Telegram bot which registers users' messages in Google Sheets. None of this is relevant for this question but may provide some context to understand the source of the troubles. The thing is that Google Sheets API does not allow too frequent access to the google sheets, so if many users try to write there, I need to process their requests with some delay).
I know it is considered a very bad practice to use threading module to process tasks and avoid locking by GIL. But by the nature of my task, I receive a flow of requests from users, and I would like to process them with some delay (like from 1 to 10 seconds later than they were actually received). (right now I use celery+redis to process delayed tasks but it looks like an overkill for me for such trivial thing as delayed execution, but I may be wrong).
So I wonder if I can use concurrent.futures.ProcessPoolExecutor (as it is explained for example here: https://idolstarastronomer.com/two-futures.html) or it will result in some kind of disaster promised by the most of people who warn against using threading in Python?
Here is purely hypothetical code that runs something with delay with ProcessPoolExecutor. Will it end up with a disaster under some conditions (too many delayed requests for instance)?
import concurrent.futures
import time
import random
def register_with_delay():
time.sleep(random.randint(0, 10))
print('Im in the delayed registration')
def main():
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = [executor.submit(register_with_delay) for _ in range(10)]
for i in range(10):
print('Im in the main loop')
time.sleep(random.randint(0, 1))
if __name__ == '__main__':
main()

How to efficiently use asyncio when calling a method on a BaseProxy?

I'm working on an application that uses LevelDB and that uses multiple long-lived processes for different tasks.
Since LevelDB does only allow a single process maintaining a database connection, all our database access is funneled through a special database process.
To access the database from another process we use a BaseProxy. But since we are using asyncio our proxy shouldn't block on these APIs that call into the db process which then eventually read from the db. Therefore we implement the APIs on the proxy using an executor.
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
thread_pool_executor,
self._callmethod,
method_name,
args,
)
And while that works just fine, I wonder if there's a better alternative to wrapping the _callmethod call of the BaseProxy in a ThreadPoolExecutor.
The way I understand it, the BaseProxy calling into the DB process is the textbook example of waiting on IO, so using a thread for this seems unnecessary wasteful.
In a perfect world, I'd assume an async _acallmethod to exist on the BaseProxy but unfortunately that API does not exist.
So, my question basically boils down to: When working with BaseProxy is there a more efficient alternative to running these cross process calls in a ThreadPoolExecutor?
Unfortunately, the multiprocessing library is not suited to conversion to asyncio, what you have is the best you can do if you must use BaseProxy to handle your IPC (Inter-Process communication).
While it is true that the library uses blocking I/O here you can't easily reach in and re-work the blocking parts to use non-blocking primitives instead. If you were to insist on going this route you'd have to patch or rewrite the internal implementation details of that library, but being internal implementation details these can differ from Python point release to point release making any patching fragile and prone to break with minor Python upgrades. The _callmethod method is part of a deep hierarchy of abstractions involving threads, socket or pipe connections, and serializers. See multiprocessing/connection.py and multiprocessing/managers.py.
So your options here are to stick with your current approach (using a threadpool executor to shove BaseProxy._callmethod() to another thread) or to implement your own IPC solution using asyncio primitives. Your central database-access process would act as a server for your other processes to connect to as a client, either using sockets or named pipes, using an agreed-upon serialisation scheme for client requests and server responses. This is what multiprocessing implements for you, but you'd implement your own (simpler) version, using asyncio streams and whatever serialisation scheme best suits your application patterns (e.g. pickle, JSON, protobuffers, or something else entirely).
A thread pool is what you want. aioprocessing provides some async functionality of multiprocessing, but it does it using threads as you have proposed. I suggest making an issue against python if there isn't one for exposing true async multiprocessing.
https://github.com/dano/aioprocessing
In most cases, this library makes blocking calls to multiprocessing methods asynchronous by executing the call in a ThreadPoolExecutor
Assuming you have the python and the database running in the same system (i.e. you are not looking to async any network calls), you have two options.
what you are already doing (run in executor). It blocks the db thread but main thread remains free to do other stuff. This is not pure non-blocking, but it is quite an acceptable solution for I/O blocking cases, with a small overhead of maintaining a thread.
For true non-blocking solution (that can be run in a single thread without blocking) you have to have #1. native support for async (callback) from the DB for each fetch call and #2 wrap that in your custom event loop implementation. Here you subclass the Base loop, and overwrite methods to integrate your db callbacks. For example you can create a base loop that implements a pipe server. the db writes to the pipe and python polls the pipe. See the implementation of Proactor event loop in the asyncio code base. Note: I have never implemented any custom event loop.
I am not familiar with leveldb, but for a key-value store, it is not clear if there will be any significant benefit for such a callback for fetch and pure non-blocking implementation. In case you are getting multiple fetches inside an iterator and that is your main problem you can make the loop async (with each fetch still blocking) and can improve your performance. Below is a dummy code that explains this.
import asyncio
import random
import time
async def talk_to_db(d):
"""
blocking db iteration. sleep is the fetch function.
"""
for k, v in d.items():
time.sleep(1)
yield (f"{k}:{v}")
async def talk_to_db_async(d):
"""
real non-blocking db iteration. fetch (sleep) is native async here
"""
for k, v in d.items():
await asyncio.sleep(1)
yield (f"{k}:{v}")
async def talk_to_db_async_loop(d):
"""
semi-non-blocking db iteration. fetch is blocking, but the
loop is not.
"""
for k, v in d.items():
time.sleep(1)
yield (f"{k}:{v}")
await asyncio.sleep(0)
async def db_call_wrapper(db):
async for row in talk_to_db(db):
print(row)
async def db_call_wrapper_async(db):
async for row in talk_to_db_async(db):
print(row)
async def db_call_wrapper_async_loop(db):
async for row in talk_to_db_async_loop(db):
print(row)
async def func(i):
await asyncio.sleep(5)
print(f"done with {i}")
database = {i:random.randint(1,20) for i in range(20)}
async def main():
db_coro = db_call_wrapper(database)
coros = [func(i) for i in range(20)]
coros.append(db_coro)
await asyncio.gather(*coros)
async def main_async():
db_coro = db_call_wrapper_async(database)
coros = [func(i) for i in range(20)]
coros.append(db_coro)
await asyncio.gather(*coros)
async def main_async_loop():
db_coro = db_call_wrapper_async_loop(database)
coros = [func(i) for i in range(20)]
coros.append(db_coro)
await asyncio.gather(*coros)
# run the blocking db iteration
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
# run the non-blocking db iteration
loop = asyncio.get_event_loop()
loop.run_until_complete(main_async())
# run the non-blocking (loop only) db iteration
loop = asyncio.get_event_loop()
loop.run_until_complete(main_async_loop())
This is something you can try. Otherwise, I would say your current method is quite efficient. I do not think BaseProxy can give you an async acall API, it does not know how to handle the callback from your db.

run infinite loop in asyncio event loop running off the main thread

I wrote code that seems to do what I want, but I'm not sure if it's a good idea since it mixes threads and event loops to run an infinite loop off the main thread. This is a minimal code snippet that captures the idea of what I'm doing:
import asyncio
import threading
msg = ""
async def infinite_loop():
global msg
while True:
msg += "x"
await asyncio.sleep(0.3)
def worker():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
asyncio.get_event_loop().run_until_complete(infinite_loop())
t = threading.Thread(target=worker, daemon=True)
t.start()
The main idea is that I have an infinite loop manipulating a global variable each 0.3 s. I want this infinite loop to run off the main thread so I can still access the shared variable in the main thread. This is especially useful in jupyter, because if I call run_until_complete in the main thread I can't interact with jupyter anymore. I want the main thread available to interactively access and modify msg. Using async might seem unnecessary in my example, but I'm using a library that has async code to run a server, so it's necessary. I'm new to async and threading in python, but I remember reading / hearing somewhere that using threading with asyncio is asking for trouble... is this a bad idea? Are there any potential concurrency issues with my approach?
I'm new to async and threading in python, but I remember reading / hearing somewhere that using threading with asyncio is asking for trouble...
Mixing asyncio and threading is discouraged for beginners because it leads to unnecessary complications and often stems from a lack of understanding of how to use asyncio correctly. Programmers new to asyncio often reach for threads by habit, using them for tasks for which coroutines would be more suitable.
But if you have a good reason to spawn a thread that runs the asyncio event loop, by all means do so - there is nothing that requires the asyncio event loop to be run in the main thread. Just be careful to interact with the event loop itself (call methods such as call_soon, create_task, stop, etc.) only from the thread that runs the event loop, i.e. from asyncio coroutines and callbacks. To safely interact with the event loop from the other threads, such as in your case the main thread, use loop.call_soon_threadsafe() or asyncio.run_coroutine_threadsafe().
Note that setting global variables and such doesn't count as "interacting" because asyncio doesn't observe those. Of course, it is up to you to take care of inter-thread synchronization issues, such as protecting access to complex mutable structures with locks.
is this a bad idea?
If unsure whether to mix threads and asyncio, you can ask yourself two questions:
Do I even need threads, given that asyncio provides coroutines that run in parallel and run_in_executor to await blocking code?
If I have threads providing parallelism, do I actually need asyncio?
Your question provides good answers to both - you need threads so that the main thread can interact with jupyter, and you need asyncio because you depend on a library that uses it.
Are there any potential concurrency issues with my approach?
The GIL ensures that setting a global variable in one thread and reading it in another is free of data races, so what you've shown should be fine.
If you add explicit synchronization, such as a multi-threaded queue or condition variable, you should keep in mind that the synchronization code must not block the event loop. In other words, you cannot just wait on, say, a threading.Event in an asyncio coroutine because that would block all coroutines. Instead, you can await an asyncio.Event, and set it using something like loop.call_soon_threadsafe(event.set) from the other thread.

Design of asynchronous request and blocking processing using Tornado

I'm trying to implement a Python app that uses async functions to receive and emit messages using NATS, using a client based on Tornado. Once a message is received, a blocking function must be called, that I'm trying to implement on a separate thread, to allow the reception and publication of messages to put messages in a Tornado queue for later processing of the blocking function.
I'm very new to Tornado (and to python multithreading), but after reading several times the Tornado documentation and other sources, I've been able to put up a working version of the code, that looks like this:
import tornado.gen
import tornado.ioloop
from tornado.queues import Queue
from concurrent.futures import ThreadPoolExecutor
from nats.io.client import Client as NATS
messageQueue = Queue()
nc = NATS()
#tornado.gen.coroutine
def consumer():
def processMessage(currentMessage):
# process the message ...
while True:
currentMessage = yield messageQueue.get()
try:
# execute the call in a separate thread to prevent blocking the queue
EXECUTOR.submit(processMessage, currentMessage)
finally:
messageQueue.task_done()
#tornado.gen.coroutine
def producer():
#tornado.gen.coroutine
def enqueueMessage(currentMessage):
yield messageQueue.put(currentMessage)
yield nc.subscribe("new_event", "", enqueueMessage)
#tornado.gen.coroutine
def main():
tornado.ioloop.IOLoop.current().spawn_callback(consumer)
yield producer()
if __name__ == '__main__':
main()
tornado.ioloop.IOLoop.current().start()
My questions are:
1) Is this the correct way of using Tornado to call a blocking function?
2) What's the best practice for implementing a consumer/producer scheme that is always listening? I'm afraid my while True: statement is actually blocking the processor...
3) How can I inspect the Queue to make sure a burst of calls is being enqueued? I've tried using Queue().qSize(), but it always returns zero, which makes me wonder if the enqueuing is done correctly or not.
General rule (credits to NYKevin) is:
multiprocessing for CPU- and GPU-bound computations.
Event-driven stuff for non-blocking I/O (which should be preferred over blocking I/O where possible, since it scales much more effectively).
Threads for blocking I/O (you can also use multiprocessing, but the per-process overhead probably isn't worth it).
ThreadPoolExecutor for IO, ProcessPoolExecutor for CPU. Both have internal queue, both scale to at most specified max_workers. More info about concurrent executors in docs.
So answer are:
Reimplementing pool is an overhead. Thread or Process depends on what you plan to do.
while True is not blocking if you have e.g. some yielded async calls (even yield gen.sleep(0.01)), it gives back control to ioloop
qsize() is the right to call, but since I have not run/debug this and I would take a different approach (existing pool), it is hard to find a problem here.

Categories