How to use multiple threadpools in twisted with coroutines

How to use multiple threadpools in twisted with coroutines - python

I want to use a separate threadPool in Python Twisted for some special kind of work that is hard on CPU usage.
The functions I want to execute are all defined as inlineCallback and I can not modify that.
from twisted.python.threadpool import ThreadPool
#inflineCallbacks
def myFunc(i):
tmp_result = yield intenseCPUwork(i)
# more work ...
return result
pool = ThreadPool(0, 2)
for i in range(100):
pool.callInThread(myFunc, i)
...
But since my functions return a deferred they return immediately when called like that in the threadpool resulting in all 100 calls to be executed at once even though the threadPool has just size 2.
How can I ensure only two async functions being called at the same time in twisted?

The functions I want to execute are all defined as inlineCallback and I can not modify that.
This means you can't run them in a non-reactor thread. Except for a couple exceptions (mostly having to do with logging or thread management) Twisted APIs are not thread-safe. You can only use them in the same thread as the reactor is running.
If you cannot change the target functions to stop using Twisted APIs, you cannot run them in another thread.

i believe what you need is async/await support of functions for intense works, it would be something like below
import asyncio
async def myFunc(i):
tmp_result = yield intenseCPUwork(i)
# more work ...
return result
loop = asyncio.get_event_loop()
tasks = [
loop.create_task(myFunc(1)),
loop.create_task(myFunc(2))
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

Related

Implementing a coroutine in python

Lets say I have a C++ function result_type compute(input_type input), which I have made available to python using cython. My python code executes multiple computations like this:
def compute_total_result()
inputs = ...
total_result = ...
for input in inputs:
result = compute_python_wrapper(input)
update_total_result(total_result)
return total_result
Since the computation takes a long time, I have implemented a C++ thread pool (like this) and written a function std::future<result_type> compute_threaded(input_type input), which returns a future that becomes ready as soon as the thread pool is done executing.
What I would like to do is to use this C++ function in python as well. A simple way to do this would be to wrap the std::future<result_type> including its get() function, wait for all results like this:
def compute_total_results_parallel()
inputs = ...
total_result = ...
futures = []
for input in inputs:
futures.append(compute_threaded_python_wrapper(input))
for future in futures:
update_total_result(future.get())
return total_result
I suppose this works well enough in this case, but it becomes very complicated very fast, because I have to pass futures around.
However, I think that conceptually, waiting for these C++ results is no different from waiting for file or network I/O.
To facilitate I/O operations, the python devs introduced the async / await keywords. If my compute_threaded_python_wrapper would be part of asyncio, I could simply rewrite it as
async def compute_total_results_async()
inputs = ...
total_result = ...
for input in inputs:
result = await compute_threaded_python_wrapper(input)
update_total_result(total_result)
return total_result
And I could execute the whole code via result = asyncio.run(compute_total_results_async()).
There are a lot of tutorials regarding async programming in python, but most of them deal with using coroutines where the bedrock seem to be some call into the asyncio package, mostly calling asyncio.sleep(delay) as a proxy for I/O.
My question is: (How) Can I implement coroutines in python, enabling python to await the wrapped future object (There is some mention of a __await__ method returning an iterator)?

First, an inaccuracy in the question needs to be corrected:
If my compute_threaded_python_wrapper would be part of asyncio, I could simply rewrite it as [...]
The rewrite is incorrect: await means "wait until the computation finishes", so the loop as written would execute the code sequentially. A rewrite that actually runs the tasks in parallel would be something like:
# a direct translation of the "parallel" version
def compute_total_results_async()
inputs = ...
total_result = ...
tasks = []
# first spawn all the tasks
for input in inputs:
tasks.append(
asyncio.create_task(compute_threaded_python_wrapper(input))
)
# and then await them
for task in tasks:
update_total_result(await task)
return total_result
This spawn-all-await-all pattern is so uniquitous that asyncio provides a helper function, asyncio.gather(), which makes it much shorter, especially when combined with a list comprehension:
# a more idiomatic version
def compute_total_results_async()
inputs = ...
total_result = ...
results = await asyncio.gather(
*[compute_threaded_python_wrapper(input) for input in inputs]
)
for result in results:
update_total_result(result)
return total_result
With that out of the way, we can proceed to the main question:
My question is: (How) Can I implement coroutines in python, enabling python to await the wrapped future object (There is some mention of a __await__ method returning an iterator)?
Yes, awaitable objects are implemented using iterators that yield to indicate suspension. But that is way too low-level a tool for what you actually need. You don't need just any awaitable, but one that works with the asyncio event loop, which has specific expectations of the underlying iterator. You need a mechanism to resume the awaitable when the result is ready, where you again depend on asyncio.
Asyncio already provides awaitable objects that can be externally assigned a value: futures. An asyncio future represents an async value that will become available at some point in the future. They are related to, but not semantically equivalent to C++ futures, and should not to be confused with multi-threaded futures from the concurrent.futures stdlib module.
To create an awaitable object that is activated by something that happens in another thread, you need to create a future, and then start your off-thread task, instructing it to mark the future as completed when it finishes execution. Since asyncio futures are not thread-safe, this must be done using the call_soon_threadsafe event loop method provided by asyncio for such situations. In Python it would be done like this:
def run_async():
loop = asyncio.get_event_loop()
future = loop.create_future()
def on_done(result):
# when done, notify the future in a thread-safe manner
loop.call_soon_threadsafe(future.set_result, resut)
# start the worker in a thread owned by the pool
pool.submit(_worker, on_done)
# returning a future makes run_async() awaitable, and
# passable to asyncio.gather() etc.
return future
def _worker(on_done):
# this runs in a different thread
... processing goes here ...
result = ...
on_done(result)
In your case, the worker would be presumably implemented in Cython combined with C++.

How do I make my list comprehension (and the function it calls) run asynchronously?

class Class1():
def func1():
self.conn.send('something')
data = self.conn.recv()
return data
class Class2():
def func2():
[class1.func1() for class1 in self.classes]
How do I make that last line asynchronously in python? I've been googling but can't understand async/await and don't know which functions I should be putting async in front of. In my case, all the class1.func1 need to send before any of them can receive anything. I was also seeing that __aiter__ and __anext__ need to be implemented, but I don't know how those are used in this context. Thanks!

It is indeed possible to fire off multiple requests and asynchronously
wait for them. Because Python is traditionally a synchronous language,
you have to be very careful about what libraries you use with
asynchronous Python. Any library that blocks the main thread (such as
requests) will break your entire asynchronicity. aiohttp is a common
choice for asynchronously making web API calls in Python. What you
want is to create a bunch of future objects inside a Python list and
await it. A future is an object that represents a value that will
eventually resolve to something.
EDIT: Since the function that actually makes the API call is
synchronous and blocking and you don't have control over it, you will
have to run that function in a separate thread.
Async List Comprehensions in Python
import asyncio
async def main():
loop = asyncio.get_event_loop()
futures = [asyncio.ensure_future(loop.run_in_executor(None, get_data, data)) for data in data_name_list]
await asyncio.gather(*futures) # wait for all the future objects to resolve
# Do something with futures
# ...
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

How to parallelize computation with asyncio?

I have a block of code which takes a long time to execute and is CPU intense. I want to run that block several times and want to use the full power of my CPU for that. Looking at asyncio I understood that it is mainly for asynchronous communication, but is also a general tool for asynchronous tasks.
In the following example the time.sleep(y) is a placeholder for the code I want to run. In this example every co-routine is executed one after the other and the execution takes about 8 seconds.
import asyncio
import logging
import time
async def _do_compute_intense_stuff(x, y, logger):
logger.info('Getting it started...')
for i in range(x):
time.sleep(y)
logger.info('Almost done')
return x * y
logging.basicConfig(format='[%(name)s, %(levelname)s]: %(message)s', level='INFO')
logger = logging.getLogger(__name__)
loop = asyncio.get_event_loop()
co_routines = [
asyncio.ensure_future(_do_compute_intense_stuff(2, 1, logger.getChild(str(i)))) for i in range(4)]
logger.info('Made the co-routines')
responses = loop.run_until_complete(asyncio.gather(*co_routines))
logger.info('Loop is done')
print(responses)
When I replace time.sleep(y) with asyncio.sleep(y) it returns nearly immediately. With await asyncio.sleep(y) it takes about 2 seconds.
Is there a way to parallelize my code using this approach or should I use multiprocessing or threading? Would I need to put the time.sleep(y) into a Thread?

Executors use multithreading to accomplish this (or mulitprocessing, if you prefer). Asyncio is used to optimize code where you wait frequently for input, output operations to run. Sometimes that can be writing to files or loading websites.
However, with cpu heavy operations (that don't just rely on waiting for IO), it's recommended to use something akin to threads, and, in my opinion, concurrent.futures provides a very nice wrapper for that and it is similar to Asyncio's wrapper.
The reason why Asyncio.sleep would make your code run faster because it starts the function and then starts checking coroutines to see if they are ready. This doesn't scale well with CPU-heavy operations, as there is no IO to wait for.
To change the following example from multiprocessing to multi-threading Simply change ProcessPoolExecutor to ThreadPoolExecutor.
Here is a multiprocessing example:
import concurrent.futures
import time
def a(z):
time.sleep(1)
return z*12
if __name__ == '__main__':
with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
futures = {executor.submit(a, i) for i in range(5)}
for future in concurrent.futures.as_completed(futures):
data = future.result()
print(data)
This is a simplified version of the example provided in the documentation for executors.

How to execute a Python function within a function and not stop the execution

I'm trying to accomplish something without using threading
I'd like to execute a function within a function, but I dont want the first function's flow to stop. Its just a procedure and I don't expect any return and I also need this to keep the execution for some reasons.
Here is a snippet code of what I'd like to do:
function foo():
a = 5
dosomething()
# I dont wan't to wait until dosomething finish. Just call and follow it
return a
Is there any way to do this?
Thanks in advance.

You can use https://docs.python.org/3/library/concurrent.futures.html to achieve fire-and-forget behavior.
import concurrent.futures
def foo():
a = 5
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(dosomething)
future.add_done_callback(on_something_done)
#print(future.result())
#continue without waiting dosomething()
#future.cancel() #To cancel dosomething
#future.done() #return True if done.
return a
def on_something_done(future):
print(future.result())
[updates]
concurrent.futures is built-in since python 3
for Python 2.x you can download futures 2.1.6 here

Python is synchronous, you'll have to use asynchronous processing to accomplish this.
While there are many many ways that you can execute a function asynchronously, one way is to use python-rq. Python-rq allows you to queue jobs for processing in the background with workers. It is backed by Redis and it is designed to have a low barrier to entry. It should be integrated in your web stack easily.
For example:
from rq import Queue, use_connection
def foo():
use_connection()
q = Queue()
# do some things
a = 5
# now process something else asynchronously
q.enqueue(do_something)
# do more here
return a

What kind of problems (if any) would there be combining asyncio with multiprocessing?

As almost everyone is aware when they first look at threading in Python, there is the GIL that makes life miserable for people who actually want to do processing in parallel - or at least give it a chance.
I am currently looking at implementing something like the Reactor pattern. Effectively I want to listen for incoming socket connections on one thread-like, and when someone tries to connect, accept that connection and pass it along to another thread-like for processing.
I'm not (yet) sure what kind of load I might be facing. I know there is currently setup a 2MB cap on incoming messages. Theoretically we could get thousands per second (though I don't know if practically we've seen anything like that). The amount of time spent processing a message isn't terribly important, though obviously quicker would be better.
I was looking into the Reactor pattern, and developed a small example using the multiprocessing library that (at least in testing) seems to work just fine. However, now/soon we'll have the asyncio library available, which would handle the event loop for me.
Is there anything that could bite me by combining asyncio and multiprocessing?

You should be able to safely combine asyncio and multiprocessing without too much trouble, though you shouldn't be using multiprocessing directly. The cardinal sin of asyncio (and any other event-loop based asynchronous framework) is blocking the event loop. If you try to use multiprocessing directly, any time you block to wait for a child process, you're going to block the event loop. Obviously, this is bad.
The simplest way to avoid this is to use BaseEventLoop.run_in_executor to execute a function in a concurrent.futures.ProcessPoolExecutor. ProcessPoolExecutor is a process pool implemented using multiprocessing.Process, but asyncio has built-in support for executing a function in it without blocking the event loop. Here's a simple example:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def blocking_func(x):
time.sleep(x) # Pretend this is expensive calculations
return x * 5
#asyncio.coroutine
def main():
#pool = multiprocessing.Pool()
#out = pool.apply(blocking_func, args=(10,)) # This blocks the event loop.
executor = ProcessPoolExecutor()
out = yield from loop.run_in_executor(executor, blocking_func, 10) # This does not
print(out)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For the majority of cases, this is function alone is good enough. If you find yourself needing other constructs from multiprocessing, like Queue, Event, Manager, etc., there is a third-party library called aioprocessing (full disclosure: I wrote it), that provides asyncio-compatible versions of all the multiprocessing data structures. Here's an example demoing that:
import time
import asyncio
import aioprocessing
import multiprocessing
def func(queue, event, lock, items):
with lock:
event.set()
for item in items:
time.sleep(3)
queue.put(item+5)
queue.close()
#asyncio.coroutine
def example(queue, event, lock):
l = [1,2,3,4,5]
p = aioprocessing.AioProcess(target=func, args=(queue, event, lock, l))
p.start()
while True:
result = yield from queue.coro_get()
if result is None:
break
print("Got result {}".format(result))
yield from p.coro_join()
#asyncio.coroutine
def example2(queue, event, lock):
yield from event.coro_wait()
with (yield from lock):
yield from queue.coro_put(78)
yield from queue.coro_put(None) # Shut down the worker
if __name__ == "__main__":
loop = asyncio.get_event_loop()
queue = aioprocessing.AioQueue()
lock = aioprocessing.AioLock()
event = aioprocessing.AioEvent()
tasks = [
asyncio.async(example(queue, event, lock)),
asyncio.async(example2(queue, event, lock)),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

Yes, there are quite a few bits that may (or may not) bite you.
When you run something like asyncio it expects to run on one thread or process. This does not (by itself) work with parallel processing. You somehow have to distribute the work while leaving the IO operations (specifically those on sockets) in a single thread/process.
While your idea to hand off individual connections to a different handler process is nice, it is hard to implement. The first obstacle is that you need a way to pull the connection out of asyncio without closing it. The next obstacle is that you cannot simply send a file descriptor to a different process unless you use platform-specific (probably Linux) code from a C-extension.
Note that the multiprocessing module is known to create a number of threads for communication. Most of the time when you use communication structures (such as Queues), a thread is spawned. Unfortunately those threads are not completely invisible. For instance they can fail to tear down cleanly (when you intend to terminate your program), but depending on their number the resource usage may be noticeable on its own.
If you really intend to handle individual connections in individual processes, I suggest to examine different approaches. For instance you can put a socket into listen mode and then simultaneously accept connections from multiple worker processes in parallel. Once a worker is finished processing a request, it can go accept the next connection, so you still use less resources than forking a process for each connection. Spamassassin and Apache (mpm prefork) can use this worker model for instance. It might end up easier and more robust depending on your use case. Specifically you can make your workers die after serving a configured number of requests and be respawned by a master process thereby eliminating much of the negative effects of memory leaks.

Based on #dano's answer above I wrote this function to replace places where I used to use multiprocess pool + map.
def asyncio_friendly_multiproc_map(fn: Callable, l: list):
"""
This is designed to replace the use of this pattern:
with multiprocessing.Pool(5) as p:
results = p.map(analyze_day, list_of_days)
By letting caller drop in replace:
asyncio_friendly_multiproc_map(analyze_day, list_of_days)
"""
tasks = []
with ProcessPoolExecutor(5) as executor:
for e in l:
tasks.append(asyncio.get_event_loop().run_in_executor(executor, fn, e))
res = asyncio.get_event_loop().run_until_complete(asyncio.gather(*tasks))
return res

See PEP 3156, in particular the section on Thread interaction:
http://www.python.org/dev/peps/pep-3156/#thread-interaction
This documents clearly the new asyncio methods you might use, including run_in_executor(). Note that the Executor is defined in concurrent.futures, I suggest you also have a look there.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.