Design of asynchronous request and blocking processing using Tornado

Design of asynchronous request and blocking processing using Tornado - python

I'm trying to implement a Python app that uses async functions to receive and emit messages using NATS, using a client based on Tornado. Once a message is received, a blocking function must be called, that I'm trying to implement on a separate thread, to allow the reception and publication of messages to put messages in a Tornado queue for later processing of the blocking function.
I'm very new to Tornado (and to python multithreading), but after reading several times the Tornado documentation and other sources, I've been able to put up a working version of the code, that looks like this:
import tornado.gen
import tornado.ioloop
from tornado.queues import Queue
from concurrent.futures import ThreadPoolExecutor
from nats.io.client import Client as NATS
messageQueue = Queue()
nc = NATS()
#tornado.gen.coroutine
def consumer():
def processMessage(currentMessage):
# process the message ...
while True:
currentMessage = yield messageQueue.get()
try:
# execute the call in a separate thread to prevent blocking the queue
EXECUTOR.submit(processMessage, currentMessage)
finally:
messageQueue.task_done()
#tornado.gen.coroutine
def producer():
#tornado.gen.coroutine
def enqueueMessage(currentMessage):
yield messageQueue.put(currentMessage)
yield nc.subscribe("new_event", "", enqueueMessage)
#tornado.gen.coroutine
def main():
tornado.ioloop.IOLoop.current().spawn_callback(consumer)
yield producer()
if __name__ == '__main__':
main()
tornado.ioloop.IOLoop.current().start()
My questions are:
1) Is this the correct way of using Tornado to call a blocking function?
2) What's the best practice for implementing a consumer/producer scheme that is always listening? I'm afraid my while True: statement is actually blocking the processor...
3) How can I inspect the Queue to make sure a burst of calls is being enqueued? I've tried using Queue().qSize(), but it always returns zero, which makes me wonder if the enqueuing is done correctly or not.

General rule (credits to NYKevin) is:
multiprocessing for CPU- and GPU-bound computations.
Event-driven stuff for non-blocking I/O (which should be preferred over blocking I/O where possible, since it scales much more effectively).
Threads for blocking I/O (you can also use multiprocessing, but the per-process overhead probably isn't worth it).
ThreadPoolExecutor for IO, ProcessPoolExecutor for CPU. Both have internal queue, both scale to at most specified max_workers. More info about concurrent executors in docs.
So answer are:
Reimplementing pool is an overhead. Thread or Process depends on what you plan to do.
while True is not blocking if you have e.g. some yielded async calls (even yield gen.sleep(0.01)), it gives back control to ioloop
qsize() is the right to call, but since I have not run/debug this and I would take a different approach (existing pool), it is hard to find a problem here.

Related

How to efficiently use asyncio when calling a method on a BaseProxy?

I'm working on an application that uses LevelDB and that uses multiple long-lived processes for different tasks.
Since LevelDB does only allow a single process maintaining a database connection, all our database access is funneled through a special database process.
To access the database from another process we use a BaseProxy. But since we are using asyncio our proxy shouldn't block on these APIs that call into the db process which then eventually read from the db. Therefore we implement the APIs on the proxy using an executor.
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
thread_pool_executor,
self._callmethod,
method_name,
args,
)
And while that works just fine, I wonder if there's a better alternative to wrapping the _callmethod call of the BaseProxy in a ThreadPoolExecutor.
The way I understand it, the BaseProxy calling into the DB process is the textbook example of waiting on IO, so using a thread for this seems unnecessary wasteful.
In a perfect world, I'd assume an async _acallmethod to exist on the BaseProxy but unfortunately that API does not exist.
So, my question basically boils down to: When working with BaseProxy is there a more efficient alternative to running these cross process calls in a ThreadPoolExecutor?

Unfortunately, the multiprocessing library is not suited to conversion to asyncio, what you have is the best you can do if you must use BaseProxy to handle your IPC (Inter-Process communication).
While it is true that the library uses blocking I/O here you can't easily reach in and re-work the blocking parts to use non-blocking primitives instead. If you were to insist on going this route you'd have to patch or rewrite the internal implementation details of that library, but being internal implementation details these can differ from Python point release to point release making any patching fragile and prone to break with minor Python upgrades. The _callmethod method is part of a deep hierarchy of abstractions involving threads, socket or pipe connections, and serializers. See multiprocessing/connection.py and multiprocessing/managers.py.
So your options here are to stick with your current approach (using a threadpool executor to shove BaseProxy._callmethod() to another thread) or to implement your own IPC solution using asyncio primitives. Your central database-access process would act as a server for your other processes to connect to as a client, either using sockets or named pipes, using an agreed-upon serialisation scheme for client requests and server responses. This is what multiprocessing implements for you, but you'd implement your own (simpler) version, using asyncio streams and whatever serialisation scheme best suits your application patterns (e.g. pickle, JSON, protobuffers, or something else entirely).

A thread pool is what you want. aioprocessing provides some async functionality of multiprocessing, but it does it using threads as you have proposed. I suggest making an issue against python if there isn't one for exposing true async multiprocessing.
https://github.com/dano/aioprocessing
In most cases, this library makes blocking calls to multiprocessing methods asynchronous by executing the call in a ThreadPoolExecutor

Assuming you have the python and the database running in the same system (i.e. you are not looking to async any network calls), you have two options.
what you are already doing (run in executor). It blocks the db thread but main thread remains free to do other stuff. This is not pure non-blocking, but it is quite an acceptable solution for I/O blocking cases, with a small overhead of maintaining a thread.
For true non-blocking solution (that can be run in a single thread without blocking) you have to have #1. native support for async (callback) from the DB for each fetch call and #2 wrap that in your custom event loop implementation. Here you subclass the Base loop, and overwrite methods to integrate your db callbacks. For example you can create a base loop that implements a pipe server. the db writes to the pipe and python polls the pipe. See the implementation of Proactor event loop in the asyncio code base. Note: I have never implemented any custom event loop.
I am not familiar with leveldb, but for a key-value store, it is not clear if there will be any significant benefit for such a callback for fetch and pure non-blocking implementation. In case you are getting multiple fetches inside an iterator and that is your main problem you can make the loop async (with each fetch still blocking) and can improve your performance. Below is a dummy code that explains this.
import asyncio
import random
import time
async def talk_to_db(d):
"""
blocking db iteration. sleep is the fetch function.
"""
for k, v in d.items():
time.sleep(1)
yield (f"{k}:{v}")
async def talk_to_db_async(d):
"""
real non-blocking db iteration. fetch (sleep) is native async here
"""
for k, v in d.items():
await asyncio.sleep(1)
yield (f"{k}:{v}")
async def talk_to_db_async_loop(d):
"""
semi-non-blocking db iteration. fetch is blocking, but the
loop is not.
"""
for k, v in d.items():
time.sleep(1)
yield (f"{k}:{v}")
await asyncio.sleep(0)
async def db_call_wrapper(db):
async for row in talk_to_db(db):
print(row)
async def db_call_wrapper_async(db):
async for row in talk_to_db_async(db):
print(row)
async def db_call_wrapper_async_loop(db):
async for row in talk_to_db_async_loop(db):
print(row)
async def func(i):
await asyncio.sleep(5)
print(f"done with {i}")
database = {i:random.randint(1,20) for i in range(20)}
async def main():
db_coro = db_call_wrapper(database)
coros = [func(i) for i in range(20)]
coros.append(db_coro)
await asyncio.gather(*coros)
async def main_async():
db_coro = db_call_wrapper_async(database)
coros = [func(i) for i in range(20)]
coros.append(db_coro)
await asyncio.gather(*coros)
async def main_async_loop():
db_coro = db_call_wrapper_async_loop(database)
coros = [func(i) for i in range(20)]
coros.append(db_coro)
await asyncio.gather(*coros)
# run the blocking db iteration
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
# run the non-blocking db iteration
loop = asyncio.get_event_loop()
loop.run_until_complete(main_async())
# run the non-blocking (loop only) db iteration
loop = asyncio.get_event_loop()
loop.run_until_complete(main_async_loop())
This is something you can try. Otherwise, I would say your current method is quite efficient. I do not think BaseProxy can give you an async acall API, it does not know how to handle the callback from your db.

How to sniff a network interface with Twisted?

I need to receive raw packets from a network interface within Twisted code. The packets will not have the correct IP or MAC address, nor valid headers, so I need the raw thing.
I have tried looking into twisted.pair, but I was not able to figure out how to use it to get at the raw interface.
Normally, I would use scapy.all.sniff. However, that is blocking, so I can't just use it with Twisted. (I also cannot use scapy.all.sniff with a timeout and busy-loop, because I don't want to lose packets.)
A possible solution would be to run scapy.all.sniff in a thread and somehow call back into Twisted when I get a packet. This seems a bit inelegant (and also, I don't know how to do it because I am a Twisted beginner), but I might settle for that if I don't find anything better.

You could run a distributed system and pass the data through a central queuing system. Take the Unix philosophy and create a single application that does a few tasks and does them well. Create one application that sniffs the packets (you can use scapy here since it won't really matter if you block anything) then sends them to a queue (RabitMQ, Redis, SQS, etc) and have another application process the packet from the queue. This method should give you the least amount of headache.
If you need to run everything in a single application, then threads/multiprocessing is the only option. But there are some design patterns you'll want to follow. You can also break up the following code into separate functions and use a dedicated queuing system.
from threading import Thread
from time import sleep
from twisted.internet import defer, reactor
class Sniffer(Thread):
def __init__(self, _reactor, shared_queue):
super().__init__()
self.reactor = _reactor
self.shared_queue = shared_queue
def run(self):
"""
Sniffer logic here
"""
while True:
self.reactor.callFromThread(self.shared_queue.put, 'hello world')
sleep(5)
#defer.inlineCallbacks
def consume_from_queue(_id, _reactor, shared_queue):
item = yield shared_queue.get()
print(str(_id), item)
_reactor.callLater(0, consume_from_queue, _id, _reactor, shared_queue)
def main():
shared_queue = defer.DeferredQueue()
sniffer = Sniffer(reactor, shared_queue)
sniffer.daemon = True
sniffer.start()
workers = 4
for i in range(workers):
consume_from_queue(i+1, reactor, shared_queue)
reactor.run()
main()
The Sniffer class starts outside of Twisted's control. Notice the sniffer.daemon = True, this is so that the thread will stop when the main thread has stopped. If it were set to False (default) then the application will exit only if all the threads have come to an end. Depending on the task at hand this may or may not always be possible. If you can take breaks from sniffing to check a thread event, then you might be able to stop the thread in a safer way.
self.reactor.callFromThread(self.shared_queue.put, 'hello world') is necessary so that the item being put into the queue happens in the main reactor thread as opposed to the thread the Sniffer executes. The main benefit of this would be that there would be some sort of synchronization of the messages coming from the threads (assuming you plan to scale to sniffing multiple interfaces). Also, I wasn't sure of DeferredQueue objects are thread safe :) I treated them like they were not.
Since Twisted isn't managing the threads in this case, it's vital that the developer does. Notice the worker loop and consume_from_queue(i+1, reactor, shared_queue). This loop ensures only the desired number of workers are handling tasks. Inside the consume_from_queue() function, shared_queue.get() will wait (non-blocking) until an item is put into the queue, prints the item, then schedule another consume_from_queue().

Async multiprocessing python

So I've read this nice article about asynch threads in python. Tough, the last one have some troubles with the GIL and threads are not as effective as it may seems.
Luckily python incorporates Multiprocessing which are designed to be not affected by this trouble.
I'd like to understand how to implement a multiprocessing queue (with Pipe open for each process) in an async manner so it wouldn't hang a running async webserver .
I've read this topic however I'm not looking for performance but rather boxing out a big calculation that hangs my webserver. Those calculations require pictures so they might have a significant i/o exchange but in my understanding this is something that is pretty well handled by async.
All the calcs are separate from each other so they are not meant to be mixed.
I'm trying to build this in front of a ws handler.
If you hint heresy in this please let me know as well :)

This is re-sourced from a article after someone nice on #python irc hinted me on async executors, and another answer on reddit :
(2) Using ProcessPoolExecutor
“The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.”
import asyncio
from concurrent.futures import ProcessPoolExecutor
def cpu_heavy(num):
print('entering cpu_heavy', num)
import time
time.sleep(10)
print('leaving cpu_heavy', num)
return num
async def main(loop):
print('entering main')
executor = ProcessPoolExecutor(max_workers=3)
data = await asyncio.gather(*(loop.run_in_executor(executor, cpu_heavy, num)
for num in range(3)))
print('got result', data)
print('leaving main')
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
And this from another nice guy on reddit ;)

Celery non blocking client

import proj.tasks
import time
import sys
import socket
import logging
import datetime
lat_to, ts = proj.tasks.timeme(time.time()) <---- blocking call
lat_from = time.time() - ts
print lat_to, lat_from
Celery task blocks so I cant take advantage of many workers.
Is it possible to make that a non blocking call?
NOTE: Ive looked at tornado-celery as an option for non blocking celery client but I am not sure if i like that approach as i need to launch tornado celery web server.

When calling a celery task the method executes synchronously. THe power of a task queue is putting a task on the queue and letting the workers asynchronously do their work.
You can do this using the task.delay method.
I'm not quiet sure what delay does internally but it returns very quickly, and the work of your method is not actually being done when you call it, your task is just being put on the work queue.

tornado-celery works fine on my side, but it by default waits for task's result before callback,
class GenAsyncHandler(web.RequestHandler):
#asynchronous
#gen.coroutine
def get(self):
response = yield gen.Task(tasks.sleep.apply_async, args=[3])
self.write(str(response.result))
self.finish()
if you want to have task callback options as below, you can try my fork
After task sent
After task sent and ack-ed
To fit original celery
behavior that task.apply_async() to get the AsyncResult first, then
AsyncResult.get() to get actual task result in tornado asynchronous
fashion

Pattern for a background Twisted server that fills an incoming message queue and empties an outgoing message queue?

I'd like to do something like this:
twistedServer.start() # This would be a nonblocking call
while True:
while twistedServer.haveMessage():
message = twistedServer.getMessage()
response = handleMessage(message)
twistedServer.sendResponse(response)
doSomeOtherLogic()
The key thing I want to do is run the server in a background thread. I'm hoping to do this with a thread instead of through multiprocessing/queue because I already have one layer of messaging for my app and I'd like to avoid two. I'm bringing this up because I can already see how to do this in a separate process, but what I'd like to know is how to do it in a thread, or if I can. Or if perhaps there is some other pattern I can use that accomplishes this same thing, like perhaps writing my own reactor.run method. Thanks for any help.
:)

The key thing I want to do is run the server in a background thread.
You don't explain why this is key, though. Generally, things like "use threads" are implementation details. Perhaps threads are appropriate, perhaps not, but the actual goal is agnostic on the point. What is your goal? To handle multiple clients concurrently? To handle messages of this sort simultaneously with events from another source (for example, a web server)? Without knowing the ultimate goal, there's no way to know if an implementation strategy I suggest will work or not.
With that in mind, here are two possibilities.
First, you could forget about threads. This would entail defining your event handling logic above as only the event handling parts. The part that tries to get an event would be delegated to another part of the application, probably something ultimately based on one of the reactor APIs (for example, you might set up a TCP server which accepts messages and turns them into the events you're processing, in which case you would start off with a call to reactor.listenTCP of some sort).
So your example might turn into something like this (with some added specificity to try to increase the instructive value):
from twisted.internet import reactor
class MessageReverser(object):
"""
Accept messages, reverse them, and send them onwards.
"""
def __init__(self, server):
self.server = server
def messageReceived(self, message):
"""
Callback invoked whenever a message is received. This implementation
will reverse and re-send the message.
"""
self.server.sendMessage(message[::-1])
doSomeOtherLogic()
def main():
twistedServer = ...
twistedServer.start(MessageReverser(twistedServer))
reactor.run()
main()
Several points to note about this example:
I'm not sure how your twistedServer is defined. I'm imagining that it interfaces with the network in some way. Your version of the code would have had it receiving messages and buffering them until they were removed from the buffer by your loop for processing. This version would probably have no buffer, but instead just call the messageReceived method of the object passed to start as soon as a message arrives. You could still add buffering of some sort if you want, by putting it into the messageReceived method.
There is now a call to reactor.run which will block. You might instead write this code as a twistd plugin or a .tac file, in which case you wouldn't be directly responsible for starting the reactor. However, someone must start the reactor, or most APIs from Twisted won't do anything. reactor.run blocks, of course, until someone calls reactor.stop.
There are no threads used by this approach. Twisted's cooperative multitasking approach to concurrency means you can still do multiple things at once, as long as you're mindful to cooperate (which usually means returning to the reactor once in a while).
The exact times the doSomeOtherLogic function is called is changed slightly, because there's no notion of "the buffer is empty for now" separate from "I just handled a message". You could change this so that the function is installed called once a second, or after every N messages, or whatever is appropriate.
The second possibility would be to really use threads. This might look very similar to the previous example, but you would call reactor.run in another thread, rather than the main thread. For example,
from Queue import Queue
from threading import Thread
class MessageQueuer(object):
def __init__(self, queue):
self.queue = queue
def messageReceived(self, message):
self.queue.put(message)
def main():
queue = Queue()
twistedServer = ...
twistedServer.start(MessageQueuer(queue))
Thread(target=reactor.run, args=(False,)).start()
while True:
message = queue.get()
response = handleMessage(message)
reactor.callFromThread(twistedServer.sendResponse, response)
main()
This version assumes a twistedServer which works similarly, but uses a thread to let you have the while True: loop. Note:
You must invoke reactor.run(False) if you use a thread, to prevent Twisted from trying to install any signal handlers, which Python only allows to be installed in the main thread. This means the Ctrl-C handling will be disabled and reactor.spawnProcess won't work reliably.
MessageQueuer has the same interface as MessageReverser, only its implementation of messageReceived is different. It uses the threadsafe Queue object to communicate between the reactor thread (in which it will be called) and your main thread where the while True: loop is running.
You must use reactor.callFromThread to send the message back to the reactor thread (assuming twistedServer.sendResponse is actually based on Twisted APIs). Twisted APIs are typically not threadsafe and must be called in the reactor thread. This is what reactor.callFromThread does for you.
You'll want to implement some way to stop the loop and the reactor, one supposes. The python process won't exit cleanly until after you call reactor.stop.
Note that while the threaded version gives you the familiar, desired while True loop, it doesn't actually do anything much better than the non-threaded version. It's just more complicated. So, consider whether you actually need threads, or if they're merely an implementation technique that can be exchanged for something else.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.