Call to async endpoint gets blocked by another thread - python

I have a tornado webservice which is going to serve something around 500 requests per minute. All these requests are going to hit 1 specific endpoint. There is a C++ program that I have compiled using Cython and use it inside the tornado service as my processor engine. Each request that goes to /check/ will trigger a function call in the C++ program (I will call it handler) and the return value will get sent to user as response.
This is how I wrap the handler class. One important point is that I do not instantiate the handler in __init__. There is another route in my tornado code that I want to start loading the DataStructure after an authroized request hits that route. (e.g. /reload/)
executors = ThreadPoolExecutor(max_workers=4)
class CheckerInstance(object):
def __init__(self, *args, **kwargs):
self.handler = None
self.is_loading = False
self.is_live = False
def init(self):
if not self.handler:
self.handler = pDataStructureHandler()
self.handler.add_words_from_file(self.data_file_name)
self.end_loading()
self.go_live()
def renew(self):
self.handler = None
self.init()
class CheckHandler(tornado.web.RequestHandler):
async def get(self):
query = self.get_argument("q", None).encode('utf-8')
answer = query
if not checker_instance.is_live:
self.write(dict(answer=self.get_argument("q", None), confidence=100))
return
checker_response = await checker_instance.get_response(query)
answer = checker_response[0]
confidence = checker_response[1]
if self.request.connection.stream.closed():
return
self.write(dict(correct=answer, confidence=confidence, is_cache=is_cache))
def on_connection_close(self):
self.wait_future.cancel()
class InstanceReloadHandler(BasicAuthMixin, tornado.web.RequestHandler):
def prepare(self):
self.get_authenticated_user(check_credentials_func=credentials.get, realm='Protected')
def new_file_exists(self):
return True
def can_reload(self):
return not checker_instance.is_loading
def get(self):
error = False
message = None
if not self.can_reload():
error = True
message = 'another job is being processed!'
else:
if not self.new_file_exists():
error = True
message = 'no new file found!'
else:
checker_instance.go_fake()
checker_instance.start_loading()
tornado.ioloop.IOLoop.current().run_in_executor(executors, checker_instance.renew)
message = 'job started!'
if self.request.connection.stream.closed():
return
self.write(dict(
success=not error, message=message
))
def on_connection_close(self):
self.wait_future.cancel()
def main():
app = tornado.web.Application(
[
(r"/", MainHandler),
(r"/check", CheckHandler),
(r"/reload", InstanceReloadHandler),
(r"/health", HealthHandler),
(r"/log-event", SubmitLogHandler),
],
debug=options.debug,
)
checker_instance = CheckerInstance()
I want this service to keep responding after checker_instance.renew starts running in another thread. But this is not what happens. When I hit the /reload/ endpoint and renew function starts working, any request to /check/ halts and waits for the reloading process to finish and then it starts working again. When the DataStructure is being loaded, the service should be in fake mode and respond to people with the same query that they send as input.
I have tested this code in my development environment with an i5 CPU (4 CPU cores) and it works just fine! But in the production environment (3 double-thread CPU cores) the /check/ endpoint halts requests.

It is difficult to fully trace the events being handled because you have clipped out some of the code for brevity. For instance, I don't see a get_response implementation here so I don't know if it is awaiting something itself that could be dependent on the state of checker_instance.
One area I would explore is in the thread-safety (or seeming absence of) in passing the checker_instance.renew to run_in_executor. This feels questionable to me because you are mutating the state of a single instance of CheckerInstance from a separate thread. While it might not break things explicitly, it does seem like this could be introducing odd race conditions or unanticipated copies of memory that might explain the unexpected behavior you are experiencing
If possible, I would make whatever load behavior you have that you want to offload to a thread be completely self-contained and when the data is loaded, return it as the function result which can then be fed back into you checker_instance. If you were to do this with the code as-is, you would want to await the run_in_executor call for its result and then update the checker_instance. This would mean the reload GET request would wait until the data was loaded. Alternatively, in your reload GET request, you could ioloop.spawn_callback to a function that triggers the run_in_executor in this manner, allowing the reload request to complete instead of waiting.

Related

How to make a tornado request atomic in the Database

I have a python app written in the Tornado Asynchronous framework. When an HTTP request comes in, this method gets called:
#classmethod
def my_method(cls, my_arg1):
# Do some Database Transaction #1
x = get_val_from_db_table1(id=1, 'x')
y = get_val_from_db_table2(id=7, 'y')
x += x + (2 * y)
# Do some Database Transaction #2
set_val_in_db_table1(id=1, 'x', x)
return True
The three database operations are interrelated. And this is a concurrent application so multiple such HTTP calls can be happening concurrently and hitting the same DB.
For data-integrity purposes, its important that the three database operations in this method are all called without another processes reading or writing to those database rows in between.
How can I make sure this method has database atomicity? Does Tornado have a decorator for this?
Synchronous database access
You haven't stated how you access your database. If, which is likely, you have synchronous DB access in get_val_from_db_table1 and friends (e.g. with pymysql) and my_method is blocking (doesn't return control to IO loop) then you block your server (which has implications on performance and responsiveness of your server) but effectively serialise your clients and only one can execute my_method at a time. So in terms of data consistency you don't need to do anything, but generally it's a bad design. You can solve both with #xyres's solution in short term (at cost of keeping in mind thread-safely concerns because most of Tornado's functionality isn't thread-safe).
Asynchronous database access
If you have asynchronous DB access in get_val_from_db_table1 and friends (e.g. with tornado-mysql) then you can use tornado.locks.Lock. Here's an example:
from tornado import web, gen, locks, ioloop
_lock = locks.Lock()
def synchronised(coro):
async def wrapper(*args, **kwargs):
async with _lock:
return await coro(*args, **kwargs)
return wrapper
class MainHandler(web.RequestHandler):
async def get(self):
result = await self.my_method('foo')
self.write(result)
#classmethod
#synchronised
async def my_method(cls, arg):
# db access
await gen.sleep(0.5)
return 'data set for {}'.format(arg)
if __name__ == '__main__':
app = web.Application([('/', MainHandler)])
app.listen(8080)
ioloop.IOLoop.current().start()
Note that the above is said about normal single-process Tornado application. If you use tornado.process.fork_processes, then you can only go with multiprocessing.Lock.
Since you want to run those three db operations one right after the other, the function my_method must be non-asynchronous.
But this would also mean that my_method will block the server. You definitely don't want that. One way that I can think of is to run this function in another thread. This won't block the server and will keep accepting new requests while the operations are running. And since, it's going to be non-async, db atomicity is guaranteed.
Here's the relevant code to get you started:
import concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
# Don't set `max_workers` more than 1, because then multiple
# threads will be able to perform db operations
class MyHandler(...):
#gen.coroutine
def get(self):
yield executor.submit(MyHandler.my_method, my_arg1)
# above, `yield` is used to wait for
# db operations to finish
# if you don't want to wait and return
# a response immediately remove the
# `yield` keyword
self.write('Done')
#classmethod
def my_method(cls, my_arg1):
# do db stuff ...
return True

Correct use of coroutine in Tornado web server

I'm trying to convert a simple syncronous server to an asyncronous version, the server receives post requestes and it retrieves the response from an external web service (amazon sqs). Here's the syncronous code
def post(self):
zoom_level = self.get_argument('zoom_level')
neLat = self.get_argument('neLat')
neLon = self.get_argument('neLon')
swLat = self.get_argument('swLat')
swLon = self.get_argument('swLon')
data = self._create_request_message(zoom_level, neLat, neLon, swLat, swLon)
self._send_parking_spots_request(data)
#....other stuff
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
self._sqs_send_queue.write(msg)
Reading Tornado documentation and some threads here I ended with this code using coroutines:
def post(self):
zoom_level = self.get_argument('zoom_level')
neLat = self.get_argument('neLat')
neLon = self.get_argument('neLon')
swLat = self.get_argument('swLat')
swLon = self.get_argument('swLon')
data = self._create_request_message(zoom_level, neLat, neLon, swLat, swLon)
self._send_parking_spots_request(data)
self.finish()
#gen.coroutine
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
yield gen.Task(write_msg, self._sqs_send_queue, msg)
def write_msg(queue, msg, callback=None):
queue.write(msg)
Comparing the performances using siege I get that the second version is even worse than the original one, so probably there's something about coroutines and Torndado asyncronous programming that I didn't understand at all.
Could you please help me with this?
Edit: self._sqs_send_queue it's a queue object retrieved from boto interface and queue.write(msg) returns the message that has been written on the queue
tornado relies on you converting all your I/O to be non-blocking. Simply sticking the same code you were using before inside of a gen.Task will not improve performance at all, because the I/O itself is still going to block the event loop. Additionally, you need to make your post method a coroutine, and call _send_parking_spots_requests using yield for the code to behave properly. So, a "correct" solution would look something like this:
#gen.coroutine
def post(self):
...
yield self._send_parking_spots_request(data) # wait (without blocking the event loop) until the method is done
self.finish()
#gen.coroutine
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
yield gen.Task(write_msg, self._sqs_send_queue, msg)
def write_msg(queue, msg, callback=None):
yield queue.write(msg, callback=callback) # This has to do non-blocking I/O.
In this example, queue.write would need to be some API that sends your request using non-blocking I/O, and executes callback when a response is received. Without knowing exactly what queue in your original example is, I can't specify exactly how that can be implemented in your case.
Edit: Assuming you're using boto, you may want to check out bototornado, which implements the exact same API I described above:
def write(self, message, callback=None):
"""
Add a single message to the queue.
:type message: Message
:param message: The message to be written to the queue
:rtype: :class:`boto.sqs.message.Message`
:return: The :class:`boto.sqs.message.Message` object that was written.

Terminate a hung redis pubsub.listen() thread

Related to this question I have the following code which subscribes to a redis pubsub queue and uses the handler provided in __init__ to feed the messages to the class that processes them:
from threading import Thread
import msgpack
class Subscriber(Thread):
def __init__(self, redis_connection, channel_name, handler):
super(Subscriber, self).__init__(name="Receiver")
self.connection = redis_connection
self.pubsub = self.connection.pubsub()
self.channel_name = channel_name
self.handler = handler
self.should_die = False
def start(self):
self.pubsub.subscribe(self.channel_name)
super(Subscriber, self).start()
def run(self):
for msg in self.pubsub.listen():
if self.should_die:
return
try:
data = msg["data"]
unpacked = msgpack.unpackb(data)
except TypeError:
# stop non-msgpacked, invalid, messages breaking stuff
# other validation happens in handler
continue
self.handler(unpacked)
def die(self):
self.should_die = True
In the linked question above, it is noted that pubsub.listen() never returns if the connection is dropped. Therefore, my die() function, while it can be called, will never actually cause the thread to terminate because it is hanging on the call to listen() inside the thread's run().
The accepted answer on the linked question mentions hacking redis-py's connection pool. I really don't want to do this and have a forked version of redis-py (at least until the fix is hopefully accepted into master), but I've had a look at the redis-py code anyway and don't immediately see where this change would be made.
Does anyone have an idea how to cleanly solve the hanging redis-py listen() call?
What issues will I incur by directly using Thread._Thread__stop?
To close this out after so many years. It ended up being a bug in the redis library. I debugged it and submitted the PR. It shouldn't occur any more.

Flask end response and continue processing

Is there a way in Flask to send the response to the client and then continue doing some processing? I have a few book-keeping tasks which are to be done, but I don't want to keep the client waiting.
Note that these are actually really fast things I wish to do, thus creating a new thread, or using a queue, isn't really appropriate here. (One of these fast things is actually adding something to a job queue.)
QUICK and EASY method.
We will use pythons Thread Library to acheive this.
Your API consumer has sent something to process and which is processed by my_task() function which takes 10 seconds to execute.
But the consumer of the API wants a response as soon as they hit your API which is return_status() function.
You tie the my_task to a thread and then return the quick response to the API consumer, while in the background the big process gets compelete.
Below is a simple POC.
import os
from flask import Flask,jsonify
import time
from threading import Thread
app = Flask(__name__)
#app.route("/")
def main():
return "Welcome!"
#app.route('/add_')
def return_status():
"""Return first the response and tie the my_task to a thread"""
Thread(target = my_task).start()
return jsonify('Response asynchronosly')
def my_task():
"""Big function doing some job here I just put pandas dataframe to csv conversion"""
time.sleep(10)
import pandas as pd
pd.DataFrame(['sameple data']).to_csv('./success.csv')
return print('large function completed')
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080)
Sadly teardown callbacks do not execute after the response has been returned to the client:
import flask
import time
app = flask.Flask("after_response")
#app.teardown_request
def teardown(request):
time.sleep(2)
print("teardown_request")
#app.route("/")
def home():
return "Success!\n"
if __name__ == "__main__":
app.run()
When curling this you'll note a 2s delay before the response displays, rather than the curl ending immediately and then a log 2s later. This is further confirmed by the logs:
teardown_request
127.0.0.1 - - [25/Jun/2018 15:41:51] "GET / HTTP/1.1" 200 -
The correct way to execute after a response is returned is to use WSGI middleware that adds a hook to the close method of the response iterator. This is not quite as simple as the teardown_request decorator, but it's still pretty straight-forward:
import traceback
from werkzeug.wsgi import ClosingIterator
class AfterResponse:
def __init__(self, app=None):
self.callbacks = []
if app:
self.init_app(app)
def __call__(self, callback):
self.callbacks.append(callback)
return callback
def init_app(self, app):
# install extension
app.after_response = self
# install middleware
app.wsgi_app = AfterResponseMiddleware(app.wsgi_app, self)
def flush(self):
for fn in self.callbacks:
try:
fn()
except Exception:
traceback.print_exc()
class AfterResponseMiddleware:
def __init__(self, application, after_response_ext):
self.application = application
self.after_response_ext = after_response_ext
def __call__(self, environ, start_response):
iterator = self.application(environ, start_response)
try:
return ClosingIterator(iterator, [self.after_response_ext.flush])
except Exception:
traceback.print_exc()
return iterator
Which you can then use like this:
#app.after_response
def after():
time.sleep(2)
print("after_response")
From the shell you will see the response return immediately and then 2 seconds later the after_response will hit the logs:
127.0.0.1 - - [25/Jun/2018 15:41:51] "GET / HTTP/1.1" 200 -
after_response
This is a summary of a previous answer provided here.
I had a similar problem with my blog. I wanted to send notification emails to those subscribed to comments when a new comment was posted, but I did not want to have the person posting the comment waiting for all the emails to be sent before he gets his response.
I used a multiprocessing.Pool for this. I started a pool of one worker (that was enough, low traffic site) and then each time I need to send an email I prepare everything in the Flask view function, but pass the final send_email call to the pool via apply_async.
You can find an example on how to use celery from within Flask
here https://gist.github.com/jzempel/3201722
The gist of the idea (pun intended) is to define the long, book-keeping tasks as #celery.task and use apply_async1 or delay to from within the view to start the task
Sounds like Teardown Callbacks would support what you want. And you might want to combine it with the pattern from Per-Request After-Request Callbacks to help with organizing the code.
You can do this with WSGI's close protocol, exposed from the Werkzeug Response object's call_on_close decorator. Explained in this other answer here: https://stackoverflow.com/a/63080968/78903

Twisted web - write() called after finish()?

I have the following Resource to handle http POST request with twisted web:
class RootResource(Resource):
isLeaf = True
def errback(self, failure):
print "Request finished with error: %s"%str(failure.value)
return failure
def write_response_happy(self, result):
self.request.write('HAPPY!')
self.request.finish()
def write_response_unhappy(self, result):
self.request.write('UNHAPPY!')
self.request.finish()
#defer.inlineCallbacks
def method_1(self):
#IRL I have many more queries to mySQL, cassandra and memcache to get final result, this is why I use inlineCallbacks to keep the code clean.
res = yield dbpool.runQuery('SELECT something FROM table')
#Now I make a decision based on result of the queries:
if res: #Doesn't make much sense but that's only an example
self.d.addCallback(self.write_response_happy) #self.d is already available after yield, so this looks OK?
else:
self.d.addCallback(self.write_response_unhappy)
returnValue(None)
def render_POST(self, request):
self.request = request
self.d = self.method_1()
self.d.addErrback(self.errback)
return server.NOT_DONE_YET
root = RootResource()
site = server.Site(root)
reactor.listenTCP(8002, site)
dbpool = adbapi.ConnectionPool('MySQLdb', host='localhost', db='mydb', user='myuser', passwd='mypass', cp_reconnect=True)
print "Serving on 8002"
reactor.run()
I've used the ab tool (from apache utils) to test 5 POST requests one after another:
ab -n 5 -p sample_post.txt http://127.0.0.1:8002/
Works fine!
Then I tried to run the same 5 POST requests simultaneously:
ab -n 5 -c 5 -p sample_post.txt http://127.0.0.1:8002/
Here I'm getting errors: exceptions.RuntimeError: Request.write called on a request after Request.finish was called. What am I doing wrong?
As Mualig suggested in his comments, you have only one instance of RootResource. When you assign to self.request and self.d in render_POST, you overwrite whatever value those attributes already had. If two requests arrive at around the same time, then this is a problem. The first Request and Deferred are discarded and replaced by new ones associated with the request that arrives second. Later, when your database operation finishes, the second request gets both results and the first one gets none at all.
This is an example of a general mistake in concurrent programming. Your per-request state is kept where it is shared between multiple requests. When multiple requests are handled concurrently, that sharing turns into a fight, and (at least) one request has to lose.
Try keeping your per-request state where it won't be shared between multiple requests. For example, try keeping it on the Deferred:
class RootResource(Resource):
isLeaf = True
def errback(self, failure):
print "Request finished with error: %s"%str(failure.value)
# You just handled the error, don't return the failure.
# Nothing later in the callback chain is doing anything with it.
# return failure
def write_response(self, result, request):
# No "self.request" anymore, just use the argument
request.write(result)
request.finish()
#defer.inlineCallbacks
def method_1(self):
#IRL I have many more queries to mySQL, cassandra and memcache to get final result, this is why I use inlineCallbacks to keep the code clean.
res = yield dbpool.runQuery('SELECT something FROM table')
#Now I make a decision based on result of the queries:
if res: #Doesn't make much sense but that's only an example
# No "self.d" anymore, just produce a result. No shared state to confuse.
returnValue("HAPPY!")
else:
returnValue("UNHAPPY!")
def render_POST(self, request):
# No more attributes on self. Just start the operation.
d = self.method_1()
# Push the request object into the Deferred. It'll be passed to response,
# which is what needs it. Each call to method_1 returns a new Deferred,
# so no shared state here.
d.addCallback(self.write_response, request)
d.addErrback(self.errback)
return server.NOT_DONE_YET

Categories