I'm looking for a way to set request level context in Tornado.
This is useful for logging purpose, to print some request attributes with every log line (like user_id).
I'd like to populate the context in web.RequestHandler and then access it in other coroutines that this request called.
class WebRequestHandler(web.RequestHandler):
#gen.coroutine
def post(self):
RequestContext.test_mode = self.application.settings.get('test_mode', False)
RequestContext.corr_id = self.request.header.get('X-Request-ID')
result = yield some_func()
self.write(result)
#gen.coroutine
def some_func()
if RequestContext.test_mode:
print "In test mode"
do more async calls
Currently I pass context object (dict with values) to every async function call down stream, this way every part of the code can do monitoring and logging with right context.
I'm looking for a cleaner/simpler solution.
Thanks
Alex
The concept of request context doesn't really hold well in async frameworks (especially if you have high volume traffic) for the simple fact that there could potentially be hundreds of concurrent requests and it becomes difficult to determine which "context" to use. This works for sequential frameworks like Flask, Falcon, Django, etc. because requests are handled one by one and it's simple to determine which request you're dealing with.
The preferred method of handling functionality between a request start and end is to override prepare and on_finish respectively.
class WebRequestHandler(web.RequestHandler):
def prepare(self):
print('Logging...prepare')
if self.application.settings.get('test_mode', False):
print("In test mode")
print('X-Request-ID: {0}'.format(self.request.header.get('X-Request-ID')))
#gen.coroutine
def post(self):
result = yield some_func()
self.write(result)
def on_finish(self):
print('Logging...on_finish')
The simple solution would be to create an object that represents the context of your request and pass that into your log function. Example:
class RequestContext(object):
"""
Hold request context
"""
class WebRequestHandler(web.RequestHandler):
#gen.coroutine
def post(self):
# create new context obj and fill w/ necessary parameters
request_context = RequestContext()
request_context.test_mode = self.application.settings.get('test_mode', False)
request_context.corr_id = self.request.header.get('X-Request-ID')
# pass context objects into coroutine
result = yield some_func(request_context)
self.write(result)
#gen.coroutine
def some_func(request_context)
if request_context.test_mode:
print "In test mode"
# do more async calls
Related
I have a python app written in the Tornado Asynchronous framework. When an HTTP request comes in, this method gets called:
#classmethod
def my_method(cls, my_arg1):
# Do some Database Transaction #1
x = get_val_from_db_table1(id=1, 'x')
y = get_val_from_db_table2(id=7, 'y')
x += x + (2 * y)
# Do some Database Transaction #2
set_val_in_db_table1(id=1, 'x', x)
return True
The three database operations are interrelated. And this is a concurrent application so multiple such HTTP calls can be happening concurrently and hitting the same DB.
For data-integrity purposes, its important that the three database operations in this method are all called without another processes reading or writing to those database rows in between.
How can I make sure this method has database atomicity? Does Tornado have a decorator for this?
Synchronous database access
You haven't stated how you access your database. If, which is likely, you have synchronous DB access in get_val_from_db_table1 and friends (e.g. with pymysql) and my_method is blocking (doesn't return control to IO loop) then you block your server (which has implications on performance and responsiveness of your server) but effectively serialise your clients and only one can execute my_method at a time. So in terms of data consistency you don't need to do anything, but generally it's a bad design. You can solve both with #xyres's solution in short term (at cost of keeping in mind thread-safely concerns because most of Tornado's functionality isn't thread-safe).
Asynchronous database access
If you have asynchronous DB access in get_val_from_db_table1 and friends (e.g. with tornado-mysql) then you can use tornado.locks.Lock. Here's an example:
from tornado import web, gen, locks, ioloop
_lock = locks.Lock()
def synchronised(coro):
async def wrapper(*args, **kwargs):
async with _lock:
return await coro(*args, **kwargs)
return wrapper
class MainHandler(web.RequestHandler):
async def get(self):
result = await self.my_method('foo')
self.write(result)
#classmethod
#synchronised
async def my_method(cls, arg):
# db access
await gen.sleep(0.5)
return 'data set for {}'.format(arg)
if __name__ == '__main__':
app = web.Application([('/', MainHandler)])
app.listen(8080)
ioloop.IOLoop.current().start()
Note that the above is said about normal single-process Tornado application. If you use tornado.process.fork_processes, then you can only go with multiprocessing.Lock.
Since you want to run those three db operations one right after the other, the function my_method must be non-asynchronous.
But this would also mean that my_method will block the server. You definitely don't want that. One way that I can think of is to run this function in another thread. This won't block the server and will keep accepting new requests while the operations are running. And since, it's going to be non-async, db atomicity is guaranteed.
Here's the relevant code to get you started:
import concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
# Don't set `max_workers` more than 1, because then multiple
# threads will be able to perform db operations
class MyHandler(...):
#gen.coroutine
def get(self):
yield executor.submit(MyHandler.my_method, my_arg1)
# above, `yield` is used to wait for
# db operations to finish
# if you don't want to wait and return
# a response immediately remove the
# `yield` keyword
self.write('Done')
#classmethod
def my_method(cls, my_arg1):
# do db stuff ...
return True
Let's say I have this object which is a core part of my system, and most of its work is to communicate with a remote agent - send tasks and get responses:
class AgentAPI:
... # Constructor etc.
def do_foo(self, args):
.... # Send data and wait for response
return result
def do_bar(self, args):
.... # Send data and wait for response
return result
The way I would go about unit-testing this object is with a mock, right?
So:
def test_agent_api_foo():
agent_api = AgentAPI(someargs)
agent_api.do_foo = Mock()
.... # test logic
That means the methods are not really executed, of course...
So how do I keep my unit-tests reliable on highly distributed systems?
In our django app, we are calling an API of another app multiple times from different places with the same inputs.
One solution could be to store the response in the request context after it gets called.
However, the request context is not passed into the calling methods. The calling methods are deep inside the control flow.
Therefore, I thought another solution could be store the response in a thread local storage.
However, I learnt from a question here that the thread locals last longer than the request itself. I don't want that. I want this storage to last only as long as the request itself.
I implemented a decorator to wrap the calling function and it looks like the below.
import threading
threadlocal = threading.local()
threadlocal.cache = {}
def cache(func):
def inner(*args, **kwargs):
cache_key = (args, str(kwargs))
if cache_key in threadlocal.cache:
return threadlocal.cache[cache_key]
response = func(*args, **kwargs)
threadlocal.cache[cachekey] = response
return response
return inner
Is there a better storage mechanism here other than threading.local?
just struggling on this. If i have an async request handler that during it's execution calls other functions that do something (for example async db queries) and then they call "finish" on their own, do i have to mark them as async? because if the application is structured like the example, i get errors about multiple calls to "finish". I guess i miss something.
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
#gen.engine
def post(self):
#do some stuff even with mongo motor
self.handleRequest(bla)
#gen.engine
def handleRequest(self,bla):
#do things,use motor call other functions
self.finish(result)
Do all functions have to be marked with async?
thanks
Calling finish ends the HTTP request see docs. Other functions should not call 'finish'
I think you want to do something like this. Note that there is a extra param 'callback' which is added into async functions:
#tornado.web.asynchronous
#gen.engine
def post(self):
query =''
response = yield tornado.gen.Task(
self.handleRequest,
query=query
)
result = response[0][0]
errors = response[1]['error']
# Do stuff with result
def handleRequest(self, callback, query):
self.motor['my_collection'].find(query, callback=callback)
See tornado.gen docs for more info
I've got a setup where Tornado is used as kind of a pass-through for workers. Request is received by Tornado, which sends this request to N workers, aggregates results and sends it back to client. Which works fine, except when for some reason timeout occurs — then I've got memory leak.
I've got a setup which similar to this pseudocode:
workers = ["http://worker1.example.com:1234/",
"http://worker2.example.com:1234/",
"http://worker3.example.com:1234/" ...]
class MyHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def post(self):
responses = []
def __callback(response):
responses.append(response)
if len(responses) == len(workers):
self._finish_req(responses)
for url in workers:
async_client = tornado.httpclient.AsyncHTTPClient()
request = tornado.httpclient.HTTPRequest(url, method=self.request.method, body=body)
async_client.fetch(request, __callback)
def _finish_req(self, responses):
good_responses = [r for r in responses if not r.error]
if not good_responses:
raise tornado.web.HTTPError(500, "\n".join(str(r.error) for r in responses))
results = aggregate_results(good_responses)
self.set_header("Content-Type", "application/json")
self.write(json.dumps(results))
self.finish()
application = tornado.web.Application([
(r"/", MyHandler),
])
if __name__ == "__main__":
##.. some locking code
application.listen()
tornado.ioloop.IOLoop.instance().start()
What am I doing wrong? Where does the memory leak come from?
I don't know the source of the problem, and it seems gc should be able to take care of it, but there's two things you can try.
The first method would be to simplify some of the references (it looks like there may still be references to responses when the RequestHandler completes):
class MyHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def post(self):
self.responses = []
for url in workers:
async_client = tornado.httpclient.AsyncHTTPClient()
request = tornado.httpclient.HTTPRequest(url, method=self.request.method, body=body)
async_client.fetch(request, self._handle_worker_response)
def _handle_worker_response(self, response):
self.responses.append(response)
if len(self.responses) == len(workers):
self._finish_req()
def _finish_req(self):
....
If that doesn't work, you can always invoke garbage collection manually:
import gc
class MyHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def post(self):
....
def _finish_req(self):
....
def on_connection_close(self):
gc.collect()
The code looks good. The leak is probably inside Tornado.
I only stumbled over this line:
async_client = tornado.httpclient.AsyncHTTPClient()
Are you aware of the instantiation magic in this constructor?
From the docs:
"""
The constructor for this class is magic in several respects: It actually
creates an instance of an implementation-specific subclass, and instances
are reused as a kind of pseudo-singleton (one per IOLoop). The keyword
argument force_instance=True can be used to suppress this singleton
behavior. Constructor arguments other than io_loop and force_instance
are deprecated. The implementation subclass as well as arguments to
its constructor can be set with the static method configure()
"""
So actually, you don't need to do this inside the loop. (On the other
hand, it should not do any harm.) But which implementation are you
using CurlAsyncHTTPClient or SimpleAsyncHTTPClient?
If it is SimpleAsyncHTTPClient, be aware of this comment in the code:
"""
This class has not been tested extensively in production and
should be considered somewhat experimental as of the release of
tornado 1.2.
"""
You can try switching to CurlAsyncHTTPClient. Or follow
Nikolay Fominyh's suggestion and trace the calls to __callback().