Logging in an asynchronous Tornado (python) server

Logging in an asynchronous Tornado (python) server - python

I am working on an application in which I may potentially need to log the entire traffic reaching the server. This feature may be turned on or off, or may be used when exceptions are caught.
In any case, I am concerned about the blocking nature of disk I/O operations and their impact on the performance of the server. The business logic that is applied when a request is handled (mostly POST http requests), is asynchronous in such that every network or db calls are asynchronously executed.
On the other hand, I am concerned about the delay to the thread while it is waiting for the disk IO operation to complete. The logged messages can be a few bytes to a few KBs but in some cases a few MBs. There is no real need for the thread to pause while data is written to disk, the http request can definitely complete at that point and there is no reason that the ioloop thread not to work on another task while data is written to disk.
So my questions are:
am I over-worried about this issue? is logging to standard output
and later redirecting it to a file "good enough"?
what is the common approach, or the one you found most practical for logging in tornado-based applications? even for simple logging and not the (extreme) case I outlined above?
is this basically an ideal case for queuing the logging messages and consume them from a dedicated thread?
Say I do offload the logging to a different thread (like Homer Simpson's "Can't Someone Else Do It?"), if the thread that performs the disk logging is waiting for the disk io operation to complete, does the linux kernel takes that point as an opportunity a context switch?
Any comments or suggestion are much appreciated,
Erez

For "normal" logging (a few lines per request), I've always found logging directly to a file to be good enough. That may not be true if you're logging all the traffic to the server. The one time I've needed to do something like that I just captured the traffic externally with tcpdump instead of modifying my server.
If you want to capture it in the process, start by just writing to a file from the main thread. As always, measure things in your own environment before taking drastic action (IOLoop.set_blocking_log_threshold is useful for determining if your logging is a problem).
If writing from the main thread blocks for too long, you can either write to a queue that is processed by another thread, or write asynchronously to a pipe or socket to another process (syslog?).

" write asynchronously to a pipe or socket to another process
(syslog?"
How can it be? log_requestis a normal function - not a coroutine and all default python handlers are not driven by asyncio event loop so they are not truly asynchronous. This is imho one of the factors that make Tornado less performant than ie. aiohttp. Writing to the memory or using udp is fast but it is not async anyway.

Historic reference
I have ops experience from the time of the question circa 2016 with a Python 3.4 Tornado 4 application running on a decent bare-metal machine. The application interacted with few 3rd-party HTTP APIs, and logged some of the interactions for potential troubleshooting in the future (which is similar to OP's requirements). The machine had a RAID of HDDs. As far as I can recall the application wasn't high-traffic.
Tornado 4 had its own IO loop implementation (Tornado 5+ uses asyncio's now), and there was an interesting code instrumentation, controlled by IOLoop.set_blocking_log_threshold. Basically it logged a WARNING record with the stack trace whenever the loop was blocked longer than the threshold seconds. I can find a couple of screenshots from that time from the Sentry timeline for the very warning where the threshold was set to 1 second.
Most of the warnings had stack traces ending on the logging file handler's flush. It was a rotating and gzipping file handler. The later may explain what might take longer that a second, but anyway for the application it was desired to keep full responsibility over logging. The solution was the stdlib pair of logging.handlers.QueueHandler and logging.handlers.QueueListener.
Logging queue
Python logging cookbook has a dedicated section on Dealing with handlers that block. Here's the example from it (where listener.start starts a thread that read off the queue and delegates the records to the handler):
que = queue.Queue(-1) # no limit on size
queue_handler = QueueHandler(que)
handler = logging.StreamHandler()
listener = QueueListener(que, handler)
root = logging.getLogger()
root.addHandler(queue_handler)
formatter = logging.Formatter('%(threadName)s: %(message)s')
handler.setFormatter(formatter)
listener.start()
# The log output will display the thread which generated
# the event (the main thread) rather than the internal
# thread which monitors the internal queue. This is what
# you want to happen.
root.warning('Look out!')
listener.stop()
For a real-world reference of a QueueHanlder implementation that covers the edge cases, chronologer.client.QueueProxyHandler can be used.
asyncio instrumentation
asyncio has a debug mode.
By default asyncio runs in production mode. In order to ease the development asyncio has a debug mode. [...] When the debug mode is enabled:
asyncio checks for coroutines that were not awaited and logs them; this
mitigates the “forgotten await” pitfall.
Many non-threadsafe asyncio APIs (such as loop.call_soon() and
loop.call_at() methods) raise an exception if they are called from a
wrong thread.
The execution time of the I/O selector is logged if it takes too long to
perform an I/O operation.
Callbacks taking longer than 100ms are logged. The
loop.slow_callback_duration attribute can be used to set the minimum
execution duration in seconds that is considered “slow”.
It may look richer than what Tornado 4 had, but in fact it's not. First, it's not intended for production (and a pretty important metric is missing). Moreover, it's an after the fact warning without a stack trace, whereas Tornado's implementation was based on signal.SIGALRM and provided the stack trace at the threshold hit.
For the curious
Have you noticed that the warnings didn't go away completely? But I can assure you that the logging issue was fixed. What caused these rare issues, to my surprise, was uuid.uuid4, which can be blocking on a machine with empty entropy pool, but that's a another story.
Further reading
Python-tulip group discussion between asyncio maintainers about
asynchronous file IO, logging and aiofiles library
A section in Trio documentation that explains the theory and
trafeoffs behind async file I/O

Related

Separate logging for task queues

I'm using a task queue with python (RQ). Since workers run concurrently, without any configuration messages from all workers are mixed up.
I want to organize logging such that at any time I can get the exact full log for a given task run by a given worker. Workers run on different machines, so preferably logs would be sent over the network to some central collector, but to get started, I'd also be happy with local logging to file, as long as the messages of each task end up in a separate log file.
My question has two parts:
how to implement this in python code. I suppose that, for the "log to file" case, I could do something like this at the beginning of each task function:
logging.basicConfig(filename="some_unique_id_for_this_task_and_worker.log", level=logging.DEBUG, format="whatever")
logging.debug("My message...")
# etc.
but when it comes to logging over the network, I'm struggling to understand how the logger should be configured so that all log messages from the same task are recognizable at the collector. This is purposely vague because I haven't chosen a given technology or protocol to do this collection yet, and I'm looking for suggestions.
Assuming that the previous requirement can be accomplished, when logging over the network, what's a centralized solution that can give me the full log for a given task? I mean really showing me the full text log, not using a search interface returning events or lines (as eg, IIRC, in splunk or elasticsearch).
Thanks

Since you're running multiple processes (the RQ workers) you could probably use one of the recipes in the logging cookbook. If you want to use a SocketHandler and a socket server to receive and send messages to a file, you should also look at this recipe in the cookbook. It has a part related to running a socket listener in production.

In order to get a speed boost for my python program, should I spawn a separate thread or a separate process for logging?

In order to get a speed boost for my python program, should I spawn a separate thread or a separate process for logging? My program uses a lot of logging and I am not sure if threading is suitable because of GIL. A lot of resources seem to suggest that it should be fine for I/O. I think that logging is I/O but I am not sure what does "should be fine" mean for most of the resources out there. I just need speed.

Before you start trying to optimize a program, there are some things you should do.
To start with, you should profile you programs. You could e.g. use line_profiler.
If it turns out that your software spends a considerable amount of time logging, there are two easy options.
Set the loglevel in production code so that no or few(er) messages are logged. There will still be some overhead left, but it should be much reduced.
Use mechanical means (like sed or grep) to completely remove the logging calls from the production code. If this doesn't improve the speed/throughput of your program, logging wasn't an issue.
If neither of those is suitable and logging is a significant fraction of your programs time you can try to implement thread or process based logging.
If you want to use threading for logging, you will besically need a list and a lock. The function that is called from the main thread to do logging grabs the lock, appends the text to log to the list and releases the lock. The second thread waits for the lock, grabs the lock, pops a couple of items from the list, releases the lock and writes the items to a file. Since the GIL makes sure that only one thread at a time is running Python bytecode, this will reduce the performance of your program somewhat; part of its time is spent running the bytecode from the logging thread.
Using multiprocessing is slightly different, in that you probably want to use e.g. a Queue to send logging messages from the main process to the logging process. The logging process takes items from the Queue and writes them to disk. This means that the time spent writing the logging actions to disk is spent in a different program. But there is some overhead associated with using a Queue as well.
You would have to measure to see which method uses less time in your program.

I am going by these assumptions:
You already determine that it is your logging that is bottlenecking your program.
You have a good reason why you are logging what you are logging.
The perceived slowness is most likely due to the success or failure acknowledgement from the logging action. To avoid this "command queuing" make the calls to a separate process asynchonously and skipping the callback. This may end up consuming more resources, but this will alleviate the backlog in your main program.
Nodejs handles this naturally or you can roll your own python listener. Since this will be a separate process. You can redirect the logging feature of your other programs to this one. You can even have a separate machine to handle this workload.

Is logging using SocketHandler asynchronous?

I am running a basic logger using a SocketHandler; essentially a minor variant of this code: https://docs.python.org/2.4/lib/network-logging.html.
My question is, is the logging from the client asynchronous? If it is not, is there a way to enforce a timeout? i.e. essentially the client should wait for the logging to happen till 't' seconds and then move on. I have multiple processes logging through the same server.

It's asynchronous in the sense that it can handle inputs from multiple processes interleaved with each other, but not asynchronous in the sense that the calls to sockets are blocking. Since each client connection is handled in a new thread, this doesn't matter too much as long as there aren't too many client connections.

ZeroMQ poll thread safety

I have a thread that is polling on a ZMQ Poller:
poller.poll(timeout)
This thread is also the one which receives and sends back messages over the sockets registered in the poller.
Then I have another thread that may, eventually, create a new socket and register it for polling on input events:
socket = context.socket(...)
socket.bind/connect(...)
poller.register(socket, zmq.POLLIN)
Once the socket is registered, the latter thread will not touch it again.
Is this safe?
Update
The answers/comments I got were about how I should not be doing this. Or which are The Guide's recommendations (which I already knew). But that does not really answer my question.
To be more specific, I would say that I am working with pyzmq Python bindings for ZeroMQ.
Now, although ZeroMQ sockets are not thread safe, it is indeed possible to transfer them from one thread to another as long as there is a full memory barrier during the transfer.
So the first question would be: do I need to set an explicit memory barrier in there? Note that there is one thread that creates and binds/connects the socket and then it registers it, but it will not be using that thread again. Is there an actual conflict? could there be a moment in which I should be explicitly preventing access to the socket from both threads?
Then the second question would be: is registering a socket in a poller thread-safe? Most of the time the thread that performs the polling is busy doing other stuff, but it could happen that it is polling waiting for a timeout. In that case, do I need to use a lock to prevent concurrent access to the poller? or is it safe to register the new socket in the poller while the other thread is polling it?
Update II
I am using Pyro4 to handle and configure remote processes (i.e.: their ZeroMQ connections and their behavior). The initial configuration can be done with the Pyro Proxy very esaily. However, when I start the process, I am in fact running the main loop with a dedicated thread (Pyro oneway call) that keeps running, but if I access the object with the Pyro Proxy again, then this access is from another thread.
So the idea is to avoid modifying the remote object's class but still allow the use of Pyro for configuring the remote objects even when they are running. As long as the creation + binding/connecting + registering of new sockets is safe from another thread, I am good.

Once the socket is registered, the latter thread will not touch it again.
Is this safe?
No.
Industries that not only require safe solutions, but also export the responsibility to actually prove both the stable and warranted system behaviour to the vendor side (be it due to wise grandfathers, a deep belief in QA/TQM or due to regulations imposed on MIL/GOV/aerospace/healthcare/pharma/automotive et al segment vendor management) would simply straight reject.
Why?
" ... will not touch it again." is just a promise.
Safety cross-validated system design does not settle with less than a proof of a collision avoidance.
Let me cite from a lovely book from Pieter HINTJENS "Code Connected, Vol.1" - a must read piece for ZeroMQ:
Some widely used models, despite being the basis for entire industries, are fundamentally broken, and shared state concurrency is one of them. Code that wants to scale without limit does it like the Internet does, by sending messages and sharing nothing except a common contempt for broken programming models.
You should follow some rules to write happy multithreaded code with ØMQ:
• Isolate data privately within its thread and never share data in multiple threads. The only exception to this are ØMQ contexts, which are threadsafe.
• Stay away from the classic concurrency mechanisms like as mutexes, critical sections, semaphores, etc. These are an anti-pattern in ØMQ applications.
• Create one ØMQ context at the start of your process, and pass that to all threads that you want to connect via inproc sockets.
• Use attached threads to create structure within your application, and connect these to their parent threads using PAIR sockets over inproc. The pattern is: bind parent socket, then create child thread which connects its socket.
• Use detached threads to simulate independent tasks, with their own contexts. Connect these over tcp. Later you can move these to stand-alone processes without changing the code significantly.
• All interaction between threads happens as ØMQ messages, which you can define more or less formally.
• Don’t share ØMQ sockets between threads. ØMQ sockets are not threadsafe. Technically it’s possible to migrate a socket from one thread to another but it demands skill. The only place where it’s remotely sane to share sockets between threads are in language bindings that need to do magic like garbage collection on sockets.
If you need to start more than one proxy in an application, for example, you will want to run each in their own thread. It is easy to make the error of creating the proxy frontend and backend sockets in one thread, and then passing the sockets to the proxy in another thread. This may appear to work at first but will fail randomly in real use. Remember: Do not use or close sockets except in the thread that created them.
If you follow these rules, you can quite easily build elegant multithreaded applications, and later split off threads into separate processes as you need to. Application logic can sit in threads, processes, or nodes: whatever your scale needs.
ØMQ uses native OS threads rather than virtual “green” threads. The advantage is that you don’t need to learn any new threading API, and that ØMQ threads map cleanly to your operating system. You can use standard tools like Intel’s ThreadChecker to see what your application is doing. The disadvantages are that native threading APIs are not always portable, and that if you have a huge number of threads (in the thousands), some operating systems will get stressed.
If you’re sharing sockets across threads, don’t. It will lead to random weirdness, and crashes.
We could assume "light" conditions: system not stressed, high-watermark never reached, no big congestions. There is just a single thread running the application (polling and executing tasks on input). So most of the time (99.99%) there is no concurrency. Now, concurrency only occurs when a second thread appears just to add a socket to the pool. There will never be more than 2 threads being executed. And the second thread will be always restricted to adding new sockets to the pool (once added the socket is transferred to the main thread). Is this enough for boundary conditions? – Peque
The more the schematic use-case details were added in update-II, the professional solution shall not lose time and shall avoid any hidden risks by using thread-clean design.
#T1 a poller-maintainer -has Context() instance control
-has graceful .close() + .term() responsibility
-has POLLER instance under it's own control
-has PAIR .bind( "inproc://worker2poller" )
-has PAIR .recv() <add_socket>-request processing responsibility
#T2 a worker-process: -has PAIR .connect( "inproc://worker2poller" )
-has PAIR .send() privilege to ask T1 to add a socket & include it into POLLER
While GIL anyway avoids any chance to find the python threads run PARALLEL, the pure OOP-design is the motivation to keep the architecture with both clean and separated responsibilities and keeping the Formal Communication Patterns fully scaleable.

Running asynchronous python code in a Django web application

Is it OK to run certain pieces of code asynchronously in a Django web app. If so how?
For example:
I have a search algorithm that returns hundreds or thousands of results. I want to enter into the database that these items were the result of the search, so I can see what users are searching most. I don't want the client to have to wait an extra hundred or thousand more database inserts. Is there a way I can do this asynchronously? Is there any danger in doing so? Is there a better way to achieve this?

As far as Django is concerned yes.
The bigger concern is your web server and if it plays nice with threading. For instance, the sync workers of gunicorn are single threads, but there are other engines, such as greenlet. I'm not sure how well they play with threads.
Combining threading and multiprocessing can be an issue if you're forking from threads:
Status of mixing multiprocessing and threading in Python
http://bugs.python.org/issue6721
That being said, I know of popular performance analytics utilities that have been using threads to report on metrics, so seems to be an accepted practice.
In sum, seems safest to use the threading.Thread object from the standard library, so long as whatever you do in it doesn't fork (python's multiprocessing library)
https://docs.python.org/2/library/threading.html

Offloading requests from the main thread is a common practice; as the end goal is to return a result to the client (browser) as quickly as possible.
As I am sure you are aware, HTTP is blocking - so until you return a response, the client cannot do anything (it is blocked, in a waiting state).
The de-facto way of offloading requests is through celery which is a task queuing system.
I highly recommend you read the introduction to celery topic, but in summary here is what happens:
You mark certain pieces of codes as "tasks". These are usually functions that you want to run asynchronously.
Celery manages workers - you can think of them as threads - that will run these tasks.
To communicate with the worker a message queue is required. RabbitMQ is the one often recommended.
Once you have all the components running (it takes but a few minutes); your workflow goes like this:
In your view, when you want to offload some work; you will call the function that does that work with the .delay() option. This will trigger the worker to start executing the method in the background.
Your view then returns a response immediately.
You can then check for the result of the task, and take appropriate actions based on what needs to be done. There are ways to track progress as well.
It is also good practice to include caching - so that you are not executing expensive tasks unnecessarily. For example, you might choose to offload a request to do some analytics on search keywords that will be placed in a report.
Once the report is generated, I would cache the results (if applicable) so that the same report can be displayed if requested later - rather than be generated again.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.