How to execute code on SIGTERM in Django request handler thread - python

I'm currently trying to understand the signal handling in Django when receiving a SIGTERM.
Background information
I have an application with potentially long running requests, running in a Docker container. When Docker wants to stop a container, it first sends a SIGTERM signal, waits for a while, and then sends a SIGKILL. Normally, on the SIGTERM, you stop receiving new requests, and hope that the currently running requests finish before Docker decides to send a SIGKILL.
However, in my application, I want to save which requests have been tried, and find that more important than finishing the request right now. So I'd prefer for the current requests to shutdown on SIGTERM, so that I can gracefully end them (and saving their state), rather than waiting for the SIGKILL.
My attempt
My theory is that you can register a signal listener for SIGTERM, that performs a sys.exit(), so that a SystemExit exception is raised. I then want to catch that exception in my request handler, and save my state. As a first experiment I've created a mock project for the Django development server.
I registered the signal in the Appconfig.ready() function:
import signal
import sys
from django.apps import AppConfig
import logging
logger = logging.getLogger(__name__)
def signal_handler(signal_num, frame):
sys.exit()
class TesterConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'tester'
def ready(self):
logger.info('starting ready')
signal.signal(signal.SIGTERM, signal_handler)
and have created a request handler that catches Exceptions and BaseExceptions:
import logging
import sys
import time
from django.http import HttpResponse
def handler(request):
try:
logger.info('start')
while True:
time.sleep(1)
except Exception:
logger.info('exception')
except BaseException:
logger.info('baseexception')
return HttpResponse('hallo')
But when I start the development server user python manage.py runserver and then send a kill signal using kill -n 15 <pid>, no 'baseexception' message gets logged ('start' does get logged).
The full code can be foud here.
My question
My hypothesis is that the SIGTERM signal is handled in the main thread, so the sys.exit() call is handled in the main thread. So the exception is not raised in the thread running the request handler, and nothing is caught.
How do I change my code to have the SystemError raised in the request handler thread? I need some information from that thread to log, so I can't just 'log' something in the signal handler directly.

Ok, I did some investigation, and I found an answer to my own question. As I somewhat suspected while posing the question, it is the kind of question where you probably want a different solution to the one being asked for. However, I'll post my findings here, for if someone in the future finds the question and finds him- and/or herself in a similar situation.
There were a couple of reasons that the above did not work. The first is that I forgot to register my app in the INSTALLED_APPS, so the code in TesterConfig.ready was not actually executed.
Next, it turns out that Django also registers a handler for the SIGTERM signal, see the Django source code. So if you send a SIGTERM to the process, this is the one that gets triggered. I temporarily commented the line in my virtual environment to investigate some more, but of course that can never lead to a real solution.
The sys.exit() function indeed raises a SystemExit exception, but that is handled only in the thread itself. If you would want to communicate between threads, you'll probably want to use an Event and check it regularly in the thread that you want to execute.
If you're looking for suggestions how to do something like this when running Django through gunicorn, I found that if you use the sync worker, you can register signals in your views.py, because the requests will be handled in the main thread.
In the end I ended up registering the signal here, and writing a logging line and raising an Exception in the signal handler. This is then handled by the exception handling that was already in place.

Related

How do I prevent python logging module from raising errors stopping execution on downtime, permanently retry and buffer events

I have a Python process whose output is logged, among other handlers, to a syslog server over tcp. The syslog server may incur in some downtime from time to time. I need the script to run regardless of the logging mechanism. Are there any options/commonly used (or built-in) libraries for buffering the logs and retrying ad infinitum that I may be missing or do I need to write a custom wrapper class for my logging that handles the buffering and retrying?
The issue arises when I stop the syslog server, in which case any "logger" statement raises an error and stops the script execution.
import logging
from logging.handlers import SysLogHandler
...
logger = logging.getLogger()
handler = SysLogHandler(address=syslog_address, socktype=socket.SOCK_STREAM)
logger.addHandler(handler)
...
logger.info("Some statements all over my code I want logged and buffered if possible but I do not want to raise exceptions stopping execution and I don't want to repeat myself wrapping them all in try/except blocks"
The built-in that python offers for this is the QueueHandler. What you do is move the SysLogHandler to a separate thread (or process) with a QueueListener and replace it with a QueueHandler in the application. This way you are insulating your app from failures caused by the syslog and queue messages are automatically buffered. Implementing infinite retry is pretty easy with a Queue, just put failed tasks back.

How do I add an errback to deferLater?

Consider the following twisted code, using deferLater:
import random
from twisted.internet.task import deferLater
from twisted.internet import reactor
def random_exception(msg='general'):
if random.random() < 0.5:
raise Exception("Random exception with 50%% likelihood occurred in %s!" % msg)
def dolater():
random_exception('dolater')
print "it's later!"
def whoops(failure):
failure.trap(Exception)
print failure
defer = deferLater(reactor, 10, dolater)
defer.addErrback(whoops)
reactor.run()
An exception is raised during the 10 second sleep (namely a KeyboardInterrupt), however, it seems that the whoops method is never called. My assumption is that since I add the errBack after the deferred kicks off, it's never properly registered. Advice appreciated.
EDIT:
Alright, no one likes my use of the signal (not the exception) KeyboardInterrupt to show an error condition outside of the defer. I thought pretty hard about an actual exception that might occur out of the defer callback, but couldn't think of a particularly good one, most everything would be some kind of signal (or developer error), so signal handling is fine for now- but that wasn't really the heart of the question.
As I understand it, twisted's callback/errback system handles errors within the callback structure - e.g. if dolater raises an Exception of some kind. To show this, I have added an exception that could occur during dolater, to show that if the exception occurs in dolater, the errback handles the exception just fine.
My concern was if something went wrong while the reactor was just reacting normally, and the only thing I could get to go wrong was a keyboard interrupt, then I wanted whoops to fire. It appears that if I put other async events into the reactor and raise exceptions from there, then the dolater code wouldn't be affected, and I would have to add errbacks to those other async events. There is no master error handling for an entire twisted program.
So signals it is, until I can find some way to cause the reactor to fail without a signal.
If by KeyboardInterrupt you mean a signal (ctrl-c, SIGINT, etc), then what you need to do is setup a signal handler with your whoops function as the callback.
By following two previous answers from #jean-paul-calderone twisted: catch keyboardinterrupt and shutdown properly and twisted - interrupt callback via KeyboardInterrupt, I tried the following, and I think it matches your need:
def dolater():
print "it's later!"
def whoops(signal, stackframe):
print "I'm here because of signal number " + str(signal)
reactor.stop()
defer = task.deferLater(reactor, 10, dolater)
signal.signal(signal.SIGINT, whoops)
reactor.run()
That will call whoops on a SIGINT. I put a reactor.stop() in the whoops because otherwise the reactor would just keep on running, take that out if you really want it to keep running in the face of a ctrl-c.
Note: I'm not explicitly showing how to fire a err-back in the signal system because (at least to my understanding) that doesn't really map to how defer'ed should be used. I imagine if you found a way to get the defer'ed into the signal handler you could fire its errback but I think thats out of the expected use-case for twisted and may have crazy consequences.
The problem is with the actual exception you're trying to catch, specifically KeyboardInterrupt is not a subclass of Exception, thus can not be catched with it. If you'd just change the line:
failure.trap(Exception)
into:
failure.trap(KeyboardInterrupt)
it surely would catch it. More on Python's exception hierarchy can be found in the official Python docs: https://docs.python.org/2/library/exceptions.html
Twisted is a library for doing many things concurrently. The things are kept as isolated as possible (given that this is still Python, there's still global state, etc).
If you have a TCP server with two clients connect to it and one of them sends you some bad data that triggers a bug in your parser that leads to an exception being raised, that exception isn't going to cause the other client to receive any error. Nor would you want it to, I hope (at least not automatically).
Similarly, if you have a client connected to your server and you start a delayed call with deferLater and the client triggers that bug, you wouldn't want the error to be delivered to the errback on the Deferred returned by deferLater.
The idea here is that separate event sources are generally treated separately (until you write some code that glues them together somehow).
For the ten seconds that are passing between when you call deferLater and when Twisted begins to run the function you passed to deferLater, any errors that happen - including you hitting C-c on your keyboard to make Python raise a KeyboardInterrupt - aren't associated with that delayed call and they won't be delivered to the errback you attach to its Deferred.
Only exceptions raised by your dolater function will cause the errback chain of that Deferred to begin execution.

Proper signal order to safely stop process

I have a long-running Python process that I want to be able to terminate in the event it gets hung-up and stops reporting progress. But I want to signal it in a way that allows it to safely cleanup, in case it hasn't completely hung-up and there's still something running that can respond to signals gracefully. What's the best order of signals to send before outright killing it?
I'm currently doing something like:
def safe_kill(pid):
for sig in [SIGTERM, SIGABRT, SIGINT, SIGKILL]:
os.kill(pid, sig)
time.sleep(1)
if not pid_exists(pid):
return
Is there a better order? I know SIGKILL bypasses the process entirely, but is there any significant difference between SIGTERM/SIGABRT/SIGINT or do they all have the same effect as far as Python is concerned?
I believe the proper way for stopping a process is SIGTERM followed by SIGKILL after a small timeout.
I don't think that SIGINT and SIGABRT are necessary if that process handles signals in a standard way. SIGINT is usually handled the same way as SIGTERM and SIGABRT is usually used by process itself on abort() (wikipedia).
Anything more complex than a small script usually implements custom SIGTERM handling to shutdown gracefully (cleaning up all the resources, etc).
For example, take a look at Upstart. It is an init daemon - it starts and stops most of processes in Ubuntu and some other distributions. The default Upstart behavior for stopping a process is to send SIGTERM, wait 5 seconds and send SIGKILL (source - upstart cookbook).
You probably should do some testing to determine the best timeout for your process.
You need to register a signal handler, as you would do in C.
import signal
import sys
def clean_termination(signal):
# perform your cleanup
sys.exit(1)
# register the signal handler for the signals specified in the question
signal.signal(signal.SIGTERM, clean_termination)
signal.signal(signal.SIGABRT, clean_termination)
Note that Python maps the SIGINT signal to a KeyboardInterrupt exception, that you can catch with a regular except statement.

Is Pyro signal safe?

I have been using Pyro 3 for a little while now, with great success, but occasionally I have noticed, that when a signal such as SIGHUP or SIGINT arrives while Pyro is doing some remote communications, the process hangs, hence the question, is Pyro signal safe?
Thanks in advance.
Seems the issue here is by default Python sets up a handlers for SIGINT and SIGTERM which raise exceptions. If you therfore receive a signal while doing some Pyro comms, the exception is raised, and off it goes to look for an appropriate except clause, not finishing what it was doing, if you then try and use Pyro again, for example in the except/finally clause, you can get issues. In my case it was sending some messages from finally to a log via a queue which was proxied to another process using Pyro.

How to shutdown cherrypy from within?

I am developing on cherrypy, I start it from a python script.
For better development I wonder what is the correct way to stop cherrypy from within the main process (and not from the outside with ctrl-c or SIGTERM).
I assume I have to register a callback function from the main application to be able to stop the cherrypy main process from a worker thread.
But how do I stop the main process from within?
import sys
class MyCherryPyApplication(object):
def default(self):
sys.exit()
default.exposed = True
cherrypy.quickstart(MyCherryPyApplication())
Putting a sys.exit() in any request handler exits the whole server
I would have expected this only terminates the current thread, but it terminates the whole server. That's what I wanted.

Categories