So I have a Flask application which I've been running on Flask's built-in server, and am ready to move it to production. This application manages several child processes. To this point, I've been handling graceful shutdown using signals. In particular, one shutdown mode I've used is to have sending a SIGHUP to the flask server cause the application to propagate that signal to its children (so they can gracefully shutdown), and then let the application process shutdown.
In production, we're planning on using mod_wsgi. I've read that wsgi applications really shouldn't be handling signals.
So my question is, how should I achieve the following behavior with this setup?:
When apache receives SIGTERM, it notifies the wsgi daemons before terminating them
The wsgi daemons are given a chance to do some cleanup on their own before shutting down
Send SIGTERM to the Apache parent process and that is what sort of happens now.
What happens is that when the Apache parent process receives SIGTERM, it in turn sends SIGTERM to all its child worker processes, as well as to the managed mod_wsgi daemon processes if using daemon mode. Those sub process will stop accepting new requests and will be given up to 3 seconds to complete existing requests, before the sub processes are forcibly shutdown.
So the default behaviour of SIGTERM is to allow a bit of time to complete requests, but long running requests will not be allowed to hold up complete server shutdown. How long it does wait for sub processes to shutdown is not configurable and is fixed at 3 seconds.
Instead of SIGTERM, you can send a SIGWINCH signal instead. This will cause Apache to do a graceful stop, but this has issues.
What happens in the case of SIGWINCH, is that Apache will send SIGTERM to its child worker process again, but instead of forcibly killing off the processes after 3 seconds, it will allow them to run until at least any active requests have completed.
A problem with this is that there is no fail safe. If those requests never finish, there is no timeout that I know of which will see the child worker processes forcibly shutdown. As a result, your server could end up hanging on shutdown.
A second issue is that Apache will still forcibly kill off the managed mod_wsgi daemon processes after 3 seconds and there isn't (or wasn't last time looked) a way to override how Apache manages those processes, to enable a more graceful shutdown on the managed daemon processes. So graceful stop signal doesn't change anything when using daemon mode.
The closest you can get to a graceful stop, is in a front end routing layer, divert new traffic away from the Apache instance. Then through some mechanism trigger within the host running Apache a script which sends a SIGUSR2 to the mod_wsgi daemon processes. Presuming you have set graceful-timeout option on the daemon process group to some adequate failsafe, this will result in the daemon processes exiting if all active requests finish. If the timeout expires, then it will go into its normal process shutdown sequence of not accept new requests from the Apache child worker processes and after the shutdown-timeout (default 5 seconds) fires, if requests still not complete, the process is forcibly shutdown.
In this case, it isn't actually shutting down the processes, but causing them to exit, which will result in them being replaced as we aren't telling the whole Apache to stop, but just telling the mod_wsgi daemon processes to do a graceful restart. In this situation, unless you monitor the set of daemon processes and know when they have all restarted, you don't have a clear indication that they are all done and can then shutdown the whole Apache instance.
So it is a little bit fiddly to do and it is hard for any server to do this in a nice generic way as what is appropriate really also depends on the hosted application and what its requirements are.
The question is if you really need to go to these lengths. Requests will inevitably fail anyway and users have to deal with that, so often interruption of a handful of requests on a restart is not a big deal. What is so special about the application that you need to set a higher bar and attempt to ensure that zero requests are interrupted?
Since mod_wsgi 4.8.0 you can also do the following:
import mod_wsgi
def shutdown_handler(event, **kwargs):
# do whatever you want on shutdown here
mod_wsgi.subscribe_shutdown(shutdown_handler)
Related
So I have a Flask API running through a Systemd service running on a piece of hardware that's battery powered (to control other hardware). I have a bunch of state that I need to save and in case something goes wrong, like a power outage, I need to be able to restore that state.
Right now I save the state as JSON files so I can load them (if they exist) on startup. But I'd also need to be able to remove them again in case it gets the shutdown signal.
I saw somewhere I could set KillSignal to SIGINT and handle the shutdown as a keyboard interrupt. Or something about ExecStop. Would that be enough, or is there a better way to handle such a scenario?
If you look at the shutdown logs of a linux system you'll see 'sending sigterm to all processes... sending sigkill to all processes'. In a normal shutdown processes get a few second's grace before being killed. So if you trap sigterm you can run your shutdown code, but it had better be over before the untrappable sigkill comes along. Since sigterm is always sent to kill a running process, trapping it is indeed the Right Way (TM) to cleanup on exit. But since you are using systemd services you could also cleanup in the service.
in my setup I am using Gunicorn for my deployment on a single CPU machine, with three worker process. I have came to ask this question from this answer: https://stackoverflow.com/a/53327191/10268003 . I have experienced that it is taking upto one and a half second to send mail, so I was trying to send email asynchronously. I am trying to understand what will happen to the worker process started by Gunicorn, which will be starting a new thread to send the mail, will the Process gets blocked until the mail sending thread finishes. In that case I beleive my application's throughput will decrease. I did not want to use celery because it seems to be overkill for setting up celery for just sending emails. I am currently running two containers on the same machine with three gunicorn workers each in development machine.
Below is the approach in question, the only difference is i will be using threading for sending mails.
import threading
from .models import Crawl
def startCrawl(request):
task = Crawl()
task.save()
t = threading.Thread(target=doCrawl,args=[task.id])
t.setDaemon(True)
t.start()
return JsonResponse({'id':task.id})
def checkCrawl(request,id):
task = Crawl.objects.get(pk=id)
return JsonResponse({'is_done':task.is_done, result:task.result})
def doCrawl(id):
task = Crawl.objects.get(pk=id)
# Do crawling, etc.
task.result = result
task.is_done = True
task.save()
Assuming that you are using gunicorn Sync (default), Gthread or Async workers, you can indeed spawn threads and gunicorn will take no notice/interfere. The threads are reused to answer following requests immediately after returning a result, not only after all Threads are joined again.
I have used this code to fire an independent event a minute or so after a request:
Timer(timeout, function_that_does_something, [arguments_to_function]).start()
You will find some more technical details in this other answer:
In normal operations, these Workers run in a loop until the Master either tells them to graceful shutdown or kills them. Workers will periodically issue a heartbeat to the Master to indicate that they are still alive and working. If a heartbeat timeout occurs, then the Master will kill the Worker and restart it.
Therefore, daemon and non-daemon threads that do not interfere with the Worker's main loop should have no impact. If the thread does interfere with the Worker's main loop, such as a scenario where the thread is performing work and will provide results to the HTTP Response, then consider using an Async Worker. Async Workers allow for the TCP connection to remain alive for a long time while still allowing the Worker to issue heartbeats to the Master.
I have recently gone on to use asynchronous event loop based solutions like the uvicorn worker for gunicorn with the fastapi framework that provide alternatives to waiting in threads for IO.
I have Django + mod_wsgi + Apache server. I need to change default HTTP connection timeout. There is Timeout directive in apache config but it's not working.
How can I set this up?
I solved this problem with:
python manage.py runserver --http_timeout 120
There is few timeout options in mod_wsgi WSGIDaemonProcess directive(check out request-timeout):
https://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html
inactivity-timeout=sss (2.0+)
Defines the maximum number of seconds allowed to pass before the
daemon process is shutdown and restarted when the daemon process has
entered an idle state. For the purposes of this option, being idle
means no new requests being received, or no attempts by current
requests to read request content or generate response content for the
defined period. This option exists to allow infrequently used
applications running in a daemon process to be restarted, thus
allowing memory being used to be reclaimed, with process size dropping
back to the initial startup size before any application had been
loaded or requests processed.
request-timeout=sss
Defines the maximum number of seconds that a request is allowed to run
before the daemon process is restarted. This can be used to recover
from a scenario where a request blocks indefinitely, and where if all
request threads were consumed in this way, would result in the whole
WSGI application process being blocked.
How this option is seen to behave is different depending on whether a
daemon process uses only one thread, or more than one thread for
handling requests, as set by the threads option.
If there is only a single thread, and so the process can only handle
one request at a time, as soon as the timeout has passed, a restart of
the process will be initiated.
If there is more than one thread, the request timeout is applied to
the average running time for any requests, across all threads. This
means that a request can run longer than the request timeout. This is
done to reduce the possibility of interupting other running requests,
and causing a user to see a failure. So where there is still capacity
to handle more requests, restarting of the process will be delayed if
possible.
deadlock-timeout=sss (2.0+)
Defines the maximum number of seconds allowed to pass before the
daemon process is shutdown and restarted after a potential deadlock on
the Python GIL has been detected. The default is 300 seconds. This
option exists to combat the problem of a daemon process freezing as
the result of a rouge Python C extension module which doesn't properly
release the Python GIL when entering into a blocking or long running
operation.
shutdown-timeout=sss
Defines the maximum number of seconds allowed to pass when waiting for
a daemon process to gracefully shutdown as a result of the maximum
number of requests or inactivity timeout being reached, or when a user
initiated SIGINT signal is sent to a daemon process. When this timeout
has been reached the daemon process will be forced to exited even if
there are still active requests or it is still running Python exit
functions. If this option is not defined, then the shutdown timeout
will be set to 5 seconds. Note that this option does not change the
shutdown timeout applied to daemon processes when Apache itself is
being stopped or restarted. That timeout value is defined internally
to Apache as 3 seconds and cannot be overridden.
...
Docs about WSGIDaemonProcess:
Using mod_wsgi daemon mode
Defining Process Groups
I have an application which is stuck in a file.read call. Before it goes into a loop where this call is made it forks a child which starts a gevent WSGI server. The purpose of this setup is that I want to wait for a keystroke, send this keystroke to the child websocket server which spreads the message among other connected websocket-clients. My problem is that I don't know how to stop this thing.
If I Ctrl+C the child server process gets the sigint and stops. But my parent only responds if it can read something out of his file. Isn't there something like an asynchronous handler? I also tried registering for SIGINT via signal.signal and manually sending the signal, but the signal handler was only called if something was written to the file.
BTW: I'm runnning Linux.
What is a good way to Reduce the number of workers on a machine in Python-RQ?
According to the documentation, I need to send a SIGINT or SIGTERM command to one of the worker processes on the machine:
Taking down workers
If, at any time, the worker receives SIGINT (via Ctrl+C) or SIGTERM (via kill), the worker wait until the currently running task is finished, stop the work loop and gracefully register its own death.
If, during this takedown phase, SIGINT or SIGTERM is received again,the worker will forcefully terminate the child process (sending it SIGKILL), but will still try to register its own death.
This seems to imply a lot of coding overhead:
Would need to keep track of the PID for the worker process
Would need to have a way to send a SIGINT command from a remote machine
Do I really need to custom build this, or is there a way to do this easily using the Python-RQ library or some other existing library?
Get all running workers using rq.Worker.all()
Select the worker you want to kill
Use os.kill(worker.pid, signal.SIGINT)