Gunicorn Internal Server Errors - python

I have a Gunicorn server running a Django application which has a tendency to crash quite frequently. Unfortunately when it crashes all the Gunicorn workers go down simultaneously and silently bypass Django's and django-sentry's logging. All the workers return "Internal Server Error" but the arbiter does not crash so supervisord does not register it as a crash and thus does not restart the process.
My question is, is there a way to hook onto a Gunicorn worker crash and possibly send an email or do a logging statement? Secondly is there a way to get supervisord to restart Gunicorn server that is returning nothing but 500's?
Thanks in advance.

I highly recommend using zc.buildout. Here is an example using plugin Superlance for supervisord with buildout:
[supervisor]
recipe = collective.recipe.supervisor
plugins =
superlance
...
programs =
10 zeo ${zeo:location}/bin/runzeo ${zeo:location}
20 instance1 ${instance1:location}/bin/runzope ${instance1:location} true
...
eventlisteners =
Memmon TICK_60 ${buildout:bin-directory}/memmon [-p instance1=200MB]
HttpOk TICK_60 ${buildout:bin-directory}/httpok [-p instance1 -t 20 http://localhost:8080/]
Which will do http request every 20 seconds and restart process if it fails.
http://pypi.python.org/pypi/collective.recipe.supervisor/0.16

Related

Any way to read number of active workers of gunicorn managed by the arbiter from command line?

As explained here
Gunicorn provides an optional instrumentation of the arbiter and workers using the statsD protocol over UDP.
My question: Is there any way to read number of active (i.e. processing some request) gunicorn workers in realtime from command line without installing statsD? My server load average sometimes peaks and I want to see how many gunicorn workers are busy in that time?
Gunicorn's default statsd tracking doesn't actually track active vs nonactive workers. It just tells you the total number of workers, which is not nearly as useful.
But Gunicorn does provide server hooks to let you run code at various points in the process. This is the solution I came up with:
sc = statsd.StatsClient(domain, port, prefix=statsd_prefix)
def pre_request(worker, req):
# Increment busy workers count
sc.incr('busy_workers', 1)
def post_request(worker, req, environ, resp):
# Decrement busy workers count
sc.decr('busy_workers', 1)
You gotta put that in your config file and then reference the config file when you start up gunicorn.
gunicorn myapp.wsgi:application --config myapp/gunicorn_config.py
If you don't want to use statsd you could use those same triggers but pipe a signal to any other program that can keep a count and then watch it to see it on the commandline.
Not sure if there is any particular command for it.
But generally it can be done using shell. Number of workers mean number of processes. Hence you can simple check all the active process, then find all the process which has gunicorn in it and then count all such entries. Do remember to exclude the grep search because it is also has gunicorn.
You command would look something like this.
ps aux | grep gunicorn | grep -v grep | wc -l

Celery worker stops after being idle for a few hours

I have a Flask app that uses WSGI. For a few tasks I'm planning to use Celery with RabbitMQ. But as the title says, I am facing an issue where the Celery tasks run for a few minutes and then after a long time of inactivity it just dies off.
Celery config:
CELERY_BROKER_URL='amqp://guest:guest#localhost:5672//'
BROKER_HEARTBEAT = 10
BROKER_HEARTBEAT_CHECKRATE = 2.0
BROKER_POOL_LIMIT = None
From this question, I added BROKER_HEARTBEAT and BROKER_HEARTBEAT_CHECKRATE.
I run the worker inside the venv with celery -A acmeapp.celery worker & to run it in the background. And while checking the status, for the first few minutes, it shows that one node is online and gives an OK response. But after a few hours of the app being idle, when I check the Celery status, it shows Error: No nodes replied within time constraint..
I am new to Celery and I don't know what to do now.
Your Celery worker might be trying to reconnect to the app until it reaches the retry limit. If that is the case, setting up this options in your config file will fix that problem.
BROKER_CONNECTION_RETRY = True
BROKER_CONNECTION_MAX_RETRIES = 0
The first line will make it retry whenever it fails, and the second one will disable the retry limit.
If that solution does not suit you enough, you can also try a high timeout (specified in seconds) for your app using this option:
BROKER_CONNECTION_TIMEOUT = 120
Hope it helps!

How to remote debug Flask request behind uWSGI in PyCharm

I've read some documentation online about how to do remote debugging with PyCharm - https://www.jetbrains.com/help/pycharm/remote-debugging.html
But there was one key issue with that for what I was trying to do, with my setup - Nginx connecting to uWSGI, which then connects to my Flask app. I'm not sure, but setting up something like,
import sys
sys.path.append('pycharm-debug.egg')
import pydevd
pydevd.settrace('localhost', port=11211,
stdoutToServer=True, stderrToServer=True,
suspend=False)
print 'connected'
from wsgi_configuration_module import app
My wsgi_configuration_module.py file is the uWSGI file used in Production, i.e. no debug.
Connects the debugger to the main/master process of uWSGI, which is run once only, at uWSGI startup / reload, but if you try to set a breakpoint in code blocks of your requests, I've found it to either skip over it, or hang entirely, without ever hitting it, and uWSGI shows a gateway error, after timeout.
The problem here, as far as I see it is exactly that last point, the debugger connects to uWSGI / the application process, which is not any of the individual request processes.
To solve this, from my situation, it needed 2 things changed, 1 of which is the uWSGI configuration for my app. Our production file looks something like
[uwsgi]
...
master = true
enable-threads = true
processes = 5
But here, to give the debugger (and us) an easy time to connect to the request process, and stay connected, we change this to
[uwsgi]
...
master = true
enable-threads = false
processes = 1
Make it the master, disable threads, and limit it to only 1 process - http://uwsgi-docs.readthedocs.io/en/latest/Options.html
Then, in the startup python file, instead of setting the debugger to connect when the entire flask app starts, you set it to connect in a function decorated with the handy flask function, before_first_request http://flask.pocoo.org/docs/0.12/api/#flask.Flask.before_first_request, so the startup script changes to something like,
import sys
import wsgi_configuration_module
sys.path.append('pycharm-debug.egg')
import pydevd
app = wsgi_configuration_module.app
#app.before_first_request
def before_first_request():
pydevd.settrace('localhost', port=11211,
stdoutToServer=True, stderrToServer=True,
suspend=False)
print 'connected'
#
So now, you've limited uWSGI to no threads, and only 1 process to limit the chance of any mixup with them and the debugger, and set pydevd to only connect before the very first request. Now, the debugger connects (for me) successfully once, at the first request in this function, prints 'connected' only once, and from then on breakpoints connect in any of your request endpoint functions without issue.

Python daemon exits silently in urlopen

I have python daemon started from a init.d script. The daemon optionally reads an array of id:s from a server through a REST interface. Otherwise I use an array of pre-defined id:s.
logger.info("BehovsBoBoxen control system: bbb_domoticz.py starting up")
if DOMOTICZ_IN or DOMOTICZ_OUT:
#
# build authenticate string to access Domoticz server
#
p = urllib2.HTTPPasswordMgrWithDefaultRealm()
p.add_password(None, DOMOTICZ_URL, USERNAME, PASSWORD)
handler = urllib2.HTTPBasicAuthHandler(p)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
if DOMOTICZ_IN:
#
# Find all temperature sensors in Domoticz and populate sensors array
#
url= "http://"+DOMOTICZ_URL+"/json.htm?type=devices&filter=temp&used=true&order=Name"
logger.debug('Reading from %s',url)
response=urllib2.urlopen(url)
data=json.loads(response.read())
logger.debug('Response is %s',json.dumps(data, indent=4, sort_keys=True))
for i in range(len(data["result"])):
a=data["result"][i]["Description"]
ini=a.find('%room')
if ini != -1:
ini=ini+6
rIndex=int(a[ini:])
logger.info('Configure room id %s with Domoticz sensor idx: %s', rIndex, data["result"][i]["idx"])
sensors[rIndex]=data["result"][i]["idx"]
The daemon is started from an init.d script at boot. Everything works perfectly if I use the option with predefined id:s, i.e. I don't use the REST interface. The daemon starts at boot, and I can stop and restart the deamon with the command
sudo service start/stop/restart
However, if I use the other option (read id:s from server), the daemon does not start at boot. In the log file I find one single line ("...bbb_domoticz.py starting up"). Henze, the daemon exits silently right after this, probably in one of the following urllib2 calls. The following logger.debug('Reading...') does not show up in the log-file.
But the strange thing is if I manually start the daemon with a copy of the init.d script in my home directory, the daemon starts. If I run the init.d script from /etc/init.d, the deamon immediatly exits as it does at boot. But if start the daemon with the script in my home directory, I can continue to start/stop/restart with the service command.
So my taking from this is something goes wrong in urllib2 unless I have managed to start the daemon once from my home directory. It puzzles me I don't get any traceback or anything when the daemon exits.
Any idea how to nail down this problem?
Edit: Inspired by the answer to add logging to specific modules, I tried to add logging to urllib2. However, I canĀ“t figure out how to let this module use my logging handler. Help on this is appreciated.

subprocess in views.py does not work

I need a function that starts several beanstalk workers before starting to record some videos with different cameras. All of these work in beanstalk. As I need to start the workers before the video record, I want to do a subprocess, but this does not work. The most curious thing is that if I run the subprocess alone in a different python script outside of this function (in the shell), this works! This is my code (the one which is not working):
os.chdir(path_to_the_manage.py)
subprocess.call("python manage.py beanstalk_worker -w 4",shell=True)
phase = get_object_or_404(Phase, pk=int(phase_id))
cameras = Video.objects.filter(phase=phase)
###########################################################################
## BEANSTALK
###########################################################################
num_workers = 4
time_to_run = 86400
[...]
for camera in cameras:
arg = phase_id +' '+settings.PATH_ORIGIN_VIDEOS +' '+camera.name
beanstalk_client.call('video.startvlcserver', arg=arg, ttr=time_to_run)
I want to include the subprocess because it's annoying to me if I have to run the beanstalk workers on each video record I wanna do.
Thanks in advance.
I am not quite sure that subproccess.call is what you are looking for. I believe the issue is subproccess call is syncronous. It doesn't spawn a new process but calls the command within the context of the web request. This ties up resources and if the request times out or the user cancels, weird things could happen?
I have never used beanstalkd, but with celery (another job queue) the celeryd worker process is always running, waiting for jobs. This makes it easy to manage using supervisord. If you look at beanstalkd deployment, I wouldn't be suprised if they recommend doing the same thing. This should include starting your beanstalk workers outside of the context of a view.
From the command line
python manage.py beanstalk_worker -w 4
Once your beanstalkd workers are set up and running, you can send jobs to the queue using an async beanstalk api call, from your view
https://groups.google.com/forum/#!topic/django-users/Vyho8TFew2I

Categories