Gunicorn workers are unable to restart after timeout - python

I have gunicorn serving a django application. Nginx is used as a reverse proxy. And supervisord is used to manage gunicorn.
This is the supervisord config:
command = /opt/backend/envs/backend/bin/gunicorn msd.wsgi:application --name backend --bind 13.134.82.143:8030 --workers 5 --timeout 300 --user backend --group backend --log-level info --log-file /opt/backend/logs/gunicorn.log
directory = /opt/backend/backend
user = backend
group = backend
stdout_logfile = /opt/backend/logs/supervisor.log
redirect_stderr = true
Sometimes gunicorn workers time out. After that, I expect gunicorn to automatically reload the dead ones.
However, the strange thing is under heavy load, some workers cannot get back up saying:
Can't connect to ('13.134.82.143', 8030)
I think that the workers that timed out are left as a zombie and occupying the ports.
What can I do in such cases?

Related

Python Django nginx uWsgi getting failed on specific callback endpoints

I'm running Django web app on a docker container where I use Nginx with uwsgi.
Overall the web works just fine it fails only on specific callback endpoints during the social app (Google, Facebook) registration.
Below is the command I use to run the uswgi
uwsgi --socket :8080 --master --strict --enable-threads --vacuum --single-interpreter --need-app --die-on-term --module config.wsgi
Below is the endpoint where it fails (Django allauth lib)
accounts/google/login/callback/?state=..........
Below is the error message:
!! uWSGI process 27 got Segmentation Fault !!!
...
upstream prematurely closed connection while reading response header from upstream, client: ...
...
DAMN ! worker 1 (pid: 27) died :( trying respawn ...
Respawned uWSGI worker 1 (new pid: 28)
Just FYI.. this works without any issues in the local docker container but it fails when I use the GCP container. Also, this used to work fine on GCP as well so probably something happened after recent dependency updates.
Environment:
Python: 3.9.16
Django: 3.2.3
allauth: 0.44.0 (Django authentication library)
Nginx: nginx/1.23.3
uWSGI: 2.0.20

gunicorn with inotify is not noticing changed files with reload

I have a Flask app that I'm running for development in a Docker container, using gunicorn (v19.9.0). I've installed the inotify module to see if it's quicker for gunicorn's --reload option than the default polling method, but it doesn't seem to notice file changes.
If I start gunicorn with this:
gunicorn --reload --reload-engine=poll -b 0.0.0.0:5006 wsgi:app --workers=4 --timeout=600
then if I make a change to a python file (like a Flask view), the logs show "Worker reloading" 4 times. Once the workers have rebooted, then when I refresh the page in my browser the code changes are picked up.
But if I specify the inotify reload engine:
gunicorn --reload --reload-engine=inotify -b 0.0.0.0:5006 wsgi:app --workers=4 --timeout=600
and change the contents of files, there's no "Worker reloading" or rebooting in the logs. And when I reload a page, the changes aren't present.
I feel like I've missed something obvious...?

uwsgi broken pipe when running it as systemd service

I am running a uwsgi/flask python app in a conda virtual environment using python 2.7.11.
I am moving from CentOS 6 to CentOS 7 and want to make use of systemd to run my app as a service. Everything (config and code) works fine if I manually call the start script for my app (sh start-foo.sh) but when I try to start it as a systemd service (sudo systemctl foo start) it starts the app but then fails right away with the following error:
WSGI app 0 (mountpoint='') ready in 8 seconds on interpreter 0x14c38d0 pid: 3504 (default app)
mountpoint already configured. skip.
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 3504)
emperor_notify_ready()/write(): Broken pipe [core/emperor.c line 2463]
VACUUM: pidfile removed.
Here is my systemd Unit file:
[Unit]
Description=foo
[Service]
ExecStart=/bin/bash /app/foo/bin/start-foo.sh
ExecStop=/bin/bash /app/foo/bin/stop-foo.sh
[Install]
WantedBy=multi-user.target
Not sure if necessary, but here are my uwsgi emperor and vassal configs:
Emperor
[uwsgi]
emperor = /app/foo/conf/vassals/
daemonize = /var/log/foo/emperor.log
Vassal
[uwsgi]
http-timeout = 500
chdir = /app/foo/scripts
pidfile = /app/foo/scripts/foo.pid
#socket = /app/foo/scripts/foo.soc
http = :8888
wsgi-file = /app/foo/scripts/foo.py
master = 1
processes = %(%k * 2)
threads = 1
module = foo
callable = app
vacuum = True
daemonize = /var/log/foo/uwsgi.log
I tried to Google for this issue but can't seem to find anything related. I suspect this has something to do with running uwsgi in a virtual environment and using systemctl to start it. I'm a systemd n00b so let me know if I'm doing something wrong in my Unit file.
This is not a blocker because I can still start/stop my app by executing the scripts manually, but I would like to be able to run it as a service and automatically launch it on startup using systemd.
Following the instructions here at uwsgi's documentation regarding setting up a systemd service fixed the problem.
Here is what I changed:
Removed daemonize from both Emperor and Vassal configs.
Took the Unit file from the link above and modified slightly to work with my app
[Unit]
Description=uWSGI Emperor
After=syslog.target
[Service]
ExecStart=/app/foo/bin/uwsgi /app/foo/conf/emperor.ini
RuntimeDirectory=uwsgi
Restart=always
KillSignal=SIGQUIT
Type=notify
StandardError=syslog
NotifyAccess=all
[Install]
WantedBy=multi-user.target

fabric kill gunicorn process only if it is running

I am very new to fabric. In my fabric file I want to restart gunicorn. For that I am killing the gunicorn process first and then starting it..
It looks like:
def restart_gunicorn():
run('ps ax|grep gunicorn')
run('pkill gunicorn')
run('gunicorn -b 0.0.0.0:8080 %(path)s/application/wsgi &' % env)
When I run this it gives me error at pkill gunicorn because at start i will not have any gunicorn process running. So I want to have a check lik if gunicorn processes are running then only kill gunicorn. If not gunicorn process are running I just want to start the gunicorn process..
How can I do this ?
Need help. Thank you
You can just add settings(warn_only=True) and will only give you a warning, but the execution won't fail:
def restart_gunicorn():
run('ps ax|grep gunicorn')
with settings(warn_only=True):
run('pkill gunicorn')
run('gunicorn -b 0.0.0.0:8080 %(path)s/application/wsgi &' % env)
More info on settings context manager here: http://docs.fabfile.org/en/1.10/api/core/context_managers.html#fabric.context_managers.settings

Celery Worker is offline

I am running a celery worker
foreman run python manage.py celery worker -E --maxtasksperchild=1000
And a celerymon
foreman run python manage.py celerymon
as well as celerycam
foreman run python manage.py celerycam
The django admin shows that my worker is offline and all tasks remain in the delayed state. I have tried killing and restarting it several times but it does not seem to be online.
Here is my configuration
BROKER_TRANSPORT = 'amqplib'
BROKER_POOL_LIMIT=0
BROKER_CONNECTION_MAX_RETRIES = 0
BROKER_URL = os.environ.get('AMQP_URL')
CELERY_RESULT_BACKEND = 'database'
CELERY_TASK_RESULT_EXPIRES = 14400

Categories