Two Celery Processes Running - python

I am debugging an issue where every scheduled task is run twice. I saw two processes named celery. Is it normal for two celery tasks to be running?
$ ps -ef | grep celery
hgarg 303 32764 0 17:24 ? 00:00:00 /home/hgarg/.pythonbrew/venvs/Python-2.7.3/hgarg_env/bin/python /data/hgarg/current/manage.py celeryd -B -s celery -E --scheduler=djcelery.schedulers.DatabaseScheduler -P eventlet -c 1000 -f /var/log/celery/celeryd.log -l INFO --pidfile=/var/run/celery/celeryd.pid --verbosity=1 --settings=settings
hgarg 307 21179 0 17:24 pts/1 00:00:00 grep celery
hgarg 32764 1 4 17:24 ? 00:00:00 /home/hgarg/.pythonbrew/venvs/Python-2.7.3/hgarg_env/bin/python /data/hgarg/current/manage.py celeryd -B -s celery -E --scheduler=djcelery.schedulers.DatabaseScheduler -P eventlet -c 1000 -f /var/log/celery/celeryd.log -l INFO --pidfile=/var/run/celery/celeryd.pid --verbosity=1 --settings=settings

There were two pairs of Celery processes, the older of which shouldn't have been. Killing them all and restarting celery seems to have fixed it. Without any other recent changes, unlikely that anything else could have caused it.

Related

gunicorn processes wont shut down

I am trying to kill my gunicorn processes on my server.
When I run kill {id} they seem to shut down for maybe 1sec and then they start back up.
$ ps ax | grep gunicorn
42898 ? S 0:00 /usr/bin/python3 /usr/bin/gunicorn cms_project.wsgi -b 0.0.0.0:8000 -w 1 --timeout 90
42924 ? S 0:00 /usr/bin/python3 /usr/bin/gunicorn cms_project.wsgi -b 0.0.0.0:8000 -w 1 --timeout 90
then I run
pkill -f gunicorn
the processes go away for maybe 1second and then start back up on session id's
43170 ? S 0:00 /usr/bin/python3 /usr/bin/gunicorn cms_project.wsgi -b 0.0.0.0:8000 -w 1 --timeout 90
43171 ? S 0:00 /usr/bin/python3 /usr/bin/gunicorn cms_project.wsgi -b 0.0.0.0:8000 -w 1 --timeout 90
I have also tried killing them individually using the kill process
I have also tried a server restart, and that is not working the gunicorn processes seems to start up when the servers back online.

Running celery worker + beat in the same container

My flask app is comprised of four containers: web app, postgres, rabbitMQ and Celery. Since I have celery tasks that run periodically, I am using celery beat. I've configured my docker-compose file like this:
version: '2'
services:
rabbit:
# ...
web:
# ...
rabbit:
# ...
celery:
build:
context: .
dockerfile: Dockerfile.celery
And my Dockerfile.celery looks like this:
# ...code up here...
CMD ["celery", "-A", "app.tasks.celery", "worker", "-B", "-l", "INFO"]
While I read in the docs that I shouldn't go to production with the -B option, I hastily added it anyway (and forgot about changing it) and quickly learned that my scheduled tasks were running multiple times. For those interested, if you do a ps aux | grep celery from within your celery container, you'll see multiple celery + beat processes running (but there should only be one beat process and however many worker processes). I wasn't sure from the docs why you shouldn't run -B in production but now I know.
So then I changed my Dockerfile.celery to:
# ...code up here...
CMD ["celery", "-A", "app.tasks.celery", "worker", "-l", "INFO"]
CMD ["celery", "-A", "app.tasks.celery", "beat", "-l", "INFO"]
No when I start my app, the worker processes start but beat does not. When I flip those commands around so that beat is called first, then beat starts but the worker processes do not. So my question is: how do I run celery worker + beat together in my container? I have combed through many articles/docs but I'm still unable to figure this out.
EDITED
I changed my Dockerfile.celery to the following:
ENTRYPOINT [ "/bin/sh" ]
CMD [ "./docker.celery.sh" ]
And my docker.celery.sh file looks like this:
#!/bin/sh -ex
celery -A app.tasks.celery beat -l debug &
celery -A app.tasks.celery worker -l info &
However, I'm receiving the error celery_1 exited with code 0
Edit #2
I added the following blocking command to the end of my docker.celery.sh file and all was fixed:
tail -f /dev/null
docker run only one CMD, so only the first CMD get executed, the work around is to create a bash script that execute both worker and beat and use the docker CMD to execute this script
I got by putting in the entrypoint as explained above, plus I added the &> to have the output in a log file.
my entrypoint.sh
#!/bin/bash
python3 manage.py migrate
python3 manage.py migrate catalog --database=catalog
python manage.py collectstatic --clear --noinput --verbosity 0
# Start Celery Workers
celery worker --workdir /app --app dri -l info &> /log/celery.log &
# Start Celery Beat
celery worker --workdir /app --app dri -l info --beat &> /log/celery_beat.log &
python3 manage.py runserver 0.0.0.0:8000
Starting from the same concept #shahaf has highlighted I solved starting from this other solution using bash -c in this way:
command: bash -c "celery -A app.tasks.celery beat & celery -A app.tasks.celery worker --loglevel=debug"
You can use celery beatX for beat. It is allowed (and recommended) to have multiple beatX instances. They use locks to synchronize.
Cannot say if it is production-ready, but it works for me like a charm (with -B key)

Restart celery beat and worker during Django deployment

I am using celery==4.1.0 and django-celery-beat==1.1.0.
I am running gunicorn + celery + rabbitmq with Django.
This is my config for creating beat and worker
celery -A myproject beat -l info -f /var/log/celery/celery.log --detach
celery -A myproject worker -l info -f /var/log/celery/celery.log --detach
During Django deployment I am doing following:
rm -f celerybeat.pid
rm -f celeryd.pid
celery -A myproject beat -l info -f /var/log/celery/celery.log --detach
celery -A myproject worker -l info -f /var/log/celery/celery.log --detach
service nginx restart
service gunicorn stop
sleep 1
service gunicorn start
I want to restart both celery beat and worker and it seems that this logic works. But I noticed that celery starts to use more and more memory during deployment and after several deployments I hit 100% memory use. I tried different server setups and it seems that it is not related.
rabbitmq may be to blame for high memory usage. Can you safely restart rabbit?
Also can you confirm that after a restart there is the expected amount of workers?
You are starting 2 new workers for every deployment without stopping/killing the previous workers.
During deployment, stop the existing workers with
kill -9 $PID
kill -9 `cat /var/run/myProcess.pid`
Alternatively, you can just kill all the workers with
pkill -9 celery
Now you can start workers as usual.
celery -A myproject beat -l info -f /var/log/celery/celery.log --detach
celery -A myproject worker -l info -f /var/log/celery/celery.log --detach

Run Django Celery Nohup in Crontab

I want to run some tasks in django. I'm using Celery to do this. Usually, I run this command to execute the tasks:
source myvirtualenvpath/bin/activate
nohup python manage.py celeryd -E -B --loglevel=DEBUG < /dev/null &>/dev/null &
I want to do this each time the machine reboot with a crontab. How can I
do this?
Thanks
I solved making this:
nohup path_to_virtual_env/bin/python path_to_project/manage.py celeryd -E -B --loglevel=DEBUG < /dev/null &>/dev/null &

Daemonize Celerybeat in Elastic Beanstalk(AWS)

I am trying to run celerybeat as a daemon in Elastic beanstalk. Here is my config file:
files:
"/opt/python/log/django.log":
mode: "000666"
owner: ec2-user
group: ec2-user
content: |
# Log file
encoding: plain
"/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
mode: "000755"
owner: root
group: root
content: |
#!/usr/bin/env bash
# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
celeryenv=${celeryenv%?}
# Create celery configuraiton script
celeryconf="[program:celeryd]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A avtotest --loglevel=INFO
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv"
# Create celerybeat configuraiton script
celerybeatconf="[program:celerybeat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A avtotest --loglevel=INFO
; remove the -A avtotest argument if you are not using an app instance
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celerybeat.log
stderr_logfile=/var/log/celerybeat.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=999
environment=$celeryenv"
# Create the celery and beat supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf
echo "$celerybeatconf" | tee /opt/python/etc/celerybeat.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
echo "files: celerybeat.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
This file daemonizes both celery and celerybeat. Celery is working fine. But celerybeat is not. I don't see celerybeat.log file created which I think suggests that celerybeat is not working.
Any ideas about this?
I will post more code if needed. Thanks for help
Your supervisord syntax is a bit off, first of all you may need to SSH into your instance, and edit the supervisord.conf file directly (vim /opt/python/etc/supervisord.conf), and fix this line directly.
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
echo "files: celerybeat.conf" | tee -a /opt/python/etc/supervisord.conf
should be
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf celerybeat.conf" | tee -a /opt/python/etc/supervisord.conf
EDIT:
To run celerybeat, and make sure that it only runs ONCE on all your machines, you should place these lines in your config files --
04_killotherbeats:
command: "ps auxww | grep 'celery beat' | awk '{print $2}' | sudo xargs kill -9 || true"
05_restartbeat:
command: "supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat"
leader_only: true

Categories