How do I kill the workers when I reboot the server and get the same effect as the following statement:
pkill -9 -f 'celery worker'
From the celery documentation:
If the worker won’t shut down after considerate time, for being stuck in an infinite-loop or similar, you can use the KILL signal to force terminate the worker:
But I am starting as a systemd service and have the following config to start it using the following systemd unit file:
[Unit]
Description=Celery Service
After=network.target
[Service]
Type=forking
User=dsangvikar
Group=www-data
EnvironmentFile=-/etc/default/celery
WorkingDirectory=/home/dsangvikar/apps/msbot/
ExecStart=/home/dsangvikar/apps/msbot/msbotenv/bin/celery multi start \
-A microsoftbotframework.runcelery.celery chatbotworker --concurrency=4 \
--workdir=/home/dsangvikar/apps/msbot/ --logfile=/var/log/celery/%n.log
--pidfile=/var/run/celery/%n.pid
ExecStop=/home/dsangvikar/apps/msbot/msbotenv/bin/celery multi stopwait
RuntimeDirectory=celery
[Install]
WantedBy=multi-user.target
When I do sudo systemctl status celery, I get the status and pid. I use it to kill it. But the worker process doesn't exit. I want to force kill them. My program logic recreates them on every server restart. But since they are not killed even on a system reboot, all kinds of different problems are cropping up.
Celery multi is not supposed to be used for production.
This is what I'm using:
It starts 10 main processes with 2 workers each. So a total of 20 worker processes.
[program:celery_worker]
numprocs=10
process_name=%(program_name)s-%(process_num)s
directory=/opt/worker/main
environment=PATH="/opt/worker/main/bin:%(ENV_PATH)s"
command=/opt/worker/main/bin/celery worker -n worker%(process_num)s.%%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E
stdout_logfile=/var/log/celery/%(program_name)s-%(process_num)s.log
user=username
autostart=true
autorestart=true
startretries=99999
startsecs=10
stopsignal=TERM
stopasgroup=false
stopwaitsecs=7200
killasgroup=true
If you have jobs running you don't want to send TERM signals to the PoolWorker process as it will cause the job to abort early. What you really want to do is send the TERM signal to MainProcess which will wait for the job to end and then close.
So you want to stop the primary processes and if it comes down to kill then you want to kill as a group.
Use this command to start the worker shutdown. If the workers fail to exit by the stopwaitsecs time in the supervisor config then a kill signal will be sent and that will kill everything since killasgroup is set to true.
sudo supervisorctl stop celery_worker:*
Example of what the supervisord config above starts.
username 1659 1.1 0.2 119796 45632 ? S 10:45 0:06 [celeryd: celery#worker7.hostname:MainProcess] -active- (worker -n worker7.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1662 1.1 0.2 119804 45716 ? S 10:45 0:06 [celeryd: celery#worker6.hostname:MainProcess] -active- (worker -n worker6.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1663 1.2 0.2 119724 45412 ? S 10:45 0:06 [celeryd: celery#worker5.hostname:MainProcess] -active- (worker -n worker5.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1666 1.1 0.2 119732 45524 ? S 10:45 0:05 [celeryd: celery#worker4.hostname:MainProcess] -active- (worker -n worker4.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1671 1.2 0.2 119792 45724 ? S 10:45 0:06 [celeryd: celery#worker3.hostname:MainProcess] -active- (worker -n worker3.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1674 1.2 0.2 119792 45420 ? S 10:45 0:06 [celeryd: celery#worker2.hostname:MainProcess] -active- (worker -n worker2.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1678 1.1 0.2 119712 45708 ? S 10:45 0:05 [celeryd: celery#worker1.hostname:MainProcess] -active- (worker -n worker1.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1679 1.2 0.2 119808 45476 ? S 10:45 0:06 [celeryd: celery#worker0.hostname:MainProcess] -active- (worker -n worker0.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1680 1.1 0.2 119796 45512 ? S 10:45 0:05 [celeryd: celery#worker9.hostname:MainProcess] -active- (worker -n worker9.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1681 1.1 0.2 119720 45736 ? S 10:45 0:06 [celeryd: celery#worker8.hostname:MainProcess] -active- (worker -n worker8.%h --app=python --time-limit=3600 -c 2 -Ofair -l debug --config=celery_config -E)
username 1796 0.0 0.2 118160 39660 ? S 10:45 0:00 [celeryd: celery#worker9.hostname:PoolWorker-1]
username 1797 0.0 0.2 118232 39548 ? S 10:45 0:00 [celeryd: celery#worker8.hostname:PoolWorker-1]
username 1798 0.0 0.2 118152 39532 ? S 10:45 0:00 [celeryd: celery#worker3.hostname:PoolWorker-1]
username 1799 0.0 0.2 118156 39652 ? S 10:45 0:00 [celeryd: celery#worker2.hostname:PoolWorker-1]
username 1800 0.0 0.2 118168 39748 ? S 10:45 0:00 [celeryd: celery#worker7.hostname:PoolWorker-1]
username 1801 0.0 0.2 118164 39608 ? S 10:45 0:00 [celeryd: celery#worker6.hostname:PoolWorker-1]
username 1802 0.0 0.2 118192 39768 ? S 10:45 0:00 [celeryd: celery#worker1.hostname:PoolWorker-1]
username 1803 0.0 0.2 118200 39728 ? S 10:45 0:00 [celeryd: celery#worker5.hostname:PoolWorker-1]
username 1804 0.0 0.2 118168 39756 ? S 10:45 0:00 [celeryd: celery#worker0.hostname:PoolWorker-1]
username 1805 0.0 0.2 118188 39692 ? S 10:45 0:00 [celeryd: celery#worker4.hostname:PoolWorker-1]
username 1806 0.0 0.2 118152 39536 ? S 10:45 0:00 [celeryd: celery#worker3.hostname:PoolWorker-2]
username 1807 0.0 0.2 118232 39544 ? S 10:45 0:00 [celeryd: celery#worker8.hostname:PoolWorker-2]
username 1808 0.0 0.2 118164 39608 ? S 10:45 0:00 [celeryd: celery#worker6.hostname:PoolWorker-2]
username 1809 0.0 0.2 118200 39732 ? S 10:45 0:00 [celeryd: celery#worker5.hostname:PoolWorker-2]
If you want stops to happen instantly then set stopwaitsecs to 1.
lpiner#hostname:~$ sudo supervisorctl status
celery_worker:celery_worker-0 RUNNING pid 2488, uptime 0:00:48
celery_worker:celery_worker-1 RUNNING pid 2487, uptime 0:00:48
celery_worker:celery_worker-2 RUNNING pid 2486, uptime 0:00:48
celery_worker:celery_worker-3 RUNNING pid 2485, uptime 0:00:48
celery_worker:celery_worker-4 RUNNING pid 2484, uptime 0:00:48
celery_worker:celery_worker-5 RUNNING pid 2483, uptime 0:00:48
celery_worker:celery_worker-6 RUNNING pid 2482, uptime 0:00:48
celery_worker:celery_worker-7 RUNNING pid 2481, uptime 0:00:48
celery_worker:celery_worker-8 RUNNING pid 2490, uptime 0:00:48
celery_worker:celery_worker-9 RUNNING pid 2489, uptime 0:00:48
lpiner#hostname:~$ sudo supervisorctl stop celery_worker:*
celery_worker:celery_worker-7: stopped
celery_worker:celery_worker-6: stopped
celery_worker:celery_worker-5: stopped
celery_worker:celery_worker-4: stopped
celery_worker:celery_worker-3: stopped
celery_worker:celery_worker-2: stopped
celery_worker:celery_worker-1: stopped
celery_worker:celery_worker-0: stopped
celery_worker:celery_worker-9: stopped
celery_worker:celery_worker-8: stopped
lpiner#hostname:~$ sudo supervisorctl status
celery_worker:celery_worker-0 STOPPED Aug 02 11:17 AM
celery_worker:celery_worker-1 STOPPED Aug 02 11:17 AM
celery_worker:celery_worker-2 STOPPED Aug 02 11:17 AM
celery_worker:celery_worker-3 STOPPED Aug 02 11:17 AM
celery_worker:celery_worker-4 STOPPED Aug 02 11:17 AM
celery_worker:celery_worker-5 STOPPED Aug 02 11:17 AM
celery_worker:celery_worker-6 STOPPED Aug 02 11:17 AM
celery_worker:celery_worker-7 STOPPED Aug 02 11:17 AM
celery_worker:celery_worker-8 STOPPED Aug 02 11:17 AM
celery_worker:celery_worker-9 STOPPED Aug 02 11:17 AM
Related
More than a week ago, I ran nohup python3 -u script.py on an Ubuntu beowulf cluster I was connected to via SSH. I've now gone back wanting to kill off these processes (this program is using multiprocesing with a Pool object), but I haven't been able to do so, as I haven't been able to find the PIDs. I know that the processes are still being run because nohup.out is still being appended to and other data is being generated, but nothing relevant seems to appear when I run commands like ps or top. For example, when I run ps -x -U mkarrmann, I get:
PID TTY STAT TIME COMMAND
1296920 ? Ss 0:00 /lib/systemd/systemd --user
1296929 ? S 0:00 (sd-pam)
1296937 ? Ssl 0:00 /usr/bin/pulseaudio --daemonize=no --log-target=journal
1296939 ? SNsl 0:00 /usr/libexec/tracker-miner-fs
1296944 ? Ss 0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
1296945 ? R 0:00 sshd: mkarrmann#pts/0
1296960 ? Ssl 0:00 /usr/libexec/gvfsd
1296965 ? Sl 0:00 /usr/libexec/gvfsd-fuse /run/user/3016/gvfs -f -o big_writes
1296972 ? Ssl 0:00 /usr/libexec/gvfs-udisks2-volume-monitor
1296979 pts/0 Ss 0:00 -bash
1296980 ? Ssl 0:00 /usr/libexec/gvfs-gphoto2-volume-monitor
1296987 ? Ssl 0:00 /usr/libexec/gvfs-afc-volume-monitor
1296992 ? Ssl 0:00 /usr/libexec/gvfs-mtp-volume-monitor
1297001 ? Ssl 0:00 /usr/libexec/gvfs-goa-volume-monitor
1297005 ? Sl 0:00 /usr/libexec/goa-daemon
1297014 ? Sl 0:00 /usr/libexec/goa-identity-service
1297126 pts/0 R+ 0:00 ps -x -U mkarrmann
Or when I run ps -faux | grep py, I get:
root 975 0.0 0.0 34240 8424 ? Ss Jul28 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 1046 0.0 0.3 476004 245516 ? Ss Jul28 66:45 /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
root 1275 0.0 0.0 20612 7732 ? S Jul28 0:00 \_ /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
mkarrma+ 1297143 0.0 0.0 6380 736 pts/0 S+ 14:40 0:00 \_ grep --color=auto py
Do any of these actually correspond to my Python processes and I'm just missing it? Anything else that I should try? I feel like the only thing I haven't tried is manually parsing through /proc, but that obviously shouldn't be necessary so I'm sure I'm missing something else.
I'm happy to provide any additional information that could be helpful. Thanks!
I'm managing Nginx with a custom web app (python) running over apache in a Debian 10 system.
It works fine, I can restart, reload, stop, check syntax of nginx without issue.
The problem arise when Nginx is started via custom web app (apache) and then if I restart/stop the apache via "service apache2 restart" or "/etc/init.d/apache2 restart", Nginx get stopped too.
the way that I start nginx is with python subprocess:
subprocess.Popen(['sudo', '/opt/waf/nginx/sbin/nginx'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
this works, but if I do a manual restart/stop of apache, the nginx service is stopped too.
I tried changing the nginx.conf from "user www-data" to "user root", to see if changing the user this can be solved, but no, the problem persist.
inspecting with ps aux --forest with "user www-data" in nginx.conf
root 2637 0.0 0.2 20312 7984 ? Ss 15:20 0:00 /usr/sbin/apache2 -k start
www-data 2638 2.4 1.9 461320 59732 ? Sl 15:20 0:09 \_ /usr/sbin/apache2 -k start
www-data 2639 0.0 0.9 2014312 28572 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
www-data 2640 0.0 0.9 2014328 28528 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
root 3322 0.0 1.9 190476 59156 ? Ss 15:27 0:00 nginx: master process /opt/waf/nginx/sbin/nginx
www-data 3323 0.0 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3324 0.0 2.8 212248 85232 ? S 15:27 0:00 \_ nginx: worker process
www-data 3325 0.0 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3326 0.5 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3327 0.0 2.1 190556 64440 ? S 15:27 0:00 \_ nginx: cache manager process
www-data 3328 0.0 2.1 190556 64440 ? S 15:27 0:00 \_ nginx: cache loader process
inspecting with ps aux --forest with "user root" in nginx.conf
root 2637 0.0 0.2 20312 7984 ? Ss 15:20 0:00 /usr/sbin/apache2 -k start
www-data 2638 5.2 1.5 455400 47264 ? Sl 15:20 0:01 \_ /usr/sbin/apache2 -k start
www-data 2639 0.2 0.8 2014236 24608 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
www-data 2640 0.1 0.7 2014328 23160 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
root 2758 0.0 1.9 190476 59156 ? Ss 15:20 0:00 nginx: master process /opt/waf/nginx/sbin/nginx
root 2759 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2760 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2761 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2762 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2763 0.0 2.1 190556 64352 ? S 15:20 0:00 \_ nginx: cache manager process
root 2764 0.0 2.1 190556 64352 ? S 15:20 0:00 \_ nginx: cache loader process
I don't know how to solve this issue, I need nginx started by the web app as totally independent process.
Any help is really appreciated.
Cheers.
I guess this should help for you question:
Launch a completely independent process
I think there is great explanation of what are you tring to achieve with PIPEs.
I don't really understand up to now, how gunicorn works. What I currently see is, that if I start:
/usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
and than run
ps -aux | grep flaskVideo
I get this response
user 0.0 0.0 13464 1096 pts/14 S+ 10:33 0:00 grep --color=auto flaskVideo
user 13684 0.0 0.4 95624 34796 pts/7 S+ 10:20 0:00 /usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
user 13698 0.4 0.5 199228 45696 pts/7 S+ 10:20 0:03 /usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
so it seems, that there is running more than one thread.
How do I have to interpretate the two running threads?
I'm deploying(daemonizing) celery. Following is my celeryd file.
CELERYD_NODES="worker1 worker2 worker3 worker4"
CELERY_BIN="/usr/local/bin/celery"
CELERY_APP="proj:app"
CELERYD_CHDIR="/path/to/proj/"
CELERYD_OPTS="-n %N#%h --config=proj.celeryconfig -l DEBUG --without-heartbeat"
CELERYD_LOG_FILE="/var/log/celery/%N.log"
CELERYD_PID_FILE="/var/run/celery/%N.pid"
CELERYD_USER="celery"
CELERYD_GROUP="celery"
CELERY_CREATE_DIRS=1
When I start celeryd, it spawns 20 processes. Why and how? My server has 4 CPUs.
Upon firing the below command
ps aux|grep 'celery worker'
Following is the output.
gfile=/var/log/celery/worker1.log --pidfile=/var/run/celery/worker1.pid --hostname=worker1#worker1#worker1#worker1#%h
celery 26652 5.7 0.8 270340 66892 ? S 18:41 0:02 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker2.log --pidfile=/var/run/celery/worker2.pid --hostname=worker2#worker2#worker2#worker2#%h
celery 26662 1.0 0.7 860128 59364 ? Sl 18:41 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker1.log --pidfile=/var/run/celery/worker1.pid --hostname=worker1#worker1#worker1#worker1#%h
celery 26663 1.1 0.7 565456 59584 ? S 18:41 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker1.log --pidfile=/var/run/celery/worker1.pid --hostname=worker1#worker1#worker1#worker1#%h
celery 26664 1.6 0.7 860384 59624 ? Sl 18:41 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker1.log --pidfile=/var/run/celery/worker1.pid --hostname=worker1#worker1#worker1#worker1#%h
celery 26665 0.2 0.7 270272 59672 ? S 18:41 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker1.log --pidfile=/var/run/celery/worker1.pid --hostname=worker1#worker1#worker1#worker1#%h
celery 26668 5.3 0.8 270340 66656 ? S 18:41 0:02 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker3.log --pidfile=/var/run/celery/worker3.pid --hostname=worker3#worker3#worker3#worker3#%h
celery 26682 0.3 0.7 270272 58048 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker2.log --pidfile=/var/run/celery/worker2.pid --hostname=worker2#worker2#worker2#worker2#%h
celery 26683 0.1 0.7 270272 58044 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker2.log --pidfile=/var/run/celery/worker2.pid --hostname=worker2#worker2#worker2#worker2#%h
celery 26684 0.3 0.7 270272 58044 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker2.log --pidfile=/var/run/celery/worker2.pid --hostname=worker2#worker2#worker2#worker2#%h
celery 26685 0.2 0.7 270272 58108 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker2.log --pidfile=/var/run/celery/worker2.pid --hostname=worker2#worker2#worker2#worker2#%h
celery 26687 5.5 0.8 270340 66824 ? S 18:42 0:02 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker4.log --pidfile=/var/run/celery/worker4.pid --hostname=worker4#worker4#worker4#worker4#%h
celery 26696 0.1 0.7 270272 58036 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker3.log --pidfile=/var/run/celery/worker3.pid --hostname=worker3#worker3#worker3#worker3#%h
celery 26697 1.3 0.7 861124 60648 ? Sl 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker3.log --pidfile=/var/run/celery/worker3.pid --hostname=worker3#worker3#worker3#worker3#%h
celery 26698 0.1 0.7 270272 58032 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker3.log --pidfile=/var/run/celery/worker3.pid --hostname=worker3#worker3#worker3#worker3#%h
celery 26699 0.1 0.7 270272 58100 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker3.log --pidfile=/var/run/celery/worker3.pid --hostname=worker3#worker3#worker3#worker3#%h
celery 26701 0.0 0.7 270272 57720 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker4.log --pidfile=/var/run/celery/worker4.pid --hostname=worker4#worker4#worker4#worker4#%h
celery 26702 0.0 0.6 270080 53260 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker4.log --pidfile=/var/run/celery/worker4.pid --hostname=worker4#worker4#worker4#worker4#%h
celery 26703 0.0 0.7 270272 57268 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker4.log --pidfile=/var/run/celery/worker4.pid --hostname=worker4#worker4#worker4#worker4#%h
celery 26704 0.0 0.7 270272 57336 ? S 18:42 0:00 /usr/bin/python -m celery worker --without-heartbeat -l DEBUG --config=c26_message.celeryconfig --loglevel=INFO --logfile=/var/log/celery/worker4.log --pidfile=/var/run/celery/worker4.pid --hostname=worker4#worker4#worker4#worker4#%h
root 26740 0.0 0.0 14196 984 pts/5 S+ 18:42 0:00 grep --color=auto celery worker
Celery flower also shows 20 workers, out of which 16 are offline and 4 are online. How is it possible?
The root cause is 4 Nodes is created and default concurrrency settings based on number of CPU. So, you can see each nodes created 5 process.
Set each concurrency node by referring to example as below to set one node per CPU:
CELERYD_OPTS="-c 4 -c:worker4 1 -c:worker3 1 -c:worker2 1 -c:worker1 1"
May refer example configuration of celery documentation: http://docs.celeryproject.org/en/latest/userguide/daemonizing.html#example-configuration
[user#centos-vm-02 ~]$ ps aux|grep python
user 4182 0.0 0.0 9228 1080 ? Ss 02:00 0:00 /bin/sh -c cd data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 0.1 0.1 341108 10740 ? Sl 02:00 0:52 /usr/local/bin/python2.7 main.py
user 4205 166 1.6 1175176 129312 ? Sl 02:00 901:39 /usr/local/bin/python2.7 main.py
user 10049 0.1 0.1 435856 10712 ? Sl 10:21 0:04 /usr/local/bin/python2.7 main.py
user 10051 71.1 2.5 948248 207628 ? Sl 10:21 28:42 /usr/local/bin/python2.7 main.py
user 10052 51.9 1.9 948380 154688 ? Sl 10:21 20:57 /usr/local/bin/python2.7 main.py
user 10053 85.9 0.9 815104 76652 ? Sl 10:21 34:41 /usr/local/bin/python2.7 main.py
user 11166 0.0 0.0 103240 864 pts/1 S+ 11:01 0:00 grep python
[user#centos-vm-02 ~]$ ps -ef|grep python
user 4182 4174 0 02:00 ? 00:00:00 /bin/sh -c cd /data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 4182 0 02:00 ? 00:00:52 /usr/local/bin/python2.7 main.py
user 4205 4190 99 02:00 ? 15:01:46 /usr/local/bin/python2.7 main.py
user 10049 1 0 10:21 ? 00:00:04 /usr/local/bin/python2.7 main.py
user 10051 10049 71 10:21 ? 00:28:47 /usr/local/bin/python2.7 main.py
user 10052 10049 51 10:21 ? 00:21:01 /usr/local/bin/python2.7 main.py
user 10053 10049 85 10:21 ? 00:34:45 /usr/local/bin/python2.7 main.py
user 11168 10904 0 11:01 pts/1 00:00:00 grep python
As we see, I launch a python process that it would spwan multiprocess, and inside the processes, multithreads are started, and inside the threads, multithreads are started.
Process tree like this:
main_process
--sub_process
----thread1
------sub_thread
------sub_thread
------sub_thread
------sub_thread
----thread2
----thread3
--sub_process
----......
Inside the picture, the pid-4205 shows different CPU usage in ps aux and ps -ef, one is 166, and the other is 99, 166 was also shown in top -c.
And I assure that the pid-4205 is one of the sub processes, which means it could not use more than 100% of CPU with GIL in python.
So that's my question, why ps -ef and ps aux show difference.
It's just a sampling artifact. Say a factory produces one car per hour. If you get there right before a car is made and leave right after a car is made, you can see two cars made in a span of time just over an hour, resulting in you thinking the factory is operating at near double capacity.
Update: Let me try to clarify the example. Say a factory produces one car per hour, on the hour. It is incapable of producing more than one car per hour. If you watch the factory from 7:59 to 9:01, you will see two cars produced (one at 8:00 and one at 9:00) in just over one hours (62 minutes). So you would estimate the factory produces about two cars per hour, nearly double its actual production. That is what happened here. It's a sampling artifact caused by top checking the CPU counters at just the wrong time.