I was seeing some Killed messages in my logs for my gunicorn processes and sometimes when I am on the django shell, it just gets Killed.
After doing some research, I found that this could be lead to a lack of resource, especially memory.
I am using a vagrant VM of 512MB in development with the following stack:
Nginx
+
Gunicorn with 3 workers
+
3 RQ workers
+
Redis as Cache
+
Redis as DataStore (is not meant to store a lot of data, mainly used to store the queues for the workers)
+
PostgreSQL
For deployment, I initially planned to use 2 machines of 1GB each to start.
The DataBase (PostgreSQL Server) would be on one machine and all the other ones on the other machine. I am wondering whether I should change this plan based on what I found out.
Machine 1 of 1GB: Nginx, Gunicorn, RQ Workers, Redis Cache, Redis DataStore
Machine 2 of 1GB: PostgreSQL
Indeed, when I looked at the memory consumption, I saw that it was more gunicorn and the RQ workers that were consumming a lot of RAM. About 45MB each process, and there are 7 processes (4 Gunicorn + 3 RQ workers).
1) Is it a normal memory footprint for a django application?
2) Could these processes grow even bigger?
3) Also, any idea of what are the postgres: my_project my_project 10.0.0.51 idle processes?
(venv)mike#vagrant-ubuntu-trusty-64:/var/local/sites/my_project$ ps aux --sort=-rss,-vsz | head -21
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
my_project 24650 0.2 8.7 161200 44072 ? S 19:14 0:02 /var/local/sites/my_project/venv/bin/python /projects/my_project/webapp/manage.py rqworker high
my_project 24654 0.2 8.7 161188 44072 ? S 19:14 0:02 /var/local/sites/my_project/venv/bin/python /projects/my_project/webapp/manage.py rqworker high default low
my_project 24651 0.2 8.7 161184 44068 ? S 19:14 0:02 /var/local/sites/my_project/venv/bin/python /projects/my_project/webapp/manage.py rqworker high default low
my_project 26144 0.2 8.6 163612 43628 ? S 19:14 0:01 /var/local/sites/my_project/venv/bin/python /var/local/sites/my_project/venv/bin/gunicorn my_project.wsgi:application --name my_project_app --workers 3 --user=my_project --group=my_project --log-level=info --bind=unix:/tmp/my_project.gunicorn.sock --access-logfile=/projects/my_project-logs/vm_logs/gunicorn-my_project-access.log --log-file=/projects/my_project-logs/vm_logs/gunicorn-my_project.log
my_project 26143 0.2 8.6 163596 43608 ? S 19:14 0:01 /var/local/sites/my_project/venv/bin/python /var/local/sites/my_project/venv/bin/gunicorn my_project.wsgi:application --name my_project_app --workers 3 --user=my_project --group=my_project --log-level=info --bind=unix:/tmp/my_project.gunicorn.sock --access-logfile=/projects/my_project-logs/vm_logs/gunicorn-my_project-access.log --log-file=/projects/my_project-logs/vm_logs/gunicorn-my_project.log
my_project 26139 0.2 8.6 163592 43608 ? S 19:14 0:01 /var/local/sites/my_project/venv/bin/python /var/local/sites/my_project/venv/bin/gunicorn my_project.wsgi:application --name my_project_app --workers 3 --user=my_project --group=my_project --log-level=info --bind=unix:/tmp/my_project.gunicorn.sock --access-logfile=/projects/my_project-logs/vm_logs/gunicorn-my_project-access.log --log-file=/projects/my_project-logs/vm_logs/gunicorn-my_project.log
postgres 25660 0.0 3.2 246436 16212 ? S 19:14 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
my_project 26102 0.0 2.4 56704 12332 ? S 19:14 0:00 /var/local/sites/my_project/venv/bin/python /var/local/sites/my_project/venv/bin/gunicorn my_project.wsgi:application --name my_project_app --workers 3 --user=my_project --group=my_project --log-level=info --bind=unix:/tmp/my_project.gunicorn.sock --access-logfile=/projects/my_project-logs/vm_logs/gunicorn-my_project-access.log --log-file=/projects/my_project-logs/vm_logs/gunicorn-my_project.log
root 24615 0.0 2.3 56624 11936 ? Ss 19:14 0:00 /usr/bin/python /usr/local/bin/supervisord -c /etc/supervisord.conf --nodaemon
postgres 25679 0.0 1.2 247668 6268 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55189) idle
postgres 25674 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55187) idle
postgres 25678 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55188) idle
postgres 25683 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55190) idle
postgres 26547 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55191) idle
postgres 26548 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55192) idle
postgres 26549 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55193) idle
my_project 26652 0.0 0.8 21844 4284 pts/0 Ss 19:15 0:00 -bash
root 26603 0.0 0.8 107696 4236 ? Ss 19:15 0:00 sshd: my_project [priv]
postgres 25662 0.0 0.6 246572 3200 ? Ss 19:14 0:00 postgres: checkpointer process
Related
More than a week ago, I ran nohup python3 -u script.py on an Ubuntu beowulf cluster I was connected to via SSH. I've now gone back wanting to kill off these processes (this program is using multiprocesing with a Pool object), but I haven't been able to do so, as I haven't been able to find the PIDs. I know that the processes are still being run because nohup.out is still being appended to and other data is being generated, but nothing relevant seems to appear when I run commands like ps or top. For example, when I run ps -x -U mkarrmann, I get:
PID TTY STAT TIME COMMAND
1296920 ? Ss 0:00 /lib/systemd/systemd --user
1296929 ? S 0:00 (sd-pam)
1296937 ? Ssl 0:00 /usr/bin/pulseaudio --daemonize=no --log-target=journal
1296939 ? SNsl 0:00 /usr/libexec/tracker-miner-fs
1296944 ? Ss 0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
1296945 ? R 0:00 sshd: mkarrmann#pts/0
1296960 ? Ssl 0:00 /usr/libexec/gvfsd
1296965 ? Sl 0:00 /usr/libexec/gvfsd-fuse /run/user/3016/gvfs -f -o big_writes
1296972 ? Ssl 0:00 /usr/libexec/gvfs-udisks2-volume-monitor
1296979 pts/0 Ss 0:00 -bash
1296980 ? Ssl 0:00 /usr/libexec/gvfs-gphoto2-volume-monitor
1296987 ? Ssl 0:00 /usr/libexec/gvfs-afc-volume-monitor
1296992 ? Ssl 0:00 /usr/libexec/gvfs-mtp-volume-monitor
1297001 ? Ssl 0:00 /usr/libexec/gvfs-goa-volume-monitor
1297005 ? Sl 0:00 /usr/libexec/goa-daemon
1297014 ? Sl 0:00 /usr/libexec/goa-identity-service
1297126 pts/0 R+ 0:00 ps -x -U mkarrmann
Or when I run ps -faux | grep py, I get:
root 975 0.0 0.0 34240 8424 ? Ss Jul28 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 1046 0.0 0.3 476004 245516 ? Ss Jul28 66:45 /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
root 1275 0.0 0.0 20612 7732 ? S Jul28 0:00 \_ /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
mkarrma+ 1297143 0.0 0.0 6380 736 pts/0 S+ 14:40 0:00 \_ grep --color=auto py
Do any of these actually correspond to my Python processes and I'm just missing it? Anything else that I should try? I feel like the only thing I haven't tried is manually parsing through /proc, but that obviously shouldn't be necessary so I'm sure I'm missing something else.
I'm happy to provide any additional information that could be helpful. Thanks!
I'm managing Nginx with a custom web app (python) running over apache in a Debian 10 system.
It works fine, I can restart, reload, stop, check syntax of nginx without issue.
The problem arise when Nginx is started via custom web app (apache) and then if I restart/stop the apache via "service apache2 restart" or "/etc/init.d/apache2 restart", Nginx get stopped too.
the way that I start nginx is with python subprocess:
subprocess.Popen(['sudo', '/opt/waf/nginx/sbin/nginx'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
this works, but if I do a manual restart/stop of apache, the nginx service is stopped too.
I tried changing the nginx.conf from "user www-data" to "user root", to see if changing the user this can be solved, but no, the problem persist.
inspecting with ps aux --forest with "user www-data" in nginx.conf
root 2637 0.0 0.2 20312 7984 ? Ss 15:20 0:00 /usr/sbin/apache2 -k start
www-data 2638 2.4 1.9 461320 59732 ? Sl 15:20 0:09 \_ /usr/sbin/apache2 -k start
www-data 2639 0.0 0.9 2014312 28572 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
www-data 2640 0.0 0.9 2014328 28528 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
root 3322 0.0 1.9 190476 59156 ? Ss 15:27 0:00 nginx: master process /opt/waf/nginx/sbin/nginx
www-data 3323 0.0 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3324 0.0 2.8 212248 85232 ? S 15:27 0:00 \_ nginx: worker process
www-data 3325 0.0 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3326 0.5 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3327 0.0 2.1 190556 64440 ? S 15:27 0:00 \_ nginx: cache manager process
www-data 3328 0.0 2.1 190556 64440 ? S 15:27 0:00 \_ nginx: cache loader process
inspecting with ps aux --forest with "user root" in nginx.conf
root 2637 0.0 0.2 20312 7984 ? Ss 15:20 0:00 /usr/sbin/apache2 -k start
www-data 2638 5.2 1.5 455400 47264 ? Sl 15:20 0:01 \_ /usr/sbin/apache2 -k start
www-data 2639 0.2 0.8 2014236 24608 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
www-data 2640 0.1 0.7 2014328 23160 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
root 2758 0.0 1.9 190476 59156 ? Ss 15:20 0:00 nginx: master process /opt/waf/nginx/sbin/nginx
root 2759 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2760 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2761 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2762 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2763 0.0 2.1 190556 64352 ? S 15:20 0:00 \_ nginx: cache manager process
root 2764 0.0 2.1 190556 64352 ? S 15:20 0:00 \_ nginx: cache loader process
I don't know how to solve this issue, I need nginx started by the web app as totally independent process.
Any help is really appreciated.
Cheers.
I guess this should help for you question:
Launch a completely independent process
I think there is great explanation of what are you tring to achieve with PIPEs.
I don't really understand up to now, how gunicorn works. What I currently see is, that if I start:
/usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
and than run
ps -aux | grep flaskVideo
I get this response
user 0.0 0.0 13464 1096 pts/14 S+ 10:33 0:00 grep --color=auto flaskVideo
user 13684 0.0 0.4 95624 34796 pts/7 S+ 10:20 0:00 /usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
user 13698 0.4 0.5 199228 45696 pts/7 S+ 10:20 0:03 /usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
so it seems, that there is running more than one thread.
How do I have to interpretate the two running threads?
Any way to hot reload python modules for a running python process? In usual cases we could run kill -HUP <pid> for some of the servers like squid, nginx,gunicorn. My running processes are
root 6 0.6 0.9 178404 39116 ? S 14:21 0:00 python3 ./src/app.py --config ./conf/config.yml
root 7 0.0 1.0 501552 43404 ? Sl 14:21 0:00 python3 ./src/app.py --config ./conf/config.yml
root 8 0.0 1.0 501808 43540 ? Sl 14:21 0:00 python3 ./src/app.py --config ./conf/config.yml
Is the question about reloading a Sanic app? If yes, then there is a hot reload built into the server.
app.run(debug=True)
Or if you want the reload without debugging
app.run(auto_reload=True)
See docs
Or, if this is a question in general, checkout aoiklivereload
[user#centos-vm-02 ~]$ ps aux|grep python
user 4182 0.0 0.0 9228 1080 ? Ss 02:00 0:00 /bin/sh -c cd data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 0.1 0.1 341108 10740 ? Sl 02:00 0:52 /usr/local/bin/python2.7 main.py
user 4205 166 1.6 1175176 129312 ? Sl 02:00 901:39 /usr/local/bin/python2.7 main.py
user 10049 0.1 0.1 435856 10712 ? Sl 10:21 0:04 /usr/local/bin/python2.7 main.py
user 10051 71.1 2.5 948248 207628 ? Sl 10:21 28:42 /usr/local/bin/python2.7 main.py
user 10052 51.9 1.9 948380 154688 ? Sl 10:21 20:57 /usr/local/bin/python2.7 main.py
user 10053 85.9 0.9 815104 76652 ? Sl 10:21 34:41 /usr/local/bin/python2.7 main.py
user 11166 0.0 0.0 103240 864 pts/1 S+ 11:01 0:00 grep python
[user#centos-vm-02 ~]$ ps -ef|grep python
user 4182 4174 0 02:00 ? 00:00:00 /bin/sh -c cd /data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 4182 0 02:00 ? 00:00:52 /usr/local/bin/python2.7 main.py
user 4205 4190 99 02:00 ? 15:01:46 /usr/local/bin/python2.7 main.py
user 10049 1 0 10:21 ? 00:00:04 /usr/local/bin/python2.7 main.py
user 10051 10049 71 10:21 ? 00:28:47 /usr/local/bin/python2.7 main.py
user 10052 10049 51 10:21 ? 00:21:01 /usr/local/bin/python2.7 main.py
user 10053 10049 85 10:21 ? 00:34:45 /usr/local/bin/python2.7 main.py
user 11168 10904 0 11:01 pts/1 00:00:00 grep python
As we see, I launch a python process that it would spwan multiprocess, and inside the processes, multithreads are started, and inside the threads, multithreads are started.
Process tree like this:
main_process
--sub_process
----thread1
------sub_thread
------sub_thread
------sub_thread
------sub_thread
----thread2
----thread3
--sub_process
----......
Inside the picture, the pid-4205 shows different CPU usage in ps aux and ps -ef, one is 166, and the other is 99, 166 was also shown in top -c.
And I assure that the pid-4205 is one of the sub processes, which means it could not use more than 100% of CPU with GIL in python.
So that's my question, why ps -ef and ps aux show difference.
It's just a sampling artifact. Say a factory produces one car per hour. If you get there right before a car is made and leave right after a car is made, you can see two cars made in a span of time just over an hour, resulting in you thinking the factory is operating at near double capacity.
Update: Let me try to clarify the example. Say a factory produces one car per hour, on the hour. It is incapable of producing more than one car per hour. If you watch the factory from 7:59 to 9:01, you will see two cars produced (one at 8:00 and one at 9:00) in just over one hours (62 minutes). So you would estimate the factory produces about two cars per hour, nearly double its actual production. That is what happened here. It's a sampling artifact caused by top checking the CPU counters at just the wrong time.