"Killed" processes and django/gunicorn memory footprint - python

I was seeing some Killed messages in my logs for my gunicorn processes and sometimes when I am on the django shell, it just gets Killed.
After doing some research, I found that this could be lead to a lack of resource, especially memory.
I am using a vagrant VM of 512MB in development with the following stack:
Nginx
+
Gunicorn with 3 workers
+
3 RQ workers
+
Redis as Cache
+
Redis as DataStore (is not meant to store a lot of data, mainly used to store the queues for the workers)
+
PostgreSQL
For deployment, I initially planned to use 2 machines of 1GB each to start.
The DataBase (PostgreSQL Server) would be on one machine and all the other ones on the other machine. I am wondering whether I should change this plan based on what I found out.
Machine 1 of 1GB: Nginx, Gunicorn, RQ Workers, Redis Cache, Redis DataStore
Machine 2 of 1GB: PostgreSQL
Indeed, when I looked at the memory consumption, I saw that it was more gunicorn and the RQ workers that were consumming a lot of RAM. About 45MB each process, and there are 7 processes (4 Gunicorn + 3 RQ workers).
1) Is it a normal memory footprint for a django application?
2) Could these processes grow even bigger?
3) Also, any idea of what are the postgres: my_project my_project 10.0.0.51 idle processes?
(venv)mike#vagrant-ubuntu-trusty-64:/var/local/sites/my_project$ ps aux --sort=-rss,-vsz | head -21
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
my_project 24650 0.2 8.7 161200 44072 ? S 19:14 0:02 /var/local/sites/my_project/venv/bin/python /projects/my_project/webapp/manage.py rqworker high
my_project 24654 0.2 8.7 161188 44072 ? S 19:14 0:02 /var/local/sites/my_project/venv/bin/python /projects/my_project/webapp/manage.py rqworker high default low
my_project 24651 0.2 8.7 161184 44068 ? S 19:14 0:02 /var/local/sites/my_project/venv/bin/python /projects/my_project/webapp/manage.py rqworker high default low
my_project 26144 0.2 8.6 163612 43628 ? S 19:14 0:01 /var/local/sites/my_project/venv/bin/python /var/local/sites/my_project/venv/bin/gunicorn my_project.wsgi:application --name my_project_app --workers 3 --user=my_project --group=my_project --log-level=info --bind=unix:/tmp/my_project.gunicorn.sock --access-logfile=/projects/my_project-logs/vm_logs/gunicorn-my_project-access.log --log-file=/projects/my_project-logs/vm_logs/gunicorn-my_project.log
my_project 26143 0.2 8.6 163596 43608 ? S 19:14 0:01 /var/local/sites/my_project/venv/bin/python /var/local/sites/my_project/venv/bin/gunicorn my_project.wsgi:application --name my_project_app --workers 3 --user=my_project --group=my_project --log-level=info --bind=unix:/tmp/my_project.gunicorn.sock --access-logfile=/projects/my_project-logs/vm_logs/gunicorn-my_project-access.log --log-file=/projects/my_project-logs/vm_logs/gunicorn-my_project.log
my_project 26139 0.2 8.6 163592 43608 ? S 19:14 0:01 /var/local/sites/my_project/venv/bin/python /var/local/sites/my_project/venv/bin/gunicorn my_project.wsgi:application --name my_project_app --workers 3 --user=my_project --group=my_project --log-level=info --bind=unix:/tmp/my_project.gunicorn.sock --access-logfile=/projects/my_project-logs/vm_logs/gunicorn-my_project-access.log --log-file=/projects/my_project-logs/vm_logs/gunicorn-my_project.log
postgres 25660 0.0 3.2 246436 16212 ? S 19:14 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
my_project 26102 0.0 2.4 56704 12332 ? S 19:14 0:00 /var/local/sites/my_project/venv/bin/python /var/local/sites/my_project/venv/bin/gunicorn my_project.wsgi:application --name my_project_app --workers 3 --user=my_project --group=my_project --log-level=info --bind=unix:/tmp/my_project.gunicorn.sock --access-logfile=/projects/my_project-logs/vm_logs/gunicorn-my_project-access.log --log-file=/projects/my_project-logs/vm_logs/gunicorn-my_project.log
root 24615 0.0 2.3 56624 11936 ? Ss 19:14 0:00 /usr/bin/python /usr/local/bin/supervisord -c /etc/supervisord.conf --nodaemon
postgres 25679 0.0 1.2 247668 6268 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55189) idle
postgres 25674 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55187) idle
postgres 25678 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55188) idle
postgres 25683 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55190) idle
postgres 26547 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55191) idle
postgres 26548 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55192) idle
postgres 26549 0.0 1.2 247668 6264 ? Ss 19:14 0:00 postgres: my_project my_project 10.0.0.51(55193) idle
my_project 26652 0.0 0.8 21844 4284 pts/0 Ss 19:15 0:00 -bash
root 26603 0.0 0.8 107696 4236 ? Ss 19:15 0:00 sshd: my_project [priv]
postgres 25662 0.0 0.6 246572 3200 ? Ss 19:14 0:00 postgres: checkpointer process

Related

Processes Not Appearing on Beowulf Cluster

More than a week ago, I ran nohup python3 -u script.py on an Ubuntu beowulf cluster I was connected to via SSH. I've now gone back wanting to kill off these processes (this program is using multiprocesing with a Pool object), but I haven't been able to do so, as I haven't been able to find the PIDs. I know that the processes are still being run because nohup.out is still being appended to and other data is being generated, but nothing relevant seems to appear when I run commands like ps or top. For example, when I run ps -x -U mkarrmann, I get:
PID TTY STAT TIME COMMAND
1296920 ? Ss 0:00 /lib/systemd/systemd --user
1296929 ? S 0:00 (sd-pam)
1296937 ? Ssl 0:00 /usr/bin/pulseaudio --daemonize=no --log-target=journal
1296939 ? SNsl 0:00 /usr/libexec/tracker-miner-fs
1296944 ? Ss 0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
1296945 ? R 0:00 sshd: mkarrmann#pts/0
1296960 ? Ssl 0:00 /usr/libexec/gvfsd
1296965 ? Sl 0:00 /usr/libexec/gvfsd-fuse /run/user/3016/gvfs -f -o big_writes
1296972 ? Ssl 0:00 /usr/libexec/gvfs-udisks2-volume-monitor
1296979 pts/0 Ss 0:00 -bash
1296980 ? Ssl 0:00 /usr/libexec/gvfs-gphoto2-volume-monitor
1296987 ? Ssl 0:00 /usr/libexec/gvfs-afc-volume-monitor
1296992 ? Ssl 0:00 /usr/libexec/gvfs-mtp-volume-monitor
1297001 ? Ssl 0:00 /usr/libexec/gvfs-goa-volume-monitor
1297005 ? Sl 0:00 /usr/libexec/goa-daemon
1297014 ? Sl 0:00 /usr/libexec/goa-identity-service
1297126 pts/0 R+ 0:00 ps -x -U mkarrmann
Or when I run ps -faux | grep py, I get:
root 975 0.0 0.0 34240 8424 ? Ss Jul28 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 1046 0.0 0.3 476004 245516 ? Ss Jul28 66:45 /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
root 1275 0.0 0.0 20612 7732 ? S Jul28 0:00 \_ /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
mkarrma+ 1297143 0.0 0.0 6380 736 pts/0 S+ 14:40 0:00 \_ grep --color=auto py
Do any of these actually correspond to my Python processes and I'm just missing it? Anything else that I should try? I feel like the only thing I haven't tried is manually parsing through /proc, but that obviously shouldn't be necessary so I'm sure I'm missing something else.
I'm happy to provide any additional information that could be helpful. Thanks!

Start nginx with apache (python app) as independent process

I'm managing Nginx with a custom web app (python) running over apache in a Debian 10 system.
It works fine, I can restart, reload, stop, check syntax of nginx without issue.
The problem arise when Nginx is started via custom web app (apache) and then if I restart/stop the apache via "service apache2 restart" or "/etc/init.d/apache2 restart", Nginx get stopped too.
the way that I start nginx is with python subprocess:
subprocess.Popen(['sudo', '/opt/waf/nginx/sbin/nginx'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
this works, but if I do a manual restart/stop of apache, the nginx service is stopped too.
I tried changing the nginx.conf from "user www-data" to "user root", to see if changing the user this can be solved, but no, the problem persist.
inspecting with ps aux --forest with "user www-data" in nginx.conf
root 2637 0.0 0.2 20312 7984 ? Ss 15:20 0:00 /usr/sbin/apache2 -k start
www-data 2638 2.4 1.9 461320 59732 ? Sl 15:20 0:09 \_ /usr/sbin/apache2 -k start
www-data 2639 0.0 0.9 2014312 28572 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
www-data 2640 0.0 0.9 2014328 28528 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
root 3322 0.0 1.9 190476 59156 ? Ss 15:27 0:00 nginx: master process /opt/waf/nginx/sbin/nginx
www-data 3323 0.0 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3324 0.0 2.8 212248 85232 ? S 15:27 0:00 \_ nginx: worker process
www-data 3325 0.0 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3326 0.5 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3327 0.0 2.1 190556 64440 ? S 15:27 0:00 \_ nginx: cache manager process
www-data 3328 0.0 2.1 190556 64440 ? S 15:27 0:00 \_ nginx: cache loader process
inspecting with ps aux --forest with "user root" in nginx.conf
root 2637 0.0 0.2 20312 7984 ? Ss 15:20 0:00 /usr/sbin/apache2 -k start
www-data 2638 5.2 1.5 455400 47264 ? Sl 15:20 0:01 \_ /usr/sbin/apache2 -k start
www-data 2639 0.2 0.8 2014236 24608 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
www-data 2640 0.1 0.7 2014328 23160 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
root 2758 0.0 1.9 190476 59156 ? Ss 15:20 0:00 nginx: master process /opt/waf/nginx/sbin/nginx
root 2759 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2760 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2761 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2762 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2763 0.0 2.1 190556 64352 ? S 15:20 0:00 \_ nginx: cache manager process
root 2764 0.0 2.1 190556 64352 ? S 15:20 0:00 \_ nginx: cache loader process
I don't know how to solve this issue, I need nginx started by the web app as totally independent process.
Any help is really appreciated.
Cheers.
I guess this should help for you question:
Launch a completely independent process
I think there is great explanation of what are you tring to achieve with PIPEs.

Gunicorn has more than one thread despite --thread 1

I don't really understand up to now, how gunicorn works. What I currently see is, that if I start:
/usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
and than run
ps -aux | grep flaskVideo
I get this response
user 0.0 0.0 13464 1096 pts/14 S+ 10:33 0:00 grep --color=auto flaskVideo
user 13684 0.0 0.4 95624 34796 pts/7 S+ 10:20 0:00 /usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
user 13698 0.4 0.5 199228 45696 pts/7 S+ 10:20 0:03 /usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
so it seems, that there is running more than one thread.
How do I have to interpretate the two running threads?

Hot reloading python process for code reload

Any way to hot reload python modules for a running python process? In usual cases we could run kill -HUP <pid> for some of the servers like squid, nginx,gunicorn. My running processes are
root 6 0.6 0.9 178404 39116 ? S 14:21 0:00 python3 ./src/app.py --config ./conf/config.yml
root 7 0.0 1.0 501552 43404 ? Sl 14:21 0:00 python3 ./src/app.py --config ./conf/config.yml
root 8 0.0 1.0 501808 43540 ? Sl 14:21 0:00 python3 ./src/app.py --config ./conf/config.yml
Is the question about reloading a Sanic app? If yes, then there is a hot reload built into the server.
app.run(debug=True)
Or if you want the reload without debugging
app.run(auto_reload=True)
See docs
Or, if this is a question in general, checkout aoiklivereload

CPU usage difference between ps aux and -ef

[user#centos-vm-02 ~]$ ps aux|grep python
user 4182 0.0 0.0 9228 1080 ? Ss 02:00 0:00 /bin/sh -c cd data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 0.1 0.1 341108 10740 ? Sl 02:00 0:52 /usr/local/bin/python2.7 main.py
user 4205 166 1.6 1175176 129312 ? Sl 02:00 901:39 /usr/local/bin/python2.7 main.py
user 10049 0.1 0.1 435856 10712 ? Sl 10:21 0:04 /usr/local/bin/python2.7 main.py
user 10051 71.1 2.5 948248 207628 ? Sl 10:21 28:42 /usr/local/bin/python2.7 main.py
user 10052 51.9 1.9 948380 154688 ? Sl 10:21 20:57 /usr/local/bin/python2.7 main.py
user 10053 85.9 0.9 815104 76652 ? Sl 10:21 34:41 /usr/local/bin/python2.7 main.py
user 11166 0.0 0.0 103240 864 pts/1 S+ 11:01 0:00 grep python
[user#centos-vm-02 ~]$ ps -ef|grep python
user 4182 4174 0 02:00 ? 00:00:00 /bin/sh -c cd /data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 4182 0 02:00 ? 00:00:52 /usr/local/bin/python2.7 main.py
user 4205 4190 99 02:00 ? 15:01:46 /usr/local/bin/python2.7 main.py
user 10049 1 0 10:21 ? 00:00:04 /usr/local/bin/python2.7 main.py
user 10051 10049 71 10:21 ? 00:28:47 /usr/local/bin/python2.7 main.py
user 10052 10049 51 10:21 ? 00:21:01 /usr/local/bin/python2.7 main.py
user 10053 10049 85 10:21 ? 00:34:45 /usr/local/bin/python2.7 main.py
user 11168 10904 0 11:01 pts/1 00:00:00 grep python
As we see, I launch a python process that it would spwan multiprocess, and inside the processes, multithreads are started, and inside the threads, multithreads are started.
Process tree like this:
main_process
--sub_process
----thread1
------sub_thread
------sub_thread
------sub_thread
------sub_thread
----thread2
----thread3
--sub_process
----......
Inside the picture, the pid-4205 shows different CPU usage in ps aux and ps -ef, one is 166, and the other is 99, 166 was also shown in top -c.
And I assure that the pid-4205 is one of the sub processes, which means it could not use more than 100% of CPU with GIL in python.
So that's my question, why ps -ef and ps aux show difference.
It's just a sampling artifact. Say a factory produces one car per hour. If you get there right before a car is made and leave right after a car is made, you can see two cars made in a span of time just over an hour, resulting in you thinking the factory is operating at near double capacity.
Update: Let me try to clarify the example. Say a factory produces one car per hour, on the hour. It is incapable of producing more than one car per hour. If you watch the factory from 7:59 to 9:01, you will see two cars produced (one at 8:00 and one at 9:00) in just over one hours (62 minutes). So you would estimate the factory produces about two cars per hour, nearly double its actual production. That is what happened here. It's a sampling artifact caused by top checking the CPU counters at just the wrong time.

Categories