Processes Not Appearing on Beowulf Cluster - python

More than a week ago, I ran nohup python3 -u script.py on an Ubuntu beowulf cluster I was connected to via SSH. I've now gone back wanting to kill off these processes (this program is using multiprocesing with a Pool object), but I haven't been able to do so, as I haven't been able to find the PIDs. I know that the processes are still being run because nohup.out is still being appended to and other data is being generated, but nothing relevant seems to appear when I run commands like ps or top. For example, when I run ps -x -U mkarrmann, I get:
PID TTY STAT TIME COMMAND
1296920 ? Ss 0:00 /lib/systemd/systemd --user
1296929 ? S 0:00 (sd-pam)
1296937 ? Ssl 0:00 /usr/bin/pulseaudio --daemonize=no --log-target=journal
1296939 ? SNsl 0:00 /usr/libexec/tracker-miner-fs
1296944 ? Ss 0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
1296945 ? R 0:00 sshd: mkarrmann#pts/0
1296960 ? Ssl 0:00 /usr/libexec/gvfsd
1296965 ? Sl 0:00 /usr/libexec/gvfsd-fuse /run/user/3016/gvfs -f -o big_writes
1296972 ? Ssl 0:00 /usr/libexec/gvfs-udisks2-volume-monitor
1296979 pts/0 Ss 0:00 -bash
1296980 ? Ssl 0:00 /usr/libexec/gvfs-gphoto2-volume-monitor
1296987 ? Ssl 0:00 /usr/libexec/gvfs-afc-volume-monitor
1296992 ? Ssl 0:00 /usr/libexec/gvfs-mtp-volume-monitor
1297001 ? Ssl 0:00 /usr/libexec/gvfs-goa-volume-monitor
1297005 ? Sl 0:00 /usr/libexec/goa-daemon
1297014 ? Sl 0:00 /usr/libexec/goa-identity-service
1297126 pts/0 R+ 0:00 ps -x -U mkarrmann
Or when I run ps -faux | grep py, I get:
root 975 0.0 0.0 34240 8424 ? Ss Jul28 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 1046 0.0 0.3 476004 245516 ? Ss Jul28 66:45 /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
root 1275 0.0 0.0 20612 7732 ? S Jul28 0:00 \_ /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
mkarrma+ 1297143 0.0 0.0 6380 736 pts/0 S+ 14:40 0:00 \_ grep --color=auto py
Do any of these actually correspond to my Python processes and I'm just missing it? Anything else that I should try? I feel like the only thing I haven't tried is manually parsing through /proc, but that obviously shouldn't be necessary so I'm sure I'm missing something else.
I'm happy to provide any additional information that could be helpful. Thanks!

Related

Python script not working when launched at startup of RPi

RPi 1B with V1 cam.
Python script takes a picture when pushbutton hooked to gpio is pressed. Then picture is emailed via Mutt.
All working fine when doing step by step.
But not doing as intended when lauched automatically at startup.
import subprocess
from datetime import datetime
from gpiozero import Button
button = Button(17)
while True:
button.wait_for_press()
time = datetime.now()
filename = "capture-%04d%02d%02d-%02d%02d%02d.jpg" % (time.year, time.month, time.day, time.hour, time.minute, time.second)
subprocess.call("raspistill -t 500 -o %s" % filename, shell=True)
subprocess.call("echo "" | mutt -s 'Someone at the door' -i messageBody.txt myname#mailprovider.com -a %s" % filename, shell=True)
All working fine when typing :
$ python raspicam.py
I get a nice email within seconds with picture attached to it.
Next logical step is to get this script to be launched at startup:
$ nano launcher.sh
#!/bin/sh
# launcher.sh
cd /
cd home/pi
python doorbell02.py
cd /
$ chmod 755 launcher.sh
$ sh launcher.sh
Then get it to be launched at startup via cron :
$ mkdir logs
$ sudo crontab -e
add: #reboot sh /home/pi/launcher.sh >/home/pi/logs/cronlog 2>&1
At next reboot all working fine except sending mail via with mutt.
$ ps aux shows that my python script and script launcher belongs to "root"... is it where trouble comes from ?
root 475 0.0 0.0 0 0 ? S 16:51 0:00 [cifsd]
root 500 0.0 0.6 7932 2300 ? Ss 16:51 0:00 /usr/sbin/cron -f
root 502 0.0 0.6 9452 2384 ? S 16:51 0:00 /usr/sbin/CRON -f
root 506 0.0 0.3 1924 1148 ? Ss 16:51 0:00 /bin/sh -c sh /home/pi/launcher.sh >/home/pi/logs/cronlog 2>&1
root 511 0.0 0.2 1924 1108 ? S 16:51 0:00 sh /home/pi/launcher.sh
root 513 1.5 2.5 34348 9728 ? Sl 16:51 4:25 python doorbell02.py
I am also unable to get pdb to work alongside with my script to get some log or debug info...
Some hints would be very appreciated
Thank you very much for your time
Try using absolute paths in your code.
It helped me in my case.

Start nginx with apache (python app) as independent process

I'm managing Nginx with a custom web app (python) running over apache in a Debian 10 system.
It works fine, I can restart, reload, stop, check syntax of nginx without issue.
The problem arise when Nginx is started via custom web app (apache) and then if I restart/stop the apache via "service apache2 restart" or "/etc/init.d/apache2 restart", Nginx get stopped too.
the way that I start nginx is with python subprocess:
subprocess.Popen(['sudo', '/opt/waf/nginx/sbin/nginx'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
this works, but if I do a manual restart/stop of apache, the nginx service is stopped too.
I tried changing the nginx.conf from "user www-data" to "user root", to see if changing the user this can be solved, but no, the problem persist.
inspecting with ps aux --forest with "user www-data" in nginx.conf
root 2637 0.0 0.2 20312 7984 ? Ss 15:20 0:00 /usr/sbin/apache2 -k start
www-data 2638 2.4 1.9 461320 59732 ? Sl 15:20 0:09 \_ /usr/sbin/apache2 -k start
www-data 2639 0.0 0.9 2014312 28572 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
www-data 2640 0.0 0.9 2014328 28528 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
root 3322 0.0 1.9 190476 59156 ? Ss 15:27 0:00 nginx: master process /opt/waf/nginx/sbin/nginx
www-data 3323 0.0 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3324 0.0 2.8 212248 85232 ? S 15:27 0:00 \_ nginx: worker process
www-data 3325 0.0 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3326 0.5 2.8 212248 85172 ? S 15:27 0:00 \_ nginx: worker process
www-data 3327 0.0 2.1 190556 64440 ? S 15:27 0:00 \_ nginx: cache manager process
www-data 3328 0.0 2.1 190556 64440 ? S 15:27 0:00 \_ nginx: cache loader process
inspecting with ps aux --forest with "user root" in nginx.conf
root 2637 0.0 0.2 20312 7984 ? Ss 15:20 0:00 /usr/sbin/apache2 -k start
www-data 2638 5.2 1.5 455400 47264 ? Sl 15:20 0:01 \_ /usr/sbin/apache2 -k start
www-data 2639 0.2 0.8 2014236 24608 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
www-data 2640 0.1 0.7 2014328 23160 ? Sl 15:20 0:00 \_ /usr/sbin/apache2 -k start
root 2758 0.0 1.9 190476 59156 ? Ss 15:20 0:00 nginx: master process /opt/waf/nginx/sbin/nginx
root 2759 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2760 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2761 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2762 0.0 2.8 212248 85232 ? S 15:20 0:00 \_ nginx: worker process
root 2763 0.0 2.1 190556 64352 ? S 15:20 0:00 \_ nginx: cache manager process
root 2764 0.0 2.1 190556 64352 ? S 15:20 0:00 \_ nginx: cache loader process
I don't know how to solve this issue, I need nginx started by the web app as totally independent process.
Any help is really appreciated.
Cheers.
I guess this should help for you question:
Launch a completely independent process
I think there is great explanation of what are you tring to achieve with PIPEs.

Gunicorn has more than one thread despite --thread 1

I don't really understand up to now, how gunicorn works. What I currently see is, that if I start:
/usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
and than run
ps -aux | grep flaskVideo
I get this response
user 0.0 0.0 13464 1096 pts/14 S+ 10:33 0:00 grep --color=auto flaskVideo
user 13684 0.0 0.4 95624 34796 pts/7 S+ 10:20 0:00 /usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
user 13698 0.4 0.5 199228 45696 pts/7 S+ 10:20 0:03 /usr/bin/python3 /usr/bin/gunicorn -k eventlet --timeout 60 --log-level debug --workers=1 -b 0.0.0.0:5001 flaskVideoClient2:create_app(5001,10)
so it seems, that there is running more than one thread.
How do I have to interpretate the two running threads?

Hot reloading python process for code reload

Any way to hot reload python modules for a running python process? In usual cases we could run kill -HUP <pid> for some of the servers like squid, nginx,gunicorn. My running processes are
root 6 0.6 0.9 178404 39116 ? S 14:21 0:00 python3 ./src/app.py --config ./conf/config.yml
root 7 0.0 1.0 501552 43404 ? Sl 14:21 0:00 python3 ./src/app.py --config ./conf/config.yml
root 8 0.0 1.0 501808 43540 ? Sl 14:21 0:00 python3 ./src/app.py --config ./conf/config.yml
Is the question about reloading a Sanic app? If yes, then there is a hot reload built into the server.
app.run(debug=True)
Or if you want the reload without debugging
app.run(auto_reload=True)
See docs
Or, if this is a question in general, checkout aoiklivereload

CPU usage difference between ps aux and -ef

[user#centos-vm-02 ~]$ ps aux|grep python
user 4182 0.0 0.0 9228 1080 ? Ss 02:00 0:00 /bin/sh -c cd data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 0.1 0.1 341108 10740 ? Sl 02:00 0:52 /usr/local/bin/python2.7 main.py
user 4205 166 1.6 1175176 129312 ? Sl 02:00 901:39 /usr/local/bin/python2.7 main.py
user 10049 0.1 0.1 435856 10712 ? Sl 10:21 0:04 /usr/local/bin/python2.7 main.py
user 10051 71.1 2.5 948248 207628 ? Sl 10:21 28:42 /usr/local/bin/python2.7 main.py
user 10052 51.9 1.9 948380 154688 ? Sl 10:21 20:57 /usr/local/bin/python2.7 main.py
user 10053 85.9 0.9 815104 76652 ? Sl 10:21 34:41 /usr/local/bin/python2.7 main.py
user 11166 0.0 0.0 103240 864 pts/1 S+ 11:01 0:00 grep python
[user#centos-vm-02 ~]$ ps -ef|grep python
user 4182 4174 0 02:00 ? 00:00:00 /bin/sh -c cd /data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 4182 0 02:00 ? 00:00:52 /usr/local/bin/python2.7 main.py
user 4205 4190 99 02:00 ? 15:01:46 /usr/local/bin/python2.7 main.py
user 10049 1 0 10:21 ? 00:00:04 /usr/local/bin/python2.7 main.py
user 10051 10049 71 10:21 ? 00:28:47 /usr/local/bin/python2.7 main.py
user 10052 10049 51 10:21 ? 00:21:01 /usr/local/bin/python2.7 main.py
user 10053 10049 85 10:21 ? 00:34:45 /usr/local/bin/python2.7 main.py
user 11168 10904 0 11:01 pts/1 00:00:00 grep python
As we see, I launch a python process that it would spwan multiprocess, and inside the processes, multithreads are started, and inside the threads, multithreads are started.
Process tree like this:
main_process
--sub_process
----thread1
------sub_thread
------sub_thread
------sub_thread
------sub_thread
----thread2
----thread3
--sub_process
----......
Inside the picture, the pid-4205 shows different CPU usage in ps aux and ps -ef, one is 166, and the other is 99, 166 was also shown in top -c.
And I assure that the pid-4205 is one of the sub processes, which means it could not use more than 100% of CPU with GIL in python.
So that's my question, why ps -ef and ps aux show difference.
It's just a sampling artifact. Say a factory produces one car per hour. If you get there right before a car is made and leave right after a car is made, you can see two cars made in a span of time just over an hour, resulting in you thinking the factory is operating at near double capacity.
Update: Let me try to clarify the example. Say a factory produces one car per hour, on the hour. It is incapable of producing more than one car per hour. If you watch the factory from 7:59 to 9:01, you will see two cars produced (one at 8:00 and one at 9:00) in just over one hours (62 minutes). So you would estimate the factory produces about two cars per hour, nearly double its actual production. That is what happened here. It's a sampling artifact caused by top checking the CPU counters at just the wrong time.

Categories