I would like some help on how to set up properly a complicated job on a HPC. So, at some point in my python code I want to submit a job by using os.system("bsub -K < mama.sh") , I fould that the -K arg would actually wait for the job to end before continuing. So now I want from this mama.sh script to call 5 other jobs (kid1.sh, kid2.sh ... kid5.sh) that would run in parallel (to reduce computational time). Each one of these 5 children scripts will run a python piece of code. mama.sh should wait until all 5 other jobs have finished before continuing.
I thought of something like that:
#!/bin/sh
#BSUB -q hpc
#BSUB -J kids[1-5]
#BSUB -n 5
#BSUB -W 10:00
#BSUB -R "rusage[mem=6GB]"
#BSUB -R "span[hosts=1]"
# -- end of LSF options --
module load python3/3.8
python3 script%Ι.py
ORRR
python3 script1.py
python3 script2.py
python3 script3.py
python3 script4.py
python3 script5.py
Maybe the above doesn't make sense at all though. Is there any way to actually do that?
Thanks in advance
As is know to me, you can accomplish the goal in different level.
By two easy ways:
parallel your python code by import multiprocessing
parallel your shell script by &, command can be executed in the background.
python3 script1.py &
python3 script2.py
Related
For one gitlab CI runner
I have a jar file which needs to be continuosly running in the Git linux box but since this is a application which is continuosly running, the python script in the next line is not getting executed. How to run the jar application and then execute the python script simultaneously one after another?
.gitlab.ci-yml file:
pwd && ls -l
unzip ZAP_2.8.0_Core.zip && ls -l
bash scan.sh
python3 Report.py
scan.sh file has the code java -jar app.jar.
Since, this application is continuosly running, 4th line code python3 Report.py is not getting executed.
How do I make both these run simulataneously without the .jar application stopping?
The immediate solution would probably be:
pwd && ls -l
echo "ls OK"
unzip ZAP_2.8.0_Core.zip && ls -l
echo "unzip + ls OK"
bash scan.sh &
scanpid=$!
echo "started scanpid with pid $scanpid"]
ps axuf | grep $scanpid || true
echo "ps + grep OK"
( python3 Report.py ; echo $? > report_status.txt ) || true
echo "report script OK"
kill $scanpid
echo "kill OK"
echo "REPORT STATUS = $(cat report_status.txt)"
test $(cat report_status.txt) -eq 0
Start the java process in the background,
run your python code and remember its return status and always return true.
kill the background process after running python
check for the status code of the python script.
Perhaps this is not necessary, as I never checked how gitlabci deals with background processes, that were spawned by its runners.
I do here a conservative approach.
- I remember the process id of the bash script, so that I can kill it later
- I ensure, that the line running the python script always returns a 0 exit code such, that gitlabci does not stop executing the next lines, but I remember the status code
- then I kill the bash script
- then I check whether the exit code of the python script was 0 or not, such, that gitlabci can perform the proper checking whether the runner was executed successfully or not.
Another minor comment (not related to your question)
I don't really understand why you write
unzip ZAP_2.8.0_Core.zip && ls -l
instead of
unzip ZAP_2.8.0_Core.zip ; ls -l```
If you expect the unzip command to fail you could just write
unzip ZAP_2.8.0_Core.zip
ls -l
and gitlabci would abort automatically before executing ls -l
I also added many echo statements for better debugging, error analysis, you might remove them in your final solution.
To run the two scripts one after the other, you can add & to the end of the line that is blocking. That will make it run in the background.
Either do
bash scan.sh & or add & to the end of the line calling the jar file within the scan.sh...
I am trying to run shell code from a python file to submit another python file to a computing cluster. The shell code is as follows:
#BSUB -J Proc[1]
#BSUB -e ~/logs/proc.%I.%J.err
#BSUB -o ~/logs/proc.%I.%J.out
#BSUB -R "span[hosts=1]"
#BSUB -n 1
python main.py
But when I run it from python like the following I can't get it to work:
from os import system
system('bsub -n 1 < #BSUB -J Proc[1];#BSUB -e ~/logs/proc.%I.%J.err;#BSUB -o ~/logs/proc.%I.%J.out;#BSUB -R "span[hosts=1]";#BSUB -n 1;python main.py')
Is there something I'm doing wrong here?
If I understand correctly, all the #BSUB stuff is text that should be fed to the bsub command as input; bsub is run locally, then runs those commands for you on the compute node.
In that case, you can't just do:
bsub -n 1 < #BSUB -J Proc[1];#BSUB -e ~/logs/proc.%I.%J.err;#BSUB -o ~/logs/proc.%I.%J.out;#BSUB -R "span[hosts=1]";#BSUB -n 1;python main.py
That's interpreted by the shell as "run bsub -n 1 and read from a file named OH CRAP A COMMENT STARTED AND NOW WE DON'T HAVE A FILE TO READ!"
You could fix this with MOAR HACKERY (using echo or here strings taking further unnecessary dependencies on shell execution). But if you want to feed stdin input, the best solution is to use a more powerful tool for the task, the subprocess module:
# Open a process (no shell wrapper) that we can feed stdin to
proc = subprocess.Popen(['bsub', '-n', '1'], stdin=subprocess.PIPE)
# Feed the command series you needed to stdin, then wait for process to complete
# Per Michael Closson, can't use semi-colons, bsub requires newlines
proc.communicate(b'''#BSUB -J Proc[1]
#BSUB -e ~/logs/proc.%I.%J.err
#BSUB -o ~/logs/proc.%I.%J.out
#BSUB -R "span[hosts=1]"
#BSUB -n 1
python main.py
''')
# Assuming the exit code is meaningful, check it here
if proc.returncode != 0:
# Handle a failed process launch here
This avoids a shell launch entirely (removing the issue with needing to deal with comment characters at all, along with all the other issues with handling shell metacharacters), and is significantly more explicit about what is being run locally (bsub -n 1) and what is commands being run in the bsub session (the stdin).
The #BSUB directives are parsed by the bsub binary, which doesn't support ; as a delimiter. You need to use newlines. This worked for me.
#!/usr/bin/python
import subprocess;
# Open a process (no shell wrapper) that we can feed stdin to
proc = subprocess.Popen(['bsub', '-n', '1'], stdin=subprocess.PIPE)
# Feed the command series you needed to stdin, then wait for process to complete
input="""#!/bin/sh
#BSUB -J mysleep
sleep 101
"""
proc.communicate(input);
*** So obviously I got the python code from #ShadowRanger. +1 his answer. I would have posted this as a comment to his answer if SO supported python code in a comment.
I am trying to submit a python job with qsub which in turn submits several other jobs using subprocess and qsub.
I submit these jobs using 2 bash scripts shown below. run_test is the first one submitted and run_script is submit through subprocess.
$ cat run_test
#$ -cwd
#$ -V
#$ -pe openmpi 1
mpirun -n 1 python test_multiple_submit.py
$ cat run_script
#$ -cwd
#$ -V
#$ -pe openmpi 1
mpirun -n 1 python $1
I am having a problem with the second script where it seems to hang at the mpirun call. I was getting an error from bash before about 'module' not found but that has vanished recently.
A simplified version of the python script is shown below
import subprocess
subprocess.Popen(cmd)
subprocess.Popen('qsub run_script '+input)
<Some checks to see if jobs are still running>
The first subprocess runs a case on the current node and the second one should outsource the job to another node, then there are some checks to see if the jobs are still running. There are also some other bits to get other jobs submitted as well but I'm pretty sure this isn't a problem with the script.
Can anyone shed any light on why the second script is failing?
I found that the compute nodes on the cluster were not submit hosts therefore I was getting an error. The only submit host was the head node.
qconf -ss
The above lists the submit hosts. To add a node to the summit list as admin is shown below:
qconf -as < host name>
I have a python script i'd like to start on startup on an ubuntu ec2 instance but im running into troubles.
The script runs in a loop and takes care or exiting when its ready so i shouldn't need to start or stop it after its running.
I've read and tried a lot of approaches with various degrees of success and honestly im confused about whats the best approach. I've tried putting a shell script that starts the python script in /etc/init.d, making it executable and doing update-rc.d to try to get it to run but its failed at every stage.
here's the contents of the script ive tried:
#!/bin/bash
cd ~/Dropbox/Render\ Farm\ 1/appleseed/bin
while :
do
python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/
done
i then did
sudo chmod +x /etc/init.d/script_name
sudo sudo update-rc.d /etc/init.d/script_name defaults
This doesn't seem to run on startup and i cant see why, if i run the command manually it works as expected.
I also tried adding a line to rc.local to start the script but that doesn't seem to work either
Can anybody share what they have found is the simplest way to run a python script in the background with arguments on startup of an ec2 instance.
UPDATE: ----------------------
I've since moved this code to a file called /home/ubuntu/bin/watch_folder_start
#!/bin/bash
cd /home/ubuntu/Dropbox/Render\ Farm\ 1/appleseed/bin
while :
do
python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/
done
and changed my rc.local file to this:
nohup /home/ubuntu/bin/watch_folder_start &
exit 0
Which works when i manually run rc.local but wont fire on startup, i did chmod +x rc.local but that didn't change anything,
Your /etc/init.d/script_name is missing the plumbing that update-rc.d and so on use, and won't properly handle stop, start, and other init-variety commands, so...
For initial experimentation, take advantage of the /etc/init.d/rc.local script (which should be linked to by default from /etc/rc2/S99rc.local). The gets you out of having to worry about the init.d conventions and just add things to /etc/rc.local before the exit 0 at its end.
Additionally, that ~ isn't going to be defined, you'll need to use a full pathname - and furthermore the script will run as root. We'll address how to avoid this if desired in a bit. In any of these, you'll need to replace "whoeveryouare" with something more useful. Also be warned that you may need to prefix the python command with a su command and some arguments to get the process to run with the user id you might need.
You might try (in /etc/rc.local):
( if cd '/home/whoeveryouare/Dropbox/Render Farm 1/appleseed/bin' ; then
while : ; do
# This loop should respawn watchfolder18.py if it dies, but
# ideally one should fix watchfolder18.py and remove this loop.
python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/
done
else
echo warning: could not find watchfolder 1>&2
fi
) &
You could also put all that in a script and just call it from /etc/rc.local.
The first pass is roughly what you had, but if we assume that watchfolder18.py will arrange to avoid dying we can cut it down to:
( cd '/home/whoeveryouare/Dropbox/Render Farm 1/appleseed/bin' \
&& exec python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/ ) &
These aren't all that pretty, but it should let you get your daemon sorted out so you can debug it and so on, then come back to making a proper /etc/init.d or /etc/init script later. Something like this might work in /etc/init/watchfolder.conf, but I'm not yet facile enough to claim this is anything other than a rough stab at it:
# watchfolder - spawner for watchfolder18.py
description "watchfolder program"
start on runlevel [2345]
stop on runlevel [!2345]
script
if cd '/home/whoeveryouare/Dropbox/Render Farm 1/appleseed/bin' ; then
exec python ./watchfolder18.py -t ./appleseed.cli -u ec2 ../../data/0
fi
end script
I found that the best solution in the end was to use 'upstart' and create a file in etc/init called myfile.conf that contained the following
description "watch folder service"
author "Jonathan Topf"
start on startup
stop on shutdown
# Automatically Respawn:
respawn
respawn limit 99 5
script
HOST=`hostname`
chdir /home/ubuntu/Dropbox/Render\ Farm\ 1/appleseed/bin
exec /usr/bin/python ./watchfolder.py -t ./appleseed.cli -u $HOST ../../data/ >> /home/ubuntu/bin/ec2_server.log 2>&1
echo "watch_folder started"
end script
More info on using the upstart system here
http://upstart.ubuntu.com/
https://help.ubuntu.com/community/UbuntuBootupHowto
http://blog.joshsoftware.com/2012/02/14/upstart-scripts-in-ubuntu/
I have a python script to start a process which I want to monitor using Nagios. When I run that script and perform ps -ef on my ubuntu EC2 instance, it shows process as python <filename>.py --arguments. For Nagios to monitor that process using check_procs, we need to supply process name. Here process name becomes 'python'.
/usr/lib/nagios/plugins/check_procs -C python
It returns the output that one python process is running. This is fine when I'm running one python process. But If I'm running multiple python scripts and monitor only few, then I have to give that particular process name. If in the above command, I give python script name, it throws an error. So I want to mask whole python <filename>.py --arguments to some other name so that while performing check_procs, I can give that new name.
If anyone have any idea, please let me know. I have checked other stackoverflow questions which suggest changing python process name using setproctitle but I want to perform it using shell.
Regards,
Sanket
You can use the check_procs command to look at arguments, which includes the module name. The following command will let you know if the python module 'module.py' is running.
/usr/lib/nagios/plugins/check_procs -c 1:1 -a module.py -C python
The -c argument lets you set the critical range. 1:1 will trigger a critical status if there is more or less than 1 process that matches running.
The -a argument will filter based on processes that contain the args 'module.py' (change it to the name of the module you want to monitor)
The -C argument will make sure that the process is a python process
If you need help figuring out how to create the service definition, I had to figure that out too. Just let me know.
REFERENCE:
check_procs plugin manpage
http://nagiosplugins.org/man/check_procs
You can't change the process name from pure Python, although you can use a wrapper (for example, written in C) to do so.
However, what you should do instead is making your program a daemon, and using a pidfile. Have a look at the python Daemon API and its implementation python-daemon.
check_procs already handles this situation.
check_procs can tell the difference between scripts launched as an argument to the interpreter vs jobs run directly a hashbang interpreter. Even though both of these look the same in the ps output!! The latter case will not be listed in check_procs -C python!
If you run your scripts explicitly via python: python <filename.py>, then you can monitor them with the check_procs -C python -a filename.py.
If you put #!/usr/bin/python in your scripts and run them as ./filename.py, then you can monitor with check_procs -C filename.py.
Example command line session showing this behavior:
#make test.py directly executable. See code below
$ chmod a+x test.py
#launch via python explicitly:
$ /usr/bin/python ./test.py &
[1] 27094
$ check_procs -C python && check_procs -C test.py && check_procs -a test.py
PROCS OK: 1 process with command name 'python'
PROCS OK: 0 processes with command name 'test.py'
PROCS OK: 1 process with args 'test.py'
#launch via python implicitly
$ ./test.py &
[2] 27134
$ check_procs -C python && check_procs -C test.py && check_procs -a test.py
PROCS OK: 1 process with command name 'python'
PROCS OK: 1 process with command name 'test.py'
PROCS OK: 2 processes with args 'test.py'
#PS 'COMMAND' output looks the same
$ ps 27094 27134
PID TTY STAT TIME COMMAND
27094 pts/6 S 0:00 /usr/bin/python ./test.py
27134 pts/6 S 0:00 /usr/bin/python ./test.py
#kill the explicit test
$ kill 27094
[1] - terminated /usr/bin/python ./test.py
$ check_procs -C python && check_procs -C test.py && check_procs -a test.py
PROCS OK: 0 processes with command name 'python'
PROCS OK: 1 process with command name 'test.py'
PROCS OK: 1 process with args 'test.py'
#kill the implicit test
$ kill 27134
[2] + terminated ./test.py
$ check_procs -C python && check_procs -C test.py && check_procs -a test.py
PROCS OK: 0 processes with command name 'python'
PROCS OK: 0 processes with command name 'test.py'
PROCS OK: 0 processes with args 'test.py'
test.py is a python script that sleeps for 2 minutes. It is chmod +x and has a hashbang #! line invoking /usr/bin/python.
#!/usr/bin/python
import time
time.sleep(120)
Create a pid file and use that file for the process lookup with nagios.
I'm not saying this is the best solution (it wouldn't scale well at all), but you can create a symbolic link to the python command and execute your script using this link. e.g.
ln -s `which python` ~/mypython
~/mypython myscript.py
Scripts launched using the link should show up as mypython in ps.
You can use subprocess.Popen to change the executable name, but you'd have to use a wrapper script (or some weird fork magic). The following code causes ps to list the executable as kwyjibo /tmp/test.py instead of /usr/bin/python /tmp/test.py:
import subprocess
p = subprocess.Popen(['kwyjibo', '/tmp/test.py'], executable='/usr/bin/python')