Qsub job using subprocess from worker node on cluster

Qsub job using subprocess from worker node on cluster - python

I am trying to submit a python job with qsub which in turn submits several other jobs using subprocess and qsub.
I submit these jobs using 2 bash scripts shown below. run_test is the first one submitted and run_script is submit through subprocess.
$ cat run_test
#$ -cwd
#$ -V
#$ -pe openmpi 1
mpirun -n 1 python test_multiple_submit.py
$ cat run_script
#$ -cwd
#$ -V
#$ -pe openmpi 1
mpirun -n 1 python $1
I am having a problem with the second script where it seems to hang at the mpirun call. I was getting an error from bash before about 'module' not found but that has vanished recently.
A simplified version of the python script is shown below
import subprocess
subprocess.Popen(cmd)
subprocess.Popen('qsub run_script '+input)
<Some checks to see if jobs are still running>
The first subprocess runs a case on the current node and the second one should outsource the job to another node, then there are some checks to see if the jobs are still running. There are also some other bits to get other jobs submitted as well but I'm pretty sure this isn't a problem with the script.
Can anyone shed any light on why the second script is failing?

I found that the compute nodes on the cluster were not submit hosts therefore I was getting an error. The only submit host was the head node.
qconf -ss
The above lists the submit hosts. To add a node to the summit list as admin is shown below:
qconf -as < host name>

Related

Force flushing from inside bash script code to a stdout file

I'm trying to flush to stdout the output of a bioinformatic software written on Python code (Ete-Toolkit software). I tried the command (stdbuf) detailed on Force flushing of output to a file while bash script is still running but does not work because I have seen that stdbuf command it's only possible to execute from shell and not from bash(How to use stdbuf on a bash function).
Moreover from Python I discovered the following function that maybe could be interesting:
import sys
sys.stdout.flush()
But I don't know how can I implement inside the next bash script attached below.
The purpose is that if I only use the options -o and -e in the bash script (as you can see) the output is printed to logs_40markers in a not continuos manner which does not permits me to see the error. I can do it working directly from shell but my internet connection is not stable and practically each night there are a power outage and I have to restart again the command that will takes minimun one week.
#!/bin/bash
#$ -N tree
#$ -o logs_40markers
#$ -e logs_40markers
#$ -q all.q#compute-0-3
#$ -l mf=100G
stdbuf -oL
module load apps/etetoolkit-3.1.2
export QT_QPA_PLATFORM='offscreen'
ete3 build -w mafft_default-none-none-none -m sptree_fasttree_all -o provaflush --cogs coglist_species_filtered.txt -a multifasta_speciesunique.fa --clearall --cpu 40
&> logs_40markers
Thanks on advance if someone can give me some guide/advice,
Have a nice day,
Thank you,
Maggi

one informatician colleague of me solved the problem using the PYTHONUNBUFFERED command.
#!/bin/bash
#$ -N tree
#$ -o logs_40markers
#$ -e logs_40markers
#$ -q all.q#compute-0-3
#$ -l mf=100G
module load apps/etetoolkit-3.1.2
export QT_QPA_PLATFORM='offscreen'
export PYTHONUNBUFFERED="TRUE"
ete3 build -w mafft_default-none-none-none -m sptree_fasttree_all -o provaflush --cogs coglist_species_filtered.txt -a multifasta_speciesunique.fa --clearall --cpu 40 --v 4
To check the current situation of the process,
type in the shell:
tail output.log file -f (means follow)
I hope that someone could find this solution helpful

How to run my python script parallely with another Java application on the same Linux box in Gitlab CI?

For one gitlab CI runner
I have a jar file which needs to be continuosly running in the Git linux box but since this is a application which is continuosly running, the python script in the next line is not getting executed. How to run the jar application and then execute the python script simultaneously one after another?
.gitlab.ci-yml file:
pwd && ls -l
unzip ZAP_2.8.0_Core.zip && ls -l
bash scan.sh
python3 Report.py
scan.sh file has the code java -jar app.jar.
Since, this application is continuosly running, 4th line code python3 Report.py is not getting executed.
How do I make both these run simulataneously without the .jar application stopping?

The immediate solution would probably be:
pwd && ls -l
echo "ls OK"
unzip ZAP_2.8.0_Core.zip && ls -l
echo "unzip + ls OK"
bash scan.sh &
scanpid=$!
echo "started scanpid with pid $scanpid"]
ps axuf | grep $scanpid || true
echo "ps + grep OK"
( python3 Report.py ; echo $? > report_status.txt ) || true
echo "report script OK"
kill $scanpid
echo "kill OK"
echo "REPORT STATUS = $(cat report_status.txt)"
test $(cat report_status.txt) -eq 0
Start the java process in the background,
run your python code and remember its return status and always return true.
kill the background process after running python
check for the status code of the python script.
Perhaps this is not necessary, as I never checked how gitlabci deals with background processes, that were spawned by its runners.
I do here a conservative approach.
- I remember the process id of the bash script, so that I can kill it later
- I ensure, that the line running the python script always returns a 0 exit code such, that gitlabci does not stop executing the next lines, but I remember the status code
- then I kill the bash script
- then I check whether the exit code of the python script was 0 or not, such, that gitlabci can perform the proper checking whether the runner was executed successfully or not.
Another minor comment (not related to your question)
I don't really understand why you write
unzip ZAP_2.8.0_Core.zip && ls -l
instead of
unzip ZAP_2.8.0_Core.zip ; ls -l```
If you expect the unzip command to fail you could just write
unzip ZAP_2.8.0_Core.zip
ls -l
and gitlabci would abort automatically before executing ls -l
I also added many echo statements for better debugging, error analysis, you might remove them in your final solution.

To run the two scripts one after the other, you can add & to the end of the line that is blocking. That will make it run in the background.
Either do
bash scan.sh & or add & to the end of the line calling the jar file within the scan.sh...

Running background process with kubectl exec

I am trying to execute a Python program as a background process inside a container with kubectl as below (kubectl issued on local machine):
kubectl exec -it <container_id> -- bash -c "cd some-dir && (python xxx.py --arg1 abc &)"
When I log in to the container and check ps -ef I do not see this process running. Also, there is no output from kubectl command itself.
Is the kubectl command issued correctly?
Is there a better way to achieve the same?
How can I see the output/logs printed off the background process being run?
If I need to stop this background process after some duration, what is the best way to do this?

The nohup Wikipedia page can help; you need to redirect all three IO streams (stdout, stdin and stderr) - an example with yes:
kubectl exec pod -- bash -c "yes > /dev/null 2> /dev/null &"
nohup is not required in the above case because I did not allocate a pseudo terminal (no -t flag) and the shell was not interactive (no -i flag) so no HUP signal is sent to the yes process on session termination. See this answer for more details.
Redirecting /dev/null to stdin is not required in the above case since stdin already refers to /dev/null (you can see this by running ls -l /proc/YES_PID/fd in another shell).
To see the output you can instead redirect stdout to a file.
To stop the process you'd need to identity the PID of the process you want to stop (pgrep could be useful for this purpose) and send a fatal signal to it (kill PID for example).
If you want to stop the process after a fixed duration, timeout might be a better option.

Actually, the best way to make this kind of things is adding an entry point to your container and run execute the commands there.
Like:
entrypoint.sh:
#!/bin/bash
set -e
cd some-dir && (python xxx.py --arg1 abc &)
./somethingelse.sh
exec "$#"
You wouldn't need to go manually inside every single container and run the command.

Exit code 191 when running a python script to run a shell file

I'm trying to use a python script to run a series of oommf simulations on a unix cluster but I'm getting stuck at the point where I send a command from python to bash. I'm using the line:-
subprocess.check_call('qsub shellfile.sh')
Which returns exit code 191. What is exit code 191, I can't seem to be able to find it online. It may be a PBS error rather than a unix error but I'm not sure. The error doesn't seem to be in the shell file itself since the only commands in there:-
#!/bin/bash
# This is an example submit script for the hello world program.
# OPTIONS FOR PBS PRO ==============================================================
#PBS -l walltime=1:00:00
# This specifies the job should run for no longer than 24 hours
#PBS -l select=1:ncpus=8:mem=2048mb
# This specifies the job needs 1 'chunk', with 1 CPU core, and 2048 MB of RAM (memory).
#PBS -j oe
# This joins up the error and output into one file rather that making two files
##PBS -o $working_folder/$PBS_JOBID-oommf_log
# This send your output to the file "hello_output" rather than the standard filename
# OPTIONS FOR PBS PRO ==============================================================
#PBS -P HPCA-000987-EFR
#PBS -M ppxsb3#nottingham.ac.uk
#PBS -m abe
# Here we just use Unix command to run our program
echo "Running on hostname"
sleep 20
echo "Finished job now""
Which should just print the hostname and 'Finished job now'
Thanks

Exit code 191 indicates that the project code associated with the job is invalid. This is the code in line 13:-
#PBS -P HPCA-000974-EFG
Which tells the cluster which project the code is associated with.

Changing Process Name using Shell for nagios monitoring with check_procs

I have a python script to start a process which I want to monitor using Nagios. When I run that script and perform ps -ef on my ubuntu EC2 instance, it shows process as python <filename>.py --arguments. For Nagios to monitor that process using check_procs, we need to supply process name. Here process name becomes 'python'.
/usr/lib/nagios/plugins/check_procs -C python
It returns the output that one python process is running. This is fine when I'm running one python process. But If I'm running multiple python scripts and monitor only few, then I have to give that particular process name. If in the above command, I give python script name, it throws an error. So I want to mask whole python <filename>.py --arguments to some other name so that while performing check_procs, I can give that new name.
If anyone have any idea, please let me know. I have checked other stackoverflow questions which suggest changing python process name using setproctitle but I want to perform it using shell.
Regards,
Sanket

You can use the check_procs command to look at arguments, which includes the module name. The following command will let you know if the python module 'module.py' is running.
/usr/lib/nagios/plugins/check_procs -c 1:1 -a module.py -C python
The -c argument lets you set the critical range. 1:1 will trigger a critical status if there is more or less than 1 process that matches running.
The -a argument will filter based on processes that contain the args 'module.py' (change it to the name of the module you want to monitor)
The -C argument will make sure that the process is a python process
If you need help figuring out how to create the service definition, I had to figure that out too. Just let me know.
REFERENCE:
check_procs plugin manpage
http://nagiosplugins.org/man/check_procs

You can't change the process name from pure Python, although you can use a wrapper (for example, written in C) to do so.
However, what you should do instead is making your program a daemon, and using a pidfile. Have a look at the python Daemon API and its implementation python-daemon.

check_procs already handles this situation.
check_procs can tell the difference between scripts launched as an argument to the interpreter vs jobs run directly a hashbang interpreter. Even though both of these look the same in the ps output!! The latter case will not be listed in check_procs -C python!
If you run your scripts explicitly via python: python <filename.py>, then you can monitor them with the check_procs -C python -a filename.py.
If you put #!/usr/bin/python in your scripts and run them as ./filename.py, then you can monitor with check_procs -C filename.py.
Example command line session showing this behavior:
#make test.py directly executable. See code below
$ chmod a+x test.py
#launch via python explicitly:
$ /usr/bin/python ./test.py &
[1] 27094
$ check_procs -C python && check_procs -C test.py && check_procs -a test.py
PROCS OK: 1 process with command name 'python'
PROCS OK: 0 processes with command name 'test.py'
PROCS OK: 1 process with args 'test.py'
#launch via python implicitly
$ ./test.py &
[2] 27134
$ check_procs -C python && check_procs -C test.py && check_procs -a test.py
PROCS OK: 1 process with command name 'python'
PROCS OK: 1 process with command name 'test.py'
PROCS OK: 2 processes with args 'test.py'
#PS 'COMMAND' output looks the same
$ ps 27094 27134
PID TTY STAT TIME COMMAND
27094 pts/6 S 0:00 /usr/bin/python ./test.py
27134 pts/6 S 0:00 /usr/bin/python ./test.py
#kill the explicit test
$ kill 27094
[1] - terminated /usr/bin/python ./test.py
$ check_procs -C python && check_procs -C test.py && check_procs -a test.py
PROCS OK: 0 processes with command name 'python'
PROCS OK: 1 process with command name 'test.py'
PROCS OK: 1 process with args 'test.py'
#kill the implicit test
$ kill 27134
[2] + terminated ./test.py
$ check_procs -C python && check_procs -C test.py && check_procs -a test.py
PROCS OK: 0 processes with command name 'python'
PROCS OK: 0 processes with command name 'test.py'
PROCS OK: 0 processes with args 'test.py'
test.py is a python script that sleeps for 2 minutes. It is chmod +x and has a hashbang #! line invoking /usr/bin/python.
#!/usr/bin/python
import time
time.sleep(120)

Create a pid file and use that file for the process lookup with nagios.

I'm not saying this is the best solution (it wouldn't scale well at all), but you can create a symbolic link to the python command and execute your script using this link. e.g.
ln -s `which python` ~/mypython
~/mypython myscript.py
Scripts launched using the link should show up as mypython in ps.

You can use subprocess.Popen to change the executable name, but you'd have to use a wrapper script (or some weird fork magic). The following code causes ps to list the executable as kwyjibo /tmp/test.py instead of /usr/bin/python /tmp/test.py:
import subprocess
p = subprocess.Popen(['kwyjibo', '/tmp/test.py'], executable='/usr/bin/python')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Qsub job using subprocess from worker node on cluster - python

I found that the compute nodes on the cluster were not submit hosts therefore I was getting an error. The only submit host was the head node. qconf -ss The above lists the submit hosts. To add a node to the summit list as admin is shown below: qconf -as < host name>

Related

Force flushing from inside bash script code to a stdout file

How to run my python script parallely with another Java application on the same Linux box in Gitlab CI?

Running background process with kubectl exec

Exit code 191 when running a python script to run a shell file

Changing Process Name using Shell for nagios monitoring with check_procs

Categories

Resources