Running a job on multiple nodes of a GridEngine cluster - python

I have access to a 128-core cluster on which I would like to run a parallelised job. The cluster uses Sun GridEngine and my program is written to run using Parallel Python, numpy, scipy on Python 2.5.8. Running the job on a single node (4-cores) yields an ~3.5x improvement over a single core. I would now like to take this to the next level and split the job across ~4 nodes. My qsub script looks something like this:
#!/bin/bash
# The name of the job, can be whatever makes sense to you
#$ -N jobname
# The job should be placed into the queue 'all.q'.
#$ -q all.q
# Redirect output stream to this file.
#$ -o jobname_output.dat
# Redirect error stream to this file.
#$ -e jobname_error.dat
# The batchsystem should use the current directory as working directory.
# Both files will be placed in the current
# directory. The batchsystem assumes to find the executable in this directory.
#$ -cwd
# request Bourne shell as shell for job.
#$ -S /bin/sh
# print date and time
date
# spython is the server's version of Python 2.5. Using python instead of spython causes the program to run in python 2.3
spython programname.py
# print date and time again
date
Does anyone have any idea of how to do this?

Yes, you need to include the Grid Engine option -np 16 either in your script like this:
# Use 16 processors
#$ -np 16
or on the command line when you submit the script. Or, for more permanent arrangements, use an .sge_request file.
On all the GE installations I've ever used this will give you 16 processors (or processor cores these days) on as few nodes as necessary, so if your nodes have 4 cores you'll get 4 nodes, if they have 8 2 and so on. To place the job on, say 2 cores on 8 nodes (which you might want to do if you need a lot of memory for each process) is a little more complicated and you should consult your support team.

Related

Run multiple files consecutively via SLURM with individual timeout

I have a python script I run on HPC that takes a list of files in a text file and starts multiple SBATCH runs:
./launch_job.sh 0_folder_file_list.txt
launch_job.sh goes through 0_folder_file_list.txt and starts an SBATCH for each file
SAMPLE_LIST=`cut -d "." -f 1 $1`
for SAMPLE in $SAMPLE_LIST
do
echo "Getting accessions from $SAMPLE"
sbatch get_acc.slurm $SAMPLE
#./get_job.slurm $SAMPLE
done
get_job.slurm has all of my SBATCH information, module loads, etc. and performs
srun --mpi=pmi2 -n 5 python python_script.py ${SAMPLE}.txt
I don't want to start all of the jobs at one time, I would like them to run consecutively with a 24-hour maximum run time. I have already set my SBATCH -t to allow for a maximum time but I only want each job to run for a maximum of 24-hours. Is there a srun argument I can set that will accomplish this? Something else?
You can use --wait flag with sbatch.
-W, --wait
Do not exit until the submitted job terminates. The exit code of the sbatch command will be the same as the exit code of the submitted
job. If the job terminated due to a signal rather than a normal exit,
the exit code will be set to 1. In the case of a job array, the exit
code recorded will be the highest value for any task in the job array.
In your case,
for SAMPLE in $SAMPLE_LIST
do
echo "Getting accessions from $SAMPLE"
sbatch --wait get_acc.slurm $SAMPLE
done
So, the next sbatch command will only be called after the first sbatch finishes (your job ended or time limit reached).

How to immediately submit all Snakemake jobs to slurm cluster

I'm using snakemake to build a variant calling pipeline that can be run on a SLURM cluster. The cluster has login nodes and compute nodes. Any real computing should be done on the compute nodes in the form of an srun or sbatch job. Jobs are limited to 48 hours of runtime. My problem is that processing many samples, especially when the queue is busy, will take more than 48 hours to process all the rules for every sample. The traditional cluster execution for snakemake leaves a master thread running that only submits rules to the queue after all the rule's dependencies have finished running. I'm supposed to run this master program on a compute node, so this limits the runtime of my entire pipeline to 48 hours.
I know SLURM jobs have dependency directives that tell a job to wait to run until other jobs have finished. Because the snakemake workflow is a DAG, is it possible to submit all the jobs at once, with each job having its dependencies defined by the rule's dependencies from the DAG? After all the jobs are submitted the master thread would complete, circumventing the 48 hour limit. Is this possible with snakemake, and if so, how does it work? I've found the --immediate-submit command line option, but I'm not sure if this has the behavior I'm looking for and how to use the command because my cluster prints Submitted batch job [id] after a job is submitted to the queue instead of just the job id.
Immediate submit unfortunately does not work out-of-the-box, but needs some tuning for it to work. This is because the way dependencies between jobs are passed along differ between cluster systems. A while ago I struggled with the same problem. As the immediate-submit docs say:
Immediately submit all jobs to the cluster instead of waiting for
present input files. This will fail, unless you make the cluster aware
of job dependencies, e.g. via: $ snakemake –cluster ‘sbatch
–dependency {dependencies}. Assuming that your submit script (here
sbatch) outputs the generated job id to the first stdout line,
{dependencies} will be filled with space separated job ids this job
depends on.
So the problem is that sbatch does not output the generated job id to the first stdout line. However we can circumvent this with our own shell script:
parseJobID.sh:
#!/bin/bash
# helper script that parses slurm output for the job ID,
# and feeds it to back to snakemake/slurm for dependencies.
# This is required when you want to use the snakemake --immediate-submit option
if [[ "Submitted batch job" =~ "$#" ]]; then
echo -n ""
else
deplist=$(grep -Eo '[0-9]{1,10}' <<< "$#" | tr '\n' ',' | sed 's/.$//')
echo -n "--dependency=aftercorr:$deplist"
fi;
And make sure to give the script execute permission with chmod +x parseJobID.sh.
We can then call immediate submit like this:
snakemake --cluster 'sbatch $(./parseJobID.sh {dependencies})' --jobs 100 --notemp --immediate-submit
Note that this will submit at max 100 jobs at the same time. You can increase or decrease this to any number you like, but know that most cluster systems do not allow more than 1000 jobs per user at the same time.

Run python script when computer is not being used?

I recently got into machine-learning. I'm running a pythonscript that is heavy on my processor. My first idea was to setup a cron-job that was running in the background and then in python cancel the job if the time is between 06:00 and 07:00 in the morning. (The job should ideally only be canceled at certain stages.)
0 1 * * * cd ~/web/im2txt/im2txt && ./train.sh >/Users/kristoffer/Desktop/train.py 2>/Users/kristoffer/Desktop/train.log
But then I got thinking, is there someway, either in python or via shell to run a script if the computer is not being used? Is in idle or something like that?
xscreensaver can run any program that is specified in its configuration file, i.e.:
programs: \
qix -root \n\
ico -r -faces -sleep 1 -obj ico \n\
xdaliclock -builtin2 -root \n\
xv -root -rmode 5 image.gif -quit \n
then, you can add your own and let xscreensaver do the rest determining when your computer is idle.
The standard way to make your program run with lower priority compared to other processes is using the nice command:
nice -n 20 ./train.sh
The command will run all the time, but the scheduler will give it the lowest possible priority, effectively giving it CPU time only when there is nothing else to do.
Note, however, that nice will only make the process nice (hence the name) to other princesses. If no other processes are competing for CPU time, a CPU-hungry process will utilize 100% of available cores (and heat up the machine), even when niced to the lowest priority.

SLURM and python, nodes are allocated, but the code only runs on one node

I have a 4*64 CPU cluster. I installed SLURM, and it seems to be working, as if i call sbatch i get the proper allocation and queue. However if i use more than 64 cores (so basically more than 1 node) it perfectly allocates the correct amount of nodes, but if i ssh into the allocated nodes i only see actual work in one of them. The rest just sits there doing nothing.
My code is complex, and it uses multiprocessing. I call pools with like 300 workers, so i guess it should not be the problem.
What i would like to achieve is to call sbatch myscript.py on like 200 cores, and SLURM should distribute my run on these 200 cores, not just allocate the correct amount of nodes but actually only use one.
The header of my python script looks like this:
#!/usr/bin/python3
#SBATCH --output=SLURM_%j.log
#SBATCH --partition=part
#SBATCH -n 200
and i call the script with sbatch myscript.py.
Unfortunately, multiprocessing does not allow working on several nodes. From the documentation:
the multiprocessing module allows the programmer to fully leverage
multiple processors on a given machine
One option, often used with Slurm, is to use MPI (with the MPI4PY package) but MPI is considered to be the 'the assembly language of parallel programming' and you will need to modify your code extensibly.
Another option is to look into the Parallel Processing packages for one that suits your needs and requires minimal changes to your code. See also this other question for more insights.
A final note: it is perfectly fine to put the #SBATCH directives in the Python script and use the Python shebang. But as Slurm executes a copy of the script rather than the script itself, you must add a line such as
sys.path.append(os.getcwd())
at the beginning of the script (but after the #SBATCH lines) to make sure Python finds any module located in your directory.
I think your sbatch script should not be inside the python script. Rather it should be a normal bash script including the #SBATCH options followed by the actual script to run with srun jobs. like the following:
#!/usr/bin/bash
#SBATCH --output=SLURM_%j.log
#SBATCH --partition=part
#SBATCH -n 200
srun python3 myscript.py
I suggest testing this with a simple python script like this:
import multiprocessing as mp
def main():
print("cpus =", mp.cpu_count())
if __name__ == "__main__":
main()
I tried to get around using different python libraries by using srun on the following bash script. srun should run on each node that you have allocated to you. The basic idea is that it determines what node it's running on and assigns a node id of 0, 1, ... , nnodes-1. Then it passes that information off to the python program along with a thread id. In the program I combine these two numbers to make a distinct id for each cpu on each node. This code assumes that there are 16 cores on each node and 10 nodes are going to be used.
#!/bin/bash
nnames=(`scontrol show hostnames`)
nnodes=${#nnames[#]}
nIDs=`seq 0 $(($nnodes-1))`
nID=0
for i in $nIDs
do
hname=`hostname`
if [ "${nnames[$i]}" == "$hname" ]
then nID=$i
fi
done
tIDs=`seq 0 15`
for tID in $tIDs
do
python testDataFitting2.py $nID $tID 160 &
done
wait

python capture and store curl command process id

i have a python script which uses threading to run a process.
It starts three threads executing curl commands on a different server.
i am not using threading sub-processes or any python sub-processes (as my version of python does not support it ) and I want to kill these specific running curl processes on the different server.
I want to capture the process id at the time the curl command is executed, put it in a list and then use this list of PID's to kill if necessary.
What is the best way to do this ?
I have tried a few ways but nothing working ..
curl command & echo $!
will return the value to screen but i want to capture this. I have tried to set as a variable, export it ..
If i try:
ret,output = remoteConnection.runCmdGetOutput(cmd)
it will return the output (which includes the pid) of one curl command (the first one), but not all 3 (i think this has to do with the threading) and the parent script continues.
Any ideas ?
Thanks

Categories