How to immediately submit all Snakemake jobs to slurm cluster - python

I'm using snakemake to build a variant calling pipeline that can be run on a SLURM cluster. The cluster has login nodes and compute nodes. Any real computing should be done on the compute nodes in the form of an srun or sbatch job. Jobs are limited to 48 hours of runtime. My problem is that processing many samples, especially when the queue is busy, will take more than 48 hours to process all the rules for every sample. The traditional cluster execution for snakemake leaves a master thread running that only submits rules to the queue after all the rule's dependencies have finished running. I'm supposed to run this master program on a compute node, so this limits the runtime of my entire pipeline to 48 hours.
I know SLURM jobs have dependency directives that tell a job to wait to run until other jobs have finished. Because the snakemake workflow is a DAG, is it possible to submit all the jobs at once, with each job having its dependencies defined by the rule's dependencies from the DAG? After all the jobs are submitted the master thread would complete, circumventing the 48 hour limit. Is this possible with snakemake, and if so, how does it work? I've found the --immediate-submit command line option, but I'm not sure if this has the behavior I'm looking for and how to use the command because my cluster prints Submitted batch job [id] after a job is submitted to the queue instead of just the job id.

Immediate submit unfortunately does not work out-of-the-box, but needs some tuning for it to work. This is because the way dependencies between jobs are passed along differ between cluster systems. A while ago I struggled with the same problem. As the immediate-submit docs say:
Immediately submit all jobs to the cluster instead of waiting for
present input files. This will fail, unless you make the cluster aware
of job dependencies, e.g. via: $ snakemake –cluster ‘sbatch
–dependency {dependencies}. Assuming that your submit script (here
sbatch) outputs the generated job id to the first stdout line,
{dependencies} will be filled with space separated job ids this job
depends on.
So the problem is that sbatch does not output the generated job id to the first stdout line. However we can circumvent this with our own shell script:
parseJobID.sh:
#!/bin/bash
# helper script that parses slurm output for the job ID,
# and feeds it to back to snakemake/slurm for dependencies.
# This is required when you want to use the snakemake --immediate-submit option
if [[ "Submitted batch job" =~ "$#" ]]; then
echo -n ""
else
deplist=$(grep -Eo '[0-9]{1,10}' <<< "$#" | tr '\n' ',' | sed 's/.$//')
echo -n "--dependency=aftercorr:$deplist"
fi;
And make sure to give the script execute permission with chmod +x parseJobID.sh.
We can then call immediate submit like this:
snakemake --cluster 'sbatch $(./parseJobID.sh {dependencies})' --jobs 100 --notemp --immediate-submit
Note that this will submit at max 100 jobs at the same time. You can increase or decrease this to any number you like, but know that most cluster systems do not allow more than 1000 jobs per user at the same time.

Related

Run multiple files consecutively via SLURM with individual timeout

I have a python script I run on HPC that takes a list of files in a text file and starts multiple SBATCH runs:
./launch_job.sh 0_folder_file_list.txt
launch_job.sh goes through 0_folder_file_list.txt and starts an SBATCH for each file
SAMPLE_LIST=`cut -d "." -f 1 $1`
for SAMPLE in $SAMPLE_LIST
do
echo "Getting accessions from $SAMPLE"
sbatch get_acc.slurm $SAMPLE
#./get_job.slurm $SAMPLE
done
get_job.slurm has all of my SBATCH information, module loads, etc. and performs
srun --mpi=pmi2 -n 5 python python_script.py ${SAMPLE}.txt
I don't want to start all of the jobs at one time, I would like them to run consecutively with a 24-hour maximum run time. I have already set my SBATCH -t to allow for a maximum time but I only want each job to run for a maximum of 24-hours. Is there a srun argument I can set that will accomplish this? Something else?
You can use --wait flag with sbatch.
-W, --wait
Do not exit until the submitted job terminates. The exit code of the sbatch command will be the same as the exit code of the submitted
job. If the job terminated due to a signal rather than a normal exit,
the exit code will be set to 1. In the case of a job array, the exit
code recorded will be the highest value for any task in the job array.
In your case,
for SAMPLE in $SAMPLE_LIST
do
echo "Getting accessions from $SAMPLE"
sbatch --wait get_acc.slurm $SAMPLE
done
So, the next sbatch command will only be called after the first sbatch finishes (your job ended or time limit reached).

How to extract with Python the list of ids of jobs running on an LSF cluster?

I am currently writing a python script to launch many simulations in parallel using this command repeatedly :
os.system("bsub -q reg -app ... file.cir")
And I need to retrieve the job ID list in order to know exactly when all the jobs are completed, to then process the data. My idea is simply to make a loop over the job id list and to check if each of them are completed.
I have tried using getpid() but I believe it only gives me the id of the python process running.
I know bjobs gives you the list of processes running but from there I do not see how to parse the output with my Python file.
How can I do that ?
Would there otherwise be an easier solution to find out when all the processes I run on the LSF cluster are over ?

Is there any way to run a secondary python script at regular intervals to work on output of a primary script in Slurm?

I am submitting a batch script that involves a primary command/script (an mpi process) that outputs data and I need to evaluate the progress of the primary process by running a secondary Python script at fixed intervals of time when the primary process is still running. Is there any command that would allow me to do so this with a Slurm batch script?
As an example, consider the primary process takes 24 hours, if I place the Python script normally after the end of the primary command/script, it would only run at the end of the primary process. I need the Python command/script to run every 1 hour to process data generated by the primary process. Is this possible on Slurm?
The structure of the script would look like this:
#! /bin/bash
#SBATCH ...
#SBATCH ...
while : ; do sleep 3600 ; python <secondary script> ; done &
mpirun <primary command>
The idea is to run the secondary script in an infinite loop in the background. When the primary command finishes, the job is terminated and the background loop is stopped.

Run python script when computer is not being used?

I recently got into machine-learning. I'm running a pythonscript that is heavy on my processor. My first idea was to setup a cron-job that was running in the background and then in python cancel the job if the time is between 06:00 and 07:00 in the morning. (The job should ideally only be canceled at certain stages.)
0 1 * * * cd ~/web/im2txt/im2txt && ./train.sh >/Users/kristoffer/Desktop/train.py 2>/Users/kristoffer/Desktop/train.log
But then I got thinking, is there someway, either in python or via shell to run a script if the computer is not being used? Is in idle or something like that?
xscreensaver can run any program that is specified in its configuration file, i.e.:
programs: \
qix -root \n\
ico -r -faces -sleep 1 -obj ico \n\
xdaliclock -builtin2 -root \n\
xv -root -rmode 5 image.gif -quit \n
then, you can add your own and let xscreensaver do the rest determining when your computer is idle.
The standard way to make your program run with lower priority compared to other processes is using the nice command:
nice -n 20 ./train.sh
The command will run all the time, but the scheduler will give it the lowest possible priority, effectively giving it CPU time only when there is nothing else to do.
Note, however, that nice will only make the process nice (hence the name) to other princesses. If no other processes are competing for CPU time, a CPU-hungry process will utilize 100% of available cores (and heat up the machine), even when niced to the lowest priority.

Running a job on multiple nodes of a GridEngine cluster

I have access to a 128-core cluster on which I would like to run a parallelised job. The cluster uses Sun GridEngine and my program is written to run using Parallel Python, numpy, scipy on Python 2.5.8. Running the job on a single node (4-cores) yields an ~3.5x improvement over a single core. I would now like to take this to the next level and split the job across ~4 nodes. My qsub script looks something like this:
#!/bin/bash
# The name of the job, can be whatever makes sense to you
#$ -N jobname
# The job should be placed into the queue 'all.q'.
#$ -q all.q
# Redirect output stream to this file.
#$ -o jobname_output.dat
# Redirect error stream to this file.
#$ -e jobname_error.dat
# The batchsystem should use the current directory as working directory.
# Both files will be placed in the current
# directory. The batchsystem assumes to find the executable in this directory.
#$ -cwd
# request Bourne shell as shell for job.
#$ -S /bin/sh
# print date and time
date
# spython is the server's version of Python 2.5. Using python instead of spython causes the program to run in python 2.3
spython programname.py
# print date and time again
date
Does anyone have any idea of how to do this?
Yes, you need to include the Grid Engine option -np 16 either in your script like this:
# Use 16 processors
#$ -np 16
or on the command line when you submit the script. Or, for more permanent arrangements, use an .sge_request file.
On all the GE installations I've ever used this will give you 16 processors (or processor cores these days) on as few nodes as necessary, so if your nodes have 4 cores you'll get 4 nodes, if they have 8 2 and so on. To place the job on, say 2 cores on 8 nodes (which you might want to do if you need a lot of memory for each process) is a little more complicated and you should consult your support team.

Categories