I'm trying to flush to stdout the output of a bioinformatic software written on Python code (Ete-Toolkit software). I tried the command (stdbuf) detailed on Force flushing of output to a file while bash script is still running but does not work because I have seen that stdbuf command it's only possible to execute from shell and not from bash(How to use stdbuf on a bash function).
Moreover from Python I discovered the following function that maybe could be interesting:
import sys
sys.stdout.flush()
But I don't know how can I implement inside the next bash script attached below.
The purpose is that if I only use the options -o and -e in the bash script (as you can see) the output is printed to logs_40markers in a not continuos manner which does not permits me to see the error. I can do it working directly from shell but my internet connection is not stable and practically each night there are a power outage and I have to restart again the command that will takes minimun one week.
#!/bin/bash
#$ -N tree
#$ -o logs_40markers
#$ -e logs_40markers
#$ -q all.q#compute-0-3
#$ -l mf=100G
stdbuf -oL
module load apps/etetoolkit-3.1.2
export QT_QPA_PLATFORM='offscreen'
ete3 build -w mafft_default-none-none-none -m sptree_fasttree_all -o provaflush --cogs coglist_species_filtered.txt -a multifasta_speciesunique.fa --clearall --cpu 40
&> logs_40markers
Thanks on advance if someone can give me some guide/advice,
Have a nice day,
Thank you,
Maggi
one informatician colleague of me solved the problem using the PYTHONUNBUFFERED command.
#!/bin/bash
#$ -N tree
#$ -o logs_40markers
#$ -e logs_40markers
#$ -q all.q#compute-0-3
#$ -l mf=100G
module load apps/etetoolkit-3.1.2
export QT_QPA_PLATFORM='offscreen'
export PYTHONUNBUFFERED="TRUE"
ete3 build -w mafft_default-none-none-none -m sptree_fasttree_all -o provaflush --cogs coglist_species_filtered.txt -a multifasta_speciesunique.fa --clearall --cpu 40 --v 4
To check the current situation of the process,
type in the shell:
tail output.log file -f (means follow)
I hope that someone could find this solution helpful
Related
I am pretty new to bash code, and I have some basic questions.
I have one job array job_array_1.sh, which I am running in Hoffman2.
job_array_1.sh is the following:
#!/bin/bash
#$ -cwd
#$ -o test.joblog.$JOB_ID.$TASK_ID
#$ -j y
#$ -l h_data=5G,h_rt=00:20:00
#$ -m n
#$ -t 1-5:1
. /u/local/Modules/default/init/modules.sh
module load anaconda3
#module load python/3.9.6
python3 file1.py $SGE_TASK_ID
If, from the terminal I type qsub job_array_1.sh, this produces 5 different files with names test.joblog.$JOB_ID.$TASK_ID (with the value of t as $TASK_ID). Notice that in this way the 5 jobs start in a parallel way.
I need to create another file call it loop.sh such that it submits the file job_array_1.sh sequentially (in this case twice). So far I have:
#$ -cwd
#$ -j y
#$ -l h_data=3G,h_rt=01:00:00
#$ -m n
for ((i=1; i<=2; i++)); do
# job submission scripts or shell scripts
fname_in1="job_array_1.sh"
./$fname_in1 &
wait
done
When, from the terminal, I type qsub loop.sh this does not produce the 5 files that I have if I do qsub job_array_1.sh. How can I modify the loop.sh file so that the job_array_1.sh produces the 5 files?
I'm guessing wildly here because I don't know anything about your job submission system, but I do know a little about bash and am trying to help. I suspect you need something more like this:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -l h_data=3G,h_rt=01:00:00
#$ -m n
for ((i=1; i<=2; i++)); do
# job submission scripts or shell scripts
echo "Loop: $i"
qsub job_array_1.sh &
done
wait
How can run a sourced bash script, and then change directories, and then run a command, all within the same shell (Using python)? Is this even possible?
My Attempt:
subprocess.check_call(["env -i bash -c 'source ./init-build ARG'", "cd ../myDir", "bitbake myBoard"], shell =True)
I would make this for you, but I need to see the absolute paths. Here is an example
subprocess.check_call(["""/usr/bin/env bash -c "cd /home/x/y/tools && source /home/x/y/venv/bin/activate && python asdf.py" >> /tmp/asdf.txt 2>&1"""], shell=True)
I am trying to run a pssh command inside a shell script, but the script freezes and there are no connections made, as verified in a ps -ef command. Also, because there is only one host in the hosts file I am using.
At this point, Control-C fails to kill the script, and it will not timeout. Only a kill command works.
If I run the same command on the command-line, there is no issue. Also, a pscp command in the same script causes no issues, so it seems that the required libraries are being loaded.
$ cat /home/myusername/tmp/hosts
mysinglehostname
Here is the script being run:
$ cat /home/myusername/bin/testpssh
#!/bin/bash
source ~/.bashrc
$HOME/path/to/python-virtualenv/bin/pscp -h "/home/myusername/tmp/hosts" "/tmp/garbage" "/tmp/garbage"
$HOME/path/to/python-virtualenv/bin/pssh -h "/home/myusername/tmp/hosts" -l myusername -p 512 -t 3 -o "out" -O GSSAPIAuthentication=no -i "whoami"
Here is what happens when I run the script:
$ /home/myusername/bin/testpssh &
[1] 18553
$ [1] 14:51:12 [SUCCESS] mysinglehostname 22
$ ps -ef | grep pssh
myusername 18580 18553 0 14:33 pts/16 00:00:00 /home/myusername/path/to/python-virtualenv/bin/python /home/myusername/path/to/python-virtualenv/bin/pssh -h /home/myusername/tmp/hosts -l myusername -p 512 -t 3 -o out -O GSSAPIAuthentication=no -i whoami
$ ## The script above is hanging after completing the pscp, before pssh completes.\
> But if I copy and paste the process line, it works fine as shown here:
$ /home/myusername/path/to/python-virtualenv/bin/python \
> /home/myusername/path/to/python-virtualenv/bin/pssh \
> -h /home/myusername/tmp/hosts -l myusername \
> -p 512 -t 3 -o out -O GSSAPIAuthentication=no -i whoami
[1] 14:59:03 [SUCCESS] mysinglehostname 22
myusername
$
The first [SUCCESS] above is for the pscp action, and no subsequent [SUCCESS] comes from the pssh command, unless it is performed explicitly on the command-line.
Why will the pssh command not work inside the bash shell script?
The script works fine if I use ksh instead of bash (and remove the line to source ~/.bashrc) in the shebang line.
I am on RedHat 6.4, using python 2.6.6
I am trying to run shell code from a python file to submit another python file to a computing cluster. The shell code is as follows:
#BSUB -J Proc[1]
#BSUB -e ~/logs/proc.%I.%J.err
#BSUB -o ~/logs/proc.%I.%J.out
#BSUB -R "span[hosts=1]"
#BSUB -n 1
python main.py
But when I run it from python like the following I can't get it to work:
from os import system
system('bsub -n 1 < #BSUB -J Proc[1];#BSUB -e ~/logs/proc.%I.%J.err;#BSUB -o ~/logs/proc.%I.%J.out;#BSUB -R "span[hosts=1]";#BSUB -n 1;python main.py')
Is there something I'm doing wrong here?
If I understand correctly, all the #BSUB stuff is text that should be fed to the bsub command as input; bsub is run locally, then runs those commands for you on the compute node.
In that case, you can't just do:
bsub -n 1 < #BSUB -J Proc[1];#BSUB -e ~/logs/proc.%I.%J.err;#BSUB -o ~/logs/proc.%I.%J.out;#BSUB -R "span[hosts=1]";#BSUB -n 1;python main.py
That's interpreted by the shell as "run bsub -n 1 and read from a file named OH CRAP A COMMENT STARTED AND NOW WE DON'T HAVE A FILE TO READ!"
You could fix this with MOAR HACKERY (using echo or here strings taking further unnecessary dependencies on shell execution). But if you want to feed stdin input, the best solution is to use a more powerful tool for the task, the subprocess module:
# Open a process (no shell wrapper) that we can feed stdin to
proc = subprocess.Popen(['bsub', '-n', '1'], stdin=subprocess.PIPE)
# Feed the command series you needed to stdin, then wait for process to complete
# Per Michael Closson, can't use semi-colons, bsub requires newlines
proc.communicate(b'''#BSUB -J Proc[1]
#BSUB -e ~/logs/proc.%I.%J.err
#BSUB -o ~/logs/proc.%I.%J.out
#BSUB -R "span[hosts=1]"
#BSUB -n 1
python main.py
''')
# Assuming the exit code is meaningful, check it here
if proc.returncode != 0:
# Handle a failed process launch here
This avoids a shell launch entirely (removing the issue with needing to deal with comment characters at all, along with all the other issues with handling shell metacharacters), and is significantly more explicit about what is being run locally (bsub -n 1) and what is commands being run in the bsub session (the stdin).
The #BSUB directives are parsed by the bsub binary, which doesn't support ; as a delimiter. You need to use newlines. This worked for me.
#!/usr/bin/python
import subprocess;
# Open a process (no shell wrapper) that we can feed stdin to
proc = subprocess.Popen(['bsub', '-n', '1'], stdin=subprocess.PIPE)
# Feed the command series you needed to stdin, then wait for process to complete
input="""#!/bin/sh
#BSUB -J mysleep
sleep 101
"""
proc.communicate(input);
*** So obviously I got the python code from #ShadowRanger. +1 his answer. I would have posted this as a comment to his answer if SO supported python code in a comment.
I am trying to submit a python job with qsub which in turn submits several other jobs using subprocess and qsub.
I submit these jobs using 2 bash scripts shown below. run_test is the first one submitted and run_script is submit through subprocess.
$ cat run_test
#$ -cwd
#$ -V
#$ -pe openmpi 1
mpirun -n 1 python test_multiple_submit.py
$ cat run_script
#$ -cwd
#$ -V
#$ -pe openmpi 1
mpirun -n 1 python $1
I am having a problem with the second script where it seems to hang at the mpirun call. I was getting an error from bash before about 'module' not found but that has vanished recently.
A simplified version of the python script is shown below
import subprocess
subprocess.Popen(cmd)
subprocess.Popen('qsub run_script '+input)
<Some checks to see if jobs are still running>
The first subprocess runs a case on the current node and the second one should outsource the job to another node, then there are some checks to see if the jobs are still running. There are also some other bits to get other jobs submitted as well but I'm pretty sure this isn't a problem with the script.
Can anyone shed any light on why the second script is failing?
I found that the compute nodes on the cluster were not submit hosts therefore I was getting an error. The only submit host was the head node.
qconf -ss
The above lists the submit hosts. To add a node to the summit list as admin is shown below:
qconf -as < host name>