I have written a bash script that consists of multiple Unix commands and Python scripts. The goal is to make a pipeline for detecting long non coding RNA from a certain input. Ultimately I would like to turn this into an 'app' and host it on some bioinformatics website. One problem I am facing is using getopt tools in bash. I couldn't find a good tutorial that I understand clearly. In addition any other comments related to the code is appreciated.
#!/bin/bash
if [ "$1" == "-h" ]
then
echo "Usage: sh $0 cuffcompare_output reference_genome blast_file"
exit
else
wget https://github.com/TransDecoder/TransDecoder/archive/2.0.1.tar.gz && tar xvf 2.0.1 && rm -r 2.0.1
makeblastdb -in $3 -dbtype nucl -out $3.blast.out
grep '"u"' $1 | \
gffread -w transcripts_u.fa -g $2 - && \
python2.7 get_gene_length_filter.py transcripts_u.fa transcripts_u_filter.fa && \
TransDecoder-2.0.1/TransDecoder.LongOrfs -t transcripts_u_filter.fa
sed 's/ .*//' transcripts_u_filter.fa | grep ">" | sed 's/>//' > transcripts_u_filter.fa.genes
cd transcripts_u_filter.fa.transdecoder_dir
sed 's/|.*//' longest_orfs.cds | grep ">" | sed 's/>//' | uniq > longest_orfs.cds.genes
grep -v -f longest_orfs.cds.genes ../transcripts_u_filter.fa.genes > longest_orfs.cds.genes.not.genes
sed 's/^/>/' longest_orfs.cds.genes.not.genes > temp && mv temp longest_orfs.cds.genes.not.genes
python ../extract_sequences.py longest_orfs.cds.genes.not.genes ../transcripts_u_filter.fa longest_orfs.cds.genes.not.genes.fa
blastn -query longest_orfs.cds.genes.not.genes.fa -db ../$3.blast.out -out longest_orfs.cds.genes.not.genes.fa.blast.out -outfmt 6
python ../filter_sequences.py longest_orfs.cds.genes.not.genes.fa.blast.out longest_orfs.cds.genes.not.genes.fa.blast.out.filtered
grep -v -f longest_orfs.cds.genes.not.genes.fa.blast.out.filtered longest_orfs.cds.genes.not.genes.fa > lincRNA_final.fa
fi
Here is how I run it:
sh test.sh cuffcompare_out_annot_no_annot.combined.gtf /mydata/db/Brapa_sequence_v1.2.fa TE_RNA_transcripts.fa
If you wanted the call to be :
test -c cuffcompare_output -r reference_genome -b blast_file
You would have something like :
#!/bin/bash
while getopts ":b:c:hr:" opt; do
case $opt in
b)
blastfile=$OPTARG
;;
c)
comparefilefile=$OPTARG
;;
h)
echo "USAGE : test -c cuffcompare_output -r reference_genome -b blast_file"
;;
r)
referencegenome=$OPTARG
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
:)
echo "Option -$OPTARG requires an argument." >&2
exit 1
;;
esac
done
In the string ":b:c:hr:",
- the first ":" tells getopts that we'll handle any errors,
- subsequent letters are the allowable flags. If the letter is followed by a ':', then getopts will expect that flag to take an argument, and supply that argument as $OPTARG
Related
Is there a way to create a Makefile/script that will fail if a file from coverage.py library has values below a certain threshold? Say 80%.
For coverage.py you can use fail_under option
First, gather test coverage by using the coverage run command (here), then run the coverage report command
Option 1.
Run via coverage report command directly
coverage report --fail-under=80
Option 2.
Use configuration file by defining value in the report section of .coveragerc
[report]
fail_under = 80
Then run the coverage report command
coverage report
It will return non zero exit code if the coverage is below the value of fail_under
If you want something more fine-grained than an overall percentage threshold, you can try this coverage goals program I threw together: https://nedbatchelder.com/blog/202111/coverage_goals.html
I figured it out with the following. Feel free to copy if it fits your needs!
The Makefile goes as follows:
test: $(TEST_DIRECTORY)/*/*
rm -f $(TEST_OUTPUT_DIRECTORY)/coverage_tests.log
cd $(TEST_DIRECTORY)/inference && coverage run -m unittest discover
for file in $^ ; do \
TEST_FILE=$$(echo $${file} | grep -E -o test_.*) ; \
FILE=$$(echo $${TEST_FILE} | sed 's/test_//') ; \
if [[ ! -z $${FILE} ]] ; then \
cd $(TEST_DIRECTORY)/inference && coverage report --include $(INFERENCE_DIRECTORY)/$${FILE} > $(TEST_OUTPUT_DIRECTORY)/coverage_temp.log ; \
cd $(TEST_DIRECTORY) && ./coverage_check.sh $${FILE} < $(TEST_OUTPUT_DIRECTORY)/coverage_temp.log >> $(TEST_OUTPUT_DIRECTORY)/coverage_tests.log ; \
rm -f $(TEST_OUTPUT_DIRECTORY)/coverage_temp.log ; \
fi ; \
done
rm -f $(TEST_DIRECTORY)/inference/.coverage
and it utilizes a coverage check script which is here:
#!/bin/sh
while read line; do
for word in $line; do
if [[ "$word" == *"%"* ]]; then
COVERAGE=`echo $word | sed 's/%//g'`
if [ $COVERAGE -gt 80 ]
then
echo "$1: PASSED $word"
break 2
else
echo "$1: FAILED $word"
break 2
fi
fi
done
done
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am new to bash and wanted to learn what this code is trying to do, if it is done poorly or with errors and how it can be improved.
COMMAND=$1
case $COMMAND in
"upgrade")
UPSCRIPT=`ls -t ./assets/upgrade | head -n1`
python ./assets/upgrade/$UPSCRIPT | tee >> biglog.txt
VERSION=$(echo $UPSCRIPT | awk -F. '{print $1}')
echo `date` $VERSION > ./version.txt
test -e ./artifcts && rm -rf ./artifacts
;;
"downgrade")
DOWNSCRIPT=`ls -t ./assets/downgrade | head -n1`
python ./assets/downgrade/$DOWNSCRIPT | tee >> biglog.txt
VERSION=$(echo $UPSCRIPT | awk -F. '{print $1}')
echo `date` $VERSION > ./version.txt
test -e ./artifcts && rm -rf ./artifacts
;;
*)
while read -r UPSCRIPT; do
python $UPSCRIPT | tee >> biglog.txt
VERSION=$(echo $UPSCRIPT | awk -F. '{print $1}')
echo `date` $VERSION > ./version.txt
test -e ./artifcts && rm -rf ./artifacts
done <<< $(find "./assets/update" -type f -name "*.py")
esac
Use lower case variable names. Upper case is recommended for environment and shell internal variables.
Use $() instead of `...`. It nests better.
use parameter expansion instead of running a command in a subshell, if possible. It's much faster.
Where the logic of the script was unclear, I left a comment in the code.
#! /bin/bash
command=$1
artifacts=./artifacts
case "$command" in
upgrade)
upscript=$(ls -t ./assets/upgrade | head -n1)
python ./assets/upgrade/"$upscript" | tee >> biglog.txt
version=${upscript%.*}
echo $(date) "$version" > ./version.txt
test -e "$artifacts" && rm -rf "$artifacts" # artifacts or artifcts?
;;
downgrade)
downscript=$(ls -t ./assets/downgrade | head -n1)
python ./assets/downgrade/"$downscript" | tee >> biglog.txt
version=${downscript%.*} # upscript or downscript?
echo $(date) "$version" > ./version.txt
test -e "$artifacts" && rm -rf "$artifacts"
;;
*)
while read -r upscript; do
python "$upscript" | tee >> biglog.txt
version=${upscript%.*}
echo $(date) "$version" > ./version.txt
test -e "$artifacts" && rm -rf "$artifacts"
done <<< $(find "./assets/update" -type f -name '*.py')
esac
I would probably also extract the common logic from upgrade and downgrade to a function to avoid repetition.
Parsing the output of ls or find is suspicious, as file names can contain weird characters. I'd need to understand more what you're trying to do to fix that.
Running this on osx...
cd ${BUILD_DIR}/mydir && for DIR in $(find ./ '.*[^_].py' | sed 's/\/\//\//g' | awk -F "/" '{print $2}' | sort |uniq | grep -v .py); do
if [ -f $i/requirements.txt ]; then
pip install -r $i/requirements.txt -t $i/
fi
cd ${DIR} && zip -r ${DIR}.zip * > /dev/null && mv ${DIR}.zip ../../ && cd ../
done
cd ../
error:
(env) ➜ sh package_lambdas.sh find: .*[^_].py: No such file or directory
why?
find takes as an argument a list of directories to search. You provided what appears to be regular expression. Because there is no directory named (literally) .*[^_].py, find returns an error.
Below I have revised your script to correct that mistake (if I understand your intention). Because I see so many ill-written shell scripts these days, I've taken the liberty of "traditionalizing" it. Please see if you don't also find it more readable.
Changes:
use #!/bin/sh, guaranteed to be on an Unix-like system. Faster than bash, unless (like OS X) it is bash.
use lower case for variable names to distinguish from system variables (and not hide them).
eschew braces for variables (${var}); they're not needed in the simple case
do not pipe output to /usr/bin/true; route it to dev/null if that's what you mean
rm -f by definition cannot fail; if you meant || true, it's superfluous
put then and do on separate lines, easier to read, and that's how the Bourne shell language was meant to be used
Let && and || serve as line-continuation, so you can see what's happening step by step
Other changes I would suggest:
Use a subshell when changing the working directory temporarily. When it terminates, the working directory is restored automatically (retained by the parent), saving you the cd .. step, and errors.
Use set -e to cause the script to terminate on error. For expected errors, use || true explicitly.
Change grep .py to grep '\.py$', just for good measure.
To avoid Tilting Matchstick Syndrome, use something other than / as a sed substitute delimiter, e.g., sed 's://:/:g'. But sed could be avoided altogether with awk -F '/+' '{print $2}'.
Revised version:
#! /bin/sh
src_dir=lambdas
build_dir=bin
mkdir -p $build_dir/lambdas
rm -rf $build_dir/*.zip
cp -r $src_dir/* $build_dir/lambdas
#
# The sed is a bit complicated to be osx / linux cross compatible :
# ( .//run.sh vs ./run.sh
#
cd $build_dir/lambdas &&
for L in $(find . -exec grep -l '.*[^_].py' {} + |
sed 's/\/\//\//g' |
awk -F "/" '{print $2}' |
sort |
uniq |
grep -v .py)
do
if [ -f $i/requirements.txt ]
then
echo "Installing requirements"
pip install -r $i/requirements.txt -t $i/
fi
cd $L &&
zip -r $L.zip * > /dev/null &&
mv $L.zip ../../ &&
cd ../
done
cd ../
The find(1) manpage says its args are [path ...] [expression], where "expression" consists of "primaries" and "operands" (-flags). '.*[^-].py' doesn't look like any expression, so it's being interpreted as a path, and it's reporting that there is no file named '.*[^-].py' in the working directory.
Perhaps you meant:
find ./ -regex '.*[^-].py'
When i run this script from shell /var/tmp/server_always_alive.sh manually has no problem works. But when i let it run with crontab it never running even all the logics are correct.
How can i make the python server.py run via this crontab?
sun#sun-Inspiron-One-2320:~$ uname -a
Linux sun-Inspiron-One-2320 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
/var/tmp/server_always_alive.sh:
#!/bin/bash
echo "test 1"
echo "test 2"
# 58888 TCP port is server port of server.py, if its not running server.py has to execute auto
main=$(export DISPLAY=:0.0 && lsof -i tcp:58888 | grep LISTEN | awk '{print $2}')
if [ -z "$main" ]; then
export DISPLAY=:0.0 && python /var/tmp/python/server.py &
sleep 2
break
fi
echo "test 3"
echo "all runs except python server.py"
crontab :
* * * * * /var/tmp/server_always_alive.sh &
DISPLAY=:0.0 indicates your python 'server' is connecting to an X server. Why?
Cron won't have the necessary X "cookie". , and almost certainly won't be running as the same user as the X server.
edit: I'll take you at your word that you're running as the correct user.
edit: If you really need to run a graphical program from cron, try
xhost +si:localuser:`whoami`
For reference to alternative workout.
Step 1: Put python script on this following script= save it in /var/tmp/main.sh
A) NON GUI BASED
#!/bin/sh
script='/my/python/script/is/here/ok.py'
/usr/bin/python $script &
B) GUI (GTK/TK etc)
#!/bin/sh
script='/my/python/script/is/here/ok.py'
export DISPLAY=:0.0 && /usr/bin/python $script &
Step 2: now make a file in /etc/init.d/scriptname_what_ever_feed_i_name with following (copy paste)
#! /bin/sh
PATH=/bin:/usr/bin:/sbin:/usr/sbin
DAEMON=/home/CHANGE _ ____ HERE ______ to the Step 1 file name
PIDFILE=/var/run/scriptname.pid
test -x $DAEMON || exit 0
. /lib/lsb/init-functions
case "$1" in
start)
log_daemon_msg "Starting feedparser"
start_daemon -p $PIDFILE $DAEMON
log_end_msg $?
;;
stop)
log_daemon_msg "Stopping feedparser"
killproc -p $PIDFILE $DAEMON
PID=`ps x |grep feed | head -1 | awk '{print $1}'`
kill -9 $PID
log_end_msg $?
;;
force-reload|restart)
$0 stop
$0 start
;;
status)
status_of_proc -p $PIDFILE $DAEMON atd && exit 0 || exit $?
;;
*)
echo "Usage: /etc/init.d/atd {start|stop|restart|force-reload|status}"
exit 1
;;
esac
exit 0
Step 3: make it executeable chmod +x /etc/init.d/scriptname_what_ever_feed_i_name and chmod -R 777 /etc/init.d/scriptname_what_ever_feed_i_name so that as any user you can execute it without sudo.
Step 4: for example:
/etc/init.d/scriptname_what_ever_feed_i_name restart
or
* * * * * /etc/init.d/scriptname_what_ever_feed_i_name restart
WORKING - and much better/safer.
ps aux | grep python
root 5026 0.5 0.3 170464 19336 pts/0 S 07:40 0:00 /usr/bin/python /var/tmp/python/server.py
Now you can start and stop your python script using the command /etc/init.d/scriptname start or stop manually or cron etc
28I tried to make a script that's converting images source from normal links to base64 encoding in html files.
But there is a problem: sometimes, sed tells me
script.sh: line 25: /bin/sed: Argument list too long
This is the code:
#!/bin/bash
# usage: ./script.sh file.html
mkdir images_temp
for i in `sed -n '/<img/s/.*src="\([^"]*\)".*/\1/p' $1`;
do echo "######### download the image";
wget -P images_temp/ $i;
#echo "######### convert the image for size saving";
#convert -quality 70 `echo ${i##*/}` `echo ${i##*/}`.temp;
#echo "######### rename temp image";
#rm `echo ${i##*/}` && mv `echo ${i##*/}`.temp `echo ${i##*/}`;
echo "######### encode in base64";
k="`echo "data:image/png;base64,"`$(base64 -w 0 images_temp/`echo ${i##*/}`)";
echo "######### deletion of images_temp pictures";
rm images_temp/*;
echo "######### remplace string in html";
sed -e "s|$i|$k|" $1 > temp.html;
echo "######### remplace final file";
rm -rf $1 && mv temp.html $1;
sleep 5;
done;
I think the $k argument is too long for sed when the image is bigger than ~128ko; sed can't process it.
How do I make it work ?
Thank you in advance !
PS1: and sorry for the very very ugly code
PS2: or how do I do that in python ? PHP ? I'm open !
Your base64 encoded image can be multiple megabytes, while the system may place a limit on the maximum length of parameters (traditionally around 128k). Sed is also not guaranteed to handle lines over 8kb, though versions like GNU sed can deal with much more.
If you want to try with your sed, provide the instructions in a file rather than on the command line. Instead of
sed -e "s|$i|$k|" $1 > temp.html;
use
echo "s|$i|$k|" > foo.sed
sed -f foo.sed "$1" > temp.html