Bash - change image urls to base64 in html - python

28I tried to make a script that's converting images source from normal links to base64 encoding in html files.
But there is a problem: sometimes, sed tells me
script.sh: line 25: /bin/sed: Argument list too long
This is the code:
#!/bin/bash
# usage: ./script.sh file.html
mkdir images_temp
for i in `sed -n '/<img/s/.*src="\([^"]*\)".*/\1/p' $1`;
do echo "######### download the image";
wget -P images_temp/ $i;
#echo "######### convert the image for size saving";
#convert -quality 70 `echo ${i##*/}` `echo ${i##*/}`.temp;
#echo "######### rename temp image";
#rm `echo ${i##*/}` && mv `echo ${i##*/}`.temp `echo ${i##*/}`;
echo "######### encode in base64";
k="`echo "data:image/png;base64,"`$(base64 -w 0 images_temp/`echo ${i##*/}`)";
echo "######### deletion of images_temp pictures";
rm images_temp/*;
echo "######### remplace string in html";
sed -e "s|$i|$k|" $1 > temp.html;
echo "######### remplace final file";
rm -rf $1 && mv temp.html $1;
sleep 5;
done;
I think the $k argument is too long for sed when the image is bigger than ~128ko; sed can't process it.
How do I make it work ?
Thank you in advance !
PS1: and sorry for the very very ugly code
PS2: or how do I do that in python ? PHP ? I'm open !

Your base64 encoded image can be multiple megabytes, while the system may place a limit on the maximum length of parameters (traditionally around 128k). Sed is also not guaranteed to handle lines over 8kb, though versions like GNU sed can deal with much more.
If you want to try with your sed, provide the instructions in a file rather than on the command line. Instead of
sed -e "s|$i|$k|" $1 > temp.html;
use
echo "s|$i|$k|" > foo.sed
sed -f foo.sed "$1" > temp.html

Related

Download ipynb from colab notebook Url

Given a list of colab notebooks how can I download the ipynb of each one of them using wget or curl?
https://colab.research.google.com/notebooks/gpu.ipynb
https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/tf2_arbitrary_image_stylization.ipynb
https://colab.research.google.com/drive/1sVsoBd9AjckIXThgtZhGrHRfFI6UUYOo
This question explains how to download notebooks stored on gdrive, but what about notebooks stored on github or on colab directories (colab.research.google.com/notebooks/) or other sources?
There're 2 options I recommend, assuming all the target url are in a text file. Save the code to .sh file (e.g dlnb.sh) and all the urls in a text file (e.g list.txt) like
https://colab.research.google.com/notebooks/gpu.ipynb
https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/tf2_arbitrary_image_stylization.ipynb
https://colab.research.google.com/drive/1sVsoBd9AjckIXThgtZhGrHRfFI6UUYOo
tl;dr: I would recommend to use solution 2 which use gdown (just run pip install gdown). Since wget can't save notebook with url doesn't have its name. Then run bash dlnb.sh list.txt in terminal
1.wget and cat only. This one has one raw back, we only use wget so the link that doesn't have a name will be save as random_id_here.ipynb
dlnb.sh
grabid() { fileid=$( echo "$1" | egrep -o '(\w|-){26,}' ); echo $fileid; }
cat $1 | while read line || [[ -n $line ]];
do
if [[ $line != *.ipynb ]]; then
id=$(grabid "$line")
wget -O $id.ipynb 'https://docs.google.com/uc?export=download&id='$id;
else
wget $line;
fi;
done
I take this reg ex, which is egrep -o '(\w|-){26,}' and plug it in my function, which it will extract and return id from the link
grabid() { fileid=$( echo "$1" | egrep -o '(\w|-){26,}' ); echo $fileid; }
assign id by calling grabid(), line is the url
id=$(grabid "$line")
then using while read line || [[ -n $line ]]; loop through each line and download it using wget, you can see the explantion of the while loop in the code here
wget -O $id.ipynb 'https://docs.google.com/uc?export=download&id='$id;
OR
2.A better solution by install gdown. This work similar as solution 1, but using gdown instead of wget
dlnb.sh
grabid() { fileid=$( echo "$1" | egrep -o '(\w|-){26,}' ); echo $fileid; }
cat $1 | while read line || [[ -n $line ]];
do
if [[ $line != *.ipynb ]]; then
gdown $(grabid "$line");
else
gdown $line;
fi;
done
If the url is not end with .ipynb if [[ $line != *.ipynb ]]; then gdown will grab the id $(grabid "$line"); and download it instead, while solution 1 will save the notebook as id_of_notebook.ipynb. gdown will save as its name instead.

Post real time output to Slack with bash script

I have a python script that I am executing with cron job. This script generates some output while executing and I wish to post it to Slack channel in real time.
Here is a bash script that I have:
#!/bin/bash
log_file=logs_$(date '+\%Y-\%m-\%d_\%H:\%M').txt
cd /path/to/script/run.py > /path/to/logs/${log_file} 2>&1
cat /path/to/logs/${log_file} | while read LINE; do
(echo "$LINE" | grep -e "Message" ) && curl -X POST --silent --data-urlencode \
"payload={\"text\": \"$(echo $LINE | sed "s/\"/'/g")\"}" "https://hooks.slack.com/services/xxxxxx";
done
This scrip works but it of course posts all messages to slack once the python script has already been executed. Is there any way I could configure it so that messages would be sent to Slack in real time while the python script is still being executed?
You may be able to read the output from your run.py script via process substitution:
#!/bin/bash
log_file=logs_$(date '+\%Y-\%m-\%d_\%H:\%M').txt
while read -r line ; do
echo "$line"
(echo "$line" | grep -e "Message" ) && curl -X POST --silent --data-urlencode \
"payload={\"text\": \"$(echo $line | sed "s/\"/'/g")\"}" "https://hooks.slack.com/services/xxxxxx";
done < <(/path/to/script/run.py 2>&1) >> "$log_file"
It may also prove useful to paste your code into shellcheck.net and have a look at the suggested changes.
Your script shouldn't work at all as you're not executing run.py but you're changing your working directory into it, so unless run.py is a directory, your script should fail.
Also, commands in bash scripting are executed sequentially, so if you launch your python command and then read the log, no wonder that you're not getting log lines in real time.
What i would do is using some pipes and xargs:
#!/bin/bash
/path/to/script/run.py | grep -e "Message" | sed "s/\"/'/g" | xargs -I{} curl -L -X POST --silent --data-urlencode 'payload={"text":"{}"}' https://hooks.slack.com/services/xxx
I've added -L to the curl command because hooks.slack.com makes a redirect to api.slack.com and without that flag curl will stop after the 302 instead of following the Location header in the response.

pass bash script args as named parameters to a command inside the script

I have a bash script that takes two parameters. Inside that script, I need to call ssh using a heredoc and call a method that expects the two arguments. For example:
ssh -o "IdentitiesOnly=yes" -t -i $key -l user localhost << 'ENDSSH'
/my_python_app.py -u -t tar -p $1 -f $2
ENDSSH
key is set by my script, I know that part is good.
However, my_python_app prints out args and it doesn't show any arguments for -p and -f
I would call my script like
my_script /tmp filename
I use argparse in my python app, but I am also printing out sys.argv and it gives me:
['my_python_app.py', '-u', '-t', 'tar', '-p', '-f']
Note there are no values received for -p and -f. (-u is a flag, and that is set correctly).
How do I pass $1 and $2 to my_python_app as the -p and -f values?
Remove the quotes around the here-document delimiter (i.e. use << ENDSSH instead of << 'ENDSSH'). The quotes tell the shell not to expand variable references (and some other things) in the here-document, so $1 and $2 are passed through to the remote shell... which doesn't have any parameters so it replaces them with nothing.
BTW, removing the single-quotes may not fully work, since if either argument contains whitespace or shell metacharacters, the remote end will parse those in a way you probably don't intend. As long as neither argument can contain a single-quote, you can use this:
ssh -o "IdentitiesOnly=yes" -t -i $key -l user localhost << ENDSSH
/my_python_app.py -u -t tar -p '$1' -f '$2'
ENDSSH
If either might contain single-quotes, it gets a little more complicated.
The more paranoid way to do this would be:
# store these in an array to reduce the incidental complexity below
ssh_args=( -o "IdentitiesOnly=yes" -t -i "$key" -l user )
posixQuote() {
python -c 'import sys, pipes; sys.stdout.write(pipes.quote(sys.argv[1])+"\n")' "$#"
}
ssh "${ssh_args[#]}" localhost "bash -s $(posixQuote "$1") $(posixQuote "$2")" << 'ENDSSH'
/path/to/my_python_app.py -u -t tar -p "$1" -f "$2"
ENDSSH
If you know with certainty that the destination account's shell matches the local one (bash if the local shell is bash, ksh if the local shell is ksh), consider the following instead:
printf -v remoteCmd '%q ' /path/to/my_python_app.py -u -t tar -p "$1" -f "$2"
ssh "${ssh_args[#]}" localhost "$remoteCmd"

No such file or directory in find running .sh

Running this on osx...
cd ${BUILD_DIR}/mydir && for DIR in $(find ./ '.*[^_].py' | sed 's/\/\//\//g' | awk -F "/" '{print $2}' | sort |uniq | grep -v .py); do
if [ -f $i/requirements.txt ]; then
pip install -r $i/requirements.txt -t $i/
fi
cd ${DIR} && zip -r ${DIR}.zip * > /dev/null && mv ${DIR}.zip ../../ && cd ../
done
cd ../
error:
(env) ➜ sh package_lambdas.sh find: .*[^_].py: No such file or directory
why?
find takes as an argument a list of directories to search. You provided what appears to be regular expression. Because there is no directory named (literally) .*[^_].py, find returns an error.
Below I have revised your script to correct that mistake (if I understand your intention). Because I see so many ill-written shell scripts these days, I've taken the liberty of "traditionalizing" it. Please see if you don't also find it more readable.
Changes:
use #!/bin/sh, guaranteed to be on an Unix-like system. Faster than bash, unless (like OS X) it is bash.
use lower case for variable names to distinguish from system variables (and not hide them).
eschew braces for variables (${var}); they're not needed in the simple case
do not pipe output to /usr/bin/true; route it to dev/null if that's what you mean
rm -f by definition cannot fail; if you meant || true, it's superfluous
put then and do on separate lines, easier to read, and that's how the Bourne shell language was meant to be used
Let && and || serve as line-continuation, so you can see what's happening step by step
Other changes I would suggest:
Use a subshell when changing the working directory temporarily. When it terminates, the working directory is restored automatically (retained by the parent), saving you the cd .. step, and errors.
Use set -e to cause the script to terminate on error. For expected errors, use || true explicitly.
Change grep .py to grep '\.py$', just for good measure.
To avoid Tilting Matchstick Syndrome, use something other than / as a sed substitute delimiter, e.g., sed 's://:/:g'. But sed could be avoided altogether with awk -F '/+' '{print $2}'.
Revised version:
#! /bin/sh
src_dir=lambdas
build_dir=bin
mkdir -p $build_dir/lambdas
rm -rf $build_dir/*.zip
cp -r $src_dir/* $build_dir/lambdas
#
# The sed is a bit complicated to be osx / linux cross compatible :
# ( .//run.sh vs ./run.sh
#
cd $build_dir/lambdas &&
for L in $(find . -exec grep -l '.*[^_].py' {} + |
sed 's/\/\//\//g' |
awk -F "/" '{print $2}' |
sort |
uniq |
grep -v .py)
do
if [ -f $i/requirements.txt ]
then
echo "Installing requirements"
pip install -r $i/requirements.txt -t $i/
fi
cd $L &&
zip -r $L.zip * > /dev/null &&
mv $L.zip ../../ &&
cd ../
done
cd ../
The find(1) manpage says its args are [path ...] [expression], where "expression" consists of "primaries" and "operands" (-flags). '.*[^-].py' doesn't look like any expression, so it's being interpreted as a path, and it's reporting that there is no file named '.*[^-].py' in the working directory.
Perhaps you meant:
find ./ -regex '.*[^-].py'

how to add getopt options in a bash script

I have written a bash script that consists of multiple Unix commands and Python scripts. The goal is to make a pipeline for detecting long non coding RNA from a certain input. Ultimately I would like to turn this into an 'app' and host it on some bioinformatics website. One problem I am facing is using getopt tools in bash. I couldn't find a good tutorial that I understand clearly. In addition any other comments related to the code is appreciated.
#!/bin/bash
if [ "$1" == "-h" ]
then
echo "Usage: sh $0 cuffcompare_output reference_genome blast_file"
exit
else
wget https://github.com/TransDecoder/TransDecoder/archive/2.0.1.tar.gz && tar xvf 2.0.1 && rm -r 2.0.1
makeblastdb -in $3 -dbtype nucl -out $3.blast.out
grep '"u"' $1 | \
gffread -w transcripts_u.fa -g $2 - && \
python2.7 get_gene_length_filter.py transcripts_u.fa transcripts_u_filter.fa && \
TransDecoder-2.0.1/TransDecoder.LongOrfs -t transcripts_u_filter.fa
sed 's/ .*//' transcripts_u_filter.fa | grep ">" | sed 's/>//' > transcripts_u_filter.fa.genes
cd transcripts_u_filter.fa.transdecoder_dir
sed 's/|.*//' longest_orfs.cds | grep ">" | sed 's/>//' | uniq > longest_orfs.cds.genes
grep -v -f longest_orfs.cds.genes ../transcripts_u_filter.fa.genes > longest_orfs.cds.genes.not.genes
sed 's/^/>/' longest_orfs.cds.genes.not.genes > temp && mv temp longest_orfs.cds.genes.not.genes
python ../extract_sequences.py longest_orfs.cds.genes.not.genes ../transcripts_u_filter.fa longest_orfs.cds.genes.not.genes.fa
blastn -query longest_orfs.cds.genes.not.genes.fa -db ../$3.blast.out -out longest_orfs.cds.genes.not.genes.fa.blast.out -outfmt 6
python ../filter_sequences.py longest_orfs.cds.genes.not.genes.fa.blast.out longest_orfs.cds.genes.not.genes.fa.blast.out.filtered
grep -v -f longest_orfs.cds.genes.not.genes.fa.blast.out.filtered longest_orfs.cds.genes.not.genes.fa > lincRNA_final.fa
fi
Here is how I run it:
sh test.sh cuffcompare_out_annot_no_annot.combined.gtf /mydata/db/Brapa_sequence_v1.2.fa TE_RNA_transcripts.fa
If you wanted the call to be :
test -c cuffcompare_output -r reference_genome -b blast_file
You would have something like :
#!/bin/bash
while getopts ":b:c:hr:" opt; do
case $opt in
b)
blastfile=$OPTARG
;;
c)
comparefilefile=$OPTARG
;;
h)
echo "USAGE : test -c cuffcompare_output -r reference_genome -b blast_file"
;;
r)
referencegenome=$OPTARG
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
:)
echo "Option -$OPTARG requires an argument." >&2
exit 1
;;
esac
done
In the string ":b:c:hr:",
- the first ":" tells getopts that we'll handle any errors,
- subsequent letters are the allowable flags. If the letter is followed by a ':', then getopts will expect that flag to take an argument, and supply that argument as $OPTARG

Categories