Processing files from shell script to argument inside python - python

I am manually running file with command line
python script.py input_files/input.txt text_out/output.json
while inside script.py
there is
input_path = sys.argv[1]
out_path = sys.argv[2]
now I have shell script and I want to make it for all files in one go.
I am facing issue.
My shell script is like below
there are two folders 1) input_files and 2) text_out
for i in input_files/*.txt;
do name=`echo "$i" | cut -d'.' -f1`
echo "$name"
python script.py -i "$i" text_out/"${name}.json"
done
but when I execude .sh as stated above, it is throwing error as sys.argv is not picking properly.
out_path = sys.argv[2]
IndexError: list index out of range
If you can guide what to change in .py or in shell .sh script would be kind.

I don't know exactly why you're getting a ListIndexOutOfRange, but it doesn't really matter, since you're also passing -i after script.py, so out_path cannot be what you expect.
$ cat script.py
import sys; print(len(sys.argv)); print(sys.argv); print({i:v for i, v in enumerate(sys.argv)})
$ (set -x; i=input_files/foo.txt; name=`echo "$i" | cut -d'.' -f1`; python script.py -i "$i" text_out/"${name}.json")
+ i=input_files/foo.txt
++ echo input_files/foo.txt
++ cut -d. -f1
+ name=input_files/foo
+ python script.py -i input_files/foo.txt text_out/input_files/foo.json
4
['script.py', '-i', 'input_files/foo.txt', 'text_out/input_files/foo.json']
{0: 'script.py', 1: '-i', 2: 'input_files/foo.txt', 3: 'text_out/input_files/foo.json'}
I recommend using argparse whenever you need to deal with cli arguments in Python. It will give better feedback and reduce the ambiguity of looking directly at indices.
$ cat script2.py
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--input-file', type=argparse.FileType('r'))
parser.add_argument('output_file', type=argparse.FileType('w'))
print(parser.parse_args())
$ (set -x; i=input_files/foo.txt; name=`echo "$i" | cut -d'.' -f1`; python script2.py -i "$i" text_out/"${name}.json")
+ i=input_files/foo.txt
++ echo input_files/foo.txt
++ cut -d. -f1
+ name=input_files/foo
+ python script2.py -i input_files/foo.txt text_out/input_files/foo.json
Namespace(input_file=<_io.TextIOWrapper name='input_files/foo.txt' mode='r' encoding='UTF-8'>, output_file=<_io.TextIOWrapper name='text_out/input_files/foo.json' mode='w' encoding='UTF-8'>)

for i in input_files/*.txt;
do name=`echo "$i" | cut -d'.' -f1`
echo "$name"
python script.py "$i" text_out/"${name}.json"
done

You get the IndexError: list index out of range error because you try to access the list at index 2 for out_path (in your Python script).
But in your shell script you're just passing one argument in the Python script (at this line: python script.py text_out/"${name}.json").
Your first example (python script.py input_files/input.txt text_out/output.json) works ofc, because you're just passing two arguments into the Python script. That's why you can access sys.argv easily at index 1 and 2.
You should check the length of len(sys.argv) to know how many arguments are passed into the Python script.
Example Shell Script
Your shell script should look something like this to get rid if the IndexError:
for i in input_files/*.txt;
do name=`echo "$i" | cut -d'.' -f1`
echo "$name"1
# Pass two args into your Python script
python script.py input.txt output.txt
done

Related

A bash script that reads python files recursively and stops after the output of each file exists

I have a bash that reads *.py scripts recursively and stops when the output of each *.py files exists in the directory (a *.pkl file). The main idea of the bash is that if the output not exists, the python script has to run again until creating the output for each *.py script. 
bash.sh
model1.py
model2.py
model3.py
model1.pkl # expected output
model2.pkl # expected output
model3.pkl # expected output
However, I have a problem here: When the second/third output NOT exists (from the second/third .*py script) the bash did not run again (while if the first output NOT exists the bash run again, as should be).
My bash is the following:
#!/bin/bash
for x in $(find . -type f -name "*.py"); do
if [[ ! -f `basename -s .py $x`.pkl ]]; then #output with the same name of *.py file
python3 ${x}
else
exit 0
fi
done
So, how I can force the bash script to run again if the output of any *.py script is missing? Or it is a problem with the name of the outputs?
I tried using the commands while read and until, but I failed to do the script read all the *.py files.
Thanks in advance!
try this: not the best way: but at least will help you in right direction.
keep_running(){
for f in $(find . -type f -name "*.py");
do
file_name=$(dirname $f)/$(basename $f .py).pk1
if [ ! -f "$file_name" ];
then
echo "$file_name doesn't exists" # here you can run your python script
fi
done
}
cnt_py=0
cnt_pkl=1
while [ $cnt_pkl -ne $cnt_py ] ; do
keep_running
cnt_py=`ls -1 *.py| wc -l`
cnt_pkl=`ls -1 *.pk1| wc -l`
done

Robust Python3 shebang line?

Is there a way to write the shebang line such that it will find the Python3 interpreter, if present?
Naively, from PEP 394 I would expect that #!/usr/bin/env python3 should work.
However, I've noticed that on some systems where python is Python3, they don't provide a python3 alias. On these systems, you'd need to use #!/usr/bin/env python to get Python3.
Is there a robust way to handle this ambiguity? Is there some way to write the shebang line such that it will use python3 if present, but try python if not? (Requiring that end users manually fix their systems to add a python3 alias is not ideal.)
The only way I can see to do this is to provide your own shebang wrapper to call the correct version of python. If you can reliably place the wrapper in a set location you can do this:
Create wrapper script, e.g. /usr/local/bin/python3_wrapper
#!/bin/bash
cmd="$1"
shift
if which python3 >/dev/null; then
exec python3 "$cmd" "$#"
elif which python >/dev/null; then
version=$(python --version 2>&1 | cut -d' ' -f2 | cut -d. -f1)
if [[ "$version" == "3" ]]; then
exec python "$cmd" "$#"
else
echo "python is version $version (python3 not found)"
fi
else
echo "python3 nor python found"
fi
exit 1
Then use the following shebang in your script:
#!/usr/local/bin/python3_wrapper
Your other option would be to call a python script that works in both version 2 and 3 that then calls your python3 script using the correct executable. If your script is called script.py then rename it to script.py3 and create script.py as follows:
#!/usr/bin/env python
import os
import sys
if sys.version_info.major == 3:
exe = "python" # python is version 3.x
else:
exe = "python3" # python is not version 3.x so try python3
try:
os.execvp(exe, [exe, sys.argv[0]+'3'] + sys.argv[1:])
except:
print(exe, "not found")
sys.exit(1)

Run Python Script with variable in arguments

I have a python command that runs as follows:
python script.py -file 1000G_EUR_Phase3_plink/1000G.NUMBER --out GTEx_Cortex.chrNUMBER
I would like to replace the NUMBER variable with the numbers 1:20. So if I replace NUMBER with 1 it would look like this:
python script.py -file 1000G_EUR_Phase3_plink/1000G.1 --out GTEx_Cortex.chr1
and this on the second iteration (if I replace it with 2):
python script.py -file 1000G_EUR_Phase3_plink/1000G.2 --out GTEx_Cortex.chr2
But I don't want to keep manually changing NUMBER 20 times. I want to automate the entire thing.
How can I do this in the command prompt? Should this be done in VIM or is there another way in python?
Thanks!
for i in `seq 1 20`;do python script.py -file 1000G_EUR_Phase3_plink/1000G.${i} --annot GTEx_Cortex_chr1.annot.gz --out GTEx_Cortex.chr${i};done
If you are doing this frequently you could also write a bash script.
Create a file run_stuff that loops through commands. It could be analogous to this:
#!/bin/bash
n=$1
i=1
while (( i <= n )); do
python prog${i}.py
(( i = i + 1 ))
done
The above script runs prog1.py, then prog2.py, and so on. For your code, just replace the 5th line with the analogous line you want.
Then in the terminal you would do:
chmod u+x run_stuff
./run_stuff 20
The chmod command just changes the permissions of the file so you can execute it.

Problems running terminal command via Python

I'm working on a small project where I need to control a console player via python. This example command works perfectly on the Linux terminal:
mplayer -loop 0 -playlist <(find "/mnt/music/soundtrack" -type f | egrep -i '(\.mp3|\.wav|\.flac|\.ogg|\.avi|\.flv|\.mpeg|\.mpg)'| sort)
In Python I'm doing the following:
command = """mplayer -loop 0 -playlist <(find "/mnt/music/soundtrack" -type f | egrep -i '(\.mp3|\.wav|\.flac|\.ogg|\.avi|\.flv|\.mpeg|\.mpg)'| sort)"""
os.system(command)
The problem is when I try it using Python it gives me an error when I run it:
sh: 1: Syntax error: "(" unexpected
I'm really confused here because it is the exact same string. Why doesn't the second method work?
Thanks.
Your default user shell is probably bash. Python's os.system command calls sh by default in linux.
A workaround is to use subprocess.check_call() and pass shell=True as an argument to tell subprocess to execute using your default user shell.
import subprocess
command = """mplayer -loop 0 -playlist <(find "/mnt/music/soundtrack" -type f | egrep -i '(\.mp3|\.wav|\.flac|\.ogg|\.avi|\.flv|\.mpeg|\.mpg)'| sort)"""
subprocess.check_call(command, shell=True)
Your python call 'os.system' is probably just using a different shell than the one you're using on the terminal:
os.system() execute command under which linux shell?
The shell you've spawned with os.system may not support parentheses for substitution.
import subprocess
COMND=subprocess.Popen('command',shell=True,stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
COMND_bytes = COMND.stdout.read() + COMND.stderr.read()
print(COMND_bytes)

Python Popen grep

I'd like Popen to execute:
grep -i --line-buffered "grave" data/*.txt
When run from the shell, this gives me the wanted result. If I start, in the very same directory where I test grep, a python repl and follow the instruction from the docs, I obtain what should be the proper argument list to feed Popen with:
['grep', '-i', '--line-buffered', 'grave', 'data/*.txt']
The result of p = subprocess.Popen(args) is
grep: data/*.txt: No such file or directory
and if I try p = subprocess.Popen(args, shell=True), I get:
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
Any help on how to perform the wanted process? I'm on MacOS Lion.
If you type * in bash the shell expands it to the files in the given directory before executing the command. Python's Popen does no such thing, so what you're doing when you call Popen like that is telling grep there is a file called *.txt in the data directory, instead of all the .txt files in the data directory. That file doesn't exist and you get the expected error.
To solve this you can tell python to run the command through the shell by passing shell=True to Popen:
subprocess.Popen('grep -i --line-buffered grave data/*.txt', shell=True)
Which gets translated to:
subprocess.Popen(['/bin/sh', '-c', 'grep -i --line-buffered "grave" data/*.txt'])
As explained in the documentation of Popen.
You have to use a string instead of a list here, because you want to execute /bin/sh -c "grep -i --line-buffered "grave" data/*.txt" (N.B. quotes around the command, making it a single argument to sh). If you use a list this command is run: /bin/sh -c grep -i --line-buffered "grave" data/*.txt, which gives you the output of simply running grep.
The problem is that shell makes file globbing for you: data/*.txt
You will need to do it youself, for example, by using glob module.
import glob
cmd_line = ['grep', '-i', '--line-buffered', 'grave'] + glob.glob('data/*.txt')

Categories