issue parsing shell program through python - python

My function run_deinterleave() is meant to copy code from the file deinterleave.sh then replace the placeholder (sra_data) with a file name which has been input by the user and then run it on the command line.
def run_deinterleave():
codes = open('Project/CODE/deinterleave.sh')
codex = codes.read()
print(inp_address)
codex = codex.replace('sra_data', inp_address)
#is opening this twice creating another pipeline?
stream = os.popen(codex)
codes.close()
self.txtarea.insert(END,codex)
#stuff
However, I keep getting this error:
/bin/sh: 5: Syntax error: "(" unexpected
The code in deinterleave.sh works fine and produces two individual files given an interleaved paired end sra_file (an output file from genetic sequencing machines, I think :P)
#1deinterleave paired end fastq file
paste - - - - - - - - < sra_data \
| tee >(cut -f 1-4 | tr "\t" "\n" > /home/lols/Project/reads-1.fq) \
| cut -f 5-8 | tr "\t" "\n" > /home/lols/Project/reads-2.fq

As the error message shows, the code was interpreted by /bin/sh; if you executed
/bin/sh Project/CODE/deinterleave.sh, you'd get the same error, because the process substitution >(…) is a Bash extension not understood by /bin/sh.
Besides, since you don't communicate with the shell code, we don't need pipes at all. So instead of os.popen I'd use subprocess.run, which allows to specify Bash as the shell.
subprocess.run(codex, shell=True, executable="bash")

The absolutely best fix is probably to replace the shell script with native Python code; but without a specification and/or sample input, I don't think we can tell you exactly how to do that.
An immediate and trivial fix is to change deinterlace so that it accepts an input file parameter.
#!/usr/bin/env bash
paste - - - - - - - - < "${1-sra_data}" |
tee >(cut -f 1-4 | tr "\t" "\n" > "${2-/home/lols/Project/reads-1.fq}") |
cut -f 5-8 | tr "\t" "\n" > "${3-/home/lols/Project/reads-2.fq}"
This refactoring also allows you to specify the names of the output files as the second and third command-line arguments.
Also, a Bash script really should not have a .sh extension, so probably take that out.
Explictly naming Bash in the shebang line should solve the error message you got when running Bash code in sh; perhaps see also Difference between sh and bash
With that, your Python code can be reduced to something like
subprocess.run(
['Project/CODE/deinterleave', inp_address],
# probably a good idea
check=True)
though I don't exactly understand the rest of the surrounding function, so it's not clear how exactly to rewrite it.
I think the shell script could be reimplemented something like
with open(inp_address, 'r') as sra_data, open(
'/home/lols/Project/reads-1.fq', 'w') as first, open(
'/home/lols/Project/reads-2.fq', 'w') as second:
for idx in range(4):
first.write(sra_data.readline())
for idx in range(4):
second.write(sra_data.readline())

Related

Send parameters to python from bash

I have a bash script that calls a python script with parameters.
In the bash script, I'm reading a file that contains one row of parameters separated by ", and then calls the python script with the line I read.
My problem is that the python gets the parameters separated by the space.
The line looks like this: "param_a" "Param B" "Param C"
Code Example:
Bash Script:
LINE=`cat $tmp_file`
id=`python /full_path/script.py $LINE`
Python Script:
print sys.argv[1]
print sys.argv[2]
print sys.argv[3]
Received output:
"param_a"
"Param
B"
Wanted output:
param_a
Param B
Param C
How can I send the parameters to the Python script the way I need?
Thanks!
What about
id=`python /full_path/script.py $tmp_file`
and
import sys
for line in open(sys.argv[1]):
print(line)
?
The issue is in how bash passes the arguments. Python has nothing do to with it.
So, you have to solve all these stuff before sending it to Python, I decided to use awk and xargs for this. (but xargs is the actual MVP here.)
LINE=$(cat $tmp_file)
awk -v ORS="\0" -v FPAT='"[^"]+"' '{for (i=1;i<=NF;i++){print substr($i,2,length($i)-2)}}' <<<$LINE |
xargs -0 python ./script.py
First $(..) is preferred over backticks, because it is more readable. You are making a variable after all.
awk only reads from stdin or a file, but you can force it to read from a variable with the <<<, also called "here string".
With awk I loop over all fields (as defined by the regex in the FPAT variable), and print them without the "".
The output record separator I choose is the NULL character (-v ORF='\0'), xargs will split on this character.
xargs will now parse the piped input by separating the arguments on NULL characters (set with -0) and execute the command given with the parsed arguments.
Note, while awk is found on most UNIX systems, I make use of FPAT which is a GNU awk extension and you might not be having GNU awk as default (for example Ubuntu), but gnu awk is usually just a install gawk away.
Also, the next command would be a quick and easy solution, but generally considered as unsafe, since eval will execute everything it receives.
eval "python ./script "$LINE
This can be done using bash arrays:
tmp_file='gash.txt'
# Set IFS to " which splits on double quotes and removes them
# Using read is preferable to using the external program cat
# read -a reads into the array called "line"
# UPPERCASE variable names are discouraged because of collisions with bash variables
IFS=\" read -ra line < "$tmp_file"
# That leaves blank and space elements in "line",
# we create a new array called "params" without those elements
declare -a params
for((i=0; i < ${#line[#]}; i++))
do
p="${line[i]}"
if [[ -n "$p" && "$p" != " " ]]
then
params+=("$p")
fi
done
# `backticks` are frowned upon because of poor readability
# I've called the python script "gash.py"
id=$(python ./gash.py "${params[#]}")
echo "$id"
gash.py:
import sys
print "1",sys.argv[1]
print "2",sys.argv[2]
print "3",sys.argv[3]
Gives:
1 param_a
2 Param B
3 Param C

EoF / Ctrl+D command in Bash + Python

I'm trying to make some functions in python so that I can connect to a linux terminal and do stuff (like in this case, create a file). The code I have, works partially. The only thing that doesn't work is if you want to do something after you have entered the code. Like for instance you create the file and then want to navigate somewhere else (cd /tmp) for instance. Instead of doing the next command, it will just add to the file created.
def create_file(self, name, contents, location):
try:
log.info("Creating a file...")
self.device.execute("mkdir -p {}".format(location))
self.cd_path(location)
self.device.sendline("cat > {}".format(name))
self.device.sendline("{}".format(contents))
self.device.sendline("EOF") # send the CTRL + D command to save and exit I tried here with ^D as well
except:
log.info("Failed to create the file!")
The contents of the file is:
cat test.txt
#!/bin/bash
echo "Fail Method Requested"
exit 1
EOF
ls -d /tmp/asdasd
The order of commands executed is:
execute.create_file(test.txt, the_message, the_location)
execute.check_path("/tmp/adsasd") #this function just checks with ls -d if the directory exists.
I have tried with sendline the following combinations:
^D, EOF, <<EOF
I don't really understand how I could make this happen. I just want to create a file with a specific message. (When researching on how to do this with VI I got the same problem, but there the command I needed was the one for ESC)
If anyone could help with some input that would be great!!
Edit: As Rob mentioned below, sending the character "\x04" actually works. For anyone else having this issue, you can also consult this chart for other combinations if needed:
http://donsnotes.com/tech/charsets/ascii.html
You probably need to send the EOF character, which is typically CONTROL-D, not the three characters E, O, and F.
self.device.sendline("\x04")
http://wiki.bash-hackers.org/syntax/redirection#here_documents
Here docs allow you to use any file input termination string you like to represent end of file ( such as the literal EOF you're attempting to use now). Quoting that string tells the shell not to interpret expansions inside the heredoc content, ensuring that said content is treated as literal.
Using pipes.quote() here ensures that filenames with literal quotes, $s, spaces, or other surprising characters won't break your script. (Of course, you'll need to import pipes; on Python 3, by contrast, this has moved to shlex.quote()).
self.device.sendline("cat > {} <<'EOF'".format(pipes.quote(name)))
Then you can write the EOF as is, having told bash to interpret it as the end of file input.

Python plumbum: Passing $ in an cmd argument

I am trying to execute the following command in python using plumbum:
sort -u -f -t$'\t' -k1,1 file1 > file2
However, I am having issues passing the -t$'\t' argument. Here is my code:
from plumbum.cmd import sort
separator = r"-t$'\t'"
print separator
cmd = (sort["-u", "-f", separator, "-k1,1", "file1"]) > "file2"
print cmd
print cmd()
I can see problems right away after print separator and print cmd() executes:
-t$'\t'
/usr/bin/sort -u -f "-t\$'\\t'" -k1,1 file1 > file2
The argument is wrapped in double quotes.
An extra \ before $ and \t is inserted.
How should I pass this argument to plumbum?
You may have stumbled into limitations of the command line escaping.
I could make it work using subprocess module, passing a real tabulation char litteraly:
import subprocess
p=subprocess.Popen(["sort","-u","-f","-t\t","-k1,1","file1",">","file2"],shell=True)
p.wait()
Also, full python short solution that does what you want:
with open("file1") as fr, open("file2","w") as fw:
fw.writelines(sorted(set(fr),key=lambda x : x.split("\t")[0]))
The full python solution doesn't work exactly the same way sort does when dealing with unicity. If 2 lines have the same first field but not the same second field, sort keeps one of them, whereas the set will keep both.
EDIT: unchecked but you just confirmed that it works: just tweak your plumbum code with:
separator = "-t\t"
could just work, although out of the 3 ones, I'd recommend the full python solution since it doesn't involve an external process and therefore is more pythonic and portable.

Why is sys.argv() giving me an index out of bound?

I have a textile with content like this:
honda motor co of japan doesn't expect output at its car manufacturing plant in thailand
When I run wc -l textfile.txt, I receive 0.
The problem is I am running a python script that needs to count the number of line in this text file and run accordingly. I have tried two ways of computing the number of lines but they both keep giving me 0 and my code refuses to run.
Python code:
#Way 1
with open(sys.argv[1]) as myfile:
row=sum(1 for line in myfile)
print(row)
#Way 2
row = run("cat %s | wc -l" % sys.argv[1]).split()[0]
I receive an error that says: with open(sys.argv[1]) as myfile IndexError: list index out of range
I am calling receiving this file from php:
exec('python testthis.py $file 2>&1', $output);
I suspect that argv.sys[1] is giving me an error.
There's nothing wrong with the first example of your Python code (way 1).
The problem is the PHP calling code; the string being passed to exec() uses single quotes which prevents the expansion of the $file variable into the command string. The resulting call therefore passes the literal string $file as the argument to exec(), which in turn runs the command in a shell. That shell treats $file as a shell variable and tries to expand it, but it is not defined, and so it expands to an empty string. The resulting call is:
python testthis.py 2>&1
to which Python raises IndexError: list index out of range because it is missing an argument.
To fix use double quotes around the command when calling exec() in PHP:
$file = 'test.txt';
exec("python testthis.py $file 2>&1", $output);
Now $file can be expanded into the string as required.
This does assume that you actually want to expand a PHP variable into the string. Because exec() runs the command in a shell, it is also possible to have the variable defined in the shell's environment, and it will be expanded by the shell into the final command. To do this you would use single quotes around the command passed to exec().
Note that the Python code of "way 1" will return a line count of 1, not 0 as does wc -l.

Passing individual lines from files into a python script using a bash script

This might be a simple question, but I am new to bash scripting and have spent quite a bit of time on this with no luck; I hope I can get an answer here.
I am trying to write a bash script that reads individual lines from a text file and passes them along as argument for a python script. I have a list of files (which I have saved into a single text file, all on individual lines) that I need to be used as arguments in my python script, and I would like to use a bash script to send them all through. Of course I can take the tedious way and copy/paste the rest of the python command to individual lines in the script, but I would think there is a way to do this with the "read line" command. I have tried all sorts of combinations of commands, but here is the most recent one I have:
#!/bin/bash
# Command Output Test
cat infile.txt << EOF
while read line
do
VALUE = $line
python fits_edit_head.py $line $line NEW_PARA 5
echo VALUE+"huh"
done
EOF
When I do this, all I get returned is the individual lines from the input file. I have the extra VALUE there to see if it will print that, but it does not. Clearly there is something simple about the "read line" command that I do not understand but after messing with it for quite a long time, I do not know what it is. I admit I am still a rookie to this bash scripting game, and not a very good one at that. Any help would certainly be appreciated.
You probably meant:
while read line; do
VALUE=$line ## No spaces allowed
python fits_edit_head.py "$line" "$line" NEW_PARA 5 ## Quote properly to isolate arguments well
echo "$VALUE+huh" ## You don't expand without $
done < infile.txt
Python may also read STDIN so that it could accidentally read input from infile.txt so you can use another file descriptor:
while read -u 4 line; do
...
done 4< infile.txt
Better yet if you're using Bash 4.0, it's safer and cleaner to use readarray:
readarray -t lines < infile.txt
for line in "${lines[#]}; do
...
done

Categories