awk commands within python script - python

I need to write a python script where I need to call a few awk commands inside of it.
#!/usr/bin/python
import os, sys
input_dir = '/home/abc/data'
os.chdir(input_dir)
#wd=os.getcwd()
#print wd
os.system ("tail -n+2 ./*/*.tsv|cat|awk 'BEGIN{FS="\t"};{split($10,arr,"-")}{print arr[1]}'|sort|uniq -c")
It gives an error in line 8: SyntaxError: unexpected character after line continuation character
Is there a way I can get the awk command get to work within the python script?
Thanks

You have both types of quotes in that string, so use triple quotes around the whole thing
>>> x = '''tail -n+2 ./*/*.tsv|cat|awk 'BEGIN{FS="\t"};{split($10,arr,"-")}{print arr[1]}'|sort|uniq -c'''
>>> x
'tail -n+2 ./*/*.tsv|cat|awk \'BEGIN{FS="\t"};{split($10,arr,"-")}{print arr[1]}\'|sort|uniq -c'

You should use subprocess instead of os.system:
import subprocess
COMMAND = "tail -n+2 ./*/*.tsv|cat|awk 'BEGIN{FS=\"\t\"};{split($10,arr,\"-\")}{print arr[1]}'|sort|uniq -c"
subprocess.call(COMMAND, shell=True)
As TehTris has pointed out, the arrangement of quotes in the question breaks the command string into multiple strings. Pre-formatting the command and escaping the double-quotes fixes this.

Related

sed error: sed: -e expression #1, char 22: unterminated `s' command

I am running the sed command inside python using os.system. Below is the code.
os.system("sed -i /solid/s/Visualization Toolkit generated SLA File/chestwall/g mesh1.stl")
The name to be changed has spaces in it. Also, in the end part i.e. mesh1.stl, the 1 need to be variable. How to do it?
Firstly, for this code, I am getting error as:
sed: -e expression #1, char 22: unterminated s command
I tried putting / at the end.
Second, I need the mesh1 to be a variable from previous line. Say, mesh1 as a and everytime, a changes. How to write like that?
Make sure that the sed statement/command is in either double or single quotes and then use "+" to concatenate strings before passing them to os.system
import os
var=1
os.system("sed -i 's/solid/s/Visualization Toolkit generated SLA File/chestwall/g' mesh" + var + ".stl")
The function os.system() is now considered to be superseded by
subprocess.call().
Would you please try the following:
import subprocess
a = 'mesh1'
cmd = ['sed', '-i', '/solid/s/Visualization Toolkit generated SLA File/chestwall/g', '{0}.stl'.format(a)]
subprocess.call(cmd)
You can pass the command as a list, not a string, and you can explicitly divide the arguments.

Fail to run a bash command on python with subprocess and sed

My goal is to execute the following bash command in Python and store its output:
echo 'sudo ./run_script.sh -dates \\{\\'2017-11-16\\',\\'2017-11-29\\'\\}'|sed 's;\\\\;\\;'
When I run this command in bash, the output is: sudo ./run_script.sh -dates \{\'2019-10-05\',\'2019-10-04\'\}
My initial idea was to replace the double backslash by a single backslash in Python. As ridiculous as it seems, I couldn't do it in Python (only when using print() the output is as I would like but I can't store the output of print() and str() doesn't convert \ to . So I decided to do it in bash.
import subprocess
t= 'some \\ here'
cmd = "echo \'"+ t+"\'|sed 's;\\\\;\\;'"
ps = subprocess.run(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
ps.stdout
Out[6]: b"sed: -e expression #1, char 7: unterminated `s' command\n"
Running Python 3.6.8 on Ubuntu 18
Try using subprocess.check_output instead. You're also forgetting an extra backslash for every backslash in your command.
import subprocess
command = "echo 'some \\\\here'|sed 's;\\\\\\\\;\\\\;'"
output = subprocess.check_output(command, shell=True).decode()
print(output) # prints your expect result "some \here"
After re-reading your question I kinda understood what you wanted.
a = r'some \here'
print(a) #some \here
Again, raw string literals...

Escaping quotation marks in python string

I'm using subprocess to call a program within python and I'm passing a string to it, which can contain quotation marks.
This is the piece of code that is giving me troubles
import subprocess
text = subprocess.Popen("""awk 'BEGIN { print "%s"}' | my_program """ % sentence, stdout=subprocess.PIPE, shell=True)
When sentence = "I'm doing this" I get the following error message
/bin/sh: -c: line 0: unexpected EOF while looking for matching `"'
/bin/sh: -c: line 1: syntax error: unexpected end of file
I guess this has to do with the way quotes are escaped in python and linux. Is there a way to fix it?
you're confusing awk and underlying shell because there's a quote in your quoted awk expression. First part is equivalent to:
awk 'BEGIN { print "I'm doing this"}'
Which is incorrect, even in pure shell.
Quickfix, escape the quotes in your sentence:
text = subprocess.Popen("""awk 'BEGIN { print "%s"}' | my_program """ % sentence.replace("'","\\'"), stdout=subprocess.PIPE, shell=True)
Proper fix: don't use awk at all just to print something, just feed input to your subprocess:
text = subprocess.Popen(my_program, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
output,error = text.communicate(sentence.encode())
(and you can get rid of the shell=True in the process)
Last point: you seem to have trouble because my_program is some program plus arguments. To pass a command such as aspell -a you can do:
my_program = "aspell -a"
or:
my_program = ['aspell','-a']
but not
my_program = ['aspell -a']
which is probably what you've done here, so Python tries to literally execute the program "aspell -a" instead of splitting into program + argument.

Python plumbum: Passing $ in an cmd argument

I am trying to execute the following command in python using plumbum:
sort -u -f -t$'\t' -k1,1 file1 > file2
However, I am having issues passing the -t$'\t' argument. Here is my code:
from plumbum.cmd import sort
separator = r"-t$'\t'"
print separator
cmd = (sort["-u", "-f", separator, "-k1,1", "file1"]) > "file2"
print cmd
print cmd()
I can see problems right away after print separator and print cmd() executes:
-t$'\t'
/usr/bin/sort -u -f "-t\$'\\t'" -k1,1 file1 > file2
The argument is wrapped in double quotes.
An extra \ before $ and \t is inserted.
How should I pass this argument to plumbum?
You may have stumbled into limitations of the command line escaping.
I could make it work using subprocess module, passing a real tabulation char litteraly:
import subprocess
p=subprocess.Popen(["sort","-u","-f","-t\t","-k1,1","file1",">","file2"],shell=True)
p.wait()
Also, full python short solution that does what you want:
with open("file1") as fr, open("file2","w") as fw:
fw.writelines(sorted(set(fr),key=lambda x : x.split("\t")[0]))
The full python solution doesn't work exactly the same way sort does when dealing with unicity. If 2 lines have the same first field but not the same second field, sort keeps one of them, whereas the set will keep both.
EDIT: unchecked but you just confirmed that it works: just tweak your plumbum code with:
separator = "-t\t"
could just work, although out of the 3 ones, I'd recommend the full python solution since it doesn't involve an external process and therefore is more pythonic and portable.

Avoid subprocess.Popen auto escaping my backslashes in grep

I'm trying to write an svn pre-commit hook in python. Part of this involves checking the diff file to see if there are any actual file changes (as opposed to just property changes).
I have a working grep command which I can execute fine on the shell
grep "^\(Added: \|Modified: \|Deleted: \)" diff filename | grep -v 'svn:'
However when I put it through subprocess.POpen it escapes all my backslashes, which knackers the regexp.
Executing command: ['grep', '"^\\Added: \\|Modified: \\|Deleted: \\)", ...]
How do I avoid this?
NB: I'm aware that I can pipe results between subprocesses and I can do the two greps that way. I need help getting the first one working first though :/
NB2: I also tried using filterdiff --clean instead and couldn't get it to work. Searching for Added, Modified or Deleted lines, removing those with 'svn:' in and checking I had some results seemed to work though.
Python code:
command = ['grep', '"^\(Added: \|Modified: \|Deleted: \)"', filename]
sys.stdout.write('Executing command: %s\n' % (command))
p = subprocess.Popen(command,
stdin = subprocess.PIPE
stdout = subprocess.PIPE
stderr = subprocess.STDOUT
shell = True)
data = p.stdout.read()
if len(data) == 0:
sys.stdout.write("Diff does not contain any file modifications./n")
exit(0)
You need to consider what you want grep to see in its command line arguments.
The first argument needs to be the literal string "^\(Added: \|Modified: \|Deleted: \)", so that means that it shouldn't include the double quotes but should include the backslashes.
The way to express this kind of string is to use Python raw strings:
command = ['grep', r'^\(Added: \|Modified: \|Deleted: \)', filename]
A good way to check what you're actually running is to replace grep by echo so you can at least see what you're passing to the command.

Categories