python subprocess grep returns zero - python

I'm trying to count the number of times the string OW appears in a file with the following scrip,
import subprocess
subprocess.call("grewp Ow file.txt | wc -l", shell=True)
but it always returns the correct answer followed by zero
>>> subprocess.call("grep OW production_run.gro | wc -l", shell=True)
2638
0
>>>
and when I try to declare a variable with it, it stores 0.
Does anyone here have any idea of why that's happening and how to fix it?

Python's docs about subprocess.call
Run the command described by args. Wait for command to complete, then
return the returncode attribute.
Try check_output instead

based on document subprocess.check_output :
Run command with arguments and return its output as a byte string.
Thus you can use this:
count = int(subprocess.check_output("grewp Ow file.txt | wc -l", shell=True))

Related

Executing awk in Python shell

I have a shell command which parses a certain content and gives the required output. I need to implement this in python but the shell command has a new line character "\n" which is not getting getting executed when run through python command.
Of the many lines in the output log, the required line looks like - configurationFile=/app/log/conf/the_jvm_name.4021.logback.xml
I would only need the_jvm_name from the above. The syntax will always be the same. The shell command works fine.
Shell Command -
ps -ef | grep 12345 | tr " " "\n" | grep logback.configurationFile | awk -F"/" '{print $NF}'| cut -d. -f1
Python (escaped all the required double quotes) -
import subprocess
pid_arr = "12345"
sh_command = "ps -ef | grep "+pid_arr+" | tr \" \" \"\n\" | grep configurationFile | awk -F \"/\" '{print $NF}' | cut -d. -f1"
outpt = subprocess.Popen(sh_command , shell=True,stdout=subprocess.PIPE).communicate()[0].decode('utf-8').strip()
With python, I'm not getting the desired output. It just prints configurationFile as it is in the command.
what am I missing here. Any other better way for getting this details?
You can achieve what you want using a regex substitution in Python:
output = subprocess.check_output(["ps", "-ef"])
for line in output.splitlines():
if re.search("12345", line):
output = re.sub(r".*configurationFile=.*/([^.]+).*", r"\1", line)
This captures the part after the last / in the configuration file path, up to the next ..
You could make it slightly more robust by checking only the second column (the PID) for 12345, either by splitting each line on white space:
cols = re.split("\s+", line)
if len(cols) > 1 and cols[1] == "12345":
or by using a better regex, like:
if re.match(r"\S+\s+12345\s", line):
Note that you could also shorten your pipe considerable by just doing something like:
ps -ef | sed -nE '/12345/ { s/.*configurationFile=.*\/([^.]*).*/\1/; p }'
Your shell command works, but it has to deal with too many lines of output and too many fields per line. An easier solution is to tell the ps command to just give you 1 line and on that line, just one field that you care about. For example, on my system:
ps -o cmd h 979
will output:
/usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
The -o cmd flag will output only the CMD column of the output, while the h parameter represents a command to tell ps to omit the header. Finally, the 979 is the process ID, which tells ps to output information just for this process.
This output is not exactly what you have in your problem, but similar enough. Once we limited the output, we eliminate the need for other commands such as grep, awk, ... At this point, we can use regular expression to extract what we want:
from __future__ import print_function
import re
import subprocess
pid = '979'
command = ['ps', '-o', 'cmd', 'h', pid]
output = subprocess.check_output(command)
pattern = re.compile(r"""
config-file= # Literal string search
.+\/ # Everything up to the last forward slash
([^.]+) # Non-dot chars, this is what we want
""", re.VERBOSE)
matched = pattern.search(output)
if matched:
print(matched.group(1))
Notes
For the regular expression, I am using a verbose form, allowing me to use comment to annotate my pattern. I like this way as regular expression can be difficult to read
On your system, please adjust the "configuration-file" part to work with your output.

How to remove b symbol in python3

How to remove b symbol from python3 script?
import subprocess
get_data=subprocess.check_output(["df -k | awk '{print $6}'"],shell=True)
data_arr=get_data.split()
data_arr.pop(0)
data_arr.pop(0)
for i in data_arr[:]:
print(str(i))
Output
b'/dev/shm'
b'/run'
b'/sys/fs/cgroup'
b'/'
b'/tmp'
b'/test'
b'/boot'
b'/home'
b'/var'
b'/mnt/install'
b'/mnt/snapshot'
b'/mnt/share'
b'/mnt/storage'
b'/mnt/linux'
b'/mnt/download'
b'/run/user/1001'
The b symbol indicates that the output of check_process is a bytes rather than a str. The best way to remove it is to convert the output to string before you do any further work on it:
byte_data=subprocess.check_output(["df -k | awk '{print $6}'"],shell=True)
str_data = byte_data.decode('utf-8')
data_arr=str_data.split()
...
The decode method will take care of any unicode you may have in the string. If your default encoding (or the one used by awk I suppose) is not UTF-8, substitute the correct one in the example above.
Possibly an even better way to get around this issue is to tell check_output to open stdout as a text stream. The easiest way is to add a universal_newlines=True argument, which will use the default encoding for your current locale:
str_data = subprocess.check_output(["df -k | awk '{print $6}'"], shell=True, universal_newlines=True)
Alternatively, you can specify an explicit encoding:
str_data = subprocess.check_output(["df -k | awk '{print $6}'"], shell=True, encoding='utf-8')
In both of these cases, you do not need to decode because the output will already be str rather than bytes.
from my SO question:
read_key = ["binary", "arg1", "arg2", "arg3"]
proc = subprocess.Popen(read_key, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding='utf-8')
output = proc.communicate()[0]
print(output)
MY_EXPECTED_OUTPUT_STRING
try this:
a=str(yourvalue,'utf-8')

Escaping quotation marks in python string

I'm using subprocess to call a program within python and I'm passing a string to it, which can contain quotation marks.
This is the piece of code that is giving me troubles
import subprocess
text = subprocess.Popen("""awk 'BEGIN { print "%s"}' | my_program """ % sentence, stdout=subprocess.PIPE, shell=True)
When sentence = "I'm doing this" I get the following error message
/bin/sh: -c: line 0: unexpected EOF while looking for matching `"'
/bin/sh: -c: line 1: syntax error: unexpected end of file
I guess this has to do with the way quotes are escaped in python and linux. Is there a way to fix it?
you're confusing awk and underlying shell because there's a quote in your quoted awk expression. First part is equivalent to:
awk 'BEGIN { print "I'm doing this"}'
Which is incorrect, even in pure shell.
Quickfix, escape the quotes in your sentence:
text = subprocess.Popen("""awk 'BEGIN { print "%s"}' | my_program """ % sentence.replace("'","\\'"), stdout=subprocess.PIPE, shell=True)
Proper fix: don't use awk at all just to print something, just feed input to your subprocess:
text = subprocess.Popen(my_program, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
output,error = text.communicate(sentence.encode())
(and you can get rid of the shell=True in the process)
Last point: you seem to have trouble because my_program is some program plus arguments. To pass a command such as aspell -a you can do:
my_program = "aspell -a"
or:
my_program = ['aspell','-a']
but not
my_program = ['aspell -a']
which is probably what you've done here, so Python tries to literally execute the program "aspell -a" instead of splitting into program + argument.

Storing value from a parsed ping

I'm working on some code that performs a ping operation from python and extracts only the latency by using awk. This is currently what I have:
from os import system
l = system("ping -c 1 sitename | awk -F = 'FNR==2 {print substr($4,1,length($4)-3)}'")
print l
The system() call works fine, but I get an output in terminal rather than the value storing into l. Basically, an example output I'd get from this particular block of code would be
90.3
0
Why does this happen, and how would I go about actually storing that value into l? This is part of a larger thing I'm working on, so preferably I'd like to keep it in native python.
Use subprocess.check_output if you want to store the output in a variable:
from subprocess import check_output
l = check_output("ping -c 1 sitename | awk -F = 'FNR==2 {print substr($4,1,length($4)-3)}'", shell=True)
print l
Related: Extra zero after executing a python script
os.system() returns the return code of the called command, not the output to stdout.
For detail on how to properly get the command's output (including pre-Python 2.7), see this: Running shell command from Python and capturing the output
BTW I would use Ping Package https://pypi.python.org/pypi/ping
It looks promising
Here is how I store output to a variable.
test=$(ping -c 1 google.com | awk -F"=| " 'NR==2 {print $11}')
echo "$test"
34.9

Avoid subprocess.Popen auto escaping my backslashes in grep

I'm trying to write an svn pre-commit hook in python. Part of this involves checking the diff file to see if there are any actual file changes (as opposed to just property changes).
I have a working grep command which I can execute fine on the shell
grep "^\(Added: \|Modified: \|Deleted: \)" diff filename | grep -v 'svn:'
However when I put it through subprocess.POpen it escapes all my backslashes, which knackers the regexp.
Executing command: ['grep', '"^\\Added: \\|Modified: \\|Deleted: \\)", ...]
How do I avoid this?
NB: I'm aware that I can pipe results between subprocesses and I can do the two greps that way. I need help getting the first one working first though :/
NB2: I also tried using filterdiff --clean instead and couldn't get it to work. Searching for Added, Modified or Deleted lines, removing those with 'svn:' in and checking I had some results seemed to work though.
Python code:
command = ['grep', '"^\(Added: \|Modified: \|Deleted: \)"', filename]
sys.stdout.write('Executing command: %s\n' % (command))
p = subprocess.Popen(command,
stdin = subprocess.PIPE
stdout = subprocess.PIPE
stderr = subprocess.STDOUT
shell = True)
data = p.stdout.read()
if len(data) == 0:
sys.stdout.write("Diff does not contain any file modifications./n")
exit(0)
You need to consider what you want grep to see in its command line arguments.
The first argument needs to be the literal string "^\(Added: \|Modified: \|Deleted: \)", so that means that it shouldn't include the double quotes but should include the backslashes.
The way to express this kind of string is to use Python raw strings:
command = ['grep', r'^\(Added: \|Modified: \|Deleted: \)', filename]
A good way to check what you're actually running is to replace grep by echo so you can at least see what you're passing to the command.

Categories