Use of subprocess with Linux pipe command

Use of subprocess with Linux pipe command - python

I want to run from Python script next command:
strings <FILE NAME> | grep "Version = <VERSION STRING>" > /dev/null
I need to save command return code and command output for following script logic.
Currently I used next code:
strings_out = subprocess.Popen(('strings', file), stdout=subprocess.PIPE)
grep_output = subprocess.check_output(('grep', "Version = " + version_string), stdin=strings_out.stdout)
strings_out.wait()
I get error
subprocess.CalledProcessError: Command '('grep', 'Version = <VERSION STRING>')' returned non-zero exit status 1
My assumption is that check_output run out of memory.
What is wrong in my use of subprocess?

A non-zero exit status for a check_output means that the bash command had a problem - I don't think you ran out of memory.
On testing myself, I found that if I gave grep a string that exists within a file, I got a proper output with your code (I'm not using Version because I don't know what input files you have, but otherwise things are just about the same). I do, however, get the same error you get if I grep a string that doesn't exist.
Maybe you are running it on a file that string doesn't output any instances of "Version = " + version_string. If you are in a loop, it would only take one file to not have the proper string to get the error.
On another note, if you plan on finishing this line: strings <FILE NAME> | grep "Version = <VERSION STRING>" > /dev/null with subprocess, you'll be piping the output to /dev/null. In this case, you won't see the output of grep.

As #samsonjm has mentioned, every successfully ran bash command has the exit code = 0. It implies that the grep command has failed. Moreover, there is no clue for an OutofMemory error.
I suspect that input file to the strings command is large and hence it could be taking more time to return its result. Therefore, I suspect the string_out.wait() directive should be called immediately after the first line above to feed in the input from stdin to the grep command. It is reasonable to think in this way as the subprocess executes commands in a child process that might be running until completion.
strings_out = subprocess.Popen(('strings', file), stdout=subprocess.PIPE)
strings_out.wait()
grep_output = subprocess.check_output(('grep', "Version = " + version_string), stdin=strings_out.stdout)

That's neat, I've never thought to use subprocess stdin/stdout like that before. However, my advice would be to either go pure Python and write a method to search for the string in a file, or get a little fancier with your subprocess line.
Python might look something like:
import os
search_term = bytes("Version = " + version_string, encoding='utf-8')
i = 0
found = False
file_size = os.stat(f).st_size
chunk_size = len(search_term) *10
with open(file_name, 'rb') as f:
while f.tell() < size:
x = f.read() #read a small amount of data
i += chunk_size - len(search_term) #to make sure we don't miss the search_term
f.seek(i)
if search_term in x:
found = True
break
For subprocess:
cmd = f'strings {file_name} | grep "Version = {version_string}"'
test = subprocess.run([cmd], shell=True, capture_output=True)
test.returncode

Related

issue parsing shell program through python

My function run_deinterleave() is meant to copy code from the file deinterleave.sh then replace the placeholder (sra_data) with a file name which has been input by the user and then run it on the command line.
def run_deinterleave():
codes = open('Project/CODE/deinterleave.sh')
codex = codes.read()
print(inp_address)
codex = codex.replace('sra_data', inp_address)
#is opening this twice creating another pipeline?
stream = os.popen(codex)
codes.close()
self.txtarea.insert(END,codex)
#stuff
However, I keep getting this error:
/bin/sh: 5: Syntax error: "(" unexpected
The code in deinterleave.sh works fine and produces two individual files given an interleaved paired end sra_file (an output file from genetic sequencing machines, I think :P)
#1deinterleave paired end fastq file
paste - - - - - - - - < sra_data \
| tee >(cut -f 1-4 | tr "\t" "\n" > /home/lols/Project/reads-1.fq) \
| cut -f 5-8 | tr "\t" "\n" > /home/lols/Project/reads-2.fq

As the error message shows, the code was interpreted by /bin/sh; if you executed
/bin/sh Project/CODE/deinterleave.sh, you'd get the same error, because the process substitution >(…) is a Bash extension not understood by /bin/sh.
Besides, since you don't communicate with the shell code, we don't need pipes at all. So instead of os.popen I'd use subprocess.run, which allows to specify Bash as the shell.
subprocess.run(codex, shell=True, executable="bash")

The absolutely best fix is probably to replace the shell script with native Python code; but without a specification and/or sample input, I don't think we can tell you exactly how to do that.
An immediate and trivial fix is to change deinterlace so that it accepts an input file parameter.
#!/usr/bin/env bash
paste - - - - - - - - < "${1-sra_data}" |
tee >(cut -f 1-4 | tr "\t" "\n" > "${2-/home/lols/Project/reads-1.fq}") |
cut -f 5-8 | tr "\t" "\n" > "${3-/home/lols/Project/reads-2.fq}"
This refactoring also allows you to specify the names of the output files as the second and third command-line arguments.
Also, a Bash script really should not have a .sh extension, so probably take that out.
Explictly naming Bash in the shebang line should solve the error message you got when running Bash code in sh; perhaps see also Difference between sh and bash
With that, your Python code can be reduced to something like
subprocess.run(
['Project/CODE/deinterleave', inp_address],
# probably a good idea
check=True)
though I don't exactly understand the rest of the surrounding function, so it's not clear how exactly to rewrite it.
I think the shell script could be reimplemented something like
with open(inp_address, 'r') as sra_data, open(
'/home/lols/Project/reads-1.fq', 'w') as first, open(
'/home/lols/Project/reads-2.fq', 'w') as second:
for idx in range(4):
first.write(sra_data.readline())
for idx in range(4):
second.write(sra_data.readline())

I am trying to print the last line of every file in a directory using shell command from python script

I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')

Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.

os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.

You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.

you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory

Running bash command on server

I am trying to run the bash command pdfcrack in Python on a remote server. This is my code:
bashCommand = "pdfcrack -f pdf123.pdf > myoutput.txt"
import subprocess
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
I, however, get the following error message:
Non-option argument myoutput2.txt
Error: file > not found
Can anybody see my mistake?

The first argument to Popen is a list containing the command name and its arguments. > is not an argument to the command, though; it is shell syntax. You could simply pass the entire line to Popen and instruct it to use the shell to execute it:
process = subprocess.Popen(bashCommand, shell=True)
(Note that since you are redirecting the output of the command to a file, though, there is no reason to set its standard output to a pipe, because there will be nothing to read.)
A better solution, though, is to let Python handle the redirection.
process = subprocess.Popen(['pdfcrack', '-f', 'pdf123.pdf'], stdout=subprocess.PIPE)
with open('myoutput.txt', 'w') as fh:
for line in process.stdout:
fh.write(line)
# Do whatever else you want with line
Also, don't use str.split as a replacement for the shell's word splitting. A valid command line like pdfcrack -f "foo bar.pdf" would be split into the incorrect list ['pdfcrack', '-f', '"foo', 'bar.pdf"'], rather than the correct list ['pdfcrack', '-f', 'foo bar.pdf'].

> is interpreted by shell, but not valid otherwise.
So, that would work (don't split, use as-is):
process = subprocess.Popen(bashCommand, shell=True)
(and stdout=subprocess.PIPE isn't useful since all output is redirected to the output file)
But it could be better with native python for redirection to output file and passing arguments as list (handles quote protection if needed)
with open("myoutput.txt","w") as f:
process = subprocess.Popen(["pdfcrack","-f","pdf123.pdf"], stdout=subprocess.PIPE)
f.write(process.read())
process.wait()

Your mistake is > in command.
It doesn't treat this as redirection to file because normally bash does it and now you run it without using bash.
Try with shell=True if you whan to use bash. And then you don't have to split command into list.
subprocess.Popen("pdfcrack -f pdf123.pdf > myoutput.txt", shell=True)

Python subprocess library: Running grep command from Python

I am trying to run grep command from my Python module using the subprocess library. Since, I am doing this operation on the doc file, I am using Catdoc third party library to get the content in a plan text file. I want to store the content in a file. I don't know where I am going wrong but the program fails to generate a plain text file and eventually to get the grep result. I have gone through the error log but its empty. Thanks for all the help.
def search_file(name, keyword):
#Extract and save the text from doc file
catdoc_cmd = ['catdoc', '-w' , name, '>', 'testing.txt']
catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
output = catdoc_process.communicate()[0]
grep_cmd = []
#Search the keyword through the text file
grep_cmd.extend(['grep', '%s' %keyword , 'testing.txt'])
print grep_cmd
p = subprocess.Popen(grep_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
stdoutdata = p.communicate()[0]
print stdoutdata

On UNIX, specifying shell=True will cause the first argument to be treated as the command to execute, with all subsequent arguments treated as arguments to the shell itself. Thus, the > won't have any effect (since with /bin/sh -c, all arguments after the command are ignored).
Therefore, you should actually use
catdoc_cmd = ['catdoc -w "%s" > testing.txt' % name]
A better solution, though, would probably be to just read the text out of the subprocess' stdout, and process it using re or Python string operations:
catdoc_cmd = ['catdoc', '-w' , name]
catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE)
for line in catdoc_process.stdout:
if keyword in line:
print line.strip()

I think you're trying to pass the > to the shell, but that's not going to work the way you've done it. If you want to spawn a process, you should arrange for its standard out to be redirected. Fortunately, that's really easy to do; all you have to do is open the file you want the output to go to for writing and pass it to popen using the stdout keyword argument, instead of PIPE, which causes it to be attached to a pipe which you can read with communicate().

Avoid subprocess.Popen auto escaping my backslashes in grep

I'm trying to write an svn pre-commit hook in python. Part of this involves checking the diff file to see if there are any actual file changes (as opposed to just property changes).
I have a working grep command which I can execute fine on the shell
grep "^\(Added: \|Modified: \|Deleted: \)" diff filename | grep -v 'svn:'
However when I put it through subprocess.POpen it escapes all my backslashes, which knackers the regexp.
Executing command: ['grep', '"^\\Added: \\|Modified: \\|Deleted: \\)", ...]
How do I avoid this?
NB: I'm aware that I can pipe results between subprocesses and I can do the two greps that way. I need help getting the first one working first though :/
NB2: I also tried using filterdiff --clean instead and couldn't get it to work. Searching for Added, Modified or Deleted lines, removing those with 'svn:' in and checking I had some results seemed to work though.
Python code:
command = ['grep', '"^\(Added: \|Modified: \|Deleted: \)"', filename]
sys.stdout.write('Executing command: %s\n' % (command))
p = subprocess.Popen(command,
stdin = subprocess.PIPE
stdout = subprocess.PIPE
stderr = subprocess.STDOUT
shell = True)
data = p.stdout.read()
if len(data) == 0:
sys.stdout.write("Diff does not contain any file modifications./n")
exit(0)

You need to consider what you want grep to see in its command line arguments.
The first argument needs to be the literal string "^\(Added: \|Modified: \|Deleted: \)", so that means that it shouldn't include the double quotes but should include the backslashes.
The way to express this kind of string is to use Python raw strings:
command = ['grep', r'^\(Added: \|Modified: \|Deleted: \)', filename]
A good way to check what you're actually running is to replace grep by echo so you can at least see what you're passing to the command.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Use of subprocess with Linux pipe command - python

Related

issue parsing shell program through python

I am trying to print the last line of every file in a directory using shell command from python script

Running bash command on server

Python subprocess library: Running grep command from Python

Avoid subprocess.Popen auto escaping my backslashes in grep

Categories

Resources