Calling Python subprocess with variable in for loop - python

I am attempting to call a bash script via the subprocess Popen function passes in a for loop. My intent is that with each iteration, a new string commit from an array out is passed as an argument to the Popen command. The command invokes a bash script that outputs a text identified by the variable commit and greps certain lines from that particular text. However, I can't get the output to flush out in the Python for loop. Right now, only the grepped data from the final commit in out is being passed into my final data structure (a pandas dataframe).
accuracy_dictionary = {}
for commit in out:
accuracy_dictionary.setdefault(commit, {})
p2 = subprocess.Popen(['~/Desktop/find_accuracies.sh', commit], encoding='utf-8', shell=True, stdout=subprocess.PIPE)
outputstring = p2.stdout.read()
# This part below is less critical to the problem at hand
# I'm putting the data from each file in a dictionary
for acc_type_line in outputstring.split('\n'):
accuracy = acc_type_line.split(': ')
if accuracy != ['']:
acc_type = accuracy[0]
value = accuracy[1]
accuracy_dictionary[commit][acc_type] = float(value)
acc_data = pd.DataFrame.from_dict(accuracy_dictionary).T
Here is the bash script that is being called:
"find_accuracies.sh":
#!/bin/sh
COMMIT=$1
git show $COMMIT:blahblahfolder/blahblah.txt | grep --line-buffered 'accuracy'
acc_data returns a dataframe of nrows=len(out) populated by unique commits, but the value is the exact same for all rows for each acc_type
For example, my output looks like this:
How can I call the file "find_accuracies.sh" with the subprocess command and have it flush the unique values of each file for each commit?

I hope this help addressing the immediate problem you're seeing: Here you should really use communicate with subprocess.PIPE as it waits for the command to finish and give give you all of its output:
outputstring = p2.communicate()[0]
You can also use convenient method like check_output to the same effect:
outputstring = subprocess.check_output(['~/Desktop/find_accuracies.sh', commit],
encoding='utf-8', shell=True)
Or also in py3 use run should also do:
p2 = subprocess.run(['~/Desktop/find_accuracies.sh', commit],
encoding='utf-8', shell=True, stdout=subprocess.PIPE)
outputstring = p2.stdout
Now few more comments, hints and suggestions:
I am a little surprised it works for you as using shell=True and list of arguments should (see the paragraph starting with "On POSIX with shell=True") make your commit argument of the underlying sh wrapped around your script call and not of the script itself. In any case you can (and I would suggest to) actually drop the shell and leave HOME resolution to python:
from pathlib import Path
executable = Path.home().joinpath('Desktop/find_accuracies.sh')
p2 = subprocess.run([executable, commit],
encoding='utf-8', stdout=subprocess.PIPE)
outputstring = p2.stdout
You can (or must for py <3.5) also use os.path.expanduser('~/Desktop/find_accuracies.sh') instead of Path.home() to get script executable. On the other hand for >=3.7 you could replace stdout=subprocess.PIPE with capture_output=True.
And last but not least. It seems a bit unnecessary to call a bash script (esp. double wrapped in sh call like in the original example) just to run git through grep when we already have a python script to process the information. I would actually try to run the corresponding git command directly getting the bulk of its output and process its output in the python script itself to get the bits of interest.

Related

Iterate through linux sort output in python

I am having trouble finding a solution to utilize linux sort command as an input to my python script.
For example I would like to iterate through the result of sort -mk1 <(cat file1.txt) <(cat file2.txt))
Normally I would use Popen and iterate through it using next and stdout.readline(), something like:
import os
import subprocess
class Reader():
def __init__(self):
self.proc = subprocess.Popen(['sort -mk1', '<(', 'cat file1.txt', ')', '<(', 'cat file2.txt', ')'], stdout=subprocess.PIPE)
def __iter__(self):
return self
def __next__(self):
while True:
line = self.proc.stdout.readline()
if not line:
raise StopIteration
return line
p = Reader()
for line in p:
# only print certain lines based on some filter
With the above, I would get an error: No such file or directory: 'sort -mk1'
After doing some research, I guess I cant use Popen, and have to use os.execl to utilize bin/bash
So now I try below:
import os
import subprocess
class Reader():
def __init__(self):
self.proc = os.execl('/bin/bash', '/bin/bash', '-c', 'set -o pipefail; sort -mk1 <(cat file1.txt) <(cat file2.txt)')
def __iter__(self):
return self
def __next__(self):
while True:
line = self.proc.stdout.readline()
if not line:
raise StopIteration
return line
p = Reader()
for line in p:
# only print certain lines based on some filter
The problem with this is that it actually prints all the lines right away. I guess one solution is to just pipe its results to a file, then in python I iterate through that file. But I dont really want to save it to a file then filter it, seems unneccessary. Yes I can use other linux commands such as awk, but I would like to use python for further processing.
So questions are:
Is there a way to make solution one with Popen to work?
How can I iterate through the output of sort using the second solution?
If you want to use shell features, you have to use shell=True. If you want to use Bash features, you have to make sure the shell you run is Bash.
self.proc = subprocess.Popen(
'sort -mk1 <(cat file1.txt) <(cat file2.txt)',
stdout=subprocess.PIPE,
shell=True,
executable='/bin/bash')
Notice how with shell=True the first argument to Popen and friends is a single string (and vice versa; if you don't have shell=True you have to parse the command line into tokens yourself).
Of course, the cats are useless but if you replace them with something which the shell performs easily and elegantly and which you cannot easily replace with native Python code, this is probably the way to go.
In brief, <(command) is a Bash process substitution; the shell will run command in a subprocess, and replace the argument with the device name of the open file handle where the process generates its output. So sort will see something like
sort -mk /dev/fd/63 /dev/fd/64
where /dev/fd/63 is a pipe where the first command's output is available, and /dev/fd/64 is the read end of the other command's standard output.
Quite a lot of problem in your scripts.
First, your Popen won't work because of several reasons:
First argument is supposed to be the command to run, and you passed sort -mk and there is no such file. You should simply pass sort, and pass -mk as arguments.
Process substituion <( command ) is something handled by the shell, for which it is doing something like running a command, create a FIFO and substitute it as the name of the FIFO. Passing these directly to sort is not going to work. sort will probably just treat <( as a filename.
Your second way using os.exec* won't work either because os.exec* is going to replace your current process. Hence it will never continue to next statement in your Python script.
In your case, there seems no reason using process substitution. Why can't you simply do somethng like subprocess.Popen(['sort', '-mk', 'filename1', 'filename2']) ?
I do not understand why you are doing sort -mk1 $(cat file), sort can operate on file. look at check_output. That will make your life simple
output=subprocess.check_output('ls')
for line in output:
print(line)
you will, of course, have to deal with the exceptions, the man page has the details

How to manipulate input in bash program with python [duplicate]

I'm trying to write a Python script that starts a subprocess, and writes to the subprocess stdin. I'd also like to be able to determine an action to be taken if the subprocess crashes.
The process I'm trying to start is a program called nuke which has its own built-in version of Python which I'd like to be able to submit commands to, and then tell it to quit after the commands execute. So far I've worked out that if I start Python on the command prompt like and then start nuke as a subprocess then I can type in commands to nuke, but I'd like to be able to put this all in a script so that the master Python program can start nuke and then write to its standard input (and thus into its built-in version of Python) and tell it to do snazzy things, so I wrote a script that starts nuke like this:
subprocess.call(["C:/Program Files/Nuke6.3v5/Nuke6.3", "-t", "E:/NukeTest/test.nk"])
Then nothing happens because nuke is waiting for user input. How would I now write to standard input?
I'm doing this because I'm running a plugin with nuke that causes it to crash intermittently when rendering multiple frames. So I'd like this script to be able to start nuke, tell it to do something and then if it crashes, try again. So if there is a way to catch a crash and still be OK then that'd be great.
It might be better to use communicate:
from subprocess import Popen, PIPE, STDOUT
p = Popen(['myapp'], stdout=PIPE, stdin=PIPE, stderr=PIPE)
stdout_data = p.communicate(input='data_to_write')[0]
"Better", because of this warning:
Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
To clarify some points:
As jro has mentioned, the right way is to use subprocess.communicate.
Yet, when feeding the stdin using subprocess.communicate with input, you need to initiate the subprocess with stdin=subprocess.PIPE according to the docs.
Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE. Similarly, to get anything other than None in the result tuple, you need to give stdout=PIPE and/or stderr=PIPE too.
Also qed has mentioned in the comments that for Python 3.4 you need to encode the string, meaning you need to pass Bytes to the input rather than a string. This is not entirely true. According to the docs, if the streams were opened in text mode, the input should be a string (source is the same page).
If streams were opened in text mode, input must be a string. Otherwise, it must be bytes.
So, if the streams were not opened explicitly in text mode, then something like below should work:
import subprocess
command = ['myapp', '--arg1', 'value_for_arg1']
p = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output = p.communicate(input='some data'.encode())[0]
I've left the stderr value above deliberately as STDOUT as an example.
That being said, sometimes you might want the output of another process rather than building it up from scratch. Let's say you want to run the equivalent of echo -n 'CATCH\nme' | grep -i catch | wc -m. This should normally return the number characters in 'CATCH' plus a newline character, which results in 6. The point of the echo here is to feed the CATCH\nme data to grep. So we can feed the data to grep with stdin in the Python subprocess chain as a variable, and then pass the stdout as a PIPE to the wc process' stdin (in the meantime, get rid of the extra newline character):
import subprocess
what_to_catch = 'catch'
what_to_feed = 'CATCH\nme'
# We create the first subprocess, note that we need stdin=PIPE and stdout=PIPE
p1 = subprocess.Popen(['grep', '-i', what_to_catch], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# We immediately run the first subprocess and get the result
# Note that we encode the data, otherwise we'd get a TypeError
p1_out = p1.communicate(input=what_to_feed.encode())[0]
# Well the result includes an '\n' at the end,
# if we want to get rid of it in a VERY hacky way
p1_out = p1_out.decode().strip().encode()
# We create the second subprocess, note that we need stdin=PIPE
p2 = subprocess.Popen(['wc', '-m'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# We run the second subprocess feeding it with the first subprocess' output.
# We decode the output to convert to a string
# We still have a '\n', so we strip that out
output = p2.communicate(input=p1_out)[0].decode().strip()
This is somewhat different than the response here, where you pipe two processes directly without adding data directly in Python.
Hope that helps someone out.
Since subprocess 3.5, there is the subprocess.run() function, which provides a convenient way to initialize and interact with Popen() objects. run() takes an optional input argument, through which you can pass things to stdin (like you would using Popen.communicate(), but all in one go).
Adapting jro's example to use run() would look like:
import subprocess
p = subprocess.run(['myapp'], input='data_to_write', capture_output=True, text=True)
After execution, p will be a CompletedProcess object. By setting capture_output to True, we make available a p.stdout attribute which gives us access to the output, if we care about it. text=True tells it to work with regular strings rather than bytes. If you want, you might also add the argument check=True to make it throw an error if the exit status (accessible regardless via p.returncode) isn't 0.
This is the "modern"/quick and easy way to do to this.
One can write data to the subprocess object on-the-fly, instead of collecting all the input in a string beforehand to pass through the communicate() method.
This example sends a list of animals names to the Unix utility sort, and sends the output to standard output.
import sys, subprocess
p = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sys.stdout)
for v in ('dog','cat','mouse','cow','mule','chicken','bear','robin'):
p.stdin.write( v.encode() + b'\n' )
p.communicate()
Note that writing to the process is done via p.stdin.write(v.encode()). I tried using
print(v.encode(), file=p.stdin), but that failed with the message TypeError: a bytes-like object is required, not 'str'. I haven't figured out how to get print() to work with this.
You can provide a file-like object to the stdin argument of subprocess.call().
The documentation for the Popen object applies here.
To capture the output, you should instead use subprocess.check_output(), which takes similar arguments. From the documentation:
>>> subprocess.check_output(
... "ls non_existent_file; exit 0",
... stderr=subprocess.STDOUT,
... shell=True)
'ls: non_existent_file: No such file or directory\n'

Start a subprocess, wait for it to complete and then retrieve data in Python

I'm struggling to get some python script to start a subprocess, wait until it completes and then retrieve the required data. I'm quite new to Python.
The command I wish to run as a subprocess is
./bin.testing/Eva -t --suite="temp0"
Running that command by hand in the Linux terminal produces:
in terminal mode
Evaluation error = 16.7934
I want to run the command as a python sub-process, and receive the output back. However, everything I try seems to skip the second line (ultimately, it's the second line that I want.) At the moment, I have this:
def job(self,fen_file):
from subprocess import Popen, PIPE
from sys import exit
try:
eva=Popen('{0}/Eva -t --suite"{0}"'.format(self.exedir,fen_file),shell=True,stdout=PIPE,stderr=PIPE)
stdout,stderr=eva.communicate()
except:
print ('Error running test suite '+fen_file)
exit("Stopping")
print(stdout)
.
.
.
return 0
All this seems to produce is
in terminal mode
0
with the important line missing. The print statement is just so I can see what I am getting back from the sub-process -- the intention is that it will be replaced with code that processes the number from the second line and returns the output (here I'm just returning 0 just so I can get this particular bit to work first. The caller of this function prints the result, which is why there is a zero at the end of the output.) exedir is just the directory of the executable for the sub-process, and fen-file is just an ascii file that the sub-process needs. I have tried removing the 'in terminal mode' from the source code of the sub-process and re compiling it, but that doesn't work -- it still doesn't return the important second line.
Thanks in advance; I expect what I am doing wrong is really very simple.
Edit: I ought to add that the subprocess Eva can take a second or two to complete.
Since the 2nd line is an error message, it's probably stored in your stderr variable!
To know for sure you can print your stderr in your code, or you can run the program on the command line and see if the output is split into stdout and stderr. One easy way is to do ./bin.testing/Eva -t --suite="temp0" > /dev/null. Any messages you get are stderr since stdout is redirected to /dev/null.
Also, typically with Popen the shell=True option is discouraged unless really needed. Instead pass a list:
[os.path.join(self.exedir, 'Eva'), '-t', '--suite=' + fen_file], shell=False, ...
This can avoid problems down the line if one of your arguments would normally be interpreted by the shell. (Note, I removed the ""'s, because the shell would normally eat those for you!)
Try using subprocess check_output.
output_lines = subprocess.check_output(['./bin.testing/Eva', '-t', '--suite="temp0"'])
for line in output_lines.splitlines():
print(line)

How to run some other program interactively from a python script

I am new to python. I would like to run a "EDA tool" from python interactively.
Here are the steps I wanted to follow:
Start the tool
Run the first command in the tool
Check for the first command output or parse (in the main pyton) script
Run the second command
Parse the output in python script
[...]
x. Exit the tool
x+1. Do some post processing in main pyhon script
I am looking for some information or pointers related to it so that I can read on my own.
This depends on what you mean by a "command". Is each command a separate process (in the operating-systems definition of that word)? If so, it sounds like you need the subprocess module.
import subprocess
execNamePlusArgs = [ 'ls', '-l' ] # unix-like (i.e. non-Windows) example
sp = subprocess.Popen( execNamePlusArgs, stdout=subprocess.PIPE, stderr=subprocess.PIPE )
stdout, stderr = sp.communicate() # this blocks until the process terminates
print( stdout )
If you don't want it to block until termination (e.g. if you want to feed the subprocess line-by-line input and examine its output line by line) then you would define stdin=subprocess.PIPE as well and then, instead of communicate, you might use calls to sp.stdin.writeline(whatever), sp.stdout.readline() and sp.stderr.readline()
You should look into using something like python-fabric
It allows you to use higher level language constructs such as context managers and makes the shell more usable with python in general.
Example usage:
from fabric.operations import local
from fabric.context_managers import lcd
with lcd(".."): # Prefix all commands with 'cd.. &&'
ls = local('ls',capture=True) # Run 'ls' command and put result into variable
print ls
>>>
[localhost] local: ls
Eigene Bilder
Eigene Musik
Eigene Videos
SynKernelDiag2015-11-07_10-01-13.log
desktop.ini
foo
scripts

Sending multiple commands to a bash shell which must share an environment

I am attempting to follow this answer here: https://stackoverflow.com/a/5087695/343381
I have a need to execute multiple bash commands within a single environment. My test case is simple:
import subprocess
cmd = subprocess.Popen(['bash'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# Write the first command
command = "export greeting=hello\n"
cmd.stdin.write(command)
cmd.stdin.flush() # Must include this to ensure data is passed to child process
result = cmd.stdout.read()
print result
# Write the second command
command = "echo $greeting world\n"
cmd.stdin.write(command)
cmd.stdin.flush() # Must include this to ensure data is passed to child process
result = cmd.stdout.read()
print result
What I expected to happen (based on the referenced answer) is that I see "hello world" printed. What actually happens is that it hangs on the first cmd.stdout.read(), and never returns.
Can anyone explain why cmd.stdout.read() never returns?
Notes:
It is absolutely essential that I run multiple bash commands from python within the same environment. Thus, subprocess.communicate() does not help because it waits for the process to terminate.
Note that in my real test case, it is not a static list of bash commands to execute. The logic is more dynamic. I don't have the option of running all of them at once.
You have two problems here:
Your first command does not produce any output. So the first read blocks waiting for some.
You are using read() instead of readline() -- read() will block until enough data is available.
The following modified code (updated with Martjin's polling suggestion) works fine:
import subprocess
import select
cmd = subprocess.Popen(['bash'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
poll = select.poll()
poll.register(cmd.stdout.fileno(),select.POLLIN)
# Write the first command
command = "export greeting=hello\n"
cmd.stdin.write(command)
cmd.stdin.flush() # Must include this to ensure data is passed to child process
ready = poll.poll(500)
if ready:
result = cmd.stdout.readline()
print result
# Write the second command
command = "echo $greeting world\n"
cmd.stdin.write(command)
cmd.stdin.flush() # Must include this to ensure data is passed to child process
ready = poll.poll(500)
if ready:
result = cmd.stdout.readline()
print result
The above has a 500ms timeout - adjust to your needs.

Categories