Python subprocess: Giving stdin, reading stdout, then giving more stdin

Python subprocess: Giving stdin, reading stdout, then giving more stdin - python

I'm working with a piece of scientific software called Chimera. For some of the code downstream of this question, it requires that I use Python 2.7.
I want to call a process, give that process some input, read its output, give it more input based on that, etc.
I've used Popen to open the process, process.stdin.write to pass standard input, but then I've gotten stuck trying to get output while the process is still running. process.communicate() stops the process, process.stdout.readline() seems to keep me in an infinite loop.
Here's a simplified example of what I'd like to do:
Let's say I have a bash script called exampleInput.sh.
#!/bin/bash
# exampleInput.sh
# Read a number from the input
read -p 'Enter a number: ' num
# Multiply the number by 5
ans1=$( expr $num \* 5 )
# Give the user the multiplied number
echo $ans1
# Ask the user whether they want to keep going
read -p 'Based on the previous output, would you like to continue? ' doContinue
if [ $doContinue == "yes" ]
then
echo "Okay, moving on..."
# [...] more code here [...]
else
exit 0
fi
Interacting with this through the command line, I'd run the script, type in "5" and then, if it returned "25", I'd type "yes" and, if not, I would type "no".
I want to run a python script where I pass exampleInput.sh "5" and, if it gives me "25" back, then I pass "yes"
So far, this is as close as I can get:
#!/home/user/miniconda3/bin/python2
# talk_with_example_input.py
import subprocess
process = subprocess.Popen(["./exampleInput.sh"],
stdin = subprocess.PIPE,
stdout = subprocess.PIPE)
process.stdin.write("5")
answer = process.communicate()[0]
if answer == "25":
process.stdin.write("yes")
## I'd like to print the STDOUT here, but the process is already terminated
But that fails of course, because after `process.communicate()', my process isn't running anymore.
(Just in case/FYI): Actual problem
Chimera is usually a gui-based application to examine protein structure. If you run chimera --nogui, it'll open up a prompt and take input.
I often need to know what chimera outputs before I run my next command. For example, I will often try to generate a protein surface and, if Chimera can't generate a surface, it doesn't break--it just says so through STDOUT. So, in my python script, while I'm looping through many proteins to analyze, I need to check STDOUT to know whether to continue analysis on that protein.
In other use cases, I'll run lots of commands through Chimera to clean up a protein first, and then I'll want to run lots of separate commands to get different pieces of data, and use that data to decide whether to run other commands. I could get the data, close the subprocess, and then run another process, but that would require re-running all of those cleaning up commands each time.
Anyways, those are some of the real-world reasons why I want to be able to push STDIN to a subprocess, read the STDOUT, and still be able to push more STDIN.
Thanks for your time!

you don't need to use process.communicate in your example.
Simply read and write using process.stdin.write and process.stdout.read. Also make sure to send a newline, otherwise read won't return. And when you read from stdin, you also have to handle newlines coming from echo.
Note: process.stdout.read will block until EOF.
# talk_with_example_input.py
import subprocess
process = subprocess.Popen(["./exampleInput.sh"],
stdin = subprocess.PIPE,
stdout = subprocess.PIPE)
process.stdin.write("5\n")
stdout = process.stdout.readline()
print(stdout)
if stdout == "25\n":
process.stdin.write("yes\n")
print(process.stdout.readline())
$ python2 test.py
25
Okay, moving on...
Update
When communicating with an program in that way, you have to pay special attention to what the application is actually writing. Best is to analyze the output in a hex editor:
$ chimera --nogui 2>&1 | hexdump -C
Please note that readline [1] only reads to the next newline (\n). In your case you have to call readline at least four times to get that first block of output.
If you just want to read everything up until the subprocess stops printing, you have to read byte by byte and implement a timeout. Sadly, neither read nor readline does provide such a timeout mechanism. This is probably because the underlying read syscall [2] (Linux) does not provide one either.
On Linux we can write a single-threaded read_with_timeout() using poll / select. For an example see [3].
from select import epoll, EPOLLIN
def read_with_timeout(fd, timeout__s):
"""Reads from fd until there is no new data for at least timeout__s seconds.
This only works on linux > 2.5.44.
"""
buf = []
e = epoll()
e.register(fd, EPOLLIN)
while True:
ret = e.poll(timeout__s)
if not ret or ret[0][1] is not EPOLLIN:
break
buf.append(
fd.read(1)
)
return ''.join(buf)
In case you need a reliable way to read non blocking under Windows and Linux, this answer might be helpful.
[1] from the python 2 docs:
readline(limit=-1)
Read and return one line from the stream. If limit is specified, at most limit bytes will be read.
The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line terminator(s) recognized.
[2] from man 2 read:
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
[3] example
$ tree
.
├── prog.py
└── prog.sh
prog.sh
#!/usr/bin/env bash
for i in $(seq 3); do
echo "${RANDOM}"
sleep 1
done
sleep 3
echo "${RANDOM}"
prog.py
# talk_with_example_input.py
import subprocess
from select import epoll, EPOLLIN
def read_with_timeout(fd, timeout__s):
"""Reads from f until there is no new data for at least timeout__s seconds.
This only works on linux > 2.5.44.
"""
buf = []
e = epoll()
e.register(fd, EPOLLIN)
while True:
ret = e.poll(timeout__s)
if not ret or ret[0][1] is not EPOLLIN:
break
buf.append(
fd.read(1)
)
return ''.join(buf)
process = subprocess.Popen(
["./prog.sh"],
stdin = subprocess.PIPE,
stdout = subprocess.PIPE
)
print(read_with_timeout(process.stdout, 1.5))
print('-----')
print(read_with_timeout(process.stdout, 3))
$ python2 prog.py
6194
14508
11293
-----
10506

Related

Knowing a python script has something to read from stdin when launched by crontab

I have a python script that is run by the crontab and can be executed manually.
This script takes as an input:
either the value from -i argument,
or whatever comes from stdin through a pipe.
The code should be something like this:
if ???: #test to check if there is some data in stdin
print("I have data from stdin!")
else:
print("I have no data from stdin!")
The script is executed as follows:
$ ./myscript.py -i myInput
> I have no data from stdin!
$ cat myInput | ./myscript.py
> I have data from stdin!
I have tried several methods that work fine when executed via the console, but don't work as expected when executed by the crontab : the script always considers there IS data from stdin.
First test:
if not sys.stdin.isatty():
print("I have data from stdin!")
else:
print("I have no data from stdin!")
I think this one does not work because because there is no tty in crontab mode so the first statement is always true.
Second test:
import stat
mode = os.fstat(sys.stdin.fileno()).st_mode
if stat.S_ISFIFO(mode):
print("I have data from stdin!")
else:
print("I have no data from stdin!")
Third test:
import select
r, w, x = select.select([sys.stdin], [], [], 0)
if r:
print("I have data from stdin!")
else:
print("I have no data from stdin!")
Is there a correct way to make it work for both console and crontab mode?

As Nullman already wrote in a comment it is better to check your command line options to decide if you want to try stdin or not.
Short summary: You cannot safely guess if you should read data from stdin by checking stdin. You should only rely on checking the command line to find out what is expected.
For example cat will use stdin only if no input file was specified as a command line argument or if the special file name - was specified.
All the tests in your examples will work in certain conditions only and will not work in other cases.
Checking if stdin is a TTY does not help. It will only tell you if it is connected to a terminal. Your script can get input from a terminal if the user types something or if it is a pseudo-terminal connected to something else. Your script can also get input from stdin if it is not connected to a terminal but to something else (pipe, file, socket,...)
Checking if stdin is a FIFO is also wrong because you can read data both from a pipe/fifo or from something else (file, socket, terminal,...).
Using select will not tell you if there is any data, but only if a read will not block. It also will not block on EOF. To distinguish these cases you would have to check the result of a read from stdin. Without a delay/timeout it might also tell you that a read would block if the data is not yet available.
There are more ways to use the script:
Instead of cat myInput | ./myscript.py you could also use ./myscript.py < myInput. In the first case stdin will be a pipe, in the second case a file.
Or imagine ./myscript.py < /dev/null. This will return EOF condition on the first read.
Or ./myscript.py <&- which will close stdin leading to an error when you try to read from it.
If stdin is connected to a terminal a read might block if the user does not enter anything. This would happen if you call ./myscript.py. You could use select to find out if data is available now, but you cannot find out if the user will enter data later. So your script does not know the intention of the user.

Unix: why I need to close the input FIFO to my program before I can read from its output FIFO

I've got a problem: I have one program running in a shell that does some calculations based on user input, and I can launch this program in an interactive way so it will keep asking for input and it outputs its calculations after user press enter. So it remains open inside the shell till user types the exit-word.
What I want to do is to create an interface in such a way that user has to type input somewhere else off the shell and using pipe, fifo and so on, input is carried away to that program, and its output goes to this interface.
In a few word: I have a long running process and I need to attach, when needed, my interface to its stdin and stdout.
For this kind of problem, I was thinking to use a FIFO file made by mkfifo command (we are in Unix, especially for Mac user) and redirect program stdin and stdout to this file:
my_program < fifofile > fifofile
But I've found some difficulties about reading and writing to this fifo file. So I decided to use 2 fifo files, one for input and one for output. So:
exec my_program < fifofile_in > fifofile_out
(don't know why I use exec for redirection, but it works... and I'm okay with exec ;) )
If I launch this command in a shell, and in another one I wrote:
echo -n "date()" > fifofile_in
The echoing process is succesful, and if I do:
cat fifofile_out
I'm able to see my_program output. Ok! But I don't want to deal with shell, instead I want to deal with a program written by me, like this python script:
import os, time;
text="";
OUT=os.open("sample_out",os.O_RDONLY | os.O_NONBLOCK)
out=os.fdopen(OUT);
while(1):
#IN=open("sample_in",'w' );
IN=os.open("sample_in",os.O_WRONLY)
#OUT=os.fdopen(os.open("sample_out",os.O_RDONLY| os.O_NONBLOCK |os.O_APPEND));
#OUT=open("sample_out","r");
print "Write your mess:";
text=raw_input();
if (text=="exit"):
break;
os.write(IN,text);
os.close(IN);
#os.fsync(IN);
time.sleep(0.05);
try:
while True:
#c=os.read(OUT,1);
c=out.readline();
print "Read: ", c#, " -- ", ord(c);
if not c:
print "End of file";
quit();
#break;
except OSError as e:
continue;
#print "OSError"
except IOError as e:
continue;
#print "IOError"
Where:
sample_in, sample_out are respectively fifo files used for redirections to stdin and stdout (so I write to stdin in order to give input to my_program and I read from stdout in order to get my_program output)
out is my os.fdopen file descriptor used for getting lines with out.readline() instead of using OUT.read(1) (char by char)
time.sleep(0.05) is for delay some time before go reading my_program output (needed for calculations, else I got nothing to read).
Whit this script and my_program running in background from the shell, I'm able to write to stdin and read from stdout correctly, but the journey to achieve this code wasn't easy: after have read all posts about fifo and reading/writing from/to fifo files, I came with this solution of closing the IN fd before reading from OUT even if the fifo files are different! From what I read around internet and in Stackoverflow articles, I thought that this procedure was for handle only one fifo file, but here I deal with two (different!). I think it is something related to how I write into sample_in: I tried to flush to look like echo -n command, but it seems useless.
So I would like to ask you if this behaviour is normal, and how can achieve the same thing with echo -n "...." > sample_in and in other shell cat sample_out? In particular, cat is outputting data continuously as soon as I echo input in sample_in, but my way of reading is with data blocks.
Thanks so much, I hope everything it's clear enough!

Start a subprocess, wait for it to complete and then retrieve data in Python

I'm struggling to get some python script to start a subprocess, wait until it completes and then retrieve the required data. I'm quite new to Python.
The command I wish to run as a subprocess is
./bin.testing/Eva -t --suite="temp0"
Running that command by hand in the Linux terminal produces:
in terminal mode
Evaluation error = 16.7934
I want to run the command as a python sub-process, and receive the output back. However, everything I try seems to skip the second line (ultimately, it's the second line that I want.) At the moment, I have this:
def job(self,fen_file):
from subprocess import Popen, PIPE
from sys import exit
try:
eva=Popen('{0}/Eva -t --suite"{0}"'.format(self.exedir,fen_file),shell=True,stdout=PIPE,stderr=PIPE)
stdout,stderr=eva.communicate()
except:
print ('Error running test suite '+fen_file)
exit("Stopping")
print(stdout)
.
.
.
return 0
All this seems to produce is
in terminal mode
0
with the important line missing. The print statement is just so I can see what I am getting back from the sub-process -- the intention is that it will be replaced with code that processes the number from the second line and returns the output (here I'm just returning 0 just so I can get this particular bit to work first. The caller of this function prints the result, which is why there is a zero at the end of the output.) exedir is just the directory of the executable for the sub-process, and fen-file is just an ascii file that the sub-process needs. I have tried removing the 'in terminal mode' from the source code of the sub-process and re compiling it, but that doesn't work -- it still doesn't return the important second line.
Thanks in advance; I expect what I am doing wrong is really very simple.
Edit: I ought to add that the subprocess Eva can take a second or two to complete.

Since the 2nd line is an error message, it's probably stored in your stderr variable!
To know for sure you can print your stderr in your code, or you can run the program on the command line and see if the output is split into stdout and stderr. One easy way is to do ./bin.testing/Eva -t --suite="temp0" > /dev/null. Any messages you get are stderr since stdout is redirected to /dev/null.
Also, typically with Popen the shell=True option is discouraged unless really needed. Instead pass a list:
[os.path.join(self.exedir, 'Eva'), '-t', '--suite=' + fen_file], shell=False, ...
This can avoid problems down the line if one of your arguments would normally be interpreted by the shell. (Note, I removed the ""'s, because the shell would normally eat those for you!)

Try using subprocess check_output.
output_lines = subprocess.check_output(['./bin.testing/Eva', '-t', '--suite="temp0"'])
for line in output_lines.splitlines():
print(line)

Controlling a python script from another script

I am trying to learn how to write a script control.py, that runs another script test.py in a loop for a certain number of times, in each run, reads its output and halts it if some predefined output is printed (e.g. the text 'stop now'), and the loop continues its iteration (once test.py has finished, either on its own, or by force). So something along the lines:
for i in range(n):
os.system('test.py someargument')
if output == 'stop now': #stop the current test.py process and continue with next iteration
#output here is supposed to contain what test.py prints
The problem with the above is that, it does not check the output of test.py as it is running, instead it waits until test.py process is finished on its own, right?
Basically trying to learn how I can use a python script to control another one, as it is running. (e.g. having access to what it prints and so on).
Finally, is it possible to run test.py in a new terminal (i.e. not in control.py's terminal) and still achieve the above goals?
An attempt:
test.py is this:
from itertools import permutations
import random as random
perms = [''.join(p) for p in permutations('stop')]
for i in range(1000000):
rand_ind = random.randrange(0,len(perms))
print perms[rand_ind]
And control.py is this: (following Marc's suggestion)
import subprocess
command = ["python", "test.py"]
n = 10
for i in range(n):
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
output = p.stdout.readline().strip()
print output
#if output == '' and p.poll() is not None:
# break
if output == 'stop':
print 'sucess'
p.kill()
break
#Do whatever you want
#rc = p.poll() #Exit Code

You can use subprocess module or also the os.popen
os.popen(command[, mode[, bufsize]])
Open a pipe to or from command. The return value is an open file object connected to the pipe, which can be read or written depending on whether mode is 'r' (default) or 'w'.
With subprocess I would suggest
subprocess.call(['python.exe', command])
or the subprocess.Popen --> that is similar to os.popen (for instance)
With popen you can read the connected object/file and check whether "Stop now" is there.
The os.system is not deprecated and you can use as well (but you won't get a object from that), you can just check if return at the end of execution.
From subprocess.call you can run it in a new terminal or if you want to call multiple times ONLY the test.py --> than you can put your script in a def main() and run the main as much as you want till the "Stop now" is generated.
Hope this solve your query :-) otherwise comment again.
Looking at what you wrote above you can also redirect the output to a file directly from the OS call --> os.system(test.py *args >> /tmp/mickey.txt) then you can check at each round the file.
As said the popen is an object file that you can access.

What you are hinting at in your comment to Marc Cabos' answer is Threading
There are several ways Python can use the functionality of other files. If the content of test.py can be encapsulated in a function or class, then you can import the relevant parts into your program, giving you greater access to the runnings of that code.
As described in other answers you can use the stdout of a script, running it in a subprocess. This could give you separate terminal outputs as you require.
However if you want to run the test.py concurrently and access variables as they are changed then you need to consider threading.

Yes you can use Python to control another program using stdin/stdout, but when using another process output often there is a problem of buffering, in other words the other process doesn't really output anything until it's done.
There are even cases in which the output is buffered or not depending on if the program is started from a terminal or not.
If you are the author of both programs then probably is better using another interprocess channel where the flushing is explicitly controlled by the code, like sockets.

You can use the "subprocess" library for that.
import subprocess
command = ["python", "test.py", "someargument"]
for i in range(n):
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
output = p.stdout.readline()
if output == '' and p.poll() is not None:
break
if output == 'stop now':
#Do whatever you want
rc = p.poll() #Exit Code

Detecting the end of the stream on popen.stdout.readline

I have a python program which launches subprocesses using Popen and consumes their output nearly real-time as it is produced. The code of the relevant loop is:
def run(self, output_consumer):
self.prepare_to_run()
popen_args = self.get_popen_args()
logging.debug("Calling popen with arguments %s" % popen_args)
self.popen = subprocess.Popen(**popen_args)
while True:
outdata = self.popen.stdout.readline()
if not outdata and self.popen.returncode is not None:
# Terminate when we've read all the output and the returncode is set
break
output_consumer.process_output(outdata)
self.popen.poll() # updates returncode so we can exit the loop
output_consumer.finish(self.popen.returncode)
self.post_run()
def get_popen_args(self):
return {
'args': self.command,
'shell': False, # Just being explicit for security's sake
'bufsize': 0, # More likely to see what's being printed as it happens
# Not guarantted since the process itself might buffer its output
# run `python -u` to unbuffer output of a python processes
'cwd': self.get_cwd(),
'env': self.get_environment(),
'stdout': subprocess.PIPE,
'stderr': subprocess.STDOUT,
'close_fds': True, # Doesn't seem to matter
}
This works great on my production machines, but on my dev machine, the call to .readline() hangs when certain subprocesses complete. That is, it will successfully process all of the output, including the final output line saying "process complete", but then will again poll readline and never return. This method exits properly on the dev machine for most of the sub-processes I call, but consistently fails to exit for one complex bash script that itself calls many sub-processes.
It's worth noting that popen.returncode gets set to a non-None (usually 0) value many lines before the end of the output. So I can't just break out of the loop when that is set or else I lose everything that gets spat out at the end of the process and is still buffered waiting for reading. The problem is that when I'm flushing the buffer at that point, I can't tell when I'm at the end because the last call to readline() hangs. Calling read() also hangs. Calling read(1) gets me every last character out, but also hangs after the final line. popen.stdout.closed is always False. How can I tell when I'm at the end?
All systems are running python 2.7.3 on Ubuntu 12.04LTS. FWIW, stderr is being merged with stdout using stderr=subprocess.STDOUT.
Why the difference? Is it failing to close stdout for some reason? Could the sub-sub-processes do something to keep it open somehow? Could it be because I'm launching the process from a terminal on my dev box, but in production it's launched as a daemon through supervisord? Would that change the way the pipes are processed and if so how do I normalize them?

The main code loop looks right. It could be that the pipe isn't closing because another process is keeping it open. For example, if script launches a background process that writes to stdout then the pipe will no close. Are you sure no other child process still running?
An idea is to change modes when you see the .returncode has set. Once you know the main process is done, read all its output from buffer, but don't get stuck waiting. You can use select to read from the pipe with a timeout. Set a several seconds timeout and you can clear the buffer without getting stuck waiting child process.

Without knowing the contents of the "one complex bash script" which causes the problem, there's too many possibilities to determine the exact cause.
However, focusing on the fact that you claim it works if you run your Python script under supervisord, then it might be getting stuck if a sub-process is trying to read from stdin, or just behaves differently if stdin is a tty, which (I presume) supervisord will redirect from /dev/null.
This minimal example seems to cope better with cases where my example test.sh runs subprocesses which try to read from stdin...
import os
import subprocess
f = subprocess.Popen(args='./test.sh',
shell=False,
bufsize=0,
stdin=open(os.devnull, 'rb'),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
close_fds=True)
while 1:
s = f.stdout.readline()
if not s and f.returncode is not None:
break
print s.strip()
f.poll()
print "done %d" % f.returncode
Otherwise, you can always fall back to using a non-blocking read, and bail out when you get your final output line saying "process complete", although it's a bit of a hack.

If you use readline() or read(), it should not hang. No need to check returncode or poll(). If it is hanging when you know the process is finished, it is most probably a subprocess keeping your pipe open, as others said before.
There are two things you could do to debug this:
* Try to reproduce with a minimal script instead of the current complex one, or
* Run that complex script with strace -f -e clone,execve,exit_group and see what is that script starting, and if any process is surviving the main script (check when the main script calls exit_group, if strace is still waiting after that, you have a child still alive).

I find that calls to read (or readline) sometimes hang, despite previously calling poll. So I resorted to calling select to find out if there is readable data. However, select without a timeout can hang, too, if the process was closed. So I call select in a semi-busy loop with a tiny timeout for each iteration (see below).
I'm not sure if you can adapt this to readline, as readline might hang if the final \n is missing, or if the process doesn't close its stdout before you close its stdin and/or terminate it. You could wrap this in a generator, and everytime you encounter a \n in stdout_collected, yield the current line.
Also note that in my actual code, I'm using pseudoterminals (pty) to wrap the popen handles (to more closely fake user input) but it should work without.
# handle to read from
handle = self.popen.stdout
# how many seconds to wait without data
timeout = 1
begin = datetime.now()
stdout_collected = ""
while self.popen.poll() is None:
try:
fds = select.select([handle], [], [], 0.01)[0]
except select.error, exc:
print exc
break
if len(fds) == 0:
# select timed out, no new data
delta = (datetime.now() - begin).total_seconds()
if delta > timeout:
return stdout_collected
# try longer
continue
else:
# have data, timeout counter resets again
begin = datetime.now()
for fd in fds:
if fd == handle:
data = os.read(handle, 1024)
# can handle the bytes as they come in here
# self._handle_stdout(data)
stdout_collected += data
# process exited
# if using a pseudoterminal, close the handles here
self.popen.wait()

Why are you setting the sdterr to STDOUT?
The real benefit of making a communicate() call on a subproces is that you are able to retrieve a tuple containining the stdout response as well as the stderr meesage.
Those might be useful if the logic depends on their succsss or failure.
Also, it would save you from the pain of having to iterate through lines. Communicate() gives you everything and there would be no unresolved questions about whether or not the full message was received

I wrote a demo with bash subprocess that can be easy explored.
A closed pipe can be recognized by '' in the output from readline(), while the output from an empty line is '\n'.
from subprocess import Popen, PIPE, STDOUT
p = Popen(['bash'], stdout=PIPE, stderr=STDOUT)
out = []
while True:
outdata = p.stdout.readline()
if not outdata:
break
#output_consumer.process_output(outdata)
print "* " + repr(outdata)
out.append(outdata)
print "* closed", repr(out)
print "* returncode", p.wait()
Example of input/output: Closing the pipe distinctly before terminating the process. That is why wait() should be used instead of poll()
[prompt] $ python myscript.py
echo abc
* 'abc\n'
exec 1>&- # close stdout
exec 2>&- # close stderr
* closed ['abc\n']
exit
* returncode 0
[prompt] $
Your code did output a huge number of empty strings for this case.
Example: Fast terminated process without '\n' on the last line:
echo -n abc
exit
* 'abc'
* closed ['abc']
* returncode 0

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.