I have one sh file, I need to install it in target linux box. So I'm in the process of writing automatic installation for the sh file which required lot of input from user. Example, first thing I made ./file.sh it will show a big paragaraph and ask user to press Enter. I'm stuck in this place. How to send key data to the sub process. Here is what I've tried.
import subprocess
def runProcess(exe):
global p
p = subprocess.Popen(exe, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while(True):
retcode = p.poll() #returns None while subprocess is running
line = p.stdout.readline()
yield line
if(retcode is not None):
break
for line in runProcess('./file.sh'.split()):
if '[Enter]' in line:
print line + 'got it'
p.communicate('\r')
Correct me if my understanding is wrong, pardon me if it is duplicate.
If you need to send a bunch of newlines and nothing else, you need to:
Make sure the stdin for the Popen is a pipe
Send the newlines without causing a deadlock
Your current code does neither. Something that might work (assuming they're not using APIs that require direct interaction in a tty, rather than just reading stdin):
import subprocess
import threading
def feednewlines(f):
try:
# Write as many newlines as it will take
while True:
f.write(b'\n') # Write newline, not carriage return
f.flush() # Flush to ensure it's sent as quickly as possible
except OSError:
return # Done when pipe closed/process exited
def runProcess(exe):
global p
# Get stdin as pipe too
p = subprocess.Popen(exe, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Use thread to just feed as many newlines as needed to stdin of subprocess
feeder = threading.Thread(target=feednewlines, args=(p.stdin,))
feeder.daemon = True
feeder.start()
# No need to poll, just read until it closes stdout or exits
for line in p.stdout:
yield line
p.stdin.close() # Stop feeding (causes thread to error and exit)
p.wait() # Cleanup process
# Iterate output, and echo when [Enter] seen
for line in runProcess('./file.sh'.split()):
if '[Enter]' in line:
print line + 'got it'
For the case where you need to customize the responses, you're going to need to add communication between parent and feeder thread, which makes this uglier, and it only works if the child process is properly flushing its output when it prompts you, even when not connected to a terminal. You might do something like this to define a global queue:
import queue # Queue on Python 2
feederqueue = queue.Queue()
then change the feeder function to:
def feednewlines(f):
try:
while True:
f.write(feederqueue.get())
f.flush()
except OSError:
return
and change the global code lower down to:
for line in runProcess('./file.sh'.split()):
if '[Enter]' in line:
print line + 'got it'
feederqueue.put(b'\n')
elif 'THING THAT REQUIRES YOU TO TYPE FOO' in line:
feederqueue.put(b'foo\n')
etc.
Command line programs run differently when they are run in a terminal verses when they are run in the background. If the program is attached to a terminal, they run in an interactive line mode expecting user interaction. If stdin is a file or a pipe, they run in block mode where writes are delayed until a certain block size is buffered. Your program will never see the [Enter] prompt because it uses pipes and the data is still in the subprocesses output buffer.
The python pexpect module solves this problem by emulating a terminal and allowing you to interact with the program with a series of "expect" statements.
Suppose we want to run a test program
#!/usr/bin/env python3
data = input('[Enter]')
print(data)
its pretty boring. It prompts for data, prints it, then exits. We can run it with pexpect
#!/usr/bin/env python3
import pexpect
# run the program
p = pexpect.spawn('./test.py')
# we don't need to see our input to the program echoed back
p.setecho(False)
# read lines until the desired program output is seen
p.expect(r'\[Enter\]')
# send some data to the program
p.sendline('inner data')
# wait for it to exit
p.expect(pexpect.EOF)
# show everything since the previous expect
print(p.before)
print('outer done')
Related
I want to achieve something which is very similar to this.
My actual goal is to run Rasa from within python.
Taken from Rasa's site:
Rasa is a framework for building conversational software: Messenger/Slack bots, Alexa skills, etc. We’ll abbreviate this as a bot in this documentation.
It is basically a chatbot which runs in the command prompt. This is how it works on cmd :
Now I want to run Rasa from python so that I can integrate it with my Django-based website. i.e. I want to keep taking inputs from the user, pass it to rasa, rasa processes the text and gives me an output which I show back to the user.
I have tried this (running it from cmd as of now)
import sys
import subprocess
from threading import Thread
from queue import Queue, Empty # python 3.x
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
def getOutput(outQueue):
outStr = ''
try:
while True: #Adds output from the Queue until it is empty
outStr+=outQueue.get_nowait()
except Empty:
return outStr
p = subprocess.Popen('command_to_run_rasa',
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=False,
universal_newlines=True,
)
outQueue = Queue()
outThread = Thread(target=enqueue_output, args=(p.stdout, outQueue))
outThread.daemon = True
outThread.start()
someInput = ""
while someInput != "stop":
someInput = input("Input: ") # to take input from user
p.stdin.write(someInput) # passing input to be processed by the rasa command
p.stdin.flush()
output = getOutput(outQueue)
print("Output: " + output + "\n")
p.stdout.flush()
But it works fine only for the first line of output. Not for successive input/output cycles. See output below.
How do I get it working for multiple cycles?
I've referred to this, and I think I understand the problem in my code from it but I dont know how to solve it.
EDIT: I'm using Python 3.6.2 (64-bit) on Windows 10
You need to keep interacting with your subprocess - at the moment once you pick the output from your subprocess you're pretty much done as you close its STDOUT stream.
Here is the most rudimentary way to continue user input -> process output cycle:
import subprocess
import sys
import time
if __name__ == "__main__": # a guard from unintended usage
input_buffer = sys.stdin # a buffer to get the user input from
output_buffer = sys.stdout # a buffer to write rasa's output to
proc = subprocess.Popen(["path/to/rasa", "arg1", "arg2", "etc."], # start the process
stdin=subprocess.PIPE, # pipe its STDIN so we can write to it
stdout=output_buffer, # pipe directly to the output_buffer
universal_newlines=True)
while True: # run a main loop
time.sleep(0.5) # give some time for `rasa` to forward its STDOUT
print("Input: ", end="", file=output_buffer, flush=True) # print the input prompt
print(input_buffer.readline(), file=proc.stdin, flush=True) # forward the user input
You can replace input_buffer with a buffer coming from your remote user(s) and output_buffer with a buffer that forwards the data to your user(s) and you'll get essentially what you're looking for - the sub-process will be getting the input directly from the user (input_buffer) and print its output to the user (output_buffer).
If you need to perform other tasks while all this is running in the background, just run everything under the if __name__ == "__main__": guard in a separate thread, and I'd suggest adding a try..except block to pick up KeyboardInterrupt and exit gracefully.
But... soon enough you'll notice that it doesn't exactly work properly all the time - if it takes longer than half a second of wait for rasa to print its STDOUT and enter the wait for STDIN stage, the outputs will start to mix. This problem is considerably more complex than you might expect. The main issue is that STDOUT and STDIN (and STDERR) are separate buffers and you cannot know when a subprocess is actually expecting something on its STDIN. This means that without a clear indication from the subprocess (like you have the \r\n[path]> in Windows CMD prompt on its STDOUT for example) you can only send data to the subprocesses STDIN and hope it will be picked up.
Based on your screenshot, it doesn't really give a distinguishable STDIN request prompt because the first prompt is ... :\n and then it waits for STDIN, but then once the command is sent it lists options without an indication of its end of STDOUT stream (technically making the prompt just ...\n but that would match any line preceding it as well). Maybe you can be clever and read the STDOUT line by line, then on each new line measure how much time has passed since the sub-process wrote to it and once a threshold of inactivity is reached assume that rasa expects input and prompt the user for it. Something like:
import subprocess
import sys
import threading
# we'll be using a separate thread and a timed event to request the user input
def timed_user_input(timer, wait, buffer_in, buffer_out, buffer_target):
while True: # user input loop
timer.wait(wait) # wait for the specified time...
if not timer.is_set(): # if the timer was not stopped/restarted...
print("Input: ", end="", file=buffer_out, flush=True) # print the input prompt
print(buffer_in.readline(), file=buffer_target, flush=True) # forward the input
timer.clear() # reset the 'timer' event
if __name__ == "__main__": # a guard from unintended usage
input_buffer = sys.stdin # a buffer to get the user input from
output_buffer = sys.stdout # a buffer to write rasa's output to
proc = subprocess.Popen(["path/to/rasa", "arg1", "arg2", "etc."], # start the process
stdin=subprocess.PIPE, # pipe its STDIN so we can write to it
stdout=subprocess.PIPE, # pipe its STDIN so we can process it
universal_newlines=True)
# lets build a timer which will fire off if we don't reset it
timer = threading.Event() # a simple Event timer
input_thread = threading.Thread(target=timed_user_input,
args=(timer, # pass the timer
1.0, # prompt after one second
input_buffer, output_buffer, proc.stdin))
input_thread.daemon = True # no need to keep the input thread blocking...
input_thread.start() # start the timer thread
# now we'll read the `rasa` STDOUT line by line, forward it to output_buffer and reset
# the timer each time a new line is encountered
for line in proc.stdout:
output_buffer.write(line) # forward the STDOUT line
output_buffer.flush() # flush the output buffer
timer.set() # reset the timer
You can use a similar technique to check for more complex 'expected user input' patterns. There is a whole module called pexpect designed to deal with this type of tasks and I wholeheartedly recommend it if you're willing to give up some flexibility.
Now... all this being said, you are aware that Rasa is built in Python, installs as a Python module and has a Python API, right? Since you're already using Python why would you call it as a subprocess and deal with all this STDOUT/STDIN shenanigans when you can directly run it from your Python code? Just import it and interact with it directly, they even have a very simple example that does exactly what you're trying to do: Rasa Core with minimal Python.
I want to achieve something which is very similar to this.
My actual goal is to run Rasa from within python.
Taken from Rasa's site:
Rasa is a framework for building conversational software: Messenger/Slack bots, Alexa skills, etc. We’ll abbreviate this as a bot in this documentation.
It is basically a chatbot which runs in the command prompt. This is how it works on cmd :
Now I want to run Rasa from python so that I can integrate it with my Django-based website. i.e. I want to keep taking inputs from the user, pass it to rasa, rasa processes the text and gives me an output which I show back to the user.
I have tried this (running it from cmd as of now)
import sys
import subprocess
from threading import Thread
from queue import Queue, Empty # python 3.x
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
def getOutput(outQueue):
outStr = ''
try:
while True: #Adds output from the Queue until it is empty
outStr+=outQueue.get_nowait()
except Empty:
return outStr
p = subprocess.Popen('command_to_run_rasa',
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=False,
universal_newlines=True,
)
outQueue = Queue()
outThread = Thread(target=enqueue_output, args=(p.stdout, outQueue))
outThread.daemon = True
outThread.start()
someInput = ""
while someInput != "stop":
someInput = input("Input: ") # to take input from user
p.stdin.write(someInput) # passing input to be processed by the rasa command
p.stdin.flush()
output = getOutput(outQueue)
print("Output: " + output + "\n")
p.stdout.flush()
But it works fine only for the first line of output. Not for successive input/output cycles. See output below.
How do I get it working for multiple cycles?
I've referred to this, and I think I understand the problem in my code from it but I dont know how to solve it.
EDIT: I'm using Python 3.6.2 (64-bit) on Windows 10
You need to keep interacting with your subprocess - at the moment once you pick the output from your subprocess you're pretty much done as you close its STDOUT stream.
Here is the most rudimentary way to continue user input -> process output cycle:
import subprocess
import sys
import time
if __name__ == "__main__": # a guard from unintended usage
input_buffer = sys.stdin # a buffer to get the user input from
output_buffer = sys.stdout # a buffer to write rasa's output to
proc = subprocess.Popen(["path/to/rasa", "arg1", "arg2", "etc."], # start the process
stdin=subprocess.PIPE, # pipe its STDIN so we can write to it
stdout=output_buffer, # pipe directly to the output_buffer
universal_newlines=True)
while True: # run a main loop
time.sleep(0.5) # give some time for `rasa` to forward its STDOUT
print("Input: ", end="", file=output_buffer, flush=True) # print the input prompt
print(input_buffer.readline(), file=proc.stdin, flush=True) # forward the user input
You can replace input_buffer with a buffer coming from your remote user(s) and output_buffer with a buffer that forwards the data to your user(s) and you'll get essentially what you're looking for - the sub-process will be getting the input directly from the user (input_buffer) and print its output to the user (output_buffer).
If you need to perform other tasks while all this is running in the background, just run everything under the if __name__ == "__main__": guard in a separate thread, and I'd suggest adding a try..except block to pick up KeyboardInterrupt and exit gracefully.
But... soon enough you'll notice that it doesn't exactly work properly all the time - if it takes longer than half a second of wait for rasa to print its STDOUT and enter the wait for STDIN stage, the outputs will start to mix. This problem is considerably more complex than you might expect. The main issue is that STDOUT and STDIN (and STDERR) are separate buffers and you cannot know when a subprocess is actually expecting something on its STDIN. This means that without a clear indication from the subprocess (like you have the \r\n[path]> in Windows CMD prompt on its STDOUT for example) you can only send data to the subprocesses STDIN and hope it will be picked up.
Based on your screenshot, it doesn't really give a distinguishable STDIN request prompt because the first prompt is ... :\n and then it waits for STDIN, but then once the command is sent it lists options without an indication of its end of STDOUT stream (technically making the prompt just ...\n but that would match any line preceding it as well). Maybe you can be clever and read the STDOUT line by line, then on each new line measure how much time has passed since the sub-process wrote to it and once a threshold of inactivity is reached assume that rasa expects input and prompt the user for it. Something like:
import subprocess
import sys
import threading
# we'll be using a separate thread and a timed event to request the user input
def timed_user_input(timer, wait, buffer_in, buffer_out, buffer_target):
while True: # user input loop
timer.wait(wait) # wait for the specified time...
if not timer.is_set(): # if the timer was not stopped/restarted...
print("Input: ", end="", file=buffer_out, flush=True) # print the input prompt
print(buffer_in.readline(), file=buffer_target, flush=True) # forward the input
timer.clear() # reset the 'timer' event
if __name__ == "__main__": # a guard from unintended usage
input_buffer = sys.stdin # a buffer to get the user input from
output_buffer = sys.stdout # a buffer to write rasa's output to
proc = subprocess.Popen(["path/to/rasa", "arg1", "arg2", "etc."], # start the process
stdin=subprocess.PIPE, # pipe its STDIN so we can write to it
stdout=subprocess.PIPE, # pipe its STDIN so we can process it
universal_newlines=True)
# lets build a timer which will fire off if we don't reset it
timer = threading.Event() # a simple Event timer
input_thread = threading.Thread(target=timed_user_input,
args=(timer, # pass the timer
1.0, # prompt after one second
input_buffer, output_buffer, proc.stdin))
input_thread.daemon = True # no need to keep the input thread blocking...
input_thread.start() # start the timer thread
# now we'll read the `rasa` STDOUT line by line, forward it to output_buffer and reset
# the timer each time a new line is encountered
for line in proc.stdout:
output_buffer.write(line) # forward the STDOUT line
output_buffer.flush() # flush the output buffer
timer.set() # reset the timer
You can use a similar technique to check for more complex 'expected user input' patterns. There is a whole module called pexpect designed to deal with this type of tasks and I wholeheartedly recommend it if you're willing to give up some flexibility.
Now... all this being said, you are aware that Rasa is built in Python, installs as a Python module and has a Python API, right? Since you're already using Python why would you call it as a subprocess and deal with all this STDOUT/STDIN shenanigans when you can directly run it from your Python code? Just import it and interact with it directly, they even have a very simple example that does exactly what you're trying to do: Rasa Core with minimal Python.
We have created a commodity function used in many projects which uses subprocess to start a command. This function is as follows:
def _popen( command_list ):
p = subprocess.Popen( command_list, stdout=subprocess.PIPE,
stderr=subprocess.PIPE )
out, error_msg = p.communicate()
# Some processes (e.g. system_start) print a number of dots in stderr
# even when no error occurs.
if error_msg.strip('.') == '':
error_msg = ''
return out, error_msg
For most processes this works as intended.
But now I have to use it with a background-process which need to keep running as long as my python-script is running as well and thus now the fun starts ;-).
Note: the script also needs to start other non background-processes using this same _popen-function.
I know that by skipping p.communicate I can make the process start in the background, while my python script continues.
But there are 2 problems with this:
I need to check that the background process started correctly
While the main process is running I need to check the stdout and stderr of the background process from time to time without stopping the process / ending hanging in the background process.
Check background process started correctly
For 1 I currently adapted the _popen version to take an extra parameter 'skip_com' (default False) to skip the p.communicate call. And in that case I return the p-object i.s.o. out and error_msg.
This so I can check if the process is running directly after starting it up and if not call communicate on the p-object to check what the error_msg is.
MY_COMMAND_LIST = [ "<command that should go to background>" ]
def _popen( command_list, skip_com=False ):
p = subprocess.Popen( command_list, stdout=subprocess.PIPE,
stderr=subprocess.PIPE )
if not skip_com:
out, error_msg = p.communicate()
# Some processes (e.g. system_start) print a number of dots in stderr
# even when no error occurs.
if error_msg.strip('.') == '':
error_msg = ''
return out, error_msg
else:
return p
...
p = _popen( MY_COMMAND_LIST, True )
error = _get_command_pid( MY_COMMAND_LIST ) # checks if background command is running using _popen and ps -ef
if error:
_, error_msg = p.communicate()
I do not know if there is a better way to do this.
check stdout / stderr
For 2 I have not found a solution which does not cause the script to wait for the end of the background process.
The only ways I know to communicate is using iter on e.g. p.stdout.readline. But that will hang if the process is still running:
for line in iter( p.stdout.readline, "" ): print line
Any one an idea how to do this?
/edit/ I need to check the data I get from stdout and stderr seperately. Especially stderr is important in this case since if the background process encounters an error it will exit and I need to catch that in my main program to be able to prevent errors caused by that exit.
The stdout output is needed in some situations to check the expected behaviour in the background process and to react on that.
Update
The subprocess will actually exit if it encounters an error
If you don't need to read the output to detect an error then redirect it to DEVNULL and call .poll() to check child process' status from time to time without stopping the process.
assuming you have to read the output:
Do not use stdout=PIPE, stderr=PIPE unless you read from the pipes. Otherwise, the child process may hang as soon as any of the corresponding OS pipe buffers fill up.
If you want to start a process and do something else while it is running then you need a non-blocking way to read its output. A simple portable way is to use a thread:
def process_output(process):
with finishing(process): # close pipes, call .wait()
for line in iter(process.stdout.readline, b''):
if detected_error(line):
communicate_error(process, line)
process = Popen(command, stdout=PIPE, stderr=STDOUT, bufsize=1)
Thread(target=process_output, args=[process]).start()
I need to check the data I get from stdout and stderr seperately.
Use two threads:
def read_stdout(process):
with waiting(process), process.stdout: # close pipe, call .wait()
for line in iter(process.stdout.readline, b''):
do_something_with_stdout(line)
def read_stderr(process):
with process.stderr:
for line in iter(process.stderr.readline, b''):
if detected_error(line):
communicate_error(process, line)
process = Popen(command, stdout=PIPE, stderr=PIPE, bufsize=1)
Thread(target=read_stdout, args=[process]).start()
Thread(target=read_stderr, args=[process]).start()
You could put the code into a custom class (to group do_something_with_stdout(), detected_error(), communicate_error() methods).
It may be better or worse than what you imagine...
Anyway, the correct way of reading a pipe line by line is simply:
for line in p.stdout:
#process line is you want of just
print line
Or if you need to process that inside of a higher level loop
line = next(p.stdout)
But a harder problem could come from the commands started from Python. Many programs use the underlying C standard library, and by default stdout is a buffered stream. The system detects whether the standard output is connected to a terminal, and automatically flushes output on a new line (\n) or on a read on same terminal. But if output is connected to a pipe or a file, everything is buffered until the buffer is full, which on current systems requires several kBytes. In that case nothing can be done at Python level. Above code would get a full line as soon as it would written on the pipe, but cannot guess before callee has actually written something...
I have a script named 1st.py which creates a REPL (read-eval-print-loop):
print "Something to print"
while True:
r = raw_input()
if r == 'n':
print "exiting"
break
else:
print "continuing"
I then launched 1st.py with the following code:
p = subprocess.Popen(["python","1st.py"], stdin=PIPE, stdout=PIPE)
And then tried this:
print p.communicate()[0]
It failed, providing this traceback:
Traceback (most recent call last):
File "1st.py", line 3, in <module>
r = raw_input()
EOFError: EOF when reading a line
Can you explain what is happening here please? When I use p.stdout.read(), it hangs forever.
.communicate() writes input (there is no input in this case so it just closes subprocess' stdin to indicate to the subprocess that there is no more input), reads all output, and waits for the subprocess to exit.
The exception EOFError is raised in the child process by raw_input() (it expected data but got EOF (no data)).
p.stdout.read() hangs forever because it tries to read all output from the child at the same time as the child waits for input (raw_input()) that causes a deadlock.
To avoid the deadlock you need to read/write asynchronously (e.g., by using threads or select) or to know exactly when and how much to read/write, for example:
from subprocess import PIPE, Popen
p = Popen(["python", "-u", "1st.py"], stdin=PIPE, stdout=PIPE, bufsize=1)
print p.stdout.readline(), # read the first line
for i in range(10): # repeat several times to show that it works
print >>p.stdin, i # write input
p.stdin.flush() # not necessary in this case
print p.stdout.readline(), # read output
print p.communicate("n\n")[0], # signal the child to exit,
# read the rest of the output,
# wait for the child to exit
Note: it is a very fragile code if read/write are not in sync; it deadlocks.
Beware of block-buffering issue (here it is solved by using "-u" flag that turns off buffering for stdin, stdout in the child).
bufsize=1 makes the pipes line-buffered on the parent side.
Do not use communicate(input=""). It writes input to the process, closes its stdin and then reads all output.
Do it like this:
p=subprocess.Popen(["python","1st.py"],stdin=PIPE,stdout=PIPE)
# get output from process "Something to print"
one_line_output = p.stdout.readline()
# write 'a line\n' to the process
p.stdin.write('a line\n')
# get output from process "not time to break"
one_line_output = p.stdout.readline()
# write "n\n" to that process for if r=='n':
p.stdin.write('n\n')
# read the last output from the process "Exiting"
one_line_output = p.stdout.readline()
What you would do to remove the error:
all_the_process_will_tell_you = p.communicate('all you will ever say to this process\nn\n')[0]
But since communicate closes the stdout and stdin and stderr, you can not read or write after you called communicate.
Your second bit of code starts the first bit of code as a subprocess with piped input and output. It then closes its input and tries to read its output.
The first bit of code tries to read from standard input, but the process that started it closed its standard input, so it immediately reaches an end-of-file, which Python turns into an exception.
I have a python program which launches subprocesses using Popen and consumes their output nearly real-time as it is produced. The code of the relevant loop is:
def run(self, output_consumer):
self.prepare_to_run()
popen_args = self.get_popen_args()
logging.debug("Calling popen with arguments %s" % popen_args)
self.popen = subprocess.Popen(**popen_args)
while True:
outdata = self.popen.stdout.readline()
if not outdata and self.popen.returncode is not None:
# Terminate when we've read all the output and the returncode is set
break
output_consumer.process_output(outdata)
self.popen.poll() # updates returncode so we can exit the loop
output_consumer.finish(self.popen.returncode)
self.post_run()
def get_popen_args(self):
return {
'args': self.command,
'shell': False, # Just being explicit for security's sake
'bufsize': 0, # More likely to see what's being printed as it happens
# Not guarantted since the process itself might buffer its output
# run `python -u` to unbuffer output of a python processes
'cwd': self.get_cwd(),
'env': self.get_environment(),
'stdout': subprocess.PIPE,
'stderr': subprocess.STDOUT,
'close_fds': True, # Doesn't seem to matter
}
This works great on my production machines, but on my dev machine, the call to .readline() hangs when certain subprocesses complete. That is, it will successfully process all of the output, including the final output line saying "process complete", but then will again poll readline and never return. This method exits properly on the dev machine for most of the sub-processes I call, but consistently fails to exit for one complex bash script that itself calls many sub-processes.
It's worth noting that popen.returncode gets set to a non-None (usually 0) value many lines before the end of the output. So I can't just break out of the loop when that is set or else I lose everything that gets spat out at the end of the process and is still buffered waiting for reading. The problem is that when I'm flushing the buffer at that point, I can't tell when I'm at the end because the last call to readline() hangs. Calling read() also hangs. Calling read(1) gets me every last character out, but also hangs after the final line. popen.stdout.closed is always False. How can I tell when I'm at the end?
All systems are running python 2.7.3 on Ubuntu 12.04LTS. FWIW, stderr is being merged with stdout using stderr=subprocess.STDOUT.
Why the difference? Is it failing to close stdout for some reason? Could the sub-sub-processes do something to keep it open somehow? Could it be because I'm launching the process from a terminal on my dev box, but in production it's launched as a daemon through supervisord? Would that change the way the pipes are processed and if so how do I normalize them?
The main code loop looks right. It could be that the pipe isn't closing because another process is keeping it open. For example, if script launches a background process that writes to stdout then the pipe will no close. Are you sure no other child process still running?
An idea is to change modes when you see the .returncode has set. Once you know the main process is done, read all its output from buffer, but don't get stuck waiting. You can use select to read from the pipe with a timeout. Set a several seconds timeout and you can clear the buffer without getting stuck waiting child process.
Without knowing the contents of the "one complex bash script" which causes the problem, there's too many possibilities to determine the exact cause.
However, focusing on the fact that you claim it works if you run your Python script under supervisord, then it might be getting stuck if a sub-process is trying to read from stdin, or just behaves differently if stdin is a tty, which (I presume) supervisord will redirect from /dev/null.
This minimal example seems to cope better with cases where my example test.sh runs subprocesses which try to read from stdin...
import os
import subprocess
f = subprocess.Popen(args='./test.sh',
shell=False,
bufsize=0,
stdin=open(os.devnull, 'rb'),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
close_fds=True)
while 1:
s = f.stdout.readline()
if not s and f.returncode is not None:
break
print s.strip()
f.poll()
print "done %d" % f.returncode
Otherwise, you can always fall back to using a non-blocking read, and bail out when you get your final output line saying "process complete", although it's a bit of a hack.
If you use readline() or read(), it should not hang. No need to check returncode or poll(). If it is hanging when you know the process is finished, it is most probably a subprocess keeping your pipe open, as others said before.
There are two things you could do to debug this:
* Try to reproduce with a minimal script instead of the current complex one, or
* Run that complex script with strace -f -e clone,execve,exit_group and see what is that script starting, and if any process is surviving the main script (check when the main script calls exit_group, if strace is still waiting after that, you have a child still alive).
I find that calls to read (or readline) sometimes hang, despite previously calling poll. So I resorted to calling select to find out if there is readable data. However, select without a timeout can hang, too, if the process was closed. So I call select in a semi-busy loop with a tiny timeout for each iteration (see below).
I'm not sure if you can adapt this to readline, as readline might hang if the final \n is missing, or if the process doesn't close its stdout before you close its stdin and/or terminate it. You could wrap this in a generator, and everytime you encounter a \n in stdout_collected, yield the current line.
Also note that in my actual code, I'm using pseudoterminals (pty) to wrap the popen handles (to more closely fake user input) but it should work without.
# handle to read from
handle = self.popen.stdout
# how many seconds to wait without data
timeout = 1
begin = datetime.now()
stdout_collected = ""
while self.popen.poll() is None:
try:
fds = select.select([handle], [], [], 0.01)[0]
except select.error, exc:
print exc
break
if len(fds) == 0:
# select timed out, no new data
delta = (datetime.now() - begin).total_seconds()
if delta > timeout:
return stdout_collected
# try longer
continue
else:
# have data, timeout counter resets again
begin = datetime.now()
for fd in fds:
if fd == handle:
data = os.read(handle, 1024)
# can handle the bytes as they come in here
# self._handle_stdout(data)
stdout_collected += data
# process exited
# if using a pseudoterminal, close the handles here
self.popen.wait()
Why are you setting the sdterr to STDOUT?
The real benefit of making a communicate() call on a subproces is that you are able to retrieve a tuple containining the stdout response as well as the stderr meesage.
Those might be useful if the logic depends on their succsss or failure.
Also, it would save you from the pain of having to iterate through lines. Communicate() gives you everything and there would be no unresolved questions about whether or not the full message was received
I wrote a demo with bash subprocess that can be easy explored.
A closed pipe can be recognized by '' in the output from readline(), while the output from an empty line is '\n'.
from subprocess import Popen, PIPE, STDOUT
p = Popen(['bash'], stdout=PIPE, stderr=STDOUT)
out = []
while True:
outdata = p.stdout.readline()
if not outdata:
break
#output_consumer.process_output(outdata)
print "* " + repr(outdata)
out.append(outdata)
print "* closed", repr(out)
print "* returncode", p.wait()
Example of input/output: Closing the pipe distinctly before terminating the process. That is why wait() should be used instead of poll()
[prompt] $ python myscript.py
echo abc
* 'abc\n'
exec 1>&- # close stdout
exec 2>&- # close stderr
* closed ['abc\n']
exit
* returncode 0
[prompt] $
Your code did output a huge number of empty strings for this case.
Example: Fast terminated process without '\n' on the last line:
echo -n abc
exit
* 'abc'
* closed ['abc']
* returncode 0