Python monitoring stderr and stdout of a subprocess - python

I trying to start a program (HandBreakCLI) as a subprocess or thread from within python 2.7. I have gotten as far as starting it, but I can't figure out how to monitor it's stderr and stdout.
The program outputs it's status (% done) and info about the encode to stderr and stdout, respectively. I'd like to be able to periodically retrieve the % done from the appropriate stream.
I've tried calling subprocess.Popen with stderr and stdout set to PIPE and using the subprocess.communicate, but it sits and waits till the process is killed or complete then retrieves the output then. Doesn't do me much good.
I've got it up and running as a thread, but as far as I can tell I still have to eventually call subprocess.Popen to execute the program and run into the same wall.
Am I going about this the right way? What other options do I have or how to I get this to work as described?

I have accomplished the same with ffmpeg. This is a stripped down version of the relevant portions. bufsize=1 means line buffering and may not be needed.
def Run(command):
proc = subprocess.Popen(command, bufsize=1,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
universal_newlines=True)
return proc
def Trace(proc):
while proc.poll() is None:
line = proc.stdout.readline()
if line:
# Process output here
print 'Read line', line
proc = Run([ handbrakePath ] + allOptions)
Trace(proc)
Edit 1: I noticed that the subprocess (handbrake in this case) needs to flush after lines to use this (ffmpeg does).
Edit 2: Some quick tests reveal that bufsize=1 may not be actually needed.

Related

How to make subprocess only communicate error

We have created a commodity function used in many projects which uses subprocess to start a command. This function is as follows:
def _popen( command_list ):
p = subprocess.Popen( command_list, stdout=subprocess.PIPE,
stderr=subprocess.PIPE )
out, error_msg = p.communicate()
# Some processes (e.g. system_start) print a number of dots in stderr
# even when no error occurs.
if error_msg.strip('.') == '':
error_msg = ''
return out, error_msg
For most processes this works as intended.
But now I have to use it with a background-process which need to keep running as long as my python-script is running as well and thus now the fun starts ;-).
Note: the script also needs to start other non background-processes using this same _popen-function.
I know that by skipping p.communicate I can make the process start in the background, while my python script continues.
But there are 2 problems with this:
I need to check that the background process started correctly
While the main process is running I need to check the stdout and stderr of the background process from time to time without stopping the process / ending hanging in the background process.
Check background process started correctly
For 1 I currently adapted the _popen version to take an extra parameter 'skip_com' (default False) to skip the p.communicate call. And in that case I return the p-object i.s.o. out and error_msg.
This so I can check if the process is running directly after starting it up and if not call communicate on the p-object to check what the error_msg is.
MY_COMMAND_LIST = [ "<command that should go to background>" ]
def _popen( command_list, skip_com=False ):
p = subprocess.Popen( command_list, stdout=subprocess.PIPE,
stderr=subprocess.PIPE )
if not skip_com:
out, error_msg = p.communicate()
# Some processes (e.g. system_start) print a number of dots in stderr
# even when no error occurs.
if error_msg.strip('.') == '':
error_msg = ''
return out, error_msg
else:
return p
...
p = _popen( MY_COMMAND_LIST, True )
error = _get_command_pid( MY_COMMAND_LIST ) # checks if background command is running using _popen and ps -ef
if error:
_, error_msg = p.communicate()
I do not know if there is a better way to do this.
check stdout / stderr
For 2 I have not found a solution which does not cause the script to wait for the end of the background process.
The only ways I know to communicate is using iter on e.g. p.stdout.readline. But that will hang if the process is still running:
for line in iter( p.stdout.readline, "" ): print line
Any one an idea how to do this?
/edit/ I need to check the data I get from stdout and stderr seperately. Especially stderr is important in this case since if the background process encounters an error it will exit and I need to catch that in my main program to be able to prevent errors caused by that exit.
The stdout output is needed in some situations to check the expected behaviour in the background process and to react on that.
Update
The subprocess will actually exit if it encounters an error
If you don't need to read the output to detect an error then redirect it to DEVNULL and call .poll() to check child process' status from time to time without stopping the process.
assuming you have to read the output:
Do not use stdout=PIPE, stderr=PIPE unless you read from the pipes. Otherwise, the child process may hang as soon as any of the corresponding OS pipe buffers fill up.
If you want to start a process and do something else while it is running then you need a non-blocking way to read its output. A simple portable way is to use a thread:
def process_output(process):
with finishing(process): # close pipes, call .wait()
for line in iter(process.stdout.readline, b''):
if detected_error(line):
communicate_error(process, line)
process = Popen(command, stdout=PIPE, stderr=STDOUT, bufsize=1)
Thread(target=process_output, args=[process]).start()
I need to check the data I get from stdout and stderr seperately.
Use two threads:
def read_stdout(process):
with waiting(process), process.stdout: # close pipe, call .wait()
for line in iter(process.stdout.readline, b''):
do_something_with_stdout(line)
def read_stderr(process):
with process.stderr:
for line in iter(process.stderr.readline, b''):
if detected_error(line):
communicate_error(process, line)
process = Popen(command, stdout=PIPE, stderr=PIPE, bufsize=1)
Thread(target=read_stdout, args=[process]).start()
Thread(target=read_stderr, args=[process]).start()
You could put the code into a custom class (to group do_something_with_stdout(), detected_error(), communicate_error() methods).
It may be better or worse than what you imagine...
Anyway, the correct way of reading a pipe line by line is simply:
for line in p.stdout:
#process line is you want of just
print line
Or if you need to process that inside of a higher level loop
line = next(p.stdout)
But a harder problem could come from the commands started from Python. Many programs use the underlying C standard library, and by default stdout is a buffered stream. The system detects whether the standard output is connected to a terminal, and automatically flushes output on a new line (\n) or on a read on same terminal. But if output is connected to a pipe or a file, everything is buffered until the buffer is full, which on current systems requires several kBytes. In that case nothing can be done at Python level. Above code would get a full line as soon as it would written on the pipe, but cannot guess before callee has actually written something...

Python subprocess timing out?

I have a script that runs another command, waits for it to finish, logs the stdout and stderr and based the return code does other stuff. Here is the code:
p = subprocess.Popen(command, stdin=subprocess.PIPE, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
o, e = p.communicate()
if p.returncode:
# report error
# do other stuff
The problem I'm having is that if command takes a long time to run none of the other actions get done. The possible errors won't get reported and the other stuff that needs to happen if no errors doesn't get done. It essentially doesn't go past p.communicate() if it takes too long. Some times this command can takes hours (or even longer) to run and some times it can take as little as 5 seconds.
Am I missing something or doing something wrong?
As per the documentation located here, it's safe to say that you're code is waiting for the subprocess to finish.
If you need to go do 'other things' while you wait you could create a loop like:
while p.poll():
# 'other things'
time.sleep(0.2)
Pick a sleep time that's reasonable for how often you want python to wake up and check the subprocess as well as doing its 'other things'.
The Popen.communicate waits for the process to finish, before anything is returned. Thus it is not ideal for any long running command; and even less so if the subprocess can hang waiting for input, say prompting for a password.
The stderr=subprocess.PIPE, stdout=subprocess.PIPE are needed only if you want to capture the output of the command into a variable. If you are OK with the output going to your terminal, then you can remove these both; and even use subprocess.call instead of Popen. Also, if you do not provide input to your subprocess, then do not use stdin=subprocess.PIPE at all, but direct that from the null device instead (in Python 3.3+ you can use stdin=subprocess.DEVNULL; in Python <3.3 use stdin=open(os.devnull, 'rb')
If you need the contents too, then instead of calling p.communicate(), you can read p.stdout and p.stderr yourself in chunks and output to the terminal, but it is a bit complicated, as it is easy to deadlock the program - the dummy approach would try to read from the subprocess' stdout while the subprocess would want to write to stderr. For this case there are 2 remedies:
you could use select.select to poll both stdout and stderr to see whichever becomes ready first and read from it then
or, if you do not care for stdout and stderr being combined into one,
you can use STDOUT to redirect the stderr stream into the stdout stream: stdout=subprocess.PIPE, stderr=subprocess.STDOUT; now all the output comes to p.stdout that you can read easily in loop and output the chunks, without worrying about deadlocks:
If the stdout, stderr are going to be huge, you can also spool them to a file right there in Popen; say,
stdout = open('stdout.txt', 'w+b')
stderr = open('stderr.txt', 'w+b')
p = subprocess.Popen(..., stdout=stdout, stderr=stderr)
while p.poll() is None:
# reading at the end of the file will return an empty string
err = stderr.read()
print(err)
out = stdout.read()
print(out)
# if we met the end of the file, then we can sleep a bit
# here to avoid spending excess CPU cycles just to poll;
# another option would be to use `select`
if not err and not out: # no input, sleep a bit
time.sleep(0.01)

Printing output in realtime from subprocess

I'm trying to print stdout in realtime for a subprocess but it looks like stdout is buffered even with bufsize=0 and I can't figure out how to make it work, I always have a delay.
The code I tried :
p = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
bufsize=0)
line = p.stdout.readline()
while line:
sys.stdout.write(line)
sys.stdout.flush()
# DO OTHER STUFF
line = p.stdout.readline()
Also tried with for line in iter(p.stdout.readline, b'') instead of the while loop and with read(1) instead of readline(). Always the same result, the output gets delayed by a lot of seconds or minutes and multiple lines appear suddenly at once.
What I think happens :
bufsize is set to 0 ( it is set to 0 by default according to the docs ) so the lines piped top.stdout should be available immediately. But since p.stdout.readline() doesn't return immediately when a new line is piped, that means that it IS buffered, hence the multiple lines at once when the buffer is finally flushed to p.stdout.
What can I do to make it work ?
Thanks to pobrelkey who found the source of the problem. Indeed, the delay is due to the fact that the child is buffering its write to stdout because it is not writing to a tty. The child uses stdio which is line buffered when writing to a tty, else it is fully buffered.
I managed to get it to work by using pexpect instead of subprocess. pexpect uses a pseudo-tty and that's exactly what we need here :
p = pexpect.spawn(cmd,args,timeout=None)
line = p.readline()
while line:
sys.stdout.write(line)
sys.stdout.flush()
# DO OTHER STUFF
line = p.readline()
Or even better in my case :
p = pexpect.spawn(cmd,args,timeout=None,logfile=sys.stdout)
line = p.readline()
while line:
# DO OTHER STUFF
line = p.readline()
No more delay !
More infos about pexpect : wiki
I would first make sure the subprocess itself doesn't buffer its output. If the subprocess is in turn a Python program, proceed to the paragraph below to see how to disable output buffering for Python processes.
As per Python, usually the problem is that Python by default buffers stderr and stdout even if you explicitly .flush() it from the code. The solution is to pass -u to Python when starting your program.
Also, you can just do for line in p.stdout instead of the tricky while loop.
P.S. actually I tried running your code (with cmd = ['cat', '/dev/urandom']) and without -u and it outputted everything in real time already; this is on OS X 10.8.
If you just want stdout of your child process to go to your stdout, why not just have the child process inherit stdout from your process?
subprocess.Popen(cmd, stdout=None, stderr=subprocess.STDOUT)

python Popen: How do I block the execution of grep command until the content to grep is ready?

I have been fighting against Popen in python for couple of days now, so I decided to put all my doubts here, hopefully all of them can be clarified by python experts.
Initially I use Popen to execute a command and grep the result(as one command using pipe, something like xxx | grep yyy), with shell=False, as you can imagine, that doesn't work quite well. Following the guide in this post, I changed my code to the following:
checkCmd = ["sudo", "pyrit", "-r", self.capFile, "analyze"]
checkExec = Popen(checkCmd, shell=False, stdout=PIPE, stderr=STDOUT)
grepExec = Popen(["grep", "good"], stdin=checkExec.stdout, stdout=PIPE)
output = grepExec.stdout.readline()
output = grepExec.communicate()[0]
But I realized that the checkExec runs slowly and since Popen is non-blocking, grepExec always get executed before checkExec shows any result, thus the grep output would always be blank. How can I postpone the execution of grepExec till checkExec is finished?
In another Popen in my program, I tried to keep a service open at the back, so I use a separate thread to execute it. When all the tasks are done, I notify this thread to quit, and I explicitly call Popen.kill() to stop the service. However, my system ends up with a zombie process that is not reaped. I don't know if there's a nice way to clean up everything in this background thread after it finishes?
What are the differences between Popen.communicate()[0] and Popen.stdout.readline()? Can I use a loop to keep reading output from both of them?
Your example would work if you do it like this:
checkCmd = ["sudo", "pyrit", "-r", self.capFile, "analyze"]
checkExec = Popen(checkCmd, shell=False, stdout=PIPE, stderr=STDOUT)
grepExec = Popen(["grep", "good"], stdin=checkExec.stdout, stdout=PIPE)
for line in grepExec.stdout:
# do something with line
You use communicate when you want to give some input to a process and read all output on stdout, stderr of the process at the same time. This is probably not what you want for your case. communicate is more for the cases where you want to start an application, feed all the input it needs to it and read its output.
As other answers have pointed out you can use shell=True to create the pipeline in your call to subprocess, but an alternative which I would prefer is to leverage python and instead of setting up a pipeline doing:
checkCmd = ["sudo", "pyrit", "-r", self.capFile, "analyze"]
checkExec = Popen(checkCmd, shell=False, stdout=PIPE, stderr=STDOUT)
for line in checkExec.stdout:
if line.find('good') != -1:
do something with the matched line here
Use subprocess instead of popen, then you can simplify things drastically with the complete commandline.
http://docs.python.org/library/subprocess.html
eg.
import subprocess as sub
f = open('/dev/null', 'w')
proc = sub.call("cat file | grep string", executable="/bin/bash", shell=True)

real time subprocess.Popen via stdout and PIPE

I am trying to grab stdout from a subprocess.Popen call and although I am achieving this easily by doing:
cmd = subprocess.Popen('ls -l', shell=True, stdout=PIPE)
for line in cmd.stdout.readlines():
print line
I would like to grab stdout in "real time". With the above method, PIPE is waiting to grab all the stdout and then it returns.
So for logging purposes, this doesn't meet my requirements (e.g. "see" what is going on while it happens).
Is there a way to get line by line, stdout while is running? Or is this a limitation of subprocess(having to wait until the PIPE closes).
EDIT
If I switch readlines() for readline() I only get the last line of the stdout (not ideal):
In [75]: cmd = Popen('ls -l', shell=True, stdout=PIPE)
In [76]: for i in cmd.stdout.readline(): print i
....:
t
o
t
a
l
1
0
4
Your interpreter is buffering. Add a call to sys.stdout.flush() after your print statement.
Actually, the real solution is to directly redirect the stdout of the subprocess to the stdout of your process.
Indeed, with your solution, you can only print stdout, and not stderr, for instance, at the same time.
import sys
from subprocess import Popen
Popen("./slow_cmd_output.sh", stdout=sys.stdout, stderr=sys.stderr).communicate()
The communicate() is so to make the call blocking until the end of the subprocess, else it would directly go to the next line and your program might terminate before the subprocess (although the redirection to your stdout will still work, even after your python script has closed, I tested it).
That way, for instance, you are redirecting both stdout and stderr, and in absolute real time.
For instance, in my case I tested with this script slow_cmd_output.sh:
#!/bin/bash
for i in 1 2 3 4 5 6; do sleep 5 && echo "${i}th output" && echo "err output num ${i}" >&2; done
To get output "in real time", subprocess is unsuitable because it can't defeat the other process's buffering strategies. That's the reason I always recommend, whenever such "real time" output grabbing is desired (quite a frequent question on stack overflow!), to use instead pexpect (everywhere but Windows -- on Windows, wexpect).
Drop the readlines() which is coalescing the output.
Also you'll need to enforce line buffering since most commands will interally buffer output to a pipe. For details see: http://www.pixelbeat.org/programming/stdio_buffering/
As this is a question I searched for an answer to for days, I wanted to leave this here for those who follow. While it is true that subprocess cannot combat the other process's buffering strategy, in the case where you are calling another Python script with subprocess.Popen, you CAN tell it to start an unbuffered python.
command = ["python", "-u", "python_file.py"]
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in iter(p.stdout.readline, ''):
line = line.replace('\r', '').replace('\n', '')
print line
sys.stdout.flush()
I have also seen cases where the popen arguments bufsize=1 and universal_newlines=True have helped with exposing the hidden stdout.
cmd = subprocess.Popen(["ls", "-l"], stdout=subprocess.PIPE)
for line in cmd.stdout:
print line.rstrip("\n")
The call to readlines is waiting for the process to exit. Replace this with a loop around cmd.stdout.readline() (note singular) and all should be well.
As stated already the issue is in the stdio library's buffering of printf like statements when no terminal is attached to the process. There is a way around this on the Windows platform anyway. There may be a similar solution on other platforms as well.
On Windows you can force create a new console at process creation. The good thing is this can remain hidden so you never see it (this is done by shell=True inside the subprocess module).
cmd = subprocess.Popen('ls -l', shell=True, stdout=PIPE, creationflags=_winapi.CREATE_NEW_CONSOLE, bufsize=1, universal_newlines=True)
for line in cmd.stdout.readlines():
print line
or
A slightly more complete solution is that you explicitly set the STARTUPINFO params which prevents launching a new and unnecessary cmd.exe shell process which shell=True did above.
class PopenBackground(subprocess.Popen):
def __init__(self, *args, **kwargs):
si = kwargs.get('startupinfo', subprocess.STARTUPINFO())
si.dwFlags |= _winapi.STARTF_USESHOWWINDOW
si.wShowWindow = _winapi.SW_HIDE
kwargs['startupinfo'] = si
kwargs['creationflags'] = kwargs.get('creationflags', 0) | _winapi.CREATE_NEW_CONSOLE
kwargs['bufsize'] = 1
kwargs['universal_newlines'] = True
super(PopenBackground, self).__init__(*args, **kwargs)
process = PopenBackground(['ls', '-l'], stdout=subprocess.PIPE)
for line in cmd.stdout.readlines():
print line

Categories