How to make subprocess only communicate error - python

We have created a commodity function used in many projects which uses subprocess to start a command. This function is as follows:
def _popen( command_list ):
p = subprocess.Popen( command_list, stdout=subprocess.PIPE,
stderr=subprocess.PIPE )
out, error_msg = p.communicate()
# Some processes (e.g. system_start) print a number of dots in stderr
# even when no error occurs.
if error_msg.strip('.') == '':
error_msg = ''
return out, error_msg
For most processes this works as intended.
But now I have to use it with a background-process which need to keep running as long as my python-script is running as well and thus now the fun starts ;-).
Note: the script also needs to start other non background-processes using this same _popen-function.
I know that by skipping p.communicate I can make the process start in the background, while my python script continues.
But there are 2 problems with this:
I need to check that the background process started correctly
While the main process is running I need to check the stdout and stderr of the background process from time to time without stopping the process / ending hanging in the background process.
Check background process started correctly
For 1 I currently adapted the _popen version to take an extra parameter 'skip_com' (default False) to skip the p.communicate call. And in that case I return the p-object i.s.o. out and error_msg.
This so I can check if the process is running directly after starting it up and if not call communicate on the p-object to check what the error_msg is.
MY_COMMAND_LIST = [ "<command that should go to background>" ]
def _popen( command_list, skip_com=False ):
p = subprocess.Popen( command_list, stdout=subprocess.PIPE,
stderr=subprocess.PIPE )
if not skip_com:
out, error_msg = p.communicate()
# Some processes (e.g. system_start) print a number of dots in stderr
# even when no error occurs.
if error_msg.strip('.') == '':
error_msg = ''
return out, error_msg
else:
return p
...
p = _popen( MY_COMMAND_LIST, True )
error = _get_command_pid( MY_COMMAND_LIST ) # checks if background command is running using _popen and ps -ef
if error:
_, error_msg = p.communicate()
I do not know if there is a better way to do this.
check stdout / stderr
For 2 I have not found a solution which does not cause the script to wait for the end of the background process.
The only ways I know to communicate is using iter on e.g. p.stdout.readline. But that will hang if the process is still running:
for line in iter( p.stdout.readline, "" ): print line
Any one an idea how to do this?
/edit/ I need to check the data I get from stdout and stderr seperately. Especially stderr is important in this case since if the background process encounters an error it will exit and I need to catch that in my main program to be able to prevent errors caused by that exit.
The stdout output is needed in some situations to check the expected behaviour in the background process and to react on that.

Update
The subprocess will actually exit if it encounters an error
If you don't need to read the output to detect an error then redirect it to DEVNULL and call .poll() to check child process' status from time to time without stopping the process.
assuming you have to read the output:
Do not use stdout=PIPE, stderr=PIPE unless you read from the pipes. Otherwise, the child process may hang as soon as any of the corresponding OS pipe buffers fill up.
If you want to start a process and do something else while it is running then you need a non-blocking way to read its output. A simple portable way is to use a thread:
def process_output(process):
with finishing(process): # close pipes, call .wait()
for line in iter(process.stdout.readline, b''):
if detected_error(line):
communicate_error(process, line)
process = Popen(command, stdout=PIPE, stderr=STDOUT, bufsize=1)
Thread(target=process_output, args=[process]).start()
I need to check the data I get from stdout and stderr seperately.
Use two threads:
def read_stdout(process):
with waiting(process), process.stdout: # close pipe, call .wait()
for line in iter(process.stdout.readline, b''):
do_something_with_stdout(line)
def read_stderr(process):
with process.stderr:
for line in iter(process.stderr.readline, b''):
if detected_error(line):
communicate_error(process, line)
process = Popen(command, stdout=PIPE, stderr=PIPE, bufsize=1)
Thread(target=read_stdout, args=[process]).start()
Thread(target=read_stderr, args=[process]).start()
You could put the code into a custom class (to group do_something_with_stdout(), detected_error(), communicate_error() methods).

It may be better or worse than what you imagine...
Anyway, the correct way of reading a pipe line by line is simply:
for line in p.stdout:
#process line is you want of just
print line
Or if you need to process that inside of a higher level loop
line = next(p.stdout)
But a harder problem could come from the commands started from Python. Many programs use the underlying C standard library, and by default stdout is a buffered stream. The system detects whether the standard output is connected to a terminal, and automatically flushes output on a new line (\n) or on a read on same terminal. But if output is connected to a pipe or a file, everything is buffered until the buffer is full, which on current systems requires several kBytes. In that case nothing can be done at Python level. Above code would get a full line as soon as it would written on the pipe, but cannot guess before callee has actually written something...

Related

stdout.read() from finished subprocess sometimes returning empty?

I have created a dictionary where I associate an id with a subprocess.
Something like:
cmd = "ls"
processes[id] = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
Then I call a method with this process map as an input, that checks which process has finished. If the process finishes, I check the process's stdout.read() for a particular string match.
The issue is sometimes stdout.read() returns an empty value which causes issues in string matching.
Sample Code:
#Create a map
processes[id] = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
...
#Pass that map to a method which checks which processes have finished
completedProcesses(processes)
def completedProcesses(processes):
processList = []
for id,process in processes.iteritems():
if process.poll() is not None:
#If some error in process stdout then print id
verifySuccessStatus(id, processes[id])
processList.add(id)
def verifySuccessStatus(id, process):
file=open(FAILED_IDS_FILE, 'a+')
buffer = process.stdout.read() #This returns empty value sometime
if 'Error' not in buffer:
file.write(id)
file.write('\n')
file.close()
I am new to python, I might be missing some internal functionality understanding of subprocess
There are at least two issues:
There is no point to call process.stdout.read() more than once. .read() does not return until EOF. It returns an empty string to indicate EOF after that.
You should read from the pipes while the processes are still running otherwise they may hang if they generate enough output to fill OS pipe buffers (~65K on my Linux box)
If you want to run multiple external processes concurrently and check their output after they are finished then see this answer that shows "thread pool" and async.io solutions.
Judging by your example command of ls, your issue may be caused by the stdout pipe filling up. Using the process.communicate() method handles this case for you, since you don't need to write multiple times to stdin.
# Recommend the future print function for easier file writing.
from __future__ import print_function
# Create a map
# Keeping access to 'stderr' is generally recommended, but not required.
# Also, if you don't know you need 'shell=True', it's safer practice not to use it.
processes[id] = subprocess.Popen(
[cmd],
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
...
#Pass that map to a method which checks which processes have finished
check_processes(processes)
def check_processes(processes):
process_ids = []
# 'id' is a built-in function in python, so it's safer to use a different name.
for idx, process in processes.iteritems():
# When using pipes, communicate() will handle the case of the pipe
# filling up for you.
stdout, stderr = process.communicate()
if not is_success(stdout):
write_failed_id(idx)
process_ids.append(idx)
def is_success(stdout):
return 'Error' not in stdout
def write_failed_id(idx):
# Recommend using a context manager when interacting with files.
# Also, 'file' is a built-in function in python.
with open(FAILED_IDS_FILE, 'a+') as fail_file:
# The future print function makes file printing simpler.
print(idx, file=fail_file)
You're only reading stdout and looking for "Error". Perhaps you should also be looking in stderr:
processes[id] = subprocess.Popen(
[cmd],
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
From the subprocess docs:
subprocess.STDOUT
Special value that can be used as the stderr argument to Popen and indicates that standard error should go into the same handle as standard output.
The process could have failed unexpectedly, returning no stdout but a non-zero return code. You can check this using process.returncode.
Popen.returncode
The child return code, set by poll() and wait() (and indirectly by communicate()). A None value indicates that the process hasn’t terminated yet.
A negative value -N indicates that the child was terminated by signal N (Unix only).

Python subprocess timing out?

I have a script that runs another command, waits for it to finish, logs the stdout and stderr and based the return code does other stuff. Here is the code:
p = subprocess.Popen(command, stdin=subprocess.PIPE, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
o, e = p.communicate()
if p.returncode:
# report error
# do other stuff
The problem I'm having is that if command takes a long time to run none of the other actions get done. The possible errors won't get reported and the other stuff that needs to happen if no errors doesn't get done. It essentially doesn't go past p.communicate() if it takes too long. Some times this command can takes hours (or even longer) to run and some times it can take as little as 5 seconds.
Am I missing something or doing something wrong?
As per the documentation located here, it's safe to say that you're code is waiting for the subprocess to finish.
If you need to go do 'other things' while you wait you could create a loop like:
while p.poll():
# 'other things'
time.sleep(0.2)
Pick a sleep time that's reasonable for how often you want python to wake up and check the subprocess as well as doing its 'other things'.
The Popen.communicate waits for the process to finish, before anything is returned. Thus it is not ideal for any long running command; and even less so if the subprocess can hang waiting for input, say prompting for a password.
The stderr=subprocess.PIPE, stdout=subprocess.PIPE are needed only if you want to capture the output of the command into a variable. If you are OK with the output going to your terminal, then you can remove these both; and even use subprocess.call instead of Popen. Also, if you do not provide input to your subprocess, then do not use stdin=subprocess.PIPE at all, but direct that from the null device instead (in Python 3.3+ you can use stdin=subprocess.DEVNULL; in Python <3.3 use stdin=open(os.devnull, 'rb')
If you need the contents too, then instead of calling p.communicate(), you can read p.stdout and p.stderr yourself in chunks and output to the terminal, but it is a bit complicated, as it is easy to deadlock the program - the dummy approach would try to read from the subprocess' stdout while the subprocess would want to write to stderr. For this case there are 2 remedies:
you could use select.select to poll both stdout and stderr to see whichever becomes ready first and read from it then
or, if you do not care for stdout and stderr being combined into one,
you can use STDOUT to redirect the stderr stream into the stdout stream: stdout=subprocess.PIPE, stderr=subprocess.STDOUT; now all the output comes to p.stdout that you can read easily in loop and output the chunks, without worrying about deadlocks:
If the stdout, stderr are going to be huge, you can also spool them to a file right there in Popen; say,
stdout = open('stdout.txt', 'w+b')
stderr = open('stderr.txt', 'w+b')
p = subprocess.Popen(..., stdout=stdout, stderr=stderr)
while p.poll() is None:
# reading at the end of the file will return an empty string
err = stderr.read()
print(err)
out = stdout.read()
print(out)
# if we met the end of the file, then we can sleep a bit
# here to avoid spending excess CPU cycles just to poll;
# another option would be to use `select`
if not err and not out: # no input, sleep a bit
time.sleep(0.01)

Printing output in realtime from subprocess

I'm trying to print stdout in realtime for a subprocess but it looks like stdout is buffered even with bufsize=0 and I can't figure out how to make it work, I always have a delay.
The code I tried :
p = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
bufsize=0)
line = p.stdout.readline()
while line:
sys.stdout.write(line)
sys.stdout.flush()
# DO OTHER STUFF
line = p.stdout.readline()
Also tried with for line in iter(p.stdout.readline, b'') instead of the while loop and with read(1) instead of readline(). Always the same result, the output gets delayed by a lot of seconds or minutes and multiple lines appear suddenly at once.
What I think happens :
bufsize is set to 0 ( it is set to 0 by default according to the docs ) so the lines piped top.stdout should be available immediately. But since p.stdout.readline() doesn't return immediately when a new line is piped, that means that it IS buffered, hence the multiple lines at once when the buffer is finally flushed to p.stdout.
What can I do to make it work ?
Thanks to pobrelkey who found the source of the problem. Indeed, the delay is due to the fact that the child is buffering its write to stdout because it is not writing to a tty. The child uses stdio which is line buffered when writing to a tty, else it is fully buffered.
I managed to get it to work by using pexpect instead of subprocess. pexpect uses a pseudo-tty and that's exactly what we need here :
p = pexpect.spawn(cmd,args,timeout=None)
line = p.readline()
while line:
sys.stdout.write(line)
sys.stdout.flush()
# DO OTHER STUFF
line = p.readline()
Or even better in my case :
p = pexpect.spawn(cmd,args,timeout=None,logfile=sys.stdout)
line = p.readline()
while line:
# DO OTHER STUFF
line = p.readline()
No more delay !
More infos about pexpect : wiki
I would first make sure the subprocess itself doesn't buffer its output. If the subprocess is in turn a Python program, proceed to the paragraph below to see how to disable output buffering for Python processes.
As per Python, usually the problem is that Python by default buffers stderr and stdout even if you explicitly .flush() it from the code. The solution is to pass -u to Python when starting your program.
Also, you can just do for line in p.stdout instead of the tricky while loop.
P.S. actually I tried running your code (with cmd = ['cat', '/dev/urandom']) and without -u and it outputted everything in real time already; this is on OS X 10.8.
If you just want stdout of your child process to go to your stdout, why not just have the child process inherit stdout from your process?
subprocess.Popen(cmd, stdout=None, stderr=subprocess.STDOUT)

Detecting the end of the stream on popen.stdout.readline

I have a python program which launches subprocesses using Popen and consumes their output nearly real-time as it is produced. The code of the relevant loop is:
def run(self, output_consumer):
self.prepare_to_run()
popen_args = self.get_popen_args()
logging.debug("Calling popen with arguments %s" % popen_args)
self.popen = subprocess.Popen(**popen_args)
while True:
outdata = self.popen.stdout.readline()
if not outdata and self.popen.returncode is not None:
# Terminate when we've read all the output and the returncode is set
break
output_consumer.process_output(outdata)
self.popen.poll() # updates returncode so we can exit the loop
output_consumer.finish(self.popen.returncode)
self.post_run()
def get_popen_args(self):
return {
'args': self.command,
'shell': False, # Just being explicit for security's sake
'bufsize': 0, # More likely to see what's being printed as it happens
# Not guarantted since the process itself might buffer its output
# run `python -u` to unbuffer output of a python processes
'cwd': self.get_cwd(),
'env': self.get_environment(),
'stdout': subprocess.PIPE,
'stderr': subprocess.STDOUT,
'close_fds': True, # Doesn't seem to matter
}
This works great on my production machines, but on my dev machine, the call to .readline() hangs when certain subprocesses complete. That is, it will successfully process all of the output, including the final output line saying "process complete", but then will again poll readline and never return. This method exits properly on the dev machine for most of the sub-processes I call, but consistently fails to exit for one complex bash script that itself calls many sub-processes.
It's worth noting that popen.returncode gets set to a non-None (usually 0) value many lines before the end of the output. So I can't just break out of the loop when that is set or else I lose everything that gets spat out at the end of the process and is still buffered waiting for reading. The problem is that when I'm flushing the buffer at that point, I can't tell when I'm at the end because the last call to readline() hangs. Calling read() also hangs. Calling read(1) gets me every last character out, but also hangs after the final line. popen.stdout.closed is always False. How can I tell when I'm at the end?
All systems are running python 2.7.3 on Ubuntu 12.04LTS. FWIW, stderr is being merged with stdout using stderr=subprocess.STDOUT.
Why the difference? Is it failing to close stdout for some reason? Could the sub-sub-processes do something to keep it open somehow? Could it be because I'm launching the process from a terminal on my dev box, but in production it's launched as a daemon through supervisord? Would that change the way the pipes are processed and if so how do I normalize them?
The main code loop looks right. It could be that the pipe isn't closing because another process is keeping it open. For example, if script launches a background process that writes to stdout then the pipe will no close. Are you sure no other child process still running?
An idea is to change modes when you see the .returncode has set. Once you know the main process is done, read all its output from buffer, but don't get stuck waiting. You can use select to read from the pipe with a timeout. Set a several seconds timeout and you can clear the buffer without getting stuck waiting child process.
Without knowing the contents of the "one complex bash script" which causes the problem, there's too many possibilities to determine the exact cause.
However, focusing on the fact that you claim it works if you run your Python script under supervisord, then it might be getting stuck if a sub-process is trying to read from stdin, or just behaves differently if stdin is a tty, which (I presume) supervisord will redirect from /dev/null.
This minimal example seems to cope better with cases where my example test.sh runs subprocesses which try to read from stdin...
import os
import subprocess
f = subprocess.Popen(args='./test.sh',
shell=False,
bufsize=0,
stdin=open(os.devnull, 'rb'),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
close_fds=True)
while 1:
s = f.stdout.readline()
if not s and f.returncode is not None:
break
print s.strip()
f.poll()
print "done %d" % f.returncode
Otherwise, you can always fall back to using a non-blocking read, and bail out when you get your final output line saying "process complete", although it's a bit of a hack.
If you use readline() or read(), it should not hang. No need to check returncode or poll(). If it is hanging when you know the process is finished, it is most probably a subprocess keeping your pipe open, as others said before.
There are two things you could do to debug this:
* Try to reproduce with a minimal script instead of the current complex one, or
* Run that complex script with strace -f -e clone,execve,exit_group and see what is that script starting, and if any process is surviving the main script (check when the main script calls exit_group, if strace is still waiting after that, you have a child still alive).
I find that calls to read (or readline) sometimes hang, despite previously calling poll. So I resorted to calling select to find out if there is readable data. However, select without a timeout can hang, too, if the process was closed. So I call select in a semi-busy loop with a tiny timeout for each iteration (see below).
I'm not sure if you can adapt this to readline, as readline might hang if the final \n is missing, or if the process doesn't close its stdout before you close its stdin and/or terminate it. You could wrap this in a generator, and everytime you encounter a \n in stdout_collected, yield the current line.
Also note that in my actual code, I'm using pseudoterminals (pty) to wrap the popen handles (to more closely fake user input) but it should work without.
# handle to read from
handle = self.popen.stdout
# how many seconds to wait without data
timeout = 1
begin = datetime.now()
stdout_collected = ""
while self.popen.poll() is None:
try:
fds = select.select([handle], [], [], 0.01)[0]
except select.error, exc:
print exc
break
if len(fds) == 0:
# select timed out, no new data
delta = (datetime.now() - begin).total_seconds()
if delta > timeout:
return stdout_collected
# try longer
continue
else:
# have data, timeout counter resets again
begin = datetime.now()
for fd in fds:
if fd == handle:
data = os.read(handle, 1024)
# can handle the bytes as they come in here
# self._handle_stdout(data)
stdout_collected += data
# process exited
# if using a pseudoterminal, close the handles here
self.popen.wait()
Why are you setting the sdterr to STDOUT?
The real benefit of making a communicate() call on a subproces is that you are able to retrieve a tuple containining the stdout response as well as the stderr meesage.
Those might be useful if the logic depends on their succsss or failure.
Also, it would save you from the pain of having to iterate through lines. Communicate() gives you everything and there would be no unresolved questions about whether or not the full message was received
I wrote a demo with bash subprocess that can be easy explored.
A closed pipe can be recognized by '' in the output from readline(), while the output from an empty line is '\n'.
from subprocess import Popen, PIPE, STDOUT
p = Popen(['bash'], stdout=PIPE, stderr=STDOUT)
out = []
while True:
outdata = p.stdout.readline()
if not outdata:
break
#output_consumer.process_output(outdata)
print "* " + repr(outdata)
out.append(outdata)
print "* closed", repr(out)
print "* returncode", p.wait()
Example of input/output: Closing the pipe distinctly before terminating the process. That is why wait() should be used instead of poll()
[prompt] $ python myscript.py
echo abc
* 'abc\n'
exec 1>&- # close stdout
exec 2>&- # close stderr
* closed ['abc\n']
exit
* returncode 0
[prompt] $
Your code did output a huge number of empty strings for this case.
Example: Fast terminated process without '\n' on the last line:
echo -n abc
exit
* 'abc'
* closed ['abc']
* returncode 0

Python monitoring stderr and stdout of a subprocess

I trying to start a program (HandBreakCLI) as a subprocess or thread from within python 2.7. I have gotten as far as starting it, but I can't figure out how to monitor it's stderr and stdout.
The program outputs it's status (% done) and info about the encode to stderr and stdout, respectively. I'd like to be able to periodically retrieve the % done from the appropriate stream.
I've tried calling subprocess.Popen with stderr and stdout set to PIPE and using the subprocess.communicate, but it sits and waits till the process is killed or complete then retrieves the output then. Doesn't do me much good.
I've got it up and running as a thread, but as far as I can tell I still have to eventually call subprocess.Popen to execute the program and run into the same wall.
Am I going about this the right way? What other options do I have or how to I get this to work as described?
I have accomplished the same with ffmpeg. This is a stripped down version of the relevant portions. bufsize=1 means line buffering and may not be needed.
def Run(command):
proc = subprocess.Popen(command, bufsize=1,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
universal_newlines=True)
return proc
def Trace(proc):
while proc.poll() is None:
line = proc.stdout.readline()
if line:
# Process output here
print 'Read line', line
proc = Run([ handbrakePath ] + allOptions)
Trace(proc)
Edit 1: I noticed that the subprocess (handbrake in this case) needs to flush after lines to use this (ffmpeg does).
Edit 2: Some quick tests reveal that bufsize=1 may not be actually needed.

Categories