I'm trying to print stdout in realtime for a subprocess but it looks like stdout is buffered even with bufsize=0 and I can't figure out how to make it work, I always have a delay.
The code I tried :
p = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
bufsize=0)
line = p.stdout.readline()
while line:
sys.stdout.write(line)
sys.stdout.flush()
# DO OTHER STUFF
line = p.stdout.readline()
Also tried with for line in iter(p.stdout.readline, b'') instead of the while loop and with read(1) instead of readline(). Always the same result, the output gets delayed by a lot of seconds or minutes and multiple lines appear suddenly at once.
What I think happens :
bufsize is set to 0 ( it is set to 0 by default according to the docs ) so the lines piped top.stdout should be available immediately. But since p.stdout.readline() doesn't return immediately when a new line is piped, that means that it IS buffered, hence the multiple lines at once when the buffer is finally flushed to p.stdout.
What can I do to make it work ?
Thanks to pobrelkey who found the source of the problem. Indeed, the delay is due to the fact that the child is buffering its write to stdout because it is not writing to a tty. The child uses stdio which is line buffered when writing to a tty, else it is fully buffered.
I managed to get it to work by using pexpect instead of subprocess. pexpect uses a pseudo-tty and that's exactly what we need here :
p = pexpect.spawn(cmd,args,timeout=None)
line = p.readline()
while line:
sys.stdout.write(line)
sys.stdout.flush()
# DO OTHER STUFF
line = p.readline()
Or even better in my case :
p = pexpect.spawn(cmd,args,timeout=None,logfile=sys.stdout)
line = p.readline()
while line:
# DO OTHER STUFF
line = p.readline()
No more delay !
More infos about pexpect : wiki
I would first make sure the subprocess itself doesn't buffer its output. If the subprocess is in turn a Python program, proceed to the paragraph below to see how to disable output buffering for Python processes.
As per Python, usually the problem is that Python by default buffers stderr and stdout even if you explicitly .flush() it from the code. The solution is to pass -u to Python when starting your program.
Also, you can just do for line in p.stdout instead of the tricky while loop.
P.S. actually I tried running your code (with cmd = ['cat', '/dev/urandom']) and without -u and it outputted everything in real time already; this is on OS X 10.8.
If you just want stdout of your child process to go to your stdout, why not just have the child process inherit stdout from your process?
subprocess.Popen(cmd, stdout=None, stderr=subprocess.STDOUT)
Related
I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.
How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?
subprocess.communicate() appears to give all the output at once.
To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:
#!/usr/bin/env python2
from subprocess import Popen, PIPE
p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
for line in iter(p.stdout.readline, b''):
print line,
p.wait() # wait for the subprocess to exit
iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.
If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?
Here's Python 3 code:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).
Please note, I think J.F. Sebastian's method (below) is better.
Here is an simple example (with no checking for errors):
import subprocess
proc = subprocess.Popen('ls',
shell=True,
stdout=subprocess.PIPE,
)
while proc.poll() is None:
output = proc.stdout.readline()
print output,
If ls ends too fast, then the while loop may end before you've read all the data.
You can catch the remainder in stdout this way:
output = proc.communicate()[0]
print output,
I believe the simplest way to collect output from a process in a streaming fashion is like this:
import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
data = proc.stdout.readline() # Alternatively proc.stdout.read(1024)
if len(data) == 0:
break
sys.stdout.write(data) # sys.stdout.buffer.write(data) on Python 3.x
The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.
On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.
If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().
If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:
import subprocess
# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])
See the docs for subprocess.check_call().
If you need to process the output, sure, loop on it. But if you don't, just keep it simple.
Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).
myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:
print(p.stderr.readline().rstrip('\r\n'))
Adding another python3 solution with a few small changes:
Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
Also pipes stderr out in real time
import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
# Run a shell command, streaming output to STDOUT in real time
# Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
for line in p.stdout:
sys.stdout.write(line)
p.wait()
exit_code = p.returncode
if exit_code != 0 and fail_on_error:
raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
return(exit_code)
I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.
How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?
subprocess.communicate() appears to give all the output at once.
To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:
#!/usr/bin/env python2
from subprocess import Popen, PIPE
p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
for line in iter(p.stdout.readline, b''):
print line,
p.wait() # wait for the subprocess to exit
iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.
If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?
Here's Python 3 code:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).
Please note, I think J.F. Sebastian's method (below) is better.
Here is an simple example (with no checking for errors):
import subprocess
proc = subprocess.Popen('ls',
shell=True,
stdout=subprocess.PIPE,
)
while proc.poll() is None:
output = proc.stdout.readline()
print output,
If ls ends too fast, then the while loop may end before you've read all the data.
You can catch the remainder in stdout this way:
output = proc.communicate()[0]
print output,
I believe the simplest way to collect output from a process in a streaming fashion is like this:
import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
data = proc.stdout.readline() # Alternatively proc.stdout.read(1024)
if len(data) == 0:
break
sys.stdout.write(data) # sys.stdout.buffer.write(data) on Python 3.x
The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.
On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.
If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().
If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:
import subprocess
# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])
See the docs for subprocess.check_call().
If you need to process the output, sure, loop on it. But if you don't, just keep it simple.
Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).
myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:
print(p.stderr.readline().rstrip('\r\n'))
Adding another python3 solution with a few small changes:
Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
Also pipes stderr out in real time
import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
# Run a shell command, streaming output to STDOUT in real time
# Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
for line in p.stdout:
sys.stdout.write(line)
p.wait()
exit_code = p.returncode
if exit_code != 0 and fail_on_error:
raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
return(exit_code)
We have created a commodity function used in many projects which uses subprocess to start a command. This function is as follows:
def _popen( command_list ):
p = subprocess.Popen( command_list, stdout=subprocess.PIPE,
stderr=subprocess.PIPE )
out, error_msg = p.communicate()
# Some processes (e.g. system_start) print a number of dots in stderr
# even when no error occurs.
if error_msg.strip('.') == '':
error_msg = ''
return out, error_msg
For most processes this works as intended.
But now I have to use it with a background-process which need to keep running as long as my python-script is running as well and thus now the fun starts ;-).
Note: the script also needs to start other non background-processes using this same _popen-function.
I know that by skipping p.communicate I can make the process start in the background, while my python script continues.
But there are 2 problems with this:
I need to check that the background process started correctly
While the main process is running I need to check the stdout and stderr of the background process from time to time without stopping the process / ending hanging in the background process.
Check background process started correctly
For 1 I currently adapted the _popen version to take an extra parameter 'skip_com' (default False) to skip the p.communicate call. And in that case I return the p-object i.s.o. out and error_msg.
This so I can check if the process is running directly after starting it up and if not call communicate on the p-object to check what the error_msg is.
MY_COMMAND_LIST = [ "<command that should go to background>" ]
def _popen( command_list, skip_com=False ):
p = subprocess.Popen( command_list, stdout=subprocess.PIPE,
stderr=subprocess.PIPE )
if not skip_com:
out, error_msg = p.communicate()
# Some processes (e.g. system_start) print a number of dots in stderr
# even when no error occurs.
if error_msg.strip('.') == '':
error_msg = ''
return out, error_msg
else:
return p
...
p = _popen( MY_COMMAND_LIST, True )
error = _get_command_pid( MY_COMMAND_LIST ) # checks if background command is running using _popen and ps -ef
if error:
_, error_msg = p.communicate()
I do not know if there is a better way to do this.
check stdout / stderr
For 2 I have not found a solution which does not cause the script to wait for the end of the background process.
The only ways I know to communicate is using iter on e.g. p.stdout.readline. But that will hang if the process is still running:
for line in iter( p.stdout.readline, "" ): print line
Any one an idea how to do this?
/edit/ I need to check the data I get from stdout and stderr seperately. Especially stderr is important in this case since if the background process encounters an error it will exit and I need to catch that in my main program to be able to prevent errors caused by that exit.
The stdout output is needed in some situations to check the expected behaviour in the background process and to react on that.
Update
The subprocess will actually exit if it encounters an error
If you don't need to read the output to detect an error then redirect it to DEVNULL and call .poll() to check child process' status from time to time without stopping the process.
assuming you have to read the output:
Do not use stdout=PIPE, stderr=PIPE unless you read from the pipes. Otherwise, the child process may hang as soon as any of the corresponding OS pipe buffers fill up.
If you want to start a process and do something else while it is running then you need a non-blocking way to read its output. A simple portable way is to use a thread:
def process_output(process):
with finishing(process): # close pipes, call .wait()
for line in iter(process.stdout.readline, b''):
if detected_error(line):
communicate_error(process, line)
process = Popen(command, stdout=PIPE, stderr=STDOUT, bufsize=1)
Thread(target=process_output, args=[process]).start()
I need to check the data I get from stdout and stderr seperately.
Use two threads:
def read_stdout(process):
with waiting(process), process.stdout: # close pipe, call .wait()
for line in iter(process.stdout.readline, b''):
do_something_with_stdout(line)
def read_stderr(process):
with process.stderr:
for line in iter(process.stderr.readline, b''):
if detected_error(line):
communicate_error(process, line)
process = Popen(command, stdout=PIPE, stderr=PIPE, bufsize=1)
Thread(target=read_stdout, args=[process]).start()
Thread(target=read_stderr, args=[process]).start()
You could put the code into a custom class (to group do_something_with_stdout(), detected_error(), communicate_error() methods).
It may be better or worse than what you imagine...
Anyway, the correct way of reading a pipe line by line is simply:
for line in p.stdout:
#process line is you want of just
print line
Or if you need to process that inside of a higher level loop
line = next(p.stdout)
But a harder problem could come from the commands started from Python. Many programs use the underlying C standard library, and by default stdout is a buffered stream. The system detects whether the standard output is connected to a terminal, and automatically flushes output on a new line (\n) or on a read on same terminal. But if output is connected to a pipe or a file, everything is buffered until the buffer is full, which on current systems requires several kBytes. In that case nothing can be done at Python level. Above code would get a full line as soon as it would written on the pipe, but cannot guess before callee has actually written something...
I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.
How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?
subprocess.communicate() appears to give all the output at once.
To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:
#!/usr/bin/env python2
from subprocess import Popen, PIPE
p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
for line in iter(p.stdout.readline, b''):
print line,
p.wait() # wait for the subprocess to exit
iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.
If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?
Here's Python 3 code:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).
Please note, I think J.F. Sebastian's method (below) is better.
Here is an simple example (with no checking for errors):
import subprocess
proc = subprocess.Popen('ls',
shell=True,
stdout=subprocess.PIPE,
)
while proc.poll() is None:
output = proc.stdout.readline()
print output,
If ls ends too fast, then the while loop may end before you've read all the data.
You can catch the remainder in stdout this way:
output = proc.communicate()[0]
print output,
I believe the simplest way to collect output from a process in a streaming fashion is like this:
import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
data = proc.stdout.readline() # Alternatively proc.stdout.read(1024)
if len(data) == 0:
break
sys.stdout.write(data) # sys.stdout.buffer.write(data) on Python 3.x
The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.
On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.
If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().
If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:
import subprocess
# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])
See the docs for subprocess.check_call().
If you need to process the output, sure, loop on it. But if you don't, just keep it simple.
Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).
myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:
print(p.stderr.readline().rstrip('\r\n'))
Adding another python3 solution with a few small changes:
Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
Also pipes stderr out in real time
import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
# Run a shell command, streaming output to STDOUT in real time
# Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
for line in p.stdout:
sys.stdout.write(line)
p.wait()
exit_code = p.returncode
if exit_code != 0 and fail_on_error:
raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
return(exit_code)
I am trying to grab stdout from a subprocess.Popen call and although I am achieving this easily by doing:
cmd = subprocess.Popen('ls -l', shell=True, stdout=PIPE)
for line in cmd.stdout.readlines():
print line
I would like to grab stdout in "real time". With the above method, PIPE is waiting to grab all the stdout and then it returns.
So for logging purposes, this doesn't meet my requirements (e.g. "see" what is going on while it happens).
Is there a way to get line by line, stdout while is running? Or is this a limitation of subprocess(having to wait until the PIPE closes).
EDIT
If I switch readlines() for readline() I only get the last line of the stdout (not ideal):
In [75]: cmd = Popen('ls -l', shell=True, stdout=PIPE)
In [76]: for i in cmd.stdout.readline(): print i
....:
t
o
t
a
l
1
0
4
Your interpreter is buffering. Add a call to sys.stdout.flush() after your print statement.
Actually, the real solution is to directly redirect the stdout of the subprocess to the stdout of your process.
Indeed, with your solution, you can only print stdout, and not stderr, for instance, at the same time.
import sys
from subprocess import Popen
Popen("./slow_cmd_output.sh", stdout=sys.stdout, stderr=sys.stderr).communicate()
The communicate() is so to make the call blocking until the end of the subprocess, else it would directly go to the next line and your program might terminate before the subprocess (although the redirection to your stdout will still work, even after your python script has closed, I tested it).
That way, for instance, you are redirecting both stdout and stderr, and in absolute real time.
For instance, in my case I tested with this script slow_cmd_output.sh:
#!/bin/bash
for i in 1 2 3 4 5 6; do sleep 5 && echo "${i}th output" && echo "err output num ${i}" >&2; done
To get output "in real time", subprocess is unsuitable because it can't defeat the other process's buffering strategies. That's the reason I always recommend, whenever such "real time" output grabbing is desired (quite a frequent question on stack overflow!), to use instead pexpect (everywhere but Windows -- on Windows, wexpect).
Drop the readlines() which is coalescing the output.
Also you'll need to enforce line buffering since most commands will interally buffer output to a pipe. For details see: http://www.pixelbeat.org/programming/stdio_buffering/
As this is a question I searched for an answer to for days, I wanted to leave this here for those who follow. While it is true that subprocess cannot combat the other process's buffering strategy, in the case where you are calling another Python script with subprocess.Popen, you CAN tell it to start an unbuffered python.
command = ["python", "-u", "python_file.py"]
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in iter(p.stdout.readline, ''):
line = line.replace('\r', '').replace('\n', '')
print line
sys.stdout.flush()
I have also seen cases where the popen arguments bufsize=1 and universal_newlines=True have helped with exposing the hidden stdout.
cmd = subprocess.Popen(["ls", "-l"], stdout=subprocess.PIPE)
for line in cmd.stdout:
print line.rstrip("\n")
The call to readlines is waiting for the process to exit. Replace this with a loop around cmd.stdout.readline() (note singular) and all should be well.
As stated already the issue is in the stdio library's buffering of printf like statements when no terminal is attached to the process. There is a way around this on the Windows platform anyway. There may be a similar solution on other platforms as well.
On Windows you can force create a new console at process creation. The good thing is this can remain hidden so you never see it (this is done by shell=True inside the subprocess module).
cmd = subprocess.Popen('ls -l', shell=True, stdout=PIPE, creationflags=_winapi.CREATE_NEW_CONSOLE, bufsize=1, universal_newlines=True)
for line in cmd.stdout.readlines():
print line
or
A slightly more complete solution is that you explicitly set the STARTUPINFO params which prevents launching a new and unnecessary cmd.exe shell process which shell=True did above.
class PopenBackground(subprocess.Popen):
def __init__(self, *args, **kwargs):
si = kwargs.get('startupinfo', subprocess.STARTUPINFO())
si.dwFlags |= _winapi.STARTF_USESHOWWINDOW
si.wShowWindow = _winapi.SW_HIDE
kwargs['startupinfo'] = si
kwargs['creationflags'] = kwargs.get('creationflags', 0) | _winapi.CREATE_NEW_CONSOLE
kwargs['bufsize'] = 1
kwargs['universal_newlines'] = True
super(PopenBackground, self).__init__(*args, **kwargs)
process = PopenBackground(['ls', '-l'], stdout=subprocess.PIPE)
for line in cmd.stdout.readlines():
print line