real time subprocess.Popen via stdout and PIPE - python

I am trying to grab stdout from a subprocess.Popen call and although I am achieving this easily by doing:
cmd = subprocess.Popen('ls -l', shell=True, stdout=PIPE)
for line in cmd.stdout.readlines():
print line
I would like to grab stdout in "real time". With the above method, PIPE is waiting to grab all the stdout and then it returns.
So for logging purposes, this doesn't meet my requirements (e.g. "see" what is going on while it happens).
Is there a way to get line by line, stdout while is running? Or is this a limitation of subprocess(having to wait until the PIPE closes).
EDIT
If I switch readlines() for readline() I only get the last line of the stdout (not ideal):
In [75]: cmd = Popen('ls -l', shell=True, stdout=PIPE)
In [76]: for i in cmd.stdout.readline(): print i
....:
t
o
t
a
l
1
0
4

Your interpreter is buffering. Add a call to sys.stdout.flush() after your print statement.

Actually, the real solution is to directly redirect the stdout of the subprocess to the stdout of your process.
Indeed, with your solution, you can only print stdout, and not stderr, for instance, at the same time.
import sys
from subprocess import Popen
Popen("./slow_cmd_output.sh", stdout=sys.stdout, stderr=sys.stderr).communicate()
The communicate() is so to make the call blocking until the end of the subprocess, else it would directly go to the next line and your program might terminate before the subprocess (although the redirection to your stdout will still work, even after your python script has closed, I tested it).
That way, for instance, you are redirecting both stdout and stderr, and in absolute real time.
For instance, in my case I tested with this script slow_cmd_output.sh:
#!/bin/bash
for i in 1 2 3 4 5 6; do sleep 5 && echo "${i}th output" && echo "err output num ${i}" >&2; done

To get output "in real time", subprocess is unsuitable because it can't defeat the other process's buffering strategies. That's the reason I always recommend, whenever such "real time" output grabbing is desired (quite a frequent question on stack overflow!), to use instead pexpect (everywhere but Windows -- on Windows, wexpect).

Drop the readlines() which is coalescing the output.
Also you'll need to enforce line buffering since most commands will interally buffer output to a pipe. For details see: http://www.pixelbeat.org/programming/stdio_buffering/

As this is a question I searched for an answer to for days, I wanted to leave this here for those who follow. While it is true that subprocess cannot combat the other process's buffering strategy, in the case where you are calling another Python script with subprocess.Popen, you CAN tell it to start an unbuffered python.
command = ["python", "-u", "python_file.py"]
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in iter(p.stdout.readline, ''):
line = line.replace('\r', '').replace('\n', '')
print line
sys.stdout.flush()
I have also seen cases where the popen arguments bufsize=1 and universal_newlines=True have helped with exposing the hidden stdout.

cmd = subprocess.Popen(["ls", "-l"], stdout=subprocess.PIPE)
for line in cmd.stdout:
print line.rstrip("\n")

The call to readlines is waiting for the process to exit. Replace this with a loop around cmd.stdout.readline() (note singular) and all should be well.

As stated already the issue is in the stdio library's buffering of printf like statements when no terminal is attached to the process. There is a way around this on the Windows platform anyway. There may be a similar solution on other platforms as well.
On Windows you can force create a new console at process creation. The good thing is this can remain hidden so you never see it (this is done by shell=True inside the subprocess module).
cmd = subprocess.Popen('ls -l', shell=True, stdout=PIPE, creationflags=_winapi.CREATE_NEW_CONSOLE, bufsize=1, universal_newlines=True)
for line in cmd.stdout.readlines():
print line
or
A slightly more complete solution is that you explicitly set the STARTUPINFO params which prevents launching a new and unnecessary cmd.exe shell process which shell=True did above.
class PopenBackground(subprocess.Popen):
def __init__(self, *args, **kwargs):
si = kwargs.get('startupinfo', subprocess.STARTUPINFO())
si.dwFlags |= _winapi.STARTF_USESHOWWINDOW
si.wShowWindow = _winapi.SW_HIDE
kwargs['startupinfo'] = si
kwargs['creationflags'] = kwargs.get('creationflags', 0) | _winapi.CREATE_NEW_CONSOLE
kwargs['bufsize'] = 1
kwargs['universal_newlines'] = True
super(PopenBackground, self).__init__(*args, **kwargs)
process = PopenBackground(['ls', '-l'], stdout=subprocess.PIPE)
for line in cmd.stdout.readlines():
print line

Related

Python subprocess.Popen stdout=subprocess.PIPE blocking execution [duplicate]

I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.
How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?
subprocess.communicate() appears to give all the output at once.
To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:
#!/usr/bin/env python2
from subprocess import Popen, PIPE
p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
for line in iter(p.stdout.readline, b''):
print line,
p.wait() # wait for the subprocess to exit
iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.
If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?
Here's Python 3 code:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).
Please note, I think J.F. Sebastian's method (below) is better.
Here is an simple example (with no checking for errors):
import subprocess
proc = subprocess.Popen('ls',
shell=True,
stdout=subprocess.PIPE,
)
while proc.poll() is None:
output = proc.stdout.readline()
print output,
If ls ends too fast, then the while loop may end before you've read all the data.
You can catch the remainder in stdout this way:
output = proc.communicate()[0]
print output,
I believe the simplest way to collect output from a process in a streaming fashion is like this:
import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
data = proc.stdout.readline() # Alternatively proc.stdout.read(1024)
if len(data) == 0:
break
sys.stdout.write(data) # sys.stdout.buffer.write(data) on Python 3.x
The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.
On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.
If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().
If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:
import subprocess
# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])
See the docs for subprocess.check_call().
If you need to process the output, sure, loop on it. But if you don't, just keep it simple.
Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).
myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:
print(p.stderr.readline().rstrip('\r\n'))
Adding another python3 solution with a few small changes:
Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
Also pipes stderr out in real time
import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
# Run a shell command, streaming output to STDOUT in real time
# Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
for line in p.stdout:
sys.stdout.write(line)
p.wait()
exit_code = p.returncode
if exit_code != 0 and fail_on_error:
raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
return(exit_code)

python how to read output without EOF from stdout of subprocess [duplicate]

I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.
How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?
subprocess.communicate() appears to give all the output at once.
To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:
#!/usr/bin/env python2
from subprocess import Popen, PIPE
p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
for line in iter(p.stdout.readline, b''):
print line,
p.wait() # wait for the subprocess to exit
iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.
If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?
Here's Python 3 code:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).
Please note, I think J.F. Sebastian's method (below) is better.
Here is an simple example (with no checking for errors):
import subprocess
proc = subprocess.Popen('ls',
shell=True,
stdout=subprocess.PIPE,
)
while proc.poll() is None:
output = proc.stdout.readline()
print output,
If ls ends too fast, then the while loop may end before you've read all the data.
You can catch the remainder in stdout this way:
output = proc.communicate()[0]
print output,
I believe the simplest way to collect output from a process in a streaming fashion is like this:
import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
data = proc.stdout.readline() # Alternatively proc.stdout.read(1024)
if len(data) == 0:
break
sys.stdout.write(data) # sys.stdout.buffer.write(data) on Python 3.x
The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.
On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.
If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().
If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:
import subprocess
# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])
See the docs for subprocess.check_call().
If you need to process the output, sure, loop on it. But if you don't, just keep it simple.
Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).
myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:
print(p.stderr.readline().rstrip('\r\n'))
Adding another python3 solution with a few small changes:
Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
Also pipes stderr out in real time
import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
# Run a shell command, streaming output to STDOUT in real time
# Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
for line in p.stdout:
sys.stdout.write(line)
p.wait()
exit_code = p.returncode
if exit_code != 0 and fail_on_error:
raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
return(exit_code)

Printing output in realtime from subprocess

I'm trying to print stdout in realtime for a subprocess but it looks like stdout is buffered even with bufsize=0 and I can't figure out how to make it work, I always have a delay.
The code I tried :
p = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
bufsize=0)
line = p.stdout.readline()
while line:
sys.stdout.write(line)
sys.stdout.flush()
# DO OTHER STUFF
line = p.stdout.readline()
Also tried with for line in iter(p.stdout.readline, b'') instead of the while loop and with read(1) instead of readline(). Always the same result, the output gets delayed by a lot of seconds or minutes and multiple lines appear suddenly at once.
What I think happens :
bufsize is set to 0 ( it is set to 0 by default according to the docs ) so the lines piped top.stdout should be available immediately. But since p.stdout.readline() doesn't return immediately when a new line is piped, that means that it IS buffered, hence the multiple lines at once when the buffer is finally flushed to p.stdout.
What can I do to make it work ?
Thanks to pobrelkey who found the source of the problem. Indeed, the delay is due to the fact that the child is buffering its write to stdout because it is not writing to a tty. The child uses stdio which is line buffered when writing to a tty, else it is fully buffered.
I managed to get it to work by using pexpect instead of subprocess. pexpect uses a pseudo-tty and that's exactly what we need here :
p = pexpect.spawn(cmd,args,timeout=None)
line = p.readline()
while line:
sys.stdout.write(line)
sys.stdout.flush()
# DO OTHER STUFF
line = p.readline()
Or even better in my case :
p = pexpect.spawn(cmd,args,timeout=None,logfile=sys.stdout)
line = p.readline()
while line:
# DO OTHER STUFF
line = p.readline()
No more delay !
More infos about pexpect : wiki
I would first make sure the subprocess itself doesn't buffer its output. If the subprocess is in turn a Python program, proceed to the paragraph below to see how to disable output buffering for Python processes.
As per Python, usually the problem is that Python by default buffers stderr and stdout even if you explicitly .flush() it from the code. The solution is to pass -u to Python when starting your program.
Also, you can just do for line in p.stdout instead of the tricky while loop.
P.S. actually I tried running your code (with cmd = ['cat', '/dev/urandom']) and without -u and it outputted everything in real time already; this is on OS X 10.8.
If you just want stdout of your child process to go to your stdout, why not just have the child process inherit stdout from your process?
subprocess.Popen(cmd, stdout=None, stderr=subprocess.STDOUT)

python Popen: How do I block the execution of grep command until the content to grep is ready?

I have been fighting against Popen in python for couple of days now, so I decided to put all my doubts here, hopefully all of them can be clarified by python experts.
Initially I use Popen to execute a command and grep the result(as one command using pipe, something like xxx | grep yyy), with shell=False, as you can imagine, that doesn't work quite well. Following the guide in this post, I changed my code to the following:
checkCmd = ["sudo", "pyrit", "-r", self.capFile, "analyze"]
checkExec = Popen(checkCmd, shell=False, stdout=PIPE, stderr=STDOUT)
grepExec = Popen(["grep", "good"], stdin=checkExec.stdout, stdout=PIPE)
output = grepExec.stdout.readline()
output = grepExec.communicate()[0]
But I realized that the checkExec runs slowly and since Popen is non-blocking, grepExec always get executed before checkExec shows any result, thus the grep output would always be blank. How can I postpone the execution of grepExec till checkExec is finished?
In another Popen in my program, I tried to keep a service open at the back, so I use a separate thread to execute it. When all the tasks are done, I notify this thread to quit, and I explicitly call Popen.kill() to stop the service. However, my system ends up with a zombie process that is not reaped. I don't know if there's a nice way to clean up everything in this background thread after it finishes?
What are the differences between Popen.communicate()[0] and Popen.stdout.readline()? Can I use a loop to keep reading output from both of them?
Your example would work if you do it like this:
checkCmd = ["sudo", "pyrit", "-r", self.capFile, "analyze"]
checkExec = Popen(checkCmd, shell=False, stdout=PIPE, stderr=STDOUT)
grepExec = Popen(["grep", "good"], stdin=checkExec.stdout, stdout=PIPE)
for line in grepExec.stdout:
# do something with line
You use communicate when you want to give some input to a process and read all output on stdout, stderr of the process at the same time. This is probably not what you want for your case. communicate is more for the cases where you want to start an application, feed all the input it needs to it and read its output.
As other answers have pointed out you can use shell=True to create the pipeline in your call to subprocess, but an alternative which I would prefer is to leverage python and instead of setting up a pipeline doing:
checkCmd = ["sudo", "pyrit", "-r", self.capFile, "analyze"]
checkExec = Popen(checkCmd, shell=False, stdout=PIPE, stderr=STDOUT)
for line in checkExec.stdout:
if line.find('good') != -1:
do something with the matched line here
Use subprocess instead of popen, then you can simplify things drastically with the complete commandline.
http://docs.python.org/library/subprocess.html
eg.
import subprocess as sub
f = open('/dev/null', 'w')
proc = sub.call("cat file | grep string", executable="/bin/bash", shell=True)

Python monitoring stderr and stdout of a subprocess

I trying to start a program (HandBreakCLI) as a subprocess or thread from within python 2.7. I have gotten as far as starting it, but I can't figure out how to monitor it's stderr and stdout.
The program outputs it's status (% done) and info about the encode to stderr and stdout, respectively. I'd like to be able to periodically retrieve the % done from the appropriate stream.
I've tried calling subprocess.Popen with stderr and stdout set to PIPE and using the subprocess.communicate, but it sits and waits till the process is killed or complete then retrieves the output then. Doesn't do me much good.
I've got it up and running as a thread, but as far as I can tell I still have to eventually call subprocess.Popen to execute the program and run into the same wall.
Am I going about this the right way? What other options do I have or how to I get this to work as described?
I have accomplished the same with ffmpeg. This is a stripped down version of the relevant portions. bufsize=1 means line buffering and may not be needed.
def Run(command):
proc = subprocess.Popen(command, bufsize=1,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
universal_newlines=True)
return proc
def Trace(proc):
while proc.poll() is None:
line = proc.stdout.readline()
if line:
# Process output here
print 'Read line', line
proc = Run([ handbrakePath ] + allOptions)
Trace(proc)
Edit 1: I noticed that the subprocess (handbrake in this case) needs to flush after lines to use this (ffmpeg does).
Edit 2: Some quick tests reveal that bufsize=1 may not be actually needed.

Categories