I'm trying to do two things when executing a shell cmd with Python:
Capture stdout and print it as it happens
Capture stdout as a whole and process it when the cmd is complete
I looked at subprocess.check_output, but it does not have an stdout param that would allow me to print the output as it happens.
So after reading this question, I realized I may need to try a different approach.
from subprocess import Popen, PIPE
process = Popen(task_cmd, stdout = PIPE)
stdout, stderr = process.communicate()
print(stdout, stderr)
The problem with this approach is that according to the docs, Popen.communicate():
Reads data from stdout and stderr, until end-of-file is reached.
Wait for process to terminate
I still cannot seem to redirect output both to stdout AND to some sort of buffer that can be parsed when the command is complete.
Ideally, I'd like something like:
# captures the process output and dumps it to stdout in realtime
stdout_capture = Something(prints_to_stdout = True)
process = Popen(task_cmd, stdout = stdout_capture)
# prints the entire output of the executed process
print(stdout_capture.complete_capture)
Is there a recommended way to accomplish this?
You were on the right track with using giving Popen stdout=PIPE, but you can't use .communicate() because it returns the values after execution. Instead, I suggest you read from .stdout.
The only guaranteed way to get the output the moment it's generated is to read from the pipe one character at a time. Here is my approach:
def passthrough_and_capture_output(args):
import sys
import subprocess
process = subprocess.Popen(args, stdout=subprocess.PIPE, universal_newlines=True)
# universal_newlines means that the output of the process will be interpreted as text
capture = ""
s = process.stdout.read(1)
while len(s) > 0:
sys.stdout.write(s)
sys.stdout.flush()
capture += s
s = process.stdout.read(1)
return capture
Note that reading one character at a time can incur significant overhead, so if you are alright with lagging behind a bit, I suggest that you replace the 1 in read(1) with a different number of characters to output in batches.
from subprocess import check_output, CalledProcessError
def shell_command(args):
try:
res = check_output(args).decode()
except CalledProcessError as e:
res = e.output.decode()
for r in ['\r', '\n\n']:
res = res.replace(r, '')
return res.strip()
Related
I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.
How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?
subprocess.communicate() appears to give all the output at once.
To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:
#!/usr/bin/env python2
from subprocess import Popen, PIPE
p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
for line in iter(p.stdout.readline, b''):
print line,
p.wait() # wait for the subprocess to exit
iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.
If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?
Here's Python 3 code:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).
Please note, I think J.F. Sebastian's method (below) is better.
Here is an simple example (with no checking for errors):
import subprocess
proc = subprocess.Popen('ls',
shell=True,
stdout=subprocess.PIPE,
)
while proc.poll() is None:
output = proc.stdout.readline()
print output,
If ls ends too fast, then the while loop may end before you've read all the data.
You can catch the remainder in stdout this way:
output = proc.communicate()[0]
print output,
I believe the simplest way to collect output from a process in a streaming fashion is like this:
import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
data = proc.stdout.readline() # Alternatively proc.stdout.read(1024)
if len(data) == 0:
break
sys.stdout.write(data) # sys.stdout.buffer.write(data) on Python 3.x
The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.
On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.
If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().
If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:
import subprocess
# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])
See the docs for subprocess.check_call().
If you need to process the output, sure, loop on it. But if you don't, just keep it simple.
Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).
myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:
print(p.stderr.readline().rstrip('\r\n'))
Adding another python3 solution with a few small changes:
Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
Also pipes stderr out in real time
import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
# Run a shell command, streaming output to STDOUT in real time
# Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
for line in p.stdout:
sys.stdout.write(line)
p.wait()
exit_code = p.returncode
if exit_code != 0 and fail_on_error:
raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
return(exit_code)
I'm trying to execute some bash command using python. I want to display command's live output to user as well as capture it.
A sample example is like
import subporcess
# This will store output in result but print nothing to terminal
result = subprocess.run(['ls', '-lh'], check=True, universal_newlines=True, stdout=subprocess.PIPE)
print(result.stdout) # STD OUTPUT
# This will print everything to terminal result will be empty
result = subprocess.run(['ls', '-lh'], check=True, universal_newlines=True)
print(result.stdout) # OUTPUT = None
Here is one possibility which will gather output lines from a long-running process, write them to terminal as it goes, and return them all when the process exits.
It returns a list of output lines, rather than a full block of text that check_output or run would return, but that's easily changed. Perhaps an IO buffer might be more efficient depending on how much output you're expecting.
import subprocess
import sys
def capture_and_echo_stdout(cmd):
""" Start a subprocess and write its stdout to stdout of this process.
Capture and return stdout lines as a list. """
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
stdout_lines = []
for line in proc.stdout:
sys.stdout.write(line.decode())
stdout_lines.append(line)
proc.communicate()
# Roughly equivalent to check=True
if proc.returncode:
raise subprocess.CalledProcessError(proc.returncode, cmd)
return stdout_lines
There are a few similar options in this answer (although the focus there is more on writing to multiple files like unix tee): How to replicate tee behavior in Python when using subprocess?
I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.
How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?
subprocess.communicate() appears to give all the output at once.
To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:
#!/usr/bin/env python2
from subprocess import Popen, PIPE
p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
for line in iter(p.stdout.readline, b''):
print line,
p.wait() # wait for the subprocess to exit
iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.
If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?
Here's Python 3 code:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).
Please note, I think J.F. Sebastian's method (below) is better.
Here is an simple example (with no checking for errors):
import subprocess
proc = subprocess.Popen('ls',
shell=True,
stdout=subprocess.PIPE,
)
while proc.poll() is None:
output = proc.stdout.readline()
print output,
If ls ends too fast, then the while loop may end before you've read all the data.
You can catch the remainder in stdout this way:
output = proc.communicate()[0]
print output,
I believe the simplest way to collect output from a process in a streaming fashion is like this:
import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
data = proc.stdout.readline() # Alternatively proc.stdout.read(1024)
if len(data) == 0:
break
sys.stdout.write(data) # sys.stdout.buffer.write(data) on Python 3.x
The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.
On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.
If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().
If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:
import subprocess
# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])
See the docs for subprocess.check_call().
If you need to process the output, sure, loop on it. But if you don't, just keep it simple.
Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).
myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:
print(p.stderr.readline().rstrip('\r\n'))
Adding another python3 solution with a few small changes:
Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
Also pipes stderr out in real time
import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
# Run a shell command, streaming output to STDOUT in real time
# Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
for line in p.stdout:
sys.stdout.write(line)
p.wait()
exit_code = p.returncode
if exit_code != 0 and fail_on_error:
raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
return(exit_code)
I have created a dictionary where I associate an id with a subprocess.
Something like:
cmd = "ls"
processes[id] = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
Then I call a method with this process map as an input, that checks which process has finished. If the process finishes, I check the process's stdout.read() for a particular string match.
The issue is sometimes stdout.read() returns an empty value which causes issues in string matching.
Sample Code:
#Create a map
processes[id] = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
...
#Pass that map to a method which checks which processes have finished
completedProcesses(processes)
def completedProcesses(processes):
processList = []
for id,process in processes.iteritems():
if process.poll() is not None:
#If some error in process stdout then print id
verifySuccessStatus(id, processes[id])
processList.add(id)
def verifySuccessStatus(id, process):
file=open(FAILED_IDS_FILE, 'a+')
buffer = process.stdout.read() #This returns empty value sometime
if 'Error' not in buffer:
file.write(id)
file.write('\n')
file.close()
I am new to python, I might be missing some internal functionality understanding of subprocess
There are at least two issues:
There is no point to call process.stdout.read() more than once. .read() does not return until EOF. It returns an empty string to indicate EOF after that.
You should read from the pipes while the processes are still running otherwise they may hang if they generate enough output to fill OS pipe buffers (~65K on my Linux box)
If you want to run multiple external processes concurrently and check their output after they are finished then see this answer that shows "thread pool" and async.io solutions.
Judging by your example command of ls, your issue may be caused by the stdout pipe filling up. Using the process.communicate() method handles this case for you, since you don't need to write multiple times to stdin.
# Recommend the future print function for easier file writing.
from __future__ import print_function
# Create a map
# Keeping access to 'stderr' is generally recommended, but not required.
# Also, if you don't know you need 'shell=True', it's safer practice not to use it.
processes[id] = subprocess.Popen(
[cmd],
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
...
#Pass that map to a method which checks which processes have finished
check_processes(processes)
def check_processes(processes):
process_ids = []
# 'id' is a built-in function in python, so it's safer to use a different name.
for idx, process in processes.iteritems():
# When using pipes, communicate() will handle the case of the pipe
# filling up for you.
stdout, stderr = process.communicate()
if not is_success(stdout):
write_failed_id(idx)
process_ids.append(idx)
def is_success(stdout):
return 'Error' not in stdout
def write_failed_id(idx):
# Recommend using a context manager when interacting with files.
# Also, 'file' is a built-in function in python.
with open(FAILED_IDS_FILE, 'a+') as fail_file:
# The future print function makes file printing simpler.
print(idx, file=fail_file)
You're only reading stdout and looking for "Error". Perhaps you should also be looking in stderr:
processes[id] = subprocess.Popen(
[cmd],
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
From the subprocess docs:
subprocess.STDOUT
Special value that can be used as the stderr argument to Popen and indicates that standard error should go into the same handle as standard output.
The process could have failed unexpectedly, returning no stdout but a non-zero return code. You can check this using process.returncode.
Popen.returncode
The child return code, set by poll() and wait() (and indirectly by communicate()). A None value indicates that the process hasn’t terminated yet.
A negative value -N indicates that the child was terminated by signal N (Unix only).
I'm using Python's subprocess.communicate() to read stdout from a process that runs for about a minute.
How can I print out each line of that process's stdout in a streaming fashion, so that I can see the output as it's generated, but still block on the process terminating before continuing?
subprocess.communicate() appears to give all the output at once.
To get subprocess' output line by line as soon as the subprocess flushes its stdout buffer:
#!/usr/bin/env python2
from subprocess import Popen, PIPE
p = Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1)
with p.stdout:
for line in iter(p.stdout.readline, b''):
print line,
p.wait() # wait for the subprocess to exit
iter() is used to read lines as soon as they are written to workaround the read-ahead bug in Python 2.
If subprocess' stdout uses a block buffering instead of a line buffering in non-interactive mode (that leads to a delay in the output until the child's buffer is full or flushed explicitly by the child) then you could try to force an unbuffered output using pexpect, pty modules or unbuffer, stdbuf, script utilities, see Q: Why not just use a pipe (popen())?
Here's Python 3 code:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(["cmd", "arg1"], stdout=PIPE, bufsize=1,
universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
Note: Unlike Python 2 that outputs subprocess' bytestrings as is; Python 3 uses text mode (cmd's output is decoded using locale.getpreferredencoding(False) encoding).
Please note, I think J.F. Sebastian's method (below) is better.
Here is an simple example (with no checking for errors):
import subprocess
proc = subprocess.Popen('ls',
shell=True,
stdout=subprocess.PIPE,
)
while proc.poll() is None:
output = proc.stdout.readline()
print output,
If ls ends too fast, then the while loop may end before you've read all the data.
You can catch the remainder in stdout this way:
output = proc.communicate()[0]
print output,
I believe the simplest way to collect output from a process in a streaming fashion is like this:
import sys
from subprocess import *
proc = Popen('ls', shell=True, stdout=PIPE)
while True:
data = proc.stdout.readline() # Alternatively proc.stdout.read(1024)
if len(data) == 0:
break
sys.stdout.write(data) # sys.stdout.buffer.write(data) on Python 3.x
The readline() or read() function should only return an empty string on EOF, after the process has terminated - otherwise it will block if there is nothing to read (readline() includes the newline, so on empty lines, it returns "\n"). This avoids the need for an awkward final communicate() call after the loop.
On files with very long lines read() may be preferable to reduce maximum memory usage - the number passed to it is arbitrary, but excluding it results in reading the entire pipe output at once which is probably not desirable.
If you want a non-blocking approach, don't use process.communicate(). If you set the subprocess.Popen() argument stdout to PIPE, you can read from process.stdout and check if the process still runs using process.poll().
If you're simply trying to pass the output through in realtime, it's hard to get simpler than this:
import subprocess
# This will raise a CalledProcessError if the program return a nonzero code.
# You can use call() instead if you don't care about that case.
subprocess.check_call(['ls', '-l'])
See the docs for subprocess.check_call().
If you need to process the output, sure, loop on it. But if you don't, just keep it simple.
Edit: J.F. Sebastian points out both that the defaults for the stdout and stderr parameters pass through to sys.stdout and sys.stderr, and that this will fail if sys.stdout and sys.stderr have been replaced (say, for capturing output in tests).
myCommand="ls -l"
cmd=myCommand.split()
# "universal newline support" This will cause to interpret \n, \r\n and \r equally, each as a newline.
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, universal_newlines=True)
while True:
print(p.stderr.readline().rstrip('\r\n'))
Adding another python3 solution with a few small changes:
Allows you to catch the exit code of the shell process (I have been unable to get the exit code while using the with construct)
Also pipes stderr out in real time
import subprocess
import sys
def subcall_stream(cmd, fail_on_error=True):
# Run a shell command, streaming output to STDOUT in real time
# Expects a list style command, e.g. `["docker", "pull", "ubuntu"]`
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True)
for line in p.stdout:
sys.stdout.write(line)
p.wait()
exit_code = p.returncode
if exit_code != 0 and fail_on_error:
raise RuntimeError(f"Shell command failed with exit code {exit_code}. Command: `{cmd}`")
return(exit_code)