redirecting shell output using subprocess - python

I have a python script which calls a lot of shell functions. The script can be run interactively from a terminal, in which case I'd like to display output right away, or called by crontab, in which case I'd like to email error output.
I wrote a helper function for calling shell functions:
import subprocess
import shlex
import sys
def shell(cmdline, interactive=True):
args = shlex.split(cmdline.encode("ascii"))
proc = subprocess.Popen(args, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
val = proc.communicate()
if interactive is True:
if proc.returncode:
print "returncode " + str(proc.returncode)
print val[1]
sys.exit(1)
else:
print val[0]
else:
if proc.returncode:
print ""
# send email with val[0] + val[1]
if __name__ == "__main__":
# example of command that produces non-zero returncode
shell("ls -z")
The problem I'm having is two-fold.
1) In interactive mode, when the shell command takes a while to finish (e.g. few minutes), I don't see anything until the command is completely done since communicate() buffers output. Is there a way to display output as it comes in, and avoid buffering? I also need a way to check the returncode, which is why I'm using communicate().
2) Some shell commands I call can produce a lot of output (e.g. 2MB). The documentation for communicate() says "do not use this method if the data size is large or unlimited." Does anyone know how large is "large"?

1) When you use communicate, you capture the output of the subprocess so nothing is sent to your standard output. The only reason why you see the output when the subprocess is finished is because you print it yourself.
Since you want to either see it as it runs and not capture it or capture everything and do something with it only at the end, you can change the way it works in interactive mode by leaving stdout and stderr to None. This makes the subprocess use the same streams as your program. You'll also have to replace the call to communicate with a call to wait:
if interactive is True:
proc = subprocess.Popen(args)
proc.wait()
if proc.returncode:
print "returncode " + str(proc.returncode)
sys.exit(1)
else:
proc = subprocess.Popen(args, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
val = proc.communicate()
if proc.returncode:
print ""
# send email with val[0] + val[1]
2) Too large is "too large to store in memory", so it all depends on a lot of factors. If storing temporarily 2MB of data in memory is fine in your situation, then there's nothing to worry about.

Related

How to make a generic method in Python to execute multiple piped shell commands?

I have many shell commands that need to be executed in my python script. I know that I shouldn't use shell=true as mentioned here and that I can use the std outputs and inputs in case when I have pipes in the command as mentioned here.
But the problem is that my shell commands are complex and full of pipes, so I'd like to make a generic method to be used by my script.
I made a small test below, but is hanging after printing the result (I simplified just to put here). Can somebody please let me know:
Why is hanging.
If there's a better method of doing this.
Thanks.
PS: This is just a small portion of a big python project and there are business reasons why I'm trying to do this. Thanks.
#!/usr/bin/env python3
import subprocess as sub
from subprocess import Popen, PIPE
import shlex
def exec_cmd(cmd,p=None,isFirstLoop=True):
if not isFirstLoop and not p:
print("Error, p is null")
exit()
if "|" in cmd:
cmds = cmd.split("|")
while "|" in cmd:
# separates what is before and what is after the first pipe
now_cmd = cmd.split('|',1)[0].strip()
next_cmd = cmd.split('|',1)[-1].strip()
try:
if isFirstLoop:
p1 = sub.Popen(shlex.split(now_cmd), stdout=PIPE)
exec_cmd(next_cmd,p1,False)
else:
p2 = sub.Popen(shlex.split(now_cmd),stdin=p.stdout, stdout=PIPE)
exec_cmd(next_cmd,p2,False)
except Exception as e:
print("Error executing command '{0}'.\nOutput:\n:{1}".format(cmd,str(e)))
exit()
# Adjust cmd to execute the next part
cmd = next_cmd
else:
proc = sub.Popen(shlex.split(cmd),stdin=p.stdout, stdout=PIPE, universal_newlines=True)
(out,err) = proc.communicate()
if err:
print(str(err).strip())
else:
print(out)
exec_cmd("ls -ltrh | awk '{print $9}' | wc -l ")
Instead of using a shell string and trying to parse it with your own means, I’d ask the user to provide the commands as separate entities themselves. This avoid the obvious trap of detecting a | that is part of a command and not used as a shell pipe. That you ask them to provide commands as a list of strings or a single string that you will shlex.split afterwards is up to the interface that you want to expose. I’d choose the first one for its simplicity in the following example.
Once you have the individual commands, a simple for loop is enough to pipe outputs of the previous commands to inputs of the next ones, as you have found yourself:
def pipe_subprocesses(*commands):
if not commands:
return
next_input = None
for command in commands:
p = subprocess.Popen(command, stdin=next_input, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
next_input = p.stdout
out, err = p.communicate()
if err:
print(err.decode().strip())
else:
print(out.decode())
Usage being:
>>> pipe_subprocesses(['ls', '-lhtr'], ['awk', '{print $9}'], ['wc', '-l'])
25
Now this is a quick and dirty way to get it setup and have seemingly work as you want it. But there are at least two issues with this code:
You leak zombies process/opened process handles because no process' exit code but the last one is collected; and the OS is keeping resources opened for you to do so;
You can't access the informations of a process that would fail midway through.
To avoid that, you need to maintain a list of opened process and explicitly wait for each of them. And because I don't know your exact use case, I'll just return the first process that failed (if any) or the last process (if not) so you can act accordingly:
def pipe_subprocesses(*commands):
if not commands:
return
processes = []
next_input = None
for command in commands:
if isinstance(command, str):
command = shlex.split(command)
p = subprocess.Popen(command, stdin=next_input, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
next_input = p.stdout
processes.append(p)
for p in processes:
p.wait()
for p in processes:
if p.returncode != 0:
return p
return p # return the last process in case everything went well
I also thrown in some shlex as an example so you can mix raw strings and already parsed lists:
>>> pipe_subprocesses('ls -lhtr', ['awk', '{print $9}'], 'wc -l')
25
This unfortunately has a few edge cases in it that the shell takes care of for you, or alternatively, that the shell completely ignores for you. Some concerns:
The function should always wait() for every process to finish, or else you will get what are called zombie processes.
The commands should be connected to each other using real pipes, that way the entire output doesn't need to be read into memory at once. This is the normal way pipes work.
The read end of every pipe should be closed in the parent process, so children can properly SIGPIPE when the next process closes its input. Without this, the parent process can keep the pipe open and the child does not know to exit, and it may run forever.
Errors in child processes should be raised as exceptions, except SIGPIPE. It is left as an exercise to the reader to raise exceptions for SIGPIPE on the final process because SIGPIPE is not expected there, but ignoring it is not harmful.
Note that subprocess.DEVNULL does not exist prior to Python 3.3. I know there are some of you out there still living with 2.x, you will have to open a file for /dev/null manually or just decide that the first process in the pipeline gets to share stdin with the parent process.
Here is the code:
import signal
import subprocess
def run_pipe(*cmds):
"""Run a pipe that chains several commands together."""
pipe = subprocess.DEVNULL
procs = []
try:
for cmd in cmds:
proc = subprocess.Popen(cmd, stdin=pipe,
stdout=subprocess.PIPE)
procs.append(proc)
if pipe is not subprocess.DEVNULL:
pipe.close()
pipe = proc.stdout
stdout, _ = proc.communicate()
finally:
# Must call wait() on every process, otherwise you get
# zombies.
for proc in procs:
proc.wait()
# Fail if any command in the pipe failed, except due to SIGPIPE
# which is expected.
for proc in procs:
if (proc.returncode
and proc.returncode != -signal.SIGPIPE):
raise subprocess.CalledProcessError(
proc.returncode, proc.args)
return stdout
Here we can see it in action. You can see that the pipeline correctly terminates with yes (which runs until SIGPIPE) and correctly fails with false (which always fails).
In [1]: run_pipe(["yes"], ["head", "-n", "1"])
Out[1]: b'y\n'
In [2]: run_pipe(["false"], ["true"])
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-2-db97c6876cd7> in <module>()
----> 1 run_pipe(["false"], ["true"])
~/test.py in run_pipe(*cmds)
22 for proc in procs:
23 if proc.returncode and proc.returncode != -signal.SIGPIPE:
---> 24 raise subprocess.CalledProcessError(proc.returncode, proc.args)
25 return stdout
CalledProcessError: Command '['false']' returned non-zero exit status 1

Python: Capture stdout from subprocess.call

I'm trying to do two things when executing a shell cmd with Python:
Capture stdout and print it as it happens
Capture stdout as a whole and process it when the cmd is complete
I looked at subprocess.check_output, but it does not have an stdout param that would allow me to print the output as it happens.
So after reading this question, I realized I may need to try a different approach.
from subprocess import Popen, PIPE
process = Popen(task_cmd, stdout = PIPE)
stdout, stderr = process.communicate()
print(stdout, stderr)
The problem with this approach is that according to the docs, Popen.communicate():
Reads data from stdout and stderr, until end-of-file is reached.
Wait for process to terminate
I still cannot seem to redirect output both to stdout AND to some sort of buffer that can be parsed when the command is complete.
Ideally, I'd like something like:
# captures the process output and dumps it to stdout in realtime
stdout_capture = Something(prints_to_stdout = True)
process = Popen(task_cmd, stdout = stdout_capture)
# prints the entire output of the executed process
print(stdout_capture.complete_capture)
Is there a recommended way to accomplish this?
You were on the right track with using giving Popen stdout=PIPE, but you can't use .communicate() because it returns the values after execution. Instead, I suggest you read from .stdout.
The only guaranteed way to get the output the moment it's generated is to read from the pipe one character at a time. Here is my approach:
def passthrough_and_capture_output(args):
import sys
import subprocess
process = subprocess.Popen(args, stdout=subprocess.PIPE, universal_newlines=True)
# universal_newlines means that the output of the process will be interpreted as text
capture = ""
s = process.stdout.read(1)
while len(s) > 0:
sys.stdout.write(s)
sys.stdout.flush()
capture += s
s = process.stdout.read(1)
return capture
Note that reading one character at a time can incur significant overhead, so if you are alright with lagging behind a bit, I suggest that you replace the 1 in read(1) with a different number of characters to output in batches.
from subprocess import check_output, CalledProcessError
def shell_command(args):
try:
res = check_output(args).decode()
except CalledProcessError as e:
res = e.output.decode()
for r in ['\r', '\n\n']:
res = res.replace(r, '')
return res.strip()

Python: subprocess32 process.stdout.readline() waiting time

If I run the following function "run" with for example "ls -Rlah /" I get output immediately via the print statement as expected
import subprocess32 as subprocess
def run(command):
process = subprocess.Popen(command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
try:
while process.poll() == None:
print process.stdout.readline()
finally:
# Handle the scenario if the parent
# process has terminated before this subprocess
if process.poll():
process.kill()
However if I use the python example program below it seems to be stuck on either process.poll() or process.stdout.readline() until the program has finished. I think it is stdout.readline() since if I increase the number of strings to output from 10 to 10000 (in the example program) or add in a sys.stdout.flush() just after every print, the print in the run function does get executed.
How can I make the output from a subprocess more real-timeish?
Note: I have just discovered that the python example program does not perform a sys.stdout.flush() when it outputs, is there a way for the caller of subprocess to enforce this somehow?
Example program which outputs 10 strings every 5 seconds.
#!/bin/env python
import time
if __name__ == "__main__":
i = 0
start = time.time()
while True:
if time.time() - start >= 5:
for _ in range(10):
print "hello world" + str(i)
start = time.time()
i += 1
if i >= 3:
break
On most systems, command line programs line buffer or block buffer depending on whether stdout is a terminal or a pipe. On unixy systems, the parent process can create a pseudo-terminal to get terminal-like behavior even though the child isn't really run from a terminal. You can use the pty module to create a pseudo-terminal or use the pexpect module which eases access to interactive programs.
As mentioned in comments, using poll to read lines can result in lost data. One example is data left in the stdout pipe when the process terminates. Reading pty is a bit different than pipes and you'll find you need to catch an IOError when the child closes to get it all to work properly as in the example below.
try:
import subprocess32 as subprocess
except ImportError:
import subprocess
import pty
import sys
import os
import time
import errno
print("running %s" % sys.argv[1])
m,s = (os.fdopen(pipe) for pipe in pty.openpty())
process = subprocess.Popen([sys.argv[1]],
stdin=s,
stdout=s,
stderr=subprocess.STDOUT)
s.close()
try:
graceful = False
while True:
line = m.readline()
print line.rstrip()
except IOError, e:
if e.errno != errno.EIO:
raise
graceful = True
finally:
# Handle the scenario if the parent
# process has terminated before this subprocess
m.close()
if not graceful:
process.kill()
process.wait()
You should flush standard output in your script:
print "hello world" + str(i)
sys.stdout.flush()
When standard output is a terminal, stdout is line-buffered. But when it is not, stdout is block buffered and you need to flush it explicitly.
If you can't change the source of your script, you can use the -u option of Python (in the subprocess):
-u Force stdin, stdout and stderr to be totally unbuffered.
Your command should be: ['python', '-u', 'script.py']
In general, this kind of buffering happens in userspace. There are no generic ways to force an application to flush its buffers: some applications support command line options (like Python), others support signals, others do not support anything.
One solution might be to emulate a pseudo terminal, giving "hints" to the programs that they should operate in line-buffered mode. Still, this is not a solution that works in every case.
For things other than python you could try using unbuffer:
unbuffer disables the output buffering that occurs when program output is redirected from non-interactive programs. For example, suppose you are watching the output from a fifo by running it through od and then more.
od -c /tmp/fifo | more
You will not see anything until a full page of output has been produced.
You can disable this automatic buffering as follows:
unbuffer od -c /tmp/fifo | more
Normally, unbuffer does not read from stdin. This simplifies use of unbuffer in some situations. To use unbuffer in a pipeline, use the -p flag. Example:
process1 | unbuffer -p process2 | process3
So in your case:
run(["unbuffer",cmd])
There are some caveats listed in the docs but it is another option.

stdout.read() from finished subprocess sometimes returning empty?

I have created a dictionary where I associate an id with a subprocess.
Something like:
cmd = "ls"
processes[id] = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
Then I call a method with this process map as an input, that checks which process has finished. If the process finishes, I check the process's stdout.read() for a particular string match.
The issue is sometimes stdout.read() returns an empty value which causes issues in string matching.
Sample Code:
#Create a map
processes[id] = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
...
#Pass that map to a method which checks which processes have finished
completedProcesses(processes)
def completedProcesses(processes):
processList = []
for id,process in processes.iteritems():
if process.poll() is not None:
#If some error in process stdout then print id
verifySuccessStatus(id, processes[id])
processList.add(id)
def verifySuccessStatus(id, process):
file=open(FAILED_IDS_FILE, 'a+')
buffer = process.stdout.read() #This returns empty value sometime
if 'Error' not in buffer:
file.write(id)
file.write('\n')
file.close()
I am new to python, I might be missing some internal functionality understanding of subprocess
There are at least two issues:
There is no point to call process.stdout.read() more than once. .read() does not return until EOF. It returns an empty string to indicate EOF after that.
You should read from the pipes while the processes are still running otherwise they may hang if they generate enough output to fill OS pipe buffers (~65K on my Linux box)
If you want to run multiple external processes concurrently and check their output after they are finished then see this answer that shows "thread pool" and async.io solutions.
Judging by your example command of ls, your issue may be caused by the stdout pipe filling up. Using the process.communicate() method handles this case for you, since you don't need to write multiple times to stdin.
# Recommend the future print function for easier file writing.
from __future__ import print_function
# Create a map
# Keeping access to 'stderr' is generally recommended, but not required.
# Also, if you don't know you need 'shell=True', it's safer practice not to use it.
processes[id] = subprocess.Popen(
[cmd],
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
...
#Pass that map to a method which checks which processes have finished
check_processes(processes)
def check_processes(processes):
process_ids = []
# 'id' is a built-in function in python, so it's safer to use a different name.
for idx, process in processes.iteritems():
# When using pipes, communicate() will handle the case of the pipe
# filling up for you.
stdout, stderr = process.communicate()
if not is_success(stdout):
write_failed_id(idx)
process_ids.append(idx)
def is_success(stdout):
return 'Error' not in stdout
def write_failed_id(idx):
# Recommend using a context manager when interacting with files.
# Also, 'file' is a built-in function in python.
with open(FAILED_IDS_FILE, 'a+') as fail_file:
# The future print function makes file printing simpler.
print(idx, file=fail_file)
You're only reading stdout and looking for "Error". Perhaps you should also be looking in stderr:
processes[id] = subprocess.Popen(
[cmd],
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
From the subprocess docs:
subprocess.STDOUT
Special value that can be used as the stderr argument to Popen and indicates that standard error should go into the same handle as standard output.
The process could have failed unexpectedly, returning no stdout but a non-zero return code. You can check this using process.returncode.
Popen.returncode
The child return code, set by poll() and wait() (and indirectly by communicate()). A None value indicates that the process hasn’t terminated yet.
A negative value -N indicates that the child was terminated by signal N (Unix only).

Handling stdin and stdout

I'm trying to use subprocess to handle streams. I need to write data to the stream, and be able to read from it asynchronously (before the program dies, because mine's will take minutes to complete, however it products output).
For the learn case, I've been using the timeout command from Windows 7:
import subprocess
import time
args = ['timeout', '5']
p = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=False)
p.stdin.write('\n') # this is supposed to mimic Enter button pressed event.
while True:
print p.stdout.read() # expected this to print output interactively. This actually hungs.
time.sleep(1)
Where am I wrong?
This line:
print p.stdout.read() # expected this to print output interactively. This actually hungs.
hangs because read() means "read all data until EOF". See the documentation. It seems like you may have wanted to read a line at a time:
print p.stdout.readline()

Categories