Python: subprocess32 process.stdout.readline() waiting time - python

If I run the following function "run" with for example "ls -Rlah /" I get output immediately via the print statement as expected
import subprocess32 as subprocess
def run(command):
process = subprocess.Popen(command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
try:
while process.poll() == None:
print process.stdout.readline()
finally:
# Handle the scenario if the parent
# process has terminated before this subprocess
if process.poll():
process.kill()
However if I use the python example program below it seems to be stuck on either process.poll() or process.stdout.readline() until the program has finished. I think it is stdout.readline() since if I increase the number of strings to output from 10 to 10000 (in the example program) or add in a sys.stdout.flush() just after every print, the print in the run function does get executed.
How can I make the output from a subprocess more real-timeish?
Note: I have just discovered that the python example program does not perform a sys.stdout.flush() when it outputs, is there a way for the caller of subprocess to enforce this somehow?
Example program which outputs 10 strings every 5 seconds.
#!/bin/env python
import time
if __name__ == "__main__":
i = 0
start = time.time()
while True:
if time.time() - start >= 5:
for _ in range(10):
print "hello world" + str(i)
start = time.time()
i += 1
if i >= 3:
break

On most systems, command line programs line buffer or block buffer depending on whether stdout is a terminal or a pipe. On unixy systems, the parent process can create a pseudo-terminal to get terminal-like behavior even though the child isn't really run from a terminal. You can use the pty module to create a pseudo-terminal or use the pexpect module which eases access to interactive programs.
As mentioned in comments, using poll to read lines can result in lost data. One example is data left in the stdout pipe when the process terminates. Reading pty is a bit different than pipes and you'll find you need to catch an IOError when the child closes to get it all to work properly as in the example below.
try:
import subprocess32 as subprocess
except ImportError:
import subprocess
import pty
import sys
import os
import time
import errno
print("running %s" % sys.argv[1])
m,s = (os.fdopen(pipe) for pipe in pty.openpty())
process = subprocess.Popen([sys.argv[1]],
stdin=s,
stdout=s,
stderr=subprocess.STDOUT)
s.close()
try:
graceful = False
while True:
line = m.readline()
print line.rstrip()
except IOError, e:
if e.errno != errno.EIO:
raise
graceful = True
finally:
# Handle the scenario if the parent
# process has terminated before this subprocess
m.close()
if not graceful:
process.kill()
process.wait()

You should flush standard output in your script:
print "hello world" + str(i)
sys.stdout.flush()
When standard output is a terminal, stdout is line-buffered. But when it is not, stdout is block buffered and you need to flush it explicitly.
If you can't change the source of your script, you can use the -u option of Python (in the subprocess):
-u Force stdin, stdout and stderr to be totally unbuffered.
Your command should be: ['python', '-u', 'script.py']
In general, this kind of buffering happens in userspace. There are no generic ways to force an application to flush its buffers: some applications support command line options (like Python), others support signals, others do not support anything.
One solution might be to emulate a pseudo terminal, giving "hints" to the programs that they should operate in line-buffered mode. Still, this is not a solution that works in every case.

For things other than python you could try using unbuffer:
unbuffer disables the output buffering that occurs when program output is redirected from non-interactive programs. For example, suppose you are watching the output from a fifo by running it through od and then more.
od -c /tmp/fifo | more
You will not see anything until a full page of output has been produced.
You can disable this automatic buffering as follows:
unbuffer od -c /tmp/fifo | more
Normally, unbuffer does not read from stdin. This simplifies use of unbuffer in some situations. To use unbuffer in a pipeline, use the -p flag. Example:
process1 | unbuffer -p process2 | process3
So in your case:
run(["unbuffer",cmd])
There are some caveats listed in the docs but it is another option.

Related

Python Popen _with_ realtime input/output control

I have searched and experimented for over an hour on this and there doesn't seem to be a way to both do a 'here document' and get the output line by line as it occurs:
python = '''var="some character text"
print(var)
print(var)
exit()
'''
from subprocess import Popen, PIPE, STDOUT
import shlex
def run_process(command):
p = Popen(shlex.split(command), stdin=PIPE, stdout=PIPE, stderr=STDOUT)
p.stdin.write(python)
while True:
output = p.stdout.readline()
if output == '' and p.poll() is not None:
break
if output:
print output.strip()
rc=p.poll()
return rc
run_process("/usr/bin/python")
The above code hangs indefinitely. Yes, it's a snake eating its tail, but it was just to prove the concept.
The problem is my subprocess takes a LONG time to run and I need to be able to see the output without waiting hours to figure out if anything is wrong. Any hints? Thanks.
The Python interpreter behaves differently when run in interactive vs. non-interactive mode. From the python(1) manual page:
In non-interactive mode, the entire input is parsed before it is executed.
Of course, “entire input” is delimited by EOF, and your program never sends an EOF, which is why it hangs.
Python runs in interactive mode if its stdin is a tty. You can use the Ptyprocess library to spawn a process with a tty as stdin. Or use the Pexpect library (based on Ptyprocess), which even includes ready-made REPL wrappers for Python and other programs.
But if you replace Python with sed — which of course doesn’t have an interactive mode — the program still doesn’t work:
sed = '''this is a foo!\n
another foo!\n
'''
from subprocess import Popen, PIPE, STDOUT
import shlex
def run_process(command):
p = Popen(shlex.split(command), stdin=PIPE, stdout=PIPE, stderr=STDOUT)
p.stdin.write(sed)
while True:
output = p.stdout.readline()
if output == '' and p.poll() is not None:
break
if output:
print output.strip()
rc=p.poll()
return rc
run_process("/bin/sed -e 's/foo/bar/g'")
This is caused by a different problem: output buffering in sed. Some programs have options to disable buffering. In particular, both sed and Python have a -u option, which solves this problem:
run_process("/bin/sed -ue 's/foo/bar/g'")

Communicate with process send key in subprocess linux

I have one sh file, I need to install it in target linux box. So I'm in the process of writing automatic installation for the sh file which required lot of input from user. Example, first thing I made ./file.sh it will show a big paragaraph and ask user to press Enter. I'm stuck in this place. How to send key data to the sub process. Here is what I've tried.
import subprocess
def runProcess(exe):
global p
p = subprocess.Popen(exe, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while(True):
retcode = p.poll() #returns None while subprocess is running
line = p.stdout.readline()
yield line
if(retcode is not None):
break
for line in runProcess('./file.sh'.split()):
if '[Enter]' in line:
print line + 'got it'
p.communicate('\r')
Correct me if my understanding is wrong, pardon me if it is duplicate.
If you need to send a bunch of newlines and nothing else, you need to:
Make sure the stdin for the Popen is a pipe
Send the newlines without causing a deadlock
Your current code does neither. Something that might work (assuming they're not using APIs that require direct interaction in a tty, rather than just reading stdin):
import subprocess
import threading
def feednewlines(f):
try:
# Write as many newlines as it will take
while True:
f.write(b'\n') # Write newline, not carriage return
f.flush() # Flush to ensure it's sent as quickly as possible
except OSError:
return # Done when pipe closed/process exited
def runProcess(exe):
global p
# Get stdin as pipe too
p = subprocess.Popen(exe, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Use thread to just feed as many newlines as needed to stdin of subprocess
feeder = threading.Thread(target=feednewlines, args=(p.stdin,))
feeder.daemon = True
feeder.start()
# No need to poll, just read until it closes stdout or exits
for line in p.stdout:
yield line
p.stdin.close() # Stop feeding (causes thread to error and exit)
p.wait() # Cleanup process
# Iterate output, and echo when [Enter] seen
for line in runProcess('./file.sh'.split()):
if '[Enter]' in line:
print line + 'got it'
For the case where you need to customize the responses, you're going to need to add communication between parent and feeder thread, which makes this uglier, and it only works if the child process is properly flushing its output when it prompts you, even when not connected to a terminal. You might do something like this to define a global queue:
import queue # Queue on Python 2
feederqueue = queue.Queue()
then change the feeder function to:
def feednewlines(f):
try:
while True:
f.write(feederqueue.get())
f.flush()
except OSError:
return
and change the global code lower down to:
for line in runProcess('./file.sh'.split()):
if '[Enter]' in line:
print line + 'got it'
feederqueue.put(b'\n')
elif 'THING THAT REQUIRES YOU TO TYPE FOO' in line:
feederqueue.put(b'foo\n')
etc.
Command line programs run differently when they are run in a terminal verses when they are run in the background. If the program is attached to a terminal, they run in an interactive line mode expecting user interaction. If stdin is a file or a pipe, they run in block mode where writes are delayed until a certain block size is buffered. Your program will never see the [Enter] prompt because it uses pipes and the data is still in the subprocesses output buffer.
The python pexpect module solves this problem by emulating a terminal and allowing you to interact with the program with a series of "expect" statements.
Suppose we want to run a test program
#!/usr/bin/env python3
data = input('[Enter]')
print(data)
its pretty boring. It prompts for data, prints it, then exits. We can run it with pexpect
#!/usr/bin/env python3
import pexpect
# run the program
p = pexpect.spawn('./test.py')
# we don't need to see our input to the program echoed back
p.setecho(False)
# read lines until the desired program output is seen
p.expect(r'\[Enter\]')
# send some data to the program
p.sendline('inner data')
# wait for it to exit
p.expect(pexpect.EOF)
# show everything since the previous expect
print(p.before)
print('outer done')

Filter out command that needs a terminal in Python subprocess module

I am developing a robot that accepts commands from network (XMPP) and uses subprocess module in Python to execute them and sends back the output of commands. Essentially it is an SSH-like XMPP-based non-interactive shell.
The robot only executes commands from authenticated trusted sources, so arbitrary shell commands are allowed (shell=True).
However, when I accidentally send some command that needs a tty, the robot is stuck.
For example:
subprocess.check_output(['vim'], shell=False)
subprocess.check_output('vim', shell=True)
Should each of the above commands is received, the robot is stuck, and the terminal from which the robot is run, is broken.
Though the robot only receives commands from authenticated trusted sources, human errs. How could I make the robot filter out those commands that will break itself? I know there is os.isatty but how could I utilize it? Is there a way to detect those "bad" commands and refuse to execute them?
TL;DR:
Say, there are two kinds of commands:
Commands like ls: does not need a tty to run.
Commands like vim: needs a tty; breaks subprocess if no tty is given.
How could I tell a command is ls-like or is vim-like and refuses to run the command if it is vim-like?
What you expect is a function that receives command as input, and returns meaningful output by running the command.
Since the command is arbitrary, requirement for tty is just one of many bad cases may happen (other includes running a infinite loop), your function should only concern about its running period, in other words, a command is “bad” or not should be determined by if it ends in a limited time or not, and since subprocess is asynchronous by nature, you can just run the command and handle it in a higher vision.
Demo code to play, you can change the cmd value to see how it performs differently:
#!/usr/bin/env python
# coding: utf-8
import time
import subprocess
from subprocess import PIPE
#cmd = ['ls']
#cmd = ['sleep', '3']
cmd = ['vim', '-u', '/dev/null']
print 'call cmd'
p = subprocess.Popen(cmd, shell=True,
stdin=PIPE, stderr=PIPE, stdout=PIPE)
print 'called', p
time_limit = 2
timer = 0
time_gap = 0.2
ended = False
while True:
time.sleep(time_gap)
returncode = p.poll()
print 'process status', returncode
timer += time_gap
if timer >= time_limit:
print 'timeout, kill process'
p.kill()
break
if returncode is not None:
ended = True
break
if ended:
print 'process ended by', returncode
print 'read'
out, err = p.communicate()
print 'out', repr(out)
print 'error', repr(err)
else:
print 'process failed'
Three points are notable in the above code:
We use Popen instead of check_output to run the command, unlike check_output which will wait for the process to end, Popen returns immediately, thus we can do further things to control the process.
We implement a timer to check for the process's status, if it runs for too long, we killed it manually because we think a process is not meaningful if it could not end in a limited time. In this way your original problem will be solved, as vim will never end and it will definitely being killed as an “unmeaningful” command.
After the timer helps us filter out bad commands, we can get stdout and stderr of the command by calling communicate method of the Popen object, after that its your choice to determine what to return to the user.
Conclusion
tty simulation is not needed, we should run the subprocess asynchronously, then control it by a timer to determine whether it should be killed or not, for those ended normally, its safe and easy to get the output.
Well, SSH is already a tool that will allow users to remotely execute commands and be authenticated at the same time. The authentication piece is extremely tricky, please be aware that building the software you're describing is a bit risky from a security perspective.
There isn't a way to determine whether a process is going to need a tty or not. And there's no os.isatty method because if you ran a sub-processes that needed one wouldn't mean that there was one. :)
In general, it would probably be safer from a security perspective and also a solution to this problem if you were to consider a white list of commands. You could choose that white list to avoid things that would need a tty, because I don't think you'll easily get around this.
Thanks a lot for #J.F. Sebastia's help (see comments under the question), I've found a solution (workaround?) for my case.
The reason why vim breaks terminal while ls does not, is that vim needs a tty. As Sebastia says, we can feed vim with a pty using pty.openpty(). Feeding a pty gurantees the command will not break terminal, and we can add a timout to auto-kill such processes. Here is (dirty) working example:
#!/usr/bin/env python3
import pty
from subprocess import STDOUT, check_output, TimeoutExpired
master_fd, slave_fd = pty.openpty()
try:
output1 = check_output(['ls', '/'], stdin=slave_fd, stderr=STDOUT, universal_newlines=True, timeout=3)
print(output1)
except TimeoutExpired:
print('Timed out')
try:
output2 = check_output(['vim'], stdin=slave_fd, stderr=STDOUT, universal_newlines=True, timeout=3)
print(output2)
except TimeoutExpired:
print('Timed out')
Note it is stdin that we need to take care of, not stdout or stderr.
You can refer to my answer in: https://stackoverflow.com/a/43012138/3555925, which use pseudo-terminal to make stdout no-blocking, and use select in handle stdin/stdout.
I can just modify the command var to 'vim'. And the script is working fine.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys
import select
import termios
import tty
import pty
from subprocess import Popen
command = 'vim'
# save original tty setting then set it to raw mode
old_tty = termios.tcgetattr(sys.stdin)
tty.setraw(sys.stdin.fileno())
# open pseudo-terminal to interact with subprocess
master_fd, slave_fd = pty.openpty()
# use os.setsid() process the leader of a new session, or bash job control will not be enabled
p = Popen(command,
preexec_fn=os.setsid,
stdin=slave_fd,
stdout=slave_fd,
stderr=slave_fd,
universal_newlines=True)
while p.poll() is None:
r, w, e = select.select([sys.stdin, master_fd], [], [])
if sys.stdin in r:
d = os.read(sys.stdin.fileno(), 10240)
os.write(master_fd, d)
elif master_fd in r:
o = os.read(master_fd, 10240)
if o:
os.write(sys.stdout.fileno(), o)
# restore tty settings back
termios.tcsetattr(sys.stdin, termios.TCSADRAIN, old_tty)

Detecting the end of the stream on popen.stdout.readline

I have a python program which launches subprocesses using Popen and consumes their output nearly real-time as it is produced. The code of the relevant loop is:
def run(self, output_consumer):
self.prepare_to_run()
popen_args = self.get_popen_args()
logging.debug("Calling popen with arguments %s" % popen_args)
self.popen = subprocess.Popen(**popen_args)
while True:
outdata = self.popen.stdout.readline()
if not outdata and self.popen.returncode is not None:
# Terminate when we've read all the output and the returncode is set
break
output_consumer.process_output(outdata)
self.popen.poll() # updates returncode so we can exit the loop
output_consumer.finish(self.popen.returncode)
self.post_run()
def get_popen_args(self):
return {
'args': self.command,
'shell': False, # Just being explicit for security's sake
'bufsize': 0, # More likely to see what's being printed as it happens
# Not guarantted since the process itself might buffer its output
# run `python -u` to unbuffer output of a python processes
'cwd': self.get_cwd(),
'env': self.get_environment(),
'stdout': subprocess.PIPE,
'stderr': subprocess.STDOUT,
'close_fds': True, # Doesn't seem to matter
}
This works great on my production machines, but on my dev machine, the call to .readline() hangs when certain subprocesses complete. That is, it will successfully process all of the output, including the final output line saying "process complete", but then will again poll readline and never return. This method exits properly on the dev machine for most of the sub-processes I call, but consistently fails to exit for one complex bash script that itself calls many sub-processes.
It's worth noting that popen.returncode gets set to a non-None (usually 0) value many lines before the end of the output. So I can't just break out of the loop when that is set or else I lose everything that gets spat out at the end of the process and is still buffered waiting for reading. The problem is that when I'm flushing the buffer at that point, I can't tell when I'm at the end because the last call to readline() hangs. Calling read() also hangs. Calling read(1) gets me every last character out, but also hangs after the final line. popen.stdout.closed is always False. How can I tell when I'm at the end?
All systems are running python 2.7.3 on Ubuntu 12.04LTS. FWIW, stderr is being merged with stdout using stderr=subprocess.STDOUT.
Why the difference? Is it failing to close stdout for some reason? Could the sub-sub-processes do something to keep it open somehow? Could it be because I'm launching the process from a terminal on my dev box, but in production it's launched as a daemon through supervisord? Would that change the way the pipes are processed and if so how do I normalize them?
The main code loop looks right. It could be that the pipe isn't closing because another process is keeping it open. For example, if script launches a background process that writes to stdout then the pipe will no close. Are you sure no other child process still running?
An idea is to change modes when you see the .returncode has set. Once you know the main process is done, read all its output from buffer, but don't get stuck waiting. You can use select to read from the pipe with a timeout. Set a several seconds timeout and you can clear the buffer without getting stuck waiting child process.
Without knowing the contents of the "one complex bash script" which causes the problem, there's too many possibilities to determine the exact cause.
However, focusing on the fact that you claim it works if you run your Python script under supervisord, then it might be getting stuck if a sub-process is trying to read from stdin, or just behaves differently if stdin is a tty, which (I presume) supervisord will redirect from /dev/null.
This minimal example seems to cope better with cases where my example test.sh runs subprocesses which try to read from stdin...
import os
import subprocess
f = subprocess.Popen(args='./test.sh',
shell=False,
bufsize=0,
stdin=open(os.devnull, 'rb'),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
close_fds=True)
while 1:
s = f.stdout.readline()
if not s and f.returncode is not None:
break
print s.strip()
f.poll()
print "done %d" % f.returncode
Otherwise, you can always fall back to using a non-blocking read, and bail out when you get your final output line saying "process complete", although it's a bit of a hack.
If you use readline() or read(), it should not hang. No need to check returncode or poll(). If it is hanging when you know the process is finished, it is most probably a subprocess keeping your pipe open, as others said before.
There are two things you could do to debug this:
* Try to reproduce with a minimal script instead of the current complex one, or
* Run that complex script with strace -f -e clone,execve,exit_group and see what is that script starting, and if any process is surviving the main script (check when the main script calls exit_group, if strace is still waiting after that, you have a child still alive).
I find that calls to read (or readline) sometimes hang, despite previously calling poll. So I resorted to calling select to find out if there is readable data. However, select without a timeout can hang, too, if the process was closed. So I call select in a semi-busy loop with a tiny timeout for each iteration (see below).
I'm not sure if you can adapt this to readline, as readline might hang if the final \n is missing, or if the process doesn't close its stdout before you close its stdin and/or terminate it. You could wrap this in a generator, and everytime you encounter a \n in stdout_collected, yield the current line.
Also note that in my actual code, I'm using pseudoterminals (pty) to wrap the popen handles (to more closely fake user input) but it should work without.
# handle to read from
handle = self.popen.stdout
# how many seconds to wait without data
timeout = 1
begin = datetime.now()
stdout_collected = ""
while self.popen.poll() is None:
try:
fds = select.select([handle], [], [], 0.01)[0]
except select.error, exc:
print exc
break
if len(fds) == 0:
# select timed out, no new data
delta = (datetime.now() - begin).total_seconds()
if delta > timeout:
return stdout_collected
# try longer
continue
else:
# have data, timeout counter resets again
begin = datetime.now()
for fd in fds:
if fd == handle:
data = os.read(handle, 1024)
# can handle the bytes as they come in here
# self._handle_stdout(data)
stdout_collected += data
# process exited
# if using a pseudoterminal, close the handles here
self.popen.wait()
Why are you setting the sdterr to STDOUT?
The real benefit of making a communicate() call on a subproces is that you are able to retrieve a tuple containining the stdout response as well as the stderr meesage.
Those might be useful if the logic depends on their succsss or failure.
Also, it would save you from the pain of having to iterate through lines. Communicate() gives you everything and there would be no unresolved questions about whether or not the full message was received
I wrote a demo with bash subprocess that can be easy explored.
A closed pipe can be recognized by '' in the output from readline(), while the output from an empty line is '\n'.
from subprocess import Popen, PIPE, STDOUT
p = Popen(['bash'], stdout=PIPE, stderr=STDOUT)
out = []
while True:
outdata = p.stdout.readline()
if not outdata:
break
#output_consumer.process_output(outdata)
print "* " + repr(outdata)
out.append(outdata)
print "* closed", repr(out)
print "* returncode", p.wait()
Example of input/output: Closing the pipe distinctly before terminating the process. That is why wait() should be used instead of poll()
[prompt] $ python myscript.py
echo abc
* 'abc\n'
exec 1>&- # close stdout
exec 2>&- # close stderr
* closed ['abc\n']
exit
* returncode 0
[prompt] $
Your code did output a huge number of empty strings for this case.
Example: Fast terminated process without '\n' on the last line:
echo -n abc
exit
* 'abc'
* closed ['abc']
* returncode 0

redirecting shell output using subprocess

I have a python script which calls a lot of shell functions. The script can be run interactively from a terminal, in which case I'd like to display output right away, or called by crontab, in which case I'd like to email error output.
I wrote a helper function for calling shell functions:
import subprocess
import shlex
import sys
def shell(cmdline, interactive=True):
args = shlex.split(cmdline.encode("ascii"))
proc = subprocess.Popen(args, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
val = proc.communicate()
if interactive is True:
if proc.returncode:
print "returncode " + str(proc.returncode)
print val[1]
sys.exit(1)
else:
print val[0]
else:
if proc.returncode:
print ""
# send email with val[0] + val[1]
if __name__ == "__main__":
# example of command that produces non-zero returncode
shell("ls -z")
The problem I'm having is two-fold.
1) In interactive mode, when the shell command takes a while to finish (e.g. few minutes), I don't see anything until the command is completely done since communicate() buffers output. Is there a way to display output as it comes in, and avoid buffering? I also need a way to check the returncode, which is why I'm using communicate().
2) Some shell commands I call can produce a lot of output (e.g. 2MB). The documentation for communicate() says "do not use this method if the data size is large or unlimited." Does anyone know how large is "large"?
1) When you use communicate, you capture the output of the subprocess so nothing is sent to your standard output. The only reason why you see the output when the subprocess is finished is because you print it yourself.
Since you want to either see it as it runs and not capture it or capture everything and do something with it only at the end, you can change the way it works in interactive mode by leaving stdout and stderr to None. This makes the subprocess use the same streams as your program. You'll also have to replace the call to communicate with a call to wait:
if interactive is True:
proc = subprocess.Popen(args)
proc.wait()
if proc.returncode:
print "returncode " + str(proc.returncode)
sys.exit(1)
else:
proc = subprocess.Popen(args, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
val = proc.communicate()
if proc.returncode:
print ""
# send email with val[0] + val[1]
2) Too large is "too large to store in memory", so it all depends on a lot of factors. If storing temporarily 2MB of data in memory is fine in your situation, then there's nothing to worry about.

Categories