Linux pipe finished reading but want to discard rest - python

I have a piece of code that is starting a process then reading from stdout to see if it has loaded OK.
After that, I'd ideally like to redirect the output to /dev/null or something that discards it. I was (A) what is the best practice in this situation and (B) what will happen to the writing process if the pipe becomes full? Will it ever block when the pipe becomes full and is not being read/cleared?
If the aim is to redirect to /dev/null would it be possible to show me how to to this with python and subprocess.Popen?
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
while True:
if init_string in proc.stderr.readline():
break;
proc.stderr.redirect ??

As far as I know, there is no way to close and reopen file descriptors of a child process after it has started executing. And yes, there is a limited buffer in the OS, so if you don't consume anything from the pipe, eventually the child process will block. That means you'll just have to keep reading from the pipe until it's closed from the write end.
If you want your program to continue doing something useful in the meantime, consider moving the data-consuming part to a separate thread (untested):
def read_all_from_pipe(pipe):
for line in pipe: # assuming it's line-based
pass
Thread(lambda: read_all_from_pipe(proc.stderr)).start()
There may be other ways to solve your problem, though. Why do you need to wait for some particular output in the first place? Shouldn't the child just die with a nonzero exit code if it didn't "load OK"? Can you instead check that the child is doing what it should, rather than that it's printing some arbitrary output?

If you would like to discard all the output:
python your_script.py > /dev/null
However, if you want to do it from Python you can use:
import sys
sys.stdout = open('file', 'w')
print 'this goes to file'
Everytime you print, the standard output has been redirected to the file "file", change that to /dev/null or any file you want and you will obtain the wanted results.

Related

Gather subprocess output nonblocking in Python

Is there an easy way of gathering the output of a subprocess without actually waiting for it?
I can think of creating a subprocess.Popen() with capturing its stdout, then call p.communicate(), but that would block until the subprocess terminates.
I can think of using subprocess.check_output() or similar, but that also would block.
I need something which I can start, then do other stuff, then check the subprocess for being terminated, and in case it is, takes its output.
I can think of two rather complicated ways to achieve this:
Redirect the output into a file, then after termination I can read the output from that file.
Implement and start a handler thread(!) which constantly tries to read data from the stdout of the subprocess and adds it to a buffer.
The first one needs temporary files and disk I/O which I do not really like in my case. The second one means implementing quite a bit.
I guess there might be a simpler way I couldn't think of yet, or a ready-to-be-used solution in some library I didn't find yet.
What's wrong with calling check_output in a thread?
import threading,subprocess
output = ""
def f():
global output
output = subprocess.check_output("ls") # ["cmd","/c","dir"] for windows
t = threading.Thread(target=f)
t.start()
print('Started')
t.join()
print(output)
note that one could be tempted to use p = subprocess.Popen(cmd,stdout=subprocess.PIPE), wait for p.poll() to be != None and try to read p.stdout afterwards: that only works when the output is small, else you get a deadlock because stdout buffer is full and you have to read it from time to time.
Using p.stdout.readline() would work but would also block if the process doesn't print on a regular basis. If your application prints to the output all the time, then you can consider it as non-blocking and the solution is acceptable.
I think what you want is an unbuffered stdout stream.
With that you will be able to capture the output of your process without waiting for it to finish.
You can achieve that with the subprocess.Popen() function and the parameter stdout=subprocess.PIPE.
Try something like this
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
line = proc.stdout.readline()
while line:
print line
line = proc.stdout.readline()

Read from pty without endless hanging

I have a script, that prints colored output if it is on tty. A bunch of them executes in parallel, so I can't put their stdout to tty. I don't have control over the script code either (to force coloring), so I want to fake it via pty. My code:
invocation = get_invocation()
master, slave = pty.openpty()
subprocess.call(invocation, stdout=slave)
print string_from_fd(master)
And I can't figure out, what should be in string_from_fd. For now, I have something like
def string_from_fd(fd):
return os.read(fd, 1000)
It works, but that number 1000 looks strange . I think output can be quiet large, and any number there could be not sufficient. I tried a lot of solutions from stack overflow, but none of them works (it prints nothing or hanging forever).
I am not very familiar with file descriptors and all that, so any clarification if I'm doing something wrong would be much appreciated.
Thanks!
This won't work for long outputs: subprocess.call will block once the PTY's buffer is full. That's why subprocess.communicate exists, but that won't work with a PTY.
The standard/easiest solution is to use the external module pexpect, which uses PTYs internally: For example,
pexpect.spawn("/bin/ls --color=auto").read()
will give you the ls output with color codes.
If you'd like to stick to subprocess, then you must use subprocess.Popen for the reason stated above. You are right in your assumption that by passing 1000, you read at most 1000 bytes, so you'll have to use a loop. os.read blocks if there is nothing to read and waits for data to appear. The catch is how to recognize when the process terminated: In this case, you know that no more data will arrive. The next call to os.read will block forever. Luckily, the operating system helps you detect this situation: If all file descriptors to the pseudo terminal that could be used for writing are closed, then os.read will either return an empty string or return an error, depending on the OS. You can check for this condition and exit the loop when this happens. Now the final piece to understanding the following code is to understand how open file descriptors and subprocess go together: subprocess.Popen internally calls fork(), which duplicates the current process including all open file descriptors, and then within one of the two execution paths calls exec(), which terminates the current process in favour of a new one. In the other execution path, control returns to your Python script. So after calling subprocess.Popen there are two valid file descriptors for the slave end of the PTY: One belongs to the spawned process, one to your Python script. If you close yours, then the only file descriptor that could be used to send data to the master end belongs to the spawned process. Upon its termination, it is closed, and the PTY enters the state where calls to read on the master end fail.
Here's the code:
import os
import pty
import subprocess
master, slave = pty.openpty()
process = subprocess.Popen("/bin/ls --color", shell=True, stdout=slave,
stdin=slave, stderr=slave, close_fds=True)
os.close(slave)
output = []
while True:
try:
data = os.read(master, 1024)
except OSError:
break
if not data:
break
output.append(data) # In Python 3, append ".decode()" to os.read()
output = "".join(output)

About piping stdio and subprocess.Popen

I have one Python program, that is opening another Python program via subprocess.Popen. The 1st is supposed to output some text into the console (just for info), and write some text to the 2nd program it had spawned. Then, it should wait for the 2nd program to respond (read() from it), and print that response.
The 2nd one is supposed to listen to the first one's input (via raw_input()) and then print text to the 1st.
To understand what exactly was happening, I had put a 5 second delay into the 2nd, and the result surprised me a bit.
Here's the code:
import subprocess
print "1st starting."
app = subprocess.Popen("name", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE) #<--- B
print "Writing something to app's STDIN..."
app.stdin.write(some_text)
print "Reading something from my STDIN..." #<--- A
result = app.stdout.read()
print "Result:"
print result
And for the 2nd one:
import time
print "app invoked."
print "Waiting for text from STDIN..."
text = raw_input()
#process(text)
time.sleep(5)
print "magic"
When I ran this code, it paused at point A, as that was the last console output.
After 5 seconds, the "Result:\n" line would be outputted, and everything the 2nd program had printed would show up in the console.
Why did the 1st program pause when reading the stdout of the 2nd one? Does it have to wait for its child to terminate before reading its output? How can this be changed so I can pass messages between programs?
I'm running Debian Linux 7.0.
The answer lies not in any magic related to the subprocess module, but in the typical behaviour of the read() method on Python objects.
If you run this:
import subprocess
p = subprocess.Popen(['ls'], stdout=subprocess.PIPE)
help(p.stdout.read)
You'll see this:
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
(END)
The same thing applies to all file-like objects. It's very simple: calling read() with no argument consumes the buffer until it encounters an error (usually EOF).
EOF is not sent until either:
the subprocess calls sys.stdout.close(), or
the subprocess exits and the Python runtime and/or OS kernel clean up its file descriptors
Beware that os.read has different behaviour - much more like typical buffered I/O in C. The built-in Python help function is useless, but if you're on any UNIXy system you should be able to run man 3 read; the Python behaviour more or less matches what's there.
A word of warning
The program above is fine, but patterns like that sometimes lead to a deadlock. The docs for the subprocess module warns about this where Popen.wait() is documented:
Warning
This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
It's possible to get in a similar situation if you're not careful during two-way communication with a subprocess, depending on what the subprocess is doing.
edit:
By the way, this page covers the behaviour of pipes with EOF:
If all file descriptors referring to the write end of a pipe have been
closed, then an attempt to read(2) from the pipe will see end-of-file
(read(2) will return 0).
edit 2:
As Lennart mentined above, if you want truly two-way communication that goes beyond write-once read-once, you'll also need to beware of buffering. If you read this you'll get some idea of it, but you should be aware that this is how buffered IO almost always works in UNIX-based systems - it's not a Python quirk. Run man stdio.h for more information.
You are asking program 1 to read input from program 2. And you are pausing program two for five seconds before it outputs anything. Obviously program 1 then needs to wait those five seconds. So what happens is perfectly expected.
Does it have to wait for its child to terminate before reading its output?
To some extent, yes, because input and output is buffered, so it's possible that even if you move the delay to after you print something the same will happen.
raw_input() will wait for a linefeed, in any case.

Detecting the end of the stream on popen.stdout.readline

I have a python program which launches subprocesses using Popen and consumes their output nearly real-time as it is produced. The code of the relevant loop is:
def run(self, output_consumer):
self.prepare_to_run()
popen_args = self.get_popen_args()
logging.debug("Calling popen with arguments %s" % popen_args)
self.popen = subprocess.Popen(**popen_args)
while True:
outdata = self.popen.stdout.readline()
if not outdata and self.popen.returncode is not None:
# Terminate when we've read all the output and the returncode is set
break
output_consumer.process_output(outdata)
self.popen.poll() # updates returncode so we can exit the loop
output_consumer.finish(self.popen.returncode)
self.post_run()
def get_popen_args(self):
return {
'args': self.command,
'shell': False, # Just being explicit for security's sake
'bufsize': 0, # More likely to see what's being printed as it happens
# Not guarantted since the process itself might buffer its output
# run `python -u` to unbuffer output of a python processes
'cwd': self.get_cwd(),
'env': self.get_environment(),
'stdout': subprocess.PIPE,
'stderr': subprocess.STDOUT,
'close_fds': True, # Doesn't seem to matter
}
This works great on my production machines, but on my dev machine, the call to .readline() hangs when certain subprocesses complete. That is, it will successfully process all of the output, including the final output line saying "process complete", but then will again poll readline and never return. This method exits properly on the dev machine for most of the sub-processes I call, but consistently fails to exit for one complex bash script that itself calls many sub-processes.
It's worth noting that popen.returncode gets set to a non-None (usually 0) value many lines before the end of the output. So I can't just break out of the loop when that is set or else I lose everything that gets spat out at the end of the process and is still buffered waiting for reading. The problem is that when I'm flushing the buffer at that point, I can't tell when I'm at the end because the last call to readline() hangs. Calling read() also hangs. Calling read(1) gets me every last character out, but also hangs after the final line. popen.stdout.closed is always False. How can I tell when I'm at the end?
All systems are running python 2.7.3 on Ubuntu 12.04LTS. FWIW, stderr is being merged with stdout using stderr=subprocess.STDOUT.
Why the difference? Is it failing to close stdout for some reason? Could the sub-sub-processes do something to keep it open somehow? Could it be because I'm launching the process from a terminal on my dev box, but in production it's launched as a daemon through supervisord? Would that change the way the pipes are processed and if so how do I normalize them?
The main code loop looks right. It could be that the pipe isn't closing because another process is keeping it open. For example, if script launches a background process that writes to stdout then the pipe will no close. Are you sure no other child process still running?
An idea is to change modes when you see the .returncode has set. Once you know the main process is done, read all its output from buffer, but don't get stuck waiting. You can use select to read from the pipe with a timeout. Set a several seconds timeout and you can clear the buffer without getting stuck waiting child process.
Without knowing the contents of the "one complex bash script" which causes the problem, there's too many possibilities to determine the exact cause.
However, focusing on the fact that you claim it works if you run your Python script under supervisord, then it might be getting stuck if a sub-process is trying to read from stdin, or just behaves differently if stdin is a tty, which (I presume) supervisord will redirect from /dev/null.
This minimal example seems to cope better with cases where my example test.sh runs subprocesses which try to read from stdin...
import os
import subprocess
f = subprocess.Popen(args='./test.sh',
shell=False,
bufsize=0,
stdin=open(os.devnull, 'rb'),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
close_fds=True)
while 1:
s = f.stdout.readline()
if not s and f.returncode is not None:
break
print s.strip()
f.poll()
print "done %d" % f.returncode
Otherwise, you can always fall back to using a non-blocking read, and bail out when you get your final output line saying "process complete", although it's a bit of a hack.
If you use readline() or read(), it should not hang. No need to check returncode or poll(). If it is hanging when you know the process is finished, it is most probably a subprocess keeping your pipe open, as others said before.
There are two things you could do to debug this:
* Try to reproduce with a minimal script instead of the current complex one, or
* Run that complex script with strace -f -e clone,execve,exit_group and see what is that script starting, and if any process is surviving the main script (check when the main script calls exit_group, if strace is still waiting after that, you have a child still alive).
I find that calls to read (or readline) sometimes hang, despite previously calling poll. So I resorted to calling select to find out if there is readable data. However, select without a timeout can hang, too, if the process was closed. So I call select in a semi-busy loop with a tiny timeout for each iteration (see below).
I'm not sure if you can adapt this to readline, as readline might hang if the final \n is missing, or if the process doesn't close its stdout before you close its stdin and/or terminate it. You could wrap this in a generator, and everytime you encounter a \n in stdout_collected, yield the current line.
Also note that in my actual code, I'm using pseudoterminals (pty) to wrap the popen handles (to more closely fake user input) but it should work without.
# handle to read from
handle = self.popen.stdout
# how many seconds to wait without data
timeout = 1
begin = datetime.now()
stdout_collected = ""
while self.popen.poll() is None:
try:
fds = select.select([handle], [], [], 0.01)[0]
except select.error, exc:
print exc
break
if len(fds) == 0:
# select timed out, no new data
delta = (datetime.now() - begin).total_seconds()
if delta > timeout:
return stdout_collected
# try longer
continue
else:
# have data, timeout counter resets again
begin = datetime.now()
for fd in fds:
if fd == handle:
data = os.read(handle, 1024)
# can handle the bytes as they come in here
# self._handle_stdout(data)
stdout_collected += data
# process exited
# if using a pseudoterminal, close the handles here
self.popen.wait()
Why are you setting the sdterr to STDOUT?
The real benefit of making a communicate() call on a subproces is that you are able to retrieve a tuple containining the stdout response as well as the stderr meesage.
Those might be useful if the logic depends on their succsss or failure.
Also, it would save you from the pain of having to iterate through lines. Communicate() gives you everything and there would be no unresolved questions about whether or not the full message was received
I wrote a demo with bash subprocess that can be easy explored.
A closed pipe can be recognized by '' in the output from readline(), while the output from an empty line is '\n'.
from subprocess import Popen, PIPE, STDOUT
p = Popen(['bash'], stdout=PIPE, stderr=STDOUT)
out = []
while True:
outdata = p.stdout.readline()
if not outdata:
break
#output_consumer.process_output(outdata)
print "* " + repr(outdata)
out.append(outdata)
print "* closed", repr(out)
print "* returncode", p.wait()
Example of input/output: Closing the pipe distinctly before terminating the process. That is why wait() should be used instead of poll()
[prompt] $ python myscript.py
echo abc
* 'abc\n'
exec 1>&- # close stdout
exec 2>&- # close stderr
* closed ['abc\n']
exit
* returncode 0
[prompt] $
Your code did output a huge number of empty strings for this case.
Example: Fast terminated process without '\n' on the last line:
echo -n abc
exit
* 'abc'
* closed ['abc']
* returncode 0

scrambled output from a child process run from subprocess

I'm using the following code to run another python script. The problem I'm facing is that the output of that script is coming out in an unorderly manner.
While running it from the command line, I get the correct output i.e. :
some output here
Editing xml file and saving changes
Uploading xml file back..
While running the script using subprocess, am getting some of the output in reverse order:
correct output till here
Uploading xml file back..
Editing xml file and saving changes
The script is executing without errors and making the right changes. So I think the culprit might be the code that is calling the child script, but I can't find the problem:
cmd = "child_script.py"
proc = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
(fout ,ferr) = ( proc.stdout, proc.stderr )
print "Going inside while - loop"
while True:
line = proc.stdout.readline()
print line
fo.write(line)
try :
err = ferr.readline()
fe.write(err)
except Exception, e:
pass
if not line:
pass
break
[EDIT]: fo and fe are file handles to output and error logs. Also the script is being run on Windows.Sorry for missing these details.
There are a few problems with the part of the script you've quoted, I'm afraid:
As mentioned in detly's comment, what are fo and fe? Presumably those are objects to which you're writing the output of the child process? (Update: you indicate that these are both for writing output logs.)
There's an indentation error on line 3. (Update: I've fixed that in the original post.)
You're specifying stderr=subprocess.STDOUT, so: (a) ferr will always be None in your loop and (b) due to buffering, standard output and error may be mixed in an unpredictable way. However, it looks from your code as if you actually want to deal with standard output and standard error separately, so perhaps try stderr=subprocess.PIPE instead.
It would be a good idea to rewrite your loop as jsbueno suggests:
from subprocess import Popen, PIPE
proc = Popen(["child_script.py"], stdout=PIPE, stderr=PIPE)
fout, ferr = proc.stdout, proc.stderr
for line in fout:
print(line.rstrip())
fo.write(line)
for line in ferr:
fe.write(line)
... or to reduce it even further, since it seems that the aim is essentially that you just want to write the standard output and standard error from the child process to fo and fe, just do:
proc = subprocess.Popen(["child_script.py"], stdout=fo, stderr=fe)
If you still see the output lines swapped in the file that fo is writing to, then we can only assume that there is some way in which this can happen in the child script. e.g. is the child script multi-threaded? Is one of the lines printed via a callback from another function?
Most of the times I've seen order of output differ based on execution, some output was sent to the C standard IO streams stdin, and some output was sent to stderr. The buffering characteristics of stdout and stderr vary depending upon if they are connected to a terminal, pipes, files, etc:
NOTES
The stream stderr is unbuffered. The stream stdout is
line-buffered when it points to a terminal. Partial lines
will not appear until fflush(3) or exit(3) is called, or a
newline is printed. This can produce unexpected results,
especially with debugging output. The buffering mode of
the standard streams (or any other stream) can be changed
using the setbuf(3) or setvbuf(3) call. Note that in case
stdin is associated with a terminal, there may also be
input buffering in the terminal driver, entirely unrelated
to stdio buffering. (Indeed, normally terminal input is
line buffered in the kernel.) This kernel input handling
can be modified using calls like tcsetattr(3); see also
stty(1), and termios(3).
So perhaps you should configure both stdout and stderr to go to the same source, so the same buffering will be applied to both streams.
Also, some programs open the terminal directly open("/dev/tty",...) (mostly so they can read passwords), so comparing terminal output with pipe output isn't always going to work.
Further, if your program is mixing direct write(2) calls with standard IO calls, the order of output can be different based on the different buffering choices.
I hope one of these is right :) let me know which, if any.

Categories