Python reading Popen continuously (Windows) - python

Im trying to stdout.readline and put the results (i.e each line, at the time of printing them to the terminal) on a multiprocessing.Queue for us in another .py file. However, the call:
res = subprocess.Popen(command, stdout=subprocess.PIPE, bufsize=1 )
with res.stdout:
for line in iter(res.stdout.readline, b''):
print line
res.wait()
Will block and the results will be printed after the process is complete (or not at all if exit code isn't returned).
I've browsed SO for answers to this, and tried setting bufsize=1, spawning threads that handle the reading, using filedescriptors, etc. None seem to work. I might have to use the module pexpect but I'm not sure how it works yet.
I have also tried
def enqueue_output(self, out, queue):
for line in iter(out.readline, b''):
queue.put([line])
out.close()
To put the data on the queue, but since out.readline seems to block, the result will be the same.
In short: How do I make the subprocess output available to me at the time of print? It prints chunks of 1-10 lines at a time, however these are returned to me when the process completes, separated by newlines as well..
Related:
Python subprocess readlines() hangs
Python: read streaming input from subprocess.communicate()
Non-blocking read on a subprocess.PIPE in python

As explained by #eryksun, and confirmed by your comment, the cause of the buffering is the use of printf by the C application.
By default, printf buffers its output, but the output is flushed on newline or if a read occurs when the output is directed to a terminal. When the output is directed to a file or a pipe, the actual output only occurs when the buffer is full.
Fortunately on Windows, there is no low level buffering (*). That means that calling setvbuf(stdout, NULL, _IONBF, 0); near the beginning of the program would be enough. But unfortunately, you need no buffering at all (_IONBF), because line buffering on Windows is implemented as full buffering.
(*) On Unix or Linux systems, the underlying system call can add its own buffering. That means that a program using low level write(1, buf, strlen(buf)); will be unbuffered on Windows, but will still be buffered on Linux when standard output is connected to a pipe or a file.

Related

Avoid Deadlock wtih Popen and stdout = PIPE in python

I am executing a shell script using Popen. I am also using stdout=PIPE to capture the output.The code is
pipe = Popen('acbd.sh', shell=True, stdout = PIPE)
while pipe.poll() is None:
time.sleep(0.5)
text = pipe.communicate()[0]
if pipe.returncode == 0:
print "File executed"
According to documentation using poll with stdout = PIPE can lead to deadlock. Also communicate() can be used to solve this problem. I have used communicate() here.
Will my code lead to deadlock with communicate too or am I using communicate usage wrong?
Also I have an alternate in subprocess.check_output but I would prefer to use Popen and record the output with same.
Yes, you can deadlock, because of these two lines:
while pipe.poll() is None:
time.sleep(0.5)
Take them out; there's no need for them here. communicate() will wait for the subprocess to close its FDs (as happens on exit) as it is; when you add a loop yourself, and don't read until after that loop completes, then your program can be stuck indefinitely trying to write contents which can't be written until communicate() causes the other side of the pipeline to start reading.
As background: The POSIX specification for the write() call does not make any guarantees about the amount of data that can be written to a FIFO before it will block, or that this amount of data will be consistent even within a given system -- thus, the safe thing is to assume that any write to a FIFO is always allowed to block unless there's a reader actively consuming that data.

About piping stdio and subprocess.Popen

I have one Python program, that is opening another Python program via subprocess.Popen. The 1st is supposed to output some text into the console (just for info), and write some text to the 2nd program it had spawned. Then, it should wait for the 2nd program to respond (read() from it), and print that response.
The 2nd one is supposed to listen to the first one's input (via raw_input()) and then print text to the 1st.
To understand what exactly was happening, I had put a 5 second delay into the 2nd, and the result surprised me a bit.
Here's the code:
import subprocess
print "1st starting."
app = subprocess.Popen("name", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE) #<--- B
print "Writing something to app's STDIN..."
app.stdin.write(some_text)
print "Reading something from my STDIN..." #<--- A
result = app.stdout.read()
print "Result:"
print result
And for the 2nd one:
import time
print "app invoked."
print "Waiting for text from STDIN..."
text = raw_input()
#process(text)
time.sleep(5)
print "magic"
When I ran this code, it paused at point A, as that was the last console output.
After 5 seconds, the "Result:\n" line would be outputted, and everything the 2nd program had printed would show up in the console.
Why did the 1st program pause when reading the stdout of the 2nd one? Does it have to wait for its child to terminate before reading its output? How can this be changed so I can pass messages between programs?
I'm running Debian Linux 7.0.
The answer lies not in any magic related to the subprocess module, but in the typical behaviour of the read() method on Python objects.
If you run this:
import subprocess
p = subprocess.Popen(['ls'], stdout=subprocess.PIPE)
help(p.stdout.read)
You'll see this:
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
(END)
The same thing applies to all file-like objects. It's very simple: calling read() with no argument consumes the buffer until it encounters an error (usually EOF).
EOF is not sent until either:
the subprocess calls sys.stdout.close(), or
the subprocess exits and the Python runtime and/or OS kernel clean up its file descriptors
Beware that os.read has different behaviour - much more like typical buffered I/O in C. The built-in Python help function is useless, but if you're on any UNIXy system you should be able to run man 3 read; the Python behaviour more or less matches what's there.
A word of warning
The program above is fine, but patterns like that sometimes lead to a deadlock. The docs for the subprocess module warns about this where Popen.wait() is documented:
Warning
This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
It's possible to get in a similar situation if you're not careful during two-way communication with a subprocess, depending on what the subprocess is doing.
edit:
By the way, this page covers the behaviour of pipes with EOF:
If all file descriptors referring to the write end of a pipe have been
closed, then an attempt to read(2) from the pipe will see end-of-file
(read(2) will return 0).
edit 2:
As Lennart mentined above, if you want truly two-way communication that goes beyond write-once read-once, you'll also need to beware of buffering. If you read this you'll get some idea of it, but you should be aware that this is how buffered IO almost always works in UNIX-based systems - it's not a Python quirk. Run man stdio.h for more information.
You are asking program 1 to read input from program 2. And you are pausing program two for five seconds before it outputs anything. Obviously program 1 then needs to wait those five seconds. So what happens is perfectly expected.
Does it have to wait for its child to terminate before reading its output?
To some extent, yes, because input and output is buffered, so it's possible that even if you move the delay to after you print something the same will happen.
raw_input() will wait for a linefeed, in any case.

Python Popen().stdout.read() hang

I'm trying to get output of another script, using Python's subprocess.Popen like follows
process = Popen(command, stdout=PIPE, shell=True)
exitcode = process.wait()
output = process.stdout.read() # hangs here
It hangs at the third line, only when I run it as a python script and I cannot reproduce this in the python shell.
The other script prints just a few words and I am assuming that it's not a buffer issue.
Does anyone has idea about what I am doing wrong here?
You probably want to use .communicate() rather than .wait() plus .read(). Note the warning about wait() on the subprocess documentation page:
Warning This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
http://docs.python.org/2/library/subprocess.html#subprocess.Popen.wait
read() waits for EOF before returning.
You can:
wait for the subprocess to die, then read() will return.
use readline() if your output is broken into lines (will still hang if no output lines).
use os.read(F,N) which returns at most N bytes from F, but will still block if the pipe is empty (unless O_NONBLOCK is set on the fd).
You can see how to deal with hanging reading of stdout/stderr in the next sources:
readingproc

scrambled output from a child process run from subprocess

I'm using the following code to run another python script. The problem I'm facing is that the output of that script is coming out in an unorderly manner.
While running it from the command line, I get the correct output i.e. :
some output here
Editing xml file and saving changes
Uploading xml file back..
While running the script using subprocess, am getting some of the output in reverse order:
correct output till here
Uploading xml file back..
Editing xml file and saving changes
The script is executing without errors and making the right changes. So I think the culprit might be the code that is calling the child script, but I can't find the problem:
cmd = "child_script.py"
proc = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
(fout ,ferr) = ( proc.stdout, proc.stderr )
print "Going inside while - loop"
while True:
line = proc.stdout.readline()
print line
fo.write(line)
try :
err = ferr.readline()
fe.write(err)
except Exception, e:
pass
if not line:
pass
break
[EDIT]: fo and fe are file handles to output and error logs. Also the script is being run on Windows.Sorry for missing these details.
There are a few problems with the part of the script you've quoted, I'm afraid:
As mentioned in detly's comment, what are fo and fe? Presumably those are objects to which you're writing the output of the child process? (Update: you indicate that these are both for writing output logs.)
There's an indentation error on line 3. (Update: I've fixed that in the original post.)
You're specifying stderr=subprocess.STDOUT, so: (a) ferr will always be None in your loop and (b) due to buffering, standard output and error may be mixed in an unpredictable way. However, it looks from your code as if you actually want to deal with standard output and standard error separately, so perhaps try stderr=subprocess.PIPE instead.
It would be a good idea to rewrite your loop as jsbueno suggests:
from subprocess import Popen, PIPE
proc = Popen(["child_script.py"], stdout=PIPE, stderr=PIPE)
fout, ferr = proc.stdout, proc.stderr
for line in fout:
print(line.rstrip())
fo.write(line)
for line in ferr:
fe.write(line)
... or to reduce it even further, since it seems that the aim is essentially that you just want to write the standard output and standard error from the child process to fo and fe, just do:
proc = subprocess.Popen(["child_script.py"], stdout=fo, stderr=fe)
If you still see the output lines swapped in the file that fo is writing to, then we can only assume that there is some way in which this can happen in the child script. e.g. is the child script multi-threaded? Is one of the lines printed via a callback from another function?
Most of the times I've seen order of output differ based on execution, some output was sent to the C standard IO streams stdin, and some output was sent to stderr. The buffering characteristics of stdout and stderr vary depending upon if they are connected to a terminal, pipes, files, etc:
NOTES
The stream stderr is unbuffered. The stream stdout is
line-buffered when it points to a terminal. Partial lines
will not appear until fflush(3) or exit(3) is called, or a
newline is printed. This can produce unexpected results,
especially with debugging output. The buffering mode of
the standard streams (or any other stream) can be changed
using the setbuf(3) or setvbuf(3) call. Note that in case
stdin is associated with a terminal, there may also be
input buffering in the terminal driver, entirely unrelated
to stdio buffering. (Indeed, normally terminal input is
line buffered in the kernel.) This kernel input handling
can be modified using calls like tcsetattr(3); see also
stty(1), and termios(3).
So perhaps you should configure both stdout and stderr to go to the same source, so the same buffering will be applied to both streams.
Also, some programs open the terminal directly open("/dev/tty",...) (mostly so they can read passwords), so comparing terminal output with pipe output isn't always going to work.
Further, if your program is mixing direct write(2) calls with standard IO calls, the order of output can be different based on the different buffering choices.
I hope one of these is right :) let me know which, if any.

Bypassing buffering of subprocess output with popen in C or Python

I have a general question about popen (and all related functions), applicable to all operating systems, when I write a python script or some c code and run the resulting executable from the console (win or linux), i can immediately see the output from the process. However, if I run the same executable as a forked process with its stdout redirected into a pipe, the output buffers somewhere, usually up to 4096 bytes before it is written to the pipe where the parent process can read it.
The following python script will generate output in chunks of 1024 bytes
import os, sys, time
if __name__ == "__main__":
dye = '#'*1024
for i in range (0,8):
print dye
time.sleep(1)
The following python script will execute the previous script and read the output as soon as it comes to the pipe, byte by byte
import os, sys, subprocess, time, thread
if __name__ == "__main__":
execArgs = ["c:\\python25\\python.exe", "C:\\Scripts\\PythonScratch\\byte_stream.py"]
p = subprocess.Popen(execArgs, bufsize=0, stdout=subprocess.PIPE)
while p.returncode == None:
data = p.stdout.read(1)
sys.stdout.write(data)
p.poll()
Adjust the path for your operating system. When run in this configuration, the output will not appear in chunks of 1024 but chunks of 4096, despite the buffer size of the popen command being set to 0 (which is the default anyway). Can anyone tell me how to change this behaviour?, is there any way I can force the operating system to treat the output from the forked process in the same way as when it is run from the console?, ie, just feed the data through without buffering?
In general, the standard C runtime library (that's running on behalf of just about every program on every system, more or less;-) detects whether stdout is a terminal or not; if not, it buffers the output (which can be a huge efficiency win, compared to unbuffered output).
If you're in control of the program that's doing the writing, you can (as another answer suggested) flush stdout continuously, or (more elegantly if feasible) try to force stdout to be unbuffered, e.g. by running Python with the -u commandline flag:
-u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
see man page for details on internal buffering relating to '-u'
(what the man page adds is a mention of stdin and issues with binary mode[s]).
If you can't or don't want to touch the program that's writing, -u or the like on the program that's just reading is unlikely to help (the buffering that matters most is the one happening on the writer's stdout, not the one on the reader's stdin). The alternative is to trick the writer into believing that it's writing to a terminal (even though in fact it's writing to another program!), via the pty standard library module or the higher-level third party pexpect module (or, for Windows, its port wexpect).
Thats correct, and applies to both Windows and Linux (and possibly other systems), with popen() and fopen(). If you want the output buffer to be dispatched before 4096 bytes, use fflush() (on C) or sys.stdout.flush() (Python).

Categories