scrambled output from a child process run from subprocess

scrambled output from a child process run from subprocess - python

I'm using the following code to run another python script. The problem I'm facing is that the output of that script is coming out in an unorderly manner.
While running it from the command line, I get the correct output i.e. :
some output here
Editing xml file and saving changes
Uploading xml file back..
While running the script using subprocess, am getting some of the output in reverse order:
correct output till here
Uploading xml file back..
Editing xml file and saving changes
The script is executing without errors and making the right changes. So I think the culprit might be the code that is calling the child script, but I can't find the problem:
cmd = "child_script.py"
proc = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
(fout ,ferr) = ( proc.stdout, proc.stderr )
print "Going inside while - loop"
while True:
line = proc.stdout.readline()
print line
fo.write(line)
try :
err = ferr.readline()
fe.write(err)
except Exception, e:
pass
if not line:
pass
break
[EDIT]: fo and fe are file handles to output and error logs. Also the script is being run on Windows.Sorry for missing these details.

There are a few problems with the part of the script you've quoted, I'm afraid:
As mentioned in detly's comment, what are fo and fe? Presumably those are objects to which you're writing the output of the child process? (Update: you indicate that these are both for writing output logs.)
There's an indentation error on line 3. (Update: I've fixed that in the original post.)
You're specifying stderr=subprocess.STDOUT, so: (a) ferr will always be None in your loop and (b) due to buffering, standard output and error may be mixed in an unpredictable way. However, it looks from your code as if you actually want to deal with standard output and standard error separately, so perhaps try stderr=subprocess.PIPE instead.
It would be a good idea to rewrite your loop as jsbueno suggests:
from subprocess import Popen, PIPE
proc = Popen(["child_script.py"], stdout=PIPE, stderr=PIPE)
fout, ferr = proc.stdout, proc.stderr
for line in fout:
print(line.rstrip())
fo.write(line)
for line in ferr:
fe.write(line)
... or to reduce it even further, since it seems that the aim is essentially that you just want to write the standard output and standard error from the child process to fo and fe, just do:
proc = subprocess.Popen(["child_script.py"], stdout=fo, stderr=fe)
If you still see the output lines swapped in the file that fo is writing to, then we can only assume that there is some way in which this can happen in the child script. e.g. is the child script multi-threaded? Is one of the lines printed via a callback from another function?

Most of the times I've seen order of output differ based on execution, some output was sent to the C standard IO streams stdin, and some output was sent to stderr. The buffering characteristics of stdout and stderr vary depending upon if they are connected to a terminal, pipes, files, etc:
NOTES
The stream stderr is unbuffered. The stream stdout is
line-buffered when it points to a terminal. Partial lines
will not appear until fflush(3) or exit(3) is called, or a
newline is printed. This can produce unexpected results,
especially with debugging output. The buffering mode of
the standard streams (or any other stream) can be changed
using the setbuf(3) or setvbuf(3) call. Note that in case
stdin is associated with a terminal, there may also be
input buffering in the terminal driver, entirely unrelated
to stdio buffering. (Indeed, normally terminal input is
line buffered in the kernel.) This kernel input handling
can be modified using calls like tcsetattr(3); see also
stty(1), and termios(3).
So perhaps you should configure both stdout and stderr to go to the same source, so the same buffering will be applied to both streams.
Also, some programs open the terminal directly open("/dev/tty",...) (mostly so they can read passwords), so comparing terminal output with pipe output isn't always going to work.
Further, if your program is mixing direct write(2) calls with standard IO calls, the order of output can be different based on the different buffering choices.
I hope one of these is right :) let me know which, if any.

Related

Writing input to a process opened with Popen

I have a program called my_program that operates a system. the program runs on Linux, and I'm trying to automate it using Python.
my_program is constantly generating output and is suppose to receive input and respond to it.
When I'm running my_program in bash it does work like it should, I receive a constant output from the program and when I press a certain sequence (for instance /3 to change the mode of the system), the program responds with an output.
to start the process I am using:
self.process = Popen(my_program,stdin=PIPE,stdout=PIPE,text=True)
And in order to write input to the system I am using:
self.process.stdin.write('/3')
But the writing does not seem to work, I also tried using:
self.process.communicate('/3)
But since my system constantly generating output, it deadlooks the process and the whole program gets stuck.
Any solution for writing to a process that is constantly generating output?
Edit:
I don't think I can provide a code that can reproduce the problem because I'm using a unique SW that my company has, but it goes somthing like this:
self.process = Popen(my_program,stdin=PIPE,stdout=PIPE,text=True)
self.process.stdin.write('/3')
# try to find a specific string that indicated that the input string was received
string_received = False
while(string_received = False):
response = self.process.stdout.readline().strip()
if (response == expected_string):
break

The operating system implements buffered I/O between processes unless you specifically request otherwise.
In very brief, the output buffer will be flushed and written when it fills up, or (with default options) when you write a newline.
You can disable buffering when you create the Popen object:
self.process = Popen(my_program, stdin=PIPE, stdout=PIPE, text=True, bufsize=1)
... or you can explicitly flush() the file handle when you want to force writing.
self.process.stdin.flush()
However, as the documentation warns you, if you can't predict when the subprocess can read and when it can write, you can easily end up in deadlock. A more maintainable solution might be to run the subprocess via pexpect or similar.

Python reading Popen continuously (Windows)

Im trying to stdout.readline and put the results (i.e each line, at the time of printing them to the terminal) on a multiprocessing.Queue for us in another .py file. However, the call:
res = subprocess.Popen(command, stdout=subprocess.PIPE, bufsize=1 )
with res.stdout:
for line in iter(res.stdout.readline, b''):
print line
res.wait()
Will block and the results will be printed after the process is complete (or not at all if exit code isn't returned).
I've browsed SO for answers to this, and tried setting bufsize=1, spawning threads that handle the reading, using filedescriptors, etc. None seem to work. I might have to use the module pexpect but I'm not sure how it works yet.
I have also tried
def enqueue_output(self, out, queue):
for line in iter(out.readline, b''):
queue.put([line])
out.close()
To put the data on the queue, but since out.readline seems to block, the result will be the same.
In short: How do I make the subprocess output available to me at the time of print? It prints chunks of 1-10 lines at a time, however these are returned to me when the process completes, separated by newlines as well..
Related:
Python subprocess readlines() hangs
Python: read streaming input from subprocess.communicate()
Non-blocking read on a subprocess.PIPE in python

As explained by #eryksun, and confirmed by your comment, the cause of the buffering is the use of printf by the C application.
By default, printf buffers its output, but the output is flushed on newline or if a read occurs when the output is directed to a terminal. When the output is directed to a file or a pipe, the actual output only occurs when the buffer is full.
Fortunately on Windows, there is no low level buffering (*). That means that calling setvbuf(stdout, NULL, _IONBF, 0); near the beginning of the program would be enough. But unfortunately, you need no buffering at all (_IONBF), because line buffering on Windows is implemented as full buffering.
(*) On Unix or Linux systems, the underlying system call can add its own buffering. That means that a program using low level write(1, buf, strlen(buf)); will be unbuffered on Windows, but will still be buffered on Linux when standard output is connected to a pipe or a file.

About piping stdio and subprocess.Popen

I have one Python program, that is opening another Python program via subprocess.Popen. The 1st is supposed to output some text into the console (just for info), and write some text to the 2nd program it had spawned. Then, it should wait for the 2nd program to respond (read() from it), and print that response.
The 2nd one is supposed to listen to the first one's input (via raw_input()) and then print text to the 1st.
To understand what exactly was happening, I had put a 5 second delay into the 2nd, and the result surprised me a bit.
Here's the code:
import subprocess
print "1st starting."
app = subprocess.Popen("name", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE) #<--- B
print "Writing something to app's STDIN..."
app.stdin.write(some_text)
print "Reading something from my STDIN..." #<--- A
result = app.stdout.read()
print "Result:"
print result
And for the 2nd one:
import time
print "app invoked."
print "Waiting for text from STDIN..."
text = raw_input()
#process(text)
time.sleep(5)
print "magic"
When I ran this code, it paused at point A, as that was the last console output.
After 5 seconds, the "Result:\n" line would be outputted, and everything the 2nd program had printed would show up in the console.
Why did the 1st program pause when reading the stdout of the 2nd one? Does it have to wait for its child to terminate before reading its output? How can this be changed so I can pass messages between programs?
I'm running Debian Linux 7.0.

The answer lies not in any magic related to the subprocess module, but in the typical behaviour of the read() method on Python objects.
If you run this:
import subprocess
p = subprocess.Popen(['ls'], stdout=subprocess.PIPE)
help(p.stdout.read)
You'll see this:
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
(END)
The same thing applies to all file-like objects. It's very simple: calling read() with no argument consumes the buffer until it encounters an error (usually EOF).
EOF is not sent until either:
the subprocess calls sys.stdout.close(), or
the subprocess exits and the Python runtime and/or OS kernel clean up its file descriptors
Beware that os.read has different behaviour - much more like typical buffered I/O in C. The built-in Python help function is useless, but if you're on any UNIXy system you should be able to run man 3 read; the Python behaviour more or less matches what's there.
A word of warning
The program above is fine, but patterns like that sometimes lead to a deadlock. The docs for the subprocess module warns about this where Popen.wait() is documented:
Warning
This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
It's possible to get in a similar situation if you're not careful during two-way communication with a subprocess, depending on what the subprocess is doing.
edit:
By the way, this page covers the behaviour of pipes with EOF:
If all file descriptors referring to the write end of a pipe have been
closed, then an attempt to read(2) from the pipe will see end-of-file
(read(2) will return 0).
edit 2:
As Lennart mentined above, if you want truly two-way communication that goes beyond write-once read-once, you'll also need to beware of buffering. If you read this you'll get some idea of it, but you should be aware that this is how buffered IO almost always works in UNIX-based systems - it's not a Python quirk. Run man stdio.h for more information.

You are asking program 1 to read input from program 2. And you are pausing program two for five seconds before it outputs anything. Obviously program 1 then needs to wait those five seconds. So what happens is perfectly expected.
Does it have to wait for its child to terminate before reading its output?
To some extent, yes, because input and output is buffered, so it's possible that even if you move the delay to after you print something the same will happen.
raw_input() will wait for a linefeed, in any case.

Asynchronously read stdout from subprocess.Popen

I am running a sub-program using subprocess.popen. When I start my Python program from the command window (cmd.exe), the program writes some info and dates in the window as the program evolves.
When I run my Python code not in a command window, it opens a new command window for this sub-program's output, and I want to avoid that. When I used the following code, it doesn't show the cmd window, but it also doesn't print the status:
p = subprocess.Popen("c:/flow/flow.exe", shell=True, stdout=subprocess.PIPE)
print p.stdout.read()
How can I show the sub-program's output in my program's output as it occurs?

Use this:
cmd = subprocess.Popen(["c:/flow/flow.exe"], stdout=subprocess.PIPE)
for line in cmd.stdout:
print line.rstrip("\n")
cmd.wait() # you may already be handling this in your current code
Note that you will still have to wait for the sub-program to flush its stdout buffer (which is commonly buffered differently when not writing to a terminal window), so you may not see each line instantaneously as the sub-program prints it (this depends on various OS details and details of the sub-program).
Also notice how I've removed the shell=True and replaced the string argument with a list, which is generally recommended.

Looking for a recipe to process Popen data asynchronously I stumbled upon http://code.activestate.com/recipes/576759-subprocess-with-async-io-pipes-class/
This looks quite promising, however I got the impression that there might be some typos in it. Not tried it yet.

It is an old post, but a common problem with a hard to find solution. Try this: http://code.activestate.com/recipes/440554-module-to-allow-asynchronous-subprocess-use-on-win/

Using subprocess.Popen for Process with Large Output

I have some Python code that executes an external app which works fine when the app has a small amount of output, but hangs when there is a lot. My code looks like:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
errcode = p.wait()
retval = p.stdout.read()
errmess = p.stderr.read()
if errcode:
log.error('cmd failed <%s>: %s' % (errcode,errmess))
There are comments in the docs that seem to indicate the potential issue. Under wait, there is:
Warning: This will deadlock if the child process generates enough output to a stdout or stderr pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
though under communicate, I see:
Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
So it is unclear to me that I should use either of these if I have a large amount of data. They don't indicate what method I should use in that case.
I do need the return value from the exec and do parse and use both the stdout and stderr.
So what is an equivalent method in Python to exec an external app that is going to have large output?

You're doing blocking reads to two files; the first needs to complete before the second starts. If the application writes a lot to stderr, and nothing to stdout, then your process will sit waiting for data on stdout that isn't coming, while the program you're running sits there waiting for the stuff it wrote to stderr to be read (which it never will be--since you're waiting for stdout).
There are a few ways you can fix this.
The simplest is to not intercept stderr; leave stderr=None. Errors will be output to stderr directly. You can't intercept them and display them as part of your own message. For commandline tools, this is often OK. For other apps, it can be a problem.
Another simple approach is to redirect stderr to stdout, so you only have one incoming file: set stderr=STDOUT. This means you can't distinguish regular output from error output. This may or may not be acceptable, depending on how the application writes output.
The complete and complicated way of handling this is select (http://docs.python.org/library/select.html). This lets you read in a non-blocking way: you get data whenever data appears on either stdout or stderr. I'd only recommend this if it's really necessary. This probably doesn't work in Windows.

Reading stdout and stderr independently with very large output (ie, lots of megabytes) using select:
import subprocess, select
proc = subprocess.Popen(cmd, bufsize=8192, shell=False, \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
with open(outpath, "wb") as outf:
dataend = False
while (proc.returncode is None) or (not dataend):
proc.poll()
dataend = False
ready = select.select([proc.stdout, proc.stderr], [], [], 1.0)
if proc.stderr in ready[0]:
data = proc.stderr.read(1024)
if len(data) > 0:
handle_stderr_data(data)
if proc.stdout in ready[0]:
data = proc.stdout.read(1024)
if len(data) == 0: # Read of zero bytes means EOF
dataend = True
else:
outf.write(data)

A lot of output is subjective so it's a little difficult to make a recommendation. If the amount of output is really large then you likely don't want to grab it all with a single read() call anyway. You may want to try writing the output to a file and then pull the data in incrementally like such:
f=file('data.out','w')
p = subprocess.Popen(cmd, shell=True, stdout=f, stderr=subprocess.PIPE)
errcode = p.wait()
f.close()
if errcode:
errmess = p.stderr.read()
log.error('cmd failed <%s>: %s' % (errcode,errmess))
for line in file('data.out'):
#do something

Glenn Maynard is right in his comment about deadlocks. However, the best way of solving this problem is two create two threads, one for stdout and one for stderr, which read those respective streams until exhausted and do whatever you need with the output.
The suggestion of using temporary files may or may not work for you depending on the size of output etc. and whether you need to process the subprocess' output as it is generated.
As Heikki Toivonen has suggested, you should look at the communicate method. However, this buffers the stdout/stderr of the subprocess in memory and you get those returned from the communicate call - this is not ideal for some scenarios. But the source of the communicate method is worth looking at.
Another example is in a package I maintain, python-gnupg, where the gpg executable is spawned via subprocess to do the heavy lifting, and the Python wrapper spawns threads to read gpg's stdout and stderr and consume them as data is produced by gpg. You may be able to get some ideas by looking at the source there, as well. Data produced by gpg to both stdout and stderr can be quite large, in the general case.

I had the same problem. If you have to handle a large output, another good option could be to use a file for stdout and stderr, and pass those files per parameter.
Check the tempfile module in python: https://docs.python.org/2/library/tempfile.html.
Something like this might work
out = tempfile.NamedTemporaryFile(delete=False)
Then you would do:
Popen(... stdout=out,...)
Then you can read the file, and erase it later.

You could try communicate and see if that solves your problem. If not, I'd redirect the output to a temporary file.

Here is simple approach which captures both regular output plus error output, all within Python so limitations in stdout don't apply:
com_str = 'uname -a'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output
Linux 3.11.0-20-generic SMP Fri May 2 21:32:55 UTC 2014
and
com_str = 'id'
command = subprocess.Popen([com_str], stdout=subprocess.PIPE, shell=True)
(output, error) = command.communicate()
print output
uid=1000(myname) gid=1000(mygrp) groups=1000(cell),0(root)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

scrambled output from a child process run from subprocess - python

Related

Writing input to a process opened with Popen

Python reading Popen continuously (Windows)

About piping stdio and subprocess.Popen

Asynchronously read stdout from subprocess.Popen

Using subprocess.Popen for Process with Large Output

Categories

Resources