Bypassing buffering of subprocess output with popen in C or Python - python

I have a general question about popen (and all related functions), applicable to all operating systems, when I write a python script or some c code and run the resulting executable from the console (win or linux), i can immediately see the output from the process. However, if I run the same executable as a forked process with its stdout redirected into a pipe, the output buffers somewhere, usually up to 4096 bytes before it is written to the pipe where the parent process can read it.
The following python script will generate output in chunks of 1024 bytes
import os, sys, time
if __name__ == "__main__":
dye = '#'*1024
for i in range (0,8):
print dye
time.sleep(1)
The following python script will execute the previous script and read the output as soon as it comes to the pipe, byte by byte
import os, sys, subprocess, time, thread
if __name__ == "__main__":
execArgs = ["c:\\python25\\python.exe", "C:\\Scripts\\PythonScratch\\byte_stream.py"]
p = subprocess.Popen(execArgs, bufsize=0, stdout=subprocess.PIPE)
while p.returncode == None:
data = p.stdout.read(1)
sys.stdout.write(data)
p.poll()
Adjust the path for your operating system. When run in this configuration, the output will not appear in chunks of 1024 but chunks of 4096, despite the buffer size of the popen command being set to 0 (which is the default anyway). Can anyone tell me how to change this behaviour?, is there any way I can force the operating system to treat the output from the forked process in the same way as when it is run from the console?, ie, just feed the data through without buffering?

In general, the standard C runtime library (that's running on behalf of just about every program on every system, more or less;-) detects whether stdout is a terminal or not; if not, it buffers the output (which can be a huge efficiency win, compared to unbuffered output).
If you're in control of the program that's doing the writing, you can (as another answer suggested) flush stdout continuously, or (more elegantly if feasible) try to force stdout to be unbuffered, e.g. by running Python with the -u commandline flag:
-u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
see man page for details on internal buffering relating to '-u'
(what the man page adds is a mention of stdin and issues with binary mode[s]).
If you can't or don't want to touch the program that's writing, -u or the like on the program that's just reading is unlikely to help (the buffering that matters most is the one happening on the writer's stdout, not the one on the reader's stdin). The alternative is to trick the writer into believing that it's writing to a terminal (even though in fact it's writing to another program!), via the pty standard library module or the higher-level third party pexpect module (or, for Windows, its port wexpect).

Thats correct, and applies to both Windows and Linux (and possibly other systems), with popen() and fopen(). If you want the output buffer to be dispatched before 4096 bytes, use fflush() (on C) or sys.stdout.flush() (Python).

Related

How does subprocess capture the output of a process with a pipe?

What exactly does subprocess do to capture the output of the of the thing being run?
Is it using some OS hook to direct the output into a shared ram space?
Does it direct the process to write to a file on disk that it then reads? Where is the file?
Network Socket?
Does it do something else?
proc = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
Windows 10
Python 3.7
you could check its source code, it's just a fancy wrapper around Create Process, for windows which accepts input and output and error buffers, which are typically buffers wrapped as python TextIO buffer. (think of it like StringIO or BytesIO buffers)
on linux since there is no CreateProcess function, this is changed to a fork then exec after rewiring the standard input and output buffers, which you can check its source code

Writing input to a process opened with Popen

I have a program called my_program that operates a system. the program runs on Linux, and I'm trying to automate it using Python.
my_program is constantly generating output and is suppose to receive input and respond to it.
When I'm running my_program in bash it does work like it should, I receive a constant output from the program and when I press a certain sequence (for instance /3 to change the mode of the system), the program responds with an output.
to start the process I am using:
self.process = Popen(my_program,stdin=PIPE,stdout=PIPE,text=True)
And in order to write input to the system I am using:
self.process.stdin.write('/3')
But the writing does not seem to work, I also tried using:
self.process.communicate('/3)
But since my system constantly generating output, it deadlooks the process and the whole program gets stuck.
Any solution for writing to a process that is constantly generating output?
Edit:
I don't think I can provide a code that can reproduce the problem because I'm using a unique SW that my company has, but it goes somthing like this:
self.process = Popen(my_program,stdin=PIPE,stdout=PIPE,text=True)
self.process.stdin.write('/3')
# try to find a specific string that indicated that the input string was received
string_received = False
while(string_received = False):
response = self.process.stdout.readline().strip()
if (response == expected_string):
break
The operating system implements buffered I/O between processes unless you specifically request otherwise.
In very brief, the output buffer will be flushed and written when it fills up, or (with default options) when you write a newline.
You can disable buffering when you create the Popen object:
self.process = Popen(my_program, stdin=PIPE, stdout=PIPE, text=True, bufsize=1)
... or you can explicitly flush() the file handle when you want to force writing.
self.process.stdin.flush()
However, as the documentation warns you, if you can't predict when the subprocess can read and when it can write, you can easily end up in deadlock. A more maintainable solution might be to run the subprocess via pexpect or similar.

Python reading Popen continuously (Windows)

Im trying to stdout.readline and put the results (i.e each line, at the time of printing them to the terminal) on a multiprocessing.Queue for us in another .py file. However, the call:
res = subprocess.Popen(command, stdout=subprocess.PIPE, bufsize=1 )
with res.stdout:
for line in iter(res.stdout.readline, b''):
print line
res.wait()
Will block and the results will be printed after the process is complete (or not at all if exit code isn't returned).
I've browsed SO for answers to this, and tried setting bufsize=1, spawning threads that handle the reading, using filedescriptors, etc. None seem to work. I might have to use the module pexpect but I'm not sure how it works yet.
I have also tried
def enqueue_output(self, out, queue):
for line in iter(out.readline, b''):
queue.put([line])
out.close()
To put the data on the queue, but since out.readline seems to block, the result will be the same.
In short: How do I make the subprocess output available to me at the time of print? It prints chunks of 1-10 lines at a time, however these are returned to me when the process completes, separated by newlines as well..
Related:
Python subprocess readlines() hangs
Python: read streaming input from subprocess.communicate()
Non-blocking read on a subprocess.PIPE in python
As explained by #eryksun, and confirmed by your comment, the cause of the buffering is the use of printf by the C application.
By default, printf buffers its output, but the output is flushed on newline or if a read occurs when the output is directed to a terminal. When the output is directed to a file or a pipe, the actual output only occurs when the buffer is full.
Fortunately on Windows, there is no low level buffering (*). That means that calling setvbuf(stdout, NULL, _IONBF, 0); near the beginning of the program would be enough. But unfortunately, you need no buffering at all (_IONBF), because line buffering on Windows is implemented as full buffering.
(*) On Unix or Linux systems, the underlying system call can add its own buffering. That means that a program using low level write(1, buf, strlen(buf)); will be unbuffered on Windows, but will still be buffered on Linux when standard output is connected to a pipe or a file.

In Linux launch a c program from Python handing the c program a large text string as an argument

In Linux. I have a c program that reads a 2048Byte text file as an input. I'd like to launch the c program from a Python script. I'd like the Python script to hand the c program the text string as an argument, instead of writing the text string to a file for the c program to then read.
How can a Python program launch a c program handing it a ~2K (text) data structure?
Also note, I cannot use "subprocess.check_output()". I have to use "os.system()". That's because the latter allows my c-program direct access to terminal input/output. The former does not.
You can pass it as an argument by just… passing it as an argument. Presumably you want to quote it rather than passing it as an arbitrary number of arguments that need to be escaped and so on, but that's easy with shlex.quote. For example:
with open('bigfile.txt', 'rb') as infile:
biginput = infile.read(2048)
os.system('cprogram {}'.format(shlex.quote(biginput)))
If you get an error about the argument or the command line being too long for the shell… then you can't do it. Python can't make the shell do things it can't do, and you refuse to go around the shell (I think because of a misunderstanding, but let's ignore that for the moment). So, you will need some other way to pass the data.
But that doesn't mean you have to store it in a file. You can use the shell from subprocess just as easily as from os.system, which means you can pass it to your child process's stdin:
with subprocess.Popen('cprogram {}'.format(shlex.quote(biginput)),
shell=True, stdin=subprocess.PIPE) as p:
p.communicate(biginput)
Since you're using shell=True, and not replacing either stdout or stderr, it will get the exact same terminal that it would get with os.system. So, for example, if it's doing, say, isatty(fileno(stdout)), it will be true if your Python script is running in a tty, false otherwise.
As a side note, storing it in a tempfile.NamedTemporaryFile may not cost nearly as much as you expect it to. In particular, the child process will likely be able to read the data you wrote right out of the in-memory disk cache instead of waiting for it to be flushed to disk (and it may never get flushed to disk).
I suspect that the reason you thought you couldn't use subprocess is that you were using check_output when you wanted check_call.
If you use check_output (or if you explicit pass stdout=PIPE to most other subprocess functions), the child process's stdout is the pipe that you're reading from, so it's obviously not a tty.
This makes sense: either you want to capture the output, in which case the C program can't output to the tty, or you want to let the C program output to the tty, in which case you can't capture it.* So, just don't capture the output, and everything will be fine.
If I'm right, this means you have no reason to use the shell in the first place, which makes everything a whole lot easier. Of course your data might still be larger than the maximum system argument size** or resource limits***, even without the shell. On most modern systems, you can count on at least 64KB, so definitely try it first:
subprocess.check_call(['cprogram', biginput])
But if you get an E2BIG error:
with subprocess.Popen(['cprogram', biginput], stdin=subprocess.PIPE) as p:
p.communicate(biginput)
* Unless, of course, you want to fake a tty for your child process, in which case you need to look at os.forkpty and related functions, or the pty module.
** On most *BSD and related systems, sysctl kern.argmax and/or getconf ARG_MAX will give you the system limit, or sysconf(_SC_ARG_MAX) from C. There may also be a constant ARG_MAX accessible through <limits.h>. On linux, things are a bit more complicated, because there are a number of different limits (most of which are very, very high) rather than just one single limit. Check your platform's manpage for execve for the details.
*** On some platforms, including recent linux, RLIMIT_STACK affects the max arg size that you can pass. Again, see your platform's execve manpage.

About piping stdio and subprocess.Popen

I have one Python program, that is opening another Python program via subprocess.Popen. The 1st is supposed to output some text into the console (just for info), and write some text to the 2nd program it had spawned. Then, it should wait for the 2nd program to respond (read() from it), and print that response.
The 2nd one is supposed to listen to the first one's input (via raw_input()) and then print text to the 1st.
To understand what exactly was happening, I had put a 5 second delay into the 2nd, and the result surprised me a bit.
Here's the code:
import subprocess
print "1st starting."
app = subprocess.Popen("name", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE) #<--- B
print "Writing something to app's STDIN..."
app.stdin.write(some_text)
print "Reading something from my STDIN..." #<--- A
result = app.stdout.read()
print "Result:"
print result
And for the 2nd one:
import time
print "app invoked."
print "Waiting for text from STDIN..."
text = raw_input()
#process(text)
time.sleep(5)
print "magic"
When I ran this code, it paused at point A, as that was the last console output.
After 5 seconds, the "Result:\n" line would be outputted, and everything the 2nd program had printed would show up in the console.
Why did the 1st program pause when reading the stdout of the 2nd one? Does it have to wait for its child to terminate before reading its output? How can this be changed so I can pass messages between programs?
I'm running Debian Linux 7.0.
The answer lies not in any magic related to the subprocess module, but in the typical behaviour of the read() method on Python objects.
If you run this:
import subprocess
p = subprocess.Popen(['ls'], stdout=subprocess.PIPE)
help(p.stdout.read)
You'll see this:
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
(END)
The same thing applies to all file-like objects. It's very simple: calling read() with no argument consumes the buffer until it encounters an error (usually EOF).
EOF is not sent until either:
the subprocess calls sys.stdout.close(), or
the subprocess exits and the Python runtime and/or OS kernel clean up its file descriptors
Beware that os.read has different behaviour - much more like typical buffered I/O in C. The built-in Python help function is useless, but if you're on any UNIXy system you should be able to run man 3 read; the Python behaviour more or less matches what's there.
A word of warning
The program above is fine, but patterns like that sometimes lead to a deadlock. The docs for the subprocess module warns about this where Popen.wait() is documented:
Warning
This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
It's possible to get in a similar situation if you're not careful during two-way communication with a subprocess, depending on what the subprocess is doing.
edit:
By the way, this page covers the behaviour of pipes with EOF:
If all file descriptors referring to the write end of a pipe have been
closed, then an attempt to read(2) from the pipe will see end-of-file
(read(2) will return 0).
edit 2:
As Lennart mentined above, if you want truly two-way communication that goes beyond write-once read-once, you'll also need to beware of buffering. If you read this you'll get some idea of it, but you should be aware that this is how buffered IO almost always works in UNIX-based systems - it's not a Python quirk. Run man stdio.h for more information.
You are asking program 1 to read input from program 2. And you are pausing program two for five seconds before it outputs anything. Obviously program 1 then needs to wait those five seconds. So what happens is perfectly expected.
Does it have to wait for its child to terminate before reading its output?
To some extent, yes, because input and output is buffered, so it's possible that even if you move the delay to after you print something the same will happen.
raw_input() will wait for a linefeed, in any case.

Categories