How does subprocess capture the output of a process with a pipe? - python

What exactly does subprocess do to capture the output of the of the thing being run?
Is it using some OS hook to direct the output into a shared ram space?
Does it direct the process to write to a file on disk that it then reads? Where is the file?
Network Socket?
Does it do something else?
proc = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
Windows 10
Python 3.7

you could check its source code, it's just a fancy wrapper around Create Process, for windows which accepts input and output and error buffers, which are typically buffers wrapped as python TextIO buffer. (think of it like StringIO or BytesIO buffers)
on linux since there is no CreateProcess function, this is changed to a fork then exec after rewiring the standard input and output buffers, which you can check its source code

Related

What is the simplest and most reliable way to send a command-line command in python, and print the stdout and stderr?

To elaborate on what I'm doing:
I want to create a web-based CLI for my Raspberry Pi. I want to take a websocket and connect it to this Raspberry Pi script, so that the text I type into the webpage will get entered directly into the CLI on the raspberry pi, and the response will return to me on the webpage.
My first goal is creating the python script that can properly send a user-inputted command to the CLI and return all responses in the CLI back.
If you just need the return value you can use os.system, but then you won't get the output of stdout and stderr. So you probably have to use the subprocess module, which requires you to split the input text into command and parameters first.
Sounds like you are looking for the python subprocess module in the standard library. This will allow you to interact with the CLI from a python script.
The subprocess module will do this for you but has a few quirks. You can pass in file objects to the various calls to bind to stderr and stdout, but they have to be real file objects. StringIO doesn't cut it.
The below uses check_output() as it grabs stdout for us and saves us opening a file. I'm sure there's fancier way of doing this.
from tempfile import TemporaryFile
from subprocess import check_output, CalledProcessError
def shell(command):
stdout = None
with TemporaryFile('rw') as fh:
try:
stdout = check_output(command, shell=True, stderr=fh)
except CalledProcessError:
pass
# Rewind the file handle to read from the beginning
fh.seek(0)
stderr = fh.read()
return stdout, stderr
print shell("echo hello")[0]
# hello
print shell("not_a_shell_command")[1]
# /bin/sh: 1: not_a_shell_command: not found
As one of the other posters mentions, you should really cleanse your input to prevent security exploits (and drop the shell=true). To be honest though, your project sounds like you are purposefully building a remote execution exploit for yourself, so it probably doesn't matter.

Python reading Popen continuously (Windows)

Im trying to stdout.readline and put the results (i.e each line, at the time of printing them to the terminal) on a multiprocessing.Queue for us in another .py file. However, the call:
res = subprocess.Popen(command, stdout=subprocess.PIPE, bufsize=1 )
with res.stdout:
for line in iter(res.stdout.readline, b''):
print line
res.wait()
Will block and the results will be printed after the process is complete (or not at all if exit code isn't returned).
I've browsed SO for answers to this, and tried setting bufsize=1, spawning threads that handle the reading, using filedescriptors, etc. None seem to work. I might have to use the module pexpect but I'm not sure how it works yet.
I have also tried
def enqueue_output(self, out, queue):
for line in iter(out.readline, b''):
queue.put([line])
out.close()
To put the data on the queue, but since out.readline seems to block, the result will be the same.
In short: How do I make the subprocess output available to me at the time of print? It prints chunks of 1-10 lines at a time, however these are returned to me when the process completes, separated by newlines as well..
Related:
Python subprocess readlines() hangs
Python: read streaming input from subprocess.communicate()
Non-blocking read on a subprocess.PIPE in python
As explained by #eryksun, and confirmed by your comment, the cause of the buffering is the use of printf by the C application.
By default, printf buffers its output, but the output is flushed on newline or if a read occurs when the output is directed to a terminal. When the output is directed to a file or a pipe, the actual output only occurs when the buffer is full.
Fortunately on Windows, there is no low level buffering (*). That means that calling setvbuf(stdout, NULL, _IONBF, 0); near the beginning of the program would be enough. But unfortunately, you need no buffering at all (_IONBF), because line buffering on Windows is implemented as full buffering.
(*) On Unix or Linux systems, the underlying system call can add its own buffering. That means that a program using low level write(1, buf, strlen(buf)); will be unbuffered on Windows, but will still be buffered on Linux when standard output is connected to a pipe or a file.

Using subprocess to launch hadoop job but can't get log from stdout

To simplify my question, here'a a python script:
from subprocess import Popen, PIPE
proc = Popen(['./mr-task.sh'], shell=True, stdout=PIPE, stderr=PIPE)
while True:
out = proc.stdout.readline()
print(out)
Here's mr-task.sh, it starts a mapreduce job:
hadoop jar xxx.jar some-conf-we-don't-need-to-care
When I run ./mr-task, I could see log printed on the screen, something like:
14/12/25 14:56:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/12/25 14:56:44 INFO snappy.LoadSnappy: Snappy native library loaded
14/12/25 14:57:01 INFO mapred.JobClient: Running job: job_201411181108_16380
14/12/25 14:57:02 INFO mapred.JobClient: map 0% reduce 0%
14/12/25 14:57:28 INFO mapred.JobClient: map 100% reduce 0%
But I can't get these output running python script. I tried removing shell=True or fetch stderr, still got nothing.
Does anyone have any idea why this happens?
You could redirect stderr to stdout:
from subprocess import Popen, PIPE, STDOUT
proc = Popen(['./mr-task.sh'], stdout=PIPE, stderr=STDOUT, bufsize=1)
for line in iter(proc.stdout.readline, b''):
print line,
proc.stdout.close()
proc.wait()
See Python: read streaming input from subprocess.communicate().
in my real program I redirect stderr to stdout and read from stdout, so bufsize is not needed, is it?
The redirection of stderr to stdout and bufsize are unrelated. Changing bufsize might affect the time performance (the default bufsize=0 i.e., unbuffered on Python 2). Unbuffered I/O might be 10..100 times slower. As usual, you should measure the time performance if it is important.
Calling Popen.wait/communicate after the subprocess has terminated is just for clearing zombie process, and these two methods have no difference in such case, correct?
The difference is that proc.communicate() closes the pipes before reaping the child process. It releases file descriptors (a finite resource) to be used by a other files in your program.
about buffer, if output fill buffer maxsize, will subprocess hang? Does that mean if I use the default bufsize=0 setting I need to read from stdout as soon as possible so that subprocess don't block?
No. It is a different buffer. bufsize controls the buffer inside the parent that is filled/drained when you call .readline() method. There won't be a deadlock whatever bufsize is.
The code (as written above) won't deadlock no matter how much output the child might produce.
The code in #falsetru's answer can deadlock because it creates two pipes (stdout=PIPE, stderr=PIPE) but it reads only from one pipe (proc.stderr).
There are several buffers between the child and the parent e.g., C stdio's stdout buffer (a libc buffer inside child process, inaccessible from the parent), child's stdout OS pipe buffer (inside kernel, the parent process may read the data from here). These buffers are fixed they won't grow if you put more data into them. If stdio's buffer overflows (e.g., during a printf() call) then the data is pushed downstream into the child's stdout OS pipe buffer. If nobody reads from the pipe then then this OS pipe buffer fills up and the child blocks (e.g., on write() system call) trying to flush the data.
To be concrete, I've assumed C stdio's based program and POSIXy OS.
The deadlock happens because the parent tries to read from the stderr pipe that is empty because the child is busy trying to flush its stdout. Thus both processes hang.
One possible reaosn is that the output is printed to standard error instead of standard output.
Try to replace stdout with stderr:
from subprocess import Popen, PIPE
proc = Popen(['./mr-task.sh'], stdout=PIPE, stderr=PIPE)
while True:
out = proc.stderr.readline() # <----
if not out:
break
print(out)

Skip stdin and stderr of child with pexpect

I'm controlling a child process using pexpect (because subprocess doesn't support pty's and I run into a deadlock with two pipes). The process creates a lot of output on stderr, in which I'm not interested, and apparantly pexpect also echoes back anything I write to its stdin:
>>> import pexpect
>>> p = pexpect.spawn('rev')
>>> p.sendline('Hello!')
7
>>> p.readline()
'Hello!\r\n'
>>> p.readline()
'!olleH\r\n'
How can I turn this off?
Using pty's is not quite the same as a pipe. If you don't put in in raw mode the tty driver will echo back the characters and perform other line editing. So to get a clean data path you need to also put the pty/tty in raw mode.
Since you are now dealing with a pseudo device you have only a single I/O stream. There is no distinction there between stdout and stderr (that is a userspace convention). So you will always see stdout and stderr mixed when using a pty/tty.

Bypassing buffering of subprocess output with popen in C or Python

I have a general question about popen (and all related functions), applicable to all operating systems, when I write a python script or some c code and run the resulting executable from the console (win or linux), i can immediately see the output from the process. However, if I run the same executable as a forked process with its stdout redirected into a pipe, the output buffers somewhere, usually up to 4096 bytes before it is written to the pipe where the parent process can read it.
The following python script will generate output in chunks of 1024 bytes
import os, sys, time
if __name__ == "__main__":
dye = '#'*1024
for i in range (0,8):
print dye
time.sleep(1)
The following python script will execute the previous script and read the output as soon as it comes to the pipe, byte by byte
import os, sys, subprocess, time, thread
if __name__ == "__main__":
execArgs = ["c:\\python25\\python.exe", "C:\\Scripts\\PythonScratch\\byte_stream.py"]
p = subprocess.Popen(execArgs, bufsize=0, stdout=subprocess.PIPE)
while p.returncode == None:
data = p.stdout.read(1)
sys.stdout.write(data)
p.poll()
Adjust the path for your operating system. When run in this configuration, the output will not appear in chunks of 1024 but chunks of 4096, despite the buffer size of the popen command being set to 0 (which is the default anyway). Can anyone tell me how to change this behaviour?, is there any way I can force the operating system to treat the output from the forked process in the same way as when it is run from the console?, ie, just feed the data through without buffering?
In general, the standard C runtime library (that's running on behalf of just about every program on every system, more or less;-) detects whether stdout is a terminal or not; if not, it buffers the output (which can be a huge efficiency win, compared to unbuffered output).
If you're in control of the program that's doing the writing, you can (as another answer suggested) flush stdout continuously, or (more elegantly if feasible) try to force stdout to be unbuffered, e.g. by running Python with the -u commandline flag:
-u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
see man page for details on internal buffering relating to '-u'
(what the man page adds is a mention of stdin and issues with binary mode[s]).
If you can't or don't want to touch the program that's writing, -u or the like on the program that's just reading is unlikely to help (the buffering that matters most is the one happening on the writer's stdout, not the one on the reader's stdin). The alternative is to trick the writer into believing that it's writing to a terminal (even though in fact it's writing to another program!), via the pty standard library module or the higher-level third party pexpect module (or, for Windows, its port wexpect).
Thats correct, and applies to both Windows and Linux (and possibly other systems), with popen() and fopen(). If you want the output buffer to be dispatched before 4096 bytes, use fflush() (on C) or sys.stdout.flush() (Python).

Categories