Python close_fds not clear - python

I had an issue with close_fds in Python27 so after doing some research I found this example:
from subprocess import Popen, PIPE, STDOUT
p1 = Popen(['cat'], stdin=PIPE, stdout=PIPE)
p2 = Popen(['grep', 'a'], stdin=p1.stdout, stdout=PIPE)
p1.stdin.write("aaaaaaaaaaaaaaaa\n")
p1.stdin.close()
p2.stdout.read()
My problem is that I can't understand why p1.stdin remains open. p1 is not a child of p2 so p2 shouldn't inherit any p1 resource except p1.stdout which is explicitly passed. Furthermore why setting close_fds=True in p2 resolves the issue? Here is written this:
If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed.
So even if I will be able to understand the inheritance between p1 and p2 still p1.stdin shouldn't be closed by close_fds=True because it is the standard input (1).

Since p1 and p2 are siblings, there is no inheritance going on between their corresponding processes directly.
However, consider the file descriptor that the parent sees as p1.stdin, inherited by p1 and redirected to its stdin. This file descriptor exists in the parent process (with a number other than 0, 1, or 2 - you can verify this by printing p1.stdin.fileno()), and it has to exist, because we intend to write to it from the parent. It is this file descriptor that is unintentionally inherited and kept open by p2.
When an open file is referenced by multiple file descriptors, as is the case with p1.stdin, it is only closed when all the descriptors are closed. This is why it is necessary to both close p1.stdin and pass close_fds to p2. (If you implemented the spawning code manually, you would simply close the file descriptor after the second fork().)

Related

Under what condition does a Python subprocess get a SIGPIPE?

I am reading the the Python documentation on the Popen class in the subprocess module section and I came across the following code:
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
The documentation also states that
"The p1.stdout.close() call after starting the p2 is important in order for p1 to receive a SIGPIPE if p2 exits before p1.
Why must the p1.stdout be closed before we can receive a SIGPIPE and how does p1 knows that p2 exits before p1 if we already closed it?
SIGPIPE is a signal that would be sent if dmesg tried to write to a closed pipe. Here, dmesg ends up with two targets to write to, your Python process and the grep process.
That's because subprocess clones file handles (using the os.dup2() function). Configuring p2 to use p1.stdout triggers a os.dup2() call that asks the OS to duplicate the pipe filehandle; the duplicate is used to connect dmesg to grep.
With two open file handles for dmesg stdout, dmesg is never given a SIGPIPE signal if only one of them closes early, so grep closing would never be detected. dmesg would needlessly continue to produce output.
So by closing p1.stdout immediately, you ensure that the only remaining filehandle reading from dmesg stdout is the grep process, and if that process were to exit, dmesg receives a SIGPIPE.

Change Output Redirection of Running Process

I have a parent Python script that launches a child (which launches grandchildren), and after some time, I terminate the child, but the grandchildren continue to pump to stdout. After I kill the child, I want to suppress/redirect the stdout and stderr of the grandchildren (and all their descendants).
Here is the parent:
import time
import subprocess
proc = subprocess.Popen('./child.sh')
print("Dad: I have begotten a son!")
time.sleep(1)
proc.kill()
proc.wait()
print("Dad: My son hath died!")
time.sleep(2)
print("Dad: Why does my grandson still speak?")
Here is the child script which I cannot modify.
#!/bin/bash
./grandchild.sh &
echo "Child: I had a son!"
for (( i = 0; i < 10; i++ )); do
echo "Child: Hi Dad, meet your grandson!"
sleep 1
done
exit 0
Here is a noisy grandchild which I cannot modify.
#!/bin/bash
for (( i = 0; i < 10; i++ )); do
echo "Grandchild: Wahhh"
sleep 1
done
exit 0
I tried doing this right before killing the child:
import os
f = open(os.devnull,"w")
proc.stdout = proc.stderr = f
But it doesn't seem to work. The output is:
> ./parent.py
Dad: I have begotten a son!
Child: I had a son!
Child: Hi Dad, meet your grandson!
Grandchild: Wahhh
Dad: My son hath died!
Grandchild: Wahhh
Grandchild: Wahhh
Dad: My grandson still speaketh!
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
When you invoke subprocess.Popen you can tell it to redirect stdout and/or stderr. If you don't, it leaves them un-redirected by allowing the OS to copy from the Python process's actual STDOUT_FILENO and STDERR_FILENO (which are fixed constants, 1 and 2).
This means that if Python's fd 1 and 2 are going to your tty session (perhaps on an underlying device like /dev/pts/0 for instance), the child—and with this case, consequently, the grandchild as well—are talking directly to the same session (the same /dev/pts/0). Nothing you do in the Python process itself can change this: those are independent processes with independent, direct access to the session.
What you can do is invoke ./child.sh with redirection in place:
proc = subprocess.Popen('./child.sh', stdout=subprocess.PIPE)
Quick side-note edit: if you want to discard all output from the child and its grandchildren, open os.devnull (either as you did, or with os.open() to get a raw integer file descriptor) and connect stdout and stderr to the underlying file descriptor. If you have opened it as a Python stream:
f = open(os.devnull, "w")
then the underlying file descriptor is f.fileno():
proc = subprocess.Popen('./child.sh', stdout=f.fileno(), stderr=f.fileno())
In this case you cannot get any output from any of the processes involved.
Now file descriptor 1 in the child is connected to a pipe-entity, rather than directly to the session. (Since there is no stderr= above, fd 2 in the child is still connected directly to the session.)
The pipe-entity, which lives inside the operating system, simply copies from one end (the "write end" of the pipe) to the other (the "read end"). Your Python process has control of the read-end. You must invoke the OS read system call—often not directly, but see below—on that read end, to collect the output from it.
In general, if you stop reading from your read-end, the pipe "fills up" and any process attempting an OS-level write on the write-end is "blocked" until someone with access to the read end (that's you, again) reads from it.
If you discard the read-end, leaving the pipe with nowhere to dump its output, the write end starts returning EPIPE errors and sending SIGPIPE signals, to any process attempting an OS-level write call. This kind of discard occurs when you call the OS-level close system call, assuming you have not handed the descriptor off to some other process(es). It also occurs when your process exits (under the same assumption, again).
There is no convenient method by which you can connect the read-end to an infinite data sink like /dev/null, at least in most Unix-like systems (there are a few with some special funky system calls to do this kind of "plumbing"). But if you plan to kill the child and are willing to let its grandchildren die from SIGPIPE signals, you can simply close the descriptor (or exit) and let the chips fall where they may.
Children and grandchildren can protect themselves from dying by setting SIGPIPE to SIG_IGN, or by blocking SIGPIPE. Signal masks are inherited across exec system calls so in some cases, you can block SIGPIPE for children (but some children will unblock signals).
If closing the descriptor is not suitable, you can create a new process that simply reads and discards incoming pipe data. If you use the fork system call, this is trivial. Alternatively some Unix-like systems allow you to pass file descriptors through AF_UNIX sockets to otherwise-unrelated (parent/child-wise) processes, so you could have a daemon that does this, reachable via an AF_UNIX socket. (This is nontrivial to code.)
If you wish the child process to send its stderr output to the same pipe, so that you can read both its stdout and its stderr, simply add stderr=subprocess.STDOUT to the Popen() call. If you wish the child process to send its stderr output to a separate pipe, add stderr=subprocess.PIPE. If you do the latter, however, things can get a bit tricky.
To prevent children from blocking, as noted above, you must invoke the OS read call. If there is only one pipe this is easy:
for line in proc.stdout:
...
for instance, or:
line = proc.stdout.readline()
will read the pipe one line at a time (modulo buffering inside Python). You can read as many or as few lines as you like.
If there are two pipes, though, you must read whichever one(s) is/are "full". Python's subprocess module defines the communicate() function to do this for you:
stdout, stderr = proc.communicate()
The drawback here is that communicate() reads to completion: it needs to get all output that can go to the write end of each pipe. This means it repeatedly calls the OS-level read operation until read indicates end-of-data. That occurs only when all processes that had, at some point, write access to the write end of the corresponding pipe, have closed that end of the pipe. In other words, it waits for the child and any grandchildren to close the descriptors connected to the write end of the pipe(s).
In general it's much simpler to use only one pipe, read as much (but only as much) as you like, then simply close the pipe:
proc = subprocess.Popen('./child.sh', stdout=subprocess.PIPE)
line1 = proc.stdout.readline()
line2 = proc.stdout.readline()
# that's all we care about
proc.stdout.close()
proc.kill()
status = proc.wait()
Whether this suffices depends on your particular problem.
If you don't care about the grandchildren; you could kill them all:
#!/usr/bin/env python3
import os
import signal
import subprocess
import sys
import time
proc = subprocess.Popen('./child.sh', start_new_session=True)
print("Dad: I have begotten a son!")
time.sleep(1)
print("Dad: kill'em all!")
os.killpg(proc.pid, signal.SIGKILL)
for msg in "dead... silence... crickets... chirping...".split():
time.sleep(1)
print(msg, end=' ', flush=True)
You can emulate start_new_session=True on old Python versions using preexec_fn=os.setsid. See Best way to kill all child processes.
You can collect children's output before the killing:
#!/usr/bin/env python
import collections
import os
import signal
import threading
from subprocess import Popen, PIPE, STDOUT
def killall(proc):
print "Dad: kill'em all!"
os.killpg(proc.pid, signal.SIGKILL)
proc.wait()
proc = Popen('./child.sh', stdout=PIPE, stderr=STDOUT, preexec_fn=os.setsid)
print("Dad: I have begotten a son!")
# kill in a second
hitman = threading.Timer(1, killall, [proc])
hitman.start()
# save last 200 lines of output
q = collections.deque(proc.stdout, maxlen=200)
hitman.cancel()
proc.wait()
# print collected output
print '*'*60
print ''.join(q).decode('ascii'),
print '*'*60
See Stop reading process output in Python without hang?
Right now, your subprocess is allowed to communicate with your terminal via STDOUT and STDERR. Instead, you can hijack this data from the subprocess like so:
import subprocess
cmd = ['./child.sh']
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
This redirects all STDERR output of your child to the normal STDOUT channel, then redirects the normal STDOUT output of your child to your python script, via a PIPE. You can now read from that PIPE using line = process.stdout.readline(), which grabs a single line of output. You can print that back to STDOUT with print(line).
Once you kill your son (gasp), stop all output from your subprocess.
For a more information on subprocess, see one of my previous answers which are similar to this: python subprocess.call output is not interleaved

Python subprocesses with several pipes

I know how to do several "nested" pipes using subprocesses however I have another doubt. I want to do the following:
p1=Popen(cmd1,stdout=PIPE)
p2=Popen(cmd2,stdin=p1.stdout)
p3=Popen(cmd3,stdin=p1.stdout)
Take into account that p3 uses p1.stdout instead of p2.stdout. The problem is that after doing p2, p1.stdout is blank. Please help me!
You can't send the same pipe to two different processes. Or, rather, if you do, they end up accessing the same pipe, meaning if one process reads something, it's no longer available to the other one.
What you need to do is "tee" the data in some way.
If you don't need to stream the data as they come in, you can read all the output from p1, then send it as input to both p2 and p3. This is easy:
output = check_output(cmd1)
p2 = Popen(cmd2, stdin=PIPE)
p2.communicate(output)
p3 = Popen(cmd3, stdin=PIPE)
p3.communicate(output)
If you just need p2 and p3 to run in parallel, you can just run them each in a thread.
But if you actually need real-time streaming, you have to connect things up more carefully. If you can be sure that p2 and p3 will always consume their input, without blocking, faster than p1 can supply it, you can do this without threads (just loop on p1.stdout.read()), but otherwise, you'll need an output thread for each consumer process, and a Queue or some other way to pass the data around. See the source code to communicate for more ideas on how to synchronize the separate threads.
If you want to copy the output from a subprocess to other processes without reading all output at once then here's an implementation of #abarnert's suggestion to loop over p1.stdout that achieves it:
from subprocess import Popen, PIPE
# start subprocesses
p1 = Popen(cmd1, stdout=PIPE, bufsize=1)
p2 = Popen(cmd2, stdin=PIPE, bufsize=1)
p3 = Popen(cmd3, stdin=PIPE, bufsize=1)
# "tee" the data
for line in iter(p1.stdout.readline, b''): # assume line-oriented data
p2.stdin.write(line)
p3.stdin.write(line)
# clean up
for pipe in [p1.stdout, p2.stdin, p3.stdin]:
pipe.close()
for proc in [p1, p2, p3]:
proc.wait()

Python subprocess reading process terminates before writing process example, clarification needed

Code snippet from: http://docs.python.org/3/library/subprocess.html#replacing-shell-pipeline
output=`dmesg | grep hda`
# becomes
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
Question: I do not quite understand why this line is needed: p1.stdout.close()?
What if by doing this p1 stdout is closed even before it is completely done outputting data and p2 is still alive ? Are we not risking that by closing p1.stdout so soon? How does this work?
p1.stdout.close() closes Python's copy of the file descriptor. p2 already has that descriptor open (via stdin=p1.stdout), so closing Python's descriptor doesn't affect p2. However, now that pipe end is only opened once, so when it closes (e.g. if p2 dies), p1 will see the pipe close and will get SIGPIPE.
If you didn't close p1.stdout in Python, and p2 died, p1 would get no signal because Python's descriptor would be holding the pipe open.
Pipes are external to processes (its an operating system thing) and are accessed by processes using read and write handles. Many processes can have handles to the pipe and can read and write in all sorts of disastrous ways if not managed properly. Pipes close when all handles to the pipes are closed.
Although process execution works differently in Linux and Windows, Here is basically what happens (I'm going to get killed on this!)
p1 = Popen(["dmesg"], stdout=PIPE)
Create pipe_1, give a write handle to dmesg as its stdout, and return a read handle in the parent as p1.stdout. You now have 1 pipe with 2 handles (pipe_1 write in dmesg, pipe_1 read in the parent).
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
Create pipe_2. Give grep a write handle to pipe_2 and a copy of the read handle to pipe_1. You now have 2 pipes and 5 handles (pipe_1 write in dmesg, pipe_1 read and pipe_2 write in grep, pipe_1 read and pipe_2 read in the parent).
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
Notice that pipe_1 has two read handles. You want grep to have the read handle so that it reads dmesg data. You don't need the handle in the parent any more. Close it so that there is only 1 read handle on pipe_1. If grep dies, its pipe_1 read handle is closed, the operating system notices there are no remaining read handles for pipe_1 and gives dmesg the bad news.
output = p2.communicate()[0]
dmesg sends data to stdout (the pipe_1 write handle) which begins filling pipe_1. grep reads stdin (the pipe_1 read handle) which empties pipe_1. grep also writes stdout (the pipe_2 write handle) filling pipe_2. The parent process reads pipe_2... and you got yourself a pipeline!

Explain example pipeline from Python subprocess module

Section 17.1.4.2: Replacing shell pipeline of the python subprocess module says to replace
output=`dmesg | grep hda`
with
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
The comment to the third line explains why the close function is being called, but not why it makes sense. It doesn't, to me. Will not closing p1.stdout before the communicate method is called prevent any output from being sent through the pipe? (Obviously it won't, I've tried to run the code and it runs fine). Why is it necessary to call close to make p1 receive SIGPIPE? What kind of close is it that doesn't close? What, exactly, is it closing?
Please consider this an academic question, I'm not trying to accomplish anything except understanding these things better.
You are closing p1.stdout in the parent process, thus leaving dmesg as the only process with that file descriptor open. If you didn't do this, even when dmesg closed its stdout, you would still have it open, and a SIGPIPE would not be generated. (The OS basically keeps a reference count, and generates SIGPIPE when it hits zero. If you don't close the file, you prevent it from ever reaching zero.)

Categories