Explain example pipeline from Python subprocess module - python

Section 17.1.4.2: Replacing shell pipeline of the python subprocess module says to replace
output=`dmesg | grep hda`
with
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
The comment to the third line explains why the close function is being called, but not why it makes sense. It doesn't, to me. Will not closing p1.stdout before the communicate method is called prevent any output from being sent through the pipe? (Obviously it won't, I've tried to run the code and it runs fine). Why is it necessary to call close to make p1 receive SIGPIPE? What kind of close is it that doesn't close? What, exactly, is it closing?
Please consider this an academic question, I'm not trying to accomplish anything except understanding these things better.

You are closing p1.stdout in the parent process, thus leaving dmesg as the only process with that file descriptor open. If you didn't do this, even when dmesg closed its stdout, you would still have it open, and a SIGPIPE would not be generated. (The OS basically keeps a reference count, and generates SIGPIPE when it hits zero. If you don't close the file, you prevent it from ever reaching zero.)

Related

Under what condition does a Python subprocess get a SIGPIPE?

I am reading the the Python documentation on the Popen class in the subprocess module section and I came across the following code:
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
The documentation also states that
"The p1.stdout.close() call after starting the p2 is important in order for p1 to receive a SIGPIPE if p2 exits before p1.
Why must the p1.stdout be closed before we can receive a SIGPIPE and how does p1 knows that p2 exits before p1 if we already closed it?
SIGPIPE is a signal that would be sent if dmesg tried to write to a closed pipe. Here, dmesg ends up with two targets to write to, your Python process and the grep process.
That's because subprocess clones file handles (using the os.dup2() function). Configuring p2 to use p1.stdout triggers a os.dup2() call that asks the OS to duplicate the pipe filehandle; the duplicate is used to connect dmesg to grep.
With two open file handles for dmesg stdout, dmesg is never given a SIGPIPE signal if only one of them closes early, so grep closing would never be detected. dmesg would needlessly continue to produce output.
So by closing p1.stdout immediately, you ensure that the only remaining filehandle reading from dmesg stdout is the grep process, and if that process were to exit, dmesg receives a SIGPIPE.

python how to kill a popen process, shell false [why not working with standard methods]

I'm trying to kill a subprocess started with:
playing_long = Popen(["omxplayer", "/music.mp3"], stdout=subprocess.PIPE)
and after a while
pid = playing_long.pid
playing_long.terminate()
os.kill(pid,0)
playing_long.kill()
Which doesn't work.
Neither the solution pointed out here
How to terminate a python subprocess launched with shell=True
Noting that I am using threads, and it is not recommended to use preexec_fn when you use threads (or at least this is what I read, anyway it doesn't work either).
Why it is not working? There's no error message in the code, but I have to manually kill -9 the process to stop listening the mp3 file.
Thanks
EDIT:
From here, I have added a wait() after the kill().
Surprisingly, before re-starting the process I check if this is still await, so that I don't start a chorus with the mp3 file.
Without the wait(), the system sees that the process is alive.
With the wait(), the system understands that the process is dead and starts again it.
However, the process is still sounding. Definitively I can't seem to get it killed.
EDIT2: The problem is that omxplayer starts a second process that I don't kill, and it's the responsible for the actual music.
I've tried to use this code, found in several places in internet, it seems to work for everyone but not for me
playing_long.stdin.write('q')
playing_long.stdin.flush()
And it prints 'NoneType' object has no attribute 'write'. Even when using this code immediately after starting the popen process, it fails with the same message
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write('q')
playing_long.stdin.flush()
EDIT3: The problem then was that I wasn't establishing the stdin line in the popen line. Now it is
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()
*needing to specify that it is bytes what I write in stdin
Final solution then (see the process edited in the question):
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()

Python subprocesses with several pipes

I know how to do several "nested" pipes using subprocesses however I have another doubt. I want to do the following:
p1=Popen(cmd1,stdout=PIPE)
p2=Popen(cmd2,stdin=p1.stdout)
p3=Popen(cmd3,stdin=p1.stdout)
Take into account that p3 uses p1.stdout instead of p2.stdout. The problem is that after doing p2, p1.stdout is blank. Please help me!
You can't send the same pipe to two different processes. Or, rather, if you do, they end up accessing the same pipe, meaning if one process reads something, it's no longer available to the other one.
What you need to do is "tee" the data in some way.
If you don't need to stream the data as they come in, you can read all the output from p1, then send it as input to both p2 and p3. This is easy:
output = check_output(cmd1)
p2 = Popen(cmd2, stdin=PIPE)
p2.communicate(output)
p3 = Popen(cmd3, stdin=PIPE)
p3.communicate(output)
If you just need p2 and p3 to run in parallel, you can just run them each in a thread.
But if you actually need real-time streaming, you have to connect things up more carefully. If you can be sure that p2 and p3 will always consume their input, without blocking, faster than p1 can supply it, you can do this without threads (just loop on p1.stdout.read()), but otherwise, you'll need an output thread for each consumer process, and a Queue or some other way to pass the data around. See the source code to communicate for more ideas on how to synchronize the separate threads.
If you want to copy the output from a subprocess to other processes without reading all output at once then here's an implementation of #abarnert's suggestion to loop over p1.stdout that achieves it:
from subprocess import Popen, PIPE
# start subprocesses
p1 = Popen(cmd1, stdout=PIPE, bufsize=1)
p2 = Popen(cmd2, stdin=PIPE, bufsize=1)
p3 = Popen(cmd3, stdin=PIPE, bufsize=1)
# "tee" the data
for line in iter(p1.stdout.readline, b''): # assume line-oriented data
p2.stdin.write(line)
p3.stdin.write(line)
# clean up
for pipe in [p1.stdout, p2.stdin, p3.stdin]:
pipe.close()
for proc in [p1, p2, p3]:
proc.wait()

Python subprocess reading process terminates before writing process example, clarification needed

Code snippet from: http://docs.python.org/3/library/subprocess.html#replacing-shell-pipeline
output=`dmesg | grep hda`
# becomes
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
Question: I do not quite understand why this line is needed: p1.stdout.close()?
What if by doing this p1 stdout is closed even before it is completely done outputting data and p2 is still alive ? Are we not risking that by closing p1.stdout so soon? How does this work?
p1.stdout.close() closes Python's copy of the file descriptor. p2 already has that descriptor open (via stdin=p1.stdout), so closing Python's descriptor doesn't affect p2. However, now that pipe end is only opened once, so when it closes (e.g. if p2 dies), p1 will see the pipe close and will get SIGPIPE.
If you didn't close p1.stdout in Python, and p2 died, p1 would get no signal because Python's descriptor would be holding the pipe open.
Pipes are external to processes (its an operating system thing) and are accessed by processes using read and write handles. Many processes can have handles to the pipe and can read and write in all sorts of disastrous ways if not managed properly. Pipes close when all handles to the pipes are closed.
Although process execution works differently in Linux and Windows, Here is basically what happens (I'm going to get killed on this!)
p1 = Popen(["dmesg"], stdout=PIPE)
Create pipe_1, give a write handle to dmesg as its stdout, and return a read handle in the parent as p1.stdout. You now have 1 pipe with 2 handles (pipe_1 write in dmesg, pipe_1 read in the parent).
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
Create pipe_2. Give grep a write handle to pipe_2 and a copy of the read handle to pipe_1. You now have 2 pipes and 5 handles (pipe_1 write in dmesg, pipe_1 read and pipe_2 write in grep, pipe_1 read and pipe_2 read in the parent).
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
Notice that pipe_1 has two read handles. You want grep to have the read handle so that it reads dmesg data. You don't need the handle in the parent any more. Close it so that there is only 1 read handle on pipe_1. If grep dies, its pipe_1 read handle is closed, the operating system notices there are no remaining read handles for pipe_1 and gives dmesg the bad news.
output = p2.communicate()[0]
dmesg sends data to stdout (the pipe_1 write handle) which begins filling pipe_1. grep reads stdin (the pipe_1 read handle) which empties pipe_1. grep also writes stdout (the pipe_2 write handle) filling pipe_2. The parent process reads pipe_2... and you got yourself a pipeline!

Proper way of re-using and closing a subprocess object

I have the following code in a loop:
while true:
# Define shell_command
p1 = Popen(shell_command, shell=shell_type, stdout=PIPE, stderr=PIPE, preexec_fn=os.setsid)
result = p1.stdout.read();
# Define condition
if condition:
break;
where shell_command is something like ls (it just prints stuff).
I have read in different places that I can close/terminate/exit a Popen object in a variety of ways, e.g. :
p1.stdout.close()
p1.stdin.close()
p1.terminate
p1.kill
My question is:
What is the proper way of closing a subprocess object once we are done using it?
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
Update
I am still a bit confused about the sequence of steps to follow depending on whether I use p1.communicate() or p1.stdout.read() to interact with my process.
From what I understood in the answers and the comments:
If I use p1.communicate() I don't have to worry about releasing resources, since communicate() would wait until the process is finished, grab the output and properly close the subprocess object
If I follow the p1.stdout.read() route (which I think fits my situation, since the shell command is just supposed to print stuff) I should call things in this order:
p1.wait()
p1.stdout.read()
p1.terminate()
Is that right?
What is the proper way of closing a subprocess object once we are done using it?
stdout.close() and stdin.close() will not terminate a process unless it exits itself on end of input or on write errors.
.terminate() and .kill() both do the job, with kill being a bit more "drastic" on POSIX systems, as SIGKILL is sent, which cannot be ignored by the application. Specific differences are explained in this blog post, for example. On Windows, there's no difference.
Also, remember to .wait() and to close the pipes after killing a process to avoid zombies and force the freeing of resources.
A special case that is often encountered are processes which read from STDIN and write their result to STDOUT, closing themselves when EOF is encountered. With these kinds of programs, it's often sensible to use subprocess.communicate:
>>> p = Popen(["sort"], stdin=PIPE, stdout=PIPE)
>>> p.communicate("4\n3\n1")
('1\n3\n4\n', None)
>>> p.returncode
0
This can also be used for programs which print something and exit right after:
>>> p = Popen(["ls", "/home/niklas/test"], stdin=PIPE, stdout=PIPE)
>>> p.communicate()
('file1\nfile2\n', None)
>>> p.returncode
0
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
I don't think the subprocess module supports this and I don't see what resources could be shared here, so I don't think it would give you a significant advantage.
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands?
Yes.
#!/usr/bin/env python
from __future__ import print_function
import uuid
import random
from subprocess import Popen, PIPE, STDOUT
MARKER = str(uuid.uuid4())
shell_command = 'echo a'
p = Popen('sh', stdin=PIPE, stdout=PIPE, stderr=STDOUT,
universal_newlines=True) # decode output as utf-8, newline is '\n'
while True:
# write next command
print(shell_command, file=p.stdin)
# insert MARKER into stdout to separate output from different shell_command
print("echo '%s'" % MARKER, file=p.stdin)
# read command output
for line in iter(p.stdout.readline, MARKER+'\n'):
if line.endswith(MARKER+'\n'):
print(line[:-len(MARKER)-1])
break # command output ended without a newline
print(line, end='')
# exit on condition
if random.random() < 0.1:
break
# cleanup
p.stdout.close()
if p.stderr:
p.stderr.close()
p.stdin.close()
p.wait()
Put while True inside try: ... finally: to perform the cleanup in case of exceptions. On Python 3.2+ you could use with Popen(...): instead.
Would that be more efficient in any way than opening new subprocess objects each time?
Does it matter in your case? Don't guess. Measure it.
The "correct" order is:
Create a thread to read stdout (and a second one to read stderr, unless you merged them into one).
Write commands to be executed by the child to stdin. If you're not reading stdout at the same time, writing to stdin can block.
Close stdin (this is the signal for the child that it can now terminate by itself whenever it is done)
When stdout returns EOF, the child has terminated. Note that you need to synchronize the stdout reader thread and your main thread.
call wait() to see if there was a problem and to clean up the child process
If you need to stop the child process for any reason (maybe the user wants to quit), then you can:
Close stdin if the child terminates when it reads EOF.
Kill the with terminate(). This is the correct solution for child processes which ignore stdin.
If the child doesn't respond, try kill()
In all three cases, you must call wait() to clean up the dead child process.
Depends on what you expect the process to do; you should always call p1.wait() in order to avoid zombies. Other steps depend on the behaviour of the subprocess; if it produces any output, you should consume the output (e.g. p1.read() ...but this would eat lots of memory) and only then call the p1.wait(); or you may wait for some timeout and call p1.terminate() to kill the process if you think it doesn't work as expected, and possible call p1.wait() to clean the zombie.
Alternatively, p1.communicate(...) would do the handling if io and waiting for you (not the killing).
Subprocess objects aren't supposed to be reused.

Categories