I have a parent Python script that launches a child (which launches grandchildren), and after some time, I terminate the child, but the grandchildren continue to pump to stdout. After I kill the child, I want to suppress/redirect the stdout and stderr of the grandchildren (and all their descendants).
Here is the parent:
import time
import subprocess
proc = subprocess.Popen('./child.sh')
print("Dad: I have begotten a son!")
time.sleep(1)
proc.kill()
proc.wait()
print("Dad: My son hath died!")
time.sleep(2)
print("Dad: Why does my grandson still speak?")
Here is the child script which I cannot modify.
#!/bin/bash
./grandchild.sh &
echo "Child: I had a son!"
for (( i = 0; i < 10; i++ )); do
echo "Child: Hi Dad, meet your grandson!"
sleep 1
done
exit 0
Here is a noisy grandchild which I cannot modify.
#!/bin/bash
for (( i = 0; i < 10; i++ )); do
echo "Grandchild: Wahhh"
sleep 1
done
exit 0
I tried doing this right before killing the child:
import os
f = open(os.devnull,"w")
proc.stdout = proc.stderr = f
But it doesn't seem to work. The output is:
> ./parent.py
Dad: I have begotten a son!
Child: I had a son!
Child: Hi Dad, meet your grandson!
Grandchild: Wahhh
Dad: My son hath died!
Grandchild: Wahhh
Grandchild: Wahhh
Dad: My grandson still speaketh!
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
When you invoke subprocess.Popen you can tell it to redirect stdout and/or stderr. If you don't, it leaves them un-redirected by allowing the OS to copy from the Python process's actual STDOUT_FILENO and STDERR_FILENO (which are fixed constants, 1 and 2).
This means that if Python's fd 1 and 2 are going to your tty session (perhaps on an underlying device like /dev/pts/0 for instance), the child—and with this case, consequently, the grandchild as well—are talking directly to the same session (the same /dev/pts/0). Nothing you do in the Python process itself can change this: those are independent processes with independent, direct access to the session.
What you can do is invoke ./child.sh with redirection in place:
proc = subprocess.Popen('./child.sh', stdout=subprocess.PIPE)
Quick side-note edit: if you want to discard all output from the child and its grandchildren, open os.devnull (either as you did, or with os.open() to get a raw integer file descriptor) and connect stdout and stderr to the underlying file descriptor. If you have opened it as a Python stream:
f = open(os.devnull, "w")
then the underlying file descriptor is f.fileno():
proc = subprocess.Popen('./child.sh', stdout=f.fileno(), stderr=f.fileno())
In this case you cannot get any output from any of the processes involved.
Now file descriptor 1 in the child is connected to a pipe-entity, rather than directly to the session. (Since there is no stderr= above, fd 2 in the child is still connected directly to the session.)
The pipe-entity, which lives inside the operating system, simply copies from one end (the "write end" of the pipe) to the other (the "read end"). Your Python process has control of the read-end. You must invoke the OS read system call—often not directly, but see below—on that read end, to collect the output from it.
In general, if you stop reading from your read-end, the pipe "fills up" and any process attempting an OS-level write on the write-end is "blocked" until someone with access to the read end (that's you, again) reads from it.
If you discard the read-end, leaving the pipe with nowhere to dump its output, the write end starts returning EPIPE errors and sending SIGPIPE signals, to any process attempting an OS-level write call. This kind of discard occurs when you call the OS-level close system call, assuming you have not handed the descriptor off to some other process(es). It also occurs when your process exits (under the same assumption, again).
There is no convenient method by which you can connect the read-end to an infinite data sink like /dev/null, at least in most Unix-like systems (there are a few with some special funky system calls to do this kind of "plumbing"). But if you plan to kill the child and are willing to let its grandchildren die from SIGPIPE signals, you can simply close the descriptor (or exit) and let the chips fall where they may.
Children and grandchildren can protect themselves from dying by setting SIGPIPE to SIG_IGN, or by blocking SIGPIPE. Signal masks are inherited across exec system calls so in some cases, you can block SIGPIPE for children (but some children will unblock signals).
If closing the descriptor is not suitable, you can create a new process that simply reads and discards incoming pipe data. If you use the fork system call, this is trivial. Alternatively some Unix-like systems allow you to pass file descriptors through AF_UNIX sockets to otherwise-unrelated (parent/child-wise) processes, so you could have a daemon that does this, reachable via an AF_UNIX socket. (This is nontrivial to code.)
If you wish the child process to send its stderr output to the same pipe, so that you can read both its stdout and its stderr, simply add stderr=subprocess.STDOUT to the Popen() call. If you wish the child process to send its stderr output to a separate pipe, add stderr=subprocess.PIPE. If you do the latter, however, things can get a bit tricky.
To prevent children from blocking, as noted above, you must invoke the OS read call. If there is only one pipe this is easy:
for line in proc.stdout:
...
for instance, or:
line = proc.stdout.readline()
will read the pipe one line at a time (modulo buffering inside Python). You can read as many or as few lines as you like.
If there are two pipes, though, you must read whichever one(s) is/are "full". Python's subprocess module defines the communicate() function to do this for you:
stdout, stderr = proc.communicate()
The drawback here is that communicate() reads to completion: it needs to get all output that can go to the write end of each pipe. This means it repeatedly calls the OS-level read operation until read indicates end-of-data. That occurs only when all processes that had, at some point, write access to the write end of the corresponding pipe, have closed that end of the pipe. In other words, it waits for the child and any grandchildren to close the descriptors connected to the write end of the pipe(s).
In general it's much simpler to use only one pipe, read as much (but only as much) as you like, then simply close the pipe:
proc = subprocess.Popen('./child.sh', stdout=subprocess.PIPE)
line1 = proc.stdout.readline()
line2 = proc.stdout.readline()
# that's all we care about
proc.stdout.close()
proc.kill()
status = proc.wait()
Whether this suffices depends on your particular problem.
If you don't care about the grandchildren; you could kill them all:
#!/usr/bin/env python3
import os
import signal
import subprocess
import sys
import time
proc = subprocess.Popen('./child.sh', start_new_session=True)
print("Dad: I have begotten a son!")
time.sleep(1)
print("Dad: kill'em all!")
os.killpg(proc.pid, signal.SIGKILL)
for msg in "dead... silence... crickets... chirping...".split():
time.sleep(1)
print(msg, end=' ', flush=True)
You can emulate start_new_session=True on old Python versions using preexec_fn=os.setsid. See Best way to kill all child processes.
You can collect children's output before the killing:
#!/usr/bin/env python
import collections
import os
import signal
import threading
from subprocess import Popen, PIPE, STDOUT
def killall(proc):
print "Dad: kill'em all!"
os.killpg(proc.pid, signal.SIGKILL)
proc.wait()
proc = Popen('./child.sh', stdout=PIPE, stderr=STDOUT, preexec_fn=os.setsid)
print("Dad: I have begotten a son!")
# kill in a second
hitman = threading.Timer(1, killall, [proc])
hitman.start()
# save last 200 lines of output
q = collections.deque(proc.stdout, maxlen=200)
hitman.cancel()
proc.wait()
# print collected output
print '*'*60
print ''.join(q).decode('ascii'),
print '*'*60
See Stop reading process output in Python without hang?
Right now, your subprocess is allowed to communicate with your terminal via STDOUT and STDERR. Instead, you can hijack this data from the subprocess like so:
import subprocess
cmd = ['./child.sh']
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
This redirects all STDERR output of your child to the normal STDOUT channel, then redirects the normal STDOUT output of your child to your python script, via a PIPE. You can now read from that PIPE using line = process.stdout.readline(), which grabs a single line of output. You can print that back to STDOUT with print(line).
Once you kill your son (gasp), stop all output from your subprocess.
For a more information on subprocess, see one of my previous answers which are similar to this: python subprocess.call output is not interleaved
Related
I am running two processes simultaneously in python using the subprocess module:
p_topic = subprocess.Popen(['rostopic','echo','/msg/address'], stdout=PIPE)
p_play = subprocess.Popen(['rosbag','play',bagfile_path])
These are ROS processes: p_topic listens for a .bag file to be played and outputs certain information from that .bag file to the stdout stream; I want to then access this output using the p_topic.stdout object (which behaves as a file).
However, what I find happening is that the p_topic.stdout object only contains the first ~1/3 of the output lines it should have - that is, in comparison to running the two commands manually, simultaneously in two shells side by side.
I've tried waiting for many seconds for output to finish, but this doesn't change anything, its approximately the same ratio of lines captured by p_topic.stdout each time. Any hints on what this could be would be greatly appreciated!
EDIT:
Here's the reading code:
#wait for playing to stop
while p_play.poll() == None:
time.sleep(.1)
time.sleep(X)#wait for some time for the p_topic to finish
p_topic.terminate()
output=[]
for line in p_topic.stdout:
output.append(line)
Note that the value X in time.sleep(X) doesn't make any difference
By default, when a process's stdout is not connected to a terminal, the output is block buffered. When connected to a terminal, it's line buffered. You expect to get complete lines, but you can't unless rostopic unbuffers or explicitly line buffers its stdout (if it's a C program, you can use setvbuf to make this automatic).
The other (possibly overlapping) possibility is that the pipe buffer itself is filling (pipe buffers are usually fairly small), and because you never drain it, rostopic fills the pipe buffer and then blocks indefinitely until you kill it, leaving only what managed to fit in the pipe to be drained when you read the process's stdout. In that case, you'd need to either spawn a thread to keep the pipe drained from Python, or have your main thread use select module components to monitor and drain the pipe (intermingled with polling the other process). The thread is generally easier, though you do need to be careful to avoid thread safety issues.
is it worth trying process communicate/wait? rather than sleep and would that solve your issue?
i have this for general purpose so not sure if you can take this and change it to what you need?
executable_Params = "{0} {1} {2} {3} {4}".format(my_Binary,
arg1,
arg2,
arg3,
arg4)
# execute the process
process = subprocess.Popen(shlex.split(executable_Params),
shell=False,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
stdout, stderr = process.communicate()
ret_code = process.wait()
if ret_code == 0:
return 0
else:
#get the correct message from my enum method
error_msg = Process_Error_Codes(ret_code).name
raise subprocess.CalledProcessError(returncode=ret_code,
cmd=executable_Params)
To simplify my question, here'a a python script:
from subprocess import Popen, PIPE
proc = Popen(['./mr-task.sh'], shell=True, stdout=PIPE, stderr=PIPE)
while True:
out = proc.stdout.readline()
print(out)
Here's mr-task.sh, it starts a mapreduce job:
hadoop jar xxx.jar some-conf-we-don't-need-to-care
When I run ./mr-task, I could see log printed on the screen, something like:
14/12/25 14:56:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/12/25 14:56:44 INFO snappy.LoadSnappy: Snappy native library loaded
14/12/25 14:57:01 INFO mapred.JobClient: Running job: job_201411181108_16380
14/12/25 14:57:02 INFO mapred.JobClient: map 0% reduce 0%
14/12/25 14:57:28 INFO mapred.JobClient: map 100% reduce 0%
But I can't get these output running python script. I tried removing shell=True or fetch stderr, still got nothing.
Does anyone have any idea why this happens?
You could redirect stderr to stdout:
from subprocess import Popen, PIPE, STDOUT
proc = Popen(['./mr-task.sh'], stdout=PIPE, stderr=STDOUT, bufsize=1)
for line in iter(proc.stdout.readline, b''):
print line,
proc.stdout.close()
proc.wait()
See Python: read streaming input from subprocess.communicate().
in my real program I redirect stderr to stdout and read from stdout, so bufsize is not needed, is it?
The redirection of stderr to stdout and bufsize are unrelated. Changing bufsize might affect the time performance (the default bufsize=0 i.e., unbuffered on Python 2). Unbuffered I/O might be 10..100 times slower. As usual, you should measure the time performance if it is important.
Calling Popen.wait/communicate after the subprocess has terminated is just for clearing zombie process, and these two methods have no difference in such case, correct?
The difference is that proc.communicate() closes the pipes before reaping the child process. It releases file descriptors (a finite resource) to be used by a other files in your program.
about buffer, if output fill buffer maxsize, will subprocess hang? Does that mean if I use the default bufsize=0 setting I need to read from stdout as soon as possible so that subprocess don't block?
No. It is a different buffer. bufsize controls the buffer inside the parent that is filled/drained when you call .readline() method. There won't be a deadlock whatever bufsize is.
The code (as written above) won't deadlock no matter how much output the child might produce.
The code in #falsetru's answer can deadlock because it creates two pipes (stdout=PIPE, stderr=PIPE) but it reads only from one pipe (proc.stderr).
There are several buffers between the child and the parent e.g., C stdio's stdout buffer (a libc buffer inside child process, inaccessible from the parent), child's stdout OS pipe buffer (inside kernel, the parent process may read the data from here). These buffers are fixed they won't grow if you put more data into them. If stdio's buffer overflows (e.g., during a printf() call) then the data is pushed downstream into the child's stdout OS pipe buffer. If nobody reads from the pipe then then this OS pipe buffer fills up and the child blocks (e.g., on write() system call) trying to flush the data.
To be concrete, I've assumed C stdio's based program and POSIXy OS.
The deadlock happens because the parent tries to read from the stderr pipe that is empty because the child is busy trying to flush its stdout. Thus both processes hang.
One possible reaosn is that the output is printed to standard error instead of standard output.
Try to replace stdout with stderr:
from subprocess import Popen, PIPE
proc = Popen(['./mr-task.sh'], stdout=PIPE, stderr=PIPE)
while True:
out = proc.stderr.readline() # <----
if not out:
break
print(out)
I am using linux/cpython 3.3/bash. Here's my problem:
#!/usr/bin/env python3
from subprocess import Popen, PIPE, DEVNULL
import time
s = Popen('cat', stdin=PIPE, stdout=DEVNULL, stderr=DEVNULL)
s.stdin.write(b'helloworld')
s.stdin.close()
time.sleep(1000) #doing stuff
This leaves cat as a zombie (and I'm busy "doing stuff" and can't wait on the child process). Is there a way in bash that I can wrap cat (e.g. through creating a grand-child) that would allow me to write to cat's stdin, but have init take over as the parent? A python solution would work too, and I can also use nohup, disown etc.
Run the subprocess from another process whose only task is to wait on it.
pid = os.fork()
if pid == 0:
s = Popen('cat', stdin=PIPE, stdout=DEVNULL, stderr=DEVNULL)
s.stdin.write(b'helloworld')
s.stdin.close()
s.wait()
sys.exit()
time.sleep(1000)
One workaround might be to "daemonize" your cat: fork, then quickly fork again and exit in the 2nd process, with the 1st one wait()ing for the 2nd. The 3rd process can then exec() cat, which will inherit its file descriptors from its parent. Thus you need to create a pipe first, then close stdin in the child and dup() it from the pipe.
I don't know how to do these things in python, but I'm fairly certain it should be possible.
I have the following code in a loop:
while true:
# Define shell_command
p1 = Popen(shell_command, shell=shell_type, stdout=PIPE, stderr=PIPE, preexec_fn=os.setsid)
result = p1.stdout.read();
# Define condition
if condition:
break;
where shell_command is something like ls (it just prints stuff).
I have read in different places that I can close/terminate/exit a Popen object in a variety of ways, e.g. :
p1.stdout.close()
p1.stdin.close()
p1.terminate
p1.kill
My question is:
What is the proper way of closing a subprocess object once we are done using it?
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
Update
I am still a bit confused about the sequence of steps to follow depending on whether I use p1.communicate() or p1.stdout.read() to interact with my process.
From what I understood in the answers and the comments:
If I use p1.communicate() I don't have to worry about releasing resources, since communicate() would wait until the process is finished, grab the output and properly close the subprocess object
If I follow the p1.stdout.read() route (which I think fits my situation, since the shell command is just supposed to print stuff) I should call things in this order:
p1.wait()
p1.stdout.read()
p1.terminate()
Is that right?
What is the proper way of closing a subprocess object once we are done using it?
stdout.close() and stdin.close() will not terminate a process unless it exits itself on end of input or on write errors.
.terminate() and .kill() both do the job, with kill being a bit more "drastic" on POSIX systems, as SIGKILL is sent, which cannot be ignored by the application. Specific differences are explained in this blog post, for example. On Windows, there's no difference.
Also, remember to .wait() and to close the pipes after killing a process to avoid zombies and force the freeing of resources.
A special case that is often encountered are processes which read from STDIN and write their result to STDOUT, closing themselves when EOF is encountered. With these kinds of programs, it's often sensible to use subprocess.communicate:
>>> p = Popen(["sort"], stdin=PIPE, stdout=PIPE)
>>> p.communicate("4\n3\n1")
('1\n3\n4\n', None)
>>> p.returncode
0
This can also be used for programs which print something and exit right after:
>>> p = Popen(["ls", "/home/niklas/test"], stdin=PIPE, stdout=PIPE)
>>> p.communicate()
('file1\nfile2\n', None)
>>> p.returncode
0
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
I don't think the subprocess module supports this and I don't see what resources could be shared here, so I don't think it would give you a significant advantage.
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands?
Yes.
#!/usr/bin/env python
from __future__ import print_function
import uuid
import random
from subprocess import Popen, PIPE, STDOUT
MARKER = str(uuid.uuid4())
shell_command = 'echo a'
p = Popen('sh', stdin=PIPE, stdout=PIPE, stderr=STDOUT,
universal_newlines=True) # decode output as utf-8, newline is '\n'
while True:
# write next command
print(shell_command, file=p.stdin)
# insert MARKER into stdout to separate output from different shell_command
print("echo '%s'" % MARKER, file=p.stdin)
# read command output
for line in iter(p.stdout.readline, MARKER+'\n'):
if line.endswith(MARKER+'\n'):
print(line[:-len(MARKER)-1])
break # command output ended without a newline
print(line, end='')
# exit on condition
if random.random() < 0.1:
break
# cleanup
p.stdout.close()
if p.stderr:
p.stderr.close()
p.stdin.close()
p.wait()
Put while True inside try: ... finally: to perform the cleanup in case of exceptions. On Python 3.2+ you could use with Popen(...): instead.
Would that be more efficient in any way than opening new subprocess objects each time?
Does it matter in your case? Don't guess. Measure it.
The "correct" order is:
Create a thread to read stdout (and a second one to read stderr, unless you merged them into one).
Write commands to be executed by the child to stdin. If you're not reading stdout at the same time, writing to stdin can block.
Close stdin (this is the signal for the child that it can now terminate by itself whenever it is done)
When stdout returns EOF, the child has terminated. Note that you need to synchronize the stdout reader thread and your main thread.
call wait() to see if there was a problem and to clean up the child process
If you need to stop the child process for any reason (maybe the user wants to quit), then you can:
Close stdin if the child terminates when it reads EOF.
Kill the with terminate(). This is the correct solution for child processes which ignore stdin.
If the child doesn't respond, try kill()
In all three cases, you must call wait() to clean up the dead child process.
Depends on what you expect the process to do; you should always call p1.wait() in order to avoid zombies. Other steps depend on the behaviour of the subprocess; if it produces any output, you should consume the output (e.g. p1.read() ...but this would eat lots of memory) and only then call the p1.wait(); or you may wait for some timeout and call p1.terminate() to kill the process if you think it doesn't work as expected, and possible call p1.wait() to clean the zombie.
Alternatively, p1.communicate(...) would do the handling if io and waiting for you (not the killing).
Subprocess objects aren't supposed to be reused.
I want to execute a program in a python application, it will run in the background but eventually come to the foreground.
A GUI is used to interact with it. But controls are offered via a console on stdin and stdout. I want to be able to control it using my application's GUI, so my first idea was:
Fork
in the parent, dup2 stdin and stdout in order to access them
exec the child
Is this easily implementable in python and how? Are there alternative ways to achieve what I want, what would that be?
First, the python subprocess module is the correct answer.
As an subprocess example:
import subprocess
x = subprocess.check_output(["echo","one","two","three"])
Where x will be the output (python3 bytes class: x.decode('utf-8') for string)
Note that this will NOT duplicate stderr. If you need stderr as well, you can do something like:
x = subprocess.check_output(["bash","-c", 'echo foo; echo bar >&2'],stderr=subprocess.STDOUT)
Of course, there are many other ways of capturing stderr, including to a different output variable.
Using direct control
However, if you are doing something tricky and need to have direct control, examine the code below:
import os
rside, wside = os.pipe()
if not os.fork():
# Child
os.close(rside)
# Make stdout go to parent
os.dup2(wside, 1)
# Make stderr go to parent
os.dup2(wside, 2)
# Optionally make stdin come from nowhere
devnull = os.open("/dev/null", os.O_RDONLY)
os.dup2(devnull, 0)
# Execute the desired program
os.execve("/bin/bash",["/bin/bash","-c","echo stdout; echo stderr >&2"],os.environ)
print("Failed to exec program!")
sys.exit(1)
# Parent
os.close(wside)
pyrside = os.fdopen(rside)
for line in pyrside:
print("Child (stdout or stderr) said: <%s>"%line)
# Prevent zombies! Reap the child after exit
pid, status = os.waitpid(-1, 0)
print("Child exited: pid %d returned %d"%(pid,status))
Note: #Beginner's answer is flawed in a few ways: os._exit(0) was included which immediately causes the child to exit, rendering everything else pointless. No os.execve() rendering the primary goal of the question pointless. No way to access the child's stdout/stderr as another question goal.
This is reasonably easy using the standard Python subprocess module:
http://docs.python.org/py3k/library/subprocess.html
That is not much complex in structure to build !
Check this example
if os.fork():
os._exit(0)
os.setsid()
os.chdir("/")
fd = os.open("/dev/null", os.O_RDWR)
os.dup2(fd, 0)
os.dup2(fd, 1)
os.dup2(fd, 2)
if fd 2:
os.close(fd)
This python code sets an id, changes the dir, opens a file and process and close !