I want to execute a program in a python application, it will run in the background but eventually come to the foreground.
A GUI is used to interact with it. But controls are offered via a console on stdin and stdout. I want to be able to control it using my application's GUI, so my first idea was:
Fork
in the parent, dup2 stdin and stdout in order to access them
exec the child
Is this easily implementable in python and how? Are there alternative ways to achieve what I want, what would that be?
First, the python subprocess module is the correct answer.
As an subprocess example:
import subprocess
x = subprocess.check_output(["echo","one","two","three"])
Where x will be the output (python3 bytes class: x.decode('utf-8') for string)
Note that this will NOT duplicate stderr. If you need stderr as well, you can do something like:
x = subprocess.check_output(["bash","-c", 'echo foo; echo bar >&2'],stderr=subprocess.STDOUT)
Of course, there are many other ways of capturing stderr, including to a different output variable.
Using direct control
However, if you are doing something tricky and need to have direct control, examine the code below:
import os
rside, wside = os.pipe()
if not os.fork():
# Child
os.close(rside)
# Make stdout go to parent
os.dup2(wside, 1)
# Make stderr go to parent
os.dup2(wside, 2)
# Optionally make stdin come from nowhere
devnull = os.open("/dev/null", os.O_RDONLY)
os.dup2(devnull, 0)
# Execute the desired program
os.execve("/bin/bash",["/bin/bash","-c","echo stdout; echo stderr >&2"],os.environ)
print("Failed to exec program!")
sys.exit(1)
# Parent
os.close(wside)
pyrside = os.fdopen(rside)
for line in pyrside:
print("Child (stdout or stderr) said: <%s>"%line)
# Prevent zombies! Reap the child after exit
pid, status = os.waitpid(-1, 0)
print("Child exited: pid %d returned %d"%(pid,status))
Note: #Beginner's answer is flawed in a few ways: os._exit(0) was included which immediately causes the child to exit, rendering everything else pointless. No os.execve() rendering the primary goal of the question pointless. No way to access the child's stdout/stderr as another question goal.
This is reasonably easy using the standard Python subprocess module:
http://docs.python.org/py3k/library/subprocess.html
That is not much complex in structure to build !
Check this example
if os.fork():
os._exit(0)
os.setsid()
os.chdir("/")
fd = os.open("/dev/null", os.O_RDWR)
os.dup2(fd, 0)
os.dup2(fd, 1)
os.dup2(fd, 2)
if fd 2:
os.close(fd)
This python code sets an id, changes the dir, opens a file and process and close !
Related
I have a python script that does this:
p = subprocess.Popen(pythonscript.py, stdin=PIPE, stdout=PIPE, stderr=PIPE, shell=False)
theStdin=request.input.encode('utf-8')
(outputhere,errorshere) = p.communicate(input=theStdin)
It works as expected, it waits for the subprocess to finish via p.communicate(). However within the pythonscript.py I want to "fire and forget" a "grandchild" process. I'm currently doing this by overwriting the join function:
class EverLastingProcess(Process):
def join(self, *args, **kwargs):
pass # Overwrites join so that it doesn't block. Otherwise parent waits.
def __del__(self):
pass
And starting it like this:
p = EverLastingProcess(target=nameOfMyFunction, args=(arg1, etc,), daemon=False)
p.start()
This also works fine I just run pythonscript.py in a bash terminal or bash script. Control and a response returns while the child process started by EverLastingProcess keeps going. However, when I run pythonscript.py with Popen running the process as shown above, it looks from timings that the Popen is waiting on the grandchild to finish.
How can I make it so that the Popen only waits on the child process, and not any grandchild processes?
The solution above (using the join method with the shell=True addition) stopped working when we upgraded our Python recently.
There are many references on the internet about the pieces and parts of this, but it took me some doing to come up with a useful solution to the entire problem.
The following solution has been tested in Python 3.9.5 and 3.9.7.
Problem Synopsis
The names of the scripts match those in the code example below.
A top-level program (grandparent.py):
Uses subprocess.run or subprocess.Popen to call a program (parent.py)
Checks return value from parent.py for sanity.
Collects stdout and stderr from the main process 'parent.py'.
Does not want to wait around for the grandchild to complete.
The called program (parent.py)
Might do some stuff first.
Spawns a very long process (the grandchild - "longProcess" in the code below).
Might do a little more work.
Returns its results and exits while the grandchild (longProcess) continues doing what it does.
Solution Synopsis
The important part isn't so much what happens with subprocess. Instead, the method for creating the grandchild/longProcess is the critical part. It is necessary to ensure that the grandchild is truly emancipated from parent.py.
Subprocess only needs to be used in a way that captures output.
The longProcess (grandchild) needs the following to happen:
It should be started using multiprocessing.
It needs multiprocessing's 'daemon' set to False.
It should also be invoked using the double-fork procedure.
In the double-fork, extra work needs to be done to ensure that the process is truly separate from parent.py. Specifically:
Move the execution away from the environment of parent.py.
Use file handling to ensure that the grandchild no longer uses the file handles (stdin, stdout, stderr) inherited from parent.py.
Example Code
grandparent.py - calls parent.py using subprocess.run()
#!/usr/bin/env python3
import subprocess
p = subprocess.run(["/usr/bin/python3", "/path/to/parent.py"], capture_output=True)
## Comment the following if you don't need reassurance
print("The return code is: " + str(p.returncode))
print("The standard out is: ")
print(p.stdout)
print("The standard error is: ")
print(p.stderr)
parent.py - starts the longProcess/grandchild and exits, leaving the grandchild running. After 10 seconds, the grandchild will write timing info to /tmp/timelog.
!/usr/bin/env python3
import time
def longProcess() :
time.sleep(10)
fo = open("/tmp/timelog", "w")
fo.write("I slept! The time now is: " + time.asctime(time.localtime()) + "\n")
fo.close()
import os,sys
def spawnDaemon(func):
# do the UNIX double-fork magic, see Stevens' "Advanced
# Programming in the UNIX Environment" for details (ISBN 0201563177)
try:
pid = os.fork()
if pid > 0: # parent process
return
except OSError as e:
print("fork #1 failed. See next. " )
print(e)
sys.exit(1)
# Decouple from the parent environment.
os.chdir("/")
os.setsid()
os.umask(0)
# do second fork
try:
pid = os.fork()
if pid > 0:
# exit from second parent
sys.exit(0)
except OSError as e:
print("fork #2 failed. See next. " )
print(e)
print(1)
# Redirect standard file descriptors.
# Here, they are reassigned to /dev/null, but they could go elsewhere.
sys.stdout.flush()
sys.stderr.flush()
si = open('/dev/null', 'r')
so = open('/dev/null', 'a+')
se = open('/dev/null', 'a+')
os.dup2(si.fileno(), sys.stdin.fileno())
os.dup2(so.fileno(), sys.stdout.fileno())
os.dup2(se.fileno(), sys.stderr.fileno())
# Run your daemon
func()
# Ensure that the daemon exits when complete
os._exit(os.EX_OK)
import multiprocessing
daemonicGrandchild=multiprocessing.Process(target=spawnDaemon, args=(longProcess,))
daemonicGrandchild.daemon=False
daemonicGrandchild.start()
print("have started the daemon") # This will get captured as stdout by grandparent.py
References
The code above was mainly inspired by the following two resources.
This reference is succinct about the use of the double-fork but does not include the file handling we need in this situation.
This reference contains the needed file handling, but does many other things that we do not need.
Edit: the below stopped working after a Python upgrade, see the accepted answer from Lachele.
Working answer from a colleague, change to shell=True like this:
p = subprocess.Popen(pythonscript.py, stdin=PIPE, stdout=PIPE, stderr=PIPE, shell=True)
I've tested and the grandchild subprocesses stay alive after the child processes returns without waiting for them to finish.
I have an assignment where we are making a shell for the Linux OS. And I have a lot of questions!
I was allowed to do it in python using some of the methods from the os library. The idea is that my program should communicate directly with the linux operating system calls.
So this include:
Create
Open
Close
Read
Write
Exit
Pipe
Exec
Fork
Dup2
Wait
So far I made a working shell which can execute commands with execvp but I am having trouble with the piping stuff.
I was reading this q/a and i felt that I almost understood what i have to do
I guess I have to use Dup2 to write (and maybe read later). Also I am a little confused if I should use read() and write() at some point regarding to piping.
from os import (
execvp,
wait,
fork,
close,
pipe,
dup2,
)
from os import _exit as kill
STDIN = 0
STDOUT = 1
STDERR = 2
CHILD = 0
def piping(cmd):
reading, writing = pipe()
pid = fork()
if pid > CHILD:
wait()
close(writing)
dup2(reading, STDIN)
execvp(cmd[1][0], cmd[1])
kill(127)
elif pid == CHILD:
close(reading)
dup2(writing, STDOUT)
execvp(cmd[0][0], cmd[0])
kill(127)
else:
print('Command not found:', cmd)
piping([['ls', '-l', '/'], ['grep', 'var']])
If I run this code it works. But I don't understand some things:
How can the execvp know that it gets extra arguments from the pipe?
Why should i kill in the end and why is it 127?
How is it possible to run the execvp inside the parent? Is this also possible in C?
If I have a nested pipe eg: ls -l / | grep var | xclip -selection clipboard should I create a new fork then? (maybe some recursion)
It is not a part of the assignment to write to a file, but I might implement it as well later, when I get the piping to work.
Should I use dup2 for that as well or maybe read/write
Thank you in advance! :)
I want to call a program from Python and make it believe that its stdout is a tty even when Python's process stdout is attached to a pipe. So I used the pty.spawn function to achieve that, which can be verified from the following :
$ python -c "import sys; from subprocess import call; call(sys.argv[1:])" python -c "import sys; print sys.stdout.isatty()" | cat
False
$ python -c "import sys; import pty; pty.spawn(sys.argv[1:])" python -c "import sys; print sys.stdout.isatty()" | cat
True
(We see that in the second command we have achieved our goal, i.e. the spawned process is tricked into thinking that its stdout is a tty.)
But the problem is that if we use pty.spawn then its input is not echoed, rather it is being redirected to the master's stdout. This can be seen by the following command :
$ python -c "import sys; import pty; pty.spawn(sys.argv[1:])" cat > out.txt
$ # Typed "hello" in input, but that is not echoed (use ^D to exit). It is redirected to output.txt
$ cat out.txt
hello
hello
(But this problem does not exists when we use subprocess.call
$ python -c "import sys; from subprocess import call; call(sys.argv[1:])" cat > out1.txt
hello
$ cat out1.txt
hello
since its stdin and stdout are correctly attached to the master.)
I could not find a way so that a program is called by Python, where it sees its stdout as a tty (similar to pty.spawn) but its input is echoed correctly (similar to subprocess.call). Any ideas?
You are creating a terminal with stdout connected to a file so the normal echo-back that terminals do is being sent to the file rather than screen.
Im not sure that spawn is intended to be used directly like this: the pty library offers pty.fork() to create a child process and returns a file descriptor for the stdin/stdout. But you'll need a lot more code to use this.
To overcome the current problem you are having with spawn, heres two easy options:
Option 1: If all you care about is sending the output of the spawned command to a file, then you can do (i prefer named pipes and here files for python one-liners):
$ python <(cat << EOF
import sys
import pty
print 'start to stdout only'
pty.spawn(sys.argv[1:])
print 'complete to stdout only'
EOF
) bash -c 'cat > out.txt'
which will look like this when run:
start to stdout only
hello
complete to stdout only
that shows that the input (I typed hello) and the result of print statements are going to the screen. The contents of out.txt will be:
$ cat out.txt
hello
That is, only what you typed.
Option 2: If on the other hand you want the out file to contain the python output around the spawned command output, then you need something a bit more complicated, like:
python <(cat << EOF
import sys
import pty
import os
old_stdout = sys.stdout
sys.stdout = myfdout = os.fdopen(4,"w")
print 'start to out file only'
myfdout.flush()
pty.spawn(sys.argv[1:])
print 'complete to out file only'
sys.stdout = old_stdout
EOF
) bash -c 'cat >&4' 4>out.txt
which will only have this output to the terminal when run (ie whatever you type):
hello
but the out file will contain:
$ cat out.txt
start to out file only
hello
complete to out file only
Background: python pty library is powerful: its creating a terminal device attached to the python processes stdout and stdin. Id imagine most uses of this will use the pty.fork() call so that real stdin/stdout are not affected.
However in your case, at your shell, you redirected the stdout of the python process to a file. The resulting pty also therefore had its stdout attached to the file so the normal action of echoing stdin back to stdout was being redirected. The regular stdout (screen) was still in place but not being used by the new pty.
The key difference for Option 1 above is to move the redirection of stdout to occur somewhere inside the pty.spawn call, so that the pty created still has a clear connection to the actual terminal stdout (for when it tries to echo stdin as you type)
The difference for Option 2 is to create a second channel on an arbitrary file descriptor (ie file descriptor 4) and use this in place of stdout, once you are inside python and when you create your spawned process (ie redirect the stdout of your spawned process to the same file descriptor)
Both of these difference prevent the pty that pty.spawn creates from having its stdout changed or disconnected from the real terminal. This allows the echo-back of stdin to work properly.
There are packages that use the pty library and give you more control but you'll find most of these use pty.fork() (and interestingly I havent found one yet that actually uses pty.spawn)
EDIT Heres an example of using pty.fork():
import sys
import pty
import os
import select
import time
import tty
import termios
print 'start'
try:
pid, fd = pty.fork()
print 'forked'
except OSError as e:
print e
if pid == pty.CHILD:
cmd = sys.argv[1]
args = sys.argv[1:]
print cmd, args
os.execvp(cmd,args)
else:
tty.setraw(fd, termios.TCSANOW)
try:
child_file = os.fdopen(fd,'rw')
read_list = [sys.stdin, child_file]
while read_list:
ready = select.select(read_list, [], [], 0.1)[0]
if not ready and len(read_list) < 2:
break
elif not ready:
time.sleep(1)
else:
for file in ready:
try:
line = file.readline()
except IOError as e:
print "Ignoring: ", e
line = None
if not line:
read_list.remove(file)
else:
if file == sys.stdin:
os.write(fd,line)
else:
print "from child:", line
except KeyboardInterrupt:
pass
EDIT This question has some good links for pty.fork()
UPDATE: should have put some comments in the code
How the pty.fork() example works:
When the interpretor executes the call to pty.fork(), the processesing splits into two: there are now two threads that both appear to have just executed the pty.fork() call.
One thread is the thread you were originally in (the parent) and one is a new thread (the child).
In the parent, the pid and fd are set to the process id of the child and a file decriptor connnected to teh child's stdin and stdout: in the parent, when you read from fd you are reading what has been written to the childs stdout; when you write to fd you are writing to the childs stdin. So now, in the parent we have a way of communicating with the other thread over its stdout/stdin.
In the child, the pid is set to 0 and the fd is not set. If we want to talk to the parent thread, we can read and write over stdin/stdout knowing that the parent can and should do something with this.
The two threads are going to execute the same code from this point on, but we can tell if we are in the parent or the child thread based on the value in pid. If we want to do different things in the child and parent threads then we just need a conditional statement that sends the child down one code path and the parent down a different code path. Thats what this line does:
if pid == pty.CHILD:
#child thread will execute this code
....
else
#parent thread will execute this code
...
In the child, we simply want to spawn the new command in a new pty. The os.execvp is used becuase we will have more control over the pty as a terminal with this method but essentially its the same as pty.spawn()'. This means the child stdin/stdout are now connected to the command you wanted via a pty. IMmportantly, any input or output from the command (or the pty for that matter) will be available to the parent thread by reading fromfd. And the parent can write to the command via pty by writing tofd`
So now, in the parent, we need to connect the real stdin/stdout to the child stdin/stdout via reading and writing to fd. Thats what the parent code now does (the else part). Any data that turns up on the real stdin is written out to fd. Any data read from fd (by the parent) is written to stdout. So the only thing the parent thread is now doing is proxying between the real stdin/stdout and fd. If you wanted to do something with the input and output of you command programatically, this is where you would do it.
The only other thing that happens in the parent is this call:
tty.setraw(fd, termios.TCSANOW)
This is one way to tell the pty in the child to stop doing echo-back.
This solves the problem you were originally having:
- your local terminal is connected only to the parent thread
- normal echo-back is in place (ie before your input is passed into the process)
- stdout of the process can be redirected
- whatever you do with your terminal stdout has no impact on the stdin/stdout of the child process
- the child process has been told to not do local echo-back of its stdin
That seems like a lot of explanation - if anyone has any edits for clarity?
I have a parent Python script that launches a child (which launches grandchildren), and after some time, I terminate the child, but the grandchildren continue to pump to stdout. After I kill the child, I want to suppress/redirect the stdout and stderr of the grandchildren (and all their descendants).
Here is the parent:
import time
import subprocess
proc = subprocess.Popen('./child.sh')
print("Dad: I have begotten a son!")
time.sleep(1)
proc.kill()
proc.wait()
print("Dad: My son hath died!")
time.sleep(2)
print("Dad: Why does my grandson still speak?")
Here is the child script which I cannot modify.
#!/bin/bash
./grandchild.sh &
echo "Child: I had a son!"
for (( i = 0; i < 10; i++ )); do
echo "Child: Hi Dad, meet your grandson!"
sleep 1
done
exit 0
Here is a noisy grandchild which I cannot modify.
#!/bin/bash
for (( i = 0; i < 10; i++ )); do
echo "Grandchild: Wahhh"
sleep 1
done
exit 0
I tried doing this right before killing the child:
import os
f = open(os.devnull,"w")
proc.stdout = proc.stderr = f
But it doesn't seem to work. The output is:
> ./parent.py
Dad: I have begotten a son!
Child: I had a son!
Child: Hi Dad, meet your grandson!
Grandchild: Wahhh
Dad: My son hath died!
Grandchild: Wahhh
Grandchild: Wahhh
Dad: My grandson still speaketh!
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
When you invoke subprocess.Popen you can tell it to redirect stdout and/or stderr. If you don't, it leaves them un-redirected by allowing the OS to copy from the Python process's actual STDOUT_FILENO and STDERR_FILENO (which are fixed constants, 1 and 2).
This means that if Python's fd 1 and 2 are going to your tty session (perhaps on an underlying device like /dev/pts/0 for instance), the child—and with this case, consequently, the grandchild as well—are talking directly to the same session (the same /dev/pts/0). Nothing you do in the Python process itself can change this: those are independent processes with independent, direct access to the session.
What you can do is invoke ./child.sh with redirection in place:
proc = subprocess.Popen('./child.sh', stdout=subprocess.PIPE)
Quick side-note edit: if you want to discard all output from the child and its grandchildren, open os.devnull (either as you did, or with os.open() to get a raw integer file descriptor) and connect stdout and stderr to the underlying file descriptor. If you have opened it as a Python stream:
f = open(os.devnull, "w")
then the underlying file descriptor is f.fileno():
proc = subprocess.Popen('./child.sh', stdout=f.fileno(), stderr=f.fileno())
In this case you cannot get any output from any of the processes involved.
Now file descriptor 1 in the child is connected to a pipe-entity, rather than directly to the session. (Since there is no stderr= above, fd 2 in the child is still connected directly to the session.)
The pipe-entity, which lives inside the operating system, simply copies from one end (the "write end" of the pipe) to the other (the "read end"). Your Python process has control of the read-end. You must invoke the OS read system call—often not directly, but see below—on that read end, to collect the output from it.
In general, if you stop reading from your read-end, the pipe "fills up" and any process attempting an OS-level write on the write-end is "blocked" until someone with access to the read end (that's you, again) reads from it.
If you discard the read-end, leaving the pipe with nowhere to dump its output, the write end starts returning EPIPE errors and sending SIGPIPE signals, to any process attempting an OS-level write call. This kind of discard occurs when you call the OS-level close system call, assuming you have not handed the descriptor off to some other process(es). It also occurs when your process exits (under the same assumption, again).
There is no convenient method by which you can connect the read-end to an infinite data sink like /dev/null, at least in most Unix-like systems (there are a few with some special funky system calls to do this kind of "plumbing"). But if you plan to kill the child and are willing to let its grandchildren die from SIGPIPE signals, you can simply close the descriptor (or exit) and let the chips fall where they may.
Children and grandchildren can protect themselves from dying by setting SIGPIPE to SIG_IGN, or by blocking SIGPIPE. Signal masks are inherited across exec system calls so in some cases, you can block SIGPIPE for children (but some children will unblock signals).
If closing the descriptor is not suitable, you can create a new process that simply reads and discards incoming pipe data. If you use the fork system call, this is trivial. Alternatively some Unix-like systems allow you to pass file descriptors through AF_UNIX sockets to otherwise-unrelated (parent/child-wise) processes, so you could have a daemon that does this, reachable via an AF_UNIX socket. (This is nontrivial to code.)
If you wish the child process to send its stderr output to the same pipe, so that you can read both its stdout and its stderr, simply add stderr=subprocess.STDOUT to the Popen() call. If you wish the child process to send its stderr output to a separate pipe, add stderr=subprocess.PIPE. If you do the latter, however, things can get a bit tricky.
To prevent children from blocking, as noted above, you must invoke the OS read call. If there is only one pipe this is easy:
for line in proc.stdout:
...
for instance, or:
line = proc.stdout.readline()
will read the pipe one line at a time (modulo buffering inside Python). You can read as many or as few lines as you like.
If there are two pipes, though, you must read whichever one(s) is/are "full". Python's subprocess module defines the communicate() function to do this for you:
stdout, stderr = proc.communicate()
The drawback here is that communicate() reads to completion: it needs to get all output that can go to the write end of each pipe. This means it repeatedly calls the OS-level read operation until read indicates end-of-data. That occurs only when all processes that had, at some point, write access to the write end of the corresponding pipe, have closed that end of the pipe. In other words, it waits for the child and any grandchildren to close the descriptors connected to the write end of the pipe(s).
In general it's much simpler to use only one pipe, read as much (but only as much) as you like, then simply close the pipe:
proc = subprocess.Popen('./child.sh', stdout=subprocess.PIPE)
line1 = proc.stdout.readline()
line2 = proc.stdout.readline()
# that's all we care about
proc.stdout.close()
proc.kill()
status = proc.wait()
Whether this suffices depends on your particular problem.
If you don't care about the grandchildren; you could kill them all:
#!/usr/bin/env python3
import os
import signal
import subprocess
import sys
import time
proc = subprocess.Popen('./child.sh', start_new_session=True)
print("Dad: I have begotten a son!")
time.sleep(1)
print("Dad: kill'em all!")
os.killpg(proc.pid, signal.SIGKILL)
for msg in "dead... silence... crickets... chirping...".split():
time.sleep(1)
print(msg, end=' ', flush=True)
You can emulate start_new_session=True on old Python versions using preexec_fn=os.setsid. See Best way to kill all child processes.
You can collect children's output before the killing:
#!/usr/bin/env python
import collections
import os
import signal
import threading
from subprocess import Popen, PIPE, STDOUT
def killall(proc):
print "Dad: kill'em all!"
os.killpg(proc.pid, signal.SIGKILL)
proc.wait()
proc = Popen('./child.sh', stdout=PIPE, stderr=STDOUT, preexec_fn=os.setsid)
print("Dad: I have begotten a son!")
# kill in a second
hitman = threading.Timer(1, killall, [proc])
hitman.start()
# save last 200 lines of output
q = collections.deque(proc.stdout, maxlen=200)
hitman.cancel()
proc.wait()
# print collected output
print '*'*60
print ''.join(q).decode('ascii'),
print '*'*60
See Stop reading process output in Python without hang?
Right now, your subprocess is allowed to communicate with your terminal via STDOUT and STDERR. Instead, you can hijack this data from the subprocess like so:
import subprocess
cmd = ['./child.sh']
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
This redirects all STDERR output of your child to the normal STDOUT channel, then redirects the normal STDOUT output of your child to your python script, via a PIPE. You can now read from that PIPE using line = process.stdout.readline(), which grabs a single line of output. You can print that back to STDOUT with print(line).
Once you kill your son (gasp), stop all output from your subprocess.
For a more information on subprocess, see one of my previous answers which are similar to this: python subprocess.call output is not interleaved
I have the following code in a loop:
while true:
# Define shell_command
p1 = Popen(shell_command, shell=shell_type, stdout=PIPE, stderr=PIPE, preexec_fn=os.setsid)
result = p1.stdout.read();
# Define condition
if condition:
break;
where shell_command is something like ls (it just prints stuff).
I have read in different places that I can close/terminate/exit a Popen object in a variety of ways, e.g. :
p1.stdout.close()
p1.stdin.close()
p1.terminate
p1.kill
My question is:
What is the proper way of closing a subprocess object once we are done using it?
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
Update
I am still a bit confused about the sequence of steps to follow depending on whether I use p1.communicate() or p1.stdout.read() to interact with my process.
From what I understood in the answers and the comments:
If I use p1.communicate() I don't have to worry about releasing resources, since communicate() would wait until the process is finished, grab the output and properly close the subprocess object
If I follow the p1.stdout.read() route (which I think fits my situation, since the shell command is just supposed to print stuff) I should call things in this order:
p1.wait()
p1.stdout.read()
p1.terminate()
Is that right?
What is the proper way of closing a subprocess object once we are done using it?
stdout.close() and stdin.close() will not terminate a process unless it exits itself on end of input or on write errors.
.terminate() and .kill() both do the job, with kill being a bit more "drastic" on POSIX systems, as SIGKILL is sent, which cannot be ignored by the application. Specific differences are explained in this blog post, for example. On Windows, there's no difference.
Also, remember to .wait() and to close the pipes after killing a process to avoid zombies and force the freeing of resources.
A special case that is often encountered are processes which read from STDIN and write their result to STDOUT, closing themselves when EOF is encountered. With these kinds of programs, it's often sensible to use subprocess.communicate:
>>> p = Popen(["sort"], stdin=PIPE, stdout=PIPE)
>>> p.communicate("4\n3\n1")
('1\n3\n4\n', None)
>>> p.returncode
0
This can also be used for programs which print something and exit right after:
>>> p = Popen(["ls", "/home/niklas/test"], stdin=PIPE, stdout=PIPE)
>>> p.communicate()
('file1\nfile2\n', None)
>>> p.returncode
0
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
I don't think the subprocess module supports this and I don't see what resources could be shared here, so I don't think it would give you a significant advantage.
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands?
Yes.
#!/usr/bin/env python
from __future__ import print_function
import uuid
import random
from subprocess import Popen, PIPE, STDOUT
MARKER = str(uuid.uuid4())
shell_command = 'echo a'
p = Popen('sh', stdin=PIPE, stdout=PIPE, stderr=STDOUT,
universal_newlines=True) # decode output as utf-8, newline is '\n'
while True:
# write next command
print(shell_command, file=p.stdin)
# insert MARKER into stdout to separate output from different shell_command
print("echo '%s'" % MARKER, file=p.stdin)
# read command output
for line in iter(p.stdout.readline, MARKER+'\n'):
if line.endswith(MARKER+'\n'):
print(line[:-len(MARKER)-1])
break # command output ended without a newline
print(line, end='')
# exit on condition
if random.random() < 0.1:
break
# cleanup
p.stdout.close()
if p.stderr:
p.stderr.close()
p.stdin.close()
p.wait()
Put while True inside try: ... finally: to perform the cleanup in case of exceptions. On Python 3.2+ you could use with Popen(...): instead.
Would that be more efficient in any way than opening new subprocess objects each time?
Does it matter in your case? Don't guess. Measure it.
The "correct" order is:
Create a thread to read stdout (and a second one to read stderr, unless you merged them into one).
Write commands to be executed by the child to stdin. If you're not reading stdout at the same time, writing to stdin can block.
Close stdin (this is the signal for the child that it can now terminate by itself whenever it is done)
When stdout returns EOF, the child has terminated. Note that you need to synchronize the stdout reader thread and your main thread.
call wait() to see if there was a problem and to clean up the child process
If you need to stop the child process for any reason (maybe the user wants to quit), then you can:
Close stdin if the child terminates when it reads EOF.
Kill the with terminate(). This is the correct solution for child processes which ignore stdin.
If the child doesn't respond, try kill()
In all three cases, you must call wait() to clean up the dead child process.
Depends on what you expect the process to do; you should always call p1.wait() in order to avoid zombies. Other steps depend on the behaviour of the subprocess; if it produces any output, you should consume the output (e.g. p1.read() ...but this would eat lots of memory) and only then call the p1.wait(); or you may wait for some timeout and call p1.terminate() to kill the process if you think it doesn't work as expected, and possible call p1.wait() to clean the zombie.
Alternatively, p1.communicate(...) would do the handling if io and waiting for you (not the killing).
Subprocess objects aren't supposed to be reused.