Backgrounding a process after writing to its stdin - python

I am using linux/cpython 3.3/bash. Here's my problem:
#!/usr/bin/env python3
from subprocess import Popen, PIPE, DEVNULL
import time
s = Popen('cat', stdin=PIPE, stdout=DEVNULL, stderr=DEVNULL)
s.stdin.write(b'helloworld')
s.stdin.close()
time.sleep(1000) #doing stuff
This leaves cat as a zombie (and I'm busy "doing stuff" and can't wait on the child process). Is there a way in bash that I can wrap cat (e.g. through creating a grand-child) that would allow me to write to cat's stdin, but have init take over as the parent? A python solution would work too, and I can also use nohup, disown etc.

Run the subprocess from another process whose only task is to wait on it.
pid = os.fork()
if pid == 0:
s = Popen('cat', stdin=PIPE, stdout=DEVNULL, stderr=DEVNULL)
s.stdin.write(b'helloworld')
s.stdin.close()
s.wait()
sys.exit()
time.sleep(1000)

One workaround might be to "daemonize" your cat: fork, then quickly fork again and exit in the 2nd process, with the 1st one wait()ing for the 2nd. The 3rd process can then exec() cat, which will inherit its file descriptors from its parent. Thus you need to create a pipe first, then close stdin in the child and dup() it from the pipe.
I don't know how to do these things in python, but I'm fairly certain it should be possible.

Related

How to kill subprocess after time.sleep()? [duplicate]

I am running some shell scripts with the subprocess module in python. If the shell scripts is running to long, I like to kill the subprocess. I thought it will be enough if I am passing the timeout=30 to my run(..) statement.
Here is the code:
try:
result=run(['utilities/shell_scripts/{0} {1} {2}'.format(
self.language_conf[key][1], self.proc_dir, config.main_file)],
shell=True,
check=True,
stdout=PIPE,
stderr=PIPE,
universal_newlines=True,
timeout=30,
bufsize=100)
except TimeoutExpired as timeout:
I have tested this call with some shell scripts that runs 120s. I expected the subprocess to be killed after 30s, but in fact the process is finishing the 120s script and than raises the Timeout Exception. Now the Question how can I kill the subprocess by timeout?
The documentation explicitly states that the process should be killed:
from the docs for subprocess.run:
"The timeout argument is passed to Popen.communicate(). If the timeout expires, the child process will be killed and waited for. The TimeoutExpired exception will be re-raised after the child process has terminated."
But in your case you're using shell=True, and I've seen issues like that before, because the blocking process is a child of the shell process.
I don't think you need shell=True if you decompose your arguments properly and your scripts have the proper shebang. You could try this:
result=run(
[os.path.join('utilities/shell_scripts',self.language_conf[key][1]), self.proc_dir, config.main_file], # don't compose argument line yourself
shell=False, # no shell wrapper
check=True,
stdout=PIPE,
stderr=PIPE,
universal_newlines=True,
timeout=30,
bufsize=100)
note that I can reproduce this issue very easily on Windows (using Popen, but it's the same thing):
import subprocess,time
p=subprocess.Popen("notepad",shell=True)
time.sleep(1)
p.kill()
=> notepad stays open, probably because it manages to detach from the parent shell process.
import subprocess,time
p=subprocess.Popen("notepad",shell=False)
time.sleep(1)
p.kill()
=> notepad closes after 1 second
Funnily enough, if you remove time.sleep(), kill() works even with shell=True probably because it successfully kills the shell which is launching notepad.
I'm not saying you have exactly the same issue, I'm just demonstrating that shell=True is evil for many reasons, and not being able to kill/timeout the process is one more reason.
However, if you need shell=True for a reason, you can use psutil to kill all the children in the end. In that case, it's better to use Popen so you get the process id directly:
import subprocess,time,psutil
parent=subprocess.Popen("notepad",shell=True)
for _ in range(30): # 30 seconds
if parent.poll() is not None: # process just ended
break
time.sleep(1)
else:
# the for loop ended without break: timeout
parent = psutil.Process(parent.pid)
for child in parent.children(recursive=True): # or parent.children() for recursive=False
child.kill()
parent.kill()
(source: how to kill process and child processes from python?)
that example kills the notepad instance as well.

Kill a chain of sub processes on KeyboardInterrupt

I'm having a strange problem I've encountered as I wrote a script to start my local JBoss instance.
My code looks something like this:
with open("/var/run/jboss/jboss.pid", "wb") as f:
process = subprocess.Popen(["/opt/jboss/bin/standalone.sh", "-b=0.0.0.0"])
f.write(str(process.pid))
try:
process.wait()
except KeyboardInterrupt:
process.kill()
Should be fairly simple to understand, write the PID to a file while its running, once I get a KeyboardInterrupt, kill the child process.
The problem is that JBoss keeps running in the background after I send the kill signal, as it seems that the signal doesn't propagate down to the Java process started by standalone.sh.
I like the idea of using Python to write system management scripts, but there are a lot of weird edge cases like this where if I would have written it in Bash, everything would have just worked™.
How can I kill the entire subprocess tree when I get a KeyboardInterrupt?
You can do this using the psutil library:
import psutil
#..
proc = psutil.Process(process.pid)
for child in proc.children(recursive=True):
child.kill()
proc.kill()
As far as I know the subprocess module does not offer any API function to retrieve the children spawned by subprocesses, nor does the os module.
A better way of killing the processes would probably be the following:
proc = psutil.Process(process.pid)
procs = proc.children(recursive=True)
procs.append(proc)
for proc in procs:
proc.terminate()
gone, alive = psutil.wait_procs(procs, timeout=1)
for p in alive:
p.kill()
This would give a chance to the processes to terminate correctly and when the timeout ends the remaining processes will be killed.
Note that psutil also provides a Popen class that has the same interface of subprocess.Popen plus all the extra functionality of psutil.Process. You may want to simply use that instead of subprocess.Popen. It is also safer because psutil checks that PIDs don't get reused if a process terminates, while subprocess doesn't.

Change Output Redirection of Running Process

I have a parent Python script that launches a child (which launches grandchildren), and after some time, I terminate the child, but the grandchildren continue to pump to stdout. After I kill the child, I want to suppress/redirect the stdout and stderr of the grandchildren (and all their descendants).
Here is the parent:
import time
import subprocess
proc = subprocess.Popen('./child.sh')
print("Dad: I have begotten a son!")
time.sleep(1)
proc.kill()
proc.wait()
print("Dad: My son hath died!")
time.sleep(2)
print("Dad: Why does my grandson still speak?")
Here is the child script which I cannot modify.
#!/bin/bash
./grandchild.sh &
echo "Child: I had a son!"
for (( i = 0; i < 10; i++ )); do
echo "Child: Hi Dad, meet your grandson!"
sleep 1
done
exit 0
Here is a noisy grandchild which I cannot modify.
#!/bin/bash
for (( i = 0; i < 10; i++ )); do
echo "Grandchild: Wahhh"
sleep 1
done
exit 0
I tried doing this right before killing the child:
import os
f = open(os.devnull,"w")
proc.stdout = proc.stderr = f
But it doesn't seem to work. The output is:
> ./parent.py
Dad: I have begotten a son!
Child: I had a son!
Child: Hi Dad, meet your grandson!
Grandchild: Wahhh
Dad: My son hath died!
Grandchild: Wahhh
Grandchild: Wahhh
Dad: My grandson still speaketh!
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
Grandchild: Wahhh
When you invoke subprocess.Popen you can tell it to redirect stdout and/or stderr. If you don't, it leaves them un-redirected by allowing the OS to copy from the Python process's actual STDOUT_FILENO and STDERR_FILENO (which are fixed constants, 1 and 2).
This means that if Python's fd 1 and 2 are going to your tty session (perhaps on an underlying device like /dev/pts/0 for instance), the child—and with this case, consequently, the grandchild as well—are talking directly to the same session (the same /dev/pts/0). Nothing you do in the Python process itself can change this: those are independent processes with independent, direct access to the session.
What you can do is invoke ./child.sh with redirection in place:
proc = subprocess.Popen('./child.sh', stdout=subprocess.PIPE)
Quick side-note edit: if you want to discard all output from the child and its grandchildren, open os.devnull (either as you did, or with os.open() to get a raw integer file descriptor) and connect stdout and stderr to the underlying file descriptor. If you have opened it as a Python stream:
f = open(os.devnull, "w")
then the underlying file descriptor is f.fileno():
proc = subprocess.Popen('./child.sh', stdout=f.fileno(), stderr=f.fileno())
In this case you cannot get any output from any of the processes involved.
Now file descriptor 1 in the child is connected to a pipe-entity, rather than directly to the session. (Since there is no stderr= above, fd 2 in the child is still connected directly to the session.)
The pipe-entity, which lives inside the operating system, simply copies from one end (the "write end" of the pipe) to the other (the "read end"). Your Python process has control of the read-end. You must invoke the OS read system call—often not directly, but see below—on that read end, to collect the output from it.
In general, if you stop reading from your read-end, the pipe "fills up" and any process attempting an OS-level write on the write-end is "blocked" until someone with access to the read end (that's you, again) reads from it.
If you discard the read-end, leaving the pipe with nowhere to dump its output, the write end starts returning EPIPE errors and sending SIGPIPE signals, to any process attempting an OS-level write call. This kind of discard occurs when you call the OS-level close system call, assuming you have not handed the descriptor off to some other process(es). It also occurs when your process exits (under the same assumption, again).
There is no convenient method by which you can connect the read-end to an infinite data sink like /dev/null, at least in most Unix-like systems (there are a few with some special funky system calls to do this kind of "plumbing"). But if you plan to kill the child and are willing to let its grandchildren die from SIGPIPE signals, you can simply close the descriptor (or exit) and let the chips fall where they may.
Children and grandchildren can protect themselves from dying by setting SIGPIPE to SIG_IGN, or by blocking SIGPIPE. Signal masks are inherited across exec system calls so in some cases, you can block SIGPIPE for children (but some children will unblock signals).
If closing the descriptor is not suitable, you can create a new process that simply reads and discards incoming pipe data. If you use the fork system call, this is trivial. Alternatively some Unix-like systems allow you to pass file descriptors through AF_UNIX sockets to otherwise-unrelated (parent/child-wise) processes, so you could have a daemon that does this, reachable via an AF_UNIX socket. (This is nontrivial to code.)
If you wish the child process to send its stderr output to the same pipe, so that you can read both its stdout and its stderr, simply add stderr=subprocess.STDOUT to the Popen() call. If you wish the child process to send its stderr output to a separate pipe, add stderr=subprocess.PIPE. If you do the latter, however, things can get a bit tricky.
To prevent children from blocking, as noted above, you must invoke the OS read call. If there is only one pipe this is easy:
for line in proc.stdout:
...
for instance, or:
line = proc.stdout.readline()
will read the pipe one line at a time (modulo buffering inside Python). You can read as many or as few lines as you like.
If there are two pipes, though, you must read whichever one(s) is/are "full". Python's subprocess module defines the communicate() function to do this for you:
stdout, stderr = proc.communicate()
The drawback here is that communicate() reads to completion: it needs to get all output that can go to the write end of each pipe. This means it repeatedly calls the OS-level read operation until read indicates end-of-data. That occurs only when all processes that had, at some point, write access to the write end of the corresponding pipe, have closed that end of the pipe. In other words, it waits for the child and any grandchildren to close the descriptors connected to the write end of the pipe(s).
In general it's much simpler to use only one pipe, read as much (but only as much) as you like, then simply close the pipe:
proc = subprocess.Popen('./child.sh', stdout=subprocess.PIPE)
line1 = proc.stdout.readline()
line2 = proc.stdout.readline()
# that's all we care about
proc.stdout.close()
proc.kill()
status = proc.wait()
Whether this suffices depends on your particular problem.
If you don't care about the grandchildren; you could kill them all:
#!/usr/bin/env python3
import os
import signal
import subprocess
import sys
import time
proc = subprocess.Popen('./child.sh', start_new_session=True)
print("Dad: I have begotten a son!")
time.sleep(1)
print("Dad: kill'em all!")
os.killpg(proc.pid, signal.SIGKILL)
for msg in "dead... silence... crickets... chirping...".split():
time.sleep(1)
print(msg, end=' ', flush=True)
You can emulate start_new_session=True on old Python versions using preexec_fn=os.setsid. See Best way to kill all child processes.
You can collect children's output before the killing:
#!/usr/bin/env python
import collections
import os
import signal
import threading
from subprocess import Popen, PIPE, STDOUT
def killall(proc):
print "Dad: kill'em all!"
os.killpg(proc.pid, signal.SIGKILL)
proc.wait()
proc = Popen('./child.sh', stdout=PIPE, stderr=STDOUT, preexec_fn=os.setsid)
print("Dad: I have begotten a son!")
# kill in a second
hitman = threading.Timer(1, killall, [proc])
hitman.start()
# save last 200 lines of output
q = collections.deque(proc.stdout, maxlen=200)
hitman.cancel()
proc.wait()
# print collected output
print '*'*60
print ''.join(q).decode('ascii'),
print '*'*60
See Stop reading process output in Python without hang?
Right now, your subprocess is allowed to communicate with your terminal via STDOUT and STDERR. Instead, you can hijack this data from the subprocess like so:
import subprocess
cmd = ['./child.sh']
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
This redirects all STDERR output of your child to the normal STDOUT channel, then redirects the normal STDOUT output of your child to your python script, via a PIPE. You can now read from that PIPE using line = process.stdout.readline(), which grabs a single line of output. You can print that back to STDOUT with print(line).
Once you kill your son (gasp), stop all output from your subprocess.
For a more information on subprocess, see one of my previous answers which are similar to this: python subprocess.call output is not interleaved

Proper way of re-using and closing a subprocess object

I have the following code in a loop:
while true:
# Define shell_command
p1 = Popen(shell_command, shell=shell_type, stdout=PIPE, stderr=PIPE, preexec_fn=os.setsid)
result = p1.stdout.read();
# Define condition
if condition:
break;
where shell_command is something like ls (it just prints stuff).
I have read in different places that I can close/terminate/exit a Popen object in a variety of ways, e.g. :
p1.stdout.close()
p1.stdin.close()
p1.terminate
p1.kill
My question is:
What is the proper way of closing a subprocess object once we are done using it?
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
Update
I am still a bit confused about the sequence of steps to follow depending on whether I use p1.communicate() or p1.stdout.read() to interact with my process.
From what I understood in the answers and the comments:
If I use p1.communicate() I don't have to worry about releasing resources, since communicate() would wait until the process is finished, grab the output and properly close the subprocess object
If I follow the p1.stdout.read() route (which I think fits my situation, since the shell command is just supposed to print stuff) I should call things in this order:
p1.wait()
p1.stdout.read()
p1.terminate()
Is that right?
What is the proper way of closing a subprocess object once we are done using it?
stdout.close() and stdin.close() will not terminate a process unless it exits itself on end of input or on write errors.
.terminate() and .kill() both do the job, with kill being a bit more "drastic" on POSIX systems, as SIGKILL is sent, which cannot be ignored by the application. Specific differences are explained in this blog post, for example. On Windows, there's no difference.
Also, remember to .wait() and to close the pipes after killing a process to avoid zombies and force the freeing of resources.
A special case that is often encountered are processes which read from STDIN and write their result to STDOUT, closing themselves when EOF is encountered. With these kinds of programs, it's often sensible to use subprocess.communicate:
>>> p = Popen(["sort"], stdin=PIPE, stdout=PIPE)
>>> p.communicate("4\n3\n1")
('1\n3\n4\n', None)
>>> p.returncode
0
This can also be used for programs which print something and exit right after:
>>> p = Popen(["ls", "/home/niklas/test"], stdin=PIPE, stdout=PIPE)
>>> p.communicate()
('file1\nfile2\n', None)
>>> p.returncode
0
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
I don't think the subprocess module supports this and I don't see what resources could be shared here, so I don't think it would give you a significant advantage.
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands?
Yes.
#!/usr/bin/env python
from __future__ import print_function
import uuid
import random
from subprocess import Popen, PIPE, STDOUT
MARKER = str(uuid.uuid4())
shell_command = 'echo a'
p = Popen('sh', stdin=PIPE, stdout=PIPE, stderr=STDOUT,
universal_newlines=True) # decode output as utf-8, newline is '\n'
while True:
# write next command
print(shell_command, file=p.stdin)
# insert MARKER into stdout to separate output from different shell_command
print("echo '%s'" % MARKER, file=p.stdin)
# read command output
for line in iter(p.stdout.readline, MARKER+'\n'):
if line.endswith(MARKER+'\n'):
print(line[:-len(MARKER)-1])
break # command output ended without a newline
print(line, end='')
# exit on condition
if random.random() < 0.1:
break
# cleanup
p.stdout.close()
if p.stderr:
p.stderr.close()
p.stdin.close()
p.wait()
Put while True inside try: ... finally: to perform the cleanup in case of exceptions. On Python 3.2+ you could use with Popen(...): instead.
Would that be more efficient in any way than opening new subprocess objects each time?
Does it matter in your case? Don't guess. Measure it.
The "correct" order is:
Create a thread to read stdout (and a second one to read stderr, unless you merged them into one).
Write commands to be executed by the child to stdin. If you're not reading stdout at the same time, writing to stdin can block.
Close stdin (this is the signal for the child that it can now terminate by itself whenever it is done)
When stdout returns EOF, the child has terminated. Note that you need to synchronize the stdout reader thread and your main thread.
call wait() to see if there was a problem and to clean up the child process
If you need to stop the child process for any reason (maybe the user wants to quit), then you can:
Close stdin if the child terminates when it reads EOF.
Kill the with terminate(). This is the correct solution for child processes which ignore stdin.
If the child doesn't respond, try kill()
In all three cases, you must call wait() to clean up the dead child process.
Depends on what you expect the process to do; you should always call p1.wait() in order to avoid zombies. Other steps depend on the behaviour of the subprocess; if it produces any output, you should consume the output (e.g. p1.read() ...but this would eat lots of memory) and only then call the p1.wait(); or you may wait for some timeout and call p1.terminate() to kill the process if you think it doesn't work as expected, and possible call p1.wait() to clean the zombie.
Alternatively, p1.communicate(...) would do the handling if io and waiting for you (not the killing).
Subprocess objects aren't supposed to be reused.

web.py + subprocess = hang

Here's my main file:
import subprocess, time
pipe = subprocess.PIPE
popen = subprocess.Popen('pythonw -uB test_web_app.py', stdout=pipe)
time.sleep(3)
And here's test_web_app.py:
import web
class Handler:
def GET(self): pass
app = web.application(['/', 'Handler'], globals())
app.run()
When I run the main file, the program executes, but a zombie process is left hanging and I have to kill it manually. Why is this? How can I get the Popen to die when the program ends? The Popen only hangs if I pipe stdout and sleep for a bit before the program ends.
Edit -- here's the final, working version of the main file:
import subprocess, time, atexit
pipe = subprocess.PIPE
popen = subprocess.Popen('pythonw -uB test_web_app.py', stdout=pipe)
def kill_app():
popen.kill()
popen.wait()
atexit.register(kill_app)
time.sleep(3)
You have not waited for the process. Once it's done, you have to call popen.wait.
You can check if the process is terminated using the poll method of the popen object to see if it has completed.
If you don't need the stdout of the web server process, you can simply ignore the stdout option.
You can use the atexit module to implement a hook that gets called when your main file exits. This should use the kill method of the Popen object and then wait on it to make sure that it's terminated.
If your main script doesn't need to be doing anything else while the subprocess executes I'd do:
import subprocess, time
pipe = subprocess.PIPE
popen = subprocess.Popen('pythonw -uB test_web_app.py', stdout=pipe, stderr=pipe)
out, err = popen.communicate()
(I think if you specifically pipe stdout back to your program, you need to read it at some point to avoid creating zombie processes - communicate will read it in a reasonably safe way).
Or if you don't care about parsing stdout / stderr, don't bother piping them:
popen = subprocess.Popen('pythonw -uB test_web_app.py')
popen.communicate()

Categories