I have a script that repeatedly runs an Ant buildfile and scrapes output into a parsable format. When I create the subprocess using Popen, there is a small time window where hitting Ctrl+C will kill the script, but will not kill the subprocess running Ant, leaving a zombie that is printing output to the console that can only be killed using Task Manager. Once Ant has started printing output, hitting Ctrl+C will always kill my script as well as Ant. Is there a way to make it so that hitting Ctrl+C will always kill the subprocess running Ant without leaving a zombie behind?
Also of note: I have a handler for SIGINT that performs a few cleanup operations before calling exit(0). If I manually kill the subprocess in the handler using os.kill(p.pid, signal.SIGTERM) (not SIGINT), then I can successfully kill the subprocess in situations where it would normally zombify. However, when you hit Ctrl+C once Ant has started producing output, you get a stacktrace from subprocess where it is unable to kill the subprocess itself as I have already killed it.
EDIT: My code looked something like:
p = Popen('ls')
def handle_sig_int(signum, stack_frame):
# perform cleanup
os.kill(p.pid, signal.SIGTERM)
exit(0)
signal.signal(signal.SIGINT, handle_sig_int)
p.wait()
Which would produce the following stacktrace when triggered incorrectly:
File "****.py", line ***, in run_test
p.wait()
File "/usr/lib/python2.5/subprocess.py", line 1122, in wait
pid, sts = os.waitpid(self.pid, 0)
File "****.py", line ***, in handle_sig_int
os.kill(p.pid, signal.SIGTERM)
I fixed it by catching the OSError raised by p.wait and exiting:
try:
p.wait()
except OSError:
exit('The operation was interrupted by the user')
This seems to work in the vast majority of my test runs. I occasionally get a uname: write error: Broken pipe, though I don't know what causes it. It seems to happen if I time the Ctrl+C just right before the child process can start displaying output.
Call p.terminate() in your SIGTERM handler:
if p.poll() is None: # Child still around?
p.terminate() # kill it
[EDIT] Since you're stuck with Python 2.5, use os.kill(p.pid, signal.SIGTERM) instead of p.terminate(). The check should make sure you don't get an exception (or reduce the number of times you get one).
To make it even better, you can catch the exception and check the message. If it means "child process not found", then ignore the exception. Otherwise, rethrow it with raise (no arguments).
Related
I am using the following code to launch a subprocess :
# Run the program
subprocess_result = subprocess.run(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
check=False,
timeout=timeout,
cwd=directory,
env=env,
preexec_fn=set_memory_limits,
)
The launched subprocess is also a Python program, with a shebang.
This subprocess may last for longer than the specified timeout.
The subprocess does heavy computations and write results in a file and does not contain any signal handler.
According to the documentation https://docs.python.org/3/library/subprocess.html#subprocess.run, subprocess.run kills a child that timesout :
The timeout argument is passed to Popen.communicate(). If the timeout
expires, the child process will be killed and waited for. The
TimeoutExpired exception will be re-raised after the child process has
terminated.
When my subprocess timesout, I always receive the subprocess.TimeoutExpired exception, but from time to time the subprocess is not killed, hence still consuming resources on my machine.
So my question is, am I doing something wrong here ? If yes, what and if no, why do I have this issue and how can I solve it ?
Note : I am using Python 3.10 on Ubuntu 22_04
The most likely culprit for the behaviour you see is that the subprocess you are spawning is probably using multiprocessing and spawning its own child processes. Killing the parent process does not automatically kill the whole set of descendants. The granchildren are inherited by the init process (i.e. the process with PID 1) and will continue to run.
You can verify from the source code of suprocess.run :
with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
except TimeoutExpired as exc:
process.kill()
if _mswindows:
# Windows accumulates the output in a single blocking
# read() call run on child threads, with the timeout
# being done in a join() on those threads. communicate()
# _after_ kill() is required to collect that and add it
# to the exception.
exc.stdout, exc.stderr = process.communicate()
else:
# POSIX _communicate already populated the output so
# far into the TimeoutExpired exception.
process.wait()
raise
except: # Including KeyboardInterrupt, communicate handled that.
process.kill()
# We don't call process.wait() as .__exit__ does that for us.
raise
Here you can see at line 550 the timeout is set on the communicate call, if it fires at line 552 the subprocess is .kill()ed. The kill method sends a SIGKILL which immediately kills the subprocess without any cleanup. It's a signal that cannot be caught by the subprocess, so it's not possible that the child is somehow ignoring it.
The TimeoutException is then re-raised at line 564, so if your parent process sees this exception the subprocess is already dead.
This however says nothing of granchildren processes. Those will continue to run as children of PID 1.
I don't see any way in which you can customize how subprocess.run handles subprocess termination. For example, if it used SIGTERM instead of SIGKILL you could modify your child process or write a wrapper process that will catch the signal and properly kill all its descendants. But SIGKILL doesn't give you this luxury.
So I believe that for your use case you cannot use the subprocess.run facade but you should use Popen directly. You can look at the subprocess.run implementation and take just the things that you need, maybe dropping support for platforms you don't use.
Note: There are extremely rare situations in which the subprocesses won't die immediately on SIGKILL. I believe the only situation in which this happens is if the subprocess is performing a very long system call or other kernel operation, which might not be interrupted immediately. If the operation is in deadlock this might prevent the process from terminating forever. However I don't think that this is your case, since you did not mention that the process is stuck doing nothing, but from what you said the process simply seems to continue running.
I have a compiled program I launch using python sh as a background process. I want to run it for 20 seconds, then kill it. I always get an exception I can't catch. The code looks like
cmd = sh.Command('./rtlogger')
try:
p = cmd('config.txt', _bg=True, _out='/dev/null', _err='/dev/null', _timeout=20)
p.wait()
except sh.TimeoutException:
print('caught timeout')
I have also tried to use p.kill() and p.terminate() after catching the timeout exception. I see a stack trace that ends in SignalException_SIGKILL. I can't seem to catch that. The stack trace references none of my code. Also, the text comes to the screen even though I'm routing stdout and stderr to /dev/null.
The program seems to run OK. The logger collects the data but I want eliminate or catch the exception. Any advice appreciated.
_timeout for the original invocation only applies when the command is run synchronously, in the foreground. When you run a command asynchronously, in the background, with _bg=True, you need to pass timeout to the wait call instead, e.g.:
cmd = sh.Command('./rtlogger')
try:
p = cmd('config.txt', _bg=True, _out='/dev/null', _err='/dev/null')
p.wait(timeout=20)
except sh.TimeoutException:
print('caught timeout')
Of course, in this case, you're not taking advantage of it being in the background (no work is done between launch and wait), so you may as well run it in the foreground and leave the _timeout on the invocation:
cmd = sh.Command('./rtlogger')
try:
p = cmd('config.txt', _out='/dev/null', _err='/dev/null', _timeout=20)
except sh.TimeoutException:
print('caught timeout')
You don't need to explicitly kill or terminate the child process; the _timeout_signal argument is used to signal the child on timeout (defaulting to signal.SIGKILL). You can change it to another signal if SIGKILL is not what you desire, but you don't need to call kill/terminate yourself either way; the act of timing out sends the signal for you.
I wrote a simple python script ./vader-shell which uses subprocess.Popen to launch a spark-shell and I have to deal with KeyboardInterrupt, since otherwise the child process would not die
command = ['/opt/spark/current23/bin/spark-shell']
command.extend(params)
p = subprocess.Popen(command)
try:
p.communicate()
except KeyboardInterrupt:
p.terminate()
This is what I see with ps f
When I actually interrupt with ctrl-C, I see the processes dying (most of the time). However the terminal starts acting weird: I don't see any cursor, and all the lines starts to appear randomly
I am really lost in what is the best way to run a subprocess with this library and how to handle killing of the child processes. What I want to achieve is basic: whenever my python process is killed with a ctrl-C, I want all the family of process being killed. I googled several solutions os.kill, p.wait() after termination, calling subprocess.Popen(['reset']) after termination but none of them worked.
Do you know what is the best way to kill when KeyboardInterrupt happens? Or do you know any other more reliable library to use to spin-up processes?
There is nothing blatantly wrong with your code, the problem is that the command you are launching tries to do stuff with the current terminal, and does not correctly restore the settings where shutting down. Replacing your command with a "sleep" like below will run just fine and stop on Ctrl+C without problems:
import subprocess
command = ['/bin/bash']
command.extend(['-c', 'sleep 600'])
p = subprocess.Popen(command)
try:
p.communicate()
except KeyboardInterrupt:
p.terminate()
I don't know what you're trying to do with spark-shell, but if you don't need it's output you could try to redirect it to /dev/null so that it's doesn't mess up the terminal display:
p = subprocess.Popen(command, stdout=subprocess.DEVNULL)
I'm launching a subprocess with Popen, and I expect it to finish exit. However, not only does the process not exit, but sending sigkill to it still leaves it alive! Below is a script that demonstrates:
from subprocess import Popen
import os
import time
import signal
command = ["python","--version"]
process = Popen(command)
pid = process.pid
time.sleep(5) #ample time to finish
print pid
print "Sending sigkill"
os.kill(pid,signal.SIGKILL)
try:
#Kill with signal 0 just checks whether process exists
os.kill(pid,0)
print "Process still alive immediately after (not so bad...)!"
except Exception as e:
print "Succeeded in terminating child quickly!"
time.sleep(20) #Give it ample time to die
#Kill with signal 0 just checks whether process exists
try:
os.kill(pid,0)
print "Process still alive! That's bad!"
except Exception as e:
print "Succeeded in terminating child!"
For me, this prints:
77881
Python 2.7.10
Sending sigkill
Process still alive immediately after (not so bad...)!
Process still alive! That's bad!
Not only can this script verify that the child is still alive after it should have finished, but I can use ps on the process id that's printed and see that it still exists. Oddly, ps lists the process name as (Python) (note the parenthesis).
You need to either call process.wait(), or use signal.signal(signal.SIGCHLD, signal.SIG_IGN) once to indicate that you don't intend to wait for any children. The former is portable; the latter only works on Unix (but is POSIX-standard). If you do neither of these things, on Unix the process will hang around as a zombie, and Windows has a similar behavior if you keep the process handle open.
I am executing a bash script in Python using the tempfile and subprocess like so:
with tempfile.NamedTemporaryFile() as scriptfile:
scriptfile.write(teststr)
scriptfile.flush()
subprocess.call(['/bin/bash', scriptfile.name])
Here, teststr has the entire bash script within it.
My question is, once it starts to execute, it doesn't capture keyboard interrupts like Ctrl+c and ctrl+z.
Is there anyway to interrupt the execution of the script once it has begun?
I assume that the problem is that Python parent process receives SIGINT from Ctrl+C and quits with unhandled exception, but the child ignores signal and keeps running. That is the only scenario I was able to reproduce. Actual problem may differ. Catching exception and killing subprocess explicitly with SIGKILL may work.
Instead of subprocess.call:
proc = subprocess.Popen(['/bin/bash', scriptfile.name])
try:
proc.wait()
except:
proc.kill()
raise