I am running some shell scripts with the subprocess module in python. If the shell scripts is running to long, I like to kill the subprocess. I thought it will be enough if I am passing the timeout=30 to my run(..) statement.
Here is the code:
try:
result=run(['utilities/shell_scripts/{0} {1} {2}'.format(
self.language_conf[key][1], self.proc_dir, config.main_file)],
shell=True,
check=True,
stdout=PIPE,
stderr=PIPE,
universal_newlines=True,
timeout=30,
bufsize=100)
except TimeoutExpired as timeout:
I have tested this call with some shell scripts that runs 120s. I expected the subprocess to be killed after 30s, but in fact the process is finishing the 120s script and than raises the Timeout Exception. Now the Question how can I kill the subprocess by timeout?
The documentation explicitly states that the process should be killed:
from the docs for subprocess.run:
"The timeout argument is passed to Popen.communicate(). If the timeout expires, the child process will be killed and waited for. The TimeoutExpired exception will be re-raised after the child process has terminated."
But in your case you're using shell=True, and I've seen issues like that before, because the blocking process is a child of the shell process.
I don't think you need shell=True if you decompose your arguments properly and your scripts have the proper shebang. You could try this:
result=run(
[os.path.join('utilities/shell_scripts',self.language_conf[key][1]), self.proc_dir, config.main_file], # don't compose argument line yourself
shell=False, # no shell wrapper
check=True,
stdout=PIPE,
stderr=PIPE,
universal_newlines=True,
timeout=30,
bufsize=100)
note that I can reproduce this issue very easily on Windows (using Popen, but it's the same thing):
import subprocess,time
p=subprocess.Popen("notepad",shell=True)
time.sleep(1)
p.kill()
=> notepad stays open, probably because it manages to detach from the parent shell process.
import subprocess,time
p=subprocess.Popen("notepad",shell=False)
time.sleep(1)
p.kill()
=> notepad closes after 1 second
Funnily enough, if you remove time.sleep(), kill() works even with shell=True probably because it successfully kills the shell which is launching notepad.
I'm not saying you have exactly the same issue, I'm just demonstrating that shell=True is evil for many reasons, and not being able to kill/timeout the process is one more reason.
However, if you need shell=True for a reason, you can use psutil to kill all the children in the end. In that case, it's better to use Popen so you get the process id directly:
import subprocess,time,psutil
parent=subprocess.Popen("notepad",shell=True)
for _ in range(30): # 30 seconds
if parent.poll() is not None: # process just ended
break
time.sleep(1)
else:
# the for loop ended without break: timeout
parent = psutil.Process(parent.pid)
for child in parent.children(recursive=True): # or parent.children() for recursive=False
child.kill()
parent.kill()
(source: how to kill process and child processes from python?)
that example kills the notepad instance as well.
Related
I am using the following code to launch a subprocess :
# Run the program
subprocess_result = subprocess.run(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
check=False,
timeout=timeout,
cwd=directory,
env=env,
preexec_fn=set_memory_limits,
)
The launched subprocess is also a Python program, with a shebang.
This subprocess may last for longer than the specified timeout.
The subprocess does heavy computations and write results in a file and does not contain any signal handler.
According to the documentation https://docs.python.org/3/library/subprocess.html#subprocess.run, subprocess.run kills a child that timesout :
The timeout argument is passed to Popen.communicate(). If the timeout
expires, the child process will be killed and waited for. The
TimeoutExpired exception will be re-raised after the child process has
terminated.
When my subprocess timesout, I always receive the subprocess.TimeoutExpired exception, but from time to time the subprocess is not killed, hence still consuming resources on my machine.
So my question is, am I doing something wrong here ? If yes, what and if no, why do I have this issue and how can I solve it ?
Note : I am using Python 3.10 on Ubuntu 22_04
The most likely culprit for the behaviour you see is that the subprocess you are spawning is probably using multiprocessing and spawning its own child processes. Killing the parent process does not automatically kill the whole set of descendants. The granchildren are inherited by the init process (i.e. the process with PID 1) and will continue to run.
You can verify from the source code of suprocess.run :
with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
except TimeoutExpired as exc:
process.kill()
if _mswindows:
# Windows accumulates the output in a single blocking
# read() call run on child threads, with the timeout
# being done in a join() on those threads. communicate()
# _after_ kill() is required to collect that and add it
# to the exception.
exc.stdout, exc.stderr = process.communicate()
else:
# POSIX _communicate already populated the output so
# far into the TimeoutExpired exception.
process.wait()
raise
except: # Including KeyboardInterrupt, communicate handled that.
process.kill()
# We don't call process.wait() as .__exit__ does that for us.
raise
Here you can see at line 550 the timeout is set on the communicate call, if it fires at line 552 the subprocess is .kill()ed. The kill method sends a SIGKILL which immediately kills the subprocess without any cleanup. It's a signal that cannot be caught by the subprocess, so it's not possible that the child is somehow ignoring it.
The TimeoutException is then re-raised at line 564, so if your parent process sees this exception the subprocess is already dead.
This however says nothing of granchildren processes. Those will continue to run as children of PID 1.
I don't see any way in which you can customize how subprocess.run handles subprocess termination. For example, if it used SIGTERM instead of SIGKILL you could modify your child process or write a wrapper process that will catch the signal and properly kill all its descendants. But SIGKILL doesn't give you this luxury.
So I believe that for your use case you cannot use the subprocess.run facade but you should use Popen directly. You can look at the subprocess.run implementation and take just the things that you need, maybe dropping support for platforms you don't use.
Note: There are extremely rare situations in which the subprocesses won't die immediately on SIGKILL. I believe the only situation in which this happens is if the subprocess is performing a very long system call or other kernel operation, which might not be interrupted immediately. If the operation is in deadlock this might prevent the process from terminating forever. However I don't think that this is your case, since you did not mention that the process is stuck doing nothing, but from what you said the process simply seems to continue running.
An issue I have with Python's (3.4) subprocess.popen:
Very rarely (once in several thousands), calls to popen seem to create another forked process, in addition to the intentional process, and hanging (possibly waiting?), resulting in the intentional process becoming a zombie.
Here's the call sequence:
with subprocess.Popen(['prog', 'arg1', 'arg2'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as p:
std_out, std_err = p.communicate()
p.wait()
Note: the above call sequence is run itself from a forked process (a form of process pooling, see process list below)
The issue happens with multiple programs (7z for example) so I assume the problem is with the caller and not the callee.
prog is zombiefied, so I assume the p.wait() statement is never reached or not executed properly.
The resulting process list (ps -ef output):
my_user 18219 18212 9 16:16 pts/1 00:18:11 python3 script.py # original process
my_user 1045 18219 0 16:18 pts/1 00:00:14 python3 script.py # Intentionally forked from original (poor man's process pool) - Seems to be stuck or waiting
my_user 2834 1045 0 16:18 pts/1 00:00:00 [prog] <defunct> # Program run by subprocess.popen - Zombie
my_user 2841 1045 0 16:18 pts/1 00:00:00 python3 script.py # !!!! Should not be here, also stuck or waiting, never finishes
Edited (added code sample as requested):
The code in questions:
import os
import subprocess
pid = os.fork()
if pid == 0:
# child
file_name='test.zip'
out_dir='/tmp'
while True:
with subprocess.Popen(['7z', 'x', '-y', '-p', '-o' + out_dir, file_name], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as p:
try:
std_out, std_err = p.communicate(timeout=600)
except subprocess.TimeoutExpired:
p.kill()
std_out, std_err = p.communicate()
logging.critical('7z failed, a timeout has occurred during waiting')
except:
p.kill()
p.wait()
raise
return_code = p.poll()
# do something
else:
# parent
wpid, status = os.waitpid(pid, 0)
exit_code = status >> 8
I believe this is an effect of mixing forking and threading, which is a bad thing to do in Linux. Here are a couple references:
Is it safe to fork from within a thread?
https://rachelbythebay.com/w/2011/06/07/forked/
I believe your process is multithreaded once you import the logging module. (In my case, I was sometimes seeing my program hang while waiting on a logging futex and sometimes hang while waiting inside subprocess with the subprocess having become a zombie.) That module uses OS locks to ensure that it can be called in a thread-safe manner. Once you fork, that lock's state is inherited by the child process. So the child (which is single threaded but inherited the memory of the parent) can't acquire the logging lock because the lock was sometimes locked when the fork happened.
(I'm not super confident in my explanation. My problem went away when I switched from using multiprocessing's default fork behavior to using spawn behavior. In the latter, a child does not inherit its parent's memory, and subprocess and logging no longer caused hangs for me.)
subprocess indeed forks before running the command. This is mentionned in PEP 324 (ctrl-f for “fork”).
The reason is that the command is run using exec, which replaces the calling process by the executed one.
As you can see, it shares the same pid as the executed script, so it actually is the same process, but it is not the python interpreter that is being run.
So, as long as the child process does not return, the caller python process can't.
I'm really new to Python and I got a little problem with the subprocess class.
I'm starting an external Program with :
thread1.event.clear()
thread2.event.clear()
print "Sende Motoren STOP"
print "Gebe BILD in Auftrag"
proc = Popen(['gphoto2 --capture-image &'], shell=True, stdin=None, stdout=None, stderr=None, close_fds=True)
sleep (args.max+2)
thread1.event.set()
thread2.event.set()
sleep (args.tp-2-args.max)
My Problem is that in my shell where I Started the Python script, I still get the outputs of GPHOTO2 and I think Python is still waiting for GPHOTO to finish.
Any ideas?
The documentation for subprocess.Pope states:
stdin, stdout and stderr specify the executed programs' standard
input, standard output and standard error file handles, respectively.
[...]
With None, no redirection will occur; the child's file handles will be
inherited from the parent.
So you might want to try something along the lines of this. Which btw. blocks until completion. So you might not need the sleep() (here the wait() from subprocess.Popen might be want you want?).
import subprocess
ret_code = subprocess.call(["echo", "Hello World!"], stdout=subprocess.PIPE);
I'm having a strange problem I've encountered as I wrote a script to start my local JBoss instance.
My code looks something like this:
with open("/var/run/jboss/jboss.pid", "wb") as f:
process = subprocess.Popen(["/opt/jboss/bin/standalone.sh", "-b=0.0.0.0"])
f.write(str(process.pid))
try:
process.wait()
except KeyboardInterrupt:
process.kill()
Should be fairly simple to understand, write the PID to a file while its running, once I get a KeyboardInterrupt, kill the child process.
The problem is that JBoss keeps running in the background after I send the kill signal, as it seems that the signal doesn't propagate down to the Java process started by standalone.sh.
I like the idea of using Python to write system management scripts, but there are a lot of weird edge cases like this where if I would have written it in Bash, everything would have just worked™.
How can I kill the entire subprocess tree when I get a KeyboardInterrupt?
You can do this using the psutil library:
import psutil
#..
proc = psutil.Process(process.pid)
for child in proc.children(recursive=True):
child.kill()
proc.kill()
As far as I know the subprocess module does not offer any API function to retrieve the children spawned by subprocesses, nor does the os module.
A better way of killing the processes would probably be the following:
proc = psutil.Process(process.pid)
procs = proc.children(recursive=True)
procs.append(proc)
for proc in procs:
proc.terminate()
gone, alive = psutil.wait_procs(procs, timeout=1)
for p in alive:
p.kill()
This would give a chance to the processes to terminate correctly and when the timeout ends the remaining processes will be killed.
Note that psutil also provides a Popen class that has the same interface of subprocess.Popen plus all the extra functionality of psutil.Process. You may want to simply use that instead of subprocess.Popen. It is also safer because psutil checks that PIDs don't get reused if a process terminates, while subprocess doesn't.
I'm launching a program with subprocess on Python.
In some cases the program may freeze. This is out of my control. The only thing I can do from the command line it is launched from is CtrlEsc which kills the program quickly.
Is there any way to emulate this with subprocess? I am using subprocess.Popen(cmd, shell=True) to launch the program.
Well, there are a couple of methods on the object returned by subprocess.Popen() which may be of use: Popen.terminate() and Popen.kill(), which send a SIGTERM and SIGKILL respectively.
For example...
import subprocess
import time
process = subprocess.Popen(cmd, shell=True)
time.sleep(5)
process.terminate()
...would terminate the process after five seconds.
Or you can use os.kill() to send other signals, like SIGINT to simulate CTRL-C, with...
import subprocess
import time
import os
import signal
process = subprocess.Popen(cmd, shell=True)
time.sleep(5)
os.kill(process.pid, signal.SIGINT)
p = subprocess.Popen("echo 'foo' && sleep 60 && echo 'bar'", shell=True)
p.kill()
Check out the docs on the subprocess module for more info: http://docs.python.org/2/library/subprocess.html
You can use two signals to kill a running subprocess call i.e., signal.SIGTERM and signal.SIGKILL; for example
import subprocess
import os
import signal
import time
..
process = subprocess.Popen(..)
..
# killing all processes in the group
os.killpg(process.pid, signal.SIGTERM)
time.sleep(2)
if process.poll() is None: # Force kill if process is still alive
time.sleep(3)
os.killpg(process.pid, signal.SIGKILL)
Your question is not too clear, but If I assume that you are about to launch a process wich goes to zombie and you want to be able to control that in some state of your script. If this in the case, I propose you the following:
p = subprocess.Popen([cmd_list], shell=False)
This in not really recommanded to pass through the shell.
I would suggest you ti use shell=False, this way you risk less an overflow.
# Get the process id & try to terminate it gracefuly
pid = p.pid
p.terminate()
# Check if the process has really terminated & force kill if not.
try:
os.kill(pid, 0)
p.kill()
print "Forced kill"
except OSError, e:
print "Terminated gracefully"
Following command worked for me
os.system("pkill -TERM -P %s"%process.pid)
Try wrapping your subprocess.Popen call in a try except block. Depending on why your process is hanging, you may be able to cleanly exit. Here is a list of exceptions you can check for: Python 3 - Exceptions Handling