I have a multiprocessing.Process subclass that ignores SIGINT:
# inside the run method
signal.signal(signal.SIGINT, signal.SIG_IGN)
I don't want this process to terminate when pressing CTRL + C, so I'm trying to simulate this terminal event in my unittests by sending a SIGINT signal to this process ID:
os.kill(PID, signal.SIGINT)
But even without ignoring this signal the process is not terminating, so this test is useless, I found out from other questions that on a CTRL + C event the terminal sends SIGINT to the process group ID, but I can't do this in my case because it will also terminate the unittest process.
So why the process doesn't terminate when it receives a SIGINT from os.kill ? and should I be doing this in another way ?
The child process should terminate on receipt of SIGINT, unless it is ignoring that signal or has its own handler installed. If you are not explicitly ignoring SIGINT in the child, then it is possible that SIGINT is being ignored in the parent, and therefore in the child, because the signal disposition is inherited.
However, I have not been able to replicate your issue, in fact, I find the opposite problem: the child process terminates regardless of its signal disposition.
If the signal is sent too soon, before the child process has ignored SIGINT (in its run() method), it will be terminated. Here is some code that demonstrates the problem:
import os, time, signal
from multiprocessing import Process
class P(Process):
def run(self):
signal.signal(signal.SIGINT, signal.SIG_IGN)
return super(P, self).run()
def f():
print 'Child sleeping...'
time.sleep(10)
print 'Child done'
p = P(target=f)
p.start()
print 'Child started with PID', p.pid
print 'Killing child'
os.kill(p.pid, signal.SIGINT)
print 'Joining child'
p.join()
Output
Child started with PID 1515
Killing child
Joining child
Traceback (most recent call last):
File "p1.py", line 15, in
p.start()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 126, in __init__
code = process_obj._bootstrap()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 242, in _bootstrap
from . import util
KeyboardInterrupt
Adding a small delay with time.sleep(0.1) in the parent just before sending the SIGINT signal to the child will fix the problem. This will give the child enough time to execute the run() method in which SIGINT is ignored. Now the signal will be ignored by the child:
Child started with PID 1589
Killing child
Child sleeping...
Joining child
Child done
An alternative that requires no delays nor custom run() method is to set the parent to ignore SIGINT, start the child, then restore the parent's original SIGINT handler. Because the signal disposition is inherited, the child will ignore SIGINT from the moment it starts:
import os, time, signal
from multiprocessing import Process
def f():
print 'Child sleeping...'
time.sleep(10)
print 'Child done'
p = Process(target=f)
old_sigint = signal.signal(signal.SIGINT, signal.SIG_IGN)
p.start()
signal.signal(signal.SIGINT, old_sigint) # restore parent's handler
print 'Child started with PID', p.pid
print 'Killing child'
os.kill(p.pid, signal.SIGINT)
print 'Joining child'
p.join()
Output
Child started with PID 1660
Killing child
Joining child
Child sleeping...
Child done
A simplified version of the issue is:
import os, time, signal
childpid = os.fork()
if childpid == 0:
# in the child
time.sleep(5) # will be interrupted by KeyboardInterrupt
print "stop child"
else:
# in the parent
#time.sleep(1)
os.kill(childpid, signal.SIGINT)
If the parent does sleep(1) before sending the signal, everything works as expected: the child (and only the child) receives a Python KeyboardInterrupt exception, which interrupts the sleep(5). However, if we comment out sleep(1) as in the example above, the kill() appears to be completely ignored: the child runs, sleeps 5 seconds, and finally prints "stop child". So a simple workaround is possible for your test suite: simply add a small sleep().
As far as I understand it, this occurs for the following (bad) reason: looking at the CPython source code, after the system call fork(), the child process explicitly clears the list of pending signals. But the following situation seems to occur often: the parent continues slightly ahead of the child, and sends the SIGINT signal. The child receives it, but at that point it is still only shortly after the system call fork(), and before the _clear_pending_signals(). As a result, the signal is lost.
This could be regarded as a CPython bug, if you feel like filing an issue on http://bugs.python.org . See PyOS_AfterFork() in signalmodule.c.
Related
I have an architecture where the main process can spawn children process.
The main process sends computation requests to the children via Pipe.
Here is my current code for the child process:
while True:
try:
# not sufficient because conn.recv() is blocking
if self.close_event.is_set():
break
fun, args = self.conn.recv()
# some heavy computation
res = getattr(ds, fun)(*args)
self.conn.send(res)
except EOFError as err:
# should be raised by conn.recv() if connection is closed
# but it never happens
break
and how it is initialized in the main process:
def init_worker(self):
close_event = DefaultCtxEvent()
conn_parent, conn_child = Pipe()
process = WorkerProcess(
i, self.nb_workers, conn_child, close_event, arguments=self.arguments)
process.daemon = True
process.start()
# close the side we don't use
conn_child.close()
# Remember the side we need
self.conn = conn_parent
I have a clean method that should close all child like so from the main process:
def clean(self):
self.conn.close()
# waiting for the loop to break for a clean exit
self.child_process.join()
However, the call to conn.recv() blocks and never throws an error as I would expect.
I may be confusing the behaviour of "conn_parent" and "conn_children" somehow?
How to properly close the children connection?
Edit: a possible solution is to explicitely send a message with a content like "_break". The loop receive the message via conn.recv() and breaks. Is that a "normal" pattern? As a bonus, is there a way to kill a potentially long running method without terminating the process?
apperantly there's a problem with linux Pipes, because the child forks the parent's connection, it's still open and need to be closed explicitly on the child's side.
this is just a dummy example of how it can be done.
from multiprocessing import Pipe, Process
def worker_func(parent_conn, child_conn):
parent_conn.close() # close parent connection forked in child
while True:
try:
a = child_conn.recv()
except EOFError:
print('child cancelled')
break
else:
print(a)
if __name__ == "__main__":
parent_conn, child_conn = Pipe()
child = Process(target=worker_func, args=(parent_conn, child_conn,))
child.start()
child_conn.close()
parent_conn.send("a")
parent_conn.close()
child.join()
print('child done')
a
child cancelled
child done
this is not required on windows, or when linux uses "spawn" for creating workers, because the child won't fork the parent connection, but this code will work on any system with any worker creation strategy.
Is there a way to make the processes in concurrent.futures.ProcessPoolExecutor terminate if the parent process terminates for any reason?
Some details: I'm using ProcessPoolExecutor in a job that processes a lot of data. Sometimes I need to terminate the parent process with a kill command, but when I do that the processes from ProcessPoolExecutor keep running and I have to manually kill them too. My primary work loop looks like this:
with concurrent.futures.ProcessPoolExecutor(n_workers) as executor:
result_list = [executor.submit(_do_work, data) for data in data_list]
for id, future in enumerate(
concurrent.futures.as_completed(result_list)):
print(f'{id}: {future.result()}')
Is there anything I can add here or do differently to make the child processes in executor terminate if the parent dies?
You can start a thread in each process to terminate when parent process dies:
def start_thread_to_terminate_when_parent_process_dies(ppid):
pid = os.getpid()
def f():
while True:
try:
os.kill(ppid, 0)
except OSError:
os.kill(pid, signal.SIGTERM)
time.sleep(1)
thread = threading.Thread(target=f, daemon=True)
thread.start()
Usage: pass initializer and initargs to ProcessPoolExecutor
with concurrent.futures.ProcessPoolExecutor(
n_workers,
initializer=start_thread_to_terminate_when_parent_process_dies, # +
initargs=(os.getpid(),), # +
) as executor:
This works even if the parent process is SIGKILL/kill -9'ed.
I would suggest two changes:
Use a kill -15 command, which can be handled by the Python program as a SIGTERM signal rather than a kill -9 command.
Use a multiprocessing pool created with the multiprocessing.pool.Pool class, whose terminate method works quite differently than that of the concurrent.futures.ProcessPoolExecutor class in that it will kill all processes in the pool so any tasks that have been submitted and running will be also immediately terminated.
Your equivalent program using the new pool and handling a SIGTERM interrupt would be:
from multiprocessing import Pool
import signal
import sys
import os
...
def handle_sigterm(*args):
#print('Terminating...', file=sys.stderr, flush=True)
pool.terminate()
sys.exit(1)
# The process to be "killed", if necessary:
print(os.getpid(), file=sys.stderr)
pool = Pool(n_workers)
signal.signal(signal.SIGTERM, handle_sigterm)
results = pool.imap_unordered(_do_work, data_list)
for id, result in enumerate(results):
print(f'{id}: {result}')
You could run the script in a kill-cgroup. When you need to kill the whole thing, you can do so by using the cgroup's kill switch. Even a cpu-cgroup will do the trick as you can access the group's pids.
Check this article on how to use cgexec.
I am writing a python script which has 2 child processes. The main logic occurs in one process and another process waits for some time and then kills the main process even if the logic is not done.
I read that calling os_exit(1) stops the interpreter, so the entire script is killed automatically. I've used it like shown below:
import os
from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Array
# Main process
def main_process(shared_variable):
shared_variable.value = "mainprc"
time.sleep(20)
print("Task finished normally.")
os._exit(1)
# Timer process
def timer_process(shared_variable):
threshold_time_secs = 5
time.sleep(threshold_time_secs)
print("Timeout reached")
print("Shared variable ",shared_variable.value)
print("Task is shutdown.")
os._exit(1)
if __name__ == "__main__":
lock = Lock()
shared_variable = Array('c',"initial",lock=lock)
process_main = Process(target=main_process, args=(shared_variable))
process_timer = Process(target=timer_process, args=(shared_variable))
process_main.start()
process_timer.start()
process_timer.join()
The timer process calls os._exit but the script still waits for the main process to print "Task finished normally." before exiting.
How do I make it such that if timer process exits, the entire program is shutdown (including main process)?
Thanks.
def daemon_start(pid_file, log_file):
def handle_exit(signum, _):
if signum == signal.SIGTERM:
sys.exit(0)
sys.exit(1)
signal.signal(signal.SIGINT, handle_exit)
signal.signal(signal.SIGTERM, handle_exit)
# fork only once because we are sure parent will exit
pid = os.fork()
assert pid != -1
if pid > 0:
# parent waits for its child
time.sleep(5)
sys.exit(0)
# child signals its parent to exit
ppid = os.getppid()
pid = os.getpid()
if write_pid_file(pid_file, pid) != 0:
os.kill(ppid, signal.SIGINT)
sys.exit(1)
os.setsid()
signal.signal(signal.SIGHUP, signal.SIG_IGN)
print('started')
os.kill(ppid, signal.SIGTERM)
sys.stdin.close()
try:
freopen(log_file, 'a', sys.stdout)
freopen(log_file, 'a', sys.stderr)
except IOError as e:
shell.print_exception(e)
sys.exit(1)
This daemon does not use double fork. It says "fork only once because we are sure parent will exit". Parent calls sys.exit(0) to exit.However child calls os.kill(ppid, signal.SIGTERM) to exit parent.
What does it mean by doing this?
The phrase "double fork" is a standard technique to ensure a daemon is reparented to the init (pid 1) process so that the shell which launched it does not kill it. This is actually using that technique because the first fork is done by the process that launched the python program. When a program calls daemon_start it forks. The original (now parent) process exits a few seconds later or sooner when the child it forked signals it. That will cause the kernel to reparent the child process to pid 1. "Double fork" does not mean the daemon calls fork() twice.
Also, your subject line asks "why does this function kill parent twice?" But the code in question does no such thing. I have no idea how you got that idea.
I am running on a linux machine a python script which creates a child process using subprocess.check_output() as it follows:
subprocess.check_output(["ls", "-l"], stderr=subprocess.STDOUT)
The problem is that even if the parent process dies, the child is still running.
Is there any way I can kill the child process as well when the parent dies?
Yes, you can achieve this by two methods. Both of them require you to use Popen instead of check_output. The first is a simpler method, using try..finally, as follows:
from contextlib import contextmanager
#contextmanager
def run_and_terminate_process(*args, **kwargs):
try:
p = subprocess.Popen(*args, **kwargs)
yield p
finally:
p.terminate() # send sigterm, or ...
p.kill() # send sigkill
def main():
with run_and_terminate_process(args) as running_proc:
# Your code here, such as running_proc.stdout.readline()
This will catch sigint (keyboard interrupt) and sigterm, but not sigkill (if you kill your script with -9).
The other method is a bit more complex, and uses ctypes' prctl PR_SET_PDEATHSIG. The system will send a signal to the child once the parent exits for any reason (even sigkill).
import signal
import ctypes
libc = ctypes.CDLL("libc.so.6")
def set_pdeathsig(sig = signal.SIGTERM):
def callable():
return libc.prctl(1, sig)
return callable
p = subprocess.Popen(args, preexec_fn = set_pdeathsig(signal.SIGTERM))
Your problem is with using subprocess.check_output - you are correct, you can't get the child PID using that interface. Use Popen instead:
proc = subprocess.Popen(["ls", "-l"], stdout=PIPE, stderr=PIPE)
# Here you can get the PID
global child_pid
child_pid = proc.pid
# Now we can wait for the child to complete
(output, error) = proc.communicate()
if error:
print "error:", error
print "output:", output
To make sure you kill the child on exit:
import os
import signal
def kill_child():
if child_pid is None:
pass
else:
os.kill(child_pid, signal.SIGTERM)
import atexit
atexit.register(kill_child)
Don't know the specifics, but the best way is still to catch errors (and perhaps even all errors) with signal and terminate any remaining processes there.
import signal
import sys
import subprocess
import os
def signal_handler(signal, frame):
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
a = subprocess.check_output(["ls", "-l"], stderr=subprocess.STDOUT)
while 1:
pass # Press Ctrl-C (breaks the application and is catched by signal_handler()
This is just a mockup, you'd need to catch more than just SIGINT but the idea might get you started and you'd need to check for spawned process somehow still.
http://docs.python.org/2/library/os.html#os.kill
http://docs.python.org/2/library/subprocess.html#subprocess.Popen.pid
http://docs.python.org/2/library/subprocess.html#subprocess.Popen.kill
I'd recommend rewriting a personalized version of check_output cause as i just realized check_output is really just for simple debugging etc since you can't interact so much with it during executing..
Rewrite check_output:
from subprocess import Popen, PIPE, STDOUT
from time import sleep, time
def checkOutput(cmd):
a = Popen('ls -l', shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT)
print(a.pid)
start = time()
while a.poll() == None or time()-start <= 30: #30 sec grace period
sleep(0.25)
if a.poll() == None:
print('Still running, killing')
a.kill()
else:
print('exit code:',a.poll())
output = a.stdout.read()
a.stdout.close()
a.stdin.close()
return output
And do whatever you'd like with it, perhaps store the active executions in a temporary variable and kill them upon exit with signal or other means of intecepting errors/shutdowns of the main loop.
In the end, you still need to catch terminations in the main application in order to safely kill any childs, the best way to approach this is with try & except or signal.
As of Python 3.2 there is a ridiculously simple way to do this:
from subprocess import Popen
with Popen(["sleep", "60"]) as process:
print(f"Just launched server with PID {process.pid}")
I think this will be best for most use cases because it's simple and portable, and it avoids any dependence on global state.
If this solution isn't powerful enough, then I would recommend checking out the other answers and discussion on this question or on Python: how to kill child process(es) when parent dies?, as there are a lot of neat ways to approach the problem that provide different trade-offs around portability, resilience, and simplicity. 😊
Manually you could do this:
ps aux | grep <process name>
get the PID(second column) and
kill -9 <PID>
-9 is to force killing it