How to terminate Python's `ProcessPoolExecutor` when parent process dies? - python

Is there a way to make the processes in concurrent.futures.ProcessPoolExecutor terminate if the parent process terminates for any reason?
Some details: I'm using ProcessPoolExecutor in a job that processes a lot of data. Sometimes I need to terminate the parent process with a kill command, but when I do that the processes from ProcessPoolExecutor keep running and I have to manually kill them too. My primary work loop looks like this:
with concurrent.futures.ProcessPoolExecutor(n_workers) as executor:
result_list = [executor.submit(_do_work, data) for data in data_list]
for id, future in enumerate(
concurrent.futures.as_completed(result_list)):
print(f'{id}: {future.result()}')
Is there anything I can add here or do differently to make the child processes in executor terminate if the parent dies?

You can start a thread in each process to terminate when parent process dies:
def start_thread_to_terminate_when_parent_process_dies(ppid):
pid = os.getpid()
def f():
while True:
try:
os.kill(ppid, 0)
except OSError:
os.kill(pid, signal.SIGTERM)
time.sleep(1)
thread = threading.Thread(target=f, daemon=True)
thread.start()
Usage: pass initializer and initargs to ProcessPoolExecutor
with concurrent.futures.ProcessPoolExecutor(
n_workers,
initializer=start_thread_to_terminate_when_parent_process_dies, # +
initargs=(os.getpid(),), # +
) as executor:
This works even if the parent process is SIGKILL/kill -9'ed.

I would suggest two changes:
Use a kill -15 command, which can be handled by the Python program as a SIGTERM signal rather than a kill -9 command.
Use a multiprocessing pool created with the multiprocessing.pool.Pool class, whose terminate method works quite differently than that of the concurrent.futures.ProcessPoolExecutor class in that it will kill all processes in the pool so any tasks that have been submitted and running will be also immediately terminated.
Your equivalent program using the new pool and handling a SIGTERM interrupt would be:
from multiprocessing import Pool
import signal
import sys
import os
...
def handle_sigterm(*args):
#print('Terminating...', file=sys.stderr, flush=True)
pool.terminate()
sys.exit(1)
# The process to be "killed", if necessary:
print(os.getpid(), file=sys.stderr)
pool = Pool(n_workers)
signal.signal(signal.SIGTERM, handle_sigterm)
results = pool.imap_unordered(_do_work, data_list)
for id, result in enumerate(results):
print(f'{id}: {result}')

You could run the script in a kill-cgroup. When you need to kill the whole thing, you can do so by using the cgroup's kill switch. Even a cpu-cgroup will do the trick as you can access the group's pids.
Check this article on how to use cgexec.

Related

os._exit(1) does not kill non-daemonic sibling processes

I am writing a python script which has 2 child processes. The main logic occurs in one process and another process waits for some time and then kills the main process even if the logic is not done.
I read that calling os_exit(1) stops the interpreter, so the entire script is killed automatically. I've used it like shown below:
import os
from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Array
# Main process
def main_process(shared_variable):
shared_variable.value = "mainprc"
time.sleep(20)
print("Task finished normally.")
os._exit(1)
# Timer process
def timer_process(shared_variable):
threshold_time_secs = 5
time.sleep(threshold_time_secs)
print("Timeout reached")
print("Shared variable ",shared_variable.value)
print("Task is shutdown.")
os._exit(1)
if __name__ == "__main__":
lock = Lock()
shared_variable = Array('c',"initial",lock=lock)
process_main = Process(target=main_process, args=(shared_variable))
process_timer = Process(target=timer_process, args=(shared_variable))
process_main.start()
process_timer.start()
process_timer.join()
The timer process calls os._exit but the script still waits for the main process to print "Task finished normally." before exiting.
How do I make it such that if timer process exits, the entire program is shutdown (including main process)?
Thanks.

How to handle abnormal child process termination?

I'm using python 3.7 and following this documentation. I want to have a process, which should spawn a child process, wait for it to finish a task, and get some info back. I use the following code:
if __name__ == '__main__':
q = Queue()
p = Process(target=some_func, args=(q,))
p.start()
print q.get()
p.join()
When the child process finishes correctly there is no problem, and it works great, but the problem starts when my child process is terminated before it finished.
In this case, my application is hanging on wait.
Giving a timeout to q.get() and p.join() not completely solves the issue, because I want to know immediately that the child process died and not to wait to the timeout.
Another problem is that timeout on q.get() yields an exception, which I prefer to avoid.
Can someone suggest me a more elegant way to overcome those issues?
Queue & Signal
One possibility would be registering a signal handler and use it to pass a sentinel value.
On Unix you could handle SIGCHLD in the parent, but that's not an option in your case. According to the signal module:
On Windows, signal() can only be called with SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, SIGTERM, or SIGBREAK.
Not sure if killing it through Task-Manager will translate into SIGTERM but you can give it a try.
For handling SIGTERM you would need to register the signal handler in the child.
import os
import sys
import time
import signal
from functools import partial
from multiprocessing import Process, Queue
SENTINEL = None
def _sigterm_handler(signum, frame, queue):
print("received SIGTERM")
queue.put(SENTINEL)
sys.exit()
def register_sigterm(queue):
global _sigterm_handler
_sigterm_handler = partial(_sigterm_handler, queue=queue)
signal.signal(signal.SIGTERM, _sigterm_handler)
def some_func(q):
register_sigterm(q)
print(os.getpid())
for i in range(30):
time.sleep(1)
q.put(f'msg_{i}')
if __name__ == '__main__':
q = Queue()
p = Process(target=some_func, args=(q,))
p.start()
for msg in iter(q.get, SENTINEL):
print(msg)
p.join()
Example Output:
12273
msg_0
msg_1
msg_2
msg_3
received SIGTERM
Process finished with exit code 0
Queue & Process.is_alive()
Even if this works with Task-Manager, your use-case sounds like you can't exclude force kills, so I think you're better off with an approach which doesn't rely on signals.
You can check in a loop if your process p.is_alive(), call queue.get() with a timeout specified and handle the Empty exceptions:
import os
import time
from queue import Empty
from multiprocessing import Process, Queue
def some_func(q):
print(os.getpid())
for i in range(30):
time.sleep(1)
q.put(f'msg_{i}')
if __name__ == '__main__':
q = Queue()
p = Process(target=some_func, args=(q,))
p.start()
while p.is_alive():
try:
msg = q.get(timeout=0.1)
except Empty:
pass
else:
print(msg)
p.join()
It would be also possible to avoid an exception, but I wouldn't recommend this because you don't spend your waiting time "on the queue", hence decreasing the responsiveness:
while p.is_alive():
if not q.empty():
msg = q.get_nowait()
print(msg)
time.sleep(0.1)
Pipe & Process.is_alive()
If you intend to utilize one connection per-child, it would however be possible to use a pipe instead of a queue. It's more performant than a queue
(which is mounted on top of a pipe) and you can use multiprocessing.connection.wait (Python 3.3+) to await readiness of multiple objects at once.
multiprocessing.connection.wait(object_list, timeout=None)
Wait till an object in object_list is ready. Returns the list of those objects in object_list which are ready. If timeout is a float then the call blocks for at most that many seconds. If timeout is None then it will block for an unlimited period. A negative timeout is equivalent to a zero timeout.
For both Unix and Windows, an object can appear in object_list if it is a readable Connection object;
a connected and readable socket.socket object; or
the sentinel attribute of a Process object.
A connection or socket object is ready when there is data available to be read from it, or the other end has been closed.
Unix: wait(object_list, timeout) almost equivalent select.select(object_list, [], [], timeout). The difference is that, if select.select() is interrupted by a signal, it can raise OSError with an error number of EINTR, whereas wait() will not.
Windows: An item in object_list must either be an integer handle which is waitable (according to the definition used by the documentation of the Win32 function WaitForMultipleObjects()) or it can be an object with a fileno() method which returns a socket handle or pipe handle. (Note that pipe handles and socket handles are not waitable handles.)
You can use this to await the sentinel attribute of the process and the parental end of the pipe concurrently.
import os
import time
from multiprocessing import Process, Pipe
from multiprocessing.connection import wait
def some_func(conn_write):
print(os.getpid())
for i in range(30):
time.sleep(1)
conn_write.send(f'msg_{i}')
if __name__ == '__main__':
conn_read, conn_write = Pipe(duplex=False)
p = Process(target=some_func, args=(conn_write,))
p.start()
while p.is_alive():
wait([p.sentinel, conn_read]) # block-wait until something gets ready
if conn_read.poll(): # check if something can be received
print(conn_read.recv())
p.join()

Kill a multiprocessing pool with SIGKILL instead of SIGTERM (I think)

So, I have this program that utilizes multiprocessing with multiple selenium browser windows.
Here's what the program looks like:
pool = Pool(5)
results = pool.map_async(worker,range(10))
time.sleep(10)
pool.terminate()
However, this waits for the existing process in pool to complete. I want instant termination of all the workers.
multiprocessing.Pool store worker processes list in Pool._pool attr, send a signal to them is straightforward then:
import multiprocessing
import os
import signal
def kill(pool):
# stop repopulating new child
pool._state = multiprocessing.pool.TERMINATE
pool._worker_handler._state = multiprocessing.pool.TERMINATE
for p in pool._pool:
os.kill(p.pid, signal.SIGKILL)
# .is_alive() will reap dead process
while any(p.is_alive() for p in pool._pool):
pass
pool.terminate()

A new process with multiprocessing can run in main thread?

One library that I need to use with python3 (iperf3) needs that the library 'run' function is executed in the main thread.
I'm performing some tests to verify if a new process with the multiprocessing library will let me use the main thread of the process but it seems that with the snippet above I cannot have a new 'main thread' for the process.
What would be the recommended way to run a forked process as the main thread of the new process? Is that possible? Will a system like Celery help with this? I'm planning to run this from a Flask app.
Thanks!
#! /usr/bin/python3
import threading
import multiprocessing as mp
def mp_call():
try:
print("mp_call is mainthread? {}".format(isinstance(threading.current_thread(), threading._MainThread)))
except Exception as e:
print('create iperf e:{}'.format(e))
def thread_call():
try:
print("thread_call is mainthread? {}".format(isinstance(threading.current_thread(), threading._MainThread)))
p = mp.Process(target=mp_call, args=[])
p.daemon = False
p.start()
p.join()
print('Process ended')
except Exception as e:
print('thread e:{}'.format(e))
t = threading.Thread(target=thread_call)
t.daemon = False
t.start()
t.join()
print('Thread ended')
In fact, all threads are dead after fork, you will get a new "main" thread which is your current thread, your checking method is wrong. threading._MainThread is not a public api, use threading.main_thread() instead:
assert threading.current_thread() == threading.main_thread()
because main thread got replaced after subprocess fork, no longer a _MainThread subclass.

How to kill a python child process created with subprocess.check_output() when the parent dies?

I am running on a linux machine a python script which creates a child process using subprocess.check_output() as it follows:
subprocess.check_output(["ls", "-l"], stderr=subprocess.STDOUT)
The problem is that even if the parent process dies, the child is still running.
Is there any way I can kill the child process as well when the parent dies?
Yes, you can achieve this by two methods. Both of them require you to use Popen instead of check_output. The first is a simpler method, using try..finally, as follows:
from contextlib import contextmanager
#contextmanager
def run_and_terminate_process(*args, **kwargs):
try:
p = subprocess.Popen(*args, **kwargs)
yield p
finally:
p.terminate() # send sigterm, or ...
p.kill() # send sigkill
def main():
with run_and_terminate_process(args) as running_proc:
# Your code here, such as running_proc.stdout.readline()
This will catch sigint (keyboard interrupt) and sigterm, but not sigkill (if you kill your script with -9).
The other method is a bit more complex, and uses ctypes' prctl PR_SET_PDEATHSIG. The system will send a signal to the child once the parent exits for any reason (even sigkill).
import signal
import ctypes
libc = ctypes.CDLL("libc.so.6")
def set_pdeathsig(sig = signal.SIGTERM):
def callable():
return libc.prctl(1, sig)
return callable
p = subprocess.Popen(args, preexec_fn = set_pdeathsig(signal.SIGTERM))
Your problem is with using subprocess.check_output - you are correct, you can't get the child PID using that interface. Use Popen instead:
proc = subprocess.Popen(["ls", "-l"], stdout=PIPE, stderr=PIPE)
# Here you can get the PID
global child_pid
child_pid = proc.pid
# Now we can wait for the child to complete
(output, error) = proc.communicate()
if error:
print "error:", error
print "output:", output
To make sure you kill the child on exit:
import os
import signal
def kill_child():
if child_pid is None:
pass
else:
os.kill(child_pid, signal.SIGTERM)
import atexit
atexit.register(kill_child)
Don't know the specifics, but the best way is still to catch errors (and perhaps even all errors) with signal and terminate any remaining processes there.
import signal
import sys
import subprocess
import os
def signal_handler(signal, frame):
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
a = subprocess.check_output(["ls", "-l"], stderr=subprocess.STDOUT)
while 1:
pass # Press Ctrl-C (breaks the application and is catched by signal_handler()
This is just a mockup, you'd need to catch more than just SIGINT but the idea might get you started and you'd need to check for spawned process somehow still.
http://docs.python.org/2/library/os.html#os.kill
http://docs.python.org/2/library/subprocess.html#subprocess.Popen.pid
http://docs.python.org/2/library/subprocess.html#subprocess.Popen.kill
I'd recommend rewriting a personalized version of check_output cause as i just realized check_output is really just for simple debugging etc since you can't interact so much with it during executing..
Rewrite check_output:
from subprocess import Popen, PIPE, STDOUT
from time import sleep, time
def checkOutput(cmd):
a = Popen('ls -l', shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT)
print(a.pid)
start = time()
while a.poll() == None or time()-start <= 30: #30 sec grace period
sleep(0.25)
if a.poll() == None:
print('Still running, killing')
a.kill()
else:
print('exit code:',a.poll())
output = a.stdout.read()
a.stdout.close()
a.stdin.close()
return output
And do whatever you'd like with it, perhaps store the active executions in a temporary variable and kill them upon exit with signal or other means of intecepting errors/shutdowns of the main loop.
In the end, you still need to catch terminations in the main application in order to safely kill any childs, the best way to approach this is with try & except or signal.
As of Python 3.2 there is a ridiculously simple way to do this:
from subprocess import Popen
with Popen(["sleep", "60"]) as process:
print(f"Just launched server with PID {process.pid}")
I think this will be best for most use cases because it's simple and portable, and it avoids any dependence on global state.
If this solution isn't powerful enough, then I would recommend checking out the other answers and discussion on this question or on Python: how to kill child process(es) when parent dies?, as there are a lot of neat ways to approach the problem that provide different trade-offs around portability, resilience, and simplicity. 😊
Manually you could do this:
ps aux | grep <process name>
get the PID(second column) and
kill -9 <PID>
-9 is to force killing it

Categories