Cleanup child processes in multiprocessing

Cleanup child processes in multiprocessing - python

Consider simple setup of a child process. Basically, it is a producer(parent)-consumer(child) scenario.
class Job:
def start_process(self):
self.queue = multiprocessing.Queue(3)
self.process = multiprocessing.Process(target=run,
args=(self.queue))
def run(queue):
while True:
item = queue.get()
....
If I do kill -9 on the parent process the child will hang forever. I was sure that it will receive SIGHUP like with subprocess.Popen - when the python process quits the popened will quit as well. Any idea how to fix child cleanup?

If the daemon param doesn't work for you, you can catch a SIGINT signal and have it set a boolean value to exit the while loop in your children. ie..
import signal
g_run_loops = True
def signal_handler(signum, frame):
global g_run_loops
g_run_loops = False
signal.signal(signal.SIGINT, signal_handler)
def run(queue):
global g_run_loops
while g_run_loops:
item = queue.get()
....
Note that this won't work for SIGKILL (kill -9) but should work for SIGINT (kill -2).

Related

How to terminate Python's `ProcessPoolExecutor` when parent process dies?

Is there a way to make the processes in concurrent.futures.ProcessPoolExecutor terminate if the parent process terminates for any reason?
Some details: I'm using ProcessPoolExecutor in a job that processes a lot of data. Sometimes I need to terminate the parent process with a kill command, but when I do that the processes from ProcessPoolExecutor keep running and I have to manually kill them too. My primary work loop looks like this:
with concurrent.futures.ProcessPoolExecutor(n_workers) as executor:
result_list = [executor.submit(_do_work, data) for data in data_list]
for id, future in enumerate(
concurrent.futures.as_completed(result_list)):
print(f'{id}: {future.result()}')
Is there anything I can add here or do differently to make the child processes in executor terminate if the parent dies?

You can start a thread in each process to terminate when parent process dies:
def start_thread_to_terminate_when_parent_process_dies(ppid):
pid = os.getpid()
def f():
while True:
try:
os.kill(ppid, 0)
except OSError:
os.kill(pid, signal.SIGTERM)
time.sleep(1)
thread = threading.Thread(target=f, daemon=True)
thread.start()
Usage: pass initializer and initargs to ProcessPoolExecutor
with concurrent.futures.ProcessPoolExecutor(
n_workers,
initializer=start_thread_to_terminate_when_parent_process_dies, # +
initargs=(os.getpid(),), # +
) as executor:
This works even if the parent process is SIGKILL/kill -9'ed.

I would suggest two changes:
Use a kill -15 command, which can be handled by the Python program as a SIGTERM signal rather than a kill -9 command.
Use a multiprocessing pool created with the multiprocessing.pool.Pool class, whose terminate method works quite differently than that of the concurrent.futures.ProcessPoolExecutor class in that it will kill all processes in the pool so any tasks that have been submitted and running will be also immediately terminated.
Your equivalent program using the new pool and handling a SIGTERM interrupt would be:
from multiprocessing import Pool
import signal
import sys
import os
...
def handle_sigterm(*args):
#print('Terminating...', file=sys.stderr, flush=True)
pool.terminate()
sys.exit(1)
# The process to be "killed", if necessary:
print(os.getpid(), file=sys.stderr)
pool = Pool(n_workers)
signal.signal(signal.SIGTERM, handle_sigterm)
results = pool.imap_unordered(_do_work, data_list)
for id, result in enumerate(results):
print(f'{id}: {result}')

You could run the script in a kill-cgroup. When you need to kill the whole thing, you can do so by using the cgroup's kill switch. Even a cpu-cgroup will do the trick as you can access the group's pids.
Check this article on how to use cgexec.

subprocess.Popen.send_signal(CTRL_C_EVENT) does not work

I have a piece of python code that should spawn an interruptible task in a child process:
class Script:
'''
Class instantiated by the parent process
'''
def __init__(self, *args, **kwargs):
self.process = None
self.start()
def __del__(self):
if self.process:
if self.process.poll() is None:
self.stop()
def start(self):
popen_kwargs = {
'executable': sys.executable,
'creationflags': 0 * subprocess.CREATE_DEFAULT_ERROR_MODE
| subprocess.CREATE_NEW_PROCESS_GROUP,
}
self.process = subprocess.Popen(['python', os.path.realpath(__file__)],
**popen_kwargs)
def stop(self):
if not self.process:
return
try:
self.process.send_signal(signal.CTRL_C_EVENT)
self.process.wait()
self.process = None
except KeyboardInterrupt:
pass
class ScriptSubprocess:
def __init__(self):
self.stop = False
def run(self):
try:
while not self.stop:
# ...
except KeyboardInterrupt:
# interrupted!
pass
finally:
# make a *clean* exit if interrupted
self.stop = True
if __name__ == '__main__':
p = ScriptSubprocess()
p.run()
del p
and it works fine in a standalone python interpreter.
The problem arises when I move this code in the real application, which has an embedded Python interpreter.
In this case, it hangs when trying to stop the child process at the line self.process.wait(), indicating the previous line self.process.send_signal(signal.CTRL_C_EVENT) did not work and in fact the child process is still running and if I manually terminate it via task manager, the call to self.process.wait() returns as if it has succeeded stopping the child process.
I am looking for possible causes (e.g. some process flag of the parent process) that disables CTRL_C_EVENT.
The documentation of subprocess says:
Popen.send_signal(signal)
Sends the signal signal to the child.
Do nothing if the process completed.
Note On Windows, SIGTERM is an alias for terminate(). CTRL_C_EVENT and
CTRL_BREAK_EVENT can be sent to processes started with a creationflags
parameter which includes CREATE_NEW_PROCESS_GROUP.
and also:
subprocess.CREATE_NEW_PROCESS_GROUP
A Popen creationflags parameter to specify that a new process group will be created. This flag is necessary for using os.kill() on the subprocess.
This flag is ignored if CREATE_NEW_CONSOLE is specified.
So I am using creationflags with subprocess.CREATE_NEW_PROCESS_GROUP, but still it is unable to kill the subprocess with CTRL_C_EVENT in the real application. (same as without this flag)
Since the real application (i.e. the parent process) is also using SetConsoleCtrlHandler to handle certain signals, I also try to pass creationflags with subprocess.CREATE_DEFAULT_ERROR_MODE to override that error mode in the child process, but still unable to kill the child process with CTRL_C_EVENT.
Note: CTRL_BREAK_EVENT works but does not give a clean exit (i.e. the finally: clause is not executed).
My guess is that SetConsoleCtrlHandler is the culprit, but I have no means of avoiding that being called in the parent process, or undoing its effect...

python picamera, keyboard ctrl+c/sigint not caught

From the pycamera docs I took the example for fast capture and processing and added a sigint event handler to catch the keyboard interrupt:
import io
import time
import threading
import picamera
# Create a pool of image processors
done = False
lock = threading.Lock()
pool = []
def signal_handler(signal, frame):
global done
print 'You pressed Ctrl+C!'
done=True
sys.exit()
signal.signal(signal.SIGINT, signal_handler)
class ImageProcessor(threading.Thread):
def __init__(self):
super(ImageProcessor, self).__init__()
self.stream = io.BytesIO()
self.event = threading.Event()
self.terminated = False
self.daemon=True;
self.start()
def run(self):
# This method runs in a separate thread
global done
while not self.terminated:
# Wait for an image to be written to the stream
if self.event.wait(1):
try:
self.stream.seek(0)
# Read the image and do some processing on it
#Image.open(self.stream)
#...
#...
# Set done to True if you want the script to terminate
# at some point
#done=True
finally:
# Reset the stream and event
self.stream.seek(0)
self.stream.truncate()
self.event.clear()
# Return ourselves to the pool
with lock:
pool.append(self)
def streams():
while not done:
with lock:
if pool:
processor = pool.pop()
else:
processor = None
if processor:
yield processor.stream
processor.event.set()
else:
# When the pool is starved, wait a while for it to refill
time.sleep(0.1)
with picamera.PiCamera() as camera:
pool = [ImageProcessor() for i in range(4)]
camera.resolution = (640, 480)
camera.framerate = 30
camera.start_preview()
time.sleep(2)
camera.capture_sequence(streams(), use_video_port=True)
# Shut down the processors in an orderly fashion
while pool:
with lock:
processor = pool.pop()
processor.terminated = True
processor.join()
but the interrupt signal is never caught.
Until the camera.capture_sequence(streams(), use_video_port=True) runs the signal is caught, after capture_sequence is started the signal handler is not called.
I'm new to python so maybe the answer is simple. What am i doing wrong in here?
EDIT:
If i remove the following code the signal is caught:
yield processor.stream

The problem there is that you are using thread.join(), it block the main thread,which means your program have to wait until that thread you joined finishes to continue.
The signals will always be caught by the main process, because it's the one that receives the signals, it's the process that has threads.
There are plenty of answer about how to deal with main thread and CTRL+C,and i give you three options,
First,add timeout to join() call:
thread1.join(60) detail here
Second, start a new process to deal with signal to kill the program.
class Watcher():
def __init__(self):
self.child = os.fork()
if self.child == 0:
return
else:
self.watch()
def watch(self):
try:
os.wait()
except KeyboardInterrupt:
self.kill()
sys.exit()
def kill(self):
try:
os.kill(self.child, signal.SIGKILL)
except OSError:
pass
start a Watcher before you start work thread,like
def main():
init()
Watcher()
start_your_thread1()
start_your_thread2()
start_your_thread3()
The final,your original way,the complicate Producer and Consumer way.
just delete the final join(),and add some task for the main thread.
i prefer the second option,it's easy use,and solves two problems with multithreaded programs in Python, (1) a signal might be delivered to any thread (which is just a malfeature) and (2) if the thread that gets the signal is waiting, the signal is ignored (which is a bug).
More detail about the Watcher is in Appendix A of the book The Little Book of Semaphores

In your code, the done variable is a global variable.
So, whenever you want to modify it inside a function, you need to use the keyword global, or else it become a local variable.
You should fix your code like this:
import signal
import sys
done = False
def signal_handler(signal, frame):
global done
print('You pressed Ctrl+C!')
done = True
sys.exit()
signal.signal(signal.SIGINT, signal_handler)

Run subprocess in python and get stdout and kill process on timeout

Here is my code, it launches a subprocess, waits till it ends and returns stdout, or a timeout happens and it raises exception. Common use is print(Run('python --version').execute())
class Run(object):
def __init__(self, cmd, timeout=2*60*60):
self.cmd = cmd.split()
self.timeout = timeout
self._stdout = b''
self.dt = 10
self.p = None
def execute(self):
print("Execute command: {}".format(' '.join(self.cmd)))
def target():
self.p = Popen(self.cmd, stdout=PIPE, stderr=STDOUT)
self._stdout = self.p.communicate()[0]
thread = Thread(target=target)
thread.start()
t = 0
while t < self.timeout:
thread.join(self.dt)
if thread.is_alive():
t += self.dt
print("Running for: {} seconds".format(t))
else:
ret_code = self.p.poll()
if ret_code:
raise AssertionError("{} failed.\nretcode={}\nstdout:\n{}".format(
self.cmd, ret_code, self._stdout))
return self._stdout
else:
print('Timeout {} reached, kill task, pid={}'.format(self.timeout, self.p.pid))
self.p.terminate()
thread.join()
raise AssertionError("Timeout")
The problem is following case. The process that I launch spawns more child processes. So when the timeout is reached, I kill main process (the one I srarted using my class) with self.p.terminate(), the children are remaining and my code hangs on line self._stdout = self.p.communicate()[0]. And execution continues if I manually kill all child processes.
I tried soulution when instead of self.p.terminate() I kill whole process tree.
This also does not work if the main process finished by itself and its children are existing on their own, and I have no ability to find and kill them. But they are blocking self.p.communicate().
Is there way to effectively solve this?

You could use the ProcessWrapper from the PySys framework - it offers alot of this functionality as an abstraction in a cross platform way i.e.
import sys, os
from pysys.constants import *
from pysys.process.helper import ProcessWrapper
from pysys.exceptions import ProcessTimeout
command=sys.executable
arguments=['--version']
try:
process = ProcessWrapper(command, arguments=arguments, environs=os.environ, workingDir=os.getcwd(), stdout='stdout.log', stderr='stderr.log', state=FOREGROUND, timeout=5.0)
process.start()
except ProcessTimeout:
print "Process timeout"
process.stop()
It's at SourceForge (http://sourceforge.net/projects/pysys/files/ and http://pysys.sourceforge.net/) if of interest.

Kill or terminate subprocess when timeout?

I would like to repeatedly execute a subprocess as fast as possible. However, sometimes the process will take too long, so I want to kill it.
I use signal.signal(...) like below:
ppid=pipeexe.pid
signal.signal(signal.SIGALRM, stop_handler)
signal.alarm(1)
.....
def stop_handler(signal, frame):
print 'Stop test'+testdir+'for time out'
if(pipeexe.poll()==None and hasattr(signal, "SIGKILL")):
os.kill(ppid, signal.SIGKILL)
return False
but sometime this code will try to stop the next round from executing.
Stop test/home/lu/workspace/152/treefit/test2for time out
/bin/sh: /home/lu/workspace/153/squib_driver: not found ---this is the next execution; the program wrongly stops it.
Does anyone know how to solve this? I want to stop in time not execute 1 second the time.sleep(n) often wait n seconds. I do not want that I want it can execute less than 1 second

You could do something like this:
import subprocess as sub
import threading
class RunCmd(threading.Thread):
def __init__(self, cmd, timeout):
threading.Thread.__init__(self)
self.cmd = cmd
self.timeout = timeout
def run(self):
self.p = sub.Popen(self.cmd)
self.p.wait()
def Run(self):
self.start()
self.join(self.timeout)
if self.is_alive():
self.p.terminate() #use self.p.kill() if process needs a kill -9
self.join()
RunCmd(["./someProg", "arg1"], 60).Run()
The idea is that you create a thread that runs the command and to kill it if the timeout exceeds some suitable value, in this case 60 seconds.

Here is something I wrote as a watchdog for subprocess execution. I use it now a lot, but I'm not so experienced so maybe there are some flaws in it:
import subprocess
import time
def subprocess_execute(command, time_out=60):
"""executing the command with a watchdog"""
# launching the command
c = subprocess.Popen(command)
# now waiting for the command to complete
t = 0
while t < time_out and c.poll() is None:
time.sleep(1) # (comment 1)
t += 1
# there are two possibilities for the while to have stopped:
if c.poll() is None:
# in the case the process did not complete, we kill it
c.terminate()
# and fill the return code with some error value
returncode = -1 # (comment 2)
else:
# in the case the process completed normally
returncode = c.poll()
return returncode
Usage:
return = subprocess_execute(['java', '-jar', 'some.jar'])
Comments:
here, the watchdog time out is in seconds; but it's easy to change to whatever needed by changing the time.sleep() value. The time_out will have to be documented accordingly;
according to what is needed, here it maybe more suitable to raise some exception.
Documentation: I struggled a bit with the documentation of subprocess module to understand that subprocess.Popen is not blocking; the process is executed in parallel (maybe I do not use the correct word here, but I think it's understandable).
But as what I wrote is linear in its execution, I really have to wait for the command to complete, with a time out to avoid bugs in the command to pause the nightly execution of the script.

I guess this is a common synchronization problem in event-oriented programming with threads and processes.
If you should always have only one subprocess running, make sure the current subprocess is killed before running the next one. Otherwise the signal handler may get a reference to the last subprocess run and ignore the older.
Suppose subprocess A is running. Before the alarm signal is handled, subprocess B is launched. Just after that, your alarm signal handler attempts to kill a subprocess. As the current PID (or the current subprocess pipe object) was set to B's when launching the subprocess, B gets killed and A keeps running.
Is my guess correct?
To make your code easier to understand, I would include the part that creates a new subprocess just after the part that kills the current subprocess. That would make clear there is only one subprocess running at any time. The signal handler could do both the subprocess killing and launching, as if it was the iteration block that runs in a loop, in this case event-driven with the alarm signal every 1 second.

Here's what I use:
class KillerThread(threading.Thread):
def __init__(self, pid, timeout, event ):
threading.Thread.__init__(self)
self.pid = pid
self.timeout = timeout
self.event = event
self.setDaemon(True)
def run(self):
self.event.wait(self.timeout)
if not self.event.isSet() :
try:
os.kill( self.pid, signal.SIGKILL )
except OSError, e:
#This is raised if the process has already completed
pass
def runTimed(dt, dir, args, kwargs ):
event = threading.Event()
cwd = os.getcwd()
os.chdir(dir)
proc = subprocess.Popen(args, **kwargs )
os.chdir(cwd)
killer = KillerThread(proc.pid, dt, event)
killer.start()
(stdout, stderr) = proc.communicate()
event.set()
return (stdout,stderr, proc.returncode)

A bit more complex, I added an answer to solve a similar problem: Capturing stdout, feeding stdin, and being able to terminate after some time of inactivity and/or after some overall runtime.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cleanup child processes in multiprocessing - python

Related

How to terminate Python's `ProcessPoolExecutor` when parent process dies?

subprocess.Popen.send_signal(CTRL_C_EVENT) does not work

python picamera, keyboard ctrl+c/sigint not caught

Run subprocess in python and get stdout and kill process on timeout

Kill or terminate subprocess when timeout?

Categories

Resources