Python, killing a process does not kill its child daemon processes - python

I have a system implemented in Python, with a main script, launcher.py, that I use to create and run a bunch of inter-communicating child processes.
launcher.py
def main():
set_start_method("spawn")
ev_killing_switch = mp.Event()
## -----------------------
## multiprocessing
## -----------------------
## child_01
child_01 = mp.Process(target=Child_01_Initializer, args=(...), daemon=True)
child_01.start()
## child_02
child_02 = mp.Process(target=Child_02_Initializer, args=(ev_killing_switch, ...), daemon=True)
child_02.start()
## child_03
child_03 = mp.Process(target=Child_03_Initializer, args=(...), daemon=True)
child_03.start()
## child_04
child_04 = mp.Process(target=Child_04_Initializer, args=(...), daemon=True)
child_04.start()
ev_killing_switch.wait()
print("System stopped, terminating all processes.")
sys.exit(0)
I have an Event that can be set in one of the child processes, and at that point the main process is terminated, and all the children daemon processes are closed as well.
However, I need to terminate the main process (and, consequently, the children) externally, by using a shell script. I tried retrieving the PID of the main process and killing it, but in this way no cleanup function is called by the main process, which leaves its children running.
How can I fix this? An idea would be to retrieve the PIDs of the children as well and kill them, is there another possibility?

Related

Python Kill all subprocesses if one of them is finished

I have a python code that is running other scripts with multiple instances using subprocess.Popen and wait for them to finish with subprocess.Popen().wait().
Everything works fine, however I want to kill all subprocesses if one of them is terminated. Here is the code that I use to run multiple instances with python subprocess package
import ctypes
import os
import signal
import subprocess
libc = ctypes.CDLL("libc.so.6")
def set_pdeathsig(sig=signal.SIGTERM):
def callable():
return libc.prctl(1, sig)
return callable
if __name__ == "__main__":
procs = []
for i in range((os.cpu_count() * 2) - 1):
proc = subprocess.Popen(['python', "pythonscript_i_need_to_run/"], preexec_fn=set_pdeathsig(signal.SIGTERM))
procs.append(proc)
procs.append(subprocess.Popen(["python", "other_pythonscript_i_need_to_run"], preexec_fn=set_pdeathsig(signal.SIGTERM)))
for proc in procs:
proc.wait()
The set_pdeathsig function is for killing the children if parent is killed. Long story short I need to kill all children if one is killed. How can I do it ?
*** NOTE ***
When I try to kill the parent when one child is dead with
os.kill(os.getppid(), signal.SIGTERM) it doesn't kill the original parent script. Also I tried to kill by group pid but it didn't work as well.
In Unix and Unix-like Operating System has SIGCHLD signal which is send by OS kernel. This signal will be sent to parent process when child process terminated. If you have no handler for this signal, SIGCHLD signal will ignored by default. But if you have a handler function for this signal, you tell the kernel “hey I have a handler function, when child process terminated please trigger this handler function to run”
In your case, you have many child process, if one of them killed or finished its execution(by exit() syscall) kernel will send a SIGCHLD signal to the parent process which is your shared code.
We have a handler for SIGCHLD signal which is chld_handler() function. When one of the child process terminated, SIGCHLD signal will be sent to parent process and chld_handler function will triggered to run by OS kernel. (This named is signal catching)
In this function signal.signal(signal.SIGCHLD,chld_handler) we tell the kernel, “i have handler function for SIGCHLD signal, don’t ignore it when child terminated”. In chld_handler function which is run when SIGCHLD signal was sent, we call signal.signal(signal.SIGCHLD, signal.SIG_IGN) function that we tell the kernel, “hey I have no handler function, ignore the SIGCHLD signal” we do that because we do not need that anymore since we killing other childs with p.terminate() looping the procs.
All code would be like below
import ctypes
import os
import signal
import subprocess
libc = ctypes.CDLL("libc.so.6")
def set_pdeathsig(sig=signal.SIGTERM):
def callable():
return libc.prctl(1, sig)
return callable
def chld_handler(sig, frame):
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
print("one of the childs dead")
for p in procs:
p.terminate()
signal.signal(signal.SIGCHLD,chld_handler)
if __name__ == "__main__":
procs = []
for i in range((os.cpu_count() * 2) - 1):
proc = subprocess.Popen(['python', "pythonscript_i_need_to_run/"], preexec_fn=set_pdeathsig(signal.SIGTERM))
procs.append(proc)
procs.append(subprocess.Popen(["python", "other_pythonscript_i_need_to_run"], preexec_fn=set_pdeathsig(signal.SIGTERM)))
for proc in procs:
proc.wait()
Also there are much more detail about SIGCHLD signal and python signal library and also zombie process, i do not tell all the thing here because there are so many detail, and i am not expert all the deep knowledge now
I hope above informations give you some insight. If you think i am wrong somewhere, please correct me
Signal delivery (in python, that is using user-defined signal.signal() handlers) is sometimes race-prone. It's easy to code a solution that works most of the time, but may yet miss a signal that arrives just before or just after you are prepared to deal with it.
(For reliable delivery as an I/O event, the venerable self-pipe trick may be implemented in python.)
Signal acceptance is another approach, in which you SIG_BLOCK a signal to hold it pending when generated, and then accept it with the signal module's sigwait(), sigwaitinfo(), or sigtimedwait() when you're ready to do so. There's no chance of missing the signal here, but you must remember that basic UNIX signals do not queue up: only one signal of each type will be held pending for acceptance regardless of how many times that signal was generated.
For your problem, that would look something like this, assuming your implementation supported signal.pthread_sigmask():
def main():
signal.pthread_sigmask(signal.SIG_BLOCK, [signal.SIGCHLD])
... launch children ...
signal.sigwait([signal.SIGCHLD])
# OK, at least one child terminated
... terminate other children ...

Killing parent process from a child process with Python on Linux

In my (very simplified) scenario, in python 2.7, I have 2 processes:
Parent process, which doing some tasks.
Child process, which needs to kill the parent process after X time.
Creation of child process:
killer = multiprocessing.Process(...)
killer.start()
The child process executes the following code after X time (simplified version of the code):
process = psutil.Process(parent_pid)
...
if time_elapsed:
while True:
process.kill()
if not process.is_alive:
exit()
The problem is that it's leaving the parent as a zombie process, and the child is never exiting because the parent is still alive.
The same code works as expected in Windows.
All the solutions that I saw were talking about the parent process waiting for the child to finish by calling killer.join(), but in my case, the parent is the one who does the task, and it shouldn't wait for its child.
What is the best way to deal with a scenario like that?
You could use os.getppid() to retrieve the parent's PID, and kill it with os.kill().
E.g. os.kill(os.getppid(), signal.SIGKILL)
See https://docs.python.org/2/library/os.html and https://docs.python.org/2/library/signal.html#module-signal for reference.
A mwo:
Parent:
import subprocess32 as subprocess
subprocess.run(['python', 'ch.py'])
Child:
import os
import signal
os.kill(os.getppid(), signal.SIGTERM)

Weird behaviour with threads and processes mixing

I'm running the following python code:
import threading
import multiprocessing
def forever_print():
while True:
print("")
def main():
t = threading.Thread(target=forever_print)
t.start()
return
if __name__=='__main__':
p = multiprocessing.Process(target=main)
p.start()
p.join()
print("main process on control")
It terminates.
When I unwrapped main from the new process, and just ran it directly, like this:
if name == '__main__':
main()
The script went on forever, as I thought it should. Am I wrong to assume that, given that t is a non-daemon process, p shouldn't halt in the first case?
I basically set up this little test because i've been developing an app in which threads are spawned inside subprocesses, and it's been showing some weird behaviour (sometimes it terminates properly, sometimes it doesn't). I guess what I wanted to know, in a broader sense, is if there is some sort of "gotcha" when mixing these two python libs.
My running environment: python 2.7 # Ubuntu 14.04 LTS
For now, threads created by multiprocessing worker processes act like daemon threads with respect to process termination: the worker process exits without waiting for the threads it created to terminate. This is due to worker processes using os._exit() to shut down, which skips most normal shutdown processing (and in particular skips the normal exit processing code (sys.exit()) that .join()'s non-daemon threading.Threads).
The easiest workaround is for worker processes to explicitly .join() the non-daemon threads they create.
There's an open bug report about this behavior, but it hasn't made much progress: http://bugs.python.org/issue18966
You need to call t.join() in your main function.
As your main function returns, the process gets terminated with both its threads.
p.join() blocks the main thread waiting for the spawned process to end. Your spawned process then, creates a thread but does not wait for it to end. It returns immediately thus trashing the thread itself.
If Threads share memory, Processes don't. Therefore, the Thread you create in the newly spawned process remains relegated to that process. The parent process is not aware of it.
The gotcha is that the multiprocessing machinery calls os._exit() after your target function exits, which violently kills the child process, even if it has background threads running.
The code for Process.start() looks like this:
def start(self):
'''
Start child process
'''
assert self._popen is None, 'cannot start a process twice'
assert self._parent_pid == os.getpid(), \
'can only start a process object created by current process'
assert not _current_process._daemonic, \
'daemonic processes are not allowed to have children'
_cleanup()
if self._Popen is not None:
Popen = self._Popen
else:
from .forking import Popen
self._popen = Popen(self)
_current_process._children.add(self)
Popen.__init__ looks like this:
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork() # This forks a new process
if self.pid == 0: # This if block runs in the new process
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap() # This calls your target function
sys.stdout.flush()
sys.stderr.flush()
os._exit(code) # Violent death of the child process happens here
The _bootstrap method is the one that actually executes the target function you passed passed to the Process object. In your case, that's main. main returns right after you start your background thread, even though the process doesn't exit, because there's still a non-daemon thread running.
However, as soon execution hits os._exit(code), the child process is killed, regardless of any non-daemon threads still executing.

Why doesn't the daemon program exit without join()

The answer might be right in front of me on the link below but I still don't understand. I'm sure after someone explains this to me, Darwin will be making a call to me.
The example is at this link here, although I've made some changes to try to experiment and help my understanding.
Here's the code:
import multiprocessing
import time
import sys
def daemon():
p = multiprocessing.current_process()
print 'Starting: ', p.name, p.pid
sys.stdout.flush()
time.sleep(2)
print 'Exiting: ', p.name, p.pid
sys.stdout.flush()
def non_daemon():
p = multiprocessing.current_process()
print 'Starting: ', p.name, p.pid
sys.stdout.flush()
time.sleep(6)
print 'Exiting: ', p.name, p.pid
sys.stdout.flush()
if __name__ == '__main__':
d = multiprocessing.Process(name='daemon', target=daemon)
d.daemon = True
n = multiprocessing.Process(name='non-daemon', target=non_daemon)
n.daemon = False
d.start()
time.sleep(1)
n.start()
# d.join()
And the output of the code is:
Starting: daemon 6173
Starting: non-daemon 6174
Exiting: non-daemon 6174
If the join() at the end is uncommented, then the output is:
Starting: daemon 6247
Starting: non-daemon 6248
Exiting: daemon 6247
Exiting: non-daemon 6248
I'm confused b/c the sleep of the daemon is 2 sec, whereas the non-daemon is 6 sec. Why doesn't it print out the "Exiting" message in the first case? The daemon should have woken up before the non-daemon and printed the message.
The explanation from the site is as such:
The output does not include the “Exiting” message from the daemon
process, since all of the non-daemon processes (including the main
program) exit before the daemon process wakes up from its 2 second
sleep.
but I changed it such that the daemon should have woken up before the non-daemon does. What am I missing here? Thanks in advance for your help.
EDIT: Forgot to mention I'm using python 2.7 but apparently this problem is also in python 3.x
This was a fun one to track down. The docs are somewhat misleading, in that they describe the non-daemon processes as if they are all equivalent; the existence of any non-daemon process means the process "family" is alive. But that's not how it's implemented. The parent process is "more equal" than others; multiprocessing registers an atexit handler that does the following:
for p in active_children():
if p.daemon:
info('calling terminate() for daemon %s', p.name)
p._popen.terminate()
for p in active_children():
info('calling join() for process %s', p.name)
p.join()
So when the main process finishes, it first terminates all daemon child processes, then joins all child processes to wait on non-daemon children and clean up resources from daemon children.
Because it performs cleanup in this order, a moment after your non-daemon Process starts, the main process begins cleanup and forcibly terminates the daemon Process.
Note that fixing this can be as simple as joining the non-daemon process manually, not just joining the daemon process (which defeats the whole point of a daemon completely); that prevents the atexit handler from being called, delaying the cleanup that would terminate the daemon child.
It's arguably a bug (one that seems to exist up through 3.5.1; I reproed myself), but whether it's a behavior bug or a docs bug is arguable.

Using join() on Processes created using multiprocessing in python

I am using multiprocessing module's Process class to spawn multiple processes, those processes execute some script and then dies.What I wanted, a timeout to be applied on each process, so that a process would die if cant execute in time timeout. I am using join(timeout) on Process objects.
Since the join() function doesn;t kill the process, it just blocks the process until it finishes
Now my question : Is there any side-effects of using join() with timeout ..like, would the processes be cleaned automatically, after the main process dies ?? or I have to kill those processes manually ??
I am a newbie to python and its multiprocessing module, please be patient.
My Code, which is creating Processes in a for loop ::
q = Queue()
jobs = [
Process(
target=get_current_value,
args=(q,),
kwargs=
{
'device': device,
'service_list': service_list,
'data_source_list': data_source_list
}
) for device in device_list
]
for j in jobs:
j.start()
for k in jobs:
k.join()
The timeout argument just tells join how long to wait for the Process to exit before giving up. If timeout expires, the Process does not exit; the join call simply unblocks. If you want to end your workers when the timeout expires, you need to do so manually. You can either use terminate, as suggested by wRAR, to uncleanly shut things down, or use some other signaling mechanism to tell the children to shutdown cleanly:
p = Process(target=worker, args=(queue,))
p.start()
p.join(50)
if p.isalive(): # join timed out without the process actually finishing
#p.terminate() # unclean shutdown
If you don't want to use terminate, the alternative approach is really dependent on what the workers are doing. If they're consuming from a queue, you can use a sentinel:
def worker(queue):
for item in iter(queue.get, None): # None will break the loop
# Do normal work
if __name__ == "__main__":
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
# Do normal work here
# Time to shut down
queue.put(None)
Or you could use an Event, if you're doing some other operation in a loop:
def worker(event):
while not event.is_set():
# Do work here
if __name__ == "__main__":
event= multiprocessing.Event()
p = multiprocessing.Process(target=worker, args=(event,))
p.start()
# Do normal work here
# Time to shut down
event.set()
Using terminate could be just fine, though, unless your child processes are using resources that could be corrupted if the process is unexpectedly shut down (like writing to a file or db, or holding a lock). If you're just doing some calculations in the worker, using terminate won't hurt anything.
join() does nothing with the child process. If you really want to terminate worker process in a non-clean manner you should use terminate() (you should understand the consequences).
If you want children to be terminated when the main process exits you should set daemon attribute on them.

Categories