Using join() on Processes created using multiprocessing in python - python

I am using multiprocessing module's Process class to spawn multiple processes, those processes execute some script and then dies.What I wanted, a timeout to be applied on each process, so that a process would die if cant execute in time timeout. I am using join(timeout) on Process objects.
Since the join() function doesn;t kill the process, it just blocks the process until it finishes
Now my question : Is there any side-effects of using join() with timeout ..like, would the processes be cleaned automatically, after the main process dies ?? or I have to kill those processes manually ??
I am a newbie to python and its multiprocessing module, please be patient.
My Code, which is creating Processes in a for loop ::
q = Queue()
jobs = [
Process(
target=get_current_value,
args=(q,),
kwargs=
{
'device': device,
'service_list': service_list,
'data_source_list': data_source_list
}
) for device in device_list
]
for j in jobs:
j.start()
for k in jobs:
k.join()

The timeout argument just tells join how long to wait for the Process to exit before giving up. If timeout expires, the Process does not exit; the join call simply unblocks. If you want to end your workers when the timeout expires, you need to do so manually. You can either use terminate, as suggested by wRAR, to uncleanly shut things down, or use some other signaling mechanism to tell the children to shutdown cleanly:
p = Process(target=worker, args=(queue,))
p.start()
p.join(50)
if p.isalive(): # join timed out without the process actually finishing
#p.terminate() # unclean shutdown
If you don't want to use terminate, the alternative approach is really dependent on what the workers are doing. If they're consuming from a queue, you can use a sentinel:
def worker(queue):
for item in iter(queue.get, None): # None will break the loop
# Do normal work
if __name__ == "__main__":
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
# Do normal work here
# Time to shut down
queue.put(None)
Or you could use an Event, if you're doing some other operation in a loop:
def worker(event):
while not event.is_set():
# Do work here
if __name__ == "__main__":
event= multiprocessing.Event()
p = multiprocessing.Process(target=worker, args=(event,))
p.start()
# Do normal work here
# Time to shut down
event.set()
Using terminate could be just fine, though, unless your child processes are using resources that could be corrupted if the process is unexpectedly shut down (like writing to a file or db, or holding a lock). If you're just doing some calculations in the worker, using terminate won't hurt anything.

join() does nothing with the child process. If you really want to terminate worker process in a non-clean manner you should use terminate() (you should understand the consequences).
If you want children to be terminated when the main process exits you should set daemon attribute on them.

Related

Python, killing a process does not kill its child daemon processes

I have a system implemented in Python, with a main script, launcher.py, that I use to create and run a bunch of inter-communicating child processes.
launcher.py
def main():
set_start_method("spawn")
ev_killing_switch = mp.Event()
## -----------------------
## multiprocessing
## -----------------------
## child_01
child_01 = mp.Process(target=Child_01_Initializer, args=(...), daemon=True)
child_01.start()
## child_02
child_02 = mp.Process(target=Child_02_Initializer, args=(ev_killing_switch, ...), daemon=True)
child_02.start()
## child_03
child_03 = mp.Process(target=Child_03_Initializer, args=(...), daemon=True)
child_03.start()
## child_04
child_04 = mp.Process(target=Child_04_Initializer, args=(...), daemon=True)
child_04.start()
ev_killing_switch.wait()
print("System stopped, terminating all processes.")
sys.exit(0)
I have an Event that can be set in one of the child processes, and at that point the main process is terminated, and all the children daemon processes are closed as well.
However, I need to terminate the main process (and, consequently, the children) externally, by using a shell script. I tried retrieving the PID of the main process and killing it, but in this way no cleanup function is called by the main process, which leaves its children running.
How can I fix this? An idea would be to retrieve the PIDs of the children as well and kill them, is there another possibility?

Multiprocessing subprocesses randomly receive SIGTERMs

I'm fiddling with multiprocessing and signal.
I'm creating a pool, and have the workers catch SIGTERMs.
With no apparent reasons, I observe that the subprocesses randomly receive SIGTERMs.
Here is a MWE:
import multiprocessing as mp
import signal
import os
import time
def start_process():
print("Starting process #{}".format(os.getpid()))
def sigterm_handler(signo, _frame):
print("Process #{} received a SIGTERM".format(os.getpid()))
def worker(i):
time.sleep(1)
signal.signal(signal.SIGTERM, sigterm_handler)
while True:
with mp.Pool(initializer=start_process) as pool:
pool.map(worker, range(10))
time.sleep(2)
Output:
Starting process #7735
Starting process #7736
Starting process #7737
Starting process #7738
Starting process #7739
Starting process #7740
Starting process #7741
Starting process #7742
Job done.
Starting process #7746
Starting process #7747
Starting process #7748
Starting process #7749
Starting process #7750
Starting process #7751
Starting process #7752
Starting process #7753
Process #7748 received a SIGTERM
Process #7746 received a SIGTERM
Job done.
Starting process #7757
Starting process #7758
Starting process #7759
Starting process #7760
Starting process #7761
Starting process #7762
Starting process #7763
Starting process #7764
As you can see, that looks unpredictable.
So, where do these SIGTERMs come from?
Is this normal?
Am I guaranteed that the workers will finish their job?
And in the end, is it OK to have the subprocesses capture SIGTERMs?
It's normal and can happen while your pool is executing __exit__ upon leaving the context-manager.
Since the workers have finished their jobs at that point, there's nothing to worry about.
The pool itself causes the SIGTERM for workers which don't have an exitcode available when the
pool checks for it. This gets triggered in the Pool._terminate_pool-method (Python 3.7.1):
# Terminate workers which haven't already finished.
if pool and hasattr(pool[0], 'terminate'):
util.debug('terminating workers')
for p in pool:
if p.exitcode is None:
p.terminate()
The pool-workers will get joined a few lines later:
if pool and hasattr(pool[0], 'terminate'):
util.debug('joining pool workers')
for p in pool:
if p.is_alive():
# worker has not yet exited
util.debug('cleaning up worker %d' % p.pid)
p.join()
In a scenario where you would call pool.terminate() explicitly while your workers
are still running (for example you are using pool.map_async and then use pool.terminate()),
your application would deadlock waiting on the p.join() (unless you let your sigterm_handler eventually call sys.exit()).
Better don't mess with signal handlers if you don't have to.
I think it normal, but can't say anything about the random message printing. You can get more info, insert this in the main:
mp.log_to_stderr(logging.DEBUG)
and change the start_process():
def start_process():
proc= mp.current_process()
print("Starting process #{}, its name is {}".format(os.getpid(),proc.name))

Weird behaviour with threads and processes mixing

I'm running the following python code:
import threading
import multiprocessing
def forever_print():
while True:
print("")
def main():
t = threading.Thread(target=forever_print)
t.start()
return
if __name__=='__main__':
p = multiprocessing.Process(target=main)
p.start()
p.join()
print("main process on control")
It terminates.
When I unwrapped main from the new process, and just ran it directly, like this:
if name == '__main__':
main()
The script went on forever, as I thought it should. Am I wrong to assume that, given that t is a non-daemon process, p shouldn't halt in the first case?
I basically set up this little test because i've been developing an app in which threads are spawned inside subprocesses, and it's been showing some weird behaviour (sometimes it terminates properly, sometimes it doesn't). I guess what I wanted to know, in a broader sense, is if there is some sort of "gotcha" when mixing these two python libs.
My running environment: python 2.7 # Ubuntu 14.04 LTS
For now, threads created by multiprocessing worker processes act like daemon threads with respect to process termination: the worker process exits without waiting for the threads it created to terminate. This is due to worker processes using os._exit() to shut down, which skips most normal shutdown processing (and in particular skips the normal exit processing code (sys.exit()) that .join()'s non-daemon threading.Threads).
The easiest workaround is for worker processes to explicitly .join() the non-daemon threads they create.
There's an open bug report about this behavior, but it hasn't made much progress: http://bugs.python.org/issue18966
You need to call t.join() in your main function.
As your main function returns, the process gets terminated with both its threads.
p.join() blocks the main thread waiting for the spawned process to end. Your spawned process then, creates a thread but does not wait for it to end. It returns immediately thus trashing the thread itself.
If Threads share memory, Processes don't. Therefore, the Thread you create in the newly spawned process remains relegated to that process. The parent process is not aware of it.
The gotcha is that the multiprocessing machinery calls os._exit() after your target function exits, which violently kills the child process, even if it has background threads running.
The code for Process.start() looks like this:
def start(self):
'''
Start child process
'''
assert self._popen is None, 'cannot start a process twice'
assert self._parent_pid == os.getpid(), \
'can only start a process object created by current process'
assert not _current_process._daemonic, \
'daemonic processes are not allowed to have children'
_cleanup()
if self._Popen is not None:
Popen = self._Popen
else:
from .forking import Popen
self._popen = Popen(self)
_current_process._children.add(self)
Popen.__init__ looks like this:
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork() # This forks a new process
if self.pid == 0: # This if block runs in the new process
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap() # This calls your target function
sys.stdout.flush()
sys.stderr.flush()
os._exit(code) # Violent death of the child process happens here
The _bootstrap method is the one that actually executes the target function you passed passed to the Process object. In your case, that's main. main returns right after you start your background thread, even though the process doesn't exit, because there's still a non-daemon thread running.
However, as soon execution hits os._exit(code), the child process is killed, regardless of any non-daemon threads still executing.

Script using multiprocessing module does not terminate

The following code, does not print "here". What is the problem?
I tested it on both my machines (windows 7, Ubuntu 12.10), and
http://www.compileonline.com/execute_python_online.php
It does not print "here" in all cases.
from multiprocessing import Queue, Process
def runLang(que):
print "start"
myDict=dict()
for i in xrange(10000):
myDict[i]=i
que.put(myDict)
print "finish"
def run(fileToAnalyze):
que=Queue()
processList=[]
dicList=[]
langs= ["chi","eng"]
for lang in langs:
p=Process(target=runLang,args=(que,))
processList.append(p)
p.start()
for p1 in processList:
p1.join()
print "here"
for _ in xrange(len(langs)):
item=que.get()
print item
dicList.append(item)
if __name__=="__main__":
processList = []
for fileToAnalyse in ["abc.txt","def.txt"]:
p=Process(target=run,args=(fileToAnalyse,))
processList.append(p)
p.start()
for p1 in processList:
p1.join()
This is because when you put lots of items into a multiprocessing.Queue, they eventually get buffered in memory, once the underlying Pipe is full. The buffer won't get flushed until something starts reading from the other end of the Queue, which will allow the Pipe to accept more data. A Process cannot terminate until the buffer for all its Queue instances have been entirely flushed to their underlying Pipe. The implication of this is that if you try to join a process without having another process/thread calling get on its Queue, you could deadlock. This is mentioned in the docs:
Warning
As mentioned above, if a child process has put items on a queue (and
it has not used JoinableQueue.cancel_join_thread), then that process
will not terminate until all buffered items have been flushed to the
pipe.
This means that if you try joining that process you may get a deadlock
unless you are sure that all items which have been put on the queue
have been consumed. Similarly, if the child process is non-daemonic
then the parent process may hang on exit when it tries to join all its
non-daemonic children.
Note that a queue created using a manager does not have this issue.
You can fix the issue by not calling join until after you empty the Queue in the parent:
for _ in xrange(len(langs)):
item = que.get()
print(item)
dicList.append(item)
# join after emptying the queue.
for p in processList:
p.join()
print("here")

Blocking subprocess function in Python?

I had an asynchronous function being called like this:
from multiprocessing import Process
def my_function(arg1, arg2):
print 'Long process begins'
p = Process(target=my_function, args=(arg1, arg2,)).start()
How can I make this blocking? I need to finish the process before running the rest of the script.
Use p.join()
Block the calling thread until the process whose join() method is
called terminates or until the optional timeout occurs.
If timeout is None then there is no timeout.
A process can be joined many times.
A process cannot join itself because this would cause a deadlock. It
is an error to attempt to join a process before it has been started.

Categories