Python multiprocessing - catch SIGINT/SIGTERM and exit gracefully

Python multiprocessing - catch SIGINT/SIGTERM and exit gracefully - python

I have two python scripts and I want them to communicate to each other. Specifically, I want script Communication.py to send an array to script Process.py if required by the latter. I've used module multiprocessing.Process and multiprocessing.Pipe to make it works. My code works, but I want to handle gracefully SIGINT and SIGTERM, I've tried the following but it does not exit gracefully:
Process.py
from multiprocessing import Process, Pipe
from Communication import arraySender
import time
import signal
class GracefulKiller:
kill_now = False
def __init__(self):
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, *args):
self.kill_now = True
def main():
parent_conn, child_conn = Pipe()
p = Process(target=arraySender, args=(child_conn,True))
p.start()
print(parent_conn.recv())
if __name__ == '__main__':
killer = GracefulKiller()
while not killer.kill_now:
main()
Communication.py
import numpy
from multiprocessing import Process, Pipe
def arraySender(child_conn, sendData):
if sendData:
child_conn.send(numpy.random.randint(0, high=10, size=15, dtype=int))
child_conn.close()
what am I doing wrong?

I strongly suspect you are running this under Windows because I think the code you have should work under Linux. This is why it is important to always tag your questions concerning Python and multiprocessing with the actual platform you are on.
The problem appears to be due to the fact that in addition to your main process you have created a child process in function main that is also receiving the signals. The solution would normally be to add calls like signal.signal(signal.SIGINT, signal.SIG_IGN) to your array_sender worker function. But there are two problems with this:
There is a race condition: The signal could be received by the child process before it has a change to ignore signals.
Regardless, the call to ignore signals when you are using multiprocess.Processing does not seem to work (perhaps that class does its own signal handling that overrides these calls).
The solution is to create a multiprocessing pool and initialize each pool process so that they ignore signals before you submit any tasks. The other advantage of using a pool, although in this case we only need a pool size of 1 because you never have more than one task running at a time, is that you only need to create the process once which can then be reused.
As an aside, you have some inconsistency in your GracefulKiller class by mixing a class attribute kill_now with an instance attribute kill_now that gets created when you execute self.kill_now = True. So when the main process is testing killer.kill_now it is accessing the class attribute until such time as self.kill_now is set to True when it will then be accessing the instance attribute.
from multiprocessing import Pool, Pipe
import time
import signal
import numpy
class GracefulKiller:
def __init__(self):
self.kill_now = False # Instance attribute
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, *args):
self.kill_now = True
def init_pool_processes():
signal.signal(signal.SIGINT, signal.SIG_IGN)
signal.signal(signal.SIGTERM, signal.SIG_IGN)
def arraySender(sendData):
if sendData:
return numpy.random.randint(0, high=10, size=15, dtype=int)
def main(pool):
result = pool.apply(arraySender, args=(True,))
print(result)
if __name__ == '__main__':
# Create pool with only 1 process:
pool = Pool(1, initializer=init_pool_processes)
killer = GracefulKiller()
while not killer.kill_now:
main(pool)
pool.close()
pool.join()
Ideally GracefulKiller should be a singleton class so that regardless of how many times GracefulKiller was instantiated by a process, you would be calling signal.signal only once for each type of signal you want to handle:
class Singleton(type):
def __init__(self, *args, **kwargs):
self.__instance = None
super().__init__(*args, **kwargs)
def __call__(self, *args, **kwargs):
if self.__instance is None:
self.__instance = super().__call__(*args, **kwargs)
return self.__instance
class GracefulKiller(metaclass=Singleton):
def __init__(self):
self.kill_now = False # Instance attribute
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, *args):
self.kill_now = True

Related

Stopping eval code dinamically on event fired [duplicate]

What's the proper way to tell a looping thread to stop looping?
I have a fairly simple program that pings a specified host in a separate threading.Thread class. In this class it sleeps 60 seconds, the runs again until the application quits.
I'd like to implement a 'Stop' button in my wx.Frame to ask the looping thread to stop. It doesn't need to end the thread right away, it can just stop looping once it wakes up.
Here is my threading class (note: I haven't implemented looping yet, but it would likely fall under the run method in PingAssets)
class PingAssets(threading.Thread):
def __init__(self, threadNum, asset, window):
threading.Thread.__init__(self)
self.threadNum = threadNum
self.window = window
self.asset = asset
def run(self):
config = controller.getConfig()
fmt = config['timefmt']
start_time = datetime.now().strftime(fmt)
try:
if onlinecheck.check_status(self.asset):
status = "online"
else:
status = "offline"
except socket.gaierror:
status = "an invalid asset tag."
msg =("{}: {} is {}. \n".format(start_time, self.asset, status))
wx.CallAfter(self.window.Logger, msg)
And in my wxPyhton Frame I have this function called from a Start button:
def CheckAsset(self, asset):
self.count += 1
thread = PingAssets(self.count, asset, self)
self.threads.append(thread)
thread.start()

Threaded stoppable function
Instead of subclassing threading.Thread, one can modify the function to allow
stopping by a flag.
We need an object, accessible to running function, to which we set the flag to stop running.
We can use threading.currentThread() object.
import threading
import time
def doit(arg):
t = threading.currentThread()
while getattr(t, "do_run", True):
print ("working on %s" % arg)
time.sleep(1)
print("Stopping as you wish.")
def main():
t = threading.Thread(target=doit, args=("task",))
t.start()
time.sleep(5)
t.do_run = False
if __name__ == "__main__":
main()
The trick is, that the running thread can have attached additional properties. The solution builds
on assumptions:
the thread has a property "do_run" with default value True
driving parent process can assign to started thread the property "do_run" to False.
Running the code, we get following output:
$ python stopthread.py
working on task
working on task
working on task
working on task
working on task
Stopping as you wish.
Pill to kill - using Event
Other alternative is to use threading.Event as function argument. It is by
default False, but external process can "set it" (to True) and function can
learn about it using wait(timeout) function.
We can wait with zero timeout, but we can also use it as the sleeping timer (used below).
def doit(stop_event, arg):
while not stop_event.wait(1):
print ("working on %s" % arg)
print("Stopping as you wish.")
def main():
pill2kill = threading.Event()
t = threading.Thread(target=doit, args=(pill2kill, "task"))
t.start()
time.sleep(5)
pill2kill.set()
t.join()
Edit: I tried this in Python 3.6. stop_event.wait() blocks the event (and so the while loop) until release. It does not return a boolean value. Using stop_event.is_set() works instead.
Stopping multiple threads with one pill
Advantage of pill to kill is better seen, if we have to stop multiple threads
at once, as one pill will work for all.
The doit will not change at all, only the main handles the threads a bit differently.
def main():
pill2kill = threading.Event()
tasks = ["task ONE", "task TWO", "task THREE"]
def thread_gen(pill2kill, tasks):
for task in tasks:
t = threading.Thread(target=doit, args=(pill2kill, task))
yield t
threads = list(thread_gen(pill2kill, tasks))
for thread in threads:
thread.start()
time.sleep(5)
pill2kill.set()
for thread in threads:
thread.join()

This has been asked before on Stack. See the following links:
Is there any way to kill a Thread in Python?
Stopping a thread after a certain amount of time
Basically you just need to set up the thread with a stop function that sets a sentinel value that the thread will check. In your case, you'll have the something in your loop check the sentinel value to see if it's changed and if it has, the loop can break and the thread can die.

I read the other questions on Stack but I was still a little confused on communicating across classes. Here is how I approached it:
I use a list to hold all my threads in the __init__ method of my wxFrame class: self.threads = []
As recommended in How to stop a looping thread in Python? I use a signal in my thread class which is set to True when initializing the threading class.
class PingAssets(threading.Thread):
def __init__(self, threadNum, asset, window):
threading.Thread.__init__(self)
self.threadNum = threadNum
self.window = window
self.asset = asset
self.signal = True
def run(self):
while self.signal:
do_stuff()
sleep()
and I can stop these threads by iterating over my threads:
def OnStop(self, e):
for t in self.threads:
t.signal = False

I had a different approach. I've sub-classed a Thread class and in the constructor I've created an Event object. Then I've written custom join() method, which first sets this event and then calls a parent's version of itself.
Here is my class, I'm using for serial port communication in wxPython app:
import wx, threading, serial, Events, Queue
class PumpThread(threading.Thread):
def __init__ (self, port, queue, parent):
super(PumpThread, self).__init__()
self.port = port
self.queue = queue
self.parent = parent
self.serial = serial.Serial()
self.serial.port = self.port
self.serial.timeout = 0.5
self.serial.baudrate = 9600
self.serial.parity = 'N'
self.stopRequest = threading.Event()
def run (self):
try:
self.serial.open()
except Exception, ex:
print ("[ERROR]\tUnable to open port {}".format(self.port))
print ("[ERROR]\t{}\n\n{}".format(ex.message, ex.traceback))
self.stopRequest.set()
else:
print ("[INFO]\tListening port {}".format(self.port))
self.serial.write("FLOW?\r")
while not self.stopRequest.isSet():
msg = ''
if not self.queue.empty():
try:
command = self.queue.get()
self.serial.write(command)
except Queue.Empty:
continue
while self.serial.inWaiting():
char = self.serial.read(1)
if '\r' in char and len(msg) > 1:
char = ''
#~ print('[DATA]\t{}'.format(msg))
event = Events.PumpDataEvent(Events.SERIALRX, wx.ID_ANY, msg)
wx.PostEvent(self.parent, event)
msg = ''
break
msg += char
self.serial.close()
def join (self, timeout=None):
self.stopRequest.set()
super(PumpThread, self).join(timeout)
def SetPort (self, serial):
self.serial = serial
def Write (self, msg):
if self.serial.is_open:
self.queue.put(msg)
else:
print("[ERROR]\tPort {} is not open!".format(self.port))
def Stop(self):
if self.isAlive():
self.join()
The Queue is used for sending messages to the port and main loop takes responses back. I've used no serial.readline() method, because of different end-line char, and I have found the usage of io classes to be too much fuss.

Depends on what you run in that thread.
If that's your code, then you can implement a stop condition (see other answers).
However, if what you want is to run someone else's code, then you should fork and start a process. Like this:
import multiprocessing
proc = multiprocessing.Process(target=your_proc_function, args=())
proc.start()
now, whenever you want to stop that process, send it a SIGTERM like this:
proc.terminate()
proc.join()
And it's not slow: fractions of a second.
Enjoy :)

My solution is:
import threading, time
def a():
t = threading.currentThread()
while getattr(t, "do_run", True):
print('Do something')
time.sleep(1)
def getThreadByName(name):
threads = threading.enumerate() #Threads list
for thread in threads:
if thread.name == name:
return thread
threading.Thread(target=a, name='228').start() #Init thread
t = getThreadByName('228') #Get thread by name
time.sleep(5)
t.do_run = False #Signal to stop thread
t.join()

I find it useful to have a class, derived from threading.Thread, to encapsulate my thread functionality. You simply provide your own main loop in an overridden version of run() in this class. Calling start() arranges for the object’s run() method to be invoked in a separate thread.
Inside the main loop, periodically check whether a threading.Event has been set. Such an event is thread-safe.
Inside this class, you have your own join() method that sets the stop event object before calling the join() method of the base class. It can optionally take a time value to pass to the base class's join() method to ensure your thread is terminated in a short amount of time.
import threading
import time
class MyThread(threading.Thread):
def __init__(self, sleep_time=0.1):
self._stop_event = threading.Event()
self._sleep_time = sleep_time
"""call base class constructor"""
super().__init__()
def run(self):
"""main control loop"""
while not self._stop_event.isSet():
#do work
print("hi")
self._stop_event.wait(self._sleep_time)
def join(self, timeout=None):
"""set stop event and join within a given time period"""
self._stop_event.set()
super().join(timeout)
if __name__ == "__main__":
t = MyThread()
t.start()
time.sleep(5)
t.join(1) #wait 1s max
Having a small sleep inside the main loop before checking the threading.Event is less CPU intensive than looping continuously. You can have a default sleep time (e.g. 0.1s), but you can also pass the value in the constructor.

Sometimes you don't have control over the running target. In those cases you can use signal.pthread_kill to send a stop signal.
from signal import pthread_kill, SIGTSTP
from threading import Thread
from itertools import count
from time import sleep
def target():
for num in count():
print(num)
sleep(1)
thread = Thread(target=target)
thread.start()
sleep(5)
pthread_kill(thread.ident, SIGTSTP)
result
0
1
2
3
4
[14]+ Stopped

Calling GMainLoop.quit() for mainloop running in child process, from parent process

I need to run a gstreamer pipeline to perform video streaming. The GStreamer pipeline requires a GObject.MainLoop object which has a run() method that does not terminate until quit() is called.
For this I create a process (P2) from my main application process (P1), which runs the GObject.MainLoop instance in its main thread. The problem is that loop goes on indefinitly within the process P2 and I'm unable to exit/quit it from the main application process (P1).
Following is the section of code that might help understanding the scenario.
'''
start() spawns a new process P2 that runs Mainloop within its main thread.
stop() is called from P1, but does not quit the Mainloop. This is probably because
processes do not have shared memory
'''
from multiprocessing import Process
import gi
from gi.repository import GObject
class Main:
def __init__(self):
self.process = None
self.loop = GObject.MainLoop()
def worker(self):
self.loop.run()
def start(self):
self.process=Process(target=self.worker, args=())
self.process.start()
def stop(self):
self.loop.quit()
Next, I tried using a multiprocessing Queue for sharing the 'loop' variable between the processes, but am still unable to quit the mainloop.
'''
start() spawns a new process and puts the loop object in a multiprocessing Queue
stop() calls get() from the loop and calls the quit() method, though it still does not quit the mainloop.
'''
from multiprocessing import Process, Queue
import gi
from gi.repository import GObject
class Main:
def __init__(self):
self.p=None
self.loop = GObject.MainLoop()
self.queue = Queue()
def worker(self):
self.queue.put(self.loop)
self.loop.run()
def start(self):
self.p=Process(target=self.worker, args=())
self.p.start()
def stop(self):
# receive loop instance shared by Child Process
loop=self.queue.get()
loop.quit()
How do I call the quit method for the MainLoop object which is only accessible within the child Process P2?

Ok firstly we need to be using threads not processes. Processes will be in a different address space.
What is the difference between a process and a thread?
Try passing the main loop object to a separate thread that does the actual work. This will make your main method in to nothing but a basic GLib event processing loop, but that is fine and the normal behavior in many GLib applciations.
Lastly, we need to handle the race condition of the child process finishing its work before the main loop activates. We do this with the while not loop.is_running() snippet.
from threading import Thread
import gi
from gi.repository import GObject
def worker(loop):
while not loop.is_running():
print("waiting for loop to run")
print("working")
loop.quit()
print("quitting")
class Main:
def __init__(self):
self.thread = None
self.loop = GObject.MainLoop()
def start(self):
self.thread=Thread(target=worker, args=(self.loop,))
self.thread.start()
self.loop.run()
def main():
GObject.threads_init()
m = Main()
m.start()
if __name__ =='__main__' : main()

I extended multiprocessing.Process module in my class Main and overridden its run() method to actually run the GObject.Mainloop instance inside another thread (T1) instead of its main thread. And then implemented a wait-notify mechanism which will make the main thread of Process (P2) to go under wait-notify loop and used multiprocessing.Queue to forward messages to the main thread of P2 and P2 will be notified at the same time. For eg, stop() method, which will send the quit message to P2 for which a handler is defined in the overridden run() method.
This module can be extended to parse any number of messages to the Child Process provided their handlers are to be defined also.
Following is the code snippet which I used.
from multiprocessing import Process, Condition, Queue
from threading import Thread
import gi
from gi.repository import GObject
loop=GObject.MainLoop()
def worker():
loop.run()
class Main(Process):
def __init__(self, target=None, args=()):
self.target=target
self.args=tuple(args)
print self.args
self.message_queue = Queue()
self.cond = Condition()
self.thread = None
self.loop = GObject.MainLoop()
Process.__init__(self)
def run(self):
if self.target:
self.thread = Thread(target=self.target, args=())
print "running target method"
self.thread.start()
while True:
with self.cond:
self.cond.wait()
msg = self.message_queue.get()
if msg == 'quit':
print loop.is_running()
loop.quit()
print loop.is_running()
break
else:
print 'message received', msg
def send_message(self, msg):
self.message_queue.put(msg)
with self.cond:
self.cond.notify_all()
def stop(self):
self.send_message("quit")
self.join()
def func1(self):
self.send_message("msg 1") # handler is defined in the overridden run method
# few others functions which will send unique messages to the process, and their handlers
# are defined in the overridden run method above
This method is working fine for my scenerio but suggestions are welcomed if there is a better way to do the same.

canonical example of worker process with PySide or PyQt

I was looking for some good example of managing worker process from Qt GUI created in Python. I need this to be as complete as possible, including reporting progress from the process, including aborting the process, including handling of possible errors coming from the process.
I only found some semi-finished examples which only did part of work but when I tried to make them complete I failed. My current design comes in three layers:
1) there is the main thread in which resides the GUI and ProcessScheduler which controls that only one instance of worker process is running and can abort it
2) there is another thread in which I have ProcessObserver which actually runs the process and understands the stuff coming from queue (which is used for inter-process communication), this must be in non-GUI thread to keep GUI responsive
3) there is the actual worker process which executes a given piece of code (my future intention is to replace multiprocessing with multiprocess or pathos or something else what can pickle function objects, but this is not my current issue) and report progress or result to the queue
Currently I have this snippet (the print functions in the code are just for debugging and will be deleted eventually):
import multiprocessing
from PySide import QtCore, QtGui
QtWidgets = QtGui
N = 10000000
# I would like this to be a function object
# but multiprocessing cannot pickle it :(
# so I will use multiprocess in the future
CODE = """
# calculates sum of numbers from 0 to n-1
# reports percent progress of finished work
sum = 0
progress = -1
for i in range(n):
sum += i
p = i * 100 // n
if p > progress:
queue.put(["progress", p])
progress = p
queue.put(["result", sum])
"""
class EvalProcess(multiprocessing.Process):
def __init__(self, code, symbols):
super(EvalProcess, self).__init__()
self.code= code
self.symbols = symbols # symbols must contain 'queue'
def run(self):
print("EvalProcess started")
exec(self.code, self.symbols)
print("EvalProcess finished")
class ProcessObserver(QtCore.QObject):
"""Resides in worker thread. Its role is to understand
to what is received from the process via the queue."""
progressChanged = QtCore.Signal(float)
finished = QtCore.Signal(object)
def __init__(self, process, queue):
super(ProcessObserver, self).__init__()
self.process = process
self.queue = queue
def run(self):
print("ProcessObserver started")
self.process.start()
try:
while True:
# this loop keeps running and listening to the queue
# even if the process is aborted
result = self.queue.get()
print("received from queue:", result)
if result[0] == "progress":
self.progressChanged.emit(result[1])
elif result[0] == "result":
self.finished.emit(result[1])
break
except Exception as e:
print(e) # QUESTION: WHAT HAPPENS WHEN THE PROCESS FAILS?
self.process.join() # QUESTION: DO I NEED THIS LINE?
print("ProcessObserver finished")
class ProcessScheduler(QtCore.QObject):
"""Resides in the main thread."""
sendText = QtCore.Signal(str)
def __init__(self):
super(ProcessScheduler, self).__init__()
self.observer = None
self.thread = None
self.process = None
self.queue = None
def start(self):
if self.process: # Q: IS THIS OK?
# should kill current process and start a new one
self.abort()
self.queue = multiprocessing.Queue()
self.process = EvalProcess(CODE, {"n": N, "queue": self.queue})
self.thread = QtCore.QThread()
self.observer = ProcessObserver(self.process, self.queue)
self.observer.moveToThread(self.thread)
self.observer.progressChanged.connect(self.onProgressChanged)
self.observer.finished.connect(self.onResultReceived)
self.thread.started.connect(self.observer.run)
self.thread.finished.connect(self.onThreadFinished)
self.thread.start()
self.sendText.emit("Calculation started")
def abort(self):
self.process.terminate()
self.sendText.emit("Aborted.")
self.onThreadFinished()
def onProgressChanged(self, percent):
self.sendText.emit("Progress={}%".format(percent))
def onResultReceived(self, result):
print("onResultReceived called")
self.sendText.emit("Result={}".format(result))
self.thread.quit()
def onThreadFinished(self):
print("onThreadFinished called")
self.thread.deleteLater() # QUESTION: DO I NEED THIS LINE?
self.thread = None
self.observer = None
self.process = None
self.queue = None
if __name__ == '__main__':
app = QtWidgets.QApplication([])
scheduler = ProcessScheduler()
window = QtWidgets.QWidget()
layout = QtWidgets.QVBoxLayout(window)
startButton = QtWidgets.QPushButton("sum(range({}))".format(N))
startButton.pressed.connect(scheduler.start)
layout.addWidget(startButton)
abortButton = QtWidgets.QPushButton("Abort")
abortButton.pressed.connect(scheduler.abort)
layout.addWidget(abortButton)
console = QtWidgets.QPlainTextEdit()
scheduler.sendText.connect(console.appendPlainText)
layout.addWidget(console)
window.show()
app.exec_()
It works kind of OK but it still lacks proper error handling and aborting of process. Especially I am now struggling with the aborting. The main problem is that the worker thread keeps running (in the loop listening to the queue) even if the process has been aborted/terminated in the middle of calculation (or at least it prints this error in the console QThread: Destroyed while thread is still running). Is there a way to solve this? Or any alternative approach? Or, if possible, any real-life and compete example of such task fulfilling all the requirements mentioned above? Any comment would be much appreciated.

Extending mp.Process in Python 3

import multiprocessing as mp
import time as t
class MyProcess(mp.Process):
def __init__(self, target, args, name):
mp.Process.__init__(self, target=target, args=args)
self.exit = mp.Event()
self.name = name
print("{0} initiated".format(self.name))
def run(self):
while not self.exit.is_set():
pass
print("Process {0} exited.".format(self.name))
def shutdown(self):
print("Shutdown initiated for {0}.".format(self.name))
self.exit.set()
def f(x):
while True:
print(x)
x = x+1
if __name__ == "__main__":
p = MyProcess(target=f, args=[3], name="function")
p.start()
#p.join()
t.wait(2)
p.shutdown()
I'm trying to extend the multiprocessing.Process class to add a shutdown method in order to be able to exit a function which could potentially have to be run for an undefined amount of time. Following instructions from Python Multiprocessing Exit Elegantly How? and adding the argument passing I came up with myself, only gets me this output:
function initiated
Shutdown initiated for function.
Process function exited.
But no actual method f(x) output. It seems that the actual process target doesn't get started. I'm obviously doing something wrong, but just can't figure out what, any ideas?
Thanks!

The sane way to handle this situation is, where possible, to have the background task cooperate in the exit mechanism by periodically checking the exit event. For that, there's no need to subclass Process: you can rewrite your background task to include that check. For example, here's your code rewritten using that approach:
import multiprocessing as mp
import time as t
def f(x, exit_event):
while not exit_event.is_set():
print(x)
x = x+1
print("Exiting")
if __name__ == "__main__":
exit_event = mp.Event()
p = mp.Process(target=f, args=(3, exit_event), name="function")
p.start()
t.sleep(2)
exit_event.set()
p.join()
If that's not an option (for example because you can't modify the code that's being run in the background job), then you can use the Process.terminate method. But you should be aware that using it is dangerous: the child process won't have an opportunity to clean up properly, so for example if it's shutdown while holding a multiprocessing lock, no other process will be able to acquire that lock, giving a risk of deadlock. It's far better to have the child cooperate in the shutdown if possible.

The solution to this problem is to call the super().run() function in your class run method.
Of course, this will cause the permanent execution of your function due to the existence of while True, and the specified event will not cause its end.
You can use Process.terminate() method to end your process.
import multiprocessing as mp
import time as t
class MyProcess(mp.Process):
def __init__(self, target, args, name):
mp.Process.__init__(self, target=target, args=args)
self.name = name
print("{0} initiated".format(self.name))
def run(self):
print("Process {0} started.".format(self.name))
super().run()
def shutdown(self):
print("Shutdown initiated for {0}.".format(self.name))
self.terminate()
def f(x):
while True:
print(x)
t.sleep(1)
x += 1
if __name__ == "__main__":
p = MyProcess(target=f, args=(3,), name="function")
p.start()
# p.join()
t.sleep(5)
p.shutdown()

Python Process Pool non-daemonic?

Would it be possible to create a python Pool that is non-daemonic? I want a pool to be able to call a function that has another pool inside.
I want this because deamon processes cannot create process. Specifically, it will cause the error:
AssertionError: daemonic processes are not allowed to have children
For example, consider the scenario where function_a has a pool which runs function_b which has a pool which runs function_c. This function chain will fail, because function_b is being run in a daemon process, and daemon processes cannot create processes.

The multiprocessing.pool.Pool class creates the worker processes in its __init__ method, makes them daemonic and starts them, and it is not possible to re-set their daemon attribute to False before they are started (and afterwards it's not allowed anymore). But you can create your own sub-class of multiprocesing.pool.Pool (multiprocessing.Pool is just a wrapper function) and substitute your own multiprocessing.Process sub-class, which is always non-daemonic, to be used for the worker processes.
Here's a full example of how to do this. The important parts are the two classes NoDaemonProcess and MyPool at the top and to call pool.close() and pool.join() on your MyPool instance at the end.
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import multiprocessing
# We must import this explicitly, it is not imported by the top-level
# multiprocessing module.
import multiprocessing.pool
import time
from random import randint
class NoDaemonProcess(multiprocessing.Process):
# make 'daemon' attribute always return False
def _get_daemon(self):
return False
def _set_daemon(self, value):
pass
daemon = property(_get_daemon, _set_daemon)
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class MyPool(multiprocessing.pool.Pool):
Process = NoDaemonProcess
def sleepawhile(t):
print("Sleeping %i seconds..." % t)
time.sleep(t)
return t
def work(num_procs):
print("Creating %i (daemon) workers and jobs in child." % num_procs)
pool = multiprocessing.Pool(num_procs)
result = pool.map(sleepawhile,
[randint(1, 5) for x in range(num_procs)])
# The following is not really needed, since the (daemon) workers of the
# child's pool are killed when the child is terminated, but it's good
# practice to cleanup after ourselves anyway.
pool.close()
pool.join()
return result
def test():
print("Creating 5 (non-daemon) workers and jobs in main process.")
pool = MyPool(5)
result = pool.map(work, [randint(1, 5) for x in range(5)])
pool.close()
pool.join()
print(result)
if __name__ == '__main__':
test()

I had the necessity to employ a non-daemonic pool in Python 3.7 and ended up adapting the code posted in the accepted answer. Below there's the snippet that creates the non-daemonic pool:
import multiprocessing.pool
class NoDaemonProcess(multiprocessing.Process):
#property
def daemon(self):
return False
#daemon.setter
def daemon(self, value):
pass
class NoDaemonContext(type(multiprocessing.get_context())):
Process = NoDaemonProcess
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class NestablePool(multiprocessing.pool.Pool):
def __init__(self, *args, **kwargs):
kwargs['context'] = NoDaemonContext()
super(NestablePool, self).__init__(*args, **kwargs)
As the current implementation of multiprocessing has been extensively refactored to be based on contexts, we need to provide a NoDaemonContext class that has our NoDaemonProcess as attribute. NestablePool will then use that context instead of the default one.
That said, I should warn that there are at least two caveats to this approach:
It still depends on implementation details of the multiprocessing package, and could therefore break at any time.
There are valid reasons why multiprocessing made it so hard to use non-daemonic processes, many of which are explained here. The most compelling in my opinion is:
As for allowing children threads to spawn off children of its own using
subprocess runs the risk of creating a little army of zombie
'grandchildren' if either the parent or child threads terminate before
the subprocess completes and returns.

As of Python 3.8, concurrent.futures.ProcessPoolExecutor doesn't have this limitation. It can have a nested process pool with no problem at all:
from concurrent.futures import ProcessPoolExecutor as Pool
from itertools import repeat
from multiprocessing import current_process
import time
def pid():
return current_process().pid
def _square(i): # Runs in inner_pool
square = i ** 2
time.sleep(i / 10)
print(f'{pid()=} {i=} {square=}')
return square
def _sum_squares(i, j): # Runs in outer_pool
with Pool(max_workers=2) as inner_pool:
squares = inner_pool.map(_square, (i, j))
sum_squares = sum(squares)
time.sleep(sum_squares ** .5)
print(f'{pid()=}, {i=}, {j=} {sum_squares=}')
return sum_squares
def main():
with Pool(max_workers=3) as outer_pool:
for sum_squares in outer_pool.map(_sum_squares, range(5), repeat(3)):
print(f'{pid()=} {sum_squares=}')
if __name__ == "__main__":
main()
The above demonstration code was tested with Python 3.8.
A limitation of ProcessPoolExecutor, however, is that it doesn't have maxtasksperchild. If you need this, consider the answer by Massimiliano instead.
Credit: answer by jfs

The multiprocessing module has a nice interface to use pools with processes or threads. Depending on your current use case, you might consider using multiprocessing.pool.ThreadPool for your outer Pool, which will result in threads (that allow to spawn processes from within) as opposed to processes.
It might be limited by the GIL, but in my particular case (I tested both), the startup time for the processes from the outer Pool as created here far outweighed the solution with ThreadPool.
It's really easy to swap Processes for Threads. Read more about how to use a ThreadPool solution here or here.

On some Python versions replacing standard Pool to custom can raise error: AssertionError: group argument must be None for now.
Here I found a solution that can help:
class NoDaemonProcess(multiprocessing.Process):
# make 'daemon' attribute always return False
#property
def daemon(self):
return False
#daemon.setter
def daemon(self, val):
pass
class NoDaemonProcessPool(multiprocessing.pool.Pool):
def Process(self, *args, **kwds):
proc = super(NoDaemonProcessPool, self).Process(*args, **kwds)
proc.__class__ = NoDaemonProcess
return proc

I have seen people dealing with this issue by using celery's fork of multiprocessing called billiard (multiprocessing pool extensions), which allows daemonic processes to spawn children. The walkaround is to simply replace the multiprocessing module by:
import billiard as multiprocessing

The issue I encountered was in trying to import globals between modules, causing the ProcessPool() line to get evaluated multiple times.
globals.py
from processing import Manager, Lock
from pathos.multiprocessing import ProcessPool
from pathos.threading import ThreadPool
class SingletonMeta(type):
def __new__(cls, name, bases, dict):
dict['__deepcopy__'] = dict['__copy__'] = lambda self, *args: self
return super(SingletonMeta, cls).__new__(cls, name, bases, dict)
def __init__(cls, name, bases, dict):
super(SingletonMeta, cls).__init__(name, bases, dict)
cls.instance = None
def __call__(cls,*args,**kw):
if cls.instance is None:
cls.instance = super(SingletonMeta, cls).__call__(*args, **kw)
return cls.instance
def __deepcopy__(self, item):
return item.__class__.instance
class Globals(object):
__metaclass__ = SingletonMeta
"""
This class is a workaround to the bug: AssertionError: daemonic processes are not allowed to have children
The root cause is that importing this file from different modules causes this file to be reevalutated each time,
thus ProcessPool() gets reexecuted inside that child thread, thus causing the daemonic processes bug
"""
def __init__(self):
print "%s::__init__()" % (self.__class__.__name__)
self.shared_manager = Manager()
self.shared_process_pool = ProcessPool()
self.shared_thread_pool = ThreadPool()
self.shared_lock = Lock() # BUG: Windows: global name 'lock' is not defined | doesn't affect cygwin
Then import safely from elsewhere in your code
from globals import Globals
Globals().shared_manager
Globals().shared_process_pool
Globals().shared_thread_pool
Globals().shared_lock
I have written a more expanded wrapper class around pathos.multiprocessing here:
https://github.com/JamesMcGuigan/python2-timeseries-datapipeline/blob/master/src/util/MultiProcessing.py
As a side note, if your usecase just requires async multiprocess map as a performance optimization, then joblib will manage all your process pools behind the scenes and allow this very simple syntax:
squares = Parallel(-1)( delayed(lambda num: num**2)(x) for x in range(100) )
https://joblib.readthedocs.io/

This presents a workaround for when the error is seemingly a false-positive. As also noted by James, this can happen to an unintentional import from a daemonic process.
For example, if you have the following simple code, WORKER_POOL can inadvertently be imported from a worker, leading to the error.
import multiprocessing
WORKER_POOL = multiprocessing.Pool()
A simple but reliable approach for a workaround is:
import multiprocessing
import multiprocessing.pool
class MyClass:
#property
def worker_pool(self) -> multiprocessing.pool.Pool:
# Ref: https://stackoverflow.com/a/63984747/
try:
return self._worker_pool # type: ignore
except AttributeError:
# pylint: disable=protected-access
self.__class__._worker_pool = multiprocessing.Pool() # type: ignore
return self.__class__._worker_pool # type: ignore
# pylint: enable=protected-access
In the above workaround, MyClass.worker_pool can be used without the error. If you think this approach can be improved upon, let me know.

Since Python version 3.7 we can create non-daemonic ProcessPoolExecutor
Using if __name__ == "__main__": is necessary while using multiprocessing.
from concurrent.futures import ProcessPoolExecutor as Pool
num_pool = 10
def main_pool(num):
print(num)
strings_write = (f'{num}-{i}' for i in range(num))
with Pool(num) as subp:
subp.map(sub_pool,strings_write)
return None
def sub_pool(x):
print(f'{x}')
return None
if __name__ == "__main__":
with Pool(num_pool) as p:
p.map(main_pool,list(range(1,num_pool+1)))

Here is how you can start a pool, even if you are in a daemonic process already. This was tested in python 3.8.5
First, define the Undaemonize context manager, which temporarily deletes the daemon state of the current process.
class Undaemonize(object):
'''Context Manager to resolve AssertionError: daemonic processes are not allowed to have children
Tested in python 3.8.5'''
def __init__(self):
self.p = multiprocessing.process.current_process()
if 'daemon' in self.p._config:
self.daemon_status_set = True
else:
self.daemon_status_set = False
self.daemon_status_value = self.p._config.get('daemon')
def __enter__(self):
if self.daemon_status_set:
del self.p._config['daemon']
def __exit__(self, type, value, traceback):
if self.daemon_status_set:
self.p._config['daemon'] = self.daemon_status_value
Now you can start a pool as follows, even from within a daemon process:
with Undaemonize():
pool = multiprocessing.Pool(1)
pool.map(... # you can do something with the pool outside of the context manager
While the other approaches here aim to create pool that is not daemonic in the first place, this approach allows you to start a pool even if you are in a daemonic process already.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing - catch SIGINT/SIGTERM and exit gracefully - python

Related

Stopping eval code dinamically on event fired [duplicate]

Calling GMainLoop.quit() for mainloop running in child process, from parent process

canonical example of worker process with PySide or PyQt

Extending mp.Process in Python 3

Python Process Pool non-daemonic?

Categories

Resources