Python Process Pool non-daemonic? - python

Would it be possible to create a python Pool that is non-daemonic? I want a pool to be able to call a function that has another pool inside.
I want this because deamon processes cannot create process. Specifically, it will cause the error:
AssertionError: daemonic processes are not allowed to have children
For example, consider the scenario where function_a has a pool which runs function_b which has a pool which runs function_c. This function chain will fail, because function_b is being run in a daemon process, and daemon processes cannot create processes.

The multiprocessing.pool.Pool class creates the worker processes in its __init__ method, makes them daemonic and starts them, and it is not possible to re-set their daemon attribute to False before they are started (and afterwards it's not allowed anymore). But you can create your own sub-class of multiprocesing.pool.Pool (multiprocessing.Pool is just a wrapper function) and substitute your own multiprocessing.Process sub-class, which is always non-daemonic, to be used for the worker processes.
Here's a full example of how to do this. The important parts are the two classes NoDaemonProcess and MyPool at the top and to call pool.close() and pool.join() on your MyPool instance at the end.
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import multiprocessing
# We must import this explicitly, it is not imported by the top-level
# multiprocessing module.
import multiprocessing.pool
import time
from random import randint
class NoDaemonProcess(multiprocessing.Process):
# make 'daemon' attribute always return False
def _get_daemon(self):
return False
def _set_daemon(self, value):
pass
daemon = property(_get_daemon, _set_daemon)
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class MyPool(multiprocessing.pool.Pool):
Process = NoDaemonProcess
def sleepawhile(t):
print("Sleeping %i seconds..." % t)
time.sleep(t)
return t
def work(num_procs):
print("Creating %i (daemon) workers and jobs in child." % num_procs)
pool = multiprocessing.Pool(num_procs)
result = pool.map(sleepawhile,
[randint(1, 5) for x in range(num_procs)])
# The following is not really needed, since the (daemon) workers of the
# child's pool are killed when the child is terminated, but it's good
# practice to cleanup after ourselves anyway.
pool.close()
pool.join()
return result
def test():
print("Creating 5 (non-daemon) workers and jobs in main process.")
pool = MyPool(5)
result = pool.map(work, [randint(1, 5) for x in range(5)])
pool.close()
pool.join()
print(result)
if __name__ == '__main__':
test()

I had the necessity to employ a non-daemonic pool in Python 3.7 and ended up adapting the code posted in the accepted answer. Below there's the snippet that creates the non-daemonic pool:
import multiprocessing.pool
class NoDaemonProcess(multiprocessing.Process):
#property
def daemon(self):
return False
#daemon.setter
def daemon(self, value):
pass
class NoDaemonContext(type(multiprocessing.get_context())):
Process = NoDaemonProcess
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class NestablePool(multiprocessing.pool.Pool):
def __init__(self, *args, **kwargs):
kwargs['context'] = NoDaemonContext()
super(NestablePool, self).__init__(*args, **kwargs)
As the current implementation of multiprocessing has been extensively refactored to be based on contexts, we need to provide a NoDaemonContext class that has our NoDaemonProcess as attribute. NestablePool will then use that context instead of the default one.
That said, I should warn that there are at least two caveats to this approach:
It still depends on implementation details of the multiprocessing package, and could therefore break at any time.
There are valid reasons why multiprocessing made it so hard to use non-daemonic processes, many of which are explained here. The most compelling in my opinion is:
As for allowing children threads to spawn off children of its own using
subprocess runs the risk of creating a little army of zombie
'grandchildren' if either the parent or child threads terminate before
the subprocess completes and returns.

As of Python 3.8, concurrent.futures.ProcessPoolExecutor doesn't have this limitation. It can have a nested process pool with no problem at all:
from concurrent.futures import ProcessPoolExecutor as Pool
from itertools import repeat
from multiprocessing import current_process
import time
def pid():
return current_process().pid
def _square(i): # Runs in inner_pool
square = i ** 2
time.sleep(i / 10)
print(f'{pid()=} {i=} {square=}')
return square
def _sum_squares(i, j): # Runs in outer_pool
with Pool(max_workers=2) as inner_pool:
squares = inner_pool.map(_square, (i, j))
sum_squares = sum(squares)
time.sleep(sum_squares ** .5)
print(f'{pid()=}, {i=}, {j=} {sum_squares=}')
return sum_squares
def main():
with Pool(max_workers=3) as outer_pool:
for sum_squares in outer_pool.map(_sum_squares, range(5), repeat(3)):
print(f'{pid()=} {sum_squares=}')
if __name__ == "__main__":
main()
The above demonstration code was tested with Python 3.8.
A limitation of ProcessPoolExecutor, however, is that it doesn't have maxtasksperchild. If you need this, consider the answer by Massimiliano instead.
Credit: answer by jfs

The multiprocessing module has a nice interface to use pools with processes or threads. Depending on your current use case, you might consider using multiprocessing.pool.ThreadPool for your outer Pool, which will result in threads (that allow to spawn processes from within) as opposed to processes.
It might be limited by the GIL, but in my particular case (I tested both), the startup time for the processes from the outer Pool as created here far outweighed the solution with ThreadPool.
It's really easy to swap Processes for Threads. Read more about how to use a ThreadPool solution here or here.

On some Python versions replacing standard Pool to custom can raise error: AssertionError: group argument must be None for now.
Here I found a solution that can help:
class NoDaemonProcess(multiprocessing.Process):
# make 'daemon' attribute always return False
#property
def daemon(self):
return False
#daemon.setter
def daemon(self, val):
pass
class NoDaemonProcessPool(multiprocessing.pool.Pool):
def Process(self, *args, **kwds):
proc = super(NoDaemonProcessPool, self).Process(*args, **kwds)
proc.__class__ = NoDaemonProcess
return proc

I have seen people dealing with this issue by using celery's fork of multiprocessing called billiard (multiprocessing pool extensions), which allows daemonic processes to spawn children. The walkaround is to simply replace the multiprocessing module by:
import billiard as multiprocessing

The issue I encountered was in trying to import globals between modules, causing the ProcessPool() line to get evaluated multiple times.
globals.py
from processing import Manager, Lock
from pathos.multiprocessing import ProcessPool
from pathos.threading import ThreadPool
class SingletonMeta(type):
def __new__(cls, name, bases, dict):
dict['__deepcopy__'] = dict['__copy__'] = lambda self, *args: self
return super(SingletonMeta, cls).__new__(cls, name, bases, dict)
def __init__(cls, name, bases, dict):
super(SingletonMeta, cls).__init__(name, bases, dict)
cls.instance = None
def __call__(cls,*args,**kw):
if cls.instance is None:
cls.instance = super(SingletonMeta, cls).__call__(*args, **kw)
return cls.instance
def __deepcopy__(self, item):
return item.__class__.instance
class Globals(object):
__metaclass__ = SingletonMeta
"""
This class is a workaround to the bug: AssertionError: daemonic processes are not allowed to have children
The root cause is that importing this file from different modules causes this file to be reevalutated each time,
thus ProcessPool() gets reexecuted inside that child thread, thus causing the daemonic processes bug
"""
def __init__(self):
print "%s::__init__()" % (self.__class__.__name__)
self.shared_manager = Manager()
self.shared_process_pool = ProcessPool()
self.shared_thread_pool = ThreadPool()
self.shared_lock = Lock() # BUG: Windows: global name 'lock' is not defined | doesn't affect cygwin
Then import safely from elsewhere in your code
from globals import Globals
Globals().shared_manager
Globals().shared_process_pool
Globals().shared_thread_pool
Globals().shared_lock
I have written a more expanded wrapper class around pathos.multiprocessing here:
https://github.com/JamesMcGuigan/python2-timeseries-datapipeline/blob/master/src/util/MultiProcessing.py
As a side note, if your usecase just requires async multiprocess map as a performance optimization, then joblib will manage all your process pools behind the scenes and allow this very simple syntax:
squares = Parallel(-1)( delayed(lambda num: num**2)(x) for x in range(100) )
https://joblib.readthedocs.io/

This presents a workaround for when the error is seemingly a false-positive. As also noted by James, this can happen to an unintentional import from a daemonic process.
For example, if you have the following simple code, WORKER_POOL can inadvertently be imported from a worker, leading to the error.
import multiprocessing
WORKER_POOL = multiprocessing.Pool()
A simple but reliable approach for a workaround is:
import multiprocessing
import multiprocessing.pool
class MyClass:
#property
def worker_pool(self) -> multiprocessing.pool.Pool:
# Ref: https://stackoverflow.com/a/63984747/
try:
return self._worker_pool # type: ignore
except AttributeError:
# pylint: disable=protected-access
self.__class__._worker_pool = multiprocessing.Pool() # type: ignore
return self.__class__._worker_pool # type: ignore
# pylint: enable=protected-access
In the above workaround, MyClass.worker_pool can be used without the error. If you think this approach can be improved upon, let me know.

Since Python version 3.7 we can create non-daemonic ProcessPoolExecutor
Using if __name__ == "__main__": is necessary while using multiprocessing.
from concurrent.futures import ProcessPoolExecutor as Pool
num_pool = 10
def main_pool(num):
print(num)
strings_write = (f'{num}-{i}' for i in range(num))
with Pool(num) as subp:
subp.map(sub_pool,strings_write)
return None
def sub_pool(x):
print(f'{x}')
return None
if __name__ == "__main__":
with Pool(num_pool) as p:
p.map(main_pool,list(range(1,num_pool+1)))

Here is how you can start a pool, even if you are in a daemonic process already. This was tested in python 3.8.5
First, define the Undaemonize context manager, which temporarily deletes the daemon state of the current process.
class Undaemonize(object):
'''Context Manager to resolve AssertionError: daemonic processes are not allowed to have children
Tested in python 3.8.5'''
def __init__(self):
self.p = multiprocessing.process.current_process()
if 'daemon' in self.p._config:
self.daemon_status_set = True
else:
self.daemon_status_set = False
self.daemon_status_value = self.p._config.get('daemon')
def __enter__(self):
if self.daemon_status_set:
del self.p._config['daemon']
def __exit__(self, type, value, traceback):
if self.daemon_status_set:
self.p._config['daemon'] = self.daemon_status_value
Now you can start a pool as follows, even from within a daemon process:
with Undaemonize():
pool = multiprocessing.Pool(1)
pool.map(... # you can do something with the pool outside of the context manager
While the other approaches here aim to create pool that is not daemonic in the first place, this approach allows you to start a pool even if you are in a daemonic process already.

Related

Python multiprocessing - catch SIGINT/SIGTERM and exit gracefully

I have two python scripts and I want them to communicate to each other. Specifically, I want script Communication.py to send an array to script Process.py if required by the latter. I've used module multiprocessing.Process and multiprocessing.Pipe to make it works. My code works, but I want to handle gracefully SIGINT and SIGTERM, I've tried the following but it does not exit gracefully:
Process.py
from multiprocessing import Process, Pipe
from Communication import arraySender
import time
import signal
class GracefulKiller:
kill_now = False
def __init__(self):
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, *args):
self.kill_now = True
def main():
parent_conn, child_conn = Pipe()
p = Process(target=arraySender, args=(child_conn,True))
p.start()
print(parent_conn.recv())
if __name__ == '__main__':
killer = GracefulKiller()
while not killer.kill_now:
main()
Communication.py
import numpy
from multiprocessing import Process, Pipe
def arraySender(child_conn, sendData):
if sendData:
child_conn.send(numpy.random.randint(0, high=10, size=15, dtype=int))
child_conn.close()
what am I doing wrong?
I strongly suspect you are running this under Windows because I think the code you have should work under Linux. This is why it is important to always tag your questions concerning Python and multiprocessing with the actual platform you are on.
The problem appears to be due to the fact that in addition to your main process you have created a child process in function main that is also receiving the signals. The solution would normally be to add calls like signal.signal(signal.SIGINT, signal.SIG_IGN) to your array_sender worker function. But there are two problems with this:
There is a race condition: The signal could be received by the child process before it has a change to ignore signals.
Regardless, the call to ignore signals when you are using multiprocess.Processing does not seem to work (perhaps that class does its own signal handling that overrides these calls).
The solution is to create a multiprocessing pool and initialize each pool process so that they ignore signals before you submit any tasks. The other advantage of using a pool, although in this case we only need a pool size of 1 because you never have more than one task running at a time, is that you only need to create the process once which can then be reused.
As an aside, you have some inconsistency in your GracefulKiller class by mixing a class attribute kill_now with an instance attribute kill_now that gets created when you execute self.kill_now = True. So when the main process is testing killer.kill_now it is accessing the class attribute until such time as self.kill_now is set to True when it will then be accessing the instance attribute.
from multiprocessing import Pool, Pipe
import time
import signal
import numpy
class GracefulKiller:
def __init__(self):
self.kill_now = False # Instance attribute
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, *args):
self.kill_now = True
def init_pool_processes():
signal.signal(signal.SIGINT, signal.SIG_IGN)
signal.signal(signal.SIGTERM, signal.SIG_IGN)
def arraySender(sendData):
if sendData:
return numpy.random.randint(0, high=10, size=15, dtype=int)
def main(pool):
result = pool.apply(arraySender, args=(True,))
print(result)
if __name__ == '__main__':
# Create pool with only 1 process:
pool = Pool(1, initializer=init_pool_processes)
killer = GracefulKiller()
while not killer.kill_now:
main(pool)
pool.close()
pool.join()
Ideally GracefulKiller should be a singleton class so that regardless of how many times GracefulKiller was instantiated by a process, you would be calling signal.signal only once for each type of signal you want to handle:
class Singleton(type):
def __init__(self, *args, **kwargs):
self.__instance = None
super().__init__(*args, **kwargs)
def __call__(self, *args, **kwargs):
if self.__instance is None:
self.__instance = super().__call__(*args, **kwargs)
return self.__instance
class GracefulKiller(metaclass=Singleton):
def __init__(self):
self.kill_now = False # Instance attribute
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, *args):
self.kill_now = True

How to create redis workers dynamically without blocking the main thread?

I want to have a queue - worker management tool, that allows adding new queues, and registering jobs to those queues, with workers spawned to handle those jobs.
I have this code so far:
from redis import Redis
from rq import Queue, Retry, Worker
class WorkerPool: # TODO: find a better name
def __init__(self):
self._queues = {}
self._workers = []
self._redis_conn = Redis()
def _get_queue(self, name):
try:
return self._queues[name]
except KeyError:
new_queue = Queue(name, connection=self._redis_conn)
self._queues[name] = new_queue
new_worker = Worker([new_queue], connection=self._redis_conn, name=name)
new_worker.work() # Blocking :(
return new_queue
def add_job(self, queue, func, *func_args):
q = self._get_queue(queue)
job = q.enqueue(func, *func_args, retry=Retry(max=3))
return job
As can be seen - the work() function blocks execution, while I want it to work in the background. I guess I can just create another thread here - and call work() from one thread, while the main thread returns the job, however, this seems a bit awkward to me. Is there a built-in Redis (or other known module) solution for this?
PS, better names for my class are welcome :)
This is my take on multiprocessing it (threading won't work due to signals sent from illegal threads):
import multiprocessing as mp
from redis import Redis
from rq import Queue, Retry, Worker
class WorkerPool: # TODO: find a better name
def __init__(self):
self._queues = {}
self._worker_procs = []
self._redis_conn = Redis()
def __del__(self):
for proc in self._worker_procs:
proc.kill()
def _get_queue(self, name):
try:
return self._queues[name]
except KeyError:
new_queue = Queue(name, connection=self._redis_conn)
self._queues[name] = new_queue
new_worker = Worker([new_queue], connection=self._redis_conn, name=name)
worker_process = mp.Process(target=new_worker.work)
worker_process.start()
self._worker_procs.append(worker_process)
return new_queue
def add_job(self, queue, func, *func_args):
q = self._get_queue(queue)
job = q.enqueue(func, *func_args, retry=Retry(max=3))
return job
Not sure how good this is, but it seems to do what I want for now
If you only need small-scale multiprocessing, tied to one main process, all running on the one machine, take a look at the multiprocessing module and the concurrent.futures module and their Pool and ProcessPoolExecutor objects. Unless you have specific requirements, it's probably better to use the Pool or ProcessPoolExecutor rather than start up Process objects manually. (In that case Redis may or may not be overkill.)
If your needs are larger-scale, workers across multiple machines, there's a whole category of software for running these; RabbitMQ is one widely-known one, but it's just one of several, each with its own strengths and weaknesses. Each of the cloud providers (if you're in the cloud) also has its own offering for this functionality. You probably want to read up on the features of several of the off-the-shelf solutions, decide which one is a good match, then set that up.
That said, I have in the past implemented a custom Redis-based queueing system; sometimes you really do need something not provided by any of the existing solutions. In that situation, the design will be heavily influenced by what features you do need. In my case, it was fine-grained priorities...

Is it right to init multiprocess in class __init__?

from multiprocessing.dummy import Pool as ThreadPool
class TSNew:
def __init__(self):
self.redis_client = redis.StrictRedis(host="172.17.31.147", port=4401, db=0)
self.global_switch = 0
self.pool = ThreadPool(40) # init pool
self.dnn_model = None
self.nnf = None
self.md5sum_nnf = "initialize"
self.thread = threading.Thread(target=self.load_model_item)
self.ts_picked_ids = None
self.thread.start()
self.memory = deque(maxlen=3000)
self.process = threading.Thread(target=self.process_user_dict)
self.process.start()
def load_model_item(self):
'''
code
'''
def predict_memcache(self,user_dict):
'''
code
'''
def process_user_dict(self):
while True:
'''
code to generate user_dicts which is a list
'''
results = self.pool.map(self.predict_memcache, user_dicts)
'''
code
'''
TSNew_ = TSNew()
def get_user_result():
logging.info("----------------come in ------------------")
if request.method == 'POST':
user_dict_json = request.get_data()# userid
if user_dict_json == '' or user_dict_json is None:
logging.info("----------------user_dict_json is ''------------------")
return ''
try:
user_dict = json.loads(user_dict_json)
except:
logging.info("json load error, pass")
return ''
TSNew_.memory.append(user_dict)
logging.info('add to deque TSNew_.memory size: %d PID: %d', len(TSNew_.memory), os.getpid())
logging.info("add to deque userid: %s, nation: %s \n",user_dict['user_id'], user_dict['user_country'])
return 'SUCCESS\n'
#app.route('/', methods=['POST'])
def get_ts_gbdt_id():
return get_user_result()
from werkzeug.contrib.fixers import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=4444)
I create a multi thread pool in class __init__ and I use the self.pool
to map the function of predict_memcache.
I have two doubts:
(a) Should I initialize the pool in __init__ or just init it right before
results = self.pool.map(self.predict_memcache, user_dicts)
(b) Since the pool is a multi thread operation and it is executed in the thread of process_user_dict, so is there any hidden error ?
Thanks.
Question (a):
It depends. If you need to run process_user_dict more than once, then it makes sense to start the pool in the constructor and keep it running. Creating a thread pool always comes with some overhead and by keeping the pool alive between calls to process_user_dict you would avoid that additional overhead.
If you just want to process one set of input, you can as well create your pool right inside process_user_dict. But probably not right before results = self.pool.map(self.predict_memcache, user_dicts) because that would create a pool for every iteration of your surrounding while loop.
In your specific case, it does not make any difference. You create your TSNew_ object on module-level, so that it remains alive (and with it the thread pool) while your app is running; the same thread pool from the same TSNew instance is used to process all the requests during the lifetime of app.run().
Since you seem to be using that construct with self.process = threading.Thread(target=self.process_user_dict) as some sort of listener on self.memory, creating the pool in the constructor is functionally equivalent to creating the pool inside of process_user_dict (but outside the loop).
Question (b):
Technically, there is no hidden error by default when creating a thread inside a thread. In the end, any additional thread's ultimate parent is always the MainThread, that is implicitly created for every instance of a Python interpreter. Basically, every time you create a thread inside a Python program, you create a thread in a thread.
Actually, your code does not even create a thread inside a thread. Your self.pool is created inside the MainThread. When the pool is instantiated via self.pool = ThreadPool(40) it creates the desired number (40) of worker threads, plus one worker handler thread, one task handler thread and one result handler thread. All of these are child threads of the MainThread. All you do with regards to your pool inside your thread under self.process is calling its map method to assign tasks to it.
However, I do not really see the point of what you are doing with that self.process here.
Making a guess, I would say that you want to start the loop in process_user_dict to act as kind of a listener on self.memory, so that the pool starts processing user_dict as soon as they start showing up in the deque in self.memory. From what I see you doing in get_user_result, you seem to get one user_dict per request. I understand that you might have concurrent user sessions passing in these dicts, but do you really see benfit from process_user_dict running in an infinite loop over simply calling TSNew_.process_user_dict() after TSNew_.memory.append(user_dict)? You could even omit self.memory completely and pass the dict directly to process_user_dict, unless I am missing something you did not show us.

Using Multiprocessing with Modules

I am writing a module such that in one function I want to use the Pool function from the multiprocessing library in Python 3.6. I have done some research on the problem and the it seems that you cannot use if __name__=="__main__" as the code is not being run from main. I have also noticed that the python pool processes get initialized in my task manager but essentially are stuck.
So for example:
class myClass()
...
lots of different functions here
...
def multiprocessFunc()
do stuff in here
def funcThatCallsMultiprocessFunc()
array=[array of filenames to be called]
if __name__=="__main__":
p = Pool(processes=20)
p.map_async(multiprocessFunc,array)
I tried to remove the if __name__=="__main__" part but still no dice. any help would appreciated.
It seems to me that your have just missed out a self. from your code. I should think this will work:
class myClass():
...
# lots of different functions here
...
def multiprocessFunc(self, file):
# do stuff in here
def funcThatCallsMultiprocessFunc(self):
array = [array of filenames to be called]
p = Pool(processes=20)
p.map_async(self.multiprocessFunc, array) #added self. here
Now having done some experiments, I see that map_async could take quite some time to start up (I think because multiprocessing creates processes) and any test code might call funcThatCallsMultiprocessFunc and then quit before the Pool has got started.
In my tests I had to wait for over 10 seconds after funcThatCallsMultiprocessFunc before calls to multiprocessFunc started. But once started, they seemed to run just fine.
This is the actual code I've used:
MyClass.py
from multiprocessing import Pool
import time
import string
class myClass():
def __init__(self):
self.result = None
def multiprocessFunc(self, f):
time.sleep(1)
print(f)
return f
def funcThatCallsMultiprocessFunc(self):
array = [c for c in string.ascii_lowercase]
print(array)
p = Pool(processes=20)
p.map_async(self.multiprocessFunc, array, callback=self.done)
p.close()
def done(self, arg):
self.result = 'Done'
print('done', arg)
Run.py
from MyClass import myClass
import time
def main():
c = myClass()
c.funcThatCallsMultiprocessFunc()
for i in range(30):
print(i, c.result)
time.sleep(1)
if __name__=="__main__":
main()
The if __name__=='__main__' construct is an import protection. You want to use it, to stop multiprocessing from running your setup on import.
In your case, you can leave out this protection in the class setup. Be sure to protect the execution points of the class in the calling file like this:
def apply_async_with_callback():
pool = mp.Pool(processes=30)
for i in range(z):
pool.apply_async(parallel_function, args = (i,x,y, ), callback = callback_function)
pool.close()
pool.join()
print "Multiprocessing done!"
if __name__ == '__main__':
apply_async_with_callback()

Extending mp.Process in Python 3

import multiprocessing as mp
import time as t
class MyProcess(mp.Process):
def __init__(self, target, args, name):
mp.Process.__init__(self, target=target, args=args)
self.exit = mp.Event()
self.name = name
print("{0} initiated".format(self.name))
def run(self):
while not self.exit.is_set():
pass
print("Process {0} exited.".format(self.name))
def shutdown(self):
print("Shutdown initiated for {0}.".format(self.name))
self.exit.set()
def f(x):
while True:
print(x)
x = x+1
if __name__ == "__main__":
p = MyProcess(target=f, args=[3], name="function")
p.start()
#p.join()
t.wait(2)
p.shutdown()
I'm trying to extend the multiprocessing.Process class to add a shutdown method in order to be able to exit a function which could potentially have to be run for an undefined amount of time. Following instructions from Python Multiprocessing Exit Elegantly How? and adding the argument passing I came up with myself, only gets me this output:
function initiated
Shutdown initiated for function.
Process function exited.
But no actual method f(x) output. It seems that the actual process target doesn't get started. I'm obviously doing something wrong, but just can't figure out what, any ideas?
Thanks!
The sane way to handle this situation is, where possible, to have the background task cooperate in the exit mechanism by periodically checking the exit event. For that, there's no need to subclass Process: you can rewrite your background task to include that check. For example, here's your code rewritten using that approach:
import multiprocessing as mp
import time as t
def f(x, exit_event):
while not exit_event.is_set():
print(x)
x = x+1
print("Exiting")
if __name__ == "__main__":
exit_event = mp.Event()
p = mp.Process(target=f, args=(3, exit_event), name="function")
p.start()
t.sleep(2)
exit_event.set()
p.join()
If that's not an option (for example because you can't modify the code that's being run in the background job), then you can use the Process.terminate method. But you should be aware that using it is dangerous: the child process won't have an opportunity to clean up properly, so for example if it's shutdown while holding a multiprocessing lock, no other process will be able to acquire that lock, giving a risk of deadlock. It's far better to have the child cooperate in the shutdown if possible.
The solution to this problem is to call the super().run() function in your class run method.
Of course, this will cause the permanent execution of your function due to the existence of while True, and the specified event will not cause its end.
You can use Process.terminate() method to end your process.
import multiprocessing as mp
import time as t
class MyProcess(mp.Process):
def __init__(self, target, args, name):
mp.Process.__init__(self, target=target, args=args)
self.name = name
print("{0} initiated".format(self.name))
def run(self):
print("Process {0} started.".format(self.name))
super().run()
def shutdown(self):
print("Shutdown initiated for {0}.".format(self.name))
self.terminate()
def f(x):
while True:
print(x)
t.sleep(1)
x += 1
if __name__ == "__main__":
p = MyProcess(target=f, args=(3,), name="function")
p.start()
# p.join()
t.sleep(5)
p.shutdown()

Categories