will 'multiprocessing' automatically close finished child processes? - python

I used the multiprocessing lib to create multi-thread to process a list of files(20+ files).
When I run the py file, I set the pool number as 4. But in cmd, it showed there are over 10 processes. And most of them have been running for a long time. Because it's large file and takes long time to process so I'm not sure if the process is hanging or still executing.
So my question is:
if it's executing, how to set the process number as exactly 4?
if it's hanging, it means child process will not shut down after finished. Can I set it automatically shutting down after finished?
from multiprocessing import Pool
poolNum = int(sys.argv[1])
pool = Pool(poolNum)
pool.map(processFunc, fileList)

It won't, not until the Pool is close-ed or terminate-ed (IIRC Pools at least at present have a reference cycle involved, so even when the last live reference to the Pool goes away, the Pool is not deterministically collected, even on CPython, which uses reference counting and normally has deterministic behavior).
Since you're using map, your work is definitely done when map returns, so the simplest solution is just to use a with statement for guaranteed termination:
from multiprocessing import Pool
def main():
poolNum = int(sys.argv[1])
with Pool(poolNum) as pool: # Pool created
pool.map(processFunc, fileList)
# terminate has been called, all workers will be killed
# Adding main guard so this code is valid on Windows and anywhere else which
# doesn't use forking for whatever reason
if __name__ == '__main__':
main()
As I commented, I used a main function with the standard guard against being invoked on import, as Windows (and on 3.8+ macOS, plus any OS if the script opts into the 'spawn' startmethod) simulates forking by reimporting the main module (but not naming it __main__); without the guard, you can end up with the child process creating new processes automatically, which is problematic.
Side-note: If you are dispatching a bunch of tasks but not waiting on them immediately (so you don't want to terminate the pool anywhere near when you create it, but want to ensure the workers are cleaned up promptly), you can still use context management to help out. Just use contextlib.closing to close the pool once all the tasks are dispatched; you must dispatch all the tasks before the end of the with block, but you can retrieve the results later, and when all results are computed, the child processes will close. For example:
from contextlib import closing
from multiprocessing import Pool
def main():
poolNum = int(sys.argv[1])
with closing(Pool(poolNum)) as pool: # Pool created
results = pool.imap_unordered(processFunc, fileList)
# close has been called, so no new work can be submitted,
# and when all outstanding tasks complete, the workers will exit
# immediately/cleanly
for res in results:
# Can still retrieve results even after pool is closed
# Adding main guard so this code is valid on Windows and anywhere else which
# doesn't use forking for whatever reason
if __name__ == '__main__':
main()

Related

How to get rid of zombie processes using torch.multiprocessing.Pool (Python)

I am using torch.multiprocessing.Pool to speed up my NN in inference, like this:
import torch.multiprocessing as mp
mp = torch.multiprocessing.get_context('forkserver')
def parallel_predict(predict_func, sequences, args):
predicted_cluster_ids = []
pool = mp.Pool(args.num_workers, maxtasksperchild=1)
out = pool.imap(
func=functools.partial(predict_func, args=args),
iterable=sequences,
chunksize=1)
for item in tqdm(out, total=len(sequences), ncols=85):
predicted_cluster_ids.append(item)
pool.close()
pool.terminate()
pool.join()
return predicted_cluster_ids
Note 1) I am using imap because I want to be able to show a progress bar with tqdm.
Note 2) I tried with both forkserver and spawn but no luck. I cannot use other methods because of how they interact (poorly) with CUDA.
Note 3) I am using maxtasksperchild=1 and chunksize=1 so for each sequence in sequences it spawns a new process.
Note 4) Adding or removing pool.terminate() and pool.join() makes no difference.
Note 5) predict_func is a method of a class I created. I could also pass the whole model to parallel_predict but it does not change anything.
Everything works fine except the fact that after a while I run out of memory on the CPU (while on the GPU everything works as expected). Using htop to monitor memory usage I notice that, for every process I spawn with pool I get a zombie that uses 0.4% of the memory. They don't get cleared, so they keep using space. Still, parallel_predict does return the correct result and the computation goes on. My script is structured in a way that id does validation multiple times so next time parallel_predict is called the zombies add up.
This is what I get in htop:
Usually, these zombies get cleared after ctrl-c but in some rare cases I need to killall.
Is there some way I can force the Pool to close them?
UPDATE:
I tried to kill the zombie processes using this:
def kill(pool):
import multiprocessing
import signal
# stop repopulating new child
pool._state = multiprocessing.pool.TERMINATE
pool._worker_handler._state = multiprocessing.pool.TERMINATE
for p in pool._pool:
os.kill(p.pid, signal.SIGKILL)
# .is_alive() will reap dead process
while any(p.is_alive() for p in pool._pool):
pass
pool.terminate()
But it does not work. It gets stuck at pool.terminate()
UPDATE2:
I tried to use the initializer arg in imap to catch signals like this:
def process_initializer():
def handler(_signal, frame):
print('exiting')
exit(0)
signal.signal(signal.SIGTERM, handler)
def parallel_predict(predict_func, sequences, args):
predicted_cluster_ids = []
with mp.Pool(args.num_workers, initializer=process_initializer, maxtasksperchild=1) as pool:
out = pool.imap(
func=functools.partial(predict_func, args=args),
iterable=sequences,
chunksize=1)
for item in tqdm(out, total=len(sequences), ncols=85):
predicted_cluster_ids.append(item)
for p in pool._pool:
os.kill(p.pid, signal.SIGTERM)
pool.close()
pool.terminate()
pool.join()
return predicted_cluster_ids
but again it does not free memory.
Ok, I have more insights to share with you. Indeed this is not a bug, it is actually the "supposed" behavior for the multiprocessing module in Python (torch.multiprocessing wraps it). What happens is that, although the Pool terminates all the processes, the memory is not released (given back to the OS). This is also stated in the documentation, though in a very confusing way.
In the documentation it says that
Worker processes within a Pool typically live for the complete duration of the Pool’s work queue
but also:
A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user
but the "clean up" does NOT happen.
To make things worse I found this post in which they recommend to use maxtasksperchild=1. This increases the memory leak, because this way the number of zombies goes with the number of data points to be predicted, and since pool.close() does not free memory they add up.
This is very bad if you are using multiprocessing for example in validation. For every validation step I was reinitializing the pool but the memory didn't get freed from the previous iteration.
The SOLUTION here is to move pool = mp.Pool(args.num_workers) outside the training loop, so the pool does not get closed and reopened, and therefore it always reuses the same processes. NOTE: again remember to remove maxtasksperchild=1 and chunksize=1.
I think this should be included in the best practices page.
BTW in my opinion this behavior of the multiprocessing library should be considered as a bug and should be fixed Python side (not Pytorch side)

How can I refresh Pool processes reading from Queue?

I'm using multiprocessing Pool + Queue to share processing work between a parent process (processing with GPUs) and child processes (processing on the CPU). My program looks like this:
def reader_proc(queue):
## Read from the queue; this will be spawned as a separate Process
while True:
msg = queue.get() # Read from the queue and do nothing
do_cpu_work(msg)
if (msg == 'DONE'):
break
if __name__=='__main__':
queue = JoinableQueue()
pool = Pool(reader_proc, target=(queue,))
for task in GPUWork:
results = do_task(task)
for result in results:
queue.put(task)
# put 'DONE' on and join and close
I'm having a severe memory leak right now, even after explicitly deleting every variable in the reader_proc and calling gc.collect(). I'm calling into various C++ libraries from the reader_proc and I suspect one of them could be leaking memory. While I try and debug that, I need to get some processing done on this data.
Is there any way to refresh these reader processes? E.g. periodically terminate them and restart them. This exists with maxtasksperchild for a Pool operating on an iter but doesn't seem to apply to this Queue / Process based scheme.

how do I wait for any thread that ends instead of a specific one?

My script has started many threads. It has kept count. The maximum number of threads are now running. There are more to be run. How can the script wait for any one of the running threads to end so it can safely start another one? It is using threading.Thread() to create each thread but that can be changed if there is a better module. I am using Python 3.6.x.
To create a pool of processes and pass tasks to them:
def processing_task(arg1, arg2):
...
from multiprocessing import Pool
with Pool() as worker_pool: # By default creates processes == number of CPUs
while True:
task = some_queue_implementation.get() # Some blocking method that receives tasks
worker_pool.apply_async(processing_task, task.arg1, task.arg2)
This will create child processes that will be idle until they are passed a task
Here is a snippet of code I always use when using threads.
sets a certain amount of threads;
ensures that no code coming after the context manager block executes until all threads complete; and
kills and raises an exception, if one of the child threads throws an exception.
with concurrent.futures.ThreadPoolExecutor(max_workers=5)\
as executor:
futures.append(executor.submit(function, args))
for future in concurrent.futures.as_completed(futures):
if future.exception():
for child in futures:
child.cancel()
raise future.exception()
It has kept count
I hope you are using .lock() and .acquire(), ;-)

Why won't Python Multiprocessing Workers die?

I'm using the python multiprocessing functionality to map some function across some elements. Something along the lines of this:
def computeStuff(arguments, globalData, concurrent=True):
pool = multiprocessing.Pool(initializer=initWorker, initargs=(globalData,))
results = pool.map(workerFunction, list(enumerate(arguments)))
return results
def initWorker(globalData):
workerFunction.globalData = globalData
def workerFunction((index, argument)):
... # computation here
Generally I run tests in ipython using both cPython and Pypy. I have noticed that the spawned processes often don't get killed, so they start accumulating, each using a gig of ram. This happens when hitting ctrl-k during a computation, which sends multiprocessing into a big frenzy of confusion. But even when letting computation finish, those processes won't die in Pypy.
According to the documentation, when the pool gets garbage collected, it should call terminate() and kill all the processes. What's happening here? Do I have to explicitly call close()? If yes, is there some sort of context manager that properly manages closing the resources (i.e. processes)?
This is on Mac OS X Yosemite.
PyPy's garbage collection is lazy, so failing to call close means the Pool is cleaned "sometime", but that might not mean "anytime soon".
Once the Pool is properly closed, the workers exit when they run out of tasks. An easy way to ensure the Pool is closed in pre-3.3 Python is:
from contextlib import closing
def computeStuff(arguments, globalData, concurrent=True):
with closing(multiprocessing.Pool(initializer=initWorker, initargs=(globalData,))) as pool:
return pool.map(workerFunction, enumerate(arguments))
Note: I also removed the explicit conversion to list (pointless, since map will iterate the enumerate iterator for you), and returned the results directly (no need to assign to a name only to return on the next line).
If you want to ensure immediate termination in the exception case (on pre-3.3 Python), you'd use a try/finally block, or write a simple context manager (which could be reused for other places where you use a Pool):
from contextlib import contextmanager
#contextmanager
def terminating(obj):
try:
yield obj
finally:
obj.terminate()
def computeStuff(arguments, globalData, concurrent=True):
with terminating(multiprocessing.Pool(initializer=initWorker, initargs=(globalData,))) as pool:
return pool.map(workerFunction, enumerate(arguments))
The terminating approach is superior in that it guarantees the processes exit immediately; in theory, if you're using threads elsewhere in your main program, the Pool workers might be forked with non-daemon threads, which would keep the processes alive even when the worker task thread exited; terminating hides this by killing the processes forcibly.
If your interpreter is Python 3.3 or higher, the terminating approach is built-in to Pool, so no special wrapper is needed for the with statement, with multiprocessing.Pool(initializer=initWorker, initargs=(globalData,)) as pool: works directly.

How to list Processes started by multiprocessing Pool?

While attempting to store multiprocessing's process instance in multiprocessing list-variable 'poolList` I am getting a following exception:
SimpleQueue objects should only be shared between processes through inheritance
The reason why I would like to store the PROCESS instances in a variable is to be able to terminate all or just some of them later (if for example a PROCESS freezes). If storing a PROCESS in variable is not an option I would like to know how to get or to list all the PROCESSES started by mutliprocessing POOL. That would be very similar to what .current_process() method does. Except .current_process gets only a single process while I need all the processes started or all the processes currently running.
Two questions:
Is it even possible to store an instance of the Process (as a result of mp.current_process()
Currently I am only able to get a single process from inside of the function that the process is running (from inside of myFunct() using .current_process() method).
Instead I would like to to list all the processes currently running by multiprocessing. How to achieve it?
import multiprocessing as mp
poolList=mp.Manager().list()
def myFunct(arg):
print 'myFunct(): current process:', mp.current_process()
try: poolList.append(mp.current_process())
except Exception, e: print e
for i in range(110):
for n in range(500000):
pass
poolDict[arg]=i
print 'myFunct(): completed', arg, poolDict
from multiprocessing import Pool
pool = Pool(processes=2)
myArgsList=['arg1','arg2','arg3']
pool=Pool(processes=2)
pool.map_async(myFunct, myArgsList)
pool.close()
pool.join()
To list the processes started by a Pool()-instance(which is what you mean if I understand you correctly), there is the pool._pool-list. And it contains the instances of the processes.
However, it is not part of the documented interface and hence, really should not be used.
BUT...it seems a little bit unlikely that it would change just like that anyway. I mean, should they stop having an internal list of processes in the pool? And not call that _pool?
And also, it annoys me that there at least isn't a get processes-method. Or something.
And handling it breaking due to some name change should not be that difficult.
But still, use at your own risk:
from multiprocessing import pool
# Have to run in main
if __name__ == '__main__':
# Create 3 worker processes
_my_pool = pool.Pool(3)
# Loop, terminate, and remove from the process list
# Use a copy [:] of the list to remove items correctly
for _curr_process in _my_pool._pool[:]:
print("Terminating process "+ str(_curr_process.pid))
_curr_process.terminate()
_my_pool._pool.remove(_curr_process)
# If you call _repopulate, the pool will again contain 3 worker processes.
_my_pool._repopulate_pool()
for _curr_process in _my_pool._pool[:]:
print("After repopulation "+ str(_curr_process.pid))
The example creates a pool and manually terminates all processes.
It is important that you remember to delete the process you terminate from the pool yourself i you want Pool() to continue working as usual.
_my_pool._repopulate increases the number of working processes to 3 again, not needed to answer the question, but gives a little bit of behind-the-scenes insight.
Yes you can get all active process and perform action based on name of process
e.g
multiprocessing.Process(target=foo, name="refresh-reports")
and then
for p in multiprocessing.active_children():
if p.name == "foo":
p.terminate()
You're creating a managed List object, but then letting the associated Manager object expire.
Process objects are shareable because they aren't pickle-able; that is, they aren't simple.
Oddly the multiprocessing module doesn't have the equivalent of threading.enumerate() -- that is, you can't list all outstanding processes. As a workaround, I just store procs in a list. I never terminate() a process, but do sys.exit(0) in the parent. It's rough, because the workers will leave things in an inconsistent state, but it's okay for smaller programs
To kill a frozen worker, I suggest: 1) worker receives "heartbeat" jobs in a queue every now and then, 2) if parent notices worker A hasn't responded to a heartbeat in a certain amount of time, then p.terminate(). Consider restating the problem in another SO question, as it's interesting.
To be honest the map stuff is much easier than using a Manager.
Here's a Manager example I've used. A worker adds stuff to a shared list. Another worker occasionally wakes up, processes everything on the list, then goes back to sleep. The code also has verbose logs, which are essential for ease in debugging.
source
# producer adds to fixed-sized list; scanner uses them
import logging, multiprocessing, sys, time
def producer(objlist):
'''
add an item to list every sec; ensure fixed size list
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(1)
except KeyboardInterrupt:
return
msg = 'ding: {:04d}'.format(int(time.time()) % 10000)
logger.info('put: %s', msg)
del objlist[0]
objlist.append( msg )
def scanner(objlist):
'''
every now and then, run calculation on objlist
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(5)
except KeyboardInterrupt:
return
logger.info('items: %s', list(objlist))
def main():
logger = multiprocessing.log_to_stderr(
level=logging.INFO
)
logger.info('setup')
# create fixed-length list, shared between producer & consumer
manager = multiprocessing.Manager()
my_objlist = manager.list( # pylint: disable=E1101
[None] * 10
)
multiprocessing.Process(
target=producer,
args=(my_objlist,),
name='producer',
).start()
multiprocessing.Process(
target=scanner,
args=(my_objlist,),
name='scanner',
).start()
logger.info('running forever')
try:
manager.join() # wait until both workers die
except KeyboardInterrupt:
pass
logger.info('done')
if __name__=='__main__':
main()

Categories