I am trying to run inference with tensorflow using multiprocessing. Each process uses 1 GPU. I have a list of files input_files[]. Every process gets one file, runs the model.predict on it and writes the results to file. To move on to next file, I need to close the process and restart it. This is because tensorflow doesn't let go of memory. So if I use the same process, I get memory leak.
I have written a code below which works. I start 5 processes, close them and start another 5. The issue is that all processes need to wait for the slowest one before they can move on. How can I start and close each process independent of the others?
Note that Pool.map is over input_files_small not input_files.
file1 --> start new process --> run prediction --> close process --> file2 --> start new process --> etc.
for i in range(0, len(input_files), num_process):
input_files_small = input_files[i:i+num_process]
try:
process_pool = multiprocessing.Pool(processes=num_process, initializer=init_worker, initargs=(gpu_ids))
pool_output = process_pool.map(worker_fn, input_files_small)
finally:
process_pool.close()
process_pool.join()
There is no need to re-create over and over the processing pool. First, specify maxtasksperchild=1 when creating the pool. This should result in creating a new process for each new task submitted. And instead of using method map, use method map_async, which will not block. You can use pool.close followed by pool.join() to wait for these submissions to complete implicitly if your worker function does not return results you need, as follows or use the second code variation:
process_pool = multiprocessing.Pool(processes=num_process, initializer=init_worker, initargs=(gpu_ids), maxtasksperchild=1)
for i in range(0, len(input_files), num_process):
input_files_small = input_files[i:i+num_process]
process_pool.map_async(worker_fn, input_files_small))
# wait for all outstanding tasks to complete
process_pool.close()
process_pool.join()
If you need return values from worker_fn:
process_pool = multiprocessing.Pool(processes=num_process, initializer=init_worker, initargs=(gpu_ids), maxtasksperchild=1)
results = []
for i in range(0, len(input_files), num_process):
input_files_small = input_files[i:i+num_process]
results.append(process_pool.map_async(worker_fn, input_files_small))
# get return values from map_async
pool_outputs = [result.get() for result in results]
# you do not need process_pool.close() and process_pool.join()
But, since there may be some "slow" tasks still running from an earlier invocation of map_async when tasks from a later invocation of map_async start up, some of these tasks may still have to wait to run. But at least all of your processes in the pool should stay fairly busy.
If you are expecting exceptions from your worker function and need to handle them in your main process, it gets more complicated.
Related
I want to do multiple transformations on some data. I figured I can use multiple Pool.imap's because each of the transformations is just a simple map. And Pool.imap is lazy, so it only does computation when needed.
But strangely, it looks like multiple consecutive Pool.imap's are blocking. And not lazy. Look at the following code as an example.
import time
from multiprocessing import Pool
def slow(n):
time.sleep(0.01)
return n*n
for i in [10, 100, 1000]:
with Pool() as p:
numbers = range(i)
iter1 = p.imap(slow, numbers)
iter2 = p.imap(slow, iter1)
start = time.perf_counter()
next(iter2)
print(i, time.perf_counter() - start)
# Prints
# 10 0.0327413540071575
# 100 0.27094774100987706
# 1000 2.6275791430089157
As you can see the time to the first element is increasing. I have 4 cores on my machine, so it roughly takes 2.5 seconds to process 1000 items with a 0.01 second delay. Hence, I think two consecutive Pool.imap's are blocking. And that the first Pool.imap finishes the entire workload before the second one starts. That is not lazy.
I've did some additional research. It does not matter if I use a process pool or a thread pool. It happens with Pool.imap and Pool.imap_unordered. The blocking takes longer when I do a third Pool.imap. A single Pool.imap is not blocking. This bug report seems related but different.
TL;DR imap is not a real generator, meaning it does not generate items on-demand (lazy computation aka similar to coroutine), and pools initiate "jobs" in serial.
longer answer: Every type of submission to a Pool be it imap, apply, apply_async etc.. gets written to a queue of "jobs". This queue is read by a thread in the main process (pool._handle_tasks) in order to allow jobs to continue to be initiated while the main process goes off and does other things. This thread contains a very simple double for loop (with a lot of error handling) that basically iterates over each job, then over each task within each job. The inner loop blocks until a worker is available to get each task, meaning tasks (and jobs) are always started in serial in the exact order they were submitted. This does not mean they will finish in perfect serial, which is why map, and imap collect results, and re-order them back to their original order (handled by pool._handle_resluts thread) before passing back to the main thread.
Rough pseudocode of what's going on:
#task_queue buffers task inputs first in - first out
pool.imap(foo, ("bar", "baz", "bat"), chunksize=1)
#put an iterator on the task queue which will yield "chunks" (a chunk is given to a single worker process to compute)
pool.imap(fun, ("one", "two", "three"), chunksize=1)
#put a second iterator to the task queue
#inside the pool._task_handler thread within the main proces
for task in task_queue: #[imap_1, imap_2]
#this is actually a while loop in reality that tries to get new tasks until the pool is close()'d
for chunk in task:
_worker_input_queue.put(chunk) # give the chunk to the next available worker
# This blocks until a worker actually takes the chunk, meaning the loop won't
# continue until all chunks are taken by workers.
def worker_function(_worker_input_queue, _worker_output_queue):
while True:
task = _worker_input_queue.get() #get the next chunk of tasks
#if task == StopSignal: break
result = task.func(task.args)
_worker_output_queue.put(result) #results are collected, and re-ordered
# by another thread in the main process
# as they are completed.
I'm using multiprocessing Pool + Queue to share processing work between a parent process (processing with GPUs) and child processes (processing on the CPU). My program looks like this:
def reader_proc(queue):
## Read from the queue; this will be spawned as a separate Process
while True:
msg = queue.get() # Read from the queue and do nothing
do_cpu_work(msg)
if (msg == 'DONE'):
break
if __name__=='__main__':
queue = JoinableQueue()
pool = Pool(reader_proc, target=(queue,))
for task in GPUWork:
results = do_task(task)
for result in results:
queue.put(task)
# put 'DONE' on and join and close
I'm having a severe memory leak right now, even after explicitly deleting every variable in the reader_proc and calling gc.collect(). I'm calling into various C++ libraries from the reader_proc and I suspect one of them could be leaking memory. While I try and debug that, I need to get some processing done on this data.
Is there any way to refresh these reader processes? E.g. periodically terminate them and restart them. This exists with maxtasksperchild for a Pool operating on an iter but doesn't seem to apply to this Queue / Process based scheme.
please be warned that this demonstration code generates a few GB data.
I have been using versions of the code below for multiprocessing for some time. It works well when the run time of each process in the pool is similar but if one process takes much longer I end up with many blocked processes waiting on the one, so I'm trying to make it run asynchronously - just for one function at a time.
For example, if I have 70 cores and need to run a function 2000 times I want that to run asynchronously then wait for the last process before calling the next function. Currently it just submits processes in batches of how ever many cores I give it and each batch has to wait for the longest process.
As you can see I've tried using map_async but this is clearly the wrong syntax. Can anyone help me out?
import os
p='PATH/test/'
def f1(tup):
x,y=tup
to_write = x*(y**5)
with open(p+x+str(y)+'.txt','w') as fout:
fout.write(to_write)
def f2(tup):
x,y=tup
print (os.path.exists(p+x+str(y)+'.txt'))
def call_func(f,nos,threads,call):
print (call)
for i in range(0, len(nos), threads):
print (i)
chunk = nos[i:i + threads]
tmp = [('args', no) for no in chunk]
pool.map(f, tmp)
#pool.map_async(f, tmp)
nos=[i for i in range(55)]
threads=8
if __name__ == '__main__':
with Pool(processes=threads) as pool:
call_func(f1,nos,threads,'f1')
call_func(f2,nos,threads,'f2')
map will only return and map_async will only call the callback after all tasks of the current chunk are done.
So you can only either give all tasks to map/map_async at once or use apply_async (initially called threads times) where the callback calls apply_asyncfor the next task.
If the actual return values of the call don't matter (or at least their order doesn't), imap_unordered may be another efficient solution when giving it all tasks at once (or an iterator/generator producing the tasks on demand)
I'm building a Python script/application which launches multiple so-called Fetchers.
They in turn do something and return data into a queue.
I want to make sure the Fetchers don't run for more than 60 seconds (because the entire application runs multiple times in one hour).
Reading the Python docs i noticed they say to be carefull when using Process.Terminate() because it can break the Queue.
My current code:
# Result Queue
resultQueue = Queue();
# Create Fetcher Instance
fetcher = fetcherClass()
# Create Fetcher Process List
fetcherProcesses = []
# Run Fetchers
for config in configList:
# Create Process to encapsulate Fetcher
log.debug("Creating Fetcher for Target: %s" % config['object_name'])
fetcherProcess = Process(target=fetcher.Run, args=(config,resultQueue))
log.debug("Starting Fetcher for Target: %s" % config['object_name'])
fetcherProcess.start()
fetcherProcesses.append((config, fetcherProcess))
# Wait for all Workers to complete
for config, fetcherProcess in fetcherProcesses:
log.debug("Waiting for Thread to complete (%s)." % str(config['object_name']))
fetcherProcess.join(DEFAULT_FETCHER_TIMEOUT)
if fetcherProcess.is_alive():
log.critical("Fetcher thread for object %s Timed Out! Terminating..." % config['object_name'])
fetcherProcess.terminate()
# Loop thru results, and save them in RRD
while not resultQueue.empty():
config, fetcherResult = resultQueue.get()
result = storage.Save(config, fetcherResult)
I want to make sure my Queue doesn't get corrupted when one of my Fetchers times out.
What is the best way to do this?
Edit: In response to a chat with sebdelsol a few clarifications:
1) I want to start processing data as soon as possible, because otherwise i have to perform a lot of Disk Intensive operations all at once. So sleeping the main thread for X_Timeout is not an option.
2) I need to wait for the Timeout only once, but per process, so if the main thread launches 50 fetchers, and this takes a few seconds to half a minute, i need to compensate.
3) I want to make sure the data that comes from Queue.Get() is put there by a Fetcher that didn't timeout (since it is theoretically possible that a fetcher was putting the data in the Queue, when the timeout occured, and it was shot to death...) That data should be dumped.
It's not a very bad thing when a timeout occurs, it's not a desirable situation, but corrupt data is worse.
You could pass a new multiprocessing.Lock() to every fetcher you start.
In the fetcher's process, be sure to wrap the Queue.put() with this lock:
with self.lock:
self.queue.put(result)
When you need to terminate a fetcher's process, use its lock :
with fetcherLock:
fetcherProcess.terminate()
This way, your queue won't get corrupted by killing a fetcher during the queue access.
Some fetcher's locks could get corrupted. But, that's not an issue since every new fetcher you launch has a brand new lock.
Why not
create a new queue and start all the fetchers that will use this
queue.
have your script actually sleep the amount of time you want the fetcher's processes to have for getting a result.
get everything from the resultQueue - it won't be corrupted, since you didn't have to kill any process.
finally, terminate all fetcher's processes that are still alive.
loop !
While attempting to store multiprocessing's process instance in multiprocessing list-variable 'poolList` I am getting a following exception:
SimpleQueue objects should only be shared between processes through inheritance
The reason why I would like to store the PROCESS instances in a variable is to be able to terminate all or just some of them later (if for example a PROCESS freezes). If storing a PROCESS in variable is not an option I would like to know how to get or to list all the PROCESSES started by mutliprocessing POOL. That would be very similar to what .current_process() method does. Except .current_process gets only a single process while I need all the processes started or all the processes currently running.
Two questions:
Is it even possible to store an instance of the Process (as a result of mp.current_process()
Currently I am only able to get a single process from inside of the function that the process is running (from inside of myFunct() using .current_process() method).
Instead I would like to to list all the processes currently running by multiprocessing. How to achieve it?
import multiprocessing as mp
poolList=mp.Manager().list()
def myFunct(arg):
print 'myFunct(): current process:', mp.current_process()
try: poolList.append(mp.current_process())
except Exception, e: print e
for i in range(110):
for n in range(500000):
pass
poolDict[arg]=i
print 'myFunct(): completed', arg, poolDict
from multiprocessing import Pool
pool = Pool(processes=2)
myArgsList=['arg1','arg2','arg3']
pool=Pool(processes=2)
pool.map_async(myFunct, myArgsList)
pool.close()
pool.join()
To list the processes started by a Pool()-instance(which is what you mean if I understand you correctly), there is the pool._pool-list. And it contains the instances of the processes.
However, it is not part of the documented interface and hence, really should not be used.
BUT...it seems a little bit unlikely that it would change just like that anyway. I mean, should they stop having an internal list of processes in the pool? And not call that _pool?
And also, it annoys me that there at least isn't a get processes-method. Or something.
And handling it breaking due to some name change should not be that difficult.
But still, use at your own risk:
from multiprocessing import pool
# Have to run in main
if __name__ == '__main__':
# Create 3 worker processes
_my_pool = pool.Pool(3)
# Loop, terminate, and remove from the process list
# Use a copy [:] of the list to remove items correctly
for _curr_process in _my_pool._pool[:]:
print("Terminating process "+ str(_curr_process.pid))
_curr_process.terminate()
_my_pool._pool.remove(_curr_process)
# If you call _repopulate, the pool will again contain 3 worker processes.
_my_pool._repopulate_pool()
for _curr_process in _my_pool._pool[:]:
print("After repopulation "+ str(_curr_process.pid))
The example creates a pool and manually terminates all processes.
It is important that you remember to delete the process you terminate from the pool yourself i you want Pool() to continue working as usual.
_my_pool._repopulate increases the number of working processes to 3 again, not needed to answer the question, but gives a little bit of behind-the-scenes insight.
Yes you can get all active process and perform action based on name of process
e.g
multiprocessing.Process(target=foo, name="refresh-reports")
and then
for p in multiprocessing.active_children():
if p.name == "foo":
p.terminate()
You're creating a managed List object, but then letting the associated Manager object expire.
Process objects are shareable because they aren't pickle-able; that is, they aren't simple.
Oddly the multiprocessing module doesn't have the equivalent of threading.enumerate() -- that is, you can't list all outstanding processes. As a workaround, I just store procs in a list. I never terminate() a process, but do sys.exit(0) in the parent. It's rough, because the workers will leave things in an inconsistent state, but it's okay for smaller programs
To kill a frozen worker, I suggest: 1) worker receives "heartbeat" jobs in a queue every now and then, 2) if parent notices worker A hasn't responded to a heartbeat in a certain amount of time, then p.terminate(). Consider restating the problem in another SO question, as it's interesting.
To be honest the map stuff is much easier than using a Manager.
Here's a Manager example I've used. A worker adds stuff to a shared list. Another worker occasionally wakes up, processes everything on the list, then goes back to sleep. The code also has verbose logs, which are essential for ease in debugging.
source
# producer adds to fixed-sized list; scanner uses them
import logging, multiprocessing, sys, time
def producer(objlist):
'''
add an item to list every sec; ensure fixed size list
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(1)
except KeyboardInterrupt:
return
msg = 'ding: {:04d}'.format(int(time.time()) % 10000)
logger.info('put: %s', msg)
del objlist[0]
objlist.append( msg )
def scanner(objlist):
'''
every now and then, run calculation on objlist
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(5)
except KeyboardInterrupt:
return
logger.info('items: %s', list(objlist))
def main():
logger = multiprocessing.log_to_stderr(
level=logging.INFO
)
logger.info('setup')
# create fixed-length list, shared between producer & consumer
manager = multiprocessing.Manager()
my_objlist = manager.list( # pylint: disable=E1101
[None] * 10
)
multiprocessing.Process(
target=producer,
args=(my_objlist,),
name='producer',
).start()
multiprocessing.Process(
target=scanner,
args=(my_objlist,),
name='scanner',
).start()
logger.info('running forever')
try:
manager.join() # wait until both workers die
except KeyboardInterrupt:
pass
logger.info('done')
if __name__=='__main__':
main()