I want to run parallel computation on some input data which is loaded from a file. (The file can be really big, so I use a generator for this.)
On a certain number of items, my code runs OK but above this threshold the program hangs (some of the worker processes do not end).
Any suggestions? (I am running this with python2.7, 8 CPUs; 5,000 lines still OK, 7,500 does not work.)
Firstly, you need an input file. Generate it in bash:
for i in {0..10000}; do echo -e "$i"'\r' >> counter.txt; done
Then, run this:
python2.7 main.py 100 counter.txt > run_log.txt
main.py:
#!/usr/bin/python2.7
import os, sys, signal, time
import Queue
import multiprocessing as mp
def eat_queue(job_queue, result_queue):
"""Eats input queue, feeds output queue
"""
proc_name = mp.current_process().name
while True:
try:
job = job_queue.get(block=False)
if job == None:
print(proc_name + " DONE")
return
result_queue.put(execute(job))
except Queue.Empty:
pass
def execute(x):
"""Does the computation on the input data
"""
return x*x
def save_result(result):
"""Saves results in a list
"""
result_list.append(result)
def load(ifilename):
"""Generator reading the input file and
yielding it row by row
"""
ifile = open(ifilename, "r")
for line in ifile:
line = line.strip()
num = int(line)
yield (num)
ifile.close()
print("file closed".upper())
def put_tasks(job_queue, ifilename):
"""Feeds the job queue
"""
for item in load(ifilename):
job_queue.put(item)
for _ in range(get_max_workers()):
job_queue.put(None)
def get_max_workers():
"""Returns optimal number of processes to run
"""
max_workers = mp.cpu_count() - 2
if max_workers < 1:
return 1
return max_workers
def run(workers_num, ifilename):
job_queue = mp.Queue()
result_queue = mp.Queue()
# decide how many processes are to be created
max_workers = get_max_workers()
print "processes available: %d" % max_workers
if workers_num < 1 or workers_num > max_workers:
workers_num = max_workers
workers_list = []
# a process for feeding job queue with the input file
task_gen = mp.Process(target=put_tasks, name="task_gen",
args=(job_queue, ifilename))
workers_list.append(task_gen)
for i in range(workers_num):
tmp = mp.Process(target=eat_queue, name="w%d" % (i+1),
args=(job_queue, result_queue))
workers_list.append(tmp)
for worker in workers_list:
worker.start()
for worker in workers_list:
worker.join()
print "worker %s finished!" % worker.name
if __name__ == '__main__':
result_list = []
args = sys.argv
workers_num = int(args[1])
ifilename = args[2]
run(workers_num, ifilename)
This is because nothing in your code takes anything off result_queue. The behavior then depends on internal queue buffering details: if "not a lot" of data is waiting, everything appears fine, but if "a lot" of data is waiting, everything freezes. Not much more can be said, because it involves layers of internal magic ;-) But the docs do warn about it:
Warning
As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
One easy way to repair that: First add
result_queue.put(None)
before eat_queue() returns. Then add:
count = 0
while count < workers_num:
if result_queue.get() is None:
count += 1
before the main program .join()s the workers. That drains the result queue, and everything shuts down cleanly then.
BTW, this code is pretty bizarre:
while True:
try:
job = job_queue.get(block=False)
if job == None:
print(proc_name + " DONE")
return
result_queue.put(execute(job))
except Queue.Empty:
pass
Why are you doing non-blocking get()? This turns into a CPU-hog "busy loop" so long as the queue is empty. The primary point of .get() is to supply an efficient way to wait for work to show up. So:
while True:
job = job_queue.get()
if job is None:
print(proc_name + " DONE")
break
else:
result_queue.put(execute(job))
result_queue.put(None)
does the same thing, but far more efficiently.
Queue size caution
You didn't ask about this, but let's cover it before it bites you ;-) By default, there is no bound on a Queue's size. If, e.g., you add a billion items to the Queue, it will demand enough RAM to hold a billion items. So if your producer(s) can generate work items faster than your consumer(s) can process them, memory use can get out of hand quickly.
Fortunately, that's easy to repair: specify a maximum queue size. For example,
job_queue = mp.Queue(maxsize=10*workers_num)
^^^^^^^^^^^^^^^^^^^^^^^
Then job_queue.put(some_work_item) will block until consumers reduce the size of the queue to less than the maximum. This way you can process enormous problems with a queue that requires trivial RAM.
Related
I have a large number of tasks (40,000 to be exact) that I am using a Pool to run in parallel. To maximize efficiency, I pass the list of all tasks at once to starmap and let them run.
I would like to have it so that if my program is broken using Ctrl+C then currently running tasks will be allowed to finish but new ones will not be started. I have figured out the signal handling part to handle the Ctrl+C breaking just fine using the recommended method and this works well (at least with Python 3.6.9 that I am using):
import os
import signal
import random as rand
import multiprocessing as mp
def init() :
signal.signal(signal.SIGINT, signal.SIG_IGN)
def child(a, b, c) :
st = rand.randrange(5, 20+1)
print("Worker thread", a+1, "sleep for", st, "...")
os.system("sleep " + str(st))
pool = mp.Pool(initializer=init)
try :
pool.starmap(child, [(i, 2*i, 3*i) for i in range(10)])
pool.close()
pool.join()
print("True exit!")
except KeyboardInterrupt :
pool.terminate()
pool.join()
print("Interupted exit!")
The problem is that Pool seems to have no function to let the currently running tasks complete and then stop. It only has terminate and close. In the example above I use terminate but this is not what I want as this immediately terminates all running tasks (whereas I want to let the currently running tasks run to completion). On the other hand, close simply prevents adding more tasks, but calling close then join will wait for all pending tasks to complete (40,000 of them in my real case) (whereas I only want currently running tasks to finish not all of them).
I could somehow gradually add my tasks one by one or in chunks so I could use close and join when interrupted, but this seems less efficient unless there is a way to add a new task as soon as one finishes manually (which I'm not seeing how to do from the Pool documentation). It really seems like my use case would be common and that Pool should have a function for this, but I have not seen this question asked anywhere (or maybe I'm just not searching for the right thing).
Does anyone know how to accomplish this easily?
I tried to do something similar with concurrent.futures - see the last code block in this answer: it attempts to throttle adding tasks to the pool and only adds new tasks as tasks complete. You could change the logic to fit your needs. Maybe keep the pending work items slightly greater than the number of workers so you don't starve the executor. something like:
import concurrent.futures
import random as rand
import time
def child(*args, n=0):
signal.signal(signal.SIGINT, signal.SIG_IGN)
a,b,c = args
st = rand.randrange(1, 5)
time.sleep(st)
x = f"Worker {n} thread {a+1} slept for {st} - args:{args}"
return (n,x)
if __name__ == '__main__':
nworkers = 5 # ncpus?
results = []
fs = []
with concurrent.futures.ProcessPoolExecutor(max_workers=nworkers) as executor:
data = ((i, 2*i, 3*i) for i in range(100))
for n,args in enumerate(data):
try:
# limit pending tasks
while len(executor._pending_work_items) >= nworkers + 2:
# wait till one completes and get the result
futures = concurrent.futures.wait(fs, return_when=concurrent.futures.FIRST_COMPLETED)
#print(futures)
results.extend(future.result() for future in futures.done)
print(f'{len(results)} results so far')
fs = list(futures.not_done)
print(f'add a new task {n}')
fs.append(executor.submit(child, *args,**{'n':n}))
except KeyboardInterrupt as e:
print('ctrl-c!!}',file=sys.stderr)
# don't add anymore tasks
break
# get leftover results as they finish
for future in concurrent.futures.as_completed(fs):
print(f'{len(executor._pending_work_items)} tasks pending:')
result = future.result()
results.append(result)
results.sort()
# separate the results from the value used to sort
for n,result in results:
print(result)
Here is a way to get the results sorted in submission order without modifying the task. It uses a dictionary to relate each future to its submission order and uses it for the sort key.
# same imports
def child(*args):
signal.signal(signal.SIGINT, signal.SIG_IGN)
a,b,c = args
st = random.randrange(1, 5)
time.sleep(st)
x = f"Worker thread {a+1} slept for {st} - args:{args}"
return x
if __name__ == '__main__':
nworkers = 5 # ncpus?
sort_dict = {}
results = []
fs = []
with concurrent.futures.ProcessPoolExecutor(max_workers=nworkers) as executor:
data = ((i, 2*i, 3*i) for i in range(100))
for n,args in enumerate(data):
try:
# limit pending tasks
while len(executor._pending_work_items) >= nworkers + 2:
# wait till one completes and grab it
futures = concurrent.futures.wait(fs, return_when=concurrent.futures.FIRST_COMPLETED)
results.extend(future for future in futures.done)
print(f'{len(results)} futures completed so far')
fs = list(futures.not_done)
future = executor.submit(child, *args)
fs.append(future)
print(f'task {n} added - future:{future}')
sort_dict[future] = n
except KeyboardInterrupt as e:
print('ctrl-c!!',file=sys.stderr)
# don't add anymore tasks
break
# get leftover futures as they finish
for future in concurrent.futures.as_completed(fs):
print(f'{len(executor._pending_work_items)} tasks pending:')
results.append(future)
#sort the futures
results.sort(key=lambda f: sort_dict[f])
# get the results
for future in results:
print(future.result())
You could also just add an attribute to each future and sort on that (no need for the dictionary)
...
future = executor.submit(child, *args)
# add an attribute to the future that can be sorted on
future.submitted = n
fs.append(future)
...
results.sort(key=lambda f: f.submitted)
How to use pipe correctly in multiple processes(>2)?
eg. one producer several consumer
these code is failure in Linux environment
but windows environment is well
import multiprocessing, time
def consumer(pipe,id):
output_p, input_p = pipe
input_p.close()
while True:
try:
item = output_p.recv()
except EOFError:
break
print("%s consumeļ¼%s" % (id,item))
#time.sleep(3) # if no sleep these code will fault in Linux environment
# but windows environment is well
print('Consumer done')
def producer(sequence, input_p):
for item in sequence:
print('produceļ¼',item)
input_p.send(item)
time.sleep(1)
if __name__ == '__main__':
(output_p, input_p) = multiprocessing.Pipe()
# create two consumer process
cons_p1 = multiprocessing.Process(target=consumer,args=((output_p,input_p),1))
cons_p1.start()
cons_p2 = multiprocessing.Process(target=consumer,args=((output_p,input_p),2))
cons_p2.start()
output_p.close()
sequence = [i for i in range(10)]
producer(sequence, input_p)
input_p.close()
cons_p1.join()
cons_p2.join()
Do not use pipe for multiple consumers. The documentation explicitly says it will be corrupted when more then two processes read or write. Which you do; two readers.
The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.
So use Queue, or JoinableQueue even.
from multiprocessing import Process, JoinableQueue
from Queue import Empty
import time
def consumer(que, pid):
while True:
try:
item = que.get(timeout=10)
print("%s consume:%s" % (pid, item))
que.task_done()
except Empty:
break
print('Consumer done')
def producer(sequence, que):
for item in sequence:
print('produce:', item)
que.put(item)
time.sleep(1)
if __name__ == '__main__':
que = JoinableQueue()
# create two consumer process
cons_p1 = Process(target=consumer, args=(que, 1))
cons_p1.start()
cons_p2 = Process(target=consumer, args=(que, 2))
cons_p2.start()
sequence = [i for i in range(10)]
producer(sequence, que)
que.join()
cons_p1.join()
cons_p2.join()
I am launching concurrent threads doing some stuff:
concurrent = 10
q = Queue(concurrent * 2)
for j in range(concurrent):
t = threading.Thread(target=doWork)
t.daemon = True
t.start()
try:
# process each line and assign it to an available thread
for line in call_file:
q.put(line)
q.join()
except KeyboardInterrupt:
sys.exit(1)
At the same time I have a distinct thread counting time:
def printit():
threading.Timer(1.0, printit).start()
print current_status
printit()
I would like to increase (or decrease) the amount of concurrent threads for the main process let's say every minute. I can make a time counter in the time thread and make it do things every minute but how to change the amount of concurrent threads in the main process ?
Is it possible (and if yes how) to do that ?
This is my worker:
def UpdateProcesses(start,processnumber,CachesThatRequireCalculating,CachesThatAreBeingCalculated,CacheDict,CacheLock,IdleLock,FileDictionary,MetaDataDict,CacheIndexDict):
NewPool()
while start[processnumber]:
IdleLock.wait()
while len(CachesThatRequireCalculating)>0 and start[processnumber] == True:
CacheLock.acquire()
try:
cacheCode = CachesThatRequireCalculating[0] # The list can be empty if an other process takes the last item during the CacheLock
CachesThatRequireCalculating.remove(cacheCode)
print cacheCode,"starts processing by",processnumber,"process"
except:
CacheLock.release()
else:
CacheLock.release()
CachesThatAreBeingCalculated.append(cacheCode[:3])
Array,b,f = TIPP.LoadArray(FileDictionary[cacheCode[:2]])#opens the dask array
Array = ((Array[:,:,CacheIndexDict[cacheCode[:2]][cacheCode[2]]:CacheIndexDict[cacheCode[:2]][cacheCode[2]+1]].compute()/2.**(MetaDataDict[cacheCode[:2]]["Bit Depth"])*255.).astype(np.uint16)).transpose([1,0,2]) #slices and calculates the array
f.close() #close the file
if CachesThatAreBeingCalculated.count(cacheCode[:3]) != 0: #if not, this cache is not needed annymore (the cacheCode is removed bij wavelengthchange)
CachesThatAreBeingCalculated.remove(cacheCode[:3])
try: #If the first time the object if not aivalable try a second time
CacheDict[cacheCode[:3]] = Array
except:
CacheDict[cacheCode[:3]] = Array
print cacheCode,"done processing by",processnumber,"process"
if start[processnumber]:
IdleLock.clear()
This is how I start them:
self.ProcessLst = [] #list with all the processes who calculate the caches
for processnumber in range(min(NumberOfMaxProcess,self.processes)):
self.ProcessTerminateLst.append(True)
for processnumber in range(min(NumberOfMaxProcess,self.processes)):
self.ProcessLst.append(process.Process(target=Proc.UpdateProcesses,args=(self.ProcessTerminateLst,processnumber,self.CachesThatRequireCalculating,self.CachesThatAreBeingCalculated,self.CacheDict,self.CacheLock,self.IdleLock,self.FileDictionary,self.MetaDataDict,self.CacheIndexDict,)))
self.ProcessLst[-1].daemon = True
self.ProcessLst[-1].start()
I close them like this:
for i in range(len(self.ProcessLst)): #For both while loops in the processes self.ProcessTerminateLst[i] must be True. So or the process is now ready to be terminad or is still in idle mode.
self.ProcessTerminateLst[i] = False
self.IdleLock.set() #Makes sure no process is in Idle and all are ready to be terminated
I would use a pool. a pool has a max number of threads it uses at the same time, but you can apply inf number of jobs. They stay in the waiting list until a thread is available. I don't think you can change number of current processes in the pool.
I am having a problem where child processes are hanging in my python application, only 4/16 processes have finished all of these processes are adding to a multiprocessing queue. https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues According to python docs:
Warning
As mentioned above, if a child process has put items on a queue (and
it has not used JoinableQueue.cancel_join_thread), then that process
will not terminate until all buffered items have been flushed to the
pipe.
This means that if you try joining that process you may get a deadlock
unless you are sure that all items which have been put on the queue
have been consumed. Similarly, if the child process is non-daemonic
then the parent process may hang on exit when it tries to join all its
non-daemonic children.
Note that a queue created using a manager does not have this issue.
See Programming guidelines.
I believe this may be my problem, however I do a get() off the queue before I join. I am not sure what other alternatives I can take.
def RunInThread(dictionary):
startedProcesses = list()
resultList = list()
output = Queue()
scriptList = ThreadChunk(dictionary, 16) # last number determines how many threads
for item in scriptList:
if __name__ == '__main__':
proc = Process(target=CreateScript, args=(item, output))
startedProcesses.append(proc)
proc.start()
while not output.empty():
resultList.append(output.get())
#we must wait for the processes to finish before continuing
for process in startedProcesses:
process.join()
print "finished"
#defines chunk of data each thread will process
def ThreadChunk(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0
while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg
return out
def CreateScript(scriptsToGenerate, queue):
start = time.clock()
for script in scriptsToGenerate:
...
queue.put([script['timeInterval'], script['script']])
print time.clock() - start
print "I have finished"
The issue with your code is that while not output.empty() is not reliable (see empty). You might also run into the scenario where the interpreter hits while not output.empty() before the processes you created finished their initialization (thus having the Queue actually empty).
Since you know exactly how much items will be put in the queue (i.e. len(dictionnary)) you can read that number of items from the queue:
def RunInThread(dictionary):
startedProcesses = list()
output = Queue()
scriptList = ThreadChunk(dictionary, 16) # last number determines how many threads
for item in scriptList:
proc = Process(target=CreateScript, args=(item, output))
startedProcesses.append(proc)
proc.start()
resultList = [output.get() for _ in xrange(len(dictionary))]
#we must wait for the processes to finish before continuing
for process in startedProcesses:
process.join()
print "finished"
If at some point you're modifying your script and don't know anymore howmuch items will be produced, you can use Queue.get with a reasonnable timeout:
def RunInThread(dictionary):
startedProcesses = list()
resultList = list()
output = Queue()
scriptList = ThreadChunk(dictionary, 16) # last number determines how many threads
for item in scriptList:
proc = Process(target=CreateScript, args=(item, output))
startedProcesses.append(proc)
proc.start()
try:
while True:
resultList.append(output.get(True, 2)) # block for a 2 seconds timeout, just in case
except queue.Empty:
pass # no more items produced
#we must wait for the processes to finish before continuing
for process in startedProcesses:
process.join()
print "finished"
You might need to adjust the timeout depending on the actual time of the computation in your CreateScript.
Update: Here is a more specific example
Suppose I want to compile some statistical data from a sizable set of files:
I can make a generator (line for line in fileinput.input(files)) and some processor:
from collections import defaultdict
scores = defaultdict(int)
def process(line):
if 'Result' in line:
res = line.split('\"')[1].split('-')[0]
scores[res] += 1
The question is how to handle this when one gets to the multiprocessing.Pool.
Of course it's possible to define a multiprocessing.sharedctypes as well as a custom struct instead of a defaultdict but this seems rather painful. On the other hand I can't think of a pythonic way to instantiate something before the process or to return something after a generator has run out to the main thread.
So you basically create a histogram. This is can easily be parallelized, because histograms can be merged without complication. One might want to say that this problem is trivially parallelizable or "embarrassingly parallel". That is, you do not need to worry about communication among workers.
Just split your data set into multiple chunks, let your workers work on these chunks independently, collect the histogram of each worker, and then merge the histograms.
In practice, this problem is best off by letting each worker process/read its own file. That is, a "task" could be a file name. You should not start pickling file contents and send them around between processes through pipes. Let each worker process retrieve the bulk data directly from files. Otherwise your architecture spends too much time with inter-process communication, instead of doing some real work.
Do you need an example or can you figure this out yourself?
Edit: example implementation
I have a number of data files with file names in this format: data0.txt, data1.txt, ... .
Example contents:
wolf
wolf
cat
blume
eisenbahn
The goal is to create a histogram over the words contained in the data files. This is the code:
from multiprocessing import Pool
from collections import Counter
import glob
def build_histogram(filepath):
"""This function is run by a worker process.
The `filepath` argument is communicated to the worker
through a pipe. The return value of this function is
communicated to the manager through a pipe.
"""
hist = Counter()
with open(filepath) as f:
for line in f:
hist[line.strip()] += 1
return hist
def main():
"""This function runs in the manager (main) process."""
# Collect paths to data files.
datafile_paths = glob.glob("data*.txt")
# Create a pool of worker processes and distribute work.
# The input to worker processes (function argument) as well
# as the output by worker processes is transmitted through
# pipes, behind the scenes.
pool = Pool(processes=3)
histograms = pool.map(build_histogram, datafile_paths)
# Properly shut down the pool of worker processes, and
# wait until all of them have finished.
pool.close()
pool.join()
# Merge sub-histograms. Do not create too many intermediate
# objects: update the first sub-histogram with the others.
# Relevant docs: collections.Counter.update
merged_hist = histograms[0]
for h in histograms[1:]:
merged_hist.update(h)
for word, count in merged_hist.items():
print "%s: %s" % (word, count)
if __name__ == "__main__":
main()
Test output:
python countwords.py
eisenbahn: 12
auto: 6
cat: 1
katze: 10
stadt: 1
wolf: 3
zug: 4
blume: 5
herbert: 14
destruction: 4
I had to modify the original pool.py (the trouble was worker is defined as a method without any inheritance) to get what I want but it's not so bad, and probably better than writing a new pool entirely.
class worker(object):
def __init__(self, inqueue, outqueue, initializer=None, initargs=(), maxtasks=None,
wrap_exception=False, finalizer=None, finargs=()):
assert maxtasks is None or (type(maxtasks) == int and maxtasks > 0)
put = outqueue.put
get = inqueue.get
self.completed = 0
if hasattr(inqueue, '_writer'):
inqueue._writer.close()
outqueue._reader.close()
if initializer is not None:
initializer(self, *initargs)
def run(self):
while maxtasks is None or (maxtasks and self.completed < maxtasks):
try:
task = get()
except (EOFError, OSError):
util.debug('worker got EOFError or OSError -- exiting')
break
if task is None:
util.debug('worker got sentinel -- exiting')
break
job, i, func, args, kwds = task
try:
result = (True, func(*args, **kwds))
except Exception as e:
if wrap_exception:
e = ExceptionWithTraceback(e, e.__traceback__)
result = (False, e)
try:
put((job, i, result))
except Exception as e:
wrapped = MaybeEncodingError(e, result[1])
util.debug("Possible encoding error while sending result: %s" % (
wrapped))
put((job, i, (False, wrapped)))
self.completed += 1
if finalizer:
finalizer(self, *finargs)
util.debug('worker exiting after %d tasks' % self.completed)
run(self)