Hang during pool.join() asynchronously processing a queue

Hang during pool.join() asynchronously processing a queue - python

On the python docs, it says that if maxsize is less than or equal to zero, the Queue size is infinite. I've also tried maxsize=-1. However this isn't the case and the program will hang. So as a work-around I created multiple Queues to work with. But this will not be ideal as I will need to work with even bigger lists and then would have to subsequently create more and more Queue() and add additional code to process the elements.
queue = Queue(maxsize=0)
queue2 = Queue(maxsize=0)
queue3 = Queue(maxsize=0)
PROCESS_COUNT = 6
def filter(aBigList):
list_chunks = list(chunks(aBigList, PROCESS_COUNT))
pool = multiprocessing.Pool(processes=PROCESS_COUNT)
for chunk in list_chunks:
pool.apply_async(func1, (chunk,))
pool.close()
pool.join()
allFiltered = []
# list of dicts
while not queue.empty():
allFiltered.append(queue.get())
while not queue2.empty():
allFiltered.append(queue2.get())
while not queue3.empty():
allFiltered.append(queue3.get())
//do work with allFiltered
def func1(subList):
SUBLIST_SPLIT = 3
theChunks = list(chunks(subList, SUBLIST_SPLIT))
for i in theChunks[0]:
dictQ = updateDict(i)
queue.put(dictQ)
for x in theChunks[1]:
dictQ = updateDict(x)
queue2.put(dictQ)
for y in theChunks[2]:
dictQ = updateDict(y)
queue3.put(dictQ)

Your issue happens because you do not process the Queue before the join call.
When you are using a multiprocessing.Queue, you should empty it before trying to join the feeder process. The Process wait for all the object put in the Queue to be flushed before terminating. I don't know why it is the case even for Queue with large size but it might be linked to the fact that the underlying os.pipe object do not have a size large enough.
So putting your get call before the pool.join should solve your problem.
PROCESS_COUNT = 6
def filter(aBigList):
list_chunks = list(chunks(aBigList, PROCESS_COUNT))
pool = multiprocessing.Pool(processes=PROCESS_COUNT)
result_queue = multiprocessing.Queue()
async_result = []
for chunk in list_chunks:
async_result.append(pool.apply_async(
func1, (chunk, result_queue)))
done = 0
while done < 3:
res = queue.get()
if res == None:
done += 1
else:
all_filtered.append(res)
pool.close()
pool.join()
# do work with allFiltered
def func1(sub_list, result_queue):
# mapping function
results = []
for i in sub_list:
result_queue.append(updateDict(i))
result_queue.append(None)
One question is why do you need to handle the communication by yourself? you could just let the Pool manage that for you if you re factor:
PROCESS_COUNT = 6
def filter(aBigList):
list_chunks = list(chunks(aBigList, PROCESS_COUNT))
pool = multiprocessing.Pool(processes=PROCESS_COUNT)
async_result = []
for chunk in list_chunks:
async_result.append(pool.apply_async(func1, (chunk,)))
pool.close()
pool.join()
# Reduce the result
allFiltered = [res.get() for res in async_result]
# do work with allFiltered
def func1(sub_list):
# mapping function
results = []
for i in sub_list:
results.append(updateDict(i))
return results
This permits to avoid this kind of bug.
EDIT
Finally, you can even reduce your code even further by using the Pool.map function, which even handle chunksize.
If your chunks gets too big, you might get error in the pickling process of the results (as stated in your comment). You can thus reduce adapt the size of the chink using map:
PROCESS_COUNT = 6
def filter(aBigList):
# Run in parallel a internal function of mp.Pool which run
# UpdateDict on chunk of 100 item in aBigList and return them.
# The map function takes care of the chunking, dispatching and
# collect the items in the right order.
with multiprocessing.Pool(processes=PROCESS_COUNT) as pool:
allFiltered = pool.map(updateDict, aBigList, chunksize=100)
# do work with allFiltered

Related

Given N generators, is it possible to create a generator that runs them in parallel processes and yields the zip of those generators?

Suppose I have N generators gen_1, ..., gen_N where each on them will yield the same number of values. I would like a generator gen such that it runs gen_1, ..., gen_N in N parallel processes and yields (next(gen_1), next(gen_2), ... next(gen_N))
That is I would like to have:
def gen():
yield (next(gen_1), next(gen_2), ... next(gen_N))
in such a way that each gen_i is running on its own process. Is it possible to do this? I have tried doing this in the following dummy example with no success:
A = range(4)
def gen(a):
B = ['a', 'b', 'c']
for b in B:
yield b + str(a)
def target(g):
return next(g)
processes = [Process(target=target, args=(gen(a),)) for a in A]
for p in processes:
p.start()
for p in processes:
p.join()
However I get the error TypeError: cannot pickle 'generator' object.
EDIT:
I have modified #darkonaut answer's a bit to fit my needs. I am posting it in case some of you find it useful. We first define a couple of utility functions:
from itertools import zip_longest
from typing import List, Generator
def grouper(iterable, n, fillvalue=iter([])):
"Collect data into fixed-length chunks or blocks"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def split_generators_into_batches(generators: List[Generator], n_splits):
chunks = grouper(generators, len(generators) // n_splits + 1)
return [zip_longest(*chunk) for chunk in chunks]
The following class is responsible for splitting any number of generators into n (number of processes) batches and proccessing them yielding the desired result:
import multiprocessing as mp
class GeneratorParallelProcessor:
SENTINEL = 'S'
def __init__(self, generators, n_processes = 2 * mp.cpu_count()):
self.n_processes = n_processes
self.generators = split_generators_into_batches(list(generators), n_processes)
self.queue = mp.SimpleQueue()
self.barrier = mp.Barrier(n_processes + 1)
self.sentinels = [self.SENTINEL] * n_processes
self.processes = [
mp.Process(target=self._worker, args=(self.barrier, self.queue, gen)) for gen in self.generators
]
def process(self):
for p in self.processes:
p.start()
while True:
results = list(itertools.chain(*(self.queue.get() for _ in self.generators)))
if results != self.sentinels:
yield results
self.barrier.wait()
else:
break
for p in self.processes:
p.join()
def _worker(self, barrier, queue, generator):
for x in generator:
queue.put(x)
barrier.wait()
queue.put(self.SENTINEL)
To use it just do the following:
parallel_processor = GeneratorParallelProcessor(generators)
for grouped_generator in parallel_processor.process():
output_handler(grouped_generator)

It's possible to get such an "Unified Parallel Generator (UPG)" (attempt to coin a name) with some effort, but as #jasonharper already mentioned, you definitely need to assemble the sub-generators within the child-processes, since a running generator can't be pickled.
The pattern below is re-usable with only the generator function gen() being custom to this example. The design uses multiprocessing.SimpleQueue for returning generator results to the parent and multiprocessing.Barrier for synchronization.
Calling Barrier.wait() will block the caller (thread in any process) until the number of specified parties has called .wait(), whereupon all threads currently waiting on the Barrier get released simultaneously. The usage of Barrier here ensures further generator-results are only started to be computed after the parent has received all results from an iteration, which might be desirable to keep overall memory consumption in check.
The number of parallel workers used equals the number of argument-tuples you provide within the gen_args_tuples-iterable, so gen_args_tuples=zip(range(4)) will use four workers for example. See comments in code for further details.
import multiprocessing as mp
SENTINEL = 'SENTINEL'
def gen(a):
"""Your individual generator function."""
lst = ['a', 'b', 'c']
for ch in lst:
for _ in range(int(10e6)): # some dummy computation
pass
yield ch + str(a)
def _worker(i, barrier, queue, gen_func, gen_args):
for x in gen_func(*gen_args):
print(f"WORKER-{i} sending item.")
queue.put((i, x))
barrier.wait()
queue.put(SENTINEL)
def parallel_gen(gen_func, gen_args_tuples):
"""Construct and yield from parallel generators
build from `gen_func(gen_args)`.
"""
gen_args_tuples = list(gen_args_tuples) # ensure list
n_gens = len(gen_args_tuples)
sentinels = [SENTINEL] * n_gens
queue = mp.SimpleQueue()
barrier = mp.Barrier(n_gens + 1) # `parties`: + 1 for parent
processes = [
mp.Process(target=_worker, args=(i, barrier, queue, gen_func, args))
for i, args in enumerate(gen_args_tuples)
]
for p in processes:
p.start()
while True:
results = [queue.get() for _ in range(n_gens)]
if results != sentinels:
results.sort()
yield tuple(r[1] for r in results) # sort and drop ids
barrier.wait() # all workers are waiting
# already, so this will unblock immediately
else:
break
for p in processes:
p.join()
if __name__ == '__main__':
for res in parallel_gen(gen_func=gen, gen_args_tuples=zip(range(4))):
print(res)
Output:
WORKER-1 sending item.
WORKER-0 sending item.
WORKER-3 sending item.
WORKER-2 sending item.
('a0', 'a1', 'a2', 'a3')
WORKER-1 sending item.
WORKER-2 sending item.
WORKER-3 sending item.
WORKER-0 sending item.
('b0', 'b1', 'b2', 'b3')
WORKER-2 sending item.
WORKER-3 sending item.
WORKER-1 sending item.
WORKER-0 sending item.
('c0', 'c1', 'c2', 'c3')
Process finished with exit code 0

I went for a little different approach, you can modify the example below accordingly.
So somewhere in the main script initialize the pool according to your needs, you need just this 2 lines
from multiprocessing import Pool
pool = Pool(processes=4)
then you can define a generator function like this:
(Note that the generators input is assumed to be any iterable containing all the generators)
def parallel_generators(generators, pool):
results = ['placeholder']
while len(results) != 0:
batch = pool.map_async(next, generators) # defines the next round of values
results = list(batch.get) # actual calculation done here
yield results
return
We define the results condition in the while loop like this because map objects with next and generators return an empty list when the generators stop producing values. So at that point we just terminate the parallel generator.
EDIT
So apparently multiproccecing pool, and map don't play good with generators making the above code not work as intended so do not use until later update.
As for the pickle error it seems some bound functions do not support pickle which is needed in the multiprocessing library in order to transfer objects and functions, for a workaround the pathos mutliprocessing library uses dill which solves the need for pickle and is an option you might want to try, searching in Stack Overflow for your error you can also find some more complicated solutions with custom code for pickling the functions needed.

Create different processes using a list of objects

I want to execute this function without having to rewrite all the code for each process.
def executeNode(node):
node.execution()
And the code that I don't feel the need to repeat n times the next one. I need to use Process not Threads.
a0 = Process(target=executeNode, args = (node1))
a1 = Process(target=executeNode, args = (node2))
a2 = Process(target=executeNode, args = (node3))
...............................
an = Process(target=executeNode, args = (nodeN))
So I decided to create a list of nodes but I don't know how to execute a process for each item (node) of the list.
sNodes = []
for i in range(0, n):
node = node("a"+ str(i), (4001 + i))
sNodes.append(node)
How can I execute a process for each item (node) of the list (sNodes).
Thank you all.

You can use a Pool:
from multiprocessing import Pool
if __name__ == '__main__':
with Pool(n) as p:
print(p.map(executeNode, sNodes))
Where n is the number of processes you want.
In case you want detached processes or you dont expect a result is better to simply use another loop:
processes = []
for node in sNodes:
p = Process(target=executeNode, args = (node1))
processes.append(p)
p.Start()
General tip: having a lot of processes will not speed up your code but make your processor start swaping and everything will be slower. Just in case you are looking for a code speedup instead of a logical architecture.

Try something like this:
from multiprocessing import Pool
process_number = 4
nodes = [...]
def execute_node(node):
print(node)
pool = Pool(processes=process_number)
pool.starmap(execute_node, [(node,) for node in nodes])
pool.close()
You will find more intel here: https://docs.python.org/3/library/multiprocessing.html

Getting a pickle error when trying to run processes

What I'm trying to do is running a list of prime number decomposition in different processes at once. I have a threaded version that's working, but can't seem to get it working with processes.
import math
from Queue import Queue
import multiprocessing
def primes2(n):
primfac = []
num = n
d = 2
while d * d <= n:
while (n % d) == 0:
primfac.append(d) # supposing you want multiple factors repeated
n //= d
d += 1
if n > 1:
primfac.append(n)
myfile = open('processresults.txt', 'a')
myfile.write(str(num) + ":" + str(primfac) + "\n")
return primfac
def mp_factorizer(nums, nprocs):
def worker(nums, out_q):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outdict = {}
for n in nums:
outdict[n] = primes2(n)
out_q.put(outdict)
# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
# Wait for all worker processes to finish
for p in procs:
p.join()
print resultdict
if __name__ == '__main__':
mp_factorizer((400243534500, 100345345000, 600034522000, 9000045346435345000), 4)
I'm getting a pickle error shown below:
Any help would be greatly appreciated :)

You need to use multiprocessing.Queue instead of regular Queue. +more
This is due the Process doesn't run using the same memory space and there are some objects that aren't pickable, like the a regular queue (Queue.Queue). To overcome this, the multiprocessing library provide a Queue class that is actually a Proxy to a Queue.
And also, you could extract the def worker(.. out as any other method. This could be your main problem because on "how" a process is forked on a OS level.
You can also use a multiprocessing.Manager +more.

dynamically created functions cannot be pickled and therefore cannot be used as the target of a Process, the function worker needs to be defined in the global scope instead of inside the definition of mp_factorizer.

Dictionary multiprocessing

I want to parallelize the processing of a dictionary using the multiprocessing library.
My problem can be reduced to this code:
from multiprocessing import Manager,Pool
def modify_dictionary(dictionary):
if((3,3) not in dictionary):
dictionary[(3,3)]=0.
for i in range(100):
dictionary[(3,3)] = dictionary[(3,3)]+1
return 0
if __name__ == "__main__":
manager = Manager()
dictionary = manager.dict(lock=True)
jobargs = [(dictionary) for i in range(5)]
p = Pool(5)
t = p.map(modify_dictionary,jobargs)
p.close()
p.join()
print dictionary[(3,3)]
I create a pool of 5 workers, and each worker should increment dictionary[(3,3)] 100 times. So, if the locking process works correctly, I expect dictionary[(3,3)] to be 500 at the end of the script.
However; something in my code must be wrong, because this is not what I get: the locking process does not seem to be "activated" and dictionary[(3,3)] always have a valuer <500 at the end of the script.
Could you help me?

The problem is with this line:
dictionary[(3,3)] = dictionary[(3,3)]+1
Three things happen on that line:
Read the value of the dictionary key (3,3)
Increment the value by 1
Write the value back again
But the increment part is happening outside of any locking.
The whole sequence must be atomic, and must be synchronized across all processes. Otherwise the processes will interleave giving you a lower than expected total.
Holding a lock whist incrementing the value ensures that you get the total of 500 you expect:
from multiprocessing import Manager,Pool,Lock
lock = Lock()
def modify_array(dictionary):
if((3,3) not in dictionary):
dictionary[(3,3)]=0.
for i in range(100):
with lock:
dictionary[(3,3)] = dictionary[(3,3)]+1
return 0
if __name__ == "__main__":
manager = Manager()
dictionary = manager.dict(lock=True)
jobargs = [(dictionary) for i in range(5)]
p = Pool(5)
t = p.map(modify_array,jobargs)
p.close()
p.join()
print dictionary[(3,3)]

I ve managed many times to find here the correct solution to a programming difficulty I had. So I would like to contribute a little bit. Above code still has the problem of not updating right the dictionary. To have the right result you have to pass lock and correct jobargs to f. In above code you make a new dictionary in every proccess. The code I found to work fine:
from multiprocessing import Process, Manager, Pool, Lock
from functools import partial
def f(dictionary, l, k):
with l:
for i in range(100):
dictionary[3] += 1
if __name__ == "__main__":
manager = Manager()
dictionary = manager.dict()
lock = manager.Lock()
dictionary[3] = 0
jobargs = list(range(5))
pool = Pool()
func = partial(f, dictionary, lock)
t = pool.map(func, jobargs)
pool.close()
pool.join()
print(dictionary)

In the OP's code, it is locking the entire iteration. In general, you should only apply locks for the shortest time, as long as it is effective. The following code is much more efficient. You acquire the lock only to make the code atomic
def f(dictionary, l, k):
for i in range(100):
with l:
dictionary[3] += 1
Note that dictionary[3] += 1 is not atomic, so it must be locked.

Python Multiprocessing with a single function

I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.
It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.
Here is the essence of the code I have:
STD_1 = []
STD_2 = []
# etc.
for L in range(0,6,2):
for a in range(1,100):
### simulation code ###
STD_1.append(value_1)
STD_2.append(value_2)
# etc.
Here is what I can modify it to:
master_list = []
def simulate(a,L):
### simulation code ###
return (a,L,STD_1, STD_2 etc.)
for L in range(0,6,2):
for a in range(1,100):
master_list.append(simulate(a,L))
Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.
How exactly would I go about coding this?
EDIT: Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?
EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.
import multiprocessing
data = []
for L in range(0,6,2):
for a in range(1,100):
data.append((L,a))
print (data)
def simulation(arg):
# unpack the tuple
a = arg[1]
L = arg[0]
STD_1 = a**2
STD_2 = a**3
STD_3 = a**4
# simulation code #
return((STD_1,STD_2,STD_3))
print("1")
p = multiprocessing.Pool()
print ("2")
results = p.map(simulation, data)
EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?

Wrap the data for each iteration up into a tuple.
Make a list data of those tuples
Write a function f to process one tuple and return one result
Create p = multiprocessing.Pool() object.
Call results = p.map(f, data)
This will run as many instances of f as your machine has cores in separate processes.
Edit1: Example:
from multiprocessing import Pool
data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]
def f(t):
name, a, b, c = t
return (name, a + b + c)
p = Pool()
results = p.map(f, data)
print results
Edit2:
Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.

Here is one way to run it in parallel threads:
import threading
L_a = []
for L in range(0,6,2):
for a in range(1,100):
L_a.append((L,a))
# Add the rest of your objects here
def RunParallelThreads():
# Create an index list
indexes = range(0,len(L_a))
# Create the output list
output = [None for i in indexes]
# Create all the parallel threads
threads = [threading.Thread(target=simulate,args=(output,i)) for i in indexes]
# Start all the parallel threads
for thread in threads: thread.start()
# Wait for all the parallel threads to complete
for thread in threads: thread.join()
# Return the output list
return output
def simulate(list,index):
(L,a) = L_a[index]
list[index] = (a,L) # Add the rest of your objects here
master_list = RunParallelThreads()

Use Pool().imap_unordered if ordering is not important. It will return results in a non-blocking fashion.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Hang during pool.join() asynchronously processing a queue - python

Related

Given N generators, is it possible to create a generator that runs them in parallel processes and yields the zip of those generators?

Create different processes using a list of objects

Getting a pickle error when trying to run processes

Dictionary multiprocessing

Python Multiprocessing with a single function

Categories

Resources