Python Multiprocessing with a single function - python

I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.
It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.
Here is the essence of the code I have:
STD_1 = []
STD_2 = []
# etc.
for L in range(0,6,2):
for a in range(1,100):
### simulation code ###
STD_1.append(value_1)
STD_2.append(value_2)
# etc.
Here is what I can modify it to:
master_list = []
def simulate(a,L):
### simulation code ###
return (a,L,STD_1, STD_2 etc.)
for L in range(0,6,2):
for a in range(1,100):
master_list.append(simulate(a,L))
Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.
How exactly would I go about coding this?
EDIT: Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?
EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.
import multiprocessing
data = []
for L in range(0,6,2):
for a in range(1,100):
data.append((L,a))
print (data)
def simulation(arg):
# unpack the tuple
a = arg[1]
L = arg[0]
STD_1 = a**2
STD_2 = a**3
STD_3 = a**4
# simulation code #
return((STD_1,STD_2,STD_3))
print("1")
p = multiprocessing.Pool()
print ("2")
results = p.map(simulation, data)
EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?

Wrap the data for each iteration up into a tuple.
Make a list data of those tuples
Write a function f to process one tuple and return one result
Create p = multiprocessing.Pool() object.
Call results = p.map(f, data)
This will run as many instances of f as your machine has cores in separate processes.
Edit1: Example:
from multiprocessing import Pool
data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]
def f(t):
name, a, b, c = t
return (name, a + b + c)
p = Pool()
results = p.map(f, data)
print results
Edit2:
Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.

Here is one way to run it in parallel threads:
import threading
L_a = []
for L in range(0,6,2):
for a in range(1,100):
L_a.append((L,a))
# Add the rest of your objects here
def RunParallelThreads():
# Create an index list
indexes = range(0,len(L_a))
# Create the output list
output = [None for i in indexes]
# Create all the parallel threads
threads = [threading.Thread(target=simulate,args=(output,i)) for i in indexes]
# Start all the parallel threads
for thread in threads: thread.start()
# Wait for all the parallel threads to complete
for thread in threads: thread.join()
# Return the output list
return output
def simulate(list,index):
(L,a) = L_a[index]
list[index] = (a,L) # Add the rest of your objects here
master_list = RunParallelThreads()

Use Pool().imap_unordered if ordering is not important. It will return results in a non-blocking fashion.

Related

Given N generators, is it possible to create a generator that runs them in parallel processes and yields the zip of those generators?

Suppose I have N generators gen_1, ..., gen_N where each on them will yield the same number of values. I would like a generator gen such that it runs gen_1, ..., gen_N in N parallel processes and yields (next(gen_1), next(gen_2), ... next(gen_N))
That is I would like to have:
def gen():
yield (next(gen_1), next(gen_2), ... next(gen_N))
in such a way that each gen_i is running on its own process. Is it possible to do this? I have tried doing this in the following dummy example with no success:
A = range(4)
def gen(a):
B = ['a', 'b', 'c']
for b in B:
yield b + str(a)
def target(g):
return next(g)
processes = [Process(target=target, args=(gen(a),)) for a in A]
for p in processes:
p.start()
for p in processes:
p.join()
However I get the error TypeError: cannot pickle 'generator' object.
EDIT:
I have modified #darkonaut answer's a bit to fit my needs. I am posting it in case some of you find it useful. We first define a couple of utility functions:
from itertools import zip_longest
from typing import List, Generator
def grouper(iterable, n, fillvalue=iter([])):
"Collect data into fixed-length chunks or blocks"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def split_generators_into_batches(generators: List[Generator], n_splits):
chunks = grouper(generators, len(generators) // n_splits + 1)
return [zip_longest(*chunk) for chunk in chunks]
The following class is responsible for splitting any number of generators into n (number of processes) batches and proccessing them yielding the desired result:
import multiprocessing as mp
class GeneratorParallelProcessor:
SENTINEL = 'S'
def __init__(self, generators, n_processes = 2 * mp.cpu_count()):
self.n_processes = n_processes
self.generators = split_generators_into_batches(list(generators), n_processes)
self.queue = mp.SimpleQueue()
self.barrier = mp.Barrier(n_processes + 1)
self.sentinels = [self.SENTINEL] * n_processes
self.processes = [
mp.Process(target=self._worker, args=(self.barrier, self.queue, gen)) for gen in self.generators
]
def process(self):
for p in self.processes:
p.start()
while True:
results = list(itertools.chain(*(self.queue.get() for _ in self.generators)))
if results != self.sentinels:
yield results
self.barrier.wait()
else:
break
for p in self.processes:
p.join()
def _worker(self, barrier, queue, generator):
for x in generator:
queue.put(x)
barrier.wait()
queue.put(self.SENTINEL)
To use it just do the following:
parallel_processor = GeneratorParallelProcessor(generators)
for grouped_generator in parallel_processor.process():
output_handler(grouped_generator)
It's possible to get such an "Unified Parallel Generator (UPG)" (attempt to coin a name) with some effort, but as #jasonharper already mentioned, you definitely need to assemble the sub-generators within the child-processes, since a running generator can't be pickled.
The pattern below is re-usable with only the generator function gen() being custom to this example. The design uses multiprocessing.SimpleQueue for returning generator results to the parent and multiprocessing.Barrier for synchronization.
Calling Barrier.wait() will block the caller (thread in any process) until the number of specified parties has called .wait(), whereupon all threads currently waiting on the Barrier get released simultaneously. The usage of Barrier here ensures further generator-results are only started to be computed after the parent has received all results from an iteration, which might be desirable to keep overall memory consumption in check.
The number of parallel workers used equals the number of argument-tuples you provide within the gen_args_tuples-iterable, so gen_args_tuples=zip(range(4)) will use four workers for example. See comments in code for further details.
import multiprocessing as mp
SENTINEL = 'SENTINEL'
def gen(a):
"""Your individual generator function."""
lst = ['a', 'b', 'c']
for ch in lst:
for _ in range(int(10e6)): # some dummy computation
pass
yield ch + str(a)
def _worker(i, barrier, queue, gen_func, gen_args):
for x in gen_func(*gen_args):
print(f"WORKER-{i} sending item.")
queue.put((i, x))
barrier.wait()
queue.put(SENTINEL)
def parallel_gen(gen_func, gen_args_tuples):
"""Construct and yield from parallel generators
build from `gen_func(gen_args)`.
"""
gen_args_tuples = list(gen_args_tuples) # ensure list
n_gens = len(gen_args_tuples)
sentinels = [SENTINEL] * n_gens
queue = mp.SimpleQueue()
barrier = mp.Barrier(n_gens + 1) # `parties`: + 1 for parent
processes = [
mp.Process(target=_worker, args=(i, barrier, queue, gen_func, args))
for i, args in enumerate(gen_args_tuples)
]
for p in processes:
p.start()
while True:
results = [queue.get() for _ in range(n_gens)]
if results != sentinels:
results.sort()
yield tuple(r[1] for r in results) # sort and drop ids
barrier.wait() # all workers are waiting
# already, so this will unblock immediately
else:
break
for p in processes:
p.join()
if __name__ == '__main__':
for res in parallel_gen(gen_func=gen, gen_args_tuples=zip(range(4))):
print(res)
Output:
WORKER-1 sending item.
WORKER-0 sending item.
WORKER-3 sending item.
WORKER-2 sending item.
('a0', 'a1', 'a2', 'a3')
WORKER-1 sending item.
WORKER-2 sending item.
WORKER-3 sending item.
WORKER-0 sending item.
('b0', 'b1', 'b2', 'b3')
WORKER-2 sending item.
WORKER-3 sending item.
WORKER-1 sending item.
WORKER-0 sending item.
('c0', 'c1', 'c2', 'c3')
Process finished with exit code 0
I went for a little different approach, you can modify the example below accordingly.
So somewhere in the main script initialize the pool according to your needs, you need just this 2 lines
from multiprocessing import Pool
pool = Pool(processes=4)
then you can define a generator function like this:
(Note that the generators input is assumed to be any iterable containing all the generators)
def parallel_generators(generators, pool):
results = ['placeholder']
while len(results) != 0:
batch = pool.map_async(next, generators) # defines the next round of values
results = list(batch.get) # actual calculation done here
yield results
return
We define the results condition in the while loop like this because map objects with next and generators return an empty list when the generators stop producing values. So at that point we just terminate the parallel generator.
EDIT
So apparently multiproccecing pool, and map don't play good with generators making the above code not work as intended so do not use until later update.
As for the pickle error it seems some bound functions do not support pickle which is needed in the multiprocessing library in order to transfer objects and functions, for a workaround the pathos mutliprocessing library uses dill which solves the need for pickle and is an option you might want to try, searching in Stack Overflow for your error you can also find some more complicated solutions with custom code for pickling the functions needed.

Multiprocess a Queue and return a list

So I am trying to process a queue of two zip-codes. The function where the queue is being fed is to calculate the driving and straight distance between the two zip-codes. I would like for the function to return both values (driving and straight distance) and be able append those values to a list so I can use them later. I'm fairly new to multiprocessing so I'm not sure where to go from here. I originally wasn't sure if you could pass two arguments through a pool/queue so I decided to try and put the two zip-codes into a set to be passed through the function to try and use it as one argument then pull out the necessary items separately. Please let me know if you need more information from me.
doc_num = []
origin_zip = []
origin_add = []
origin_city = []
destin_zip = []
destin_add = []
destin_city = []
#for i in range(14, len(data)-1):
for i in range(14, 16):
doc_num.append(data['AutoTable+Fit.13'][i])
origin_add.append(data['AutoTable+Fit'][i])
origin_city.append(data['AutoTable+Fit.1'][i])
origin_zip.append(data['AutoTable+Fit.2'][i])
destin_add.append(data['AutoTable+Fit.8'][i])
destin_city.append(data['AutoTable+Fit.6'][i])
destin_zip.append(data['AutoTable+Fit.7'][i])
distances = []
def calculate_distances(q):
try:
zip_sets = q.get()
drive = (gmaps.distance_matrix(zip_sets[1][0], zip_sets[1][1]))['rows'][0]['elements'][0]['distance']['value'] * 0.000621371
#print(f"Driving distance:", drive_dist)
LAT1 = gmaps.geocode(zip_sets[1][0])[0]['geometry']['location']['lat']
LONG1 = gmaps.geocode(zip_sets[1][0])[0]['geometry']['location']['lng']
LAT2 = gmaps.geocode(zip_sets[1][1])[0]['geometry']['location']['lat']
LONG2 = gmaps.geocode(zip_sets[1][1])[0]['geometry']['location']['lng']
distance = math.acos(math.cos(math.radians(90-(LAT1))) *math.cos(math.radians(90-(LAT2))) + math.sin(math.radians(90-(LAT1)))\
* math.sin(math.radians(90-(LAT2))) * math.cos(math.radians((LONG1)-(LONG2)))) * 3958.756
except:
drive = -1
distance = -1
return (drive, distance)
q = Queue()
for x in range(len(origin_zip)):
q.put((origin_zip[x], destin_zip[x]))
pool = Pool(5)
pool.map(calculate_distances, (q,))
pool.close()
pool.join()
A queue.Queue() should not be used for passing arguments to a pool of worker. Instead you should directly pass an iterable (e.g. a list of the sets of arguments) to pool.map():
# Create a list of pairs as input to pool.map() (Not needed, see below)
pairs = [pair for pair in zip(origin_list, destin_list)]
distances = pool.map(calculate_distances, pairs)
pool.close()
pool.join()
However, the creation of the pairs list is not necessary in this case, as the output of the zip operation is an iterable anyway and can be used directly. Also, if you are running Python 3.3 or above (which I sincerly hope you do) you should consider using a context manager (with ...) instead of the pool.close() and pool.join() calls, giving you the following piece of code:
with Pool(5) as pool:
distances = pool.map(calculate_distances, zip(origin_list, destin_list))
i.e., no explicit close() or join() and no intermediate list of zip codes.

Use a global variable to keep track of progression of a multiprocessing program

I got a program that I ran in multiprocessing. I would like to have a progression system with a print.
This is what I came up with:
import multiprocessing as mp
import os
global counter
global size
def f(x):
global counter
global size
print ("{} / {}".format(counter, size))
counter += 1
return x**2
size = 4
counter = 1
result = list()
for x in [1,2,3,4]:
result.append(f(x))
This one works. However, if you replace the bottom part with:
with mp.Pool(processes = 2) as p:
p.starmap(f, [1,2,3,4])
It doesn't. I don't understand why, can anyone help to get that up and running ? Thanks :)
N.B: This is of course a dummy example.
EDIT:
Ok new issue appear with your solution. I'll make an example:
fix1 = 1
fix2 = 2
dynamic = [1,2,3,4,5]
def f(x, y, z):
return x**2 + y + z
size = len(dynamic)
counter = 1
with mp.Pool(processes = 2) as p:
for output in p.starmap(f, [(x, fix1, fix2) for x in dynamic]):
print ("{} / {}".format(counter, size))
counter += 1
This one works but does all the print at the end.
with mp.Pool(processes = 2) as p:
for output in p.imap_unordered(f, [(x, fix1, fix2) for x in dynamic]):
print ("{} / {}".format(counter, size))
counter += 1
This one doesn't work and say that f() is missing 2 required positional arguments fix1 and fix2.
Any idea why I get this behavior?
N.B: I'm running on windows.
On a forking system like linux, subprocesses share a copy-on-write view of the parent memory space. If one side updates memory, it gets its own private copy of the changed pages. On other systems, a new process is created and a new python is executed. In either case, neither side sees the changes the others make. And that means that everyone is updating their own private copy of count and don't see the additions made by the others.
To keep things complicated, stdout is not synchronized. If workers print, you re likely to get garbled messages.
An alternative is to count the results as they come back to the parent pool. The parent tracks the count and the parent is the only one printing. If you don't care about the order of the returned data, then imap_unordered will work well for you.
import multiprocessing as mp
def f(x):
return x**2
data = [1,2,3,4]
result = []
with mp.Pool(processes = 2) as p:
for val in p.imap_unordered(f, data):
result.append(val)
print("progress", len(result)/len(data))

Getting a pickle error when trying to run processes

What I'm trying to do is running a list of prime number decomposition in different processes at once. I have a threaded version that's working, but can't seem to get it working with processes.
import math
from Queue import Queue
import multiprocessing
def primes2(n):
primfac = []
num = n
d = 2
while d * d <= n:
while (n % d) == 0:
primfac.append(d) # supposing you want multiple factors repeated
n //= d
d += 1
if n > 1:
primfac.append(n)
myfile = open('processresults.txt', 'a')
myfile.write(str(num) + ":" + str(primfac) + "\n")
return primfac
def mp_factorizer(nums, nprocs):
def worker(nums, out_q):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outdict = {}
for n in nums:
outdict[n] = primes2(n)
out_q.put(outdict)
# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
# Wait for all worker processes to finish
for p in procs:
p.join()
print resultdict
if __name__ == '__main__':
mp_factorizer((400243534500, 100345345000, 600034522000, 9000045346435345000), 4)
I'm getting a pickle error shown below:
Any help would be greatly appreciated :)
You need to use multiprocessing.Queue instead of regular Queue. +more
This is due the Process doesn't run using the same memory space and there are some objects that aren't pickable, like the a regular queue (Queue.Queue). To overcome this, the multiprocessing library provide a Queue class that is actually a Proxy to a Queue.
And also, you could extract the def worker(.. out as any other method. This could be your main problem because on "how" a process is forked on a OS level.
You can also use a multiprocessing.Manager +more.
dynamically created functions cannot be pickled and therefore cannot be used as the target of a Process, the function worker needs to be defined in the global scope instead of inside the definition of mp_factorizer.

How to parallel sum a loop using multiprocessing in Python

I am having difficulty understanding how to use Python's multiprocessing module.
I have a sum from 1 to n where n=10^10, which is too large to fit into a list, which seems to be the thrust of many examples online using multiprocessing.
Is there a way to "split up" the range into segments of a certain size and then perform the sum for each segment?
For instance
def sum_nums(low,high):
result = 0
for i in range(low,high+1):
result += i
return result
And I want to compute sum_nums(1,10**10) by breaking it up into many sum_nums(1,1000) + sum_nums(1001,2000) + sum_nums(2001,3000)... and so on. I know there is a close-form n(n+1)/2 but pretend we don't know that.
Here is what I've tried
import multiprocessing
def sum_nums(low,high):
result = 0
for i in range(low,high+1):
result += i
return result
if __name__ == "__main__":
n = 1000
procs = 2
sizeSegment = n/procs
jobs = []
for i in range(0, procs):
process = multiprocessing.Process(target=sum_nums, args=(i*sizeSegment+1, (i+1)*sizeSegment))
jobs.append(process)
for j in jobs:
j.start()
for j in jobs:
j.join()
#where is the result?
I find the usage of multiprocess.Pool and map() much more simple
Using your code:
from multiprocessing import Pool
def sum_nums(args):
low = int(args[0])
high = int(args[1])
return sum(range(low,high+1))
if __name__ == "__main__":
n = 1000
procs = 2
sizeSegment = n/procs
# Create size segments list
jobs = []
for i in range(0, procs):
jobs.append((i*sizeSegment+1, (i+1)*sizeSegment))
pool = Pool(procs).map(sum_nums, jobs)
result = sum(pool)
>>> print result
>>> 500500
You can do this sum without multiprocessing at all, and it's probably simpler, if not faster, to just use generators.
# prepare a generator of generators each at 1000 point intervals
>>> xr = (xrange(1000*i+1,i*1000+1001) for i in xrange(10000000))
>>> list(xr)[:3]
[xrange(1, 1001), xrange(1001, 2001), xrange(2001, 3001)]
# sum, using two map functions
>>> xr = (xrange(1000*i+1,i*1000+1001) for i in xrange(10000000))
>>> sum(map(sum, map(lambda x:x, xr)))
50000000005000000000L
However, if you want to use multiprocessing, you can also do this too. I'm using a fork of multiprocessing that is better at serialization (but otherwise, not really different).
>>> xr = (xrange(1000*i+1,i*1000+1001) for i in xrange(10000000))
>>> import pathos
>>> mmap = pathos.multiprocessing.ProcessingPool().map
>>> tmap = pathos.multiprocessing.ThreadingPool().map
>>> sum(tmap(sum, mmap(lambda x:x, xr)))
50000000005000000000L
The version w/o multiprocessing is faster and takes about a minute on my laptop. The multiprocessing version takes a few minutes due to the overhead of spawning multiple python processes.
If you are interested, get pathos here: https://github.com/uqfoundation
First, the best way to get around the memory issue is to use an iterator/generator instead of a list:
def sum_nums(low, high):
result = 0
for i in xrange(low, high+1):
result += 1
return result
in python3, range() produces an iterator, so this is only needed in python2
Now, where multiprocessing comes in is when you want to split up the processing to different processes or CPU cores. If you don't need to control the individual workers than the easiest method is to use a process pool. This will let you map a function to the pool and get the output. You can alternatively use apply_async to apply jobs to the pool one at a time and get a delayed result which you can get with .get():
import multiprocessing
from multiprocessing import Pool
from time import time
def sum_nums(low, high):
result = 0
for i in xrange(low, high+1):
result += i
return result
# map requires a function to handle a single argument
def sn((low,high)):
return sum_nums(low, high)
if __name__ == '__main__':
#t = time()
# takes forever
#print sum_nums(1,10**10)
#print '{} s'.format(time() -t)
p = Pool(4)
n = int(1e8)
r = range(0,10**10+1,n)
results = []
# using apply_async
t = time()
for arg in zip([x+1 for x in r],r[1:]):
results.append(p.apply_async(sum_nums, arg))
# wait for results
print sum(res.get() for res in results)
print '{} s'.format(time() -t)
# using process pool
t = time()
print sum(p.map(sn, zip([x+1 for x in r], r[1:])))
print '{} s'.format(time() -t)
On my machine, just calling sum_nums with 10**10 takes almost 9 minutes, but using a Pool(8) and n=int(1e8) reduces this to just over a minute.

Categories