Multiprocessing Module Runs Slower with Worker Function?

Multiprocessing Module Runs Slower with Worker Function? - python

I am trying to use Python's multiprocessing module to speed up processing. However, when I have run the test_parallel_compute function at the very bottom of the code, on a computing cluster with 32 nodes (EDIT I've found out that I'm only running across one node), the time for the program to run without multiprocessing is longer: 1024 seconds (32 processes) vs 231 seconds (no multiprocessing module used). 1022 of the seconds were spent in the pool.map call within the parallel_compute_new_2 function, so the time is not limited by partitioning the inputs nor by joining the return functions.
I have an input list (b) and several other arguments (a and c) to the function (test_function). In order to prepare these for the multiple processors, I partition b. I then give the function and its partitioned arguments as arguments to worker_function_new, which calls the test_function on its partitioned arguments.
QUESTION EDITTED:
Can you see any inefficiencies in mapping the multiple processes as below? Again, 1022 of the seconds were spent in the pool.map call within the parallel_compute_new_2 function, so the time is not limited by partitioning the inputs nor by joining the return functions.
I am calling running this with inputs of a = 100.0, b = range(10000000), and c = 15.0.
Thank you!
# Partition the input list
def partition_inputs(input, number):
num_inputs = len(input)
return [input[num_inputs * i/number:num_inputs * (i+1)/number] for i in range(number)]
# This is the function that each process is supposed to run.
# It takes in several arguments. b is a long list, which is partitioned
# into multiple slices for each process. a and c are just other numbers.
# This function's return values, d, e, and f, are joined between each process.
def test_function_2(args):
a = args[0]
b = args[1]
c = args[2]
d = []
e = 0
f = {}
for i in b:
d.append(a*i*c)
f[i] = set([a, i, c])
return d, e, f
def parallel_compute_new_2(function, args, input, input_index, partition_input_function, join_functions_dict, pool,
number=32, procnumber=32):
# Partition the b list. In my case, the partition_input_function is
# partition_input_list, as above.
new_inputs = partition_input_function(input, number)
# Since test_function_2 requires arguments (a, c) beyond the partitioned
# list b, create a list of the complete arguments.
worker_function_args_list = []
for i in range(number):
new_args = args[:]
new_args[input_index] = new_inputs[i]
worker_function_args_list.append(new_args)
returnlist = pool.map(function, worker_function_args_list)
# Join the return values from each process.
return_values = list(returnlist[0])
for index in join_functions_dict:
for proc in range(1, number):
return_values[index] = join_functions_dict[index](return_values[index], returnlist[proc][index])
return return_values
def test_parallel_compute(a, b, c, number=32, procnumber=32):
join_functions_dict = {}
join_functions_dict[0] = lambda a, b: a + b
join_functions_dict[2] = combine_dictionaries
# a = 100.
# b = range(1000000000)
# c = 15.
d, e, f = test_function(a, b, c)
pool = mup.Pool(processes=procnumber)
d1, e1, f1 = parallel_compute_new_2(test_function_2, [a, b, c], b, 1, partition_inputs, join_functions_dict, pool, number=number, procnumber=procnumber)

Related

running a function in parallel in Python

I'm trying to run a function like f in parallel in Python but have two problems:
When using map, function f is not applied to all permuted tuples of arrays a and b.
When trying to use Pool, I get the following error:
TypeError: '<=' not supported between instances of 'tuple' and 'int'
def f(n,m):
x = n * m
return x
a = (1,2,3,4,5)
b = (3,4,7,8,9)
result = map(f, a, b)
print(list(result))
#now trying parallel computing
from multiprocessing import Pool
pool = Pool(processes=4)
print(*pool.map(f, a, b))

I didn't make any changes for your #1 issue and get the expected result from using map(). You seem to have an incorrect assumption of how it works, but didn't provide expectation vs. actual results for your example.
for #2 to return the same answers as #1, you need starmap() instead of map() for this instance of multiprocessing use, and then zip() the argument lists to provide sets of arguments. If on an OS that doesn't fork (and for portability if you are), run global code only if it is the main process, and not a spawned process by using the documented if __name__ == '__main__': idiom:
from multiprocessing import Pool
def f(n,m):
x = n * m
return x
if __name__ == '__main__':
a = (1,2,3,4,5)
b = (3,4,7,8,9)
result = map(f, a, b)
print(list(result))
#now trying parallel computing
pool = Pool(processes=4)
print(*pool.starmap(f, zip(a, b)))
Output:
[3, 8, 21, 32, 45]
3 8 21 32 45
If you actually want permutations as mentioned in #1, use itertools.starmap or pool.starmap with itertools.product(a,b) as parameters instead.

List defining the list of variable of function in Python

I want to have a function foo which outputs another function, whose list of variables depends on an input list.
Precisely:
Suppose func is a function with the free variable t and three parameters A,gamma,x
Example: func = lambda t,A,gamma,x: Somefunction
I want to define a function foo, which takes as input a list and outputs another function. The output function is a sum of func's, where each func summand has his parameters independent from each other.
Depending on the input list the variables of the outputs changes in the following:
If the entry of the list is 'None' then the output function 'gains' a variable and if the entry of the list is a float it 'fixes' the parameter.
Example:
li=[None,None,0.1]
g=foo(li)
gives the same output as
g = lambda t,A,gamma: func(t,A,gamma,0.1)
or
li=[None,None,0.1,None,0.2,0.3]
g=foo(li)
gives the same output as
g = lambda t,A1,gamma1,A2:func(t,A,gamma,0.1)+func(t,A2,0.2,0.3)
Note: the order in the list is relevant and this behaviour is wanted.
I don't have any clue on how to do that...
I first tried to build a string, which depends on the inout list and then to execute it, but this is surely not the way.

First, partition the parameters from li into chunks. Then use an iterator to either get the next from the function parameters *args, if the value in that chunk is None, or the provided value from the parameters chunk.
def foo(li, func, num_params):
chunks = (li[i:i+num_params] for i in range(0, len(li), num_params))
def function(t, *args):
result = 0
iter_args = iter(args)
for chunk in chunks:
actual_params = [next(iter_args) if x is None else x for x in chunk]
result += func(t, *actual_params)
return result
return function
Example:
def f(t, a, b, c):
print t, a, b, c
return a + b + c
func = foo([1,2,None,4,None,6], f, 3)
print func("foo", 3, 5)
Output:
foo 1 2 3 # from first call to f
foo 4 5 6 # from second call to f
21 # result of func

Multiprocessing in Python: how to implement a loop over "apply_async" as "map_async" using a callback function

I would like to integrate a system of differential equations for several parameter combinations using Python's multiprocessing module. So, the system should get integrated and the parameter combination should be stored as well as its index and the final value of one of the variables.
While that works fine when I use apply_async - which is already faster than doing it in a simple for-loop - I fail to implement the same thing using map_async which seems to be faster than apply_async. The callback function is never called and I have no clue why. Could anyone explain why this happens and how to get the same output using map_async instead of apply_async?!
Here is my code:
from pylab import *
import multiprocessing as mp
from scipy.integrate import odeint
import time
#my system of differential equations
def myODE (yn,tvec,allpara):
(x, y, z) = yn
a, b = allpara['para']
dx = -x + a*y + x*x*y
dy = b - a*y - x*x*y
dz = x*y
return (dx, dy, dz)
#returns the index of the parameter combination, the parameters and the integrated solution
#this way I know which parameter combination belongs to which outcome in the asynch-case
def runMyODE(yn,tvec,allpara):
return allpara['index'],allpara['para'],transpose(odeint(myODE, yn, tvec, args=(allpara,)))
#for reproducibility
seed(0)
#time settings for integration
dt = 0.01
tmax = 50
tval = arange(0,tmax,dt)
numVar = 3 #number of variables (x, y, z)
numPar = 2 #number of parameters (a, b)
numComb = 5 #number of parameter combinations
INIT = zeros((numComb,numVar)) #initial conditions will be stored here
PARA = zeros((numComb,numPar)) #parameter combinations for a and b will be stored here
#create some initial conditions and random parameters
for combi in range(numComb):
INIT[combi,:] = append(10*rand(2),0) #initial conditions for x and y are randomly chosen, z is 0
PARA[combi,:] = 10*rand(2) #parameter a and b are chosen randomly
#################################using loop over apply####################
#results will be stored in here
asyncResultsApply = []
#my callback function
def saveResultApply(result):
# storing the index, a, b and the final value of z
asyncResultsApply.append((result[0], result[1], result[2][2,-1]))
#start the multiprocessing part
pool = mp.Pool(processes=4)
for combi in range(numComb):
pool.apply_async(runMyODE, args=(INIT[combi,:],tval,{'para': PARA[combi,:], 'index': combi}), callback=saveResultApply)
pool.close()
pool.join()
for res in asyncResultsApply:
print res[0], res[1], res[2] #printing the index, a, b and the final value of z
#######################################using map#####################
#the only difference is that the for loop is replaced by a "map_async" call
print "\n\nnow using map\n\n"
asyncResultsMap = []
#my callback function which is never called
def saveResultMap(result):
# storing the index, a, b and the final value of z
asyncResultsMap.append((result[0], result[1], result[2][2,-1]))
pool = mp.Pool(processes=4)
pool.map_async(lambda combi: runMyODE(INIT[combi,:], tval, {'para': PARA[combi,:], 'index': combi}), range(numComb), callback=saveResultMap)
pool.close()
pool.join()
#this does not work yet
for res in asyncResultsMap:
print res[0], res[1], res[2] #printing the index, a, b and the final value of z

If I understood you correctly, it stems from something that confuses people quite often. apply_async's callback is called after the single op, but so does map's - it does not call the callback on each element, but rather once on the entire result.
You are correct in noting that map is faster than apply_asyncs. If you want something to happen after each result, there are a few ways to go:
You can effectively add the callback to the operation you want to be performed on each element, and map using that.
You could use imap (or imap_unordered) in a loop, and do the callback within the loop body. Of course, this means that all will be performed in the parent process, but the nature of stuff written as callbacks means that's usually not a problem (it tends to be cheap functions). YMMV.
For example, suppose you have the functions f and cb, and you'd like to map f on es with cb for each op. Then you could either do:
def look_ma_no_cb(e):
r = f(e)
cb(r)
return r
p = multiprocessing.Pool()
p.map(look_ma_no_cb, es)
or
for r in p.imap(f, es):
cb(r)

Executing some function on IPython is slower than a normal python function

I'm testing some functionalities of ipython and I'm think I'm doing something wrong.
I'm testing 3 different ways to execute some math operation.
1st using #parallel.parallel(view=dview, block=True) and function map
2nd using single core function (python normal function)
3rd using clients load balance function
I have this code:
from IPython import parallel
import numpy as np
import multiprocessing as mp
import time
rc = parallel.Client(block=True)
dview = rc[:]
lbview = rc.load_balanced_view()
#parallel.require(np)
def suma_pll(a, b):
return a + b
#parallel.require(np)
def producto_pll(a, b):
return a * b
def suma(a, b):
return a + b
def producto(a, b):
return a * b
#parallel.parallel(view=dview, block=True)
#parallel.require(np)
#parallel.require(suma_pll)
#parallel.require(producto_pll)
def a_calc_pll(a, b):
result = []
for i, v in enumerate(a):
result.append(
producto_pll(suma_pll(a[i], a[i]), suma_pll(b[i], b[i]))//100
)
return result
#parallel.require(suma)
#parallel.require(producto)
def a_calc_remote(a, b):
result = []
for i, v in enumerate(a):
result.append(
producto(suma(a[i], a[i]), suma(b[i], b[i]))//100
)
return result
def a_calc(a, b):
return producto(suma(a, a), suma(b, b))//100
def main_pll(a, b):
return a_calc_pll.map(a, b)
def main_lb(a, b):
c = lbview.map(a_calc_remote, a, b, block=True)
return c
def main(a, b):
c = []
for i in range(len(a)):
c += [a_calc(a[i], b[i]).tolist()]
return c
if __name__ == '__main__':
a, b = [], []
for i in range(1, 1000):
a.append(np.array(range(i+00, i+10)))
b.append(np.array(range(i+10, i+20)))
t = time.time()
c1 = main_pll(a, b)
t1 = time.time()-t
t = time.time()
c2 = main(a, b)
t2 = time.time()-t
t = time.time()
c3 = main_lb(a, b)
t3 = time.time()-t
print(str(c1) == str(c2))
print(str(c3) == str(c2))
print('%f secs (multicore)' % t1)
print('%f secs (singlecore)' % t2)
print('%f secs (multicore_load_balance)' % t3)
My result is:
True
True
0.040741 secs (multicore)
0.004004 secs (singlecore)
1.286592 secs (multicore_load_balance)
Why are my multicore routines slower than my single core routine? What is wrong with this approach? What can I do to fix it?
Some information: python3.4.1, ipython 2.2.0, numpy 1.9.0, ipcluster starting 8 Engines with LocalEngineSetLauncher

It seems to me that you are trying to parallelise something that takes too little time to execute on a single core. In Python, any form of "true" parallelism is multi-process, which means that you have to spawn multiple Python interpreters, transfer the data via pickling/unpickling, etc.
This is going to result in a noticeable overhead for small workloads. On my system, just starting and then stopping immediately a Python interpreter takes around 1/100 of a second:
# time python -c "pass"
real 0m0.018s
user 0m0.012s
sys 0m0.005s
I am not sure what the decorators you are using are doing behind the scenes, but as you can see just setting up the infrastructure for parallel work can take quite a bit of time.
edit
On further inspection, it looks like you are already setting up the workers before running your code, so the overhead hinted above might be out of the picture.
You are though moving data to the worker processes, two lists of 1000 NumPy arrays. Pickling a and b to a string on my system takes ~0.13 seconds with pickle and ~0.046 seconds with cPickle. The pickling time can be reduced by storing your arrays in, instead of lists, NumPy arrays:
a = np.array(a)
b = np.array(b)
This cuts down the cPickle time to ~0.029 seconds.

Grid search function in Python

I am trying to write a parameter search function to loop over one of the parameters and repeatedly call a function with all other parameters the same, other than the one I am searching over. Here is some sample code:
def worker1(a, b, c):
return a + b + c
def worker2(d, e, f):
return d * e * f
def search(model, params):
res = []
# Loop over one of the parameters and repeatedly append to res
if model == 1:
res.append(worker1(**params))
elif model == 2:
res.append(worker2(**params))
return res
params = dict(a=1, b=2, c=3)
print search(1, params)
I have two workers and they are called depending on the value of the model flag I pass to search(). The problem I am trying to solve here is to write a loop (commented in the code) over the if statements to repeatedly call say worker1 by varying only one of the parameters. I want my code to be flexible - sometimes I want to loop through a and keep b and c the same, but sometimes I want to loop through b and keeping a and c the same.
I'm open whatever solution suggested, but I think I would be specifying the search parameters in the params dictionary. E.g. To loop a over 1,2,3,4, I would say:
`params = dict(a=[1,2,3,4], b=2, c=3)`
Also it would be nice if I don't have to modify the code for worker1 and worker2.
Thank you!

You could perhaps use itertools.product to call your workers with all combinations of params:
http://docs.python.org/2/library/itertools.html#itertools.product
eg
from itertools import product
def worker1(a, b, c):
return a + b + c
def worker2(d, e, f):
return d * e * f
def search(model, *params):
res = []
# Loop over one of the parameters and repeatedly append to res
for current_params in product(*params):
if model == 1:
res.append(worker1(*current_params))
elif model == 2:
res.append(worker2(*current_params))
return res
print search(1, [1,2,3,4], [2], [3])
# more complicated combinations are possible:
print search(1, [1,2,3,4], [2,7,9], [3,13,23,43])
I've avoided using keyword arguments as your worker functions take differently-named args so it wouldn't make much sense.
I'm assuming your worker functions don't actually look like the ones above as if they did you could further simplify the code using the builtin sum and reduce functions.

I am not sure if I understood the problem. Check if this is what you want (omitted the model parameter):
>>> def worker1(a, b, c):
return a + b + c
>>> def search(params):
params = params.values()
var_param = filter(lambda p: type(p) == list, params)[0]
other_params = filter(lambda p: p != var_param, params)
return [worker1(x, *other_params) for x in var_param]
>>> search({'a':2, 'b':[3,4,5], 'c':3})
[8, 9, 10]
Assuming:
arguments of worker1() are commutative (order does not matter).
variable parameter is a list
other parameters are single values.
In the above sample b is the variable parameter which you want to loop over
Update:
In case order of the arguments of the function worker1 is to be preserved:
def search(params):
params = params.items()
var_param = filter(lambda t: type(t[1]) == list, params)[0]
other_params = filter(lambda t: t != var_param, params)
var_param_key = var_param[0]
var_param_values = var_param[1]
return [worker1(**dict([(var_param_key, x)] + other_params)) for x in var_param_values]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing Module Runs Slower with Worker Function? - python

Related

running a function in parallel in Python

List defining the list of variable of function in Python

Multiprocessing in Python: how to implement a loop over "apply_async" as "map_async" using a callback function

Executing some function on IPython is slower than a normal python function

Grid search function in Python

Categories

Resources