Which is the correct way to parallelize this? Essentially, I have a very large 2D array I want to do a linear fit of each row to a separate array of the same length (x), which would be constant for all the rows. The expected result is a 1D array (data_slopes) with the linear fit slopes. This code works but it is very slow:
for j in range(img1_data_r.shape[0]):
y = img1_data_r[j,:]
model = LinearRegression()
model.fit(x.reshape((-1, 1)),y,1)
data_slopes[j] = model.coef_[0]
I have no previous experience with multiprocessing pool and I have been trying unsuccessfully
If you can pass in the domain of the data that needs to processed, you can use xargs to run your program in parallel. xargs allows you to execute a program in parallel, passing in different parameters read from stdin. I've used it successfully to make bash shells work in parallel.
See if this question helps you: Python read .json files from GCS into pandas DF in parallel
You can try the following. Instead of iterating over the ranges, what I would recommend is that you make a function that takes in a 2D array and returns your expected LinearRegression output. Then you can create a list that has all the 2D arrays you need to iterate over (iterator) -
#Function that works on a single object
def fn(x):
out = x**3 #your code here
return out
iterator = [1,2,3,4,5,6,7,8,9,10] #list of objects that you need to run your function on
pool = mp.Pool(processes=4) #Number of cores you want to utilize
results = [pool.apply_async(fn, args=(x,)) for x in iterator] #maps the iterator and the function to each core asynchronously
output = [p.get() for p in results] #collects and returns the results as a list of outputs.
output
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
pool.apply_async should be superfast along with the list comprehensions since it asynchronously passes the operations to the cores without waiting for all cores to finish their operations before passing the next batch.
Here is an example of how you can use multiprocessing to operate on rows of a 2d numpy array and a constant vector.
In this example, the same vector b (equivalent to your x) is dot-producted with each of the rows of the a array.
import numpy as np
from multiprocessing import Pool
def dot_product(row, vec):
return (row * vec).sum()
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]])
b = np.array([10, 11, 12])
p = Pool(3) # max number of simultaneous processes
print(p.starmap(dot_product, ((row, b) for row in a)))
Note that you can only pass picklable objects to a multiprocessing.Pool. Although numpy arrays are picklable, and a function (such as dot_product here) is, instance methods are not. So you could not use your model (LinearRegression()) as first argument to Pool.map (or Pool.starmap). Instead, you would have to instantiate LinearRegression separately inside the function for each process.
Putting this together for you (although obviously I don't have enough information to test this), you would get something like this:
def get_data_slope(row, x):
model = LinearRegression()
model.fit(x.reshape((-1, 1)), row, 1)
return model.coef_[0]
p = Pool(3)
data_slopes[:] = p.starmap(get_data_slope, ((row, x) for row in img1_data_r))
Related
I have a matrix of matrices that I need to do computations on (i.e. a x by y by z by z matrix, I need to perform a computation on each of the z by z matrices, so x*y total operations). Currently, I am looping over the first two dimensions of the matrix and performing the computation, but they are entirely independent. Is there a way I can compute these in parallel, without knowing in advance how many such matrices there will be (i.e. x and y unknown).
Yes; see the multiprocessing module. Here's an example (tweaked from the one in the docs to suit your use case). (I assume z = 1 here for simplicity, so that f takes a scalar.)
from multiprocessing import Pool
# x = 2, y = 3, z = 1 - needn't be known in advance
matrix = [[1, 2, 3], [4, 5, 6]]
def f(x):
# Your computation for each z-by-z submatrix goes here.
return x**2
with Pool() as p:
flat_results = p.map(f, [x for row in matrix for x in row])
# If you don't need to preserve the x*y shape of the input matrix,
# you can use `flat_results` and skip the rest.
x = len(matrix)
y = len(matrix[0])
results = [flat_results[i*y:(i+1)*y] for i in range(x)]
# Now `results` contains `[[1, 4, 9], [16, 25, 36]]`
This will divide up the x * y computations across several processes (one per CPU core; this can be tweaked by providing an argument to Pool()).
Depending on what you're doing, consider trying vectorized operations (as opposed to for loops) with numpy first; you might find that it's fast enough to make multiprocessing unnecessary. (If matrix were a numpy array in this example, the code would just be results = matrix**2.)
The way I approach parallel processing in python is to define a function to do what I want, then apply it in parallel using multiprocessing.Pool.starmap. It's hard to suggest any code for your problem without knowing what you are computing and how.
import multiprocessing as mp
def my_function(matrices_to_compare, matrix_of_matrices):
m1 = matrices_to_compare[0]
m2 = matrices_to_compare[1]
result = m1 - m2 # or whatever you want to do
return result
matrices_x = <list of x matrices>
matrices_y = <list of y matrices>
matrices_to_compare = list(zip(matrices_x,matrices_y))
with mp.Pool(mp.cpu_count()) as pool:
results = pool.starmap(my_function,
[(x, matrix_of_matrices) for x in matrices_to_compare],
chunksize=1)
An alternative to the multiprocessing pool approach proposed by other answers - if you have a GPU available possibly the most straightforward approach would be to use a tensor algebra package taking advantage of it, such as cupy or torch.
You could also get some more speedup by jit-compiling your code (for cpu) with cython or numba (and then for gpu programming there's also numba.cuda which however requires some background to use).
I was trying to evaluate a customized function over every point on an n-dimensional grid, after which I can marginalize and do corner plots.
This is conceptually simple but I'm struggling to find a way to do it efficiently. I tried a loop regardless, and it is indeed too slow, especially considering that I will be needing this for more parameters (a1, a2, a3...) as well. I was wondering if there is a faster way or any reliable package that could help?
EDITS: Sorry that my description of myfunction hasn't been very clear, since the function takes some specific external data. Nevertheless here's a sample function that demonstrates it:
import numpy as np
from scipy.ndimage import gaussian_filter
#This gaussian filter is needed to process my data
data = gaussian_filter(np.array([[1, 2], [3, 4]]), sigma = 1)
model1 = np.array([[0, 1], [2, 3]])
model2 = np.array([[2, 3], [4, 5]])
models = np.array([model1, model2])
(This is just a demonstration. The actual data and models are some 500x500-ish 2D arrays.)
and then
from scipy.special import gammaln
def myfunction(params):
"""
Calculates the log of the Poisson likelihood of generating data
given model params and fits.
params: array-like, with number entries. For example,
if params = np.array([a1, a2]), we generate a model of
a1*model1 + a2*model2.
"""
model_combined = np.sum(models * params[:,None,None], axis=0)
#Unfortunately I need to process my combined model as well
model_smeared = gaussian_filter(model_combined, sigma=1)
#This following line is calculating the log of the Poisson likelihood
#of each pixel taking its value given the combined model as the expectation
#value, taking advantage that numpy does element-wise calculations
#automatically in this case
loglikelihood_array = data * np.log(model_combined) - model_combined - gammaln(data+1)
#Sum up the loglikelihoods
loglikelihood_sum = np.sum(loglikelihood_array)
return loglikelihood_sum
The function itself will return me results immediately, but not so if I just simply write a for-loop to calculate it for, say, 100x100 pairs of parameters.
EDIT #2 I understand that the for-loop within my shown code (sorry for my sloppiness) is confusing (and thanks for the comments for the broadcasting simplification!), and I've just edited that.
My real problem isn't with the combining of the models[i], but with the implementation of the entire function (again described by a very sloppy for-loop here), and loglikes is what I finally wanted:
a1_array = np.linspace(0, 2, 100)
a2_array = np.linspace(2, 4, 100)
loglikes = np.empty((100, 100))
for i in range(len(a1_array)):
for j in range(len(a2_array)):
loglikes[i][j] = myfunction(np.array([a1_array[i], a2_array[j]]))
I think there should be a better way of doing this out there than this for-loop, but was unfortunately unaware of it. When I say more parameters I mean, say adding an a3_array = np.linspace(3, 5, 100) and then loglikes will be a 3-dimensional array, and so on.
Thanks again so much for any feedback!
Vectorising that loop won't save you any time and in fact may make things worse.
Instead of looping through a1_array and a2_array to create pairs, you can generate all pairs from the get go by putting them in a 100x100x2 array. This operation takes an insignificant amount of time compared to python loops. But when you're actually in the function and you're broadcasting your arrays so that you can do your calculations on data, you're now suddenly dealing with 100x100x2x500x500 arrays. You won't have memory for this and if you rely on file swapping it makes the operations exponentially slower.
Not only are you not saving any time (well, you do for very small arrays but it's the difference between 0.03 s vs 0.005 s), but with python loops you're only using a few 10s of MB of RAM, while with the vectorised approach it skyrockets into the GB.
But out of curiosity, this is how it could be vectorised.
import itertools
def vectorised(params):
model_combined = np.sum(params[...,None,None] * models, axis=2)
model_smeared = gaussian_filter(model_combined, sigma=1)
log_array = data * np.log(model_combined) - model_combined - gammaln(data+1)
return np.sum(log_array, axis=(-2, -1))
np.random.seed(0)
M = 500
N = 100
data = gaussian_filter(np.random.randint(0, 1000, (N, N)), sigma=1)
models = np.random.randint(1, 1000, (2, M, M))
a1 = np.linspace(0, 2, N)
a2 = np.linspace(2, 4, N)
a = np.array(list(itertools.product(a1, a2))).reshape((N, N, 2))
log_sum = vectorised(a)
And some benchmarks
Bottom line, a python loop ran 10000 times just to fetch some array elements takes 0.001 s in its totality. This is insignificant to your function taking 0.01 s for each call.
In the context of functional programming, a function that takes and returns a scalar can be mapped onto lists/vectors to return a list/vector of the mapped values. In regular Python, I would do this either functionally or in NumPy's vectorized fashion:
import numpy as np # NumPy import
xs = [1,2,3,4,5] # List of inputs
f = lambda x: x**2 # Some function, which could be the equivalent of an ExplicitComponent
list(map(f, xs)) # Functional style
[ f(x) for x in xs ] # List comprehension
f(np.array(xs)) # NumPy vectorized style
Is this type of behaviour attainable using Components? By this I mean a Component could take scalar inputs and perform like a normal function, but automatically performs the operations on vectors and lists of varying length as well.
I haven't been able to find anything similar in the documentation. I understand that most of the behaviour in OpenMDAO uses NumPy's vectorization for efficiency, but does this mean all components that could have vector inputs must be written using some kind of self.options.declare('num_nodes', default=1) method and passing a global state n for the number of nodes for lists/vectors of length/dimension n across all Components?
Regarding design considerations, I understand that vectorizations over Cartesian products of input vectors are not implemented by default in NumPy, and that they're more zip-like. But it does work like a partially-applied mapped function by default for a single NumPy array, e.g.:
>>> xs = [1,2,3,4,5]
>>> f = lambda x,y: x**2 + y**2
>>> f(np.array(xs), np.array([2,4,6,8]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
ValueError: operands could not be broadcast together with shapes (5,) (4,)
>>> f(np.array(xs), np.array([2,4,6,8,10]))
array([ 5, 20, 45, 80, 125])
>>> f(np.array(xs), 1)
array([ 2, 5, 10, 17, 26])
An alternative is to use NumPy's meshgrid() as follows:
xs, ys = [1,2,3,4,5], [2,4,6,8]
f = lambda x, y: x**2 + y**2
xys = np.meshgrid(xs, ys)
f(xys[0], xys[1])
So would something like this be more feasible (and desirable!) behaviour for Components?
As of OpenMDAO V3.1 the only way to do this in OpenMDAO is via an option/argument such as num_nodes or vec_size.
Other users have expressed an interest in allowing dynamically sized IO in components. For instance, basing he size of an input on the output to which it is connected. See https://github.com/OpenMDAO/POEMs/pull/51.
We're working on it, but we don't have a time table for when we'll find an acceptable solution at this time.
I have multiple arrays of the same dimension, or rather a matrix say
data.shape
# (n, m)
I want to interpolate the m-axis and leave the n-axis. Ideally I would get a function which I can call by with an x-array of length n.
interpolated(x)
x.shape
# (n,)
I tried
from scipy import interpolate
interpolated = interpolate.interp1d(x=x_points, y=data)
interpolated(x).shape
# (n, n)
but this evaluates every array at the given point. Is there a better way to do it than ugly loops like
interpolated = array(interpolate.interp1d(x=x_points, y=array_) for
array_ in data)
array(func_(xi) for func_, xi in zip(interpolated, x))
Your (n,m)-shaped data is, as you said, is a collection of n datasets, each of length m. You're trying to pass this an n-length x array, and expect to obtain an n-length result. That is, you're querying the n independent datasets at n unrelated points.
This makes me believe that you need to use n independent interpolators. There is no real benefit in trying to get away with a single call to an interpolation routine. Interpolation routines as far as I know assume that the target of the interpolation is a single object. Either a multivariate function, or a function that has an array-shaped value; in either case you can query the function one (optionally higher-dimensional) point at a time. For instance, multilinear interpolation works across rows of the input, so there's (again, as far as I know) no way to "interpolate linearly along an axis". In your case, there is absolutely no relationship between the rows of your data, and there's no relationship between query points, so it's also semantically motivated to use n independent interpolators for your problem.
As for convenience, you can shove all those interpolating functions into a single function for ease of use:
interpolated = [interpolate.interp1d(x=x_points, y=array_) for
array_ in data]
def common_interpolator(x):
'''interpolate n separate datasets at n separate input points'''
return array([fun(xx) for fun,xx in zip(interpolated,x)])
This will allow you to use a single call to common_interpolator with an input array_like of length n.
But since you mentioned it in comments, you can actually make use of np.vectorize if you want to add multiple sets if query points to this function. Here's a complete example with three trivial dummy functions:
import numpy as np
# three scalar (well, or vectorized) functions:
funs = [lambda x,i=i: x+i for i in range(3)]
# define a wrapper for calling them together
def allfuns(xs):
'''bundled call to functions: n-length input to n-length output'''
return np.array([fun(x) for fun,x in zip(funs,xs)])
# define a vectorized version of the wrapper, (...,n) to (...,n)-shape
allfuns_vector = np.vectorize(allfuns,signature='(n)->(n)')
# print some examples
x = np.arange(3)
print([fun(xx) for fun,xx in zip(funs,x)])
# [0, 2, 4]
print(allfuns(x))
# [0 2 4]
print(allfuns_vector(x))
# [0 2 4]
print(allfuns_vector([x,x+10]))
#[[ 0 2 4]
# [10 12 14]]
As you can see, all of the above work the same way for a 1d input array. But we can pass a (k,n)-shaped array to the vectorized version and it will perform the interpolation row-wise, that is each [:,n] slice will be fed to the original interpolator bundle. As far as I know np.vectorize is essentially a wrapper for a for loop, but at least it makes calling your functions more convenient.
I would like to apply one function to two ndarray's corresponding elements at once without using a for loop. Let's say I have the following two ndarrays x and y and a function foo that takes in two 1d-arrays and compute the beta.
The end result I want is to compute beta00 = foo(x[0, 0],y[0]), beta01 = foo(x[0, 1], y[1]), beta10 = foo(x[1, 0],y[0]), beta11 = foo(x[1, 1], y[1]) and yield a expected result of
[[beta00, beta01],
[beta10, beta11]]
I have been looking into vectorize function and apply function, but still don't have a solution. Could someone help me on this? Many thanks in advance.
import numpy as np
x = np.array([[[0, 1, 2, 3], [0, 1, 2, 3]],
[[2,3,4,5], [2,3,4,5]]])
y = np.array([[-1, 0.2, 0.9, 2.1], [-1, 0.2, 0.9, 2.1]])
def foo(x,y):
A = np.vstack([x, np.ones(x.shape)]).T
return np.linalg.lstsq(A, y)[0][0]
So you want
beta[i,j] = foo(x[i,j,:], y[j,:])
Where foo takes 2 1d arrays, and returns a scalar. The explicit : make it clear that we are using 3 and 2 arrays.
np.vectorize will not help because its function must accept scalars, not arrays. And - it is not a speed solution. It just as nice way of enabling broadcasting, handling inputs with a variety of dimensions.
There looping wrappers like apply_along(over)_axis, but they are still Python level loops. The key to any real speedup will be reworking the foo so it operates on 2 or 3d arrays, not just 1d ones. But that may be more work than it's worth, or even impossible.
So for reference, any alternative must match:
beta = np.zeros(x.shape[:2])
for i in range(x.shape[0]):
for j in range(x.shape[1]):
beta[i,j] = foo(x[i,j,:],y[j,:])
An alternative way of generating the multidimensional indexes is:
for i,j in np.ndindex(x.shape[:2]):
beta[i,j] = foo(x[i,j,:], y[j,:])
but it's not a time saver.
Look into whether foo can be written to accept a 2d y,
foo(x[i,j,:], y[None,j,:])
aiming eventually to be able to do:
beta = foo1(x, y[None,:])