I have a function foo(x,y) that takes two scalars (or lists of scalars) and returns a scalar output (or list of scalars computed pairwise from the input). I want to be able to evaluate this function over 2 orthogonal arrays such that the output is a matrix ij of foo(x[i], y[j]).
I have a for-loop version that solves this problem as below:
import numpy as np
x = np.range(50) # Could be linspaces, whatever the axis in the vector space is
y = np.range(50)
mat = np.zeros(len(x), len(y)) # To hold the result for plotting
for i in range(len(x)):
for j in range(len(y)):
mat[i][j] = foo(x[i], y[j])
where my result is stored in mat. However, this is dreadfully slow, and looks to me as if it could easily be vectorized. I'm not aware of how Python solves this problem however, as this doesn't appear to be something like zip or map. Is there another such function or concept (beyond trivially making extremely long arrays of the same array rotated by a value and passing them that way) that could vectorize this successfully? Or is the nature of the foo function limiting the ability to vectorize this?
In this case, itertools.product is the tool you want. It generates an iterable sequence of elements from the Cartesian product of N inputs, which you can use to discretely map a vector space. You can then evaluate foo on these. This isn't vectorization per se, but does reduce the nested for loop.
See docs at https://docs.python.org/3/library/itertools.html#itertools.product
Related
I am trying to get rid of the for loop and instead do an array-matrix multiplication to decrease the processing time when the weights array is very large:
import numpy as np
sequence = [np.random.random(10), np.random.random(10), np.random.random(10)]
weights = np.array([[0.1,0.3,0.6],[0.5,0.2,0.3],[0.1,0.8,0.1]])
Cov_matrix = np.matrix(np.cov(sequence))
results = []
for w in weights:
result = np.matrix(w)*Cov_matrix*np.matrix(w).T
results.append(result.A)
Where:
Cov_matrix is a 3x3 matrix
weights is an array of n lenght with n 1x3 matrices in it.
Is there a way to multiply/map weights to Cov_matrix and bypass the for loop? I am not very familiar with all the numpy functions.
I'd like to reiterate what's already been said in another answer: the np.matrix class has much more disadvantages than advantages these days, and I suggest moving to the use of the np.array class alone. Matrix multiplication of arrays can be easily written using the # operator, so the notation is in most cases as elegant as for the matrix class (and arrays don't have several restrictions that matrices do).
With that out of the way, what you need can be done in terms of a call to np.einsum. We need to contract certain indices of three matrices while keeping one index alone in two matrices. That is, we want to perform w_{ij} * Cov_{jk} * w.T_{ki} with a summation over j, k, giving us an array with i indices. The following call to einsum will do:
res = np.einsum('ij,jk,ik->i', weights, Cov_matrix, weights)
Note that the above will give you a single 1d array, whereas you originally had a list of arrays with shape (1,1). I suspect the above result will even make more sense. Also, note that I omitted the transpose in the second weights argument, and this is why the corresponding summation indices appear as ik rather than ki. This should be marginally faster.
To prove that the above gives the same result:
In [8]: results # original
Out[8]: [array([[0.02803215]]), array([[0.02280609]]), array([[0.0318784]])]
In [9]: res # einsum
Out[9]: array([0.02803215, 0.02280609, 0.0318784 ])
The same can be achieved by working with the weights as a matrix and then looking at the diagonal elements of the result. Namely:
np.diag(weights.dot(Cov_matrix).dot(weights.transpose()))
which gives:
array([0.03553664, 0.02394509, 0.03765553])
This does more calculations than necessary (calculates off-diagonals) so maybe someone will suggest a more efficient method.
Note: I'd suggest slowly moving away from np.matrix and instead work with np.array. It takes a bit of getting used to not being able to do A*b but will pay dividends in the long run. Here is a related discussion.
I have a question how to efficiently apply a function which takes an m-dimensional slice of a n-dimensional array as an input.
For example, I have a n-dimensional array of shape (i,j,k,l). And on the dimensions (j,l), I want to apply the function, which gives me back a matrix of shape (j,l). The resulting numpy array should again have the shape (i,j,k,l).
For example I want to apply the following, normalisation function
def norm(arr2d):
return arr2d - np.mean(arr2d)
over the array
arrnd = np.arange(2*3*4*5).reshape(2,3,4,5) # Shape is (2,3,4,5)
on the slice (j,l).
The result I want to achieve I would get via a (slow?) Python list comprehension and moving axes.
result = np.asarray([ [ f(arrnd[:,j,:,l]) for l in range(5) ] for j in range(3)]) # Shape is (3,5,2,4)
result = np.moveaxis(np.moveaxis(result,2,0),2,3).shape # Shape is (2,3,4,5) again
Is there any better, more "numpyic" way to achieve this, without any involved loops?
I alreay looked at np.apply_along_axis() and np.apply_over_axes() but the former only works for 1-d functions, and the latter might only work, if my function is implemented as a ufunc.
The example I provided is just a toy example. The solution should work for any python function.
((If normalising a slice would be my specific problem, I could have circumenvented the python loop and moveaxis by using the ufunc's axes=(..).))
I have multiple arrays of the same dimension, or rather a matrix say
data.shape
# (n, m)
I want to interpolate the m-axis and leave the n-axis. Ideally I would get a function which I can call by with an x-array of length n.
interpolated(x)
x.shape
# (n,)
I tried
from scipy import interpolate
interpolated = interpolate.interp1d(x=x_points, y=data)
interpolated(x).shape
# (n, n)
but this evaluates every array at the given point. Is there a better way to do it than ugly loops like
interpolated = array(interpolate.interp1d(x=x_points, y=array_) for
array_ in data)
array(func_(xi) for func_, xi in zip(interpolated, x))
Your (n,m)-shaped data is, as you said, is a collection of n datasets, each of length m. You're trying to pass this an n-length x array, and expect to obtain an n-length result. That is, you're querying the n independent datasets at n unrelated points.
This makes me believe that you need to use n independent interpolators. There is no real benefit in trying to get away with a single call to an interpolation routine. Interpolation routines as far as I know assume that the target of the interpolation is a single object. Either a multivariate function, or a function that has an array-shaped value; in either case you can query the function one (optionally higher-dimensional) point at a time. For instance, multilinear interpolation works across rows of the input, so there's (again, as far as I know) no way to "interpolate linearly along an axis". In your case, there is absolutely no relationship between the rows of your data, and there's no relationship between query points, so it's also semantically motivated to use n independent interpolators for your problem.
As for convenience, you can shove all those interpolating functions into a single function for ease of use:
interpolated = [interpolate.interp1d(x=x_points, y=array_) for
array_ in data]
def common_interpolator(x):
'''interpolate n separate datasets at n separate input points'''
return array([fun(xx) for fun,xx in zip(interpolated,x)])
This will allow you to use a single call to common_interpolator with an input array_like of length n.
But since you mentioned it in comments, you can actually make use of np.vectorize if you want to add multiple sets if query points to this function. Here's a complete example with three trivial dummy functions:
import numpy as np
# three scalar (well, or vectorized) functions:
funs = [lambda x,i=i: x+i for i in range(3)]
# define a wrapper for calling them together
def allfuns(xs):
'''bundled call to functions: n-length input to n-length output'''
return np.array([fun(x) for fun,x in zip(funs,xs)])
# define a vectorized version of the wrapper, (...,n) to (...,n)-shape
allfuns_vector = np.vectorize(allfuns,signature='(n)->(n)')
# print some examples
x = np.arange(3)
print([fun(xx) for fun,xx in zip(funs,x)])
# [0, 2, 4]
print(allfuns(x))
# [0 2 4]
print(allfuns_vector(x))
# [0 2 4]
print(allfuns_vector([x,x+10]))
#[[ 0 2 4]
# [10 12 14]]
As you can see, all of the above work the same way for a 1d input array. But we can pass a (k,n)-shaped array to the vectorized version and it will perform the interpolation row-wise, that is each [:,n] slice will be fed to the original interpolator bundle. As far as I know np.vectorize is essentially a wrapper for a for loop, but at least it makes calling your functions more convenient.
I am trying to vectorize a function that takes as its input a 3-Component vector "x" and a 3x3 "matrix" and produces the scalar
def myfunc(x, matrix):
return np.dot(x, np.dot(matrix, x))
However this needs to be called "n" times, and the vector x has different components each time. I would like to modify this function such that it takes as input some 3xn arrays (the columns of which are the vectors x) and produces a vector whose components are the scalars that would have been computed at each iteration.
I can write down an Einstein summation that does the job but it requires that I construct a 3x3xn stack of "copies" of the original 3x3. I am concerned that doing this will blow away any performance gains I get from trying to do this. Is there any way to compute the vector I want without making copies of the 3x3?
Let x be the 3xN array and y be the 3x3 array. You're looking for
z = numpy.einsum('ji,jk,ki->i', x, y, x)
You also could have built that 3x3xN array you were talking about as a view of y to avoid copying, but it isn't necessary.
I am working on an image processing problem where I have code that looks like this (the code written below just illustrates the type of problem I want to solve):
for i in range(0,10):
for j in range(0,10):
number_length = round(random.random()*10)
a = np.zeros(number_length)
Z[i][j] = a
What I want to do is create some sort of 2D list or np.array (not really sure) where I essentially index a term for every pixel in an image, and have a vector/list of values for every individual pixel of which I can not anticipate its length, moreover, the length of each vector for every indexed pixel is different to each other. What is the best way to go about this?
In my MATLAB code the workaround is simple: I define a 2D cell and just assign any vector to any element in the 2D cell. Since cells do not complain about coherent length of every indexed vector, this is a good thing. What is the equivalent optimal solution to handle this in python?
Ideally the solution should not involve anticipating the maximum length of "a" for any pixel and to make all indexed vectors the same length (since this implies I have to do some sort of zero padding that will consume memory if the indexed vectors are high dimensional and these high dimensional vectors are sparse through out the image).
A NumPy array won't work because it requires fixed dimensions. You can use a 2d list (i.e. list of lists), where each element can be an array of arbitrary length. This is analogous to your setup in Matlab, using a 2d cell array of vectors.
Try this:
z = [[np.zeros(np.random.randint(10)+1) for j in range(10)] for i in range(10)]
This creates a 10x10 list, where z[i][j] is a NumPy array of zeros with random length (from 1 to 10).
Edit (nested loops requested in comment):
z = [[None for j in range(10)] for i in range(10)]
for i in range(len(z)):
for j in range(len(z[i])):
z[i][j] = np.zeros(np.random.randint(10)+1)