I have two arrays A,B and want to take the outer product on their last dimension,
e.g.
result[:,i,j]=A[:,i]*B[:,j]
when A,B are 2-dimensional.
How can I do this if I don't know whether they will be 2 or 3 dimensional?
In my specific problem A,B are slices out of a bigger 3-dimensional array Z,
Sometimes this may be called with integer indices A=Z[:,1,:], B=Z[:,2,:] and other times
with slices A=Z[:,1:3,:],B=Z[:,4:6,:].
Since scipy "squeezes" singleton dimensions, I won't know what dimensions my inputs
will be.
The array-outer-product I'm trying to define should satisfy
array_outer_product( Y[a,b,:], Z[i,j,:] ) == scipy.outer( Y[a,b,:], Z[i,j,:] )
array_outer_product( Y[a:a+N,b,:], Z[i:i+N,j,:])[n,:,:] == scipy.outer( Y[a+n,b,:], Z[i+n,j,:] )
array_outer_product( Y[a:a+N,b:b+M,:], Z[i:i+N, j:j+M,:] )[n,m,:,:]==scipy.outer( Y[a+n,b+m,:] , Z[i+n,j+m,:] )
for any rank-3 arrays Y,Z and integers a,b,...i,j,k...n,N,...
The kind of problem I'm dealing with involves a 2-D spatial grid, with a vector-valued function at each grid point. I want to be able to calculate the covariance matrix (outer product) of these vectors, over regions defined by slices in the first two axes.
You may have some luck with einsum :
http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html
After discovering the use of ellipsis in numpy/scipy arrays
I ended up implementing it as a recursive function:
def array_outer_product(A, B, result=None):
''' Compute the outer-product in the final two dimensions of the given arrays.
If the result array is provided, the results are written into it.
'''
assert(A.shape[:-1] == B.shape[:-1])
if result is None:
result=scipy.zeros(A.shape+B.shape[-1:], dtype=A.dtype)
if A.ndim==1:
result[:,:]=scipy.outer(A, B)
else:
for idx in xrange(A.shape[0]):
array_outer_product(A[idx,...], B[idx,...], result[idx,...])
return result
Assuming I've understood you correctly, I encountered a similar issue in my research a couple weeks ago. I realized that the Kronecker product is simply an outer product which preserves dimensionality. Thus, you could do something like this:
import numpy as np
# Generate some data
a = np.random.random((3,2,4))
b = np.random.random((2,5))
# Now compute the Kronecker delta function
c = np.kron(a,b)
# Check the shape
np.prod(c.shape) == np.prod(a.shape)*np.prod(b.shape)
I'm not sure what shape you want at the end, but you could use array slicing in combination with np.rollaxis, np.reshape, np.ravel (etc.) to shuffle things around as you wish. I guess the downside of this is that it does some extra calculations. This may or may not matter, depending on your limitations.
Related
In my current project, I have a D-dimensional array. For the sake of exposition, we can assume D=2, but the code should work with arbitrarily high dimensions. I need to run some operations on this matrix when it is sorted according to its last dimension, and subsequently reverse the sorting on the matrix.
The first part of sorting the matrix is relatively simple:
import numpy as np
D = 2
matrix = np.random.uniform(low=0.,high=1.,size=tuple([5]*D))
matrix_sorted = np.sort(matrix,axis=-1)
This code snippet sorts the matrix according to the last dimension, but does not remember how the array was sorted, and consequently does not allow me to revert the sorting. Alternatively, I could get the sorted indices with the following line:
sorted_indices = np.argsort(matrix,axis=-1)
Unfortunately, these indices do not seem to be very useful. I am not sure how I can use these sorted indices to (a) sort the matrix, and (b) undo the sorting in the case for general D. A simple approach would be to create a for-loop over all rows for the D=2 case (in this case, we sorted across the columns), but since I want the code to work for arbitrary dimensions, hard-coding nested for-loops is not really an option.
Do you have any elegant suggestions on how I could tackle this issue?
So yes, after following Mark M's suggestion, and reading up on some other StackOverflow answers continuing from there, the answer seems to be as follows:
import numpy as np
# Create the initial random matrix
D = 2
matrix = np.random.uniform(low=0.,high=1.,size=tuple([5]*D))
# Get the sorting indices
sorting = np.argsort(matrix,axis=-1)
# Get the indices for unsorting the matrix
reverse_sorting = np.argsort(sorting,axis=-1)
# Sort the initial matrix
matrix_sorted = np.take_along_axis(matrix, sorting, axis=-1)
# Undo the sorting
matrix_unsorted = np.take_along_axis(matrix_sorted, reverse_sorting, axis=-1)
The trick consists of two steps: np.take_along_axis allows us to sort arbitrarily-dimensional matrices according to the indices we get from np.argsort, and sorting the indices gives us the set of indices required to undo the sorting again with np.take_along_axis. I can do the desired complex operations between the penultimate and ultimate steps. Perfect!
Given a matrix of functions (every function can have a different logic). The shape is [N, x].
matrix_of_functions = [
[fun11, fun12, fun13],
[fun21, fun22, fun23],
...
[funN1, funN2, funN3]
]
There is also an array of parameters with shape [x].
array_of_parameters = [param1, param2, param3]
This dimension has the same size x. The parameters should broadcast over this dimension and the function should be applied.
The resulting matrix therefore is (should be a numpy array in the end):
matrix_of_results = [
[fun11(param1), fun12(param2), fun13(param3)],
[fun21(param1), fun22(param2), fun23(param3)],
...
[funN1(param1), funN2(param2), funN3(param3)]
]
It feels like there has to be a beautiful way to do this, but how?
Obviously, this could be done like this:
matrix_of_results = []
for array_of_functions in matrix_of_functions:
array_of_results = [fun(param) for fun, param in zip(array_of_functions, array_of_parameters])]:
matrix_of_results.append(array_of_results )
Or more compact:
matrix_of_results = [
[fun(param) for fun, param in zip(array_of_functions, array_of_parameters])]
for array_of_functions in matrix_of_functions
]
Or many other ways... but that is neither readable nor beautiful.
I did hope that there is a numpy ish way. Meaning, that the broadcasting is automatically handled since the shapes [N, x] and [x] are broadcastable. But that doesn't seem to be the case (np.vectorize does only take a single function and not a list or matrix of functions).
#mephisto pointed me to another question that contained a similar goal: Numpy: Apply an array of functions to a same length 2d-array of value as if multiplying elementwise? (using a python function as an operator?)
The difference is, that the other question was about broadcasting an array of functions to a matrix of parameters.
The good part is, that the answer also applies in the case where we try to map an array of parameters to a matrix of functions.
The solution was np.vectorize after all:
apply_vectorized = np.vectorize(lambda f, x: f(x))
matrix_of_results = apply_vectorized(matrix_of_functions, array_of_parameters)
The apply_vectorized method is exactly what I was looking for.
I did hope that this functionality would come out of the box and that I would not have to implement a helper function, but I can live with that.
Assume having two vectors with m x 6, n x 6
import numpy as np
a = np.random.random(m,6)
b = np.random.random(n,6)
using np.inner works as expected and yields
np.inner(a,b).shape
(m,n)
with every element being the scalar product of each combination. I now want to compute a special inner product (namely Plucker). Right now im using
def pluckerSide(a,b):
a0,a1,a2,a3,a4,a5 = a
b0,b1,b2,b3,b4,b5 = b
return a0*b4+a1*b5+a2*b3+a4*b0+a5*b1+a3*b2
with a,b sliced by a for loop. Which is way too slow. Any plans on vectorizing fail. Mostly broadcast errors due to wrong shapes. Cant get np.vectorize to work either.
Maybe someone can help here?
There seems to be an indexing based on some random indices for pairwise multiplication and summing on those two input arrays with function pluckerSide. So, I would list out those indices, index into the arrays with those and finally use matrix-multiplication with np.dot to perform the sum-reduction.
Thus, one approach would be like this -
a_idx = np.array([0,1,2,4,5,3])
b_idx = np.array([4,5,3,0,1,2])
out = a[a_idx].dot(b[b_idx])
If you are doing this in a loop across all rows of a and b and thus generating an output array of shape (m,n), we can vectorize that, like so -
out_all = a[:,a_idx].dot(b[:,b_idx].T)
To make things a bit easier, we can re-arrange a_idx such that it becomes range(6) and re-arrange b_idx with that pattern. So, we would have :
a_idx = np.array([0,1,2,3,4,5])
b_idx = np.array([4,5,3,2,0,1])
Thus, we can skip indexing into a and the solution would be simply -
a.dot(b[:,b_idx].T)
For an assignment I have to use different combinations of features belonging to some data, to evaluate a classification system. By features I mean measurements, e.g. height, weight, age, income. So for instance I want to see how well a classifier performs when given just the height and weight to work with, and then the height and age say. I not only want to be able to test what two features work best together, but also what 3 features work best together and would like to be able to generalise this to n features.
I've been attempting this using numpy's mgrid, to create n dimensional arrays, flattening them, and then making arrays that use the same elements from each array to create new ones. Tricky to explain so here is some code and psuedo code:
import numpy as np
def test_feature_combos(data, combinations):
dimensions = combinations.shape[0]
grid = np.empty(dimensions)
for i in xrange(dimensions):
grid[i] = combinations[i].flatten()
#The above code throws an error "setting an array element with a sequence" error which I understand, but this shows my approach.
**Pseudo code begin**
For each element of each element of this new array,
create a new array like so:
[[1,1,2,2],[1,2,1,2]] ---> [[1,1],[1,2],[2,1],[2,2]]
Call this new array combo_indices
Then choose the columns (features) from the data in a loop using:
new_data = data[:, combo_indices[j]]
combinations = np.mgrid[1:5,1:5]
test_feature_combos(data, combinations)
I concede that this approach means a lot of unnecessary combinations due to repeats, however I cannot even implement this so beggars can not be choosers.
Please can someone advise me on how I can either a) implement my approach or b) achieve this goal in a much more elegant way.
Thanks in advance, and let me know if any clarification needs to be made, this was tough to explain.
To generate all combinations of k elements drawn without replacement from a set of size n you can use itertools.combinations, e.g.:
idx = np.vstack(itertools.combinations(range(n), k)) # an (n, k) array of indices
For the special case where k=2 it's often faster to use the indices of the upper triangle of an n x n matrix, e.g.:
idx = np.vstack(np.triu_indices(n, 1)).T
numpy provides three handy routines to turn an array into at least a 1D, 2D, or 3D array, e.g. through numpy.atleast_3d
I need the equivalent for one more dimension: atleast_4d. I can think of various ways using nested if statements but I was wondering whether there is a more efficient and faster method of returning the array in question. In you answer, I would be interested to see an estimate (O(n)) of the speed of execution if you can.
The np.array method has an optional ndmin keyword argument that:
Specifies the minimum number of dimensions that the resulting array
should have. Ones will be pre-pended to the shape as needed to meet
this requirement.
If you also set copy=False you should get close to what you are after.
As a do-it-yourself alternative, if you want extra dimensions trailing rather than leading:
arr.shape += (1,) * (4 - arr.ndim)
Why couldn't it just be something as simple as this:
import numpy as np
def atleast_4d(x):
if x.ndim < 4:
y = np.expand_dims(np.atleast_3d(x), axis=3)
else:
y = x
return y
ie. if the number of dimensions is less than four, call atleast_3d and append an extra dimension on the end, otherwise just return the array unchanged.