I want to repeat a 1D-array along the dimensions of another array, knowing that this number of dimensions can change.
For example:
import numpy as np
to_repeat = np.linspace(0, 100, 10)
base_array = np.random.random((24, 60)) ## this one can have more than two dimensions.
final_array = np.array([[to_repeat for i in range(base_array.shape[0])] for j in range(base_array.shape[1])]).T
print(final_array.shape)
# >>> (10, 24, 60)
How can this be extended to an array base_array with an arbitrary number of dimensions?
Possibly using numpy vectorized functions in order to avoid loops?
EDIT (bigger picture):
base_array is in fact of shape (10, 24, 60) (if we stick to this example), where the coordinates along the first dimension are the vector to_repeat.
I'm looking for the minimum along the first dimension of base_array, and create the array of corresponding coordinates, here of shape (24, 60).
You don't need final_array, you can get the result you want by:
to_repeat[base_array.argmin(0)]
Related
Let us have a numpy array (float) with shape equal to (36, 2, 400, 400). Let us say the 400 by 400 represents an image. Then for each pixel I would like to find the two values (second dimension) which are when taking the norm over the second dimension, the lowest with respect to the first dimension. So I end up with an array of shape (2, 400, 400).
With np.argmin(np.linalg.norm(array, axis=1), axis=0) I am able to get the index for each of those 2 by 400 by 400 pixels which is almost what I want. But now I want to use this number to slice the original array in the first dimension so I am left with an array of shape (2, 400, 400).
What I can do is loop over all indices and construct the result pixel by pixel, but I am convinced there is a smarter way. Can anyone help me with a smarter way?
A minimal reproducible example as requested where distances is the array:
shape = (400, 400)
centers = np.random.randint(400, size=(36, 2))
distances = np.array([np.indices(shape) - np.array(center)[:, None, None] for center in centers])
nearest_center_index = np.argmin(np.linalg.norm(distances, axis=1), axis=0)
print(distances.shape)
print(nearest_center_index.shape)
plt.imshow(nearest_center_index)
out:
(36, 2, 400, 400)
(400, 400)
I was able, with help from the comments, to produce a somewhat ugly answer, which helped me futher to understand the problem. Let me elaborate. What is possible to do is to flatten the image and argmin results and then use advanced indexing with argmin and indices over the image to produce the results.
flatten_indices = nearest_center_index.reshape(400**2)
image_indices = range(400**2)
results = distances.reshape(36, 2, 400**2)[flatten_indices, :, image_indices].reshape(400, 400, 2).swapaxes(0, 2)
However, I think it happens a lot that you have indices that are shaped as a subset of the dimensions and have values containing indices of another dimension. I would expect a generic method to slice this.
Thus let us have and array with n dimensions with shape = (x1, x2, ..., xn) and let us say we have a array representing indices for a dimension, e.g., xi, that has shape which is a subset of the shape of the original array and not containing xi. Then I would expect a method to slice this array.
The function I was looking for is numpy.take_along_axis().
For the specific example the only thing needed to be done is making sure the nearest_center_index (output of argmin) has equal amount of dimensions as the to be sliced array. In the example this can be achieved by passing keepdims=True to both norm and argmin which then can be directly used as the second argument of the numpy function. The third argument should be the xi axis (in the example axis 0).
Without passing the keepdims=True, following the exact example, the stated objective can be achieved by:
result = np.take_along_axis(distances, nearest_center_index[None,None,:,:], 0)[0]
I want to merge two dimensions (y,z) of a 3D array (x,y,z) into one. Each corresponding value from y should be copied next to z.
For eg. I have 100 frames of a video with coordinates of 15 key points in 3 dimensions. The array shape is (100,15,3). I want output as (100, 45), which is merging y and z as 15x3.
Just use numpy.reshape. It can be used to flatten dimensions selectively.
import numpy as np
mat_3d = np.random.randn(2, 3, 4)
mat_2d = mat_3d.reshape((mat_3d.shape[0], -1))
print(mat_3d)
print(mat_2d)
In this example, I'm using (mat_3d.shape[0], -1) as argument of reshape. It means that the first dimension must stay unchanged, but all the other ones must be flatten (-1 is extra sugar to let numpy infers the right size, but using np.prod(mat_3d.shape[1:]) would be the same).
In such as case, Numpy first fetches values across the last axis (z here), then the second to last axis (y here), and so on and so forth in higher dimension.
I want to multiply two numpy arrays. One numpy array is given by matrix of shape (10, 10) and the other is given by a matrix of matrices, i.e. shape (10, 10, 256, 256).
I now simply want to multiply each matrix in the second matrix of matrices with the corresponding component in the first matrix. For instance, the matrix at position (0, 0) in the second matrix shall be multiplied by the value at position (0, 0) in the first matrix.
Intuitively, this is not really complicated, but numpy does not seem to support that. Or at least I am not smart enough to make it work. The ValueError that is thrown says:
ValueError: operands could not be broadcast together with shapes (10,10) (10,10,256,256)
Can anybody of you help me please? How can I achieve what I want in a numpyy way.
You can use the NumPy einsum function, e.g., (using zeros arrays as dummies in this example):
import numpy as np
x = np.zeros((10, 10))
y = np.zeros((10, 10, 256, 256))
z = np.einsum("ij,ijkm->km", x, y)
print(z.shape)
(256, 256)
See here for a nice description of einsum's usage.
I'm currently learning about broadcasting in Numpy and in the book I'm reading (Python for Data Analysis by Wes McKinney the author has mentioned the following example to "demean" a two-dimensional array:
import numpy as np
arr = np.random.randn(4, 3)
print(arr.mean(0))
demeaned = arr - arr.mean(0)
print(demeaned)
print(demeand.mean(0))
Which effectively causes the array demeaned to have a mean of 0.
I had the idea to apply this to an image-like, three-dimensional array:
import numpy as np
arr = np.random.randint(0, 256, (400,400,3))
demeaned = arr - arr.mean(2)
Which of course failed, because according to the broadcasting rule, the trailing dimensions have to match, and that's not the case here:
print(arr.shape) # (400, 400, 3)
print(arr.mean(2).shape) # (400, 400)
Now, i have gotten it to work mostly, by substracting the mean from every single index in the third dimension of the array:
demeaned = np.ones(arr.shape)
for i in range(3):
demeaned[...,i] = arr[...,i] - means
print(demeaned.mean(0))
At this point, the returned values are very close to zero and i think, that's a precision error. Am i actually right with this thought or is there another caveat, that i missed?
Also, this doesn't seam to be the cleanest, most 'numpy'-way to achieve what i wanted to achieve. Is there a function or a principle that i can make use of to improve the code?
As of numpy version 1.7.0, np.mean, and several other functions, accept a tuple in their axis parameter. This means that you can perform the operation on the planes of the image all at once:
m = arr.mean(axis=(0, 1))
This mean will have shape (3,), with one element for each plane of the image.
If you want to subtract the means of each pixel individually, you have to remember that broadcasting aligns shape tuples on the right edge. That means that you need to insert an extra dimension:
n = arr.mean(axis=2)
n = n.reshape(*n.shape, 1)
Or
n = arr.mean(axis=2)[..., None]
Try np.apply_along_axis().
np.apply_along_axis(lambda x: x - np.mean(x), 2, arr)
Output: you get the array of the same shape where each cell is demeaned in the dimension you want (the second parameter, here it is 2).
I have numpy array 'test' of dimension (100, 100, 16, 16) which gives me a different 16x16 array for points on a 100x100 grid.
I also have some eigenvalues and vectors where vals has the dimension (100, 100, 16) and vecs (100, 100, 16, 16) where vecs[x, y, :, i] would be the ith eigenvector of the matrix at the point (x, y) corresponding to the ith eigenvalue vals[x, y, i].
Now I want to take the first eigenvector of the array at ALL points on the grid, do a matrix product with the test matrix and then do a scalar product of the resulting vector with all the other eigenvectors of the array at all points on the grid and sum them.
The resulting array should have the dimension (100, 100). After this I would like to take the 2nd eigenvector of the array, matrix multiply it with test and then take the scalar product of the result with all the eigenvectors that is not the 2nd and so on so that in the end I have 16 (100, 100) or rather a (100, 100, 16) array. I only succeded sofar with a lot of for loops which I would like to avoid, but using tensordot gives me the wrong dimension and I don't see how to pick the axis which is vectorized along for the np.dot function.
I heard that einsum might be suitable to this task, but everything that doesn't rely on the python loops is fine by me.
import numpy as np
from numpy import linalg as la
test = np.arange(16*16*100*100).reshape((100, 100, 16, 16))
vals, vecs = la.eig(test + 1)
np.tensordot(vecs, test, axes=[2, 3]).shape
>>> (10, 10, 16, 10, 10, 16)
EDIT: Ok, so I used np.einsum to get a correct intermediate result.
np.einsum('ijkl, ijkm -> ijlm', vecs, test)
But in the next step I want to do the scalarproduct only with all the other entries of vec. Can I implement maybe some inverse Kronecker delta in this einsum formalism? Or should I switch back to the usual numpy now?
Ok, I played around and with np.einsum I found a way to do what is described above. A nice feature of einsum is that if you repeat doubly occuring indices in the 'output' (so right of the '->'-thing) you can have element-wise multiplication along some and contraction along some other axes (something that you don't have in handwritten tensor algebra notation).
result = np.einsum('ijkl, ijlm -> ijkm', np.einsum('ijkl, ijkm -> ijlm', vecs, test), vecs)
This nearly does the trick. Now only the diagonal terms have to be taken out. We could do this by just substracting the diagonal terms like this:
result = result - result * np.eye(np.shape(test)[-1])[None, None, ...]