Appending matricies into a single matrix with numpy - python

I have a function in Python that returns a numpy.mat of shape (100, 1). I am calling this function 4 times in a loop and would like to take the resulting 4 matricies and create a matrix of shape (100, 4). I have looked for sometime at numpy.append, numpy.concatenate, and numpy.insert but have not been able to get this working.
Here is a short SSCCE of my issue
zeros = np.zeros(shape=(100, 4))
for i in range(1, 5):
np.append(zeros, np.empty(shape=(100, 1)))
print(zeros)
Where zeros should results in a matrix of shape (100, 4) with "junk" values from each of the calls to numpy.empty and not all 0..

Do something along these lines -
zeros = np.zeros(shape=(100, 4))
for i in range(1, 5):
data = np.random.rand(100,1) # func that returns (100,1) shaped array
zeros[:,i-1] = data.ravel()
In place of ravel(), we could also use : data[:,0] or np.squeeze(data), basic idea is to feed a 1D array there, because the LHS zeros[:,i-1] expects a 1D array there.
As an alternative, inside the loop, we could also do -
zeros[:,[i-1]] = data
Thus, with that list of column index [i-1] instead of i-1, we are keeping the dimensions into which data is to be assigned (keeps as 2D) and that allows us to feed in data, which is also 2D without any change.

Related

Slice based on numpy.argmin results

Let us have a numpy array (float) with shape equal to (36, 2, 400, 400). Let us say the 400 by 400 represents an image. Then for each pixel I would like to find the two values (second dimension) which are when taking the norm over the second dimension, the lowest with respect to the first dimension. So I end up with an array of shape (2, 400, 400).
With np.argmin(np.linalg.norm(array, axis=1), axis=0) I am able to get the index for each of those 2 by 400 by 400 pixels which is almost what I want. But now I want to use this number to slice the original array in the first dimension so I am left with an array of shape (2, 400, 400).
What I can do is loop over all indices and construct the result pixel by pixel, but I am convinced there is a smarter way. Can anyone help me with a smarter way?
A minimal reproducible example as requested where distances is the array:
shape = (400, 400)
centers = np.random.randint(400, size=(36, 2))
distances = np.array([np.indices(shape) - np.array(center)[:, None, None] for center in centers])
nearest_center_index = np.argmin(np.linalg.norm(distances, axis=1), axis=0)
print(distances.shape)
print(nearest_center_index.shape)
plt.imshow(nearest_center_index)
out:
(36, 2, 400, 400)
(400, 400)
I was able, with help from the comments, to produce a somewhat ugly answer, which helped me futher to understand the problem. Let me elaborate. What is possible to do is to flatten the image and argmin results and then use advanced indexing with argmin and indices over the image to produce the results.
flatten_indices = nearest_center_index.reshape(400**2)
image_indices = range(400**2)
results = distances.reshape(36, 2, 400**2)[flatten_indices, :, image_indices].reshape(400, 400, 2).swapaxes(0, 2)
However, I think it happens a lot that you have indices that are shaped as a subset of the dimensions and have values containing indices of another dimension. I would expect a generic method to slice this.
Thus let us have and array with n dimensions with shape = (x1, x2, ..., xn) and let us say we have a array representing indices for a dimension, e.g., xi, that has shape which is a subset of the shape of the original array and not containing xi. Then I would expect a method to slice this array.
The function I was looking for is numpy.take_along_axis().
For the specific example the only thing needed to be done is making sure the nearest_center_index (output of argmin) has equal amount of dimensions as the to be sliced array. In the example this can be achieved by passing keepdims=True to both norm and argmin which then can be directly used as the second argument of the numpy function. The third argument should be the xi axis (in the example axis 0).
Without passing the keepdims=True, following the exact example, the stated objective can be achieved by:
result = np.take_along_axis(distances, nearest_center_index[None,None,:,:], 0)[0]

How to make a 2D ndarray from 3D so that (100, 50, 20) is (100, 100)

I want to merge two dimensions (y,z) of a 3D array (x,y,z) into one. Each corresponding value from y should be copied next to z.
For eg. I have 100 frames of a video with coordinates of 15 key points in 3 dimensions. The array shape is (100,15,3). I want output as (100, 45), which is merging y and z as 15x3.
Just use numpy.reshape. It can be used to flatten dimensions selectively.
import numpy as np
mat_3d = np.random.randn(2, 3, 4)
mat_2d = mat_3d.reshape((mat_3d.shape[0], -1))
print(mat_3d)
print(mat_2d)
In this example, I'm using (mat_3d.shape[0], -1) as argument of reshape. It means that the first dimension must stay unchanged, but all the other ones must be flatten (-1 is extra sugar to let numpy infers the right size, but using np.prod(mat_3d.shape[1:]) would be the same).
In such as case, Numpy first fetches values across the last axis (z here), then the second to last axis (y here), and so on and so forth in higher dimension.

Subtract Mean from Multidimensional Numpy-Array

I'm currently learning about broadcasting in Numpy and in the book I'm reading (Python for Data Analysis by Wes McKinney the author has mentioned the following example to "demean" a two-dimensional array:
import numpy as np
arr = np.random.randn(4, 3)
print(arr.mean(0))
demeaned = arr - arr.mean(0)
print(demeaned)
print(demeand.mean(0))
Which effectively causes the array demeaned to have a mean of 0.
I had the idea to apply this to an image-like, three-dimensional array:
import numpy as np
arr = np.random.randint(0, 256, (400,400,3))
demeaned = arr - arr.mean(2)
Which of course failed, because according to the broadcasting rule, the trailing dimensions have to match, and that's not the case here:
print(arr.shape) # (400, 400, 3)
print(arr.mean(2).shape) # (400, 400)
Now, i have gotten it to work mostly, by substracting the mean from every single index in the third dimension of the array:
demeaned = np.ones(arr.shape)
for i in range(3):
demeaned[...,i] = arr[...,i] - means
print(demeaned.mean(0))
At this point, the returned values are very close to zero and i think, that's a precision error. Am i actually right with this thought or is there another caveat, that i missed?
Also, this doesn't seam to be the cleanest, most 'numpy'-way to achieve what i wanted to achieve. Is there a function or a principle that i can make use of to improve the code?
As of numpy version 1.7.0, np.mean, and several other functions, accept a tuple in their axis parameter. This means that you can perform the operation on the planes of the image all at once:
m = arr.mean(axis=(0, 1))
This mean will have shape (3,), with one element for each plane of the image.
If you want to subtract the means of each pixel individually, you have to remember that broadcasting aligns shape tuples on the right edge. That means that you need to insert an extra dimension:
n = arr.mean(axis=2)
n = n.reshape(*n.shape, 1)
Or
n = arr.mean(axis=2)[..., None]
Try np.apply_along_axis().
np.apply_along_axis(lambda x: x - np.mean(x), 2, arr)
Output: you get the array of the same shape where each cell is demeaned in the dimension you want (the second parameter, here it is 2).

numpy array not broadcastable

This is an example of my error. Say i created a numpy array
X = np.zeros((1000, 50))
Where 1000 is the features (rows) and 50 is the examples (columns)
Since i am adding examples one by one i will have to replace columns in the array 1 by 1 to get the final feature array. I tried this:
X[:,i] = example
where example is of size (1000, 1), and i is iterated for every example. This does not work because X[:,i] is of shape (1000,), a rank 1 array. How do i code it so that each example replaces a row of the X array without throwing the broadcast error. Thank you.
Reshape your vector before assigning it.
X[:,i] = example.reshape(-1,)
This will suppress the second dimension and turn example into shape (1000,)
Or, avoiding assigning one by one in the loop you can put all of your arrays in a list and then call np.array on your list and transpose it to have them as columns. This will probably work better if you can construct your list of arrays in a list comprehension.
Example:
arrs = [np.random.randint(10, size=5) for _ in range(5)]
X = np.array(arrs).T

What is the best way to do multi-dimensional indexing with numpy?

I am trying to do some indexing on a 3D numpy array.
Basically I have an array phi which has shape (F,A,D); for example (5, 3, 7). Generated, for example as follows:
F=5; A=3; D=7; phi = np.random.random((F,A,D))
My goal is to be able to index over A and D, with a 2D array such as [[0,1,2],[5,5,6]], which means take the values indexed by 0 in the 3rd dimension, for the the first position in A, the values indexed by 1 in the 3rd dimension for the second position of A and so on. The result should have a shape that is (F,A,2) or (F,2,A).
This would be equivalent to manually cycling all the values of the "indexer array" such as:
phi[:,0,0]; phi[:,1,1]; phi[:,2,2]
phi[:,0,5]; phi[:,1,5]; phi[:,2,6]
Intuitively I would do something like phi[:,:,[[0,1,2],[3,3,3]]], but it's shape ends up being (5, 3, 2, 3).
Any ideas on how to obtain the correct result?
I think this is what you want
phi[:,range(A),[[0,1,2],[5,5,6]]]
Your attempt
phi[:,:,[[0,1,2],[5,5,6]]]
takes the values along the third dimension for every values of the first two dimensions, therefore you end up with a shape of (5,3,2,3).
However, according to your example you want a continous increase in the second dimension which is accomplished in my code by range(A) and numpy's broadcasting.

Categories