Appending along 4th dimension of numpy array - python

I have a 4d set of data structured (time,level,lat,lon) that i am trying to interpolate. In order to do so easily i need to add an extra longitude value onto the end of the data with the same values as the first longitude. this will allow the interpolation method i am using to correctly interpolate at higher longitude values(eg 359)
currently data has dimension (64,70,64,128), need to make it have dimension (64,70,64,129) where the values at the last longitude is the same as the ones at the first longitude.
Here is what i have tried so far,
data = np.concatenate((data, data[:,:,:,0]), axis = 3)
and
data = np.append( data, data[:,:,:,0],axis = 3)
however i get
ValueError: all the input arrays must have same number of dimensions
for both, i tried adding an extra dimension to the data to append with data[:,:,:,0][...,np.newaxis] however that did not help.
At this point I am not sure how to go about doing this, other than looping through each time,level,lat and appending a single value, however i need to perform this operation to hundreds of sets of data so this would get very slow.
Any ideas?

The issue is that your arrays need to share the same shape (obviously from the error message), but what that means is that your arrays need to have the same number of dimensions. The quick answer is use
np.append(data, data[:,:,:,0,np.newaxis], axis=3)
# or alternatively in shorthand:
np.append(data, data[...,0,None], axis=-1)
Adding either None or np.newaxis at the end of your slice adds an extra dimension to the array:
>>> data.shape
(64, 70, 64, 128)
>>> data[...,0].shape
(64, 70, 64)
>>> data[...,0,None].shape
(64, 70, 64, 1)
This allows the arrays to share the same number of dimensions and the same shape in all dimensions but the one you're appending over.

Related

Slice based on numpy.argmin results

Let us have a numpy array (float) with shape equal to (36, 2, 400, 400). Let us say the 400 by 400 represents an image. Then for each pixel I would like to find the two values (second dimension) which are when taking the norm over the second dimension, the lowest with respect to the first dimension. So I end up with an array of shape (2, 400, 400).
With np.argmin(np.linalg.norm(array, axis=1), axis=0) I am able to get the index for each of those 2 by 400 by 400 pixels which is almost what I want. But now I want to use this number to slice the original array in the first dimension so I am left with an array of shape (2, 400, 400).
What I can do is loop over all indices and construct the result pixel by pixel, but I am convinced there is a smarter way. Can anyone help me with a smarter way?
A minimal reproducible example as requested where distances is the array:
shape = (400, 400)
centers = np.random.randint(400, size=(36, 2))
distances = np.array([np.indices(shape) - np.array(center)[:, None, None] for center in centers])
nearest_center_index = np.argmin(np.linalg.norm(distances, axis=1), axis=0)
print(distances.shape)
print(nearest_center_index.shape)
plt.imshow(nearest_center_index)
out:
(36, 2, 400, 400)
(400, 400)
I was able, with help from the comments, to produce a somewhat ugly answer, which helped me futher to understand the problem. Let me elaborate. What is possible to do is to flatten the image and argmin results and then use advanced indexing with argmin and indices over the image to produce the results.
flatten_indices = nearest_center_index.reshape(400**2)
image_indices = range(400**2)
results = distances.reshape(36, 2, 400**2)[flatten_indices, :, image_indices].reshape(400, 400, 2).swapaxes(0, 2)
However, I think it happens a lot that you have indices that are shaped as a subset of the dimensions and have values containing indices of another dimension. I would expect a generic method to slice this.
Thus let us have and array with n dimensions with shape = (x1, x2, ..., xn) and let us say we have a array representing indices for a dimension, e.g., xi, that has shape which is a subset of the shape of the original array and not containing xi. Then I would expect a method to slice this array.
The function I was looking for is numpy.take_along_axis().
For the specific example the only thing needed to be done is making sure the nearest_center_index (output of argmin) has equal amount of dimensions as the to be sliced array. In the example this can be achieved by passing keepdims=True to both norm and argmin which then can be directly used as the second argument of the numpy function. The third argument should be the xi axis (in the example axis 0).
Without passing the keepdims=True, following the exact example, the stated objective can be achieved by:
result = np.take_along_axis(distances, nearest_center_index[None,None,:,:], 0)[0]

How do i add a dimension to a numpy array and copy the dimension from another numpy array

I have a numpy array with the shape (128, 8)
I want to add an extra dimension so it has the shape (128, 168, 8)
And add the content of a 168 dimension from another array that has the shape (128, 168, 8).
I can always permute the positions of the dimensions if I can somehow add it.
Is this possible somehow? I have seen the append and concatenation methods but to no luck.
you can also do:
small[:,None,:]+big
Adding None to indexing creates a new dimension of size 1, and adding to another bigger array will broadcast the small's size=1 dimension to bigger arrays corresponding dimension size (here will be 168)
np.expand_dims(smaller_array, axis=1) + bigger_array
Is the correct solution, thanks!

Appending matricies into a single matrix with numpy

I have a function in Python that returns a numpy.mat of shape (100, 1). I am calling this function 4 times in a loop and would like to take the resulting 4 matricies and create a matrix of shape (100, 4). I have looked for sometime at numpy.append, numpy.concatenate, and numpy.insert but have not been able to get this working.
Here is a short SSCCE of my issue
zeros = np.zeros(shape=(100, 4))
for i in range(1, 5):
np.append(zeros, np.empty(shape=(100, 1)))
print(zeros)
Where zeros should results in a matrix of shape (100, 4) with "junk" values from each of the calls to numpy.empty and not all 0..
Do something along these lines -
zeros = np.zeros(shape=(100, 4))
for i in range(1, 5):
data = np.random.rand(100,1) # func that returns (100,1) shaped array
zeros[:,i-1] = data.ravel()
In place of ravel(), we could also use : data[:,0] or np.squeeze(data), basic idea is to feed a 1D array there, because the LHS zeros[:,i-1] expects a 1D array there.
As an alternative, inside the loop, we could also do -
zeros[:,[i-1]] = data
Thus, with that list of column index [i-1] instead of i-1, we are keeping the dimensions into which data is to be assigned (keeps as 2D) and that allows us to feed in data, which is also 2D without any change.

NumPy ndarray broadcasting - shape (X,) vs (X, 1) to operate with (X,Y)

I have a NumPy ndarray which is shaped (32, 1024) and holds 32 signal measurements which I would like to combine into a single 1024 element long array, with a different weight for each of the 32. I was using numpy.average but my weights are complex and average performs a normalisation of the weights based on the sum which throws off my results.
Looking at the code for average I realised that I can accomplish the same thing by multiplying the weights by the signal array and then summing over the first axis. However when I try and multiply my (32,) weights array by the (32, 1024) signal array I get a dimension mismatch as the (32,) cannot be broadcast to (32, 1024). If I reshape the weights array to (32, 1) then everything works as expected, however this results in rather ugly code:
avg = (weights.reshape((32, 1)) * data).sum(axis=0)
Can anybody explain why NumPy will not allow my (32,) array to broadcast to (32, 1024) and/or suggest an alternative, neater way of performing the weighted average?
Generic setup for alignment between (X,) and (X,Y) shaped arrays
On the question of why (32,) can't broadcast to (32, 1024), it's because the shapes aren't aligned properly. To put it into a schematic, we have :
weights : 32
data : 32 x 1024
We need to align the only axis, which is the first axis of weights aligned to the first axis of data. So, as you discovered one way is to reshape to 2D, such that we would end up with a singleton dimension as the second axis. This could be achieved by introducing a new axis with None/np.newaxis : weights[:,np.newaxis] or weights[:,None] or a simple reshape : weights.reshape(-1,1). Hence, going back to the schematic, with the modified version we would have :
weights[:,None] : 32 x 1
data : 32 x 1024
Now, that the shapes are aligned, we can perform any generic element-wise operation between these two with the result schematic looking like so -
weights[:,None] : 32 x 1
data : 32 x 1024
result : 32 x 1024
This would broadcast weights and the relevant element-wise operation would be performed with data resulting in result.
Solving our specific case and alternatives
Following the discussion in previous section, to solve our case of element-wise multiplication, it would be weights[:,None]*data and then sum along axis=0, i.e. -
(weights[:,None]*data).sum(axis=0)
Let's look for neat alternatives!
One neat and probably intuitive way would be with np.einsum -
np.einsum('i,ij->j',weights,data)
Another way would be with matrix-multiplication using np.dot, as we lose the first axis of weights against the first axis of data, like so -
weights.dot(data)

What is the best way to do multi-dimensional indexing with numpy?

I am trying to do some indexing on a 3D numpy array.
Basically I have an array phi which has shape (F,A,D); for example (5, 3, 7). Generated, for example as follows:
F=5; A=3; D=7; phi = np.random.random((F,A,D))
My goal is to be able to index over A and D, with a 2D array such as [[0,1,2],[5,5,6]], which means take the values indexed by 0 in the 3rd dimension, for the the first position in A, the values indexed by 1 in the 3rd dimension for the second position of A and so on. The result should have a shape that is (F,A,2) or (F,2,A).
This would be equivalent to manually cycling all the values of the "indexer array" such as:
phi[:,0,0]; phi[:,1,1]; phi[:,2,2]
phi[:,0,5]; phi[:,1,5]; phi[:,2,6]
Intuitively I would do something like phi[:,:,[[0,1,2],[3,3,3]]], but it's shape ends up being (5, 3, 2, 3).
Any ideas on how to obtain the correct result?
I think this is what you want
phi[:,range(A),[[0,1,2],[5,5,6]]]
Your attempt
phi[:,:,[[0,1,2],[5,5,6]]]
takes the values along the third dimension for every values of the first two dimensions, therefore you end up with a shape of (5,3,2,3).
However, according to your example you want a continous increase in the second dimension which is accomplished in my code by range(A) and numpy's broadcasting.

Categories