Crop part of np.array - python

Ihave a numpy array A like
A.shape
(512,270,1,20)
I dont want to use all the 20 layers in dimension 4. The new array should be like
Anew.shape
(512,270,1,2)
So I want to crop out 2 "slices" of the array A

From the python documentation, the answer is:
start = 4 # Index where you want to start.
Anew = A[:,:,:,start:start+2]

You can use a list or array of indices rather than slice notation in order to select an arbitrary sequence of indices in the final dimension:
x = np.zeros((512, 270, 1, 20))
y = x[..., [4, 10]] # the 5th and 11th indices in the final dimension
print(y.shape)
# (512,270,1,2)

Related

Extract 2d ndarray from arbitrarily dimensional ndarray using index arrays

I want to extract parts of an numpy ndarray based on arrays of index positions for some of the dimensions. Let me show this on an example
Example data
dummy = np.random.rand(5,2,100)
X = np.array([[0,1],[4,1],[2,0]])
dummy is the original ndarray with dimensionality 5x2x100. This dimensionality is arbitrary, it could as well be 5x2x4x100.
X is a matrix of index values, here X[:,0] are the indices of the first dimension of dummy, X[:,1] those of the second dimension. The number of columns in X is always the number of dimensions in dummy minus 1.
Example output
I want to extract an ndarray of the following form for this example
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Complications
If the number of dimensions in dummy were fixed, this could just be done by dummy[X[:,0],X[:,1],:] . Sadly the dimensionality can be different, e.g. dummy could be a 5x2x4x6x100 ndarray and X correspondingly would then be 3x4 . My attempts at dealing with it have not yielded the desired result.
dummy[X,:] yields a 3x2x2x100 ndarray for this example same as dummy[X]
Iteratively reducing dummy by doing something like dummy = dummy[X[:,i],:] with i an iterator over the number of columns of X also does not reduce the ndarray in the example past 3x2x100
I have a feeling that this should be pretty simple with numpy indexing, but I guess my search for a solution was missing the right terms for this.
Does anyone have a solution to this?
I will try to provide some explainability to #Michael Szczesny answer.
First, notice that if you have an np.array with dimension n and pass m indexes where m<n, then it will be the same as using : in the dimensions >=m. In your case, for example:
dummy[(0, 0)] == dummy[0, 0, :]
Given that, note that you can also pass an array as an index. Thus:
dummy[([0, 1], [0, 0])]
It would be the same as:
np.array([dummy[(0,0)], dummy[(1,0)]])
You can validate that using:
dummy[([0, 1], [0, 0])] == np.array([dummy[(0,0)], dummy[(1,0)]])
Finally, notice that:
(*X.T,)
# (array([0, 4, 2]), array([1, 1, 0]))
You are here getting each dimension as an array, and then you will get:
[
dummy[0,1],
dummy[4,1],
dummy[2,0]
]
Which is the same as:
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Edit: Instead of using (*X.T,), you can use tuple(X.T), which for me, makes more sense
as Michael Szczesny wrote, the best solution is dummy[(*X.T,)].
Since X[:,0] are the indices of the first dimension of dummy and X[:,1] are the indices of the second dimension of dummy, if you transpose X (X.T) you'll have the the indices of the first dimension of dummy as X.T[0] and the indices of the second dimension of dummy as X.T[1].
Now to slice dummy as you want, you can specify the indices of the first and of the second dimension in this way:
dummy[(first_dim_indices, second_dim_indices)] = dummy[(X.T[0], X.T[1])]
In order to simplify the code (and since you doesn't want to transpose the X matrix twice) you can unpack X.T in a tuple as (*X.T,) and so write X[(*X.T,)] is the same thing to write dummy[(X.T[0], X.T[1])].
This writing is also useful if you have an unfixed number of dimensions to slice trough because you will unpack from X.T as many lines as there are dimensions to slice in dummy. For example suppose you want to retrieve an 1D-array from dummy given the following indices:
first_dim: (0, 4, 2)
second_dim: (1, 1, 0)
third_dim: (9, 8, 7)
You can specify the indices of the 3 dimensions as X = np.array([[0,1,9],[4,1,8],[2,0,7]]) and dim[(*X.T,)] is still valid.

Randomly select rows from numpy array based on a condition

Let's say I have 2 arrays of arrays, labels is 1D and data is 5D note that both arrays have the same first dimension.
To simplify things let's say labels contain only 3 arrays :
labels=np.array([[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]])
And let's say I have a datalist of data arrays (length=3) where each array has a 5D shape where the first dimension of each one is the same as the arrays of the labels array.
In this example, datalist has 3 arrays of shapes : (8,3,100,10,1), (5,3,100,10,1) and (10,3,100,10,1) respectively. Here, the first dimension of each of these arrays is the same as the lengths of each array in label.
Now I want to reduce the number of zeros in each array of labels and keep the other values. Let's say I want to keep only 3 zeros for each array. Therefore, the length of each array in labels as well as the first dimension of each array in data will be 6, 4 and 8.
In order to reduce the number of zeros in each array of labels, I want to randomly select and keep only 3. Now these same random selected indexes will be used then to select the correspondant rows from data.
For this example, the new_labels array will be something like this :
new_labels=np.array([[0,0,1,1,2,0],[4,0,0,0],[0,3,2,1,0,1,7,0]])
Here's what I have tried so far :
all_ind=[] #to store indexes where value=0 for all arrays
indexes_to_keep=[] #to store the random selected indexes
new_labels=[] #to store the final results
for i in range(len(labels)):
ind=[] #to store indexes where value=0 for one array
for j in range(len(labels[i])):
if (labels[i][j]==0):
ind.append(j)
all_ind.append(ind)
for k in range(len(labels)):
indexes_to_keep.append(np.random.choice(all_ind[i], 3))
aux= np.zeros(len(labels[i]) - len(all_ind[i]) + 3)
....
....
Here, how can I fill **aux** with the values ?
....
....
new_labels.append(aux)
Any suggestions ?
Playing with numpy arrays of different lenghts is not a good idea therefore you are required to iterate each item and perform some method on it. Assuming you want to optimize that method only, masking might work pretty well here:
def specific_choice(x, n):
'''leaving n random zeros of the list x'''
x = np.array(x)
mask = x != 0
idx = np.flatnonzero(~mask)
np.random.shuffle(idx) #dynamical change of idx value, quite fast
idx = idx[:n]
mask[idx] = True
return x[mask] # or mask if you need it
Iteration of list is faster than one of array so effective usage would be:
labels = [[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]]
output = [specific_choice(n, 3) for n in labels]
Output:
[array([0, 1, 1, 2, 0, 0]), array([0, 4, 0, 0]), array([0, 3, 0, 2, 1, 1, 7, 0])]

Dimensions of array after multidimensional index slicing

I want to slice a multidimensional numpy array (>2 dimensions) along 2 of its axes using index slicing. What are the rules for where each of its original dimensions end up?
To illustrate my problem, let me provide an example. Say we have a 4D array:
import numpy as np
a = np.arange(2*3*4*5).reshape(2,3,4,5)
I'll create a tuple of indices using numpy.where, for slicing along axes 1 and 3:
mask = np.where(np.random.rand(3,5) > 0.5)
This will pick out random slices from my array a. Let's say it returned tuples of length 7.
To preserve the remaining dimensions I will use slice(None) objects:
b = a[(slice(None), mask[0], slice(None), mask[1])]
This changed the shape:
>>> a.shape
(2, 3, 4, 5)
>>> b.shape
(7, 2, 4)
The axes that were untouched (i.e. sliced using the slice(None) object) appear to have been preserved, whereas the sliced axes are destroyed and the resulting axis is moved to the front.
However, this is not always the case. When I apply a mask to axes 1 and 2:
mask2 = np.where(np.random.rand(3,4) > 0.5)
c = a[(slice(None), mask[0], mask[1], slice(None))]
I observe the following (numpy.where has returned tuples of length 7 again):
>>> c.shape
(2, 7, 5)
The axis resulting from the axes that have been destroyed by the slicing did not move to the front this time.
My guess is that it is related to whether the sliced axes are adjacent or not, but I want to know from what rules this behavior emerges.
https://docs.scipy.org/doc/numpy-1.15.4/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
Your where masks will produce a 1d (7,) shape array if applied to a 2d array, the values where the condition is true. You phrase that as 'destroying' a pair of axes.
In the second case that 7 can be placed between the 2 and 5.
But in the first it's ambiguous because of the slice in the middle (the non adjacency) - the fall back rule is to put it first, and order the slices after. In other words, instead of trying to choose between a (2,7,4) and (2,4,7) order, it chooses (7,2,4).
The ambiguity is clear in this case, and the default reasonable. It's more complicated with one or more of the dimensions is eliminated by a scalar index.

Append value to each array in a numpy array

I have a numpy array of arrays, for example:
x = np.array([[1,2,3],[10,20,30]])
Now lets say I want to extend each array with [4,40], to generate the following resulting array:
[[1,2,3,4],[10,20,30,40]]
How can I do this without making a copy of the whole array? I tried to change the shape of the array in place but it throws a ValueError:
x[0] = np.append(x[0],4)
x[1] = np.append(x[1],40)
ValueError : could not broadcast input array from shape (4) into shape (3)
You can't do this. Numpy arrays allocate contiguous blocks of memory, if at all possible. Any change to the array size will force an inefficient copy of the whole array. You should use Python lists to grow your structure if possible, then convert the end result back to an array.
However, if you know the final size of the resulting array, you could instantiate it with something like np.empty() and then assign values by index, rather than appending. This does not change the size of the array itself, only reassigns values, so should not require copying.
While #roganjosh is right that you cannot modify the numpy arrays without making a copy (in the underlying process), there is a simpler way of appending each value of an ndarray to the end of each numpy array in a 2d ndarray, by using numpy.column_stack
x = np.array([[1,2,3],[10,20,30]])
array([[ 1, 2, 3],
[10, 20, 30]])
stack_y = np.array([4,40])
array([ 4, 40])
numpy.column_stack((x, stack_y))
array([[ 1, 2, 3, 4],
[10, 20, 30, 40]])
Create a new matrix
Insert the values of your old matrix
Then, insert your new values in the last positions
x = np.array([[1,2,3],[10,20,30]])
new_X = np.zeros((2, 4))
new_X[:2,:3] = x
new_X[0][-1] = 4
new_X[1][-1] = 40
x=new_X
Or Use np.reshape() or np.resize() instead

Selecting which dimension to index in a numpy array

I am writing a program that is suppose to be able to import numpy arrays of some higher dimension, e.g. something like an array a:
a = numpy.zeros([3,5,7,2])
Further, each dimension will correspond to some physical dimension, e.g. frequency, distance, ... and I will also import arrays with information about these dimensions, e.g. for a above:
freq = [1,2,3]
time = [0,1,2,3,4,5,6]
distance = [0,0,0,4,1]
angle = [0,180]
Clearly from this example and the signature it can be figured out that freq belong to dimension 0, time to dimension 2 and so on. But since this is not known in advance, I can take a frequency slice like
a_f1 = a[1,:,:,:]
since I do not know which dimension the frequency is indexed.
So, what I would like is to have some way to chose which dimension to index with an index; in some Python'ish code something like
a_f1 = a.get_slice([0,], [[1],])
This is suppose to return the slice with index 1 from dimension 0 and the full other dimensions.
Doing
a_p = a[0, 1:, ::2, :-1]
would then correspond to something like
a_p = a.get_slice([0, 1, 2, 3], [[0,], [1,2,3,4], [0,2,4,6], [0,]])
You can fairly easily construct a tuple of indices, using slice objects where needed, and then use this to index into your array. The basic is recipe is this:
indices = {
0: # put here whatever you want to get on dimension 0,
1: # put here whatever you want to get on dimension 1,
# leave out whatever dimensions you want to get all of
}
ix = [indices.get(dim, slice(None)) for dim in range(arr.ndim)]
arr[ix]
Here I have done it with a dictionary since I think that makes it easier to see which dimension goes with which indexer.
So with your example data:
x = np.zeros([3,5,7,2])
We do this:
indices = {0: 1}
ix = [indices.get(dim, slice(None)) for dim in range(x.ndim)]
>>> x[ix].shape
(5L, 7L, 2L)
Because your array is all zeros, I'm just showing the shape of the result to indicate that it is what we want. (Even if it weren't all zeros, it's hard to read a 3D array in text form.)
For your second example:
indices = {
0: 0,
1: slice(1, None),
2: slice(None, None, 2),
3: slice(None, -1)
}
ix = [indices.get(dim, slice(None)) for dim in range(x.ndim)]
>>> x[ix].shape
(4L, 4L, 1L)
You can see that the shape corresponds to the number of values in your a_p example. One thing to note is that the first dimension is gone, since you only specified one value for that index. The last dimension still exists, but with a length of one, because you specified a slice that happens to just get one element. (This is the same reason that some_list[0] gives you a single value, but some_list[:1] gives you a one-element list.)
You can use advanced indexing to achieve this.
The index for each dimension needs to be shaped appropriately so that the indices will broadcast correctly across the array. For example, the index for the first dimension of a 3-d array needs to be shaped (x, 1, 1) so that it will broadcast across the first dimension. The index for the second dimension of a 3-d array needs to be shaped (1, y, 1) so that it will broadcast across the second dimension.
import numpy as np
a = np.zeros([3,5,7,2])
b = a[0, 1:, ::2, :-1]
indices = [[0,], [1,2,3,4], [0,2,4,6], [0,]]
def get_aslice(a, indices):
n_dim_ = len(indices)
index_array = [np.array(thing) for thing in indices]
idx = []
# reshape the arrays by adding single-dimensional entries
# based on the position in the index array
for d, thing in enumerate(index_array):
shape = [1] * n_dim_
shape[d] = thing.shape[0]
#print(d, shape)
idx.append(thing.reshape(shape))
c = a[idx]
# to remove leading single-dimensional entries from the shape
#while c.shape[0] == 1:
# c = np.squeeze(c, 0)
# To remove all single-dimensional entries from the shape
#c = np.squeeze(c).shape
return c
For a as an input, it returns an array with shape (1,4,4,1) your a_p example has a shape of (4,4,1). If the extra dimensions need to be removed un-comment the np.squeeze lines in the function.
Now I feel silly. While reading the docs slower I noticed numpy has an indexing routine that does what you want - numpy.ix_
>>> a = numpy.zeros([3,5,7,2])
>>> indices = [[0,], [1,2,3,4], [0,2,4,6], [0,]]
>>> index_arrays = np.ix_(*indices)
>>> a_p = a[index_arrays]
>>> a_p.shape
(1, 4, 4, 1)
>>> a_p = np.squeeze(a_p)
>>> a_p.shape
(4, 4)
>>>

Categories