Problem simultaneously indexing several dimensions of a multidimensional numpy array - python

Consider a 4-dimensional numpy array (variable a). We have a.shape = (16, 5, 66, 717).
From the second dimension containing 4 elements, I want to select the second and the fifth:
b = a[:, [1,4],:,:]
b.shape returns (16, 2, 66, 717), so I guess what I did is correct. Now I want to extract 4 elements from the first dimension (eighth, eleventh, twelfth, thirteenth) and two elements from the second dimension (second and fifth):
b = a[[7,10,12,13,14], [1,4],:,:]
which gives an error:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (5,) (2,)
I don't understand why this simultaneous indexing across >1 dimensions of numpy array doesn't work. I guess I could sequentially do b = a[:, [1,4],:,:] and c = b[[7,10,12,13,14],:,:,:] to get what I want, but there must be a way to do that in one step. Could you please help?

Make a smaller 3d array:
In [155]: a = np.arange(24).reshape(2,3,4)
In [158]: a
Out[158]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Selecting two "rows" (on the middle dimension):
In [159]: a[:,[0,2],:]
Out[159]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[20, 21, 22, 23]]])
If we use 2 lists (or arrays) of the same shape, we end up selecting 2 "rows" from [159]:
In [160]: a[[0,1],[0,2],:]
Out[160]:
array([[ 0, 1, 2, 3],
[20, 21, 22, 23]])
If instead the first list/array is a "column vector", we select a (2,2) "block":
In [161]: a[[[0],[1]],[0,2],:]
Out[161]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[20, 21, 22, 23]]])
ix_ can be used to create the same 2 arrays:
In [162]: np.ix_([0,1],[0,2])
Out[162]:
(array([[0],
[1]]),
array([[0, 2]]))
So using ix_ arrays:
In [163]: I,J = np.ix_([0,1],[0,2])
In [164]: a[I,J,:]
Out[164]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[20, 21, 22, 23]]])
when I say they broadcast against each other, I mean in the same sense as broadcasting during adding or multiplication:
In [165]: I*10 + J
Out[165]:
array([[ 0, 2],
[10, 12]])
reference: https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing
edit
In [166]: np.ix_([7,10,12,13,14], [1,4])
Out[166]:
(array([[ 7],
[10],
[12],
[13],
[14]]),
array([[1, 4]]))
Regarding your error:
In [167]: np.ix_([7,10,12,13,14], [1,4],:,:)
Input In [167]
np.ix_([7,10,12,13,14], [1,4],:,:)
^
SyntaxError: invalid syntax
ix_ is a function. ':' isn't allowed in a function call. It only works in an indexing, where it's converted to a slice. That's why you get a syntax error.

Related

Taking a Different Subset of Indices per Row in Numpy Using Fancy Indexing

I have an array of pairwise differences of features:
diff.shape = (200, 200, 2)
of which I am trying to take only the columns corresponding to the 50 closest points. For each row, I have the indices of the closest 50 points stored as:
dist_idx.shape = (200, 50).
How can I index the 50 closest entries (different indices per row) using fancy indexing? I have tried:
diff[dist_idx].shape = (200, 50, 200, 2)
diff[np.arange(200), dist_idx] -> IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (200,) (200,50)
diff[np.arange(200), dist_idx[np.arange(200)]] -> IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (200,) (200,50)
An iterative solution that works is:
X_diff = np.zeros((200, 50, 2))
for i in range(200):
X_diff[i] = diff[i, dist_idx[i]]
Make a smaller example
In [159]: arr = np.arange(24).reshape(3, 4, 2)
In [160]: arr
Out[160]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11],
[12, 13],
[14, 15]],
[[16, 17],
[18, 19],
[20, 21],
[22, 23]]])
In [161]: idx = np.array([[0, 1], [0, 2], [1, 3]])
In [162]: idx.shape
Out[162]: (3, 2)
Your iterative approach:
In [164]: out = np.zeros((3, 2, 2), int)
In [165]: for i in range(3):
...: out[i] = arr[i, idx[i]]
In [166]: out
Out[166]:
array([[[ 0, 1],
[ 2, 3]],
[[ 8, 9],
[12, 13]],
[[18, 19],
[22, 23]]])
your first try doesn't work because it is applying the idx to the 1st dimension of the array. In my case I get an error because the 2 dimensions don't match
In [167]: arr[idx]
Traceback (most recent call last):
Input In [167] in <module>
arr[idx]
IndexError: index 3 is out of bounds for axis 0 with size 3
But if we uses a (3,1) array as the first index, it pairs nicely with the (3,2) idx array for the 2nd.
In [168]: arr[np.arange(3)[:, None], idx, :]
Out[168]:
array([[[ 0, 1],
[ 2, 3]],
[[ 8, 9],
[12, 13]],
[[18, 19],
[22, 23]]])

Figuring out correct numpy transpose for 3*3*3 array

Suppose I have a 3*3*3 array x. I would like to find out an array y, such that such that y[0,1,2] = x[1,2,0], or more generally, y[a,b,c]= x[b,c,a]. I can try numpy.transpose
import numpy as np
x = np.arange(27).reshape((3,3,3))
y = np.transpose(x, [2,0,1])
print(x[0,1,2],x[1,2,0])
print(y[0,1,2])
The output is
5 15
15
The result 15,15 is what I expected (the first 15 is the reference value from x[1,2,0]; the second is from y[0,1,2]) . However, I found the transpose [2,0,1] by drawing in a paper.
B C A
A B C
by inspection, the transpose should be [2,0,1], the last entry in the upper row goes to 1st in the lower row; the middle goes last; the first go middle. Is there any automatic and hopefully efficient way to do it (like any standard function in numpy/sympy)?
Given the input y[a,b,c]= x[b,c,a], output [2,0,1]?
I find easier to explore tranpose with a example with shape like (2,3,4), each axis is different.
But sticking with your (3,3,3)
In [23]: x = np.arange(27).reshape(3,3,3)
In [24]: x
Out[24]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
In [25]: x[0,1,2]
Out[25]: 5
Your sample transpose:
In [26]: y = x.transpose(2,0,1)
In [27]: y
Out[27]:
array([[[ 0, 3, 6],
[ 9, 12, 15],
[18, 21, 24]],
[[ 1, 4, 7],
[10, 13, 16],
[19, 22, 25]],
[[ 2, 5, 8],
[11, 14, 17],
[20, 23, 26]]])
We get the same 5 with
In [28]: y[2,0,1]
Out[28]: 5
We could get that (2,0,1) by applying the same transposing values:
In [31]: idx = np.array((0,1,2)) # use an array for ease of indexing
In [32]: idx[[2,0,1]]
Out[32]: array([2, 0, 1])
The way I think about the trapose (2,0,1), we are moving the last axis, 2, to the front, and preserving the order of the other 2.
With differing dimensions, it's easier to visualize the change:
In [33]: z=np.arange(2*3*4).reshape(2,3,4)
In [34]: z
Out[34]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [35]: z.transpose(2,0,1)
Out[35]:
array([[[ 0, 4, 8],
[12, 16, 20]],
[[ 1, 5, 9],
[13, 17, 21]],
[[ 2, 6, 10],
[14, 18, 22]],
[[ 3, 7, 11],
[15, 19, 23]]])
In [36]: _.shape
Out[36]: (4, 2, 3)
np.swapaxes is another compiled function for making these changes. np.rollaxis is another, though it's python code that ends up calling transpose.
I haven't tried to follow all of your reasoning, though I think you want a kind reverse of the transpose numbers, one where you specify the result order, and want how to get them.

Multi dimensional Indexing with Numpy

I'm using a 3 dimensional array, that is defined like this:
x = np.zeros((dim1, dim2, dim3), dtype=np.float32)
After inserting some data I need to apply a function only if values in specific columns are still zero.
The columns I'm interested in are selected by this array containing the correct indexes
scale_idx = np.array([0,1,3])
therefore what I'm trying to do is to use indexing to select those row and columns.
At first i tried to do this, using a boolean mask for the first 2 dimensions, using an array for the third:
x[x[:,:,scale_idx].any(axis =2)] ,scale_idx]
but I get this error:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,) (3,)
If I change the last index to : I get all the row I'm interested in, but i get all the possible columns, I was expecting that the last array would act as an indexer, as explained in https://docs.scipy.org/doc/numpy/user/basics.indexing.html.
x[x[:,:,scale_idx].any(axis =2)]
My scale_idx should be interpreted as a column indexers but are actually interpreted as row indexes, therefore, since only 2 rows respect the condition but i have 3 indexes, I get an IndexError.
I have found a workaround to this using
x[x[:,:,scale_idx].any(axis =2)][:,:,scale_idx]
but it's kinda ugly and, since it's a slice, i can't modify the original array.
Anybody willing to explain to me what I'm doing wrong?
EDIT:
Thanks to #hpaulj I've managed to isolate the cells I need, after that I've created a matrix with the same shape of the selected values, and assigned the values to the masked cells, to my surprise, the new values are not the ones I just set but are some random integers that I can't figure out where they came from.
Code to reproduce:
scale_idx = np.array([0,3,1])
b = x[:,:,scale_idx].any(axis =2)
I, J = np.nonzero(b)
x[I[:,None], J[:,None], scale_idx] #this selects the correct cells
>>>
array([[ 50, 50, 50],
[100, 100, 100],
[100, 100, 100]])
scaler.transform(x[I[:,None], J[:,None], scale_idx]) #sklearn standard scaler, returns a matrix with the scaled values
>>>
array([[-0.50600345, -0.5445559 , -1.2957878 ],
[-0.50600345, -0.25915199, -1.22266904],
[-0.50600345, -0.25915199, -1.22266904]])
x[I[:,None], J[:,None], scale_idx] = scaler.transform(x[I[:,None], J[:,None], scale_idx]) #assign the new values to the selected cells
x[I[:,None], J[:,None], scale_idx] #check the new values
array([[0, 2, 0],
[0, 6, 2],
[0, 6, 2]])
Why are the new values different from what I'm expecting?
Let's take the 3d boolean mask example from the indexing docs:
In [135]: x = np.arange(30).reshape(2,3,5)
...: b = np.array([[True, True, False], [False, True, True]])
In [136]: x
Out[136]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
In [137]: b
Out[137]:
array([[ True, True, False],
[False, True, True]])
In [138]: x[b]
Out[138]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
This is a 2d array. The mask b selects elements from the first 2 dimensions. The False values cause it to skip the [10...] and [15...] rows.
We can slice on the last dimension:
In [139]: x[b,:3]
Out[139]:
array([[ 0, 1, 2],
[ 5, 6, 7],
[20, 21, 22],
[25, 26, 27]])
but a list index will produce an error (unless it's length 4):
In [140]: x[b,[0,1,2]]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-140-7f1dbec100f2> in <module>
----> 1 x[b,[0,1,2]]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (4,) (3,)
The reason is that the boolean mask effectively translates into index with the np.where arrays:
In [141]: np.nonzero(b)
Out[141]: (array([0, 0, 1, 1]), array([0, 1, 1, 2]))
nonzero found 4 nonzero elements. The x[b] indexing is then:
In [143]: x[[0,0,1,1],[0,1,1,2],:]
Out[143]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean-array-indexing
The shape mismatch then becomes more obvious:
In [144]: x[[0,0,1,1],[0,1,1,2],[1,2,3]]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-144-1efd76049cb0> in <module>
----> 1 x[[0,0,1,1],[0,1,1,2],[1,2,3]]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (4,) (3,)
If the lists match in size, the indexing runs, but produces a 'diagonal', not a block:
In [145]: x[[0,0,1,1],[0,1,1,2],[1,2,3,4]]
Out[145]: array([ 1, 7, 23, 29])
As you found the two stage indexing works - but not for setting values
In [146]: x[[0,0,1,1],[0,1,1,2]][:,[1,2,3]]
Out[146]:
array([[ 1, 2, 3],
[ 6, 7, 8],
[21, 22, 23],
[26, 27, 28]])
We can get the block by 'transposing' the last index list:
In [147]: x[[0,0,1,1],[0,1,1,2],[[1],[2],[3]]]
Out[147]:
array([[ 1, 6, 21, 26],
[ 2, 7, 22, 27],
[ 3, 8, 23, 28]])
Ok, this is the transpose. We could apply transpose to it. Or we could transpose the b arrays first:
In [148]: I,J=np.nonzero(b)
In [149]: x[I[:,None], J[:,None], [1,2,3]]
Out[149]:
array([[ 1, 2, 3],
[ 6, 7, 8],
[21, 22, 23],
[26, 27, 28]])
And this works for setting
In [150]: x[I[:,None], J[:,None], [1,2,3]]=0
In [151]: x
Out[151]:
array([[[ 0, 0, 0, 0, 4],
[ 5, 0, 0, 0, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 0, 0, 0, 24],
[25, 0, 0, 0, 29]]])
It's a long answer. I had a general idea of what was happening, but needed to work out the details. Plus, you need to understand what's going on.

splitting ND arrays using numpy

I have a 3D numpy array and I want to partition it by the first 2 dimensions (and select all elements in the last one). Is there a simple way I can do that using numpy?
Example: given array
a = array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I would like to split it N ways by the first two axes (while retaining all elements in the last one), e.g.,:
a[0:2, 0:2, :], a[2:3, 2:3, :]
But it doesn't need to be evenly split. Seems like numpy.array_split will split on all axes?
In [179]: np.array_split(a,2,0)
Out[179]:
[array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]]),
array([[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])]
is the same as [a[:2,:,:], a[2:,:,:]]
You could loop on those 2 arrays and apply split on the next axis.
In [182]: a2=[np.array_split(aa,2,1) for aa in a1]
In [183]: a2 # edited for clarity
Out[183]:
[[array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 9, 10, 11],
[12, 13, 14]]]), # (2,2,3)
array([[[ 6, 7, 8]],
[[15, 16, 17]]])], # (2,1,3)
[array([[[18, 19, 20],
[21, 22, 23]]]), # (1,2,3)
array([[[24, 25, 26]]])]] # (1,1,3)
In [184]: a2[0][0].shape
Out[184]: (2, 2, 3)
In [185]: a2[0][1].shape
Out[185]: (2, 1, 3)
In [187]: a2[1][0].shape
Out[187]: (1, 2, 3)
In [188]: a2[1][1].shape
Out[188]: (1, 1, 3)
With the potential of splitting in uneven arrays in each dimension, it is hard to do this in a full vectorized form. And even if the splits were even it's tricky to do this sort of grid splitting because values are not contiguous. In this example there's a gap between 5 and 9 in the first subarray.
A quick list comprehension will do the trick
[np.array_split(arr, 2, axis=1)
for arr in np.array_split(a, 2, axis=0)]
This will result in a list of lists, the items of which contain the arrays you're looking for.

How to select values in a n-dimensional array

I have been trying to perform a simple operation, but I can't seem to find a simple way to do it using Numpy functions without creating unnecessary copies of the array.
Suppose we have the following 3-dimensional array :
In [171]: x = np.arange(24).reshape((4, 3, 2))
In [172]: x
Out[172]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
And the following array :
In [173]: y = np.array([0, 1, 1, 0])
I want to select in x, for each row, the value of the last dimension whose index is the corresponding element in y. In other words, I want :
array([[ 0, 2, 4],
[ 7, 9, 11],
[13, 15, 17],
[18, 20, 22]])
The only solution that I have for now is using a for loop over the first dimension of x and y, as follows :
z = np.zeros((4, 3), dtype=int)
for i, row in enumerate(x):
z[i, :] = row[:, y[i]]
Is there a way of avoiding a for loop here, using numpy functions or fancy indexing?
Thanks!
The tricky aspect is that you don't want all of the 0th-dimension for each slice, you want the slices to correspond to each element in the 0th-dimension. So you could do something like:
>>> x[np.arange(x.shape[0]), :, y]
array([[ 0, 2, 4],
[ 7, 9, 11],
[13, 15, 17],
[18, 20, 22]])
Fancy indexing:
x[np.arange(y.size),:,y]
gives:
array([[ 0, 2, 4],
[ 7, 9, 11],
[13, 15, 17],
[18, 20, 22]])

Categories