I'm using a 3 dimensional array, that is defined like this:
x = np.zeros((dim1, dim2, dim3), dtype=np.float32)
After inserting some data I need to apply a function only if values in specific columns are still zero.
The columns I'm interested in are selected by this array containing the correct indexes
scale_idx = np.array([0,1,3])
therefore what I'm trying to do is to use indexing to select those row and columns.
At first i tried to do this, using a boolean mask for the first 2 dimensions, using an array for the third:
x[x[:,:,scale_idx].any(axis =2)] ,scale_idx]
but I get this error:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,) (3,)
If I change the last index to : I get all the row I'm interested in, but i get all the possible columns, I was expecting that the last array would act as an indexer, as explained in https://docs.scipy.org/doc/numpy/user/basics.indexing.html.
x[x[:,:,scale_idx].any(axis =2)]
My scale_idx should be interpreted as a column indexers but are actually interpreted as row indexes, therefore, since only 2 rows respect the condition but i have 3 indexes, I get an IndexError.
I have found a workaround to this using
x[x[:,:,scale_idx].any(axis =2)][:,:,scale_idx]
but it's kinda ugly and, since it's a slice, i can't modify the original array.
Anybody willing to explain to me what I'm doing wrong?
EDIT:
Thanks to #hpaulj I've managed to isolate the cells I need, after that I've created a matrix with the same shape of the selected values, and assigned the values to the masked cells, to my surprise, the new values are not the ones I just set but are some random integers that I can't figure out where they came from.
Code to reproduce:
scale_idx = np.array([0,3,1])
b = x[:,:,scale_idx].any(axis =2)
I, J = np.nonzero(b)
x[I[:,None], J[:,None], scale_idx] #this selects the correct cells
>>>
array([[ 50, 50, 50],
[100, 100, 100],
[100, 100, 100]])
scaler.transform(x[I[:,None], J[:,None], scale_idx]) #sklearn standard scaler, returns a matrix with the scaled values
>>>
array([[-0.50600345, -0.5445559 , -1.2957878 ],
[-0.50600345, -0.25915199, -1.22266904],
[-0.50600345, -0.25915199, -1.22266904]])
x[I[:,None], J[:,None], scale_idx] = scaler.transform(x[I[:,None], J[:,None], scale_idx]) #assign the new values to the selected cells
x[I[:,None], J[:,None], scale_idx] #check the new values
array([[0, 2, 0],
[0, 6, 2],
[0, 6, 2]])
Why are the new values different from what I'm expecting?
Let's take the 3d boolean mask example from the indexing docs:
In [135]: x = np.arange(30).reshape(2,3,5)
...: b = np.array([[True, True, False], [False, True, True]])
In [136]: x
Out[136]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
In [137]: b
Out[137]:
array([[ True, True, False],
[False, True, True]])
In [138]: x[b]
Out[138]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
This is a 2d array. The mask b selects elements from the first 2 dimensions. The False values cause it to skip the [10...] and [15...] rows.
We can slice on the last dimension:
In [139]: x[b,:3]
Out[139]:
array([[ 0, 1, 2],
[ 5, 6, 7],
[20, 21, 22],
[25, 26, 27]])
but a list index will produce an error (unless it's length 4):
In [140]: x[b,[0,1,2]]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-140-7f1dbec100f2> in <module>
----> 1 x[b,[0,1,2]]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (4,) (3,)
The reason is that the boolean mask effectively translates into index with the np.where arrays:
In [141]: np.nonzero(b)
Out[141]: (array([0, 0, 1, 1]), array([0, 1, 1, 2]))
nonzero found 4 nonzero elements. The x[b] indexing is then:
In [143]: x[[0,0,1,1],[0,1,1,2],:]
Out[143]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean-array-indexing
The shape mismatch then becomes more obvious:
In [144]: x[[0,0,1,1],[0,1,1,2],[1,2,3]]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-144-1efd76049cb0> in <module>
----> 1 x[[0,0,1,1],[0,1,1,2],[1,2,3]]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (4,) (3,)
If the lists match in size, the indexing runs, but produces a 'diagonal', not a block:
In [145]: x[[0,0,1,1],[0,1,1,2],[1,2,3,4]]
Out[145]: array([ 1, 7, 23, 29])
As you found the two stage indexing works - but not for setting values
In [146]: x[[0,0,1,1],[0,1,1,2]][:,[1,2,3]]
Out[146]:
array([[ 1, 2, 3],
[ 6, 7, 8],
[21, 22, 23],
[26, 27, 28]])
We can get the block by 'transposing' the last index list:
In [147]: x[[0,0,1,1],[0,1,1,2],[[1],[2],[3]]]
Out[147]:
array([[ 1, 6, 21, 26],
[ 2, 7, 22, 27],
[ 3, 8, 23, 28]])
Ok, this is the transpose. We could apply transpose to it. Or we could transpose the b arrays first:
In [148]: I,J=np.nonzero(b)
In [149]: x[I[:,None], J[:,None], [1,2,3]]
Out[149]:
array([[ 1, 2, 3],
[ 6, 7, 8],
[21, 22, 23],
[26, 27, 28]])
And this works for setting
In [150]: x[I[:,None], J[:,None], [1,2,3]]=0
In [151]: x
Out[151]:
array([[[ 0, 0, 0, 0, 4],
[ 5, 0, 0, 0, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 0, 0, 0, 24],
[25, 0, 0, 0, 29]]])
It's a long answer. I had a general idea of what was happening, but needed to work out the details. Plus, you need to understand what's going on.
Related
Consider a 4-dimensional numpy array (variable a). We have a.shape = (16, 5, 66, 717).
From the second dimension containing 4 elements, I want to select the second and the fifth:
b = a[:, [1,4],:,:]
b.shape returns (16, 2, 66, 717), so I guess what I did is correct. Now I want to extract 4 elements from the first dimension (eighth, eleventh, twelfth, thirteenth) and two elements from the second dimension (second and fifth):
b = a[[7,10,12,13,14], [1,4],:,:]
which gives an error:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (5,) (2,)
I don't understand why this simultaneous indexing across >1 dimensions of numpy array doesn't work. I guess I could sequentially do b = a[:, [1,4],:,:] and c = b[[7,10,12,13,14],:,:,:] to get what I want, but there must be a way to do that in one step. Could you please help?
Make a smaller 3d array:
In [155]: a = np.arange(24).reshape(2,3,4)
In [158]: a
Out[158]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Selecting two "rows" (on the middle dimension):
In [159]: a[:,[0,2],:]
Out[159]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[20, 21, 22, 23]]])
If we use 2 lists (or arrays) of the same shape, we end up selecting 2 "rows" from [159]:
In [160]: a[[0,1],[0,2],:]
Out[160]:
array([[ 0, 1, 2, 3],
[20, 21, 22, 23]])
If instead the first list/array is a "column vector", we select a (2,2) "block":
In [161]: a[[[0],[1]],[0,2],:]
Out[161]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[20, 21, 22, 23]]])
ix_ can be used to create the same 2 arrays:
In [162]: np.ix_([0,1],[0,2])
Out[162]:
(array([[0],
[1]]),
array([[0, 2]]))
So using ix_ arrays:
In [163]: I,J = np.ix_([0,1],[0,2])
In [164]: a[I,J,:]
Out[164]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[20, 21, 22, 23]]])
when I say they broadcast against each other, I mean in the same sense as broadcasting during adding or multiplication:
In [165]: I*10 + J
Out[165]:
array([[ 0, 2],
[10, 12]])
reference: https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing
edit
In [166]: np.ix_([7,10,12,13,14], [1,4])
Out[166]:
(array([[ 7],
[10],
[12],
[13],
[14]]),
array([[1, 4]]))
Regarding your error:
In [167]: np.ix_([7,10,12,13,14], [1,4],:,:)
Input In [167]
np.ix_([7,10,12,13,14], [1,4],:,:)
^
SyntaxError: invalid syntax
ix_ is a function. ':' isn't allowed in a function call. It only works in an indexing, where it's converted to a slice. That's why you get a syntax error.
I have an array of pairwise differences of features:
diff.shape = (200, 200, 2)
of which I am trying to take only the columns corresponding to the 50 closest points. For each row, I have the indices of the closest 50 points stored as:
dist_idx.shape = (200, 50).
How can I index the 50 closest entries (different indices per row) using fancy indexing? I have tried:
diff[dist_idx].shape = (200, 50, 200, 2)
diff[np.arange(200), dist_idx] -> IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (200,) (200,50)
diff[np.arange(200), dist_idx[np.arange(200)]] -> IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (200,) (200,50)
An iterative solution that works is:
X_diff = np.zeros((200, 50, 2))
for i in range(200):
X_diff[i] = diff[i, dist_idx[i]]
Make a smaller example
In [159]: arr = np.arange(24).reshape(3, 4, 2)
In [160]: arr
Out[160]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11],
[12, 13],
[14, 15]],
[[16, 17],
[18, 19],
[20, 21],
[22, 23]]])
In [161]: idx = np.array([[0, 1], [0, 2], [1, 3]])
In [162]: idx.shape
Out[162]: (3, 2)
Your iterative approach:
In [164]: out = np.zeros((3, 2, 2), int)
In [165]: for i in range(3):
...: out[i] = arr[i, idx[i]]
In [166]: out
Out[166]:
array([[[ 0, 1],
[ 2, 3]],
[[ 8, 9],
[12, 13]],
[[18, 19],
[22, 23]]])
your first try doesn't work because it is applying the idx to the 1st dimension of the array. In my case I get an error because the 2 dimensions don't match
In [167]: arr[idx]
Traceback (most recent call last):
Input In [167] in <module>
arr[idx]
IndexError: index 3 is out of bounds for axis 0 with size 3
But if we uses a (3,1) array as the first index, it pairs nicely with the (3,2) idx array for the 2nd.
In [168]: arr[np.arange(3)[:, None], idx, :]
Out[168]:
array([[[ 0, 1],
[ 2, 3]],
[[ 8, 9],
[12, 13]],
[[18, 19],
[22, 23]]])
I have an MxNxD array I and also a binary MxN mask M.
Let's say that there are k 1s in M. What I want is to extract a kxD array that contains all the D-length vectors corresponding to the 1s in the mask.
I can get the indices of these vectors in I by calling numpy.nonzero() but I can't find a nice compact way of getting my slice without horrible loops.
Any help will be much appreciated.
I think this is what you want:
In [283]: A = np.arange(24).reshape(2,3,4)
In [284]: M = np.array([[1,0,1],[0,1,0]],dtype=bool)
In [285]: A
Out[285]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [286]: M
Out[286]:
array([[ True, False, True],
[False, True, False]])
In [287]: I,J = np.nonzero(M)
In [288]: I,J
Out[288]: (array([0, 0, 1]), array([0, 2, 1]))
In [289]: A[I,J,:]
Out[289]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11],
[16, 17, 18, 19]])
Since M is masking the initial dimensions, it can be simplified to
A[np.nonzero(M)]
I am basically trying to achieve this, but need the unfolding to be done in a different fashion. i want all samples of the N-1th dimension to be concatenated. For example, if my unfolding were to be applied to an RGB image of (100,100,3) the new array would basically become a (100,300) where the 3 colour channel images are now side by side in the new array.
All my attempts to use a neat built in numpy function like flatten and concatenate yielded no results. (flatten, because the end goal is to apply this unfolding until it is a 1D array)
Can't even think of a slicing way of doing it in a loop since the starting number of dimensions isn't constant (array = array[:,...,:,0]+...+array[:,...,:,0])
EDIT
I just came up with this way of achieving what I want, but would still welcome better, more pure, numpy solutions.
shape = numpy.random.randint(100, size=numpy.random.randint(100))
array = numpy.random.uniform(size=shape)
array = array.T
for i in range(0, len(shape)-1, -1):
array = numpy.concatenate(array)
Am I right in deducing that you want flatten the last 2 dimensions of the array?
In [96]: shape = numpy.random.randint(10, size=numpy.random.randint(10))+1
In [97]: shape
Out[97]: array([2, 7, 2])
In [98]: newshape=tuple(shape[:-2])+(-1,)
In [99]: arr = np.arange(np.prod(shape)).reshape(shape)
In [100]: arr.shape
Out[100]: (2, 7, 2)
In [101]: arr.reshape(newshape).shape
Out[101]: (2, 14)
In [102]: arr.reshape(newshape)
Out[102]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])
If you don't like the order of terms in the last dimension, you may need to transpose
In [109]: np.swapaxes(arr, -1,-2).reshape(newshape)
Out[109]:
array([[ 0, 2, 4, 6, 8, 10, 12, 1, 3, 5, 7, 9, 11, 13],
[14, 16, 18, 20, 22, 24, 26, 15, 17, 19, 21, 23, 25, 27]])
I can't test it against your code because range(0,len(shape)-1, -1) is an empty range.
I don't think you want
In [112]: np.concatenate(arr,axis=-1).shape
Out[112]: (7, 4)
In [113]: np.concatenate((arr[0,...],arr[1,...]), axis=-1)
Out[113]:
array([[ 0, 1, 14, 15],
[ 2, 3, 16, 17],
[ 4, 5, 18, 19],
[ 6, 7, 20, 21],
[ 8, 9, 22, 23],
[10, 11, 24, 25],
[12, 13, 26, 27]])
That splits the arr on the 1st axis, and then joins it on the last.
I have a 3D numpy array and I want to partition it by the first 2 dimensions (and select all elements in the last one). Is there a simple way I can do that using numpy?
Example: given array
a = array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I would like to split it N ways by the first two axes (while retaining all elements in the last one), e.g.,:
a[0:2, 0:2, :], a[2:3, 2:3, :]
But it doesn't need to be evenly split. Seems like numpy.array_split will split on all axes?
In [179]: np.array_split(a,2,0)
Out[179]:
[array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]]),
array([[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])]
is the same as [a[:2,:,:], a[2:,:,:]]
You could loop on those 2 arrays and apply split on the next axis.
In [182]: a2=[np.array_split(aa,2,1) for aa in a1]
In [183]: a2 # edited for clarity
Out[183]:
[[array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 9, 10, 11],
[12, 13, 14]]]), # (2,2,3)
array([[[ 6, 7, 8]],
[[15, 16, 17]]])], # (2,1,3)
[array([[[18, 19, 20],
[21, 22, 23]]]), # (1,2,3)
array([[[24, 25, 26]]])]] # (1,1,3)
In [184]: a2[0][0].shape
Out[184]: (2, 2, 3)
In [185]: a2[0][1].shape
Out[185]: (2, 1, 3)
In [187]: a2[1][0].shape
Out[187]: (1, 2, 3)
In [188]: a2[1][1].shape
Out[188]: (1, 1, 3)
With the potential of splitting in uneven arrays in each dimension, it is hard to do this in a full vectorized form. And even if the splits were even it's tricky to do this sort of grid splitting because values are not contiguous. In this example there's a gap between 5 and 9 in the first subarray.
A quick list comprehension will do the trick
[np.array_split(arr, 2, axis=1)
for arr in np.array_split(a, 2, axis=0)]
This will result in a list of lists, the items of which contain the arrays you're looking for.