Subsetting a 2D numpy array - python

I have looked into documentations and also other questions here, but it seems I
have not got the hang of subsetting in numpy arrays yet.
I have a numpy array,
and for the sake of argument, let it be defined as follows:
import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
# [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:
n1 = range(5)
n2 = range(5)
But when I use:
b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])
Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:
b = a[n1,:]
b = b[:,n2]
# array([[ 0, 1, 2, 3, 4],
# [10, 11, 12, 13, 14],
# [20, 21, 22, 23, 24],
# [30, 31, 32, 33, 34],
# [40, 41, 42, 43, 44]])
But I am sure there should be a way to do this simple task in just one command.

You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.
There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.
In your case:
import numpy as np
a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)
# Not what you want
b = a[n1, n2] # array([ 0, 11, 22, 33, 44])
# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]
# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]
Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.
print "Fancy Indexing:"
print a[n1, n2]
print "Manual indexing:"
for i, j in zip(n1, n2):
print a[i, j]
However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.
In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.
In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
array([[11, 21, 31],
[12, 22, 32],
[13, 23, 33]])
In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])
To be a bit more precise,
a[[[1, 2, 3]], [[1],[2],[3]]]
is treated exactly like:
i = [[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
j = [[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]
a[i, j]
In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.
np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:
In [6]: np.ix_([1, 2, 3], [1, 2, 3])
[3]]), array([[1, 2, 3]]))
Similarly (the sparse argument would make it identical to ix_ above):
In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
[array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]),
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])]

Another quick way to build the desired index is to use the np.ix_ function:
>>> a[np.ix_([n1, n2])]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
This provides a convenient way to construct an open mesh from sequences of indices.

You could use np.meshgrid to give the n1, n2 arrays the proper shape to perform the desired indexing:
In [104]: a[np.meshgrid(n1,n2, sparse=True, indexing='ij')]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
Or, without meshgrid:
In [117]: a[np.array(n1)[:,np.newaxis], np.array(n2)[np.newaxis,:]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
There is a similar example with an explanation of how this integer array indexing works in the docs.
See also the Cookbook recipe Picking out rows and columns.

A nice Trick I've managed to pull (for lazy people only)
Is filter + Transpose + filter.
a = np.arange(100).reshape(10,10)
subsetA = [1,3,5,7]
array([[11, 31, 51, 71],
[13, 33, 53, 73],
[15, 35, 55, 75],
[17, 37, 57, 77]])

It seems that a use case for your particular question would deal with image manipulation. To the extent that you are using your example to edit numpy arrays arising from images, you can use the Python Imaging Library (PIL).
# Import Pillow:
from PIL import Image
# Load the original image:
img ="flowers.jpg")
# Crop the image
img2 = img.crop((0, 0, 5, 5))
The img2 object is a numpy array of the resulting cropped image.
You can read more about image manipulation here with the Pillow package (a user friendly fork on the PIL package):


Array based indexing of an ndarray

I am not understanding numpy.take though it seems like it is the function I want. I have an ndarray and I want to use another ndarray to index into the first.
import numpy as np
# Create a matrix
A = np.arange(75).reshape((5,5,3))
# Create the index array
idx = np.array([[1, 0, 0, 1, 1],
[1, 1, 0, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 0, 0, 0],
[1, 1, 1, 1, 0]])
Given the above, I want to index A by the values in idx. I thought takedoes this, but it doesn't output what I expected.
# Index the 3rd dimension of the A matrix by the idx array.
Asub = np.take(A, idx)
print(f'Value in A at 1,1,1 is {A[1,1,1]}')
print(f'Desired index from idx {idx[1,1]}')
print(f'Value in Asub at [1,1,1] {Asub[1,1]} <- thought this would be 19')
I was expecting to see the value at the idx location one the value in A based on idx:
Value in A at 1,1,1 is 19
Desired index from idx 1
Value in Asub at [1,1,1] 1 <- thought this would be 19
One possibility is to create row and col indices that broadcast with the third dimension one, i.e a (5,1) and (5,) that pair with the (5,5) idx:
In [132]: A[np.arange(5)[:,None],np.arange(5), idx]
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
This ends up picking values from A[:,:,0] and A[:,:,1]. This takes the values of idx as integers, in the range of valid (0,1,2) (for shape 3). They aren't boolean selectors.
Out[132][1,1] is 19, same as A[1,1,1]; Out[132][1,2] is the same as A[1,2,0].
take_along_axis gets the same values, but with an added dimension:
In [142]: np.take_along_axis(A, idx[:,:,None], 2).shape
Out[142]: (5, 5, 1)
In [143]: np.take_along_axis(A, idx[:,:,None], 2)[:,:,0]
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
The iterative equivalent might be easier to understand:
In [145]: np.array([[A[i,j,idx[i,j]] for j in range(5)] for i in range(5)])
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
If you have trouble expressing an action in "vectorized" array ways, go ahead an write an integrative version. It will avoid a lot of ambiguity and misunderstanding.
Another way to get the same values, treating the idx values as True/False booleans is:
In [146]: np.where(idx, A[:,:,1], A[:,:,0])
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
IIUC, you can get the resulted array by broadcasting the idx array, to make its shape same as A to be multiplied, and then indexing to get the column 1 as:
Asub = (A * idx[:, :, None])[:, :, 1] # --> Asub[1, 1] = 19
# [[ 1 0 0 10 13]
# [16 19 0 25 28]
# [31 0 37 0 43]
# [46 49 0 0 0]
# [61 64 67 70 0]]
I think it be the fastest way (or one of the bests), particularly for large arrays.

Error in comparing of two arrays: DeprecationWarning: elementwise comparison failed; this will raise an error in the future

I have an array a with the shape of (1000000,32) and array b with the shape of (10000,32)
I want to find the indices of a that contains the rows of b.
I have written the following code:
I = np.argwhere((a == b[:, None]).all(axis=2))[:, 1]
when I test it in other cases it works very well. But for my current arrays it gives the following error:
...\Anaconda3\lib\site-packages\ DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
AttributeError: 'bool' object has no attribute 'all'
Any idea what is the source of the error? thanks
result = (a[:, np.newaxis] == b).all(-1).any(-1)
a[:, np.newaxis] == b - "by element" comparison. First and second index -
indices of rows from a and b, third index - column index in both rows.
….all(-1) - does a[i] has its "counterpart" in b[j] (all
elements of both rows are equal).
….any(-1) - does a[i] has its "counterpart" in any row in b.
To check the results of each step, use 2 arrays with e.g. up to 10 rows and
2 columns.
>>> import numpy as np
>>> a=np.arange(60).reshape(10,6)
>>> a
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59]])
>>> b=np.arange(24).reshape(4,6)
>>> b
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
>>> np.arange(a.shape[0])[np.isin(a,b).all(axis=1)]
array([0, 1, 2, 3])
Of course np.isin() is in the docs here and tests if an element of a is in b. The questioner is already familiar with all() and the use of the axis=1 argument. So np.isin(a,b).all(axis=1) produces a boolean array that selects from among the indexes of a which are represented by np.arange(a.shape[0]).

Reshaping rank > 2 numpy arrays in Python

I am working with numpy arrays as rank > 2 tensors in Python and am trying to reshape such a tensor into a matrix, i.e. a rank-2 array. The standard ndarray.reshape() function doesn't really work for this because I need to group the indices of my tensor in a particular way. What I mean is this: say I start with a rank 3 tensor, T_ijk. I am trying to find a function that will output the rank 2 tensor T_(j)(ik), for instance, i.e. for this exampe the desired input/output would be
[Input:] T=np.array([[[1 2]
[3 4]]
[[5 6]
[7 8]]])
[Output:] array([[1, 2, 5, 6],
[3, 4, 7, 8]])
Also, a friend suggested to me that tensorflow might have functions like this, but I've never used it. Does anyone have any insight here?
Try this -
k = 1
m = 2
i = 5
j = 5
l = 2
#dummy T_ijklm
T = np.array(range(100)).reshape(k,m,i,j,l)
T_new = T.reshape(k*m,i*j*l)
print('Original T:',T.shape)
print('New T:',T_new.shape)
#(km)(ijl) = 2*50
Original T: (1, 2, 5, 5, 2)
New T: (2, 50)
New tensor is now a rank 2
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
98, 99]])
In [216]: arr = np.arange(1,9).reshape(2,2,2)
In [217]: arr
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
reshape keeps elements in the original [1,2,3,4,5...] order
In [218]: arr.reshape(2,4)
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Figuring out the correct transpose order can be tricky. Sometimes I just try several things. Here I note that you want to preserve the order on the last dimension, so all we have to do is swap the first 2 axes:
In [219]: arr.transpose(1,0,2)
array([[[1, 2],
[5, 6]],
[[3, 4],
[7, 8]]])
now the reshape does what we want:
In [220]: arr.transpose(1,0,2).reshape(2,4)
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
This sequence, as best I know, the best "built-in" approach.
You comment:
if I wanted to transform T_ijklmno to T_(ilo)(jmnk) having to figure out which axes to switch and how to reshape will probably get out of hand... that's why I'm looking for an in-built solution
The T_.... notation reminds me that we could use einsum to do the transpose:
In [221]: np.einsum('ijk->jik',arr)
array([[[1, 2],
[5, 6]],
[[3, 4],
[7, 8]]])
So T_ijklmno to T_(ilo)(jmnk) could become
np.einsum('ijklmno->ilojmnk',T).reshape(I*L*O, J*M*N*K)
(I wrote these by just eyeballing your T expression)
There are so many ways you could transpose and reshape an array with 7 dimensions, that there's little point in coming up with anything more general than the existing methods - transpose, swapaxes, einsum. Simply identifying the dimensions as you do with 'ijk...' is the toughest part of the problem.

How to access array by a list of point coordinates

I have array A = np.ones((4,4,4)) and another array that represent the coordinates of point in array A called B, lets assume that B = [[2,2,2], [3,2,1]].
I tried to access A by array indexing like A[B], but it didn't works.
How i can do it in elegant way, that also work for B that it's have a higher dimensions like B of shape (10, 20, 3) ?
You can pass a list of coordinates, but you should transpose the list. Such that the items of the i-th dimension are passed as the i-th element in the indexing, for example with:
For a 4×4×4 matrix:
>>> A = np.arange(64).reshape(4,4,4)
>>> A
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]],
[[48, 49, 50, 51],
[52, 53, 54, 55],
[56, 57, 58, 59],
[60, 61, 62, 63]]])
we get for the given coordinates:
>>> A[tuple(np.transpose(B))]
array([42, 57])
and if we calculate these manually, we get:
>>> A[2,2,2]
>>> A[3,2,1]
A[1,2,3] is short for A[(1,2,3)] (so in a tuple). You can fetch multiple items with A[([2,3], [2,2], [2,1])] but then you thus first need to transpose the data.
Since the data is represented as [[2,2,2], [3,2,1]], we thus first need to transpose it to [[2,3], [2,2], [2,1]]. Next we wrap it in a tuple, and can use this to subscript A.

numpy select columns and rows with lists [duplicate]

I have looked into documentations and also other questions here, but it seems I
have not got the hang of subsetting in numpy arrays yet.
I have a numpy array,
and for the sake of argument, let it be defined as follows:
import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
# [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:
n1 = range(5)
n2 = range(5)
But when I use:
b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])
Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:
b = a[n1,:]
b = b[:,n2]
# array([[ 0, 1, 2, 3, 4],
# [10, 11, 12, 13, 14],
# [20, 21, 22, 23, 24],
# [30, 31, 32, 33, 34],
# [40, 41, 42, 43, 44]])
But I am sure there should be a way to do this simple task in just one command.
You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.
There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.
In your case:
import numpy as np
a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)
# Not what you want
b = a[n1, n2] # array([ 0, 11, 22, 33, 44])
# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]
# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]
Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.
print "Fancy Indexing:"
print a[n1, n2]
print "Manual indexing:"
for i, j in zip(n1, n2):
print a[i, j]
However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.
In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.
In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
array([[11, 21, 31],
[12, 22, 32],
[13, 23, 33]])
In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])
To be a bit more precise,
a[[[1, 2, 3]], [[1],[2],[3]]]
is treated exactly like:
i = [[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
j = [[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]
a[i, j]
In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.
np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:
In [6]: np.ix_([1, 2, 3], [1, 2, 3])
[3]]), array([[1, 2, 3]]))
Similarly (the sparse argument would make it identical to ix_ above):
In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
[array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]),
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])]
Another quick way to build the desired index is to use the np.ix_ function:
>>> a[np.ix_([n1, n2])]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
This provides a convenient way to construct an open mesh from sequences of indices.
You could use np.meshgrid to give the n1, n2 arrays the proper shape to perform the desired indexing:
In [104]: a[np.meshgrid(n1,n2, sparse=True, indexing='ij')]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
Or, without meshgrid:
In [117]: a[np.array(n1)[:,np.newaxis], np.array(n2)[np.newaxis,:]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
There is a similar example with an explanation of how this integer array indexing works in the docs.
See also the Cookbook recipe Picking out rows and columns.
A nice Trick I've managed to pull (for lazy people only)
Is filter + Transpose + filter.
a = np.arange(100).reshape(10,10)
subsetA = [1,3,5,7]
array([[11, 31, 51, 71],
[13, 33, 53, 73],
[15, 35, 55, 75],
[17, 37, 57, 77]])
It seems that a use case for your particular question would deal with image manipulation. To the extent that you are using your example to edit numpy arrays arising from images, you can use the Python Imaging Library (PIL).
# Import Pillow:
from PIL import Image
# Load the original image:
img ="flowers.jpg")
# Crop the image
img2 = img.crop((0, 0, 5, 5))
The img2 object is a numpy array of the resulting cropped image.
You can read more about image manipulation here with the Pillow package (a user friendly fork on the PIL package):
