I am not understanding numpy.take though it seems like it is the function I want. I have an ndarray and I want to use another ndarray to index into the first.
import numpy as np
# Create a matrix
A = np.arange(75).reshape((5,5,3))
# Create the index array
idx = np.array([[1, 0, 0, 1, 1],
[1, 1, 0, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 0, 0, 0],
[1, 1, 1, 1, 0]])
Given the above, I want to index A by the values in idx. I thought takedoes this, but it doesn't output what I expected.
# Index the 3rd dimension of the A matrix by the idx array.
Asub = np.take(A, idx)
print(f'Value in A at 1,1,1 is {A[1,1,1]}')
print(f'Desired index from idx {idx[1,1]}')
print(f'Value in Asub at [1,1,1] {Asub[1,1]} <- thought this would be 19')
I was expecting to see the value at the idx location one the value in A based on idx:
Value in A at 1,1,1 is 19
Desired index from idx 1
Value in Asub at [1,1,1] 1 <- thought this would be 19
One possibility is to create row and col indices that broadcast with the third dimension one, i.e a (5,1) and (5,) that pair with the (5,5) idx:
In [132]: A[np.arange(5)[:,None],np.arange(5), idx]
Out[132]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
This ends up picking values from A[:,:,0] and A[:,:,1]. This takes the values of idx as integers, in the range of valid (0,1,2) (for shape 3). They aren't boolean selectors.
Out[132][1,1] is 19, same as A[1,1,1]; Out[132][1,2] is the same as A[1,2,0].
take_along_axis gets the same values, but with an added dimension:
In [142]: np.take_along_axis(A, idx[:,:,None], 2).shape
Out[142]: (5, 5, 1)
In [143]: np.take_along_axis(A, idx[:,:,None], 2)[:,:,0]
Out[143]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
The iterative equivalent might be easier to understand:
In [145]: np.array([[A[i,j,idx[i,j]] for j in range(5)] for i in range(5)])
Out[145]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
If you have trouble expressing an action in "vectorized" array ways, go ahead an write an integrative version. It will avoid a lot of ambiguity and misunderstanding.
Another way to get the same values, treating the idx values as True/False booleans is:
In [146]: np.where(idx, A[:,:,1], A[:,:,0])
Out[146]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
IIUC, you can get the resulted array by broadcasting the idx array, to make its shape same as A to be multiplied, and then indexing to get the column 1 as:
Asub = (A * idx[:, :, None])[:, :, 1] # --> Asub[1, 1] = 19
# [[ 1 0 0 10 13]
# [16 19 0 25 28]
# [31 0 37 0 43]
# [46 49 0 0 0]
# [61 64 67 70 0]]
I think it be the fastest way (or one of the bests), particularly for large arrays.
Related
I have an array a with the shape of (1000000,32) and array b with the shape of (10000,32)
I want to find the indices of a that contains the rows of b.
I have written the following code:
I = np.argwhere((a == b[:, None]).all(axis=2))[:, 1]
when I test it in other cases it works very well. But for my current arrays it gives the following error:
...\Anaconda3\lib\site-packages\ipykernel_launcher.py:111: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
AttributeError: 'bool' object has no attribute 'all'
Any idea what is the source of the error? thanks
Run:
result = (a[:, np.newaxis] == b).all(-1).any(-1)
Steps:
a[:, np.newaxis] == b - "by element" comparison. First and second index -
indices of rows from a and b, third index - column index in both rows.
….all(-1) - does a[i] has its "counterpart" in b[j] (all
elements of both rows are equal).
….any(-1) - does a[i] has its "counterpart" in any row in b.
To check the results of each step, use 2 arrays with e.g. up to 10 rows and
2 columns.
np.arange(a.shape[0])[np.isin(a,b).all(axis=1)]
>>> import numpy as np
>>> a=np.arange(60).reshape(10,6)
>>> a
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59]])
>>> b=np.arange(24).reshape(4,6)
>>> b
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
>>> np.arange(a.shape[0])[np.isin(a,b).all(axis=1)]
array([0, 1, 2, 3])
Of course np.isin() is in the docs here and tests if an element of a is in b. The questioner is already familiar with all() and the use of the axis=1 argument. So np.isin(a,b).all(axis=1) produces a boolean array that selects from among the indexes of a which are represented by np.arange(a.shape[0]).
You can use numpy.intersect1d(a1,a2) and then the docs provide an option to intersect multiple arrays :
reduce(np.intersect1d, ([1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2]))
What I want to do is to find the intersection between a 1D array and every row in the corresponding 2D array.
Or better yet just the COUNT of the overlapping elements in every row.
I know I can do that with intersect1d() and a loop, but it will be too slow.
How can we count the overlapping elements in every row the numpy-way ?
Ex:
In [59]: a2 = np.random.choice(np.arange(0,100),(10,5), replace=False)
In [60]: a2
Out[60]:
array([[50, 5, 25, 40, 19], 1
[43, 37, 21, 55, 11], 0
[16, 49, 6, 86, 96], 0
[80, 66, 87, 51, 64], 0
[42, 7, 20, 24, 74], 1
[92, 63, 75, 54, 90], 2
[ 9, 91, 88, 85, 22], 0
[ 4, 65, 97, 93, 53], 0
[18, 0, 57, 71, 76], 0
[94, 1, 77, 89, 45]]) 0
In [61]: a1 = np.random.choice(np.arange(0,100),5, replace=False)
In [63]: a1
Out[63]: array([63, 54, 20, 60, 25])
To simply get the count of common elements per row, we can get a mask of matches with np.isin and then just the count per row -
np.isin(arr2D,arr1D).sum(axis=1)
If you want to count each unique element only once in case of duplicate occurences per row and if input elements are positive numbers, we need few more steps -
# https://stackoverflow.com/a/46256361/ #Divakar
def bincount2D_vectorized(a):
N = a.max()+1
a_offs = a + np.arange(a.shape[0])[:,None]*N
return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)
count = (bincount2D_vectorized(np.isin(arr2D,arr1D)*arr2D)[:,1:]!=0).sum(1)
Given a 2D numpy image array with shape (height, width, 3), and BGR tuples as elements, I want to multiply each element by a kernel to extract the B/G/R channels individually. The blue kernel, for example, would be (1, 0, 0). Something like this:
# extact color channel
def extract_color_channel(image, kernel):
channel = np.copy(image)
height, width = image.shape[:2]
for y in range(0, height):
for x in range(0, width):
channel[y,x] = image[y, x] * kernel
return channel
# extract blue channel
def extract_blue(image):
return extract_color_channel(image, (1, 0, 0))
What is the most efficient "numpy way" to do this?
With a sample array:
In [220]: arr = np.arange(5*5*3).reshape(5,5,3)
Basic indexing is the most efficient way (this will be a view)
In [221]: arr[:,:,0]
Out[221]:
array([[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57],
[60, 63, 66, 69, 72]])
The [1,0,0] list is not what you want. But you could cast it as a bool array.
In [222]: kernel = np.array([1,0,0],dtype=bool)
In [223]: kernel
Out[223]: array([ True, False, False], dtype=bool)
In [224]: arr[:,:,kernel].shape
Out[224]: (5, 5, 1)
In [225]: arr[:,:,kernel].squeeze()
Out[225]:
array([[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57],
[60, 63, 66, 69, 72]])
Notice that the shape with the boolean is still 3d. If you don't want that, then you'll need to reshape or squeeze that last dimension out. This indexing is slower since it makes a copy.
This boolean indexing is the equivalent of
In [226]: arr[:,:,[0]].shape
Out[226]: (5, 5, 1)
where [0] is the location of the 'true' value(s) in kernel.
You could also use a dot (matrix product):
In [228]: np.dot(arr,[1,0,0])
Out[228]:
array([[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57],
[60, 63, 66, 69, 72]])
It will be slower than indexing.
Element multiplication:
In [232]: arr*np.array([1,0,0])
Out[232]:
array([[[ 0, 0, 0],
[ 3, 0, 0],
[ 6, 0, 0],
[ 9, 0, 0],
[12, 0, 0]],
[[15, 0, 0],
[18, 0, 0],
....
[66, 0, 0],
[69, 0, 0],
[72, 0, 0]]])
In this multiplication the [1,0,0] behaves as though it were a (1,1,3) array, and broadcasts with the (n,n,3) just fine.
I have looked into documentations and also other questions here, but it seems I
have not got the hang of subsetting in numpy arrays yet.
I have a numpy array,
and for the sake of argument, let it be defined as follows:
import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
# [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:
n1 = range(5)
n2 = range(5)
But when I use:
b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])
Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:
b = a[n1,:]
b = b[:,n2]
# array([[ 0, 1, 2, 3, 4],
# [10, 11, 12, 13, 14],
# [20, 21, 22, 23, 24],
# [30, 31, 32, 33, 34],
# [40, 41, 42, 43, 44]])
But I am sure there should be a way to do this simple task in just one command.
You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.
There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.
In your case:
import numpy as np
a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)
# Not what you want
b = a[n1, n2] # array([ 0, 11, 22, 33, 44])
# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]
# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]
Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.
print "Fancy Indexing:"
print a[n1, n2]
print "Manual indexing:"
for i, j in zip(n1, n2):
print a[i, j]
However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.
In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.
In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
Out[4]:
array([[11, 21, 31],
[12, 22, 32],
[13, 23, 33]])
In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])
To be a bit more precise,
a[[[1, 2, 3]], [[1],[2],[3]]]
is treated exactly like:
i = [[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
j = [[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]
a[i, j]
In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.
np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:
In [6]: np.ix_([1, 2, 3], [1, 2, 3])
Out[6]:
(array([[1],
[2],
[3]]), array([[1, 2, 3]]))
Similarly (the sparse argument would make it identical to ix_ above):
In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
Out[7]:
[array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]),
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])]
Another quick way to build the desired index is to use the np.ix_ function:
>>> a[np.ix_([n1, n2])]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
This provides a convenient way to construct an open mesh from sequences of indices.
You could use np.meshgrid to give the n1, n2 arrays the proper shape to perform the desired indexing:
In [104]: a[np.meshgrid(n1,n2, sparse=True, indexing='ij')]
Out[104]:
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
Or, without meshgrid:
In [117]: a[np.array(n1)[:,np.newaxis], np.array(n2)[np.newaxis,:]]
Out[117]:
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
There is a similar example with an explanation of how this integer array indexing works in the docs.
See also the Cookbook recipe Picking out rows and columns.
A nice Trick I've managed to pull (for lazy people only)
Is filter + Transpose + filter.
a = np.arange(100).reshape(10,10)
subsetA = [1,3,5,7]
a[subsetA].T[subsetA]
array([[11, 31, 51, 71],
[13, 33, 53, 73],
[15, 35, 55, 75],
[17, 37, 57, 77]])
It seems that a use case for your particular question would deal with image manipulation. To the extent that you are using your example to edit numpy arrays arising from images, you can use the Python Imaging Library (PIL).
# Import Pillow:
from PIL import Image
# Load the original image:
img = Image.open("flowers.jpg")
# Crop the image
img2 = img.crop((0, 0, 5, 5))
The img2 object is a numpy array of the resulting cropped image.
You can read more about image manipulation here with the Pillow package (a user friendly fork on the PIL package):
I have looked into documentations and also other questions here, but it seems I
have not got the hang of subsetting in numpy arrays yet.
I have a numpy array,
and for the sake of argument, let it be defined as follows:
import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
# [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:
n1 = range(5)
n2 = range(5)
But when I use:
b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])
Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:
b = a[n1,:]
b = b[:,n2]
# array([[ 0, 1, 2, 3, 4],
# [10, 11, 12, 13, 14],
# [20, 21, 22, 23, 24],
# [30, 31, 32, 33, 34],
# [40, 41, 42, 43, 44]])
But I am sure there should be a way to do this simple task in just one command.
You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.
There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.
In your case:
import numpy as np
a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)
# Not what you want
b = a[n1, n2] # array([ 0, 11, 22, 33, 44])
# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]
# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]
Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.
print "Fancy Indexing:"
print a[n1, n2]
print "Manual indexing:"
for i, j in zip(n1, n2):
print a[i, j]
However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.
In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.
In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
Out[4]:
array([[11, 21, 31],
[12, 22, 32],
[13, 23, 33]])
In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])
To be a bit more precise,
a[[[1, 2, 3]], [[1],[2],[3]]]
is treated exactly like:
i = [[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
j = [[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]
a[i, j]
In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.
np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:
In [6]: np.ix_([1, 2, 3], [1, 2, 3])
Out[6]:
(array([[1],
[2],
[3]]), array([[1, 2, 3]]))
Similarly (the sparse argument would make it identical to ix_ above):
In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
Out[7]:
[array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]),
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])]
Another quick way to build the desired index is to use the np.ix_ function:
>>> a[np.ix_([n1, n2])]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
This provides a convenient way to construct an open mesh from sequences of indices.
You could use np.meshgrid to give the n1, n2 arrays the proper shape to perform the desired indexing:
In [104]: a[np.meshgrid(n1,n2, sparse=True, indexing='ij')]
Out[104]:
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
Or, without meshgrid:
In [117]: a[np.array(n1)[:,np.newaxis], np.array(n2)[np.newaxis,:]]
Out[117]:
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
There is a similar example with an explanation of how this integer array indexing works in the docs.
See also the Cookbook recipe Picking out rows and columns.
A nice Trick I've managed to pull (for lazy people only)
Is filter + Transpose + filter.
a = np.arange(100).reshape(10,10)
subsetA = [1,3,5,7]
a[subsetA].T[subsetA]
array([[11, 31, 51, 71],
[13, 33, 53, 73],
[15, 35, 55, 75],
[17, 37, 57, 77]])
It seems that a use case for your particular question would deal with image manipulation. To the extent that you are using your example to edit numpy arrays arising from images, you can use the Python Imaging Library (PIL).
# Import Pillow:
from PIL import Image
# Load the original image:
img = Image.open("flowers.jpg")
# Crop the image
img2 = img.crop((0, 0, 5, 5))
The img2 object is a numpy array of the resulting cropped image.
You can read more about image manipulation here with the Pillow package (a user friendly fork on the PIL package):