Related
I have a multidimensional array of shape (n,x,y). For this example can use this array
A = array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]],
[[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]],
[[24, 25, 26],
[27, 28, 29],
[30, 31, 32],
[33, 34, 35]]])
I then have another multidimensional array that has index values that I want to use on the original array, A. This has shape (z,2) and the values represent row values index’s
Row_values = array([[0,1],
[0,2],
[1,2],
[1,3]])
So I want to use all the index values in row_values to apply to each of the three arrays in A so I end up with a final array of shape (12,2,3)
Result = ([[[0,1,2],
[3,4,5]],
[[0,1,2],
[6,7,8]],
[[3,4,5],
[6,7,8]]
[[3,4,5],
[9,10,11],
[[12,13,14],
[15,16,17]],
[[12,13,14],
[18,19,20]],
[[15,16,17],
[18,19,20]],
[[15,16,17],
[21,22,23]],
[[24,25,26],
[27,28,29]],
[[24,25,26],
[30,31,32]],
[[27,28,29],
[30,31,32]],
[[27,28,29],
[33,34,35]]]
I have tried using np.take() but haven’t been able to make it work. Not sure if there’s another numpy function that is easier to use
We can advantage of NumPy's advanced indexing and using np.repeat and np.tile along with it.
cidx = np.tile(Row_values, (A.shape[0], 1))
ridx = np.repeat(np.arange(A.shape[0]), Row_values.shape[0])
out = A[ridx[:, None], cidx]
# out.shape -> (12, 2, 3)
Using np.take
np.take(A, Row_values, axis=1).reshape((-1, 2, 3))
# Or
A[:, Row_values].reshape((-1, 2, 3))
Output:
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]],
[[12, 13, 14],
[15, 16, 17]],
[[12, 13, 14],
[18, 19, 20]],
[[15, 16, 17],
[18, 19, 20]],
[[15, 16, 17],
[21, 22, 23]],
[[24, 25, 26],
[27, 28, 29]],
[[24, 25, 26],
[30, 31, 32]],
[[27, 28, 29],
[30, 31, 32]],
[[27, 28, 29],
[33, 34, 35]]])
I have a 2D numpy array x as:
[ [ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18],
[19, 20, 21],
[22, 23, 24],
[25, 26, 27],
[28, 29, 30],
[31, 32, 33],
[34, 35, 36],
[37, 38, 39],
[40, 41, 42],
[43, 44, 45],
[46, 47, 48],
[49, 50, 51],
[52, 53, 54],
[55, 56, 57],
[58, 59, 60] ]
I want to extract the arguments of rows for which any element in the row is less than 25. So, what I need for output is [0,1,2,3,4,5,6,7] for just the rows but using np.where(x<35) is giving me the list of 2D arguments for all possible values. In other words, I what I want are the arguments of all the rows of x where at least one element is less than 25, but what I am getting are the arguments of all the elements of x that are less than 25.
What should I do? Is there a specific function for this or should I just select the unique values from the returned list of arguments?
One way would be this:
import numpy as np
# x is your array
x1 = (x < 25).sum(axis = 1)
rows = np.where(x1 > 0)[0]
The row indices are in rows as array([0, 1, 2, 3, 4, 5, 6, 7]).
You can also use nonzero as:
rows = np.nonzero((x < 25).sum(axis = 1))[0]
I have looked into documentations and also other questions here, but it seems I
have not got the hang of subsetting in numpy arrays yet.
I have a numpy array,
and for the sake of argument, let it be defined as follows:
import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
# [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:
n1 = range(5)
n2 = range(5)
But when I use:
b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])
Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:
b = a[n1,:]
b = b[:,n2]
# array([[ 0, 1, 2, 3, 4],
# [10, 11, 12, 13, 14],
# [20, 21, 22, 23, 24],
# [30, 31, 32, 33, 34],
# [40, 41, 42, 43, 44]])
But I am sure there should be a way to do this simple task in just one command.
You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.
There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.
In your case:
import numpy as np
a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)
# Not what you want
b = a[n1, n2] # array([ 0, 11, 22, 33, 44])
# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]
# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]
Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.
print "Fancy Indexing:"
print a[n1, n2]
print "Manual indexing:"
for i, j in zip(n1, n2):
print a[i, j]
However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.
In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.
In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
Out[4]:
array([[11, 21, 31],
[12, 22, 32],
[13, 23, 33]])
In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])
To be a bit more precise,
a[[[1, 2, 3]], [[1],[2],[3]]]
is treated exactly like:
i = [[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
j = [[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]
a[i, j]
In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.
np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:
In [6]: np.ix_([1, 2, 3], [1, 2, 3])
Out[6]:
(array([[1],
[2],
[3]]), array([[1, 2, 3]]))
Similarly (the sparse argument would make it identical to ix_ above):
In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
Out[7]:
[array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]),
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])]
Another quick way to build the desired index is to use the np.ix_ function:
>>> a[np.ix_([n1, n2])]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
This provides a convenient way to construct an open mesh from sequences of indices.
You could use np.meshgrid to give the n1, n2 arrays the proper shape to perform the desired indexing:
In [104]: a[np.meshgrid(n1,n2, sparse=True, indexing='ij')]
Out[104]:
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
Or, without meshgrid:
In [117]: a[np.array(n1)[:,np.newaxis], np.array(n2)[np.newaxis,:]]
Out[117]:
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
There is a similar example with an explanation of how this integer array indexing works in the docs.
See also the Cookbook recipe Picking out rows and columns.
A nice Trick I've managed to pull (for lazy people only)
Is filter + Transpose + filter.
a = np.arange(100).reshape(10,10)
subsetA = [1,3,5,7]
a[subsetA].T[subsetA]
array([[11, 31, 51, 71],
[13, 33, 53, 73],
[15, 35, 55, 75],
[17, 37, 57, 77]])
It seems that a use case for your particular question would deal with image manipulation. To the extent that you are using your example to edit numpy arrays arising from images, you can use the Python Imaging Library (PIL).
# Import Pillow:
from PIL import Image
# Load the original image:
img = Image.open("flowers.jpg")
# Crop the image
img2 = img.crop((0, 0, 5, 5))
The img2 object is a numpy array of the resulting cropped image.
You can read more about image manipulation here with the Pillow package (a user friendly fork on the PIL package):
I have looked into documentations and also other questions here, but it seems I
have not got the hang of subsetting in numpy arrays yet.
I have a numpy array,
and for the sake of argument, let it be defined as follows:
import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
# [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
now I want to choose rows and columns of a specified by vectors n1 and n2. As an example:
n1 = range(5)
n2 = range(5)
But when I use:
b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])
Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:
b = a[n1,:]
b = b[:,n2]
# array([[ 0, 1, 2, 3, 4],
# [10, 11, 12, 13, 14],
# [20, 21, 22, 23, 24],
# [30, 31, 32, 33, 34],
# [40, 41, 42, 43, 44]])
But I am sure there should be a way to do this simple task in just one command.
You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.
There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.
In your case:
import numpy as np
a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)
# Not what you want
b = a[n1, n2] # array([ 0, 11, 22, 33, 44])
# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]
# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]
Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.
print "Fancy Indexing:"
print a[n1, n2]
print "Manual indexing:"
for i, j in zip(n1, n2):
print a[i, j]
However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.
In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.
In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
Out[4]:
array([[11, 21, 31],
[12, 22, 32],
[13, 23, 33]])
In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])
To be a bit more precise,
a[[[1, 2, 3]], [[1],[2],[3]]]
is treated exactly like:
i = [[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
j = [[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]
a[i, j]
In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.
np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing:
In [6]: np.ix_([1, 2, 3], [1, 2, 3])
Out[6]:
(array([[1],
[2],
[3]]), array([[1, 2, 3]]))
Similarly (the sparse argument would make it identical to ix_ above):
In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
Out[7]:
[array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]),
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])]
Another quick way to build the desired index is to use the np.ix_ function:
>>> a[np.ix_([n1, n2])]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
This provides a convenient way to construct an open mesh from sequences of indices.
You could use np.meshgrid to give the n1, n2 arrays the proper shape to perform the desired indexing:
In [104]: a[np.meshgrid(n1,n2, sparse=True, indexing='ij')]
Out[104]:
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
Or, without meshgrid:
In [117]: a[np.array(n1)[:,np.newaxis], np.array(n2)[np.newaxis,:]]
Out[117]:
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
There is a similar example with an explanation of how this integer array indexing works in the docs.
See also the Cookbook recipe Picking out rows and columns.
A nice Trick I've managed to pull (for lazy people only)
Is filter + Transpose + filter.
a = np.arange(100).reshape(10,10)
subsetA = [1,3,5,7]
a[subsetA].T[subsetA]
array([[11, 31, 51, 71],
[13, 33, 53, 73],
[15, 35, 55, 75],
[17, 37, 57, 77]])
It seems that a use case for your particular question would deal with image manipulation. To the extent that you are using your example to edit numpy arrays arising from images, you can use the Python Imaging Library (PIL).
# Import Pillow:
from PIL import Image
# Load the original image:
img = Image.open("flowers.jpg")
# Crop the image
img2 = img.crop((0, 0, 5, 5))
The img2 object is a numpy array of the resulting cropped image.
You can read more about image manipulation here with the Pillow package (a user friendly fork on the PIL package):
I am working with images through numpy. I want to set a chunk of the image to its average color. I am able to do this, but I have to re-index the array, when I would like to use the original view to do this. In other words, I would like to use that 4th line of code, but I'm stuck with the 3rd one.
I have read a few posts about the as_strided function, but it is confusing to me, and I was hoping there might be a simpler solution. So is there a way to slightly modify that last line of code to do what I want?
box = im[x-dx:x+dx, y-dy:y+dy, :]
avg = block(box) #returns a 1D numpy array with 3 values
im[x-dx:x+dx, y-dy:y+dy, :] = avg[None,None,:] #sets box to average color
#box = avg[None,None,:] #does not affect original array
box = blah
just reassigns the box variable. The array that the box variable previously referred to is unaffected. This is not what you want.
box[:] = blah
is a slice assignment. It modifies the contents of the array. This is what you want.
Note that slice assignment is dependent on the syntactic form of the statement. The fact that box was assigned by box = im[stuff] does not make further assignments to box slice assignments. This is similar to how if you do
l = [1, 2, 3]
b = l[2]
b = 0
the assignment to b doesn't affect l.
Gray-scale Images
This will set a chunk of an array to its average (mean) value:
im[2:4, 2:4] = im[2:4, 2:4].mean()
For example:
In [9]: im
Out[9]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [10]: im[2:4, 2:4] = im[2:4, 2:4].mean()
In [11]: im
Out[11]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 12, 12],
[12, 13, 12, 12]])
Color Images
Suppose that we want to average over each component of color separately:
In [22]: im = np.arange(48).reshape((4,4,3))
In [23]: im
Out[23]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]],
[[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]],
[[24, 25, 26],
[27, 28, 29],
[30, 31, 32],
[33, 34, 35]],
[[36, 37, 38],
[39, 40, 41],
[42, 43, 44],
[45, 46, 47]]])
In [24]: im[2:4, 2:4, :] = im[2:4, 2:4, :].mean(axis=0).mean(axis=0)[np.newaxis, np.newaxis, :]
In [25]: im
Out[25]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]],
[[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]],
[[24, 25, 26],
[27, 28, 29],
[37, 38, 39],
[37, 38, 39]],
[[36, 37, 38],
[39, 40, 41],
[37, 38, 39],
[37, 38, 39]]])