How to write numpy where condition based on indices and not values? - python

I have a 2d numpy array and I need to extract all elements array[i][j] if the conditions
x1range < i < x2range and y1range < j < y2range are satisfied.
How do I write such conditions? Do I need to use mgrid/ogrid?
Edit: Should have written my additional requirement. I was looking for a where condition, and not splicing, because I want to change the values of all the elements to (0,0,0) which satisfy the above condition. I assumed if I have a where condition, I could do that.
Edit2: Also, is it possible to get the 'not' of the above condition?
As in,
if i > x1range and i < x2range and j > y1range and j < y2range: # the above condition
do nothing # keep original value
else:
val = (0,0,0)

Problem #1: Getting indices within the range
You could use np.meshgrid to get those indices -
In [145]: x1range,x2range = 2,5
...: y1range,y2range = 1,4
...:
In [146]: np.meshgrid(np.arange(x1range,x2range),np.arange(y1range,y2range))
Out[146]:
[array([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])]
Problem #2 : Extracting or setting input array elements within those ranges
You could use np.ix_ to directly index into the input array arr -
In [148]: arr
Out[148]:
array([[97, 69, 0, 60, 28, 97],
[98, 85, 24, 75, 97, 23],
[70, 25, 77, 86, 93, 66],
[ 0, 85, 51, 17, 40, 92],
[66, 28, 28, 22, 79, 52]])
In [149]: arr[np.ix_(np.arange(x1range,x2range),np.arange(y1range,y2range))]
Out[149]:
array([[25, 77, 86],
[85, 51, 17],
[28, 28, 22]])
With this indexing, one can also set all those elements directly.
Problem #3 : Extracting or setting input array elements NOT within those ranges
To set/ extract the not satisfied elements to 0s and keeping rest as the same, you can use NumPy broadcasting alongwith boolean-indexing like so -
In [150]: Imask = np.in1d(np.arange(arr.shape[0]),np.arange(x1range,x2range))
...: Jmask = np.in1d(np.arange(arr.shape[1]),np.arange(y1range,y2range))
...: arr[~(Imask[:,None] & Jmask)] = 0
...:
In [151]: arr
Out[151]:
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0],
[ 0, 25, 77, 86, 0, 0],
[ 0, 85, 51, 17, 0, 0],
[ 0, 28, 28, 22, 0, 0]])

just a guess.
x=array[x1range:x2range,y1range:y2range]

What about slicing?
array[x1range:x2range,y1range:y2range]
Example:
numpy.array([[1,2,3],[4,5,6],[7,8,9]])[0:2,0:2]
array([[1, 2],
[4, 5]])

Related

Array based indexing of an ndarray

I am not understanding numpy.take though it seems like it is the function I want. I have an ndarray and I want to use another ndarray to index into the first.
import numpy as np
# Create a matrix
A = np.arange(75).reshape((5,5,3))
# Create the index array
idx = np.array([[1, 0, 0, 1, 1],
[1, 1, 0, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 0, 0, 0],
[1, 1, 1, 1, 0]])
Given the above, I want to index A by the values in idx. I thought takedoes this, but it doesn't output what I expected.
# Index the 3rd dimension of the A matrix by the idx array.
Asub = np.take(A, idx)
print(f'Value in A at 1,1,1 is {A[1,1,1]}')
print(f'Desired index from idx {idx[1,1]}')
print(f'Value in Asub at [1,1,1] {Asub[1,1]} <- thought this would be 19')
I was expecting to see the value at the idx location one the value in A based on idx:
Value in A at 1,1,1 is 19
Desired index from idx 1
Value in Asub at [1,1,1] 1 <- thought this would be 19
One possibility is to create row and col indices that broadcast with the third dimension one, i.e a (5,1) and (5,) that pair with the (5,5) idx:
In [132]: A[np.arange(5)[:,None],np.arange(5), idx]
Out[132]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
This ends up picking values from A[:,:,0] and A[:,:,1]. This takes the values of idx as integers, in the range of valid (0,1,2) (for shape 3). They aren't boolean selectors.
Out[132][1,1] is 19, same as A[1,1,1]; Out[132][1,2] is the same as A[1,2,0].
take_along_axis gets the same values, but with an added dimension:
In [142]: np.take_along_axis(A, idx[:,:,None], 2).shape
Out[142]: (5, 5, 1)
In [143]: np.take_along_axis(A, idx[:,:,None], 2)[:,:,0]
Out[143]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
The iterative equivalent might be easier to understand:
In [145]: np.array([[A[i,j,idx[i,j]] for j in range(5)] for i in range(5)])
Out[145]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
If you have trouble expressing an action in "vectorized" array ways, go ahead an write an integrative version. It will avoid a lot of ambiguity and misunderstanding.
Another way to get the same values, treating the idx values as True/False booleans is:
In [146]: np.where(idx, A[:,:,1], A[:,:,0])
Out[146]:
array([[ 1, 3, 6, 10, 13],
[16, 19, 21, 25, 28],
[31, 33, 37, 39, 43],
[46, 49, 51, 54, 57],
[61, 64, 67, 70, 72]])
IIUC, you can get the resulted array by broadcasting the idx array, to make its shape same as A to be multiplied, and then indexing to get the column 1 as:
Asub = (A * idx[:, :, None])[:, :, 1] # --> Asub[1, 1] = 19
# [[ 1 0 0 10 13]
# [16 19 0 25 28]
# [31 0 37 0 43]
# [46 49 0 0 0]
# [61 64 67 70 0]]
I think it be the fastest way (or one of the bests), particularly for large arrays.

Slicing 3D numpy array using list of index

The objective is to slice 3D array using list of index.
Here, the array is of shape 2,5,5. For simplicity, let assume the index 0 to 4 label as A,B,C,D,E.
Assume we have 3d array as below
array([[[44, 47, 64, 67, 67],
[ 9, 83, 21, 36, 87],
[70, 88, 88, 12, 58],
[65, 39, 87, 46, 88],
[81, 37, 25, 77, 72]],
[[ 9, 20, 80, 69, 79],
[47, 64, 82, 99, 88],
[49, 29, 19, 19, 14],
[39, 32, 65, 9, 57],
[32, 31, 74, 23, 35]]], dtype=int64)
The index of interest is [1,3,4]. Again, we label this as B,D,E`. The expected output, when slicing the 3D array based on the index is as below
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]], dtype=int64)
However, slicing the array as below
import numpy as np
np.random.seed(0)
arr = np.random.randint(0, 100, size=(2, 5, 5))
k=arr[:,(1,3,4),(1,3,4)]
does not produced the expect output.
In actual use case, the number of element to be sliced is > 3 elements (> B,D,E). Sorry for the lack of correct terminology used
Try this, which is similar structure to your arr[:,idx,idx] but using np.ix_(). Do read the documentation for np.ix().-
idx = [1,3,4]
ixgrid = np.ix_(idx,idx)
arr[:,ixgrid[0],ixgrid[1]]
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]])
Explanation
What you are WANT to do is extract a mesh from the last 2 axes of the array. But what you are doing is extract exact indexes from each of the 2 axes.
When you use arr[:,(1,3,4),(1,3,4)], you are essentially asking for (1,1), (3,3) and (4,4) from the two matrices arr[0] and arr[1]
What you need is to extract a mesh. This can be achieved with np.ix_ and the magic of broadcasting.
If you ask for ...
[[1],
[3], and [1,3,4]
[4]]
... which is what the np.ix_ constructs, you broadcast the indexes and instead ask for a cross product between them, which is (1,1), (1,3), (1,4), (3,1), (3,3)... etc.
Hope that clarifies why you get the result you are getting and how you can actually get what you need.
The problem
Advanced indexing expects all dimensions to be indexed explicitly. What you're doing here is grabbing the elements at coordinates (1, 1), (3, 3), (4, 4) in each array along axis 0.
The solution
What you need to do is this instead:
idx = (1, 3, 4) # the indices of interest
arr[np.ix_((0, 1), idx, idx)]
Where (0, 1) corresponds to the first two arrays along axis 0.
Output:
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]], dtype=int64)
As shown above, np.ix_((0, 1), idx, idx)) produces an object which can be used for advanced indexing. The (0, 1) means that you're explicitly selecting the elements from the arrays arr[0] and arr[1]. If you have a more general 3D array of shape (n, m, q) and want to grab the same subarray out of every array along axis 0, you can use
np.ix_(np.arange(arr.shape[0]), idx, idx))
As your indices. Note that idx is repeated here because you wanted those specific indices but in general they don't need to match.
Generalizing
More generally, you can slice and dice however you want like so:
In [1]: arrays_to_select = (0, 1)
In [2]: rows_to_select = (1, 3, 4)
In [3]: cols_to_select = (1, 3, 4)
In [4]: indices = np.ix_(arrays_to_select, rows_to_select, cols_to_select)
In [5]: arr[indices]
Out[5]:
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]], dtype=int64)
Let's consider some other shape:
In [4]: x = np.random.randint(0, 9, (4, 3, 5))
In [5]: x
Out[5]:
array([[[1, 0, 2, 1, 0],
[3, 5, 1, 4, 3],
[1, 8, 1, 4, 2]],
[[1, 6, 8, 2, 8],
[0, 0, 4, 2, 3],
[8, 5, 6, 2, 5]],
[[4, 4, 8, 6, 0],
[3, 0, 1, 2, 8],
[0, 8, 2, 4, 3]],
[[7, 8, 8, 1, 4],
[5, 7, 4, 8, 5],
[7, 5, 5, 3, 4]]])
In [6]: rows = (0, 2)
In [7]: cols = (0, 2, 3, 4)
By using those rows and cols, you'll be grabbing the subarrays composed of all the elements from columns 0 through 4, from only the rows 0 and 2. Let's verify that with the first array along axis 0:
In [8]: arrs = (0,) # A 1-tuple which will give us only the first array along axis 0
In [9]: x[np.ix_(arrs, rows, cols)]
Out[9]:
array([[[1, 2, 1, 0],
[1, 1, 4, 2]]])
Now suppose you want the subarrays produced by rows and cols of only the first and last arrays along axis 0. You can explicitly select (0, -1):
In [10]: arrs = (0, -1)
In [11]: x[np.ix_(arrs, rows, cols)]
Out[11]:
array([[[1, 2, 1, 0],
[1, 1, 4, 2]],
[[7, 8, 1, 4],
[7, 5, 3, 4]]])
If, instead, you want that same subarray from all the arrays along axis 0:
In [12]: arrs = np.arange(x.shape[0])
In [13]: arrs
Out[13]: array([0, 1, 2, 3])
In [14]: x[np.ix_(arrs, rows, cols)]
Out[14]:
array([[[1, 2, 1, 0],
[1, 1, 4, 2]],
[[1, 8, 2, 8],
[8, 6, 2, 5]],
[[4, 8, 6, 0],
[0, 2, 4, 3]],
[[7, 8, 1, 4],
[7, 5, 3, 4]]])

Replace values from (m,n,3) array with conditions from (m,n,1) array

Let's say I have the following array:
a = np.random.randint(5, size=(2000, 2000, 1))
a = np.repeat(a, 3, axis=2) # Using this method to have a (m,n,3) array with the same values
and the next arrays:
val_old = np.array([[0, 0, 0], [3, 3, 3]])
val_new = np.array([[12, 125, 13], [78, 78, 0]])
What I want to do is to replace the values from the array a with the values specified in the array val_new. So, all [0,0,0] arrays would become [12,125,13] and all [3,3,3] would become [78, 78, 0].
I can't find an efficient way to do this... I tried to adapt this solution but it's only for 1-d arrays...
Does anyone know a fast way/method to replace these values ?
Assuming you have a "map" for each integer, you can use a (2000, 2000) index on a (5,) array to broadcast to a (2000,2000, 5) array. example:
val_new = np.array([[12, 125, 13], [0,0,0], [1,3,3], [78, 78, 0]]) #0, 1, 2, 3
a = np.random.randint(4,size=(4,5))
val_new[a] # (4,5,3) shaped array
>>array([[[ 0, 0, 0],
[ 78, 78, 0],
[ 78, 78, 0],
[ 12, 125, 13],
[ 0, 0, 0]],
....
[[ 12, 125, 13],
[ 12, 125, 13],
[ 0, 0, 0],
[ 12, 125, 13],
[ 0, 0, 0]]])

Multiply each element of a numpy array by a tuple

Given a 2D numpy image array with shape (height, width, 3), and BGR tuples as elements, I want to multiply each element by a kernel to extract the B/G/R channels individually. The blue kernel, for example, would be (1, 0, 0). Something like this:
# extact color channel
def extract_color_channel(image, kernel):
channel = np.copy(image)
height, width = image.shape[:2]
for y in range(0, height):
for x in range(0, width):
channel[y,x] = image[y, x] * kernel
return channel
# extract blue channel
def extract_blue(image):
return extract_color_channel(image, (1, 0, 0))
What is the most efficient "numpy way" to do this?
With a sample array:
In [220]: arr = np.arange(5*5*3).reshape(5,5,3)
Basic indexing is the most efficient way (this will be a view)
In [221]: arr[:,:,0]
Out[221]:
array([[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57],
[60, 63, 66, 69, 72]])
The [1,0,0] list is not what you want. But you could cast it as a bool array.
In [222]: kernel = np.array([1,0,0],dtype=bool)
In [223]: kernel
Out[223]: array([ True, False, False], dtype=bool)
In [224]: arr[:,:,kernel].shape
Out[224]: (5, 5, 1)
In [225]: arr[:,:,kernel].squeeze()
Out[225]:
array([[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57],
[60, 63, 66, 69, 72]])
Notice that the shape with the boolean is still 3d. If you don't want that, then you'll need to reshape or squeeze that last dimension out. This indexing is slower since it makes a copy.
This boolean indexing is the equivalent of
In [226]: arr[:,:,[0]].shape
Out[226]: (5, 5, 1)
where [0] is the location of the 'true' value(s) in kernel.
You could also use a dot (matrix product):
In [228]: np.dot(arr,[1,0,0])
Out[228]:
array([[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57],
[60, 63, 66, 69, 72]])
It will be slower than indexing.
Element multiplication:
In [232]: arr*np.array([1,0,0])
Out[232]:
array([[[ 0, 0, 0],
[ 3, 0, 0],
[ 6, 0, 0],
[ 9, 0, 0],
[12, 0, 0]],
[[15, 0, 0],
[18, 0, 0],
....
[66, 0, 0],
[69, 0, 0],
[72, 0, 0]]])
In this multiplication the [1,0,0] behaves as though it were a (1,1,3) array, and broadcasts with the (n,n,3) just fine.

Sum along axis in numpy array

I want to understand how this ndarray.sum(axis=) works. I know that axis=0 is for columns and axis=1 is for rows.
But in case of 3 dimensions(3 axes) its difficult to interpret below result.
arr = np.arange(0,30).reshape(2,3,5)
arr
Out[1]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
arr.sum(axis=0)
Out[2]:
array([[15, 17, 19, 21, 23],
[25, 27, 29, 31, 33],
[35, 37, 39, 41, 43]])
arr.sum(axis=1)
Out[8]:
array([[15, 18, 21, 24, 27],
[60, 63, 66, 69, 72]])
arr.sum(axis=2)
Out[3]:
array([[ 10, 35, 60],
[ 85, 110, 135]])
Here in this example of 3 axes array of shape(2,3,5), there are 3 rows and 5 columns. But if i look at this array as whole, seems like only two rows (both with 3 array elements).
Can anyone please explain how this sum works on array of 3 or more axes(dimensions).
If you want to keep the dimensions you can specify keepdims:
>>> arr = np.arange(0,30).reshape(2,3,5)
>>> arr.sum(axis=0, keepdims=True)
array([[[15, 17, 19, 21, 23],
[25, 27, 29, 31, 33],
[35, 37, 39, 41, 43]]])
Otherwise the axis you sum along is removed from the shape. An easy way to keep track of this is using the numpy.ndarray.shape property:
>>> arr.shape
(2, 3, 5)
>>> arr.sum(axis=0).shape
(3, 5) # the first entry (index = axis = 0) dimension was removed
>>> arr.sum(axis=1).shape
(2, 5) # the second entry (index = axis = 1) was removed
You can also sum along multiple axis if you want (reducing the dimensionality by the amount of specified axis):
>>> arr.sum(axis=(0, 1))
array([75, 81, 87, 93, 99])
>>> arr.sum(axis=(0, 1)).shape
(5, ) # first and second entry is removed
Here is another way to interpret this. You can consider a multi-dimensional array as a tensor, T[i][j][k], while i, j, k represents axis 0,1,2 respectively.
T.sum(axis = 0) mathematically will be equivalent to:
Similary, T.sum(axis = 1):
And, T.sum(axis = 2):
So in another word, the axis will be summed over, for instance, axis = 0, the first index will be summed over. If written in a for loop:
result[j][k] = sum(T[i][j][k] for i in range(T.shape[0])) for all j,k
for axis = 1:
result[i][k] = sum(T[i][j][k] for j in range(T.shape[1])) for all i,k
etc.
numpy displays a (2,3,5) array as 2 blocks of 3x5 arrays (3 rows, 5 columns). Or call them 'planes' (MATLAB would show it as 5 blocks of 2x3).
The numpy display also matches a nested list - a list of two sublists; each with 3 sublists. Each of those is 5 elements long.
In the 3x5 2d case, axis 0 sums along the size 3 dimension, resulting in a 5 element array. The descriptions 'sum over rows' or 'sum along colulmns' are a little vague in English. Focus on the results, the change in shape, and which values are being summed, not on the description.
Back to the 3d case:
With axis=0, it sums along the 1st dimension, effectively removing it, leaving us with a 3x5 array. 0+15=16, 1+16=17 etc.
Axis 1, condenses the size 3 dimension, result is 2x5. 0+5+10=15, etc.
Axis 2, condense the size 5 dimenson, result is 2x3, sum((0,1,2,3,4))
Your example is good, since the 3 dimensions are different, and it is easier to see which one was eliminated during the sum.
With 2d there's some ambiguity; 'sum over rows' - does that mean the rows are eliminated or retained? With 3d there's no ambiguity; with axis=0, you can only remove it, leaving the other 2.
The axis you specify is the one that is effectively removed. So given a shape of (2,3,5), axis 0 gives (3,5), axis 1 gives (2,5), etc. This extends to any number of dimensions.
You seem to be confused by the output style of numpy arrays. The "row" of the output is almost always the last index, not the first. Example:
x=np.arange(1,4)
y=np.arange(10,31,10)
z=np.arange(100,301,100)
xy=x[:,None]+y[None,:]
xy
Out[100]:
array([[11, 21, 31],
[12, 22, 32],
[13, 23, 33]])
Notice the tens place increments on the row, not the column, even though y is the second index.
xyz=x[:,None,None]+y[None,:,None]+z[None,None,:]
xyz
Out[102]:
array([[[111, 211, 311],
[121, 221, 321],
[131, 231, 331]],
[[112, 212, 312],
[122, 222, 322],
[132, 232, 332]],
[[113, 213, 313],
[123, 223, 323],
[133, 233, 333]]])
Now the hundred's place increments in the row, even though z is the last index. This can be somewhat counter-intuitive to beginners.
Thus when you do np.sum(x,index=-1) you will always sum over the "rows" as shown in the np.array([]) format. Looking at the arr.sum(axis=2)[0,0] that's 0+1+2+3+4=10.
Think of a multi-dimensional array as a tree. Each dimension is a level in the tree. Each grouping at that level is a node. A sum along a specific axis (say axis=4) means coalescing (overlaying) all nodes at that level into a single node (under their respective parents). Sub-trees rooted at the overlaid nodes at that level are stacked on top of each other. All overlapping nodes' values are added together.
Picture: https://ibb.co/dg3P3w
It's maybe a little easier to see with a simpler 3D array. After filling the array with ones, the numbers in the sums come out to be the size of the particular dimension summed over! The other two dimensions in each case are left intact.
arr = np.arange(0,60).reshape(4,3,5)
arr
Out[10]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]],
[[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44]],
[[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
arr=arr*0+1
arr
Out[12]:
array([[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]]])
arr0=arr.sum(axis=0,keepdims=True)
arr2=arr.sum(axis=2,keepdims=True)
arr1=arr.sum(axis=1,keepdims=True)
arr0
Out[20]:
array([[[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4]]])
arr1
Out[21]:
array([[[3, 3, 3, 3, 3]],
[[3, 3, 3, 3, 3]],
[[3, 3, 3, 3, 3]],
[[3, 3, 3, 3, 3]]])
arr2
Out[22]:
array([[[5],
[5],
[5]],
[[5],
[5],
[5]],
[[5],
[5],
[5]],
[[5],
[5],
[5]]])

Categories