Query about picking values from numpy array - python

I have a numpy array of shape (206, 482, 3). I wanted to pick the 1st channel so I used name_of_array[:][:][0] but apparently that doesn't select the 1st channel.
I think name_of_array[:,:,0] picks the 1st channel. I don't understand why. Why name_of_array[:][:][0] != name_of_array[:,:,0]?

It's important to understand what each thing does. To do this break up the action left to right. Perhaps rewriting will make this more clear:
x[:][:][0] -> ( ( x[:] )[:] )[0] # Both are valid and equivalent Python syntax
So basically, we apply [:] to x, then [:] to the result, then [0] to this result. What does x[:]? Just return a copy of x! Thus
( (x[:])[:] )[0] == ( (x)[:] )[0] == (x[:])[0] == x[0]
This is of course, not what you expected. On the other hand,
x[:, :, 0]
returns at once the 0 column of all rows of all frames (I'm treating the index as [frame, row, col]).

The short answer: because thats the syntax (see Numpy basics indexing).
arr[:] == arr # full slice of all dimensions of the array
arr[:][:] == arr # full slice of a full slice of all dimensions
arr[:][:][0] == arr # equal to arr[0] because the first 2 [:] slice all
vs
arr[:,:,0] # slice all of 1st dim, slice all of 2nd dim, get 0th of 3rd arr
One way to figure things like this out yourself is to make a simplified example and experiment (heeding How to debug small programs):
import numpy as np
res = np.arange(4 * 3 * 2).reshape(4,3,2)
print(":,:,:")
print(res[:, :, :])
print("\n1:2,1:2,:")
print(res[1:2, 1:2, :])
print("\n:,:,0")
print(res[:, :, 0])
print("\n:,:,1")
print(res[:,:,1])
Output:
# :,:,: == all of it
[[[ 0 1]
[ 2 3]
[ 4 5]]
[[ 6 7]
[ 8 9]
[10 11]]
[[12 13]
[14 15]
[16 17]]
[[18 19]
[20 21]
[22 23]]]
# 1:2,1:2,:
[[[8 9]]]
# :,:,0
[[ 0 2 4]
[ 6 8 10]
[12 14 16]
[18 20 22]]
# :,:,1
[[ 1 3 5]
[ 7 9 11]
[13 15 17]
[19 21 23]]
There are lots of questions about numpy-slicing on SO, some of which are worthwhile studying to advance your knowledge (suggested as probably dupes but they do not address the confusion correctly):
Numpy extract submatrix
Selecting specific rows and columns from NumPy array

Related

Numpy array add inplace, values not summed when select same row multiple times

suppose I have a 2x2 matrix, I want to select a few rows and add inplace with another array of the correct shape. The problem is, when a row is selected multiple times, the values from another array is not summed:
Example:
I have a 2x2 matrix:
>>> import numpy as np
>>> x = np.arange(15).reshape((5,3))
>>> print(x)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
I want to select a few rows, and add values:
>>> x[np.array([[1,1],[2,3]])] # row 1 is selected twice
[[[ 3 4 5]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]]
>>> add_value = np.random.randint(0,10,(2,2,3))
[[[6 1 2] # add to row 1
[9 8 5]] # add to row 1 again!
[[5 0 5] # add to row 2
[1 9 3]]] # add to row 3
>>> x[np.array([[1,1],[2,3]])] += add_value
>>> print(x)
[[ 0 1 2]
[12 12 10] # [12,12,10]=[3,4,5]+[9,8,5]
[11 7 13]
[10 19 14]
[12 13 14]]
as above, the first row is [12,12,10], which means [9,8,5] and [6,1,2] is not summed when added onto the first row. Are there any solutions? Thanks!
This behavior is described in the numpy documentation, near the bottom of this page, under "assigning values to indexed arrays":
https://numpy.org/doc/stable/user/basics.indexing.html#basics-indexing
Quoting:
Unlike some of the references (such as array and mask indices) assignments are always made to the original data in the array (indeed, nothing else would make sense!). Note though, that some actions may not work as one may naively expect. This particular example is often surprising to people:
>>> x = np.arange(0, 50, 10)
>>> x
array([ 0, 10, 20, 30, 40])
>>> x[np.array([1, 1, 3, 1])] += 1
>>> x
array([ 0, 11, 20, 31, 40])
Where people expect that the 1st location will be incremented by 3. In fact, it will only be incremented by 1. The reason is that a new array is extracted from the original (as a temporary) containing the values at 1, 1, 3, 1, then the value 1 is added to the temporary, and then the temporary is assigned back to the original array. Thus the value of the array at x[1] + 1 is assigned to x[1] three times, rather than being incremented 3 times.
Just wanna share what #hpaulj suggests that uses np.add.at:
>>> import numpy as np
>>> x = np.arange(15).reshape((5,3))
>>> select = np.array([[1,1],[2,3]])
>>> add_value = np.array([[[6,1,2],[9,8,5]],[[5,0,5],[1,9,3]]])
>>> np.add.at(x, select.flatten(), add_value.reshape(-1, add_value.shape[-1]))
[[ 0 1 2]
[18 13 12]
[11 7 13]
[10 19 14]
[12 13 14]]
Now the first row is [18,13,12] which is the sum of [3,4,5], [6,1,2] and [9,8,5]

3D slice from 3D array

I have a large 3D, N x N x N, numpy array with a value at each index in the array.
I want to be able to take cubic slices from the array using a center point:
def take_slice(large_array, center_point):
...
return cubic_slice_from_center
To illustrate, I want the cubic_slice_from_center to come back with the following shape, where slice[1][1][1] would be the value of the center point used to generate the slice:
print(cubic_slice_from_center)
array([[[0.32992015, 0.30037145, 0.04947877],
[0.0158681 , 0.26743224, 0.49967057],
[0.04274621, 0.0738851 , 0.60360489]],
[[0.78985965, 0.16111745, 0.51665212],
[0.08491344, 0.30240689, 0.23544363],
[0.47282742, 0.5777977 , 0.92652398]],
[[0.78797628, 0.98634545, 0.17903971],
[0.76787071, 0.29689657, 0.08112121],
[0.08786254, 0.06319838, 0.27050039]]])
I looked at a couple of ways to do this. One way was the following:
def get_cubic_slice(space, slice_center_x, slice_center_y, slice_center_z):
return space[slice_center_x-1:slice_center_x+2,
slice_center_y-1:slice_center_y+2,
slice_center_z-1:slice_center_z+2]
This works so long as the cubic slice is not on the edge but, if it is on the edge, it returns an empty array!
Sometimes, the center point of the slice will be on the edge of the 3D numpy array. When this occurs, rather than return nothing, I would like to return the values of the slice of cubic space that are within the bounds of the space and, where the slice would be out of bounds, fill the return array with np.nan values.
For example, for a 20 x 20 x 20 space, with indices 0-19 for the x, y and z axes, I would like the get_cubic_slice function to return the following kind of result for the point (0,5,5):
print(get_cubic_slice(space,0,5,5))
array([[[np.nan, np.nan, np.nan],
[np.nan , np.nan, np.nan],
[np.nan, np.nan , np.nan]],
[[0.78985965, 0.16111745, 0.51665212],
[0.08491344, 0.30240689, 0.23544363],
[0.47282742, 0.5777977 , 0.92652398]],
[[0.78797628, 0.98634545, 0.17903971],
[0.76787071, 0.29689657, 0.08112121],
[0.08786254, 0.06319838, 0.27050039]]])
What would be the best way to do this with numpy?
x = np.arange(27).reshape(3,3,3)
[[[ 0 1 2] [ 3 4 5] [ 6 7 8]]
[[ 9 10 11] [12 13 14] [15 16 17]]
[[18 19 20] [21 22 23] [24 25 26]]]
x[1][1][2]
14
x[1:, 0:2, 1:4]
[[[ 0] [ 3]]
[[ 9] [12]]
[[18] [21]]]
This is the way how we can do slicing in 3D array

Sorting an 3-dimensional array by a single row/column in Python - using vectorisation

I have a 3D multidimensional arr = (x, y, z) shaped numpy array. Shape = (10000, 99, 2) in this example.
I.e. we have 10000 instances of 99 x 2 two dimensional arrays.
I would like to sort the whole array by the values in the z index i.e. ranking according to the 99 variables across rows in each column, across each instance.
Is there an easy way to do this using vectorisation? I'm aware I could loop over 10000 iterations, sorting the 2d array like below and combining to a 3d output.
np.unique(arr[:,0], return_inverse=True)
np.unique(arr[:,1], return_inverse=True)
Given I have 10000 outer instances, I am however interested in avoiding loops and sorting all 10000 values in a more efficient manner.
I am not sure if I understand the z score completely, but you can try:
np.sort(arr,axis=1)
An example 3-d input:
import numpy as np
rng_seed = 42 # control reproducibility
rng = np.random.RandomState(rng_seed)
arr=rng.randint(0,40,20).reshape(2,5,2)
The input looks like:
[[[38 28]
[14 7]
[20 38]
[18 22]
[10 10]]
[[23 35]
[39 23]
[ 2 21]
[ 1 23]
[29 37]]]
Applying:
arr1=np.sort(arr,axis=1)
print (arr1)
Gives you the sorted array based on column within each instance:
[[[10 7]
[14 10]
[18 22]
[20 28]
[38 38]]
[[ 1 21]
[ 2 23]
[23 23]
[29 35]
[39 37]]]
If you want the rank of each value instead, try:
arr_rank = arr.argsort(axis=1)
print (arr_rank)
The output is:
[[[4 1]
[1 4]
[3 3]
[2 0]
[0 2]]
[[3 2]
[2 1]
[0 3]
[4 0]
[1 4]]]

Numpy: Multidimensional index. Row by row with no loop

I have a Nx2x2x2 array called A. I also have a Nx2 array called B, which tells me the position of the last two dimensions of A in which I am interested. I am currently getting a Nx2 array, either by using a loop (as in C in the code below) or by using a list comprehension (as in D in the code below). I want to know whether there would be time gains from vectorization, and, if so, how to vectorize this task.
My current approach to vectorization (E in the code below), is using B to index each submatrix of A, so it does not return what I want. I want E to return the same as C or D.
Input:
A=np.reshape(np.arange(0,32),(4,2,2,2))
print("A")
print(A)
B=np.concatenate((np.array([0,1,0,1])[:,np.newaxis],np.array([1,1,0,0])[:,np.newaxis]),axis=1)
print("B")
print(B)
C=np.empty(shape=(4,2))
for n in range(0, 4):
C[n,:]=A[n,:,B[n,0],B[n,1]]
print("C")
print(C)
D = np.array([A[n,:,B[n,0],B[n,1]] for n in range(0, 4)])
print("D")
print(D)
E=A[:,:,B[:,0],B[:,1]]
print("E")
print(E)
Output:
A
[[[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]]
[[[ 8 9]
[10 11]]
[[12 13]
[14 15]]]
[[[16 17]
[18 19]]
[[20 21]
[22 23]]]
[[[24 25]
[26 27]]
[[28 29]
[30 31]]]]
B
[[0 1]
[1 1]
[0 0]
[1 0]]
C
[[ 1. 5.]
[ 11. 15.]
[ 16. 20.]
[ 26. 30.]]
D
[[ 1 5]
[11 15]
[16 20]
[26 30]]
E
[[[ 1 3 0 2]
[ 5 7 4 6]]
[[ 9 11 8 10]
[13 15 12 14]]
[[17 19 16 18]
[21 23 20 22]]
[[25 27 24 26]
[29 31 28 30]]]
The complicated slicing operation could be done in a vectorized manner like so -
shp = A.shape
out = A.reshape(shp[0],shp[1],-1)[np.arange(shp[0]),:,B[:,0]*shp[3] + B[:,1]]
You are using the first and second columns of B to index into the third and fourth dimensions of input 4D array, A. With it means is, basically you are slicing the 4D array, with the last two dimensions being fused together. So, you need to get the linear indices with that fused format using B. Of course, before doing all that, you need to reshape A to a 3D array with A.reshape(shp[0],shp[1],-1).
Verify results for a generic 4D array case -
In [104]: A = np.random.rand(6,3,4,5)
...: B = np.concatenate((np.random.randint(0,4,(6,1)),np.random.randint(0,5,(6,1))),1)
...:
In [105]: C=np.empty(shape=(6,3))
...: for n in range(0, 6):
...: C[n,:]=A[n,:,B[n,0],B[n,1]]
...:
In [106]: shp = A.shape
...: out = A.reshape(shp[0],shp[1],-1)[np.arange(shp[0]),:,B[:,0]*shp[3] + B[:,1]]
...:
In [107]: np.allclose(C,out)
Out[107]: True

How to iterate over initial dimensions of a Numpy array?

I have a Numpy array with shape [1000, 1000, 1000, 3], being the last dimension, sized 3, is contains the triplets of 3D spatial vectors components. How can I use nditer to iterate over each triplet? Like this:
for vec in np.nditer(my_array, op_flags=['writeonly', <???>]):
vec = np.array(something)
I've addressed this question before, but here's a short example:
vec=np.arange(2*2*2*3).reshape(2,2,2,3)
it=np.ndindex(2,2,2)
for i in it:
print(vec[i])
producing:
[0 1 2]
[3 4 5]
[6 7 8]
[ 9 10 11]
[12 13 14]
[15 16 17]
[18 19 20]
[21 22 23]
ndindex constructs a multi-index iterator around a dummy array of the size you give it (here (2,2,2)), and returns it along with a next method.
So you can use ndindex as is, or use it as a model for constructing your on nditer.

Categories