Numpy: Multidimensional index. Row by row with no loop - python

I have a Nx2x2x2 array called A. I also have a Nx2 array called B, which tells me the position of the last two dimensions of A in which I am interested. I am currently getting a Nx2 array, either by using a loop (as in C in the code below) or by using a list comprehension (as in D in the code below). I want to know whether there would be time gains from vectorization, and, if so, how to vectorize this task.
My current approach to vectorization (E in the code below), is using B to index each submatrix of A, so it does not return what I want. I want E to return the same as C or D.
Input:
A=np.reshape(np.arange(0,32),(4,2,2,2))
print("A")
print(A)
B=np.concatenate((np.array([0,1,0,1])[:,np.newaxis],np.array([1,1,0,0])[:,np.newaxis]),axis=1)
print("B")
print(B)
C=np.empty(shape=(4,2))
for n in range(0, 4):
C[n,:]=A[n,:,B[n,0],B[n,1]]
print("C")
print(C)
D = np.array([A[n,:,B[n,0],B[n,1]] for n in range(0, 4)])
print("D")
print(D)
E=A[:,:,B[:,0],B[:,1]]
print("E")
print(E)
Output:
A
[[[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]]
[[[ 8 9]
[10 11]]
[[12 13]
[14 15]]]
[[[16 17]
[18 19]]
[[20 21]
[22 23]]]
[[[24 25]
[26 27]]
[[28 29]
[30 31]]]]
B
[[0 1]
[1 1]
[0 0]
[1 0]]
C
[[ 1. 5.]
[ 11. 15.]
[ 16. 20.]
[ 26. 30.]]
D
[[ 1 5]
[11 15]
[16 20]
[26 30]]
E
[[[ 1 3 0 2]
[ 5 7 4 6]]
[[ 9 11 8 10]
[13 15 12 14]]
[[17 19 16 18]
[21 23 20 22]]
[[25 27 24 26]
[29 31 28 30]]]

The complicated slicing operation could be done in a vectorized manner like so -
shp = A.shape
out = A.reshape(shp[0],shp[1],-1)[np.arange(shp[0]),:,B[:,0]*shp[3] + B[:,1]]
You are using the first and second columns of B to index into the third and fourth dimensions of input 4D array, A. With it means is, basically you are slicing the 4D array, with the last two dimensions being fused together. So, you need to get the linear indices with that fused format using B. Of course, before doing all that, you need to reshape A to a 3D array with A.reshape(shp[0],shp[1],-1).
Verify results for a generic 4D array case -
In [104]: A = np.random.rand(6,3,4,5)
...: B = np.concatenate((np.random.randint(0,4,(6,1)),np.random.randint(0,5,(6,1))),1)
...:
In [105]: C=np.empty(shape=(6,3))
...: for n in range(0, 6):
...: C[n,:]=A[n,:,B[n,0],B[n,1]]
...:
In [106]: shp = A.shape
...: out = A.reshape(shp[0],shp[1],-1)[np.arange(shp[0]),:,B[:,0]*shp[3] + B[:,1]]
...:
In [107]: np.allclose(C,out)
Out[107]: True

Related

How to sort 3D array's inner 2d arrays with numpy?

How to sort each 2d array that are in 3d array?
I have an array of shape (10, 1000, 3)
I am trying do a reverse sort by last column in each subarray.
Looping solution:
result = []
for subarr2d in arr3d:
subarr2d_sorted = subarr2d[subarr2d[:, -1].argsort()][::-1]
result.append(subarr2d_sorted)
I want to do that in numpy alone? Is it even possible?
You could use numpy.sort and choose the inner axis to sort along.
import numpy as np
result = np.random.randint(20, size=(3, 4, 3))
print(np.sort(result, axis=2))
# outputs
# [[[ 8 11 11]
# [ 2 11 16]
# [ 3 7 14]
# [ 7 12 12]]
# [[ 8 10 16]
# [ 0 6 14]
# [ 0 16 17]
# [ 0 14 19]]
# [[ 2 4 5]
# [ 3 4 4]
# [ 1 1 11]
# [ 0 8 17]]]
print(np.sort(result, axis=1))
#outputs
# [[[ 7 2 7]
# [11 3 11]
# [12 8 11]
# [16 12 14]]
# [[14 6 0]
# [16 8 0]
# [17 14 0]
# [19 16 10]]
# [[ 2 1 0]
# [ 3 4 1]
# [11 4 4]
# [17 8 5]]]
To get the sorted array in descending order you could use -np.sort(-result, axis=2) and you may also want to check numpy.flip

Dividing numpy 2d array into equal sections

I have a 30*30px image and I converted it to a NumPy array. Now I want to divide this 30*30 image into 9 equal pieces (imagine a tic-tak-toe game). I wrote the code below for that purpose but the problem with my code is that it has two nested loops and in python, that means a straight ticket to lower-performance town (specially for large number of datas). So is there a better way of doing this using NumPy and Numpy indexing?
#Factor is saing that the image should be divided into 9 sections 3*3 = 9 (kinda like 3 rows 3 columns)
def section(img , factor = 3):
secs = []
#This basicaly tests if the image can actually get divided into equal sections
if (img.shape[0] % factor != 0):
return False
#number of pixel in each row and column of the sections
pix_num = int(img.shape[0] / factor)
ptr_x_a = 0
ptr_x_b = pix_num -1
for i in range(factor):
ptr_y_a = 0
ptr_y_b = pix_num - 1
for j in range(factor):
secs.append( img[ptr_x_a :ptr_x_b , ptr_y_a : ptr_y_b] )
ptr_y_a += pix_num
ptr_y_b += pix_num
ptr_x_a += pix_num
ptr_x_b += pix_num
return np.array(secs , dtype = "int16"‍‍‍‍‍‍‍)
P.S: Don't mind reading the whole code, just know that it uses pointers to select different areas of the image.
P.S2: See the image below to get an idea of what's happening. It is a 6*6 image divided into 9 pieces (factor = 3)
If you have an array of shape (K * M, K * N), you can transform it into something of shape (K * K, M, N) using reshape and transpose. For example, if you have K = M = N = 3, you want to transform
>>> a = np.arange(81).reshape(9, 9)
into
[[[ 0, 1, 2],
[ 9, 10, 11],
[18, 19, 20]],
[[ 3, 4, 5],
[12, 13, 14],
[21, 22, 23]],
[[ 6, 7, 8],
[15, 16, 17],
[24, 25, 26]],
...
]]]
The idea is that you need to get the elements lined up in memory in the order shown here (i.e. 0, 1, 2, 9, 10, 11, 18, ...). You can do this by adding the appropriate auxiliary dimensions and transposing:
b = a.reshape(K, M, K, N)
c = b.transpose(0, 2, 1, 3)
d = c.reahape(-1, M, N)
As a one-liner:
a.reshape(K, M, K, N).transpose(0, 2 1, 3).reshape(-1, M, N)
The order of the transpose determines the order of the blocks. The first two dimensions, 0, 2, represent the fact that your inner loop iterates the columns faster than the rows. If you wanted to arrange the blocks by column (iterate the rows faster), you could do
c = b.transpose(2, 0, 1, 3)
Reshaping does not change the memory layout of the elements, but transposing copies data if necessary.
In your particular example, K = 3 and M = N = 10. The code above does not change in any way besides that.
As an aside, your loops can be improved by making the ranges directly over the indices you want rather auxiliary quantities, as well as pre-allocating the output:
result = np.zeros(factor * factor, pix_num, pix_num)
n = 0
for r in range(0, img.shape[0], pix_num):
for c in range(0, img.shape[1], pix_num):
result[n, :, :] = img[r:r + pix_num, c:c + pix_num]
n += 1
a = np.arange(36)
a.resize(6, 6)
print(a)
b = list(map(lambda x: np.array_split(x, 3, axis=1), np.array_split(a, 3, axis=0)))
print(np.array(b).reshape(9,2,2))
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
[[[ 0 1]
[ 6 7]]
[[ 2 3]
[ 8 9]]
[[ 4 5]
[10 11]]
[[12 13]
[18 19]]
[[14 15]
[20 21]]
[[16 17]
[22 23]]
[[24 25]
[30 31]]
[[26 27]
[32 33]]
[[28 29]
[34 35]]]
You can do something like this to get one section:
sec = img[:10]
sec = list(zip(*sec))[:10]
sec = list(zip(*sec))
This will pick out the first 10x10 section.

Query about picking values from numpy array

I have a numpy array of shape (206, 482, 3). I wanted to pick the 1st channel so I used name_of_array[:][:][0] but apparently that doesn't select the 1st channel.
I think name_of_array[:,:,0] picks the 1st channel. I don't understand why. Why name_of_array[:][:][0] != name_of_array[:,:,0]?
It's important to understand what each thing does. To do this break up the action left to right. Perhaps rewriting will make this more clear:
x[:][:][0] -> ( ( x[:] )[:] )[0] # Both are valid and equivalent Python syntax
So basically, we apply [:] to x, then [:] to the result, then [0] to this result. What does x[:]? Just return a copy of x! Thus
( (x[:])[:] )[0] == ( (x)[:] )[0] == (x[:])[0] == x[0]
This is of course, not what you expected. On the other hand,
x[:, :, 0]
returns at once the 0 column of all rows of all frames (I'm treating the index as [frame, row, col]).
The short answer: because thats the syntax (see Numpy basics indexing).
arr[:] == arr # full slice of all dimensions of the array
arr[:][:] == arr # full slice of a full slice of all dimensions
arr[:][:][0] == arr # equal to arr[0] because the first 2 [:] slice all
vs
arr[:,:,0] # slice all of 1st dim, slice all of 2nd dim, get 0th of 3rd arr
One way to figure things like this out yourself is to make a simplified example and experiment (heeding How to debug small programs):
import numpy as np
res = np.arange(4 * 3 * 2).reshape(4,3,2)
print(":,:,:")
print(res[:, :, :])
print("\n1:2,1:2,:")
print(res[1:2, 1:2, :])
print("\n:,:,0")
print(res[:, :, 0])
print("\n:,:,1")
print(res[:,:,1])
Output:
# :,:,: == all of it
[[[ 0 1]
[ 2 3]
[ 4 5]]
[[ 6 7]
[ 8 9]
[10 11]]
[[12 13]
[14 15]
[16 17]]
[[18 19]
[20 21]
[22 23]]]
# 1:2,1:2,:
[[[8 9]]]
# :,:,0
[[ 0 2 4]
[ 6 8 10]
[12 14 16]
[18 20 22]]
# :,:,1
[[ 1 3 5]
[ 7 9 11]
[13 15 17]
[19 21 23]]
There are lots of questions about numpy-slicing on SO, some of which are worthwhile studying to advance your knowledge (suggested as probably dupes but they do not address the confusion correctly):
Numpy extract submatrix
Selecting specific rows and columns from NumPy array

Sorting an 3-dimensional array by a single row/column in Python - using vectorisation

I have a 3D multidimensional arr = (x, y, z) shaped numpy array. Shape = (10000, 99, 2) in this example.
I.e. we have 10000 instances of 99 x 2 two dimensional arrays.
I would like to sort the whole array by the values in the z index i.e. ranking according to the 99 variables across rows in each column, across each instance.
Is there an easy way to do this using vectorisation? I'm aware I could loop over 10000 iterations, sorting the 2d array like below and combining to a 3d output.
np.unique(arr[:,0], return_inverse=True)
np.unique(arr[:,1], return_inverse=True)
Given I have 10000 outer instances, I am however interested in avoiding loops and sorting all 10000 values in a more efficient manner.
I am not sure if I understand the z score completely, but you can try:
np.sort(arr,axis=1)
An example 3-d input:
import numpy as np
rng_seed = 42 # control reproducibility
rng = np.random.RandomState(rng_seed)
arr=rng.randint(0,40,20).reshape(2,5,2)
The input looks like:
[[[38 28]
[14 7]
[20 38]
[18 22]
[10 10]]
[[23 35]
[39 23]
[ 2 21]
[ 1 23]
[29 37]]]
Applying:
arr1=np.sort(arr,axis=1)
print (arr1)
Gives you the sorted array based on column within each instance:
[[[10 7]
[14 10]
[18 22]
[20 28]
[38 38]]
[[ 1 21]
[ 2 23]
[23 23]
[29 35]
[39 37]]]
If you want the rank of each value instead, try:
arr_rank = arr.argsort(axis=1)
print (arr_rank)
The output is:
[[[4 1]
[1 4]
[3 3]
[2 0]
[0 2]]
[[3 2]
[2 1]
[0 3]
[4 0]
[1 4]]]

Apply function to vectors in 3D numpy array

I have a question about how to apply a function to vectors in a 3D numpy array.
My problem is the following: let's say I have an array like this one:
a = np.arange(24)
a = a.reshape([4,3,2])
I want to apply a function to all following vectors to modify them:
[0 6], [1 7], [2 8], [4 10], [3 9] ...
What is the best method to use? As my array is quite big, looping in two of the three dimension is quite long...
Thanks in advance!
You can use function np.apply_along_axis. From the doc:
Apply a function to 1-D slices along the given axis.
For example:
>>> import numpy as np
>>> a = np.arange(24)
>>> a = a.reshape([4,3,2])
>>>
>>> def my_func(a):
... print "vector: " + str(a)
... return sum(a) / len(a)
...
>>> np.apply_along_axis(my_func, 0, a)
vector: [ 0 6 12 18]
vector: [ 1 7 13 19]
vector: [ 2 8 14 20]
vector: [ 3 9 15 21]
vector: [ 4 10 16 22]
vector: [ 5 11 17 23]
array([[ 9, 10],
[11, 12],
[13, 14]])
In example above I've used 0th axis. If you need n axes you can execute this function n times.

Categories