Slicing 3D numpy array using list of index - python

The objective is to slice 3D array using list of index.
Here, the array is of shape 2,5,5. For simplicity, let assume the index 0 to 4 label as A,B,C,D,E.
Assume we have 3d array as below
array([[[44, 47, 64, 67, 67],
[ 9, 83, 21, 36, 87],
[70, 88, 88, 12, 58],
[65, 39, 87, 46, 88],
[81, 37, 25, 77, 72]],
[[ 9, 20, 80, 69, 79],
[47, 64, 82, 99, 88],
[49, 29, 19, 19, 14],
[39, 32, 65, 9, 57],
[32, 31, 74, 23, 35]]], dtype=int64)
The index of interest is [1,3,4]. Again, we label this as B,D,E`. The expected output, when slicing the 3D array based on the index is as below
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]], dtype=int64)
However, slicing the array as below
import numpy as np
np.random.seed(0)
arr = np.random.randint(0, 100, size=(2, 5, 5))
k=arr[:,(1,3,4),(1,3,4)]
does not produced the expect output.
In actual use case, the number of element to be sliced is > 3 elements (> B,D,E). Sorry for the lack of correct terminology used

Try this, which is similar structure to your arr[:,idx,idx] but using np.ix_(). Do read the documentation for np.ix().-
idx = [1,3,4]
ixgrid = np.ix_(idx,idx)
arr[:,ixgrid[0],ixgrid[1]]
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]])
Explanation
What you are WANT to do is extract a mesh from the last 2 axes of the array. But what you are doing is extract exact indexes from each of the 2 axes.
When you use arr[:,(1,3,4),(1,3,4)], you are essentially asking for (1,1), (3,3) and (4,4) from the two matrices arr[0] and arr[1]
What you need is to extract a mesh. This can be achieved with np.ix_ and the magic of broadcasting.
If you ask for ...
[[1],
[3], and [1,3,4]
[4]]
... which is what the np.ix_ constructs, you broadcast the indexes and instead ask for a cross product between them, which is (1,1), (1,3), (1,4), (3,1), (3,3)... etc.
Hope that clarifies why you get the result you are getting and how you can actually get what you need.

The problem
Advanced indexing expects all dimensions to be indexed explicitly. What you're doing here is grabbing the elements at coordinates (1, 1), (3, 3), (4, 4) in each array along axis 0.
The solution
What you need to do is this instead:
idx = (1, 3, 4) # the indices of interest
arr[np.ix_((0, 1), idx, idx)]
Where (0, 1) corresponds to the first two arrays along axis 0.
Output:
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]], dtype=int64)
As shown above, np.ix_((0, 1), idx, idx)) produces an object which can be used for advanced indexing. The (0, 1) means that you're explicitly selecting the elements from the arrays arr[0] and arr[1]. If you have a more general 3D array of shape (n, m, q) and want to grab the same subarray out of every array along axis 0, you can use
np.ix_(np.arange(arr.shape[0]), idx, idx))
As your indices. Note that idx is repeated here because you wanted those specific indices but in general they don't need to match.
Generalizing
More generally, you can slice and dice however you want like so:
In [1]: arrays_to_select = (0, 1)
In [2]: rows_to_select = (1, 3, 4)
In [3]: cols_to_select = (1, 3, 4)
In [4]: indices = np.ix_(arrays_to_select, rows_to_select, cols_to_select)
In [5]: arr[indices]
Out[5]:
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]], dtype=int64)
Let's consider some other shape:
In [4]: x = np.random.randint(0, 9, (4, 3, 5))
In [5]: x
Out[5]:
array([[[1, 0, 2, 1, 0],
[3, 5, 1, 4, 3],
[1, 8, 1, 4, 2]],
[[1, 6, 8, 2, 8],
[0, 0, 4, 2, 3],
[8, 5, 6, 2, 5]],
[[4, 4, 8, 6, 0],
[3, 0, 1, 2, 8],
[0, 8, 2, 4, 3]],
[[7, 8, 8, 1, 4],
[5, 7, 4, 8, 5],
[7, 5, 5, 3, 4]]])
In [6]: rows = (0, 2)
In [7]: cols = (0, 2, 3, 4)
By using those rows and cols, you'll be grabbing the subarrays composed of all the elements from columns 0 through 4, from only the rows 0 and 2. Let's verify that with the first array along axis 0:
In [8]: arrs = (0,) # A 1-tuple which will give us only the first array along axis 0
In [9]: x[np.ix_(arrs, rows, cols)]
Out[9]:
array([[[1, 2, 1, 0],
[1, 1, 4, 2]]])
Now suppose you want the subarrays produced by rows and cols of only the first and last arrays along axis 0. You can explicitly select (0, -1):
In [10]: arrs = (0, -1)
In [11]: x[np.ix_(arrs, rows, cols)]
Out[11]:
array([[[1, 2, 1, 0],
[1, 1, 4, 2]],
[[7, 8, 1, 4],
[7, 5, 3, 4]]])
If, instead, you want that same subarray from all the arrays along axis 0:
In [12]: arrs = np.arange(x.shape[0])
In [13]: arrs
Out[13]: array([0, 1, 2, 3])
In [14]: x[np.ix_(arrs, rows, cols)]
Out[14]:
array([[[1, 2, 1, 0],
[1, 1, 4, 2]],
[[1, 8, 2, 8],
[8, 6, 2, 5]],
[[4, 8, 6, 0],
[0, 2, 4, 3]],
[[7, 8, 1, 4],
[7, 5, 3, 4]]])

Related

Appending 2-D numpy array to 3-D numpy array

I have a loop that creates a new (2, 3) array every time. I want to append these arrays together to create a new array 3-D array, I think an example explains what I want to do best. Say I have an array arr1 = array([[1, 2, 3], [4, 5, 6]]) and an array arr2 = array([[11, 22, 33], [44, 55, 66]]). I want to get a new array by "adding the arrays" so something like arr3 = np.array([[[1, 11], [2, 22], [3, 33]], [[4,44], [5,55], [6,66]]]). In practice this would look something like:
import numpy as np
n_samples = 10
total_arr = np.empty([2, 3, n_samples])
for i in range(n_samples):
arr = np.random.rand(2,3)
total_arr.append(arr) #This is the step I don't know what do do with
print(total_arr.shape)
>>> (2, 3, 10) #Where 10 is whatever n_samples is
My current method is to cast total_arr to a list with total_lst = total_arr.tolist() and append each arr[i,j] to the lists in total_lst with a for loop. So something like total_list[i][j].append(arr[i,j]) but this is taking much too long, is there a numpy solution for this?
Thanks
In [179]: arr1 = np.array([[1, 2, 3], [4, 5, 6]]); arr2 = np.array([[11, 22, 33], [44, 55, 66]])
In [180]: arr1
Out[180]:
array([[1, 2, 3],
[4, 5, 6]])
In [181]: arr2
Out[181]:
array([[11, 22, 33],
[44, 55, 66]])
np.stack can join a list of arrays along a new axis. The default is to act like np.array([arr1, arr2]), but it looks like you want a new last axis:
In [182]: np.stack([arr1, arr2], axis=2)
Out[182]:
array([[[ 1, 11],
[ 2, 22],
[ 3, 33]],
[[ 4, 44],
[ 5, 55],
[ 6, 66]]])
In general, collecting the arrays in a list and doing one 'join' at the end is best. Trying to the array join iterative is slower, and harder to do right.
Alternatively you can create an array of the right target size, and assign elements/blocks.
===
New first axis example:
In [183]: np.stack([arr1, arr2])
Out[183]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[11, 22, 33],
[44, 55, 66]]])
In [184]: np.stack([arr1, arr2]).transpose(1,2,0)
Out[184]:
array([[[ 1, 11],
[ 2, 22],
[ 3, 33]],
[[ 4, 44],
[ 5, 55],
[ 6, 66]]])

Select different slices from each numpy row

I have a 3d tensor and I want to select different slices from the dim=2. something like a[[0, 1], :, [slice(2, 4), slice(1, 3)]].
a=np.arange(2*3*5).reshape(2, 3, 5)
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
# then I want something like a[[0, 1], :, [slice(2, 4), slice(1, 3)]]
# that gives me np.stack([a[0, :, 2:4], a[1, :, 1:3]]) without a for loop
array([[[ 2, 3],
[ 7, 8],
[12, 13]],
[[16, 17],
[21, 22],
[26, 27]]])
and I've seen this and it is not what I want.
You can use advanced indexing as explained here. You will have to pass the row ids which are [0, 1] in your case and the column ids 2, 3 and 1, 2. Here 2,3 means [2:4] and 1, 2 means [1:3]
import numpy as np
a=np.arange(2*3*5).reshape(2, 3, 5)
rows = np.array([[0], [1]], dtype=np.intp)
cols = np.array([[2, 3], [1, 2]], dtype=np.intp)
aa = np.stack(a[rows, :, cols]).swapaxes(1, 2)
# array([[[ 2, 3],
# [ 7, 8],
# [12, 13]],
# [[16, 17],
# [21, 22],
# [26, 27]]])
Another equivalent way to avoid swapaxes and getting the result in desired format is
aa = np.stack(a[rows, :, cols], axis=2).T
A third way I figured out is by passing the list of indices. Here [0, 0] will correspond to [2,3] and [1, 1] will correspond to [1, 2]. The swapaxes is just to get your desired format of output
a[[[0,0], [1,1]], :, [[2,3], [1,2]]].swapaxes(1,2)
A solution...
import numpy as np
a = np.arange(2*3*5).reshape(2, 3, 5)
np.array([a[0,:,2:4], a[1,:,1:3]])

Multiplying arrays with broadcasting

I have an mxn A matrix and an nxr B matrix that I want to multiply in a specific way to get an mxr matrix. I want to multiply every element in the ith column of A as a scalar to the ith row of B and the sum the n matrices
For example
a = [[0, 1, 2],
[3, 4, 5],
b = [[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]]
The product would be
a*b = [[0, 0, 0, 0], + [[4, 5, 6, 7], + [[16, 18, 20, 22], = [[20, 23, 26, 29],
[0, 3, 6, 9]] [16, 20, 24, 28]] [40, 45, 50, 55]] [56, 68, 80, 92]]
I can't use any loops so I'm pretty sure I have to use broadcasting but I don't know how. Any help is appreciated
Your input matrices are of shape (2, 3) and (3, 4) respectively and the result you want is of shape (2, 4).
What you need is just a dot product of your two matrices as
a = np.array([[0, 1, 2],
[3, 4, 5]])
b = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
print (np.dot(a,b))
# array([[20, 23, 26, 29],
# [56, 68, 80, 92]])

python numpy array from index array

I have a 3d array a = np.arange(108).reshape(6, 6, 3). I want to grab certain indices of the array, as defined by i = np.array([[0, 1], [1, 3], [2, 1]]) such that the result is [[3, 4, 5], [27, 28, 29], [39, 40, 41]]. I need an efficient way to do this, as my actual arrays are significantly larger.
Extract the first and second dimension indices from i, then use advanced indexing:
a[i[:,0], i[:,1], :] # or a[i[:,0], i[:,1]]
#array([[ 3, 4, 5],
# [27, 28, 29],
# [39, 40, 41]])

Sum along axis in numpy array

I want to understand how this ndarray.sum(axis=) works. I know that axis=0 is for columns and axis=1 is for rows.
But in case of 3 dimensions(3 axes) its difficult to interpret below result.
arr = np.arange(0,30).reshape(2,3,5)
arr
Out[1]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
arr.sum(axis=0)
Out[2]:
array([[15, 17, 19, 21, 23],
[25, 27, 29, 31, 33],
[35, 37, 39, 41, 43]])
arr.sum(axis=1)
Out[8]:
array([[15, 18, 21, 24, 27],
[60, 63, 66, 69, 72]])
arr.sum(axis=2)
Out[3]:
array([[ 10, 35, 60],
[ 85, 110, 135]])
Here in this example of 3 axes array of shape(2,3,5), there are 3 rows and 5 columns. But if i look at this array as whole, seems like only two rows (both with 3 array elements).
Can anyone please explain how this sum works on array of 3 or more axes(dimensions).
If you want to keep the dimensions you can specify keepdims:
>>> arr = np.arange(0,30).reshape(2,3,5)
>>> arr.sum(axis=0, keepdims=True)
array([[[15, 17, 19, 21, 23],
[25, 27, 29, 31, 33],
[35, 37, 39, 41, 43]]])
Otherwise the axis you sum along is removed from the shape. An easy way to keep track of this is using the numpy.ndarray.shape property:
>>> arr.shape
(2, 3, 5)
>>> arr.sum(axis=0).shape
(3, 5) # the first entry (index = axis = 0) dimension was removed
>>> arr.sum(axis=1).shape
(2, 5) # the second entry (index = axis = 1) was removed
You can also sum along multiple axis if you want (reducing the dimensionality by the amount of specified axis):
>>> arr.sum(axis=(0, 1))
array([75, 81, 87, 93, 99])
>>> arr.sum(axis=(0, 1)).shape
(5, ) # first and second entry is removed
Here is another way to interpret this. You can consider a multi-dimensional array as a tensor, T[i][j][k], while i, j, k represents axis 0,1,2 respectively.
T.sum(axis = 0) mathematically will be equivalent to:
Similary, T.sum(axis = 1):
And, T.sum(axis = 2):
So in another word, the axis will be summed over, for instance, axis = 0, the first index will be summed over. If written in a for loop:
result[j][k] = sum(T[i][j][k] for i in range(T.shape[0])) for all j,k
for axis = 1:
result[i][k] = sum(T[i][j][k] for j in range(T.shape[1])) for all i,k
etc.
numpy displays a (2,3,5) array as 2 blocks of 3x5 arrays (3 rows, 5 columns). Or call them 'planes' (MATLAB would show it as 5 blocks of 2x3).
The numpy display also matches a nested list - a list of two sublists; each with 3 sublists. Each of those is 5 elements long.
In the 3x5 2d case, axis 0 sums along the size 3 dimension, resulting in a 5 element array. The descriptions 'sum over rows' or 'sum along colulmns' are a little vague in English. Focus on the results, the change in shape, and which values are being summed, not on the description.
Back to the 3d case:
With axis=0, it sums along the 1st dimension, effectively removing it, leaving us with a 3x5 array. 0+15=16, 1+16=17 etc.
Axis 1, condenses the size 3 dimension, result is 2x5. 0+5+10=15, etc.
Axis 2, condense the size 5 dimenson, result is 2x3, sum((0,1,2,3,4))
Your example is good, since the 3 dimensions are different, and it is easier to see which one was eliminated during the sum.
With 2d there's some ambiguity; 'sum over rows' - does that mean the rows are eliminated or retained? With 3d there's no ambiguity; with axis=0, you can only remove it, leaving the other 2.
The axis you specify is the one that is effectively removed. So given a shape of (2,3,5), axis 0 gives (3,5), axis 1 gives (2,5), etc. This extends to any number of dimensions.
You seem to be confused by the output style of numpy arrays. The "row" of the output is almost always the last index, not the first. Example:
x=np.arange(1,4)
y=np.arange(10,31,10)
z=np.arange(100,301,100)
xy=x[:,None]+y[None,:]
xy
Out[100]:
array([[11, 21, 31],
[12, 22, 32],
[13, 23, 33]])
Notice the tens place increments on the row, not the column, even though y is the second index.
xyz=x[:,None,None]+y[None,:,None]+z[None,None,:]
xyz
Out[102]:
array([[[111, 211, 311],
[121, 221, 321],
[131, 231, 331]],
[[112, 212, 312],
[122, 222, 322],
[132, 232, 332]],
[[113, 213, 313],
[123, 223, 323],
[133, 233, 333]]])
Now the hundred's place increments in the row, even though z is the last index. This can be somewhat counter-intuitive to beginners.
Thus when you do np.sum(x,index=-1) you will always sum over the "rows" as shown in the np.array([]) format. Looking at the arr.sum(axis=2)[0,0] that's 0+1+2+3+4=10.
Think of a multi-dimensional array as a tree. Each dimension is a level in the tree. Each grouping at that level is a node. A sum along a specific axis (say axis=4) means coalescing (overlaying) all nodes at that level into a single node (under their respective parents). Sub-trees rooted at the overlaid nodes at that level are stacked on top of each other. All overlapping nodes' values are added together.
Picture: https://ibb.co/dg3P3w
It's maybe a little easier to see with a simpler 3D array. After filling the array with ones, the numbers in the sums come out to be the size of the particular dimension summed over! The other two dimensions in each case are left intact.
arr = np.arange(0,60).reshape(4,3,5)
arr
Out[10]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]],
[[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44]],
[[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
arr=arr*0+1
arr
Out[12]:
array([[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]]])
arr0=arr.sum(axis=0,keepdims=True)
arr2=arr.sum(axis=2,keepdims=True)
arr1=arr.sum(axis=1,keepdims=True)
arr0
Out[20]:
array([[[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4]]])
arr1
Out[21]:
array([[[3, 3, 3, 3, 3]],
[[3, 3, 3, 3, 3]],
[[3, 3, 3, 3, 3]],
[[3, 3, 3, 3, 3]]])
arr2
Out[22]:
array([[[5],
[5],
[5]],
[[5],
[5],
[5]],
[[5],
[5],
[5]],
[[5],
[5],
[5]]])

Categories