I have a loop that creates a new (2, 3) array every time. I want to append these arrays together to create a new array 3-D array, I think an example explains what I want to do best. Say I have an array arr1 = array([[1, 2, 3], [4, 5, 6]]) and an array arr2 = array([[11, 22, 33], [44, 55, 66]]). I want to get a new array by "adding the arrays" so something like arr3 = np.array([[[1, 11], [2, 22], [3, 33]], [[4,44], [5,55], [6,66]]]). In practice this would look something like:
import numpy as np
n_samples = 10
total_arr = np.empty([2, 3, n_samples])
for i in range(n_samples):
arr = np.random.rand(2,3)
total_arr.append(arr) #This is the step I don't know what do do with
print(total_arr.shape)
>>> (2, 3, 10) #Where 10 is whatever n_samples is
My current method is to cast total_arr to a list with total_lst = total_arr.tolist() and append each arr[i,j] to the lists in total_lst with a for loop. So something like total_list[i][j].append(arr[i,j]) but this is taking much too long, is there a numpy solution for this?
Thanks
In [179]: arr1 = np.array([[1, 2, 3], [4, 5, 6]]); arr2 = np.array([[11, 22, 33], [44, 55, 66]])
In [180]: arr1
Out[180]:
array([[1, 2, 3],
[4, 5, 6]])
In [181]: arr2
Out[181]:
array([[11, 22, 33],
[44, 55, 66]])
np.stack can join a list of arrays along a new axis. The default is to act like np.array([arr1, arr2]), but it looks like you want a new last axis:
In [182]: np.stack([arr1, arr2], axis=2)
Out[182]:
array([[[ 1, 11],
[ 2, 22],
[ 3, 33]],
[[ 4, 44],
[ 5, 55],
[ 6, 66]]])
In general, collecting the arrays in a list and doing one 'join' at the end is best. Trying to the array join iterative is slower, and harder to do right.
Alternatively you can create an array of the right target size, and assign elements/blocks.
===
New first axis example:
In [183]: np.stack([arr1, arr2])
Out[183]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[11, 22, 33],
[44, 55, 66]]])
In [184]: np.stack([arr1, arr2]).transpose(1,2,0)
Out[184]:
array([[[ 1, 11],
[ 2, 22],
[ 3, 33]],
[[ 4, 44],
[ 5, 55],
[ 6, 66]]])
I have a 3d tensor and I want to select different slices from the dim=2. something like a[[0, 1], :, [slice(2, 4), slice(1, 3)]].
a=np.arange(2*3*5).reshape(2, 3, 5)
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
# then I want something like a[[0, 1], :, [slice(2, 4), slice(1, 3)]]
# that gives me np.stack([a[0, :, 2:4], a[1, :, 1:3]]) without a for loop
array([[[ 2, 3],
[ 7, 8],
[12, 13]],
[[16, 17],
[21, 22],
[26, 27]]])
and I've seen this and it is not what I want.
You can use advanced indexing as explained here. You will have to pass the row ids which are [0, 1] in your case and the column ids 2, 3 and 1, 2. Here 2,3 means [2:4] and 1, 2 means [1:3]
import numpy as np
a=np.arange(2*3*5).reshape(2, 3, 5)
rows = np.array([[0], [1]], dtype=np.intp)
cols = np.array([[2, 3], [1, 2]], dtype=np.intp)
aa = np.stack(a[rows, :, cols]).swapaxes(1, 2)
# array([[[ 2, 3],
# [ 7, 8],
# [12, 13]],
# [[16, 17],
# [21, 22],
# [26, 27]]])
Another equivalent way to avoid swapaxes and getting the result in desired format is
aa = np.stack(a[rows, :, cols], axis=2).T
A third way I figured out is by passing the list of indices. Here [0, 0] will correspond to [2,3] and [1, 1] will correspond to [1, 2]. The swapaxes is just to get your desired format of output
a[[[0,0], [1,1]], :, [[2,3], [1,2]]].swapaxes(1,2)
A solution...
import numpy as np
a = np.arange(2*3*5).reshape(2, 3, 5)
np.array([a[0,:,2:4], a[1,:,1:3]])
I have an mxn A matrix and an nxr B matrix that I want to multiply in a specific way to get an mxr matrix. I want to multiply every element in the ith column of A as a scalar to the ith row of B and the sum the n matrices
For example
a = [[0, 1, 2],
[3, 4, 5],
b = [[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]]
The product would be
a*b = [[0, 0, 0, 0], + [[4, 5, 6, 7], + [[16, 18, 20, 22], = [[20, 23, 26, 29],
[0, 3, 6, 9]] [16, 20, 24, 28]] [40, 45, 50, 55]] [56, 68, 80, 92]]
I can't use any loops so I'm pretty sure I have to use broadcasting but I don't know how. Any help is appreciated
Your input matrices are of shape (2, 3) and (3, 4) respectively and the result you want is of shape (2, 4).
What you need is just a dot product of your two matrices as
a = np.array([[0, 1, 2],
[3, 4, 5]])
b = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
print (np.dot(a,b))
# array([[20, 23, 26, 29],
# [56, 68, 80, 92]])
I want to understand how this ndarray.sum(axis=) works. I know that axis=0 is for columns and axis=1 is for rows.
But in case of 3 dimensions(3 axes) its difficult to interpret below result.
arr = np.arange(0,30).reshape(2,3,5)
arr
Out[1]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
arr.sum(axis=0)
Out[2]:
array([[15, 17, 19, 21, 23],
[25, 27, 29, 31, 33],
[35, 37, 39, 41, 43]])
arr.sum(axis=1)
Out[8]:
array([[15, 18, 21, 24, 27],
[60, 63, 66, 69, 72]])
arr.sum(axis=2)
Out[3]:
array([[ 10, 35, 60],
[ 85, 110, 135]])
Here in this example of 3 axes array of shape(2,3,5), there are 3 rows and 5 columns. But if i look at this array as whole, seems like only two rows (both with 3 array elements).
Can anyone please explain how this sum works on array of 3 or more axes(dimensions).
If you want to keep the dimensions you can specify keepdims:
>>> arr = np.arange(0,30).reshape(2,3,5)
>>> arr.sum(axis=0, keepdims=True)
array([[[15, 17, 19, 21, 23],
[25, 27, 29, 31, 33],
[35, 37, 39, 41, 43]]])
Otherwise the axis you sum along is removed from the shape. An easy way to keep track of this is using the numpy.ndarray.shape property:
>>> arr.shape
(2, 3, 5)
>>> arr.sum(axis=0).shape
(3, 5) # the first entry (index = axis = 0) dimension was removed
>>> arr.sum(axis=1).shape
(2, 5) # the second entry (index = axis = 1) was removed
You can also sum along multiple axis if you want (reducing the dimensionality by the amount of specified axis):
>>> arr.sum(axis=(0, 1))
array([75, 81, 87, 93, 99])
>>> arr.sum(axis=(0, 1)).shape
(5, ) # first and second entry is removed
Here is another way to interpret this. You can consider a multi-dimensional array as a tensor, T[i][j][k], while i, j, k represents axis 0,1,2 respectively.
T.sum(axis = 0) mathematically will be equivalent to:
Similary, T.sum(axis = 1):
And, T.sum(axis = 2):
So in another word, the axis will be summed over, for instance, axis = 0, the first index will be summed over. If written in a for loop:
result[j][k] = sum(T[i][j][k] for i in range(T.shape[0])) for all j,k
for axis = 1:
result[i][k] = sum(T[i][j][k] for j in range(T.shape[1])) for all i,k
etc.
numpy displays a (2,3,5) array as 2 blocks of 3x5 arrays (3 rows, 5 columns). Or call them 'planes' (MATLAB would show it as 5 blocks of 2x3).
The numpy display also matches a nested list - a list of two sublists; each with 3 sublists. Each of those is 5 elements long.
In the 3x5 2d case, axis 0 sums along the size 3 dimension, resulting in a 5 element array. The descriptions 'sum over rows' or 'sum along colulmns' are a little vague in English. Focus on the results, the change in shape, and which values are being summed, not on the description.
Back to the 3d case:
With axis=0, it sums along the 1st dimension, effectively removing it, leaving us with a 3x5 array. 0+15=16, 1+16=17 etc.
Axis 1, condenses the size 3 dimension, result is 2x5. 0+5+10=15, etc.
Axis 2, condense the size 5 dimenson, result is 2x3, sum((0,1,2,3,4))
Your example is good, since the 3 dimensions are different, and it is easier to see which one was eliminated during the sum.
With 2d there's some ambiguity; 'sum over rows' - does that mean the rows are eliminated or retained? With 3d there's no ambiguity; with axis=0, you can only remove it, leaving the other 2.
The axis you specify is the one that is effectively removed. So given a shape of (2,3,5), axis 0 gives (3,5), axis 1 gives (2,5), etc. This extends to any number of dimensions.
You seem to be confused by the output style of numpy arrays. The "row" of the output is almost always the last index, not the first. Example:
x=np.arange(1,4)
y=np.arange(10,31,10)
z=np.arange(100,301,100)
xy=x[:,None]+y[None,:]
xy
Out[100]:
array([[11, 21, 31],
[12, 22, 32],
[13, 23, 33]])
Notice the tens place increments on the row, not the column, even though y is the second index.
xyz=x[:,None,None]+y[None,:,None]+z[None,None,:]
xyz
Out[102]:
array([[[111, 211, 311],
[121, 221, 321],
[131, 231, 331]],
[[112, 212, 312],
[122, 222, 322],
[132, 232, 332]],
[[113, 213, 313],
[123, 223, 323],
[133, 233, 333]]])
Now the hundred's place increments in the row, even though z is the last index. This can be somewhat counter-intuitive to beginners.
Thus when you do np.sum(x,index=-1) you will always sum over the "rows" as shown in the np.array([]) format. Looking at the arr.sum(axis=2)[0,0] that's 0+1+2+3+4=10.
Think of a multi-dimensional array as a tree. Each dimension is a level in the tree. Each grouping at that level is a node. A sum along a specific axis (say axis=4) means coalescing (overlaying) all nodes at that level into a single node (under their respective parents). Sub-trees rooted at the overlaid nodes at that level are stacked on top of each other. All overlapping nodes' values are added together.
Picture: https://ibb.co/dg3P3w
It's maybe a little easier to see with a simpler 3D array. After filling the array with ones, the numbers in the sums come out to be the size of the particular dimension summed over! The other two dimensions in each case are left intact.
arr = np.arange(0,60).reshape(4,3,5)
arr
Out[10]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]],
[[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44]],
[[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
arr=arr*0+1
arr
Out[12]:
array([[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]]])
arr0=arr.sum(axis=0,keepdims=True)
arr2=arr.sum(axis=2,keepdims=True)
arr1=arr.sum(axis=1,keepdims=True)
arr0
Out[20]:
array([[[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4]]])
arr1
Out[21]:
array([[[3, 3, 3, 3, 3]],
[[3, 3, 3, 3, 3]],
[[3, 3, 3, 3, 3]],
[[3, 3, 3, 3, 3]]])
arr2
Out[22]:
array([[[5],
[5],
[5]],
[[5],
[5],
[5]],
[[5],
[5],
[5]],
[[5],
[5],
[5]]])