Related
Consider a 4-dimensional numpy array (variable a). We have a.shape = (16, 5, 66, 717).
From the second dimension containing 4 elements, I want to select the second and the fifth:
b = a[:, [1,4],:,:]
b.shape returns (16, 2, 66, 717), so I guess what I did is correct. Now I want to extract 4 elements from the first dimension (eighth, eleventh, twelfth, thirteenth) and two elements from the second dimension (second and fifth):
b = a[[7,10,12,13,14], [1,4],:,:]
which gives an error:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (5,) (2,)
I don't understand why this simultaneous indexing across >1 dimensions of numpy array doesn't work. I guess I could sequentially do b = a[:, [1,4],:,:] and c = b[[7,10,12,13,14],:,:,:] to get what I want, but there must be a way to do that in one step. Could you please help?
Make a smaller 3d array:
In [155]: a = np.arange(24).reshape(2,3,4)
In [158]: a
Out[158]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Selecting two "rows" (on the middle dimension):
In [159]: a[:,[0,2],:]
Out[159]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[20, 21, 22, 23]]])
If we use 2 lists (or arrays) of the same shape, we end up selecting 2 "rows" from [159]:
In [160]: a[[0,1],[0,2],:]
Out[160]:
array([[ 0, 1, 2, 3],
[20, 21, 22, 23]])
If instead the first list/array is a "column vector", we select a (2,2) "block":
In [161]: a[[[0],[1]],[0,2],:]
Out[161]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[20, 21, 22, 23]]])
ix_ can be used to create the same 2 arrays:
In [162]: np.ix_([0,1],[0,2])
Out[162]:
(array([[0],
[1]]),
array([[0, 2]]))
So using ix_ arrays:
In [163]: I,J = np.ix_([0,1],[0,2])
In [164]: a[I,J,:]
Out[164]:
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[20, 21, 22, 23]]])
when I say they broadcast against each other, I mean in the same sense as broadcasting during adding or multiplication:
In [165]: I*10 + J
Out[165]:
array([[ 0, 2],
[10, 12]])
reference: https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing
edit
In [166]: np.ix_([7,10,12,13,14], [1,4])
Out[166]:
(array([[ 7],
[10],
[12],
[13],
[14]]),
array([[1, 4]]))
Regarding your error:
In [167]: np.ix_([7,10,12,13,14], [1,4],:,:)
Input In [167]
np.ix_([7,10,12,13,14], [1,4],:,:)
^
SyntaxError: invalid syntax
ix_ is a function. ':' isn't allowed in a function call. It only works in an indexing, where it's converted to a slice. That's why you get a syntax error.
The objective is to slice 3D array using list of index.
Here, the array is of shape 2,5,5. For simplicity, let assume the index 0 to 4 label as A,B,C,D,E.
Assume we have 3d array as below
array([[[44, 47, 64, 67, 67],
[ 9, 83, 21, 36, 87],
[70, 88, 88, 12, 58],
[65, 39, 87, 46, 88],
[81, 37, 25, 77, 72]],
[[ 9, 20, 80, 69, 79],
[47, 64, 82, 99, 88],
[49, 29, 19, 19, 14],
[39, 32, 65, 9, 57],
[32, 31, 74, 23, 35]]], dtype=int64)
The index of interest is [1,3,4]. Again, we label this as B,D,E`. The expected output, when slicing the 3D array based on the index is as below
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]], dtype=int64)
However, slicing the array as below
import numpy as np
np.random.seed(0)
arr = np.random.randint(0, 100, size=(2, 5, 5))
k=arr[:,(1,3,4),(1,3,4)]
does not produced the expect output.
In actual use case, the number of element to be sliced is > 3 elements (> B,D,E). Sorry for the lack of correct terminology used
Try this, which is similar structure to your arr[:,idx,idx] but using np.ix_(). Do read the documentation for np.ix().-
idx = [1,3,4]
ixgrid = np.ix_(idx,idx)
arr[:,ixgrid[0],ixgrid[1]]
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]])
Explanation
What you are WANT to do is extract a mesh from the last 2 axes of the array. But what you are doing is extract exact indexes from each of the 2 axes.
When you use arr[:,(1,3,4),(1,3,4)], you are essentially asking for (1,1), (3,3) and (4,4) from the two matrices arr[0] and arr[1]
What you need is to extract a mesh. This can be achieved with np.ix_ and the magic of broadcasting.
If you ask for ...
[[1],
[3], and [1,3,4]
[4]]
... which is what the np.ix_ constructs, you broadcast the indexes and instead ask for a cross product between them, which is (1,1), (1,3), (1,4), (3,1), (3,3)... etc.
Hope that clarifies why you get the result you are getting and how you can actually get what you need.
The problem
Advanced indexing expects all dimensions to be indexed explicitly. What you're doing here is grabbing the elements at coordinates (1, 1), (3, 3), (4, 4) in each array along axis 0.
The solution
What you need to do is this instead:
idx = (1, 3, 4) # the indices of interest
arr[np.ix_((0, 1), idx, idx)]
Where (0, 1) corresponds to the first two arrays along axis 0.
Output:
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]], dtype=int64)
As shown above, np.ix_((0, 1), idx, idx)) produces an object which can be used for advanced indexing. The (0, 1) means that you're explicitly selecting the elements from the arrays arr[0] and arr[1]. If you have a more general 3D array of shape (n, m, q) and want to grab the same subarray out of every array along axis 0, you can use
np.ix_(np.arange(arr.shape[0]), idx, idx))
As your indices. Note that idx is repeated here because you wanted those specific indices but in general they don't need to match.
Generalizing
More generally, you can slice and dice however you want like so:
In [1]: arrays_to_select = (0, 1)
In [2]: rows_to_select = (1, 3, 4)
In [3]: cols_to_select = (1, 3, 4)
In [4]: indices = np.ix_(arrays_to_select, rows_to_select, cols_to_select)
In [5]: arr[indices]
Out[5]:
array([[[83, 36, 87],
[39, 46, 88],
[37, 77, 72]],
[[64, 99, 88],
[32, 9, 57],
[31, 23, 35]]], dtype=int64)
Let's consider some other shape:
In [4]: x = np.random.randint(0, 9, (4, 3, 5))
In [5]: x
Out[5]:
array([[[1, 0, 2, 1, 0],
[3, 5, 1, 4, 3],
[1, 8, 1, 4, 2]],
[[1, 6, 8, 2, 8],
[0, 0, 4, 2, 3],
[8, 5, 6, 2, 5]],
[[4, 4, 8, 6, 0],
[3, 0, 1, 2, 8],
[0, 8, 2, 4, 3]],
[[7, 8, 8, 1, 4],
[5, 7, 4, 8, 5],
[7, 5, 5, 3, 4]]])
In [6]: rows = (0, 2)
In [7]: cols = (0, 2, 3, 4)
By using those rows and cols, you'll be grabbing the subarrays composed of all the elements from columns 0 through 4, from only the rows 0 and 2. Let's verify that with the first array along axis 0:
In [8]: arrs = (0,) # A 1-tuple which will give us only the first array along axis 0
In [9]: x[np.ix_(arrs, rows, cols)]
Out[9]:
array([[[1, 2, 1, 0],
[1, 1, 4, 2]]])
Now suppose you want the subarrays produced by rows and cols of only the first and last arrays along axis 0. You can explicitly select (0, -1):
In [10]: arrs = (0, -1)
In [11]: x[np.ix_(arrs, rows, cols)]
Out[11]:
array([[[1, 2, 1, 0],
[1, 1, 4, 2]],
[[7, 8, 1, 4],
[7, 5, 3, 4]]])
If, instead, you want that same subarray from all the arrays along axis 0:
In [12]: arrs = np.arange(x.shape[0])
In [13]: arrs
Out[13]: array([0, 1, 2, 3])
In [14]: x[np.ix_(arrs, rows, cols)]
Out[14]:
array([[[1, 2, 1, 0],
[1, 1, 4, 2]],
[[1, 8, 2, 8],
[8, 6, 2, 5]],
[[4, 8, 6, 0],
[0, 2, 4, 3]],
[[7, 8, 1, 4],
[7, 5, 3, 4]]])
Say I want to do a lot of matrix multiplications in Numpy; what is the fastest way?
For concreteness, say this is the problem: I have two long lists of matrices, and I want to elementwise multiply them together. That is, I have
[a_1, a_2, a_3, ..., a_N]
and
[b_1, b_2, b_3, ..., b_N],
where each a_i, b_i is an nxn matrix (n is small, say n=2), and N is large (say N = 100000), and I want to find the matrix products a_1 * b_1, a_2 * b_2, ...
What is the fastest way to do this using Python and Numpy/Scipy?
some options are:
with a for loop--this is slow since Python loops are slow.
putting the small matrices into two NxN block diagonal matrices A and B--this will result in having to multiply a much bigger matrix than needed.
using vectorize-- this is easiest to code, but isn't any faster than a for loop.
You already can multiply 3D arrays, simply put your list of arrays into numpy arrays, e.g.,
A = np.array([a_1, a_2, ..., a_N])
B = np.array([b_1, b_2, ..., b_N])
Then multiply A # B (# is the matrix multiplication operator). Here's an example using two "lists" of 3x3 arrays:
In [1]: import numpy as np
In [2]: x = np.random.randint(0, 9, (2, 3, 3))
In [3]: y = np.random.randint(0, 9, (2, 3, 3))
In [4]: x
Out[4]:
array([[[0, 4, 8],
[2, 5, 5],
[3, 0, 5]],
[[7, 6, 1],
[7, 0, 7],
[5, 2, 8]]])
In [5]: y
Out[5]:
array([[[7, 2, 6],
[6, 1, 4],
[6, 8, 5]],
[[8, 5, 4],
[8, 2, 7],
[3, 7, 0]]])
In [7]: x # y
Out[7]:
array([[[ 72, 68, 56],
[ 74, 49, 57],
[ 51, 46, 43]],
[[107, 54, 70],
[ 77, 84, 28],
[ 80, 85, 34]]])
To demonstrate that all this does is the product of each matrix at the corresponding index:
In [8]: x[0]
Out[8]:
array([[0, 4, 8],
[2, 5, 5],
[3, 0, 5]])
In [9]: y[0]
Out[9]:
array([[7, 2, 6],
[6, 1, 4],
[6, 8, 5]])
In [10]: x[0] # y[0]
Out[10]:
array([[72, 68, 56],
[74, 49, 57],
[51, 46, 43]])
In [11]: (x # y)[0]
Out[11]:
array([[72, 68, 56],
[74, 49, 57],
[51, 46, 43]])
Just use numpy.matmul or # like usual, it's a ufunc and can do broadcasting, just not on array elements but matrix subarrays. In your case, you just need to stack your (n,m) matrices in a (N,n,m) numpy array and your (m,p) matrices in a (N,m,p) numpy array.
m = np.array([[1, 3], [2,4]])
m
Out[12]:
array([[1, 3],
[2, 4]])
m # m
Out[13]:
array([[ 7, 15],
[10, 22]])
stackedm = np.stack([m,m,m])
stackedm
Out[15]:
array([[[1, 3],
[2, 4]],
[[1, 3],
[2, 4]],
[[1, 3],
[2, 4]]])
stackedm # stackedm
Out[16]:
array([[[ 7, 15],
[10, 22]],
[[ 7, 15],
[10, 22]],
[[ 7, 15],
[10, 22]]])
I have a 3d array a = np.arange(108).reshape(6, 6, 3). I want to grab certain indices of the array, as defined by i = np.array([[0, 1], [1, 3], [2, 1]]) such that the result is [[3, 4, 5], [27, 28, 29], [39, 40, 41]]. I need an efficient way to do this, as my actual arrays are significantly larger.
Extract the first and second dimension indices from i, then use advanced indexing:
a[i[:,0], i[:,1], :] # or a[i[:,0], i[:,1]]
#array([[ 3, 4, 5],
# [27, 28, 29],
# [39, 40, 41]])
I have a 3D numpy array and I want to partition it by the first 2 dimensions (and select all elements in the last one). Is there a simple way I can do that using numpy?
Example: given array
a = array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I would like to split it N ways by the first two axes (while retaining all elements in the last one), e.g.,:
a[0:2, 0:2, :], a[2:3, 2:3, :]
But it doesn't need to be evenly split. Seems like numpy.array_split will split on all axes?
In [179]: np.array_split(a,2,0)
Out[179]:
[array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]]),
array([[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])]
is the same as [a[:2,:,:], a[2:,:,:]]
You could loop on those 2 arrays and apply split on the next axis.
In [182]: a2=[np.array_split(aa,2,1) for aa in a1]
In [183]: a2 # edited for clarity
Out[183]:
[[array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 9, 10, 11],
[12, 13, 14]]]), # (2,2,3)
array([[[ 6, 7, 8]],
[[15, 16, 17]]])], # (2,1,3)
[array([[[18, 19, 20],
[21, 22, 23]]]), # (1,2,3)
array([[[24, 25, 26]]])]] # (1,1,3)
In [184]: a2[0][0].shape
Out[184]: (2, 2, 3)
In [185]: a2[0][1].shape
Out[185]: (2, 1, 3)
In [187]: a2[1][0].shape
Out[187]: (1, 2, 3)
In [188]: a2[1][1].shape
Out[188]: (1, 1, 3)
With the potential of splitting in uneven arrays in each dimension, it is hard to do this in a full vectorized form. And even if the splits were even it's tricky to do this sort of grid splitting because values are not contiguous. In this example there's a gap between 5 and 9 in the first subarray.
A quick list comprehension will do the trick
[np.array_split(arr, 2, axis=1)
for arr in np.array_split(a, 2, axis=0)]
This will result in a list of lists, the items of which contain the arrays you're looking for.