Trying to understand how numpy selects elements when indexing in more than 2 dimensions.
import numpy as np
x = np.arange(24).reshape(2,3,4)
x[:,:,0].shape #(2, 3)
In writing x[:,:,0] I am selecting all the elements along the "depth", all the "rows" and the first column. When thinking about this visually I would have thought numpy would return something with a shape of (2,3,1) but instead the last dimension is dropped. This is reasonable but how does numpy populate the result? Ie in this example, why does x[:,:,0] result in the elements [0,12] forming the first column. Just trying to figure out the general logic which for some reason I am not comprehending at the moment.
General NumPy indexing is complicated, but this is still the easy case. I've always felt that it helps to think in terms of how indexing the result corresponds to indexing the original array.
The result of x[:, :, 0] is an array such that for any indices i and j,
result[i, j] == x[i, j, 0]
Similarly, if you index a 5D array a as a[:, 1, :, 2, :], the result is such that
result[i, j, k] == a[i, 1, j, 2, k]
Related
I have two 3D arrays, one containing the values I am using and one containing indices. I want to fill a 4D array using these two.
Each entry of the index array points towards a row of the input array.4
At first I simply iterated through the values of i, j, and k and manually filled in each row. However, since this is a machine learning project, this method takes way too long.
# x.shape = (8, 2500, 3)
# ind.shape = (8, 2500, 9)
M = np.empty(8, 2500, 9, 3)
for i in range(0, M.shape[0]):
for j in range(0, M.shape[1]):
for k in range(0, M.shape[2]):
M[i, j, k, :] = x[i, ind[i, j, k], :]
Is there a faster way that exists to do this?
You can try something like:
import numpy as np
M = x[np.arange(0,ind.shape[0])[:, None, None], ind]
where [:, None, None] is needed to broadcast np.arange(0,ind.shape[0]) to the correct dimensions for indexing the array x.
As a test, you can generate the array M with your current method, then use the above method to generate an array M_, and confirm that (M == M_).all() returns True.
I make it to be at least 30x as fast.
I want to multiply two big matrices and then to take only diagonal elements of the resulting matrix:
m1[i, j] = sum_k A[i, k] * B[k, j]
m2[i] = m1[i, i]
I can do it this way. However, doing it this way involves a lot of unnecessary operations. A better way to do it would be:
m[i, i] = sum_k A[i, k] * B[k, i]
Is there a way to "force" to do it the second way?
The solution:
T.sum(a * b.dimshuffle(1,0), axis = 1)
The explanation:
To "pair" indices of the first tensor with the indices of the second tensor we can use a pairwise multiplication. In this case all the dimensions of the first tensor will be paired with all the corresponding dimensions of the second tensor. However, to be able to do it we might need to reorder the dimensions since the i_th dimensions of the first tensor is always paired with i_th dimension of the second tensor. So, use transpose in the particular described cases.
After an element wise multiplication you can sum over some dimensions.
gather(params, indices) does the following
output[i, ..., j, :, ... :] = params[indices[i, ..., j], :, ..., :]
so if you have 4-dimensional params and 2-dimensional indices, you end up having 5-dimensional array as a result
the question is how to do
output[i, ..., j, :, ... :] = params[indices[i, :], ..., indices[j, :], :, ..., :]
so that it acts as numpy's
output = params[indices[0], indices[1], .. , :]
(the #206 ticket on github is regarding different issue: it is about numpy-like api, not gathering in general)
one possible way is to use gather_nd, but (as far as I understand) if we want to gather_nd over not all dimensions, we still have to create indices for them, e.g. if we have 10-dimensional array A and we want to index first two dimensions with 2-dimensional array B, like A[B[0], B[1], :] our indices matrix would have to have 11 columns (with 8 redundant).
--- old indices ---- new index
0 0 <all rows of length 8> 0
1 1 <all rows of length 8> 1
...
There's an update on #206 that #ebrevdo is working on generalizing slicing.
Meanwhile, you could flatten your array, construct linear indices for the elements you want, use gather, then reshape back, like was done in another answer by mrry. That's probably not much worse in efficiency than a native implementation
I encountered a quite weird problem when indexing a numpy ndarray. You can produce the result with following code. I don't understand why the result of indexing a is somehow transposed while the result of 2d array b is normal. Thanks.
In [1]: a = np.array(range(6)).reshape((1,2,3))
In [2]: mask = np.array([True, True, True])
In [3]: a
Out[3]:
array([[[0, 1, 2],
[3, 4, 5]]])
In [4]: a[0, :, mask]
Out[4]:
array([[0, 3],
[1, 4],
[2, 5]])
In [5]: a[0, :, mask].shape
Out[5]: (3, 2)
In [6]: b = np.array(range(6)).reshape((2,3))
In [7]: b[:, mask].shape
Out[7]: (2, 3)
a[0, :, mask] mixes advanced indexing with slicing. The : is a "slice index", while the 0 (for this purpose) and mask are consider "advanced indexes".
The rules governing the behavior of indexing when both advanced indexing and slicing are combined state:
There are two parts to the indexing operation, the subspace defined by the basic indexing (excluding integers) and the subspace from the advanced indexing part. Two cases of index combination need to be distinguished:
The advanced indexes are separated by a slice, ellipsis or newaxis. For example x[arr1, :, arr2].
The advanced indexes are all next to each other. For example x[..., arr1, arr2, :] but not x[arr1, :, 1] since 1 is an advanced index in this regard.
In the first case, the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that. In the second case, the dimensions from the advanced indexing operations are inserted into the result array at the same spot as they were in the initial array (the latter logic is what makes simple advanced indexing behave just like slicing).
So since a[0, :, mask] has advanced indexes separated by a slice (the first case), the shape of the resulting array has the axes associated to the advanced indexes pushed to the front and the axes associated with the slice pushed tho the end. Thus the shape is (3, 2) since the mask is associated with the axis of length 3, and the slice, :, associated with the axis of length 2. (The 0 index in effect removes the axis of length 1 from the resultant array so it plays no role in the resultant shape.)
In contrast, b[:, mask] has all the advanced indexes together (the second case). So the shape of the resulting array keeps the axes in place. b[:, mask].shape is thus (2, 3).
I have a multidimensional numpy array, and I need to iterate across a given dimension. Problem is, I won't know which dimension until runtime. In other words, given an array m, I could want
m[:,:,:,i] for i in xrange(n)
or I could want
m[:,:,i,:] for i in xrange(n)
etc.
I imagine that there must be a straightforward feature in numpy to write this, but I can't figure out what it is/what it might be called. Any thoughts?
There are many ways to do this. You could build the right index with a list of slices, or perhaps alter m's strides. However, the simplest way may be to use np.swapaxes:
import numpy as np
m=np.arange(24).reshape(2,3,4)
print(m.shape)
# (2, 3, 4)
Let axis be the axis you wish to loop over. m_swapped is the same as m except the axis=1 axis is swapped with the last (axis=-1) axis.
axis=1
m_swapped=m.swapaxes(axis,-1)
print(m_swapped.shape)
# (2, 4, 3)
Now you can just loop over the last axis:
for i in xrange(m_swapped.shape[-1]):
assert np.all(m[:,i,:] == m_swapped[...,i])
Note that m_swapped is a view, not a copy, of m. Altering m_swapped will alter m.
m_swapped[1,2,0]=100
print(m)
assert(m[1,0,2]==100)
You can use slice(None) in place of the :. For example,
from numpy import *
d = 2 # the dimension to iterate
x = arange(5*5*5).reshape((5,5,5))
s = slice(None) # :
for i in range(5):
slicer = [s]*3 # [:, :, :]
slicer[d] = i # [:, :, i]
print x[slicer] # x[:, :, i]