Numpy: 2D array access with 2D array of indices - python

I have two arrays, one is a matrix of index pairs,
a = array([[[0,0],[1,1]],[[2,0],[2,1]]], dtype=int)
and another which is a matrix of data to access at these indices
b = array([[1,2,3],[4,5,6],[7,8,9]])
and I want to able to use the indices of a to get the entries of b. Just doing:
>>> b[a]
does not work, as it gives one row of b for each entry in a, i.e.
array([[[[1,2,3],
[1,2,3]],
[[4,5,6],
[4,5,6]]],
[[[7,8,9],
[1,2,3]],
[[7,8,9],
[4,5,6]]]])
when I would like to use the index pair in the last axis of a to give the two indices of b:
array([[1,5],[7,8]])
Is there a clean way of doing this, or do I need to reshape b and combine the columns of a in a corresponding manner?
In my actual problem a has about 5 million entries, and b is 100-by-100, I'd like to avoid for loops.

Actually, this works:
b[a[:, :, 0],a[:, :, 1]]
Gives array([[1, 5],
[7, 8]]).

For this case, this works
tmp = a.reshape(-1,2)
b[tmp[:,0], tmp[:,1]]

A more general solution, whenever you want to use a 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, in order to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
You can test this by using, e.g.:
B = 1/(np.arange(n*m).reshape(n,-1) + 1)
inds = np.random.randint(0,B.shape[1],(B.shape[0],B.shape[1]))

Related

Iterating over Numpy Array rows in python even with 1 row / 2 columns array

I am trying to iterate over the rows of a numpy array. The array may consist of two columns and multiple rows like [[a, b], [c, d], ...], or sometimes a single row like [a, b].
For the one-dimensional array, when I use enumerate to iterate over rows, python yields the individual elements a then b instead of the complete row [a, b] all at once.
How to I iterate the one-dimensional case in the same way as I would the 2D case?
Numpy iterates over the first dimension no matter what. Check the shape before you iterate.
>>> x = np.array([1, 2])
>>> x.ndim
1
>>> y = np.array([[1, 2], [3, 4], [5, 6]])
>>> y.ndim
2
Probably the simplest method is to always wrap in a call to np.array:
>>> x = np.array(x, ndmin=2, copy=False)
>>> y = np.array(y, ndmin=2, copy=False)
This will prepend a dimension of shape 1 to your array as necessary. It has the advantage that your inputs don't even have to be arrays, just something that can be converted to an array.
Another option is to use the atleast_2d function:
>>> x = np.atleast_2d(x)
All that being said, you are likely sacrificing most of the benefits of using numpy in the first place by attempting a vanilla python loop. Try to vectorize your operation instead.

How do I in a sense "juggle" elements in ndarrays?

What I mean by this is imagine you have an ndarray a with shape (2,3,4). I want to define another ndarray b with shape (3,2,4) such that
b[i][j][k] = a[j][i][k]
Matrix operations only apply to the last 2 index places. If there is a way to make matrix operations act on any 2 chosen index places then everything can be solved.
Thank you
On the same lines of your thought, you can use numpy.einsum() to achieve what you want.
In [21]: arr = np.random.randn(2,3,4)
In [22]: arr.shape
Out[22]: (2, 3, 4)
# swap first two dimensions
In [23]: rolled = np.einsum('ijk->jik', arr)
In [24]: rolled.shape
Out[24]: (3, 2, 4)
But pay attention to what you want to do with the resulting array because a view of the original array is returned. Thus, if you modify the rolled array, the original arr will also be affected.
Use numpy.rollaxis:
numpy.rollaxis(a, 1)
What you are looking for is probably np.transpose(..) (in fact the transpose of a 2D matrix is a specific case of this):
b = a.transpose((1, 0, 2))
Here we specify that the first index of the new matrix (b) is the second (1) index of the old matrix (a); that the second index of the new matrix is the first (0) index of the old matrix; and the third index of the new matrix is the third index (2) of the old matrix.
This thus means that if a has a.shape = (m, n, p), then b.shape = (n, m, p).

Apply a function to each row of a numpy matrix w.r.t. its index

I have a numpy matrix A of shape [n,m] and an array b of length n. What I need is to take sum of b[i] least elements of the i'th row of A.
So the code might look like this:
A = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
b = np.array([2,3,1])
sums = magic_function() #sums = [3, 15, 7]
I've considered np.apply_along_axis() function but it seems that your function can only depend on the row itself in this case.
Vectorized approach making use of NumPy broadcasting to create the mask of valid ones along each row and then perform sum-reduction -
mask = b[:,None] > np.arange(A.shape[1])
out = (A*mask).sum(1)
Alternatively, with np.einsum to get the reduction -
out = np.einsum('ij,ij->i',A,mask)
We can also use np.matmul/# notation on Python 3.x -
out = (A[:,None] # mask[...,None]).squeeze()

Replace for loop by using a boolean matrix to perform advanced indexing

When dealing with a 3-dimensional matrix "M" of dimensions (A, B, C), one can index M using 2 vectors X with elements in [0, A) and Y with elements in [0, B) of the same dimension D.
More specifically, I understand that when writing
M[X,Y,:]
we are taking, for each "i" in D,
M[X[i], Y[i], :],
thus producing a DxC matrix in the end.
Now suppose
X is a numpy array of dim U, same concept as before
this time Y is a matrix UxL, where each row correspond to a Boolean numpy array
(a mask)
and look at the following code
for u in U:
my_matrix[Y[u], X[u], :] += 1 # Y[u] is the mask that selects specific elements of the first dimension
I would like to write the same code without the for loop. Something like this
np.add.at(my_matrix, (Y, X), 1) # i use numpy.ufunc.at since same elements could occur multiple times in X or Y.
which unfortunately returns the following error
IndexError: boolean index did not match indexed array along dimension 0; dimension is L but corresponding boolean dimension is 1
This issue can also be found when performing assignment
for u in U:
a_matrix[u, Y[u], :] = my_matrix[Y[u], X[u], :]
Do you know how I can address this problem(s) in an elegant way?
The straightforward way of simply using the usual nd-array-shaped fancy indexing is unlikely to work for your problem. Here's why I'm saying this: Y has boolean rows which tell you which indices to take along the first dimension. So Y[0] and Y[1] might have a different amount of True elements, thus the rows of Y would slice subarrays with varying length along the first dimension. In other words, your array-shaped indices can't be translated to a rectangular subarray.
But if you think about what your indexing arrays mean, there's a way out. The rows of Y exactly tell you which elements to modify. If we clutter all the indices to a huge collection of 1d fancy indices, we can pinpoint each (x,y) point along the first dimension which we want to index.
In particular, consider the following example (sorely missing from your question, by the way):
A = np.arange(4*3*2).reshape(4,3,2)
Y = np.array([[True,False,False,True],
[True,True,True,False],
[True,False,False,True]])
X = np.array([2,1,2])
A is shape (4,3,2), Y is shape (3,4) (and the first and last rows are identical on purpose), X is shape (3,)` (and the first and last elements are identical on purpose). Let's turn the boolean indices into a collection of linear indices:
U,inds = Y.nonzero()
#U: array([0, 0, 1, 1, 1, 2, 2])
#inds: array([0, 3, 0, 1, 2, 0, 3])
As you can see, U are the row indices of each True element in Y. These are the indices that give the correspondence between rows of Y and elements of X. The second array, inds are the actual linear indices (for a given row) along the first dimension.
We're almost done, all we need is to pair up the elements of inds with the corresponding index from X for the second dimension. This is actually pretty easy: we just need to index X with U.
So all in all, the following two are equivalent looping and fancy-indexing solutions to the same problem:
B = A.copy()
for u in range(X.size):
A[Y[u],X[u],:] += 1
U,inds = Y.nonzero()
np.add.at(B,(inds,X[U]),1)
A is modified with a loop, B is modified using np.add.at. We can see that the two are equal:
>>> (A == B).all()
True
And if you take a look at the example, you can see that I intentionally duplicated the first and third bunch of indices. This demonstrates that np.add.at is working properly with these fancy indices, complete with accumulating indices that appear multiple times on input. (Printing B and comparing with the initial value of A you can see that the last items are incremented twice.)

Index of multidimensional array

I have a problem using multi-dimensional vectors as indices for multi-dimensional vectors. Say I have C.ndim == idx.shape[0], then I want C[idx] to give me a single element. Allow me to explain with a simple example:
A = arange(0,10)
B = 10+A
C = array([A.T, B.T])
C = C.T
idx = array([3,1])
Now, C[3] gives me the third row, and C[1] gives me the first row. C[idx] then will give me a vstack of both rows. However, I need to get C[3,1]. How would I achieve that given arrays C, idx?
/edit:
An answer suggested tuple(idx). This work's perfectly for a single idx. But:
Let's take it to the next level: say INDICES is a vector where I have stacked vertically arrays of shape idx. tuple(INDICES) will give me one long tuple, so C[tuple(INDICES)] won't work. Is there a clean way of doing this or will I need to iterate over the rows?
If you convert idx to a tuple, it'll be interpreted as basic and not advanced indexing:
>>> C[3,1]
13
>>> C[tuple(idx)]
13
For the vector case:
>>> idx
array([[3, 1],
[7, 0]])
>>> C[3,1], C[7,0]
(13, 7)
>>> C[tuple(idx.T)]
array([13, 7])
>>> C[idx[:,0], idx[:,1]]
array([13, 7])

Categories