What I mean by this is imagine you have an ndarray a with shape (2,3,4). I want to define another ndarray b with shape (3,2,4) such that
b[i][j][k] = a[j][i][k]
Matrix operations only apply to the last 2 index places. If there is a way to make matrix operations act on any 2 chosen index places then everything can be solved.
Thank you
On the same lines of your thought, you can use numpy.einsum() to achieve what you want.
In [21]: arr = np.random.randn(2,3,4)
In [22]: arr.shape
Out[22]: (2, 3, 4)
# swap first two dimensions
In [23]: rolled = np.einsum('ijk->jik', arr)
In [24]: rolled.shape
Out[24]: (3, 2, 4)
But pay attention to what you want to do with the resulting array because a view of the original array is returned. Thus, if you modify the rolled array, the original arr will also be affected.
Use numpy.rollaxis:
numpy.rollaxis(a, 1)
What you are looking for is probably np.transpose(..) (in fact the transpose of a 2D matrix is a specific case of this):
b = a.transpose((1, 0, 2))
Here we specify that the first index of the new matrix (b) is the second (1) index of the old matrix (a); that the second index of the new matrix is the first (0) index of the old matrix; and the third index of the new matrix is the third index (2) of the old matrix.
This thus means that if a has a.shape = (m, n, p), then b.shape = (n, m, p).
Related
The standard use of np.newaxis (or None) is within the index brackets. For example
arr = np.arange(4)
print(arr[:,None].shape)
print(arr[None,:].shape)
gives
(4,1)
(1,4)
But I recently saw someone using it as an index in a separate bracket, like so
arr = np.arange(4)
print(arr[:][None].shape)
print(arr[None][:].shape)
which gives
(1,4)
(1,4)
To my surprise, both results are the same, even though the newaxis was in the second index position in the first case. Why is arr[:][None].shape not (4,1)?
In arr[:][np.newaxis] and arr[np.newaxis][:] the indexing is done sequentially, so arr2 = arr[:][np.newaxis] is equivalent to:
arr_temp = arr[:]
arr2 = arr_temp[np.newaxis]
del arr_temp
The same logic applies to ordering the indexing operators the other way round, for arr2 = arr[np.newaxis][:]:
arr_temp = arr[np.newaxis]
arr2 = arr_temp[:]
del arr_temp
Now, to quote https://numpy.org/doc/1.19/reference/arrays.indexing.html:
Each newaxis object in the selection tuple serves to expand the dimensions of the resulting selection by one unit-length dimension. The added dimension is the position of the newaxis object in the selection tuple.
Since np.newaxis is at the first position (there is only one position) in the indexing selection tuple in both arr[np.newaxis] and arr_temp[np.newaxis], it will create the new dimension as the first dimension, and thus the resulting shape is (1, 4) in both cases.
When dealing with a 3-dimensional matrix "M" of dimensions (A, B, C), one can index M using 2 vectors X with elements in [0, A) and Y with elements in [0, B) of the same dimension D.
More specifically, I understand that when writing
M[X,Y,:]
we are taking, for each "i" in D,
M[X[i], Y[i], :],
thus producing a DxC matrix in the end.
Now suppose
X is a numpy array of dim U, same concept as before
this time Y is a matrix UxL, where each row correspond to a Boolean numpy array
(a mask)
and look at the following code
for u in U:
my_matrix[Y[u], X[u], :] += 1 # Y[u] is the mask that selects specific elements of the first dimension
I would like to write the same code without the for loop. Something like this
np.add.at(my_matrix, (Y, X), 1) # i use numpy.ufunc.at since same elements could occur multiple times in X or Y.
which unfortunately returns the following error
IndexError: boolean index did not match indexed array along dimension 0; dimension is L but corresponding boolean dimension is 1
This issue can also be found when performing assignment
for u in U:
a_matrix[u, Y[u], :] = my_matrix[Y[u], X[u], :]
Do you know how I can address this problem(s) in an elegant way?
The straightforward way of simply using the usual nd-array-shaped fancy indexing is unlikely to work for your problem. Here's why I'm saying this: Y has boolean rows which tell you which indices to take along the first dimension. So Y[0] and Y[1] might have a different amount of True elements, thus the rows of Y would slice subarrays with varying length along the first dimension. In other words, your array-shaped indices can't be translated to a rectangular subarray.
But if you think about what your indexing arrays mean, there's a way out. The rows of Y exactly tell you which elements to modify. If we clutter all the indices to a huge collection of 1d fancy indices, we can pinpoint each (x,y) point along the first dimension which we want to index.
In particular, consider the following example (sorely missing from your question, by the way):
A = np.arange(4*3*2).reshape(4,3,2)
Y = np.array([[True,False,False,True],
[True,True,True,False],
[True,False,False,True]])
X = np.array([2,1,2])
A is shape (4,3,2), Y is shape (3,4) (and the first and last rows are identical on purpose), X is shape (3,)` (and the first and last elements are identical on purpose). Let's turn the boolean indices into a collection of linear indices:
U,inds = Y.nonzero()
#U: array([0, 0, 1, 1, 1, 2, 2])
#inds: array([0, 3, 0, 1, 2, 0, 3])
As you can see, U are the row indices of each True element in Y. These are the indices that give the correspondence between rows of Y and elements of X. The second array, inds are the actual linear indices (for a given row) along the first dimension.
We're almost done, all we need is to pair up the elements of inds with the corresponding index from X for the second dimension. This is actually pretty easy: we just need to index X with U.
So all in all, the following two are equivalent looping and fancy-indexing solutions to the same problem:
B = A.copy()
for u in range(X.size):
A[Y[u],X[u],:] += 1
U,inds = Y.nonzero()
np.add.at(B,(inds,X[U]),1)
A is modified with a loop, B is modified using np.add.at. We can see that the two are equal:
>>> (A == B).all()
True
And if you take a look at the example, you can see that I intentionally duplicated the first and third bunch of indices. This demonstrates that np.add.at is working properly with these fancy indices, complete with accumulating indices that appear multiple times on input. (Printing B and comparing with the initial value of A you can see that the last items are incremented twice.)
Say I have a matrices A, B, and C. When I initialize the matrices to
A = np.array([[]])
B = np.array([[1,2,3]])
C = np.array([[1,2,3],[4,5,6]])
Then
B.shape[0]
C.shape[0]
give 1 and 2, respectively (as expected), but
A.shape[0]
gives 1, just like B.shape[0].
What is the simplest way to get the number of rows of a given matrix, but still ensure that an empty matrix like A gives a value of zero.
After searching stack overflow for awhile, I couldn't find an answer, so I'm posting my own below, but if you can come up with a cleaner, more general answer, I'll accept your answer instead. Thanks!
A = np.array([[]])
That's a 1-by-0 array. You seem to want a 0-by-3 array. Such an array is almost completely useless, but if you really want one, you can make one:
A = np.zeros([0, 3])
Then you'll have A.shape[0] == 0.
You could qualify the shape[0] by the test of whether size is 0 or not.
In [121]: A.shape[0]*(A.size>0)
Out[121]: 0
In [122]: B.shape[0]*(B.size>0)
Out[122]: 1
In [123]: C.shape[0]*(C.size>0)
Out[123]: 2
or test the number of columns
In [125]: A.shape[0]*(A.shape[1]>0)
Out[125]: 0
What's distinctive about A is the number of columns, the 2nd dimension.
Using
A.size/(len(A[0]) or 1)
B.size/(len(B[0]) or 1)
C.size/(len(C[0]) or 1)
yields 0, 1, and 2, respectively.
I do not understand the behaviour of numpy.array_split with subindices. Indeed when I consider an array of a given length, I determine a subindices and I try to use array_split. I obtain different behaviour if the number of subindices is odd or even. Let's make an example
import numpy as np
a = np.ones(2750001) # fake array
t = np.arange(a.size) # fake time basis
indA = ((t>= 5e5) & (t<= 1e6)) # First subindices odd number
indB = ((t>=5e5+1) & (t<= 1e6)) # Second indices even number
# now perform array_split
print(np.shape(np.array_split(a[indA],10)))
# (10,)
print(np.shape(np.array_split(a[indB],10)))
# (10, 50000)
Now we have different results, basically for the even number we have that the shape command gives actually (10,50000) whereas the shape command in case of odd indices gives (10,) (the 10 lists supposed). I'm a bit surprise actually and I would like to understand the reason. I know that array_split can be used also when the number of splitting does not equally divide the array. But I would like some clue also because I need to insert in a loop where I do not know a priori if the indices will be even or odd.
I think the suprising behavior has more to do with np.shape than np.array_split:
In [58]: np.shape([(1,2),(3,4)])
Out[58]: (2, 2)
In [59]: np.shape([(1,2),(3,4,5)])
Out[59]: (2,)
np.shape(a) is showing the shape of the array np.asarray(a):
def shape(a):
try:
result = a.shape
except AttributeError:
result = asarray(a).shape
return result
So, when np.array_split returns a list of arrays of unequal length, np.asarray(a) is a 1-dimensional array of object dtype:
In [61]: np.asarray([(1,2),(3,4,5)])
Out[61]: array([(1, 2), (3, 4, 5)], dtype=object)
When array_split returns a list of arrays of equal length, then np.asarray(a) returns a 2-dimensional array:
In [62]: np.asarray([(1,2),(3,4)])
Out[62]:
array([[1, 2],
[3, 4]])
I have two arrays, one is a matrix of index pairs,
a = array([[[0,0],[1,1]],[[2,0],[2,1]]], dtype=int)
and another which is a matrix of data to access at these indices
b = array([[1,2,3],[4,5,6],[7,8,9]])
and I want to able to use the indices of a to get the entries of b. Just doing:
>>> b[a]
does not work, as it gives one row of b for each entry in a, i.e.
array([[[[1,2,3],
[1,2,3]],
[[4,5,6],
[4,5,6]]],
[[[7,8,9],
[1,2,3]],
[[7,8,9],
[4,5,6]]]])
when I would like to use the index pair in the last axis of a to give the two indices of b:
array([[1,5],[7,8]])
Is there a clean way of doing this, or do I need to reshape b and combine the columns of a in a corresponding manner?
In my actual problem a has about 5 million entries, and b is 100-by-100, I'd like to avoid for loops.
Actually, this works:
b[a[:, :, 0],a[:, :, 1]]
Gives array([[1, 5],
[7, 8]]).
For this case, this works
tmp = a.reshape(-1,2)
b[tmp[:,0], tmp[:,1]]
A more general solution, whenever you want to use a 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, in order to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
You can test this by using, e.g.:
B = 1/(np.arange(n*m).reshape(n,-1) + 1)
inds = np.random.randint(0,B.shape[1],(B.shape[0],B.shape[1]))