Suppose I have a two dimensional numpy array. For example:
dog = np.random.rand(3, 3)
Now, I can extract the intersection of the first and second row and the second and third column of dog thus:
dog[:2, 1:]
I can also do
dog[[0, 1], 1:]
or
dog[:2, [1, 2]]
But I CAN NOT do
dog[[0, 1], [1, 2]]
That returns a one dimensional array of the [0, 1] and [1, 2] elements of dog.
And this seems to mean that to extract that principal submatrix which is the intersection of the first and last row of dog and the first and last column I have to something gross like:
tmp = dog[[0, 2], :]
ans = tmp[:, [0, 2]]
Is there some more civilized way extracting submatrices? The obvious solution dog[[0, 2], [0, 2]] does work in Julia.
In [94]: dog = np.arange(9).reshape(3,3)
In [95]: dog
Out[95]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
the slice block:
In [96]: dog[:2,1:]
Out[96]:
array([[1, 2],
[4, 5]])
With 2 lists (1d arrays), we select the diagonal from that block:
In [97]: dog[[0,1],[1,2]]
Out[97]: array([1, 5])
But if we change the first to (2,1) array, it broadcasts with the (2,) to select at (2,2) block:
In [98]: dog[[[0],[1]],[1,2]]
Out[98]:
array([[1, 2],
[4, 5]])
In [99]: dog[np.ix_([0,1],[1,2])]
Out[99]:
array([[1, 2],
[4, 5]])
ix_ turns the 2 lists into (2,1) and (1,2) arrays:
In [100]: np.ix_([0,1],[1,2])
Out[100]:
(array([[0],
[1]]),
array([[1, 2]]))
The diagonal selection in [97] follows the same logic: (2,) and (2,) => (2,)
I don't know about Julia, but MATLAB lets us use [97] like syntax to select the block, but to get the 'diagonal' we have to convert the indices to a flat index, the equivalent of:
In [104]: np.ravel_multi_index(([0,1],[1,2]),(3,3))
Out[104]: array([1, 5])
In [105]: dog.flat[_]
Out[105]: array([1, 5])
So what's easy in numpy is harder in MATLAB, and visa versa. Once you understand broadcasting the numpy approach is logical and general.
Related
There is this method written in Matlab that I want to translate into Python. However, I don't understand how to interpret the notation of indexing the sparse matrix M with a row of the matrix faces. What would be the equivalent in Python?
M = spalloc(size(template,1), size(template,1), 10*size(template,1));
for i = 1:size(faces,1)
v = faces(i,:); % faces is a Nx3 matrix
...
M(v, v) = M(v, v) + WIJ; % WIJ is some 3x3 matrix
#Eric Yu` uses a dense numpy array:
In [239]: A=np.array([[1,2,3],[3,4,5],[5,6,7]])
In [240]: A
Out[240]:
array([[1, 2, 3],
[3, 4, 5],
[5, 6, 7]])
In [241]: v=[0,1]
this indexing selects rows:
In [242]: A[v]
Out[242]:
array([[1, 2, 3],
[3, 4, 5]])
and from that select columns:
In [243]: A[v][:,v]
Out[243]:
array([[1, 2],
[3, 4]])
But A[v] is a copy, not a view, so assignment will fail:
In [244]: A[v][:,v] = 0
In [245]: A
Out[245]:
array([[1, 2, 3],
[3, 4, 5],
[5, 6, 7]])
===
To properly index a block of a numpy array, use ix_ (or equivalent) to create indexing arrays that broadcast against each other to define the block:
In [247]: np.ix_(v,v)
Out[247]:
(array([[0],
[1]]), array([[0, 1]]))
In [248]: A[np.ix_(v,v)]
Out[248]:
array([[1, 2],
[3, 4]])
In [249]: A[np.ix_(v,v)]=0
In [250]: A
Out[250]:
array([[0, 0, 3],
[0, 0, 5],
[5, 6, 7]])
Without the ix_ transform, indexing with [v,v] selects a diagonal:
In [251]: A[v,v]
Out[251]: array([0, 0])
MATLAB M(v,v) indexes the block. Indexing the diagonal on the other hand requires use of sub2idx (or something like that). This is a case where MATLAB's indexing notation makes one task easy, and the other more complex. numpy does the reverse.
===
What I wrote is applicable to sparse matrices as well
In [253]: M=sparse.lil_matrix(np.array([[1,2,3],[3,4,5],[5,6,7]]))
In [254]: M
Out[254]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in LInked List format>
The diagonal selection:
In [255]: M[v,v]
Out[255]:
<1x2 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in LInked List format>
In [256]: _.A
Out[256]: array([[1, 4]], dtype=int64)
Note that this matrix is (1,2), still 2d, in the style of MATLAB matrices.
block selection:
In [258]: M[np.ix_(v,v)]
Out[258]:
<2x2 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in LInked List format>
In [259]: _.A
Out[259]:
array([[1, 2],
[3, 4]], dtype=int64)
In [260]: M[np.ix_(v,v)]=0
In [261]: M.A
Out[261]:
array([[0, 0, 3],
[0, 0, 5],
[5, 6, 7]], dtype=int64)
sparse.csr_matrix will index in the same way (with some differences in the assignment step).
import numpy as np
A=[[1,2,3],[3,4,5],[5,6,7]]
M=np.array(A)
v=[0,1]
M[v][:,v]
the result is:
array([[1, 2],
[3, 4]])
In [136]: s = np.array([[1,0,1],[0,1,1],[0,0,1],[1,1,1]])
In [137]: s
Out[137]:
array([[1, 0, 1],
[0, 1, 1],
[0, 0, 1],
[1, 1, 1]])
In [138]: x = s[0:1]
In [139]: x.shape
Out[139]: (1, 3)
In [140]: y = s[0]
In [141]: y.shape
Out[141]: (3,)
In [142]: x
Out[142]: array([[1, 0, 1]])
In [143]: y
Out[143]: array([1, 0, 1])
In the above code, x's shape is (1,3) and y's shape is(3,).
(1,3): 1 row and 3 columns
(3,): How many rows and columns in this case?
Does (3,) represent 1-dimension array?
In practice, if I want to iterate through the matrix row by row, which way should I go?
for i in range(len(x)):
row = x[i]
# OR
row = x[i:i+1]
First, you can get the number of dimensions of an numpy array array through len(array.shape).
An array with some dimensions of length 1 is not equal to an array with those dimensions removed, for example:
>>> a = np.array([[1], [2], [3]])
>>> b = np.array([1, 2, 3])
>>> a
array([[1],
[2],
[3]])
>>> b
array([1, 2, 3])
>>> a.shape
(3, 1)
>>> b.shape
(3,)
>>> a + a
array([[2],
[4],
[6]])
>>> a + b
array([[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
Conceptually, the difference between an array of shape (3, 1) and one of shape (3,) is like the difference between the length of [100] and 100.
[100] is a list that happens to have one element. It could have more, but right now it has the minimum possible number of elements.
On the other hand, it doesn't even make sense to talk about the length of 100, because it doesn't have one.
Similarly, the array of shape (3, 1) has 3 rows and 1 column, while the array of shape (3,) has no columns at all. It doesn't even have rows, in a sense; it is a row, just like 100 has no elements, because it is an element.
For more information on how differently shaped arrays behave when interacting with other arrays, you can see the broadcasting rules.
Lastly, for completeness, to iterate through the rows of a numpy array, you could just do for row in array. If you want to iterate through the back axes, you can use np.moveaxis, for example:
>>> array = np.array([[1, 2], [3, 4], [5, 6]])
>>> for row in array:
... print(row)
...
[1 2]
[3 4]
[5 6]
>>> for col in np.moveaxis(array, [0, 1], [1, 0]):
... print(col)
...
[1 3 5]
[2 4 6]
Let's say I have a 3x3 matrix. The 1D indices of this matrix are:
0 1 2
3 4 5
6 7 8
Is there a function that receives a slice and returns the 1D indices, wathever the dimension? Something like:
m = np.ones((3, 3))
id1 = some_function(m, (1, :)) # [3, 4, 5]
id2 = some_function(m, (:, 1)) # [1, 4, 7]
# Use the indices together
m[id1 + id2] = wathever
m[~(id1 + id2)] = wathever else
I don't want to code it because I'm sure it exists somewhere in numpy! For those who wonder why I want that, it's because I want to merge several slices together, use not (~) on the indices, etc.
ravel_multi_index returns the 1d equivalent of n-d indexing tuple:
In [208]: np.ravel_multi_index(([1],[0,1,2]),(3,3))
Out[208]: array([3, 4, 5], dtype=int32)
In [209]: np.ravel_multi_index(([0,1,2],[1]),(3,3))
Out[209]: array([1, 4, 7], dtype=int32)
For more complex indexing we may need to use ix_ to get index broadcasting right:
In [214]: np.ravel_multi_index((np.ix_([0,1,2],[1,2])),(3,3))
Out[214]:
array([[1, 2],
[4, 5],
[7, 8]], dtype=int32)
Now we just need to turn [1,:] in to that tuple. Something in indexing_tricks should do that.
In [222]: np.ravel_multi_index((np.ix_(np.r_[0:3],[1,2])),(3,3))
Out[222]:
array([[1, 2],
[4, 5],
[7, 8]], dtype=int32)
In [223]: np.ravel_multi_index((np.ix_([1],np.r_[0:3])),(3,3))
Out[223]: array([[3, 4, 5]], dtype=int32)
In a more general case we'd want to use m.shape instead of (3,3).
~ works on boolean masks, not indices. So to 'delete' the [1] element from a array, we can do:
In [225]: mask = np.ones((3,),bool)
In [226]: mask[1] = False # index to delete
In [227]: np.arange(3)[mask]
Out[227]: array([0, 2])
This is essentially what np.delete does.
I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop
If I have
x = np.arange(1, 10).reshape((3,3))
# array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
and
ind = np.array([[1,1], [1,2]])
# array([[1, 1],
# [1, 2]])
, how do I get use each row (axis 0) of ind to extract a cell of x? I hope to end up with the array [5, 6]. np.take(x, ind, axis=0) does not seem to work.
You could use "advanced integer indexing" by indexing x with two integer arrays, the first array for indexing the row, the second array for indexing the column:
In [58]: x[ind[:,0], ind[:,1]]
Out[58]: array([5, 6])
x[ind.T.tolist()]
works, too, and can also be used for multidimensional NumPy arrays.
Why?
NumPy arrays are indexed by tuples. Usually, these tuples are created implicitly by python:
Note
In Python, x[(exp1, exp2, ..., expN)] is equivalent to x[exp1, exp2, ..., expN]; the latter is just syntactic sugar for the former.
Note that this syntactic sugar isn't NumPy-specific. You could use it on dictionaries when the key is a tuple:
In [1]: d = { 'I like the number': 1, ('pi', "isn't"): 2}
In [2]: d[('pi', "isn't")]
Out[2]: 2
In [3]: d['pi', "isn't"]
Out[3]: 2
Actually, it's not even related to indexing:
In [5]: 1, 2, 3
Out[5]: (1, 2, 3)
Thus, for your NumPy array, x = np.arange(1,10).reshape((3,3))
In [11]: x[1,2]
Out[11]: 6
because
In [12]: x[(1,2)]
Out[12]: 6
So, in unutbu's answer, actually a tuple containing the columns of ind is passed:
In [21]: x[(ind[:,0], ind[:,1])]
Out[21]: array([5, 6])
with x[ind[:,0], ind[:,1]] just being an equivalent (and recommended) short hand notation for the same.
Here's how that tuple looks like:
In [22]: (ind[:,0], ind[:,1])
Out[22]: (array([1, 1]), array([1, 2]))
We can construct the same tuple diffently from ind: tolist() returns a NumPy array's rows. Transposing switches rows and columns, so we can get a list of columns by first transposing and calling tolist on the result:
In [23]: ind.T.tolist()
Out[23]: [[1, 1], [1, 2]]
Because ind is symmetric in your example, it is it's own transpose. Thus, for illustration, let's use
In [24]: ind_2 = np.array([[1,1], [1,2], [0, 0]])
# array([[1, 1],
# [1, 2],
# [0, 0]])
In [25]: ind_2.T.tolist()
Out[25]: [[1, 1, 0], [1, 2, 0]]
This can easily be converted to the tuples we want:
In [27]: tuple(ind_2.T.tolist())
Out[27]: ([1, 1, 0], [1, 2, 0])
In [28]: tuple(ind.T.tolist())
Out[28]: ([1, 1], [1, 2])
Thus,
In [29]: x[tuple(ind.T.tolist())]
Out[29]: array([5, 6])
equivalently to unutbu's answer for x.ndim == 2 and ind_2.shape[1] == 2, but also working more generally when x.ndim == ind_2.shape[1], in case you have to work with multi-dimensional NumPy arrays.
Why you can drop the tuple(...) and directly use the list for indexing, I don't know. Must be a NumPy thing:
In [43]: x[ind_2.T.tolist()]
Out[43]: array([5, 6, 1])