List as index in matlab - python

There is this method written in Matlab that I want to translate into Python. However, I don't understand how to interpret the notation of indexing the sparse matrix M with a row of the matrix faces. What would be the equivalent in Python?
M = spalloc(size(template,1), size(template,1), 10*size(template,1));
for i = 1:size(faces,1)
v = faces(i,:); % faces is a Nx3 matrix
...
M(v, v) = M(v, v) + WIJ; % WIJ is some 3x3 matrix

#Eric Yu` uses a dense numpy array:
In [239]: A=np.array([[1,2,3],[3,4,5],[5,6,7]])
In [240]: A
Out[240]:
array([[1, 2, 3],
[3, 4, 5],
[5, 6, 7]])
In [241]: v=[0,1]
this indexing selects rows:
In [242]: A[v]
Out[242]:
array([[1, 2, 3],
[3, 4, 5]])
and from that select columns:
In [243]: A[v][:,v]
Out[243]:
array([[1, 2],
[3, 4]])
But A[v] is a copy, not a view, so assignment will fail:
In [244]: A[v][:,v] = 0
In [245]: A
Out[245]:
array([[1, 2, 3],
[3, 4, 5],
[5, 6, 7]])
===
To properly index a block of a numpy array, use ix_ (or equivalent) to create indexing arrays that broadcast against each other to define the block:
In [247]: np.ix_(v,v)
Out[247]:
(array([[0],
[1]]), array([[0, 1]]))
In [248]: A[np.ix_(v,v)]
Out[248]:
array([[1, 2],
[3, 4]])
In [249]: A[np.ix_(v,v)]=0
In [250]: A
Out[250]:
array([[0, 0, 3],
[0, 0, 5],
[5, 6, 7]])
Without the ix_ transform, indexing with [v,v] selects a diagonal:
In [251]: A[v,v]
Out[251]: array([0, 0])
MATLAB M(v,v) indexes the block. Indexing the diagonal on the other hand requires use of sub2idx (or something like that). This is a case where MATLAB's indexing notation makes one task easy, and the other more complex. numpy does the reverse.
===
What I wrote is applicable to sparse matrices as well
In [253]: M=sparse.lil_matrix(np.array([[1,2,3],[3,4,5],[5,6,7]]))
In [254]: M
Out[254]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in LInked List format>
The diagonal selection:
In [255]: M[v,v]
Out[255]:
<1x2 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in LInked List format>
In [256]: _.A
Out[256]: array([[1, 4]], dtype=int64)
Note that this matrix is (1,2), still 2d, in the style of MATLAB matrices.
block selection:
In [258]: M[np.ix_(v,v)]
Out[258]:
<2x2 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in LInked List format>
In [259]: _.A
Out[259]:
array([[1, 2],
[3, 4]], dtype=int64)
In [260]: M[np.ix_(v,v)]=0
In [261]: M.A
Out[261]:
array([[0, 0, 3],
[0, 0, 5],
[5, 6, 7]], dtype=int64)
sparse.csr_matrix will index in the same way (with some differences in the assignment step).

import numpy as np
A=[[1,2,3],[3,4,5],[5,6,7]]
M=np.array(A)
v=[0,1]
M[v][:,v]
the result is:
array([[1, 2],
[3, 4]])

Related

Scipy create sparse row matrix from a list of indices and a list of list data

suppose I have a list of list data, and a list containing the row number of each data, how to convert to a sparse matrix?
Example:
import numpy as np
data = np.array([[1,2,3],[4,5,6],[7,8,9]])
indices = np.array([0,0,4]) # row number, sum when duplicated
expected output is:
[[5, 7, 9], # row 0: [5,7,9]=[1,2,3]+[4,5,6]
[0, 0, 0],
[0, 0, 0],
[7, 8, 9]] # row 4
I understand that I can construct it using scipy.sparse.csr_matrix with data, and row, col, or indptr, but I now have already calculated data and indices, is there a way to simply construct a sparse matrix using these two? Thanks!
According to the documentation, there is a constructor that utilizes the CSR information directly:
csr_matrix((data, indices, indptr), [shape=(M, N)])
So in your specific case, you could write it like:
data = np.array([1,2,3,4,5,6,7,8,9])
indices = np.array([0,1,2,0,1,2,0,1,2]) # col numbers
indptr = np.array([0,6,6,6,9]) # row pointers
mat = csr_matrix((data, indices, indptr), shape=(4, 3))
To get an example on how the CSR format works, you can take a look into sparse matrices. I will explain the code nonetheless:
First, the data needs to be flattened to a single list. The indices of the CSR format relate to the column-indices, while the indptr is used to point to the rows.
So having an indptr value of 0 at position 0 in the list tells us that the 1st row (position + 1) of the matrix starts after 0 data entries. Similarly, a value of 6 at position 1 in the list tells us that the 2nd row (position + 1) of the matrix starts after 6 data entries.
The column-indices list is as you would expect it to behave: data[i] is positioned in column indices[i].
In [131]: data = np.array([[1,2,3],[4,5,6],[7,8,9]])
...: indices = np.array([0,0,3]) # row number, sum when duplicated
I corrected the indices for 0 based indexing.
We don't need sparse to sum the duplicates. There's a np.add.at that does this nicely:
In [135]: res = np.zeros((4,3),int)
In [136]: np.add.at(res, indices, data)
In [137]: res
Out[137]:
array([[5, 7, 9],
[0, 0, 0],
[0, 0, 0],
[7, 8, 9]])
If we make a csr from that:
In [141]: M = sparse.csr_matrix(res)
In [142]: M
Out[142]:
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [143]: M.data
Out[143]: array([5, 7, 9, 7, 8, 9])
In [144]: M.indices
Out[144]: array([0, 1, 2, 0, 1, 2], dtype=int32)
In [145]: M.indptr
Out[145]: array([0, 3, 3, 3, 6], dtype=int32)
To make a csr directly, it's often easier to use the coo style of inputs. They are easier to understand.
Those inputs are 3 1d arrays of the same size:
In [160]: data.ravel()
Out[160]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [161]: row = np.repeat(indices,3)
In [162]: row
Out[162]: array([0, 0, 0, 0, 0, 0, 3, 3, 3])
In [163]: col = np.tile(np.arange(3),3)
In [164]: col
Out[164]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])
In [165]: M1 = sparse.coo_matrix((data.ravel(),(rows, cols)))
In [166]: M1.data
Out[166]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
The coo format leaves the inputs as given; but on conversion to csr duplicates are summed.
In [168]: M2 = M1.tocsr()
In [169]: M2
Out[169]:
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [170]: M2.data
Out[170]: array([5, 7, 9, 7, 8, 9])
In [171]: M2.indices
Out[171]: array([0, 1, 2, 0, 1, 2], dtype=int32)
In [172]: M2.indptr
Out[172]: array([0, 3, 3, 3, 6], dtype=int32)
In [173]: M2.A
Out[173]:
array([[5, 7, 9],
[0, 0, 0],
[0, 0, 0],
[7, 8, 9]])
#Erik shows how to use the csr format directly:
In [174]: M3 =sparse.csr_matrix((data.ravel(), col, [0,6,6,6,9]))
In [175]: M3
Out[175]:
<4x3 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Row format>
In [176]: M3.A
Out[176]:
array([[5, 7, 9],
[0, 0, 0],
[0, 0, 0],
[7, 8, 9]])
In [177]: M3.indices
Out[177]: array([0, 1, 2, 0, 1, 2, 0, 1, 2], dtype=int32)
Note this has 9 nonzero elements; it hasn't summed the duplicates for storage (though the .A display shows them summed). To sum, we need an extra step:
In [179]: M3.sum_duplicates()
In [180]: M3.data
Out[180]: array([5, 7, 9, 7, 8, 9])

2D slicing confusion

Suppose I have a two dimensional numpy array. For example:
dog = np.random.rand(3, 3)
Now, I can extract the intersection of the first and second row and the second and third column of dog thus:
dog[:2, 1:]
I can also do
dog[[0, 1], 1:]
or
dog[:2, [1, 2]]
But I CAN NOT do
dog[[0, 1], [1, 2]]
That returns a one dimensional array of the [0, 1] and [1, 2] elements of dog.
And this seems to mean that to extract that principal submatrix which is the intersection of the first and last row of dog and the first and last column I have to something gross like:
tmp = dog[[0, 2], :]
ans = tmp[:, [0, 2]]
Is there some more civilized way extracting submatrices? The obvious solution dog[[0, 2], [0, 2]] does work in Julia.
In [94]: dog = np.arange(9).reshape(3,3)
In [95]: dog
Out[95]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
the slice block:
In [96]: dog[:2,1:]
Out[96]:
array([[1, 2],
[4, 5]])
With 2 lists (1d arrays), we select the diagonal from that block:
In [97]: dog[[0,1],[1,2]]
Out[97]: array([1, 5])
But if we change the first to (2,1) array, it broadcasts with the (2,) to select at (2,2) block:
In [98]: dog[[[0],[1]],[1,2]]
Out[98]:
array([[1, 2],
[4, 5]])
In [99]: dog[np.ix_([0,1],[1,2])]
Out[99]:
array([[1, 2],
[4, 5]])
ix_ turns the 2 lists into (2,1) and (1,2) arrays:
In [100]: np.ix_([0,1],[1,2])
Out[100]:
(array([[0],
[1]]),
array([[1, 2]]))
The diagonal selection in [97] follows the same logic: (2,) and (2,) => (2,)
I don't know about Julia, but MATLAB lets us use [97] like syntax to select the block, but to get the 'diagonal' we have to convert the indices to a flat index, the equivalent of:
In [104]: np.ravel_multi_index(([0,1],[1,2]),(3,3))
Out[104]: array([1, 5])
In [105]: dog.flat[_]
Out[105]: array([1, 5])
So what's easy in numpy is harder in MATLAB, and visa versa. Once you understand broadcasting the numpy approach is logical and general.

Numpy slice to indices

Let's say I have a 3x3 matrix. The 1D indices of this matrix are:
0 1 2
3 4 5
6 7 8
Is there a function that receives a slice and returns the 1D indices, wathever the dimension? Something like:
m = np.ones((3, 3))
id1 = some_function(m, (1, :)) # [3, 4, 5]
id2 = some_function(m, (:, 1)) # [1, 4, 7]
# Use the indices together
m[id1 + id2] = wathever
m[~(id1 + id2)] = wathever else
I don't want to code it because I'm sure it exists somewhere in numpy! For those who wonder why I want that, it's because I want to merge several slices together, use not (~) on the indices, etc.
ravel_multi_index returns the 1d equivalent of n-d indexing tuple:
In [208]: np.ravel_multi_index(([1],[0,1,2]),(3,3))
Out[208]: array([3, 4, 5], dtype=int32)
In [209]: np.ravel_multi_index(([0,1,2],[1]),(3,3))
Out[209]: array([1, 4, 7], dtype=int32)
For more complex indexing we may need to use ix_ to get index broadcasting right:
In [214]: np.ravel_multi_index((np.ix_([0,1,2],[1,2])),(3,3))
Out[214]:
array([[1, 2],
[4, 5],
[7, 8]], dtype=int32)
Now we just need to turn [1,:] in to that tuple. Something in indexing_tricks should do that.
In [222]: np.ravel_multi_index((np.ix_(np.r_[0:3],[1,2])),(3,3))
Out[222]:
array([[1, 2],
[4, 5],
[7, 8]], dtype=int32)
In [223]: np.ravel_multi_index((np.ix_([1],np.r_[0:3])),(3,3))
Out[223]: array([[3, 4, 5]], dtype=int32)
In a more general case we'd want to use m.shape instead of (3,3).
~ works on boolean masks, not indices. So to 'delete' the [1] element from a array, we can do:
In [225]: mask = np.ones((3,),bool)
In [226]: mask[1] = False # index to delete
In [227]: np.arange(3)[mask]
Out[227]: array([0, 2])
This is essentially what np.delete does.

argsort for a multidimensional ndarray

I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop

matrix entries also matrices in python

is there a way to create a matrix whose entries are also matrices in Python? I don't see any way to do so with numpy.
*In other words, I want A[i,j] to be a matrix as well.
If a 4d array is ok, then
x = np.zeros((3,4,2,2), dtype=int)
where
x[0,0].shape # (2,2)
If it must be np.matrix type, then it has to be 2d. It can be dtype=object, where each element is in turn a 2d matrix. That construction is a bit more convoluted (a lot more?).
Make an empty array with dtype=object
In [565]: x=np.zeros((2,2),dtype=object)
In [566]: x
Out[566]:
array([[0, 0],
[0, 0]], dtype=object)
Fill each element with a matrix:
In [567]: x[0,0]=np.matrix([[0,1],[2,3]])
In [569]: x[0,1]=np.matrix([[0,1],[2,3]])
In [570]: x[1,0]=np.matrix([[0,1],[2,3]])
In [571]: x[1,1]=np.matrix([[0,1],[2,3]])
In [572]: x
Out[572]:
array([[matrix([[0, 1],
[2, 3]]), matrix([[0, 1],
[2, 3]])],
[matrix([[0, 1],
[2, 3]]), matrix([[0, 1],
[2, 3]])]], dtype=object)
Turn it into a matrix:
In [573]: xm=np.matrix(x)
In [574]: xm
Out[574]:
matrix([[matrix([[0, 1],
[2, 3]]), matrix([[0, 1],
[2, 3]])],
[matrix([[0, 1],
[2, 3]]), matrix([[0, 1],
[2, 3]])]], dtype=object)
I don't know whether xm has any useful computational properties.

Categories