Simple extending csr matrix by adding a column - python

I have this code
import numpy as np
from scipy.sparse import csr_matrix
q = csr_matrix([[1.], [0.]])
ones = np.ones((2, 1))
and now how to add ones column to matrix q to have result shape (2, 2)?
(matrix q is sparse and I don't want to change type from csr)

The code for sparse.hstack is
return bmat([blocks], format=format, dtype=dtype)
for bmat, then blocks is a 1xN array. If they are all csc, it does a fast version of stack:
A = _compressed_sparse_stack(blocks[0,:], 1)
Conversely sparse.vstack with csr matrixes does
A = _compressed_sparse_stack(blocks[:,0], 0)
In effect given how data is stored in a csr matrix it it relatively easy to add rows (or columns for csc) (I can elaborate if that needs explanation).
Otherwise bmat does:
# convert everything to COO format
# calculate total nnz
data = np.empty(nnz, dtype=dtype)
for B in blocks:
data[nnz:nnz + B.nnz] = B.data
return coo_matrix((data, (row, col)), shape=shape).asformat(format)
In other words it gets the data, row, col values for each block, concatenates them, makes a new coo matrix, and finally converts it to the desire format.
sparse readily converts between formats. Even the display of a matrix can involve a conversion - to coo for the (i,j) d format, to csr for dense/array. sparse.nonzero converts to coo. Most math converts to csr. A csr is transposed by converting it to a csc (without change of attribute arrays). Much of the conversion is done in compiled code so you don't see delays.
Adding columns directly to csr format is a lot of work. All 3 attribute arrays have to be modified row by row. Again I could go into detail if needed.

Related

Adding dimensions to scipy sparse

I wish to add a dimension to a sparse matrix. In numpy it's simply a matter of doing [:,None]. I tried reshape and resize without any success.
Here's some dummy data:
from scipy.sparse import csr_matrix
data = [1,2,3,4,5,6]
col = [0,0,0,1,1,1]
row = [0,1,2,0,1,2]
a = csr_matrix((data, (row, col)))
a.reshape((3,2,1))
The last line gives the error: ValueError: matrix shape must be two-dimensional. Doing resize instead gives the error ValueError: shape must be a 2-tuple of positive integers.
In my particular case I also need to reshape it to (3,1,2). Any thoughts?
scipy.sparse can only handle 2d arrays. You might want to look into pydata/sparse which looks to handle n-dimensional sparse data while following the array interface. At the moment, it has fewer types of arrays and will have some performance issues, but is being actively developed.
from scipy.sparse import csr_matrix
data = [1,2,3,4,5,6]
col = [0,0,0,1,1,1]
row = [0,1,2,0,1,2]
a = csr_matrix((data, (row, col)))
a.reshape((3,2))

Transpose a 1-dimensional array in Numpy without casting to matrix

My goal is to to turn a row vector into a column vector and vice versa. The documentation for numpy.ndarray.transpose says:
For a 1-D array, this has no effect. (To change between column and row vectors, first cast the 1-D array into a matrix object.)
However, when I try this:
my_array = np.array([1,2,3])
my_array_T = np.transpose(np.matrix(myArray))
I do get the wanted result, albeit in matrix form (matrix([[66],[640],[44]])), but I also get this warning:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
my_array_T = np.transpose(np.matrix(my_array))
How can I properly transpose an ndarray then?
A 1D array is itself once transposed, contrary to Matlab where a 1D array doesn't exist and is at least 2D.
What you want is to reshape it:
my_array.reshape(-1, 1)
Or:
my_array.reshape(1, -1)
Depending on what kind of vector you want (column or row vector).
The -1 is a broadcast-like, using all possible elements, and the 1 creates the second required dimension.
If your array is my_array and you want to convert it to a column vector you can do:
my_array.reshape(-1, 1)
For a row vector you can use
my_array.reshape(1, -1)
Both of these can also be transposed and that would work as expected.
IIUC, use reshape
my_array.reshape(my_array.size, -1)

init a COO matrix from its attributes (data,row,col)

I have a script that computes the three attributes of the COO format:
data COO format data array of the matrix
row COO format row index array of the matrix
col COO format column index array of the matrix
And i want to use these three arrays to initialise a coo_matrix() in order to use the methods available to the coo_matrix class. What would be the fastest way to do that without changing the main script?
scipy.sparse documentation
coo_matrix((data, (row, column)), [shape=(M, N)])

easy sampling of vectors from a sparse matrix, and creating a new matrix from the sample (python)

This question has two parts (maybe one solution?):
Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix?
When I'm trying to sample lines using random.sample I get an TypeError: sparse matrix length is ambiguous.
from random import sample
import numpy as np
from scipy.sparse import lil_matrix
K = 2
m = [[1,2],[0,4],[5,0],[0,8]]
sample(m,K) #works OK
mm = np.array(m)
sample(m,K) #works OK
sm = lil_matrix(m)
sample(sm,K) #throws exception TypeError: sparse matrix length is ambiguous.
My current solution is to sample from the number of rows in the matrix, then use getrow(),, something like:
indxSampls = sample(range(sm.shape[0]), k)
sampledRows = []
for i in indxSampls:
sampledRows+=[sm.getrow(i)]
Any other efficient/elegant ideas? the dense matrix size is 1000x30000 and could be larger.
Constructing a sparse matrix from a list of sparse vectors: Now imagine I have the list of sampled vectors sampledRows, how can I convert it to a sparse matrix without densify it, convert it to list of lists and then convet it to lil_matrix?
Try
sm[np.random.sample(sm.shape[0], K, replace=False), :]
This gets you out an LIL-format matrix with just K of the rows (in the order determined by the random.sample). I'm not sure it's super-fast, but it can't really be worse than manually accessing row by row like you're currently doing, and probably preallocates the results.
The accepted answer to this question is outdated and no longer works. With newer versions of numpy, you should use np.random.choice in place of np.random.sample, e.g.:
sm[np.random.choice(sm.shape[0], K, replace=False), :]
as opposed to:
sm[np.random.sample(sm.shape[0], K, replace=False), :]

Is there an efficient way of concatenating scipy.sparse matrices?

I'm working with some rather large sparse matrices (from 5000x5000 to 20000x20000) and need to find an efficient way to concatenate matrices in a flexible way in order to construct a stochastic matrix from separate parts.
Right now I'm using the following way to concatenate four matrices, but it's horribly inefficient. Is there any better way to do this that doesn't involve converting to a dense matrix?
rmat[0:m1.shape[0],0:m1.shape[1]] = m1
rmat[m1.shape[0]:rmat.shape[0],m1.shape[1]:rmat.shape[1]] = m2
rmat[0:m1.shape[0],m1.shape[1]:rmat.shape[1]] = bridge
rmat[m1.shape[0]:rmat.shape[0],0:m1.shape[1]] = bridge.transpose()
The sparse library now has hstack and vstack for respectively concatenating matrices horizontally and vertically.
Amos's answer is no longer necessary. Scipy now does something similar to this internally if the input matrices are in csr or csc format and the desired output format is set to none or the same format as the input matrices. It's efficient to vertically stack matrices in csr format, or to horizontally stack matrices in csc format, using scipy.sparse.vstack or scipy.sparse.hstack, respectively.
Using hstack, vstack, or concatenate, is dramatically slower than concatenating the inner data objects themselves. The reason is that hstack/vstack converts the sparse matrix to coo format which can be very slow when the matrix is very large not and not in coo format. Here is the code for concatenating csc matrices, similar method can be used for csr matrices:
def concatenate_csc_matrices_by_columns(matrix1, matrix2):
new_data = np.concatenate((matrix1.data, matrix2.data))
new_indices = np.concatenate((matrix1.indices, matrix2.indices))
new_ind_ptr = matrix2.indptr + len(matrix1.data)
new_ind_ptr = new_ind_ptr[1:]
new_ind_ptr = np.concatenate((matrix1.indptr, new_ind_ptr))
return csc_matrix((new_data, new_indices, new_ind_ptr))
Okay, I found the answer. Using scipy.sparse.coo_matrix is much much faster than using lil_matrix. I converted the matrices to coo (painless and fast) and then just concatenated the data, rows and columns after adding the right padding.
data = scipy.concatenate((m1S.data,bridgeS.data,bridgeTS.data,m2S.data))
rows = scipy.concatenate((m1S.row,bridgeS.row,bridgeTS.row + m1S.shape[0],m2S.row + m1S.shape[0]))
cols = scipy.concatenate((m1S.col,bridgeS.col+ m1S.shape[1],bridgeTS.col ,m2S.col + m1S.shape[1]))
scipy.sparse.coo_matrix((data,(rows,cols)),shape=(m1S.shape[0]+m2S.shape[0],m1S.shape[1]+m2S.shape[1]) )

Categories