Adding dimensions to scipy sparse - python

I wish to add a dimension to a sparse matrix. In numpy it's simply a matter of doing [:,None]. I tried reshape and resize without any success.
Here's some dummy data:
from scipy.sparse import csr_matrix
data = [1,2,3,4,5,6]
col = [0,0,0,1,1,1]
row = [0,1,2,0,1,2]
a = csr_matrix((data, (row, col)))
a.reshape((3,2,1))
The last line gives the error: ValueError: matrix shape must be two-dimensional. Doing resize instead gives the error ValueError: shape must be a 2-tuple of positive integers.
In my particular case I also need to reshape it to (3,1,2). Any thoughts?

scipy.sparse can only handle 2d arrays. You might want to look into pydata/sparse which looks to handle n-dimensional sparse data while following the array interface. At the moment, it has fewer types of arrays and will have some performance issues, but is being actively developed.

from scipy.sparse import csr_matrix
data = [1,2,3,4,5,6]
col = [0,0,0,1,1,1]
row = [0,1,2,0,1,2]
a = csr_matrix((data, (row, col)))
a.reshape((3,2))

Related

Transpose a 1-dimensional array in Numpy without casting to matrix

My goal is to to turn a row vector into a column vector and vice versa. The documentation for numpy.ndarray.transpose says:
For a 1-D array, this has no effect. (To change between column and row vectors, first cast the 1-D array into a matrix object.)
However, when I try this:
my_array = np.array([1,2,3])
my_array_T = np.transpose(np.matrix(myArray))
I do get the wanted result, albeit in matrix form (matrix([[66],[640],[44]])), but I also get this warning:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
my_array_T = np.transpose(np.matrix(my_array))
How can I properly transpose an ndarray then?
A 1D array is itself once transposed, contrary to Matlab where a 1D array doesn't exist and is at least 2D.
What you want is to reshape it:
my_array.reshape(-1, 1)
Or:
my_array.reshape(1, -1)
Depending on what kind of vector you want (column or row vector).
The -1 is a broadcast-like, using all possible elements, and the 1 creates the second required dimension.
If your array is my_array and you want to convert it to a column vector you can do:
my_array.reshape(-1, 1)
For a row vector you can use
my_array.reshape(1, -1)
Both of these can also be transposed and that would work as expected.
IIUC, use reshape
my_array.reshape(my_array.size, -1)

Transform square matrix into 1D array with NumPy

Let's say I have a square matrix with 20 lines and 20 columns. Using NumPy, what should I do to transform this matrix into a 1D array with a single line and 400 columns (that is, 20.20 = 400, all in one line)?
So far, I've tried:
1) array = np.ravel(matrix)
2) array = np.squeeze(np.asarray(matrix))
But when I print array, it's still a square matrix.
Use the reshape method:
array = matrix.reshape((1,400)).
This works for both Numpy Array and Matrix types.
UPDATE: As sacul noted, matrix.reshape(-1) is more general in terms of dimensions.

how to convert list of different shape of arrays to numpy array in Python

I have a different shape of 3D matrices. Such as:
Matrix shape = [5,10,2048]
Matrix shape = [5,6,2048]
Matrix shape = [5,1,2048]
and so on....
I would like to put them into big matrix, but I am normally getting a shape error (since they have different shape) when I am trying to use numpy.asarray(list_of_matrix) function.
What would be your recommendation to handle such a case?
My implementation was like the following:
matrices = []
matrices.append(mat1)
matrices.append(mat2)
matrices.append(mat3)
result_matrix = numpy.asarray(matrices)
and having shape error!!
UPDATE
I am willing to have a result matrix that is 4D.
Thank you.
I'm not entirely certain if this would work for you, but it looks as though your matrices only disagree along the 1st axis, so why not concatenate them:
e.g.
>>> import numpy as np
>>> c=np.zeros((5,10,2048))
>>> d=np.zeros((5,6,2048))
>>> e=np.zeros((5,1,2048))
>>> f=np.concatenate((c,d,e),axis=1)
>>> f.shape
(5, 17, 2048)
Now, you'd have to keep track of which indices of the 1st axis corresponds to which matrices, but maybe this could work for you?

Simple extending csr matrix by adding a column

I have this code
import numpy as np
from scipy.sparse import csr_matrix
q = csr_matrix([[1.], [0.]])
ones = np.ones((2, 1))
and now how to add ones column to matrix q to have result shape (2, 2)?
(matrix q is sparse and I don't want to change type from csr)
The code for sparse.hstack is
return bmat([blocks], format=format, dtype=dtype)
for bmat, then blocks is a 1xN array. If they are all csc, it does a fast version of stack:
A = _compressed_sparse_stack(blocks[0,:], 1)
Conversely sparse.vstack with csr matrixes does
A = _compressed_sparse_stack(blocks[:,0], 0)
In effect given how data is stored in a csr matrix it it relatively easy to add rows (or columns for csc) (I can elaborate if that needs explanation).
Otherwise bmat does:
# convert everything to COO format
# calculate total nnz
data = np.empty(nnz, dtype=dtype)
for B in blocks:
data[nnz:nnz + B.nnz] = B.data
return coo_matrix((data, (row, col)), shape=shape).asformat(format)
In other words it gets the data, row, col values for each block, concatenates them, makes a new coo matrix, and finally converts it to the desire format.
sparse readily converts between formats. Even the display of a matrix can involve a conversion - to coo for the (i,j) d format, to csr for dense/array. sparse.nonzero converts to coo. Most math converts to csr. A csr is transposed by converting it to a csc (without change of attribute arrays). Much of the conversion is done in compiled code so you don't see delays.
Adding columns directly to csr format is a lot of work. All 3 attribute arrays have to be modified row by row. Again I could go into detail if needed.

easy sampling of vectors from a sparse matrix, and creating a new matrix from the sample (python)

This question has two parts (maybe one solution?):
Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix?
When I'm trying to sample lines using random.sample I get an TypeError: sparse matrix length is ambiguous.
from random import sample
import numpy as np
from scipy.sparse import lil_matrix
K = 2
m = [[1,2],[0,4],[5,0],[0,8]]
sample(m,K) #works OK
mm = np.array(m)
sample(m,K) #works OK
sm = lil_matrix(m)
sample(sm,K) #throws exception TypeError: sparse matrix length is ambiguous.
My current solution is to sample from the number of rows in the matrix, then use getrow(),, something like:
indxSampls = sample(range(sm.shape[0]), k)
sampledRows = []
for i in indxSampls:
sampledRows+=[sm.getrow(i)]
Any other efficient/elegant ideas? the dense matrix size is 1000x30000 and could be larger.
Constructing a sparse matrix from a list of sparse vectors: Now imagine I have the list of sampled vectors sampledRows, how can I convert it to a sparse matrix without densify it, convert it to list of lists and then convet it to lil_matrix?
Try
sm[np.random.sample(sm.shape[0], K, replace=False), :]
This gets you out an LIL-format matrix with just K of the rows (in the order determined by the random.sample). I'm not sure it's super-fast, but it can't really be worse than manually accessing row by row like you're currently doing, and probably preallocates the results.
The accepted answer to this question is outdated and no longer works. With newer versions of numpy, you should use np.random.choice in place of np.random.sample, e.g.:
sm[np.random.choice(sm.shape[0], K, replace=False), :]
as opposed to:
sm[np.random.sample(sm.shape[0], K, replace=False), :]

Categories