init a COO matrix from its attributes (data,row,col) - python

I have a script that computes the three attributes of the COO format:
data COO format data array of the matrix
row COO format row index array of the matrix
col COO format column index array of the matrix
And i want to use these three arrays to initialise a coo_matrix() in order to use the methods available to the coo_matrix class. What would be the fastest way to do that without changing the main script?

scipy.sparse documentation
coo_matrix((data, (row, column)), [shape=(M, N)])

Related

python: How to construct a sparse matrix out of a 2D-image?

I have a 2D array which represents an image (376, 450)
How can I construct a new diagonal matrix (sparse matrix) which has all of the entries of the image along the diagonal and the rest zeroes? When I use for example scipy.sparse.diags I get following error:
ValueError: Different number of diagonals and offsets.
I already checked the documentation here but for me, it is still not quite clear how to do that. Thanks a lot in advance for any tip!
Edit: If I just want to have the entries of the image (2D array) just in the zero-diagonal do I have to flatten the 2D array first into a single dimension?
That I have something like:
from scipy.sparse import diags
new_flat_array = [1,2,3,4, ...]
diags(new_flat_array, [0]).toarray()
But that way the resulting array will not be a sparse 2D matrix which I want.

Transpose a 1-dimensional array in Numpy without casting to matrix

My goal is to to turn a row vector into a column vector and vice versa. The documentation for numpy.ndarray.transpose says:
For a 1-D array, this has no effect. (To change between column and row vectors, first cast the 1-D array into a matrix object.)
However, when I try this:
my_array = np.array([1,2,3])
my_array_T = np.transpose(np.matrix(myArray))
I do get the wanted result, albeit in matrix form (matrix([[66],[640],[44]])), but I also get this warning:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
my_array_T = np.transpose(np.matrix(my_array))
How can I properly transpose an ndarray then?
A 1D array is itself once transposed, contrary to Matlab where a 1D array doesn't exist and is at least 2D.
What you want is to reshape it:
my_array.reshape(-1, 1)
Or:
my_array.reshape(1, -1)
Depending on what kind of vector you want (column or row vector).
The -1 is a broadcast-like, using all possible elements, and the 1 creates the second required dimension.
If your array is my_array and you want to convert it to a column vector you can do:
my_array.reshape(-1, 1)
For a row vector you can use
my_array.reshape(1, -1)
Both of these can also be transposed and that would work as expected.
IIUC, use reshape
my_array.reshape(my_array.size, -1)

Simple extending csr matrix by adding a column

I have this code
import numpy as np
from scipy.sparse import csr_matrix
q = csr_matrix([[1.], [0.]])
ones = np.ones((2, 1))
and now how to add ones column to matrix q to have result shape (2, 2)?
(matrix q is sparse and I don't want to change type from csr)
The code for sparse.hstack is
return bmat([blocks], format=format, dtype=dtype)
for bmat, then blocks is a 1xN array. If they are all csc, it does a fast version of stack:
A = _compressed_sparse_stack(blocks[0,:], 1)
Conversely sparse.vstack with csr matrixes does
A = _compressed_sparse_stack(blocks[:,0], 0)
In effect given how data is stored in a csr matrix it it relatively easy to add rows (or columns for csc) (I can elaborate if that needs explanation).
Otherwise bmat does:
# convert everything to COO format
# calculate total nnz
data = np.empty(nnz, dtype=dtype)
for B in blocks:
data[nnz:nnz + B.nnz] = B.data
return coo_matrix((data, (row, col)), shape=shape).asformat(format)
In other words it gets the data, row, col values for each block, concatenates them, makes a new coo matrix, and finally converts it to the desire format.
sparse readily converts between formats. Even the display of a matrix can involve a conversion - to coo for the (i,j) d format, to csr for dense/array. sparse.nonzero converts to coo. Most math converts to csr. A csr is transposed by converting it to a csc (without change of attribute arrays). Much of the conversion is done in compiled code so you don't see delays.
Adding columns directly to csr format is a lot of work. All 3 attribute arrays have to be modified row by row. Again I could go into detail if needed.

mmap sparse vector in python

I'm looking for simple sparse vector implementation that can be mapped into memory, similarly to numpy.memmap.
Unfortunately, numpy implementation deals only with full vector. Example usage:
vec = SparseVector('/tmp/file.dat') # SparseVector is the class I'm looking for
vec[10] = 10
vec[50] = 21
for key in vec:
print vec[key] # 10, 21
I foung scipy class representing sparse matrix, however 2 dimensions are clumsy to use as I'd need to make matrix with only one row a then use vec[0,i].
Any suggestions?
Someone else was just asking about 1d sparse vectors, only they wanted to take advantage of the scipy.sparse method of handling duplicate indices.
is there something like coo_matrix but for sparse vectors?
As shown there, a coo_matrix actually consists of 3 numpy arrays, data, row, col. Other formats rearrange the values in other ways, lil for example has 2 nested lists, one for the data, another for the coordinates. dok is a regular dictionary, with (i,j) tuples as keys.
In theory then a sparse vector will require 2 arrays. Or as your example shows it could be a simple dictionary.
So you could implement a mmap sparse vector by using two mmap arrays. As far as I know there isn't a mmap version of the scipy sparse matrices, though it's not something I've looked for.
But what functionality do you want? What dimension? So large that a dense version would not fit in regular memory? Are you doing math with it? Or just data lookup?

numpy create 2D mask from list of indices [+ then draw from masked array]

I have a 2-D array of values and need to mask certain elements of that array (with indices taken from a list of ~ 100k tuple-pairs) before drawing random samples from the remaining elements without replacement.
I need something that is both quite fast/efficient (hopefully avoiding for loops) and has a small memory footprint because in practice the master array is ~ 20000 x 20000.
For now I'd be content with something like (for illustration):
xys=[(1,2),(3,4),(6,9),(7,3)]
gxx,gyy=numpy.mgrid[0:100,0:100]
mask = numpy.where((gxx,gyy) not in set(xys)) # The bit I can't get right
# Now sample the masked array
draws=numpy.random.choice(master_array[mask].flatten(),size=40,replace=False)
Fortunately for now I don't need the x,y coordinates of the drawn fluxes - but bonus points if you know an efficient way to do this all in one step (i.e. it would be acceptable for me to identify those coordinates first and then use them to fetch the corresponding master_array values; the illustration above is a shortcut).
Thanks!
Linked questions:
Numpy mask based on if a value is in some other list
Mask numpy array based on index
Implementation of numpy in1d for 2D arrays?
You can do it efficently using sparse coo matrix
from scipy import sparse
xys=[(1,2),(3,4),(6,9),(7,3)]
coords = zip(*xys)
mask = sparse.coo_matrix((numpy.ones(len(coords[0])), coords ), shape= master_array.shape, dtype=bool)
draws=numpy.random.choice( master_array[~mask.toarray()].flatten(), size=10)

Categories