Computing top eigenvalues, operator norm of sparse matrix - python

I have a large sparse square non-normal matrix: 73080 rows, but only 6 nonzero entries per row (and all equal to 1.). I'd like to compute the two largest eigenvalues, as well as the operator (2) norm, ideally with Python. The natural way for me to store this matrix is with scipy's csr_matrix, especially since I'll be multiplying it with other sparse matrices. However, I don't see a good way to compute the relevant statistics: scipy.sparse.linalg's norm method doesn't have the 2-norm implemented and converting to a dense matrix seems like it would be a bad idea, and running scipy.sparse.linalg.eigs seems to run extremely, maybe prohibitively, slowly, and in any event it computes lots of data that I just don't need. I suppose I could subtract off the spectral projector corresponding to the top eigenvalue but then I'd still need to know the top eigenvalue of the new matrix, which I'd like to do with an out-of-the-box method if at all possible, and in any event this wouldn't continue to work after multiplying with other large sparse matrices.
However, these kinds of computations seem to be doable: the top of page 6 of this paper seems to have data on the eigenvalues of ~10000-row matrices. If this is not feasible in Python, is there another way I should try to do this? Thanks in advance.

Related

6x6 block matrix inversion

I'm facing the inversion of a 6x6 matrix which can also be represented as a symmetric block matrix as following:
Each of the P sub matrices is then a 3x3 matrix. P12 and P21 are equal so that P is symmetric. I would like to exploit this form to compute the inverse P matrix in an efficient way. Until now I'm using using the inv() function from Scipy directly on P but, having profiled my code and considering that I have to invert this type of matrices thousands of times in the code I would wish for a better way. Looking up online I found a formula using Schur complements as follow:
I'm wondering if using this strategy will be more computationally efficient then inverting the 6x6 matrix after assembling it. Since the blocks are only 3x3 I could also think of using formulas for calculating the inverse of the blocks, and then use them in the formula represented in the second picture.

Fast subsequent multiplication of many matrices in python

I have to generate a matrix (propagator in physics) by ordered multiplication of many other matrices. Each matrix is about the size of (30,30), all real entries (floats), but not symmetric. The number of matrices to multiply varies between 1e3 to 1e5. Each matrix is only slightly different from previous, however they are not commutative (and at the end I need the product of all these non-commutative multiplication). Each matrix is for certain time slice, so I know how to generate each of them independently, wherever they are in the multiplication sequence. At the end, I have to produce many such matrix propagators, so any performance enhancement is welcomed.
What is the algorithm for fastest implementation of such matrix multiplication in python?
In particular -
How to structure it? Are there fast axes and so on? preferable dimensions for rows / columns of the matrix?
Assuming memory is not a problem, to allocate and build all matrices before multiplication, or to generate each per time step? To store each matrix in dedicated variable before multiplication, or to generate when needed and directly multiply?
Cumulative effects of function call overheads effects when generating matrices?
As I know how to build each, should it be parallelized? For example maybe to create batch sequences from start of the sequence and from the end, multiply them in parallel and at the end multiply the results in proper order?
Is it preferable to use other module than numpy? Numba can be useful? or some other efficient way to compile in place to C, or use of optimized external libraries? (please give reference if so, I don't have experience in that)
Thanks in advance.
I don't think that the matrix multiplication would take much time. So, I would do it in a single loop. The assembling is probably the costly part here.
If you have bigger matrices, a map-reduce approach could be helpful. (split the set of matrices, apply matrix multiplication to each set and do the same for the resulting matrices)
Numpy is perfectly fine for problems like this as it is pretty optimized. (and is partly in C)
Just test how much time the matrix multiplication takes and how much the assembling. The result should indicate where you need to optimize.

Performing Decomposition on Sparse Matrices in Python

I'm trying to decomposing signals in components (matrix factorization) in a large sparse matrix in Python using the sklearn library.
I made use of scipy's scipy.sparse.csc_matrix to construct my matrix of data. However I'm unable to perform any analysis such as factor analysis or independent component analysis. The only thing I'm able to do is use truncatedSVD or scipy's scipy.sparse.linalg.svds and perform PCA.
Does anyone know any work-arounds to doing ICA or FA on a sparse matrix in python? Any help would be much appreciated! Thanks.
Given:
M = UΣV^t
The drawback with SVD is that the matrix U and V^t are dense matrices. It doesn't really matter that the input matrix is sparse, U and T will be dense. Also the computational complexity of SVD is O(n^2*m) or O(m^2*n) where n is the number of rows and m the number of columns in the input matrix M. It depends on which one is biggest.
It is worth mentioning that SVD will give you the optimal solution and if you can live with a smaller loss, calculated by the frobenius norm, you might want to consider using the CUR algorithm. It will scale to larger datasets with O(n*m).
U = CUR^t
Where C and R are now SPARSE matrices.
If you want to look at a python implementation, take a look at pymf. But be a bit careful about that exact implementations since it seems, at this point in time, there is an open issue with the implementation.
Even the input matrix is sparse the output will not be a sparse matrix. If the system does not support a dense matrix neither the results will not be supported
It is usually a best practice to use coo_matrix to establish the matrix and then convert it using .tocsc() to manipulate it.

Inverse of a Matrix in Python

While trying to compute inverse of a matrix in python using numpy.linalg.inv(matrix), I get singular matrix error. Why does it happen? Has it anything to do with the smallness of the values in the matrix. The numbers in my matrix are probabilities and add up to 1.
It may very well have to do with the smallness of the values in the matrix.
Some matrices that are not, in fact, mathematically singular (with a zero determinant) are totally singular from a practical point of view, in that the math library one is using cannot process them properly.
Numerical analysis is tricky, as you know, and how well it deals with such situations is a measure of the quality of a matrix library.

PCA on large Sparse matrix using Correlation matrix

I have a large (500k by 500k), sparse matrix. I would like to get the principle components of it (in fact, even computing just the largest PC would be fine). Randomized PCA works great, except that it is essentially finding the eigenvectors of the covariance matrix instead of the correlation matrix. Any ideas of a package that will find PCA using the covariance matrix of a large, sparse matrix? Preferably in python, though matlab and R work too.
(For reference, a similar question was asked here but the methods refer to covariance matrix).
Are they not the same thing? As far as I understand it, the correlation matrix is just the covariance matrix normalised by the product of each variable's standard deviation. And, if I recall correctly, isn't there a scaling ambiguity in PCA anyway?
Have you ever tried irlba package in R - "The IRLBA package is the R language implementation of the method. With it, you can compute partial SVDs and principal component analyses of very large scale data. The package works well with sparse matrices and with other matrix classes like those provided by the Bigmemory package." you can check here for details

Categories