How to invert a matrix that contains all-zero rows? - python

I am running this algorithm, which computes the MLE of the matrix normal distribution.
One part of the algorithm requires computing the inverse of a matrix that contains all-zero rows. How can we solve this problem?

Related

Computing top eigenvalues, operator norm of sparse matrix

I have a large sparse square non-normal matrix: 73080 rows, but only 6 nonzero entries per row (and all equal to 1.). I'd like to compute the two largest eigenvalues, as well as the operator (2) norm, ideally with Python. The natural way for me to store this matrix is with scipy's csr_matrix, especially since I'll be multiplying it with other sparse matrices. However, I don't see a good way to compute the relevant statistics: scipy.sparse.linalg's norm method doesn't have the 2-norm implemented and converting to a dense matrix seems like it would be a bad idea, and running scipy.sparse.linalg.eigs seems to run extremely, maybe prohibitively, slowly, and in any event it computes lots of data that I just don't need. I suppose I could subtract off the spectral projector corresponding to the top eigenvalue but then I'd still need to know the top eigenvalue of the new matrix, which I'd like to do with an out-of-the-box method if at all possible, and in any event this wouldn't continue to work after multiplying with other large sparse matrices.
However, these kinds of computations seem to be doable: the top of page 6 of this paper seems to have data on the eigenvalues of ~10000-row matrices. If this is not feasible in Python, is there another way I should try to do this? Thanks in advance.

6x6 block matrix inversion

I'm facing the inversion of a 6x6 matrix which can also be represented as a symmetric block matrix as following:
Each of the P sub matrices is then a 3x3 matrix. P12 and P21 are equal so that P is symmetric. I would like to exploit this form to compute the inverse P matrix in an efficient way. Until now I'm using using the inv() function from Scipy directly on P but, having profiled my code and considering that I have to invert this type of matrices thousands of times in the code I would wish for a better way. Looking up online I found a formula using Schur complements as follow:
I'm wondering if using this strategy will be more computationally efficient then inverting the 6x6 matrix after assembling it. Since the blocks are only 3x3 I could also think of using formulas for calculating the inverse of the blocks, and then use them in the formula represented in the second picture.

Find underlaying normal distribution of random vectors

I am trying to solve a statistics-related real world problem with Python and am looking for inputs on my ideas: I have N random vectors from a m-dimensional normal distribution. I have no information about the means and the covariance matrix of the underlying distribution, in fact also that it is a normal distribution is only an assumption, a very plausible one though. I want to compute an approximation of the mean vector and covariance matrix of the distribution. The number of random vectors is in the order of magnitude of 100 to 300, the dimensionality of the normal distribution is somewhere between 2 and 5. The time for the calculation should ideally not exceed 1 minute on a standard home computer.
I am currently thinking about three approaches and am happy about all suggestions for other approaches or preferences between those three:
Fitting: Make a multi dimensional histogram of all random vectors and fit a multi dimensional normal distribution to the histogram. Problem about that approach: The covariance matrix has many entries, this could possibly be a problem for the fitting process?
Invert cumulative distribution function: Make a multi dimensional histogram as approximation of the density function of the random vectors. Then integrate this to get a multi dimensional cumulative distribution function. For one dimension, this is invertible and one could use the cum-dist function to distribute random numbers like in the original distribution. Problem: For the multi-dimensional case the cum-dist function is not invertible(?) and I don't know if this approach still works then?
Bayesian: Use Bayesian Statistics with some normal distribution as prior and update for every observation. The result should always be again a normal distribution. Problem: I think this is computationally expensive? Also, I don't want the later updates have more impact on the resulting distribution than the earlier ones.
Also, maybe there is some library which has this task already implemented? I did not find exactly this in Numpy or Scipy, maybe someone has an idea where else to look?
If the simple estimates described in the section Parameter estimation of the wikipedia article on the multivariate normal distribution are sufficient for your needs, you can use numpy.mean to compute the mean and numpy.cov to compute the sample covariance matrix.

Dissimilarity matrix of a scipy.sparse.csc.csc_matrix in Python

I am searching for a Python implementation of computing dissimilarity measures of a sparse matrix. I used using scipy.spatial.distance.pdist. But I get an error:
ValueError: setting an array element with a sequence.
I think this is because pdist does not work on scipy.sparse.csc.csc_matrix. It works fine on dense matrix. So far I have been using
Dismat = A.T*T
to compute the euclidean distance, where A is the sparse matrix and Dismat is the dissimilarity matrix. But I would like to compute other distance such as Manhattan, jaccard, shortest path and so on.
I am wondering if anyone know whether there is a python package that calculates dissimilarity measures of a scipy.sparse.csc.csc_matrix. That would be great!

PCA on large Sparse matrix using Correlation matrix

I have a large (500k by 500k), sparse matrix. I would like to get the principle components of it (in fact, even computing just the largest PC would be fine). Randomized PCA works great, except that it is essentially finding the eigenvectors of the covariance matrix instead of the correlation matrix. Any ideas of a package that will find PCA using the covariance matrix of a large, sparse matrix? Preferably in python, though matlab and R work too.
(For reference, a similar question was asked here but the methods refer to covariance matrix).
Are they not the same thing? As far as I understand it, the correlation matrix is just the covariance matrix normalised by the product of each variable's standard deviation. And, if I recall correctly, isn't there a scaling ambiguity in PCA anyway?
Have you ever tried irlba package in R - "The IRLBA package is the R language implementation of the method. With it, you can compute partial SVDs and principal component analyses of very large scale data. The package works well with sparse matrices and with other matrix classes like those provided by the Bigmemory package." you can check here for details

Categories