I'm trying to implement Reinsch's Algorithm (pp 4).
Since the working matrices are sparse, I'm using scipy.sparse module, but as you can see, Reinsch's algorithm needs the Cholesky decomposition of a sparse matrix (let's call it my_matrix) in order to solve certain system, but I couldn't find anything related to this.
Of course, in the same algorithm I can solve the sparse system using, for instance scipy.sparse.linalg.spsolve, and then at the end of the algorithm use something like:
R = numpy.linalg.chol( my_matrix.A )
But, in my application my_matrix is usualy about 800*800, so the last one is very inneficient.
So, my question is, where I can find such decomposition?.
Thank's in advance.
For fast decomposition, you can try,
from scikits.sparse.cholmod import cholesky
factor = cholesky(A.toarray())
x = factor(b)
A is your sparse, symmetric, positive-definite matrix.
Since your matrix is not "Huge!!" converting it into numpy array doesn't cause any problem.
Related
I have a large sparse square non-normal matrix: 73080 rows, but only 6 nonzero entries per row (and all equal to 1.). I'd like to compute the two largest eigenvalues, as well as the operator (2) norm, ideally with Python. The natural way for me to store this matrix is with scipy's csr_matrix, especially since I'll be multiplying it with other sparse matrices. However, I don't see a good way to compute the relevant statistics: scipy.sparse.linalg's norm method doesn't have the 2-norm implemented and converting to a dense matrix seems like it would be a bad idea, and running scipy.sparse.linalg.eigs seems to run extremely, maybe prohibitively, slowly, and in any event it computes lots of data that I just don't need. I suppose I could subtract off the spectral projector corresponding to the top eigenvalue but then I'd still need to know the top eigenvalue of the new matrix, which I'd like to do with an out-of-the-box method if at all possible, and in any event this wouldn't continue to work after multiplying with other large sparse matrices.
However, these kinds of computations seem to be doable: the top of page 6 of this paper seems to have data on the eigenvalues of ~10000-row matrices. If this is not feasible in Python, is there another way I should try to do this? Thanks in advance.
I am trying to invert a large (150000,150000) sparse matrix as follows:
import scipy as sp
import scipy.sparse.linalg as splu
#Bs is a large sparse matrix with shape=(150000,150000)
#calculating the sparse inverse
iBs=splu.inv(Bs)
leads to the following error message:
Traceback (most recent call last):
iBs=splu.inv(Bs)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py", line 134, in spsolve
autoTranspose=True)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/dsolve/umfpack/umfpack.py", line 603, in linsolve
self.numeric(mtx)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/dsolve/umfpack/umfpack.py", line 450, in numeric
umfStatus[status]))
RuntimeError: <function umfpack_di_numeric at 0x7f2c76b1d320> failed with UMFPACK_ERROR_out_of_memory
I rejigged the program to simply solve a system of linear differential equations:
import numpy as np
N=Bs.shape[0]
I=np.ones(N)
M=splu.spsolve(Bs,I)
and I encounter the same error again
I was using this code on a machine with 16 GB of RAM and then moved it onto a server with 32 GB of RAM, still to no avail.
has anyone encountered this before?
First let me say that this question should be better asked on http://scicomp.stackexchange.com where there is a great community of experts in computational science and numerical linear algebra.
Let's start from the basics: never invert a sparse matrix, it's completely meaningless. See this discussion on MATLAB central and particularly this comment by Tim Davis.
Briefly: there are no algorithms for numerically inverting a matrix. Whenever you try to numerically compute the inverse of a NxN matrix, you solve in fact N linear systems with N rhs vectors corresponding to the columns of the identity matrix.
In other words, when you compute
from scipy.sparse import eye
from scipy.sparse.linalg import (inv, spsolve)
N = Bs.shape[0]
iBs = inv(Bs)
iBs = spsolve(Bs, eye(N))
the last two statements (inv(eye) and spsolve(Bs, eye(N))) are equivalent. Please note that the identity matrix (eye(N)) is not a ones vector (np.ones(N)) as you question falsely assumes.
The point here is that matrix inverses are seldom useful in numerical linear algebra: the solution of Ax = b is not computed as inv(A)*b, but by a specialised algorithm.
Going to your specific problem, for big sparse system of equations there are no black-box solvers. You can chose the correct class of solvers only if you have a good understanding of the structure and properties of your matrix problem. The properties of your matrices in turn are a consequence of the problem you are trying to solve. E.g. when you discretise by the FEM a system of elliptic PDE, you end up with a symmetric positive sparse system of algebraic equations. Once you know the properties of your problem, you can choose the correct solving strategy.
In your case, you are trying to use a generic direct solver, without reordering the equations. It is well known that this will generate fill-ins which destroy the sparsity of the iBs matrix in the first phase of the spsolve function (which should be a factorisation.) Please note that a full double precision 150000 x 150000 matrix requires about 167 GB of memory. There are a lot of techniques for reordering equations in order to reduce the fill-in during factorisation, but you don't provide enough info for giving you a sensible hint.
I'm sorry, but you should considering reformulating your question on http://scicomp.stackexchange.com clearly stating which is the problem you are trying to solve, in order to give a clue on the matrix structure and properties.
A sparse array only fits the non-zero entries of your matrix into memory. Now suppose you do an inversion. This means that almost all entries of the matrix become non-zero. Sparse matrices are memory optimized.
There are some operations which you can apply on sparse matrices without losing the "spare" property:
Addition, just adding a constant can keep the sparse matrix sparse.
I'm trying to decomposing signals in components (matrix factorization) in a large sparse matrix in Python using the sklearn library.
I made use of scipy's scipy.sparse.csc_matrix to construct my matrix of data. However I'm unable to perform any analysis such as factor analysis or independent component analysis. The only thing I'm able to do is use truncatedSVD or scipy's scipy.sparse.linalg.svds and perform PCA.
Does anyone know any work-arounds to doing ICA or FA on a sparse matrix in python? Any help would be much appreciated! Thanks.
Given:
M = UΣV^t
The drawback with SVD is that the matrix U and V^t are dense matrices. It doesn't really matter that the input matrix is sparse, U and T will be dense. Also the computational complexity of SVD is O(n^2*m) or O(m^2*n) where n is the number of rows and m the number of columns in the input matrix M. It depends on which one is biggest.
It is worth mentioning that SVD will give you the optimal solution and if you can live with a smaller loss, calculated by the frobenius norm, you might want to consider using the CUR algorithm. It will scale to larger datasets with O(n*m).
U = CUR^t
Where C and R are now SPARSE matrices.
If you want to look at a python implementation, take a look at pymf. But be a bit careful about that exact implementations since it seems, at this point in time, there is an open issue with the implementation.
Even the input matrix is sparse the output will not be a sparse matrix. If the system does not support a dense matrix neither the results will not be supported
It is usually a best practice to use coo_matrix to establish the matrix and then convert it using .tocsc() to manipulate it.
I have a large (500k by 500k), sparse matrix. I would like to get the principle components of it (in fact, even computing just the largest PC would be fine). Randomized PCA works great, except that it is essentially finding the eigenvectors of the covariance matrix instead of the correlation matrix. Any ideas of a package that will find PCA using the covariance matrix of a large, sparse matrix? Preferably in python, though matlab and R work too.
(For reference, a similar question was asked here but the methods refer to covariance matrix).
Are they not the same thing? As far as I understand it, the correlation matrix is just the covariance matrix normalised by the product of each variable's standard deviation. And, if I recall correctly, isn't there a scaling ambiguity in PCA anyway?
Have you ever tried irlba package in R - "The IRLBA package is the R language implementation of the method. With it, you can compute partial SVDs and principal component analyses of very large scale data. The package works well with sparse matrices and with other matrix classes like those provided by the Bigmemory package." you can check here for details
I was wondering if there is a Python package, numpy or otherwise, that has a function that computes the first eigenvalue and eigenvector of a small matrix, say 2x2. I could use the linalg package in numpy as follows.
import numpy as np
def whatever():
A = np.asmatrix(np.rand(2, 2))
evals, evecs = np.linalg.eig(A)
#Assume that the eigenvalues are ordered from large to small and that the
#eigenvectors are ordered accordingly.
return evals[0], evecs[:, 0]
But this takes a really long time. I suspect that it's because numpy computes eigenvectors through some sort of iterative process. So I was wondering if there were a much faster algorithm that only returns the first (largest) eigenvalue and eigenvector, since I only need the first.
For 2x2 matrices of course I can write a function myself, that computes the eigenvalue and eigenvector analytically, but then there are problems with floating point computations, for example when I divide a very big number by a very small number, I get infinity or NaN. Does anyone know anything about this? Please help! Thank you in advance!
Use this: http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html
http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigs.html#scipy.sparse.linalg.eigs
Find k eigenvalues and eigenvectors of the square matrix A.
According to the docs:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eig.html
and also to my own experience, numpy.linalg.eig(A) does NOT sort the eigenvectors in any particular order, which is what the OP and subsequent seem to be assuming. I suggest something like:
rearrangedEvalsVecs = sorted(zip(evals,evecs.T),\
key=lambda x: x[0].real, reverse=True)
There doesn't appear to be a numpy equivalent of Matlab's eigs(A,B,k) for finding the k largest eigenvectors.
If you're interested, Enthought has compiled a table showing the differences between Matlab and numpy. That should be helpful for answering questions like this one: Link
One other thought, for 2x2 matrices, I don't think eigs(A,B,1) would help anyway. The effort involved in computing the first eigenpair leaving the matrix transformed to where the second emerges directly. There is only benefit for 3x3 and larger.