How to compute scipy sparse matrix determinant without turning it to dense? - python

I am trying to figure out the fastest method to find the determinant of sparse symmetric and real matrices in python. using scipy sparse module but really surprised that there is no determinant function. I am aware I could use LU factorization to compute determinant but don't see a easy way to do it because the return of scipy.sparse.linalg.splu is an object and instantiating a dense L and U matrix is not worth it - I may as well do sp.linalg.det(A.todense()) where A is my scipy sparse matrix.
I am also a bit surprised why others have not faced the problem of efficient determinant computation within scipy. How would one use splu to compute determinant?
I looked into pySparse and scikits.sparse.chlmod. The latter is not practical right now for me - needs package installations and also not sure sure how fast the code is before I go into all the trouble.
Any solutions? Thanks in advance.

Here are some references I provided as part of an answer here.
I think they address the actual problem you are trying to solve:
notes for an implementation in the Shogun library
Erlend Aune, Daniel P. Simpson: Parameter estimation in high dimensional Gaussian distributions, particularly section 2.1 (arxiv:1105.5256)
Ilse C.F. Ipsen, Dean J. Lee: Determinant Approximations (arxiv:1105.0437)
Arnold Reusken: Approximation of the determinant of large sparse symmetric positive definite matrices (arxiv:hep-lat/0008007)
Quoting from the Shogun notes:
The usual technique for computing the log-determinant term in the likelihood expression relies on Cholesky factorization of the matrix, i.e. Σ=LLT, (L is the lower triangular Cholesky factor) and then using the diagonal entries of the factor to compute log(det(Σ))=2∑ni=1log(Lii). However, for sparse matrices, as covariance matrices usually are, the Cholesky factors often suffer from fill-in phenomena - they turn out to be not so sparse themselves. Therefore, for large dimensions this technique becomes infeasible because of a massive memory requirement for storing all these irrelevant non-diagonal co-efficients of the factor. While ordering techniques have been developed to permute the rows and columns beforehand in order to reduce fill-in, e.g. approximate minimum degree (AMD) reordering, these techniques depend largely on the sparsity pattern and therefore not guaranteed to give better result.
Recent research shows that using a number of techniques from complex analysis, numerical linear algebra and greedy graph coloring, we can, however, approximate the log-determinant up to an arbitrary precision [Aune et. al., 2012]. The main trick lies within the observation that we can write log(det(Σ)) as trace(log(Σ)), where log(Σ) is the matrix-logarithm.

The "standard" way to solve this problem is with a cholesky decomposition, but if you're not up to using any new compiled code, then you're out of luck. The best sparse cholesky implementation is Tim Davis's CHOLMOD, which is licensed under the LGPL and thus not available in scipy proper (scipy is BSD).

You can use scipy.sparse.linalg.splu to obtain sparse matrices for the lower (L) and upper (U) triangular matrices of an M=LU decomposition:
from scipy.sparse.linalg import splu
lu = splu(M)
The determinant det(M) can be then represented as:
det(M) = det(LU) = det(L)det(U)
The determinant of triangular matrices is just the product of the diagonal terms:
diagL = lu.L.diagonal()
diagU = lu.U.diagonal()
d = diagL.prod()*diagU.prod()
However, for large matrices underflow or overflow commonly occurs, which can be avoided by working with the logarithms.
diagL = diagL.astype(np.complex128)
diagU = diagU.astype(np.complex128)
logdet = np.log(diagL).sum() + np.log(diagU).sum()
Note that I invoke complex arithmetic to account for negative numbers that might appear in the diagonals. Now, from logdet you can recover the determinant:
det = np.exp(logdet) # usually underflows/overflows for large matrices
whereas the sign of the determinant can be calculated directly from diagL and diagU (important for example when implementing Crisfield's arc-length method):
sign = swap_sign*np.sign(diagL).prod()*np.sign(diagU).prod()
where swap_sign is a term to consider the number of permutations in the LU decomposition. Thanks to #Luiz Felippe Rodrigues, it can be calculated:
swap_sign = (-1)**minimumSwaps(lu.perm_r)
def minimumSwaps(arr):
"""
Minimum number of swaps needed to order a
permutation array
"""
# from https://www.thepoorcoder.com/hackerrank-minimum-swaps-2-solution/
a = dict(enumerate(arr))
b = {v:k for k,v in a.items()}
count = 0
for i in a:
x = a[i]
if x!=i:
y = b[i]
a[y] = x
b[x] = y
count+=1
return count

Things start to go wrong with the determinant of sparse tridiagonal (-1 2 -1) around N=1e6 using both SuperLU and CHOLMOD...
The determinant should be N+1.
It's probably propagation of error when calculating the product of the U diagonal:
from scipy.sparse import diags
from scipy.sparse.linalg import splu
from sksparse.cholmod import cholesky
from math import exp
n=int(5e6)
K = diags([-1.],-1,shape=(n,n)) + diags([2.],shape=(n,n)) + diags([-1.],1,shape=(n,n))
lu = splu(K.tocsc())
diagL = lu.L.diagonal()
diagU = lu.U.diagonal()
det=diagL.prod()*diagU.prod()
print(det)
factor = cholesky(K.tocsc())
ld = factor.logdet()
print(exp(ld))
Output:
4999993.625461911
4999993.625461119
Even if U is 10-13 digit accurate, this might be expected:
n=int(5e6)
print(n*diags([1-0.00000000000025],0,shape=(n,n)).diagonal().prod())
4999993.749444371

Related

not able to resolve LinAlgError: Last 2 dimensions of the array must be square [duplicate]

I need to solve a set of simultaneous equations of the form Ax = B for x. I've used the numpy.linalg.solve function, inputting A and B, but I get the error 'LinAlgError: Last 2 dimensions of the array must be square'. How do I fix this?
Here's my code:
A = matrix([[v1x, v2x], [v1y, v2y], [v1z, v2z]])
print A
B = [(p2x-p1x-nmag[0]), (p2y-p1y-nmag[1]), (p2z-p1z-nmag[2])]
print B
x = numpy.linalg.solve(A, B)
The values of the matrix/vector are calculated earlier in the code and this works fine, but the values are:
A =
(-0.56666301, -0.52472909)
(0.44034147, 0.46768087)
(0.69641397, 0.71129036)
B =
(-0.38038602567630364, -24.092279373295057, 0.0)
x should have the form (x1,x2,0)
In case you still haven't found an answer, or in case someone in the future has this question.
To solve Ax=b:
numpy.linalg.solve uses LAPACK gesv. As mentioned in the documentation of LAPACK, gesv requires A to be square:
LA_GESV computes the solution to a real or complex linear system of equations AX = B, where A is a square matrix and X and B are rectangular matrices or vectors. Gaussian elimination with row interchanges is used to factor A as A = PL*U , where P is a permutation matrix, L is unit lower triangular, and U is upper triangular. The factored form of A is then used to solve the above system.
If A matrix is not square, it means that you either have more variables than your equations or the other way around. In these situations, you can have the cases of no solution or infinite number of solutions. What determines the solution space is the rank of the matrix compared to the number of columns. Therefore, you first have to check the rank of the matrix.
That being said, you can use another method to solve your system of linear equations. I suggest having a look at factorization methods like LU or QR or even SVD. In LAPACK you can use getrs, in Python you can different things:
first do the factorization like QR and then feed the resulting matrices to a method like scipy.linalg.solve_triangular
solve the least-squares using numpy.linalg.lstsq
Also have a look here where a simple example is formulated and solved.
A square matrix is a matrix with the same number of rows and columns. The matrix you are doing is a 3 by 2. Add a column of zeroes to fix this problem.

Eigenanalysis of complex hermitian matrix: different phase angles for EIG and EIGH

I understand that eigenvectors are only defined up to a multiplicative constant. As far as I see all numpy algorithms (e.g. linalg.eig, linalg.eigh, linalg.svd) yield identical eigenvectors for real matrices, so apparently they use the same normalization. In the case of a complex matrix, however, the algorithms yield different results.
That is, the eigenvectors are the same up to a (complex) constant z. After some experimenting with eig and eigh I realised that eigh always sets the phase angle (defined as arctan(complex part/real part)) to 0 for the first component of each eigenvector whereas eig seems to start with some (arbitrary ?) non-zero phase angle.
Q: Is there a way to normalize the eigenvectors from eigh in the way eig is doing it (that is not to force phase angle = 0)?
Example
I have a complex hermitian matrix G for which I want to calculate the eigenvectors using the two following algorithms:
numpy.linalg.eig for a real/complex square matrix
numpy.linalg.eighfor a real symmetric/complex hermitian matrix (special case of 1.)
Check that G is hermitian
# check if a matrix is hermitian
def isHermitian(a, rtol=1e-05, atol=1e-08):
return np.allclose(a, a.conjugate().T, rtol=rtol, atol=atol)
print('G is hermitian:', isHermitian(G))
Out:
G is hermitian: True
Perform eigenanalysis
# eigenvectors from EIG()
l1,u1 = np.linalg.eig(G)
idx = np.argsort(l1)[::-1]
l1,u1 = l1[idx].real,u1[:,idx]
# eigenvectors from EIGH()
l2,u2 = np.linalg.eigh(G)
idx = np.argsort(l2)[::-1]
l2,u2 = l2[idx],u2[:,idx]
Check eigenvalues
print('Eigenvalues')
print('eig\t:',l1[:3])
print('eigh\t:',l2[:3])
Out:
Eigenvalues
eig : [2.55621629e+03 3.48520440e+00 3.16452447e-02]
eigh : [2.55621629e+03 3.48520440e+00 3.16452447e-02]
Both methods yield the same eigenvectors.
Check eigenvectors
Now look at the eigenvectors (e.g. 3. eigenvector) , which differ by a constant factor z.
multFactors = u1[:,2]/u2[:,2]
if np.count_nonzero(multFactors[0] == multFactors):
print("All multiplication factors are same:", multFactors[0])
else:
print("Multiplication factors are different.")
Out:
All multiplication factors are same: (-0.8916113627685007+0.45280147727156245j)
Check phase angle
Now check the phase angle for the first component of the 3. eigenvector:
print('Phase angel (in PI) for first point:')
print('Eig\t:',np.arctan2(u1[0,2].imag,u1[0,2].real)/np.pi)
print('Eigh\t:',np.arctan2(u2[0,2].imag,u2[0,2].real)/np.pi)
Out:
Phase angel (in PI) for first point:
Eig : 0.8504246311627189
Eigh : 0.0
Code to reproduce figure
num = 2
fig = plt.figure()
gs = gridspec.GridSpec(2, 3)
ax0 = plt.subplot(gs[0,0])
ax1 = plt.subplot(gs[1,0])
ax2 = plt.subplot(gs[0,1:])
ax3 = plt.subplot(gs[1,1:])
ax2r= ax2.twinx()
ax3r= ax3.twinx()
ax0.imshow(G.real,vmin=-30,vmax=30,cmap='RdGy')
ax1.imshow(G.imag,vmin=-30,vmax=30,cmap='RdGy')
ax2.plot(u1[:,num].real,label='eig')
ax2.plot((u2[:,num]).real,label='eigh')
ax3.plot(u1[:,num].imag,label='eig')
ax3.plot((u2[:,num]).imag,label='eigh')
for a in [ax0,ax1,ax2,ax3]:
a.set_xticks([])
a.set_yticks([])
ax0.set_title('Re(G)')
ax1.set_title('Im(G)')
ax2.set_title('Re('+str(num+1)+'. Eigenvector)')
ax3.set_title('Im('+str(num+1)+'. Eigenvector)')
ax2.legend(loc=0)
ax3.legend(loc=0)
fig.subplots_adjust(wspace=0, hspace=.2,top=.9)
fig.suptitle('Eigenanalysis of Hermitian Matrix G',size=16)
plt.show()
As you say, the eigenvalue problem only fixes the eigenvectors up to a scalar x. Transforming an eigenvector v as v = v*x does not change its status as an eigenvector.
There is an "obvious" way to normalize the vectors (according to the euclidean inner product np.vdot(v1, v1)), but this only fixes the amplitude of the scalar, which can be complex.
Fixing the angle or "phase" is kind of arbitrary without further context. I tried out eigh() and indeed it just makes the first entry of the vector real (with an apparently random sign!?).
eig() instead chooses to make real the vector entry with the largest real part. For example, here is what I get for a random Hermitian matrix:
n = 10
H = 0.5*(X + X.conj().T)
np.max(la.eig(H)[1], axis=0)
# returns
array([0.57590624+0.j, 0.42672485+0.j, 0.51974879+0.j, 0.54500475+0.j,
0.4644593 +0.j, 0.53492448+0.j, 0.44080532+0.j, 0.50544424+0.j,
0.48589402+0.j, 0.43431733+0.j])
This is arguably more sensible, as just picking the first entry, like eigh() does, is not very robust if the first entry happens to be very small. Picking the max value avoids this. I am not sure if eig() also fixes the sign (a random matrix is not a very good test case for this as it would be very unusual for all entries in an eigenvector to have negative real parts, which is the only case in which an unfixed sign would show up).
In any case, I would not rely on the eigensolver using any particular way of fixing phases. It's not documented and so could, in principle, change in the future. Instead, fix the phases yourself, perhaps the same way eig() does it now.
In my experience (and there are many questions here to back this up), you NEVER want to use eig when eigh is an option - eig is very slow and very unstable. The relevance of this is that I believe your question is backward - you want to normalize the eigenvectors of eig to be like those of eigh, and this you know how to do.

how does numpy.linalg.eigh vs numpy.linalg.svd?

problem description
For a square matrix, one can obtain the SVD
X= USV'
decomposition, by using simply numpy.linalg.svd
u,s,vh = numpy.linalg.svd(X)
routine or numpy.linalg.eigh, to compute the eig decomposition on Hermitian matrix X'X and XX'
Are they using the same algorithm? Calling the same Lapack routine?
Is there any difference in terms of speed? and stability?
Indeed, numpy.linalg.svd and numpy.linalg.eigh do not call the same routine of Lapack. On the one hand, numpy.linalg.eigh refers to LAPACK's dsyevd() while numpy.linalg.svd makes use LAPACK's dgesdd().
The common point between these routines is the use of Cuppen's divide and conquer algorithm, first designed to solve tridiagonal eigenvalue problems. For instance, dsyevd() only handles Hermitian matrix and performs the following steps, only if eigenvectors are required:
Reduce matrix to tridiagonal form using DSYTRD()
Compute the eigenvectors of the tridiagonal matrix using the divide and conquer algorithm, through DSTEDC()
Apply the Householder reflection reported by DSYTRD() using DORMTR().
On the contrary, to compute the SVD, dgesdd() performs the following steps, in the case job==A (U and VT required):
Bidiagonalize A using dgebrd()
Compute the SVD of the bidiagonal matrix using divide and conquer algorithm using DBDSDC()
Revert the bidiagonalization using using the matrices P and Q returned by dgebrd() applying dormbr() twice, once for U and once for V
While the actual operations performed by LAPACK are very different, the strategies are globally similar. It may stem from the fact that computing the SVD of a general matrix A is similar to performing the eigendecomposition of the symmetric matrix A^T.A.
Regarding accuracy and performances of lapack divide and conquer SVD, see This survey of SVD methods:
They often achieve the accuracy of QR-based SVD, though it is not proven
The worst case is O(n^3) if no deflation occurs, but often proves better than that
The memory requirement is 8 times the size of the matrix, which can become prohibitive
Regarding the symmetric eigenvalue problem, the complexity is 4/3n^3 (but often proves better than that) and the memory footprint is about 2n^2 plus the size of the matrix. Hence, the best choice is likely numpy.linalg.eigh if your matrix is symmetric.
The actual complexity can be computed for your particular matrices using the following code:
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
# see https://stackoverflow.com/questions/41109122/fitting-a-curve-to-a-power-law-distribution-with-curve-fit-does-not-work
def func_powerlaw(x, m, c):
return np.log(np.abs( x**m * c))
import time
start = time.time()
print("hello")
end = time.time()
print(end - start)
timeev=[]
timesvd=[]
size=[]
for n in range(10,600):
print n
size.append(n)
A=np.zeros((n,n))
#populate A, 1D diffusion.
for j in range(n):
A[j,j]=2.
if j>0:
A[j-1,j]=-1.
if j<n-1:
A[j+1,j]=-1.
#EIG
Aev=A.copy()
start = time.time()
w,v=np.linalg.eigh(Aev,'L')
end = time.time()
timeev.append(end-start)
Asvd=A.copy()
start = time.time()
u,s,vh=np.linalg.svd(Asvd)
end = time.time()
timesvd.append(end-start)
poptev, pcov = curve_fit(func_powerlaw, size[len(size)/2:], np.log(timeev[len(size)/2:]),p0=[2.1,1e-7],maxfev = 8000)
print poptev
poptsvd, pcov = curve_fit(func_powerlaw, size[len(size)/2:], np.log(timesvd[len(size)/2:]),p0=[2.1,1e-7],maxfev = 8000)
print poptsvd
plt.figure()
fig, ax = plt.subplots()
plt.plot(size,timeev,label="eigh")
plt.plot(size,[np.exp(func_powerlaw(x, poptev[0], poptev[1])) for x in size],label="eigh-adjusted complexity: "+str(poptev[0]))
plt.plot(size,timesvd,label="svd")
plt.plot(size,[np.exp(func_powerlaw(x, poptsvd[0], poptsvd[1])) for x in size],label="svd-adjusted complexity: "+str(poptsvd[0]))
ax.set_xlabel('n')
ax.set_ylabel('time, s')
#plt.legend(loc="upper left")
ax.legend(loc="lower right")
ax.set_yscale("log", nonposy='clip')
fig.tight_layout()
plt.savefig('eigh.jpg')
plt.show()
For such 1D diffusion matrices, eigh outperforms svd, but the actual complexity are similar, slightly lower than n^3, something like n^2.5.
Checking of the accuracy could be performed as well.
No they do not use the same algorithm as they do different things. They are somewhat related but also very different. Let's start with the fact that you can do SVD on m x n matrices, where m and n don't need to be the same.
Dependent on the version of numpy, you are doing. Here are the eigenvalue routines in lapack for double precision:
http://www.netlib.org/lapack/explore-html/d9/d8e/group__double_g_eeigen.html
And the according SVD routines:
http://www.netlib.org/lapack/explore-html/d1/d7e/group__double_g_esing.html
There are differences in routines. Big differences. If you care for the details, they are specified in the fortran headers very well. In many cases it makes sense to find out, what kind of matrix you have in front of you, to make a good choice of routine. Is the matrix symmetric/hermitian? Is it in upper diagonal form? Is it positive semidefinite? ...
There are gynormous differences in runtime. But as rule of thumb EIGs are cheaper than SVDs. But that depends also on convergence speed, which in turn depends a lot on condition number of the matrix, in other words, how ill posed a matrix is, ...
SVDs are usually very robust and slow algorithms and oftentimes used for inversion, speed optimisation through truncation, principle component analysis really with the expetation, that the matrix you are dealing with is just a pile of shitty rows ;)

fit through origin via matrix algebra

Usually I use the following code to carry out a linear fit or a quadratic fit. Sometimes it is necessary to weight the model 1/x2 using weight=2. I would like to know if I can force a model through the origin via adding some matrix algebra (obviously if weight=0). Thanks.
import numpy
from pylab import *
data=loadtxt('...')
degree=1
weight=0
x,y,w=data[:,0],data[:,1],1/data[:,0]**weight
n=len(data)
d=degree+1
f=zeros(n*d).reshape((n,d))
for i in range(0,n):
for j in range(0,d):
f[i,j]=x[i]**j
q=diag(w)
fT=dot(transpose(f),q)
fTx=dot(fT,f)
fTy=dot(fT,y)
coeffs=dot(inv(fTx),fTy)
For the weight=0 case, get rid of the constant term in your feature vector by changing
for j in range(0,d) to for j in range(1,d).
For larger values of your weight term, the weights associated with 1/x^p terms would have to be zero, which probably won't happen in the ordinary least squares solution.
For best numpy practices, I would suggest that you replace zeros(n*d).reshape((n,d)) with zeros( (n,d) ) and dot(inv(fTx),fTy) with linalg.solve(fTx,fTy).

Scipy's sparse eigsh() for small eigenvalues

I'm trying to write a spectral clustering algorithm using NumPy/SciPy for larger (but still tractable) systems, making use of SciPy's sparse linear algebra library. Unfortunately, I'm running into stability issues with eigsh().
Here's my code:
import numpy as np
import scipy.sparse
import scipy.sparse.linalg as SLA
import sklearn.utils.graph as graph
W = self._sparse_rbf_kernel(self.X_, self.datashape)
D = scipy.sparse.csc_matrix(np.diag(np.array(W.sum(axis = 0))[0]))
L = graph.graph_laplacian(W) # D - W
vals, vects = SLA.eigsh(L, k = self.k, M = D, which = 'SM', sigma = 0, maxiter = 1000)
The sklearn library refers to the scikit-learn package, specifically this method for calculating a graph laplacian from a sparse SciPy matrix.
_sparse_rbf_kernel is a method I wrote to compute pairwise affinities of the data points. It operates by creating a sparse affinity matrix from image data, specifically by only computing pairwise affinities for the 8-neighborhoods around each pixel (instead of pairwise for all pixels with scikit-learn's rbf_kernel method, which for the record doesn't fix this either).
Since the laplacian is unnormalized, I'm looking for the smallest eigenvalues and corresponding eigenvectors of the system. I understand that ARPACK is ill-suited for finding small eigenvalues, but I'm trying to use shift-invert to find these values and am still not having much success.
With the above arguments (specifically, sigma = 0), I get the following error:
RuntimeError: Factor is exactly singular
With sigma = 0.001, I get a different error:
scipy.sparse.linalg.eigen.arpack.arpack.ArpackNoConvergence: ARPACK error -1: No convergence (1001 iterations, 0/5 eigenvectors converged)
I've tried all three different values for mode with the same result. Any suggestions for using the SciPy sparse library for finding small eigenvalues of a large system?
You should use which='LM': in the shift-invert mode, this parameter refers to the transformed eigenvalues. (As explained in the documentation.)

Categories