how does numpy.linalg.eigh vs numpy.linalg.svd?

how does numpy.linalg.eigh vs numpy.linalg.svd? - python

problem description
For a square matrix, one can obtain the SVD
X= USV'
decomposition, by using simply numpy.linalg.svd
u,s,vh = numpy.linalg.svd(X)
routine or numpy.linalg.eigh, to compute the eig decomposition on Hermitian matrix X'X and XX'
Are they using the same algorithm? Calling the same Lapack routine?
Is there any difference in terms of speed? and stability?

Indeed, numpy.linalg.svd and numpy.linalg.eigh do not call the same routine of Lapack. On the one hand, numpy.linalg.eigh refers to LAPACK's dsyevd() while numpy.linalg.svd makes use LAPACK's dgesdd().
The common point between these routines is the use of Cuppen's divide and conquer algorithm, first designed to solve tridiagonal eigenvalue problems. For instance, dsyevd() only handles Hermitian matrix and performs the following steps, only if eigenvectors are required:
Reduce matrix to tridiagonal form using DSYTRD()
Compute the eigenvectors of the tridiagonal matrix using the divide and conquer algorithm, through DSTEDC()
Apply the Householder reflection reported by DSYTRD() using DORMTR().
On the contrary, to compute the SVD, dgesdd() performs the following steps, in the case job==A (U and VT required):
Bidiagonalize A using dgebrd()
Compute the SVD of the bidiagonal matrix using divide and conquer algorithm using DBDSDC()
Revert the bidiagonalization using using the matrices P and Q returned by dgebrd() applying dormbr() twice, once for U and once for V
While the actual operations performed by LAPACK are very different, the strategies are globally similar. It may stem from the fact that computing the SVD of a general matrix A is similar to performing the eigendecomposition of the symmetric matrix A^T.A.
Regarding accuracy and performances of lapack divide and conquer SVD, see This survey of SVD methods:
They often achieve the accuracy of QR-based SVD, though it is not proven
The worst case is O(n^3) if no deflation occurs, but often proves better than that
The memory requirement is 8 times the size of the matrix, which can become prohibitive
Regarding the symmetric eigenvalue problem, the complexity is 4/3n^3 (but often proves better than that) and the memory footprint is about 2n^2 plus the size of the matrix. Hence, the best choice is likely numpy.linalg.eigh if your matrix is symmetric.
The actual complexity can be computed for your particular matrices using the following code:
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
# see https://stackoverflow.com/questions/41109122/fitting-a-curve-to-a-power-law-distribution-with-curve-fit-does-not-work
def func_powerlaw(x, m, c):
return np.log(np.abs( x**m * c))
import time
start = time.time()
print("hello")
end = time.time()
print(end - start)
timeev=[]
timesvd=[]
size=[]
for n in range(10,600):
print n
size.append(n)
A=np.zeros((n,n))
#populate A, 1D diffusion.
for j in range(n):
A[j,j]=2.
if j>0:
A[j-1,j]=-1.
if j<n-1:
A[j+1,j]=-1.
#EIG
Aev=A.copy()
start = time.time()
w,v=np.linalg.eigh(Aev,'L')
end = time.time()
timeev.append(end-start)
Asvd=A.copy()
start = time.time()
u,s,vh=np.linalg.svd(Asvd)
end = time.time()
timesvd.append(end-start)
poptev, pcov = curve_fit(func_powerlaw, size[len(size)/2:], np.log(timeev[len(size)/2:]),p0=[2.1,1e-7],maxfev = 8000)
print poptev
poptsvd, pcov = curve_fit(func_powerlaw, size[len(size)/2:], np.log(timesvd[len(size)/2:]),p0=[2.1,1e-7],maxfev = 8000)
print poptsvd
plt.figure()
fig, ax = plt.subplots()
plt.plot(size,timeev,label="eigh")
plt.plot(size,[np.exp(func_powerlaw(x, poptev[0], poptev[1])) for x in size],label="eigh-adjusted complexity: "+str(poptev[0]))
plt.plot(size,timesvd,label="svd")
plt.plot(size,[np.exp(func_powerlaw(x, poptsvd[0], poptsvd[1])) for x in size],label="svd-adjusted complexity: "+str(poptsvd[0]))
ax.set_xlabel('n')
ax.set_ylabel('time, s')
#plt.legend(loc="upper left")
ax.legend(loc="lower right")
ax.set_yscale("log", nonposy='clip')
fig.tight_layout()
plt.savefig('eigh.jpg')
plt.show()
For such 1D diffusion matrices, eigh outperforms svd, but the actual complexity are similar, slightly lower than n^3, something like n^2.5.
Checking of the accuracy could be performed as well.

No they do not use the same algorithm as they do different things. They are somewhat related but also very different. Let's start with the fact that you can do SVD on m x n matrices, where m and n don't need to be the same.
Dependent on the version of numpy, you are doing. Here are the eigenvalue routines in lapack for double precision:
http://www.netlib.org/lapack/explore-html/d9/d8e/group__double_g_eeigen.html
And the according SVD routines:
http://www.netlib.org/lapack/explore-html/d1/d7e/group__double_g_esing.html
There are differences in routines. Big differences. If you care for the details, they are specified in the fortran headers very well. In many cases it makes sense to find out, what kind of matrix you have in front of you, to make a good choice of routine. Is the matrix symmetric/hermitian? Is it in upper diagonal form? Is it positive semidefinite? ...
There are gynormous differences in runtime. But as rule of thumb EIGs are cheaper than SVDs. But that depends also on convergence speed, which in turn depends a lot on condition number of the matrix, in other words, how ill posed a matrix is, ...
SVDs are usually very robust and slow algorithms and oftentimes used for inversion, speed optimisation through truncation, principle component analysis really with the expetation, that the matrix you are dealing with is just a pile of shitty rows ;)

Related

Scikit learn NMF how to adjust sparseness of resulting factorization?

Nonnegative matrix factorization is lauded for generating sparse basis sets. However, when I run sklearn.decomposition.NMF the factors are not sparse. Older versions of NMF had a 'degree of sparseness' parameter beta. Newer versions do not, but I want my basis matrix W to actually be sparse. What can I do? (Code to reproduce problem is below).
I have toyed around with increasing various regularization parameters (e.g., alpha), but am not getting anything very sparse (like in the paper by Lee and Seung (1999) when I apply it to the Olivetti faces dataset. They still basically end up looking like eigenfaces.
My CNM output (not very sparse):
Lee and Seung CNM paper output basis columns (looks sparse to me):
Code to reproduce my problem:
from sklearn.datasets import fetch_olivetti_faces
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import NMF
faces, _ = fetch_olivetti_faces(return_X_y=True)
# run nmf on the faces data set
num_nmf_components = 50
estimator = NMF(num_nmf_components,
init='nndsvd',
tol=5e-3,
max_iter=1000,
alpha_W=0.01,
l1_ratio=0)
H = estimator.fit_transform(faces)
W = estimator.components_
# plot the basis faces
n_row, n_col = 6, 4 # how many faces to plot
image_shape = (64, 64)
n_samples, n_features = faces.shape
plt.figure(figsize=(10,12))
for face_id, face in enumerate(W[:n_row*n_col]):
plt.subplot(n_row, n_col, face_id+1)
plt.imshow(face.reshape(image_shape), cmap='gray')
plt.axis('off')
plt.tight_layout()
Is there some combinations of parameters with sklearn.decomposition.NMF() that lets you dial in sparseness? I have played with different combinations of alpha_W and l1_ratio and even tweaked the number of components. I still end up with eigen-face looking things.

There are a couple of things going on here that we need to disentangle. First, what happened to sparseness? Second, how do you generate sparse faces using the sklearn function?
Where did the sparseness go?
The sklearn.decomposition.NMF function went through a major change from versions 0.16 to 0.19. There are multiple ways to implement nonnetative matrix factorization.
Before 0.16, NMF used projected gradient descent as described in Hoyer 2004, and included a sparseness parameter (which as OP noted let you adjust the sparseness of the resulting W basis).
Because of various limitations outlined in this extremely thorough issue at sklearn's github repo, it was decided to move on to two additional methods:
Release 0.16: coordinate descent (PR here which was in version 0.16)
Release 0.19: multiplicative update (PR here which was in version 0.19)
This was a pretty major undertaking, and the upshot is we now have a great deal more freedom in terms of error functions, initialization, and regularization. You can read about that at the issue. The objective function is now:
You can read more details/explanation at the docs, but to note a few things relevant to the question:
The solver param which takes in mu for multiplicative update or cd for coordinate descent. The older projected gradient descent method (with the sparseness parameter) is deprecated.
As you can see in the objective function, there are weights for regularizing W and for H (alpha_W and alpha_H respectively). In theory if you want to reign in W, you should increase alpha_W.
You can regularize using the L1 or L2 norm, and the ratio between the two is set by l1_ratio. The larger you make l1_ratio, the more you weight the L1 norm over L2 norm. Note: the L1 norm tends to generate more sparse parameter sets, while the L2 norm tends to generate small parameter sets, so in theory if you want sparseness, then set your l1_ratio high.
How to generate sparse faces?
The examination of the objective function suggests what to do. Crank up alpha_W and l1_ratio. But also note that the Lee and Seung paper used multiplicative update (mu), so if you wanted to reproduce their results, I would recommend setting solver to mu, setting alpha_W high, and l1_ratio high, and see what happens.
In the OP's question, they implicitly used the cd solver (which is the default), and set alpha_W=0.01 and l1_ratio=0, which I wouldn't necessarily expect to create a sparse basis set.
But things are actually not that simple. I tried some initial runs of coordinate descent with high l1_ratio and alpha_W and found very low sparseness. So to quantify some of this, I did a grid search, and used a sparseness measure.
Quantifying sparseness is itself a cottage industry (e.g., see this post, and the paper cited there). I used Hoyer's measure of sparsity, adapted from the one used in the nimfa package:
def sparseness_hoyer(x):
"""
The sparseness of array x is a real number in [0, 1], where sparser array
has value closer to 1. Sparseness is 1 iff the vector contains a single
nonzero component and is equal to 0 iff all components of the vector are
the same
modified from Hoyer 2004: [sqrt(n)-L1/L2]/[sqrt(n)-1]
adapted from nimfa package: https://nimfa.biolab.si/
"""
from math import sqrt # faster than numpy sqrt
eps = np.finfo(x.dtype).eps if 'int' not in str(x.dtype) else 1e-9
n = x.size
# measure is meant for nmf: things get weird for negative values
if np.min(x) < 0:
x -= np.min(x)
# patch for array of zeros
if np.allclose(x, np.zeros(x.shape), atol=1e-6):
return 0.0
L1 = abs(x).sum()
L2 = sqrt(np.multiply(x, x).sum())
sparseness_num = sqrt(n) - (L1 + eps) / (L2 + eps)
sparseness_den = sqrt(n) - 1
return sparseness_num / sparseness_den
What this measures actually quantifies is sort of complicated, but roughly a sparse image is one with only a few pixels active, a non-sparse image has lots of pixels active. If we run PCA on the faces example from the OP, we can see the sparseness values is low around 0.04 for the eigenfaces:
Sparsifying using coordinate descent?
If we run NMF using the params used in the OP (using coordinate descent, with low W_alpha and l1_ratio, except with 200 components), the sparseness values are again low:
If you look at the histogram of sparseness values this is verified:
Different, but not super impressive, compared with PCA.
I next did a grid search through W_alpha and l1_ratio space, varying them between 0 and 1 (at 0.1 step increments). I found that sparsity was not maximized when they were 1. Surprisingly, contrary to theoretical expectations, I found that sparsity was only high when l1_ratio was 0 and it dropped of precipitously above 0. And within this slice of parameters, sparsity was maximized when alpha_W was 0.9:
Intuitively, this is a huge improvement. There is still a lot of variation in the distribution of sparseness values, but they are much higher:
However, maybe in order to replicate the Lee and Seung results, and better control sparseness, we should be using multiplicative update (which is what they used). Let's try that next.
Sparsifying using multiplicative update
For the next attempt, I used multiplicative update, and this behaved much more as expected, with sparse, parts-based representations emerging:
You can see the drastic difference, and this is reflected in the histogram of sparseness values:
Note the code to generate this is below.
One final interesting thing to note: the sparseness values with this method seem to increase with the component number. I plotted sparseness as a function of component, and this is (roughly) born out, and was born out consistently over all my runs of the algorithm:
I have not seen this discussed elsewhere, so thought I'd mention it.
Code to generate sparse representation of faces using the mu NMF algorithm:
from sklearn.datasets import fetch_olivetti_faces
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import NMF
faces, _ = fetch_olivetti_faces(return_X_y=True)
num_nmf_components = 200
alph_W = 0.9 # cd: .9, mu: .9
L1_ratio = 0.9 # cd: 0, L1_ratio: 0.9
try:
del estimator
except:
print("first run")
estimator = NMF(num_nmf_components,
init='nndsvdar', # nndsvd
solver='mu',
max_iter=50,
alpha_W=alph_W,
alpha_H=0, zeros
l1_ratio=L1_ratio,
shuffle=True)
H = estimator.fit_transform(faces)
W = estimator.components_
# plot the basis faces
n_row, n_col = 5, 7 # how many faces to plot
image_shape = (64, 64)
n_samples, n_features = faces.shape
plt.figure(figsize=(10,12))
for face_id, face in enumerate(W[:n_row*n_col]):
plt.subplot(n_row, n_col, face_id+1)
face_sparseness = sparseness_hoyer(face)
plt.imshow(face.reshape(image_shape), cmap='gray')
plt.title(f"{face_sparseness: 0.2f}")
plt.axis('off')
plt.suptitle('NMF', fontsize=16, y=1)
plt.tight_layout()

Eigenanalysis of complex hermitian matrix: different phase angles for EIG and EIGH

I understand that eigenvectors are only defined up to a multiplicative constant. As far as I see all numpy algorithms (e.g. linalg.eig, linalg.eigh, linalg.svd) yield identical eigenvectors for real matrices, so apparently they use the same normalization. In the case of a complex matrix, however, the algorithms yield different results.
That is, the eigenvectors are the same up to a (complex) constant z. After some experimenting with eig and eigh I realised that eigh always sets the phase angle (defined as arctan(complex part/real part)) to 0 for the first component of each eigenvector whereas eig seems to start with some (arbitrary ?) non-zero phase angle.
Q: Is there a way to normalize the eigenvectors from eigh in the way eig is doing it (that is not to force phase angle = 0)?
Example
I have a complex hermitian matrix G for which I want to calculate the eigenvectors using the two following algorithms:
numpy.linalg.eig for a real/complex square matrix
numpy.linalg.eighfor a real symmetric/complex hermitian matrix (special case of 1.)
Check that G is hermitian
# check if a matrix is hermitian
def isHermitian(a, rtol=1e-05, atol=1e-08):
return np.allclose(a, a.conjugate().T, rtol=rtol, atol=atol)
print('G is hermitian:', isHermitian(G))
Out:
G is hermitian: True
Perform eigenanalysis
# eigenvectors from EIG()
l1,u1 = np.linalg.eig(G)
idx = np.argsort(l1)[::-1]
l1,u1 = l1[idx].real,u1[:,idx]
# eigenvectors from EIGH()
l2,u2 = np.linalg.eigh(G)
idx = np.argsort(l2)[::-1]
l2,u2 = l2[idx],u2[:,idx]
Check eigenvalues
print('Eigenvalues')
print('eig\t:',l1[:3])
print('eigh\t:',l2[:3])
Out:
Eigenvalues
eig : [2.55621629e+03 3.48520440e+00 3.16452447e-02]
eigh : [2.55621629e+03 3.48520440e+00 3.16452447e-02]
Both methods yield the same eigenvectors.
Check eigenvectors
Now look at the eigenvectors (e.g. 3. eigenvector) , which differ by a constant factor z.
multFactors = u1[:,2]/u2[:,2]
if np.count_nonzero(multFactors[0] == multFactors):
print("All multiplication factors are same:", multFactors[0])
else:
print("Multiplication factors are different.")
Out:
All multiplication factors are same: (-0.8916113627685007+0.45280147727156245j)
Check phase angle
Now check the phase angle for the first component of the 3. eigenvector:
print('Phase angel (in PI) for first point:')
print('Eig\t:',np.arctan2(u1[0,2].imag,u1[0,2].real)/np.pi)
print('Eigh\t:',np.arctan2(u2[0,2].imag,u2[0,2].real)/np.pi)
Out:
Phase angel (in PI) for first point:
Eig : 0.8504246311627189
Eigh : 0.0
Code to reproduce figure
num = 2
fig = plt.figure()
gs = gridspec.GridSpec(2, 3)
ax0 = plt.subplot(gs[0,0])
ax1 = plt.subplot(gs[1,0])
ax2 = plt.subplot(gs[0,1:])
ax3 = plt.subplot(gs[1,1:])
ax2r= ax2.twinx()
ax3r= ax3.twinx()
ax0.imshow(G.real,vmin=-30,vmax=30,cmap='RdGy')
ax1.imshow(G.imag,vmin=-30,vmax=30,cmap='RdGy')
ax2.plot(u1[:,num].real,label='eig')
ax2.plot((u2[:,num]).real,label='eigh')
ax3.plot(u1[:,num].imag,label='eig')
ax3.plot((u2[:,num]).imag,label='eigh')
for a in [ax0,ax1,ax2,ax3]:
a.set_xticks([])
a.set_yticks([])
ax0.set_title('Re(G)')
ax1.set_title('Im(G)')
ax2.set_title('Re('+str(num+1)+'. Eigenvector)')
ax3.set_title('Im('+str(num+1)+'. Eigenvector)')
ax2.legend(loc=0)
ax3.legend(loc=0)
fig.subplots_adjust(wspace=0, hspace=.2,top=.9)
fig.suptitle('Eigenanalysis of Hermitian Matrix G',size=16)
plt.show()

As you say, the eigenvalue problem only fixes the eigenvectors up to a scalar x. Transforming an eigenvector v as v = v*x does not change its status as an eigenvector.
There is an "obvious" way to normalize the vectors (according to the euclidean inner product np.vdot(v1, v1)), but this only fixes the amplitude of the scalar, which can be complex.
Fixing the angle or "phase" is kind of arbitrary without further context. I tried out eigh() and indeed it just makes the first entry of the vector real (with an apparently random sign!?).
eig() instead chooses to make real the vector entry with the largest real part. For example, here is what I get for a random Hermitian matrix:
n = 10
H = 0.5*(X + X.conj().T)
np.max(la.eig(H)[1], axis=0)
# returns
array([0.57590624+0.j, 0.42672485+0.j, 0.51974879+0.j, 0.54500475+0.j,
0.4644593 +0.j, 0.53492448+0.j, 0.44080532+0.j, 0.50544424+0.j,
0.48589402+0.j, 0.43431733+0.j])
This is arguably more sensible, as just picking the first entry, like eigh() does, is not very robust if the first entry happens to be very small. Picking the max value avoids this. I am not sure if eig() also fixes the sign (a random matrix is not a very good test case for this as it would be very unusual for all entries in an eigenvector to have negative real parts, which is the only case in which an unfixed sign would show up).
In any case, I would not rely on the eigensolver using any particular way of fixing phases. It's not documented and so could, in principle, change in the future. Instead, fix the phases yourself, perhaps the same way eig() does it now.

In my experience (and there are many questions here to back this up), you NEVER want to use eig when eigh is an option - eig is very slow and very unstable. The relevance of this is that I believe your question is backward - you want to normalize the eigenvectors of eig to be like those of eigh, and this you know how to do.

Is t-SNE's computational bottleneck its memory complexity?

I've been exploring different dimensionality reduction algorithms, specifically PCA and T-SNE. I'm taking a small subset of the MNIST dataset (with ~780 dimensions) and attempting to reduce the raw down to three dimensions to visualize as a scatter plot. T-SNE can be described in great detail here.
I'm using PCA as an intermediate dimensional reduction step prior to T-SNE, as described by the original creators of T-SNE on the source code from their website.
I'm finding that T-SNE takes forever to run (10-15 minutes to go from a 2000 x 25 to a 2000 x 3 feature space), while PCA runs relatively quickly (a few seconds for a 2000 x 780 => 2000 X 20).
Why is this the case? My theory is that in the PCA implementation (directly from primary author's source code in Python), he utilizes Numpy dot product notations to calculate X and X.T:
def pca(X = Math.array([]), no_dims = 50):
"""Runs PCA on the NxD array X in order to reduce its dimensionality to no_dims dimensions."""
print "Preprocessing the data using PCA..."
(n, d) = X.shape;
X = X - Math.tile(Math.mean(X, 0), (n, 1));
(l, M) = Math.linalg.eig(Math.dot(X.T, X));
Y = Math.dot(X, M[:,0:no_dims]);
return Y;
As far as I recall, this is significantly more efficient than scalar operations, and also means that only 2N (where N is the number of rows) of data is loaded into memory (you need to load one row of X and one column of X.T).
However, I don't think this is the root reason. T-SNE definitely also contains vector operations, for example, when calculating the pairwise distances D:
D = Math.add(Math.add(-2 * Math.dot(X, X.T), sum_X).T, sum_X);
Or, when calculating P (higher dimension) and Q (lower dimension). In t-SNE, however, you have to create two N X N matrices to store your pairwise distances between each data, one for its original high-dimensional space representation and the other for its reduced dimensional space.
In computing your gradient, you also have to create another N X N matrix called PQ, which is P - Q.
It seems to me that the memory complexity here is the bottleneck. T-SNE requires 3N^2 of memory. There is no way this can fit in local memory, so the algorithm experiences significant cache line misses and needs to go to global memory to retrieve the values.
Is this correct? How do I explain to a client or a reasonable non-technical person why t-SNE is slower than PCA?
The co-author's Python implementation is found here.

The main reason for t-SNE being slower than PCA is that no analytical solution exists for the criterion that is being optimised. Instead, a solution must be approximated through gradient descend iterations.
In practice, this means lots of for loops. Not in the least the main iteration for-loop in line 129, that runs up to max_iter=1000 times. Additionally, the x2p function iterates over all data points with a for loop.
The reference implementation is optimised for readability, not for computational speed. The authors link to an optimised Torch implementation as well, which should speed up the computation a lot. If you want to stay in pure Python, I recommend the implementation in Scikit-Learn, which should also be a lot faster.

t-SNE tries to lower the dimensionality while preserving the distributions of distances between elements.
This requires computing distances between all the points. Pairwise distance matrix has N^2 entries where N is the number of examples.

How to compute scipy sparse matrix determinant without turning it to dense?

I am trying to figure out the fastest method to find the determinant of sparse symmetric and real matrices in python. using scipy sparse module but really surprised that there is no determinant function. I am aware I could use LU factorization to compute determinant but don't see a easy way to do it because the return of scipy.sparse.linalg.splu is an object and instantiating a dense L and U matrix is not worth it - I may as well do sp.linalg.det(A.todense()) where A is my scipy sparse matrix.
I am also a bit surprised why others have not faced the problem of efficient determinant computation within scipy. How would one use splu to compute determinant?
I looked into pySparse and scikits.sparse.chlmod. The latter is not practical right now for me - needs package installations and also not sure sure how fast the code is before I go into all the trouble.
Any solutions? Thanks in advance.

Here are some references I provided as part of an answer here.
I think they address the actual problem you are trying to solve:
notes for an implementation in the Shogun library
Erlend Aune, Daniel P. Simpson: Parameter estimation in high dimensional Gaussian distributions, particularly section 2.1 (arxiv:1105.5256)
Ilse C.F. Ipsen, Dean J. Lee: Determinant Approximations (arxiv:1105.0437)
Arnold Reusken: Approximation of the determinant of large sparse symmetric positive definite matrices (arxiv:hep-lat/0008007)
Quoting from the Shogun notes:
The usual technique for computing the log-determinant term in the likelihood expression relies on Cholesky factorization of the matrix, i.e. Σ=LLT, (L is the lower triangular Cholesky factor) and then using the diagonal entries of the factor to compute log(det(Σ))=2∑ni=1log(Lii). However, for sparse matrices, as covariance matrices usually are, the Cholesky factors often suffer from fill-in phenomena - they turn out to be not so sparse themselves. Therefore, for large dimensions this technique becomes infeasible because of a massive memory requirement for storing all these irrelevant non-diagonal co-efficients of the factor. While ordering techniques have been developed to permute the rows and columns beforehand in order to reduce fill-in, e.g. approximate minimum degree (AMD) reordering, these techniques depend largely on the sparsity pattern and therefore not guaranteed to give better result.
Recent research shows that using a number of techniques from complex analysis, numerical linear algebra and greedy graph coloring, we can, however, approximate the log-determinant up to an arbitrary precision [Aune et. al., 2012]. The main trick lies within the observation that we can write log(det(Σ)) as trace(log(Σ)), where log(Σ) is the matrix-logarithm.

The "standard" way to solve this problem is with a cholesky decomposition, but if you're not up to using any new compiled code, then you're out of luck. The best sparse cholesky implementation is Tim Davis's CHOLMOD, which is licensed under the LGPL and thus not available in scipy proper (scipy is BSD).

You can use scipy.sparse.linalg.splu to obtain sparse matrices for the lower (L) and upper (U) triangular matrices of an M=LU decomposition:
from scipy.sparse.linalg import splu
lu = splu(M)
The determinant det(M) can be then represented as:
det(M) = det(LU) = det(L)det(U)
The determinant of triangular matrices is just the product of the diagonal terms:
diagL = lu.L.diagonal()
diagU = lu.U.diagonal()
d = diagL.prod()*diagU.prod()
However, for large matrices underflow or overflow commonly occurs, which can be avoided by working with the logarithms.
diagL = diagL.astype(np.complex128)
diagU = diagU.astype(np.complex128)
logdet = np.log(diagL).sum() + np.log(diagU).sum()
Note that I invoke complex arithmetic to account for negative numbers that might appear in the diagonals. Now, from logdet you can recover the determinant:
det = np.exp(logdet) # usually underflows/overflows for large matrices
whereas the sign of the determinant can be calculated directly from diagL and diagU (important for example when implementing Crisfield's arc-length method):
sign = swap_sign*np.sign(diagL).prod()*np.sign(diagU).prod()
where swap_sign is a term to consider the number of permutations in the LU decomposition. Thanks to #Luiz Felippe Rodrigues, it can be calculated:
swap_sign = (-1)**minimumSwaps(lu.perm_r)
def minimumSwaps(arr):
"""
Minimum number of swaps needed to order a
permutation array
"""
# from https://www.thepoorcoder.com/hackerrank-minimum-swaps-2-solution/
a = dict(enumerate(arr))
b = {v:k for k,v in a.items()}
count = 0
for i in a:
x = a[i]
if x!=i:
y = b[i]
a[y] = x
b[x] = y
count+=1
return count

Things start to go wrong with the determinant of sparse tridiagonal (-1 2 -1) around N=1e6 using both SuperLU and CHOLMOD...
The determinant should be N+1.
It's probably propagation of error when calculating the product of the U diagonal:
from scipy.sparse import diags
from scipy.sparse.linalg import splu
from sksparse.cholmod import cholesky
from math import exp
n=int(5e6)
K = diags([-1.],-1,shape=(n,n)) + diags([2.],shape=(n,n)) + diags([-1.],1,shape=(n,n))
lu = splu(K.tocsc())
diagL = lu.L.diagonal()
diagU = lu.U.diagonal()
det=diagL.prod()*diagU.prod()
print(det)
factor = cholesky(K.tocsc())
ld = factor.logdet()
print(exp(ld))
Output:
4999993.625461911
4999993.625461119
Even if U is 10-13 digit accurate, this might be expected:
n=int(5e6)
print(n*diags([1-0.00000000000025],0,shape=(n,n)).diagonal().prod())
4999993.749444371

Scipy's sparse eigsh() for small eigenvalues

I'm trying to write a spectral clustering algorithm using NumPy/SciPy for larger (but still tractable) systems, making use of SciPy's sparse linear algebra library. Unfortunately, I'm running into stability issues with eigsh().
Here's my code:
import numpy as np
import scipy.sparse
import scipy.sparse.linalg as SLA
import sklearn.utils.graph as graph
W = self._sparse_rbf_kernel(self.X_, self.datashape)
D = scipy.sparse.csc_matrix(np.diag(np.array(W.sum(axis = 0))[0]))
L = graph.graph_laplacian(W) # D - W
vals, vects = SLA.eigsh(L, k = self.k, M = D, which = 'SM', sigma = 0, maxiter = 1000)
The sklearn library refers to the scikit-learn package, specifically this method for calculating a graph laplacian from a sparse SciPy matrix.
_sparse_rbf_kernel is a method I wrote to compute pairwise affinities of the data points. It operates by creating a sparse affinity matrix from image data, specifically by only computing pairwise affinities for the 8-neighborhoods around each pixel (instead of pairwise for all pixels with scikit-learn's rbf_kernel method, which for the record doesn't fix this either).
Since the laplacian is unnormalized, I'm looking for the smallest eigenvalues and corresponding eigenvectors of the system. I understand that ARPACK is ill-suited for finding small eigenvalues, but I'm trying to use shift-invert to find these values and am still not having much success.
With the above arguments (specifically, sigma = 0), I get the following error:
RuntimeError: Factor is exactly singular
With sigma = 0.001, I get a different error:
scipy.sparse.linalg.eigen.arpack.arpack.ArpackNoConvergence: ARPACK error -1: No convergence (1001 iterations, 0/5 eigenvectors converged)
I've tried all three different values for mode with the same result. Any suggestions for using the SciPy sparse library for finding small eigenvalues of a large system?

You should use which='LM': in the shift-invert mode, this parameter refers to the transformed eigenvalues. (As explained in the documentation.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how does numpy.linalg.eigh vs numpy.linalg.svd? - python

Related

Scikit learn NMF how to adjust sparseness of resulting factorization?

Eigenanalysis of complex hermitian matrix: different phase angles for EIG and EIGH

Is t-SNE's computational bottleneck its memory complexity?

How to compute scipy sparse matrix determinant without turning it to dense?

Scipy's sparse eigsh() for small eigenvalues

Categories

Resources