Singular value decomposition of matrix M of size (M,N) means factoring
How to obtain all three matrices from scikit-learn and numpy package?
I think I can obtain Sigma with PCA model:
import numpy as np
from sklearn.decomposition import PCA
model = PCA(N, copy=True, random_state=0)
model.fit(X)
Sigma = model.singular_values_
Sigma = np.diag(singular_values)
What about other matrices?
You can get these matrices using numpy.linalg.svd as follows:
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
U, S, V = np.linalg.svd(a, full_matrices=True)
S is a 1D array that represents the diagonal entries in Sigma. U and V are the corresponding matrices from the decomposition.
By the way, note that when you used PCA, the data is centered before svd is applied (unlike numpy.linalg.svd, where svd is applied directly on the matrix itself. see lines 409-410 here).
Can't comment on Mirian's answer because I don't have enough reputation, but from looking at Miriam's link, sklearn actually calls scipy's linalg.svd which is doesn't seem to be the same as np.linalg.svd (discussion here)
So it may be better to use U, S, V = scipy.linalg.svd(a, full_matrices=True)
Related
I am trying to implement SVD using np.linalg.eig method for an image compression assignment. We are not allowed to use the np.linalg.svd method directly.
Here is my svd method:
def svd(A):
evals, U = LA.eig(A # A.T)
evals2, V = LA.eig(A.T # A)
idx = evals.argsort()[::-1]
evals = evals[idx]
U = U[:, idx]
idx = evals2.argsort()[::-1]
V = V[ :, idx]
sigma = np.array(list(map(math.sqrt, evals)))
return U, sigma, V.T
But when I try to reconstruct the image using U and V returned by the above svd, the error rate is so much that the image is completely blurry even after using all the singular vectors. Whereas when I try the same reconstruction procedure with the U & V matrices returned by np.linalg.svd, I am able to clearly reconstruct the image.
Please let me know if there is anything wrong with my svd method.
Both SVD and eigenvectors are not fully unique. In SVD you can sign flip any vector in U as long as you do the same to the corresponding vector in V. The eigenvectors you get are not coupled in that way, therefore there is a good chance of sign mismatch. You can check and correct for that by using the fact that U.T#A#V.T is sigma, so check the signs of the diagonal elements of U.T#A#V.T and for each negative one flip the corresponding vector in either U or V (but not in both).
Additional suggestions:
Since you only need the diagonal elements it is wasteful to compute the full product U.T#A#V.T; the simplest way to compute only diagonal elements would be np.einsum('ij,ik,jk->j',U,A,V).
Use eigh instead of eig because you know A#A.T and A.T#A are symmetric.
You can save one eigen decomposition because sigma#V = U.T#A and sigma being diagonal is easy to invert. This also has the advantage that the above sign problem can't happen.
I am trying to find the eigenvalues and eigenvectors of a complex matrix with scipy.sparse.linalg.eigsh using its shift-invert mode. With just real numbers in the matrix I get the same result for the spicy.linalg.eigh solver, but when adding the imaginary parts the eigenvalues diverge. A tiny example:
import numpy as np
from scipy.linalg import eigh
from scipy.sparse.linalg import eigsh
n = 10
X = np.random.random((n, n)) - 0.5 + (np.random.random((n, n)) - 0.5) * 1j
X = np.dot(X, X.T) # create a symmetric matrix
evals_all, evecs_all = eigh(X)
evals_small, evecs_small = eigsh(X, 3, sigma=0, which='LM')
print(sorted(evals_all, key=abs))
print(sorted(evals_small, key=abs))
The prints in this case are for example
[0.041577858515751132, -0.084104744918533481, -0.58668240775486691, 0.63845672501004724, -1.2311727737115068, 1.5193345703630159, -1.8652302423152105, 1.9970059660853923, -2.6414593461321654, 2.8624290667460293]
[-0.017278543470343462, -0.32684893256215408, 0.34551438015659475]
whereas in the real case, the first three eigenvalues are identical.
I am aware that I'm passing a dense matrix to the sparse solver, but this is just intended as an example.
I am probably missing something obvious somewhere, but I'd be happy about some hints where to look. Thank you!
scipy is not checking your input if it's hermitian.
Doing it like proposed in the link:
if not np.allclose(X, np.asmatrix(X).H):
raise ValueError('expected symmetric or Hermitian matrix')
outputs:
ValueError: expected symmetric or Hermitian matrix
I think this is also indicated by those negative eigenvalues you see (but complex-based math is really not my speciality...).
I have been utilizing PCA implemented in scikit-learn. However, I want to find the eigenvalues and eigenvectors that result after we fit the training dataset. There is no mention of both in the docs.
Secondly, can these eigenvalues and eigenvectors themselves be utilized as features for classification purposes?
I am assuming here that by EigenVectors you mean the Eigenvectors of the Covariance Matrix.
Lets say that you have n data points in a p-dimensional space, and X is a p x n matrix of your points then the directions of the principal components are the Eigenvectors of the Covariance matrix XXT. You can obtain the directions of these EigenVectors from sklearn by accessing the components_ attribute of the PCA object. This can be done as follows:
from sklearn.decomposition import PCA
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA()
pca.fit(X)
print pca.components_
This gives an output like
[[ 0.83849224 0.54491354]
[ 0.54491354 -0.83849224]]
where every row is a principal component in the p-dimensional space (2 in this toy example). Each of these rows is an Eigenvector of the centered covariance matrix XXT.
As far as the Eigenvalues go, there is no straightforward way to get them from the PCA object. The PCA object does have an attribute called explained_variance_ratio_ which gives the percentage of the variance of each component. These numbers for each component are proportional to the Eigenvalues. In the case of our toy example, we get these if print the explained_variance_ratio_ attribute :
[ 0.99244289 0.00755711]
This means that the ratio of the eigenvalue of the first principal component to the eigenvalue of the second principal component is 0.99244289:0.00755711.
If the understanding of the basic mathematics of PCA is clear, then a better way to get the Eigenvectors and Eigenvalues is to use numpy.linalg.eig to get Eigenvalues and Eigenvectors of the centered covariance matrix. If your data matrix is a p x n matrix, X (p features, n points), then the you can use the following code:
import numpy as np
centered_matrix = X - X.mean(axis=1)[:, np.newaxis]
cov = np.dot(centered_matrix, centered_matrix.T)
eigvals, eigvecs = np.linalg.eig(cov)
Coming to your second question. These EigenValues and EigenVectors cannot be used themselves for classification. For classification you need features for each data point. These Eigenvectors and Eigenvalues that you generate are derived from the entire covariance matrix, XXT. For dimensionality reduction you could use the projections of your original points(in the p-dimensional space) on the principal components obtained as a result of PCA. However, this is also not always useful, because PCA does not take into account the labels of your training data. I would recommend you to look into LDA for supervised problems.
Hope that helps.
The docs say explained_variance_ will give you
"The amount of variance explained by each of the selected components. Equal to n_components largest eigenvalues of the covariance matrix of X.", new in version 0.18.
Seems a little questionable since the first and second sentences do not seem to agree.
sklearn PCA documentation
I found some examples online showing how to find the null space of a regular matrix in Python, but I couldn't find any examples for a sparse matrix (scipy.sparse.csr_matrix).
By null space I mean x such that M·x = 0, where '·' is matrix multiplication. Does anybody know how to do this?
Furthermore, in my case I know that the null space will consist of a single vector. Can this information be used to improve the efficiency of the method?
This isn't a complete answer yet, but hopefully it will be a starting point towards one. You should be able to compute the null space using a variant on the SVD-based approach shown for dense matrices in this question:
import numpy as np
from scipy import sparse
import scipy.sparse.linalg
def rand_rank_k(n, k, **kwargs):
"generate a random (n, n) sparse matrix of rank <= k"
a = sparse.rand(n, k, **kwargs)
b = sparse.rand(k, n, **kwargs)
return a.dot(b)
# I couldn't think of a simple way to generate a random sparse matrix with known
# rank, so I'm currently using a dense matrix for proof of concept
n = 100
M = rand_rank_k(n, n - 1, density=1)
# # this seems like it ought to work, but it doesn't
# u, s, vh = sparse.linalg.svds(M, k=1, which='SM')
# this works OK, but obviously converting your matrix to dense and computing all
# of the singular values/vectors is probably not feasible for large sparse matrices
u, s, vh = np.linalg.svd(M.todense(), full_matrices=False)
tol = np.finfo(M.dtype).eps * M.nnz
null_space = vh.compress(s <= tol, axis=0).conj().T
print(null_space.shape)
# (100, 1)
print(np.allclose(M.dot(null_space), 0))
# True
If you know that x is a single row vector then in principle you would only need to compute the smallest singular value/vector of M. It ought to be possible to do this using scipy.sparse.linalg.svds, i.e.:
u, s, vh = sparse.linalg.svds(M, k=1, which='SM')
null_space = vh.conj().ravel()
Unfortunately, scipy's svds seems to be badly behaved when finding small singular values of singular or near-singular matrices and usually either returns NaNs or throws an ArpackNoConvergence error.
I'm not currently aware of an alternative implementation of truncated SVD with Python bindings that will work on sparse matrices and can selectively find the smallest singular values - perhaps someone else knows of one?
Edit
As a side note, the second approach seems to work reasonably well using MATLAB or Octave's svds function:
>> M = rand(100, 99) * rand(99, 100);
% svds converges much more reliably if you set sigma to something small but nonzero
>> [U, S, V] = svds(M, 1, 1E-9);
>> max(abs(M * V))
ans = 1.5293e-10
I have been trying to find a solution to the same problem. Using Scipy's svds function provides unreliable results for small singular values. Therefore i have been using QR decomposition instead. The sparseqr https://github.com/yig/PySPQR provides a wrapper for Matlabs SuiteSparseQR method, and works reasonably well. Using this the null space can be calculated as:
from sparseqr import qr
Q, _, _,r = qr( M.transpose() )
N = Q.tocsr()[:,r:]
I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives.
The transform is simple, it is just data * pca.components_, i.e. simple dot product. However, I have no idea how to perform the inverse transform. Which field of the pca object contains the relevant coefficients for the inverse transform? How do I calculate the inverse transform?
Specifically, I am referring to the PCA.inverse_transform() method call available in the sklearn.decomposition.PCA package: how can I manually reproduce its functionality using various coefficients calculated by the PCA?
1) transform is not data * pca.components_.
Firstly, * is not dot product for numpy array. It is element-wise multiplication. To perform dot product, you need to use np.dot.
Secondly, the shape of PCA.components_ is (n_components, n_features) while the shape of data to transform is (n_samples, n_features), so you need to transpose PCA.components_ to perform dot product.
Moreover, the first step of transform is to subtract the mean, therefore if you do it manually, you also need to subtract the mean at first.
The correct way to transform is
data_reduced = np.dot(data - pca.mean_, pca.components_.T)
2) inverse_transform is just the inverse process of transform
data_original = np.dot(data_reduced, pca.components_) + pca.mean_
If your data already has zero mean in each column, you can ignore the pca.mean_ above, for example
import numpy as np
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
pca.fit(data)
data_reduced = np.dot(data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform