I am trying to get accustomed to doing singular value decomposition with numpy. I decided to do the SVD on a matrix from an example to understand how it works. I am following this pdf, where A = [[3, 2, 2], [2, 3, -2]]. When I run the svd, however, I get something different for the matrices U and V then what is provided in the pdf. It is the same matrix, except the signs have been flipped. Now, since the matrices are both linear operators and the signs have been flipped on both it is technically still correct, the flipping cancels out. But why is it this way?
Remember that U and V are eigenvectors. Scaling an eigenvector is still an eigenvector, but as long as you get some linear multiple of the solution that you get in the PDF, it is perfectly acceptable. You know the implementation is correct if the eigenvalues are the same. Judging from your post as you didn't comment on the eigenvalues, I'm assuming that they are correct. The eigenvalues need to be the same, but the eigenvectors can be different.
In your case, the scaling is done by -1, which are still valid eigenvectors to the same eigenvalues. As to the reason why the eigenvectors are different in sign is most likely the way the SVD is calculated. Finding the actual left and right eigenvectors is computationally prohibitive, so some tips and tricks to arrive at the same solution are done, and that may mean that the eigenvectors are of a different scale than you expect.
I'd finally like to point you to this Cross Validated post that talks about the different algorithms that compute the SVD. numpy.svd examines the properties of the input matrix and chooses the right algorithm that is suitable.
https://stats.stackexchange.com/questions/66034/what-are-efficient-algorithms-to-compute-singular-value-decomposition-svd
Related
I have a large sparse square non-normal matrix: 73080 rows, but only 6 nonzero entries per row (and all equal to 1.). I'd like to compute the two largest eigenvalues, as well as the operator (2) norm, ideally with Python. The natural way for me to store this matrix is with scipy's csr_matrix, especially since I'll be multiplying it with other sparse matrices. However, I don't see a good way to compute the relevant statistics: scipy.sparse.linalg's norm method doesn't have the 2-norm implemented and converting to a dense matrix seems like it would be a bad idea, and running scipy.sparse.linalg.eigs seems to run extremely, maybe prohibitively, slowly, and in any event it computes lots of data that I just don't need. I suppose I could subtract off the spectral projector corresponding to the top eigenvalue but then I'd still need to know the top eigenvalue of the new matrix, which I'd like to do with an out-of-the-box method if at all possible, and in any event this wouldn't continue to work after multiplying with other large sparse matrices.
However, these kinds of computations seem to be doable: the top of page 6 of this paper seems to have data on the eigenvalues of ~10000-row matrices. If this is not feasible in Python, is there another way I should try to do this? Thanks in advance.
I have written a simple PCA code that calculates the covariance matrix and then uses linalg.eig on that covariance matrix to find the principal components. When I use scikit's PCA for three principal components I get almost the equivalent result. My PCA function outputs the third column of transformed data with flipped signs to what scikit's PCA function does. Now I think there is a higher probability that scikit's built-in PCA is correct than to assume that my code is correct. I have noticed that the third principal component/eigenvector has flipped signs in my case. So if scikit's third eigenvector is (a,-b,-c,-d) then mine is (-a,b,c,d). I might a bit shabby in my linear algebra, but I assume those are different results. The way I arrive at my eigenvectors is by computing the eigenvectors and eigenvalues of the covariance matrix using linalg.eig. I would gladly try to find eigenvectors by hand, but doing that for a 4x4 matrix (I am using iris data set) is not fun.
Iris data set has 4 dimensions, so at most I can run PCA for 4 components. When I run for one component, the results are equivalent. When I run for 2, also equivalent. For three, as I said, my function outputs flipped signs in the third column. When I run for four, again signs are flipped in the third column and all other columns are fine. I am afraid I cannot provide the code for this. This is a project, kind of.
This is desired behaviour, even stated in the documentation of sklearn's PCA
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
and quite obviously correct from mathematical perspective, as if v is eigenvector of A then
Av = kv
thus also
A(-v) = -(Av) = -(kv) = k(-v)
So if scikit's third eigenvector is (a,-b,-c,-d) then mine is (-a,b,c,d).
That's completely normal. If v is an eigenvector of a matrix, then -v is an eigenvector with the same eigenvalue.
While trying to compute inverse of a matrix in python using numpy.linalg.inv(matrix), I get singular matrix error. Why does it happen? Has it anything to do with the smallness of the values in the matrix. The numbers in my matrix are probabilities and add up to 1.
It may very well have to do with the smallness of the values in the matrix.
Some matrices that are not, in fact, mathematically singular (with a zero determinant) are totally singular from a practical point of view, in that the math library one is using cannot process them properly.
Numerical analysis is tricky, as you know, and how well it deals with such situations is a measure of the quality of a matrix library.
I have a graph laplacian, for which I need to find out the largest 'k' eigen values and eigen vectors. I am using something like this :-
#L= laplacian matrix.
eigVal,eigVectors = eigsh(L, k, which='LA')
This is giving me approximately correct results, but something's going wrong and I am getting eig values slightly greater than 1 (say 1.05). In my case the eigen values are upper bounded by 1. when using MATLAB and other platforms I am getting desired results.
What am I doing wrong here?? Is there any way by which I can parallelize the computation of eigen vectors and values? (I am considering pyCuda.)
Are you sure that your Python implementation of the Laplacian is correct? Did you double-check e.g. that the input matrix is symmetric?
Without having your specific matrix at hand, it is difficult to say what exactly goes wrong. Can you save the matrix and put it somewhere on the internet?
EDIT: removed mention of eigs* previous behavior -- the routine did not have the eigsh name before that, so that's not the case here.
I was wondering if there is a Python package, numpy or otherwise, that has a function that computes the first eigenvalue and eigenvector of a small matrix, say 2x2. I could use the linalg package in numpy as follows.
import numpy as np
def whatever():
A = np.asmatrix(np.rand(2, 2))
evals, evecs = np.linalg.eig(A)
#Assume that the eigenvalues are ordered from large to small and that the
#eigenvectors are ordered accordingly.
return evals[0], evecs[:, 0]
But this takes a really long time. I suspect that it's because numpy computes eigenvectors through some sort of iterative process. So I was wondering if there were a much faster algorithm that only returns the first (largest) eigenvalue and eigenvector, since I only need the first.
For 2x2 matrices of course I can write a function myself, that computes the eigenvalue and eigenvector analytically, but then there are problems with floating point computations, for example when I divide a very big number by a very small number, I get infinity or NaN. Does anyone know anything about this? Please help! Thank you in advance!
Use this: http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html
http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigs.html#scipy.sparse.linalg.eigs
Find k eigenvalues and eigenvectors of the square matrix A.
According to the docs:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eig.html
and also to my own experience, numpy.linalg.eig(A) does NOT sort the eigenvectors in any particular order, which is what the OP and subsequent seem to be assuming. I suggest something like:
rearrangedEvalsVecs = sorted(zip(evals,evecs.T),\
key=lambda x: x[0].real, reverse=True)
There doesn't appear to be a numpy equivalent of Matlab's eigs(A,B,k) for finding the k largest eigenvectors.
If you're interested, Enthought has compiled a table showing the differences between Matlab and numpy. That should be helpful for answering questions like this one: Link
One other thought, for 2x2 matrices, I don't think eigs(A,B,1) would help anyway. The effort involved in computing the first eigenpair leaving the matrix transformed to where the second emerges directly. There is only benefit for 3x3 and larger.