I am trying to implement SVD using np.linalg.eig method for an image compression assignment. We are not allowed to use the np.linalg.svd method directly.
Here is my svd method:
def svd(A):
evals, U = LA.eig(A # A.T)
evals2, V = LA.eig(A.T # A)
idx = evals.argsort()[::-1]
evals = evals[idx]
U = U[:, idx]
idx = evals2.argsort()[::-1]
V = V[ :, idx]
sigma = np.array(list(map(math.sqrt, evals)))
return U, sigma, V.T
But when I try to reconstruct the image using U and V returned by the above svd, the error rate is so much that the image is completely blurry even after using all the singular vectors. Whereas when I try the same reconstruction procedure with the U & V matrices returned by np.linalg.svd, I am able to clearly reconstruct the image.
Please let me know if there is anything wrong with my svd method.
Both SVD and eigenvectors are not fully unique. In SVD you can sign flip any vector in U as long as you do the same to the corresponding vector in V. The eigenvectors you get are not coupled in that way, therefore there is a good chance of sign mismatch. You can check and correct for that by using the fact that U.T#A#V.T is sigma, so check the signs of the diagonal elements of U.T#A#V.T and for each negative one flip the corresponding vector in either U or V (but not in both).
Additional suggestions:
Since you only need the diagonal elements it is wasteful to compute the full product U.T#A#V.T; the simplest way to compute only diagonal elements would be np.einsum('ij,ik,jk->j',U,A,V).
Use eigh instead of eig because you know A#A.T and A.T#A are symmetric.
You can save one eigen decomposition because sigma#V = U.T#A and sigma being diagonal is easy to invert. This also has the advantage that the above sign problem can't happen.
Related
I'm trying to diagonalize an unitary matrix using numpy, in particular the numpy.linalg.eig function. Since the matrix is unitary, its eigenvectors are supposed to form an orthonormal basis. However, it seems that this is not the case:
import numpy as np
from qiskit.circuit.library import QFT
from qiskit.quantum_info import Operator
op = Operator(QFT(num_qubits=4, do_swaps=True)).data
eigvals, eigvecs = np.linalg.eig(op) # Compute the eigenvalues
eigvecs = eigvecs.T # Since the eigenvectors are arranged in columns
lambda_i = eigvals[2]
lambda_j = eigvals[-1]
v_i = eigvecs[2].reshape((-1, 1))
v_j = eigvecs[-1].reshape((-1, 1))
print(np.linalg.norm(op # v_i - lambda_i * v_i)) # Should be close to 0 by definition, actually yields 6.706985734026871e-16, which is fine
print(np.linalg.norm(op # v_j - lambda_j * v_j)) # Should be close to 0 by definition, actually yields 8.151878100248519e-16, which is fine
print(v_j.T.conj() # v_i) # Should be close to zero since the basis is supposed to be orthonormal but actually yields array([[-0.15147621-0.06735767j]]), which is not fine
print(np.linalg.norm(op.T.conj() # op - np.eye(16))) # Should be around zero if and only if op.data is unitary, actually yields 2.3337334181537826e-15, which is ine
I've tested it with the qiskit library, however my question is purely numpy-related. If required, the same matrix can be loaded using the following .npy file:
b"\x93NUMPY\x01\x00v\x00{'descr': '<c16', 'fortran_order': False, 'shape': (16, 16), } \n\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00C\x8d2\xcfk\x90\xcd?`\xa9\xae\xa6\xe2}\xb8?\xcb;\x7ff\x9e\xa0\xc6?\xca;\x7ff\x9e\xa0\xc6?c\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd?\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?^\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd?\xc9;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?A\x8d2\xcfk\x90\xcd\xbfc\xa9\xae\xa6\xe2}\xb8?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00C\x8d2\xcfk\x90\xcd\xbf`\xa9\xae\xa6\xe2}\xb8\xbf\xcb;\x7ff\x9e\xa0\xc6\xbf\xca;\x7ff\x9e\xa0\xc6\xbfc\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd\xbf\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf^\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd\xbf\xc9;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbfA\x8d2\xcfk\x90\xcd?c\xa9\xae\xa6\xe2}\xb8\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xcb;\x7ff\x9e\xa0\xc6?\xca;\x7ff\x9e\xa0\xc6?\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xca;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xcb;\x7ff\x9e\xa0\xc6\xbf\xca;\x7ff\x9e\xa0\xc6\xbf\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xca;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xcb;\x7ff\x9e\xa0\xc6?\xca;\x7ff\x9e\xa0\xc6?\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xca;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xcb;\x7ff\x9e\xa0\xc6\xbf\xca;\x7ff\x9e\xa0\xc6\xbf\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xca;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00a\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd?\xca;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?D\x8d2\xcfk\x90\xcd\xbf[\xa9\xae\xa6\xe2}\xb8\xbf\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbfC\x8d2\xcfk\x90\xcd?c\xa9\xae\xa6\xe2}\xb8\xbf\xcb;\x7ff\x9e\xa0\xc6?\xc9;\x7ff\x9e\xa0\xc6?Z\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00a\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd\xbf\xca;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbfD\x8d2\xcfk\x90\xcd?[\xa9\xae\xa6\xe2}\xb8?\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?C\x8d2\xcfk\x90\xcd\xbfc\xa9\xae\xa6\xe2}\xb8?\xcb;\x7ff\x9e\xa0\xc6\xbf\xc9;\x7ff\x9e\xa0\xc6\xbfZ\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00^\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd?\xcb;\x7ff\x9e\xa0\xc6\xbf\xca;\x7ff\x9e\xa0\xc6\xbfC\x8d2\xcfk\x90\xcd?d\xa9\xae\xa6\xe2}\xb8\xbf\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?C\x8d2\xcfk\x90\xcd\xbf]\xa9\xae\xa6\xe2}\xb8\xbf\xc9;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbfd\xa9\xae\xa6\xe2}\xb8?A\x8d2\xcfk\x90\xcd?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00^\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd\xbf\xcb;\x7ff\x9e\xa0\xc6?\xca;\x7ff\x9e\xa0\xc6?C\x8d2\xcfk\x90\xcd\xbfd\xa9\xae\xa6\xe2}\xb8?\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbfC\x8d2\xcfk\x90\xcd?]\xa9\xae\xa6\xe2}\xb8?\xc9;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?d\xa9\xae\xa6\xe2}\xb8\xbfA\x8d2\xcfk\x90\xcd\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xc9;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xcb;\x7ff\x9e\xa0\xc6?\xc9;\x7ff\x9e\xa0\xc6?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xc9;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbf\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xcb;\x7ff\x9e\xa0\xc6\xbf\xc9;\x7ff\x9e\xa0\xc6\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xc9;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xcb;\x7ff\x9e\xa0\xc6?\xc9;\x7ff\x9e\xa0\xc6?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xc9;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbf\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xcb;\x7ff\x9e\xa0\xc6\xbf\xc9;\x7ff\x9e\xa0\xc6\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00C\x8d2\xcfk\x90\xcd\xbfc\xa9\xae\xa6\xe2}\xb8?\xca;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbfZ\xa9\xae\xa6\xe2}\xb8\xbfD\x8d2\xcfk\x90\xcd?\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbfd\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd?\xcb;\x7ff\x9e\xa0\xc6\xbf\xc9;\x7ff\x9e\xa0\xc6\xbfC\x8d2\xcfk\x90\xcd?Y\xa9\xae\xa6\xe2}\xb8?\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00C\x8d2\xcfk\x90\xcd?c\xa9\xae\xa6\xe2}\xb8\xbf\xca;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?Z\xa9\xae\xa6\xe2}\xb8?D\x8d2\xcfk\x90\xcd\xbf\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?d\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd\xbf\xcb;\x7ff\x9e\xa0\xc6?\xc9;\x7ff\x9e\xa0\xc6?C\x8d2\xcfk\x90\xcd\xbfY\xa9\xae\xa6\xe2}\xb8\xbf\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00C\x8d2\xcfk\x90\xcd\xbf`\xa9\xae\xa6\xe2}\xb8\xbf\xcb;\x7ff\x9e\xa0\xc6?\xca;\x7ff\x9e\xa0\xc6?c\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd\xbf\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?^\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd\xbf\xc9;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?A\x8d2\xcfk\x90\xcd?c\xa9\xae\xa6\xe2}\xb8\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00C\x8d2\xcfk\x90\xcd?`\xa9\xae\xa6\xe2}\xb8?\xcb;\x7ff\x9e\xa0\xc6\xbf\xca;\x7ff\x9e\xa0\xc6\xbfc\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd?\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf^\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd?\xc9;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbfA\x8d2\xcfk\x90\xcd\xbfc\xa9\xae\xa6\xe2}\xb8?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xcb;\x7ff\x9e\xa0\xc6\xbf\xca;\x7ff\x9e\xa0\xc6\xbf\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xca;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xcb;\x7ff\x9e\xa0\xc6?\xca;\x7ff\x9e\xa0\xc6?\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xca;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xcb;\x7ff\x9e\xa0\xc6\xbf\xca;\x7ff\x9e\xa0\xc6\xbf\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xca;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xcb;\x7ff\x9e\xa0\xc6?\xca;\x7ff\x9e\xa0\xc6?\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xca;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00a\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd\xbf\xca;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?D\x8d2\xcfk\x90\xcd?[\xa9\xae\xa6\xe2}\xb8?\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbfC\x8d2\xcfk\x90\xcd\xbfc\xa9\xae\xa6\xe2}\xb8?\xcb;\x7ff\x9e\xa0\xc6?\xc9;\x7ff\x9e\xa0\xc6?Z\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00a\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd?\xca;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbfD\x8d2\xcfk\x90\xcd\xbf[\xa9\xae\xa6\xe2}\xb8\xbf\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?C\x8d2\xcfk\x90\xcd?c\xa9\xae\xa6\xe2}\xb8\xbf\xcb;\x7ff\x9e\xa0\xc6\xbf\xc9;\x7ff\x9e\xa0\xc6\xbfZ\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00^\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd\xbf\xcb;\x7ff\x9e\xa0\xc6\xbf\xca;\x7ff\x9e\xa0\xc6\xbfC\x8d2\xcfk\x90\xcd\xbfd\xa9\xae\xa6\xe2}\xb8?\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?C\x8d2\xcfk\x90\xcd?]\xa9\xae\xa6\xe2}\xb8?\xc9;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbfd\xa9\xae\xa6\xe2}\xb8\xbfA\x8d2\xcfk\x90\xcd\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00^\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd?\xcb;\x7ff\x9e\xa0\xc6?\xca;\x7ff\x9e\xa0\xc6?C\x8d2\xcfk\x90\xcd?d\xa9\xae\xa6\xe2}\xb8\xbf\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbfC\x8d2\xcfk\x90\xcd\xbf]\xa9\xae\xa6\xe2}\xb8\xbf\xc9;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?d\xa9\xae\xa6\xe2}\xb8?A\x8d2\xcfk\x90\xcd?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xc9;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbf\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xcb;\x7ff\x9e\xa0\xc6\xbf\xc9;\x7ff\x9e\xa0\xc6\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xc9;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xcb;\x7ff\x9e\xa0\xc6?\xc9;\x7ff\x9e\xa0\xc6?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00\xc9;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbf\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbf\xcb;\x7ff\x9e\xa0\xc6\xbf\xc9;\x7ff\x9e\xa0\xc6\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00\xc9;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?\xcb;\x7ff\x9e\xa0\xc6?\xc9;\x7ff\x9e\xa0\xc6?\xfd\xff\xff\xff\xff\xff\xcf?\x00\x00\x00\x00\x00\x00\x00\x00C\x8d2\xcfk\x90\xcd?c\xa9\xae\xa6\xe2}\xb8\xbf\xca;\x7ff\x9e\xa0\xc6?\xcb;\x7ff\x9e\xa0\xc6\xbfZ\xa9\xae\xa6\xe2}\xb8?D\x8d2\xcfk\x90\xcd\xbf\x05\\\x143&\xa6q\xbc\xfd\xff\xff\xff\xff\xff\xcf\xbfd\xa9\xae\xa6\xe2}\xb8\xbfC\x8d2\xcfk\x90\xcd\xbf\xcb;\x7ff\x9e\xa0\xc6\xbf\xc9;\x7ff\x9e\xa0\xc6\xbfC\x8d2\xcfk\x90\xcd\xbfY\xa9\xae\xa6\xe2}\xb8\xbf\xfd\xff\xff\xff\xff\xff\xcf\xbf\x00\x00\x00\x00\x00\x00\x00\x00C\x8d2\xcfk\x90\xcd\xbfc\xa9\xae\xa6\xe2}\xb8?\xca;\x7ff\x9e\xa0\xc6\xbf\xcb;\x7ff\x9e\xa0\xc6?Z\xa9\xae\xa6\xe2}\xb8\xbfD\x8d2\xcfk\x90\xcd?\x05\\\x143&\xa6q<\xfd\xff\xff\xff\xff\xff\xcf?d\xa9\xae\xa6\xe2}\xb8?C\x8d2\xcfk\x90\xcd?\xcb;\x7ff\x9e\xa0\xc6?\xc9;\x7ff\x9e\xa0\xc6?C\x8d2\xcfk\x90\xcd?Y\xa9\xae\xa6\xe2}\xb8?"
The behavior seems to occur only for purely complex (that is, with a nil real part) eigenvalues. For most of the other eigenvectors, the scalar product does yield 0, but I was able to remplicate this behavior on others.
I don't think it is due to some numerical approximation since the final scalar product is quite large.
I don't think it is due to some conjugation that I would have forgot. As a sanity check, I've ensured that v_i and v_j are indeed eigenvectors of op, which means they are correct, and which means they should be orthogonal.
The error in the reasoning comes from this part:
As a sanity check, I've ensured that v_i and v_j are indeed eigenvectors of op, which means they are correct, and which means they should be orthogonal.
If v_i and v_j are associated to the same eigenvalue lambda, then (v_i+v_j)/sqrt(2) is an eigenvector associated to lambda and which isn't orthogonal to v_i or v_j.
In order to ensure that the eigenvectors are orthogonal, simply perform:
eigvecs = np.linalg.qr(eigvecs)[0]
eigvecs will now contain orthonormal eigenvectors of op.
I was going through the book called Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow and the author was explaining how the pseudo-inverse (Moore-Penrose inverse) of a matrix is calculated in the context of Linear Regression. I'm quoting verbatim here:
The pseudoinverse itself is computed using a standard matrix
factorization technique called Singular Value Decomposition (SVD) that
can decompose the training set matrix X into the matrix
multiplication of three matrices U Σ VT (see numpy.linalg.svd()). The
pseudoinverse is calculated as X+ = V * Σ+ * UT. To compute the matrix
Σ+, the algorithm takes Σ and sets to zero all values smaller than a
tiny threshold value, then it replaces all nonzero values with their
inverse, and finally it transposes the resulting matrix. This approach
is more efficient than computing the Normal equation.
I've got an understanding of how the pseudo-inverse and SVD are related from this post. But I'm not able to grasp the rationale behind setting all values less than the threshold to zero. The inverse of a diagonal matrix is obtained by taking the reciprocals of the diagonal elements. Then small values would be converted to large values in the inverse matrix, right? Then why are we removing the large values?
I went and looked into the numpy code, and it looks like follows, just for reference:
#array_function_dispatch(_pinv_dispatcher)
def pinv(a, rcond=1e-15, hermitian=False):
a, wrap = _makearray(a)
rcond = asarray(rcond)
if _is_empty_2d(a):
m, n = a.shape[-2:]
res = empty(a.shape[:-2] + (n, m), dtype=a.dtype)
return wrap(res)
a = a.conjugate()
u, s, vt = svd(a, full_matrices=False, hermitian=hermitian)
# discard small singular values
cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True)
large = s > cutoff
s = divide(1, s, where=large, out=s)
s[~large] = 0
res = matmul(transpose(vt), multiply(s[..., newaxis], transpose(u)))
return wrap(res)
It's almost certainly an adjustment for numerical error. To see why this might be necessary, look what happens when you take the svd of a rank-one 2x2 matrix. We can create a rank-one matrix by taking the outer product of a vector like so:
>>> a = numpy.arange(2) + 1
>>> A = a[:, None] * a[None, :]
>>> A
array([[1, 2],
[2, 4]])
Although this is a 2x2 matrix, it only has one linearly independent column, and so its rank is one instead of two. So we should expect that when we pass it to svd, one of the singular values will be zero. But look what happens:
>>> U, s, V = numpy.linalg.svd(A)
>>> s
array([5.00000000e+00, 1.98602732e-16])
What we actually get is a singular value that is not quite zero. This result is inevitable in many cases given that we are working with finite-precision floating point numbers. So although the problem you have identified is a real one, we will not be able to tell in practice the difference between a matrix that really has a very small singular value and a matrix that ought to have a zero singular value but doesn't. Setting small values to zero is the safest practical way to handle that problem.
I try to diagonalize a n*100*100 3d matrix K by numpy.linalg.eig and get the eigenvalues w and eigenvectors v. The matrix is 100*100, but I wanna do it with broadcasting, and that's the number n I set up. And the matrix is not hermitian.
w,v=np.linalg.eig(K)
At first, I tried n=1000, I get real eigenvalues and eigenvectors, i.e. xxxxxxxxxe+xx, but when I tried n=2000, the elements of w and v shows xxxxxxxxxe+xx+0.j. Due to +0.j, it gave complex numbers when using w and v do further calculation.
I thought it's because of the algorithm error for float number calculation but why?
Does numpy.linalg use LAPACK? Is that possible the error from LAPACK?
How can I get rid of +0.j?
According to documentation, numpy.linalg.eig uses (for real arguments) the LAPACK routine DGEEV which does not make any assumptions about the input matrix (apart from being real). If the matrix is within floating point precision sufficiently symmetric, the complex part of the returned eigenvalues will be zero (the output argument WI of DGEEV). However, due to finite precision, it might happen that you could get some spurious complex parts.
EDIT:
If you are sure that your matrices have only real eigenvalues, you could strip the complex part with numpy.real or use numpy.linalg.eigh specialized for symmetric matrices.
As for numpy.linalg.eig, the relevant part in numpy/linalg/linalg.py is:
w, vt = _umath_linalg.eig(a, signature=signature, extobj=extobj)
if not isComplexType(t) and all(w.imag == 0.0):
w = w.real
vt = vt.real
result_t = _realType(result_t)
else:
result_t = _complexType(result_t)
So the test is a strict comparison all(w.imag == 0.0) and only then are the eigenvalues cast to real with w = w.real.
I found some examples online showing how to find the null space of a regular matrix in Python, but I couldn't find any examples for a sparse matrix (scipy.sparse.csr_matrix).
By null space I mean x such that M·x = 0, where '·' is matrix multiplication. Does anybody know how to do this?
Furthermore, in my case I know that the null space will consist of a single vector. Can this information be used to improve the efficiency of the method?
This isn't a complete answer yet, but hopefully it will be a starting point towards one. You should be able to compute the null space using a variant on the SVD-based approach shown for dense matrices in this question:
import numpy as np
from scipy import sparse
import scipy.sparse.linalg
def rand_rank_k(n, k, **kwargs):
"generate a random (n, n) sparse matrix of rank <= k"
a = sparse.rand(n, k, **kwargs)
b = sparse.rand(k, n, **kwargs)
return a.dot(b)
# I couldn't think of a simple way to generate a random sparse matrix with known
# rank, so I'm currently using a dense matrix for proof of concept
n = 100
M = rand_rank_k(n, n - 1, density=1)
# # this seems like it ought to work, but it doesn't
# u, s, vh = sparse.linalg.svds(M, k=1, which='SM')
# this works OK, but obviously converting your matrix to dense and computing all
# of the singular values/vectors is probably not feasible for large sparse matrices
u, s, vh = np.linalg.svd(M.todense(), full_matrices=False)
tol = np.finfo(M.dtype).eps * M.nnz
null_space = vh.compress(s <= tol, axis=0).conj().T
print(null_space.shape)
# (100, 1)
print(np.allclose(M.dot(null_space), 0))
# True
If you know that x is a single row vector then in principle you would only need to compute the smallest singular value/vector of M. It ought to be possible to do this using scipy.sparse.linalg.svds, i.e.:
u, s, vh = sparse.linalg.svds(M, k=1, which='SM')
null_space = vh.conj().ravel()
Unfortunately, scipy's svds seems to be badly behaved when finding small singular values of singular or near-singular matrices and usually either returns NaNs or throws an ArpackNoConvergence error.
I'm not currently aware of an alternative implementation of truncated SVD with Python bindings that will work on sparse matrices and can selectively find the smallest singular values - perhaps someone else knows of one?
Edit
As a side note, the second approach seems to work reasonably well using MATLAB or Octave's svds function:
>> M = rand(100, 99) * rand(99, 100);
% svds converges much more reliably if you set sigma to something small but nonzero
>> [U, S, V] = svds(M, 1, 1E-9);
>> max(abs(M * V))
ans = 1.5293e-10
I have been trying to find a solution to the same problem. Using Scipy's svds function provides unreliable results for small singular values. Therefore i have been using QR decomposition instead. The sparseqr https://github.com/yig/PySPQR provides a wrapper for Matlabs SuiteSparseQR method, and works reasonably well. Using this the null space can be calculated as:
from sparseqr import qr
Q, _, _,r = qr( M.transpose() )
N = Q.tocsr()[:,r:]
My code:
from numpy import *
def pca(orig_data):
data = array(orig_data)
data = (data - data.mean(axis=0)) / data.std(axis=0)
u, s, v = linalg.svd(data)
print s #should be s**2 instead!
print v
def load_iris(path):
lines = []
with open(path) as input_file:
lines = input_file.readlines()
data = []
for line in lines:
cur_line = line.rstrip().split(',')
cur_line = cur_line[:-1]
cur_line = [float(elem) for elem in cur_line]
data.append(array(cur_line))
return array(data)
if __name__ == '__main__':
data = load_iris('iris.data')
pca(data)
The iris dataset: http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
Output:
[ 20.89551896 11.75513248 4.7013819 1.75816839]
[[ 0.52237162 -0.26335492 0.58125401 0.56561105]
[-0.37231836 -0.92555649 -0.02109478 -0.06541577]
[ 0.72101681 -0.24203288 -0.14089226 -0.6338014 ]
[ 0.26199559 -0.12413481 -0.80115427 0.52354627]]
Desired Output:
Eigenvalues - [2.9108 0.9212 0.1474 0.0206]
Principal Components - Same as I got but transposed so okay I guess
Also, what's with the output of the linalg.eig function? According to the PCA description on wikipedia, I'm supposed to this:
cov_mat = cov(orig_data)
val, vec = linalg.eig(cov_mat)
print val
But it doesn't really match the output in the tutorials I found online. Plus, if I have 4 dimensions, I thought I should have 4 eigenvalues and not 150 like the eig gives me. Am I doing something wrong?
edit: I've noticed that the values differ by 150, which is the number of elements in the dataset. Also, the eigenvalues are supposed to add to be equal to the number of dimensions, in this case, 4. What I don't understand is why this difference is happening. If I simply divided the eigenvalues by len(data) I could get the result I want, but I don't understand why. Either way the proportion of the eigenvalues isn't altered, but they are important to me so I'd like to understand what's going on.
You decomposed the wrong matrix.
Principal Component Analysis requires manipulating the eigenvectors/eigenvalues
of the covariance matrix, not the data itself. The covariance matrix, created from an m x n data matrix, will be an m x m matrix with ones along the main diagonal.
You can indeed use the cov function, but you need further manipulation of your data. It's probably a little easier to use a similar function, corrcoef:
import numpy as NP
import numpy.linalg as LA
# a simulated data set with 8 data points, each point having five features
data = NP.random.randint(0, 10, 40).reshape(8, 5)
# usually a good idea to mean center your data first:
data -= NP.mean(data, axis=0)
# calculate the covariance matrix
C = NP.corrcoef(data, rowvar=0)
# returns an m x m matrix, or here a 5 x 5 matrix)
# now get the eigenvalues/eigenvectors of C:
eval, evec = LA.eig(C)
To get the eigenvectors/eigenvalues, I did not decompose the covariance matrix using SVD,
though, you certainly can. My preference is to calculate them using eig in NumPy's (or SciPy's)
LA module--it is a little easier to work with than svd, the return values are the eigenvectors
and eigenvalues themselves, and nothing else. By contrast, as you know, svd doesn't return these these directly.
Granted the SVD function will decompose any matrix, not just square ones (to which the eig function is limited); however when doing PCA, you'll always have a square matrix to decompose,
regardless of the form that your data is in. This is obvious because the matrix you
are decomposing in PCA is a covariance matrix, which by definition is always square
(i.e., the columns are the individual data points of the original matrix, likewise
for the rows, and each cell is the covariance of those two points, as evidenced
by the ones down the main diagonal--a given data point has perfect covariance with itself).
The left singular values returned by SVD(A) are the eigenvectors of AA^T.
The covariance matrix of a dataset A is : 1/(N-1) * AA^T
Now, when you do PCA by using the SVD, you have to divide each entry in your A matrix by (N-1) so you get the eigenvalues of the covariance with the correct scale.
In your case, N=150 and you haven't done this division, hence the discrepancy.
This is explained in detail here
(Can you ask one question, please? Or at least list your questions separately. Your post reads like a stream of consciousness because you are not asking one single question.)
You probably used cov incorrectly by not transposing the matrix first. If cov_mat is 4-by-4, then eig will produce four eigenvalues and four eigenvectors.
Note how SVD and PCA, while related, are not exactly the same. Let X be a 4-by-150 matrix of observations where each 4-element column is a single observation. Then, the following are equivalent:
a. the left singular vectors of X,
b. the principal components of X,
c. the eigenvectors of X X^T.
Also, the eigenvalues of X X^T are equal to the square of the singular values of X. To see all this, let X have the SVD X = QSV^T, where S is a diagonal matrix of singular values. Then consider the eigendecomposition D = Q^T X X^T Q, where D is a diagonal matrix of eigenvalues. Replace X with its SVD, and see what happens.
Question already adressed: Principal component analysis in Python