Related
I understand that eigenvectors are only defined up to a multiplicative constant. As far as I see all numpy algorithms (e.g. linalg.eig, linalg.eigh, linalg.svd) yield identical eigenvectors for real matrices, so apparently they use the same normalization. In the case of a complex matrix, however, the algorithms yield different results.
That is, the eigenvectors are the same up to a (complex) constant z. After some experimenting with eig and eigh I realised that eigh always sets the phase angle (defined as arctan(complex part/real part)) to 0 for the first component of each eigenvector whereas eig seems to start with some (arbitrary ?) non-zero phase angle.
Q: Is there a way to normalize the eigenvectors from eigh in the way eig is doing it (that is not to force phase angle = 0)?
Example
I have a complex hermitian matrix G for which I want to calculate the eigenvectors using the two following algorithms:
numpy.linalg.eig for a real/complex square matrix
numpy.linalg.eighfor a real symmetric/complex hermitian matrix (special case of 1.)
Check that G is hermitian
# check if a matrix is hermitian
def isHermitian(a, rtol=1e-05, atol=1e-08):
return np.allclose(a, a.conjugate().T, rtol=rtol, atol=atol)
print('G is hermitian:', isHermitian(G))
Out:
G is hermitian: True
Perform eigenanalysis
# eigenvectors from EIG()
l1,u1 = np.linalg.eig(G)
idx = np.argsort(l1)[::-1]
l1,u1 = l1[idx].real,u1[:,idx]
# eigenvectors from EIGH()
l2,u2 = np.linalg.eigh(G)
idx = np.argsort(l2)[::-1]
l2,u2 = l2[idx],u2[:,idx]
Check eigenvalues
print('Eigenvalues')
print('eig\t:',l1[:3])
print('eigh\t:',l2[:3])
Out:
Eigenvalues
eig : [2.55621629e+03 3.48520440e+00 3.16452447e-02]
eigh : [2.55621629e+03 3.48520440e+00 3.16452447e-02]
Both methods yield the same eigenvectors.
Check eigenvectors
Now look at the eigenvectors (e.g. 3. eigenvector) , which differ by a constant factor z.
multFactors = u1[:,2]/u2[:,2]
if np.count_nonzero(multFactors[0] == multFactors):
print("All multiplication factors are same:", multFactors[0])
else:
print("Multiplication factors are different.")
Out:
All multiplication factors are same: (-0.8916113627685007+0.45280147727156245j)
Check phase angle
Now check the phase angle for the first component of the 3. eigenvector:
print('Phase angel (in PI) for first point:')
print('Eig\t:',np.arctan2(u1[0,2].imag,u1[0,2].real)/np.pi)
print('Eigh\t:',np.arctan2(u2[0,2].imag,u2[0,2].real)/np.pi)
Out:
Phase angel (in PI) for first point:
Eig : 0.8504246311627189
Eigh : 0.0
Code to reproduce figure
num = 2
fig = plt.figure()
gs = gridspec.GridSpec(2, 3)
ax0 = plt.subplot(gs[0,0])
ax1 = plt.subplot(gs[1,0])
ax2 = plt.subplot(gs[0,1:])
ax3 = plt.subplot(gs[1,1:])
ax2r= ax2.twinx()
ax3r= ax3.twinx()
ax0.imshow(G.real,vmin=-30,vmax=30,cmap='RdGy')
ax1.imshow(G.imag,vmin=-30,vmax=30,cmap='RdGy')
ax2.plot(u1[:,num].real,label='eig')
ax2.plot((u2[:,num]).real,label='eigh')
ax3.plot(u1[:,num].imag,label='eig')
ax3.plot((u2[:,num]).imag,label='eigh')
for a in [ax0,ax1,ax2,ax3]:
a.set_xticks([])
a.set_yticks([])
ax0.set_title('Re(G)')
ax1.set_title('Im(G)')
ax2.set_title('Re('+str(num+1)+'. Eigenvector)')
ax3.set_title('Im('+str(num+1)+'. Eigenvector)')
ax2.legend(loc=0)
ax3.legend(loc=0)
fig.subplots_adjust(wspace=0, hspace=.2,top=.9)
fig.suptitle('Eigenanalysis of Hermitian Matrix G',size=16)
plt.show()
As you say, the eigenvalue problem only fixes the eigenvectors up to a scalar x. Transforming an eigenvector v as v = v*x does not change its status as an eigenvector.
There is an "obvious" way to normalize the vectors (according to the euclidean inner product np.vdot(v1, v1)), but this only fixes the amplitude of the scalar, which can be complex.
Fixing the angle or "phase" is kind of arbitrary without further context. I tried out eigh() and indeed it just makes the first entry of the vector real (with an apparently random sign!?).
eig() instead chooses to make real the vector entry with the largest real part. For example, here is what I get for a random Hermitian matrix:
n = 10
H = 0.5*(X + X.conj().T)
np.max(la.eig(H)[1], axis=0)
# returns
array([0.57590624+0.j, 0.42672485+0.j, 0.51974879+0.j, 0.54500475+0.j,
0.4644593 +0.j, 0.53492448+0.j, 0.44080532+0.j, 0.50544424+0.j,
0.48589402+0.j, 0.43431733+0.j])
This is arguably more sensible, as just picking the first entry, like eigh() does, is not very robust if the first entry happens to be very small. Picking the max value avoids this. I am not sure if eig() also fixes the sign (a random matrix is not a very good test case for this as it would be very unusual for all entries in an eigenvector to have negative real parts, which is the only case in which an unfixed sign would show up).
In any case, I would not rely on the eigensolver using any particular way of fixing phases. It's not documented and so could, in principle, change in the future. Instead, fix the phases yourself, perhaps the same way eig() does it now.
In my experience (and there are many questions here to back this up), you NEVER want to use eig when eigh is an option - eig is very slow and very unstable. The relevance of this is that I believe your question is backward - you want to normalize the eigenvectors of eig to be like those of eigh, and this you know how to do.
This is the code I've found online
d0 = pd.read_csv('./mnist_train.csv')
labels = d0.label.head(15000)
data = d0.drop('label').head(15000)
from sklearn.preprocessing import StandardScaler
standardized_data = StandardScaler().fit_transform(data)
#find the co-variance matrix which is : (A^T * A)/n
sample_data = standardized_data
# matrix multiplication using numpy
covar_matrix = np.matmul(sample_data.T , sample_data) / len(sample_data)
How does multiplying the same data gives np.matmul(sample_data.T, sample_data) covariance matrix? What is the co-variance matrix according to this tutorial I found online? The last step is what I don't understand.
This might be a better question for the math or stats stack exchange, but I'll answer here for now.
This comes from the definition of covariance. The Wikipedia page (linked) gives a whole lot of detail, but covariance is defined as (in pseudo-code)
cov = E[dot((x - E[x]), (x - E[x]).T)]
for column vectors, but in your case you probably have row vectors, which is why the first element in your dot-product is transposed, not the second. The E[...] means expected value, which is the mean for Gaussian-distributed data. When you perform StandardScaler().fit_transform(data), you are basically subtracting out the mean of the data, so that's why you don't explicitly do so in your dot product.
Note that StandardScaler() is also dividing by the variance, so it's normalizing everything to unit variance. This is going to affect your covariance! So if you need the actual covariance of the data without normalization, just calculate it with something like np.cov() from the numpy module.
Let's build towards Covariance matrix step by step, first let's define variance.
The variance of some random variable X is a measure of how much values in the distribution vary on average with respect to the mean.
Now we have to define covariance.
Covariance is the measure of the joint probability for two random variables. It describes how the two variables change together. Read here.
So now armed with that you can understand that Co-variance matrix is a matrix which shows how each feature varies with changes in other features. Which can be calculated as
and there you can see the equation that you are confused about formed at the bottom. If you have any further queries, comment down.
Image Source: Wikipedia.
If I have two separate multivariate normal random variables:
from scipy.stats import multivariate_normal
import numpy as np
cov0=np.array([
[1,0,0],
[0,1,0],
[0,0,1]
])
mean0 = np.array([1,1,1])
rv3d_0 = multivariate_normal(mean=mean0, cov=cov0)
cov1=np.array([
[1,0,0],
[0,1,0],
[0,0,1]
])
mean1 = np.array([4,4,4])
rv3d_1 = multivariate_normal(mean=mean1, cov=cov1)
Then I am interested in creating a new random variable that is between these two:
mean_avg = (mean0+mean1)/2
cov_avg = (cov0+cov1)/2
rv3d_avg = multivariate_normal(mean=mean_avg, cov=cov_avg)
# I can then plot the points generated by:
rv3d_0.rvs(1000)
rv3d_1.rvs(1000)
rv3d_avg.rvs(1000)
However when looking at the points generated, the covariance is predictably the same as the two components. However what I would like is for the covariance to be greater along the vector (mean1-mean0) compared to the covariance along the orthogonal vectors. I think maybe taking the average of the covariance is not the proper technique? Any suggestions welcome, thanks!
This is an interesting problem. Look at it this way: you have some specific directions for the covariance components, namely mean1 - mean0 is one direction and the plane orthogonal to mean1 - mean0 contains the others. In these directions you want to specify the magnitude of the variation, namely it's something (let's say FOO) in the orthogonal plane and a lot more (let's say 100 times FOO) in the direction mean1 - mean0.
You can find a basis for the orthogonal plane via the Gram-Schmidt algorithm or something. At this point you can construct a covariance matrix: let S = columns of the directions you've found (namely mean1 - mean plus the basis of the orthogonal plane), and let D = diagonal matrix with 100 FOO, FOO, FOO, ..., FOO on the diagonal. Now S D S^T (where S^T is the matrix transpose) is a positive definite matrix with the desired properties.
You might be able to avoid Gram-Schmidt, but your goal would be the same in any case: specify the properties you want and then construct a matrix to satisfy them.
I would suggest the following approach:
1- sample a good amount of observations (say 10000) from both distributions: obs0 and obs1
2- create a new array of observations obs_avg which is the sum of obs0 and obs1 divided by 2
3- for the obtained array, calculate the mean and the covariance. the code should look like this:
import numpy as np
obs0 = np.random.normal(mean0, np.sqrt(cov0), 10000) #sampling from a normal distribution
obs1 = np.random.normal(mean1, np.sqrt(cov1), 10000)
obs_avg = (obs0 + obs1)/2
mean_avg = np.mean(obs_avg, axis=0)
cov_avg = np.cov(obs_avg.T)
It's an experimental way of generating the mean and covariance of the average distribution, and I think it should give you pretty accurate results if you take a large enough number of observations.
I've seen several posts on this subject, but I need a pure Python (no Numpy or any other imports) solution that accepts a list of points (x,y,z coordinates) and calculates a normal for the closest plane that to those points.
I'm following one of the working Numpy examples from here: Fit points to a plane algorithms, how to iterpret results?
def fitPLaneLTSQ(XYZ):
# Fits a plane to a point cloud,
# Where Z = aX + bY + c ----Eqn #1
# Rearanging Eqn1: aX + bY -Z +c =0
# Gives normal (a,b,-1)
# Normal = (a,b,-1)
[rows,cols] = XYZ.shape
G = np.ones((rows,3))
G[:,0] = XYZ[:,0] #X
G[:,1] = XYZ[:,1] #Y
Z = XYZ[:,2]
(a,b,c),resid,rank,s = np.linalg.lstsq(G,Z)
normal = (a,b,-1)
nn = np.linalg.norm(normal)
normal = normal / nn
return normal
XYZ = np.array([
[0,0,1],
[0,1,2],
[0,2,3],
[1,0,1],
[1,1,2],
[1,2,3],
[2,0,1],
[2,1,2],
[2,2,3]
])
print fitPLaneLTSQ(XYZ)
[ -8.10792259e-17 7.07106781e-01 -7.07106781e-01]
I'm trying to adapt this code: Basic ordinary least squares calculation to replace np.linalg.lstsq
Here is what I have so far without using Numpy using the same coords as above:
xvals = [0,0,0,1,1,1,2,2,2]
yvals = [0,1,2,0,1,2,0,1,2]
zvals = [1,2,3,1,2,3,1,2,3]
""" Basic ordinary least squares calculation. """
sumx, sumy = map(sum, [xvals, yvals])
sumxy = sum(map(lambda x, y: x*y, xvals, yvals))
sumxsq = sum(map(lambda x: x**2, xvals))
Nsamp = len(xvals)
# y = a*x + b
# a (slope)
slope = (Nsamp*sumxy - sumx*sumy) / ((Nsamp*sumxsq - sumx**2))
# b (intercept)
intercept = (sumy - slope*sumx) / (Nsamp)
a = slope
b = intercept
normal = (a,b,-1)
mag = lambda x : math.sqrt(sum(i**2 for i in x))
nn = mag(normal)
normal = [i/nn for i in normal]
print normal
[0.0, 0.7071067811865475, -0.7071067811865475]
As you can see, the answers come out the same, but that is only because of this particular example. In other examples, they don't match. If you look closely you'll see that in the Numpy example the 'z' values are fed into 'np.linalg.lstsq', but in the non-Numpy version the 'z' values are ignored. How do I work in the 'z' values to the least-squares code?
Thanks
I do not think you can get away without implementing some basic matrix operations. As this is a multivariate linear regression problem, you will definitely need dot product, transpose and norm. These are easy. The difficult part is that you also need matrix inverse or QR decomposition or something similar. People usually use BLAS for these for good reasons, implementing them is not easy - but not impossible either.
With QR decomposition
I would start by creating a Matrix class that has the following methods
dot(m1, m2) (or __matmul__(m1, m2) if you have python 3.5): it is just the sum of products, should be straightforward
transpose(self): swapping matrix elements, should be easy
norm(self): square root of sum of squares (should be only used on vectors)
qr_decomp(self): this one is tricky. For an almost pure python implementation see this rosetta code solution (disclaimer: I have not thoroughly checked this code). It uses some numpy functions, but these are basic functions you can implement for your matrix class (shape, eye, dot, copysign, norm).
leastsqr_ut(R, A): solve the equation Rx = A if R is an upper triangular matrix. Not trivial, but should be easy enough as you can solve it equation by equation from the bottom.
With these, the solution is easy:
Generate the matrix G as detailed in your numpy example
Find the QR decomposition of G
Solve Rb = Q'z for b using that R is an upper triangular matrix
Then the normal vector you are looking for is (b[0], b[1], -1) (or the norm of it if you want a unit length normal vector).
With matrix inverse
The inverse of a 3x3 matrix is relatively easy to calculate, but this method is much less numerically stable than doing QR decomposition. If it is not an important concern, then you can do the following: implement
dot(m1, m2) (or __matmul__(m1, m2) if you have python 3.5): it is just the sum of products, should be straightforward
transpose(self): swapping matrix elements, should be easy
norm(self): square root of sum of squares (should be only used on vectors)
det(self): determinant, but it is enough if it works on 2x2 and 3x3 matrices, and for those simple formulas are available
inv(self): matrix inverse. It is enough if it works on 3x3 matrices, there is a simple formula for example here
Then the formula for b is b = inv(G'G) * (G'z) and your normal vector is again (b[0], b[1], -1).
As you can see, none of these are simple, and most of it is replicating some numpy functionality while making it a lot slower lot slower. So make sure you have absolutely no other choice.
I generated a code with a similar purpose (see "tangentplane_3D" function in the linked code).
In my case I had a scatter cloud of points that define a 3D ellipsoid. For each point I wanted to determine the tangent plane to the ellipsoid containing such point --> Goal: Determination of a 3D plane.
The problem can be seen in the following way: A plane is defined by its normal and the normal can be seen as the eigenvector associated to the minimum of the eigenvalues of a n set of points.
What I did, and you can check it on the code I posted, is to select k points close to the point of interest at which I wanted to calculate the tangent plane. Then, I performed a 3D Single Value Decomposition to these k points. Finally, from these SVD I selected the minimum eigenvalue and its associated eigenvector which is, in fact, the normal of the plane best fitting my set of points, and thus in my case, tangent to the ellipsoid plane. With the normal vector and the point you can subsequently calculate the complete plane equation.
I hope it helps!!
Best wishes.
I'm trying to write a spectral clustering algorithm using NumPy/SciPy for larger (but still tractable) systems, making use of SciPy's sparse linear algebra library. Unfortunately, I'm running into stability issues with eigsh().
Here's my code:
import numpy as np
import scipy.sparse
import scipy.sparse.linalg as SLA
import sklearn.utils.graph as graph
W = self._sparse_rbf_kernel(self.X_, self.datashape)
D = scipy.sparse.csc_matrix(np.diag(np.array(W.sum(axis = 0))[0]))
L = graph.graph_laplacian(W) # D - W
vals, vects = SLA.eigsh(L, k = self.k, M = D, which = 'SM', sigma = 0, maxiter = 1000)
The sklearn library refers to the scikit-learn package, specifically this method for calculating a graph laplacian from a sparse SciPy matrix.
_sparse_rbf_kernel is a method I wrote to compute pairwise affinities of the data points. It operates by creating a sparse affinity matrix from image data, specifically by only computing pairwise affinities for the 8-neighborhoods around each pixel (instead of pairwise for all pixels with scikit-learn's rbf_kernel method, which for the record doesn't fix this either).
Since the laplacian is unnormalized, I'm looking for the smallest eigenvalues and corresponding eigenvectors of the system. I understand that ARPACK is ill-suited for finding small eigenvalues, but I'm trying to use shift-invert to find these values and am still not having much success.
With the above arguments (specifically, sigma = 0), I get the following error:
RuntimeError: Factor is exactly singular
With sigma = 0.001, I get a different error:
scipy.sparse.linalg.eigen.arpack.arpack.ArpackNoConvergence: ARPACK error -1: No convergence (1001 iterations, 0/5 eigenvectors converged)
I've tried all three different values for mode with the same result. Any suggestions for using the SciPy sparse library for finding small eigenvalues of a large system?
You should use which='LM': in the shift-invert mode, this parameter refers to the transformed eigenvalues. (As explained in the documentation.)