Calculate the distance between fitted hyperplane and points - python

I'm trying to find the distance between a fitted hyperplane and five points. Most of the responses I've read use SVM, but I'm not trying to do a classification problem. I know there are probably multiple ways to do this in Python, but I'm a little stumped.
As an example here are my points:
[[ 163.3828172 169.65537306 144.69201418]
[-212.50951396 -167.06555958 56.69388025]
[-164.65129832 -163.42420063 -149.97008725]
[ 41.8704004 52.2538316 14.0683657 ]
[-128.38386078 -102.76840542 -303.4960438 ]]
To find the equation of a fitted plane I use SVD to compute the coefficients ax + by + cz - b = 0.
def fit_plane(points):
assert points.shape[1] == 3
centroid = points.mean(axis=0)
x = points - centroid[None, :]
U, S, Vt = np.linalg.svd(x.T # x)
#normal vector of best fitting plane is the left
#singular vector corresponding to the least singular value
normal = U[:, -1]
#calculate the distance from origin
origin_distance = normal # centroid
return np.hstack([normal, -origin_distance])
fit_plane(X)
Giving the equation:
-0.67449074x + 0.73767288y -0.03001614z -10.75632119 = 0
Now how do I calculate the distance between the points and the hyperplane? The answer I've seen used in conjunction with SVMs is d = |w^Tx +b|/||w||, but I don't know how to go from the equation I have already.

You can find the distance between an equation π and a point P by dropping a perpendicular N from P to π and get the point A where N and π intersect. The distance you are looking for is the distance between A and P.
This video explains the math of finding A (although it is about finding the reflection, finding A is part of it).

Related

Fitting a plane by Orthogonal Regression in Python

I want to fit a plane to a set of points (x, y, z) in Python. I found various answers how to perform the fitting if the error is measured with respect to the z-axis but I want to consider errors in orthogonal direction. I found the following question (Best fit plane by minimizing orthogonal distances) which addresses the same question - but it's not clear to me how to implement this in Python (likely with NumPy/SciPy). Further details regarding the mathematical derivation can also be found here: http://www.ncorr.com/download/publications/eberlyleastsquares.pdf (section 2).
The first link you gave does describe the algorithm for orthogonal distance fitting, but rather tersely. Here, in case it helps, is a more prolix description:
I suppose you have points (in your case 3d, but the dimension makes no odds to the algotithm) P[i], i=1..N
You want to find a (hyper-) plane that is of mininmal orthogonal distance from your points.
A hyper-plane can be described by a unit vector n and a scalar d. The set of points on the plane is
{ P | n.P + d = 0 }
and the (orthogonal) distance of a point P from the plane is
n.P + d
So we want to find n and d to minimise
Q(n,d) = Sum{ i | (n.P[i]+d)*(n.P[i]+d) } /N
(The division by N isn't essential, and makes no difference to the values of n and d that are found, but to my mind makes the algebra neater)
The first thing to notice is that if we knew n, the d that minimises Q will be
d = -n.Pbar where
Pbar = Sum{ i | P[i]}/N, the mean of the P[]
We may as well use this value of d, so that, after a little algebra the problem reduces to minimising Q^:
Q^(n) = Sum{ i | (n.P[i]-n.Pbar)*(n.P[i]-n.Pbar) } /N
= n' * C * n
where
C = Sum{ i | (P[i]-Pbar)*(P[i]-Pbar) } /N
The form of Q^ tells us that the value of n to minimise Q^ will be an eigenvector of C correseponding to a minimal eigenvalue.
So (sorry I can't give code but my python is contemptible):
a/ compute
Pbar = Sum{ i | P[i]}/N, the mean of the points
b/ compute
C = Sum{ i | (P[i]-Pbar)*(P[i]-Pbar) } /N, the covariance matrix of the points
c/ diagonalise C, and pick out a minimal eigenvalue and the corresponding eigenvector n
d/ compute
d = -Pbar.n
Then n, d define the hyperplane you want.
I've also had to deal with this situation and at first the mathematical notation can be overwhelming, but in the end the solution is fairly simple.
Once you get the intuition that the vector (A,B,C) that defines the best fitting plane Ax+By+Cz+D=0 is the one that explains the minimum variance of your set of coordinates, then the solution is straightforward.
First thing to do is center your coordinates (this way D will be 0 in your plane equation)
coords -= coords.mean(axis=0)
Then you have 2 options to get the vector you are interested in: (1) use the PCA implementation from sklearn or scipy to get the vector that explains minimal variance
pca = PCA(n_components=3)
pca.fit(coords)
# The last component/vector is the one with minimal variance, see PCA documentation
normal_vector = pca.components_[-1]
(2) re-implement the procedure described in the Geometric Tool reference you've linked.
#njit
def get_best_fitting_plane_vector(coords):
# Calculate the covariance matrix of the coordinates
covariance_matrix = np.cov(coords, rowvar=False) # Variables = columns
# Calculate the eigenvalues & eigenvectors of the covariance matrix
e_val, e_vect = np.linalg.eig(covariance_matrix)
# The normal vector to the plane is the eigenvector associated to the minimum eigenvalue
min_eval = np.argmin(e_val)
normal_vector = e_vect[:, min_eval]
return normal_vector
In terms of speed, the re-implemented procedure is faster than using PCA, and can be a lot faster if you use numba (just decorate the function with #njit).
Based on your second refernce
[]
Say you have n samples (x,y,z)
I'll call the 3 terms M*A=V, and define the column arrays
X=[ x_0, x_1 .. x_n ]'
Y=[ y_0, y_1 .. y_n ]'
Z=[ z_0, z_1 .. z_n ]'
Define the (n by 3) matrix XY1=[X,Y,1n]:
[[x_0,y_0,1],
XY1= [x_1,y_1,1],
...
[x_n,y_n,1]]
The matrix M can be obtained as
M = XY1' * XY1
Where apostrophe (') is the transposition operator and (*) the matrix product.
And the array V is
V = XY1'*Z
The least squares solution can be obtained through the moore-penrose pseoudoinverse: [(M'*M)^-1 * M']
~A = [(M'*M)^-1 * M'] * V
Sample code:
import numpy as np
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
#Input your values
A=3
B=2
C=1
#reserve memory
xy1=np.ones([n,3])
#Make random data, n ( x,y ) tuples.
n=30 #samples
xy1[:,:2]=np.random.rand(n,2)
#plane: A*x+B*y+C = z , the z coord is calculated from random x,y
z=xy1.dot (np.array([[A,B,C],]).transpose() )
#addnoise
xy1[:,:2]+=np.random.normal(scale=0.05,size=[n,2])
z+=np.random.normal(scale=0.05,size=[n,1])
#calculate M and V
M=xy1.transpose().dot(xy1)
V=xy1.transpose().dot(z)
#pseudoinverse:
Mp=np.linalg.inv(M.transpose().dot(M)).dot(M.transpose())
#Least-squares Solution
ABC= Mp.dot(V)
Output
In [24]: ABC
Out[24]:
array([[3.11395111],
[2.02909874],
[1.01340411]])

How do I calculate the covariance matrix for a specific centroid (k-means clustering) in Python 3?

I am implementing an algorithm for k-means clustering. So far it works using Euclidean distances. Switching out Euclidean distances for Mahalanobis distances fails to cluster correctly.
For some reason, the Mahalanobis distance is negative at times. Turns out the covariance matrix has negative eigenvalues, which apparently is not good for covariance matrices.
Here are the functions I'm using:
#takes in data point x, centroid m, covariance matrix sigma
def mahalanobis(x, m, sigma):
return np.dot(np.dot(np.transpose(x - m), np.linalg.inv(sigma)), x - m)
#takes in centroid m and data (iris in 2d, dimensions: 2x150)
def covar_matrix(m, data):
d, n = data.shape
R = np.zeros((d,d))
for i in range(n):
R += np.dot(data[:,i:i+1] , np.transpose(data[:,i:i+1]))
R /= n
return R - np.dot(m, np.transpose(m))
#autocorrelation_matrix - centroid*centroid'
How I implemented the algorithm:
Set k
Randomly choose k centroids
Calculate covar_matrix() of each centroid
Calculate mahalanobis() of each data point to each centroid and add to closest cluster
Start looking for new centroids; for each data point* in each cluster, calculate the sum of mahalanobis() to every other point in the cluster; point with minimum sum becomes new centroid
Repeat 3-5 until old centroid and new centroids are the same
*Calculate covar_matrix() with this point
I expect a positive Mahalanobis distance and a positive definite covariance matrix (the latter will fix the former I hope).

Is there an algorithm to calculate the area of a Lissajous figure?

Suppose I have measurements of two signals
V = V(t) and U = U(t)
that are periodic in time with a phase difference between them. When plotted against each other in a graph V vs U they form a Lissajous figure, and I want to calculate the area inside it.
Is there an algorithm for such calculation?
I would like to solve this problem using Python. But a response in any language or an algorithm to do it will be very appreciated.
Examples of V and U signals can be generated using expressions like:
V(t) = V0*sin(2*pi*t) ; U(t) = U0*sin(2*pi*t + delta)
Figure 1 shows a graph of V,U vs t for V0=10, U0=5, t=np.arange(0.0,2.0,0.01) and delta = pi/5.
And Figure 2 shows the corresponding Lissajous figure V vs U.
This is an specific problem of a more general question: How to calculate a closed path integral obtained with a discrete (x_i,y_i) data set?
To find area of (closed) parametric curve in Cartesian coordinates, you can use Green's theorem (4-th formula here)
A = 1/2 * Abs(Integral[t=0..t=period] {(V(t) * U'(t) - V'(t) * U(t))dt})
But remember that interpretation - what is real area under self-intersected curves - is ambiguous, as #algrid noticed in comments
for the outer most curves area of usual Lissajous shapes I would try this:
find period of signal
so find T such:
U(t) = U(t+T)
V(t) = V(t+T)
sample data on t=<0,T>
I would use polar coordinate system with center equal to average U,V coordinate on interval t=<0,T> and call it U0,V0. Convert and store the data in polar coordinates so:
a(t)=atan2( V(t)-V0 , U(t)-U0 )
r(t)=sqrt( (U(t)-U0)^2 + (V(t)-V0)^2 )
and remember only the points with max radius for each angle position. That can be done either with arrays (limiting precision in angle) or geometricaly by computing polyline intersection with overlapping segments. and removing inside parts.
Compute the area from sampled data
So compute the the area by summing the pie triangles for each angular position covering whole circle.
This may not work for exotic shapes.
Both solutions above - by #MBo and by #Spektre (and #meowgoesthedog in the comments) - works fine. Thank you guys.
But I found another way to calculate the area A of an elliptical Lissajous curve: use the A = Pi*a*b formula (a and b are, respectively, the major and minor semi axis of the ellipse).
Steps:
1 - Find the period T of the V (or U) signal;
2 - In the time interval 0<t<T:
2.a - calculate the average values of V and U (V0 and U0), in order to determine the center of the ellipse;
2.b - calculate the distance r(t) from the point (V0,U0) using:
r(t)=sqrt( (U(t)-U0)^2 + (V(t)-V0)^2 )
3 - Find a and b values using:
a = max(r(t)); b = min(r(t))
4 - calculate A: A = Pi*a*b
The Lissajous curves will always be elliptical if the U,V signals are sinusoidal-like and have the same frequency.
Seizing the opportunity, I will propose a solution for the case where the V,U signals are triangular and have the same frequency. In this case, the Lissajous curve will be a parallelogram, then one can calculate its area A using A = 2*|D|*|d|*sin(q), where |D| and |d| are, respectively, the length of major and minor semi diagonals of the parallelogram and q is the angle between the vectors D and d.
Repeat steps 1 and 2 for the elliptical case.
In step 3 we will have:
|D| = max(r(t)) = r(t1); |d| = min(r(t)) = r(t2)
4' - Obtain t1 and t2 and use them to get the coordinates (V(t1)=V1,U(t1)=U1) and (V(t2)=V2,U(t2)=U2). Then the vectors D and d can be written as:
D=(V1,U1)-(V0,U0); d=(V2,U2)-(V0,U0)
5' - Calculate the angle q between D and d;
6' - Perform the calculation of A: A = 2*|D|*|d|*sin(q)

Intersection of nD line with convex hull in Python

I have created a convex hull using scipy.spatial.ConvexHull. I need to compute the intersection point between the convex hull and a ray, starting at 0 and in the direction of some other defined point. The convex hull is known to contain 0 so the intersection should be guaranteed. The dimension of the problem can vary between 2 and 5. I have tried some google searching but haven't found an answer. I am hoping this is a common problem with known solutions in computational geometry. Thank you.
According to qhull.org, the points x of a facet of the convex hull verify V.x+b=0, where V and b are given by hull.equations. (. stands for the dot product here. V is a normal vector of length one.)
If V is a normal, b is an offset, and x is a point inside the convex
hull, then Vx+b <0.
If U is a vector of the ray starting in O, the equation of the ray is x=αU, α>0. so the intersection of ray an facet is x = αU = -b/(V.U) U. The unique intersection point with the hull corresponds to the min of the positive values of α:
The next code give it :
import numpy as np
from scipy.spatial import ConvexHull
def hit(U,hull):
eq=hull.equations.T
V,b=eq[:-1],eq[-1]
alpha=-b/np.dot(V,U)
return np.min(alpha[alpha>0])*U
It is a pure numpy solution so it is fast. An example for 1 million points in the [-1,1]^3 cube :
In [13]: points=2*np.random.rand(1e6,3)-1;hull=ConvexHull(points)
In [14]: %timeit x=hit(np.ones(3),hull)
#array([ 0.98388702, 0.98388702, 0.98388702])
10000 loops, best of 3: 30 µs per loop
As mentioned by Ante in the comments, you need to find the closest intersection of all the lines/planes/hyper-planes in the hull.
To find the intersection of the ray with the hyperplane, do a dot product of the normalized ray with the hyperplane normal, which will tell you how far in the direction of the hyperplane normal you move for each unit distance along the ray.
If the dot product is negative it means that the hyperplane is in the opposite direction of the ray, if zero it means the ray is parallel to it and won't intersect.
Once you have a positive dot product, you can work out how far away the hyperplane is in the direction of the ray, by dividing the distance of the plane in the direction of the plane normal by the dot product. For example if the plane is 3 units away, and the dot product is 0.5, then you only get 0.5 units closer for every unit you move along the ray, so the hyperplane is 3 / 0.5 = 6 units away in the direction of the ray.
Once you have calculated this distance for all the hyperplanes and found the closest one, the intersection point is just the ray multiplied by the closest distance.
Here is a solution in Python (normalize function is from here):
def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm
def find_hull_intersection(hull, ray_point):
# normalise ray_point
unit_ray = normalize(ray_point)
# find the closest line/plane/hyperplane in the hull:
closest_plane = None
closest_plane_distance = 0
for plane in hull.equations:
normal = plane[:-1]
distance = plane[-1]
# if plane passes through the origin then return the origin
if distance == 0:
return np.multiply(ray_point, 0) # return n-dimensional zero vector
# if distance is negative then flip the sign of both the
# normal and the distance:
if distance < 0:
np.multiply(normal, -1);
distance = distance * -1
# find out how much we move along the plane normal for
# every unit distance along the ray normal:
dot_product = np.dot(normal, unit_ray)
# check the dot product is positive, if not then the
# plane is in the opposite direction to the rayL
if dot_product > 0:
# calculate the distance of the plane
# along the ray normal:
ray_distance = distance / dot_product
# is this the closest so far:
if closest_plane is None or ray_distance < closest_plane_distance:
closest_plane = plane
closest_plane_distance = ray_distance
# was there no valid plane? (should never happen):
if closest_plane is None:
return None
# return the point along the unit_ray of the closest plane,
# which will be the intersection point
return np.multiply(unit_ray, closest_plane_distance)
Test code in 2D (the solution generalizes to higher dimensions):
from scipy.spatial import ConvexHull
import numpy as np
points = np.array([[-2, -2], [2, 0], [-1, 2]])
h = ConvexHull(points)
closest_point = find_hull_intersection(h, [1, -1])
print closest_point
output:
[ 0.66666667 -0.66666667]

Points on a geodesic line

I am working on a unit sphere. I am interested to place N points on a strait line over the surface of the sphere (geodesic) between two arbitrary points. The coordinate of these points are in spherical coordinate (radians).
How do I compute a set of N equally spaced points along such line. I would like to take the curvature of the sphere into account in my calculation.
I am using python 2.7.9
You may consider SLERP - spherical linear interpolation
P = P0*Sin(Omega*(1-t))/Sin(Omega) + P1*Sin(Omega * t)/Sin(Omega)
where Omega is central angle between start and end points (arc of great circle), t is parameter in range [0..1], for i-th point t(i) = i/N
Let us reason geometrically.
Convert the two given points to Cartesian coordinates.
The angle between the position vectors from the center to P0 and P1 is given by the dot product
cos A = P0.P1
Construct a linear combination of these:
P = (1-t).P0 + t.P1
The angle between P and P0 is given by the dot product with P normalized
cos a = cos kA/N = P.P0/|P| = ((1-t) + t.cos A)/ sqrt((1-t)² + 2.(1-t).t.cos A + t²)
Squaring and rewriting, you obtain a second degree equation in t:
cos²a.(1-t)² + 2.(1-t).t.cos²a.cos A + t².cos²a - (1-t)² - 2.(1-t).t.cos A - t².cos²A = 0
- sin²a.(1-t)² - 2.(1-t).t.sin²a.cos A - t².(cos²A - cos² a) = 0
t²(-sin²a + 2.sin²a.cos A - cos²A + cos²a) + 2.t.sin²a.(1 - cos A) - sin²a = 0
Solve the equation, compute the vector P from its definition and normalize it.
Then revert to spherical coordinates. Varying k between 1 and N-1 will give you the required intermediate points.
Alternatively, you can use the Rodrigue's rotation formula around an axis in 3D. The axis is given by the cross-product P0 x P1.

Categories