Plane fitting to 4 (or more) XYZ points - python

I have 4 points, which are very near to be at the one plane - it is the 1,4-Dihydropyridine cycle.
I need to calculate distance from C3 and N1 to the plane, which is made of C1-C2-C4-C5.
Calculating distance is OK, but fitting plane is quite difficult to me.
1,4-DHP cycle:
1,4-DHP cycle, another view:
from array import *
from numpy import *
from scipy import *
# coordinates (XYZ) of C1, C2, C4 and C5
x = [0.274791784, -1.001679346, -1.851320839, 0.365840754]
y = [-1.155674199, -1.215133985, 0.053119249, 1.162878076]
z = [1.216239624, 0.764265677, 0.956099579, 1.198231236]
# plane equation Ax + By + Cz = D
# non-fitted plane
abcd = [0.506645455682, -0.185724560275, -1.43998120646, 1.37626378129]
# creating distance variable
distance = zeros(4, float)
# calculating distance from point to plane
for i in range(4):
distance[i] = (x[i]*abcd[0]+y[i]*abcd[1]+z[i]*abcd[2]+abcd[3])/sqrt(abcd[0]**2 + abcd[1]**2 + abcd[2]**2)
print distance
# calculating squares
squares = distance**2
print squares
How to make sum(squares) minimized? I have tried least squares, but it is too hard for me.

That sounds about right, but you should replace the nonlinear optimization with an SVD. The following creates the moment of inertia tensor, M, and then SVD's it to get the normal to the plane. This should be a close approximation to the least-squares fit and be much faster and more predictable. It returns the point-cloud center and the normal.
def planeFit(points):
"""
p, n = planeFit(points)
Given an array, points, of shape (d,...)
representing points in d-dimensional space,
fit an d-dimensional plane to the points.
Return a point, p, on the plane (the point-cloud centroid),
and the normal, n.
"""
import numpy as np
from numpy.linalg import svd
points = np.reshape(points, (np.shape(points)[0], -1)) # Collapse trialing dimensions
assert points.shape[0] <= points.shape[1], "There are only {} points in {} dimensions.".format(points.shape[1], points.shape[0])
ctr = points.mean(axis=1)
x = points - ctr[:,np.newaxis]
M = np.dot(x, x.T) # Could also use np.cov(x) here.
return ctr, svd(M)[0][:,-1]
For example: Construct a 2D cloud at (10, 100) that is thin in the x direction and 100 times bigger in the y direction:
>>> pts = np.diag((.1, 10)).dot(randn(2,1000)) + np.reshape((10, 100),(2,-1))
The fit plane is very nearly at (10, 100) with a normal very nearly along the x axis.
>>> planeFit(pts)
(array([ 10.00382471, 99.48404676]),
array([ 9.99999881e-01, 4.88824145e-04]))

Least squares should fit a plane easily. The equation for a plane is: ax + by + c = z. So set up matrices like this with all your data:
x_0 y_0 1
A = x_1 y_1 1
...
x_n y_n 1
And
a
x = b
c
And
z_0
B = z_1
...
z_n
In other words: Ax = B. Now solve for x which are your coefficients. But since you have more than 3 points, the system is over-determined so you need to use the left pseudo inverse. So the answer is:
a
b = (A^T A)^-1 A^T B
c
And here is some simple Python code with an example:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
N_POINTS = 10
TARGET_X_SLOPE = 2
TARGET_y_SLOPE = 3
TARGET_OFFSET = 5
EXTENTS = 5
NOISE = 5
# create random data
xs = [np.random.uniform(2*EXTENTS)-EXTENTS for i in range(N_POINTS)]
ys = [np.random.uniform(2*EXTENTS)-EXTENTS for i in range(N_POINTS)]
zs = []
for i in range(N_POINTS):
zs.append(xs[i]*TARGET_X_SLOPE + \
ys[i]*TARGET_y_SLOPE + \
TARGET_OFFSET + np.random.normal(scale=NOISE))
# plot raw data
plt.figure()
ax = plt.subplot(111, projection='3d')
ax.scatter(xs, ys, zs, color='b')
# do fit
tmp_A = []
tmp_b = []
for i in range(len(xs)):
tmp_A.append([xs[i], ys[i], 1])
tmp_b.append(zs[i])
b = np.matrix(tmp_b).T
A = np.matrix(tmp_A)
fit = (A.T * A).I * A.T * b
errors = b - A * fit
residual = np.linalg.norm(errors)
print("solution: %f x + %f y + %f = z" % (fit[0], fit[1], fit[2]))
print("errors:")
print(errors)
print("residual: {}".format(residual))
# plot plane
xlim = ax.get_xlim()
ylim = ax.get_ylim()
X,Y = np.meshgrid(np.arange(xlim[0], xlim[1]),
np.arange(ylim[0], ylim[1]))
Z = np.zeros(X.shape)
for r in range(X.shape[0]):
for c in range(X.shape[1]):
Z[r,c] = fit[0] * X[r,c] + fit[1] * Y[r,c] + fit[2]
ax.plot_wireframe(X,Y,Z, color='k')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
The solution for your points:
0.143509 x + 0.057196 y + 1.129595 = z

The fact that you are fitting to a plane is only slightly relevant here. What you are trying to do is minimize a particular function starting from a guess. For that use scipy.optimize. Note that there is no guarantee that this is the globally optimal solution, only locally optimal. A different initial condition may converge to a different result, this works well if you start close to the local minima you are seeking.
I've taken the liberty to clean up your code by taking advantage of numpy's broadcasting:
import numpy as np
# coordinates (XYZ) of C1, C2, C4 and C5
XYZ = np.array([
[0.274791784, -1.001679346, -1.851320839, 0.365840754],
[-1.155674199, -1.215133985, 0.053119249, 1.162878076],
[1.216239624, 0.764265677, 0.956099579, 1.198231236]])
# Inital guess of the plane
p0 = [0.506645455682, -0.185724560275, -1.43998120646, 1.37626378129]
def f_min(X,p):
plane_xyz = p[0:3]
distance = (plane_xyz*X.T).sum(axis=1) + p[3]
return distance / np.linalg.norm(plane_xyz)
def residuals(params, signal, X):
return f_min(X, params)
from scipy.optimize import leastsq
sol = leastsq(residuals, p0, args=(None, XYZ))[0]
print("Solution: ", sol)
print("Old Error: ", (f_min(XYZ, p0)**2).sum())
print("New Error: ", (f_min(XYZ, sol)**2).sum())
This gives:
Solution: [ 14.74286241 5.84070802 -101.4155017 114.6745077 ]
Old Error: 0.441513295404
New Error: 0.0453564286112

This returns the 3D plane coefficients along with the RMSE of the fit.
The plane is provided in a homogeneous coordinate representation, meaning its dot product with the homogeneous coordinates of a point produces the distance between the two.
def fit_plane(points):
assert points.shape[1] == 3
centroid = points.mean(axis=0)
x = points - centroid[None, :]
U, S, Vt = np.linalg.svd(x.T # x)
normal = U[:, -1]
origin_distance = normal # centroid
rmse = np.sqrt(S[-1] / len(points))
return np.hstack([normal, -origin_distance]), rmse
Minor note: the SVD can also be directly applied to the points instead of the outer product matrix, but I found it to be slower with NumPy's SVD implementation.
U, S, Vt = np.linalg.svd(x.T, full_matrices=False)
rmse = S[-1] / np.sqrt(len(points))

Another way aside from svd to quickly reach a solution while dealing with outliers ( when you have a large data set ) is ransac :
def fit_plane(voxels, iterations=50, inlier_thresh=10): # voxels : x,y,z
inliers, planes = [], []
xy1 = np.concatenate([voxels[:, :-1], np.ones((voxels.shape[0], 1))], axis=1)
z = voxels[:, -1].reshape(-1, 1)
for _ in range(iterations):
random_pts = voxels[np.random.choice(voxels.shape[0], voxels.shape[1] * 10, replace=False), :]
plane_transformation, residual = fit_pts_to_plane(random_pts)
inliers.append(((z - np.matmul(xy1, plane_transformation)) <= inlier_thresh).sum())
planes.append(plane_transformation)
return planes[np.array(inliers).argmax()]
def fit_pts_to_plane(voxels): # x y z (m x 3)
# https://math.stackexchange.com/questions/99299/best-fitting-plane-given-a-set-of-points
xy1 = np.concatenate([voxels[:, :-1], np.ones((voxels.shape[0], 1))], axis=1)
z = voxels[:, -1].reshape(-1, 1)
fit = np.matmul(np.matmul(np.linalg.inv(np.matmul(xy1.T, xy1)), xy1.T), z)
errors = z - np.matmul(xy1, fit)
residual = np.linalg.norm(errors)
return fit, residual

Here's one way. If your points are P[1]..P[n] then compute the mean M of these and subtract it from each, getting points p[1]..p[n]. Then compute C = Sum{ p[i]*p[i]'} (the "covariance" matrix of the points). Next diagonalise C, that is find orthogonal U and diagonal E so that C = U*E*U'. If your points are indeed on a plane then one of the eigenvalues (ie the diagonal entries of E) will be very small (with perfect arithmetic it would be 0). In any case if the j'th one of these is the smallest, then let the j'th column of U be (A,B,C) and compute D = -M'*N. These parameters define the "best" plane, the one such that the sum of the squares of the distances from the P[] to the plane is least.

Related

Wondering if my code is correct for Simple Kriging

I am a student and we learned about simple kriging with an example of 3 points with known elevations and 1 point with unknown elevation with the assumption that the empirical semivariogram is represented by the linear regression line with Y intercept = 0 and slope of 4.0.
I attempted to generalize this for any number of points with known elevations and any number of points with unknown elevations. I will show my code and then I will show an edited version with 50 known points and a grid of 40 000 unknown points plotted in a grid making it look like an interpolated surface. I just want to know if my code is working as it should because I can't seem to find any examples of simple kriging the way we did it in class.
'''
Simple Kriging using any number of known and unknown points
Assumption: Let us assume that the empirical semivariogram is represented
by the linear regression line with Y intercept = 0 and slope = 4.0
'''
__author__ = "Frank D"
import numpy as np
import matplotlib.pyplot as plt
import random
# Define the data in the form of a list of tuples
# k for known points xyz and u for unknown points xy
# k = [(3.00, 4.00, 120), (6.30, 3.40, 103.00), (2.00, 1.30, 142)]
# u = [(3.00, 3.00)]
k = [(random.uniform(0, 20), random.uniform(0, 20), random.uniform(100, 200)) for _ in range(50)]
u = [(random.uniform(0, 20), random.uniform(0, 20)) for _ in range(30)]
def distance(p1, p2):
'''
This function calculates the distance between two points.
Keyword arguments:
p1, p2 (tuple, list): coordinate with x and y as first and second indices.
'''
return ((p2[0]-p1[0])**2 + (p2[1]-p1[1])**2)**(1/2)
distance_matrix = np.zeros((len(k), len(k))) # Creating empty matrix
for i in range(len(k)): # Looping through the k rows of the matrix
for j in range(len(k)): # Looping through the elements in each row
distance_matrix[i, j] = distance(k[i], k[j])
semivariance_matrix = 4*distance_matrix
distance_to_unknowns = np.zeros((len(u), len(k))) # Creating empty matrix
for i in range(len(u)):
for j in range(len(k)):
distance_to_unknowns[i, j] = distance(u[i], k[j])
semivariance_to_unknowns = 4*distance_to_unknowns
# Assembling Gamma and appending Lagrange multipliers
# by adding ones to the right side and bottom and then
# setting the diagonal to zeros
gamma = np.append(semivariance_matrix, np.ones(
[semivariance_matrix.shape[0], 1]), axis=1)
gamma = np.append(gamma, np.ones([1, gamma.shape[1]]), axis=0)
np.fill_diagonal(gamma, 0)
# Assembling vector Beta and appending Lagrange multipliers
beta = np.append(semivariance_to_unknowns,
np.ones([len(u), 1]), axis=1).transpose()
# Calculating lambda:
lambda_vector = np.matmul(np.linalg.inv(gamma), beta)
# Finding the variance
variance = np.zeros([len(u), 1])
for i in range(len(u)):
for j in range(len(k)):
variance[i][0] += lambda_vector[j][i]*semivariance_to_unknowns[i][j]
# Finding the standard error
std_error = np.sqrt(variance)
# Print known points to the console
for i in range(len(k)):
print(f'k{i} = {k[i]}')
# Assembling results vector containing elevations for unknown points
# and printing the results to the console
results = np.zeros([len(u), 1])
for i in range(len(u)):
for j in range(len(k)):
results[i][0] += lambda_vector[j][i]*k[j][2]
print(f'u{i} = ({u[i][0]}, {u[i][1]}, {results[i][0]}), variance =' +
f' {round(variance[i][0], 2)}, standard error =' +
f' {round(std_error[i][0], 2)}, 95% CI = ' +
f'({round(results[i][0] - 1.96*std_error[i][0], 2)},' +
f' {round(results[i][0] + 1.96*std_error[i][0], 2)})')
# Plotting the results
x_known = [point[0] for point in k]
y_known = [point[1] for point in k]
x_unknown = [point[0] for point in u]
y_unknown = [point[1] for point in u]
plt.scatter(x_known, y_known)
plt.scatter(x_unknown, y_unknown, color='red')
plt.title('Scatterplot of x vs y')
plt.xlabel("x")
plt.ylabel("y", rotation='horizontal')
for i in range(len(k)): #adding elevation labels for known points
plt.annotate(round(k[i][2], 2), (k[i][0], k[i][1]))
for i in range(len(u)): #adding elevation labels for unknown points
plt.annotate(round(results[i][0], 2), (u[i][0], u[i][1]))
plt.show()
Here is the edited version with 40 000 points and a color gradient added to make it look like an interpolated surface.
'''
Simple Kriging using any number of known and a grid of unknown points
Assumption: Let us assume that the empirical semivariogram is represented
by the linear regression line with Y intercept = 0 and slope = 4.0
'''
__author__ = "Frank D"
import numpy as np
import matplotlib.pyplot as plt
import random
# Define the data in the form of a list of tuples
# k for known points xyz and u for unknown points xy
k = [(random.uniform(0, 20), random.uniform(0, 20), random.uniform(100, 200)) for _ in range(50)]
u=[]
for j in range(200):
u += [(0.1*i, 0.1*j) for i in range(200)]
x_u_fill = [point[0] for point in u]
y_u_fill = [point[1] for point in u]
def distance(p1, p2):
'''
This function calculates the distance between two points.
Keyword arguments:
p1, p2 (tuple, list): coordinate with x and y as first and second indices.
'''
return ((p2[0]-p1[0])**2 + (p2[1]-p1[1])**2)**(1/2)
distance_matrix = np.zeros((len(k), len(k))) # Creating empty matrix
for i in range(len(k)): # Looping through the k rows of the matrix
for j in range(len(k)): # Looping through the elements in each row
distance_matrix[i, j] = distance(k[i], k[j])
semivariance_matrix = 4*distance_matrix
distance_to_unknowns = np.zeros((len(u), len(k))) # Creating empty matrix
for i in range(len(u)):
for j in range(len(k)):
distance_to_unknowns[i, j] = distance(u[i], k[j])
semivariance_to_unknowns = 4*distance_to_unknowns
# Assembling Gamma and appending Lagrange multipliers
# by adding ones to the right side and bottom and then
# setting the diagonal to zeros
gamma = np.append(semivariance_matrix, np.ones(
[semivariance_matrix.shape[0], 1]), axis=1)
gamma = np.append(gamma, np.ones([1, gamma.shape[1]]), axis=0)
np.fill_diagonal(gamma, 0)
# Assembling vector Beta and appending Lagrange multipliers
beta = np.append(semivariance_to_unknowns,
np.ones([len(u), 1]), axis=1).transpose()
# Calculating lambda:
lambda_vector = np.matmul(np.linalg.inv(gamma), beta)
# Finding the variance
variance = np.zeros([len(u), 1])
for i in range(len(u)):
for j in range(len(k)):
variance[i][0] += lambda_vector[j][i]*semivariance_to_unknowns[i][j]
# Finding the standard error
std_error = np.sqrt(variance)
# Print known points to the console
for i in range(len(k)):
print(f'k{i} = {k[i]}')
# Assembling results vector containing elevations for unknown points
# and printing the results to the console
results = np.zeros([len(u), 1])
for i in range(len(u)):
for j in range(len(k)):
results[i][0] += lambda_vector[j][i]*k[j][2]
print(f'u{i} = ({u[i][0]}, {u[i][1]}, {results[i][0]}), ' +
f'variance = {round(variance[i][0], 2)}, standard error = '+
f'{round(std_error[i][0], 2)}, 95% CI = ' +
f'({round(results[i][0] - 1.96*std_error[i][0], 2)}, ' +
f'{round(results[i][0] + 1.96*std_error[i][0], 2)})')
# Plotting the results
fig, ax = plt.subplots()
scat_fill = ax.scatter(x_u_fill, y_u_fill, c=results, cmap='jet')
scat_fill.set_clim(100, 200)
x_known = [point[0] for point in k]
y_known = [point[1] for point in k]
plt.scatter(x_known, y_known, s=20)
plt.title('Scatterplot of x vs y')
plt.xlabel("x")
plt.ylabel("y", rotation='horizontal')
for i in range(len(k)): #adding elevation labels for known points
plt.annotate(round(k[i][2], 2), (round(k[i][0], 2),round(k[i][1], 2)))
plt.show()

How to use a smooth curve to link points approximately distributing in a circle?

I have a set of twelve points, which center at (0, 0) and distribute approximately in a circle, at the interval of 30 degrees, shown in the image.
The twelve points
I want to use a smooth curve to link (go through) them like the image below (I draw the red line by hand).
a hand-drawn curve in red
I want to make it in python or matlab. I have tried some interpolation methods for the upper half and lower half separately, and wanted to combine them as a complete curve. However, the results always overshoot.
Thank you for any suggestions!
I think the key here is to note that you have to consider it as a parametrized curve in 2d, not just a 1d to 2d function. Furthermore since it should be something like a circle, you need an interpolation method that supports periodic boundaries. Here are two methods for which this applies:
% set up toy data
t = linspace(0, 2*pi, 10);
t = t(1:end-1);
a = 0.08;
b = 0.08;
x = cos(t+a*randn(size(t))) + b*randn(size(t));
y = sin(t+a*randn(size(t))) + b*randn(size(t));
plot(x, y, 'ok');
% fourier interpolation
z = x+1i*y;
y = interpft(z, 200);
hold on
plot(real(y), imag(y), '-.r')
% periodic spline interpolation
z = [z, z(1)];
n = numel(z);
t = 1:n;
pp = csape(t, z, 'periodic');
ts = linspace(1, n, 200);
y = ppval(pp, ts);;
plot(real(y), imag(y), ':b');
Thank for suggestions from #flawr. According to the answer from #flawr, I implemented the periodic spline interpolation in python (still working on implementing fourier interpolation in python.). Here is the code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import CubicSpline
# set up toy data
t = np.linspace(0, 2*np.pi, 10)
t = t[0:-1]
a = 0.08
b = 0.08
x = np.cos(t + a * np.random.normal(size=len(t))) + b * np.random.normal(size=len(t))
y = np.sin(t + a * np.random.normal(size=len(t))) + b * np.random.normal(size=len(t))
plt.scatter(x, y)
# periodic spline interpolation
z = []
for idx in range(len(x)):
z.append(complex(x[idx], y[idx]))
z.append(complex(x[0], y[0]))
len_z = len(z)
t = [i for i in range(len_z)]
cs = CubicSpline(t, z, bc_type='periodic')
xs = np.linspace(0, len_z, 200)
y_new = cs(xs)
plt.plot(y_new.real, y_new.imag)
plt.show()

How to convert a cartesian problem in a cylindrical problem?

I display a gyroid structure (TPMS) in a cartesian system using Pyvista. I try now to display the structure in cylindrical coordinates. Pyvista displays something cylindrical indeed but it seems that the unit cell length is not uniform (while there is no reason to change this my parameter "a" being steady. This change seems to appear especially along z but I don't understand why (see image).
I have this:
Here is a part of my code.
Thank you for your help.
import pyvista as pv
import numpy as np
from numpy import cos, sin, pi
from random import uniform
lattice_par = 1.0 # Unit cell length
a = (2*pi)/lattice_par
res = 200j
r, theta, z = np.mgrid[0:2:res, 0:2*pi:res, 0:4:res]
# consider using non-equidistant r for uniformity
def GyroidCyl(r, theta, z, b=0.8):
return (sin(a*(r*cos(theta) - 1))*cos(a*(r*sin(theta) - 1))
+ sin(a*(r*sin(theta) - 1))*cos(a*(z - 1))
+ sin(a*(z - 1))*cos(a*(r*cos(theta) - 1))
- b)
vol3 = GyroidCyl(r, theta, z)
# compute Cartesian coordinates for grid points
x = r * cos(theta)
y = r * sin(theta)
grid = pv.StructuredGrid(x, y, z)
grid["vol3"] = vol3.flatten()
contours3 = grid.contour([0]) # Isosurface = 0
pv.set_plot_theme('document')
p = pv.Plotter()
p.add_mesh(contours3, scalars=contours3.points[:, 2], show_scalar_bar=False, interpolate_before_map=True,
show_edges=False, smooth_shading=False, render=True)
p.show_axes_all()
p.add_floor()
p.show_grid()
p.add_title('Gyroid in cylindrical coordinates')
p.add_text('Volume Fraction Parameter = ' + str(b))
p.show(window_size=[2040, 1500])
So you've noted in comments that you're trying to replicate something like the strategy explained in this paper. What they do is take a regular gyroid unit cell, and then transform it to build a cylindrical shell. If igloos were cylindrical, then a gyroid cell would be a single piece of snow brick. Put them next to one another and stack them in a column, and you get a cylinder.
Since I can't use figures from the paper we'll have to recreate some ourselves. So you have to start from a regular gyroid defined by the implicit function
cos(x) sin(y) + cos(y) sin(z) + cos(z) sin(x) = 0
(or some variation thereof). Here's how a single unit cell looks:
import pyvista as pv
import numpy as np
res = 100j
a = 2*np.pi
x, y, z = np.mgrid[0:a:res, 0:a:res, 0:a:res]
def Gyroid(x, y, z):
return np.cos(x)*np.sin(y) + np.cos(y)*np.sin(z) + np.cos(z)*np.sin(x)
# compute implicit function
fun_values = Gyroid(x, y, z)
# create grid for contouring
grid = pv.StructuredGrid(x, y, z)
grid["vol3"] = fun_values.ravel('F')
contours3 = grid.contour([0]) # isosurface for 0
# plot the contour, i.e. the gyroid
pv.set_plot_theme('document')
plotter = pv.Plotter()
plotter.add_mesh(contours3, scalars=contours3.points[:, -1],
show_scalar_bar=False)
plotter.add_bounding_box()
plotter.enable_terrain_style()
plotter.show_axes()
plotter.show()
Using the "unit cell" term implies there's an underlying infinite lattice, which can be built by stacking these (rectangular) unit cells neatly next to one another. With some imagination we can convince ourselves that this is true. Or we can look at the formula and note that due to the trigonometric functions the function is periodic in x, y and z, with period 2*pi. This also tells us that we can transform the unit cell to have arbitrary rectangular dimensions by introducing lattice parameters a, b and c:
cos(kx x) sin(ky y) + cos(ky y) sin(kz z) + cos(kz z) sin(kx x) = 0, where
kx = 2 pi/a
ky = 2 pi/b
kz = 2 pi/c
(These kx, ky and kz quantities are called wave vectors in solid state physics.)
The corresponding change only affects the header:
res = 100j
a, b, c = lattice_params = 1, 2, 3
kx, ky, kz = [2*np.pi/lattice_param for lattice_param in lattice_params]
x, y, z = np.mgrid[0:a:res, 0:b:res, 0:c:res]
def Gyroid(x, y, z):
return ( np.cos(kx*x)*np.sin(ky*y)
+ np.cos(ky*y)*np.sin(kz*z)
+ np.cos(kz*z)*np.sin(kx*x))
This is where we start. What we have to do is take this unit cell, bend it so that it corresponds to a 30-degree circular arc on a cylinder, and stack the cylinder using this unit. According to the paper, they used 12 unit cells to create a circle in a plane (hence the 30-degree magic number), and stacked three such circular bands on top of each other to build the cylinder.
The actual mapping is also fairly clearly explained in the paper. Whereas your original x, y and z parameters of the function essentially interpolated between [0, a], [0, b] and [0, c], respectively, in the new setup x interpolates in the radius range [r1, r2], y interpolates in the angular range [0, pi/6] and z is just z. (In the paper x and y seem to be reversed with respect to this convention, but this shouldn't matter. If it matters, that's left as an exercise to the reader.)
So what we need to do is more or less keep the current grid points, but transform the corresponding x, y and z grid points so that they lie on a cylinder instead. Here's one take:
import pyvista as pv
import numpy as np
res = 100j
a, b, c = lattice_params = 1, 1, 1
kx, ky, kz = [2*np.pi/lattice_param for lattice_param in lattice_params]
r_aux, phi, z = np.mgrid[0:a:res, 0:b:res, 0:3*c:res]
# convert r_aux range to actual radii
r1, r2 = 1.5, 2
r = r2/a*r_aux + r1/a*(1 - r_aux)
def Gyroid(x, y, z):
return ( np.cos(kx*x)*np.sin(ky*y)
+ np.cos(ky*y)*np.sin(kz*z)
+ np.cos(kz*z)*np.sin(kx*x))
# compute data for cylindrical gyroid
# r_aux is x, phi / 12 is y and z is z
fun_values = Gyroid(r_aux, phi * 12, z)
# compute Cartesian coordinates for grid points
x = r * np.cos(phi*ky)
y = r * np.sin(phi*ky)
grid = pv.StructuredGrid(x, y, z)
grid["vol3"] = fun_values.ravel('F')
contours3 = grid.contour([0])
# plot cylindrical gyroid
pv.set_plot_theme('document')
plotter = pv.Plotter()
plotter.add_mesh(contours3, scalars=contours3.points[:, -1],
show_scalar_bar=False)
plotter.add_bounding_box()
plotter.show_axes()
plotter.enable_terrain_style()
plotter.show()
If you want to look at a single transformed unit cell in the cylindrical setting, use a single domain of phi and z for the function and only convert to 1/12 a full circle for the grid points:
fun_values = Gyroid(r_aux, phi, z/3)
# compute Cartesian coordinates for grid points
x = r * np.cos(phi*ky/12)
y = r * np.sin(phi*ky/12)
grid = pv.StructuredGrid(x, y, z/3)
But it's not easy to see the curvature in the (no longer a) unit cell:

Plotting orthogonal distances in python

Given a set of points and a line in 2D, I would like to plot the orthogonal distance between each point and the line. Any suggestions?
Find the equation of your given line in the form of y = m*x + b where m is slope and b is your y-intercept. The slope of the perpendicular line is the negative inverse of your known slope (i.e. m2 = -1/m). Use the given point and the new slope m2 to get the equation of the line perpendicular to the given line which goes through your point. Set the second line equal to the first and solve for x and y. This is where the two lines intersect. Get the difference between the intersection and find the magnitude to determine distance between the given line and the given point:
distance = ((x2 - x)**2 + (y2 - y)**2)**0.5
where [x2, y2] is the given point and [x, y] is the intersection.
More precisely, the following image was generated to illustrate this technique with sample code below:
import matplotlib.pyplot as plt
import numpy as np
# points follow [x, y] format
line_point1 = [2, 3]
line_point2 = [6, 8]
random_point = [-6, 5]
def line(x, get_eq=False):
m = (line_point1[1] - line_point2[1])/(line_point1[0] - line_point2[0])
b = line_point1[1] - m*line_point1[0]
if get_eq:
return m, b
else:
return m*x + b
def perpendicular_line(x, get_eq=False):
m, b = line(0, True)
m2 = -1/m
b2 = random_point[1] - m2*random_point[0]
if get_eq:
return m2, b2
else:
return m2*x + b2
def get_intersection():
m, b = line(0, True)
m2, b2 = perpendicular_line(0, True)
x = (b2 - b) / (m - m2)
y = line(x)
return [x, y]
domain = np.linspace(-10, 10)
plt.figure(figsize=(8, 9))
plt.plot(domain, [line(x) for x in domain], label='given line')
plt.plot(random_point[0], random_point[1], 'ro', label='given point')
plt.plot(domain, [perpendicular_line(x) for x in domain], '--', color='orange', label='perpendicular line')
intersection = get_intersection()
plt.plot(intersection[0], intersection[1], 'go', label='intersection')
plt.plot([intersection[0], random_point[0]], [intersection[1], random_point[1]], color='black', label='distance')
plt.legend()
plt.grid()
plt.show()
distance = ((random_point[0] - intersection[0])**2 + (random_point[1] -
intersection[1])**2)**0.5
print(distance)
This is more like a math question.
What #jacob says is perfect solution using coordinate geometry. If you prefer to use vector math (and hence numpy arrays as vectors), you can go about it like this:
Consider the vector equation of a line: L = A + ql
(q is a free parameter, the one we wish to find)
The position vector of your point: P
The orthogonality condition (just the dot product being zero): L . P = 0
Hence, (A + ql) . P = 0
or, q = - (A . P / l . P)
(Bold denotes a vector, bold-small denotes a unit vector, all else are scalars)
We've found q; substituting q in the vector equation of the line yields the position vector of the point which intersects with the perpendicular from the point dropped on the line. Now just find the distance between the two points, which is the magnitude of the difference vector:
d = |P - L(q)|
A numpy implementation is pretty straightforward:
(Define A, l and P as numpy arrays first)
...
L = A + lambda q: q*l // define the line as a function of q
q = - numpy.dot(A, P)/numpy.dot(l, P) // find q subject to condition
d = numpy.linalg.norm(P - L(q)) // find the norm of the difference vector
The advantage to this method is, it works in N-dimensions as well.
Here is a resource to refer to, on vector equations of lines.

Inverse Distance Weighted (IDW) Interpolation with Python

The Question:
What is the best way to calculate inverse distance weighted (IDW) interpolation in Python, for point locations?
Some Background:
Currently I'm using RPy2 to interface with R and its gstat module. Unfortunately, the gstat module conflicts with arcgisscripting which I got around by running RPy2 based analysis in a separate process. Even if this issue is resolved in a recent/future release, and efficiency can be improved, I'd still like to remove my dependency on installing R.
The gstat website does provide a stand alone executable, which is easier to package with my python script, but I still hope for a Python solution which doesn't require multiple writes to disk and launching external processes. The number of calls to the interpolation function, of separate sets of points and values, can approach 20,000 in the processing I'm performing.
I specifically need to interpolate for points, so using the IDW function in ArcGIS to generate rasters sounds even worse than using R, in terms of performance.....unless there is a way to efficiently mask out only the points I need. Even with this modification, I wouldn't expect performance to be all that great. I will look into this option as another alternative. UPDATE: The problem here is you are tied to the cell size you are using. If you reduce the cell-size to get better accuracy, processing takes a long time. You also need to follow up by extracting by points.....over all an ugly method if you want values for specific points.
I have looked at the scipy documentation, but it doesn't look like there is a straight forward way to calculate IDW.
I'm thinking of rolling my own implementation, possibly using some of the scipy functionality to locate the closest points and calculate distances.
Am I missing something obvious? Is there a python module I haven't seen that does exactly what I want? Is creating my own implementation with the aid of scipy a wise choice?
changed 20 Oct: this class Invdisttree combines inverse-distance weighting and
scipy.spatial.KDTree.
Forget the original brute-force answer;
this is imho the method of choice for scattered-data interpolation.
""" invdisttree.py: inverse-distance-weighted interpolation using KDTree
fast, solid, local
"""
from __future__ import division
import numpy as np
from scipy.spatial import cKDTree as KDTree
# http://docs.scipy.org/doc/scipy/reference/spatial.html
__date__ = "2010-11-09 Nov" # weights, doc
#...............................................................................
class Invdisttree:
""" inverse-distance-weighted interpolation using KDTree:
invdisttree = Invdisttree( X, z ) -- data points, values
interpol = invdisttree( q, nnear=3, eps=0, p=1, weights=None, stat=0 )
interpolates z from the 3 points nearest each query point q;
For example, interpol[ a query point q ]
finds the 3 data points nearest q, at distances d1 d2 d3
and returns the IDW average of the values z1 z2 z3
(z1/d1 + z2/d2 + z3/d3)
/ (1/d1 + 1/d2 + 1/d3)
= .55 z1 + .27 z2 + .18 z3 for distances 1 2 3
q may be one point, or a batch of points.
eps: approximate nearest, dist <= (1 + eps) * true nearest
p: use 1 / distance**p
weights: optional multipliers for 1 / distance**p, of the same shape as q
stat: accumulate wsum, wn for average weights
How many nearest neighbors should one take ?
a) start with 8 11 14 .. 28 in 2d 3d 4d .. 10d; see Wendel's formula
b) make 3 runs with nnear= e.g. 6 8 10, and look at the results --
|interpol 6 - interpol 8| etc., or |f - interpol*| if you have f(q).
I find that runtimes don't increase much at all with nnear -- ymmv.
p=1, p=2 ?
p=2 weights nearer points more, farther points less.
In 2d, the circles around query points have areas ~ distance**2,
so p=2 is inverse-area weighting. For example,
(z1/area1 + z2/area2 + z3/area3)
/ (1/area1 + 1/area2 + 1/area3)
= .74 z1 + .18 z2 + .08 z3 for distances 1 2 3
Similarly, in 3d, p=3 is inverse-volume weighting.
Scaling:
if different X coordinates measure different things, Euclidean distance
can be way off. For example, if X0 is in the range 0 to 1
but X1 0 to 1000, the X1 distances will swamp X0;
rescale the data, i.e. make X0.std() ~= X1.std() .
A nice property of IDW is that it's scale-free around query points:
if I have values z1 z2 z3 from 3 points at distances d1 d2 d3,
the IDW average
(z1/d1 + z2/d2 + z3/d3)
/ (1/d1 + 1/d2 + 1/d3)
is the same for distances 1 2 3, or 10 20 30 -- only the ratios matter.
In contrast, the commonly-used Gaussian kernel exp( - (distance/h)**2 )
is exceedingly sensitive to distance and to h.
"""
# anykernel( dj / av dj ) is also scale-free
# error analysis, |f(x) - idw(x)| ? todo: regular grid, nnear ndim+1, 2*ndim
def __init__( self, X, z, leafsize=10, stat=0 ):
assert len(X) == len(z), "len(X) %d != len(z) %d" % (len(X), len(z))
self.tree = KDTree( X, leafsize=leafsize ) # build the tree
self.z = z
self.stat = stat
self.wn = 0
self.wsum = None;
def __call__( self, q, nnear=6, eps=0, p=1, weights=None ):
# nnear nearest neighbours of each query point --
q = np.asarray(q)
qdim = q.ndim
if qdim == 1:
q = np.array([q])
if self.wsum is None:
self.wsum = np.zeros(nnear)
self.distances, self.ix = self.tree.query( q, k=nnear, eps=eps )
interpol = np.zeros( (len(self.distances),) + np.shape(self.z[0]) )
jinterpol = 0
for dist, ix in zip( self.distances, self.ix ):
if nnear == 1:
wz = self.z[ix]
elif dist[0] < 1e-10:
wz = self.z[ix[0]]
else: # weight z s by 1/dist --
w = 1 / dist**p
if weights is not None:
w *= weights[ix] # >= 0
w /= np.sum(w)
wz = np.dot( w, self.z[ix] )
if self.stat:
self.wn += 1
self.wsum += w
interpol[jinterpol] = wz
jinterpol += 1
return interpol if qdim > 1 else interpol[0]
#...............................................................................
if __name__ == "__main__":
import sys
N = 10000
Ndim = 2
Nask = N # N Nask 1e5: 24 sec 2d, 27 sec 3d on mac g4 ppc
Nnear = 8 # 8 2d, 11 3d => 5 % chance one-sided -- Wendel, mathoverflow.com
leafsize = 10
eps = .1 # approximate nearest, dist <= (1 + eps) * true nearest
p = 1 # weights ~ 1 / distance**p
cycle = .25
seed = 1
exec "\n".join( sys.argv[1:] ) # python this.py N= ...
np.random.seed(seed )
np.set_printoptions( 3, threshold=100, suppress=True ) # .3f
print "\nInvdisttree: N %d Ndim %d Nask %d Nnear %d leafsize %d eps %.2g p %.2g" % (
N, Ndim, Nask, Nnear, leafsize, eps, p)
def terrain(x):
""" ~ rolling hills """
return np.sin( (2*np.pi / cycle) * np.mean( x, axis=-1 ))
known = np.random.uniform( size=(N,Ndim) ) ** .5 # 1/(p+1): density x^p
z = terrain( known )
ask = np.random.uniform( size=(Nask,Ndim) )
#...............................................................................
invdisttree = Invdisttree( known, z, leafsize=leafsize, stat=1 )
interpol = invdisttree( ask, nnear=Nnear, eps=eps, p=p )
print "average distances to nearest points: %s" % \
np.mean( invdisttree.distances, axis=0 )
print "average weights: %s" % (invdisttree.wsum / invdisttree.wn)
# see Wikipedia Zipf's law
err = np.abs( terrain(ask) - interpol )
print "average |terrain() - interpolated|: %.2g" % np.mean(err)
# print "interpolate a single point: %.2g" % \
# invdisttree( known[0], nnear=Nnear, eps=eps )
Edit: #Denis is right, a linear Rbf (e.g. scipy.interpolate.Rbf with "function='linear'") isn't the same as IDW...
(Note, all of these will use excessive amounts of memory if you're using a large number of points!)
Here's a simple exampe of IDW:
def simple_idw(x, y, z, xi, yi):
dist = distance_matrix(x,y, xi,yi)
# In IDW, weights are 1 / distance
weights = 1.0 / dist
# Make weights sum to one
weights /= weights.sum(axis=0)
# Multiply the weights for each interpolated point by all observed Z-values
zi = np.dot(weights.T, z)
return zi
Whereas, here's what a linear Rbf would be:
def linear_rbf(x, y, z, xi, yi):
dist = distance_matrix(x,y, xi,yi)
# Mutual pariwise distances between observations
internal_dist = distance_matrix(x,y, x,y)
# Now solve for the weights such that mistfit at the observations is minimized
weights = np.linalg.solve(internal_dist, z)
# Multiply the weights for each interpolated point by the distances
zi = np.dot(dist.T, weights)
return zi
(Using the distance_matrix function here:)
def distance_matrix(x0, y0, x1, y1):
obs = np.vstack((x0, y0)).T
interp = np.vstack((x1, y1)).T
# Make a distance matrix between pairwise observations
# Note: from <http://stackoverflow.com/questions/1871536>
# (Yay for ufuncs!)
d0 = np.subtract.outer(obs[:,0], interp[:,0])
d1 = np.subtract.outer(obs[:,1], interp[:,1])
return np.hypot(d0, d1)
Putting it all together into a nice copy-paste example yields some quick comparison plots:
(source: jkington at www.geology.wisc.edu)
(source: jkington at www.geology.wisc.edu)
(source: jkington at www.geology.wisc.edu)
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import Rbf
def main():
# Setup: Generate data...
n = 10
nx, ny = 50, 50
x, y, z = map(np.random.random, [n, n, n])
xi = np.linspace(x.min(), x.max(), nx)
yi = np.linspace(y.min(), y.max(), ny)
xi, yi = np.meshgrid(xi, yi)
xi, yi = xi.flatten(), yi.flatten()
# Calculate IDW
grid1 = simple_idw(x,y,z,xi,yi)
grid1 = grid1.reshape((ny, nx))
# Calculate scipy's RBF
grid2 = scipy_idw(x,y,z,xi,yi)
grid2 = grid2.reshape((ny, nx))
grid3 = linear_rbf(x,y,z,xi,yi)
print grid3.shape
grid3 = grid3.reshape((ny, nx))
# Comparisons...
plot(x,y,z,grid1)
plt.title('Homemade IDW')
plot(x,y,z,grid2)
plt.title("Scipy's Rbf with function=linear")
plot(x,y,z,grid3)
plt.title('Homemade linear Rbf')
plt.show()
def simple_idw(x, y, z, xi, yi):
dist = distance_matrix(x,y, xi,yi)
# In IDW, weights are 1 / distance
weights = 1.0 / dist
# Make weights sum to one
weights /= weights.sum(axis=0)
# Multiply the weights for each interpolated point by all observed Z-values
zi = np.dot(weights.T, z)
return zi
def linear_rbf(x, y, z, xi, yi):
dist = distance_matrix(x,y, xi,yi)
# Mutual pariwise distances between observations
internal_dist = distance_matrix(x,y, x,y)
# Now solve for the weights such that mistfit at the observations is minimized
weights = np.linalg.solve(internal_dist, z)
# Multiply the weights for each interpolated point by the distances
zi = np.dot(dist.T, weights)
return zi
def scipy_idw(x, y, z, xi, yi):
interp = Rbf(x, y, z, function='linear')
return interp(xi, yi)
def distance_matrix(x0, y0, x1, y1):
obs = np.vstack((x0, y0)).T
interp = np.vstack((x1, y1)).T
# Make a distance matrix between pairwise observations
# Note: from <http://stackoverflow.com/questions/1871536>
# (Yay for ufuncs!)
d0 = np.subtract.outer(obs[:,0], interp[:,0])
d1 = np.subtract.outer(obs[:,1], interp[:,1])
return np.hypot(d0, d1)
def plot(x,y,z,grid):
plt.figure()
plt.imshow(grid, extent=(x.min(), x.max(), y.max(), y.min()))
plt.hold(True)
plt.scatter(x,y,c=z)
plt.colorbar()
if __name__ == '__main__':
main()
I also needed something fast and i started with #joerington solution and ended up finally at numba
I always experiment between scipy, numpy and numba and choose best one. For this problem I use numba, for extra tmp memory is negligible giving super speed.
With using numpy there is a trade-of with memory and speed. For example on a 16GB ram if you want to calculate interpolation of 50000 points on other 50000 points it will go out of memory or be incredibly slow, no matter what.
So to save on memory we need to use for loops so as to have minimum temp memory allocation. But writing for loops in numpy would mean loosing possible vectorization. For this we have numba. You can add numba jit for a function accepting with for loops on numpy and it will effectively vectorize in on hardware + additional parallelism on multi-core. It will give better speed up for huge arrays case and also you can run it on GPU without writing cuda
An extremely simple snippet would be to calculate distance matrix, in IDW case we need inverse distance matrix. But even for methods other than IDW you can do something similar
Also on custom methods for calculation of hypotenuse I have few caution points here
#nb.njit((nb.float64[:, :], nb.float64[:, :]), parallel=True)
def f2(d0, d1):
print('Numba with parallel')
res = np.empty((d0.shape[0], d1.shape[0]), dtype=d0.dtype)
for i in nb.prange(d0.shape[0]):
for j in range(d1.shape[0]):
res[i, j] = np.sqrt((d0[i, 0] - d1[j, 0])**2 + (d0[i, 1] - d1[j, 1])**2)
return res
Also recent numba becoming compatible with scikit, so that is +1
Refer:
Why np.hypot and np.subtract.outer very fast compared to vanilla broadcast ? Using Numba for speedup numpy in parallel for distance matrix calculation
Custom dtype in numpy for lattitude, longitude for faster distance matrix/krigging/IDW interpolation calculations

Categories