Related
say we have a 2D grid that is projected on a 3D surface, resulting in a 3D numpy array, like the below image. What is the most efficient way to calculate a surface normal for each point of this grid?
I can give you an example with simulated data:
I showed your way, with three points. With three points you can always calculate the cross product to get the perpendicular vector based on the two vectors created from three points. Order does not matter.
I took the liberty to also add the PCA approach using predefined sklearn functions. You can create your own PCA, good exercise to understand what happens under the hood but this works fine. The benefit of the approach is that it is easy to increase the number of neighbors and you are still able to calculate the normal vector. It is also possible to select the neighbors within a range instead of N nearest neighbors.
If you need more explanation about the working of the code please let me know.
from functools import partial
import numpy as np
from sklearn.neighbors import KDTree
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Grab some test data.
X, Y, Z = axes3d.get_test_data(0.25)
X, Y, Z = map(lambda x: x.flatten(), [X, Y, Z])
plt.plot(X, Y, Z, '.')
plt.show(block=False)
data = np.array([X, Y, Z]).T
tree = KDTree(data, metric='minkowski') # minkowki is p2 (euclidean)
# Get indices and distances:
dist, ind = tree.query(data, k=3) #k=3 points including itself
def calc_cross(p1, p2, p3):
v1 = p2 - p1
v2 = p3 - p1
v3 = np.cross(v1, v2)
return v3 / np.linalg.norm(v3)
def PCA_unit_vector(array, pca=PCA(n_components=3)):
pca.fit(array)
eigenvalues = pca.explained_variance_
return pca.components_[ np.argmin(eigenvalues) ]
combinations = data[ind]
normals = list(map(lambda x: calc_cross(*x), combinations))
# lazy with map
normals2 = list(map(PCA_unit_vector, combinations))
## NEW ##
def calc_angle_with_xy(vectors):
'''
Assuming unit vectors!
'''
l = np.sum(vectors[:,:2]**2, axis=1) ** 0.5
return np.arctan2(vectors[:, 2], l)
dist, ind = tree.query(data, k=5) #k=3 points including itself
combinations = data[ind]
# map with functools
pca = PCA(n_components=3)
normals3 = list(map(partial(PCA_unit_vector, pca=pca), combinations))
print( combinations[10] )
print(normals3[10])
n = np.array(normals3)
n[calc_angle_with_xy(n) < 0] *= -1
def set_axes_equal(ax):
'''Make axes of 3D plot have equal scale so that spheres appear as spheres,
cubes as cubes, etc.. This is one possible solution to Matplotlib's
ax.set_aspect('equal') and ax.axis('equal') not working for 3D.
Input
ax: a matplotlib axis, e.g., as output from plt.gca().
FROM: https://stackoverflow.com/questions/13685386/matplotlib-equal-unit-length-with-equal-aspect-ratio-z-axis-is-not-equal-to
'''
x_limits = ax.get_xlim3d()
y_limits = ax.get_ylim3d()
z_limits = ax.get_zlim3d()
x_range = abs(x_limits[1] - x_limits[0])
x_middle = np.mean(x_limits)
y_range = abs(y_limits[1] - y_limits[0])
y_middle = np.mean(y_limits)
z_range = abs(z_limits[1] - z_limits[0])
z_middle = np.mean(z_limits)
# The plot bounding box is a sphere in the sense of the infinity
# norm, hence I call half the max range the plot radius.
plot_radius = 0.5*max([x_range, y_range, z_range])
ax.set_xlim3d([x_middle - plot_radius, x_middle + plot_radius])
ax.set_ylim3d([y_middle - plot_radius, y_middle + plot_radius])
ax.set_zlim3d([z_middle - plot_radius, z_middle + plot_radius])
u, v, w = n.T
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
# ax.set_aspect('equal')
# Make the grid
ax.quiver(X, Y, Z, u, v, w, length=10, normalize=True)
set_axes_equal(ax)
plt.show()
The surface normal for a point cloud is not well defined. One way to define them is from the surface normal of a reconstructed mesh using triangulation (which can introduce artefacts regarding you specific input). A relatively simple and fast solution is to use VTK to do that, and more specifically, vtkSurfaceReconstructionFilter and vtkPolyDataNormals . Regarding your needs, it might be useful to apply other filters.
Goal
I would like to compute the 3D volume integral of a numeric scalar field.
Code
For this post, I will use an example of which the integral can be exactly computed. I have therefore chosen the following function:
In Python, I define the function, and a set of points in 3D, and then generate the discrete values at these points:
import numpy as np
# Make data.
def function(x, y, z):
return x**y**z
N = 5
grid = np.meshgrid(
np.linspace(0, 1, N),
np.linspace(0, 1, N),
np.linspace(0, 1, N)
)
points = np.vstack(list(map(np.ravel, grid))).T
x = points[:, 0]
y = points[:, 1]
z = points[:, 2]
values = [function(points[i, 0], points[i, 1], points[i, 2])
for i in range(len(points))]
Question
How can I find the integral, if I don't know the underlying function, i.e. if I only have the coordinates (x, y, z) and the values?
A nice way to go about this would be using scipy's tplquad integration. However, to use that, we need a function and not a cloud point.
An easy way around that is to use an interpolator, to get a function approximating our cloud point - we can for example use scipy's RegularGridInterpolator if the data is on a regular grid:
import numpy as np
from scipy import integrate
from scipy.interpolate import RegularGridInterpolator
# Make data.
def function(x,y,z):
return x*y*z
N = 5
xmin, xmax = 0, 1
ymin, ymax = 0, 1
zmin, zmax = 0, 1
x = np.linspace(xmin, xmax, N)
y = np.linspace(ymin, ymax, N)
z = np.linspace(zmin, zmax, N)
values = function(*np.meshgrid(x,y,z, indexing='ij'))
# Interpolate:
function_interpolated = RegularGridInterpolator((x, y, z), values)
# tplquad integrates func(z,y,x)
f = lambda z,y,x : my_interpolating_function([z,y,x])
result, error = integrate.tplquad(f, xmin, xmax, lambda _: ymin, lambda _:ymax,lambda *_: zmin, lambda *_: zmax)
In the example above, we get result = 0.12499999999999999 - close enough!
The easiest way to achieve what you are looking for is probably scipy's integration function. Here your example:
from scipy import integrate
# Make data.
def func(x,y,z):
return x**y**z
ranges = [[0,1], [0,1], [0,1]]
result, error = integrate.nquad(func, ranges)
Are you aware that the function that you created is different from the one that you show in the image. The one you created is an exponential (x^y^z) while the one that you are showing is just multiplications. If you want to represent the function in the image, use
def func(x,y,z):
return x*y*z
Hope this answers your question, otherwise just write a comment!
Edit:
Misread your post. If you only have the results, and they are not regularly spaced, you would have to figure out some form of interpolation (i.e. linear) and a lookup-table. If you do not know how to create that, let me know. The rest of the stated answer could still be used if you define func to return interpolated values from your original data
The first answer explains nicely the principal approach to handle this. Just wanted to illustrate an alternative way by showing the power of sklearn package and machine learning regression.
Doing the meshgrid in 3D gives a very large numpy array,
import numpy as np
N = 5
xmin, xmax = 0, 1
ymin, ymax = 0, 1
zmin, zmax = 0, 1
x = np.linspace(xmin, xmax, N)
y = np.linspace(ymin, ymax, N)
z = np.linspace(zmin, zmax, N)
grid = np.array(np.meshgrid(x,y,z, indexing='ij'))
grid.shape = (3, 5, 5, 5) # 2*5*5*5 = 250 numbers
Which is visually not very intuitive with 250 numbers. With different possible indexing ('ij' or 'xy'). Using regression we can get the same result with few input points (15-20).
# building random combinations from (x,y,z)
X = np.random.choice(x, 20)[:,None]
Y = np.random.choice(y, 20)[:,None]
Z = np.random.choice(z, 20)[:,None]
xyz = np.concatenate((X,Y,Z), axis = 1)
data = np.multiply.reduce(xyz, axis = 1)
So the input (grid) is just a 2D numpy array,
xyz.shape
(20, 3)
With the corresponding data,
data.shape = (20,)
Now the regression function and integration,
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from scipy import integrate
pipe=Pipeline([('polynomial',PolynomialFeatures(degree=3)),('modal',LinearRegression())])
pipe.fit(xyz, data)
def func(x,y,z):
return pipe.predict([[x, y, z]])
ranges = [[0,1], [0,1], [0,1]]
result, error = integrate.nquad(func, ranges)
print(result)
0.1257
This approach is useful with limited number of points.
Based on your requirements, it sounds like the most appropriate technique would be Monte Carlo integration:
# Step 0 start with some empirical data
observed_points = np.random.uniform(0,1,size=(10000,3))
unknown_fn = lambda x: np.prod(x) # just used to generate fake values
observed_values = np.apply_along_axis(unknown_fn, 1, observed_points)
K = 1000000
# Step 1 - assume that f(x,y,z) can be approximated by an interpolation
# of the data we have (you could get really fancy with the
# selection of interpolation method - we'll stick with straight lines here)
from scipy.interpolate import LinearNDInterpolator
f_interpolate = LinearNDInterpolator(observed_points, observed_values)
# Step 2 randomly sample from within convex hull of observed data
# Step 2a - Uniformly sample from bounding 3D-box of data
lower_bounds = observed_points.min(axis=0)
upper_bounds = observed_points.max(axis=0)
sampled_points = np.random.uniform(lower_bounds, upper_bounds,size=(K, 3))
# Step 2b - Reject points outside of convex hull...
# Luckily, we get a np.nan from LinearNDInterpolator in this case
sampled_values = f_interpolate(sampled_points)
rejected_idxs = np.argwhere(np.isnan(sampled_values))
# Step 2c - Remember accepted values of estimated f(x_i, y_i, z_i)
final_sampled_values = np.delete(sampled_values, rejected_idxs, axis=0)
# Step 3 - Calculate estimate of volume of observed data domain
# Since we sampled uniformly from the convex hull of data domain,
# each point was selected with P(x,y,z)= 1 / Volume of convex hull
volume = scipy.spatial.ConvexHull(observed_points).volume
# Step 4 - Multiply estimated volume of domain by average sampled value
I_hat = volume * final_sampled_values.mean()
print(I_hat)
For a derivation of why this works see this: https://cs.dartmouth.edu/wjarosz/publications/dissertation/appendixA.pdf
I have two sets 1__scatter_xyz.dat and 2__scatter_xyz.dat of scattered points.
These points are defined by 3 coordinates: x, y, z
1__scatter_xyz.dat : https://paste.ubuntu.com/25069931/
2__scatter_xyz.dat : https://paste.ubuntu.com/25069938/
These two sets of scattered points intersect in a region:
gnuplot> splot "1__scatter_xyz.dat" using 3:1:2 with points lt 1 title "1", "2__scatter_xyz.dat" using 3:1:2 with points lt 1 lc 2 title "2"
gnuplot> set xlabel 'x'
gnuplot> set ylabel 'y'
gnuplot> set zlabel 'z'
The crossing between the surface of set 1 with the surface of set 2 will define a line / curve, that plotted in a 2D y-xdiagram, will give us the phase boundary between these two sets.
I would like to plot in a 2D y-xdiagram this line / curve that arises from the crossing of both surfaces.
The way I thought on how to attack this problem :
We can define a new function, w = z_{1} - z_{2}.
The crossing between these two surfaces will be the points where w = (z_{1} - z_{2}) = 0.
I could then define two regions:
a) A region where w = 0
b) A region where w \neq 0
If I plot these two values of w in a 2D y-xdiagram :
I could then define that this line / curve is the phase boundary between these two sets:
a) The region where w = 0 is where both sets coexist together
b) The region where w \neq 0 is where both sets do not coexist together
Why I cannot progress with this solution:
If we just remove the blank lines on the .dat files and sort x- wise:
sed '/^\s*$/d' 1__scatter_xyz.dat | grep -v "^#" | sort -k1 -n > 1__scatter_xyz_sort_x_wise.dat
sed '/^\s*$/d' 2__scatter_xyz.dat | grep -v "^#" | sort -k1 -n > 2__scatter_xyz_sort_x_wise.dat
If you look at both x_wise.dat files, there is overlapping data:
set 1 goes from a y of -4.41 to 10.85, and set 2 goes from 8.06 to 17.64. The array of y is different on both sets. However, the array of x is the same: from 10 to 2000 with a step of 20.1.
Thus, set 1 and set 2 have the same array of x_{j}: from 10 to 2000 in a step of 20.1.
However, both sets do not have the same array of ys: there is an array y_{i}^{1} for set 1 and an array y_{i}^{2} for set 2.
In other words,
Thus, imagine that I find a point where both surfaces have the same value of z.
This point will be defined by x_{j}, y_{i}^{1} and y_{i}^{2} instead of two unique coordinates.
More efficient ideas are more than welcome.
Using scipy's griddata for this:
import numpy as np
import sys
import matplotlib.pyplot as plt
from scipy.interpolate import griddata
# Load data:
x_1, y_1, z_1 = = np.loadtxt(./1__scatter_xyz.dat, skiprows = 1).T
x_2, y_2, z_2 = = np.loadtxt(./2__scatter_xyz.dat, skiprows = 1).T
# According to the example posted in the above scipy's griddata link,
# variables "points" and "values" are defined, so we can similarly use:
points_1 = (x_1, y_1)
points_2 = (x_2, y_2)
values_1 = (z_1)
values_2 = (z_2)
We would now have to define the grid.
As explained deeply on the post, the array of y is sampled differently on both sets.
I we carefully study the data, there is a region of overlapping between both sets on the y space:
So, continuing with this scipy's griddata example, we can set:
T_initial = 10.0
T_end = 2000.0
number_of_Ts = 100
P_initial = 8.0622
P_end = 10.8535
number_of_Ps = 100
# And then define the mesh as:
grid_T, grid_P = np.meshgrid(np.linspace(T_initial, T_end, number_of_Ts), np.linspace(P_initial, P_end, number_of_Ps))
At this point I do not know how to continue, because we can actually just define two sets of grids ?
grid_Gibbs_solid_1 = griddata(points_solid_1, values_solid_1, (grid_T, grid_P), method='cubic')
grid_Gibbs_solid_2 = griddata(points_solid_2, values_solid_2, (grid_T, grid_P), method='cubic')
Which would be the approach to follow ?
Let f(x,y) and g(x,y) denote the functions corresponding to your two surfaces. What you are looking for is to plot the contour corresponding to the equation f(x,y) == g(x,y), or, equivalently f(x,y) - g(x,y) == 0.
Matplotlib offers the function contour for this purpose. As a simple example, consider the two surfaces given by the functions
import numpy as np
def f(x, y):
return np.exp(-(x**2 + y**2))
def g(x, y):
return (3*x**2 + y**2)/16
The following snippet plots the function f-g, the (3D) contour corresponding to f-g==0 as well as its (2D) projection on the z-plane
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
fig = plt.figure(figsize = (8,8))
ax = fig.gca(projection='3d')
X = np.linspace(-2, 2, 30)
Y = np.linspace(-2, 2, 30)
X, Y = np.meshgrid(X, Y)
Z = g(X,Y)
ax.plot_surface(X, Y, f(X,Y)-g(X,Y), rstride=1, cstride=1, cmap = cm.viridis, antialiased=False, alpha = 0.5)
ax.contour(X, Y, f(X,Y) - g(X,Y), zdir='z', offset=-2, levels = [0])
ax.contour(X, Y, f(X,Y) - g(X,Y), levels = [0])
ax.set_zlim(zmin = -2)
In your case, you have data samples instead of functions. You may easily obtain (approximate) functions of the surfaces from your data by interpolation (see scipy.interpolate).
I am trying to find pairs of (x,y) points within a maximum distance of each other. I thought the simplest thing to do would be to generate a DataFrame and go through each point, one by one, calculating if there are points with coordinates (x,y) within distance r of the given point (x_0, y_0). Then, divide the total number of discovered pairs by 2.
%pylab inline
import pandas as pd
def find_nbrs(low, high, num, max_d):
x = random.uniform(low, high, num)
y = random.uniform(low, high, num)
points = pd.DataFrame({'x':x, 'y':y})
tot_nbrs = 0
for i in arange(len(points)):
x_0 = points.x[i]
y_0 = points.y[i]
pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]
tot_nbrs += len(pt_nbrz)
plot (pt_nbrz.x, pt_nbrz.y, 'r-')
plot (points.x, points.y, 'b.')
return tot_nbrs
print find_nbrs(0, 1, 50, 0.1)
First of all, it's not always finding the right pairs (I see points that are within the stated distance that are not labeled).
If I write plot(..., 'or'), it highlights all the points. Which means that pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2] returns at least one (x,y). Why? Shouldn't it return an empty array if the comparison is False?
How do I do all of the above more elegantly in Pandas? For example, without having to loop through each element.
The functionality you're looking for is included in scipy's spatial distance module.
Here's an example of how you could use it. The real magic is in squareform(pdist(points)).
from scipy.spatial.distance import pdist, squareform
import numpy as np
import matplotlib.pyplot as plt
points = np.random.uniform(-.5, .5, (1000,2))
# Compute the distance between each different pair of points in X with pdist.
# Then, just for ease of working, convert to a typical symmetric distance matrix
# with squareform.
dists = squareform(pdist(points))
poi = points[4] # point of interest
dist_min = .1
close_points = dists[4] < dist_min
print("There are {} other points within a distance of {} from the point "
"({:.3f}, {:.3f})".format(close_points.sum() - 1, dist_min, *poi))
There are 27 other points within a distance of 0.1 from the point (0.194, 0.160)
For visualization purposes:
f,ax = plt.subplots(subplot_kw=
dict(aspect='equal', xlim=(-.5, .5), ylim=(-.5, .5)))
ax.plot(points[:,0], points[:,1], 'b+ ')
ax.plot(poi[0], poi[1], ms=15, marker='s', mfc='none', mec='g')
ax.plot(points[close_points,0], points[close_points,1],
marker='o', mfc='none', mec='r', ls='') # draw all points within distance
t = np.linspace(0, 2*np.pi, 512)
circle = dist_min*np.vstack([np.cos(t), np.sin(t)]).T
ax.plot((circle+poi)[:,0], (circle+poi)[:,1], 'k:') # Add a visual check for that distance
plt.show()
Lets say I have a path in a 2d-plane given by a parametrization, for example the archimedian spiral:
x(t) = a*φ*cos(φ), y(t) = a*φ*sin(φ)
Im looking for a way to discretize this with a numpy array,
the problem is if I use
a = 1
phi = np.arange(0, 10*np.pi, 0.1)
x = a*phi*np.cos(phi)
y = a*phi*np.sin(phi)
plt.plot(x,y, "ro")
I get a nice curve but the points don't have the same distance, for
growing φ the distance between 2 points gets larger.
Im looking for a nice and if possible fast way to do this.
It might be possible to get the exact analytical formula for your simple spiral, but I am not in the mood to do that and this might not be possible in a more general case. Instead, here is a numerical solution:
import matplotlib.pyplot as plt
import numpy as np
a = 1
phi = np.arange(0, 10*np.pi, 0.1)
x = a*phi*np.cos(phi)
y = a*phi*np.sin(phi)
dr = (np.diff(x)**2 + np.diff(y)**2)**.5 # segment lengths
r = np.zeros_like(x)
r[1:] = np.cumsum(dr) # integrate path
r_int = np.linspace(0, r.max(), 200) # regular spaced path
x_int = np.interp(r_int, r, x) # interpolate
y_int = np.interp(r_int, r, y)
plt.subplot(1,2,1)
plt.plot(x, y, 'o-')
plt.title('Original')
plt.axis([-32,32,-32,32])
plt.subplot(1,2,2)
plt.plot(x_int, y_int, 'o-')
plt.title('Interpolated')
plt.axis([-32,32,-32,32])
plt.show()
It calculates the length of all the individual segments, integrates the total path with cumsum and finally interpolates to get a regular spaced path. You might have to play with your step-size in phi, if it is too large you will see that the spiral is not a smooth curve, but instead built from straight line segments. Result: