Finding the distance of points to an axis - python

I have an array of points in 3D Cartesian space:
P = np.random.random((10, 3))
Now I'd like to find their distances to a given axis and on that given axis
Ax_support = array([3, 2, 1])
Ax_direction = array([1, 2, 3])
I've found a solution that first finds the vector from each point that is perpendicular to the direction vector... I feel however, that it is really complicated and that for such a standard problem there would be a numpy or scipy routine already out there (as there is to find the distance between points in scipy.spatial.distance)

I would be surprised to see such an operation among the standard operations of numpy/scipy. What you are looking for is the projection distance onto your line. Start by subtracting Ax_support:
P_centered = P - Ax_support
The points on the line through 0 with direction Ax_direction with the shortest distance in the L2 sense to each element of P_centered is given by
P_projected = P_centered.dot(np.linalg.pinv(
Ax_direction[np.newaxis, :])).dot(Ax_direction[np.newaxis, :])
Thus the formula you are looking for is
distances = np.sqrt(((P_centered - P_projected) ** 2).sum(axis=1))
Yes, this is exactly what you propose, in a vectorized way of doing things, so it should be pretty fast for reasonably many data points.
Note: If anybody knows a built-in function for this, I'd be very interested!

It's been bothering me that something like this does not exist, so I added haggis.math.segment_distance to the library of utilities I maintain, haggis.
One catch is that this function expects a line defined by two points rather than a support and direction. You can supply as many points and lines as you want, as long as the dimensions broadcast. The usual formats are many points projected on one line (as you have), or one point projected on many lines.
distances = haggis.math.segment_distance(
P, Ax_support, Ax_support + Ax_direction,
axis=1, segment=False)
Here is a reproducible example:
np.random.seed(0xBEEF)
P = np.random.random((10,3))
Ax_support = np.array([3, 2, 1])
Ax_direction = np.array([1, 2, 3])
d, t = haggis.math.segment_distance(
P, Ax_support, Ax_support + Ax_direction,
axis=1, return_t=True, segment=False)
return_t returns the location of the normal points along the line as a ratio of the line length from the support (i.e., Ax_support + t * Ax_direction is the location of the projected points).
>>> d
array([2.08730838, 2.73314321, 2.1075711 , 2.5672012 , 1.96132443,
2.53325436, 2.15278454, 2.77763701, 2.50545181, 2.75187883])
>>> t
array([-0.47585462, -0.60843258, -0.46755277, -0.4273361 , -0.53393468,
-0.58564737, -0.38732655, -0.53212317, -0.54956548, -0.41748691])
This allows you to make the following plot:
fig, ax = plt.subplots(subplot_kw={'projection': '3d'})
ax.plot(*Ax_support, 'ko')
ax.plot(*Ax_support + Ax_direction, 'ko')
start = min(t.min(), 0)
end = max(t.max(), 1)
margin = 0.1 * (end - start)
start -= margin
end += margin
ax.plot(*Ax_support[:, None] + [start, end] * Ax_direction[:, None], 'k--')
for s, te in zip(P, t):
pt = Ax_support + te * Ax_direction
ax.plot(*np.stack((s, pt), axis=-1), 'k-')
ax.plot(*pt, 'ko', markerfacecolor='w')
ax.plot(*P.T, 'ko', markerfacecolor='r')
plt.show()

Related

How to find intersection of a line with a mesh?

I have trajectory data, where each trajectory consists of a sequence of coordinates(x, y points) and each trajectory is identified by a unique ID.
These trajectories are in x - y plane, and I want to divide the whole plane into equal sized grid (square grid). This grid is obviously invisible but is used to divide trajectories into sub-segments. Whenever a trajectory intersects with a grid line, it is segmented there and becomes a new sub-trajectory with new_id.
I have included a simple handmade graph to make clear what I am expecting.
It can be seen how the trajectory is divided at the intersections of the grid lines, and each of these segments has new unique id.
I am working on Python, and seek some python implementation links, suggestions, algorithms, or even a pseudocode for the same.
Please let me know if anything is unclear.
UPDATE
In order to divide the plane into grid , cell indexing is done as following:
#finding cell id for each coordinate
#cellid = (coord / cellSize).astype(int)
cellid = (coord / 0.5).astype(int)
cellid
Out[] : array([[1, 1],
[3, 1],
[4, 2],
[4, 4],
[5, 5],
[6, 5]])
#Getting x-cell id and y-cell id separately
x_cellid = cellid[:,0]
y_cellid = cellid[:,1]
#finding total number of cells
xmax = df.xcoord.max()
xmin = df.xcoord.min()
ymax = df.ycoord.max()
ymin = df.ycoord.min()
no_of_xcells = math.floor((xmax-xmin)/ 0.5)
no_of_ycells = math.floor((ymax-ymin)/ 0.5)
total_cells = no_of_xcells * no_of_ycells
total_cells
Out[] : 25
Since the plane is now divided into 25 cells each with a cellid. In order to find intersections, maybe I could check the next coordinate in the trajectory, if the cellid remains the same, then that segment of the trajectory is in the same cell and has no intersection with grid. Say, if x_cellid[2] is greater than x_cellid[0], then segment intersects vertical grid lines. Though, I am still unsure how to find the intersections with the grid lines and segment the trajectory on intersections giving them new id.
This can be solved by shapely:
%matplotlib inline
import pylab as pl
from shapely.geometry import MultiLineString, LineString
import numpy as np
from matplotlib.collections import LineCollection
x0, y0, x1, y1 = -10, -10, 10, 10
n = 11
lines = []
for x in np.linspace(x0, x1, n):
lines.append(((x, y0), (x, y1)))
for y in np.linspace(y0, y1, n):
lines.append(((x0, y), (x1, y)))
grid = MultiLineString(lines)
x = np.linspace(-9, 9, 200)
y = np.sin(x)*x
line = LineString(np.c_[x, y])
fig, ax = pl.subplots()
for i, segment in enumerate(line.difference(grid)):
x, y = segment.xy
pl.plot(x, y)
pl.text(np.mean(x), np.mean(y), str(i))
lc = LineCollection(lines, color="gray", lw=1, alpha=0.5)
ax.add_collection(lc);
The result:
To not use shapely, and do it yourself:
import pylab as pl
import numpy as np
from matplotlib.collections import LineCollection
x0, y0, x1, y1 = -10, -10, 10, 10
n = 11
xgrid = np.linspace(x0, x1, n)
ygrid = np.linspace(y0, y1, n)
x = np.linspace(-9, 9, 200)
y = np.sin(x)*x
t = np.arange(len(x))
idx_grid, idx_t = np.where((xgrid[:, None] - x[None, :-1]) * (xgrid[:, None] - x[None, 1:]) <= 0)
tx = idx_t + (xgrid[idx_grid] - x[idx_t]) / (x[idx_t+1] - x[idx_t])
idx_grid, idx_t = np.where((ygrid[:, None] - y[None, :-1]) * (ygrid[:, None] - y[None, 1:]) <= 0)
ty = idx_t + (ygrid[idx_grid] - y[idx_t]) / (y[idx_t+1] - y[idx_t])
t2 = np.sort(np.r_[t, tx, tx, ty, ty])
x2 = np.interp(t2, t, x)
y2 = np.interp(t2, t, y)
loc = np.where(np.diff(t2) == 0)[0] + 1
xlist = np.split(x2, loc)
ylist = np.split(y2, loc)
fig, ax = pl.subplots()
for i, (xp, yp) in enumerate(zip(xlist, ylist)):
pl.plot(xp, yp)
pl.text(np.mean(xp), np.mean(yp), str(i))
lines = []
for x in np.linspace(x0, x1, n):
lines.append(((x, y0), (x, y1)))
for y in np.linspace(y0, y1, n):
lines.append(((x0, y), (x1, y)))
lc = LineCollection(lines, color="gray", lw=1, alpha=0.5)
ax.add_collection(lc);
You're asking a lot. You should attack most of the design and coding yourself, once you have a general approach. Algorithm identification is reasonable for Stack Overflow; asking for design and reference links is not.
I suggest that you put the point coordinates into a list. use the NumPy and SciKit capabilities to interpolate the grid intersections. You can store segments in a list (of whatever defines a segment in your data design). Consider making a dictionary that allows you to retrieve the segments by grid coordinates. For instance, if segments are denoted only by the endpoints, and points are a class of yours, you might have something like this, using the lower-left corner of each square as its defining point:
grid_seg = {
(0.5, 0.5): [p0, p1],
(1.0, 0.5): [p1, p2],
(1.0, 1.0): [p2, p3],
...
}
where p0, p1, etc. are the interpolated crossing points.
Each trajectory is composed of a series of straight line segments. You therefore need a routine to break each line segment into sections that lie completely within a grid cell. The basis for such a routine would be the Digital Differential Analyzer (DDA) algorithm, though you'll need to modify the basic algorithm since you need endpoints of the line within each cell, not just which cells are visited.
A couple of things you have to be careful of:
1) If you're working with floating point numbers, beware of rounding errors in the calculation of the step values, as these can cause the algorithm to fail. For this reason many people choose to convert to an integer grid, obviously with a loss of precision. This is a good discussion of the issues, with some working code (though not python).
2) You'll need to decide which of the 4 grid lines surrounding a cell belong to the cell. One convention would be to use the bottom and left edges. You can see the issue if you consider a horizontal line segment that falls on a grid line - does its segments belong to the cell above or the cell below?
Cheers
data = list of list of coordinates
For point_id, point_coord in enumerate(point_coord_list):
if current point & last point stayed in same cell:
append point's index to last list of data
else:
append a new empty list to data
interpolate the two points and add a new point
that is on the grid lines.
Data stores all trajectories. Each list within the data is a trajectory.
The cell index along x and y axes (x_cell_id, y_cell_id) can be found by dividing coordinate of point by dimension of cell, then round to integer. If the cell indices of current point are same as that of last points, then these two points are in the same cell. list is good for inserting new points but it is not as memory efficient as arrays.
Might be a good idea to create a class for trajectory. Or use a memory buffer and sparse data structure instead of list and list and an array for the x-y coordinates if the list of coordinates wastes too much memory.
Inserting new points into array is slow, so we can use another array for new points.
Warning: I haven't thought too much about the things below. It probably has bugs, and someone needs to fill in the gaps.
# coord n x 2 numpy array.
# columns 0, 1 are x and y coordinate.
# row n is for point n
# cell_size length of one side of the square cell.
# n_ycells number of cells along the y axis
import numpy as np
cell_id_2d = (coord / cell_size).astype(int)
x_cell_id = cell_id_2d[:,0]
y_cell_id = cell_id_2d[:,1]
cell_id_1d = x_cell_id + y_cell_id*n_x_cells
# if the trajectory exits a cell, its cell id changes
# and the delta_cell_id is not zero.
delta_cell_id = cell_id_1d[1:] - cell_id_1d[:-1]
# The nth trajectory should contains the points from
# the (crossing_id[n])th to the (crossing_id[n + 1] - 1)th
w = np.where(delta_cell_id != 0)[0]
crossing_ids = np.empty(w.size + 1)
crossing_ids[1:] = w
crossing_ids[0] = 0
# need to interpolate when the trajectory cross cell boundary.
# probably can replace this loop with numpy functions/indexing
new_points = np.empty((w.size, 2))
for i in range(1, n):
st = coord[crossing_ids[i]]
en = coord[crossing_ids[i+1]]
# 1. check which boundary of the cell is crossed
# 2. interpolate
# 3. put points into new_points
# Each trajectory contains some points from coord array and 2 points
# in the new_points array.
For retrieval, make a sparse array that contains the index of the starting point in the coord array.
Linear interpolation can look bad if the cell size is large.
Further explanation:
Description of the grid
For n_xcells = 4, n_ycells = 3, the grid is:
0 1 2 3 4
0 [ ][ ][ ][ ][ ]
1 [ ][ ][ ][* ][ ]
2 [ ][ ][ ][ ][ ]
[* ] has an x_index of 3 and a y_index of 1.
There are (n_x_cells * n_y_cells) cells in the grid.
Relationship between point and cell
The cell that contains the ith point of the trajectory has an x_index of x_cell_id[i] and a y_index of x_cell_id[i]. I get this by discretization through dividing the xy-coordinates of the points by the length of the cell and then truncate to integers.
The cell_id_1d of the cells are the number in [ ]
0 1 2 3 4
0 [0 ][1 ][2 ][3 ][4 ]
1 [5 ][6 ][7 ][8 ][9 ]
2 [10][11][12][13][14]
cell_id_1d[i] = x_cell_id[i] + y_cell_id[i]*n_x_cells
I converted the pair of cell indices (x_cell_id[i], y_cell_id[i]) for the ith point to a single index called cell_id_1d.
How to find if trajectory exit a cell at the ith point
Now, the ith and (i + 1)th points are in same cell, if and only if (x_cell_id[i], y_cell_id[i]) == (x_cell_id[i + 1], y_cell_id[i + 1]) and also cell_id_1d[i] == cell_id[i + 1], and cell_id[i + 1] - cell_id[i] == 0. delta_cell_ids[i] = cell_id_1d[i + 1] - cell_id[i], which is zero if and only the ith and (i + 1)th points are in the same cell.

Rotating 1D numpy array of radial intensities into 2D array of spacial intensities

I have a numpy array filled with intensity readings at different radii in a uniform circle (for context, this is a 1D radiative transfer project for protostellar formation models: while much better models exist, my supervisor wasnts me to have the experience of producing one so I understand how others work).
I want to take that 1d array, and "rotate" it through a circle, forming a 2D array of intensities that could then be shown with imshow (or, with a bit of work, aplpy). The final array needs to be 2d, and the projection needs to be Cartesian, not polar.
I can do it with nested for loops, and I can do it with lookup tables, but I have a feeling there must be a neat way of doing it in numpy or something.
Any ideas?
EDIT:
I have had to go back and recreate my (frankly horrible) mess of for loops and if statements that I had before. If I really tried, I could probably get rid of one of the loops and one of the if statements by condensing things down. However, the aim is not to make it work with for loops, but see if there is a built in way to rotate the array.
impB is an array that differs slightly from what I stated it was before. Its actually just a list of radii where particles are detected. I then bin those into radius bins to get the intensity (or frequency if you prefer) in each radius. R is the scale factor for my radius as I run the model in a dimensionless way. iRes is a resolution scale factor, essentially how often I want to sample my radial bins. Everything else should be clear.
radJ = np.ndarray(shape=(2*iRes, 2*iRes)) # Create array of 2xRadius square
for i in range(iRes):
n = len(impB[np.where(impB[:] < ((i+1.) * (R / iRes)))]) # Count number of things within this radius +1
m = len(impB[np.where(impB[:] <= ((i) * (R / iRes)))]) # Count number of things in this radius
a = (((i + 1) * (R / iRes))**2 - ((i) * (R / iRes))**2) * math.pi # A normalisation factor based on area.....dont ask
for x in range(iRes):
for y in range(iRes):
if (x**2 + y**2) < (i * iRes)**2:
if (x**2 + y**2) >= (i * iRes)**2: # Checks for radius, and puts in cartesian space
radJ[x+iRes,y+iRes] = (n-m) / a # Put in actual intensity bins
radJ[x+iRes,-y+iRes] = (n-m) / a
radJ[-x+iRes,y+iRes] = (n-m) / a
radJ[-x+iRes,-y+iRes] = (n-m) / a
Nested loops are a simple approach for that. With ri_data_r and y containing your radius values (difference to the middle pixel) and the array for rotation, respectively, I would suggest:
from scipy import interpolate
import numpy as np
y = np.random.rand(100)
ri_data_r = np.linspace(-len(y)/2,len(y)/2,len(y))
interpol_index = interpolate.interp1d(ri_data_r, y)
xv = np.arange(-1, 1, 0.01) # adjust your matrix values here
X, Y = np.meshgrid(xv, xv)
profilegrid = np.ones(X.shape, float)
for i, x in enumerate(X[0, :]):
for k, y in enumerate(Y[:, 0]):
current_radius = np.sqrt(x ** 2 + y ** 2)
profilegrid[i, k] = interpol_index(current_radius)
print(profilegrid)
This will give you exactly what you are looking for. You just have to take in your array and calculate an symmetric array ri_data_r that has the same length as your data array and contains the distance between the actual data and the middle of the array. The code is doing this automatically.
I stumbled upon this question in a different context and I hope I understood it right. Here are two other ways of doing this. The first uses skimage.transform.warp with interpolation of desired order (here we use order=0 Nearest-neighbor). This method is slower but more precise and needs less memory then the second method.
The second one does not use interpolation, therefore is faster but also less precise and needs way more memory because it stores each 2D array containing one tilt until the end, where they are averaged with np.nanmean().
The difference between both solutions stemmed from the problem of handling the center of the final image where the tilts overlap the most, i.e. the first one would just add values with each tilt ending up out of the original range. This was "solved" by clipping the matrix in each step to a global_min and global_max (consult the code). The second one solves it by taking the mean of the tilts where they overlap, which forces us to use the np.nan.
Please, read the Example of usage and Sanity check sections in order to understand the plot titles.
Solution 1:
import numpy as np
from skimage.transform import warp
def rotate_vector(vector, deg_angle):
# Credit goes to skimage.transform.radon
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
center = vector.size // 2
square = np.zeros((vector.size, vector.size))
square[center,:] = vector
rad_angle = np.deg2rad(deg_angle)
cos_a, sin_a = np.cos(rad_angle), np.sin(rad_angle)
R = np.array([[cos_a, sin_a, -center * (cos_a + sin_a - 1)],
[-sin_a, cos_a, -center * (cos_a - sin_a - 1)],
[0, 0, 1]])
# Approx. 80% of time is spent in this function
return warp(square, R, clip=False, output_shape=((vector.size, vector.size)))
def place_vectors(vectors, deg_angles):
matrix = np.zeros((vectors.shape[-1], vectors.shape[-1]))
global_min, global_max = 0, 0
for i, deg_angle in enumerate(deg_angles):
tilt = rotate_vector(vectors[i], deg_angle)
global_min = tilt.min() if global_min > tilt.min() else global_min
global_max = tilt.max() if global_max < tilt.max() else global_max
matrix += tilt
matrix = np.clip(matrix, global_min, global_max)
return matrix
Solution 2:
Credit for the idea goes to my colleague Michael Scherbela.
import numpy as np
def rotate_vector(vector, deg_angle):
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
square = np.ones([vector.size, vector.size]) * np.nan
radius = vector.size // 2
r_values = np.linspace(-radius, radius, vector.size)
rad_angle = np.deg2rad(deg_angle)
ind_x = np.round(np.cos(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_y = np.round(np.sin(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_x = np.clip(ind_x, 0, vector.size-1)
ind_y = np.clip(ind_y, 0, vector.size-1)
square[ind_y, ind_x] = vector
return square
def place_vectors(vectors, deg_angles):
matrices = []
for deg_angle, vector in zip(deg_angles, vectors):
matrices.append(rotate_vector(vector, deg_angle))
matrix = np.nanmean(np.array(matrices), axis=0)
return np.nan_to_num(matrix, copy=False, nan=0.0)
Example of usage:
r = 100 # Radius of the circle, i.e. half the length of the vector
n = int(np.pi * r / 8) # Number of vectors, e.g. number of tilts in tomography
v = np.ones(2*r) # One vector, e.g. one tilt in tomography
V = np.array([v]*n) # All vectors, e.g. a sinogram in tomography
# Rotate 1D vector to a specific angle (output is 2D)
angle = 45
rotated = rotate_vector(v, angle)
# Rotate each row of a 2D array according to its angle (output is 2D)
angles = np.linspace(-90, 90, num=n, endpoint=False)
inplace = place_vectors(V, angles)
Sanity check:
These are just simple checks which by no means cover all possible edge cases. Depending on your use case you might want to extend the checks and adjust the method.
# I. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then sum(inplace) should be approx. equal to (n * (2πr - n)) / π
# which is an area that should be covered by the tilts
desired_area = (n * (2 * np.pi * r - n)) / np.pi
covered_area = np.sum(inplace)
covered_frac = covered_area / desired_area
print(f'This method covered {covered_frac * 100:.2f}% '
'of the area which should be covered in total.')
# II. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then a circle M with radius m <= r should be the largest circle which
# is fully covered by the vectors. I.e. its mean should be no less than 1.
# If n = πr then m = r.
# m = n / π
m = int(n / np.pi)
# Code for circular mask not included
mask = create_circular_mask(2*r, 2*r, center=None, radius=m)
m_area = np.mean(inplace[mask])
print(f'Full radius r={r}, radius m={m}, mean(M)={m_area:.4f}.')
Code for plotting:
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 8))
plt.subplot(121)
rotated = np.nan_to_num(rotated) # not necessary in case of the first method
plt.title(
f'Output of rotate_vector(), angle={angle}°\n'
f'Sum is {np.sum(rotated):.2f} and should be {np.sum(v):.2f}')
plt.imshow(rotated, cmap=plt.cm.Greys_r)
plt.subplot(122)
plt.title(
f'Output of place_vectors(), r={r}, n={n}\n'
f'Covered {covered_frac * 100:.2f}% of the area which should be covered.\n'
f'Mean of the circle M is {m_area:.4f} and should be 1.0.')
plt.imshow(inplace)
circle=plt.Circle((r, r), m, color='r', fill=False)
plt.gcf().gca().add_artist(circle)
plt.gcf().gca().legend([circle], [f'Circle M (m={m})'])

Find point along line a specified distance from a polygon

Given a 2-D closed polygon defined by a series of points and an infinite line, I would like to find points on that line a specified distance from the polygon. The polygon is known to be closed, not intersecting, and not containing 3 consecutive collinear points. In general there are many possible points along the line. Ideally I would like to find them all, or alternatively the one nearest some initial guess location. I am using python but a solution in any language would be helpful. I believe scipy.spatial kdtree will be one important component, but I cannot see how to do the whole solution. Here is some code to define the problem, which shows at least some of the corner cases involved:
import numpy as np
import matplotlib.pyplot as plt
poly = np.array([[0, 0],
[10, 0],
[10, 3],
[1, 1],
[1, 6],
[0, 6],
[.8, 4],
[0, 0]])
line = np.array([[-2, 4.5],
[12, 3]])
plt.plot(poly[:, 0], poly[:, 1])
plt.plot(line[:, 0], line[:, 1])
plt.xlim([-1, 11])
plt.ylim([-1, 7])
plt.show()
points = find_points_distance_from_polygon(poly, line, distance)
Edit: I am looking for the algorithm to find the points.
Update:
What I have tried so far is an approximate solution using the distance to each point. My thought was that if I refined the polygon by adding additional points along each line, then this approach might be accurate enough. However I would have to add a lot of points if the distance was small. I thought there is probably a better way.
import scipy.spatial as spatial
import scipy.optimize as opt
import math
def find_point_distance_from_polygon_along_line(tree, line, dist, guess_ratio):
def f(x):
pt = line[0, :] + x * (line[1, :] - line[0, :])
d, i = tree.query(pt)
return math.fabs(d - dist)
res = opt.minimize(f, [guess_ratio])
return line[0, :] + res.x * (line[1, :] - line[0, :])
tree = spatial.cKDTree(poly)
pt = find_point_distance_from_polygon_along_line(tree, line, 1, 0)
For the example in the plot and a distance of 0.5, I expect to find 4 points at approximately (.1, 4.2), (1.5, 4.1), (9.1, 3.3), and (10.5, 3.1). My current plan would find more points, particularly points which are some distance from the opposite edge of the polygon. I want the line connecting the point on the line to the polygon to be external to the polygon.
If number of polygon edges is reasonable, you can use simple linear algorithm.
Let's parametric equation for line is
L(u) = L0 + u * dL
where L0 is some base point, dL is direction vector, u is parameter
and parametric equation for i-th segment is
P = P[i] + t * Dir[i]
where P[i] is the first point of segment, Dir[i] is normalized direction vector, t is parameter in range 0..1
Arbitrary point at the line has it's projection on given segment at parameter
t = DotProduct(L(u) - P[i], Dir[i]) //equation 1
and length of normal to the projection (needed distance) is
Dist = Abs(CrossProduct(L(u) - P[i], Dir[i]))
Abs((L0x + u * dLx - Px) * Diry - (L0y + u * dLy - Py) * Dirx) = Dist
so
u = (+-Dist - ((L0x- Px)*Diry -(L0y-Py)*Dirx)) / (dLx * Diry - dLy * Dirx)
substitute values u into equation 1 and check if parameter t is in range 0..1 (projection inside the segment). If yes, L(u) is needed point.
Then check distance to vertices - solve
(L0x + u * dLx - Px)^2 + (L0y + u * dLy - Py)^2 = Dist^2

Find the area between two curves plotted in matplotlib (fill_between area)

I have a list of x and y values for two curves, both having weird shapes, and I don't have a function for any of them. I need to do two things:
Plot it and shade the area between the curves like the image below.
Find the total area of this shaded region between the curves.
I'm able to plot and shade the area between those curves with fill_between and fill_betweenx in matplotlib, but I have no idea on how to calculate the exact area between them, specially because I don't have a function for any of those curves.
Any ideas?
I looked everywhere and can't find a simple solution for this. I'm quite desperate, so any help is much appreciated.
Thank you very much!
EDIT: For future reference (in case anyone runs into the same problem), here is how I've solved this: connected the first and last node/point of each curve together, resulting in a big weird-shaped polygon, then used shapely to calculate the polygon's area automatically, which is the exact area between the curves, no matter which way they go or how nonlinear they are. Works like a charm! :)
Here is my code:
from shapely.geometry import Polygon
x_y_curve1 = [(0.121,0.232),(2.898,4.554),(7.865,9.987)] #these are your points for curve 1 (I just put some random numbers)
x_y_curve2 = [(1.221,1.232),(3.898,5.554),(8.865,7.987)] #these are your points for curve 2 (I just put some random numbers)
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in x_y_curve1:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in x_y_curve2[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in x_y_curve1[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
polygon = Polygon(polygon_points)
area = polygon.area
print(area)
EDIT 2: Thank you for the answers. Like Kyle explained, this only works for positive values. If your curves go below 0 (which is not my case, as showed in the example chart), then you would have to work with absolute numbers.
The area calculation is straightforward in blocks where the two curves don't intersect: thats the trapezium as has been pointed out above. If they intersect, then you create two triangles between x[i] and x[i+1], and you should add the area of the two. If you want to do it directly, you should handle the two cases separately. Here's a basic working example to solve your problem. First, I will start with some fake data:
#!/usr/bin/python
import numpy as np
# let us generate fake test data
x = np.arange(10)
y1 = np.random.rand(10) * 20
y2 = np.random.rand(10) * 20
Now, the main code. Based on your plot, looks like you have y1 and y2 defined at the same X points. Then we define,
z = y1-y2
dx = x[1:] - x[:-1]
cross_test = np.sign(z[:-1] * z[1:])
cross_test will be negative whenever the two graphs cross. At these points, we want to calculate the x coordinate of the crossover. For simplicity, I will calculate x coordinates of the intersection of all segments of y. For places where the two curves don't intersect, they will be useless values, and we won't use them anywhere. This just keeps the code easier to understand.
Suppose you have z1 and z2 at x1 and x2, then we are solving for x0 such that z = 0:
# (z2 - z1)/(x2 - x1) = (z0 - z1) / (x0 - x1) = -z1/(x0 - x1)
# x0 = x1 - (x2 - x1) / (z2 - z1) * z1
x_intersect = x[:-1] - dx / (z[1:] - z[:-1]) * z[:-1]
dx_intersect = - dx / (z[1:] - z[:-1]) * z[:-1]
Where the curves don't intersect, area is simply given by:
areas_pos = abs(z[:-1] + z[1:]) * 0.5 * dx # signs of both z are same
Where they intersect, we add areas of both triangles:
areas_neg = 0.5 * dx_intersect * abs(z[:-1]) + 0.5 * (dx - dx_intersect) * abs(z[1:])
Now, the area in each block x[i] to x[i+1] is to be selected, for which I use np.where:
areas = np.where(cross_test < 0, areas_neg, areas_pos)
total_area = np.sum(areas)
That is your desired answer. As has been pointed out above, this will get more complicated if the both the y graphs were defined at different x points. If you want to test this, you can simply plot it (in my test case, y range will be -20 to 20)
negatives = np.where(cross_test < 0)
positives = np.where(cross_test >= 0)
plot(x, y1)
plot(x, y2)
plot(x, z)
plt.vlines(x_intersect[negatives], -20, 20)
Define your two curves as functions f and g that are linear by segment, e.g. between x1 and x2, f(x) = f(x1) + ((x-x1)/(x2-x1))*(f(x2)-f(x1)).
Define h(x)=abs(g(x)-f(x)). Then use scipy.integrate.quad to integrate h.
That way you don't need to bother about the intersections. It will do the "trapeze summing" suggested by ch41rmn automatically.
Your set of data is quite "nice" in the sense that the two sets of data share the same set of x-coordinates. You can therefore calculate the area using a series of trapezoids.
e.g. define the two functions as f(x) and g(x), then, between any two consecutive points in x, you have four points of data:
(x1, f(x1))-->(x2, f(x2))
(x1, g(x1))-->(x2, g(x2))
Then, the area of the trapezoid is
A(x1-->x2) = ( f(x1)-g(x1) + f(x2)-g(x2) ) * (x2-x1)/2 (1)
A complication arises that equation (1) only works for simply-connected regions, i.e. there must not be a cross-over within this region:
|\ |\/|
|_| vs |/\|
The area of the two sides of the intersection must be evaluated separately. You will need to go through your data to find all points of intersections, then insert their coordinates into your list of coordinates. The correct order of x must be maintained. Then, you can loop through your list of simply connected regions and obtain a sum of the area of trapezoids.
EDIT:
For curiosity's sake, if the x-coordinates for the two lists are different, you can instead construct triangles. e.g.
.____.
| / \
| / \
| / \
|/ \
._________.
Overlap between triangles must be avoided, so you will again need to find points of intersections and insert them into your ordered list. The lengths of each side of the triangle can be calculated using Pythagoras' formula, and the area of the triangles can be calculated using Heron's formula.
The area_between_two_curves function in pypi library similaritymeasures (released in 2018) might give you what you need. I tried a trivial example on my side, comparing the area between a function and a constant value and got pretty close tie-back to Excel (within 2%). Not sure why it doesn't give me 100% tie-back, maybe I am doing something wrong. Worth considering though.
I had the same problem.The answer below is based on an attempt by the question author. However, shapely will not directly give the area of the polygon in purple. You need to edit the code to break it up into its component polygons and then get the area of each. After-which you simply add them up.
Area Between two lines
Consider the lines below:
Sample Two lines
If you run the code below you will get zero for area because it takes the clockwise and subtracts the anti clockwise area:
from shapely.geometry import Polygon
x_y_curve1 = [(1,1),(2,1),(3,3),(4,3)] #these are your points for curve 1
x_y_curve2 = [(1,3),(2,3),(3,1),(4,1)] #these are your points for curve 2
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in x_y_curve1:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in x_y_curve2[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in x_y_curve1[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
polygon = Polygon(polygon_points)
area = polygon.area
print(area)
The solution is therefore to split the polygon into smaller pieces based on where the lines intersect. Then use a for loop to add these up:
from shapely.geometry import Polygon
x_y_curve1 = [(1,1),(2,1),(3,3),(4,3)] #these are your points for curve 1
x_y_curve2 = [(1,3),(2,3),(3,1),(4,1)] #these are your points for curve 2
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in x_y_curve1:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in x_y_curve2[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in x_y_curve1[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
polygon = Polygon(polygon_points)
area = polygon.area
x,y = polygon.exterior.xy
# original data
ls = LineString(np.c_[x, y])
# closed, non-simple
lr = LineString(ls.coords[:] + ls.coords[0:1])
lr.is_simple # False
mls = unary_union(lr)
mls.geom_type # MultiLineString'
Area_cal =[]
for polygon in polygonize(mls):
Area_cal.append(polygon.area)
Area_poly = (np.asarray(Area_cal).sum())
print(Area_poly)
A straightforward application of the area of a general polygon (see Shoelace formula) makes for a super-simple and fast, vectorized calculation:
def area(p):
# for p: 2D vertices of a polygon:
# area = 1/2 abs(sum(p0 ^ p1 + p1 ^ p2 + ... + pn-1 ^ p0))
# where ^ is the cross product
return np.abs(np.cross(p, np.roll(p, 1, axis=0)).sum()) / 2
Application to area between two curves. In this example, we don't even have matching x coordinates!
np.random.seed(0)
n0 = 10
n1 = 15
xy0 = np.c_[np.linspace(0, 10, n0), np.random.uniform(0, 10, n0)]
xy1 = np.c_[np.linspace(0, 10, n1), np.random.uniform(0, 10, n1)]
p = np.r_[xy0, xy1[::-1]]
>>> area(p)
4.9786...
Plot:
plt.plot(*xy0.T, 'b-')
plt.plot(*xy1.T, 'r-')
p = np.r_[xy0, xy1[::-1]]
plt.fill(*p.T, alpha=.2)
Speed
For both curves having 1 million points:
n = 1_000_000
xy0 = np.c_[np.linspace(0, 10, n), np.random.uniform(0, 10, n)]
xy1 = np.c_[np.linspace(0, 10, n), np.random.uniform(0, 10, n)]
%timeit area(np.r_[xy0, xy1[::-1]])
# 42.9 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Simple viz of polygon area calculation
# say:
p = np.array([[0, 3], [1, 0], [3, 3], [1, 3], [1, 2]])
p_closed = np.r_[p, p[:1]]
fig, axes = plt.subplots(ncols=2, figsize=(10, 5), subplot_kw=dict(box_aspect=1), sharex=True)
ax = axes[0]
ax.set_aspect('equal')
ax.plot(*p_closed.T, '.-')
ax.fill(*p_closed.T, alpha=0.6)
center = p.mean(0)
txtkwargs = dict(ha='center', va='center')
ax.text(*center, f'{area(p):.2f}', **txtkwargs)
ax = axes[1]
ax.set_aspect('equal')
for a, b in zip(p_closed, p_closed[1:]):
ar = 1/2 * np.cross(a, b)
pos = ar >= 0
tri = np.c_[(0,0), a, b, (0,0)].T
# shrink a bit to make individual triangles easier to visually identify
center = tri.mean(0)
tri = (tri - center)*0.95 + center
c = 'b' if pos else 'r'
ax.plot(*tri.T, 'k')
ax.fill(*tri.T, c, alpha=0.2, zorder=2 - pos)
t = ax.text(*center, f'{ar:.1f}', color=c, fontsize=8, **txtkwargs)
t.set_bbox(dict(facecolor='white', alpha=0.8, edgecolor='none'))
plt.tight_layout()

Second order gradient in numpy

I am trying to calculate the 2nd-order gradient numerically of an array in numpy.
a = np.sin(np.arange(0, 10, .01))
da = np.gradient(a)
dda = np.gradient(da)
This is what I come up. Is the the way it should be done?
I am asking this, because in numpy there isn't an option saying np.gradient(a, order=2). I am concerned about whether this usage is wrong, and that is why numpy does not have this implemented.
PS1: I do realize that there is np.diff(a, 2). But this is only single-sided estimation, so I was curious why np.gradient does not have a similar keyword.
PS2: The np.sin() is a toy data - the real data does not have an analytic form.
Thank you!
I'll second #jrennie's first sentence - it can all depend. The numpy.gradient function requires that the data be evenly spaced (although allows for different distances in each direction if multi-dimensional). If your data does not adhere to this, than numpy.gradient isn't going to be much use. Experimental data may have (OK, will have) noise on it, in addition to not necessarily being all evenly spaced. In this case it might be better to use one of the scipy.interpolate spline functions (or objects). These can take unevenly spaced data, allow for smoothing, and can return derivatives up to k-1 where k is the order of the spline fit requested. The default value for k is 3, so a second derivative is just fine.
Example:
spl = scipy.interpolate.splrep(x,y,k=3) # no smoothing, 3rd order spline
ddy = scipy.interpolate.splev(x,spl,der=2) # use those knots to get second derivative
The object oriented splines like scipy.interpolate.UnivariateSpline have methods for the derivatives. Note that the derivative methods are implemented in Scipy 0.13 and are not present in 0.12.
Note that, as pointed out by #JosephCottham in comments in 2018, this answer (good for Numpy 1.08 at least), is no longer applicable since (at least) Numpy 1.14. Check your version number and the available options for the call.
There's no universal right answer for numerical gradient calculation. Before you can calculate the gradient about sample data, you have to make some assumption about the underlying function that generated that data. You can technically use np.diff for gradient calculation. Using np.gradient is a reasonable approach. I don't see anything fundamentally wrong with what you are doing---it's one particular approximation of the 2nd derivative of a 1-D function.
The double gradient approach fails for discontinuities in the first derivative.
As the gradient function takes one data point to the left and to the right into account, this continues/spreads when applying it multiple times.
On the other hand side, the second derivative can be calculated by the formula
d^2 f(x[i]) / dx^2 = (f(x[i-1]) - 2*f(x[i]) + f(x[i+1])) / h^2
compare here. This has the advantage to just take the two neighboring pixels into account.
In the picture the double np.gradient approach (left) and the above mentioned formula (right), as implemented by np.diff are compared. As f(x) has only one kink at zero, the second derivative (green) should only there have a peak.
As the double gradient solution takes 2 neighboring points in each direction into account, this leads to finite second derivative values at +/- 1.
In some cases, however, you may want to prefer the double gradient solution, as this is more robust to noise.
I am not sure why there is np.gradient and np.diff, but a reason might be, that the second argument of np.gradient defines the pixel distance (for each dimension) and for images it can be applied for both dimensions simultaneously gy, gx = np.gradient(a).
Code
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(-5,6,1)
f = np.abs(xs)
f_x = np.gradient(f)
f_xx_bad = np.gradient(f_x)
f_xx_good = np.diff(f, 2)
test = f[:-2] - 2* f[1:-1] + f[2:]
# lets plot all this
fig, axs = plt.subplots(1, 2, figsize=(9, 3), sharey=True)
ax = axs[0]
ax.set_title('bad: double gradient')
ax.plot(xs, f, marker='o', label='f(x)')
ax.plot(xs, f_x, marker='o', label='d f(x) / dx')
ax.plot(xs, f_xx_bad, marker='o', label='d^2 f(x) / dx^2')
ax.legend()
ax = axs[1]
ax.set_title('good: diff with n=2')
ax.plot(xs, f, marker='o', label='f(x)')
ax.plot(xs, f_x, marker='o', label='d f(x) / dx')
ax.plot(xs[1:-1], f_xx_good, marker='o', label='d^2 f(x) / dx^2')
ax.plot(xs[1:-1], test, marker='o', label='test', markersize=1)
ax.legend()
As I keep stepping over this problem in one form or the other again and again, I decided to write a function gradient_n, which adds an differentiation oder functionality to np.gradient. Not all functionalities of np.gradient are supported, like differentiation of mutiple axis.
Like np.gradient, gradient_n returns the differentiated result in the same shape as the input. Also a pixel distance argument (d) is supported.
import numpy as np
def gradient_n(arr, n, d=1, axis=0):
"""Differentiate np.ndarray n times.
Similar to np.diff, but additional support of pixel distance d
and padding of the result to the same shape as arr.
If n is even: np.diff is applied and the result is zero-padded
If n is odd:
np.diff is applied n-1 times and zero-padded.
Then gradient is applied. This ensures the right output shape.
"""
n2 = int((n // 2) * 2)
diff = arr
if n2 > 0:
a0 = max(0, axis)
a1 = max(0, arr.ndim-axis-1)
diff = np.diff(arr, n2, axis=axis) / d**n2
diff = np.pad(diff, tuple([(0,0)]*a0 + [(1,1)] +[(0,0)]*a1),
'constant', constant_values=0)
if n > n2:
assert n-n2 == 1, 'n={:f}, n2={:f}'.format(n, n2)
diff = np.gradient(diff, d, axis=axis)
return diff
def test_gradient_n():
import matplotlib.pyplot as plt
x = np.linspace(-4, 4, 17)
y = np.linspace(-2, 2, 9)
X, Y = np.meshgrid(x, y)
arr = np.abs(X)
arr_x = np.gradient(arr, .5, axis=1)
arr_x2 = gradient_n(arr, 1, .5, axis=1)
arr_xx = np.diff(arr, 2, axis=1) / .5**2
arr_xx = np.pad(arr_xx, ((0, 0), (1, 1)), 'constant', constant_values=0)
arr_xx2 = gradient_n(arr, 2, .5, axis=1)
assert np.sum(arr_x - arr_x2) == 0
assert np.sum(arr_xx - arr_xx2) == 0
fig, axs = plt.subplots(2, 2, figsize=(29, 21))
axs = np.array(axs).flatten()
ax = axs[0]
ax.set_title('x-cut')
ax.plot(x, arr[0, :], marker='o', label='arr')
ax.plot(x, arr_x[0, :], marker='o', label='arr_x')
ax.plot(x, arr_x2[0, :], marker='x', label='arr_x2', ls='--')
ax.plot(x, arr_xx[0, :], marker='o', label='arr_xx')
ax.plot(x, arr_xx2[0, :], marker='x', label='arr_xx2', ls='--')
ax.legend()
ax = axs[1]
ax.set_title('arr')
im = ax.imshow(arr, cmap='bwr')
cbar = ax.figure.colorbar(im, ax=ax, pad=.05)
ax = axs[2]
ax.set_title('arr_x')
im = ax.imshow(arr_x, cmap='bwr')
cbar = ax.figure.colorbar(im, ax=ax, pad=.05)
ax = axs[3]
ax.set_title('arr_xx')
im = ax.imshow(arr_xx, cmap='bwr')
cbar = ax.figure.colorbar(im, ax=ax, pad=.05)
test_gradient_n()
This is an excerpt from the original documentation (at the time of writing found at http://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html). It states that unless the sampling distance is 1 you need to include a list containing the distances as an argument.
numpy.gradient(f, *varargs, **kwargs)
Return the gradient of an N-dimensional array.
The gradient is computed using second order accurate central differences in the interior and either first differences or second order accurate one-sides (forward or backwards) differences at the boundaries. The returned gradient hence has the same shape as the input array.
Parameters:
f : array_like
An N-dimensional array containing samples of a scalar function.
varargs : list of scalar, optional
N scalars specifying the sample distances for each dimension, i.e. dx, dy, dz, ... Default distance: 1.
edge_order : {1, 2}, optional
Gradient is calculated using Nth order accurate differences at the boundaries. Default: 1.
New in version 1.9.1.
Returns:
gradient : ndarray
N arrays of the same shape as f giving the derivative of f with respect to each dimension.
My solution is to create a function similar to np.gradient that calculates the 2nd derivatives numerically from the array data.
import numpy as np
def gradient2_even(y, h=None, edge_order=1):
"""
Return the 2nd-order gradient i.e.
2nd derivatives of y with n samples and k components.
The 2nd-order gradient is computed using second-order-accurate central differences
in the interior points and either first or second order accurate one-sided
(forward or backwards) differences at the boundaries.
The returned gradient hence has the same shape as the input array.
Parameters
----------
y : 1d or 2d array_like
The array containing the samples. If 2d with shape (n,k),
n is the number of samples at least 2 while k is the number of
y series/components. 1d input is equivalent to 2d input with shape (n,1).
h : constant or 1d, optional
spacing between the y samples. Default unitary spacing for
all y components. Spacing can be specified using:
1. Single scalar spacing value for all y components.
2. 1d array_like of length k specifying the spacing for each y component
edge_order : {1, 2}, optional
Order 1 means 3-point forward/backward finite differences
are used to calculate the 2nd derivatves at the edge points while
order 2 uses 4-point forward/backward finite differences.
Returns
----------
d2y : 1d or 2d array
Array containing the 2nd derivatives. The output shape is the same as y.
"""
if edge_order!=1 and edge_order!=2:
raise ValueError('edge_order must be 1 or 2.')
else:
pass
y = np.asfarray(y)
origshape = y.shape
if y.ndim!=1 and y.ndim!=2:
raise ValueError('y can only be 1d or 2d.')
elif y.ndim==1:
y = np.atleast_2d(y).T
elif y.ndim==2:
if y.shape[0]<2:
raise ValueError('The number of y samples must be atleast 2.')
else:
pass
else:
pass
n,k = y.shape
if h is None:
h = 1.0
else:
h = np.asfarray(h)
if h.ndim!=0 and h.ndim!=1:
raise ValueError('h can only be 0d or 1d.')
elif h.ndim==0:
pass
elif h.ndim==1 and h.size!=n:
raise ValueError('If h is 1d, it must have the same number as the components of y.')
else:
pass
d2y = np.zeros_like(y)
if n==2:
pass
elif n==3:
d2y[:] = ( 1/h**2 * (y[0] - 2*y[1] + y[2]) )
else:
d2y = np.zeros_like(y)
d2y[1:-1]=1/h**2 * ( y[:-2] - 2*y[1:-1] + y[2:] )
if edge_order==1:
d2y[0]=1/h**2 * ( y[0] - 2*y[1] + y[2] )
d2y[-1]=1/h**2 * ( y[-1] - 2*y[-2] + y[-3] )
else:
d2y[0]=1/h**2 * ( 2*y[0] - 5*y[1] + 4*y[2] - y[3] )
d2y[-1]=1/h**2 * ( 2*y[-1] - 5*y[-2] + 4*y[-3] - y[-4] )
return d2y.reshape(origshape)
Using your example,
# After importing the function from the script file or running it
from numpy import *
from matplotlib.pyplot import *
x, h = linspace(0, 10, 17) # use a fairly coarse grid to see the discrepancies better
y = sin(x)
ypp = -sin(x) # analytical 2nd derivatives
# Compute numerically the 2nd derivatives using 2nd-order finite differences at the edge points
d2y = gradient2_even(y, h, 2)
# Compute numerically the 2nd derivatives using nested gradient function
d2y2 = gradient(gradient(y, h, edge_order=2), h, edge_order=2)
# Compute numerically the 2nd derivatives using 1st-order finite differences at the edge points
d2y3 = gradient2_even(y, h, 1)
fig,ax=subplots(1,1)
ax.plot(x, ypp, x, d2y, 'o', x, d2y2, 'o', x, d2y3, 'o'), ax.grid()
ax.legend(['Analytical', 'edge_order=2', 'nested gradient', 'edge_order=1'])
fig.tight_layout()

Categories