Let's say you have two arrays of data values from a calculation, that you can model with a continuos, differentiable function each. Both "lines" of data points intersect at (at least) one point and now the question is whether the functions behind these datasets are actually crossing or anticrossing.
The image below shows the situation, where I know (from the physics behind it) that at the upper two "contact points" the yellow and green lines actually should "switch color", whereas at the lower one both functions go out of each others way:
To give an easier "toy set" of data, take this code for example:
import matplotlib.pyplot as plt
import numpy as np
x=np.arange(-10,10,.5)
y1=[np.absolute(i**3)+100*np.absolute(i) for i in x]
y2=[-np.absolute(i**3)-100*np.absolute(i) for i in x][::-1]
plt.scatter(x,y1)
plt.scatter(x,y2,color='r')
plt.show()
Which should produce the following image:
Now how could I extrapolate whether the trend behind the data is crossing (so the data from the lower left continues to the upper right) or anti-crossing (as indicated with the colors above, the data from the lower left continues to the lower right)?
So far I was able to find the "contact point" between these to datasets by looking at the derivative of the Difference between them, roughly like this:
closePoints=np.where(np.diff(np.diff(array_A - array_B) > 0))[0] + 1
(which probably would be faster to evaluate with something like scipy's cKDTree).
Should I go on and (probably very inefficiently) check the derivative on both sides of the intersection? Or can I somehow check if the extrapolation of the data on the left side fits better to crossing or anticrossing?
I understood your problem as:
You have two sequences of points in a 2D plane.
The true curves can be approximated by straight lines between consecutive points of the sequences.
You want to know how often and where the two curves intersect (not only come into contact but really cross each other) (polygon intersection).
A potential solution is:
You look at each combination of a line segment of one curve with a line segment of another curve.
Combinations where the bounding boxes of the line segments have an overlap can potentially contain intersection points.
You solve a linear equation system to compute if and where an intersection between two lines occurs
In case of no solution to the equation system the lines are parallel but not overlapping, dismiss this case
In case of one solution check that it is truly within the segments, if so record this crossing point
In case of infinitely many intersections the lines are identical. This is also no real crossing and can be dismissed.
Do this for all combinations of line segments and eliminate twin cases, i.e. where the two curves intersect at a segment start or end
Let me give some details:
How to check if two bounding-boxes (rectangles) of the segments overlap so that the segments potentially can intersect?
The minimal x/y value of one rectangle must be smaller than the maximal x/y value of the other. This must hold for both.
If you have two segments how do you solve for intersection points?
Let's say segment A has two points (x1, y1) and (x2, y2) and segment B has two points (x2, y3) and (x4, y4).
Then you simply have two parametrized line equations which have to be set equal:
(x1, y1) + t * (x2 - x1, y2 - y1) = (x3, y3) + q * (x4 - x3, y4 - y3)
And you need to find all solutions where t or q in [0, 1). The corresponding linear equation system may be rank deficient or not solvable at all, best is to use a general solver (I chose numpy.linalg.lstsq) that does everything in one go.
Curves sharing a common point
Surprisingly difficult are cases where one point is common in the segmentation of both curves. The difficulty lies then in the correct decision of real intersection vs. contact points. The solution is to compute the angle of both adjacent segments of both curves (gives 4 angles) around the common point and look at the order of the angles. If both curves come alternating when going around the equal point then it's an intersection, otherwise it isn't.
And a code example based on your data:
import math
import matplotlib.pyplot as plt
import numpy as np
def intersect_curves(x1, y1, x2, y2):
"""
x1, y1 data vector for curve 1
x2, y2 data vector for curve 2
"""
# number of points in each curve, number of segments is one less, need at least one segment in each curve
N1 = x1.shape[0]
N2 = x2.shape[0]
# get segment presentation (xi, xi+1; xi+1, xi+2; ..)
xs1 = np.vstack((x1[:-1], x1[1:]))
ys1 = np.vstack((y1[:-1], y1[1:]))
xs2 = np.vstack((x2[:-1], x2[1:]))
ys2 = np.vstack((y2[:-1], y2[1:]))
# test if bounding-boxes of segments overlap
mix1 = np.tile(np.amin(xs1, axis=0), (N2-1,1))
max1 = np.tile(np.amax(xs1, axis=0), (N2-1,1))
miy1 = np.tile(np.amin(ys1, axis=0), (N2-1,1))
may1 = np.tile(np.amax(ys1, axis=0), (N2-1,1))
mix2 = np.transpose(np.tile(np.amin(xs2, axis=0), (N1-1,1)))
max2 = np.transpose(np.tile(np.amax(xs2, axis=0), (N1-1,1)))
miy2 = np.transpose(np.tile(np.amin(ys2, axis=0), (N1-1,1)))
may2 = np.transpose(np.tile(np.amax(ys2, axis=0), (N1-1,1)))
idx = np.where((mix2 <= max1) & (max2 >= mix1) & (miy2 <= may1) & (may2 >= miy1)) # overlapping segment combinations
# going through all the possible segments
x0 = []
y0 = []
for (i, j) in zip(idx[0], idx[1]):
# get segment coordinates
xa = xs1[:, j]
ya = ys1[:, j]
xb = xs2[:, i]
yb = ys2[:, i]
# ax=b, prepare matrices a and b
a = np.array([[xa[1] - xa[0], xb[0] - xb[1]], [ya[1] - ya[0], yb[0]- yb[1]]])
b = np.array([xb[0] - xa[0], yb[0] - ya[0]])
r, residuals, rank, s = np.linalg.lstsq(a, b)
# if this is not a
if rank == 2 and not residuals and r[0] >= 0 and r[0] < 1 and r[1] >= 0 and r[1] < 1:
if r[0] == 0 and r[1] == 0 and i > 0 and j > 0:
# super special case of one segment point (not the first) in common, need to differentiate between crossing or contact
angle_a1 = math.atan2(ya[1] - ya[0], xa[1] - xa[0])
angle_b1 = math.atan2(yb[1] - yb[0], xb[1] - xb[0])
# get previous segment
xa2 = xs1[:, j-1]
ya2 = ys1[:, j-1]
xb2 = xs2[:, i-1]
yb2 = ys2[:, i-1]
angle_a2 = math.atan2(ya2[0] - ya2[1], xa2[0] - xa2[1])
angle_b2 = math.atan2(yb2[0] - yb2[1], xb2[0] - xb2[1])
# determine in which order the 4 angle are
if angle_a2 < angle_a1:
h = angle_a1
angle_a1 = angle_a2
angle_a2 = h
if (angle_b1 > angle_a1 and angle_b1 < angle_a2 and (angle_b2 < angle_a1 or angle_b2 > angle_a2)) or\
((angle_b1 < angle_a1 or angle_b1 > angle_a2) and angle_b2 > angle_a1 and angle_b2 < angle_a2):
# both in or both out, just a contact point
x0.append(xa[0])
y0.append(ya[0])
else:
x0.append(xa[0] + r[0] * (xa[1] - xa[0]))
y0.append(ya[0] + r[0] * (ya[1] - ya[0]))
return (x0, y0)
# create data
def data_A():
# data from question (does not intersect)
x1 = np.arange(-10, 10, .5)
x2 = x1
y1 = [np.absolute(x**3)+100*np.absolute(x) for x in x1]
y2 = [-np.absolute(x**3)-100*np.absolute(x) for x in x2][::-1]
return (x1, y1, x2, y2)
def data_B():
# sine, cosine, should have some intersection points
x1 = np.arange(-10, 10, .5)
x2 = x1
y1 = np.sin(x1)
y2 = np.cos(x2)
return (x1, y1, x2, y2)
def data_C():
# a spiral and a diagonal line, showing the more general case
t = np.arange(0, 10, .2)
x1 = np.sin(t * 2) * t
y1 = np.cos(t * 2) * t
x2 = np.arange(-10, 10, .5)
y2 = x2
return (x1, y1, x2, y2)
def data_D():
# parallel and overlapping, should give no intersection point
x1 = np.array([0, 1])
y1 = np.array([0, 0])
x2 = np.array([-1, 3])
y2 = np.array([0, 0])
return (x1, y1, x2, y2)
def data_E():
# crossing at a segment point, should give exactly one intersection point
x1 = np.array([-1,0,1])
y1 = np.array([0,0,0])
x2 = np.array([0,0,0])
y2 = np.array([-1,0,1])
return (x1, y1, x2, y2)
def data_F():
# contacting at one segment point, should give no intersection point
x1 = np.array([-1,0,-1])
y1 = np.array([-1,0,1])
x2 = np.array([1,0,1])
y2 = np.array([-1,0,1])
return (x1, y1, x2, y2)
x1, y1, x2, y2 = data_F() # select the data you like here
# show example data
plt.plot(x1, y1, 'b-o')
plt.plot(x2, y2, 'r-o')
# call to intersection computation
x0, y0 = intersect_curves(x1, y1, x2, y2)
print('{} intersection points'.format(len(x0)))
# display intersection points in green
plt.plot(x0, y0, 'go')
plt.show() # zoom in to see that the algorithm is correct
I tested it extensively and should get most (all) border cases right (see data_A-F in code). Some examples:
Some Comments:
The assumption about the line approximation is crucial. Most true curves might only be to some extent be approximable to lines locally. Because of this places where the two curves come close but to not intersect with a distance in the order of the distance of consecutive sampling points of your curve - you may obtain false positives or false negatives. The solution is then to either use more points or to use additonal knowledge about the true curves. Splines might give a lower error rate but also require more computations, better sampling of the curves would be preferable then.
Self-intersection is trivially included when taking two times the same curve and let them intersect
This solution has the additional advantage that it isn't restricted to curves of the form y=f(x) but it's applicable to arbitrary curves in 2D.
You could use a spline interpolation for the difference function g(x) = y1(x) - y(2). Finding the minimum of the square g(x)**2 would be a contact or crossing point. Looking at the first and second derivative you could decide if it is a contact point( g(x) has minimum, g'(x)==0, g''(x) != 0) or a crossing point (g(x) is a stationary point, g'(x)==0, g''(x)==0).
The following code searches for a minimum of g(x)**2 in constrained interval and then plot the derivatives. The use of a constrained interval is to find multiple points successively by excluding intervals in which previous points were.
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as sopt
import scipy.interpolate as sip
# test functions:
nocrossingTest = True
if nocrossingTest:
f1 = lambda x: +np.absolute(x**3)+100*np.absolute(x)
f2 = lambda x: -np.absolute(x**3)-100*np.absolute(x)
else:
f1 = lambda x: +np.absolute(x**3)+100*x
f2 = lambda x: -np.absolute(x**3)-100*x
xp = np.arange(-10,10,.5)
y1p, y2p = f1(xp), f2(xp) # test array
# Do Interpolation of y1-y2 to find crossing point:
g12 = sip.InterpolatedUnivariateSpline(xp, y1p - y2p) # Spline Interpolator of Difference
dg12 = g12.derivative() # spline derivative
ddg12 = dg12.derivative() # spline derivative
# Bounded least square fit to find minimal distance
gg = lambda x: g12(x)*g12(x)
rr = sopt.minimize_scalar(gg, bounds=[-1,1]) # search minium in Interval [-1,1]
x_c = rr['x'] # x value with minimum distance
print("Crossing point is at x = {} (Distance: {})".format(x_c, g12(x_c)))
fg = plt.figure(1)
fg.clf()
fg,ax = plt.subplots(1, 1,num=1)
ax.set_title("Function Values $y$")
ax.plot(xp, np.vstack([y1p,y2p]).T, 'x',)
xx = np.linspace(xp[0], xp[-1], 1000)
ax.plot(xx, np.vstack([f1(xx), f2(xx)]).T, '-', alpha=0.5)
ax.grid(True)
ax.legend(loc="best")
fg.canvas.draw()
fg = plt.figure(2)
fg.clf()
fg,axx = plt.subplots(3, 1,num=2)
axx[0].set_title("$g(x) = y_1(x) - y_2(x)$")
axx[1].set_title("$dg(x)/dx$")
axx[2].set_title("$d^2g(x)/dx^2$")
for ax,g in zip(axx, [g12, dg12, ddg12]):
ax.plot(xx, g(xx))
ax.plot(x_c, g(x_c), 'ro', alpha=.5)
ax.grid(True)
fg.tight_layout()
plt.show()
The difference function show that the difference is not smooth:
Related
I'm making a python script right now that is trying to find the length of an arc, where it given this information:
center of arc: x1, y1
start point of arc: x2, y2
end point of arc: x3, y3
direction, cw, ccw
so far I have been able to successfully calculate the radius, and I tried calculating the angle using the equation:
But for any arcs that have an angle greater than 1*pi or 180 degrees, it returns the incorrect (but correct) inside angle.
What is the correct equation knowing the radius and these three points that I can use to find the value of the angle of the arc from 0 rad/degrees to 360 degrees/2pi radians, going in either the clockwise or counterclockwise direction (it can be either or and I need to be able to calculate for both scenarios)
Code:
# code to find theta
aVector = np.array([x1 - x2, y1 - y2])
bVector = np.array([x1 - x3, y1 - y3])
aMag = np.linalg.norm(aVector)
bMag = np.linalg.norm(aVector)
theta = np.arcos(np.dot(aVector, bVector) / (aMag * bMag))
as you can see here, I'm using arccos which to my dismay only outputs 0-180 degrees
Solution/Working code:
# equation for angle using atan2
start = math.atan2(y2 - y1, x2 - x1)
end = math.atan2(y3 - y1, x3 - x1)
if gcodeAnalysis[tempLineNum][4] == "G3": # going CW
start, end = end, start
tau = 2.0 * math.pi
theta = math.fmod(math.fmod(end - start, tau) + tau, tau)
Working Values:
X1 = 0.00048399999999998444
Y1 = 0.0002720000000007161
X2 = 0.378484
Y2 = -14.694728
X3 = 3.376
Y3 = -14.307
Proper result/value
Theta = 6.077209477545957
Assume this arc was done CCW
As you noticed, the range of math.acos is [0, pi], making it rather useless for telling you the relative directions of the vectors. To get full circular information about a pair of angles, you can use math.atan2. While regular math.atan has a range of [-pi/2, pi/2], atan2 splits the inputs into two parts and returns an angle in the range (-pi, pi]. You can compute the angles relative to any reference, not necessarily relative to each other:
start = math.atan2(y2 - y1, x2 - x1)
end = math.atan2(y3 - y1, x3 - x1)
Now you can use some common formulae to find the difference between the angles in whatever direction you want. I've implemented some of these in a small utility library I made called haggis. The specific function you want is haggis.math.ang_diff_pos.
First, the "manual" computation:
if direction == 'cw':
start, end = end, start
tau = 2.0 * math.pi
angle = math.fmod(math.fmod(end - start, tau) + tau, tau)
If you want to use my function, you can do
if direction == 'cw':
start, end = end, start
angle = ang_diff_pos(start, end)
All of these operations can be easily vectorized using numpy if you find yourself dealing with many points all at once.
You can use the cross product of the two vector to determine if the two vector need to rotate clock or counter-clock wise.
See code below:
import numpy as np
from numpy import linalg as LA
x1 = 0
y1 = 0
x2 = 2
y2 = 0
x3 = 2
y3 = -2
dir = 'ccw' # or ccw
v1 = np.array([x2-x1,y2-y1])
v2 = np.array( [x3-x1,y3-y1])
# if the cross product is positive, then the two vector need to rotate counter clockwise
rot = np.cross(v1,v2)
vdir = 'ccw' if rot >0 else 'cw'
r = (v1[0]*v2[0]+v1[1]*v2[1])/(LA.norm(v1)*LA.norm(v2))
deg = np.arccos(r)/np.pi*180
if vdir != dir:
deg = 360 -deg
print(deg)
I have two points and would like to calculate the angle of the line crossing these points in degrees.
I calculated the angle like so:
import numpy as np
p1 = [0, 0.004583285714285714]
p2 = [1, 0.004588714285714285]
x1 = p1[0]
y1 = p1[1]
x2 = p2[0]
y2 = p2[1]
angle = np.rad2deg(np.arctan2(y1 - y2, x2 - x1))
print(angle)
As expected, the angle is a very small negative number (a small downward slope in relation to the X plane):
-0.00031103423163937605
If I plot this, you will see what I mean:
plt.ylim([0,1]) # making y axis range the same as X (a full unit)
plt.plot([x1, x2], [y1, y2])
Clearly the angle of that line is a very small number because the Y values are so small.
I know the lowest y number in this plot is 0.00458 and the highest is 0.00459.
I'm having trouble coming up with the way to scale this properly so that I can obtain this angle instead:
Which is closer to -35 degrees or so (visually).
How can I get the angle a person would see if the chart was plotted with the Y axis ranging only between those min and max values above?
Of course all plots are just for illustration - I'm trying to calculate just the raw angle number given two points and the min and max values for the Y axis.
Solved it, turns out was exceedingly simple and I'm not sure why I was having trouble with it (or why folks seem not to understand the question ¯\ (ツ)/¯ ).
The angle I was looking for can be obtained by
yRange = yMaxValue - yMininumValue
scaledY1 = y1 / yRange
scaledY2 = y2 / yRange
angle = np.rad2deg(np.arctan2(scaledY1 - scaledY2, x2 - x1))
Which for the values posted in the question, result in -28.495638618242538
I have given two coordinates (x1,y1) and (x2,y2) and would like to draw a line between them given a function drawdot(x,y). So to make it look like I draw a line I just want to draw 100 dots between the two points. How would I do this , I guesse its just a y = ax + b problem but I just can't manage to make it work.
thx
You need to take the unitary vector from point A to point B and then scale it n times.
Ill make a function that returns the points (pair of coordinates) that need to be drawn.
def drawLine(x1, y1, x2, y2, n):
v = (x2 - x1, y2 - y1)
length = math.sqrt(v[0]**2, v[1]**2)
unitary = (v[0]/length, v[1]/length)
return [(x1 + unitary[0]*i, y1 + unitary[1]*i) for i in range(n)]
I have many points in the x,y plane, with length around 10000, each point (x,y) has an intrinsic radius r. This small data set is only one tiny corner of my entire data set. I have an interested point (x1,y1), I want to find nearby point around (x1,y1) within 1 and meet the criteria that the distance between (x,y) and (x1,y1) is less than r. I want to return the index of those good points, not the good points themselves.
import numpy as np
np.random.seed(2000)
x = 20.*np.random.rand(10000)
y = 20.*np.random.rand(10000)
r = 0.3*np.random.rand(10000)
x1 = 10. ### (x1,y1) is an interest point
y1 = 12.
def index_finder(x,y,r,x1,y1):
idx = (abs(x - x1) < 1.) & (abs(y - y1) < 1.) ### This cut will probably cut 90% of the data
x_temp = x[idx] ### but if I do like this, then I lose the track of the original index
y_temp = y[idx]
dis_square = (x_temp - x1)*(x_temp - x1) + (y_temp - y1)*(y_temp - y1)
idx1 = dis_square < r*r ### after this cut, there are only a few left
x_good = x_temp[idx1]
y_good = y_temp[idx1]
In this function, I can find the good points around (x1,y1), but not the index of those good points. HOWEVER, I need the ORIGINAL index because the ORIGINAL index are used to extract other data associated with the coordinate (x,y). As I mentioned, the sample data set is only a tiny corner of my entire data set, I will call the above function around 1,000,000 times for my entire data set, therefore the efficiency of the above index_finder function is also a consideration.
Any thoughts on such task?
Approach #1
We could simply index into the first mask with its own mask for selecting the True places masked values from the second stage, like so -
idx[idx] = idx1
Thus, idx would have the final valid masked values/ good valued places corresponding to original array x and y, i.e. -
x_good = x[idx]
y_good = y[idx]
This mask could then be used to index into other arrays as mentioned in the question.
Approach #2
As another approach, we could use two conditional statements , thus creating two masks with them. Finally, combine them with AND-ing to get the combined mask, which could be indexed into x and y arrays for the final outputs. We won't need to get the actual indices that way, so that's one more benefit with it.
Hence, the implementation -
X = x-x1
Y = y-y1
mask1 = (np.abs(X) < 1.) & (np.abs(Y) < 1.)
mask2 = X**2 + Y*2 < r**2
comb_mask = mask1 & mask2
x_good = x[comb_mask]
y_good = y[comb_mask]
If for some reason, you still need the corresponding indices, just do -
comb_idx = np.flatnonzero(comb_mask)
If you are doing these operations for different x1 and y1 pairs for the same x and y dataset, I would suggest using broadcasting to vectorize it through all those x1, y1 paired datasets, as shown in this post.
numpy.where seems made for finding the indices
the vectorized norm calc + np.where() could be faster than a loop
sq_norm = (x - x1)**2 + (y - y1)**2 # no need to take 10000 sqrt
idcs = np.where(sq_norm < 1.)
len(idcs[0])
Out[193]: 69
np.stack((idcs[0], x[idcs], y[idcs]), axis=1)[:5]
Out[194]:
array([[ 38. , 9.47165956, 11.94250173],
[ 39. , 9.6966941 , 11.67505453],
[ 276. , 10.68835317, 12.11589316],
[ 288. , 9.93632584, 11.07624915],
[ 344. , 9.48644057, 12.04911857]])
the norm calc can include the r array too, the 2nd step?
r_sq_norm = (x[idcs] - x1)**2 + (y[idcs] - y1)**2 - r[idcs]**2
r_idcs = np.where(r_sq_norm < 0.)
idcs[0][r_idcs]
Out[11]: array([1575, 3476, 3709], dtype=int64)
you might want to time the 2 step test vs including r in the 1st vectorized norm calc?
sq_norm = (x - x1)**2 + (y - y1)**2 - r**2
idcs = np.where(sq_norm < 0.)
idcs[0]
Out[13]: array([1575, 3476, 3709], dtype=int64)
You can take a mask of your indices, like so:
def index_finder(x,y,r,x1,y1):
idx = np.nonzero((abs(x - x1) < 1.) & (abs(y - y1) < 1.)) #numerical, not boolean
mask = (x[idx] - x1)*(x[idx] - x1) + (y[idx] - y1)*(y[idx] - y1) < r*r
idx1 = [i[mask] for i in idx]
x_good = x_temp[idx1]
y_good = y_temp[idx1]
now idx1 is the indices you want to extract.
Faster way in general to do this is to use scipy.spatial.KDTree
from scipy.spatial import KDTree
xy = np.stack((x,y))
kdt = KDTree(xy)
kdt.query_ball_point([x1, y1], r)
If you have many points to query against the same dataset, this will be much faster than sequentially calling your index_finder app.
x1y1 = np.stack((x1, y1)) #`x1` and `y1` are arrays of coordinates.
kdt.query_ball_point(x1y1, r)
ALSO WRONG: if you have different distances for each point, you can do:
def query_variable_ball(kdtree, x, y, r):
out = []
for x_, y_, r_ in zip(x, y, r):
out.append(kdt.query_ball_point([x_, y_], r_)
return out
xy = np.stack((x,y))
kdt = KDTree(xy)
query_variable_ball(kdt, x1, y1, r)
EDIT 2: This should work with different r values for each point
from scipy.spatial import KDTree
def index_finder_kd(x, y, r, x1, y1): # all arrays
xy = np.stack((x,y), axis = -1)
x1y1 = np.stack((x1, y1), axis = -1)
xytree = KDTree(xy)
d, i = xytree.query(x1y1, k = None, distance_upper_bound = 1.)
good_idx = np.zeros(x.size, dtype = bool)
for idx, dist in zip(i, d):
good_idx[idx] |= r[idx] > dist
x_good = x[good_idx]
y_good = y[good_idx]
return x_good, y_good, np.flatnonzero(good_idx)
This is very slow for only one (x1, y1) pair as the KDTree takes a while to populate. But if you have millions of pairs, this will be much faster.
(I've assumed you want the union of all good points in the (x, y) data for all (x1, y1), if you want them separately it's also possible using a similar method, removing elements of i[j] based on whether d[j] < r[i[j]])
I have a list of x and y values for two curves, both having weird shapes, and I don't have a function for any of them. I need to do two things:
Plot it and shade the area between the curves like the image below.
Find the total area of this shaded region between the curves.
I'm able to plot and shade the area between those curves with fill_between and fill_betweenx in matplotlib, but I have no idea on how to calculate the exact area between them, specially because I don't have a function for any of those curves.
Any ideas?
I looked everywhere and can't find a simple solution for this. I'm quite desperate, so any help is much appreciated.
Thank you very much!
EDIT: For future reference (in case anyone runs into the same problem), here is how I've solved this: connected the first and last node/point of each curve together, resulting in a big weird-shaped polygon, then used shapely to calculate the polygon's area automatically, which is the exact area between the curves, no matter which way they go or how nonlinear they are. Works like a charm! :)
Here is my code:
from shapely.geometry import Polygon
x_y_curve1 = [(0.121,0.232),(2.898,4.554),(7.865,9.987)] #these are your points for curve 1 (I just put some random numbers)
x_y_curve2 = [(1.221,1.232),(3.898,5.554),(8.865,7.987)] #these are your points for curve 2 (I just put some random numbers)
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in x_y_curve1:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in x_y_curve2[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in x_y_curve1[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
polygon = Polygon(polygon_points)
area = polygon.area
print(area)
EDIT 2: Thank you for the answers. Like Kyle explained, this only works for positive values. If your curves go below 0 (which is not my case, as showed in the example chart), then you would have to work with absolute numbers.
The area calculation is straightforward in blocks where the two curves don't intersect: thats the trapezium as has been pointed out above. If they intersect, then you create two triangles between x[i] and x[i+1], and you should add the area of the two. If you want to do it directly, you should handle the two cases separately. Here's a basic working example to solve your problem. First, I will start with some fake data:
#!/usr/bin/python
import numpy as np
# let us generate fake test data
x = np.arange(10)
y1 = np.random.rand(10) * 20
y2 = np.random.rand(10) * 20
Now, the main code. Based on your plot, looks like you have y1 and y2 defined at the same X points. Then we define,
z = y1-y2
dx = x[1:] - x[:-1]
cross_test = np.sign(z[:-1] * z[1:])
cross_test will be negative whenever the two graphs cross. At these points, we want to calculate the x coordinate of the crossover. For simplicity, I will calculate x coordinates of the intersection of all segments of y. For places where the two curves don't intersect, they will be useless values, and we won't use them anywhere. This just keeps the code easier to understand.
Suppose you have z1 and z2 at x1 and x2, then we are solving for x0 such that z = 0:
# (z2 - z1)/(x2 - x1) = (z0 - z1) / (x0 - x1) = -z1/(x0 - x1)
# x0 = x1 - (x2 - x1) / (z2 - z1) * z1
x_intersect = x[:-1] - dx / (z[1:] - z[:-1]) * z[:-1]
dx_intersect = - dx / (z[1:] - z[:-1]) * z[:-1]
Where the curves don't intersect, area is simply given by:
areas_pos = abs(z[:-1] + z[1:]) * 0.5 * dx # signs of both z are same
Where they intersect, we add areas of both triangles:
areas_neg = 0.5 * dx_intersect * abs(z[:-1]) + 0.5 * (dx - dx_intersect) * abs(z[1:])
Now, the area in each block x[i] to x[i+1] is to be selected, for which I use np.where:
areas = np.where(cross_test < 0, areas_neg, areas_pos)
total_area = np.sum(areas)
That is your desired answer. As has been pointed out above, this will get more complicated if the both the y graphs were defined at different x points. If you want to test this, you can simply plot it (in my test case, y range will be -20 to 20)
negatives = np.where(cross_test < 0)
positives = np.where(cross_test >= 0)
plot(x, y1)
plot(x, y2)
plot(x, z)
plt.vlines(x_intersect[negatives], -20, 20)
Define your two curves as functions f and g that are linear by segment, e.g. between x1 and x2, f(x) = f(x1) + ((x-x1)/(x2-x1))*(f(x2)-f(x1)).
Define h(x)=abs(g(x)-f(x)). Then use scipy.integrate.quad to integrate h.
That way you don't need to bother about the intersections. It will do the "trapeze summing" suggested by ch41rmn automatically.
Your set of data is quite "nice" in the sense that the two sets of data share the same set of x-coordinates. You can therefore calculate the area using a series of trapezoids.
e.g. define the two functions as f(x) and g(x), then, between any two consecutive points in x, you have four points of data:
(x1, f(x1))-->(x2, f(x2))
(x1, g(x1))-->(x2, g(x2))
Then, the area of the trapezoid is
A(x1-->x2) = ( f(x1)-g(x1) + f(x2)-g(x2) ) * (x2-x1)/2 (1)
A complication arises that equation (1) only works for simply-connected regions, i.e. there must not be a cross-over within this region:
|\ |\/|
|_| vs |/\|
The area of the two sides of the intersection must be evaluated separately. You will need to go through your data to find all points of intersections, then insert their coordinates into your list of coordinates. The correct order of x must be maintained. Then, you can loop through your list of simply connected regions and obtain a sum of the area of trapezoids.
EDIT:
For curiosity's sake, if the x-coordinates for the two lists are different, you can instead construct triangles. e.g.
.____.
| / \
| / \
| / \
|/ \
._________.
Overlap between triangles must be avoided, so you will again need to find points of intersections and insert them into your ordered list. The lengths of each side of the triangle can be calculated using Pythagoras' formula, and the area of the triangles can be calculated using Heron's formula.
The area_between_two_curves function in pypi library similaritymeasures (released in 2018) might give you what you need. I tried a trivial example on my side, comparing the area between a function and a constant value and got pretty close tie-back to Excel (within 2%). Not sure why it doesn't give me 100% tie-back, maybe I am doing something wrong. Worth considering though.
I had the same problem.The answer below is based on an attempt by the question author. However, shapely will not directly give the area of the polygon in purple. You need to edit the code to break it up into its component polygons and then get the area of each. After-which you simply add them up.
Area Between two lines
Consider the lines below:
Sample Two lines
If you run the code below you will get zero for area because it takes the clockwise and subtracts the anti clockwise area:
from shapely.geometry import Polygon
x_y_curve1 = [(1,1),(2,1),(3,3),(4,3)] #these are your points for curve 1
x_y_curve2 = [(1,3),(2,3),(3,1),(4,1)] #these are your points for curve 2
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in x_y_curve1:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in x_y_curve2[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in x_y_curve1[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
polygon = Polygon(polygon_points)
area = polygon.area
print(area)
The solution is therefore to split the polygon into smaller pieces based on where the lines intersect. Then use a for loop to add these up:
from shapely.geometry import Polygon
x_y_curve1 = [(1,1),(2,1),(3,3),(4,3)] #these are your points for curve 1
x_y_curve2 = [(1,3),(2,3),(3,1),(4,1)] #these are your points for curve 2
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in x_y_curve1:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in x_y_curve2[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in x_y_curve1[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
polygon = Polygon(polygon_points)
area = polygon.area
x,y = polygon.exterior.xy
# original data
ls = LineString(np.c_[x, y])
# closed, non-simple
lr = LineString(ls.coords[:] + ls.coords[0:1])
lr.is_simple # False
mls = unary_union(lr)
mls.geom_type # MultiLineString'
Area_cal =[]
for polygon in polygonize(mls):
Area_cal.append(polygon.area)
Area_poly = (np.asarray(Area_cal).sum())
print(Area_poly)
A straightforward application of the area of a general polygon (see Shoelace formula) makes for a super-simple and fast, vectorized calculation:
def area(p):
# for p: 2D vertices of a polygon:
# area = 1/2 abs(sum(p0 ^ p1 + p1 ^ p2 + ... + pn-1 ^ p0))
# where ^ is the cross product
return np.abs(np.cross(p, np.roll(p, 1, axis=0)).sum()) / 2
Application to area between two curves. In this example, we don't even have matching x coordinates!
np.random.seed(0)
n0 = 10
n1 = 15
xy0 = np.c_[np.linspace(0, 10, n0), np.random.uniform(0, 10, n0)]
xy1 = np.c_[np.linspace(0, 10, n1), np.random.uniform(0, 10, n1)]
p = np.r_[xy0, xy1[::-1]]
>>> area(p)
4.9786...
Plot:
plt.plot(*xy0.T, 'b-')
plt.plot(*xy1.T, 'r-')
p = np.r_[xy0, xy1[::-1]]
plt.fill(*p.T, alpha=.2)
Speed
For both curves having 1 million points:
n = 1_000_000
xy0 = np.c_[np.linspace(0, 10, n), np.random.uniform(0, 10, n)]
xy1 = np.c_[np.linspace(0, 10, n), np.random.uniform(0, 10, n)]
%timeit area(np.r_[xy0, xy1[::-1]])
# 42.9 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Simple viz of polygon area calculation
# say:
p = np.array([[0, 3], [1, 0], [3, 3], [1, 3], [1, 2]])
p_closed = np.r_[p, p[:1]]
fig, axes = plt.subplots(ncols=2, figsize=(10, 5), subplot_kw=dict(box_aspect=1), sharex=True)
ax = axes[0]
ax.set_aspect('equal')
ax.plot(*p_closed.T, '.-')
ax.fill(*p_closed.T, alpha=0.6)
center = p.mean(0)
txtkwargs = dict(ha='center', va='center')
ax.text(*center, f'{area(p):.2f}', **txtkwargs)
ax = axes[1]
ax.set_aspect('equal')
for a, b in zip(p_closed, p_closed[1:]):
ar = 1/2 * np.cross(a, b)
pos = ar >= 0
tri = np.c_[(0,0), a, b, (0,0)].T
# shrink a bit to make individual triangles easier to visually identify
center = tri.mean(0)
tri = (tri - center)*0.95 + center
c = 'b' if pos else 'r'
ax.plot(*tri.T, 'k')
ax.fill(*tri.T, c, alpha=0.2, zorder=2 - pos)
t = ax.text(*center, f'{ar:.1f}', color=c, fontsize=8, **txtkwargs)
t.set_bbox(dict(facecolor='white', alpha=0.8, edgecolor='none'))
plt.tight_layout()