Draw ellipses around points

Draw ellipses around points - python

I'm trying to draw ellipses around points of a group on a graph, with matplotlib. I would like to obtain something like this:
A dataset for a group (the red one for example) could look like this:
[[-23.88315146 -3.26328266] # first point
[-25.94906669 -1.47440904] # second point
[-26.52423229 -4.84947907]] # third point
I can easily draw the points on a graph, but I encounter problems to draw the ellipses.
The ellipses have diameters of 2 * standard deviation, and its center has the coordinates (x_mean, y_mean). The width of one ellipse equals the x standard deviation * 2. Its height equals the y standard deviation * 2.
However, I don't know how to calculate the angle of the ellipses (you can see on the picture the ellipses are not perfectly vertical).
Do you have an idea about how to do that ?
Note:
This question is a simplification of LDA problem (Linear Discriminant Analysis). I'm trying to simplify the problem to its most basic expression.

This is a well-studied problem. First take the convex hull of the set of points
you wish to enclose. Then perform computations as described in the literature.
I provide two sources below.
"Smallest Enclosing Ellipses--An Exact and Generic Implementation in C++" (abstract link).
Charles F. Van Loan. "Using the Ellipse to Fit and Enclose Data Points."
(PDF download).

This has a lot more to do with mathematics than programming ;)
Since you already have the dimensions and only want to find the angle, here is what I would do (based on my instinct):
Try to find the line that best fits the given set of points (trendline), this is also called Linear Regression. There are several methods to do this but the Least Squares method is a relatively easy one (see below).
Once you found the best fitting line, you could use the slope as your angle.
Least Squares Linear Regression
The least squares linear regression method is used to find the slope of the trendline, exactly what we want.
Here is a video explaining how it works
Let's assume you have a data set: data = [(x1, y1), (x2, y2), ...]
Using the least square method, your slope would be:
# I see in your example that you already have x_mean and y_mean
# No need to calculate them again, skip the two following lines
# and use your values in the rest of the example
avg_x = sum(element[0] for element in data)/len(data)
avg_y = sum(element[1] for element in data)/len(data)
x_diff = [element[0] - avg_x for element in data]
y_diff = [element[1] - avg_y for element in data]
x_diff_squared = [element**2 for element in x_diff]
slope = sum(x * y for x,y in zip(x_diff, y_diff)) / sum(x_diff_squared)
Once you have that, you are almost done. The slope is equal to the tangent of the angle slope = tan(angle)
Use python's math module angle = math.atan(slope) this will return the angle in radians. If you want it in degrees you have to convert it using math.degrees(angle)
Combine this with the dimensions and position you already have and you got yourself an ellipse ;)
This is how I would solve this particular problem, but there are probably a thousand different methods that would have worked too
and may eventually be better (and more complex) than what I propose.

I wrote a simple function to implement Mathieu David's solution. I'm sure there are many ways to do this, but this worked for my application.
def get_ellipse_params(self, points):
''' Calculate the parameters needed to graph an ellipse around a cluster of points in 2D.
Calculate the height, width and angle of an ellipse to enclose the points in a cluster.
Calculate the width by finding the maximum distance between the x-coordinates of points
in the cluster, and the height by finding the maximum distance between the y-coordinates
in the cluster. Multiple both by a scale factor to give padding around the points when
constructing the ellipse. Calculate the angle by taking the inverse tangent of the
gradient of the regression line. Note that tangent solutions repeat every 180 degrees,
and so to ensure the correct solution has been found for plotting, add a correction
factor of +/- 90 degrees if the magnitude of the angle exceeds 45 degrees.
Args:
points (ndarray): The points in a cluster to enclose with an ellipse, containing n
ndarray elements representing each point, each with d elements
representing the coordinates for the point.
Returns:
width (float): The width of the ellipse.
height (float): The height of the ellipse.
angle (float): The angle of the ellipse in degrees.
'''
if points.ndim == 1:
width, height, angle = 0.1, 0.1, 0
return width, height, angle
else:
SCALE = 2.5
width = np.amax(points[:,0]) - np.amin(points[:,0])
height = np.amax(points[:,1]) - np.amin(points[:,1])
# Calculate angle
x_reg, y_reg = [[p[0]] for p in points], [[p[1]] for p in points]
grad = LinearRegression().fit(x_reg, y_reg).coef_[0][0]
angle = np.degrees(np.arctan(grad))
# Account for multiple solutions of arctan
if angle < -45: angle += 90
elif angle > 45: angle -= 90
return width*SCALE, height*SCALE, angle

Related

Point in Spherical Polygon using Python [duplicate]

Say I have an arbitrary set of latitude and longitude pairs representing points on some simple, closed curve. In Cartesian space I could easily calculate the area enclosed by such a curve using Green's Theorem. What is the analogous approach to calculating the area on the surface of a sphere? I guess what I am after is (even some approximation of) the algorithm behind Matlab's areaint function.

There several ways to do this.
1) Integrate the contributions from latitudinal strips. Here the area of each strip will be (Rcos(A)(B1-B0))(RdA), where A is the latitude, B1 and B0 are the starting and ending longitudes, and all angles are in radians.
2) Break the surface into spherical triangles, and calculate the area using Girard's Theorem, and add these up.
3) As suggested here by James Schek, in GIS work they use an area preserving projection onto a flat space and calculate the area in there.
From the description of your data, in sounds like the first method might be the easiest. (Of course, there may be other easier methods I don't know of.)
Edit – comparing these two methods:
On first inspection, it may seem that the spherical triangle approach is easiest, but, in general, this is not the case. The problem is that one not only needs to break the region up into triangles, but into spherical triangles, that is, triangles whose sides are great circle arcs. For example, latitudinal boundaries don't qualify, so these boundaries need to be broken up into edges that better approximate great circle arcs. And this becomes more difficult to do for arbitrary edges where the great circles require specific combinations of spherical angles. Consider, for example, how one would break up a middle band around a sphere, say all the area between lat 0 and 45deg into spherical triangles.
In the end, if one is to do this properly with similar errors for each method, method 2 will give fewer triangles, but they will be harder to determine. Method 1 gives more strips, but they are trivial to determine. Therefore, I suggest method 1 as the better approach.

I rewrote the MATLAB's "areaint" function in java, which has exactly the same result.
"areaint" calculates the "suface per unit", so I multiplied the answer by Earth's Surface Area (5.10072e14 sq m).
private double area(ArrayList<Double> lats,ArrayList<Double> lons)
{
double sum=0;
double prevcolat=0;
double prevaz=0;
double colat0=0;
double az0=0;
for (int i=0;i<lats.size();i++)
{
double colat=2*Math.atan2(Math.sqrt(Math.pow(Math.sin(lats.get(i)*Math.PI/180/2), 2)+ Math.cos(lats.get(i)*Math.PI/180)*Math.pow(Math.sin(lons.get(i)*Math.PI/180/2), 2)),Math.sqrt(1- Math.pow(Math.sin(lats.get(i)*Math.PI/180/2), 2)- Math.cos(lats.get(i)*Math.PI/180)*Math.pow(Math.sin(lons.get(i)*Math.PI/180/2), 2)));
double az=0;
if (lats.get(i)>=90)
{
az=0;
}
else if (lats.get(i)<=-90)
{
az=Math.PI;
}
else
{
az=Math.atan2(Math.cos(lats.get(i)*Math.PI/180) * Math.sin(lons.get(i)*Math.PI/180),Math.sin(lats.get(i)*Math.PI/180))% (2*Math.PI);
}
if(i==0)
{
colat0=colat;
az0=az;
}
if(i>0 && i<lats.size())
{
sum=sum+(1-Math.cos(prevcolat + (colat-prevcolat)/2))*Math.PI*((Math.abs(az-prevaz)/Math.PI)-2*Math.ceil(((Math.abs(az-prevaz)/Math.PI)-1)/2))* Math.signum(az-prevaz);
}
prevcolat=colat;
prevaz=az;
}
sum=sum+(1-Math.cos(prevcolat + (colat0-prevcolat)/2))*(az0-prevaz);
return 5.10072E14* Math.min(Math.abs(sum)/4/Math.PI,1-Math.abs(sum)/4/Math.PI);
}

You mention "geography" in one of your tags so I can only assume you are after the area of a polygon on the surface of a geoid. Normally, this is done using a projected coordinate system rather than a geographic coordinate system (i.e. lon/lat). If you were to do it in lon/lat, then I would assume the unit-of-measure returned would be percent of sphere surface.
If you want to do this with a more "GIS" flavor, then you need to select an unit-of-measure for your area and find an appropriate projection that preserves area (not all do). Since you are talking about calculating an arbitrary polygon, I would use something like a Lambert Azimuthal Equal Area projection. Set the origin/center of the projection to be the center of your polygon, project the polygon to the new coordinate system, then calculate the area using standard planar techniques.
If you needed to do many polygons in a geographic area, there are likely other projections that will work (or will be close enough). UTM, for example, is an excellent approximation if all of your polygons are clustered around a single meridian.
I am not sure if any of this has anything to do with how Matlab's areaint function works.

I don't know anything about Matlab's function, but here we go. Consider splitting your spherical polygon into spherical triangles, say by drawing diagonals from a vertex. The surface area of a spherical triangle is given by
R^2 * ( A + B + C - \pi)
where R is the radius of the sphere, and A, B, and C are the interior angles of the triangle (in radians). The quantity in the parentheses is known as the "spherical excess".
Your n-sided polygon will be split into n-2 triangles. Summing over all the triangles, extracting the common factor of R^2, and bringing all of the \pi together, the area of your polygon is
R^2 * ( S - (n-2)\pi )
where S is the angle sum of your polygon. The quantity in parentheses is again the spherical excess of the polygon.
[edit] This is true whether or not the polygon is convex. All that matters is that it can be dissected into triangles.
You can determine the angles from a bit of vector math. Suppose you have three vertices A,B,C and are interested in the angle at B. We must therefore find two tangent vectors (their magnitudes are irrelevant) to the sphere from point B along the great circle segments (the polygon edges). Let's work it out for BA. The great circle lies in the plane defined by OA and OB, where O is the center of the sphere, so it should be perpendicular to the normal vector OA x OB. It should also be perpendicular to OB since it's tangent there. Such a vector is therefore given by OB x (OA x OB). You can use the right-hand rule to verify that this is in the appropriate direction. Note also that this simplifies to OA * (OB.OB) - OB * (OB.OA) = OA * |OB| - OB * (OB.OA).
You can then use the good ol' dot product to find the angle between sides: BA'.BC' = |BA'|*|BC'|*cos(B), where BA' and BC' are the tangent vectors from B along sides to A and C.
[edited to be clear that these are tangent vectors, not literal between the points]

Here is a Python 3 implementation, loosely inspired by the above answers:
def polygon_area(lats, lons, algorithm = 0, radius = 6378137):
"""
Computes area of spherical polygon, assuming spherical Earth.
Returns result in ratio of the sphere's area if the radius is specified.
Otherwise, in the units of provided radius.
lats and lons are in degrees.
"""
from numpy import arctan2, cos, sin, sqrt, pi, power, append, diff, deg2rad
lats = np.deg2rad(lats)
lons = np.deg2rad(lons)
# Line integral based on Green's Theorem, assumes spherical Earth
#close polygon
if lats[0]!=lats[-1]:
lats = append(lats, lats[0])
lons = append(lons, lons[0])
#colatitudes relative to (0,0)
a = sin(lats/2)**2 + cos(lats)* sin(lons/2)**2
colat = 2*arctan2( sqrt(a), sqrt(1-a) )
#azimuths relative to (0,0)
az = arctan2(cos(lats) * sin(lons), sin(lats)) % (2*pi)
# Calculate diffs
# daz = diff(az) % (2*pi)
daz = diff(az)
daz = (daz + pi) % (2 * pi) - pi
deltas=diff(colat)/2
colat=colat[0:-1]+deltas
# Perform integral
integrands = (1-cos(colat)) * daz
# Integrate
area = abs(sum(integrands))/(4*pi)
area = min(area,1-area)
if radius is not None: #return in units of radius
return area * 4*pi*radius**2
else: #return in ratio of sphere total area
return area
Please find a somewhat more explicit version (and with many more references and TODOs...) here.

You could also have a look at this code of the spherical_geometry package: Here and here. It does provide two different methods for calculating the area of a spherical polygon.

LDA: ellipses for confidence intervals: error in the doc?

TL;DR
To plot confidence intervals after a LDA analysis:
Should I use the covariance matrix shared by all classes (lda.covariance_), or should I calculate and use the covariance matrix of each class ?
Long question
Some time ago, I asked a question about how to draw ellipses around points: Draw ellipses around points
These ellipses will represent confidence intervals for Linear Discriminant Analysis (LDA) data points.
I will reuse my old picture, which I got from a scientific publication:
The red points (for example) could be defined as follow, after the LDA calculations:
[[-23.88315146 -3.26328266] # first point
[-25.94906669 -1.47440904] # second point
[-26.52423229 -4.84947907]] # third point
You can see on the picture that the red points are surrounded by an ellipse, which represents the confidence interval (at a certain level) for the mean of the red points.
This is what I would like to obtain. Now scikit-learn's doc has an example about that (here):
def plot_ellipse(splot, mean, cov, color):
v, w = linalg.eigh(cov)
u = w[0] / linalg.norm(w[0])
angle = np.arctan(u[1] / u[0])
angle = 180 * angle / np.pi # convert to degrees
# filled Gaussian at 2 standard deviation
ell = mpl.patches.Ellipse(mean, 2 * v[0] ** 0.5, 2 * v[1] ** 0.5,
180 + angle, color=color)
And this function is called like this:
plot_ellipse(splot, lda.means_[0], lda.covariance_, 'red')
In the doc's example, plot_ellipse is called to draw the confidence interval of all the classes, always with the same covariance: lda.covariance.
lda.covariance is then used to determine the angle of the ellipses. As lda.covariance never changes, all the ellipses will have the same angle.
Is it mathematically correct to do that ? I am tempted to say no.
On another post (multidimensional confidence intervals), which is not related to LDA, #Joe Kington simply uses a " 2-sigma ellipse of the scatter of points". He calculates the covariance for each class:
cov = np.cov(points, rowvar=False)
, where points would be the 3 points described above, for example. He then uses a similar way to calculate the angle of the ellipses. But as he calculates the covariance matrix for each class, the angles of the ellipses are not the same across the classes.

Computing diameter-lines of a 3D spherical mask

Background
For an algorithm I'm working on, I currently use a 3D sphere as binary mask, with a NxNxN array having voxels in a sphere of radius N//2 as True. Further processing does computation for each voxel set as True.
It proved computationally intensive for my specific task as N grew large = O(N^3), so I now want to reduce my binary mask to a subsample of lines radiating from array center within radius.
Objective
I want a 3D binary mask of the lines in gray in the image.
To have a bit of control over the number of voxels, I would have a parameter (say l) regulating the number of lines sampled in each 2D circle, and maybe a second one (k ?) for the number of z-rotation.
What I tried
I am using numpy and scipy, and I thought that I could use the scipy.ndimage.interpolation.rotate method to rotate a single line around on a plane, then use that complete 2D mask to rotate around the z-axis.
This proved difficult, as interpolate uses some deep magic regarding splines that discard my True values on rotation.
I am thinking that I could compute mathematically which voxel should be set to True by following some line-equations, but I'm at a loss to find them.
Any idea how to get there ?
Update : Solution !
Thanks to jkalden who helped me think this through and gave code samples, I have this :
rmax is radius of sphere, n_theta and n_phi the number of polar and azimutal lines to use.
out_mask = np.zeros((rmax*2,) * 3, dtype=bool)
# for each phi = one circle in azimutal circles
for phi in np.linspace(0, np.deg2rad(360), n_phi,endpoint=False):
# for all lines in polar circle of this azimutal circle
for theta in np.linspace(0, np.deg2rad(360), n_theta,endpoint=False):
# for all distances (0-rmax) in these lines
for r in range(rmax):
coords = spherical_to_cartesian([r, theta, phi]) + rmax
out_mask[tuple(coords)] = True
With the spherical_to_cartesian from this code sample.
Which gives me this (with rmax = 50 and n_theta = n_phi = 8) :
(Center area tuned out of my function by choice)

I propose to change the coordinate system to spherical coordinates. Thus, you will choose your 2D circle by an azimuthal angle, and a line then is defined by additionally choosing a polar angle. The variable along the line is then just the radius, and you can use ´numpy.linspace´ to discretize it. Doing so might also save time during calculation.
You can switch your coordinate system any time by using the bijective relation which is implemented e.g. here or here.

Generate random points on a surface of the cylinder

I want to generate random points on the surface of cylinder such that distance between the points fall in a range of 230 and 250. I used the following code to generate random points on surface of cylinder:
import random,math
H=300
R=20
s=random.random()
#theta = random.random()*2*math.pi
for i in range(0,300):
theta = random.random()*2*math.pi
z = random.random()*H
r=math.sqrt(s)*R
x=r*math.cos(theta)
y=r*math.sin(theta)
z=z
print 'C' , x,y,z
How can I generate random points such that they fall with in the range(on the surfaceof cylinder)?

This is not a complete solution, but an insight that should help. If you "unroll" the surface of the cylinder into a rectangle of width w=2*pi*r and height h, the task of finding distance between points is simplified. You have not explained how to measure "distance along the surface" between points on the top of the cylinder and the side- this is a slightly tricky bit of geometry.
As for computing the distance along the surface when we created an artificial "seam", just use both (x1-x2) and (w -x1+x2) - whichever gives the shorter distance is the one you want.
I do think that #VincentNivoliers' suggestion to use Poisson disk sampling is very good, but with the constraints of h=300 and r=20 you will get terrible results no matter what.

The basic way of creating a set of random points with constraints in the positions between them, is to have a function that modulates the probability of points being placed at a certain location. this function starts out being a constant, and whenever a point is placed, forbidden areas surrounding the point are set to zero. That is difficult to do with continuous variables, but reasonably easy if you discretize your problem.
The other thing to be careful about is the being on a cylinder part. It may be easier to think of it as random points on a rectangular area that repeats periodically. This can be handled in two different ways:
the simplest is to take into consideration not only the rectangular tile where you are placing the points, but also its neighbouring ones. Whenever you place a point in your main tile, you also place one in the neighboring ones and compute their effect on the probability function inside your tile.
A more sophisticated approach considers the probability function then convolution of a kernel that encodes forbidden areas, with a sum of delta functions, corresponding to the points already placed. If this is computed using FFTs, the periodicity is anatural by product.
The first approach can be coded as follows:
from __future__ import division
import numpy as np
r, h = 20, 300
w = 2*np.pi*r
int_w = int(np.rint(w))
mult = 10
pdf = np.ones((h*mult, int_w*mult), np.bool)
points = []
min_d, max_d = 230, 250
available_locs = pdf.sum()
while available_locs:
new_idx = np.random.randint(available_locs)
new_idx = np.nonzero(pdf.ravel())[0][new_idx]
new_point = np.array(np.unravel_index(new_idx, pdf.shape))
points += [new_point]
min_mask = np.ones_like(pdf)
if max_d is not None:
max_mask = np.zeros_like(pdf)
else:
max_mask = True
for p in [new_point - [0, int_w*mult], new_point +[0, int_w*mult],
new_point]:
rows = ((np.arange(pdf.shape[0]) - p[0]) / mult)**2
cols = ((np.arange(pdf.shape[1]) - p[1]) * 2*np.pi*r/int_w/mult)**2
dist2 = rows[:, None] + cols[None, :]
min_mask &= dist2 > min_d*min_d
if max_d is not None:
max_mask |= dist2 < max_d*max_d
pdf &= min_mask & max_mask
available_locs = pdf.sum()
points = np.array(points) / [mult, mult*int_w/(2*np.pi*r)]
If you run it with your values, the output is usually just one or two points, as the large minimum distance forbids all others. but if you run it with more reasonable values, e.g.
min_d, max_d = 50, 200
Here's how the probability function looks after placing each of the first 5 points:
Note that the points are returned as pairs of coordinates, the first being the height, the second the distance along the cylinder's circumference.

calculate turning points / pivot points in trajectory (path)

I'm trying to come up with an algorithm that will determine turning points in a trajectory of x/y coordinates. The following figures illustrates what I mean: green indicates the starting point and red the final point of the trajectory (the entire trajectory consists of ~ 1500 points):
In the following figure, I added by hand the possible (global) turning points that an algorithm could return:
Obviously, the true turning point is always debatable and will depend on the angle that one specifies that has to lie between points. Furthermore a turning point can be defined on a global scale (what I tried to do with the black circles), but could also be defined on a high-resolution local scale. I'm interested in the global (overall) direction changes, but I'd love to see a discussion on the different approaches that one would use to tease apart global vs local solutions.
What I've tried so far:
calculate distance between subsequent points
calculate angle between subsequent points
look how distance / angle changes between subsequent points
Unfortunately this doesn't give me any robust results. I probably have too calculate the curvature along multiple points, but that's just an idea.
I'd really appreciate any algorithms / ideas that might help me here. The code can be in any programming language, matlab or python are preferred.
EDIT here's the raw data (in case somebody want's to play with it):
mat file
text file (x coordinate first, y coordinate in second line)

You could use the Ramer-Douglas-Peucker (RDP) algorithm to simplify the path. Then you could compute the change in directions along each segment of the simplified path. The points corresponding to the greatest change in direction could be called the turning points:
A Python implementation of the RDP algorithm can be found on github.
import matplotlib.pyplot as plt
import numpy as np
import os
import rdp
def angle(dir):
"""
Returns the angles between vectors.
Parameters:
dir is a 2D-array of shape (N,M) representing N vectors in M-dimensional space.
The return value is a 1D-array of values of shape (N-1,), with each value
between 0 and pi.
0 implies the vectors point in the same direction
pi/2 implies the vectors are orthogonal
pi implies the vectors point in opposite directions
"""
dir2 = dir[1:]
dir1 = dir[:-1]
return np.arccos((dir1*dir2).sum(axis=1)/(
np.sqrt((dir1**2).sum(axis=1)*(dir2**2).sum(axis=1))))
tolerance = 70
min_angle = np.pi*0.22
filename = os.path.expanduser('~/tmp/bla.data')
points = np.genfromtxt(filename).T
print(len(points))
x, y = points.T
# Use the Ramer-Douglas-Peucker algorithm to simplify the path
# http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
# Python implementation: https://github.com/sebleier/RDP/
simplified = np.array(rdp.rdp(points.tolist(), tolerance))
print(len(simplified))
sx, sy = simplified.T
# compute the direction vectors on the simplified curve
directions = np.diff(simplified, axis=0)
theta = angle(directions)
# Select the index of the points with the greatest theta
# Large theta is associated with greatest change in direction.
idx = np.where(theta>min_angle)[0]+1
fig = plt.figure()
ax =fig.add_subplot(111)
ax.plot(x, y, 'b-', label='original path')
ax.plot(sx, sy, 'g--', label='simplified path')
ax.plot(sx[idx], sy[idx], 'ro', markersize = 10, label='turning points')
ax.invert_yaxis()
plt.legend(loc='best')
plt.show()
Two parameters were used above:
The RDP algorithm takes one parameter, the tolerance, which
represents the maximum distance the simplified path
can stray from the original path. The larger the tolerance, the cruder the simplified path.
The other parameter is the min_angle which defines what is considered a turning point. (I'm taking a turning point to be any point on the original path, whose angle between the entering and exiting vectors on the simplified path is greater than min_angle).

I will be giving numpy/scipy code below, as I have almost no Matlab experience.
If your curve is smooth enough, you could identify your turning points as those of highest curvature. Taking the point index number as the curve parameter, and a central differences scheme, you can compute the curvature with the following code
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
def first_derivative(x) :
return x[2:] - x[0:-2]
def second_derivative(x) :
return x[2:] - 2 * x[1:-1] + x[:-2]
def curvature(x, y) :
x_1 = first_derivative(x)
x_2 = second_derivative(x)
y_1 = first_derivative(y)
y_2 = second_derivative(y)
return np.abs(x_1 * y_2 - y_1 * x_2) / np.sqrt((x_1**2 + y_1**2)**3)
You will probably want to smooth your curve out first, then calculate the curvature, then identify the highest curvature points. The following function does just that:
def plot_turning_points(x, y, turning_points=10, smoothing_radius=3,
cluster_radius=10) :
if smoothing_radius :
weights = np.ones(2 * smoothing_radius + 1)
new_x = scipy.ndimage.convolve1d(x, weights, mode='constant', cval=0.0)
new_x = new_x[smoothing_radius:-smoothing_radius] / np.sum(weights)
new_y = scipy.ndimage.convolve1d(y, weights, mode='constant', cval=0.0)
new_y = new_y[smoothing_radius:-smoothing_radius] / np.sum(weights)
else :
new_x, new_y = x, y
k = curvature(new_x, new_y)
turn_point_idx = np.argsort(k)[::-1]
t_points = []
while len(t_points) < turning_points and len(turn_point_idx) > 0:
t_points += [turn_point_idx[0]]
idx = np.abs(turn_point_idx - turn_point_idx[0]) > cluster_radius
turn_point_idx = turn_point_idx[idx]
t_points = np.array(t_points)
t_points += smoothing_radius + 1
plt.plot(x,y, 'k-')
plt.plot(new_x, new_y, 'r-')
plt.plot(x[t_points], y[t_points], 'o')
plt.show()
Some explaining is in order:
turning_points is the number of points you want to identify
smoothing_radius is the radius of a smoothing convolution to be applied to your data before computing the curvature
cluster_radius is the distance from a point of high curvature selected as a turning point where no other point should be considered as a candidate.
You may have to play around with the parameters a little, but I got something like this:
>>> x, y = np.genfromtxt('bla.data')
>>> plot_turning_points(x, y, turning_points=20, smoothing_radius=15,
... cluster_radius=75)
Probably not good enough for a fully automated detection, but it's pretty close to what you wanted.

A very interesting question. Here is my solution, that allows for variable resolution. Although, fine-tuning it may not be simple, as it's mostly intended to narrow down
Every k points, calculate the convex hull and store it as a set. Go through the at most k points and remove any points that are not in the convex hull, in such a way that the points don't lose their original order.
The purpose here is that the convex hull will act as a filter, removing all of "unimportant points" leaving only the extreme points. Of course, if the k-value is too high, you'll end up with something too close to the actual convex hull, instead of what you actually want.
This should start with a small k, at least 4, then increase it until you get what you seek. You should also probably only include the middle point for every 3 points where the angle is below a certain amount, d. This would ensure that all of the turns are at least d degrees (not implemented in code below). However, this should probably be done incrementally to avoid loss of information, same as increasing the k-value. Another possible improvement would be to actually re-run with points that were removed, and and only remove points that were not in both convex hulls, though this requires a higher minimum k-value of at least 8.
The following code seems to work fairly well, but could still use improvements for efficiency and noise removal. It's also rather inelegant in determining when it should stop, thus the code really only works (as it stands) from around k=4 to k=14.
def convex_filter(points,k):
new_points = []
for pts in (points[i:i + k] for i in xrange(0, len(points), k)):
hull = set(convex_hull(pts))
for point in pts:
if point in hull:
new_points.append(point)
return new_points
# How the points are obtained is a minor point, but they need to be in the right order.
x_coords = [float(x) for x in x.split()]
y_coords = [float(y) for y in y.split()]
points = zip(x_coords,y_coords)
k = 10
prev_length = 0
new_points = points
# Filter using the convex hull until no more points are removed
while len(new_points) != prev_length:
prev_length = len(new_points)
new_points = convex_filter(new_points,k)
Here is a screen shot of the above code with k=14. The 61 red dots are the ones that remain after the filter.

The approach you took sounds promising but your data is heavily oversampled. You could filter the x and y coordinates first, for example with a wide Gaussian and then downsample.
In MATLAB, you could use x = conv(x, normpdf(-10 : 10, 0, 5)) and then x = x(1 : 5 : end). You will have to tweak those numbers depending on the intrinsic persistence of the objects you are tracking and the average distance between points.
Then, you will be able to detect changes in direction very reliably, using the same approach you tried before, based on the scalar product, I imagine.

Another idea is to examine the left and the right surroundings at every point. This may be done by creating a linear regression of N points before and after each point. If the intersecting angle between the points is below some threshold, then you have an corner.
This may be done efficiently by keeping a queue of the points currently in the linear regression and replacing old points with new points, similar to a running average.
You finally have to merge adjacent corners to a single corner. E.g. choosing the point with the strongest corner property.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.