I have a set of 500,000 spheres 3D coordinates along with their radii (non-unique) which are put in a pandas DataFrame object.
I would like to create an efficient search routine where I sample many points' coordinates, and the routine returns if each point is in a sphere, and if yes, in which sphere it is.
To do that, I read that using a cartesian search mesh can be the solution. I therefore created a mesh with a given cell size, and assigned ids to each sphere. In case it is useful, I also stored the differences between the center of the cell and the center of the sphere.
import pandas as pd
import numpy as np
size_cell = 20
fake_coordinates = np.uniform(-200,200, (500000, 3)) # Here spheres will cross, which should not be the case in the real input. If point in two spheres, choose the first one that comes.
data = pd.DataFrame(fake_coordinates, columns=['x','y','z'])
data['r'] = np.uniform(1,3, 500000)
x_vect = np.arange(data.x.min()-np.max(data.r), data.x.max()+np.max(data.r), size_cell)
y_vect = np.arange(data.y.min()-np.max(data.r), data.y.max()+np.max(data.r), size_cell)
z_vect = np.arange(data.z.min()-np.max(data.r), data.z.max()+np.max(data.r), size_cell)
data['i_x'] = ((data.x-x_vect[0])//size_cell).astype(int)
data['i_y'] = ((data.y-y_vect[0])//size_cell).astype(int)
data['i_z'] = ((data.z-z_vect[0])//size_cell).astype(int)
data['dx'] = data.x-(x_vect[data.i_x]+x_vect[data.i_x+1])/2
data['dy'] = data.y-(y_vect[data.i_y]+y_vect[data.i_y+1])/2
data['dz'] = data.z-(z_vect[data.i_z]+z_vect[data.i_z+1])/2
Then, I noticed that a sphere which has its center that belongs to a neighboring cell can cross its cell and therefore, the sampled point can be in this neighbor sphere. So I computed the crossing nature of each sphere.
data['crossing_x_left'] = data.dx < - (size_cell - data.r)
data['crossing_x_right'] = data.dx > (size_cell - data.r)
data['crossing_y_left'] = data.dy < - (size_cell - data.r)
data['crossing_y_right'] = data.dy > (size_cell - data.r)
data['crossing_z_down'] = data.dz < - (size_cell - data.r)
data['crossing_z_top'] = data.dz > (size_cell - data.r)
Now I should sample N points:
sample = np.uniform(-200,200, (1000000, 3))
If I take into account only one point and only candidate spheres with the center within the cell:
i_x = (sample[0,0]-x_vect[0])//size_cell
i_y = (sample[0,1]-y_vect[0])//size_cell
i_z = (sample[0,2]-z_vect[0])//size_cell
ok_x = data['i_x']==i_x
ok_y = data['i_y']==i_y
ok_z = data['i_z']==i_z
candidates = data.loc[ok_x & ok_y & ok_z]
Now I want to test many points at the same time, but also take into account the crossing spheres.
But then I run into a method issue. How to efficiently compute (probably meaning processing all points and all spheres at the same time using matrices) if the points belong to a sphere, and if yes, in which sphere they are? The part where spheres can cross the boundary makes it difficult for me to see.
Related
I am trying to calculate the area of a shape enclosed by a large set of unordered points in python. I have a 2D array of points which I can plot as a scatterplot like this.
There are several ways to calculate the area enclosed by points, but these all assume ordered points, such as here and here. This method calculates the area unordered points, but it doesn't appear to work for complex shapes, as seen here. How would I calculate this area from unordered points in python?
Sample data looks like this:
[[225.93459 -27.25677 ]
[226.98128 -32.001945]
[223.3623 -34.119724]
[225.84741 -34.416553]]
From pen and paper one can see that this shape contains an area of ~12 (unitless) but putting these coordinates into one of the algorithms linked to previously returns an area of ~0.78.
Let's first mention that in the question How would I calculate this area from unordered points in python? used phrase 'unordered points' in the context of calculation of an area usually means that given are points of a contour enclosing an area which area is to calculate.
But in the question provided data sample are not points of a contour but just a cloud of points, which if visualized using a scatterplot results in a visually perceivable area.
The above is the reason why in the question provided links to algorithms calculating areas from 'unordered points' don't apply at all to what the question is about.
In other words, the actual title of the question I will answer below will be:
Calculate the visually perceivable area a cloud of (x,y) points is forming when visualized as a scatterplot
One of the possible options is mentioned in a comment to the question:
Honestly, you might consider taking THAT graph as a bitmap, and counting the number of non-white pixels in it. That is probably as close as you can get. – Tim Roberts
Given the image perfectly covering (without any margin) all the non-white pixels you can calculate the area the image rectangle is covering in units used in the underlying (x,y) data by calculating the area TA of the rectangle visible in the image from the underlying list of points P with (x,y) point coordinates ( P = [(x1,y1), (x2,y2), ...] ) as follows:
X = [x for x,y in P]
Y = [y for x,y in P]
TA = (max(X)-min(X))*(max(Y)-min(Y))
Assuming N_white is the number of all white pixels in the image with N pixels the actual area A covered by non-white pixels expressed in units used in the list of points P will be:
A = TA*(N-N_white)/N
Another approach using a list of points P with (x,y) point coordinates only ( without creation of an image ) consists of following steps:
decide which area Ap a point is covering and calculate half of the size h2 of a rectangle with this area around that point ( h2 = 0.5*sqrt(Ap) )
create a list R with rectangles around all points in the list P: R = [(x-h2, y+h2, x+h2, y-h2) for x,y in P]
use the code provided through a link listed in the stackoverflow question
Area of Union Of Rectangles using Segment Trees to calculate the total area covered by the rectangles in the list R.
The above approach has the advantage over the graphical one obtained from the scatterplot that with the choice of the area covered by a point you directly influence the used precision/resolution/granularity for the area calculation.
Given a 2D array of points the area covered by the points can be calculated with help of the return value of the same hist2d() function provided in the matplotlib module (as matplotlib.pyplot.hist2d()) which is used to show the scatterplot.
The 'trick' is to set the cmin parameter value of the function to 1 ( cmin=1 ) and then calculate the number of numpy.nan values in the by the function returned array setting them in relation to entire amount of array values.
In other words all what is necessary to calculate the area when creating the scatterplot is already there for easy use in a simple area calculation formulas if you know that the histogram creating function provide as return value all what is therefore necessary.
Below code of a ready to use function for the area calculation along with demonstration of function usage:
def area_of_points(points, grid_size = [1000, 1000]):
"""
Returns the area covered by N 2D-points provided in a 'points' array
points = [ (x1,y1), (x2,y2), ... , (xN, yN) ]
'grid_size' gives the number of grid cells in x and y direction the
'points' bounding box is divided into for calculation of the area.
Larger 'grid_size' values mean smaller grid cells, higher precision
of the area calculation and longer runtime.
area_of_points() requires installed matplotlib module. """
import matplotlib.pyplot as plt
import numpy as np
pts_x = [x for x,y in points]
pts_y = [y for x,y in points]
pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
h2D,_,_,_ = plt.hist2d( pts_x, pts_y, bins = grid_size, cmin=1)
numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
numberOfAll2Dbins = h2D.shape[0]*h2D.shape[1]
areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
pts_pts_area = areaFactor * pts_bb_area
print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {pts_pts_area:8.4f}')
plt.show()
return pts_pts_area
#:def area_of_points(points, grid_size = [1000, 1000])
import numpy as np
np.random.seed(12345)
x = np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
print(area_of_points(pts))
# ^-- prints: Areas: b-box = 114.5797, points = 7.8001
# ^-- prints: 7.800126875291629
The above code creates following scatterplot:
Notice that the printed output Areas: b-box = 114.5797, points = 7.8001 and the by the function returned area value 7.800126875291629 give the area in units in which the x,y coordinates in the array of points are specified.
Instead of usage of a function when utilizing the know how you can play around with the parameter of the scatterplot calculating the area of what can be seen in the scatterplot.
Below code which changes the displayed scatterplot using the same underlying point data:
import numpy as np
np.random.seed(12345)
x = np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
pts_values_example = \
[[0.53005, 2.79209],
[0.73751, 0.18978],
... ,
[-0.6633, -2.0404],
[1.51470, 0.86644]]
# ---
pts_x = [x for x,y in pts]
pts_y = [y for x,y in pts]
pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
# ---
import matplotlib.pyplot as plt
bins = [320, 300] # resolution of the grid (for the scatter plot)
# ^-- resolution of precision for the calculation of area
pltRetVal = plt.hist2d( pts_x, pts_y, bins = bins, cmin=1, cmax=15 )
plt.colorbar() # display the colorbar (for a 2d density histogram)
plt.show()
# ---
h2D, xedges1D, yedges1D, h2DhistogramObject = pltRetVal
numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
numberOfAll2Dbins = (len(xedges1D)-1)*(len(yedges1D)-1)
areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
area = areaFactor * pts_bb_area
print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {area:8.4f}')
# prints "Areas: b-box = 114.5797, points = 20.7174"
creating following scatterplot:
Notice that the calculated area is now larger due to smaller values used for grid resolution resulting in more of the area colored.
Problem
Need to identify a way to find 2 mile clusters of points where each point has a value. Identify 2 mile areas which have a sum(value) > 50.
Data
I have data that looks like the following
ID COUNT LATITUDE LONGITUDE
187601546 20 025.56394 -080.03206
187601547 25 025.56394 -080.03206
187601548 4 025.56394 -080.03206
187601550 0 025.56298 -080.03285
Roughly 200K records. What I need to determine is if there are any areas where more than sum of the count exceeds 65 in a one mile radius (2 mile diameter) area.
Using each point as a center for an area
Now, I have python code from another project that will draw a shapefile around a point of x diameter as follows:
def poly_based_on_distance(center_lat,center_long, distance, bearing):
# bearing is in degrees
# distance in miles
# print ('center', center_lat, center_long)
destination = (vincenty(miles=distance).destination(Point(center_lat,
center_long), bearing).format_decimal())
And a routine to return destination and then see which points are inside the radius.
## This is the evaluation for overlap between points and
## area polyshapes
area_list = []
store_geo_dict = {}
for stores in locationdict:
location = Polygon(locationdict[stores])
for areas in AREAdictionary:
area = Polygon(AREAdictionary[areass])
if store.intersects(area):
area_list.append(areas)
store_geo_dict[stores] = area_list
area_list = []
At this point, I am simply drawing a circular shapefile around each of the 200K points, see which others were inside and doing the count.
Need Clustering Algorithm?
However, there might be an area with the required count density where one of the points is not in the center.
I'm familiar with clustering algos such as DBSCAN that use attributes for classification but this is a matter of finding a density clusters using a value for each point. Is there any clustering algorithm to find any cluster of a 2 mile diameter circle where the inside count is >= 50?
Any suggestions, python or R are preferred tools but this is wide-open and probably a one-off so computation efficiency is not a priority.
Not a complete solution, but maybe it will help simplify the problem depending on the distribution of your data. I will use planar coordinates and cKDTree in my example, this might work with geographic data if you can ignore curvature in a projection.
The main observation is the following: a point (x,y) does not contribute to a dense cluster if a ball of radius 2*r (e.g. 2 miles) around (x,y) contributes less than the cutoff value (e.g. 50 in your title). In fact, any point within r of (x,y) does not contribute to ant dense cluster.
This allows you to repeatedly discard points from consideration. If you are left with no points, there are no dense clusters; if you are left with some points, clusters may exist.
import numpy as np
from scipy.spatial import cKDTree
# test data
N = 1000
data = np.random.rand(N, 2)
x, y = data.T
# test weights of each point
weights = np.random.rand(N)
def filter_noncontrib(pts, weights, radius=0.1, cutoff=60):
tree = cKDTree(pts)
contribs = np.array(
[weights[tree.query_ball_point(pt, 2 * radius)].sum() for pt in pts]
)
return contribs >= cutoff
def possible_contributors(pts, weights, radius=0.1, cutoff=60):
n_pts = len(pts)
while len(pts):
mask = filter_noncontrib(pts, weights, radius, cutoff)
pts = pts[mask]
weights = weights[mask]
if len(pts) == n_pts:
break
n_pts = len(pts)
return pts
Example with dummy data:
DBSCAN can be adapted (see Generalized DBSCAN; define core points as weight sum >= 50), but it will not ensure the maximum cluster size (it computes transitive closures).
You could also try complete linkage. Use it to find clusters with the desired maximum diameter, then check if these satisfy the desired density. But that does not guarantee to find all.
It's probably faster to (a) build an index for fast radius search. (b) for every point, find neighbors in radius r; keep if they have the desired minimum sum. But that does not guarantee to find everything because the center is not necessarily a data point. Consider a max radius of 1, minimum weight 100. Two points with weight 50 each, at (0,0) and (1,1). Neither a query at (0,0) nor one at (1,1) will discover the solution, but a cluster at (.5,.5) satisfies the conditions.
Unfortunately, I believe your problem is at least NP-hard, so you won't be able to afford the ultimate solution.
I'm trying to find the degrees of visibility of natural environments from a number of viewpoints. I have a DEM, and a raster defining the natural environments. For each cell I have slope, height, aspect and distance from viewpoint. I want to calculate the angle between the viewpoint and each cell.
I originally did this by getting the x,y,z coordinates for the viewpoint and the both the bottom and top of a sloped cell. Then get the 3D distance between each coordinate and use these distances to get the angles. I would then sum for each cell to get total degrees of visibilty.
Problem is that this approach only works when the bearing between the viewpoint and the visible cell is in line with the x and y axis of the coordinate system. This because to calculate the points coordinates for the cell corners I'm subtracting and adding the cell resolution/2 to the cell centroid.
Code below
def Cell_Loc (VPX, VPY, VPZ, cell_x, cell_y, cell_z, aspect, cell_resolution, bearing):
import math
from math import sqrt
#Get change in height of cell using slope and cell resolution (trig)
AspectTan = math.tan(math.radians(aspect))
Opposite = (AspectTan * cell_res)
#Get coordinates for cell corners
CloseCornerX = cent_x - 2.5
CloseCornerY = cent_y - 2.5
FarCornerX = cent_x + 2.5
FarCornerY = cent_y + 2.5
CloseCornerZ = cent_z - (Opposite/2)
FarCornerZ = cent_z + (Opposite/2)
#Get Distance between coordinates
VP = (VPX, VPY, VPZ) #data point 1
cellcrner1 = (CloseCornerX, CloseCornerY, CloseCornerZ) #data point 2
cellcrner2 = (FarCornerX, FarCornerY, FarCornerZ)
dist_to_far_corner = sqrt(sum( (VP - cellcrner2)**2 for VP, cellcrner2 in zip(VP, cellcrner2)))
dist_to_close_corner = sqrt(sum( (VP - cellcrner1)**2 for VP, cellcrner1 in zip(VP, cellcrner1)))
cell_dist = sqrt(sum( (cellcrner1 - cellcrner2)**2 for cellcrner1, cellcrner2 in zip(cellcrner1, cellcrner2)))
#Input above distances into ViewAngle to find angle
def ViewAngle (a, b, c):
"calculates the viewing angle of each visible cell"
import math
a2 = a*a
b2 = b*b
c2 = c*c
VA = (a2-b2+c2)/ (2.0 * (a*c))
rads = math.acos(VA)
return math.degrees(rads)
How can I improve this code, or maybe another approach is a better way of doing it. The output I need is to say "for this viewpoint, there are X amount of degrees for visible natural environments". Again I have, DEM, visible areas from viewshed analysis, slope, aspect and distances between viewpoint and cell centroid. I use python programming language and have ArcGIS10.2 with arcpy
One improvement would be to use coordinates and inner products instead of distances to calculate angles: this would help avoid some (though not all) expensive and messy square roots. So if your view point is at v=(x,y,z), and two other points are P=(a,b,c) and Q=(d,e,f), the subtended angle is acos(sqrt(((V-P)(V-Q))^2/(V-P)^2(V-Q)^2))=acos(sqrt(<(V-P),(V-Q)>^2/<(V-P),(V-P)><(V-Q),(V-Q)>)) where the (internal) products are inner products, i.e., say PQ = ad + be + cf. Also, some of the products and inner products can be 'recycled' instead of recalculating.
I want to generate random points on the surface of cylinder such that distance between the points fall in a range of 230 and 250. I used the following code to generate random points on surface of cylinder:
import random,math
H=300
R=20
s=random.random()
#theta = random.random()*2*math.pi
for i in range(0,300):
theta = random.random()*2*math.pi
z = random.random()*H
r=math.sqrt(s)*R
x=r*math.cos(theta)
y=r*math.sin(theta)
z=z
print 'C' , x,y,z
How can I generate random points such that they fall with in the range(on the surfaceof cylinder)?
This is not a complete solution, but an insight that should help. If you "unroll" the surface of the cylinder into a rectangle of width w=2*pi*r and height h, the task of finding distance between points is simplified. You have not explained how to measure "distance along the surface" between points on the top of the cylinder and the side- this is a slightly tricky bit of geometry.
As for computing the distance along the surface when we created an artificial "seam", just use both (x1-x2) and (w -x1+x2) - whichever gives the shorter distance is the one you want.
I do think that #VincentNivoliers' suggestion to use Poisson disk sampling is very good, but with the constraints of h=300 and r=20 you will get terrible results no matter what.
The basic way of creating a set of random points with constraints in the positions between them, is to have a function that modulates the probability of points being placed at a certain location. this function starts out being a constant, and whenever a point is placed, forbidden areas surrounding the point are set to zero. That is difficult to do with continuous variables, but reasonably easy if you discretize your problem.
The other thing to be careful about is the being on a cylinder part. It may be easier to think of it as random points on a rectangular area that repeats periodically. This can be handled in two different ways:
the simplest is to take into consideration not only the rectangular tile where you are placing the points, but also its neighbouring ones. Whenever you place a point in your main tile, you also place one in the neighboring ones and compute their effect on the probability function inside your tile.
A more sophisticated approach considers the probability function then convolution of a kernel that encodes forbidden areas, with a sum of delta functions, corresponding to the points already placed. If this is computed using FFTs, the periodicity is anatural by product.
The first approach can be coded as follows:
from __future__ import division
import numpy as np
r, h = 20, 300
w = 2*np.pi*r
int_w = int(np.rint(w))
mult = 10
pdf = np.ones((h*mult, int_w*mult), np.bool)
points = []
min_d, max_d = 230, 250
available_locs = pdf.sum()
while available_locs:
new_idx = np.random.randint(available_locs)
new_idx = np.nonzero(pdf.ravel())[0][new_idx]
new_point = np.array(np.unravel_index(new_idx, pdf.shape))
points += [new_point]
min_mask = np.ones_like(pdf)
if max_d is not None:
max_mask = np.zeros_like(pdf)
else:
max_mask = True
for p in [new_point - [0, int_w*mult], new_point +[0, int_w*mult],
new_point]:
rows = ((np.arange(pdf.shape[0]) - p[0]) / mult)**2
cols = ((np.arange(pdf.shape[1]) - p[1]) * 2*np.pi*r/int_w/mult)**2
dist2 = rows[:, None] + cols[None, :]
min_mask &= dist2 > min_d*min_d
if max_d is not None:
max_mask |= dist2 < max_d*max_d
pdf &= min_mask & max_mask
available_locs = pdf.sum()
points = np.array(points) / [mult, mult*int_w/(2*np.pi*r)]
If you run it with your values, the output is usually just one or two points, as the large minimum distance forbids all others. but if you run it with more reasonable values, e.g.
min_d, max_d = 50, 200
Here's how the probability function looks after placing each of the first 5 points:
Note that the points are returned as pairs of coordinates, the first being the height, the second the distance along the cylinder's circumference.
I'm trying to come up with an algorithm that will determine turning points in a trajectory of x/y coordinates. The following figures illustrates what I mean: green indicates the starting point and red the final point of the trajectory (the entire trajectory consists of ~ 1500 points):
In the following figure, I added by hand the possible (global) turning points that an algorithm could return:
Obviously, the true turning point is always debatable and will depend on the angle that one specifies that has to lie between points. Furthermore a turning point can be defined on a global scale (what I tried to do with the black circles), but could also be defined on a high-resolution local scale. I'm interested in the global (overall) direction changes, but I'd love to see a discussion on the different approaches that one would use to tease apart global vs local solutions.
What I've tried so far:
calculate distance between subsequent points
calculate angle between subsequent points
look how distance / angle changes between subsequent points
Unfortunately this doesn't give me any robust results. I probably have too calculate the curvature along multiple points, but that's just an idea.
I'd really appreciate any algorithms / ideas that might help me here. The code can be in any programming language, matlab or python are preferred.
EDIT here's the raw data (in case somebody want's to play with it):
mat file
text file (x coordinate first, y coordinate in second line)
You could use the Ramer-Douglas-Peucker (RDP) algorithm to simplify the path. Then you could compute the change in directions along each segment of the simplified path. The points corresponding to the greatest change in direction could be called the turning points:
A Python implementation of the RDP algorithm can be found on github.
import matplotlib.pyplot as plt
import numpy as np
import os
import rdp
def angle(dir):
"""
Returns the angles between vectors.
Parameters:
dir is a 2D-array of shape (N,M) representing N vectors in M-dimensional space.
The return value is a 1D-array of values of shape (N-1,), with each value
between 0 and pi.
0 implies the vectors point in the same direction
pi/2 implies the vectors are orthogonal
pi implies the vectors point in opposite directions
"""
dir2 = dir[1:]
dir1 = dir[:-1]
return np.arccos((dir1*dir2).sum(axis=1)/(
np.sqrt((dir1**2).sum(axis=1)*(dir2**2).sum(axis=1))))
tolerance = 70
min_angle = np.pi*0.22
filename = os.path.expanduser('~/tmp/bla.data')
points = np.genfromtxt(filename).T
print(len(points))
x, y = points.T
# Use the Ramer-Douglas-Peucker algorithm to simplify the path
# http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
# Python implementation: https://github.com/sebleier/RDP/
simplified = np.array(rdp.rdp(points.tolist(), tolerance))
print(len(simplified))
sx, sy = simplified.T
# compute the direction vectors on the simplified curve
directions = np.diff(simplified, axis=0)
theta = angle(directions)
# Select the index of the points with the greatest theta
# Large theta is associated with greatest change in direction.
idx = np.where(theta>min_angle)[0]+1
fig = plt.figure()
ax =fig.add_subplot(111)
ax.plot(x, y, 'b-', label='original path')
ax.plot(sx, sy, 'g--', label='simplified path')
ax.plot(sx[idx], sy[idx], 'ro', markersize = 10, label='turning points')
ax.invert_yaxis()
plt.legend(loc='best')
plt.show()
Two parameters were used above:
The RDP algorithm takes one parameter, the tolerance, which
represents the maximum distance the simplified path
can stray from the original path. The larger the tolerance, the cruder the simplified path.
The other parameter is the min_angle which defines what is considered a turning point. (I'm taking a turning point to be any point on the original path, whose angle between the entering and exiting vectors on the simplified path is greater than min_angle).
I will be giving numpy/scipy code below, as I have almost no Matlab experience.
If your curve is smooth enough, you could identify your turning points as those of highest curvature. Taking the point index number as the curve parameter, and a central differences scheme, you can compute the curvature with the following code
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
def first_derivative(x) :
return x[2:] - x[0:-2]
def second_derivative(x) :
return x[2:] - 2 * x[1:-1] + x[:-2]
def curvature(x, y) :
x_1 = first_derivative(x)
x_2 = second_derivative(x)
y_1 = first_derivative(y)
y_2 = second_derivative(y)
return np.abs(x_1 * y_2 - y_1 * x_2) / np.sqrt((x_1**2 + y_1**2)**3)
You will probably want to smooth your curve out first, then calculate the curvature, then identify the highest curvature points. The following function does just that:
def plot_turning_points(x, y, turning_points=10, smoothing_radius=3,
cluster_radius=10) :
if smoothing_radius :
weights = np.ones(2 * smoothing_radius + 1)
new_x = scipy.ndimage.convolve1d(x, weights, mode='constant', cval=0.0)
new_x = new_x[smoothing_radius:-smoothing_radius] / np.sum(weights)
new_y = scipy.ndimage.convolve1d(y, weights, mode='constant', cval=0.0)
new_y = new_y[smoothing_radius:-smoothing_radius] / np.sum(weights)
else :
new_x, new_y = x, y
k = curvature(new_x, new_y)
turn_point_idx = np.argsort(k)[::-1]
t_points = []
while len(t_points) < turning_points and len(turn_point_idx) > 0:
t_points += [turn_point_idx[0]]
idx = np.abs(turn_point_idx - turn_point_idx[0]) > cluster_radius
turn_point_idx = turn_point_idx[idx]
t_points = np.array(t_points)
t_points += smoothing_radius + 1
plt.plot(x,y, 'k-')
plt.plot(new_x, new_y, 'r-')
plt.plot(x[t_points], y[t_points], 'o')
plt.show()
Some explaining is in order:
turning_points is the number of points you want to identify
smoothing_radius is the radius of a smoothing convolution to be applied to your data before computing the curvature
cluster_radius is the distance from a point of high curvature selected as a turning point where no other point should be considered as a candidate.
You may have to play around with the parameters a little, but I got something like this:
>>> x, y = np.genfromtxt('bla.data')
>>> plot_turning_points(x, y, turning_points=20, smoothing_radius=15,
... cluster_radius=75)
Probably not good enough for a fully automated detection, but it's pretty close to what you wanted.
A very interesting question. Here is my solution, that allows for variable resolution. Although, fine-tuning it may not be simple, as it's mostly intended to narrow down
Every k points, calculate the convex hull and store it as a set. Go through the at most k points and remove any points that are not in the convex hull, in such a way that the points don't lose their original order.
The purpose here is that the convex hull will act as a filter, removing all of "unimportant points" leaving only the extreme points. Of course, if the k-value is too high, you'll end up with something too close to the actual convex hull, instead of what you actually want.
This should start with a small k, at least 4, then increase it until you get what you seek. You should also probably only include the middle point for every 3 points where the angle is below a certain amount, d. This would ensure that all of the turns are at least d degrees (not implemented in code below). However, this should probably be done incrementally to avoid loss of information, same as increasing the k-value. Another possible improvement would be to actually re-run with points that were removed, and and only remove points that were not in both convex hulls, though this requires a higher minimum k-value of at least 8.
The following code seems to work fairly well, but could still use improvements for efficiency and noise removal. It's also rather inelegant in determining when it should stop, thus the code really only works (as it stands) from around k=4 to k=14.
def convex_filter(points,k):
new_points = []
for pts in (points[i:i + k] for i in xrange(0, len(points), k)):
hull = set(convex_hull(pts))
for point in pts:
if point in hull:
new_points.append(point)
return new_points
# How the points are obtained is a minor point, but they need to be in the right order.
x_coords = [float(x) for x in x.split()]
y_coords = [float(y) for y in y.split()]
points = zip(x_coords,y_coords)
k = 10
prev_length = 0
new_points = points
# Filter using the convex hull until no more points are removed
while len(new_points) != prev_length:
prev_length = len(new_points)
new_points = convex_filter(new_points,k)
Here is a screen shot of the above code with k=14. The 61 red dots are the ones that remain after the filter.
The approach you took sounds promising but your data is heavily oversampled. You could filter the x and y coordinates first, for example with a wide Gaussian and then downsample.
In MATLAB, you could use x = conv(x, normpdf(-10 : 10, 0, 5)) and then x = x(1 : 5 : end). You will have to tweak those numbers depending on the intrinsic persistence of the objects you are tracking and the average distance between points.
Then, you will be able to detect changes in direction very reliably, using the same approach you tried before, based on the scalar product, I imagine.
Another idea is to examine the left and the right surroundings at every point. This may be done by creating a linear regression of N points before and after each point. If the intersecting angle between the points is below some threshold, then you have an corner.
This may be done efficiently by keeping a queue of the points currently in the linear regression and replacing old points with new points, similar to a running average.
You finally have to merge adjacent corners to a single corner. E.g. choosing the point with the strongest corner property.