I have game window of size 640 by 480 and it is populated by particles, but when a particle goes off to one side, it wraps around to the other (i.e. it is a toroid).
I want to calculate the distance between each particle, since this will be used to apply different forces to each particle.
At first I looped through each pair of particles, and then rescaled everything so that the first particle in pair was centered and then calculated the distance to the second particle, but this was extremely slow to run.
Then I found some functions in scipy.spatial.distance that allow me to calculate the distance between all points very quickly, but the only problem is that it doesn't take into account the wrap around.
Here is my current code
from scipy.spatial.distance import pdist, squareform
...
distance = squareform(pdist([(p.x, p.y) for p in particles]))
This works for particles near the center, but if one particle is at (1, 320) and the other particle is at (639, 320), then it calculates their distance as 638 instead of 2. It doesn't take into account the wrap.
Is there a different function I can use, or some transformation I can apply before/after to take into account the wrap?
You can compute the smaller of the x and y differences (the in-window difference versus the edge-crossing distance) like this:
game_width = 640
game_height = 480
def smaller_xy(point1, point2):
xdiff = abs(point1.x - point2.x)
if xdiff > (game_width / 2):
xdiff = game_width - xdiff
ydiff = abs(point1.y - point2.y)
if ydiff > (game_height / 2):
ydiff = game_height - ydiff
return xdiff, ydiff
That is, if the in-window distance in the x or y directions is greater than half the size of the window in that direction, it's better to go off the edge -- and in that case the distance will be the window size in that direction minus the original in-window distance.
Obviously, once you have the x and y separations you can compute the distance between the points as:
import math
small_x, small_y = smaller_xy(p1, p2)
least_distance = math.sqrt(small_x**2 + small_y**2)
However, depending on how your force calculation is defined, you might find that all you really need is the square of the distance (just (small_x**2 + small_y**2)) and therefore you can avoid the work of finding the sqrt.
To get this plumbed into scipy.pdist, note that pdist can be called with a function argument in addition to the points, as:
Y = pdist(X, func)
This is the last form of invocation shown in the description of pdist at https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html#scipy.spatial.distance.pdist
You should be able to use that feature to cause pdist to to build its distances-between-all-pairs-of-points matrix on the basis of distances calculated by a callback function that applies the smaller_xy computation.
Let's imagine you replicate four boards above, below, left, and right to the original board, and also the particles on the original board onto the new boards. Let's also mark the particles.
Denote the N particles on the original board o(i), i running from 1 and N. a(i) for the particles on the replicated board above. b(i), l(i), r(i) for below, left, and right respectively.
For all distinct i and j, you need to find distances between o(i) and o(j), o(i) and a(j), etc, etc. There are 5x5 = 25 distances to compute for each pair of i and j. Once you have all these distances, take the minimum for each pair, that's your distance for i and j.
I was thinking there may be ways to prune the computation. But my thinking is you would need to at least compute distances between particles to the boarders and compare that with distances on original board. That is an overhead as well.
Related
Problem
Need to identify a way to find 2 mile clusters of points where each point has a value. Identify 2 mile areas which have a sum(value) > 50.
Data
I have data that looks like the following
ID COUNT LATITUDE LONGITUDE
187601546 20 025.56394 -080.03206
187601547 25 025.56394 -080.03206
187601548 4 025.56394 -080.03206
187601550 0 025.56298 -080.03285
Roughly 200K records. What I need to determine is if there are any areas where more than sum of the count exceeds 65 in a one mile radius (2 mile diameter) area.
Using each point as a center for an area
Now, I have python code from another project that will draw a shapefile around a point of x diameter as follows:
def poly_based_on_distance(center_lat,center_long, distance, bearing):
# bearing is in degrees
# distance in miles
# print ('center', center_lat, center_long)
destination = (vincenty(miles=distance).destination(Point(center_lat,
center_long), bearing).format_decimal())
And a routine to return destination and then see which points are inside the radius.
## This is the evaluation for overlap between points and
## area polyshapes
area_list = []
store_geo_dict = {}
for stores in locationdict:
location = Polygon(locationdict[stores])
for areas in AREAdictionary:
area = Polygon(AREAdictionary[areass])
if store.intersects(area):
area_list.append(areas)
store_geo_dict[stores] = area_list
area_list = []
At this point, I am simply drawing a circular shapefile around each of the 200K points, see which others were inside and doing the count.
Need Clustering Algorithm?
However, there might be an area with the required count density where one of the points is not in the center.
I'm familiar with clustering algos such as DBSCAN that use attributes for classification but this is a matter of finding a density clusters using a value for each point. Is there any clustering algorithm to find any cluster of a 2 mile diameter circle where the inside count is >= 50?
Any suggestions, python or R are preferred tools but this is wide-open and probably a one-off so computation efficiency is not a priority.
Not a complete solution, but maybe it will help simplify the problem depending on the distribution of your data. I will use planar coordinates and cKDTree in my example, this might work with geographic data if you can ignore curvature in a projection.
The main observation is the following: a point (x,y) does not contribute to a dense cluster if a ball of radius 2*r (e.g. 2 miles) around (x,y) contributes less than the cutoff value (e.g. 50 in your title). In fact, any point within r of (x,y) does not contribute to ant dense cluster.
This allows you to repeatedly discard points from consideration. If you are left with no points, there are no dense clusters; if you are left with some points, clusters may exist.
import numpy as np
from scipy.spatial import cKDTree
# test data
N = 1000
data = np.random.rand(N, 2)
x, y = data.T
# test weights of each point
weights = np.random.rand(N)
def filter_noncontrib(pts, weights, radius=0.1, cutoff=60):
tree = cKDTree(pts)
contribs = np.array(
[weights[tree.query_ball_point(pt, 2 * radius)].sum() for pt in pts]
)
return contribs >= cutoff
def possible_contributors(pts, weights, radius=0.1, cutoff=60):
n_pts = len(pts)
while len(pts):
mask = filter_noncontrib(pts, weights, radius, cutoff)
pts = pts[mask]
weights = weights[mask]
if len(pts) == n_pts:
break
n_pts = len(pts)
return pts
Example with dummy data:
DBSCAN can be adapted (see Generalized DBSCAN; define core points as weight sum >= 50), but it will not ensure the maximum cluster size (it computes transitive closures).
You could also try complete linkage. Use it to find clusters with the desired maximum diameter, then check if these satisfy the desired density. But that does not guarantee to find all.
It's probably faster to (a) build an index for fast radius search. (b) for every point, find neighbors in radius r; keep if they have the desired minimum sum. But that does not guarantee to find everything because the center is not necessarily a data point. Consider a max radius of 1, minimum weight 100. Two points with weight 50 each, at (0,0) and (1,1). Neither a query at (0,0) nor one at (1,1) will discover the solution, but a cluster at (.5,.5) satisfies the conditions.
Unfortunately, I believe your problem is at least NP-hard, so you won't be able to afford the ultimate solution.
For a project, I am trying to determine the time it would take for a photon to leave the Sun. However, I am having trouble with my code (found below).
More specifically, I set up a for loop with an if statement, and if some randomly generated probability is less than the probability of collision, that means the photon collides and it changes direction.
What I am having trouble with is setting up a condition where the for loop stops if the photon escapes (when distance > Sun radius). The one I have set up already doesn't appear to work.
I use a very scaled down measurement of the Sun's radius because if I didn't it would take a long time for the photon to escape in my simulation.
from numpy.random import random as rng # we want them random numbers
import numpy as np # for the math functions
import matplotlib.pyplot as plt # to make pretty pretty class
mass_proton = 1.67e-27
mass_electron = 9.11e-31
Thompson_cross = 6.65e-29
Sun_density = 150000
Sun_radius = .005
Mean_Free = (mass_proton + mass_electron)/(Thompson_cross*Sun_density*np.sqrt(2))
time_step= 10**-13 # Used this specifically in order for the path length to be < Mean free Path
path_length = (3e8)*time_step
Probability = 1-np.exp(-path_length/Mean_Free) # Probability of the photon colliding
def Random_walk():
x = 0 # Start at origin (0,0)
y = 0
N = 1000
m=0 # This is a counter I have set up for the number of collisions
for i in range(1,N+1):
prand = rng(N+1) # Randomly generated probability
if prand[i] < Probability: # If my prand is less than the probability
# of collision, the photon collides and changes
# direction
x += Mean_Free*np.cos(2*np.pi*prand)
y += Mean_Free*np.sin(2*np.pi*prand)
m += 1 # Everytime a collision occurs 1 is added to my collision counter
distance = np.sqrt(x**2 + y**2) # Final distance the photon covers
if np.all(distance) > Sun_radius: # if the distance the photon travels
break # is greater than the Radius of the Sun,
# the for loop stops, meaning the
#photon escaped
print(m)
return x,y,distance
x,y,d = Random_walk()
plt.plot(x,y, '--')
plt.plot(x[-1], y[-1], 'ro')
Any criticisms of my code are welcome, this is for a grade and I do want to learn how to do this correctly, so please tell me if you notice any other errors.
I don't understand the motivation for the formulas you've implemented. I'll explain my own motivation here, but if your instructor told you to do something else, I guess you should listen to them instead.
If I were going to do this, I would generate a sequence of movements of a photon, stopping when distance of the photon to the center of the sun is greater than the solar radius. Each movement is a sample from a distribution which has two components: one for the distance, and one for the direction. I will assume that these are independent (this may be questioned in a more careful simulation).
It seems plausible that the distribution of distance is an exponential distribution with parameter 1/(mean free path). Then the density is p(d) = (1/MFP) exp(-d/MFP). Its cdf is 1 - exp(-d/MFP) and the inverse of the cdf is -MFP log(1 - p) where p = cdf(d). Now you can sample from the distribution of distances: let p = rand(0, 1) where rand = uniform random and plug it into the inverse cdf to get d. This is called the inverse cdf method of sampling; a web search will find more info about it.
As for the direction, you can let angle = rand(0, 2*pi) and then (x, y) = (cos(angle), sin(angle)).
Now you can construct the series of positions. From an initial location, let the new location = previous + d*(x, y). Stop when distance of location to center is greater than radius.
Looks like a great problem! Good luck and have fun. Let me know if you have any questions.
Here is a way of thinking about the problem that you may find helpful. At each moment, the photon has a position (x, y) and a direction (dx, dy). The (dx, dy) variables are coefficients of the unit vector, so sqrt(dx**2 + dy**2) = 1.0. The distance traveled by the photon during one step is path_length * direction.
At each step you do 4 things:
calculate the photon's new position
figure out if the photon has left the sun by computing its distance from the center point
determine, with a single random number, whether or not the photon collides. If it does you randomly generate a new direction.
Append the photon's current position to a list. You might want to do this as a function of distance from the center rather than x,y.
At the end, you plot the list you have built up.
You should also choose a random direction at the very start.
I don't know how you will terminate the loop, for the photon isn't ever guaranteed to leave the sun - just like in the real world. In principle the program might run forever (or until the sun burns out).
There is a slight inaccuracy in that the photon can collide at any instant, not just at the end of one step. But since the steps are small, so is the error introduced by this simplification.
I will point out that you do not need numpy for any of this except perhaps the final plot. The standard Python library has all the math functions you need. Numpy is of course great for manipulating arrays of data, but the only array you will have here is the one you build, a step at a time, of photon position versus time.
As I pointed out in one of my comments, you are modeling the sun as a 2-dimensional object. If you want to do this calculation in three dimensions, you don't need to change this basic approach.
The Task
For a class in molecular dynamics, I have to simulate 100 particles in a box with periodic boundaries. I have to take particle-particle collisions into account, since the walls are 'transparent' those interactions can be happen across the boundaries. Since the simulation should cover 50000 steps, and I'm expecting more and more additional tasks in the future, I want my code as efficient as possible (I have to use python, despite the long run time).
The Setting
The system consists of
100 Particles
Box with x = 10, y = 5
Mass = 2
Radius = 0.2
Velocity |v| = 0.5 per step
Simulation of 50000 steps
What I've done so fare
I have found this example for particles in a box with particle-particle collision. Since the author used a efficient implementation, I took his approach.
My relevant code parts are (in strong resemblance to the linked site):
class particlesInPeriodicBox:
# define the particle properties
def __init__(self,
initialState = [[0,0,0,0]], # state with form [x, y, vx, vy] for each particle
boundaries = [0, 10, 0, 5], # box boundaries with form [xmin, xmax, ymin, ymax] in nm
radius = 0.2, # particle radius in nm
mass = 2): # mass in g/mol. Here a parameter instead of a global variable
self.initialState = np.asarray(initialState, dtype=float)
self.state = self.initialState.copy()
self.radius = radius
self.time = 0 # keep count of time, if time, i want to implement a 'clock'
self.mass = mass # mass
self.boundaries = boundaries
def collision(self):
"""
now, one has to check for collisions. I do this by distance check and will solve this problems below.
To minimize the simulation time I then will only consider the particles that colided.
"""
dist = squareform(pdist(self.state[:, :2])) # direct distance
colPart1, colPart2 = np.where(dist < 2 * self.radius) # define collision partners 1 and 2 as those where the distance between the centeres are smaller than two times the radius
# resolve self-self collissions
unique = (colPart1 < colPart2)
colPart1 = colPart1[unique]
colPart2 = colPart2[unique]
"""
The following loop resolves the collisions. I zip the lists of collisionpartners to one aray,
where one entry contains both colisionpartners.
"""
for cp1, cp2 in zip(colPart1, colPart2): # cp1/cp2 are the two particles colliding in one collision.
# masses could be different in future...
m1 = self.mass[cp1]
m2 = self.mass[cp2]
# take the position (x,y) tuples for the two particles
r1 = self.state[cp1, :2]
r2 = self.state[cp2, :2]
# same with velocities
v1 = self.state[cp1, 2:]
v2 = self.state[cp2, 2:]
# get relative parameters
r = r1-r2
v = r2-r1
# center of mass velocity:
vcm = (m1 * v1 + m2 * v2) / (m1 + m2)
"""
This is the part with the elastic collision
"""
dotrr = np.dot(r, r) # the dot product of the relative position with itself
dotvr = np.dot(v, r) # the dot product of the relative velocity with the relative position
v = 2 * r * dotvr / dotrr - v # new relative velocity
"""
In this center of mass frame, the velocities 'reflect' on the center of mass in oposite directions
"""
self.state[cp1, 2:] = vcm + v * m2/(m1 + m2) # new velocity of particle 1 still considering possible different masses
self.state[cp2, 2:] = vcm - v * m1/(m1 + m2) # new velocity of particle 2
As I understand it, this technique of handling the operations to the whole arrays is more efficient than manually looping through it every time. Moving the particles 'trough' the wall is easy, I just subtract or add the dimension of the box, respectively. But:
The Problem
For now the algorithm only sees collision inside the box, but not across the boundaries. I thought about this problem a while now, and come up with the following Ideas:
I could make a total of 9 copies of this system in a 3x3 grid, and only considering the middle one, can so look into the neighboring cells for the nearest neighbor search. BUT i can't think of a effective way to implement this despite the fact, that this approach seams to be the standard way
Every other idea has some hand waving use of modulo, and im almost sure, that this is not the way to go.
If I had to boil it down, I guess my key questions are:
How do I take periodic boundaries into account when calculating
the distance between particles?
the actual particle-particle collision (elastic) and resulting directions?
For the first problem it might be possible to use techniques like in Calculation of Contact/Coordination number with Periodic Boundary Conditions, but Im not sure if that is the most efficient way.
Thank you!
Modulus is likely as quick an operation as you're going to get. In any self-respecting run-time system, this will attach directly to the on-chip floating-divide operations, which may well be faster than a tedious set of "if-subtract" pairs.
I think your 9-cell solution is overkill. Use 4 cells in a 2x2 matrix and check two regions: the original cell, and the same dimensions centered on the "four corners" point (middle of the 2x2). For any pair of points, the proper distance is the lesser of these two. Note that this method also gives you a frame in which you can easily calculate the momentum changes.
A third possible approach is to double the dimensions (ala the 2x2 above), but give each particle four sets of coordinates, one in each box. Alter your algorithms to consider all four when computing distance. If you have good vectorization packages and parallelism, this might be the preferred solution.
I want to generate random points on the surface of cylinder such that distance between the points fall in a range of 230 and 250. I used the following code to generate random points on surface of cylinder:
import random,math
H=300
R=20
s=random.random()
#theta = random.random()*2*math.pi
for i in range(0,300):
theta = random.random()*2*math.pi
z = random.random()*H
r=math.sqrt(s)*R
x=r*math.cos(theta)
y=r*math.sin(theta)
z=z
print 'C' , x,y,z
How can I generate random points such that they fall with in the range(on the surfaceof cylinder)?
This is not a complete solution, but an insight that should help. If you "unroll" the surface of the cylinder into a rectangle of width w=2*pi*r and height h, the task of finding distance between points is simplified. You have not explained how to measure "distance along the surface" between points on the top of the cylinder and the side- this is a slightly tricky bit of geometry.
As for computing the distance along the surface when we created an artificial "seam", just use both (x1-x2) and (w -x1+x2) - whichever gives the shorter distance is the one you want.
I do think that #VincentNivoliers' suggestion to use Poisson disk sampling is very good, but with the constraints of h=300 and r=20 you will get terrible results no matter what.
The basic way of creating a set of random points with constraints in the positions between them, is to have a function that modulates the probability of points being placed at a certain location. this function starts out being a constant, and whenever a point is placed, forbidden areas surrounding the point are set to zero. That is difficult to do with continuous variables, but reasonably easy if you discretize your problem.
The other thing to be careful about is the being on a cylinder part. It may be easier to think of it as random points on a rectangular area that repeats periodically. This can be handled in two different ways:
the simplest is to take into consideration not only the rectangular tile where you are placing the points, but also its neighbouring ones. Whenever you place a point in your main tile, you also place one in the neighboring ones and compute their effect on the probability function inside your tile.
A more sophisticated approach considers the probability function then convolution of a kernel that encodes forbidden areas, with a sum of delta functions, corresponding to the points already placed. If this is computed using FFTs, the periodicity is anatural by product.
The first approach can be coded as follows:
from __future__ import division
import numpy as np
r, h = 20, 300
w = 2*np.pi*r
int_w = int(np.rint(w))
mult = 10
pdf = np.ones((h*mult, int_w*mult), np.bool)
points = []
min_d, max_d = 230, 250
available_locs = pdf.sum()
while available_locs:
new_idx = np.random.randint(available_locs)
new_idx = np.nonzero(pdf.ravel())[0][new_idx]
new_point = np.array(np.unravel_index(new_idx, pdf.shape))
points += [new_point]
min_mask = np.ones_like(pdf)
if max_d is not None:
max_mask = np.zeros_like(pdf)
else:
max_mask = True
for p in [new_point - [0, int_w*mult], new_point +[0, int_w*mult],
new_point]:
rows = ((np.arange(pdf.shape[0]) - p[0]) / mult)**2
cols = ((np.arange(pdf.shape[1]) - p[1]) * 2*np.pi*r/int_w/mult)**2
dist2 = rows[:, None] + cols[None, :]
min_mask &= dist2 > min_d*min_d
if max_d is not None:
max_mask |= dist2 < max_d*max_d
pdf &= min_mask & max_mask
available_locs = pdf.sum()
points = np.array(points) / [mult, mult*int_w/(2*np.pi*r)]
If you run it with your values, the output is usually just one or two points, as the large minimum distance forbids all others. but if you run it with more reasonable values, e.g.
min_d, max_d = 50, 200
Here's how the probability function looks after placing each of the first 5 points:
Note that the points are returned as pairs of coordinates, the first being the height, the second the distance along the cylinder's circumference.
I'm trying to come up with an algorithm that will determine turning points in a trajectory of x/y coordinates. The following figures illustrates what I mean: green indicates the starting point and red the final point of the trajectory (the entire trajectory consists of ~ 1500 points):
In the following figure, I added by hand the possible (global) turning points that an algorithm could return:
Obviously, the true turning point is always debatable and will depend on the angle that one specifies that has to lie between points. Furthermore a turning point can be defined on a global scale (what I tried to do with the black circles), but could also be defined on a high-resolution local scale. I'm interested in the global (overall) direction changes, but I'd love to see a discussion on the different approaches that one would use to tease apart global vs local solutions.
What I've tried so far:
calculate distance between subsequent points
calculate angle between subsequent points
look how distance / angle changes between subsequent points
Unfortunately this doesn't give me any robust results. I probably have too calculate the curvature along multiple points, but that's just an idea.
I'd really appreciate any algorithms / ideas that might help me here. The code can be in any programming language, matlab or python are preferred.
EDIT here's the raw data (in case somebody want's to play with it):
mat file
text file (x coordinate first, y coordinate in second line)
You could use the Ramer-Douglas-Peucker (RDP) algorithm to simplify the path. Then you could compute the change in directions along each segment of the simplified path. The points corresponding to the greatest change in direction could be called the turning points:
A Python implementation of the RDP algorithm can be found on github.
import matplotlib.pyplot as plt
import numpy as np
import os
import rdp
def angle(dir):
"""
Returns the angles between vectors.
Parameters:
dir is a 2D-array of shape (N,M) representing N vectors in M-dimensional space.
The return value is a 1D-array of values of shape (N-1,), with each value
between 0 and pi.
0 implies the vectors point in the same direction
pi/2 implies the vectors are orthogonal
pi implies the vectors point in opposite directions
"""
dir2 = dir[1:]
dir1 = dir[:-1]
return np.arccos((dir1*dir2).sum(axis=1)/(
np.sqrt((dir1**2).sum(axis=1)*(dir2**2).sum(axis=1))))
tolerance = 70
min_angle = np.pi*0.22
filename = os.path.expanduser('~/tmp/bla.data')
points = np.genfromtxt(filename).T
print(len(points))
x, y = points.T
# Use the Ramer-Douglas-Peucker algorithm to simplify the path
# http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
# Python implementation: https://github.com/sebleier/RDP/
simplified = np.array(rdp.rdp(points.tolist(), tolerance))
print(len(simplified))
sx, sy = simplified.T
# compute the direction vectors on the simplified curve
directions = np.diff(simplified, axis=0)
theta = angle(directions)
# Select the index of the points with the greatest theta
# Large theta is associated with greatest change in direction.
idx = np.where(theta>min_angle)[0]+1
fig = plt.figure()
ax =fig.add_subplot(111)
ax.plot(x, y, 'b-', label='original path')
ax.plot(sx, sy, 'g--', label='simplified path')
ax.plot(sx[idx], sy[idx], 'ro', markersize = 10, label='turning points')
ax.invert_yaxis()
plt.legend(loc='best')
plt.show()
Two parameters were used above:
The RDP algorithm takes one parameter, the tolerance, which
represents the maximum distance the simplified path
can stray from the original path. The larger the tolerance, the cruder the simplified path.
The other parameter is the min_angle which defines what is considered a turning point. (I'm taking a turning point to be any point on the original path, whose angle between the entering and exiting vectors on the simplified path is greater than min_angle).
I will be giving numpy/scipy code below, as I have almost no Matlab experience.
If your curve is smooth enough, you could identify your turning points as those of highest curvature. Taking the point index number as the curve parameter, and a central differences scheme, you can compute the curvature with the following code
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
def first_derivative(x) :
return x[2:] - x[0:-2]
def second_derivative(x) :
return x[2:] - 2 * x[1:-1] + x[:-2]
def curvature(x, y) :
x_1 = first_derivative(x)
x_2 = second_derivative(x)
y_1 = first_derivative(y)
y_2 = second_derivative(y)
return np.abs(x_1 * y_2 - y_1 * x_2) / np.sqrt((x_1**2 + y_1**2)**3)
You will probably want to smooth your curve out first, then calculate the curvature, then identify the highest curvature points. The following function does just that:
def plot_turning_points(x, y, turning_points=10, smoothing_radius=3,
cluster_radius=10) :
if smoothing_radius :
weights = np.ones(2 * smoothing_radius + 1)
new_x = scipy.ndimage.convolve1d(x, weights, mode='constant', cval=0.0)
new_x = new_x[smoothing_radius:-smoothing_radius] / np.sum(weights)
new_y = scipy.ndimage.convolve1d(y, weights, mode='constant', cval=0.0)
new_y = new_y[smoothing_radius:-smoothing_radius] / np.sum(weights)
else :
new_x, new_y = x, y
k = curvature(new_x, new_y)
turn_point_idx = np.argsort(k)[::-1]
t_points = []
while len(t_points) < turning_points and len(turn_point_idx) > 0:
t_points += [turn_point_idx[0]]
idx = np.abs(turn_point_idx - turn_point_idx[0]) > cluster_radius
turn_point_idx = turn_point_idx[idx]
t_points = np.array(t_points)
t_points += smoothing_radius + 1
plt.plot(x,y, 'k-')
plt.plot(new_x, new_y, 'r-')
plt.plot(x[t_points], y[t_points], 'o')
plt.show()
Some explaining is in order:
turning_points is the number of points you want to identify
smoothing_radius is the radius of a smoothing convolution to be applied to your data before computing the curvature
cluster_radius is the distance from a point of high curvature selected as a turning point where no other point should be considered as a candidate.
You may have to play around with the parameters a little, but I got something like this:
>>> x, y = np.genfromtxt('bla.data')
>>> plot_turning_points(x, y, turning_points=20, smoothing_radius=15,
... cluster_radius=75)
Probably not good enough for a fully automated detection, but it's pretty close to what you wanted.
A very interesting question. Here is my solution, that allows for variable resolution. Although, fine-tuning it may not be simple, as it's mostly intended to narrow down
Every k points, calculate the convex hull and store it as a set. Go through the at most k points and remove any points that are not in the convex hull, in such a way that the points don't lose their original order.
The purpose here is that the convex hull will act as a filter, removing all of "unimportant points" leaving only the extreme points. Of course, if the k-value is too high, you'll end up with something too close to the actual convex hull, instead of what you actually want.
This should start with a small k, at least 4, then increase it until you get what you seek. You should also probably only include the middle point for every 3 points where the angle is below a certain amount, d. This would ensure that all of the turns are at least d degrees (not implemented in code below). However, this should probably be done incrementally to avoid loss of information, same as increasing the k-value. Another possible improvement would be to actually re-run with points that were removed, and and only remove points that were not in both convex hulls, though this requires a higher minimum k-value of at least 8.
The following code seems to work fairly well, but could still use improvements for efficiency and noise removal. It's also rather inelegant in determining when it should stop, thus the code really only works (as it stands) from around k=4 to k=14.
def convex_filter(points,k):
new_points = []
for pts in (points[i:i + k] for i in xrange(0, len(points), k)):
hull = set(convex_hull(pts))
for point in pts:
if point in hull:
new_points.append(point)
return new_points
# How the points are obtained is a minor point, but they need to be in the right order.
x_coords = [float(x) for x in x.split()]
y_coords = [float(y) for y in y.split()]
points = zip(x_coords,y_coords)
k = 10
prev_length = 0
new_points = points
# Filter using the convex hull until no more points are removed
while len(new_points) != prev_length:
prev_length = len(new_points)
new_points = convex_filter(new_points,k)
Here is a screen shot of the above code with k=14. The 61 red dots are the ones that remain after the filter.
The approach you took sounds promising but your data is heavily oversampled. You could filter the x and y coordinates first, for example with a wide Gaussian and then downsample.
In MATLAB, you could use x = conv(x, normpdf(-10 : 10, 0, 5)) and then x = x(1 : 5 : end). You will have to tweak those numbers depending on the intrinsic persistence of the objects you are tracking and the average distance between points.
Then, you will be able to detect changes in direction very reliably, using the same approach you tried before, based on the scalar product, I imagine.
Another idea is to examine the left and the right surroundings at every point. This may be done by creating a linear regression of N points before and after each point. If the intersecting angle between the points is below some threshold, then you have an corner.
This may be done efficiently by keeping a queue of the points currently in the linear regression and replacing old points with new points, similar to a running average.
You finally have to merge adjacent corners to a single corner. E.g. choosing the point with the strongest corner property.