efficiently calculate distance between multiple pygame objects - python

I have a pygame program where i wish to check if any of the rabbits are close enough to mate. In doing so i have to use two for loops where i use the distance between two points formula to calculate the distance. This is process consumes many of my computers resources and cause the games performance to drop dramatically.
What is the most efficient way to check each rabbits distance to one another?
def mating(rabbits):
for i in range(len(rabbits)):
for x in range(len(rabbits)):
if math.sqrt(math.pow((rabbits[i].xpos - rabbits[x].xpos),2) + math.pow((rabbits[i].ypos - rabbits[x].ypos),2)) <= 20:
#add a new rabbit
rabbits.append(rabbit())

In your algorithm, math.sqrt consumes most of the time. Calculating the square root is very expensive. Compare the square of the distance instead of the distance, so you don't have to calculate the square root.
You also calculate the distance from one rabbit to the other rabbit twice. Note that you even calculate a rabbit's distance from itself (when i is equal to x). This distance is always 0. The outer loop must go through all rabbits. However, the inner loop only needs to iterate through the subsequent rabbits in the list (rabbits[i+1:]).
def mating(rabbits):
for i, rabbit1 in enumerate(rabbits):
for rabbit2 in rabbits[i+1:]:
dx = rabbit1.xpos - rabbit2.xpos
dy = rabbit1.ypos - rabbit2.ypos
if dx*dx+dy*dy <= 20*20:
#add a new rabbit
rabbits.append(rabbit())

Related

Anyone knows a more efficient way to run a pairwise comparison of hundreds of trajectories?

So I have two different files containing multiple trajectories in a squared map (512x512 pixels). Each file contains information about the spatial position of each particle within a track/trajectory (X and Y coordinates) and to which track/trajectory that spot belongs to (TRACK_ID).
My goal was to find a way to cluster similar trajectories between both files. I found a nice way to do this (distance clustering comparison), but the code it's too slow. I was just wondering if someone has some suggestions to make it faster.
My files look something like this:
The approach that I implemented finds similar trajectories based on something called Fréchet Distance (maybe not to relevant here). Below you can find the function that I wrote, but briefly this is the rationale:
group all the spots by track using pandas.groupby function for file1 (growth_xml) and file2 (shrinkage_xml)
for each trajectories in growth_xml (loop) I compare with each trajectory in growth_xml
if they pass the Fréchet Distance criteria that I defined (an if statement) I save both tracks in a new table. you can see an additional filter condition that I called delay, but I guess that is not important to explain here.
so really simple:
def distance_clustering(growth_xml,shrinkage_xml):
coords_g = pd.DataFrame() # empty dataframes to save filtered tracks
coords_s = pd.DataFrame()
counter = 0 #initalize counter to count number of filtered tracks
for track_g, param_g in growth_xml.groupby('TRACK_ID'):
# define growing track as multi-point line object
traj1 = [(x,y) for x,y in zip(param_g.POSITION_X.values, param_g.POSITION_Y.values)]
for track_s, param_s in shrinkage_xml.groupby('TRACK_ID'):
# define shrinking track as a second multi-point line object
traj2 = [(x,y) for x,y in zip(param_s.POSITION_X.values, param_s.POSITION_Y.values)]
# compute delay between shrinkage and growing ends to use as an extra filter
delay = (param_s.FRAME.iloc[0] - param_g.FRAME.iloc[0])
# keep track only if the frechet Distance is lower than 0.2 microns
if frechetDist(traj1, traj2) < 0.2 and delay > 0:
counter += 1
param_g = param_g.assign(NEW_ID = np.ones(param_g.shape[0]) * counter)
coords_g = pd.concat([coords_g, param_g])
param_s = param_s.assign(NEW_ID = np.ones(param_s.shape[0]) * counter)
coords_s = pd.concat([coords_s, param_s])
coords_g.reset_index(drop = True, inplace = True)
coords_s.reset_index(drop = True, inplace = True)
return coords_g, coords_s
The main problem is that most of the times I have more than 2 thousand tracks (!!) and this pairwise combination takes forever. I'm wondering if there's a simple and more efficient way to do this. Perhaps by doing the pairwise combination in multiple small areas instead of the whole map? not sure...
Have you tried to make a matrix (DeltaX,DeltaY) lookUpTable for the pairwise combination distance. It will take some long time to calc the LUT once, or you can write it in a file and load it when the algo starts.
Then you'll only have to look on correct case to have the result instead of calc each time.
You can too make a polynomial regression for the distance calc, it will be less precise but definitely faster
Maybe not an outright answer, but it's been a while. Could you not segment the lines and use minimum bounding box around each segment to assess similarities? I might be thinking of your problem the wrong way around. I'm not sure. Right now I'm trying to work with polygons from two different data sets and want to optimize the processing by first identifying the polygons in both geometries that overlap.
In your case, I think segments would you leave you with some edge artifacts. Maybe look at this paper: https://drops.dagstuhl.de/opus/volltexte/2021/14879/pdf/OASIcs-ATMOS-2021-10.pdf or this paper (with python code): https://www.austriaca.at/0xc1aa5576_0x003aba2b.pdf

How calculate in python a history of x,y data to calculate movement?

Hello im trying to calculate if an object is stopped or in movement.
In my code I have a list for x, y coordinates like a history.
history = [[12,30],[15,30],[25,30],[35,30],[45,30],[50,32],[50,33],[51,32]]
Id like to calculate in a history if this object is stopped or in movement.
If I take the distance from last two coordinates I will have a low value that can return for me that this object is stopped.
But id like to get more data like the last 10 information but if i don't have the 10 items in my list I will get all list.
And after that i want calculate the movement distance for each point in a median.
My actual function for distance:
def calculateDistance(x1,y1,x2,y2):
dist = math.sqrt((x2 - x1)**2 + (y2 - y1)**2)
return dist
any one can help me ?
Assuming your position history was sampled at regular intervals you can group the positions by streaks of "closeness" using zip and list enumerations. With these groups of "stillness" you can select the periods of time where a pause of a minimal duration occurred.
Note that this only requires comparing distances so you don't need to use the square root.
history = [[12,30],[15,30],[25,30],[35,30],[45,30],[50,32],[50,33],[51,32]]
minDist = 5 # this is your distance threshold to determine if a movement occurred
minTime = 3 # stillness time expressed in number of position samples
dists = ( (ax-bx)**2+(ay-by)**2 for (ax,ay),(bx,by) in zip(history,history[1:]) ) # distance to neighbour
moves = [ i for i,d in enumerate(dists,1) if d>minDist**2 ] # positions where movement occurred
pauses = [ history[s:e] for s,e in zip([0]+moves,moves+[len(history)]) ] # groups of "stillness"
output:
for pause in pauses:
if len(pause) >= minTime: # check stayed sill for a minimum amount of time
print(pause[0],len(pause))
# [50, 32] 3
You could refine this by checking if all points in the group are within 1/2 distance from the center (averageX,averageY) and break down the group further based on that (using the same technique if needed)
you can calculate all the distance between two consecutive values.
def calculateDistance(x1,y1,x2,y2):
dist = math.sqrt((x2 - x1)**2 + (y2 - y1)**2)
return dist
history = [[12,30],[15,30],[25,30],[35,30],[45,30],[50,32],[50,33],[51,32]]
distanceHistory = []
for index in range(0,len(history)):
distance = calculateDistance(history[idx][0],history[idx][1], history[idx+1][0],history[idx+1][1])
distanceHistory.append(distance)
Answering here as I can't comment yet. For the first question: with only a pair of coordinates you won't be able to easily identify whether the player/object is moving or stopped. You could add a third variable, time. Between two past moves you'd be able to tell for how long the object stood still or how long the move took. If you add a check every x time period, you'd have even more accuracy and you'd know surely if they stood still or not.

Fast great circle for multiple points - Python geopy

Is it possible to speed up the great_circle(pos1, pos2).miles from geopy if using it for multiple thousand points?
I want to create something like a distance matrix and at the moment my machine needs 5 seconds for 250,000 calculations.
Actually pos1 is always the same if it helps.
Another "restriction" in my case is that I only want all points pos2 which have a distance less than a constant x.
(The exact distance doesn't matter in my case)
Is there a fast method? Do I need to use a faster function than great_circle which is less accurate or is it possible to speed it up without losing accuracy?
Update
In my case the question is whether a point is inside a circle.
Therefore it is easily possible to first get whether a point is inside a square.
start = geopy.Point(mid_point_lat, mid_point_lon)
d = geopy.distance.VincentyDistance(miles=radius)
p_north_lat = d.destination(point=start, bearing=0).latitude
# check whether the given point lat is > p_north_lat
# and so on for east, south and west

Using flood fill algorithm to determine map area with equal height

I have a list of lists where each element represents the average height in integers of a all square metres contained in the map (one number= one square metre). For example:
map=[
[1,1,1,1],
[1,1,2,2],
[1,2,2,2]
] # where 1 and 2 are the average heights of those coordenates.
I'm trying to implement a method that, given a position looks for the area around him that has the same height. let's call them 'Flat areas'.
I found a solution in the flood-fill algorithm. However, i'm having some problems when it comes to writing the code. I get a
RuntimeError: maximum recursion depth exceeded
I've no idea of where my problem is. Here it is the code of the function:
def zona_igual_alcada(self,pos,zones=[],h=None):
x,y=pos
if h==None:
h=base_terreny.base_terreny.__getitem__(self,(x,y))
if base_terreny.base_terreny.__getitem__(self,(x,y))!=h:
return
if x in range(0,self.files) and y in range(0,self.columnes):
if base_terreny.base_terreny.__getitem__(self,(x,y))==h:
zones.append((x,y))
terreny.zona_igual_alcada(self,(x-1,y),zones,h)
terreny.zona_igual_alcada(self,(x+1,y),zones,h)
terreny.zona_igual_alcada(self,(x,y-1),zones,h)
terreny.zona_igual_alcada(self,(x,y+1),zones,h)
return set(zones)
You're not doing anything to "mark" the zones you have already visited, so you are doing the same zones over and over until the stack fills up.
This isn't a particularly efficient way to do a flood fill, so if you have a large number of zones you will be better off looking for a more efficient algorithm to do the flood fill (eg. scanline fill).

Euclidian Distance Python Implementation

I am playing with the following code from programming collective intelligence, this is a function from the book that calculated eclidian distance between two movie critics.
This function sums the difference of the rankings in the dictionary, but euclidean distance in n dimensions also includes the square root of that sum.
AFAIK since we use the same function to rank everyone it does not matter we square root or not, but i was wondering is there a particular reason for that?
from math import sqrt
# Returns a distance-based similarity score for person1 and person2
def sim_distance(prefs,person1,person2):
# Get the list of shared_items
si={}
for item in prefs[person1]:
if item in prefs[person2]:
si[item]=1
# if they have no ratings in common, return 0
if len(si)==0: return 0
# Add up the squares of all the differences
sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2)
for item in prefs[person1] if item in prefs[person2]])
return 1/(1+sum_of_squares)
The reason the square root is not used is because it is computationally expensive; it is monotonic (i.e., it preserves order) with the square function, so if all you're interested in is the order of the distances, the square root is unnecessary (and, as mentioned, very expensive computationally).
That's correct. While the square root is necessary for a quantitatively correct result, if all you care about is distance relative to others for sorting, then taking the square root is superfluous.
To compute a Cartesian distance, first you must compute the distance-squared, then you take its square root. But computing a square root is computationally expensive. If all you're really interested in is comparing distances, it works just as well to compare the distance-squared--and it's much faster.
For every two real numbers A and B, where A and B are >= zero, it's always true that A-squared and B-squared have the same relationship as A and B:
if A < B, then A-squared < B-squared.
if A == B, then A-squared == B-squared.
if A > B, then A-squared > B-squared.
Since distances are always >= 0 this relationship means comparing distance-squared gives you the same answer as comparing distance.
Just for intercomparisons the square root is not necessary and you would get the squared euclidean distance... which is also a distance (mathematically speaking, see http://en.wikipedia.org/wiki/Metric_%28mathematics%29).

Categories