Fast great circle for multiple points - Python geopy

Fast great circle for multiple points - Python geopy - python

Is it possible to speed up the great_circle(pos1, pos2).miles from geopy if using it for multiple thousand points?
I want to create something like a distance matrix and at the moment my machine needs 5 seconds for 250,000 calculations.
Actually pos1 is always the same if it helps.
Another "restriction" in my case is that I only want all points pos2 which have a distance less than a constant x.
(The exact distance doesn't matter in my case)
Is there a fast method? Do I need to use a faster function than great_circle which is less accurate or is it possible to speed it up without losing accuracy?
Update
In my case the question is whether a point is inside a circle.
Therefore it is easily possible to first get whether a point is inside a square.
start = geopy.Point(mid_point_lat, mid_point_lon)
d = geopy.distance.VincentyDistance(miles=radius)
p_north_lat = d.destination(point=start, bearing=0).latitude
# check whether the given point lat is > p_north_lat
# and so on for east, south and west

Related

efficiently calculate distance between multiple pygame objects

I have a pygame program where i wish to check if any of the rabbits are close enough to mate. In doing so i have to use two for loops where i use the distance between two points formula to calculate the distance. This is process consumes many of my computers resources and cause the games performance to drop dramatically.
What is the most efficient way to check each rabbits distance to one another?
def mating(rabbits):
for i in range(len(rabbits)):
for x in range(len(rabbits)):
if math.sqrt(math.pow((rabbits[i].xpos - rabbits[x].xpos),2) + math.pow((rabbits[i].ypos - rabbits[x].ypos),2)) <= 20:
#add a new rabbit
rabbits.append(rabbit())

In your algorithm, math.sqrt consumes most of the time. Calculating the square root is very expensive. Compare the square of the distance instead of the distance, so you don't have to calculate the square root.
You also calculate the distance from one rabbit to the other rabbit twice. Note that you even calculate a rabbit's distance from itself (when i is equal to x). This distance is always 0. The outer loop must go through all rabbits. However, the inner loop only needs to iterate through the subsequent rabbits in the list (rabbits[i+1:]).
def mating(rabbits):
for i, rabbit1 in enumerate(rabbits):
for rabbit2 in rabbits[i+1:]:
dx = rabbit1.xpos - rabbit2.xpos
dy = rabbit1.ypos - rabbit2.ypos
if dx*dx+dy*dy <= 20*20:
#add a new rabbit
rabbits.append(rabbit())

Anyone knows a more efficient way to run a pairwise comparison of hundreds of trajectories?

So I have two different files containing multiple trajectories in a squared map (512x512 pixels). Each file contains information about the spatial position of each particle within a track/trajectory (X and Y coordinates) and to which track/trajectory that spot belongs to (TRACK_ID).
My goal was to find a way to cluster similar trajectories between both files. I found a nice way to do this (distance clustering comparison), but the code it's too slow. I was just wondering if someone has some suggestions to make it faster.
My files look something like this:
The approach that I implemented finds similar trajectories based on something called Fréchet Distance (maybe not to relevant here). Below you can find the function that I wrote, but briefly this is the rationale:
group all the spots by track using pandas.groupby function for file1 (growth_xml) and file2 (shrinkage_xml)
for each trajectories in growth_xml (loop) I compare with each trajectory in growth_xml
if they pass the Fréchet Distance criteria that I defined (an if statement) I save both tracks in a new table. you can see an additional filter condition that I called delay, but I guess that is not important to explain here.
so really simple:
def distance_clustering(growth_xml,shrinkage_xml):
coords_g = pd.DataFrame() # empty dataframes to save filtered tracks
coords_s = pd.DataFrame()
counter = 0 #initalize counter to count number of filtered tracks
for track_g, param_g in growth_xml.groupby('TRACK_ID'):
# define growing track as multi-point line object
traj1 = [(x,y) for x,y in zip(param_g.POSITION_X.values, param_g.POSITION_Y.values)]
for track_s, param_s in shrinkage_xml.groupby('TRACK_ID'):
# define shrinking track as a second multi-point line object
traj2 = [(x,y) for x,y in zip(param_s.POSITION_X.values, param_s.POSITION_Y.values)]
# compute delay between shrinkage and growing ends to use as an extra filter
delay = (param_s.FRAME.iloc[0] - param_g.FRAME.iloc[0])
# keep track only if the frechet Distance is lower than 0.2 microns
if frechetDist(traj1, traj2) < 0.2 and delay > 0:
counter += 1
param_g = param_g.assign(NEW_ID = np.ones(param_g.shape[0]) * counter)
coords_g = pd.concat([coords_g, param_g])
param_s = param_s.assign(NEW_ID = np.ones(param_s.shape[0]) * counter)
coords_s = pd.concat([coords_s, param_s])
coords_g.reset_index(drop = True, inplace = True)
coords_s.reset_index(drop = True, inplace = True)
return coords_g, coords_s
The main problem is that most of the times I have more than 2 thousand tracks (!!) and this pairwise combination takes forever. I'm wondering if there's a simple and more efficient way to do this. Perhaps by doing the pairwise combination in multiple small areas instead of the whole map? not sure...

Have you tried to make a matrix (DeltaX,DeltaY) lookUpTable for the pairwise combination distance. It will take some long time to calc the LUT once, or you can write it in a file and load it when the algo starts.
Then you'll only have to look on correct case to have the result instead of calc each time.
You can too make a polynomial regression for the distance calc, it will be less precise but definitely faster

Maybe not an outright answer, but it's been a while. Could you not segment the lines and use minimum bounding box around each segment to assess similarities? I might be thinking of your problem the wrong way around. I'm not sure. Right now I'm trying to work with polygons from two different data sets and want to optimize the processing by first identifying the polygons in both geometries that overlap.
In your case, I think segments would you leave you with some edge artifacts. Maybe look at this paper: https://drops.dagstuhl.de/opus/volltexte/2021/14879/pdf/OASIcs-ATMOS-2021-10.pdf or this paper (with python code): https://www.austriaca.at/0xc1aa5576_0x003aba2b.pdf

How to generate statistically probably locations for ships in battleship

I made the original battleship and now I'm looking to upgrade my AI from random guessing to guessing statistically probably locations. I'm having trouble finding algorithms online, so my question is what kinds of algorithms already exist for this application? And how would I implement one?
Ships: 5, 4, 3, 3, 2
Field: 10X10
Board:
OCEAN = "O"
FIRE = "X"
HIT = "*"
SIZE = 10
SEA = [] # Blank Board
for x in range(SIZE):
SEA.append([OCEAN] * SIZE)
If you'd like to see the rest of the code, I posted it here: (https://github.com/Dbz/Battleship/blob/master/BattleShip.py); I didn't want to clutter the question with a lot of irrelevant code.

The ultimate naive solution wold be to go through every possible placement of ships (legal given what information is known) and counting the number of times each square is full.
obviously, in a relatively empty board this will not work as there are too many permutations, but a good start might be:
for each square on board: go through all ships and count in how many different ways it fits in that square, i.e. for each square of the ships length check if it fits horizontally and vertically.
an improvement might be to also check for each possible ship placement if the rest of the ships can be placed legally whilst covering all known 'hits' (places known to contain a ship).
to improve performance, if only one ship can be placed in a given spot, you no longer need to test it on other spots. also, when there are many 'hits', it might be quicker to first cover all known 'hits' and for each possible cover go through the rest.
edit: you might want to look into DFS.
Edit: Elaboration on OP's (#Dbz) suggestion in the comments:
hold a set of dismissed placements ('dissmissed') of ships (can be represented as string, say "4V5x3" for the placement of length 4 ship in 5x3, 5x4, 5x5, 5x6), after a guess you add all the placements the guess dismisses, then for each square hold a set of placements that intersect with it ('placements[x,y]') then the probability would be:
34-|intersection(placements[x,y], dissmissed)|/(3400-|dismissed|)
To add to the dismissed list:
if guess at (X,Y) is a miss add placements[x,y]
if guess at (X,Y) is a hit:
add neighboring placements (assuming that ships cannot be placed adjacently), i.e. add:
<(2,3a,3b,4,5)>H<X+1>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y+1>
<(2,3a,3b,4,5)>H<X-(2,3,3,4,5)>x<Y>, <(2,3a,3b,4,5)>V<X>x<Y-(2,3,3,4,5)>
2H<X+-1>x<Y+(-2 to 1)>, 3aH<X+-1>x<Y+(-3 to 1)> ...
2V<X+(-2 to 1)>x<Y+-1>, 3aV<X+(-3 to 1)>x<Y+-1> ...
if |intersection(placements[x,y], dissmissed)|==33, i.e. only one placement possible add ship (see later)
check if any of the previews hits has only one possible placement left, if so, add the ship
check to see if any of the ships have only possible placement, if so, add the ship
adding a ship:
add all other placements of that ship to dismissed
for each (x,y) of the ships placement add placements[x,y] with out the actual placement
for each (x,y) of the ships placement mark as hit guess (if not already known) run stage 2
for each (x,y) neighboring the ships placement mark as miss guess (if not already known) run stage 1
run stage 3 and 4.
i might have over complicated this, there might be some redundant actions, but you get the point.

Nice question, and I like your idea for statistical approach.
I think I would have tried a machine learning approach for this problem as follows:
First model your problem as a classification problem.
The classification problem is: Given a square (x,y) - you want to tell the likelihood of having a ship in this square. Let this likelihood be p.
Next, you need to develop some 'features'. You can take the surrounding of (x,y) [as you might have partial knowledge on it] as your features.
For example, the features of the middle of the following mini-board (+ indicates the square you want to determine if there is a ship or not in):
OO*
O+*
?O?
can be something like:
f1 = (0,0) = false
f2 = (0,1) = false
f3 = (0,2) = true
f4 = (1,0) = false
**note skipping (1,1)
f5 = (1,2) = true
f6 = (2,0) = unknown
f7 = (2,1) = false
f8 = (2,2) = unknown
I'd implement features relative to the point of origin (in this case - (1,1)) and not as absolute location on board (so the square up to (3,3) will also be f2).
Now, create a training set. The training set is a 'labeled' set of features - based on some real boards. You can create it manually (create a lot of boards), automatically by a random generator of placements, or by some other data you can gather.
Feed the training set to a learning algorithm. The algorithm should be able to handle 'unknowns' and be able to give probability of "true" and not only a boolean answer. I think a variation of Naive Bayes can fit well here.
After you have got a classifier - exploit it with your AI.
When it's your turn, choose to fire upon a square which has the maximal value of p. At first, the shots will be kinda random - but with more shots you fire, you will have more information on the board, and the AI will exploit it for better predictions.
Note that I gave features based on a square of size 1. You can of course choose any k and find features on this bigger square - it will give you more features, but each might be less informative. There is no rule of thumb which will be better - and it should be tested.

Main question is, how are you going to find statistically probable locations. Are they already known or you want to figure them out?
Either case, I'd just make the grid weighed. In your case, the initial weight for each slot would be 1.0/(SIZE^2). The sum of weights must be equal to 1.
You can then adjust weights based on the statistics gathered from N last played games.
Now, when your AI makes a choice, it chooses a coordinate to hit based on weighed probabilities. The quick and simple way to do that would be:
Generate a random number R in range [0..1]
Start from slot (0, 0) adding the weights, i.e. S = W(0, 0) + W(0, 1) + .... where W(n, m) is the weight of the corresponding slot. Once S >= R, you've got the coordinate to hit.
This can be optimised by pre-calculating cumulative weights for each row, have fun :)

Find out which ships are still alive:
alive = [2,2,3,4] # length of alive ships
Find out spots where you have not shot, for example with a numpy.where()
Loop over spots where you can shoot.
Check the sides of the given position. Go left and right, how many spaces? Go up and down, how many spaces? If you can fit a boat in that many spaces, you can fit any smaller boat, so this loop I'd do it from the largest ship downwards, and I'd add to the counts in this position as many +1 as ships smaller than the one that fits.
Once you have done all of this, the position with more points should be the most probable to attack and hit something.
Of course, it can get as complicated as you want. You can also ask yourself, instead of which is my next hit, which combinations of hits will give me the victory in less number of hits or any other combination/parametrization of the problem. Good luck!

Using pyephem to calculate when a satellite crosses a Longitude

I am having a hard time figuring out how to calculate when a satellite crosses a specific Longitude. It would be nice to able to provide a time period and a TLE and be able to return all the times at which the satellite crosses a given longitude during the specified time period. Does pyephem support something like this?

There are so many possible circumstances that users might ask about — when a satellite crosses a specific longitude; when it reaches a specific latitude; when it reaches a certain height or descends to its lowest altitude; when its velocity is greatest or least — that PyEphem does not try to provide built-in functions for all of them. Instead, it provides a newton() function that lets you find the zero-crossing of whatever comparison you want to make between a satellite attribute and a pre-determined value of that attribute that you want to search for.
Note that the SciPy Python library contains several very careful search functions that are much more sophisticated than PyEphem's newton() function, in case you are dealing with a particularly poorly-behaved function:
http://docs.scipy.org/doc/scipy/reference/optimize.html
Here is how you might search for when a satellite — in this example, the ISS — passes a particular longitude, to show the general technique. This is not the fastest possible approach — the minute-by-minute search, in particular, could be sped up if we were very careful — but it is written to be very general and very safe, in case there are other values besides longitude that you also want to search for. I have tried to add documentation and comments to explain what is going on, and why I use znorm instead of returning the simple difference. Let me know if this script works for you, and explains its approach clearly enough!
import ephem
line0 = 'ISS (ZARYA) '
line1 = '1 25544U 98067A 13110.27262069 .00008419 00000-0 14271-3 0 6447'
line2 = '2 25544 51.6474 35.7007 0010356 160.4171 304.1803 15.52381363825715'
sat = ephem.readtle(line0, line1, line2)
target_long = ephem.degrees('-83.8889')
def longitude_difference(t):
'''Return how far the satellite is from the target longitude.
Note carefully that this function does not simply return the
difference of the two longitudes, since that would produce a
terrible jagged discontinuity from 2pi to 0 when the satellite
crosses from -180 to 180 degrees longitude, which could happen to be
a point close to the target longitude. So after computing the
difference in the two angles we run degrees.znorm on it, so that the
result is smooth around the point of zero difference, and the
discontinuity sits as far away from the target position as possible.
'''
sat.compute(t)
return ephem.degrees(sat.sublong - target_long).znorm
t = ephem.date('2013/4/20')
# How did I know to make jumps by minute here? I experimented: a
# `print` statement in the loop showing the difference showed huge jumps
# when looping by a day or hour at a time, but minute-by-minute results
# were small enough steps to bring the satellite gradually closer to the
# target longitude at a rate slow enough that we could stop near it.
#
# The direction that the ISS travels makes the longitude difference
# increase with time; `print` statements at one-minute increments show a
# series like this:
#
# -25:16:40.9
# -19:47:17.3
# -14:03:34.0
# -8:09:21.0
# -2:09:27.0
# 3:50:44.9
# 9:45:50.0
# 15:30:54.7
#
# So the first `while` loop detects if we are in the rising, positive
# region of this negative-positive pattern and skips the positive
# region, since if the difference is positive then the ISS has already
# passed the target longitude and is on its way around the rest of
# the planet.
d = longitude_difference(t)
while d > 0:
t += ephem.minute
sat.compute(t)
d = longitude_difference(t)
# We now know that we are on the negative-valued portion of the cycle,
# and that the ISS is closing in on our longitude. So we keep going
# only as long as the difference is negative, since once it jumps to
# positive the ISS has passed the target longitude, as in the sample
# data series above when the difference goes from -2:09:27.0 to
# 3:50:44.9.
while d < 0:
t += ephem.minute
sat.compute(t)
d = longitude_difference(t)
# We are now sitting at a point in time when the ISS has just passed the
# target longitude. The znorm of the longitude difference ought to be a
# gently sloping zero-crossing curve in this region, so it should be
# safe to set Newton's method to work on it!
tn = ephem.newton(longitude_difference, t - ephem.minute, t)
# This should be the answer! So we print it, and also double-check
# ourselves by printing the longitude to see how closely it matches.
print 'When did ISS cross this longitude?', target_long
print 'At this specific date and time:', ephem.date(tn)
sat.compute(tn)
print 'To double-check, at that time, sublong =', sat.sublong
The output that I get when running this script suggests that it has indeed found the moment (within reasonable tolerance) when the ISS reaches the target longitude:
When did ISS cross this longitude? -83:53:20.0
At this specific date and time: 2013/4/20 00:18:21
To double-check, at that time, sublong = -83:53:20.1

There is a difference of time between the time the program calculates the passes over the longitude and the real time. I've checked it with the LIS system ( that it's inside the ISS ) of the Nasa to find lightnings.
And I have discovered that in Europe in some orbits the time that the program calculates the pass, it's 30 seconds in advanced than the real time. And in Colombia in some orbits the time in advanced is about 3 minutes ( perhaps because 1 degree of longitud in Colombia is bigger in amount of Km than 1 degree of longitude in Europe ). But this problem only happens in 2 particular orbits ! The one that pass over France and goes down in Sicilia. And the one that pass over USA, and goes down in Cuba.
Why could this be possible ?
In my pinion I think maybe there's some mistake in the ephem.newton algorithm or maybe with the TLE, that normally it reads the one created at 00:00:00 at night when it changes of day (and not the actual, because the ISS creates 3-4 TLE per day ) or maybe with the sat.sublong function that calculates a wrong Nadir of the satellite.
Does anyone has an idea or an explanation for this problem ?
Why it happens ?.
PS: I need to checked it for sure because I need to know when the ISS crosses an area ( for detecting the lightnings inside the area ). And if the time that the program calculates in some orbits it's in advanced than the real time, then the sat.sublong function calculates it's outside the area ( it calculates it hasn't arrived to the area yet ) but the program shows it's inside the area. So the real time doesn't match with the one that the program calculates, in some occasions.
Thanks a lot for your time !

Performing a moving linear fit to 1D data in Python

I have a 1D array of data and wish to extract the spatial variation. The standard way to do this which I wish to pythonize is to perform a moving linear regression to the data and save the gradient...
def nssl_kdp(phidp, distance, fitlen):
kdp=zeros(phidp.shape, dtype=float)
myshape=kdp.shape
for swn in range(myshape[0]):
print "Sweep ", swn+1
for rayn in range(myshape[1]):
print "ray ", rayn+1
small=[polyfit(distance[a:a+2*fitlen], phidp[swn, rayn, a:a+2*fitlen],1)[0] for a in xrange(myshape[2]-2*fitlen)]
kdp[swn, rayn, :]=array((list(itertools.chain(*[fitlen*[small[0]], small, fitlen*[small[-1]]]))))
return kdp
This works well but is SLOW... I need to do this 17*360 times...
I imagine the overhead is in the iterator in the [ for in arange] line... Is there an implimentation of a moving fit in numpy/scipy?

the calculation for linear regression is based on the sum of various values. so you could write a more efficient routine that modifies the sum as the window moves (adding one point and subtracting an earlier one).
this will be much more efficient than repeating the process every time the window shifts, but is open to rounding errors. so you would need to restart occasionally.
you can probably do better than this for equally spaced points by pre-calculating all the x dependencies, but i don't understand your example in detail so am unsure whether it's relevant.
so i guess i'll just assume that it is.
the slope is (NΣXY - (ΣX)(ΣY)) / (NΣX2 - (ΣX)2) where the "2" is "squared" - http://easycalculation.com/statistics/learn-regression.php
for evenly spaced data the denominator is fixed (since you can shift the x axis to the start of the window without changing the gradient). the (ΣX) in the numerator is also fixed (for the same reason). so you only need to be concerned with ΣXY and ΣY. the latter is trivial - just add and subtract a value. the former decreases by ΣY (each X weighting decreases by 1) and increases by (N-1)Y (assuming x_0 is 0 and x_N is N-1) each step.
i suspect that's not clear. what i am saying is that the formula for the slope does not need to be completely recalculated each step. particularly because, at each step, you can rename the X values as 0,1,...N-1 without changing the slope. so almost everything in the formula is the same. all that changes are two terms, which depend on Y as Y_0 "drops out" of the window and Y_N "moves in".

I've used these moving window functions from the somewhat old scikits.timeseries module with some success. They are implemented in C, but I haven't managed to use them in a situation where the moving window varies in size (not sure if you need that functionality).
http://pytseries.sourceforge.net/lib.moving_funcs.html
Head here for downloads (if using Python 2.7+, you'll probably need to compile the extension itself -- I did this for 2.7 and it works fine):
http://sourceforge.net/projects/pytseries/files/scikits.timeseries/0.91.3/
I/we might be able to help you more if you clean up your example code a bit. I'd consider defining some of the arguments/objects in lines 7 and 8 (where you're defining 'small') as variables, so that you don't end row 8 with so many hard-to-follow parentheses.

Ok.. I have what seems to be a solution.. not an answer persay, but a way of doing a moving, multi-point differential... I have tested this and the result looks very very similar to a moving regression... I used a 1D sobel filter (ramp from -1 to 1 convolved with the data):
def KDP(phidp, dx, fitlen):
kdp=np.zeros(phidp.shape, dtype=float)
myshape=kdp.shape
for swn in range(myshape[0]):
#print "Sweep ", swn+1
for rayn in range(myshape[1]):
#print "ray ", rayn+1
kdp[swn, rayn, :]=sobel(phidp[swn, rayn,:], window_len=fitlen)/dx
return kdp
def sobel(x,window_len=11):
"""Sobel differential filter for calculating KDP
output:
differential signal (Unscaled for gate spacing
example:
"""
s=np.r_[x[window_len-1:0:-1],x,x[-1:-window_len:-1]]
#print(len(s))
w=2.0*np.arange(window_len)/(window_len-1.0) -1.0
#print w
w=w/(abs(w).sum())
y=np.convolve(w,s,mode='valid')
return -1.0*y[window_len/2:len(x)+window_len/2]/(window_len/3.0)
this runs QUICK!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fast great circle for multiple points - Python geopy - python

Related

efficiently calculate distance between multiple pygame objects

Anyone knows a more efficient way to run a pairwise comparison of hundreds of trajectories?

How to generate statistically probably locations for ships in battleship

Using pyephem to calculate when a satellite crosses a Longitude

Performing a moving linear fit to 1D data in Python

Categories

Resources