Finding random closest neighbours in 3D - python

I am trying to implement Python code when given the names and GPS positions of 750 people (latitude, longitude and elevation) to find the names of the 10 closest neighbors of a randomly selected individual.
import random
#random = rand.sample(range(0,750), 10)
coords = [(random.random()*2.0, random.random()*2.0, random.random()*2.0) for _ in range(750)]

To do this you should either work in spherical coordinates, or you can convert to Cartesian. Working in Cartesian makes the assumption that direct distance, and not a great elliptic arc, is how you are measuring distance.
import numpy as np
from sklearn.neighbors import DistanceMetric
R = 6371 # approximate radius of earth in km
# coordinates in (lat,lon,elv) in units of (rad,rad,km)
coords = np.random.random((750, 3)) * 2
cart_coords = np.array([((R+coord[2]) * np.cos(coord[0]) * np.cos(coord[1]),
(R+coord[2]) * np.cos(coord[0]) * np.sin(coord[1]),
(R+coord[2]) *np.sin(coord[0])) for coord in coords])
# calculate distances between points
dist = DistanceMetric.get_metric('euclidean')
dist_vals = dist.pairwise(cart_coords)
# pick a random person
random_person = np.random.choice(np.arange(750))
top_ten = np.where(dist_vals[random_person] < sorted(dist_vals[random_person])[11])[0]
# remove self from list
top_ten = top_ten[top_ten!=random_person]
print(top_ten)
If you wished to ignore the elevation and use the havesine formula, you can check this post Vectorizing Haversine distance calculation in Python
The Earth is an ellipsoid with a difference of about 21km between the polar and equatorial radii. If you really want to go deeper you can look into the science of geodesy. astropy is a good package for this type of problem https://docs.astropy.org/en/stable/api/astropy.coordinates.spherical_to_cartesian.html

Couldn't you just use the distance formula to calculate the distance between two points given x,y,z, where d=sqrt((x2-x1)^2+(y2-y1)^2+(z2-z1)^2) to get the distance between the randomly selected person and all other elements. Just calculate the distances of every single person from the random person and then only store the ten lowest values

You could use the excellent BallTree from sklearn:
import numpy as np
from sklearn.neighbors import BallTree
coords = np.random.random((750, 3)) * 2
tree = BallTree(coords)
random_person = np.random.choice(np.arange(750))
closest_people = tree.query(coords[None, random_person], k=10)[1]

Related

Creating Integer Range from multiple string math expressions in python

I have a problem in a project and I've searched the internet high and low with no clear answer.
How can i convert math expressions
As 3x + 5y >= 100
And x,y < 500
Into a range of x and range of y to be used as ristriction in a math problem,
Ex: f = x^2+4y
The end result is to find the largest answer using genetic algorithms where x and y are ristricted in value.
Tried sympy and eval with no luck
Searched every and found only a few helpful but not enough resources
I need just to translate the user input to the code for the genetic algorithm to use.
Your set of linear inequations define a polygon in the plane.
The edges of this polygon are the lines defined by each equality that you get by replacing the inequal sign by an equal sign in an inequality.
The vertices of this polygon are the intersections of two adjacent edges; equivalently, they are the intersections of two edges that satisfy the system of (large) inequalities.
So, one way to find all the vertices of the polygon is to find every intersection point by solving every subsystem of two equalities, then filtering out the points that are outside of the polygon.
import numpy as np
from numpy.linalg import solve, LinAlgError
from itertools import combinations
import matplotlib.pyplot as plt
A = np.array([[-3, -5],[1,0],[0,1]])
b = np.array([-100,500,500])
# find polygon for system of linear inequations
# expects input in "less than" form:
# A X <= b
def get_polygon(A, b, tol = 1e-5):
polygon = []
for subsystem in map(list, combinations(range(len(A)), 2)):
try:
polygon.append(solve(A[subsystem], b[subsystem])) # solve subsystem of 2 equalities
except LinAlgError:
pass
polygon = np.array(polygon)
polygon = polygon[(polygon # A.T <= b + tol).all(axis=1)] # filter out points outside of polygon
return polygon
polygon = get_polygon(A, b)
polygon = np.vstack((polygon, polygon[0])) # duplicate first point of polygon to "close the loop" before plotting
plt.plot(polygon[:,0], polygon[:,1])
plt.show()
Note that get_polygon will find all vertices of the polygon, but if there are more than 3, they might not be ordered in clockwise order.
If you want to sort the vertices in clockwise order before plotting the polygon, I refer you to this question:
How to sort a list of points in clockwise/anticlockwise in python?
Using #Stef's approach in SymPy would give the triangular region of interest like this:
>>> from sympy.abc import x, y
>>> from sympy import intersection, Triangle, Line
>>> eqs = Eq(3*x+5*y,100), Eq(x,500), Eq(y,500)
>>> Triangle(*intersection(*[Line(eq) for eq in eqs], pairwise=True))
Triangle(Point2D(-800, 500), Point2D(500, -280), Point2D(500, 500))
So x is in range [-800, 500] and y is in range [m, 500] where m is the y value calculated from the equation of the diagonal:
m = solve(eqs[0], y)[0] # m(x)
def yval(xi):
if xi <-800 or xi > 500:
return
return m.subs(x,xi)
yval(300) # -> -160

Cubic search mesh with spheres and points sample (Python)

I have a set of 500,000 spheres 3D coordinates along with their radii (non-unique) which are put in a pandas DataFrame object.
I would like to create an efficient search routine where I sample many points' coordinates, and the routine returns if each point is in a sphere, and if yes, in which sphere it is.
To do that, I read that using a cartesian search mesh can be the solution. I therefore created a mesh with a given cell size, and assigned ids to each sphere. In case it is useful, I also stored the differences between the center of the cell and the center of the sphere.
import pandas as pd
import numpy as np
size_cell = 20
fake_coordinates = np.uniform(-200,200, (500000, 3)) # Here spheres will cross, which should not be the case in the real input. If point in two spheres, choose the first one that comes.
data = pd.DataFrame(fake_coordinates, columns=['x','y','z'])
data['r'] = np.uniform(1,3, 500000)
x_vect = np.arange(data.x.min()-np.max(data.r), data.x.max()+np.max(data.r), size_cell)
y_vect = np.arange(data.y.min()-np.max(data.r), data.y.max()+np.max(data.r), size_cell)
z_vect = np.arange(data.z.min()-np.max(data.r), data.z.max()+np.max(data.r), size_cell)
data['i_x'] = ((data.x-x_vect[0])//size_cell).astype(int)
data['i_y'] = ((data.y-y_vect[0])//size_cell).astype(int)
data['i_z'] = ((data.z-z_vect[0])//size_cell).astype(int)
data['dx'] = data.x-(x_vect[data.i_x]+x_vect[data.i_x+1])/2
data['dy'] = data.y-(y_vect[data.i_y]+y_vect[data.i_y+1])/2
data['dz'] = data.z-(z_vect[data.i_z]+z_vect[data.i_z+1])/2
Then, I noticed that a sphere which has its center that belongs to a neighboring cell can cross its cell and therefore, the sampled point can be in this neighbor sphere. So I computed the crossing nature of each sphere.
data['crossing_x_left'] = data.dx < - (size_cell - data.r)
data['crossing_x_right'] = data.dx > (size_cell - data.r)
data['crossing_y_left'] = data.dy < - (size_cell - data.r)
data['crossing_y_right'] = data.dy > (size_cell - data.r)
data['crossing_z_down'] = data.dz < - (size_cell - data.r)
data['crossing_z_top'] = data.dz > (size_cell - data.r)
Now I should sample N points:
sample = np.uniform(-200,200, (1000000, 3))
If I take into account only one point and only candidate spheres with the center within the cell:
i_x = (sample[0,0]-x_vect[0])//size_cell
i_y = (sample[0,1]-y_vect[0])//size_cell
i_z = (sample[0,2]-z_vect[0])//size_cell
ok_x = data['i_x']==i_x
ok_y = data['i_y']==i_y
ok_z = data['i_z']==i_z
candidates = data.loc[ok_x & ok_y & ok_z]
Now I want to test many points at the same time, but also take into account the crossing spheres.
But then I run into a method issue. How to efficiently compute (probably meaning processing all points and all spheres at the same time using matrices) if the points belong to a sphere, and if yes, in which sphere they are? The part where spheres can cross the boundary makes it difficult for me to see.

How to calculate 3D distance (including altitude) between two points in GeoDjango

Prologue:
This is a question arising often in SO:
3d distance calculations with GeoDjango
Calculating distance between two points using latitude longitude and altitude (elevation)
Distance between two 3D point in geodjango (postgis)
I wanted to compose an example on SO Documentation but the geodjango chapter never took off and since the Documentation got shut down on August 8, 2017, I will follow the suggestion of this widely upvoted and discussed meta answer and write my example as a self-answered post.
Of course, I would be more than happy to see any different approach as well!!
Question:
Assume the model:
class MyModel(models.Model):
name = models.CharField()
coordinates = models.PointField()
Where I store the point in the coordinate variable as a lan, lng, alt point:
MyModel.objects.create(
name='point_name',
coordinates='SRID=3857;POINT Z (100.00 10.00 150)')
I am trying to calculate the 3D distance between two such points:
p1 = MyModel.objects.get(name='point_1').coordinates
p2 = MyModel.objects.get(name='point_2').coordinates
d = Distance(m=p1.distance(p2))
Now d=X in meters.
If I change only the altitude of one of the points in question:
For example:
p1.coordinates = 'SRID=3857;POINT Z (100.00 10.00 200)'
from 150 previously, the calculation:
d = Distance(m=p1.distance(p2))
returns d=X again, like the elevation is ignored.
How can I calculate the 3D distance between my points?
Reading from the documentation on the GEOSGeometry.distance method:
Returns the distance between the closest points on this geometry and the given geom (another GEOSGeometry object).
Note
GEOS distance calculations are linear – in other words, GEOS does not perform a spherical calculation even if the SRID specifies a geographic coordinate system.
Therefore we need to implement a method to calculate a more accurate 2D distance between 2 points and then we can try to apply the altitude (Z) difference between those points.
1. Great-Circle 2D distance calculation (Take a look at the 2022 UPDATE below the explanation for a better approach using geopy):
The most common way to calculate the distance between 2 points on the surface of a sphere (as the Earth is simplistically but usually modeled) is the Haversine formula:
The haversine formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes.
Although from the great-circle distance wiki page we read:
Although this formula is accurate for most distances on a sphere, it too suffers from rounding errors for the special (and somewhat unusual) case of antipodal points (on opposite ends of the sphere). A formula that is accurate for all distances is the following special case of the Vincenty formula for an ellipsoid with equal major and minor axes.
We can create our own implementation of the Haversine or the Vincenty formula (as shown here for Haversine: Haversine Formula in Python (Bearing and Distance between two GPS points)) or we can use one of the already implemented methods contained in geopy:
geopy.distance.great_circle (Haversine):
from geopy.distance import great_circle
newport_ri = (41.49008, -71.312796)
cleveland_oh = (41.499498, -81.695391)
# This call will result in 536.997990696 miles
great_circle(newport_ri, cleveland_oh).miles)
geopy.distance.vincenty (Vincenty):
from geopy.distance import vincenty
newport_ri = (41.49008, -71.312796)
cleveland_oh = (41.499498, -81.695391)
# This call will result in 536.997990696 miles
vincenty(newport_ri, cleveland_oh).miles
!!!2022 UPDATE: On 2D distance calculation using geopy:
GeoPy discourages the use of Vincenty as of version 1.14.0. Changelog states:
CHANGED: Vincenty usage now issues a warning. Geodesic should be used instead. Vincenty is planned to be removed in geopy 2.0. (#293)
So (especially if we are going to apply the calculation on a WGS84 ellipsoid) we should use geodesic distance instead:
from geopy.distance import geodesic
newport_ri = (41.49008, -71.312796)
cleveland_oh = (41.499498, -81.695391)
# This call will result in 538.390445368 miles
geodesic(newport_ri, cleveland_oh).miles
2. Adding altitude to the mix:
As mentioned, each of the above calculations yields a great circle distance between 2 points. That distance is also called "as the crow flies", assuming that the "crow" flies without changing altitude and as straight as possible from point A to point B.
We can have a better estimation of the "walking/driving" ("as the crow walks"??) distance by combining the result of one of the previous methods with the difference (delta) in altitude between point A and point B, inside the Euclidean Formula for distance calculation:
acw_dist = sqrt(great_circle(p1, p2).m**2 + (p1.z - p2.z)**2)
The previous solution is prone to errors especially the longer the real distance between the points is. I leave it here for comment continuation reasons.
GeoDjango Distance calculates the 2D distance between two points and doesn't take into consideration the altitude differences.
In order to get the 3D calculation, we need to create a distance function that will consider altitude differences in the calculation:
Theory:
The latitude, longitude and altitude are Polar coordinates and we need to translate them to Cartesian coordinates (x, y, z) in order to apply the Euclidean Formula on them and calculate their 3D distance.
Assume:
polar_point_1 = (long_1, lat_1, alt_1)
and polar_point_2 = (long_2, lat_2, alt_2)
Translate each point to it's Cartesian equivalent by utilizing this formula:
x = alt * cos(lat) * sin(long)
y = alt * sin(lat)
z = alt * cos(lat) * cos(long)
and you will have p_1 = (x_1, y_1, z_1) and p_2 = (x_2, y_2, z_2) points respectively.
Finally use the Euclidean formula:
dist = sqrt((x_2-x_1)**2 + (y_2-y_1)**2 + (z_2-z_1)**2)
Using geopy, this is the easiest and perfect solution.
https://geopy.readthedocs.io/en/stable/#geopy.distance.lonlat
>>> from geopy.distance import distance
>>> from geopy.point import Point
>>> a = Point(-71.312796, 41.49008, 0)
>>> b = Point(-81.695391, 41.499498, 0)
>>> print(distance(a, b).miles)
538.3904453677203
Once converted into Cartesian coordinates, you can compute the norm with numpy:
np.linalg.norm(point_1 - point_2)

Scipy: how to convert KD-Tree distance from query to kilometers (Python/Pandas)

This post builds upon this one.
I got a Pandas dataframe containing cities with their geo-coordinates (geodetic) as longitude and latitude.
import pandas as pd
df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300},
{'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600},
{'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]);
For each city I'm trying to find two other cities that are closest. Therefore I tried the scipy.spatial.KDTree. To do so, I had to convert the geodetic coordinates into 3D catesian coordinates (ECEF = earth-centered, earth-fixed):
from math import *
def to_Cartesian(lat, lng):
R = 6367 # radius of the Earth in kilometers
x = R * cos(lat) * cos(lng)
y = R * cos(lat) * sin(lng)
z = R * sin(lat)
return x, y, z
df['x'], df['y'], df['z'] = zip(*map(to_Cartesian, df['lat'], df['lng']))
df
This give me this:
With this I can create the KDTree:
coordinates = list(zip(df['x'], df['y'], df['z']))
from scipy import spatial
tree = spatial.KDTree(coordinates)
tree.data
Now I'm testing it with Berlin,
tree.query(coordinates[0], 2)
which correctly gives me Berlin (itself) and Potsdam as the two cities from my list that are closest to Berlin.
Question: But I wonder what to do with the distance from that query? It says 1501 - but how can I convert this to meters or kilometers? The real distance between Berlin and Potsdam is 27km and not 1501km.
Remark: I know I could get longitude/latitude for both cities and calculate the haversine-distance. But would be cool that use the output from KDTree instead.
(array([ 0. , 1501.59637685]), array([0, 1]))
Any help is appreciated.
The KDTree is computing the euclidean distance between the two points (cities). The two cities and the center of the earth form an isosceles triangle.
The German wikipedia entry contains a nice overview of the geometric properties which the English entry lacks. You can use this to compute the distance.
import numpy as np
def deg2rad(degree):
rad = degree * 2*np.pi / 360
return(rad)
def distToKM(x):
R = 6367 # earth radius
gamma = 2*np.arcsin(deg2rad(x/(2*R))) # compute the angle of the isosceles triangle
dist = 2*R*sin(gamma/2) # compute the side of the triangle
return(dist)
distToKM(1501.59637685)
# 26.207800812050056
Update
After the comment about obtaining the opposite I re-read the question and realised that while it seems that one can use the proposed function above, the real problem lies somewhere else.
cos and sin in your function to_Cartesian expect the input to be in radians (documentation) whereas you are handing them the angles in degree. You can use the function deg2rad defined above to transform the latitude and longitude to radians. This should give you the distance in km directly from the KDTree.

Efficiently finding the closest coordinate pair from a set in Python

The Problem
Imagine I am stood in an airport. Given a geographic coordinate pair, how can one efficiently determine which airport I am stood in?
Inputs
A coordinate pair (x,y) representing the location I am stood at.
A set of coordinate pairs [(a1,b1), (a2,b2)...] where each coordinate pair represents one airport.
Desired Output
A coordinate pair (a,b) from the set of airport coordinate pairs representing the closest airport to the point (x,y).
Inefficient Solution
Here is my inefficient attempt at solving this problem. It is clearly linear in the length of the set of airports.
shortest_distance = None
shortest_distance_coordinates = None
point = (50.776435, -0.146834)
for airport in airports:
distance = compute_distance(point, airport)
if distance < shortest_distance or shortest_distance is None:
shortest_distance = distance
shortest_distance_coordinates = airport
The Question
How can this solution be improved? This might involve some way of pre-filtering the list of airports based on the coordinates of the location we are currently stood at, or sorting them in a certain order beforehand.
Using a k-dimensional tree:
>>> from scipy import spatial
>>> airports = [(10,10),(20,20),(30,30),(40,40)]
>>> tree = spatial.KDTree(airports)
>>> tree.query([(21,21)])
(array([ 1.41421356]), array([1]))
Where 1.41421356 is the distance between the queried point and the nearest neighbour and 1 is the index of the neighbour.
See: http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query.html#scipy.spatial.KDTree.query
If your coordinates are unsorted, your search can only be improved slightly assuming it is (latitude,longitude) by filtering on latitude first as for earth
1 degree of latitude on the sphere is 111.2 km or 69 miles
but that would not give a huge speedup.
If you sort the airports by latitude first then you can use a binary search for finding the first airport that could match (airport_lat >= point_lat-tolerance) and then only compare up to the last one that could match (airport_lat <= point_lat+tolerance) - but take care of 0 degrees equaling 360. While you cannot use that library directly, the sources of bisect are a good start for implementing a binary search.
While technically this way the search is still O(n), you have much fewer actual distance calculations (depending on tolerance) and few latitude comparisons. So you will have a huge speedup.
From this SO question:
import numpy as np
def closest_node(node, nodes):
nodes = np.asarray(nodes)
deltas = nodes - node
dist_2 = np.einsum('ij,ij->i', deltas, deltas)
return np.argmin(dist_2)
where node is a tuple with two values (x, y) and nodes is an array of tuples with two values ([(x_1, y_1), (x_2, y_2),])
The answer of #Juddling is great, but KDTree does not support haversine distance, which is better suited for latitude/longitude coordinates.
For the haversine distance you can use BallTree. Please note, that you need to convert your coordinates to radians first.
from math import radians
from sklearn.neighbors import BallTree
import numpy as np
airports = [(10,10),(20,20),(30,30),(40,40)]
airports_rad = np.array([[radians(x[0]), radians(x[1])] for x in airports ])
tree = BallTree(airports_rad , metric = 'haversine')
result = tree.query([(radians(21),radians(21))])
print(result)
gives
(array([[0.02391369]]), array([[1]], dtype=int64))
To convert the distance to meters you need to multiply by the earth radius (in meters).
earth_radius = 6371000 # meters in earth
print(result[0][0] * earth_radius)
[152354.11114795]

Categories