Optimization function for the sum Google Maps distance - python

I am trying to find a point (latitude/longitude) that minimizes the sum of Google maps distance to all other N points.
I was able to extract the Google Maps distances between my latitude and longitude arrays but I wasn't able to minimize my function.
Code
def minimize_g(input_g):
gmaps1 = googlemaps.Client(key="xxx")
def distance_f(x):
dist = gmaps1.distance_matrix([x], np.array(input_g)[:,1:3])
sum_ = 0
for obs in range(len(np.array(df[:3]))):
sum_+= dist['rows'][0]['elements'][obs]['distance']['value']
return sum_
#initial guess: centroid
centroid = input_g.mean(axis=0)
optimization = minimize(distance_f, centroid, method='COBYLA')
return optimization.x
Thanks!

If you are looking for any point on the map that results in shortest distance to all coordinates in your list, you can try writing a function that calculates the distance from one coordinate to another coordinate. If you have that function ready to go, its a matter of calculating the total distance to all your points from a test point.
Then, from some artificially created coordinates, you would minimize the distances to all your points with something along the lines of
import numpy as np
lats = [12.3, 12.4, 12.5]
lons = [16.1, 15.1, 14.1]
def total_distance_to_lats_and_lons(lat, lon):
# some summation over distances from lat, lon to lats, lons
# create two lists with 0.01 degree precision as an artificial grid of possibilities
test_lats = np.arange(min(lats), max(lats), 0.01)
test_lons = np.arange(min(lons), max(lons), 0.01)
test_distances = [] # empty list to fill with the total_distance to each combination of test_lat, test_lon
coordinate_index_combinations = [] # corresponding coordinates
for test_lat in test_lats:
for test_lon in test_lons:
coordinate_combinations.append([test_lat, test_lon]) # add a combination of indices
test_distances.append(total_distance_to_lats_and_lons(test_lat, test_lon)) # add a distance
index_of_best_test_coordinate = np.argmin(test_distances) # find index of the minimum value
print('Best match is index {}'.format(index_of_best_test_coordinate))
print('Coordinates: {}'.format(coordinate_combinations[index_of_best_test_coordinate]))
print('Total distance: {}'.format(test_distances[index_of_best_test_coordinate]))
This brute force approach has some precision limitations and becomes an expensive loop quite quickly, so you can also apply this method iteratively with the minimum found after each round, so iteratively increasing precision and decreasing start and end points in the test coordinate lists. After a few iterations, you should have a pretty precise estimate. On the other hand, it is possible such an iterative method converges to one of multiple local minima, yielding only one of multiple solutions.

Related

how do I calculate bi-directional distance without using for loop?

I have a list with n numbers. I need to calculate the BD distance of each member of the list to all other members and sum all distances for each number and then select the number that has the lowest sum distance to all other members. I used two for loops for this, but it is time-consuming. is it a way to calculate this distance without using for loop? as you know the distance between i and j is the same but we calculate it two times in each loop. I have the distance of all points to each other in a big NumPy array. here we have some part of points in each cluster.
for itr2 in range(K):
tmp_cl=clusters[itr2+1]
if len(tmp_cl)>1:
BD_cent=np.zeros((len(tmp_cl),1))
for itr3 in range(len(tmp_cl)):
sumv=0
for itr5 in range(len(tmp_cl)):
BD_R=bd_rate(rate,tmp_cl[itr3,:],rate,tmp_cl[itr5,:])
BD_R=(BD_R-min_BDR)/(max_BDR-min_BDR)
BD_Q=bd_PSNR(rate,tmp_cl[itr3,:],rate,tmp_cl[itr5,:])
BD_Q=(BD_Q-min_BDQ)/(max_BDQ-min_BDQ)
value=(wr*BD_R+wq*BD_Q)
if value!=np.NINF:
sumv+=(value)
else:
sumv+=1000#for curves which have not overlap with others
BD_cent[itr3]=sumv/len(tmp_cl)
new_centroid_index=np.argmin(BD_cent)
centroid[itr2]=clusters[itr2+1][new_centroid_index]
This should be O(N log(N)) for the sort, and O(N) to find the minimum.
import numpy as np
import matplotlib.pyplot as plt
data = np.array([11, 32, 71, 167, 217, 308, 366, 411, 449])
x = np.sort(data)
position = np.arange(len(x))
# distances between neighboring points
xdiff = np.diff(x, prepend=0, append=0)
# From here, most calculations are unnecessary, because
# the minimum point depends on local conditions, and
# can be obtained from the change in position and xdiff
# delta distance = (distance between neighboring points)*position
xAccumLeftToRight = np.cumsum(xdiff[:-1]*position)
xAccumRightToLeft = np.cumsum(xdiff[::-1][:-1]*position)[::-1]
# Sum of distances to the left and to the right
sumDist = xAccumLeftToRight+xAccumRightToLeft
# Finding minimum. Can be speeded up with an log n search
# because it descends monotonically to the center
# (and then ascends monotonically)
indexMin = np.argmin(sumDist)
print(f"minimum distance at {x[indexMin]}")
print(xdiff)
print(xAccumLeftToRight)
print(xAccumRightToLeft)
print(sumDist)
plt.plot(x, sumDist)
plt.scatter(x[indexMin], sumDist[indexMin])
plt.show()

How do I find the closest point for each point in a data set while keeping precision?

This is my data set:
https://pastebin.com/SsuKP2eH
I'm trying to find the nearest point for all points in the data set. These points are latitude and longitude on the Earth's surface. Of course, the nearest point cannot be the same point.
I tried the KDTree solutions listed in this post: https://stackoverflow.com/a/45128643 and changed the poster's random points (generated by np.random.uniform) to my own data set.
I expected to get an array full of distances, but instead, I got an array full of zeroes with some numbers like 2.87722e-06 and 0.616582 sprinkled in. This wasn't what I wanted. I tried the other solution, NearestNeighbours, on my data set and got the same result. So, I did some debugging and reduced the range of random numbers he used, making it closer to my own data set.
import numpy as np
import scipy.spatial as spatial
import pandas as pd
R = 6367
def using_kdtree(data):
"Based on https://stackoverflow.com/q/43020919/190597"
def dist_to_arclength(chord_length):
"""
https://en.wikipedia.org/wiki/Great-circle_distance
Convert Euclidean chord length to great circle arc length
"""
central_angle = 2*np.arcsin(chord_length/(2.0*R))
arclength = R*central_angle
return arclength
phi = np.deg2rad(data['Latitude'])
theta = np.deg2rad(data['Longitude'])
data['x'] = R * np.cos(phi) * np.cos(theta)
data['y'] = R * np.cos(phi) * np.sin(theta)
data['z'] = R * np.sin(phi)
tree = spatial.KDTree(data[['x', 'y','z']])
distance, index = tree.query(data[['x', 'y','z']], k=2)
return dist_to_arclength(distance[:, 1])
#return distance, index
np.random.seed(2017)
N = 1000
#data = pd.DataFrame({'Latitude':np.random.uniform(-90,90,size=N), 'Longitude':np.random.uniform(0, 360,size=N)})
data = pd.DataFrame({'Latitude':np.random.uniform(-49.19,49.32,size=N), 'Longitude':np.random.uniform(-123.02, -123.23,size=N)})
result = using_kdtree(data)
I found that the resulting distances array had small values, close to 0. This makes me believe that the reason why the result array for my data set is full of zeroes is because the differences between points are very small. Somewhere, the KD Tree/nearest neighbours loses precision and outputs garbage. Is there a way to make them keep the precision of my floats? The brute-force method can keep precision but it is far too slow with 7200 points to iterate through.
I think what's happening is that k=2 in
distance, index = tree.query(data[['x', 'y','z']], k=2)
tells KDTree you want the closest two points to a point. So the closest is obviously the point itself with distance from itself being zero. Also if you print index you see a Nx2 array and each row starts with the number of the row. This is KDTree's way of saying well the closest point to the i-th point is the i-th point itself.
Obviously that is not useful and you probably want only the 2nd closest point. Fortunately I found this in the documentation of the k parameter of query
Either the number of nearest neighbors to return, or a list of the
k-th nearest neighbors to return, starting from 1.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query.html#scipy.spatial.KDTree.query
So
distance, index = tree.query(data[['x', 'y','z']], k=[2])
gives only the distance and index of the 2nd to closest point.

Efficient way to sample N points from a 3D space with the constraint of a minimum distance between two points in Python

I have 200 data points, each point is a list of 3 numbers that represents the position. I want to sample N=100 points from this 3D space, but with the constraint that the minimum distance between every two points must be larger than 0.15. The script below is the way I sample the points, but it just keeps running and never stops. Also, if I set a N larger than a value, the code cannot find all N points because I sample each point randomly and it gets to a point where no points can be sampled that isn't too close to the current points, but in reality the N can be much larger than this value if the point distribution is very "dense" (but still satisfies minimum distance larger than 0.15). Is there a more efficient way to do this?
import numpy as np
import random
import time
def get_random_points_not_too_close(points, npoints, min_distance):
random.shuffle(points)
final_points = [points[0]]
while len(final_points) < npoints:
for point in points:
if point in final_points:
continue
elif min([np.linalg.norm(np.array(p) - np.array(point)) for p in final_points]) > min_distance:
final_points.append(point)
return final_points
data = [[random.random() for i in range(3)] for j in range(200)]
t1 = time.time()
sample_points = get_random_points_not_too_close(points=data, npoints=100, min_distance=0.15)
t2 = time.time()
print(t2-t1)
Your algorithm could work for small sets of points, but it does not run in a deterministic time.
I did the following to create a random forest (simulated trees): first generate a square grid of points where the grid point distance is 3x the minimal distance. Now you take each point in the regular grid and translate it in a random direction, at a random distance which is max your distance. The result points will never be closer than max your distance.

Identifying weighted clusters with max distance diameter and sum(weight) > 50

Problem
Need to identify a way to find 2 mile clusters of points where each point has a value. Identify 2 mile areas which have a sum(value) > 50.
Data
I have data that looks like the following
ID COUNT LATITUDE LONGITUDE
187601546 20 025.56394 -080.03206
187601547 25 025.56394 -080.03206
187601548 4 025.56394 -080.03206
187601550 0 025.56298 -080.03285
Roughly 200K records. What I need to determine is if there are any areas where more than sum of the count exceeds 65 in a one mile radius (2 mile diameter) area.
Using each point as a center for an area
Now, I have python code from another project that will draw a shapefile around a point of x diameter as follows:
def poly_based_on_distance(center_lat,center_long, distance, bearing):
# bearing is in degrees
# distance in miles
# print ('center', center_lat, center_long)
destination = (vincenty(miles=distance).destination(Point(center_lat,
center_long), bearing).format_decimal())
And a routine to return destination and then see which points are inside the radius.
## This is the evaluation for overlap between points and
## area polyshapes
area_list = []
store_geo_dict = {}
for stores in locationdict:
location = Polygon(locationdict[stores])
for areas in AREAdictionary:
area = Polygon(AREAdictionary[areass])
if store.intersects(area):
area_list.append(areas)
store_geo_dict[stores] = area_list
area_list = []
At this point, I am simply drawing a circular shapefile around each of the 200K points, see which others were inside and doing the count.
Need Clustering Algorithm?
However, there might be an area with the required count density where one of the points is not in the center.
I'm familiar with clustering algos such as DBSCAN that use attributes for classification but this is a matter of finding a density clusters using a value for each point. Is there any clustering algorithm to find any cluster of a 2 mile diameter circle where the inside count is >= 50?
Any suggestions, python or R are preferred tools but this is wide-open and probably a one-off so computation efficiency is not a priority.
Not a complete solution, but maybe it will help simplify the problem depending on the distribution of your data. I will use planar coordinates and cKDTree in my example, this might work with geographic data if you can ignore curvature in a projection.
The main observation is the following: a point (x,y) does not contribute to a dense cluster if a ball of radius 2*r (e.g. 2 miles) around (x,y) contributes less than the cutoff value (e.g. 50 in your title). In fact, any point within r of (x,y) does not contribute to ant dense cluster.
This allows you to repeatedly discard points from consideration. If you are left with no points, there are no dense clusters; if you are left with some points, clusters may exist.
import numpy as np
from scipy.spatial import cKDTree
# test data
N = 1000
data = np.random.rand(N, 2)
x, y = data.T
# test weights of each point
weights = np.random.rand(N)
def filter_noncontrib(pts, weights, radius=0.1, cutoff=60):
tree = cKDTree(pts)
contribs = np.array(
[weights[tree.query_ball_point(pt, 2 * radius)].sum() for pt in pts]
)
return contribs >= cutoff
def possible_contributors(pts, weights, radius=0.1, cutoff=60):
n_pts = len(pts)
while len(pts):
mask = filter_noncontrib(pts, weights, radius, cutoff)
pts = pts[mask]
weights = weights[mask]
if len(pts) == n_pts:
break
n_pts = len(pts)
return pts
Example with dummy data:
DBSCAN can be adapted (see Generalized DBSCAN; define core points as weight sum >= 50), but it will not ensure the maximum cluster size (it computes transitive closures).
You could also try complete linkage. Use it to find clusters with the desired maximum diameter, then check if these satisfy the desired density. But that does not guarantee to find all.
It's probably faster to (a) build an index for fast radius search. (b) for every point, find neighbors in radius r; keep if they have the desired minimum sum. But that does not guarantee to find everything because the center is not necessarily a data point. Consider a max radius of 1, minimum weight 100. Two points with weight 50 each, at (0,0) and (1,1). Neither a query at (0,0) nor one at (1,1) will discover the solution, but a cluster at (.5,.5) satisfies the conditions.
Unfortunately, I believe your problem is at least NP-hard, so you won't be able to afford the ultimate solution.

Efficiently finding the closest coordinate pair from a set in Python

The Problem
Imagine I am stood in an airport. Given a geographic coordinate pair, how can one efficiently determine which airport I am stood in?
Inputs
A coordinate pair (x,y) representing the location I am stood at.
A set of coordinate pairs [(a1,b1), (a2,b2)...] where each coordinate pair represents one airport.
Desired Output
A coordinate pair (a,b) from the set of airport coordinate pairs representing the closest airport to the point (x,y).
Inefficient Solution
Here is my inefficient attempt at solving this problem. It is clearly linear in the length of the set of airports.
shortest_distance = None
shortest_distance_coordinates = None
point = (50.776435, -0.146834)
for airport in airports:
distance = compute_distance(point, airport)
if distance < shortest_distance or shortest_distance is None:
shortest_distance = distance
shortest_distance_coordinates = airport
The Question
How can this solution be improved? This might involve some way of pre-filtering the list of airports based on the coordinates of the location we are currently stood at, or sorting them in a certain order beforehand.
Using a k-dimensional tree:
>>> from scipy import spatial
>>> airports = [(10,10),(20,20),(30,30),(40,40)]
>>> tree = spatial.KDTree(airports)
>>> tree.query([(21,21)])
(array([ 1.41421356]), array([1]))
Where 1.41421356 is the distance between the queried point and the nearest neighbour and 1 is the index of the neighbour.
See: http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query.html#scipy.spatial.KDTree.query
If your coordinates are unsorted, your search can only be improved slightly assuming it is (latitude,longitude) by filtering on latitude first as for earth
1 degree of latitude on the sphere is 111.2 km or 69 miles
but that would not give a huge speedup.
If you sort the airports by latitude first then you can use a binary search for finding the first airport that could match (airport_lat >= point_lat-tolerance) and then only compare up to the last one that could match (airport_lat <= point_lat+tolerance) - but take care of 0 degrees equaling 360. While you cannot use that library directly, the sources of bisect are a good start for implementing a binary search.
While technically this way the search is still O(n), you have much fewer actual distance calculations (depending on tolerance) and few latitude comparisons. So you will have a huge speedup.
From this SO question:
import numpy as np
def closest_node(node, nodes):
nodes = np.asarray(nodes)
deltas = nodes - node
dist_2 = np.einsum('ij,ij->i', deltas, deltas)
return np.argmin(dist_2)
where node is a tuple with two values (x, y) and nodes is an array of tuples with two values ([(x_1, y_1), (x_2, y_2),])
The answer of #Juddling is great, but KDTree does not support haversine distance, which is better suited for latitude/longitude coordinates.
For the haversine distance you can use BallTree. Please note, that you need to convert your coordinates to radians first.
from math import radians
from sklearn.neighbors import BallTree
import numpy as np
airports = [(10,10),(20,20),(30,30),(40,40)]
airports_rad = np.array([[radians(x[0]), radians(x[1])] for x in airports ])
tree = BallTree(airports_rad , metric = 'haversine')
result = tree.query([(radians(21),radians(21))])
print(result)
gives
(array([[0.02391369]]), array([[1]], dtype=int64))
To convert the distance to meters you need to multiply by the earth radius (in meters).
earth_radius = 6371000 # meters in earth
print(result[0][0] * earth_radius)
[152354.11114795]

Categories