I am working with a list of ID, X, and Y data for fire hydrant locations. I am trying to find the three closest fire hydrants for each fire hydrant in the list.
a = [[ID, X, Y],[ID, X, Y]]
I have tried implementing this using a for loop but I am having trouble because I cannot keep the original point data the same while iterating through the list of points.
Is there a strait forward way to calculate the distance from one point to each of the other points and iterate this for each point in the list? I am very new to python and have not seen anything about how to do this online.
Any help would be greatly appreciated.
You do not have to calculate all distances of all points to all others to get the three nearest neighbours for all points.
A kd-tree search will be much more efficient due to its O(log n) complexity instead of a O(n**2) time complexity for the brute force method (calculating all distances).
Example
import numpy as np
from scipy import spatial
#Create some coordinates and indices
#It is assumed that the coordinates are unique (only one entry per hydrant)
Coords=np.random.rand(1000*2).reshape(1000,2)
Coords*=100
Indices=np.arange(1000) #Indices
def get_indices_of_nearest_neighbours(Coords,Indices):
tree=spatial.cKDTree(Coords)
#k=4 because the first entry is the nearest neighbour
# of a point with itself
res=tree.query(Coords, k=4)[1][:,1:]
return Indices[res]
Here you go. Let's say you have an input list with this format [[ID, X, Y],[ID, X, Y]].
You can simply loop through each hydrant when looping through each hydrant and calculate the min distance between them. You just need to have some variable to store the min distance for each hydrant and the ID of the closest hydrant.
import math # for sqrt calculation
def distance(p0, p1):
""" Calculate the distance between two hydrant """
return math.sqrt((p0[1] - p1[1])**2 + (p0[2] - p1[2])**2)
input = [[0, 1, 2], [1, 2, -3], [2, -3, 5]] # your input list of hydrant
for current_hydrant in input: # loop through each hydrant
min_distance = 999999999999999999999999
closest_hydrant = 0
for other_hydrant in input: # loop through each other hydrant
if current_hydrant != other_hydrant:
curr_distance = distance(current_hydrant, other_hydrant) # call the distance function
if curr_distance < min_distance: # find the closet hydrant
min_distance = curr_distance
closest_hydrant = other_hydrant[0]
print("Closest fire hydrants to the", current_hydrant[0], "is the hydrants",
closest_hydrant, "with the distance of", min_distance) # print the closet hydrant
Since the distance function is not very complicated i rewrite it, you can use some other function in scipy or numpy library to get the distance.
Hope this can help ;)
If you have geolocation, we can perform simple distance calculation(https://en.m.wikipedia.org/wiki/Haversine_formula) to get kilometers distance between two locations. This code is NOT meant to be efficient. If this is what you want we can use numpy to speed it up:
import math
def distance(lat,lon, lat2,lon2):
R = 6372.8 # Earth radius in kilometers
# change lat and lon to radians to find diff
rlat = math.radians(lat)
rlat2 = math.radians(lat2)
rlon = math.radians(lon)
rlon2 = math.radians(lon2)
dlat = math.radians(lat2 - lat)
dlon = math.radians(lon2 - lon)
m = math.sin(dlat/2)**2 + \
math.cos(rlat)*math.cos(rlat2)*math.sin(dlon/2)**2
return 2 * R * math.atan2(math.sqrt(m),
math.sqrt(1 - m))
a = [['ID1', 52.5170365, 13.3888599],
['ID2', 54.5890365, 12.5865499],
['ID3', 50.5170365, 10.3888599],
]
b = []
for id, lat, lon in a:
for id2, lat2, lon2 in a:
if id != id2:
d = distance(lat,lon,lat2,lon2)
b.append([id,id2,d])
print(b)
Related
I currently have a dataframe which includes five columns as seen below. I group the elements of the original dataframe such that they are within a 100km x 100km grid. For each grid element, I need to determine whether there is at least one set of points which are 100m away from each other. In order to do this, I am using the Haversine formula and calculating the distance between all points within a grid element using a for loop. This is rather slow as my parent data structure can have billions of points, and each grid element millions. Is there a quicker way to do this?
Here is a view into a group in the dataframe. "approx_LatSp" & "approx_LonSp" are what I use for groupBy in a previous function.
print(group.head())
Time Lat Lon approx_LatSp approx_LonSp
197825 1.144823 -69.552576 -177.213646 -70.0 -177.234835
197826 1.144829 -69.579416 -177.213370 -70.0 -177.234835
197827 1.144834 -69.606256 -177.213102 -70.0 -177.234835
197828 1.144840 -69.633091 -177.212856 -70.0 -177.234835
197829 1.144846 -69.659925 -177.212619 -70.0 -177.234835
This group is equivalent to one grid element. This group gets passed to the following function which seems to be the crux of my issue (from a performance perspective):
def get_pass_in_grid(group):
'''
Checks if there are two points within 100m
'''
check_100m = 0
check_1km = 0
row_mins = []
for index, row in group.iterrows():
# Get distance
distance_from_row = get_distance_lla(row['Lat'], row['Lon'], group['Lat'].drop(index), group['Lon'].drop(index))
minimum = np.amin(distance_from_row)
row_mins = row_mins + [minimum]
array = np.array(row_mins)
m_100 = array[array < 0.1]
km_1 = array[array < 1.0]
if m_100.size > 0:
check_100m = 1
if km_1.size > 0:
check_1km = 1
return check_100m, check_1km
And the Haversine formula is calculated as follows
def get_distance_lla(row_lat, row_long, group_lat, group_long):
def radians(degrees):
return degrees * np.pi / 180.0
global EARTH_RADIUS
lon1 = radians(group_long)
lon2 = radians(row_long)
lat1 = radians(group_lat)
lat2 = radians(row_lat)
# Haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat / 2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2)**2
c = 2 * np.arcsin(np.sqrt(a))
# calculate the result
return(c * EARTH_RADIUS)
One way in which I know I can improve this code is to stop the for loop if the 100m is met for any two points. If this is the only way to improve the speed then I will apply this. But I am hoping there is a better way to resolve my problem. Any thoughts are greatly appreciated! Let me know if I can help to clear something up.
Convert all points to carthesian coordinates to have much easier task (distance of 100m is small enough to disregard that Earth is not flat)
Divide each grid into NxN subgrids (20x20, 100x100? check what is faster), for each point determine in which subgrid it is located. Determine distances within smaller subgrids (and their neighbours) instead of searching whole grid.
Use numpy to vectorize calculations (doing point no1 will definitely help you)
Thanks #Corralien for his advice. I was able to use the BallTree in order to quickly find the closest elements. Improvement is something like 100x over my original code from a performance standpoint. Here is the new get_pass_in_grid:
def get_pass_in_grid(group):
'''
Checks if there is a pass within 100m to meet SWE L-Band requirement
'''
check_100m = 0
check_1km = 0
if len(group) < 2:
return check_100m, check_1km
row_mins = []
group['Lat'] = np.deg2rad(group['Lat'])
group['Lon'] = np.deg2rad(group['Lon'])
temp = np.array([group['Lat'],group['Lon']]).T
tree = BallTree(temp, leaf_size=2, metric='haversine')
for _, row in group.iterrows():
# Get distance
row_arr = np.array([row['Lat'], row['Lon']]).reshape((-1,2))
closest_elem_lst, _ = tree.query(row_arr, k=2)
# First element is always just this one (since d=0)
closest_elem = closest_elem_lst[0,1] * EARTH_RADIUS
row_mins = row_mins + [closest_elem]
if closest_elem < 0.1:
break
array = np.array(row_mins)
m_100 = array[array < 0.1]
km_1 = array[array < 1.0]
if m_100.size > 0:
check_100m = 1
if km_1.size > 0:
check_1km = 1
return check_100m, check_1km
I need to calculate the manhattan distance between 2 vectors
I found this code
https://www.geeksforgeeks.org/sum-manhattan-distances-pairs-points/
def distancesum (x, y, n):
sum = 0
# for each point, finding distance
# to rest of the point
for i in range(n):
for j in range(i+1,n):
sum += (abs(x[i] - x[j]) +
abs(y[i] - y[j]))
return sum
But in another documentation I found this code for manhattan
so the code for this is:
def manhattan_distance(instance1, instance2):
n = len(instance1)-1
sum = 0
# for each point, finding distance
# to rest of the point
for i in range(n):
sum += abs(float(instance1[i]) - float(instance2[i]))
return sum
What is the algorithm for manhattan distance
Here's an example for calculating the manhattan distance.
In [1]: %paste
import numpy as np
def manhattan_distance(a, b):
return np.abs(a - b).sum()
a = np.array([1, 2])
b = np.array([-1, 4])
print(manhattan_distance(a, b))
## -- End pasted text --
4
If dealing with vectors that are strings
In [1]: %paste
import numpy as np
def manhattan_distance(a, b):
return np.abs(a - b).sum()
a = ['1', '2']
b = ['-1', '4']
print(manhattan_distance(np.array(a, dtype=float), np.array(b, dtype=float)))
## -- End pasted text --
4.0
In the referenced formula, you have n points each with 2 coordinates and you compute the distance of one vectors to the others. So apart from the notations, both formula are the same. The Manhattan distance between 2 vectors is the sum of the absolute value of the difference of their coordinates. An easy way to remember it, is that the distance of a vector to itself must be 0.
You probably need this Scipy function:
Y = cdist(XA, XB, 'cityblock')
I am trying to implement an algorithm which computes the shortest path and and its associated distance from a current position to a goal through an ordered list of waypoints in a 2d plane. A waypoint is defined by its center coordinates (x, y) and its radius r. The shortest path have to intersect each waypoint circumference at least once. This is different from other path optimization problems because I already know the order in which the waypoints have to be crossed.
In the simple case, consecutive waypoints are distinct and not aligned and this can be solved using consecutive angle bisections. The tricky cases are :
when three or more consecutive waypoints have the same center but different radii
when consecutive waypoints are aligned such that a straight line passes through all of them
Here is a stripped down version of my Python implementation, which does not handle aligned waypoints, and handles badly concentric consecutive waypoints. I adapted it because it normally uses latitudes and longitudes, not points in the euclidean space.
def optimize(position, waypoints):
# current position is on the shortest path, cumulative distance starts at zero
shortest_path = [position.center]
optimized_distance = 0
# if only one waypoint left, go in a straight line
if len(waypoints) == 1:
shortest_path.append(waypoints[-1].center)
optimized_distance += distance(position.center, waypoints[-1].center)
else:
# consider the last optimized point (one) and the next two waypoints (two, three)
for two, three in zip(waypoints[:], waypoints[1:]):
one = fast_waypoints[-1]
in_heading = get_heading(two.center, one.center)
in_distance = distance(one.center, two.center)
out_distance = distance(two.center, three.center)
# two next waypoints are concentric
if out_distance == 0:
next_target, nb_concentric = find_next_not_concentric(two, waypoints)
out_heading = get_heading(two.center, next_target.center)
angle = out_heading - in_heading
leg_distance = two.radius
leg_heading = in_heading + (0.5/nb_concentric) * angle
else:
out_heading = get_heading(two.center, three.center)
angle = out_heading - in_heading
leg_heading = in_heading + 0.5 * angle
leg_distance = (2 * in_distance * out_distance * math.cos(math.radians(angle * 0.5))) / (in_distance + out_distance)
best_leg_distance = min(leg_distance, two.radius)
next_best = get_offset(two.center, leg_heading, min_leg_distance)
shortest_path.append(next_best.center)
optimized_distance += distance(one.center, next_best.center)
return optimized_distance, shortest_path
I can see how to test for the different corner cases but I think this approach is bad, because there may be other corner cases I haven't thought of. Another approach would be to discretize the waypoints circumferences and apply a shortest path algorithm such as A*, but that would be highly inefficient.
So here is my question : Is there a more concise approach to this problem ?
For the record, I implemented a solution using Quasi-Newton methods, and described it in this short article. The main work is summarized below.
import numpy as np
from scipy.optimize import minimize
# objective function definition
def tasklen(θ, x, y, r):
x_proj = x + r*np.sin(θ)
y_proj = y + r*np.cos(θ)
dists = np.sqrt(np.power(np.diff(x_proj), 2) + np.power(np.diff(y_proj), 2))
return dists.sum()
# center coordinates and radii of turnpoints
X = np.array([0, 5, 0, 7, 12, 12]).astype(float)
Y = np.array([0, 0, 4, 7, 0, 5]).astype(float)
R = np.array([0, 2, 1, 2, 1, 0]).astype(float)
# first initialization vector is an array of zeros
init_vector = np.zeros(R.shape).astype(float)
# using scipy's solvers to minimize the objective function
result = minimize(tasklen, init_vector, args=(X, Y, R), tol=10e-5)
I would do it like this:
For each circle in order, pick any point on the circumference, and route the path through these points.
For each circle, move the point along the circumference in the direction that makes the total path length smaller.
Repeat 2. until no further improvement can be done.
I am still very new to Python. I am heading a project to map the building footprints within our county on the tax map.
I have found a previous question that may be very helpful for this project: https://gis.stackexchange.com/questions/6724/creating-line-of-varying-distance-from-origin-point-using-python-in-arcgis-deskt
Our Cama system generates views/table with the needed information. Below is an example:
PARID LLINE VECT X_COORD Y_COORD
1016649 0 R59D26L39U9L20U17 482547 1710874
180,59,270,26,0,39,90,9,0,20,90,17 (VECT column converted)
I have found some python examples to convert the VECT column, which are distance and direction calls to angles and distances separated by commas.
My question: Is there a way to implement a loop into the script below to utilize a table rather than static, user entered, numbers? This would be very valuable to the county as we have several thousand polygons to construct.
Below is the snippet to change the distances and angles to x, y points to be generated in ArcMap 10.2
#Using trig to deflect from a starting point
import arcpy
from math import radians, sin, cos
origin_x, origin_y = (400460.99, 135836.7)
distance = 800
angle = 15 # in degrees
# calculate offsets with light trig
(disp_x, disp_y) = (distance * sin(radians(angle)),\
distance * cos(radians(angle)))
(end_x, end_y) = (origin_x + disp_x, origin_y + disp_y)
output = "offset-line.shp"
arcpy.CreateFeatureClass_management("c:\workspace", output, "Polyline")
cur = arcpy.InsertCursor(output)
lineArray = arcpy.Array()
# start point
start = arcpy.Point()
(start.ID, start.X, start.Y) = (1, origin_x, origin_y)
lineArray.add(start)
# end point
end = arcpy.Point()
(end.ID, end.X, end.Y) = (2, end_x, end_y)
lineArray.add(end)
# write our fancy feature to the shapefile
feat = cur.newRow()
feat.shape = lineArray
cur.insertRow(feat)
# yes, this shouldn't really be necessary...
lineArray.removeAll()
del cur
Any suggestions would be greatly appreciated.
Thank you for your valuable time and knowledge.
You can create a dictionary of dictionaries from given table that would hold all the different values. Such as
d = {1:{"x":400460.99,"y":135836.7,"distance":800,"angle":15},
2:{"x":"etc","y":"etc","distance":"etc","angle":"etc"}}
for k in d.keys():
origin_x, d[k]["x"]
origin_y = d[k]["y"]
distance = d[k]["distance"]
angle = d[k]["angle"]
#rest of the code
#.....
I have been banging my head against this for some time now. My problem is very simple to explain:
I have data containing longitudes and latitudes. For simplicity, let us assume these are coordinates of cities. What I want is to separate these city coordinates into groups, so that all cities within a group lie within a given 'maximum distance' to it's nearest neighbour. All cities within a group must have at least one neighbour within this distance limit. The minimum distance between these separated groups is therefore greater than 'maximum distance' mentioned above.
My understanding is that this is a clustering problem (e.g. minimum spanning tree). The distance on the sphere can be calculated with the haversine distance, but I can't wrap my head around how to implement this...my restriction are that I can only use numpy, scipy, and scikit-learn.
I hope someone can help
thanks
Ok, so I have implemented a brute force approach to solve this. I am not 100% sure if the results are correct in all cases, though...if some of you have time to check this, it would be greatly appreciated.
import numpy as np
import matplotlib.pyplot as plt
# -------------------------------------------------------------------
def distance_sphere(lon1, lat1, lon2, lat2):
# Calculate distance on sphere
return np.degrees(np.arccos(np.sin(np.radians(lat1)) * np.sin(np.radians(lat2)) +
np.cos(np.radians(lat1)) * np.cos(np.radians(lat2)) *
np.cos(np.radians(lon1 - lon2))))
# -------------------------------------------------------------------
def distance_euclid(lon1, lat1, lon2, lat2):
# Calculate distance
return np.sqrt((lon1 - lon2)**2 + (lat1 - lat2)**2)
# -------------------------------------------------------------------
# Maximum allowed distance in degrees
max_distance = 10
# Generate city coordinates
lon_all = np.random.random(100) * 100
lat_all = np.random.random(100) * 100
# Start with as many groups as cities
group = np.arange(len(lon_all))
# Loop over all city coordinates
for lon, lat in zip(lon_all, lat_all):
# Calculate distance to all other cities
dis = distance_euclid(lon1=lon, lat1=lat, lon2=lon_all, lat2=lat_all)
# Get index of those which are within the given limits
idx = np.where(dis <= max_distance)[0]
# If there is no other city, we continue
if len(idx) == 0:
continue
# Set common group for all cities within the limits
for i in idx:
group[group == group[i]] = min(group[idx])
# Rewrite labels starting with 0
for old, new in zip(set(group), range(len(set(group)))):
idx = [i for i, j in enumerate(group) if j == old]
group[idx] = new
# -------------------------------------------------------------------
# Plot results
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[10, 10])
for g, lon, lat in zip(group, lon_all, lat_all):
ax.annotate(str(g), xy=(lon, lat), xycoords="data", size=12, ha="center", va="center")
circ = plt.Circle((lon, lat), radius=max_distance/2, lw=0, color="gray")
ax.add_patch(circ)
ax.set_xlim(-10, 110)
ax.set_ylim(-10, 110)
plt.show()
From the graphical output as it stands in your answer, I believe that your clusters are being terminated prematurely. This is my approach to the problem; the code is ugly because really I just wanted to demonstrate the concept and I don't have time to think about the most elegant way to illustrate this. Also, it's not in numpy because then I could steal my old distance calculation function to save me some time. Hopefully the concept though is clear enough and you'll see how it could be made faster and cleaner e.g. not repeatedly rebuilding available_locations and maybe not re-scanning items in the cluster from previous iteration.
Edit: Illustrated behaviour:
1) Always converges on same solution for each DISTANCE_CAP regardless of all the randomisation in the initialisation and progression of the solution
2) Modifying DISTANCE_CAP can result in single-location clusters or a giant blob
import math
from random import choice, shuffle
DISTANCE_CAP = 20
def crow_flies(lat1, lon1, lat2, lon2):
dx1,dy1 = (lat1/180)*3.141593,(lon1/180)*3.141593
dx2,dy2 = (lat2/180)*3.141593,(lon2/180)*3.141593
dlat,dlon = abs(dx2-dx1),abs(dy2-dy1)
a = (math.sin(dlat/2))**2 + (math.cos(dx1) * math.cos(dx2)
* (math.sin(dlon/2))**2)
c = 2*(math.atan2(math.sqrt(a),math.sqrt(1-a)))
km = 6373 * c
return km
# Aim: separate these back out
manchester = [[53.486286, -2.251476, 1],
[53.483586, -2.254534, 2],
[53.475158, -2.248011, 3],
[53.397161, -2.509189, 4]]
stoke = [[53.037375, -2.262903, 5],
[53.031031, -2.199587, 6]]
birmingham = [[52.443368, -1.975714, 7],
[52.429641, -1.902849, 8],
[52.483326, -1.817483, 9]]
# Mix them all together
combined_list = [item for item in manchester]
for item in stoke:
combined_list.append(item)
for item in birmingham:
combined_list.append(item)
shuffle(combined_list)
# Build a matrix:
matrix = {}
for item in combined_list:
for pair_item in combined_list:
if item[2] != pair_item[2]:
distance = crow_flies(item[0], item[1], pair_item[0], pair_item[1])
matrix[(item[2], pair_item[2])] = distance
# pick a random starting location
available_locations = [combined_list[x][2] for x in range(len(combined_list))]
start_loc = choice(available_locations)
available_locations = [a for a in available_locations if a != start_loc]
all_clusters = []
single_cluster = []
single_cluster.append(start_loc)
# RECURSIVELY add items to our cluster until it cannot get larger, then start a
# new one
cluster_got_bigger = True
while available_locations:
if cluster_got_bigger == True:
cluster_got_bigger = False
for loc in single_cluster:
for item in available_locations:
distance = matrix[(loc, item)]
if distance < DISTANCE_CAP:
single_cluster.append(item)
available_locations = [a for a in available_locations if a != item]
cluster_got_bigger = True
if cluster_got_bigger == False:
all_clusters.append(single_cluster)
single_cluster = []
new_seed = choice(available_locations)
single_cluster.append(new_seed)
available_locations = [a for a in available_locations if a != new_seed]
cluster_got_bigger = True
if not available_locations:
all_clusters.append(single_cluster)
print all_clusters
May be my answer is too late.
But a quick solution is to construct a network data-structure from your cities and get the connected components of your graph:
Each city is a node
There is an edge between two cities if their inter-distance is lower than some threshold
Finally, use some python network module (i.e NetworkX).
The code will be something like this:
import networkx as nx
graph = nx.Graph()
# Add all vertices (cities) to the graph
for i, city in enumerate(cities):
graph.add_vertex(i)
# Add edges between cities that lie under a distance threshold
for i, city_one in enumerate(cities):
for j, city_two in enumerate(cities):
if j > i:
link_exists = calculate_distance(city_one, city_two) < threshold
if link_exists:
graph.add_edge(i,j)
# A list of sets, each set has the indices of cities
components = [c for c in sorted(nx.connected_components(G), reverse=False)]
The calculate_distance and threshold are supposed to be known, the first is a function and the second is the distance threshold.