Find the nearest set of coordinates in commandline - python

I'm looking for a commandline solution to find the nearest sets of points from a list of CSV coordinates.
Here this was answered for Excel, but I need a somewhat different solution.
I'm NOT looking for the nearest point for every point, but for the point pairs with least distance from each other.
I would like to match many power plants from GEO, so a (python?) commandline tool would be great.
Here is an example dataset:
Chicoasén Dam,16.941064,-93.100828
Tuxpan Oil Power Plant,21.014891,-97.334492
Petacalco Coal Power Plant,17.983575,-102.115252
Angostura Dam,16.401226,-92.778926
Tula Oil Power Plant,20.055825,-99.276857
Carbon II Coal Power Plant,28.467176,-100.698559
Laguna Verde Nuclear Power Plant,19.719095,-96.406347
Carbón I Coal Power Plant,28.485238,-100.69096
Manzanillo I Oil Power Plant,19.027372,-104.319274
Tamazunchale Gas Power Plant,21.311282,-98.756266
The tool should print "Carbon II" and "Carbon I", because this pair has the minimal distance.
A code fragment could be:
from math import radians, cos, sin, asin, sqrt
import csv
def haversine(lon1, lat1, lon2, lat2):
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6371 * c
return km
with open('mexico-test.csv', newline='') as csvfile:
so = csv.reader(csvfile, delimiter=',', quotechar='|')
data = []
for row in so:
data.append(row)
print(haversine(28.467176,-100.698559,28.485238,-100.69096))

A simple method is to compute all pairs, then find the minimum pair, where the "size" of a pair is defined as the distance between the two points in the pair:
from itertools import combinations
closest = min(combinations(data, 2),
key=lambda p: haversine(float(p[0][1]), float(p[0][2]), float(p[1][1]), float(p[1][2])))
To get the five smallest, use a heap with the same key.
import heap
pairs = list(combinations(data, 2))
heap.heapify(pairs)
five_smallest = heapq.nsmallest(
5,
combinations(data, 2),
key=lambda p: haversine(float(p[0][1]), float(p[0][2]), float(p[1][1]), float(p[1][2])))

Related

Calculate the speed from two longitude and latitude GPS coordinates

Im fiddling around with a GPS i bought a while back, a "GlobalSat GPS Receiver" model BU-S353S4. And it seems to be working well!
I have been able to read the GPS signal from the device with the help from "Cody Wilsons" excellent explanation! And convert the GPGGA output to a longitude and latitude coordinate with the help of the excellent python package "Pynmea2".
But how can I calculate my current speed, from two positions? I have found this thread that refers to this thread on how to calculate the distance with the Haversine formula.
My code looks like this:
from math import radians, cos, sin, asin, sqrt
import serial
import pynmea2
ser = serial.Serial('/dev/ttyUSB0', 4800, timeout = 5)
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance in kilometers between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles. Determines return value units.
return c * r
counter = 0
while 1:
line = ser.readline().decode('UTF-8')
splitline = line.split(',')
if splitline[0] == '$GPGGA':
counter += 1
msg = line
data = pynmea2.parse(msg)
lat1 = data.latitude
lon1 = data.longitude
if counter % 2:
distance = haversine(lon1, lat1, data.longitude, data.latitude)
print(distance)
This outputs approx every the distance i have traveled. But heres the problem, it always returns 0.0. I maybe havent traveled far enough?
And how should i proceed to calculate the speed?
I know the formula speed = distance/time. Pynmea2 have a time property (msg.timestamp). But frankly, i dont know how to do this.
Final result that seems to be working
from math import radians, cos, sin, asin, sqrt, atan2, degrees
import serial
import pynmea2
ser = serial.Serial('/dev/ttyUSB0', 4800, timeout = 5)
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance in kilometers between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles. Determines return value units.
return c * r
prev_data = None
while 1:
line = ser.readline().decode('UTF-8')
splitline = line.split(',')
if splitline[0] == '$GPGGA':
msg = line
data = pynmea2.parse(msg)
if prev_data is not None:
distance = haversine(data.longitude, data.latitude, prev_data.longitude, prev_data.latitude)
print('distance', distance)
print('speed', round(distance*3600, 2))
prev_data = data
The interface updates once every second hence the speed formula.
Your formula for haversine is correct.
You have a bug in your program that you are setting lat1 and lon1 to data.longtitude and data.latitude, and then immediately using these to test the distance from data.longitude and data.latitude. Of course you're going to get a distance of zero. You need to separate the "current" longitude and latitude from the "previous" longitude and latitude.
(Though minor nit. You're not calculating the haversine, you're calculating the distance using haversine. You probably just want to name your function distance. hav(x) = sin(x/2)**2'
prev_data = None
while True:
....
if prev_data is not None:
distance = haversine(data.longitude, data.latitude,
prev_data.longitude, prev_data.latitude)
pre_data = data
....

Nested dict from for loop adding same values to all nested keys

I have address data and shapefiles with polygons, and am trying to determine the closest distance (in miles) of each address from each polygon, then create a nested dict containing all the info, with this format:
nested_dict = {poly_1: {address1: distance, address2 : distance},
poly2: {address1: distance, address2: distance}, etc}
The full, applicable code I'm using is:
import pandas as pd
from shapely.geometry import mapping, Polygon, LinearRing, Point
import geopandas as gpd
from math import radians, cos, sin, asin, sqrt
address_dict = {k: [] for k in addresses_geo.input_string}
sludge_dtc = {k: [] for k in sf_geo.unique_name}
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956 # Radius of earth in miles. Use 6371 for kilometers
return c * r
# Here's the key loop that isn't working correctly
for unique_name, i in zip(sf_geo.unique_name, sf_geo.index):
for address, pt in zip(addresses_geo.input_string, addresses_geo.index):
pol_ext = LinearRing(sf_geo.iloc[i].geometry.exterior.coords)
d = pol_ext.project(addresses_geo.iloc[pt].geometry)
p = pol_ext.interpolate(d)
closest_point_coords = list(p.coords)[0]
# print(closest_point_coords)
dist = haversine(addresses_geo.iloc[pt].geometry.x,
addresses_geo.iloc[pt].geometry.y,
closest_point_coords[0], closest_point_coords[1])
address_dict[address] = dist
sludge_dtc[unique_name] = address_dict
# Test results on a single address
addresses_with_sludge_distance = pd.DataFrame(sludge_dtc)
print(addresses_with_sludge_distance.iloc[[1]].T)
If I break this code out and try and calculate the distances for a single polygon, it seems to work fine. However, when I create the DataFrame and check an address, it lists the same distance for every single polygon.
So, inner-dict-key '123 Main Street' will have 5.25 miles for each of the polygon keys in the outer dict, and '456 South Street' will have 6.13 miles for each of the polygon keys in the outer dict. (Made up examples.)
I realize I must be doing something dumb in the way I have the for loops set up, but I can't figure it out. I've reversed the order of the for statements, messed with indents-- all the same result.
To make it clear, what I want to happen is:
Take a single polygon, then
For each address in the address data, find the distance from that polygon and add to the address_dict dictionary with the address as the key and the distance as the value
When all addresses have been calculated, add the entire address dict as the value for the polygon key in sludge_dtc
Move on to the next polygon and continue
Any ideas what I'm missing?
The problem is very simple, you are always using the same address_dict instance.
You just need to recreate it inside every key loop.
import pandas as pd
from shapely.geometry import mapping, Polygon, LinearRing, Point
import geopandas as gpd
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956 # Radius of earth in miles. Use 6371 for kilometers
return c * r
sludge_dtc = {k: [] for k in sf_geo.unique_name}
# Here's the key loop that isn't working correctly
for unique_name, i in zip(sf_geo.unique_name, sf_geo.index):
address_dict = {k: [] for k in addresses_geo.input_string}
for address, pt in zip(addresses_geo.input_string, addresses_geo.index):
pol_ext = LinearRing(sf_geo.iloc[i].geometry.exterior.coords)
d = pol_ext.project(addresses_geo.iloc[pt].geometry)
p = pol_ext.interpolate(d)
closest_point_coords = list(p.coords)[0]
# print(closest_point_coords)
dist = haversine(addresses_geo.iloc[pt].geometry.x,
addresses_geo.iloc[pt].geometry.y,
closest_point_coords[0], closest_point_coords[1])
address_dict[address] = dist
sludge_dtc[unique_name] = address_dict
# Test results on a single address
addresses_with_sludge_distance = pd.DataFrame(sludge_dtc)
print(addresses_with_sludge_distance.iloc[[1]].T)
Another consideration:
Your are creating empty dictionaries with empty lists as values, but after you set values directly (empty list are replaced). If you need to collect a list of values you should append values to the existing list, eg:
address_dict[address].append(dist)
and
sludge_dtc[unique_name].append(address_dict)

Lat/lon distance calculation by converting to utm gives different results than using approximate method

I've got two methods for calculating the distance between georeferenced coordinates in python:
from pyproj import Proj
import math
def calc_distance(lat1, lon1, lat2, lon2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat / 2) ** 2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
c = 2 * math.asin(math.sqrt(a))
km = 6371 * c
return km
def calc_distance_convert_utm(lat1, lon1, lat2, lon2):
myProj = Proj("+proj=utm +zone=42, +north +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
# convert to utm
utm_x1, utm_y1 = myProj(lat1, lon1)
utm_x2, utm_y2 = myProj(lat2, lon2)
diff_x = abs(utm_x1 - utm_x2)
diff_y = abs(utm_y1 - utm_y2)
distance = math.sqrt(diff_x**2 + diff_y**2)
return distance
Which I call with the following values:
lat1 = 34.866527
lon1 = 69.674606
lat2 = 34.864990
lon2 = 69.657655
print "approximation method: ", calc_distance(lat1, lon1, lat2, lon2)
print "converting to utm method: ", calc_distance_convert_utm(lat1, lon1, lat2, lon2)
However, If I compare the results, I get two different values:
approximation method: 1.55593476881
converting to utm method: 1928.21537269
Note, that the first method returns the distance in kilometers while the second returns it in meters.
I've compared the result with distance calculators which you can find online, and it seems that the first method (approximation method) is the "more correct" answer as this is the value most online calculators return. I wonder, why the second method (converting to utm first) does not return a more similar result (something like 1555.9347...). I have a difference of almost 0.5 km which seems pretty much to me.
Did I do anything wrong?
Any help is appreciated! Thanks
I've found the error ...
In the utm converting method I've switched the lat/lon values in the conversion process. It should be:
utm_x1, utm_y1 = myProj(lon1, lat1)
utm_x2, utm_y2 = myProj(lon2, lat2)

ST_Distance_Sphere() in Python?

I am working on a Python project where I have two lat/long pairs and I want to calculate the distance between them. In other projects I have calculated distance in Postgres using ST_Distance_Sphere(a.loc_point, b.loc_point), but I would like to avoid having to load all of my data into Postgres just so that I can calculate distance differences. I have searched, but have not been able to find what I would like, which is a purely Python implementation of this so that I don't have to load my data into Postgres.
I know there are other distance calculations that treat the earth as a perfect sphere, but those aren't good enough due to poor accuracy, which is why I would like to use the PostGIS ST_Distance_Sphere() function (or an equivalent).
Here are a couple of sample Lat/Longs that I would like to calculate the distance of:
Lat, Long 1: (49.8755, 6.07594)
Lat, Long 2: (49.87257, 6.0784)
I can't imagine I am the first person to ask this, but does anyone know of a way to use ST_Distance_Sphere() for lat/long distance calculations purely from within a Python script?
I would recommend the geopy package - see section Measuring Distance in the documentation...
For your particular case:
from geopy.distance import great_circle
p1 = (49.8755, 6.07594)
p2 = (49.87257, 6.0784)
print(great_circle(p1, p2).kilometers)
This is a rudimentary function used to calculate distance between two coordinates on a perfect sphere with Radius = Radius of Earth
from math import pi , acos , sin , cos
def calcd(y1,x1, y2,x2):
#
y1 = float(y1)
x1 = float(x1)
y2 = float(y2)
x2 = float(x2)
#
R = 3958.76 # miles
#
y1 *= pi/180.0
x1 *= pi/180.0
y2 *= pi/180.0
x2 *= pi/180.0
#
# approximate great circle distance with law of cosines
#
x = sin(y1)*sin(y2) + cos(y1)*cos(y2)*cos(x2-x1)
if x > 1:
x = 1
return acos( x ) * R
Hope this helps!
See this How can I quickly estimate the distance between two (latitude, longitude) points?
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
By Aaron D
You can modify it to return miles by adding miles = km * 0.621371
I have since found another way in addition to the answers provided here. Using the python haversine module.
from haversine import haversine as h
# Return results in meters (*1000)
print '{0:30}{1:12}'.format("haversine module:", h(a, b)*1000)
I tested all three answers plus haversine module against what I got using ST_Distance_Sphere(a, b) in Postgres. All answers were excellent (thank you), but the all math answer (calcd) from Sishaar Rao was the closest. Here are the results:
# Short Distance Test
ST_Distance_Sphere(a, b): 370.43790478
vincenty: 370.778186438
great_circle: 370.541763803
calcd: 370.437386736
haversine function: 370.20481753
haversine module: 370.437394767
#Long Distance test:
ST_Distance_Sphere(a, b): 1011734.50495159
vincenty: 1013450.40832
great_circle: 1012018.16318
calcd: 1011733.11203
haversine function: 1011097.90053
haversine module: 1011733.11203

IDW interpolation of point data using python and gdal

I have a CSV file with the Lat, Long and Rainfall Information. I would like to interpolate those point and create tiff file. Can any one can suggest me the easiest way to do that.
I am trying to using gdal_grid. I am very new on using gdal in python.
This is actually several questions. Assuming you have some scattered data for lats and longs you'll to build all the location were you want to make estimation (all lats and longs for the pixels of you Tiff image).
Once you have that you can use any of the solutions around to do IWD over your data (using a recent example in another question):
class Estimation():
# IWD. Check: https://stackoverflow.com/questions/36031338/interpolate-z-values-in-a-3d-surface-starting-from-an-irregular-set-of-points/36037288#36037288
def __init__(self,lon,lat,values):
self.x = lat
self.y = lon
self.v = values
def estimate(self,x,y,using='ISD'):
"""
Estimate point at coordinate x,y based on the input data for this
class.
"""
if using == 'ISD':
return self._isd(x,y)
def _isd(self,x,y):
#d = np.sqrt((x-self.x)**2+(y-self.y)**2)
d = x.copy()
for i in range(d.shape[0]):
d[i] = haversine(self.x[i],self.y[i],x,y)
if d.min() > 0:
v = np.sum(self.v*(1/d**2)/np.sum(1/d**2))
return v
else:
return self.v[d.argmin()]
The code above is actually adapted to calculate distance with the Haversine formula (which gives great-circle distances between two points on a sphere from their longitudes and latitudes). Notice again you can find all sorts of solutions for the haversine distance like this one:
def haversine(lon1, lat1, lon2, lat2):
"""
Check: https://stackoverflow.com/questions/15736995/how-can-i-quickly-estimate-the-distance-between-two-latitude-longitude-points
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
Finally once you have your array ready you should just build the Tiff using GDAL. For this check the following question for which I quote a part of it's solution:
driver = gdal.GetDriverByName('GTiff')
ds = driver.Create('output.tif',xsize, ysize, 1, gdal.GDT_Float32, )
# this assumes the projection is Geographic lat/lon WGS 84
srs = osr.SpatialReference()
srs.ImportFromEPSG(4326)
ds.SetProjection(srs.ExportToWkt())
gt = [ulx, xres, 0, uly, 0, yres ]
ds.SetGeoTransform(gt)
outband=ds.GetRasterBand(1)
outband.SetStatistics(np.min(mag_grid), np.max(mag_grid), np.average(mag_grid), np.std(mag_grid))
outband.WriteArray(mag_grid)

Categories