Calculate (road travel) distance between postcodes/zipcodes python - python

I have a csv file with start and end postcodes (UK equivalent of US zipcodes) and would like to compute simple distance, road travel distance and travel time between the two. I guess the way to go would be to use Google maps in one way or another. I first tried using some spreadhsheet and the following url http://maps.google.com/maps?saddr="&B2&"&daddr="&A2&" but
I do not know how to retrieve the resulting distance from google maps
I would like to know some more pythonic way to work this out

The distance between postal codes can be obtained with the pgeocode library. Unlike the above response, it does not query a web API, and is therefore more suitable for processing large amounts of data,
>>> import pgeocode
>>> dist = pgeocode.GeoDistance('GB')
>>> dist.query_postal_code('WC2N', 'EH53')
536.5 # retured distance in km
More information about these postal codes, including latitude and longitude, can be queried with,
>>> nomi = pgeocode.Nominatim('GB')
>>> nomi.query_postal_code(['WC2N', 'EH53'])
postal_code country code place_name \
0 WC2N GB London
1 EH53 GB Pumpherston, Mid Calder, East Calder, Oakbank
state_name state_code county_name county_code community_name \
0 England ENG Greater London 11609024 NaN
1 Scotland SCT West Lothian WLN NaN
community_code latitude longitude accuracy
0 NaN 51.5085 -0.125700 4.0
1 NaN 55.9082 -3.479025 4.0
This uses the GeoNames postal code dataset to get the GPS coordinates, then computes the Haversine (great circle) distance on those. Most countries are supported.
In the particular case of Great Britain, only the outward codes are included in the GB dataset, the full dataset is also available as GB_full but it is currently not supported in pgeocode.

The main issue with finding a distance between 2 postcodes is that they aren't designed for it.
For the purposes of directing mail, the United Kingdom is divided by
Royal Mail into postcode areas. -Wikipedia
A postcode by itself provides no useful information, so you are correct you need help from an external source. The Google maps service at http://maps.google.com is of no use, as it's not designed for you to retrieve information like this.
Option 1 - Google Maps API
The Google Maps API is feature packed and will provide you with a lot of options. The link above is to the Distance Matrix API, which will help with working out distances between 2 points. The results from this will be based on travel (so driving distance), this may or may not be what you want.
Example
Python 3
import urllib.request
import json
res = urllib.request.urlopen("https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=SE1%208XX&destinations=B2%205NY").read()
data = json.loads(res.decode())
print(data["rows"][0]["elements"][0]["distance"])
# {'text': '127 mi', 'value': 204914}
Note: Google Maps API is subject to usage limits.
Option 2 - Do it yourself with postcodes.io
postcodes.io has a nice API backed by a public data set. Example lookup. Results are in JSON which can be mapped to a Python dictionary using the json module. The downside here is it provides no way to check distance, so you will have to do it yourself using the Longitude and Latitude returned.
Example
Python 3
import urllib.request
import json
res = urllib.request.urlopen("http://api.postcodes.io/postcodes/SE18XX").read()
data = json.loads(res)
print(data["result"]["longitude"], data["result"]["latitude"])
# -0.116825494204512 51.5057668390097
Calculating distance
I don't want to get too much into this because it's a big topic and varies greatly depending on what you're trying to achieve, but a good starting point would be the Haversine Formula, which takes into account the curvature of the Earth. However, it assumes the Earth is a perfect sphere (which it's not).
The haversine formula determines the great-circle distance between two
points on a sphere given their longitudes and latitudes. Important in
navigation, it is a special case of a more general formula in
spherical trigonometry, the law of haversines, that relates the sides
and angles of spherical triangles.
Here is an example of it implemented in Python: https://stackoverflow.com/a/4913653/7220776

This looks like the perfect resource for you (they provide lat and long values for each postcode in the UK, in various formats): https://www.freemaptools.com/download-uk-postcode-lat-lng.htm
and in particular this CSV file (linked in the same page):
https://www.freemaptools.com/download/full-postcodes/ukpostcodes.zip
Once you match geographical coordinates to each postcode you have (out of the scope of this question), say you'll have a table with 4 columns (i.e. 2 (lat, long) values per postcode).
You can compute the distances using numpy. Here's an example:
import numpy as np
latlong = np.random.random((3,4))
# Dummy table containing 3 records, will look like this:
# array([[ 0.258906 , 0.66073909, 0.25845113, 0.87433443],
# [ 0.7657047 , 0.48898144, 0.39812762, 0.66054291],
# [ 0.2839561 , 0.04679014, 0.40685189, 0.09550362]])
# The following will produce a numpy array with as many elements as your records
# (It's the Euclidean distance between the points)
distances = np.sqrt((latlong[:, 3] - latlong[:, 1])**2 + (latlong[:, 2] - latlong[:, 0])**2)
# and it look like this:
# array([ 0.21359582, 0.405643 , 0.13219825])

The simplest way to calculate the distance between two UK postcodes is not to use latitude and longitude but to use easting and northing instead.
Once you have easting and northing you can just use Pythagoras's theorem to calculate the distance, making the maths much simpler.
Get the easting and northing for the postcodes. You can use Open Postcode Geo for this.
Use the below formula to find the distance:
sqrt(pow(abs(easting1 - easting2),2) + pow(abs(northing1 - northing1),2))
This example is from MySQL but you should be able to find similar functions in both Excel and Python:
sqrt(): Find the square root.
pow(): Raise to the power of.
abs(): Absolute
value (ignore sign).

Related

Python and Pandas - Distances with latitude and longitude

I am trying compare distances between points (in this case fake people) in longitudes and latitudes.
I can import the data, then convert the lat and long data to radians and get the following output with pandas:
lat long
name
Veronica Session 0.200081 0.246723
Lynne Donahoo 0.775020 -1.437292
Debbie Hanley 0.260559 -1.594263
Lisandra Earls 1.203430 -2.425601
Sybil Leef -0.029293 0.592702
From there i am trying to compare different points and get the distance between them.
I came across a post that seemed to be of use (https://stackoverflow.com/a/40453439/15001056) but I am unable to get this working for my data set.
Any help in calculating the distance between points would be appreciated. Idealy id like to expand and optimise the route once the distance function is working.
I used the function in the answer you linked and it worked fine. Can't confirm that the distance is in the unit you need though.
df['dist'] = \
haversine(df.lat.shift(), df.long.shift(),
df.loc[1:, 'lat'], df.loc[1:, 'long'], to_radians=False)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Veronica Session 0.200081 0.246723 NaN
Lynne Donahoo 0.775020 -1.437292 9625.250626
Debbie Hanley 0.260559 -1.594263 3385.893020
Lisandra Earls 1.203430 -2.425601 6859.234096
Sybil Leef -0.029293 0.592702 12515.848878

Most effective pathway to compare geographic locations? solutions pls

Say I have a location A, with IP address, and a geolocator API with latitude and longitude. Now I want to find all instances that are within a 25-mile radius of location A. How can I compute this with the least amount of steps?
solution A: I can compute all distances between location A and all instances in the database, and display instances within 25 radius. (way too slow especially if I want a dynamic location, with a large database of locations)
solution B: I can group all instances in terms of zip code in addition to IP, and (lat, long). so that fewer distances between location A and instances needed to be computed. (better, but what if the IP address is at the border of another zip code, this will add to the amount of needed computation)
solution C: I can use trigonometry. using the latitude and longitude of location A. i can find each instance with in the 25 mile radius.
Can someone please describe a better way of comparing distances? ideas and suggestions are much appreciated (if further explanation is needed, pls ask) Thanks.
I'd use a combination of your proposed solutions a and c. You can query your database directly using a filter that only selects the locations within a 25 miles radius (or any other radius). Calculating the longitudinal difference in miles is a little tricky because the mileage of one degree in longitude differs with latitude. Kudos go out to this explanation: https://gis.stackexchange.com/questions/142326/calculating-longitude-length-in-miles#142327
Assuming you have the following DB schema with existing locations (only latitude and longitude as columns):
CREATE TABLE location (
lat REAL,
lon REAL
);
You're able to filter only the locations within a 25 mile radius using this query:
query = """
SELECT (lat - ?) AS difflat,
(lon - ?) AS difflon
FROM location
WHERE POWER(POWER(difflat * 69.172, 2) + POWER(difflon * COS(lat*3.14/180.) * 69.172, 2), 0.5) < ?;
"""
Using the query then like this:
radius = 25 #miles
cursor.execute(query, (querylocation['lat'], querylocation['lat'], radius))
SQLite3 unfortunately doesn't support basic mathematical functions like COS and POWER but they can easily be created:
import math
con = sqlite3.connect(db_path)
con.create_function('POWER', 2, math.pow)
con.create_function('COS', 1, math.cos)

Excluding data points based on proximity in a scatterplot

I am trying to create a representation of Amsterdam's channels based on a very large data set of coordinates send through AIS. As the AIS is sometimes calibrated wrong, some coordinates are not on the actual channel, but rather on urban structures. Luckily, this happens relatively few times. As a result these datapoints are not in close proximity of other data points / data point clusters. As such, I want to exlude these data points which are do not have a 'neighbour' with a margin (say 5 meters in real life) in the most pythonic way. Would anyone know how to approach this problem? My data is a simple pandas dataframe:
lng lat
0 4.962218 52.362260
1 4.882198 52.406013
2 4.918583 52.335535
3 4.908185 52.381353
4 5.020983 52.277188
... ... ...
2249835 4.979960 52.352660
2249836 4.914533 52.334980
2249837 4.856630 52.401977
2249838 4.971418 52.357525
2249839 5.042353 52.402142
[2211095 rows x 2 columns]
and the map currently looks as follows, I have marked examples of coordinates I want filter out / exclude:

How to delete CSV table values outside of a longitude/latitude radius?

I have a csv file with a table that has the columns Longitude, Latitude, and Wind Speed. I have a code that takes a csv file and deletes values outside of a specified bound. I would like to retain values whose longitude/latitude is within a 0.5 lon/lat radius of a point located at -71.5 longitude and 40.5 latitude.
My example code below deletes any values whose longitude and latitude isn't between -71 to -72 and 40 to 41 respectively. Of course, this retains values within a square bound ±0.5 lon/lat around my point of interest. But I am interested in finding values within a circular bound with radius 0.5 lon/lat of my point of interest. How should I modify my code?
import pandas as pd
import numpy
df = pd.read_csv(r"C:\\Users\\xil15102\\Documents\\results\\EasternLongIsland50.csv") #file path
indexNames=df[(df['Longitude'] <= -72)|(df['Longitude']>=-71)|(df['Latitude']<=40)|(df['Latitude']>=41)].index
df.drop(indexNames,inplace=True)
df.to_csv(r"C:\\Users\\xil15102\\Documents\\results\\EasternLongIsland50.csv")
Basically you need to check if a value is a certain distance from a central point (-71.5 and 40.5); to do this use the pythagorean theorem/distance formula:
d = sqrt(dx^2+dy^2).
So programmatically, I would do this like:
from math import sqrt
drop_indices = []
for row in range(len(df)):
if (sqrt(abs(-71.5 - df[row]['Longitude'])*abs(-71.5 - df[row]['Longitude']) + abs(40.5-df[row]['Latitude'])*abs(40.5-df[row]['Latitude']))) > 0.5:
drop_indices.append(row)
df.drop(drop_indices)
Sorry that is a sort for disgusting way to get rid of the rows and your way looks much better, but the code should work.
You should write a function to calculate the distance from your point of interest and drop those. Some help here. Pretty sure the example below should work if you implement is_not_in_area as a function to calculate the distance and check if dist < 0.5.
df = df.drop(df[is_not_in_area(df.lat, df.lon)].index)
(This code lifted from here)
Edit: drop the ones that aren't in area, not the ones that are haha.

Calculate Km with latitude and longitude of different DataFrames Python Pandas

I have 4 Dataframes (ticket_data.csv, providers.csv, stations.csv and cities.csv)
In stations.csv I have 2 colls called o_city (origin city) and d_city (destination city) those two colls gives me the id of the city i need to look for in cities.csv
In cities.csv I have the lat and long of each city.
How can i calculate the distance between o_city and d_city for each ticket ? I tried to use pyproj but I didn't find a way to make it work with each ticket..
Screenshot of csv files :
ticket_data.csv
cities.csv
Welcome to StackOverflow! In your cities dataframe, assuming here it is called city_df; for each row you can use something called the haversine distance formula from Euclidean geometry to calculate the distance between two coordinate pairs on Earth's surface. Here is an example of some dummy Python3 code of roughly how you may go about this (just using two pairs of coordinates for ease of communication):
from haversine import haversine
distance = haversine((city_df[origin_lat][0], city_df[origin_lon][0]), (city_df[destination_lat][0], city_df[destination_lon][0]))
The coordinates must be in decimal degree notation as in 43.9202 instead of 43* 38" 67' notation. Given this, the output value of distance will be in km units.
Hope this helps you get closer to solving your problem!
P. S. - you may need to install haversine, as it is not in the standard libary

Categories