I have a code that takes the starting lat/lon, a bearing (direction), and then distance (in km) to find the new lat lon on a spherical earth. The code looks like:
get_new_lat_lon_from_distance_bearing_lat_lon(lat0,lon0,bearing,d):
import math
# Quick constant
R = 6378.1
# Convert to radians
lat1 = math.radians(lat0)
lon1 = math.radians(lon0)
brng = math.radians(bearing)
# Do some math for lat and lon
lat2 = math.asin( math.sin(lat1)*math.cos(d/R) + math.cos(lat1)*math.sin(d/R)*math.cos(brng))
lon2 = lon1 + math.atan2(math.sin(brng)*math.sin(d/R)*math.cos(lat1),math.cos(d/R)-math.sin(lat1)*math.sin(lat2))
# Reconvert to degrees
lat2 = math.degrees(lat2)
lon2 = math.degrees(lon2)
return lat2,lon2
I can then call this such that:
lat_s,lon_s = get_new_lat_lon_from_distance_bearing_lat_lon(yll,xll,180,cellsize*r)
lat_e,lon_e = get_new_lat_lon_from_distance_bearing_lat_lon(yll,xll,90,cellsize*c)
where yll = 130, and xll = 55
I want to move every 500m from this starting lat/lon position south (bearing = 180) and then also to the east (bearing = 90). The East direction should be 14,000 times, and then the South should be traversed 7000 times. In other words, we can loop through these such that:
nrows = 7000
ncols = 14000
c = 0.5 (0.5km)
# Pre-allocate
biglats = []
biglons = []
# Traverse south
for r in range(0,nrows):
lat_s,lon_s = get_new_lat_lon_from_distance_bearing_lat_lon(yll,xll,180,cellsize*r)
biglats.append(lat_s)
# Traverse East
for c in range(0,ncols):
lat_e,lon_e = get_new_lat_lon_from_distance_bearing_lat_lon(yll,xll,90,cellsize*c)
biglons.append(lon_e)
However, when I print the first and last values of each:
55.000000000147004
23.56327426583246
-130.00000000007
-56.372396480687385
23.56 should be 20, and -56.37 should be -60. The end goal would be to create a meshgrid of lat/lon with a [14000,7000] array. However, the calculations are wrong. What could be done to get a more correct lat/lon and/or some sort of 'meshgrid' of 14000,7000 values of lat/lon equally spaced 500m apart given the starting lat/lon provided?
Related
My task is as follows: knowing the center (starting point), for example - [{'lat': -7.7940023, 'lng': 110.3656535}] and knowing the radius 5km I need to get all the points included in this square in 1 km increments. How do I achieve this?
P.S Using the Haversine formula I can check if a point is in a given square according to the radius
Image
if you consider a spherical Earth with radius R, the associated angle of a segment with length L (5km in your case) is:
import numpy as np
R = 6378.0 # km
L = 5.0 # km
angle = np.degrees(L/R)
so now, you can easily check if a point is inside your square:
center = {'lat': -7.7940023, 'lng': 110.3656535}
point = {'lat': 'point latitude', 'lng': 'point longitude'} # insert here your values
if (point['lat']-center['lat'] < angle) and (point['lng']-center['lng'] < angle):
print('Point is inside')
else:
print('Point is outside')
EDIT: check the one below.
import numpy as np
R = 6378.0 # km
L = 5 # side of the square
center = {'lat': -7.7940023, 'lng': 110.3656535}
square_side = np.linspace(-L/2, L/2, L+1)
angle = np.degrees(square_side/R)
latitude, longitude = np.meshgrid(angle+center['lat'], angle+center['lng'])
points = []
for lat, lng in zip(latitude.flatten(), longitude.flatten()):
points.append({'lat': lat, 'lng': lng})
This example should illustrate the point:
import pandas as pd
import numpy as np
#for_map = pd.read_csv('campaign_contributions_for_map.tsv', sep='\t')
df_airports = pd.read_csv('C:\\airports.csv')
print(df_airports.head(3))
df_cities = pd.read_csv('C:\\worldcities.csv')
print(df_cities.head(3))
# join the two dataframes - must be the same length
df = pd.concat([df_cities, df_airports], axis=1)
# cast latitudes and longitudes to numeric
cols = ["lat", "lng", "latitude_deg", "longitude_deg"]
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)
# create a mask where our conditions are met (difference between lat fuze and lat air < 5 and difference between long fuze and long air < 0.1)
mask = ((abs(df["lat"] - df["latitude_deg"]) < .5) & (abs(df["lng"] - df["longitude_deg"]) < .5))
# fill the type column
df.loc[mask, 'Type'] = "Airport"
df.shape
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
df.head()
# The haversine formula determines the great-circle distance between two points on a sphere given
# their longitudes and latitudes. Important in navigation, it is a special case of a more general
# formula in spherical trigonometry, the law of haversines, that relates the sides and angles of
# spherical triangles.
lat1 = df['lat']
lon1 = df['lng']
lat2 = df['latitude_deg']
lon2 = df['longitude_deg']
from math import radians, cos, sin, asin, sqrt
def haversine(lat1, lon1, lat2, lon2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers is 6371
km = 6371* c
return km
# Creating a new column to generate the output by passing lat long information to Haversine Equation
df['distance'] = [haversine(df.lat[i],df.lng[i],df.latitude_deg[i],df.longitude_deg[i]) for i in range(len(df))]
df['distance'] = df['distance'].round(decimals=3)
# Printing the data table
df.sort_values(by=['distance'], inplace=True)
df.head()
# Let's sort by our newly created field, which identifies airport lat/lonn coordinates within .5 places of
# a city's lat/long coordinates
# Create a mask where our conditions are met (difference between lat and latitude_deg < 0.1 and
# difference between lng and longitude_deg < 0.1)
mask = ((abs(df["lat"] - df["latitude_deg"]) < 0.1) & (abs(df["lng"] - df["longitude_deg"]) < 0.1))
# Fill the type column
df.loc[mask, 'Type'] = "Airport"
df.sort_values(by=['Type'], inplace=True)
df.head()
More details here.
https://github.com/ASH-WICUS/Notebooks/blob/master/Haversine%20Distance%20-%20Airport%20or%20Not.ipynb
I have a high frequency of gps data which i want to downsample to every 50 meters ie keep gps latitude and longitude every 50 meter and discard inbetween points. I found a python code on the internet which basically calculates the distance between two points. But i am not sure how to basically read from a csv the lat and long values and feed it into the function and calculate the distance. If the distance reaches 50 meter i simply save that gps coordinates. So far, i have the following python code
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles
return c * r
x1 = 52.19421607
x2 = 52.20000327
y1 = -1.484984011
y2 = -1.48533465
result = haversine(x1,y1,x2,y2) #need to give input from a csv
#if result is greater than 50m , save the coordinates
print(result)
How can i solve the problem?Any direction would be appreciated.
Here is a outline and a working code example - where I made some assumptions about which to keep/drop. I assume the dataframe is sorted.
First calculate distance to next point, indeed use haversine for lat/long pairs. This part is not fast in my implementation - you can find faster.
Use cumsum() of distances, to create distance groups, where group 1 is all distances below 50, group 2 between 50 and 100, etc...
Within each group, keep for instance only the first()
Note that this is approximately each 50 units based on group, so be aware this is different than take a point and jump to next point which is closest to 50 units away and repeat. But for data reduction purposes it should be fine.
Generate some random data around London.
import numpy as np
import sklearn
import pandas as pd
LONDON = (51.509865, -0.118092)
random_gps = np.random.random( (10000,2) ) / 25
random_gps[:,0] += np.arange(random_gps.shape[0]) / 25
random_gps[:,0] += LONDON[0]
random_gps[:,1] += LONDON[1]
gps_data = pd.DataFrame( random_gps, columns=["lat","long"] )
Shift the data to get the lat/long of the next point
gps_data['next_lat'] = gps_data.lat.shift(1)
gps_data['next_long'] = gps_data.long.shift(1)
gps_data.head()
Define the distance metric. This part can be improved in terms of speed by using vector expressions with numpy, so if speed is important change this part.
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')
EARTH_RADIUS = 6371.009
def haversine_distance(row):
point_a = np.array([[row.lat, row.long]])
point_b = np.array([[row.next_lat, row.next_long]])
return EARTH_RADIUS * dist.pairwise(np.radians(point_a), np.radians(point_b) )[0][0]
and apply our distance function (slow part, which can be improved)
gps_data["distance_to_next"] = gps_data.apply( haversine_distance, axis=1)
gps_data["distance_cumsum"] = gps_data.distance_to_next.cumsum()
Finally, create groups and drop. (!) The haversine is returning the distance in KM - so here i wrongly did an example of 50 km instead of meters.
gps_data["distance_group"] = gps_data.distance_cumsum // 50
filtered = gps_data.groupby(['distance_group']).first()
I need to find the distance between two gps trajectories, from US 101 dataset, which covers totally 2000ft distance.
"Vehicle ID","Frame ID","Total Frames","Global Time","Local X","Local Y","Global X","Global Y","V_Len","V_Width","V_Class","V_Vel","V_Acc","Lane_ID","Pre_Veh","Fol_Veh","Spacing","Headway"
2,13,437,1118846980200,16.467,35.381,6451137.641,1873344.962,14.5,4.9,2,40.00,0.00,2,0,0,0.00,0.00
2,14,437,1118846980300,16.447,39.381,6451140.329,1873342.000,14.5,4.9,2,40.00,0.00,2,0,0,0.00,0.00
2,15,437,1118846980400,16.426,43.381,6451143.018,1873339.038,14.5,4.9,2,40.00,0.00,2,0,0,0.00,0.00
2,16,437,1118846980500,16.405,47.380,6451145.706,1873336.077,14.5,4.9,2,40.00,0.00,2,0,0,0.00,0.00
2,17,437,1118846980600,16.385,51.381,6451148.395,1873333.115,14.5,4.9,2,40.00,0.00,2,0,0,0.00,0.00
But when I am trying to find the distance between two adjacent points of the same vehicle, Its giving in more than 20 kms..
import math
def distance(origin, destination):
lat1, lon1 = origin
lat2, lon2 = destination
radius = 3959 * 5280 # km
dlat = math.radians(lat2-lat1)
dlon = math.radians(lon2-lon1)
a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
* math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
d = radius * c
return d
lat1 = 16.467; lat2 = 16.447; long1 = 35.381; long2 = 39.381;
print( distance((lat1, long1), (lat2, long2)) )
Can any you help me to find the distance between two adjacent trajectories
I need to segregate the dataset into subsections covering 200ft distance each..
Are you sure those coordinates are latitude and longitude?
I am not familiar with the dataset you are using. However, if I am not mistaken, this is it. And the documentation clearly states about Local X (my italics):
Lateral (X) coordinate of the front center of the vehicle in feet with respect to the left-most edge of the section in the direction of travel.
(and something similar for Local Y).
This code is a function which then runs it through a for loop with an if statement (not included for brevity but basically it lets me separate some factory output data by machine and location). The function is designed to convert x and y data i am given in meters from a known lat and long position and convert that to a new lat and long. I do this using Pythagoras and then a formula i found on SO.
It works without error but the math fails as it adds more or less a degree of longitude to the output. It should produce data very similar to the reference location as it is never more than about 50 meters away from it.
here is a section of the JSON from which the data is taken for interest
"id": "b4994c877c9c",
"name": "forklift_0001", <---forklift data used in IF statement
"areaId": "Tracking001",
"areaName": "hall_1",
"color": "#FF0000",
"coordinateSystemId": "CoordSys001",
"coordinateSystemName": null,
"covarianceMatrix": [
0.47,
0.06,
0.06,
0.61
],
"position": [
33.86, <---position data converted from known lat/long, X then Y.
33.07,
2.15
],
"positionAccuracy": 0.36,
"positionTS": 1489363199493,
"smoothedPosition": [
33.96,
33.13,
2.15
and here is the code
import json
import pprint
import time
import math
file_list = ['13_01.json']
output_nr = 1
def positionToLatLon( position ):
posx = position[0]
posy = position[1]
R = 6371 #Radius of the Earth
brng = 1.57 #Bearing is 90 degrees converted to radians.
d = math.sqrt((posx*posx) + (posy*posy)) #Distance in km from the lat/long #Pythagoras formula
lat1 = math.radians(40.477719)#reference lat point converted to radians
lon1 = math.radians(16.941589)#reference long point converted to radians
lat2 = math.asin(math.sin(lat1)*math.cos(d/R) + math.cos(lat1)*math.sin(d/R)*math.cos(brng))
lon2 = lon1 + math.atan2(math.sin(brng)*math.sin(d/R)*math.cos(lat1),
math.cos(d/R)-math.sin(lat1)*math.sin(lat2))
lat2 = math.degrees(lat2)
lon2 = math.degrees(lon2)
result = []
result.append(lat2)
result.append(lon2)
return result
So it runs without any errors but the output is incorrect, it adds more or less a degree of longitude and so moves the whole result about 60m east making the analysis no good and I cannot see why.
I've looked for a different formula but no luck and my maths isn't good enough to see if I am using an incorrect trig function or something.
All help appreciated.
Change brng = 1.57 to brng = (math.pi/2). This is not a suitable estimate, and is likely the source of some if not all of your error
I'm dealing with two sets of three large lists of the same size containing longitude, latitude and altitude coordinates in UTM format (see lists below). The arrays contain overlapping coordinates (i.e. longitude and latitude values are equal). If the values in Lon are equal to Lon2 and the values in Lat are equal to Lat2 then I want to calculate the mean altitude at those indexes. However, if they're not equal then the longitude, latitude and altitude values will remain. I only want to replace the overlapping data to one set of longitude and latitude coordinates and calculate the mean at those coordinates.
This is my attempt so far
import numpy as np
Lon = [450000.50,459000.50,460000,470000]
Lat = [5800000.50,459000.50,500000,470000]
Alt = [-1,-9,-2,1]
Lon2 = [450000.50,459000.50,460000,470000]
Lat2 = [5800000.50,459000.50,800000,470000]
Alt2= [-3,-1,-20,2]
MeanAlt = []
appendAlt = MeanAlt.append
LonOverlap = []
appendLon = LonOverlap.append
LatOverlap = []
appendLat = LatOverlap.append
for i, a in enumerate(Lon and Lat and Alt):
for j, b in enumerate(Lon2 and Lat2 and Alt2):
if Lon[i]==Lon2[j] and Lat[i]==Lat2[j]:
MeanAltData = (Alt[i]+Alt2[j])/2
appendAlt(MeanAltData)
LonOverlapData = Lon[i]
appendLat(LonOverlapData)
LatOverlapData = Lat[i]
appendLon(LatOverlapData)
print(MeanAlt) # correct ans should be MeanAlt = [-2.0,-5,1.5]
print(LonOverlap)
print(LatOverlap)
I'm working in a jupyter notebook and my laptop is rather slow so I need to make this code much more efficient. I would appreciate any help on this. Thank you :)
I believe your code can be improved in 2 ways:
Firstly, the usage of tuples instead of lists, as iterating over a tuple is generally faster than iterating over a list.
Secondly, your for loops can be reduced to only one loop that iterates over the indices of the tuples you are going to read. Of course, this assumption holds if and only if all your tuples contain the same amount of items (i.e.: len(Lat) == len(Lon) == len(Alt) == len(Lat2) == len(Lon2) == len(Alt2)).
Here is the improved code (I took the liberty of removing the import numpy statement as it was not being used in the piece of code you provided):
# use of tuples
Lon = (450000.50, 459000.50, 460000, 470000)
Lat = (5800000.50, 459000.50, 500000, 470000)
Alt = (-1, -9, -2, 1)
Lon2 = (40000.50, 459000.50, 460000, 470000)
Lat2 = (5800000.50, 459000.50, 800000, 470000)
Alt2 = (-3, -1, -20, 2)
MeanAlt = []
appendAlt = MeanAlt.append
LonOverlap = []
appendLon = LonOverlap.append
LatOverlap = []
appendLat = LatOverlap.append
# only one loop
for i in range(len(Lon)):
if (Lon[i] == Lon2[i]) and (Lat[i] == Lat2[i]):
MeanAltData = (Alt[i] + Alt2[i]) / 2
appendAlt(MeanAltData)
LonOverlapData = Lon[i]
appendLat(LonOverlapData)
LatOverlapData = Lat[i]
appendLon(LatOverlapData)
print(MeanAlt) # correct ans should be MeanAlt = [-2.0,-5,1.5]
print(LonOverlap)
print(LatOverlap)
I executed this program 1 million times on my laptop. Following my code, the amount of time required for all executions is: 1.41 seconds. On the other hand, with your approach the amount of time it takes is: 4.01 seconds.
This is not 100% functionally equivalent, but I am guessing it is closer to what you actually want:
Lon = [450000.50,459000.50,460000,470000]
Lat = [5800000.50,459000.50,500000,470000]
Alt = [-1,-9,-2,1]
Lon2 = [40000.50,459000.50,460000,470000]
Lat2 = [5800000.50,459000.50,800000,470000]
Alt2= [-3,-1,-20,2]
MeanAlt = []
appendAlt = MeanAlt.append
LonOverlap = []
appendLon = LonOverlap.append
LatOverlap = []
appendLat = LatOverlap.append
ll = dict((str(la)+'/'+str(lo), al) for (la, lo, al) in zip(Lat, Lon, Alt))
for la, lo, al in zip(Lon2, Lat2, Alt2):
al2 = ll.get(str(la)+'/'+str(lo))
if al2:
MeanAltData = (al+al2)/2
appendAlt(MeanAltData)
LonOverlapData = lo
appendLat(LonOverlapData)
LatOverlapData = la
appendLon(LatOverlapData)
print(MeanAlt) # correct ans should be MeanAlt = [-2.0,-5,1.5]
print(LonOverlap)
print(LatOverlap)
Or simpler:
Lon = [450000.50,459000.50,460000,470000]
Lat = [5800000.50,459000.50,500000,470000]
Alt = [-1,-9,-2,1]
Lon2 = [40000.50,459000.50,460000,470000]
Lat2 = [5800000.50,459000.50,800000,470000]
Alt2= [-3,-1,-20,2]
ll = dict((str(la)+'/'+str(lo), al) for (la, lo, al) in zip(Lat, Lon, Alt))
result = []
for la, lo, al in zip(Lon2, Lat2, Alt2):
al2 = ll.get(str(la)+'/'+str(lo))
if al2:
result.append((la, lo, (al+al2)/2))
print(result)
In practice, I would try to start with better structured input data to begin with, making the conversion to dict, or at the very least the "zip()" unnecessary.
Use numpy to vectorize computations. For 1,000,000 long arrays execution time should be on the order of 15-25ms of microseconds if inputs are already numpy.ndarrays and ~140ms if inputs are Python lists.
import numpy as np
def mean_alt(lon, lon2, lat, lat2, alt, alt2):
lon = np.asarray(lon)
lon2 = np.asarray(lon2)
lat = np.asarray(lat)
lat2 = np.asarray(lat2)
alt = np.asarray(alt)
alt2 = np.asarray(alt2)
ind = np.where((lon == lon2) & (lat == lat2))
mean_alt = (0.5 * (alt[ind] + alt2[ind])).tolist()
return (lon[ind].tolist(), lat[ind].tolist(), mean_alt)