Python nearest neighbour - coordinates - python

I wanted to check I was using scipy's KD tree correctly because it appears slower than a simple bruteforce.
I had three questions regarding this:
Q1.
If I create the following test data:
nplen = 1000000
# WGS84 lat/long
point = [51.349,-0.19]
# This contains WGS84 lat/long
points = np.ndarray.tolist(np.column_stack(
[np.round(np.random.randn(nplen)+51,5),
np.round(np.random.randn(nplen),5)]))
And create three functions:
def kd_test(points,point):
""" KD Tree"""
return points[spatial.KDTree(points).query(point)[1]]
def ckd_test(points,point):
""" C implementation of KD Tree"""
return points[spatial.cKDTree(points).query(point)[1]]
def closest_math(points,point):
""" Simple angle"""
return (min((hypot(x2-point[1],y2-point[0]),y2,x2) for y2,x2 in points))[1:3]
I would expect the cKD tree to be the fastest, however - running this:
print("Co-ordinate: ", f(points,point))
print("Index: ", points.index(list(f(points,point))))
%timeit f(points,point)
Result times - the simple bruteforce method is faster:
closest_math: 1 loops, best of 3: 3.59 s per loop
ckd_test: 1 loops, best of 3: 13.5 s per loop
kd_test: 1 loops, best of 3: 30.9 s per loop
Is this because I am using it wrong - somehow?
Q2.
I would assume that the even to get the ranking (rather than distance) of closest points one still needs to project the data. However, it seems that the projected and un-projected points give me the same nearest neighbour:
def proj_list(points,
inproj = Proj(init='epsg:4326'),
outproj = Proj(init='epsg:27700')):
""" Projected geo coordinates"""
return [list(transform(inproj,outproj,x,y)) for y,x in points]
proj_points = proj_list(points)
proj_point = proj_list([point])[0]
Is this just because my spread of points is not big enough to introduce distortion? I re-ran a few times and still got the same index out of the projected and un-projected lists being returned.
Q3.
Is it generally faster to project the points (like above) and calculate the hypotenuse distance compared to calculating the haversine or vincenty distance on (un-projected) latitude/longitudes? Also which option would be more accurate? I ran a small test:
from math import *
def haversine(origin,
destination):
"""
Find distance between a pair of lat/lng coordinates
"""
lat1, lon1, lat2, lon2 = map(radians, [origin[0],origin[1],destination[0],destination[1]])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * asin(sqrt(a))
r = 6371000 # Metres
return (c * r)
def closest_math_unproj(points,point):
""" Haversine on unprojected """
return (min((haversine(point,pt),pt[0],pt[1]) for pt in points))
def closest_math_proj(points,point):
""" Simple angle since projected"""
return (min((hypot(x2-point[1],y2-point[0]),y2,x2) for y2,x2 in points))
Results:
So this seems to say that projecting and then doing distance is faster than not - however, I am not sure which method will bring more accurate results.
Testing this on an online vincenty calculation is seems the projected co-ordinates are the way to go:

Q1.
The reason for the apparent inefficiency of the k-d tree is quite simple: you are measuring both the construction and querying of the k-d tree at once. This is not how you would or should use a k-d tree: you should construct it only once. If you measure only the querying, the time taken reduces to mere tens of milliseconds (vs seconds using the brute-force approach).
Q2.
This will depend on the spatial distribution of the actual data being used and the projection being used. There might be slight differences based on how efficient the implementation of the k-d tree is at balancing the constructed tree. If you are querying only a single point, then the result will be deterministic and unaffected by the distribution of points anyway.
With the sample data that you are using, which has strong central symmetry, and with your map projection (Transverese Mercator), the difference should be negligible.
Q3.
Technically, the answer to your question is trivial: using the Haversine formula for geographic distance measurement is both more accurate and slower. Whether the tradeoff between accuracy and speed is warranted depends heavily on your use case and the spatial distribution of your data (mostly on the spatial extent, obviously).
If the spatial extent of your points is on the small, regional side, then using a suitable projection and the simple Euclidean distance measure might be accurate enough for your use case and faster than using the Haversine formula.

Related

When calcuating distance between points on earth why are my Haversine vs. Geodesic calculations diverging?

I am getting wildly diverging distances using two approximations to calculate distance between points on Earth's surface. I am using the Haversine (vectorized) approximation and the more precise (presumably) geopy.distance.geodesic .
As you can see I am off by five percent as the distances between points becomes large. Is this divergence due to rounding error in Haversine? Do I indeed trust the Geodesic? Here is code:
import numpy as np
lat = np.linspace(35,45,100)
lon = np.linspace(-120,-110,100)
data = pd.DataFrame({'Latitude':lat,'Longitude':lon})
def Haversine(v):
"""
distance between two lat,lon coordinates
using the Haversine formula. Assumes one
radius. r = 3,950 to 3,963 mi
"""
from timeit import default_timer as timer
start = timer()
R = 3958 # radius at 40 deg 750 m elev
v = np.radians(v)
dlat = v[:, 0, np.newaxis] - v[:, 0]
dlon = v[:, 1, np.newaxis] - v[:, 1]
c = np.cos(v[:,0,None])
a = np.sin(dlat / 2.0) ** 2 + c * c.T * np.sin(dlon / 2.0) ** 2
c = 2 * np.arcsin(np.sqrt(a))
result = R * c
print(round((timer() - start),3))
return result
def slowdistancematrix(data):
from geopy.distance import geodesic
distance = np.zeros((data.shape[0],data.shape[0]))
for i in range(data.shape[0]):
lat_lon_i = data.Latitude.iloc[i],data.Longitude.iloc[i]
for j in range(i):
lat_lon_j = data.Latitude.iloc[j],data.Longitude.iloc[j]
distance[i,j] = geodesic(lat_lon_i, lat_lon_j).miles
distance[j,i] = distance[i,j] # make use of symmetry
return distance
distanceG = slowdistancematrix(data)
distanceH = Haversine(data.values)
plt.scatter(distanceH.ravel(),distanceG.ravel()/distanceH.ravel(),s=.5)
plt.ylabel('Geodesic/Haversine')
plt.xlabel('Haversine distance (miles)')
plt.title('all points in distance matrix')
I would rather use the vectorized version becuase it is fast. However,the 5% is too big for me to be comfortable with it. Supposedly Haversine is only suppose to be off by .5%.
UPDATE:
Found error. when implementing the vectorized version I wasn't calculating all the distances between points, but only between some. I updated code to reflect this. Here is what the difference between Haversine and Geodesic are for my domain (25-55* by -125--110):
Pretty darn good!
The Haversine formula calculates distances between points on a sphere (the great-circle distance), as does geopy.distance.great_circle.
On the other hand, geopy.distance.geodesic calculates distances between points on an ellipsoidal model of the earth, which you can think of as a "flattened" sphere.
The difference isn't due to rounding so much as they use different formulas, with the geodesic formula more accurately modeling the true shape of the earth.
There was a matrix algebra error in the Haversine formula. I updated the code in the question. I am getting much better agreement between Haversine and geodesic now:
On my actual dataset:

How to calculate 3D distance (including altitude) between two points in GeoDjango

Prologue:
This is a question arising often in SO:
3d distance calculations with GeoDjango
Calculating distance between two points using latitude longitude and altitude (elevation)
Distance between two 3D point in geodjango (postgis)
I wanted to compose an example on SO Documentation but the geodjango chapter never took off and since the Documentation got shut down on August 8, 2017, I will follow the suggestion of this widely upvoted and discussed meta answer and write my example as a self-answered post.
Of course, I would be more than happy to see any different approach as well!!
Question:
Assume the model:
class MyModel(models.Model):
name = models.CharField()
coordinates = models.PointField()
Where I store the point in the coordinate variable as a lan, lng, alt point:
MyModel.objects.create(
name='point_name',
coordinates='SRID=3857;POINT Z (100.00 10.00 150)')
I am trying to calculate the 3D distance between two such points:
p1 = MyModel.objects.get(name='point_1').coordinates
p2 = MyModel.objects.get(name='point_2').coordinates
d = Distance(m=p1.distance(p2))
Now d=X in meters.
If I change only the altitude of one of the points in question:
For example:
p1.coordinates = 'SRID=3857;POINT Z (100.00 10.00 200)'
from 150 previously, the calculation:
d = Distance(m=p1.distance(p2))
returns d=X again, like the elevation is ignored.
How can I calculate the 3D distance between my points?
Reading from the documentation on the GEOSGeometry.distance method:
Returns the distance between the closest points on this geometry and the given geom (another GEOSGeometry object).
Note
GEOS distance calculations are linear – in other words, GEOS does not perform a spherical calculation even if the SRID specifies a geographic coordinate system.
Therefore we need to implement a method to calculate a more accurate 2D distance between 2 points and then we can try to apply the altitude (Z) difference between those points.
1. Great-Circle 2D distance calculation (Take a look at the 2022 UPDATE below the explanation for a better approach using geopy):
The most common way to calculate the distance between 2 points on the surface of a sphere (as the Earth is simplistically but usually modeled) is the Haversine formula:
The haversine formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes.
Although from the great-circle distance wiki page we read:
Although this formula is accurate for most distances on a sphere, it too suffers from rounding errors for the special (and somewhat unusual) case of antipodal points (on opposite ends of the sphere). A formula that is accurate for all distances is the following special case of the Vincenty formula for an ellipsoid with equal major and minor axes.
We can create our own implementation of the Haversine or the Vincenty formula (as shown here for Haversine: Haversine Formula in Python (Bearing and Distance between two GPS points)) or we can use one of the already implemented methods contained in geopy:
geopy.distance.great_circle (Haversine):
from geopy.distance import great_circle
newport_ri = (41.49008, -71.312796)
cleveland_oh = (41.499498, -81.695391)
# This call will result in 536.997990696 miles
great_circle(newport_ri, cleveland_oh).miles)
geopy.distance.vincenty (Vincenty):
from geopy.distance import vincenty
newport_ri = (41.49008, -71.312796)
cleveland_oh = (41.499498, -81.695391)
# This call will result in 536.997990696 miles
vincenty(newport_ri, cleveland_oh).miles
!!!2022 UPDATE: On 2D distance calculation using geopy:
GeoPy discourages the use of Vincenty as of version 1.14.0. Changelog states:
CHANGED: Vincenty usage now issues a warning. Geodesic should be used instead. Vincenty is planned to be removed in geopy 2.0. (#293)
So (especially if we are going to apply the calculation on a WGS84 ellipsoid) we should use geodesic distance instead:
from geopy.distance import geodesic
newport_ri = (41.49008, -71.312796)
cleveland_oh = (41.499498, -81.695391)
# This call will result in 538.390445368 miles
geodesic(newport_ri, cleveland_oh).miles
2. Adding altitude to the mix:
As mentioned, each of the above calculations yields a great circle distance between 2 points. That distance is also called "as the crow flies", assuming that the "crow" flies without changing altitude and as straight as possible from point A to point B.
We can have a better estimation of the "walking/driving" ("as the crow walks"??) distance by combining the result of one of the previous methods with the difference (delta) in altitude between point A and point B, inside the Euclidean Formula for distance calculation:
acw_dist = sqrt(great_circle(p1, p2).m**2 + (p1.z - p2.z)**2)
The previous solution is prone to errors especially the longer the real distance between the points is. I leave it here for comment continuation reasons.
GeoDjango Distance calculates the 2D distance between two points and doesn't take into consideration the altitude differences.
In order to get the 3D calculation, we need to create a distance function that will consider altitude differences in the calculation:
Theory:
The latitude, longitude and altitude are Polar coordinates and we need to translate them to Cartesian coordinates (x, y, z) in order to apply the Euclidean Formula on them and calculate their 3D distance.
Assume:
polar_point_1 = (long_1, lat_1, alt_1)
and polar_point_2 = (long_2, lat_2, alt_2)
Translate each point to it's Cartesian equivalent by utilizing this formula:
x = alt * cos(lat) * sin(long)
y = alt * sin(lat)
z = alt * cos(lat) * cos(long)
and you will have p_1 = (x_1, y_1, z_1) and p_2 = (x_2, y_2, z_2) points respectively.
Finally use the Euclidean formula:
dist = sqrt((x_2-x_1)**2 + (y_2-y_1)**2 + (z_2-z_1)**2)
Using geopy, this is the easiest and perfect solution.
https://geopy.readthedocs.io/en/stable/#geopy.distance.lonlat
>>> from geopy.distance import distance
>>> from geopy.point import Point
>>> a = Point(-71.312796, 41.49008, 0)
>>> b = Point(-81.695391, 41.499498, 0)
>>> print(distance(a, b).miles)
538.3904453677203
Once converted into Cartesian coordinates, you can compute the norm with numpy:
np.linalg.norm(point_1 - point_2)

Calculating distance travelled from gps track points using python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have this very silly question to ask. I have GPS track points for a journey like this:
863.3,2013-10-05T01:21:07Z,0,13.348841,77.686539
863.3,2013-10-05T01:21:08Z,1,13.348841,77.686539
863.3,2013-10-05T01:21:23Z,2,13.348708,77.686248
861.1,2013-10-05T01:21:28Z,3,13.348647,77.686088
867.0,2013-10-05T01:29:03Z,4,13.34732,77.682364
All I want is to find the distance traveled: should I only consider the first track point and the last track point? Or do I need to find the distance traveled between every track point?
Once you parse your gps points, you need to extract the lat/lon points for each. You could use the following formula adapted from here to get the distances between each pair of points and add sum them for your total distance.
import math
def getDistance(lat1,lon1,lat2,lon2):
# This uses the haversine formula, which remains a good numberical computation,
# even at small distances, unlike the Shperical Law of Cosines.
# This method has ~0.3% error built in.
R = 6371 # Radius of Earth in km
dLat = math.radians(float(lat2) - float(lat1))
dLon = math.radians(float(lon2) - float(lon1))
lat1 = math.radians(float(lat1))
lat2 = math.radians(float(lat2))
a = math.sin(dLat/2) * math.sin(dLat/2) + \
math.cos(lat1) * math.cos(lat2) * math.sin(dLon/2) * math.sin(dLon/2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
d = R * c * 0.621371 # Converting km to miles with "* 0.621371"
return d
Note that this function returns your distances in miles, but you can keep things metric in (km) by removing the "* 0.621371" from the end.
Of course these are assuming great circle lines. You're probably traveling some sort of network, so this will certainly not be real world accurate.
In order to get an estimate of the distance travelled between the GPS track points you have, you definitely need to consider the distances between all consecutive points. More precisely, if you have N positions, you need to iterate over all positions you have and sum up the distance between each point P_i and P_i+1 (ordered by the time it has been recorded).
If you would only calculate the distance between the first and the last point, the result would not be of any meaning at all. Imagine a set of N points that have been recorded while moving a track that represents a large circle. The first and the last point would be almost the same, hence resulting in a very small distance, even though the total distance you travelled while moving in the circle is significantly larger.
However, be aware that summing up the distance between consecutive points will still only be an estimate of the total distance travelled. Depending on the resolution of your track (i.e., the frequency in wich the positions of your track have been submitted) the accuracy compared to the real distance may vary significantly.

Fast python GIS library that supports Great Circle Distance and polygon

I was looking for a geographical library for python.
I need to be able to do the following:
Get the distance between 2 points (in meters) using Great-circle distance (not liner distance calculation)
Check if a point is inside a polygon
Perform 1 and 2 couple of thousands times per seconds
At start I've looked at this post: Python module for storing and querying geographical coordinates and started to use geopy.
I've encountered 2 problems:
Geopy doesn't support polygons
High CPU usage of geoPy (it takes about 140ms of CPU to calculate distance between a point and relative 5000 points)
I've continued looking and found Best Python GIS library? and https://gis.stackexchange.com/ . It looked promising as geos is using complied C code which should be faster and shapely supports polygons.
The problem is that geos/OGR performs linear distance calculations instead of sphere. This eliminates all other geos based modules (like GEODjango and shapely).
Am I missing something here? I don't think that I'm the first person who is using python to perform GIS calculations and wants to get accurate results.
UPDATE
Moving on now to finishing out the other 576 functions in that library not including the two polygon functions that are finished, the three sphere distance algorithms that are done, and two new ones, an angle_box_2d and angle_contains_ray_2d. Also, I switched to the C version so that externs are not needed, simplifies the work. Put the old C++ version in directory old_c++, so its still there.
Tested performance, it is identical as listed at the bottom of the answer.
UPDATE 2
So just a quick update, I haven't finished the whole library yet (I'm only about 15% of the way through), but I've added these untested functions, in case you need them right away, on github, to add to the old point in polygon and sphere distance algorithms.
angle_box_2d
angle_contains_ray_2d
angle_deg_2d
angle_half_2d # MLM: double *
angle_rad_2d
angle_rad_3d
angle_rad_nd
angle_turn_2d
anglei_deg_2d
anglei_rad_2d
annulus_area_2d
annulus_sector_area_2d
annulus_sector_centroid_2d # MLM: double *
ball_unit_sample_2d # MLM: double *
ball_unit_sample_3d # MLM: double *
ball_unit_sample_nd # MLM; double *
basis_map_3d #double *
box_01_contains_point_2d
box_01_contains_point_nd
box_contains_point_2d
box_contains_point_nd
box_ray_int_2d
box_segment_clip_2d
circle_arc_point_near_2d
circle_area_2d
circle_dia2imp_2d
circle_exp_contains_point_2d
circle_exp2imp_2d
circle_imp_contains_point_2d
circle_imp_line_par_int_2d
circle_imp_point_dist_2d
circle_imp_point_dist_signed_2d
circle_imp_point_near_2d
circle_imp_points_2d # MlM: double *
circle_imp_points_3d # MLM: double *
circle_imp_points_arc_2d
circle_imp_print_2d
circle_imp_print_3d
circle_imp2exp_2d
circle_llr2imp_2d # MLM: double *
circle_lune_area_2d
circle_lune_centroid_2d # MLM; double *
circle_pppr2imp_3d
The ones that I've commented above probably won't work, the others might, but again - polygon & sphere distances definitely do. And you can specify meters, kilometers, miles, nautical miles, it doesn't really matter on the spherical distance ones, the output is in the same units as the input - the algorithms are agnnostic to the units.
I put this together this morning so it currently only provides the point in polygon, point in convex polygon, and three different types of spherical distance algorithms, but at least those ones that you requested are there for you to use now. I don't know if there is a name conflict with any other python library out there, I only get peripherally involved with python these days, so if there's a better name for it I'm open to suggestions.
On github: https://github.com/hoonto/pygeometry
It is just a python bridge to the functions described and implemented here:
http://people.sc.fsu.edu/~jburkardt/cpp_src/geometry/geometry.html
The GEOMETRY library is pretty good actually, so I think it'll be useful to bridge all of those functions for python, which I'll do probably tonight.
Edit: a couple other things
Because the math functions are actually compiled C++, you do of course need to make sure that the shared library is in the path. You can modify the geometry.py to point at wherever you want to put that shared library though.
Only compiled for linux, the .o and .so were compiled on x86_64 fedora.
The spherical distance algorithms expect radians so you need to convert decimal lat/lon degrees for example to radians, as shown in geometry.py.
If you do need this on Windows let me know, it should only take a couple minutes to get it worked out in Visual Studio. But unless someone asks I'll probably just leave it alone for now.
Hope this helps!
Rgds....Hoonto/Matt
(new commit: SHA: 4fa2dbbe849c09252c7bd931edfe8db478de28e6 - fixed some things, like radian conversions and also the return types for the py functions. Also added some basic performance tests to make sure the library performs appropriately.)
Test Results
In each iteration, one call to sphere_distance1 and one call polygon_contains_point_2d so 2 calls to the library total.
~0.062s : 2000 iterations, 4000 calls
~0.603s : 20000 iterations, 40000 calls
~0.905s : 30000 iterations, 60000 calls
~1.198s : 40000 iterations, 80000 calls
If spherical calculation is enough I'd just use numpy for distance and matplotlib for polygon check (as you find similar proposals in stackoverflow).
from math import asin, cos, radians, sin, sqrt
import numpy as np
def great_circle_distance_py(pnt1, pnt2, radius):
""" Returns distance on sphere between points given as (latitude, longitude) in degrees. """
lat1 = radians(pnt1[0])
lat2 = radians(pnt2[0])
dLat = lat2 - lat1
dLon = radians(pnt2[1]) - radians(pnt1[1])
a = sin(dLat / 2.0) ** 2 + cos(lat1) * cos(lat2) * sin(dLon / 2.0) ** 2
return 2 * asin(min(1, sqrt(a))) * radius
def great_circle_distance_numpy(pnt1, l_pnt2, radius):
""" Similar to great_circle_distance_py(), but working on list of pnt2 and returning minimum. """
dLat = np.radians(l_pnt2[:, 0]) - radians(pnt1[0]) # slice latitude from list of (lat, lon) points
dLon = np.radians(l_pnt2[:, 1]) - radians(pnt1[1])
a = np.square(np.sin(dLat / 2.0)) + np.cos(radians(pnt1[0])) * np.cos(np.radians(l_pnt2[:, 0])) * np.square(np.sin(dLon / 2.0))
return np.min(2 * np.arcsin(np.minimum(np.sqrt(a), len(a)))) * radius
def aux_generateLatLon():
import random
while 1:
yield (90.0 - 180.0 * random.random(), 180.0 - 360.0 * random.random())
if __name__ == "__main__":
## 1. Great-circle distance
earth_radius_m = 6371000.785 # sphere of same volume
nPoints = 1000
nRep = 100 # just to measure time
# generate a point and a list of to check against
pnt1 = next(aux_generateLatLon())
l_pnt2 = np.array([next(aux_generateLatLon()) for i in range(nPoints)])
dMin1 = min([great_circle_distance_py(pnt1, pnt2, earth_radius_m) for pnt2 in l_pnt2])
dMin2 = great_circle_distance_numpy(pnt1, l_pnt2, earth_radius_m)
# check performance
import timeit
print "random points: %7i" % nPoints
print "repetitions : %7i" % nRep
print "function 1 : %14.6f s" % (timeit.timeit('min([great_circle_distance_py(pnt1, pnt2, earth_radius_m) for pnt2 in l_pnt2])', 'from __main__ import great_circle_distance_py , pnt1, l_pnt2, earth_radius_m', number=nRep))
print "function 2 : %14.6f s" % (timeit.timeit('great_circle_distance_numpy(pnt1, l_pnt2, earth_radius_m)' , 'from __main__ import great_circle_distance_numpy, pnt1, l_pnt2, earth_radius_m', number=nRep))
# tell distance
assert(abs(dMin1 - dMin2) < 0.0001)
print
print "min. distance: %14.6f m" % dMin1
## 2. Inside polygon?
# Note, not handled:
# - the "pathological case" mentioned on http://paulbourke.net/geometry/polygonmesh/
# - special situations on a sphere: polygons covering "180 degrees longitude edge" or the Poles
from matplotlib.path import Path
x = y = 1.0
l_pnt2 = [(-x, -y), (x, -y), (x, y), (-x, y), (-x, -y)]
path = Path(l_pnt2)
print "isInside ?"
for pnt in [(0.9, -1.9), (0.9, -0.9)]:
print " ", pnt, bool(path.contains_point(pnt))
If you want to do more, the Quantum GIS toolset probably is worth a look: PyQGIS Developer Cookbook (docs.qgis.org).

How to compute distance of two geographical coordinates?

I read this question and implemented the accepted answer in Python (see below). It works in principle, but the results are consistently about 30% higher than expected (Czech Republic) - is that the expected accuracy of this algorithm?
To verify the algorithm, I used BoundingBox to get a bounding box with a known diagonal distance (building, two cities) and used the output coordinates as input for "my" algorithm.
Where is the problem?
my implementation?
the algorithm itself?
Python?
testing?
My implementation:
R= 6371 #km
dLat = math.radians(lat2-lat1)
dLon = math.radians(lon2-lon1)
lat1 = math.radians(lat1)
lat2 = math.radians(lat2)
a= math.sin(dLat/2)*math.sin(dLat/2) + math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1) * math.cos(lat2)
c= 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
d = R * c;
return d
No, the algorithm is not supposed to have an error of that magnitude. The link specifies that you can expect around a 0.3% error.
I can not reproduce your results with your code, so I believe the error is with your testing.
Here's some testing data from a site with the distance between and coordinates of Prague and Brno in decimal degrees format:
lat_prague, long_prague = 50.0833, 14.4667
lat_brno, long_brno = 49.2000, 16.6333
expected_km = 184.21
Here are the testing results:
>>> def calc(lat1,lon1, lat2,lon2):
# ... your code ...
>>> calc(lat_prague,long_prague,lat_brno,long_brno)
184.34019283649852
>>> calc(lat_prague,long_prague,lat_brno,long_brno) / expected_km
1.0007067631317437
A wild guess: for locations in the Czech Republic, the error you're getting seems in the right order of magnitude for with mixing up latitude and longitude:
>>> calc(long_prague,lat_prague,long_brno,lat_brno)
258.8286271447481
>>> calc(long_prague,lat_prague,long_brno,lat_brno) / expected_km
1.405073704710646
This is apparently a known confusion. A coordinate specified as only a pair of numbers is ambiguous (for instance: both BoundingBox and the reference for the distance above use (long, lat), and the algorithm uses the ordering lat, long). When you come across the ambiguous format with an unfamiliar data source without a formal specification, you'll just have to sanity-check. Sites like Wikipedia will tell you unambiguously that Prague lies at "50°05′N 14°25′E" -- that is, very roughly, around 50 degrees latitude (north-south) and 14 degrees longitude (east-west).

Categories