I'm using ArcGIS Pro and GeoPandas for spatial analysis operations. I noticed that the distance operations in ArcGIS and the GeoPandas don't align. I wonder which algorithm GeoPandas uses for its distance calculations (function distance).
In my example I selected polygons within a distance of 10 km from another polygon. One polygon is selected in ArcGIS but not in GeoPandas as the distance there is > 10 km. The data is projected to the same crs in both cases.
It's not surprising that different distance algorithms are used, I just can't find any information on which algorithm GeoPandas uses. I already checked the documentation and the code in Git.
ArcGIS uses vertex distances for polygons (ArcGIS documentation here).
Has anyone background information on the GeoPandas distance tool algorithm?
Help is greatly appreciated!
There are a lot of dependencies geopandas relies on. For the distance computation that involves point to point object, it uses shapely Euclidean distance as can be traced to this link:- https://github.com/Toblerity/Shapely/blob/master/shapely/geometry/base.py
If you have a geodataframe df5 like this
a b geometry
0 0 1 POINT (0.00000 0.00000)
1 0 2 POINT (1.00000 0.00000)
2 1 3 POINT (0.00000 2.00000)
3 5 4 POINT (1.00000 1.00000)
You can do computation check with
df5.geometry.values[0].distance( df5.geometry.values[3] )
The result will be 1.4142135623730951, which is the Euclidean distance.
There is no exact explanation for this in the shapely documentation. However, it can be estimated that
https://shapely.readthedocs.io/en/latest/manual.html#shapely.ops.nearest_points is likely to be used.
So I think it will represent the minimum distance between two polygons.
However, this is just my opinion, and I think you will have to test it yourself in the end. In my experience shapely doesn't produce the closest points other than the points shown on the polygon. (You can see this in an example where shapely calculates the minimum distance between a point and a line. The minimum distance is not the distance along the vertical tangent, but the closest point to the line's vertices.)
Since it is based only on points that already exist in polygons, I think that there may be a difference between the two values if ARCGIS finds and calculates the contact point with the minimum distance.
Related
Given a (lat,lng) point and an all type OSMNX network, how can I find which nodes in the graph are within 1km walking distance from the point?
I was thinking about calculating the great circle distance between each node and the point and checking whether it is at most 1km, but I do not believe this will be very accurate since the topology of the network will be ignored.
This OSMnx usage example demonstrates how.
I have never used OSMnx before, but the documentation seems to be very good. And no, you are right that calculating the Haversine (great circle) distance or Euclidean distance will not give you the actual walking distance. The whole point of OSMnx is that it takes real-life street networks into account.
One of the functions that seem to be based on the actual network is
osmnx.distance.shortest_path(G, orig, dest, weight='length').
You might use this function to calculate the shortest distance between all nodes and your point ... and then select those whose shortest distance is below 1 km.
I do not know, however, how walking paths, cicling paths and streets for cars can be differentiated in OSMnx. You might need to consult the documentation for more details or open an issue in the OSMnx GITHUB repo.
I am looking for a solution to find the percentage area covered by a polygon inside another polygon, from geo coordinates using python.
The polygon can be either fully reside inside the other one or a portion of the second polygon.
Is there a solution to this.
Please advice.
Percentage is just area of intersection over area of the (other) polygon:
area(intersection)/area(polygon2).
Basically any of geometry packages should be able to compute this, as they all support area and intersection functions: I think Geopandas, SymPy, Shapely (and others I missed) should be able to do this. There might be differences in supported formats.
You did not specify what Geo coordinates you use though. I think Geopandas and SymPy support only 2D maps (flat map) - meaning you need to use appropriate projection to get exact result, and Shapely works with spherical Earth model.
I have a problem with pdist function in python. I have coordinates of points that I want to find the distance between them but it does not consider them as coordinates and find distance between two points rather than coordinate (it consider coordinates as decimal numbers rather than coordinates). I could not find anything so far of how to fix this problem. Any help is appreciated. In other words, should I do any transformation on my coordinates? Here is a sample code:
p1=[39.1653, -86.5264]
p2=[39.704166670000049, -86.399444439999826]
X=[p1[0],p2[0]]
Y=[p1[1],p2[1]]
spdist.pdist(zip(X,Y), 'euclidean')
The result it gives me is 0.55361991 miles but when I put the coordinates in google map, it give me 42 miles.
Thanks
You can calculate distance from decimal coordinates if you know the formula that's involved. There's one for rectangular coordinate systems; another for spherical coordinate systems.
If the Python built in function takes in point parameters, why not wrap your decimal values as points before calling the function?
from django.contrib.gis.geos import Point
p1 = Point(36.74851779201058, -6.429006806692149, srid=4326)
p2 = Point(37.03254161520977, -8.98366068931684, srid=4326)
p1.distance(p2)
Out: 2.5703941316759376
But what is the unit of this float number?
If you calculate this distance, this is 229.88 Km. You can get it too using geopy:
from geopy.distance import distance
distance(p1, p2)
Out: Distance(229.883275249)
distance(p1, p2).km
Out: 229.88327524944066
I have read that you can get (so so) this, if you divide the previous number for 111:
(2.5703941316759376 / 111) * 10000
Out: 231.5670388897241 # kilometers
Is there any way to get the real distance using only GeoDjango? Or should I use geopy?
Usually, all spatial calculations yield results in the same coordinate system as the input was given. In your case you should seek a calculation using the SRID 4326 which is longitude/latitude polars in degrees from the prime meridian and equator.
Consequently, GeoDjango's distance calculation - if I get it correctly - is the Euclidean distance between the two pairs of coordinates. You are searching for the big circle distance (where your division by 111 is just a rough approximation that is only close to the actual big circle distance in certain ranges of latitude).
geopy should use the big circle distance for SRID 4326 implicitly, yielding the correct result.
You now have a few different options:
A: Implement big circle on your own
Google for haversine formula, you can punch in two pairs of lat/lon coordinates and you should get a good approximation of the actual big circle distance. However, this depends on the mercator approximation that is used -- remember that Earth is not a sphere. You may run into problems near the poles with this.
B: Transform to a metric (local) coordinate system
If you transform your two locations to another coordinate system that is measured in meters, calculating the Euclidean distance will yield the correct result. However, such coordinate systems (call them planar systems) are different for various regions on the globe. There are different projections for different countries, as the approximation of the Earth's irregularly curved surface as a plane is errorneous -- especially not uniquely errorneous for any location on its surface.
This is only applicable if all points among which you wish to calculate distances are in the same geographical region.
C: Use a library for this
Use geopy or shapely or any other qualified library that can calculate the actual big circle distance based on the SRID your points are given in. Remember that all coordinates are just approximations due to Earth's irregularity.
As far as I know, GeoDjango doesn't support calculating the real distance. It just calculates the distance geometrically. Therefore, I think you should use geopy as I did in my project..
from geopy.distance import vincenty
distance = vincenty((lat1, lon1), (lat2, lon2)).kilometers
This will give the right distance as kilometers.
For further information, check the geopy documentation.
http://geopy.readthedocs.io/en/latest/
There's a solution to this online, which explains both what GeoDjango is doing originally (a distance calculation that doesn't use any standard units, essentially), but also, how to get it into a form that returns the distance in more useful units -- the code is very similar to what you're already doing, except that it makes use of a transform on each point before retrieving the distance. The link is below, hopefully it's useful to you:
https://coderwall.com/p/k1gg1a/distance-calculation-in-geodjango
I am using Ipython Notebook. I am working on a project where I need to look at about 100 data points in 3D space and figure out the distance between each and the angle from one another. I want to see correlations of the data points and ultimately see if there is any structure to the data (a straight line hidden somewhere). I have looked into clustering techniques and hough transforms, but they seem not to give me the result I need. Any ideas are much appreciated.. thanks!
For the first issue of determining the pairwise distance between three dimensional points, you can use scipy.spatial.distance.pdist(). This will generate n(n-1)/2 distances for n points. For the second issue finding the angle between points, that's trickier. It seems so tricky that I don't even really want to think about it; however, to that end, you can use scipy.spatial.distance.cosine(), which will determine the cosine distance between two vectors.
Have you looked at scikits? I've found them very helpful in my work. http://scikit-learn.org/stable/
The distance is best found using scipy.spatial.distance.pdist() as mentioned in cjohnson318's answer. For a small array of points 'a' defined as:
import numpy as np
a=np.array([[0,0,0],[1,1,1],[4,2,-2],[3,-1,2]])
The distance euclidean distance 'D' between the points can be found as:
from scipy.spatial.distance import pdist, squareform
D = squareform(pdist(a))
In 3d polar notation, you would need 2 angles to define the direction from one point to another. It seems like a Cartesian unit vector giving the direction would likely serve your purpose just as well. These can be found as:
(a-a[:,np.newaxis,:]) / D[...,np.newaxis]
This will include NaN's in the diagonal elements, as there is no vector from a point to itself. If necessary, these can be changed to zeros using np.nan_to_num
If you actually do need the angles, you could get them by applying np.arctan to the components of the unit vector.