Need performance on postGIS with GeoDjango

Need performance on postGIS with GeoDjango - python

This is the first time I'm using GeoDjango with postGIS. After installation and some tests with everything running fine I am concerned about query performance when table rows will grow.
I'm saving in a geometry point longitudes and latitudes that I get from Google geocoding (WGS84, or SRID 4326). My problem is that distance operations are very common in my application. I often need to get near spots from a landmark. Geometry maths are very complex, so even if I have an spatial index, it will probably take too long in the future having more than 1000 spots in a nearby area.
So is there any way to project this geometry type to do distance operations faster? does anyone know a Django library that can render a Google map containing some of these points?
Any advices on how to speed up spatial queries on GeoDjango?

If you can fit your working area into a map projection, that will always be faster, as there are fewer math calls necessary for things like distance calculations. However, if you have truly global data, suck it up: use geography. If you only have continental USA data, use something like EPSG:2163 http://spatialreference.org/ref/epsg/2163/
The more constrained your working area, the more accurate results you can get in a map projection. See the state plane projections for highly constrained, accurate projections for regional areas in the USA. Or UTM projections for larger sub-national regions.

I'm researching on this topic. As far as I have found, coordinates that you get from geopy library are in SRID 4326 format, so you can store them in a geometry field type without problems. This would be an example of a GeoDjango model using geometry:
class Landmark(models.Model):
point = models.PointField(spatial_index = True,
srid = 4326,
geography = True)
objects = models.GeoManager()
By the way, be very careful to pass longitude / latitude to the PointField, in that exact order. geopy returns latitude / longitude coordinates, so you will need to reverse them.
For transforming points in one coordinate system to another we can use GEOS with GeoDjango. In the example I will transform a point in 4326 to the famous Google projection 900913:
from django.contrib.gis.geos import Point
punto = Point(40,-3)
punto.set_srid(900913)
punto.transform(4326)
punto.wkt
Out[5]: 'POINT (0.0003593261136478 -0.0000269494585230)'
This way we can store coordinates in projection systems, which will have better performance maths.
For showing points in a Google map in the admin site interface. We can use this great article.
I have decided to go on with geography types, and I will convert them in the future, in case I need to improve performance.

Generally, GeoDjango will create and use spatial indexes on geometry columns where appropriate.
For an application dealing primarily with distances between points, the Geography type (introduced in PostGIS 1.5, and supported by GeoDjango) may be a good fit. GeoDjango says it gives "much better performance on WGS84 distance queries" [link].

Related

Points in Polygons. How can I match them spatially with given coordinates?

I have a dataset of georeferenced flickr posts (ca. 35k, picture below) and I have an unrelated dataset of georeferenced polygons (ca. 40k, picture below), both are currently panda dataframes. The polygons do not cover the entire area where flickr posts are possible. I am having trouble understanding how to sort many different points in many different polygons (or check if they are close). In the end I want a map with the points from the flickerdata in polygons colord to an attribute (Tag). I am trying to do this in Python. Do you have any ideas or recommendations?
Point dataframe Polygon dataframe

Since, you don't have any sample data to load and play with, my answer will be descriptive in nature, trying to explain some possible strategies to approach the problem you are trying to solve.
I assume that:
these polygons are probably some addresses and you essentially want to place the geolocated flickr posts to the nearest best-match among the polygons.
First of all, you need to identify or acquire information on the precision of those flickr geolocations. How off could they possibly be because of numerous sources of errors (the reason behind those errors is not your concern, but the amount of error is). This will give you an idea of a circle of confusion (2D) or more likely a sphere of confusion (3D). Why 3D? Well, you might have flickr post from a certain elevation on a high-rise apartment, and so, (x: latitude,y: longitude, z: altitude) all may be necessary to consider. But, you have to study the data and any other information available to you to determine the best option here (2D/3D space-of-confusion).
Once you have figured out the type of ND-space-of-confusion, you will need a distance metric (typically just a distance between two points) -- call this sigma. Just to be on the safe side, find all the addresses (geopolygons) within a radius of 1 sigma and additionally within 2 sigma -- these are your possible set of target addresses. For each of these addresses have a variable that calculates its distances of its centroid, and the four corners of its rectangular outer bounding box from the flickr geolocations.
You will then want to rank these addresses for each flickr geolocation, based on their distances for all the five points. You will need a way of identifying a flickr point that is far from a big building's center (distance from centroid could be way more than distance from the corners) but closer to it's edges vs. a different property with smaller area-footprint.
For each flickr point, thus you would have multiple predictions with different probabilities (convert the distance metric based scores into probabilities) using the distances, on which polygon they belong to.
Thus, if you choose any flickr location, you should be able to show top-k geopolygons that flickr location could belong to (with probabilities).
For visualizations, I would suggest you to use holoviews with datashader as that should be able to take care of curse of dimension in your data. Also, please take a look at leafmap (or, geemap).
References
holoviews: https://holoviews.org/
datshader: https://datashader.org/
leafmap: https://leafmap.org/
geemap: https://geemap.org/

How to snap coordinates to road and calculate distance

maybe somebody knows something, since I am not able to find anything that makes sense to me.
I have a dataset positions (lon, lat) and I want to snap them to the nearest road and calculate the distance between them.
So far I discovered OSM, however I can't find a working example on how to use the API using python.
If any of you could help, I am thankful for ever little detail.
Will try to find it out by myself in the meantime and publish the answer if successful (couldn't find any similar question so maybe it will help someone in the future)

Welcome! OSM is a wonderful resource, but is essentially a raw dataset that you have to download and do your own processing on. There are a number of ways to do this, if you need a relatively small extract of the data (as opposed to the full planet file) the Overpass API is the place to look. Overpass turbo (docs) is a useful tool to help with this API.
Once you have the road network data you need, you can use a library like Shapely to snap your points to the road network geometry, and then either calculate the distance between them (if you need "as the crow flies" distance), or split the road geometry by the snapped points and calculate the length of the line. If you need real-world distance that takes the curvature of the earth into consideration (as opposed to the distance as it appears on a projected map), you can use something like Geopy.
You may also want to look into the Map Matching API from Mapbox (full disclosure, I work there), which takes a set of coordinates, snaps them to the road network, and returns the snapped geometry as well as information about the route, including distance.

You might use KDTree of sklearn for this. You fill an array with coordinates of candidate roads (I downloaded this from openstreetmap). Then use KDTree to make a tree of this array. Finally, use KDTree.query(your_point, k=1) to get the nearest point of the tree (which is the nearest node of the coordinate roads). Since searching the tree is very fast (essentially log(N) for N points that form the tree), you can query lots of points.

Python package/function to get percentage area covered by one polygon in another polygon using geo coordinates

I am looking for a solution to find the percentage area covered by a polygon inside another polygon, from geo coordinates using python.
The polygon can be either fully reside inside the other one or a portion of the second polygon.
Is there a solution to this.
Please advice.

Percentage is just area of intersection over area of the (other) polygon:
area(intersection)/area(polygon2).
Basically any of geometry packages should be able to compute this, as they all support area and intersection functions: I think Geopandas, SymPy, Shapely (and others I missed) should be able to do this. There might be differences in supported formats.
You did not specify what Geo coordinates you use though. I think Geopandas and SymPy support only 2D maps (flat map) - meaning you need to use appropriate projection to get exact result, and Shapely works with spherical Earth model.

GeoDjango distance search

I want to use GeoDjango to do basic location searches. Specifically I want to give the search function a ZIP code/city/county and find all the ZIP codes/cities/counties within 5mi, 10mi, 20mi, etc. I found the following paragraph in the documentation:
Using a geographic coordinate system may introduce complications for the developer later on. For example, PostGIS does not have the capability to perform distance calculations between non-point geometries using geographic coordinate systems, e.g., constructing a query to find all points within 5 miles of a county boundary stored as WGS84. [6]
What does this exactly mean if I want to use PostGIS and to be able to do the searches described above across the USA? The docs suggest using a projected coordinate system to cover only a specific region. I need to cover the whole country so this I suppose is not an option.
Basically in the end I want to be able to find neighbouring ZIP codes/cities/counties given a starting location and distance. I don't really care how this is done on a technical level.
Also where would I find a database that contains the geographic boundaries of ZIP codes/cities/counties in the USA that I can import into a GeoDjango model?
UPDATE
I found a database of that contains the latitude and longitude coordinates of all ZIP codes in the USA here. My plan is to import these points into a GeoDjango model and use PostGis to construct queries that can find other points within x miles from a given point. This gets around the issue raised in the documentation because all the ZIP codes are treated as points instead of as polygons. This is fine for my use case because perfect accuracy is not something I care about.
The good: the data file is free
The bad: this data is from the 2000 census so it is quite dated
The somewhat hopeful: the United States Census Bureau conducts a census every 10 years and it is almost 2010
The conclusion: it's good enough for me

To get around the limitation in the quote, you can just take the centroid of the zipcode region provided by the user, and then from that point find all zipcode regions that intersect a 5, 10 or whatever mile circle emanating from that point. I'm not sure how that would be achieved in geodjango, but with postgis it's definitely possible.
The limitation you quoted basically says you can't write a query that says "give me all points that are within 5 miles on the inside of the border of Ohio."

In [1]: o = Place.objects.get(pk=2463583) # Oakland, CA
In [2]: sf = Place.objects.get(pk=2487956) # San Francisco, CA
In [3]: o.coords.transform(3410) # use the NSIDC EASE-Grid Global projection
In [4]: sf.coords.transform(3410) # use the NSIDC EASE-Grid Global projection
In [5]: o.coords.distance(sf.coords) # find the distance between Oakland and San Francisco (in meters)
Out[5]: 14401.942808571299

Getting Easting & Northing Values from geopy

I have a table full of longitude/ latitude pairs in decimal format (e.g., -41.547, 23.456). I want to display the values in "Easting and Northing"/ UTM format. Does geopy provide a way to convert from decimal to UTM? I see in the code that it will parse UTM values, but I don't see how to get them back out and the geopy Google Group has gone the way of all things.

Nope. You need to reproject your points, and geopy isn't going to do that for you.
What you need is libgdal and some Python bindings. I always use the bindings in GeoDjango, but there are other alternatives.
EDIT: It is just a mathematical formula, but it's non-trivial. There are thousands of different ways to represent the surface of the Earth. See here for a huge but incomplete list.
There are two parts to a geographic projection of the Earth-- a coordinate system and a datum. The latter is essentially a three-dimensional model of the planet. When you say you want to convert latitude/longitude points to UTM values, you're missing a couple of pieces of the puzzle.
Let's assume that your lat/long points are based on the WGS84 datum, because that's a pretty common standard for lat/long points these days. You want to convert those points to a UTM coordinate system. But to which UTM coordinate system? There are 60 of them.

I think I may have over-complicated things. All I wanted was the dms values (so 42.519540,
-70.896716 becomes 42º31'10.34" N 70º53'48.18" W). You can get this by creating a geopy point object with your long and lat, then calling format(). However, as of this writing, format() is broken and requires the patch here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.