Mapping points to polygons in ESRI shape file in Python

Mapping points to polygons in ESRI shape file in Python - python

I have an ESRI shape file with many geographic areas, represented as non-overlapping polygons. I'm trying to map arbitrary points to the polygons which they belong to in Python.
I've looked into storing the polygons in a SQLite database as R-trees, but I think this works if the shapes are rectangles (or if I approximate the polygons by using minimum bounding rectangles).
Is there any way to do this exact calculation with R-trees (or a similar module provided by SQLite)? This way I could store this information as a SQLite database, making it very easy to perform this calculation cross platform.

Related

Points in Polygons. How can I match them spatially with given coordinates?

I have a dataset of georeferenced flickr posts (ca. 35k, picture below) and I have an unrelated dataset of georeferenced polygons (ca. 40k, picture below), both are currently panda dataframes. The polygons do not cover the entire area where flickr posts are possible. I am having trouble understanding how to sort many different points in many different polygons (or check if they are close). In the end I want a map with the points from the flickerdata in polygons colord to an attribute (Tag). I am trying to do this in Python. Do you have any ideas or recommendations?
Point dataframe Polygon dataframe

Since, you don't have any sample data to load and play with, my answer will be descriptive in nature, trying to explain some possible strategies to approach the problem you are trying to solve.
I assume that:
these polygons are probably some addresses and you essentially want to place the geolocated flickr posts to the nearest best-match among the polygons.
First of all, you need to identify or acquire information on the precision of those flickr geolocations. How off could they possibly be because of numerous sources of errors (the reason behind those errors is not your concern, but the amount of error is). This will give you an idea of a circle of confusion (2D) or more likely a sphere of confusion (3D). Why 3D? Well, you might have flickr post from a certain elevation on a high-rise apartment, and so, (x: latitude,y: longitude, z: altitude) all may be necessary to consider. But, you have to study the data and any other information available to you to determine the best option here (2D/3D space-of-confusion).
Once you have figured out the type of ND-space-of-confusion, you will need a distance metric (typically just a distance between two points) -- call this sigma. Just to be on the safe side, find all the addresses (geopolygons) within a radius of 1 sigma and additionally within 2 sigma -- these are your possible set of target addresses. For each of these addresses have a variable that calculates its distances of its centroid, and the four corners of its rectangular outer bounding box from the flickr geolocations.
You will then want to rank these addresses for each flickr geolocation, based on their distances for all the five points. You will need a way of identifying a flickr point that is far from a big building's center (distance from centroid could be way more than distance from the corners) but closer to it's edges vs. a different property with smaller area-footprint.
For each flickr point, thus you would have multiple predictions with different probabilities (convert the distance metric based scores into probabilities) using the distances, on which polygon they belong to.
Thus, if you choose any flickr location, you should be able to show top-k geopolygons that flickr location could belong to (with probabilities).
For visualizations, I would suggest you to use holoviews with datashader as that should be able to take care of curse of dimension in your data. Also, please take a look at leafmap (or, geemap).
References
holoviews: https://holoviews.org/
datshader: https://datashader.org/
leafmap: https://leafmap.org/
geemap: https://geemap.org/

How to determine whether a point is inside or outside a 3D model computationally

I have a .obj and .ply file of a 3D model.
What I want to do is read this 3D model file and see if a list of 3D coordinates are either inside or outside the 3D model space.
For example, if the 3D model is a sphere with radius 1, (0,0,0) would be inside (True) and (2,0,0) would be outside (False). Of course the 3D model I'm using is not as simple as a sphere.
I would like to add some of the methods I considered using.
Since I am using Python, I thought of using PyMesh, as their intersection feature looked promising. However the list of coordinates I have are not mesh files but just vectors, so it didn't seem to be the appropriate function to use.
I also found this method using ray casting. However, how to do this with PyMesh, or any other Python tool is something I need advice on.

Cast a ray from the 3D point along X-axis and check how many intersections with outer object you find.
Depending on the intersection number on each axis (even or odd) you can understand if your point is inside or outside.
You may want to repeat on Y and Z axes to improve the result (sometimes your ray is coincident with planar faces and intersection number is not reliable).

Converting my comment into an answer for future readers.
You can use a Convex Hull library to check whether a point is inside the hull. Most libraries use signed distance function to determine whether the point is inside. trimesh is one of the libraries that implements this feature.

Thiessen-like polygons out of pre-labeled points

I have a list of coordinate points that are already clustered. Each point is available to me as a row in a csv file, with one of the fields being the "zone id": the ID of the cluster to which a point belongs. I was wondering if there is a way, given the latitude, longitude and zone ID of each point, to draw polygons similar to Voronoi cells, such that:
each cluster is entirely contained within a polygon
each polygon contains points belonging to only one cluster
the union of the polygons is contiguous polygon that contains all the points. No holes: the polygons must border each other except at the edges. A fun extension would be to supply the "holes" (water bodies, for example) as part of the input.
I realise the problem is very abstract and could be very resource intensive, but I am curious to hear of any approaches. I am open to solutions using a variety or combination of tools, such as GIS software, Python, R, etc. I am also open to implementations that would be integrated into the clustering process.

Plotting gridded data using KML

We are beginning a project to visualize the results of a finite volume (FV) calculation using Google Earth. The FV data is essentially 2d (lat/long) data consisting of a Cartesian array of values (sea surface height, for example). Each value should be mapped to a color from some colormap, and then displayed as a single mesh cell in a gridded array suitable for Google Earth. The Cartesian array could be 100x100 or larger.
My question is, do we construct polygons for each mesh cell C_{ij} in the array, assigning a color corresponding to the q_{ij} value for that mesh cell? This would seem to create a huge KML file, if the coordinates of the four corners of every mesh cell must be described, (i.e. 10,000 polygons, for example).
Or are there KML tools we could use that would allow us to specify, for example, the lower and upper coordinates of the array, a generic mesh cell size (e.g. dX, dY values), and the array of q data (or, equivalently, colours) that should be used to fill the "patch"?
Alternatively, we could create an image file, containing for example, a rendered image of our data array (created by some other means), and then referenced from the KML file.
Our aim is to use PyKML for this project.
Any suggestions would be very helpful.

After much digging around, I think I now have a better understanding of what Google Earth can and cannot do, (or is not designed to do). It seems that Google Earth is not designed as a visualization tool for numerical data. This does not mean it cannot be done, but that one must create the image files elsewhere, and then overlay them onto Google Earth. For example, this link provides instructions for visualizing the output from a fire modeling code :
http://www.openwfm.org/wiki/Visualization_in_Google_Earth
The instructions here suggests how pseudocolor plots can used in at least one special case to visualize output in Google Earth.

Finding intersections

Given a scenario where there are millions of potentially overlapping bounding boxes of variable sizes less the 5km in width.
Create a fast function with the arguments findIntersections(Longitude,Latitude,Radius) and the output is a list of those bounding boxes ids where each bounding box origin is inside the perimeter of the function argument dimensions.
How do I solve this problem elegantly?

This is normally done using an R-tree data structure
dbs like mysql or postgresql have GIS modules that use an r-tree under the hood to quickly retrieve locations within a certain proximity to a point on a map.
From http://en.wikipedia.org/wiki/R-tree:
R-trees are tree data structures that
are similar to B-trees, but are used
for spatial access methods, i.e., for
indexing multi-dimensional
information; for example, the (X, Y)
coordinates of geographical data. A
common real-world usage for an R-tree
might be: "Find all museums within 2
kilometres (1.2 mi) of my current
location".
The data structure splits space with
hierarchically nested, and possibly
overlapping, minimum bounding
rectangles (MBRs, otherwise known as
bounding boxes, i.e. "rectangle", what
the "R" in R-tree stands for).
The Priority R-Tree (PR-tree) is a variant that has a maximum running time of:
"O((N/B)^(1-1/d)+T/B) I/Os, where N is the number of d-dimensional (hyper-)
rectangles stored in the R-tree, B is the disk block size, and T is the output
size."
In practice most real-world queries will have a much quicker average case run time.
fyi, in addition to the other great code posted, there's some cool stuff like SpatiaLite and SQLite R-tree module

PostGIS is an open-source GIS extention for postgresql.
They have ST_Intersects and ST_Intersection functions available.
If your interested you can dig around and see how it's implemented there:
http://svn.osgeo.org/postgis/trunk/postgis/

This seems like a better more general approach GiST
http://en.wikipedia.org/wiki/GiST

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.