I have a large list of polygons (>10^6) most of which are non-intersecting but some of these polygons are hole of another polygon (~10^3 cases). Here a image to explain the problem, the smaller polygon is a hole in the larger polygon but both are independent polygon in the list of polygons.
Now I would like to efficiently determine which polygons are holes and substract the holes i.e. subtract the smaller polygons which lie completely inside in another polygon and return a list of "cleaned" polygons. A pair of hole and parent polygon should be transformed like this (so basically hole subtracted from the parent):
There are plenty of similar questions on Stackoverflow and gis.stackexchange.com but I haven't found one that actually solves this problems. Here are some related questions:
1. https://gis.stackexchange.com/questions/5405/using-shapely-translating-between-polygons-and-multipolygons
2. https://gis.stackexchange.com/questions/319546/converting-list-of-polygons-to-multipolygon-using-shapely
Here is a sample code.
from shapely.geometry import Point
from shapely.geometry import MultiPolygon
from shapely.ops import unary_union
import numpy as np
#Generate a list of polygons, where some are holes in others;
def generateRandomPolygons(polygonCount = 100, areaDimension = 1000, holeProbability = 0.5):
pl = []
radiusLarge = 2 #In the real dataset the size of polygons can vary
radiusSmall = 1 #Size of holes can also vary
for i in range(polygonCount):
x, y = np.random.randint(0,areaDimension,(2))
rn1 = np.random.random(1)
pl.append(Point(x, y).buffer(radiusLarge))
if rn1 < holeProbability: #With a holeProbability add a hole in the large polygon that was just added to the list
pl.append(Point(x, y).buffer(radiusSmall))
return pl
polygons = generateRandomPolygons()
print(len(pl))
Output looks like this:
Now how can I create a new list of polygons with the holes removed. Shapely provides functions to subtract one polygon from another (difference) but is there a similar function for lists of polygons (maybe something like unary_union but where overlaps are removed)? Alternative how to efficiently determine which are holes and then subtract them from the larger polygons?
Your problem is you don't know which ones are "holes", right? To "efficiently determine which polygons are holes", you can use an rtree to speed up the intersection check:
from rtree.index import Index
# create an rtree for efficient spatial queries
rtree = Index((i, p.bounds, None) for i, p in enumerate(polygons))
donuts = []
for i, this_poly in enumerate(polygons):
# loop over indices of approximately intersecting polygons
for j in rtree.intersection(this_poly.bounds):
# ignore the intersection of this polygon with itself
if i == j:
continue
other_poly = polygons[j]
# ensure the polygon fully contains our match
if this_poly.contains(other_poly):
donut = this_poly.difference(other_poly)
donuts.append(donut)
break # quit searching
print(len(donuts))
Related
I have a dataset of circular polygons that correspond to tree crowns.
Many polygons overlap with each other, and some are even completely covered by larger polygons (or larger polygons covered by many small polygons). I would like to clip polygons based on attribute value (tree height), where the maximum height polygons clip the polygons with lower height values.
Image below describes the situation, where 1 is the lowest tree height and 3 is the tallest:
I attempted using this workflow in QGIS (https://gis.stackexchange.com/questions/427555/cut-polygons-with-each-other-based-on-attribute-value), but it takes very long and was unusable for larger datasets.
I would prefer to use Python, but if you can accomplish with any other programming language I would accept. Thanks in advance!
Test dataset located at:
https://github.com/arojas314/data-sharing/blob/main/niwo010_treepolys.zip
I attempted but only got as far as splitting the polygons with the boundaries (lines) of each polygon, creating smaller polygons where they over lap:
import shapely
import geopandas as gpd
# 1. convert polys to lines
tree_lines = tree_polys_valid.boundary
# 2. Split polygons by lines
merged_lines = shapely.ops.linemerge(tree_lines.values)
border_lines = shapely.ops.unary_union(merged_lines)
decomposition = shapely.ops.polygonize(border_lines)
# 3. Convert into GeoSeries
poly_series = gpd.GeoSeries(list(decomposition))
I have a Shapely polygon. I want to cut these polygon into n polygons, which all have more-or-less equally sized areas. Equally sized would be best, but an approximation would be okay too.
I have tried to use the two methods described here, which both are a step in the right direction by not what I need. Both don't allow for a target n
I looked into voronoi, with which I am largely unfamiliar. The resulting shapes this analysis gives would be ideal, but it requires points, not a shape as input.
This is the best I could manage. It does not result in equal surface area per polygon, but it turned out to work for what I needed. This populates a shape with a specific number of points (if the parameters are kept constant, the number of points will be too). Then the points are converted to a voronoi, which was then turned into triangles.
from shapely import affinity
from shapely.geometry.multipolygon import MultiPolygon
from scipy.spatial import Voronoi
# Voronoi doesn't work properly with points below (0,0) so set lowest point to (0,0)
shape = affinity.translate(shape, -shape_a.bounds[0], -shape_a.bounds[1])
points = shape_to_points(shape)
vor = points_to_voronoi(points)
triangles = MultiPolygon(triangulate(MultiLineString(vor)))
def shape_to_points(shape, num = 10, smaller_versions = 10):
points = []
# Take the shape, shrink it by a factor (first iteration factor=1), and then
# take points around the contours
for shrink_factor in range(0,smaller_versions,1):
# calculate the shrinking factor
shrink_factor = smaller_versions - shrink_factor
shrink_factor = shrink_factor / float(smaller_versions)
# actually shrink - first iteration it remains at 1:1
smaller_shape = affinity.scale(shape, shrink_factor, shrink_factor)
# Interpolate numbers around the boundary of the shape
for i in range(0,int(num*shrink_factor),1):
i = i / int(num*shrink_factor)
x,y = smaller_shape.interpolate(i, normalized=True).xy
points.append( (x[0],y[0]))
# add the origin
x,y = smaller_shape.centroid.xy
points.append( (x[0], y[0]) ) # near, but usually not add (0,0)
points = np.array(points)
return points
def points_to_voronoi(points):
vor = Voronoi(points)
vertices = [ x for x in vor.ridge_vertices if -1 not in x]
# For some reason, some vertices were seen as super, super long. Probably also infinite lines, so take them out
lines = [ LineString(vor.vertices[x]) for x in vertices if not vor.vertices[x].max() > 50000]
return MultiLineString(lines)
This is the input shape:
This is after shape_to_points:
This is after points_to_voronoi
And then we can triangulate the voronoi:
Just combining the response and basic polyfill docs provided by #user3496060 (very helpful for me, thank you), here's a simple function.
And here's a great notebook from the h3 repo. Check out the "Census Polygon to Hex" section for how they use polyfill().
def h3_fill_shapely_poly(poly = shape, res = 10):
"""
inputs:
- poly: must be a shapely Polygon, cannot be any other shapely object
- res: resolution (higher means more specific zoom)
output:
- h3_fill: a Python set() object, generated by polypill
"""
coordinates = [[i[0], i[1]] for i in poly.exterior.coords]
geo_json = {
"type": "Polygon",
"coordinates": [coordinates]
}
h3_fill = h3.polyfill(geo_json, res, geo_json_conformant=False)
print(f'h3_fill =\n{type(h3_fill), h3_fill}')
return h3_fill
Another out-of-the-box option out there is the h3 polyfill function. Basically any repeating structure would work (triangle, square, hex), but Uber's library uses hexes so you're stuck with that unless you write a module to do the same thing with one of the other shapes. You still have the issue with "n" not being specified directly though (only indirectly through the discrete zoom level options).
polyfill
I am trying to join two spatial datasets. The first contains points and the second polygons.
However, some of the points are outside of the polygons.
Is there a simple way of joining/snapping these points to the nearest polygon boundary , not the nearest polygon centroid?
At the moment I am joining to the nearest polygon centroid but this does not yield the results I am looking for.
You need to put all points (not polygon points into a KD-Tree) using something like the sklearn package. This package contains an efficient nearest neighbours calculation. In Python it can be imported using:
import sklearn.neighbors as neighbors
If you have about 10 million polygons you only need a tree depth of 12 for it to be efficient. You can experiment with this. If less that 100,000 a leaf_size=9 might be enough. The code to put all points (in one single array) into a tree is done using the following:
tree = neighbors.KDTree( arrayOfPoints, leaf_size=12 )
Then you iterate over each polygon and the individual points in each polygon to find the nearest 5 points (for instance). The algorithm is superquick at finding these because of the nature of the KDTree. Bruteforce comparison can be 1000 times slower (as I found for massive data sets).
shortestDistances, closestIndices = tree.query( pointInPolygon, k=5 )
You might just want the nearest point, so you can set k=1 and then the closestIndices[0] is what you want for the actual array index from the point list.
This is no complete answer, but you can check distances between points and polygons'boudaries using shapely :
from shapely.geometry import Point, Polygon
p = Point(0,0)
poly = Polygon([[1,0], [1,10], [10, 0]])
print(p.distance(poly))
Edit :
So if you're not using big datasets, you could do something like :
my_points = [...]
my_polys = [...]
dict_point_poly = {}
for point in my_points:
try:
intersecting_poly = [poly for poly in my_polys if
point.intersects(poly)][0]
this_poly = intersecting_poly
except IndexError:
distances = [(poly, point.distance(poly)) for poly in my_polys]
distances.sort(key=lambda x:x[1])
this_poly = distances[0][0]
finally:
dict_point_poly[point] = this_poly
(not the most efficient method, but one easily understood I think)
Lets say we have a 100x100 grid that contains a polygon.
Now if we color all possible (x,y) points [x,y are integers] that are contained in the polygon we should expect the polygon to be somewhat painted/filled
But the image that i'm getting never properly falls within and fills the polygon! Is this a limitation of shapely or am I doing something wrong?!
(please note I need this to work for other purposes and not just paiting a polygon)
polygon and filled area not overlapping
import numpy as np
import matplotlib.pyplot as plt
import shapely.geometry
points = np.random.randint(0,100, (10,2)) # 10 random points
poly = shapely.geometry.MultiPoint(points).convex_hull.buffer(1) # a polygon
grid_points = [ shapely.geometry.Point(x,y) for x in range(100) for y in range(100)]
in_poly = np.array([poly.contains(point) for point in grid_points])
#plot
plt.imshow(in_poly.reshape(100,100), origin='lower')
plt.plot(*poly.exterior.xy)
This seems to do what you want - replace this one line (swap y and x in for loops):
grid_points = [ shapely.geometry.Point(x,y) for y in range(100) for x in range(100)]
Couple of notes:
My installation of shapely has this module name (geometry spelled differently so you may need to change name in above line):
import shapely.geometry
And thanks for adding the second plot command - that helped a bunch.
Something along the way has differing major orders (row-vs-column) so the above line changes to column-major.
And it may be you'd want to compensate by doing the inverse on the exterior plot.
(original (with new random shape), updated, with exterior)
I am using python and geojson to do this, I want to specify a point and that point will be the center of a square, assuming the square is 1 mile by one mile I want to list all the points and polys found in the square, including polys bigger than the square.
I have multiple geojson files so will need to do the check a few times which is fine. I have been playing with the code below which checks to see if the cell center is near the centre of the square but will have issues for oddly shaped polygons. I really want to know all items / features that are found in the square.
import json
from shapely.geometry import shape, Point
from shapely.geometry import asShape, mapping
point = Point(14.9783266342289, 16.87265432621112)
max_distance_from_center = 1
with open('cells.geojson') as f:
js = json.load(f)
for feature in js['features']:
polygon = asShape(feature['geometry'])
distance = point.distance(polygon.centroid)
# print(f'{distance} - {polygon.centroid}')
if distance < max_distance_from_center:
print (f'Found cells containing polygon:{feature}')
For source data I was using a exported map from https://azgaar.github.io/Fantasy-Map-Generator/ the grid should be 10 miles by 10 miles. Suggestions on how to do this?
Update:
Here is a poorly drawn diagram. Within the grid square I want to identify all markers and polygons that fall within the bounds of the square even if they go out side of it. I want to have a list of all features that have some presence in the grid square. I highlighted the areas in yellow.
Poorly draw image
I looked at intersects and it may do it. Will try tonight.
you can try this:
First, create grid.
from shapely.geometry import Point
from matplotlib.pyplot as plt
point = Point(0, -10)
square = point.buffer(0.5).envelope
fig, ax = plt.subplots(figsize=(5,5))
gpd.GeoSeries(square).plot(ax=ax)
gpd.GeoSeries(point).plot(ax=ax, color = "black",markersize=30)
plt.grid()
plt.show()
and then,
import geopandas as gpd
# get geodataframe from geojson file
geo_df = gpd.GeoDataFrame.from_file('cells.geojson')
geo_df['grid_yn'] = geo_df['geometry'].apply(lambda x : x.intersects(square))