Calculate overlap efficiently - python

In my code I would like to calculate the overlap between multiple polygons and another polygon. As an example, I made the following three polygons ('tracks') and a square polygon. I need to calculate the overlay of the tracks with the square.
First, the polygons:
from shapely.ops import cascaded_union
from shapely import geometry
import matplotlib.pyplot as plt
from descartes import PolygonPatch
import cartopy.crs as ccrs
#Set up polygons
square = geometry.Polygon([(-3,-3),(3,-3),(3,3),(-3,3),(-3,-3)]) #blue square
track1 = geometry.Polygon([(-6,-0.5),(6,-0.5),(6,0.5),(-6,0.5),(-6,-0.5)])
#green track 1
track2 = geometry.Polygon([(-0.5,-6),(-0.5,6),(-2,6),(-2,-6),(-0.5,-6)]) #red
track 2
track3 = geometry.Polygon([(-8.5,-3),(-7.5,-3),(8.5,4),(7.5,4),(-8.5,-3)])
#yellow track 3
Which looks like this when plotted:
Now, to get the overlay, I first combine the three tracks using cascaded_union into an polygon. After that, I get the intersect polygon:
casc = cascaded_union([track1,track2,track3]) #ISSUE 1
intersect = square.intersection(casc) #ISSUE 2
Then, to get the overlap, I use:
ratio = intersect.area/square.area*100.0
print(ratio)
Which yields a value of 41.98598710317461. So the overlap is approximately 42%.
The issue is that I have to calculate the overlap hundreds of times. The geometries only have to be made once, but I know (from using time.time()) the two lines (indicated by ISSUE 1 and ISSUE 2) take up the major part of the running time. Is there any faster way to calculate the overlap?

Related

remove polygons with an intersection higher than a threshold

The goal is to remove polygons with an intersection higher than a threshold, let's say 80% of the area from the smaller intersected polygon.
In the following picture we see in first image how the small red circle in coordinate (2.06, 41.41) is overlapping with two big circles, in this case in a percentage higher than 80% of its area. So the output will be the second image where the polygons remains under an smaller area of intersection than 80%.
For a reproducible example the data from the image is:
df = pd.DataFrame({"centroid":[geometry.Point(2.1990512195822394, 41.390164933230444),
geometry.Point(2.1253931941117, 41.39962167422747),
geometry.Point(2.0894753122714187, 41.41858536477601),
geometry.Point(2.0724937348578973, 41.41460885909822),
geometry.Point(2.0617756309327735, 41.42279161424217)],
"radius":[591.0368301703261,
247.41971532239666,
1978.0374093864489,
270.9246060416432,
1218.9034814954907],
}
)
For computing the intersection area, just do the following:
import matplotlib.pyplot as plt
from shapely.geometry.point import Point
test_circle = Point(1.7, 3).buffer(0.5)
other = Point(3, 4).buffer(2)
intersection = other.intersection(test_circle)
plt.axis("equal")
plt.plot(*test_circle.exterior.xy, c="r")
plt.plot(*other.exterior.xy, c="g")
if intersection.area > 0.8 * test_circle.area:
# do stuff
...
then you just have to test the possible combinations of overlapping circles.

Problem with shapely polygon contain, seems not to correctly flag all contained points

Lets say we have a 100x100 grid that contains a polygon.
Now if we color all possible (x,y) points [x,y are integers] that are contained in the polygon we should expect the polygon to be somewhat painted/filled
But the image that i'm getting never properly falls within and fills the polygon! Is this a limitation of shapely or am I doing something wrong?!
(please note I need this to work for other purposes and not just paiting a polygon)
polygon and filled area not overlapping
import numpy as np
import matplotlib.pyplot as plt
import shapely.geometry
points = np.random.randint(0,100, (10,2)) # 10 random points
poly = shapely.geometry.MultiPoint(points).convex_hull.buffer(1) # a polygon
grid_points = [ shapely.geometry.Point(x,y) for x in range(100) for y in range(100)]
in_poly = np.array([poly.contains(point) for point in grid_points])
#plot
plt.imshow(in_poly.reshape(100,100), origin='lower')
plt.plot(*poly.exterior.xy)
This seems to do what you want - replace this one line (swap y and x in for loops):
grid_points = [ shapely.geometry.Point(x,y) for y in range(100) for x in range(100)]
Couple of notes:
My installation of shapely has this module name (geometry spelled differently so you may need to change name in above line):
import shapely.geometry
And thanks for adding the second plot command - that helped a bunch.
Something along the way has differing major orders (row-vs-column) so the above line changes to column-major.
And it may be you'd want to compensate by doing the inverse on the exterior plot.
(original (with new random shape), updated, with exterior)

Determining what points and polygons are in a grid square

I am using python and geojson to do this, I want to specify a point and that point will be the center of a square, assuming the square is 1 mile by one mile I want to list all the points and polys found in the square, including polys bigger than the square.
I have multiple geojson files so will need to do the check a few times which is fine. I have been playing with the code below which checks to see if the cell center is near the centre of the square but will have issues for oddly shaped polygons. I really want to know all items / features that are found in the square.
import json
from shapely.geometry import shape, Point
from shapely.geometry import asShape, mapping
point = Point(14.9783266342289, 16.87265432621112)
max_distance_from_center = 1
with open('cells.geojson') as f:
js = json.load(f)
for feature in js['features']:
polygon = asShape(feature['geometry'])
distance = point.distance(polygon.centroid)
# print(f'{distance} - {polygon.centroid}')
if distance < max_distance_from_center:
print (f'Found cells containing polygon:{feature}')
For source data I was using a exported map from https://azgaar.github.io/Fantasy-Map-Generator/ the grid should be 10 miles by 10 miles. Suggestions on how to do this?
Update:
Here is a poorly drawn diagram. Within the grid square I want to identify all markers and polygons that fall within the bounds of the square even if they go out side of it. I want to have a list of all features that have some presence in the grid square. I highlighted the areas in yellow.
Poorly draw image
I looked at intersects and it may do it. Will try tonight.
you can try this:
First, create grid.
from shapely.geometry import Point
from matplotlib.pyplot as plt
point = Point(0, -10)
square = point.buffer(0.5).envelope
fig, ax = plt.subplots(figsize=(5,5))
gpd.GeoSeries(square).plot(ax=ax)
gpd.GeoSeries(point).plot(ax=ax, color = "black",markersize=30)
plt.grid()
plt.show()
and then,
import geopandas as gpd
# get geodataframe from geojson file
geo_df = gpd.GeoDataFrame.from_file('cells.geojson')
geo_df['grid_yn'] = geo_df['geometry'].apply(lambda x : x.intersects(square))

Substracting inner rings from a list of polygons

I have a large list of polygons (>10^6) most of which are non-intersecting but some of these polygons are hole of another polygon (~10^3 cases). Here a image to explain the problem, the smaller polygon is a hole in the larger polygon but both are independent polygon in the list of polygons.
Now I would like to efficiently determine which polygons are holes and substract the holes i.e. subtract the smaller polygons which lie completely inside in another polygon and return a list of "cleaned" polygons. A pair of hole and parent polygon should be transformed like this (so basically hole subtracted from the parent):
There are plenty of similar questions on Stackoverflow and gis.stackexchange.com but I haven't found one that actually solves this problems. Here are some related questions:
1. https://gis.stackexchange.com/questions/5405/using-shapely-translating-between-polygons-and-multipolygons
2. https://gis.stackexchange.com/questions/319546/converting-list-of-polygons-to-multipolygon-using-shapely
Here is a sample code.
from shapely.geometry import Point
from shapely.geometry import MultiPolygon
from shapely.ops import unary_union
import numpy as np
#Generate a list of polygons, where some are holes in others;
def generateRandomPolygons(polygonCount = 100, areaDimension = 1000, holeProbability = 0.5):
pl = []
radiusLarge = 2 #In the real dataset the size of polygons can vary
radiusSmall = 1 #Size of holes can also vary
for i in range(polygonCount):
x, y = np.random.randint(0,areaDimension,(2))
rn1 = np.random.random(1)
pl.append(Point(x, y).buffer(radiusLarge))
if rn1 < holeProbability: #With a holeProbability add a hole in the large polygon that was just added to the list
pl.append(Point(x, y).buffer(radiusSmall))
return pl
polygons = generateRandomPolygons()
print(len(pl))
Output looks like this:
Now how can I create a new list of polygons with the holes removed. Shapely provides functions to subtract one polygon from another (difference) but is there a similar function for lists of polygons (maybe something like unary_union but where overlaps are removed)? Alternative how to efficiently determine which are holes and then subtract them from the larger polygons?
Your problem is you don't know which ones are "holes", right? To "efficiently determine which polygons are holes", you can use an rtree to speed up the intersection check:
from rtree.index import Index
# create an rtree for efficient spatial queries
rtree = Index((i, p.bounds, None) for i, p in enumerate(polygons))
donuts = []
for i, this_poly in enumerate(polygons):
# loop over indices of approximately intersecting polygons
for j in rtree.intersection(this_poly.bounds):
# ignore the intersection of this polygon with itself
if i == j:
continue
other_poly = polygons[j]
# ensure the polygon fully contains our match
if this_poly.contains(other_poly):
donut = this_poly.difference(other_poly)
donuts.append(donut)
break # quit searching
print(len(donuts))

Extract coordinates enclosed by a matplotlib patch.

I have created an ellipse using matplotlib.patches.ellipse as shown below:
patch = mpatches.Ellipse(center, major_ax, minor_ax, angle_deg, fc='none', ls='solid', ec='g', lw='3.')
What I want is a list of all the integer coordinates enclosed inside this patch.
I.e. If I was to plot this ellipse along with every integer point on the same grid, how many of those points are enclosed in the ellipse?
I have tried seeing if I can extract the equation of the ellipse so I can loop through each point and see whether it falls within the line but I can't seem to find an obvious way to do this, it becomes more complicated as the major axis of the ellipse can be orientated at any angle. The information to do this must be stored in patches somewhere, but I can't seem to find it.
Any advice on this would be much appreciated.
Ellipse objects have a method contains_point which will return 1 if the point is in the ellipse, 0 other wise.
Stealing from #DrV 's answer:
import matplotlib.pyplot as plt
import matplotlib.patches
import numpy as np
# create an ellipse
el = matplotlib.patches.Ellipse((50,-23), 10, 13.7, 30, facecolor=(1,0,0,.2), edgecolor='none')
# calculate the x and y points possibly within the ellipse
y_int = np.arange(-30, -15)
x_int = np.arange(40, 60)
# create a list of possible coordinates
g = np.meshgrid(x_int, y_int)
coords = list(zip(*(c.flat for c in g)))
# create the list of valid coordinates (from untransformed)
ellipsepoints = np.vstack([p for p in coords if el.contains_point(p, radius=0)])
# just to see if this works
fig = plt.figure()
ax = fig.add_subplot(111)
ax.add_artist(el)
ep = np.array(ellipsepoints)
ax.plot(ellipsepoints[:,0], ellipsepoints[:,1], 'ko')
plt.show()
This will give you the result as below:
If you really want to use the methods offered by matplotlib, then:
import matplotlib.pyplot as plt
import matplotlib.patches
import numpy as np
# create an ellipse
el = matplotlib.patches.Ellipse((50,-23), 10, 13.7, 30, facecolor=(1,0,0,.2), edgecolor='none')
# find the bounding box of the ellipse
bb = el.get_window_extent()
# calculate the x and y points possibly within the ellipse
x_int = np.arange(np.ceil(bb.x0), np.floor(bb.x1) + 1, dtype='int')
y_int = np.arange(np.ceil(bb.y0), np.floor(bb.y1) + 1, dtype='int')
# create a list of possible coordinates
g = np.meshgrid(x_int, y_int)
coords = np.array(zip(*(c.flat for c in g)))
# create a list of transformed points (transformed so that the ellipse is a unit circle)
transcoords = el.get_transform().inverted().transform(coords)
# find the transformed coordinates which are within a unit circle
validcoords = transcoords[:,0]**2 + transcoords[:,1]**2 < 1.0
# create the list of valid coordinates (from untransformed)
ellipsepoints = coords[validcoords]
# just to see if this works
fig = plt.figure()
ax = fig.add_subplot(111)
ax.add_artist(el)
ep = np.array(ellipsepoints)
ax.plot(ellipsepoints[:,0], ellipsepoints[:,1], 'ko')
Seems to work:
(Zooming in reveals that even the points hanging on the edge are inside.)
The point here is that matplotlib handles ellipses as transformed circles (translate, rotate, scale, anything affine). If the transform is applied in reverse, the result is a unit circle at origin, and it is very simple to check if a point is within that.
Just a word of warning: The get_window_extent may not be extremely reliable, as it seems to use the spline approximation of a circle. Also, see tcaswell's comment on the renderer-dependency.
In order to find a more reliable bounding box, you may:
create a horizontal and vertical vector into the plot coordinates (their position is not important, ([0,0],[1,0]) and ([0,0], [0,1]) will do)
transform these vectors into the ellipse coordinates (the get_transform, etc.)
find in the ellipse coordinate system (i.e. the system where the ellipse is a unit circle around the origin) the four tangents of the circle which are parallel to these two vectors
find the intersection points of the vectors (4 intersections, but 2 diagonal will be enough)
transform the intersection points back to the plot coordinates
This will give an accurate (but of course limited by the numerical precision) square bounding box.
However, you may use a simple approximation:
all possible points are within a circle whose center is the same as that of the ellipse and whose diameter is the same as that of the major axis of the ellipse
In other words, all possible points are within a square bounding box which is between x0+-m/2, y0+-m/2, where (x0, y0) is the center of the ellipse and m the major axis.
I'd like to offer another solution that uses the Path object's contains_points() method instead of contains_point():
First get the coordinates of the ellipse and make it into a Path object:
elpath=Path(el.get_verts())
(NOTE that el.get_paths() won't work for some reason.)
Then call the path's contains_points():
validcoords=elpath.contains_points(coords)
Below I'm comparing #tacaswell's solution (method 1), #Drv's (method 2) and my own (method 3) (I've enlarged the ellipse by ~5 times):
import numpy
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
from matplotlib.path import Path
import time
#----------------Create an ellipse----------------
el=Ellipse((50,-23),50,70,30,facecolor=(1,0,0,.2), edgecolor='none')
#---------------------Method 1---------------------
t1=time.time()
for ii in range(50):
y=numpy.arange(-100,50)
x=numpy.arange(-30,130)
g=numpy.meshgrid(x,y)
coords=numpy.array(zip(*(c.flat for c in g)))
ellipsepoints = numpy.vstack([p for p in coords if el.contains_point(p, radius=0)])
t2=time.time()
print 'time of method 1',t2-t1
#---------------------Method 2---------------------
t2=time.time()
for ii in range(50):
y=numpy.arange(-100,50)
x=numpy.arange(-30,130)
g=numpy.meshgrid(x,y)
coords=numpy.array(zip(*(c.flat for c in g)))
invtrans=el.get_transform().inverted()
transcoords=invtrans.transform(coords)
validcoords=transcoords[:,0]**2+transcoords[:,1]**2<=1.0
ellipsepoints=coords[validcoords]
t3=time.time()
print 'time of method 2',t3-t2
#---------------------Method 3---------------------
t3=time.time()
for ii in range(50):
y=numpy.arange(-100,50)
x=numpy.arange(-30,130)
g=numpy.meshgrid(x,y)
coords=numpy.array(zip(*(c.flat for c in g)))
#------Create a path from ellipse's vertices------
elpath=Path(el.get_verts())
# call contains_points()
validcoords=elpath.contains_points(coords)
ellipsepoints=coords[validcoords]
t4=time.time()
print 'time of method 3',t4-t3
#---------------------Plot it ---------------------
fig,ax=plt.subplots()
ax.add_artist(el)
ep=numpy.array(ellipsepoints)
ax.plot(ellipsepoints[:,0],ellipsepoints[:,1],'ko')
plt.show(block=False)
I got these execution time:
time of method 1 62.2502269745
time of method 2 0.488734006882
time of method 3 0.588987112045
So the contains_point() approach is way slower. The coordinate-transformation method is faster than mine, but when you get irregular shaped contours/polygons, this method would still work.
Finally the result plot:

Categories