remove polygons with an intersection higher than a threshold - python

The goal is to remove polygons with an intersection higher than a threshold, let's say 80% of the area from the smaller intersected polygon.
In the following picture we see in first image how the small red circle in coordinate (2.06, 41.41) is overlapping with two big circles, in this case in a percentage higher than 80% of its area. So the output will be the second image where the polygons remains under an smaller area of intersection than 80%.
For a reproducible example the data from the image is:
df = pd.DataFrame({"centroid":[geometry.Point(2.1990512195822394, 41.390164933230444),
geometry.Point(2.1253931941117, 41.39962167422747),
geometry.Point(2.0894753122714187, 41.41858536477601),
geometry.Point(2.0724937348578973, 41.41460885909822),
geometry.Point(2.0617756309327735, 41.42279161424217)],
"radius":[591.0368301703261,
247.41971532239666,
1978.0374093864489,
270.9246060416432,
1218.9034814954907],
}
)

For computing the intersection area, just do the following:
import matplotlib.pyplot as plt
from shapely.geometry.point import Point
test_circle = Point(1.7, 3).buffer(0.5)
other = Point(3, 4).buffer(2)
intersection = other.intersection(test_circle)
plt.axis("equal")
plt.plot(*test_circle.exterior.xy, c="r")
plt.plot(*other.exterior.xy, c="g")
if intersection.area > 0.8 * test_circle.area:
# do stuff
...
then you just have to test the possible combinations of overlapping circles.

Related

Calculate the area enclosed by a 2D array of unordered points in python

I am trying to calculate the area of a shape enclosed by a large set of unordered points in python. I have a 2D array of points which I can plot as a scatterplot like this.
There are several ways to calculate the area enclosed by points, but these all assume ordered points, such as here and here. This method calculates the area unordered points, but it doesn't appear to work for complex shapes, as seen here. How would I calculate this area from unordered points in python?
Sample data looks like this:
[[225.93459 -27.25677 ]
[226.98128 -32.001945]
[223.3623 -34.119724]
[225.84741 -34.416553]]
From pen and paper one can see that this shape contains an area of ~12 (unitless) but putting these coordinates into one of the algorithms linked to previously returns an area of ~0.78.
Let's first mention that in the question How would I calculate this area from unordered points in python? used phrase 'unordered points' in the context of calculation of an area usually means that given are points of a contour enclosing an area which area is to calculate.
But in the question provided data sample are not points of a contour but just a cloud of points, which if visualized using a scatterplot results in a visually perceivable area.
The above is the reason why in the question provided links to algorithms calculating areas from 'unordered points' don't apply at all to what the question is about.
In other words, the actual title of the question I will answer below will be:
Calculate the visually perceivable area a cloud of (x,y) points is forming when visualized as a scatterplot
One of the possible options is mentioned in a comment to the question:
Honestly, you might consider taking THAT graph as a bitmap, and counting the number of non-white pixels in it. That is probably as close as you can get. – Tim Roberts
Given the image perfectly covering (without any margin) all the non-white pixels you can calculate the area the image rectangle is covering in units used in the underlying (x,y) data by calculating the area TA of the rectangle visible in the image from the underlying list of points P with (x,y) point coordinates ( P = [(x1,y1), (x2,y2), ...] ) as follows:
X = [x for x,y in P]
Y = [y for x,y in P]
TA = (max(X)-min(X))*(max(Y)-min(Y))
Assuming N_white is the number of all white pixels in the image with N pixels the actual area A covered by non-white pixels expressed in units used in the list of points P will be:
A = TA*(N-N_white)/N
Another approach using a list of points P with (x,y) point coordinates only ( without creation of an image ) consists of following steps:
decide which area Ap a point is covering and calculate half of the size h2 of a rectangle with this area around that point ( h2 = 0.5*sqrt(Ap) )
create a list R with rectangles around all points in the list P: R = [(x-h2, y+h2, x+h2, y-h2) for x,y in P]
use the code provided through a link listed in the stackoverflow question
Area of Union Of Rectangles using Segment Trees to calculate the total area covered by the rectangles in the list R.
The above approach has the advantage over the graphical one obtained from the scatterplot that with the choice of the area covered by a point you directly influence the used precision/resolution/granularity for the area calculation.
Given a 2D array of points the area covered by the points can be calculated with help of the return value of the same hist2d() function provided in the matplotlib module (as matplotlib.pyplot.hist2d()) which is used to show the scatterplot.
The 'trick' is to set the cmin parameter value of the function to 1 ( cmin=1 ) and then calculate the number of numpy.nan values in the by the function returned array setting them in relation to entire amount of array values.
In other words all what is necessary to calculate the area when creating the scatterplot is already there for easy use in a simple area calculation formulas if you know that the histogram creating function provide as return value all what is therefore necessary.
Below code of a ready to use function for the area calculation along with demonstration of function usage:
def area_of_points(points, grid_size = [1000, 1000]):
"""
Returns the area covered by N 2D-points provided in a 'points' array
points = [ (x1,y1), (x2,y2), ... , (xN, yN) ]
'grid_size' gives the number of grid cells in x and y direction the
'points' bounding box is divided into for calculation of the area.
Larger 'grid_size' values mean smaller grid cells, higher precision
of the area calculation and longer runtime.
area_of_points() requires installed matplotlib module. """
import matplotlib.pyplot as plt
import numpy as np
pts_x = [x for x,y in points]
pts_y = [y for x,y in points]
pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
h2D,_,_,_ = plt.hist2d( pts_x, pts_y, bins = grid_size, cmin=1)
numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
numberOfAll2Dbins = h2D.shape[0]*h2D.shape[1]
areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
pts_pts_area = areaFactor * pts_bb_area
print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {pts_pts_area:8.4f}')
plt.show()
return pts_pts_area
#:def area_of_points(points, grid_size = [1000, 1000])
import numpy as np
np.random.seed(12345)
x = np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
print(area_of_points(pts))
# ^-- prints: Areas: b-box = 114.5797, points = 7.8001
# ^-- prints: 7.800126875291629
The above code creates following scatterplot:
Notice that the printed output Areas: b-box = 114.5797, points = 7.8001 and the by the function returned area value 7.800126875291629 give the area in units in which the x,y coordinates in the array of points are specified.
Instead of usage of a function when utilizing the know how you can play around with the parameter of the scatterplot calculating the area of what can be seen in the scatterplot.
Below code which changes the displayed scatterplot using the same underlying point data:
import numpy as np
np.random.seed(12345)
x = np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
pts_values_example = \
[[0.53005, 2.79209],
[0.73751, 0.18978],
... ,
[-0.6633, -2.0404],
[1.51470, 0.86644]]
# ---
pts_x = [x for x,y in pts]
pts_y = [y for x,y in pts]
pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
# ---
import matplotlib.pyplot as plt
bins = [320, 300] # resolution of the grid (for the scatter plot)
# ^-- resolution of precision for the calculation of area
pltRetVal = plt.hist2d( pts_x, pts_y, bins = bins, cmin=1, cmax=15 )
plt.colorbar() # display the colorbar (for a 2d density histogram)
plt.show()
# ---
h2D, xedges1D, yedges1D, h2DhistogramObject = pltRetVal
numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
numberOfAll2Dbins = (len(xedges1D)-1)*(len(yedges1D)-1)
areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
area = areaFactor * pts_bb_area
print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {area:8.4f}')
# prints "Areas: b-box = 114.5797, points = 20.7174"
creating following scatterplot:
Notice that the calculated area is now larger due to smaller values used for grid resolution resulting in more of the area colored.

How do I triangulate a polygon with holes without extra points

I am trying to triangulate a number of polygons such that the triangles do not add extra points. For the sake of keeping the question short I will be using 2 circles in each other, in reality these will be opencv contours, however the translation between the two is quite complex and the circles also show the problem.
So I have the following code (based on the example) in order to first get the circles and then triangulate them with the triangle project
import matplotlib.pyplot as plt
import numpy as np
import triangle as tr
def circle(N, R):
i = np.arange(N)
theta = i * 2 * np.pi / N
pts = np.stack([np.cos(theta), np.sin(theta)], axis=1) * R
seg = np.stack([i, i + 1], axis=1) % N
return pts, seg
pts0, seg0 = circle(30, 1.4)
pts1, seg1 = circle(16, 0.6)
pts = np.vstack([pts0, pts1])
seg = np.vstack([seg0, seg1 + seg0.shape[0]])
print(pts)
print(seg)
A = dict(vertices=pts, segments=seg, holes=[[0, 0]])
print(seg)
B = tr.triangulate(A) #note that the origin uses 'qpa0.05' here
tr.compare(plt, A, B)
plt.show()
Now this causes both the outer and inner circles to get triangulated like show here , clearly ignoring the hole. However by setting the 'qpa0.05' flag we can cause the circle to use the hole as seen here . However doing this causes the triangles to be split, adding many different triangles, increasing the qpa to a higher value does cause the number of triangles to be somewhat decreased, however they remain there.
Note that I want to be able to handle multiple holes in the same shape, and that shapes might end up being concave.
Anybody know how to get the triangulation to use the holes without adding extra triangles?
You can connect the hole (or holes) to the exterior perimeter so that you get a single "degenerate polygon" defined by a single point sequence that connects all points without self-intersection.
You go in and out through the same segment. If you follow the outer perimeter clockwise, you need to follow the hole perimeter counterclockwise or vice-versa. Otherwise, it would self-intersect.
I figured it out
the "qpa0.05" should have been a 'p', the p makes the code factor in holes and the a sets the maximum area for the triangles, this causes the extra points to be added.
B = tr.triangulate(A,'p')

Determining what points and polygons are in a grid square

I am using python and geojson to do this, I want to specify a point and that point will be the center of a square, assuming the square is 1 mile by one mile I want to list all the points and polys found in the square, including polys bigger than the square.
I have multiple geojson files so will need to do the check a few times which is fine. I have been playing with the code below which checks to see if the cell center is near the centre of the square but will have issues for oddly shaped polygons. I really want to know all items / features that are found in the square.
import json
from shapely.geometry import shape, Point
from shapely.geometry import asShape, mapping
point = Point(14.9783266342289, 16.87265432621112)
max_distance_from_center = 1
with open('cells.geojson') as f:
js = json.load(f)
for feature in js['features']:
polygon = asShape(feature['geometry'])
distance = point.distance(polygon.centroid)
# print(f'{distance} - {polygon.centroid}')
if distance < max_distance_from_center:
print (f'Found cells containing polygon:{feature}')
For source data I was using a exported map from https://azgaar.github.io/Fantasy-Map-Generator/ the grid should be 10 miles by 10 miles. Suggestions on how to do this?
Update:
Here is a poorly drawn diagram. Within the grid square I want to identify all markers and polygons that fall within the bounds of the square even if they go out side of it. I want to have a list of all features that have some presence in the grid square. I highlighted the areas in yellow.
Poorly draw image
I looked at intersects and it may do it. Will try tonight.
you can try this:
First, create grid.
from shapely.geometry import Point
from matplotlib.pyplot as plt
point = Point(0, -10)
square = point.buffer(0.5).envelope
fig, ax = plt.subplots(figsize=(5,5))
gpd.GeoSeries(square).plot(ax=ax)
gpd.GeoSeries(point).plot(ax=ax, color = "black",markersize=30)
plt.grid()
plt.show()
and then,
import geopandas as gpd
# get geodataframe from geojson file
geo_df = gpd.GeoDataFrame.from_file('cells.geojson')
geo_df['grid_yn'] = geo_df['geometry'].apply(lambda x : x.intersects(square))

Calculate overlap efficiently

In my code I would like to calculate the overlap between multiple polygons and another polygon. As an example, I made the following three polygons ('tracks') and a square polygon. I need to calculate the overlay of the tracks with the square.
First, the polygons:
from shapely.ops import cascaded_union
from shapely import geometry
import matplotlib.pyplot as plt
from descartes import PolygonPatch
import cartopy.crs as ccrs
#Set up polygons
square = geometry.Polygon([(-3,-3),(3,-3),(3,3),(-3,3),(-3,-3)]) #blue square
track1 = geometry.Polygon([(-6,-0.5),(6,-0.5),(6,0.5),(-6,0.5),(-6,-0.5)])
#green track 1
track2 = geometry.Polygon([(-0.5,-6),(-0.5,6),(-2,6),(-2,-6),(-0.5,-6)]) #red
track 2
track3 = geometry.Polygon([(-8.5,-3),(-7.5,-3),(8.5,4),(7.5,4),(-8.5,-3)])
#yellow track 3
Which looks like this when plotted:
Now, to get the overlay, I first combine the three tracks using cascaded_union into an polygon. After that, I get the intersect polygon:
casc = cascaded_union([track1,track2,track3]) #ISSUE 1
intersect = square.intersection(casc) #ISSUE 2
Then, to get the overlap, I use:
ratio = intersect.area/square.area*100.0
print(ratio)
Which yields a value of 41.98598710317461. So the overlap is approximately 42%.
The issue is that I have to calculate the overlap hundreds of times. The geometries only have to be made once, but I know (from using time.time()) the two lines (indicated by ISSUE 1 and ISSUE 2) take up the major part of the running time. Is there any faster way to calculate the overlap?

speedup geolocation algorithm in python

I have a set 100k of of geo locations (lat/lon) and a hexogonal grid (4k polygons). My goal is to calculate the total number of points which are located within each polygon.
My current algorithm uses 2 for loops to loop over all geo points and all polygons, which is really slow if I increase the number of polygons... How would you speedup the algorithm? I have uploaded a minimal example which creates 100k random geo points and uses 561 cells in the grid...
I also saw that reading the geo json file (with 4k polygons) takes some time, maybe i should export the polygons into a csv?
hexagon_grid.geojson file:
https://gist.github.com/Arnold1/9e41454e6eea910a4f6cd68ff1901db1
minimal python example:
https://gist.github.com/Arnold1/ee37a2e4b2dfbfdca9bfae7c7c3a3755
You don't need to explicitly test each hexagon to see whether a given point is located inside it.
Let's assume, for the moment, that all of your points fall somewhere within the bounds of your hexagonal grid. Because your hexagons form a regular lattice, you only really need to know which of the hexagon centers is closest to each point.
This can be computed very efficiently using a scipy.spatial.cKDTree:
import numpy as np
from scipy.spatial import cKDTree
import json
with open('/tmp/grid.geojson', 'r') as f:
data = json.load(f)
verts = []
centroids = []
for hexagon in data['features']:
# a (7, 2) array of xy coordinates specifying the vertices of the hexagon.
# we ignore the last vertex since it's equal to the first
xy = np.array(hexagon['geometry']['coordinates'][0][:6])
verts.append(xy)
# compute the centroid by taking the average of the vertex coordinates
centroids.append(xy.mean(0))
verts = np.array(verts)
centroids = np.array(centroids)
# construct a k-D tree from the centroid coordinates of the hexagons
tree = cKDTree(centroids)
# generate 10000 normally distributed xy coordinates
sigma = 0.5 * centroids.std(0, keepdims=True)
mu = centroids.mean(0, keepdims=True)
gen = np.random.RandomState(0)
xy = (gen.randn(10000, 2) * sigma) + mu
# query the k-D tree to find which hexagon centroid is nearest to each point
distance, idx = tree.query(xy, 1)
# count the number of points that are closest to each hexagon centroid
counts = np.bincount(idx, minlength=centroids.shape[0])
Plotting the output:
from matplotlib import pyplot as plt
fig, ax = plt.subplots(1, 1, subplot_kw={'aspect': 'equal'})
ax.hold(True)
ax.scatter(xy[:, 0], xy[:, 1], 10, c='b', alpha=0.25, edgecolors='none')
ax.scatter(centroids[:, 0], centroids[:, 1], marker='h', s=(counts + 5),
c=counts, cmap='Reds')
ax.margins(0.01)
I can think of several different ways you could handle points that fall outside your grid depending on how much accuracy you need:
You could exclude points that fall outside the outer bounding rectangle of your hexagon vertices (i.e. x < xmin, x > xmax etc.). However, this will fail to exclude points that fall within the 'gaps' along the edges of your grid.
Another straightforward option would be to set a cut-off on distance according to the spacing of your hexagon centers, which is equivalent to using a circular approximation for your outer hexagons.
If accuracy is crucial then you could define a matplotlib.path.Path corresponding to the outer vertices of your hexagonal grid, then use its .contains_points() method to test whether your points are contained within it. Compared to the other two methods, this would probably be slower and more fiddly to code.

Categories