Determining what points and polygons are in a grid square - python

I am using python and geojson to do this, I want to specify a point and that point will be the center of a square, assuming the square is 1 mile by one mile I want to list all the points and polys found in the square, including polys bigger than the square.
I have multiple geojson files so will need to do the check a few times which is fine. I have been playing with the code below which checks to see if the cell center is near the centre of the square but will have issues for oddly shaped polygons. I really want to know all items / features that are found in the square.
import json
from shapely.geometry import shape, Point
from shapely.geometry import asShape, mapping
point = Point(14.9783266342289, 16.87265432621112)
max_distance_from_center = 1
with open('cells.geojson') as f:
js = json.load(f)
for feature in js['features']:
polygon = asShape(feature['geometry'])
distance = point.distance(polygon.centroid)
# print(f'{distance} - {polygon.centroid}')
if distance < max_distance_from_center:
print (f'Found cells containing polygon:{feature}')
For source data I was using a exported map from https://azgaar.github.io/Fantasy-Map-Generator/ the grid should be 10 miles by 10 miles. Suggestions on how to do this?
Update:
Here is a poorly drawn diagram. Within the grid square I want to identify all markers and polygons that fall within the bounds of the square even if they go out side of it. I want to have a list of all features that have some presence in the grid square. I highlighted the areas in yellow.
Poorly draw image
I looked at intersects and it may do it. Will try tonight.

you can try this:
First, create grid.
from shapely.geometry import Point
from matplotlib.pyplot as plt
point = Point(0, -10)
square = point.buffer(0.5).envelope
fig, ax = plt.subplots(figsize=(5,5))
gpd.GeoSeries(square).plot(ax=ax)
gpd.GeoSeries(point).plot(ax=ax, color = "black",markersize=30)
plt.grid()
plt.show()
and then,
import geopandas as gpd
# get geodataframe from geojson file
geo_df = gpd.GeoDataFrame.from_file('cells.geojson')
geo_df['grid_yn'] = geo_df['geometry'].apply(lambda x : x.intersects(square))

Related

Overlapping LineString and Polygon returning false in Shapely

I am using shapely to find if a LineString and Polygon overlap.
I have defined the polygon based on the map area that I need to find if points are overlapping in:
polygon_map = Polygon([(-126.03561599522513,60.08405276856493),(-65.91842849522513,60.08405276856493),(-65.91842849522513,76.84958092750016),(-126.03561599522513,76.84958092750016),(-126.03561599522513,60.08405276856493)])
For the LineString, I have a long list of coordinates in two columns, x and y. I have taken the maximum and minimum coordinates of x and y to generate a LineString.
line = LineString([(32.823101,87.988993),(-153.01468,30.001368)])
When I plot these on a map, they overlap (as expected)
m = folium.Map([61.08405, -66.918], zoom_start=3, tiles='cartodbpositron')
folium.GeoJson(line).add_to(m)
folium.GeoJson(polygon_map).add_to(m)
folium.LatLngPopup().add_to(m)
m
[Image of map created showing intersecting polygon and linestring]
However, when I do:
line.overlaps(polygon_map)
It returns false, and I can't work out why.
I have simplified the LineString to only include the minimum and maximum coordinates as I have hundreds of coordinates in my original dataframe and I'm worried it will take too long to loop through each set of coordinates. I haven't used Shapely before so I'm not sure if this is why it isn't working.
This is all down to geographic projections. As pure cartesian geometry without accounting for curvature of the earth they do not overlap. (See below image). shapely has no knowledge of geographic projections, it is pure cartesian geometry. As cartesian geometric objects this polygon and LineString do not overlap.
Only after setting the CRS does folium show these geometries overlapping.
import geopandas as gpd
from shapely.geometry import Polygon, LineString
polygon_map = Polygon([(-126.03561599522513,60.08405276856493),(-65.91842849522513,60.08405276856493),(-65.91842849522513,76.84958092750016),(-126.03561599522513,76.84958092750016),(-126.03561599522513,60.08405276856493)]) # fmt: skip
line = LineString([(32.823101, 87.988993), (-153.01468, 30.001368)])
gdf = gpd.GeoDataFrame(geometry=[polygon_map, line])
gdf.explore(height=300, width=300)
# gdf.set_crs("epsg:4386").explore(height=300, width=300)

Problem with shapely polygon contain, seems not to correctly flag all contained points

Lets say we have a 100x100 grid that contains a polygon.
Now if we color all possible (x,y) points [x,y are integers] that are contained in the polygon we should expect the polygon to be somewhat painted/filled
But the image that i'm getting never properly falls within and fills the polygon! Is this a limitation of shapely or am I doing something wrong?!
(please note I need this to work for other purposes and not just paiting a polygon)
polygon and filled area not overlapping
import numpy as np
import matplotlib.pyplot as plt
import shapely.geometry
points = np.random.randint(0,100, (10,2)) # 10 random points
poly = shapely.geometry.MultiPoint(points).convex_hull.buffer(1) # a polygon
grid_points = [ shapely.geometry.Point(x,y) for x in range(100) for y in range(100)]
in_poly = np.array([poly.contains(point) for point in grid_points])
#plot
plt.imshow(in_poly.reshape(100,100), origin='lower')
plt.plot(*poly.exterior.xy)
This seems to do what you want - replace this one line (swap y and x in for loops):
grid_points = [ shapely.geometry.Point(x,y) for y in range(100) for x in range(100)]
Couple of notes:
My installation of shapely has this module name (geometry spelled differently so you may need to change name in above line):
import shapely.geometry
And thanks for adding the second plot command - that helped a bunch.
Something along the way has differing major orders (row-vs-column) so the above line changes to column-major.
And it may be you'd want to compensate by doing the inverse on the exterior plot.
(original (with new random shape), updated, with exterior)

Calculate overlap efficiently

In my code I would like to calculate the overlap between multiple polygons and another polygon. As an example, I made the following three polygons ('tracks') and a square polygon. I need to calculate the overlay of the tracks with the square.
First, the polygons:
from shapely.ops import cascaded_union
from shapely import geometry
import matplotlib.pyplot as plt
from descartes import PolygonPatch
import cartopy.crs as ccrs
#Set up polygons
square = geometry.Polygon([(-3,-3),(3,-3),(3,3),(-3,3),(-3,-3)]) #blue square
track1 = geometry.Polygon([(-6,-0.5),(6,-0.5),(6,0.5),(-6,0.5),(-6,-0.5)])
#green track 1
track2 = geometry.Polygon([(-0.5,-6),(-0.5,6),(-2,6),(-2,-6),(-0.5,-6)]) #red
track 2
track3 = geometry.Polygon([(-8.5,-3),(-7.5,-3),(8.5,4),(7.5,4),(-8.5,-3)])
#yellow track 3
Which looks like this when plotted:
Now, to get the overlay, I first combine the three tracks using cascaded_union into an polygon. After that, I get the intersect polygon:
casc = cascaded_union([track1,track2,track3]) #ISSUE 1
intersect = square.intersection(casc) #ISSUE 2
Then, to get the overlap, I use:
ratio = intersect.area/square.area*100.0
print(ratio)
Which yields a value of 41.98598710317461. So the overlap is approximately 42%.
The issue is that I have to calculate the overlap hundreds of times. The geometries only have to be made once, but I know (from using time.time()) the two lines (indicated by ISSUE 1 and ISSUE 2) take up the major part of the running time. Is there any faster way to calculate the overlap?

Calculating Centroid of outline of XY scatter

I'm working on a project to calculate the centroid of a state/country using python.
What I have done so far:
Take an outline of the state and run it through ImageJ to create a csv of the x,y coordinates of the border. This gives me a .csv file with data like this:
556,243
557,243
557,250
556,250
556,252
555,252
555,253
554,253
etc, etc,
For about 2500 data points.
Import this list into a Python script.
Calculate the average of the x and y coordinate arrays. This point is the centroid. (Idea similar to this)
Plot the points and the centroid using matplotlib.
Here is my code:
#####################################################
# Imports #
#####################################################
import csv
import matplotlib.pyplot as plt
import numpy as np
import pylab
#####################################################
# Setup #
#####################################################
#Set empty list for coordinates
x,y =[],[]
#Importing csv data
with open("russiadata.csv", "r") as russiadataFile:
russiadataReader = csv.reader(russiadataFile)
#Create list of points
russiadatalist = []
#Import data
for row in russiadataReader:
#While the rows have data, AKA length not equal to zero.
if len(row) != 0:
#Append data to arrays created above
x.append(float(row[0]))
y.append(float(row[1]))
#Close file as importing is done
russiadataFile.closejust flipped around the
#####################################################
# Data Analysis #
#####################################################
#Convert list to array for computations
x=np.array(x)
y=np.array(y)
#Calculate number of data points
x_len=len(x)just flipped around the
y_len=len(y)
#Set sum of points equal to x_sum and y_sum
x_sum=np.sum(x)
y_sum=np.sum(y)
#Calculate centroid of points
x_centroid=x_sum/x_len
y_centroid=y_sum/y_len
#####################################################
# Plotting #
#####################################################
#Plot all points in data
plt.xkcd()
plt.plot(x,y, "-.")
#Plot centroid and label it
plt.plot(x_centroid,y_centroid,'^')
plt.ymax=max(x)
#Add axis labels
plt.xlabel("X")
plt.ylabel("Y")
plt.title("russia")
#Show the plot
plt.show()
The problem I have run into is that some sides of the state have more points than others, so the centroid is being weighted towards areas with more points. This is not what I want. I'm trying to find the centroid of the polygon that has vertices from the x,y coordinates.
This is what my plot looks like:
https://imgur.com/a/ZdukA
As you can see, the centroid is weighted more towards the section of points with more density. (As a side note, yes, that is Russia. I'm having issues with the plot coming out backwards and stretched/squashed.)
In other words, is there a more accurate way to get the centroid?
Thanks in advance for any help.
It sounds to me like you don't want your centroid to be calculated with the density of the scatter in mind.
If you just want to use surface area, then I would eliminate any point that is contained within the current outline of the scatter. A slightly more accurate way might be to pretend there is a box outlined by your outer-most points, then to check the x- and y-coordinates of all of your points and eliminate any that fall inside of the box. Any points that fall inside the current outline are not contributing to the shape, only the density.
I think the most technical and accurate approach would be very complicated, and here's what I think it would require: to get the outer-most points to connect based on least distance from each other, and furthest distance from all other points. By "connect" I mean to pretend that a line passes through, and ends at, both points. It should be defined mathematically.
Then, for each point, calculate whether or not it falls inside or outside of this outline, and eliminate all that fall inside (they are redundant as they are already inside the shape).
You can find the correct formula for a closed polygon on Wikipedia: https://en.wikipedia.org/wiki/Centroid#Centroid_of_a_polygon
Another formula is helpful to deal with Kaliningrad oblast (exclave) and islands (if you want to be really precise): https://en.wikipedia.org/wiki/Centroid#By_geometric_decomposition
That said, such questions probably fit better to https://math.stackexchange.com

Extract coordinates enclosed by a matplotlib patch.

I have created an ellipse using matplotlib.patches.ellipse as shown below:
patch = mpatches.Ellipse(center, major_ax, minor_ax, angle_deg, fc='none', ls='solid', ec='g', lw='3.')
What I want is a list of all the integer coordinates enclosed inside this patch.
I.e. If I was to plot this ellipse along with every integer point on the same grid, how many of those points are enclosed in the ellipse?
I have tried seeing if I can extract the equation of the ellipse so I can loop through each point and see whether it falls within the line but I can't seem to find an obvious way to do this, it becomes more complicated as the major axis of the ellipse can be orientated at any angle. The information to do this must be stored in patches somewhere, but I can't seem to find it.
Any advice on this would be much appreciated.
Ellipse objects have a method contains_point which will return 1 if the point is in the ellipse, 0 other wise.
Stealing from #DrV 's answer:
import matplotlib.pyplot as plt
import matplotlib.patches
import numpy as np
# create an ellipse
el = matplotlib.patches.Ellipse((50,-23), 10, 13.7, 30, facecolor=(1,0,0,.2), edgecolor='none')
# calculate the x and y points possibly within the ellipse
y_int = np.arange(-30, -15)
x_int = np.arange(40, 60)
# create a list of possible coordinates
g = np.meshgrid(x_int, y_int)
coords = list(zip(*(c.flat for c in g)))
# create the list of valid coordinates (from untransformed)
ellipsepoints = np.vstack([p for p in coords if el.contains_point(p, radius=0)])
# just to see if this works
fig = plt.figure()
ax = fig.add_subplot(111)
ax.add_artist(el)
ep = np.array(ellipsepoints)
ax.plot(ellipsepoints[:,0], ellipsepoints[:,1], 'ko')
plt.show()
This will give you the result as below:
If you really want to use the methods offered by matplotlib, then:
import matplotlib.pyplot as plt
import matplotlib.patches
import numpy as np
# create an ellipse
el = matplotlib.patches.Ellipse((50,-23), 10, 13.7, 30, facecolor=(1,0,0,.2), edgecolor='none')
# find the bounding box of the ellipse
bb = el.get_window_extent()
# calculate the x and y points possibly within the ellipse
x_int = np.arange(np.ceil(bb.x0), np.floor(bb.x1) + 1, dtype='int')
y_int = np.arange(np.ceil(bb.y0), np.floor(bb.y1) + 1, dtype='int')
# create a list of possible coordinates
g = np.meshgrid(x_int, y_int)
coords = np.array(zip(*(c.flat for c in g)))
# create a list of transformed points (transformed so that the ellipse is a unit circle)
transcoords = el.get_transform().inverted().transform(coords)
# find the transformed coordinates which are within a unit circle
validcoords = transcoords[:,0]**2 + transcoords[:,1]**2 < 1.0
# create the list of valid coordinates (from untransformed)
ellipsepoints = coords[validcoords]
# just to see if this works
fig = plt.figure()
ax = fig.add_subplot(111)
ax.add_artist(el)
ep = np.array(ellipsepoints)
ax.plot(ellipsepoints[:,0], ellipsepoints[:,1], 'ko')
Seems to work:
(Zooming in reveals that even the points hanging on the edge are inside.)
The point here is that matplotlib handles ellipses as transformed circles (translate, rotate, scale, anything affine). If the transform is applied in reverse, the result is a unit circle at origin, and it is very simple to check if a point is within that.
Just a word of warning: The get_window_extent may not be extremely reliable, as it seems to use the spline approximation of a circle. Also, see tcaswell's comment on the renderer-dependency.
In order to find a more reliable bounding box, you may:
create a horizontal and vertical vector into the plot coordinates (their position is not important, ([0,0],[1,0]) and ([0,0], [0,1]) will do)
transform these vectors into the ellipse coordinates (the get_transform, etc.)
find in the ellipse coordinate system (i.e. the system where the ellipse is a unit circle around the origin) the four tangents of the circle which are parallel to these two vectors
find the intersection points of the vectors (4 intersections, but 2 diagonal will be enough)
transform the intersection points back to the plot coordinates
This will give an accurate (but of course limited by the numerical precision) square bounding box.
However, you may use a simple approximation:
all possible points are within a circle whose center is the same as that of the ellipse and whose diameter is the same as that of the major axis of the ellipse
In other words, all possible points are within a square bounding box which is between x0+-m/2, y0+-m/2, where (x0, y0) is the center of the ellipse and m the major axis.
I'd like to offer another solution that uses the Path object's contains_points() method instead of contains_point():
First get the coordinates of the ellipse and make it into a Path object:
elpath=Path(el.get_verts())
(NOTE that el.get_paths() won't work for some reason.)
Then call the path's contains_points():
validcoords=elpath.contains_points(coords)
Below I'm comparing #tacaswell's solution (method 1), #Drv's (method 2) and my own (method 3) (I've enlarged the ellipse by ~5 times):
import numpy
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
from matplotlib.path import Path
import time
#----------------Create an ellipse----------------
el=Ellipse((50,-23),50,70,30,facecolor=(1,0,0,.2), edgecolor='none')
#---------------------Method 1---------------------
t1=time.time()
for ii in range(50):
y=numpy.arange(-100,50)
x=numpy.arange(-30,130)
g=numpy.meshgrid(x,y)
coords=numpy.array(zip(*(c.flat for c in g)))
ellipsepoints = numpy.vstack([p for p in coords if el.contains_point(p, radius=0)])
t2=time.time()
print 'time of method 1',t2-t1
#---------------------Method 2---------------------
t2=time.time()
for ii in range(50):
y=numpy.arange(-100,50)
x=numpy.arange(-30,130)
g=numpy.meshgrid(x,y)
coords=numpy.array(zip(*(c.flat for c in g)))
invtrans=el.get_transform().inverted()
transcoords=invtrans.transform(coords)
validcoords=transcoords[:,0]**2+transcoords[:,1]**2<=1.0
ellipsepoints=coords[validcoords]
t3=time.time()
print 'time of method 2',t3-t2
#---------------------Method 3---------------------
t3=time.time()
for ii in range(50):
y=numpy.arange(-100,50)
x=numpy.arange(-30,130)
g=numpy.meshgrid(x,y)
coords=numpy.array(zip(*(c.flat for c in g)))
#------Create a path from ellipse's vertices------
elpath=Path(el.get_verts())
# call contains_points()
validcoords=elpath.contains_points(coords)
ellipsepoints=coords[validcoords]
t4=time.time()
print 'time of method 3',t4-t3
#---------------------Plot it ---------------------
fig,ax=plt.subplots()
ax.add_artist(el)
ep=numpy.array(ellipsepoints)
ax.plot(ellipsepoints[:,0],ellipsepoints[:,1],'ko')
plt.show(block=False)
I got these execution time:
time of method 1 62.2502269745
time of method 2 0.488734006882
time of method 3 0.588987112045
So the contains_point() approach is way slower. The coordinate-transformation method is faster than mine, but when you get irregular shaped contours/polygons, this method would still work.
Finally the result plot:

Categories