Find Polygon Coordinates from Shapefile using Python Geopandas

Find Polygon Coordinates from Shapefile using Python Geopandas - python

ERROR:
TypeError: 'LineString' object is not iterable
I am trying to find the bottom right corner of the Polygon in this .shp file. This .shp file is a square but other files may be a rectangle/triangle.
I want to use Geopandas and apparently I can obtain this using the read_file() method. I am quite new to SHPs and I have the .shx, .dpf files however when I enter just the .shp in this method, I am not able to loop through the polygon coordinates.
Here is my code - I want to capture the bottom right corner in a variable, currently all_cords will capture all of them, so I need to find a way to get over this error and then capture the bottom right corner,
import geopandas as gpd
import numpy as np
shapepath = r"FieldAlyticsCanada_WesternSales_AlexOlson_CouttsAgro2022_CouttsAgro_7-29-23.shp"
df = gpd.read_file(shapepath)
g = [i for i in df.geometry]
all_coords = []
for b in g[0].boundary: # error happens here
coords = np.dstack(b.coords.xy).tolist()
all_coords.append(*coords)
print(all_coords)

You can use much simpler concepts
bounds provides tuple (minx, miny, maxx, maxy)
shapely.ops.nearest_points() can be used to find nearest point to maxx, miny point
import geopandas as gpd
import numpy as np
import shapely.geometry
import shapely.ops
import pandas as pd
shapepath = (
r"FieldAlyticsCanada_WesternSales_AlexOlson_CouttsAgro2022_CouttsAgro_7-29-23.shp"
)
shapepath = gpd.datasets.get_path("naturalearth_lowres")
df = gpd.read_file(shapepath)
# find point closest to bottom right corner of geometry (minx, maxy)
df["nearest"] = df["geometry"].apply(
lambda g: shapely.ops.nearest_points(
g,
shapely.geometry.Point(
g.bounds[2], g.bounds[1]
),
)[0]
)
# visualize outcome...
gdfn = gpd.GeoDataFrame(df["name"], geometry=df["nearest"], crs=df.crs)
m = df.drop(columns=["nearest"]).explore()
gdfn.explore(m=m, color="red")

Related

Create point grid inside a shapefile using python

I am working on a shapefile in python using geopandas and gdal.
I am looking to create meshgrid (with regular 1000m interval points) inside the polygon shapefile. I have reprojected the file so that units can be meters. However, I could not find any direct way to implement this.
Can any one guide in this regard?
I am sharing the code, I have tried so far:
from osgeo import gdal, ogr
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
source_ds = ogr.Open(r"E:\review paper\sample tb data for recon\descend\tiffbt\alaska_bound.shp")
boundFile =gpd.read_file(r"E:\review paper\sample tb data for recon\descend\tiffbt\alaska_bound.shp")
bound_project = boundFile.to_crs({'init': 'EPSG:3572'})
print(bound_project.crs)
print(bound_project.total_bounds)
The coordinate system and bounding box coordinates are as below (output of above code):
+init=epsg:3572 +type=crs
[-2477342.73003557 -3852592.48050272 1305143.81797914 -2054961.64359753]

It's not clear if you are trying to create a grid of boxes or a grid of points. To change to points use:
# create a grid for geometry
gdf_grid = gpd.GeoDataFrame(
geometry=[
shapely.geometry.Point(x, y)
for x in np.arange(a, c, STEP)
for y in np.arange(b, d, STEP)
],
crs=crs,
).to_crs(gdf.crs)
have used 50km instead of 1000m for demonstration purposes
with Alaska it for polygons it is necessary to take into account the antimeridian. Without this you will have polygons than span in excess of 350 degrees when re-projected to EPSG:4326
approach is simple
obtain Alaska geometry shape file
project to a CRS in meters. Have used UTM
get total_bounds
construct grid of geometry objects using 3
restrict grid of geometry to ones that intersect with geometry
you will observe at such latitudes there will be distortion between UTM and EPSG:4326 as expected (the nature of projections)
full code
import geopandas as gpd
import numpy as np
import shapely.geometry
gdf = gpd.read_file("https://www2.census.gov/geo/tiger/TIGER2018/ANRC/tl_2018_02_anrc.zip")
STEP = 50000
crs = gdf.estimate_utm_crs()
# crs = "EPSG:3338"
a, b, c, d = gdf.to_crs(crs).total_bounds
# create a grid for geometry
gdf_grid = gpd.GeoDataFrame(
geometry=[
shapely.geometry.box(minx, miny, maxx, maxy)
for minx, maxx in zip(np.arange(a, c, STEP), np.arange(a, c, STEP)[1:])
for miny, maxy in zip(np.arange(b, d, STEP), np.arange(b, d, STEP)[1:])
],
crs=crs,
).to_crs(gdf.crs)
# exclude geometries that cross antimeridian
gdf_grid = gdf_grid.loc[~gdf_grid["geometry"].bounds.pipe(lambda d: d["maxx"] - d["minx"]).ge(350)]
# restrict grid to only squares that intersect with geometry
gdf_grid = (
gdf_grid.sjoin(gdf.dissolve().loc[:,["geometry"]])
.pipe(lambda d: d.groupby(d.index).first())
.set_crs(gdf.crs)
.drop(columns=["index_right"])
)
m = gdf.explore(color="red", style_kwds={"fillOpacity":0})
gdf_grid.explore(m=m)
output

Plot polygons with buffer of some radius using folium not working properly

I am trying plot the intersection between a buffer circle and the mesh blocks (or boundaries) within that circle of some radius (in this case, 80 km).
I got the intersection using sjoin() as follows:
intersection_MeshBlock = gpd.sjoin(buffer_df, rest_VIC, how='inner', predicate='intersects')
My buffer variable looks like this:
buffer_df
And the intersection looks like this:
intersection
The problem is I am not able to plot the intersection polygons.
Here is the plot I get after I plot it using the polygon plotting in folium:
for _, r in intersection_MeshBlock.iterrows():
# Without simplifying the representation of each borough,
# the map might not be displayed
sim_geo = gpd.GeoSeries(r['geometry']).simplify(tolerance=0.00001)
geo_j = sim_geo.to_json()
geo_j = folium.GeoJson(data=geo_j,
style_function=lambda x: {'fillColor': 'orange'} )
folium.Popup(r['SA1_CODE21']).add_to(geo_j)
geo_j.add_to(m)
m
Plot:
color filled maps
What am I doing in wrong ways?
EDIT:
I might have solved the issue partially. Now, I am able to plot the polygons inside some buffer radius. This is how my plot looks like:
If you see the image, you will realise that there are certain meshblocks that cross the circular boundary region. How do I get rid of everything which is outside that circular region?

have located some geometry for Melbourne to demonstrate
fundamentally, you want to use overlay() not sjoin()
generation of folium map is much simpler using GeoPandas 0.10 capability explore()
import geopandas as gpd
import numpy as np
import shapely.geometry
import folium
rest_VIC = gpd.read_file(
"https://raw.githubusercontent.com/codeforgermany/click_that_hood/main/public/data/melbourne.geojson"
)
# select a point randomly from total bounds of geometry
buffer_df = gpd.GeoDataFrame(
geometry=[
shapely.geometry.Point(
np.random.uniform(*rest_VIC.total_bounds[[0, 2]], size=1)[0],
np.random.uniform(*rest_VIC.total_bounds[[1, 3]], size=1)[0],
)
],
crs=rest_VIC.crs,
)
buffer_df = gpd.GeoDataFrame(
geometry=buffer_df.to_crs(buffer_df.estimate_utm_crs())
.buffer(8 * 10**3)
.to_crs(buffer_df.crs)
)
# need overlay not sjoin
intersection_MeshBlock = gpd.overlay(buffer_df, rest_VIC, how="intersection")
m = rest_VIC.explore(name="base", style_kwds={"fill":False}, width=400, height=300)
m = buffer_df.explore(m=m, name="buffer", style_kwds={"fill":False})
m = intersection_MeshBlock.explore(m=m, name="intersection", style_kwds={"fillColor":"orange"})
folium.LayerControl().add_to(m)
m

Spatial binning from a spatial dataframe using geopandas (Python)

I want to do a spatial binning (using median as aggregation function)
starting from a CSV file containing pollutant values measured at positions long and lat.
The resulting map should be something as:
But for data applied to a city's extent.
At this regard I found this tutorial that is close to what I want to do, but I was not able to get the desired result.
I think that I'm missing something on how to correctly use dissolve and plot the resulting data (better using Folium)
Any useful example code?

you have not provided sample data. So I have used global earthquakes as set of points and geometry of California for scope / extent
it's simple to create grid using shapely.geometry.box()
I have shown use of median and also another aggfunc to demonstrate multiple metrics can be calculated
have used folium to plot. This feature is new in geopandas 0.10.0 https://geopandas.org/en/stable/docs/user_guide/interactive_mapping.html
import geopandas as gpd
import shapely.geometry
import numpy as np
# equivalent of CSV, all earthquake points globally
gdf_e = gpd.read_file(
"https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.geojson"
)
# get geometry of bounding area. Have selected a state rather than a city
gdf_CA = gpd.read_file(
"https://raw.githubusercontent.com/glynnbird/usstatesgeojson/master/california.geojson"
).loc[:, ["geometry"]]
BOXES = 50
a, b, c, d = gdf_CA.total_bounds
# create a grid for Califormia, could be a city
gdf_grid = gpd.GeoDataFrame(
geometry=[
shapely.geometry.box(minx, miny, maxx, maxy)
for minx, maxx in zip(np.linspace(a, c, BOXES), np.linspace(a, c, BOXES)[1:])
for miny, maxy in zip(np.linspace(b, d, BOXES), np.linspace(b, d, BOXES)[1:])
],
crs="epsg:4326",
)
# remove grid boxes created outside actual geometry
gdf_grid = gdf_grid.sjoin(gdf_CA).drop(columns="index_right")
# get earthquakes that have occured within one of the grid geometries
gdf_e_CA = gdf_e.loc[:, ["geometry", "mag"]].sjoin(gdf_grid)
# get median magnitude of eargquakes in grid
gdf_grid = gdf_grid.join(
gdf_e_CA.dissolve(by="index_right", aggfunc="median").drop(columns="geometry")
)
# how many earthquakes in the grid
gdf_grid = gdf_grid.join(
gdf_e_CA.dissolve(by="index_right", aggfunc=lambda d: len(d))
.drop(columns="geometry")
.rename(columns={"mag": "number"})
)
# drop grids geometries that have no measures and create folium map
m = gdf_grid.dropna().explore(column="mag")
# for good measure - boundary on map too
gdf_CA["geometry"].apply(lambda g: shapely.geometry.MultiLineString([p.exterior for p in g.geoms])).explore(m=m)

Thanks to #Rob Raymond,
finally solved with the following code:
import pandas as pd
import geopandas as gpd
import pyproj
import matplotlib.pyplot as plt
import numpy as np
import shapely
from folium import plugins
df=pd.read_csv('../Desktop/test_esri.csv')
gdf_monica = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df.long, df.lat))
gdf_monica=gdf_monica.set_crs('epsg:4326')
gdf_area = gpd.read_file('https://raw.githubusercontent.com/openpolis/geojson-italy/master/geojson/limits_IT_municipalities.geojson')#.loc[:, ["geometry"]]
gdf_area =gdf_area[gdf_area['name']=='Portici'].loc[:,['geometry']]
BOXES = 50
a, b, c, d = gdf_area.total_bounds
gdf_grid = gpd.GeoDataFrame(
geometry=[
shapely.geometry.box(minx, miny, maxx, maxy)
for minx, maxx in zip(np.linspace(a, c, BOXES), np.linspace(a, c, BOXES)[1:])
for miny, maxy in zip(np.linspace(b, d, BOXES), np.linspace(b, d, BOXES)[1:])
],
crs="epsg:4326",
)
# remove grid boxes created outside actual geometry
gdf_grid = gdf_grid.sjoin(gdf_area).drop(columns="index_right")
gdf_monica_binned = gdf_monica.loc[:, ["geometry", "CO"]].sjoin(gdf_grid)
# get median magnitude of CO pollutant
gdf_grid = gdf_grid.join(
gdf_monica_binned.dissolve(by="index_right", aggfunc="median").drop(columns="geometry")
)
# how many earthquakes in the grid
gdf_grid = gdf_grid.join(
gdf_monica_binned.dissolve(by="index_right", aggfunc=lambda d: len(d))
.drop(columns="geometry")
.rename(columns={"CO": "number"})
)
# drop grids geometries that have no measures and create folium map
m = gdf_grid.dropna().explore(column="CO")
# for good measure - boundary on map too
gdf_area["geometry"].apply(lambda g: shapely.geometry.MultiLineString([p.exterior for p in g.geoms])).explore(m=m)
that produce:
As you can understand, I have little or no knowledge regarding spatial analysis. I was not able to get correct results without using geojson data that describe a geometry within which the points of interest fall.
If anyone could add more insights... thanks!

I want to convert a pandas DataFrame to a spatial enabled geopandas one as:
df=pd.read_csv('../Desktop/test_esri.csv')
df.head()
Then converted using:
gdf = geopandas.GeoDataFrame(
df, geometry=geopandas.points_from_xy(df.long, df.lat))
from pyproj import crs
crs_epsg = crs.CRS.from_epsg(4326)
gdf=gdf.set_crs('epsg:4326')
Then I want to overimpose a spatial grid as:
import numpy as np
import shapely
from pyproj import crs
# total area for the grid
xmin, ymin, xmax, ymax= gdf.total_bounds
# how many cells across and down
n_cells=30
cell_size = (xmax-xmin)/n_cells
# projection of the grid
# crs = "+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs"
# create the cells in a loop
grid_cells = []
for x0 in np.arange(xmin, xmax+cell_size, cell_size ):
for y0 in np.arange(ymin, ymax+cell_size, cell_size):
# bounds
x1 = x0-cell_size
y1 = y0+cell_size
grid_cells.append( shapely.geometry.box(x0, y0, x1, y1) )
cell = geopandas.GeoDataFrame(grid_cells, columns=['geometry'],
crs=crs.CRS('epsg:4326'))
Then merge the grid with geopandas dataframe:
merged = geopandas.sjoin(gdf, cell, how='left', predicate='within')
To finally compute the desired metric inside "dissolve":
# Compute stats per grid cell -- aggregate fires to grid cells with dissolve
dissolve = merged.dissolve(by="index_right", aggfunc="median")
But I think I did something wrong with the "cell" grid and I can't figure it out!!
An extract of csv file used con be found here.

Python shapely: Aggregating points to shape files for a Choropleth map

I'm trying a create a Choropleth in Python3 using shapely, fiona & bokeh for display.
I have a file with about 7000 lines that have the location of a town and a counter.
Example:
54.7604;9.55827;208
54.4004;9.95918;207
53.8434;9.95271;203
53.5979;10.0013;201
53.728;10.2526;197
53.646;10.0403;196
54.3977;10.1054;193
52.4385;9.39217;193
53.815;10.3476;192
...
I want to show these in a 12,5km grid, for which a shapefile is available on
https://opendata-esri-de.opendata.arcgis.com/datasets/3c1f46241cbb4b669e18b002e4893711_0
The code I have works.
It's very slow, because it's a brute force algorithm that checks each of the 7127 grid points against all of the 7000 points.
import pandas as pd
import fiona
from shapely.geometry import Polygon, Point, MultiPoint, MultiPolygon
from shapely.prepared import prep
sf = r'c:\Temp\geo_de\Hexagone_125_km\Hexagone_125_km.shp'
shp = fiona.open(sf)
district_xy = [ [ xy for xy in feat["geometry"]["coordinates"][0]] for feat in shp]
district_poly = [ Polygon(xy) for xy in district_xy] # coords to Polygon
df_p = pd.read_csv('points_file.csv', sep=';', header=None)
df_p.columns = ('lat', 'lon', 'count')
map_points = [Point(x,y) for x,y in zip(df_p.lon, df_p.lat)] # Convert Points to Shapely Points
all_points = MultiPoint(map_points) # all points
def calc_points_per_poly(poly, points, values): # Returns total for poly
poly = prep(poly)
return sum([v for p, v in zip(points, values) if poly.contains(p)])
# this is the slow part
# for each shape this sums um the points
sum_hex = [calc_points_per_poly(x, all_points, df_p['count']) for x in district_poly]
Since this is extremly slow, I'm wondering if there is a faster way to get the num_hex value, especially, since the real world list of points may be a lot larger and a smaller grid with more shapes would deliver a better result.

I would recommend using 'geopandas' and its built-in rtree spatial index. It allows you to do the check only if there is a possibility that point lies within polygon.
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, Point
sf = 'Hexagone_125_km.shp'
shp = gpd.read_file(sf)
df_p = pd.read_csv('points_file.csv', sep=';', header=None)
df_p.columns = ('lat', 'lon', 'count')
gdf_p = gpd.GeoDataFrame(df_p, geometry=[Point(x,y) for x,y in zip(df_p.lon, df_p.lat)])
sum_hex = []
spatial_index = gdf_p.sindex
for index, row in shp.iterrows():
polygon = row.geometry
possible_matches_index = list(spatial_index.intersection(polygon.bounds))
possible_matches = gdf_p.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.within(polygon)]
sum_hex.append(sum(precise_matches['count']))
shp['sum'] = sum_hex
This solution should be faster than your. You can then plot your GeoDataFrame via Bokeh. If you want more details on spatial indexing I recommend this article by Geoff Boeing: https://geoffboeing.com/2016/10/r-tree-spatial-index-python/

Extract superpixels, retrieve edges and reduce mesh

How would one extract superpixels, retrieve edges then simplify those?
This is what I've gotten so far:
import skimage
import numpy as np
from rasterio import features
from shapely.geometry import Polygon, MultiPolygon
image = skimage.util.img_as_float(skimage.io.imread("image.jpg"))
labels = skimage.segmentation.slic(image, n_segments = 200, sigma = 5)
boundaries = skimage.segmentation.find_boundaries(labels).astype(np.uint8)
#this is where it goes wrong as rasterio creates shapes with distinct edges
shapes = rasterio.features.shapes(boundaries)
polygons = MultiPolygon([Polygon(sh[0]["coordinates"][0]) for sh in shapes])
out = polygons.simplify(0.05)
The problem here is that the simplification works on a per polygon basis and therefore its output isn't a tight mesh.
I'm looking to achieve something similar to this, so obtaining the edges and being able to simplify.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find Polygon Coordinates from Shapefile using Python Geopandas - python

Related

Create point grid inside a shapefile using python

Plot polygons with buffer of some radius using folium not working properly

Spatial binning from a spatial dataframe using geopandas (Python)

Python shapely: Aggregating points to shape files for a Choropleth map

Extract superpixels, retrieve edges and reduce mesh

Categories

Resources