Convert Geopandas Multipolygon to Polygon - python

I have a geodataframe with a Multipolygon geometry:
I would like to convert them to Polygons i.e. fill in the holes of multipolygon and make it a single polygon.
I have tried the code from this similar question:
from shapely.geometry import MultiPolygon, Polygon
gdf['Polygon'] = gdf['SHAPE'].apply( lambda x: MultiPolygon(Polygon(p.exterior) for p in x))
But I get the error:
TypeError: 'Polygon' object is not subscriptable
I have tried other solutions from stack overflow with no luck.
Any ideas?
Here are the dtypes:
FID int64
LHO object
Shape__Area float64
Shape__Length float64
SHAPE geometry
Here is the complete code to get the shapefile:
import pandas as pd
import geopandas as gpd
from arcgis import GIS
gis = GIS(verify_cert=False,api_key='your_api_key')
search_result = gis.content.search(query="title:National_LHO", item_type="Feature Layer")
# get layer
layer = search_result[0].layers[0]
# dataframe from layer
df= pd.DataFrame.spatial.from_layer(layer)
gdf = gpd.GeoDataFrame(df)
gdf = gdf.set_geometry('SHAPE')
gdf = gdf.set_crs(epsg='3857')
gdf = gdf.to_crs(epsg='4326')

There is method called .explode you can use on your GeoDataFrame:
gdf_exploded=gdf.explode()
you can find the docs here

Related

How to limit the number of features read in using GeoPandas?

I have the following Python code to read my shapefile features into a GeoDataFrame using the points x, y.
import math
import shapely.geometry
import geopandas as gpd
from shapely.ops import nearest_points
absolute_path_to_shapefile = 'c:/test/test1.shp'
gdf1 = gpd.read_file(absolute_path_to_shapefile)
gdf = gpd.GeoDataFrame(
gdf1, geometry=gpd.points_from_xy(gdf1['x'], gdf1['y']))
Is there a way to limit the features read in? Some shapefiles have millions of points but I just want to read in the first 100 as proof of concept.
GeoPandas read_file() has a rows option to limit the number of rows read (or to use a slice to read specific rows).
import math
import shapely.geometry
import geopandas as gpd
from shapely.ops import nearest_points
absolute_path_to_shapefile = 'c:/test/test1.shp'
gdf1 = gpd.read_file(absolute_path_to_shapefile, rows=100)
gdf = gpd.GeoDataFrame(gdf1, geometry=gpd.points_from_xy(gdf1['x'], gdf1['y']))
GeoPandas documentation
geopandas.read_file(filename, bbox=None, mask=None, rows=None, **kwargs)
Returns a GeoDataFrame from a file or URL.
Parameters
filename: str, path object or file-like object
Either the absolute or relative path to the file or URL to be opened, or any object with a read() method (such as an open file or StringIO)
bbox: tuple | GeoDataFrame or GeoSeries | shapely Geometry, default None
Filter features by given bounding box, GeoSeries, GeoDataFrame or a shapely geometry. CRS mis-matches are resolved if given a GeoSeries or GeoDataFrame. Tuple is (minx, miny, maxx, maxy) to match the bounds property of shapely geometry objects. Cannot be used with mask.
mask: dict | GeoDataFrame or GeoSeries | shapely Geometry, default None
Filter for features that intersect with the given dict-like geojson geometry, GeoSeries, GeoDataFrame or shapely geometry. CRS mis-matches are resolved if given a GeoSeries or GeoDataFrame. Cannot be used with bbox.
rows: int or slice, default None
Load in specific rows by passing an integer (first n rows) or a slice() object.
**kwargs :
Keyword args to be passed to the open or BytesCollection method in the fiona library when opening the file. For more information on possible keywords, type: import fiona; help(fiona.open)
Returns
geopandas.GeoDataFrame or pandas.DataFrame :
If ignore_geometry=True a pandas.DataFrame will be returned.

Clipping shapefile with multi-polygon shapefile in geopandas

I have a shapefile (will be called source-file hereafter), which I need to clip by a multi-polygon shapefile so that I can have a clipped shapefile for each polygon. I tried the geopandas, though I am able to clip the source-file by individually clipping the it by selecting the polygons separately from the multi-polygon shapefile, but when I try to loop over the polygons to automate the clipping process I get the following error:
Error:
TypeError: 'mask' should be GeoDataFrame, GeoSeries or(Multi)Polygon, got <class 'tuple'>
Code:
import geopandas as gpd
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
for row in maskfile.iterrows():
gpd.clip(sourcefile, row)
Two points
https://geopandas.org/en/stable/docs/reference/api/geopandas.clip.html mask can be a GeoDataFrame hence no need for looping
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html yields a tuple of the index value and named tuple of row. Hence your error is the fact your are passing this tuple to clip()
Have constructed an example. It is far simpler to clip using a GeoDataFrame as the mask.
import geopandas as gpd
import pandas as pd
# lets build a mask for use in clip, multipolygons and polygons
maskfile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
maskfile = maskfile.loc[maskfile["continent"].eq("Europe") & maskfile["name"].ne("Russia")].pipe(
lambda d: d.assign(gdp_grp=pd.cut(d["gdp_md_est"], bins=4, labels=list("abcd")))
).dissolve("gdp_grp").reset_index()
sourcefile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# now clip, no looping needed
gpd.clip(sourcefile, maskfile)
Finally, after 5 hours of research, I am now able clip the shapefile by a multi-polygon shapefile and save the clipped polygons separately with their respective names. Following code may be dirty
but it works.
code:
import geopandas as gpd
import pandas as pd
import os, sys
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
outpath = ('/outpath')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
clipshape = maskfile.explode()
clipshape.set_index('CATCH_NAME', inplace=True) # CATCH_NAME is attribute column name
for index, row in clipshape['geometry'].iteritems():
clipped = gpd.clip(sourcefile, row)
clipped.to_file(os.path.join(outpath, f'{index}.shp'))

Shapely doesn't recognize the geometry type of geoJson

I'm a beginner with shapely and i'm trying to read shapefile, save it as geoJson and then use shape() in order to see the geometry type.
according to the doc, shape():
shapely.geometry.shape(context) Returns a new, independent geometry
with coordinates copied from the context.
Saving the shapefile as geoJson seems to work but for some reason when I try to use shape() on the geoJson I get error:
ValueError: Unknown geometry type: featurecollection
This is my script:
import geopandas
import numpy as np
from shapely.geometry import shape, Polygon, MultiPolygon, MultiLineString
#read shapefile:
myshpfile = geopandas.read_file('shape/myshape.shp')
myshpfile.to_file('myshape.geojson', driver='GeoJSON')
#read as GeoJson and use shape()
INPUT_FILE = 'shape/myshape.geojson'
geo_json = geopandas.read_file(INPUT_FILE)
#try to use shape()
geom = shape(geo_json)
>>>ValueError: Unknown geometry type: featurecollection
I have also tried to specify the geometry with slicing but seems like impossible.
#try to use shape()
geom = shape(geo_json.iloc[:,9])
>>>TypeError: '(slice(None, None, None), 9)' is an invalid key
Right now I can't pass this level, but my end goal is to be able to get the geometry type when print geom.geom_type (now I get the error before).
Edit:when I check the type of the saved GeoJson I get "geopandas.geodataframe.GeoDataFrame"
Your geo_json object is geopandas.GeoDataFrame which has a column of shapely geometries. There's no need to call shape. If you want to check geom_type, there's an easy way to do that directly.
import geopandas
import numpy as np
from shapely.geometry import shape, Polygon, MultiPolygon, MultiLineString
#read shapefile:
myshpfile = geopandas.read_file('shape/myshape.shp')
myshpfile.to_file('myshape.geojson', driver='GeoJSON')
#read as GeoJson and use shape()
INPUT_FILE = 'shape/myshape.geojson'
geo_json = geopandas.read_file(INPUT_FILE)
geo_json.geom_type
That will give you geom_type for each geometry in the dataframe. Maybe check geopandas documentation to get more familiar with the concept.

Geopandas: how to read a csv and convert to a geopandas dataframe with polygons?

I read a .csv file as a dataframe that looks like the following:
import pandas as pd
df = pd.read_csv('myFile.csv')
df.head()
BoroName geometry
0 Brooklyn MULTIPOLYGON (((-73.97604935657381 40.63127590...
1 Queens MULTIPOLYGON (((-73.80379022888098 40.77561011...
2 Queens MULTIPOLYGON (((-73.8610972440186 40.763664477...
3 Queens MULTIPOLYGON (((-73.75725671509139 40.71813860...
4 Manhattan MULTIPOLYGON (((-73.94607828674226 40.82126321...
I want to convert it to a geopandas dataframe.
import geopandas as gpd
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df, crs=crs).set_geometry('geometry')
but I get the following error
TypeError: Input must be valid geometry objects: MULTIPOLYGON (((-73.97604935657381 40.631275905646774, -73.97716511994669 40.63074665412933,....
Geopandas seems to be unable to convert a geometry column from a pandas dataframe.
Solution number 2
Try applying the shapely wkt.loads function on your column before converting your dataframe to a geodataframe.
from shapely import wkt
df['geometry'] = df['geometry'].apply(wkt.loads)
gdf = gpd.GeoDataFrame(df, crs='epsg:4326')
Good luck!
Do not use - crashes spyder and jupyter kernel for some people
Solution number 1: Try loading the csv directly with geopandas
gdf = gpd.read_file('myFile.csv')
gdf.crs = 'epsg:4326'
You could also try this:
gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df.longitude, df.latitude)
)
This will convert those lat/long columns to points
Dudes, there are wkt geometry strings in the original dataframe, not xy columns, so i would suggest read this:
DataFrame with WKT Column to GeoPandas Geometry
Geopandas puts a geometry column at the end if you load directly. Found this out by experimenting with column names and it worked

Convert a column of GeoJSON-like strings to geometry objects in GeoPandas

I have a column in a GeoPandas dataframe with strings like this one '{type=Point, coordinates=[37.55, 55.71]}' or this '{type=MultiPoint, coordinates=[[37.6, 55.4]]}'. It can be a polygon or any other geometry as well. Then there are a few points in the form of nested list. How can I transform it to the ordinary GeoPandas geometry objects?
Use shapely.geometry.shape to convert geojson strings to shapely geometry.
from shapely.geometry import shape
df['geometry'] = df.apply(lambda: row: shape(row['jsoncolumn']), axis=1)
I implemented it as follows. Thanks to #martinfleis
# Add necessary shapes and keys
coordinates = 'coordinates'
type = 'type'
Point = 'Point'
MultiPoint = 'MultiPoint'
Polygon = 'Polygon'
MultiPolygon = 'MultiPolygon'
center='center'
df['geometry'] = df.geoData.apply(lambda x: shape(eval(x.replace('=',':'))))
From this source : on github
I built the following function :
import geopandas as gpd
import geojson
import json
def geojsonification(x):
geom = x['geom']
if type(geom) == dict:
s = json.dumps(geom)
s2 = geojson.loads(s)
res = shape(s2)
return res
else:
return np.nan
Which you can use as this :
gdf.geometry = gdf.apply(geojsonification, axis=1)

Categories