I have a shapefile (will be called source-file hereafter), which I need to clip by a multi-polygon shapefile so that I can have a clipped shapefile for each polygon. I tried the geopandas, though I am able to clip the source-file by individually clipping the it by selecting the polygons separately from the multi-polygon shapefile, but when I try to loop over the polygons to automate the clipping process I get the following error:
Error:
TypeError: 'mask' should be GeoDataFrame, GeoSeries or(Multi)Polygon, got <class 'tuple'>
Code:
import geopandas as gpd
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
for row in maskfile.iterrows():
gpd.clip(sourcefile, row)
Two points
https://geopandas.org/en/stable/docs/reference/api/geopandas.clip.html mask can be a GeoDataFrame hence no need for looping
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html yields a tuple of the index value and named tuple of row. Hence your error is the fact your are passing this tuple to clip()
Have constructed an example. It is far simpler to clip using a GeoDataFrame as the mask.
import geopandas as gpd
import pandas as pd
# lets build a mask for use in clip, multipolygons and polygons
maskfile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
maskfile = maskfile.loc[maskfile["continent"].eq("Europe") & maskfile["name"].ne("Russia")].pipe(
lambda d: d.assign(gdp_grp=pd.cut(d["gdp_md_est"], bins=4, labels=list("abcd")))
).dissolve("gdp_grp").reset_index()
sourcefile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# now clip, no looping needed
gpd.clip(sourcefile, maskfile)
Finally, after 5 hours of research, I am now able clip the shapefile by a multi-polygon shapefile and save the clipped polygons separately with their respective names. Following code may be dirty
but it works.
code:
import geopandas as gpd
import pandas as pd
import os, sys
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
outpath = ('/outpath')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
clipshape = maskfile.explode()
clipshape.set_index('CATCH_NAME', inplace=True) # CATCH_NAME is attribute column name
for index, row in clipshape['geometry'].iteritems():
clipped = gpd.clip(sourcefile, row)
clipped.to_file(os.path.join(outpath, f'{index}.shp'))
Related
I have the following Python code to read my shapefile features into a GeoDataFrame using the points x, y.
import math
import shapely.geometry
import geopandas as gpd
from shapely.ops import nearest_points
absolute_path_to_shapefile = 'c:/test/test1.shp'
gdf1 = gpd.read_file(absolute_path_to_shapefile)
gdf = gpd.GeoDataFrame(
gdf1, geometry=gpd.points_from_xy(gdf1['x'], gdf1['y']))
Is there a way to limit the features read in? Some shapefiles have millions of points but I just want to read in the first 100 as proof of concept.
GeoPandas read_file() has a rows option to limit the number of rows read (or to use a slice to read specific rows).
import math
import shapely.geometry
import geopandas as gpd
from shapely.ops import nearest_points
absolute_path_to_shapefile = 'c:/test/test1.shp'
gdf1 = gpd.read_file(absolute_path_to_shapefile, rows=100)
gdf = gpd.GeoDataFrame(gdf1, geometry=gpd.points_from_xy(gdf1['x'], gdf1['y']))
GeoPandas documentation
geopandas.read_file(filename, bbox=None, mask=None, rows=None, **kwargs)
Returns a GeoDataFrame from a file or URL.
Parameters
filename: str, path object or file-like object
Either the absolute or relative path to the file or URL to be opened, or any object with a read() method (such as an open file or StringIO)
bbox: tuple | GeoDataFrame or GeoSeries | shapely Geometry, default None
Filter features by given bounding box, GeoSeries, GeoDataFrame or a shapely geometry. CRS mis-matches are resolved if given a GeoSeries or GeoDataFrame. Tuple is (minx, miny, maxx, maxy) to match the bounds property of shapely geometry objects. Cannot be used with mask.
mask: dict | GeoDataFrame or GeoSeries | shapely Geometry, default None
Filter for features that intersect with the given dict-like geojson geometry, GeoSeries, GeoDataFrame or shapely geometry. CRS mis-matches are resolved if given a GeoSeries or GeoDataFrame. Cannot be used with bbox.
rows: int or slice, default None
Load in specific rows by passing an integer (first n rows) or a slice() object.
**kwargs :
Keyword args to be passed to the open or BytesCollection method in the fiona library when opening the file. For more information on possible keywords, type: import fiona; help(fiona.open)
Returns
geopandas.GeoDataFrame or pandas.DataFrame :
If ignore_geometry=True a pandas.DataFrame will be returned.
I am trying to convert a CSV file with WKT Polygon geometry to shapefile in Python, but cannot determine how to correctly integrate the geometry into a shapefile. Below is a segment of the CSV file:
ID Day Number Hours WKT
1 10 2 [12,12,13] POLYGON ((153.101112401 -27.797998206, 153.097860177 -27.807122487, 153.097715464 -27.8163131, 153.100598081 -27.821068293,...)
I am attempting to use the geopandas and shapely libraries and have found documentd to support conversion from CSV to Shapefile from Points geometry and using latitude/longtitude, but I cannot figure out how to do so without lat/lon and from Polygon geometry. When I attempted to plot the data, I get an "AttributeError: No geometry data set yet (expected in column 'geometry')". I can still generate a plot graphic, but there is no data associated with it. Once I can plot the data, I should be able to generate the desired shapefile output that preserves the attributes of the original CSV. Below is the the code I am using:
import pandas as pd
import geopandas as gpd
from shapely import wkt
test_file = pd.read_csv("C:\\Users\\mdl518\\Desktop\\sample_data.csv") ## read the CSV
test_file['geometry'] = test_file.WKT.apply(wkt.loads) ## load the WKT geometry
gdf = gpd.GeoDataFrame(test_file, geometry='geometry') ## load CRS into Geodataframe
test_file_gdf.plot(markersize = 1.5, figsize = (10,10)) ## plot the data
## Obtaining the ESRI WKT
ESRI_WKT = 'GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]'
## Saving to shapefile
test_file_gdf.to_file(filename = "C:\\Users\\mdl518\\Desktop\\test_sample.shp", driver = "ESRI Shapefile", crs_wkt = ESRI_WKT)
I feel like this should otherwise be fairly straightforward, but I cannot figure out the missing steps in the geodataframe geometry integration, any assistance is most appreciated!
I am trying to save the following geodataframe with columns (geometry, area, centroid, and boundary) to a json file using df.to_file('result.geojson', driver="GeoJSON"):
However, I get the following error because I have centroid and boundary columns.
TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fb7fff86940>' as a data type
This works perfectly fine when there is only geometry column and area column.
Also when I try to save it as a csv file and then read it back as geopandas file, I am only able to convert geometry into datatype geometry. Howver, centroid and boundary show as object datatype. How do I convert them to geometry datatype?
as per comments, both geopandas and geojson only support one geometry per feature
hence your data frame is polygons as geometry and other columns are series of objects (shapely objects)
have simulated data set. This clearly demonstrates your data structure is not normalised. boundary, centroid and area are calculated columns. Hence in relational theory not normalised
this can be saved as CSV shapely objects will be encoded as WKT
this can then simply be loaded, with second step to decode WKT back into shapely objects
have demonstrated this works by plotting geometries loaded from CSV
import geopandas as gpd
import pandas as pd
import shapely.geometry
from pathlib import Path
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# derive columns in question...
gdf["boundary"] = gdf["geometry"].apply(lambda p: p.boundary )
gdf["centroid"] = gdf["geometry"].apply(lambda p: p.centroid )
gdf["area"] = gdf["geometry"].apply(lambda p: p.area )
# save as CSV shapely.wkt.dumps will be implicitly used
gdf.to_csv(Path.cwd().joinpath("SO_geom.csv"), index=False)
# load encoded dataframe
df = pd.read_csv(Path.cwd().joinpath("SO_geom.csv"))
# decode geometry columns as strings back into shapely objects
for c in ["geometry","boundary","centroid"]:
df[c] = df[c].apply(shapely.wkt.loads)
# finally reconstruct geodataframe
gdf = gpd.GeoDataFrame(df)
# show it has worked
gdf.plot()
gpd.GeoSeries(gdf["boundary"]).plot()
gpd.GeoSeries(gdf["centroid"]).plot()
I'm a beginner with shapely and i'm trying to read shapefile, save it as geoJson and then use shape() in order to see the geometry type.
according to the doc, shape():
shapely.geometry.shape(context) Returns a new, independent geometry
with coordinates copied from the context.
Saving the shapefile as geoJson seems to work but for some reason when I try to use shape() on the geoJson I get error:
ValueError: Unknown geometry type: featurecollection
This is my script:
import geopandas
import numpy as np
from shapely.geometry import shape, Polygon, MultiPolygon, MultiLineString
#read shapefile:
myshpfile = geopandas.read_file('shape/myshape.shp')
myshpfile.to_file('myshape.geojson', driver='GeoJSON')
#read as GeoJson and use shape()
INPUT_FILE = 'shape/myshape.geojson'
geo_json = geopandas.read_file(INPUT_FILE)
#try to use shape()
geom = shape(geo_json)
>>>ValueError: Unknown geometry type: featurecollection
I have also tried to specify the geometry with slicing but seems like impossible.
#try to use shape()
geom = shape(geo_json.iloc[:,9])
>>>TypeError: '(slice(None, None, None), 9)' is an invalid key
Right now I can't pass this level, but my end goal is to be able to get the geometry type when print geom.geom_type (now I get the error before).
Edit:when I check the type of the saved GeoJson I get "geopandas.geodataframe.GeoDataFrame"
Your geo_json object is geopandas.GeoDataFrame which has a column of shapely geometries. There's no need to call shape. If you want to check geom_type, there's an easy way to do that directly.
import geopandas
import numpy as np
from shapely.geometry import shape, Polygon, MultiPolygon, MultiLineString
#read shapefile:
myshpfile = geopandas.read_file('shape/myshape.shp')
myshpfile.to_file('myshape.geojson', driver='GeoJSON')
#read as GeoJson and use shape()
INPUT_FILE = 'shape/myshape.geojson'
geo_json = geopandas.read_file(INPUT_FILE)
geo_json.geom_type
That will give you geom_type for each geometry in the dataframe. Maybe check geopandas documentation to get more familiar with the concept.
I have this code:
import pandas as pd
import numpy as np
from geopandas import GeoDataFrame
import geopandas
from shapely.geometry import LineString, Point
import matplotlib.pyplot as plt
import contextily
''' Do Something'''
df = start_stop_df.drop('track', axis=1)
crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(df, crs=crs, geometry=geometry)
ax = gdf.plot()
contextily.add_basemap(ax)
ax.set_axis_off()
plt.show()
Basically, this generates a background map that is in Singapore. However, when I run it, I get the following error: HTTPError: Tile URL resulted in a 404 error. Double-check your tile url:http://tile.stamen.com/terrain/29/268436843/268435436.png
However, it still produces this image:
How can I change the Tile URL? I would still like to have the map of Singapore as the base layer.
EDIT:
Also tried including this argument to add_basemap:
url ='https://www.openstreetmap.org/#map=12/1.3332/103.7987'
Which produced this error:
OSError: cannot identify image file <_io.BytesIO object at 0x000001CC3CC4BC50>
First make sure that your GeoDataframe is in Web Mercator projection (epsg=3857). Once your Geodataframe is correctly georeferenced, you can achieve this by Geopandas reprojection:
df = df.to_crs(epsg=3857)
Once you have this done, you easily choose any of the supported map styles. A full list can be found in contextily.sources module, at the time of writing:
### Tile provider sources ###
ST_TONER = 'http://tile.stamen.com/toner/tileZ/tileX/tileY.png'
ST_TONER_HYBRID = 'http://tile.stamen.com/toner-hybrid/tileZ/tileX/tileY.png'
ST_TONER_LABELS = 'http://tile.stamen.com/toner-labels/tileZ/tileX/tileY.png'
ST_TONER_LINES = 'http://tile.stamen.com/toner-lines/tileZ/tileX/tileY.png'
ST_TONER_BACKGROUND = 'http://tile.stamen.com/toner-background/tileZ/tileX/tileY.png'
ST_TONER_LITE = 'http://tile.stamen.com/toner-lite/tileZ/tileX/tileY.png'
ST_TERRAIN = 'http://tile.stamen.com/terrain/tileZ/tileX/tileY.png'
ST_TERRAIN_LABELS = 'http://tile.stamen.com/terrain-labels/tileZ/tileX/tileY.png'
ST_TERRAIN_LINES = 'http://tile.stamen.com/terrain-lines/tileZ/tileX/tileY.png'
ST_TERRAIN_BACKGROUND = 'http://tile.stamen.com/terrain-background/tileZ/tileX/tileY.png'
ST_WATERCOLOR = 'http://tile.stamen.com/watercolor/tileZ/tileX/tileY.png'
# OpenStreetMap as an alternative
OSM_A = 'http://a.tile.openstreetmap.org/tileZ/tileX/tileY.png'
OSM_B = 'http://b.tile.openstreetmap.org/tileZ/tileX/tileY.png'
OSM_C = 'http://c.tile.openstreetmap.org/tileZ/tileX/tileY.png'
Keep in mind that you should not be adding actual x,y,z tile numbers in your tile URL (like you did in your "EDIT" example). ctx will take care of all this.
You can find a working copy-pastable example and further info at GeoPandas docs.
import contextily as ctx
# Dataframe you want to plot
gdf = GeoDataFrame(df, crs= {"init": "epsg:4326"}) # Create a georeferenced dataframe
gdf = gdf.to_crs(epsg=3857) # reproject it in Web mercator
ax = gdf.plot()
# choose any of the supported maps from ctx.sources
ctx.add_basemap(ax, url=ctx.sources.ST_TERRAIN)
ax.set_axis_off()
plt.show()
Contextily's default crs is epsg:3857. However, your data-frame is in different CRS. Use the following,refer the manual here:
ctx.add_basemap(ax, crs='epsg:4326', source=ctx.providers.Stamen.TonerLite)
Please, refer to this link for using different sources such as Stamen.Toner, Stamen.Terrain etc. (Stamen.Terrain is used as default).
Also, you can cast your data frame to EPSG:3857 by using df.to_crs(). In this case, you should skip crs argument inside ctx.add_basemap() function.
im too new to add a comment but I wanted to point out to those saying in the comments that they get a 404 error. Check you capitaliations, etc. Stamen's urls are specifc on this. For instance there is not an all caps call. It is only capitalize the first letter. For example:
ctx.add_basemap(ax=ax,url=ctx.providers.Stamen.Toner, zoom=10)