I am trying to convert a CSV file with WKT Polygon geometry to shapefile in Python, but cannot determine how to correctly integrate the geometry into a shapefile. Below is a segment of the CSV file:
ID Day Number Hours WKT
1 10 2 [12,12,13] POLYGON ((153.101112401 -27.797998206, 153.097860177 -27.807122487, 153.097715464 -27.8163131, 153.100598081 -27.821068293,...)
I am attempting to use the geopandas and shapely libraries and have found documentd to support conversion from CSV to Shapefile from Points geometry and using latitude/longtitude, but I cannot figure out how to do so without lat/lon and from Polygon geometry. When I attempted to plot the data, I get an "AttributeError: No geometry data set yet (expected in column 'geometry')". I can still generate a plot graphic, but there is no data associated with it. Once I can plot the data, I should be able to generate the desired shapefile output that preserves the attributes of the original CSV. Below is the the code I am using:
import pandas as pd
import geopandas as gpd
from shapely import wkt
test_file = pd.read_csv("C:\\Users\\mdl518\\Desktop\\sample_data.csv") ## read the CSV
test_file['geometry'] = test_file.WKT.apply(wkt.loads) ## load the WKT geometry
gdf = gpd.GeoDataFrame(test_file, geometry='geometry') ## load CRS into Geodataframe
test_file_gdf.plot(markersize = 1.5, figsize = (10,10)) ## plot the data
## Obtaining the ESRI WKT
ESRI_WKT = 'GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]'
## Saving to shapefile
test_file_gdf.to_file(filename = "C:\\Users\\mdl518\\Desktop\\test_sample.shp", driver = "ESRI Shapefile", crs_wkt = ESRI_WKT)
I feel like this should otherwise be fairly straightforward, but I cannot figure out the missing steps in the geodataframe geometry integration, any assistance is most appreciated!
Related
I have a shapefile (will be called source-file hereafter), which I need to clip by a multi-polygon shapefile so that I can have a clipped shapefile for each polygon. I tried the geopandas, though I am able to clip the source-file by individually clipping the it by selecting the polygons separately from the multi-polygon shapefile, but when I try to loop over the polygons to automate the clipping process I get the following error:
Error:
TypeError: 'mask' should be GeoDataFrame, GeoSeries or(Multi)Polygon, got <class 'tuple'>
Code:
import geopandas as gpd
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
for row in maskfile.iterrows():
gpd.clip(sourcefile, row)
Two points
https://geopandas.org/en/stable/docs/reference/api/geopandas.clip.html mask can be a GeoDataFrame hence no need for looping
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html yields a tuple of the index value and named tuple of row. Hence your error is the fact your are passing this tuple to clip()
Have constructed an example. It is far simpler to clip using a GeoDataFrame as the mask.
import geopandas as gpd
import pandas as pd
# lets build a mask for use in clip, multipolygons and polygons
maskfile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
maskfile = maskfile.loc[maskfile["continent"].eq("Europe") & maskfile["name"].ne("Russia")].pipe(
lambda d: d.assign(gdp_grp=pd.cut(d["gdp_md_est"], bins=4, labels=list("abcd")))
).dissolve("gdp_grp").reset_index()
sourcefile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# now clip, no looping needed
gpd.clip(sourcefile, maskfile)
Finally, after 5 hours of research, I am now able clip the shapefile by a multi-polygon shapefile and save the clipped polygons separately with their respective names. Following code may be dirty
but it works.
code:
import geopandas as gpd
import pandas as pd
import os, sys
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
outpath = ('/outpath')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
clipshape = maskfile.explode()
clipshape.set_index('CATCH_NAME', inplace=True) # CATCH_NAME is attribute column name
for index, row in clipshape['geometry'].iteritems():
clipped = gpd.clip(sourcefile, row)
clipped.to_file(os.path.join(outpath, f'{index}.shp'))
I am trying to save the following geodataframe with columns (geometry, area, centroid, and boundary) to a json file using df.to_file('result.geojson', driver="GeoJSON"):
However, I get the following error because I have centroid and boundary columns.
TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fb7fff86940>' as a data type
This works perfectly fine when there is only geometry column and area column.
Also when I try to save it as a csv file and then read it back as geopandas file, I am only able to convert geometry into datatype geometry. Howver, centroid and boundary show as object datatype. How do I convert them to geometry datatype?
as per comments, both geopandas and geojson only support one geometry per feature
hence your data frame is polygons as geometry and other columns are series of objects (shapely objects)
have simulated data set. This clearly demonstrates your data structure is not normalised. boundary, centroid and area are calculated columns. Hence in relational theory not normalised
this can be saved as CSV shapely objects will be encoded as WKT
this can then simply be loaded, with second step to decode WKT back into shapely objects
have demonstrated this works by plotting geometries loaded from CSV
import geopandas as gpd
import pandas as pd
import shapely.geometry
from pathlib import Path
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# derive columns in question...
gdf["boundary"] = gdf["geometry"].apply(lambda p: p.boundary )
gdf["centroid"] = gdf["geometry"].apply(lambda p: p.centroid )
gdf["area"] = gdf["geometry"].apply(lambda p: p.area )
# save as CSV shapely.wkt.dumps will be implicitly used
gdf.to_csv(Path.cwd().joinpath("SO_geom.csv"), index=False)
# load encoded dataframe
df = pd.read_csv(Path.cwd().joinpath("SO_geom.csv"))
# decode geometry columns as strings back into shapely objects
for c in ["geometry","boundary","centroid"]:
df[c] = df[c].apply(shapely.wkt.loads)
# finally reconstruct geodataframe
gdf = gpd.GeoDataFrame(df)
# show it has worked
gdf.plot()
gpd.GeoSeries(gdf["boundary"]).plot()
gpd.GeoSeries(gdf["centroid"]).plot()
I'm a beginner with shapely and i'm trying to read shapefile, save it as geoJson and then use shape() in order to see the geometry type.
according to the doc, shape():
shapely.geometry.shape(context) Returns a new, independent geometry
with coordinates copied from the context.
Saving the shapefile as geoJson seems to work but for some reason when I try to use shape() on the geoJson I get error:
ValueError: Unknown geometry type: featurecollection
This is my script:
import geopandas
import numpy as np
from shapely.geometry import shape, Polygon, MultiPolygon, MultiLineString
#read shapefile:
myshpfile = geopandas.read_file('shape/myshape.shp')
myshpfile.to_file('myshape.geojson', driver='GeoJSON')
#read as GeoJson and use shape()
INPUT_FILE = 'shape/myshape.geojson'
geo_json = geopandas.read_file(INPUT_FILE)
#try to use shape()
geom = shape(geo_json)
>>>ValueError: Unknown geometry type: featurecollection
I have also tried to specify the geometry with slicing but seems like impossible.
#try to use shape()
geom = shape(geo_json.iloc[:,9])
>>>TypeError: '(slice(None, None, None), 9)' is an invalid key
Right now I can't pass this level, but my end goal is to be able to get the geometry type when print geom.geom_type (now I get the error before).
Edit:when I check the type of the saved GeoJson I get "geopandas.geodataframe.GeoDataFrame"
Your geo_json object is geopandas.GeoDataFrame which has a column of shapely geometries. There's no need to call shape. If you want to check geom_type, there's an easy way to do that directly.
import geopandas
import numpy as np
from shapely.geometry import shape, Polygon, MultiPolygon, MultiLineString
#read shapefile:
myshpfile = geopandas.read_file('shape/myshape.shp')
myshpfile.to_file('myshape.geojson', driver='GeoJSON')
#read as GeoJson and use shape()
INPUT_FILE = 'shape/myshape.geojson'
geo_json = geopandas.read_file(INPUT_FILE)
geo_json.geom_type
That will give you geom_type for each geometry in the dataframe. Maybe check geopandas documentation to get more familiar with the concept.
I am trying to set the crs of a geopandas object as described here.
The example file can be downloaded from here
import geopandas as gdp
df = pd.read_pickle('myShp.pickle')
I upload the screenshot to show the values of the coordinates
then if I try to change the crs the values of the polygon don't change
tmp = gpd.GeoDataFrame(df, geometry='geometry')
tmp.crs = {'init' :'epsg:32618'}
I show again the screenshot
If I try:
import geopandas as gdp
df = pd.read_pickle('myShp.pickle')
df = gpd.GeoDataFrame(df, geometry='geometry')
dfNew=df.to_crs(epsg=32618)
I get:
ValueError: Cannot transform naive geometries. Please set a crs on the object first.
Setting the crs like:
gdf.crs = {'init' :'epsg:32618'}
does not transform your data, it only sets the CRS (it basically says: "my data is represented in this CRS"). In most cases, the CRS is already set while reading the data with geopandas.read_file (if your file has CRS information). So you only need the above when your data has no CRS information yet.
If you actually want to convert the coordinates to a different CRS, you can use the to_crs method:
gdf_new = gdf.to_crs(epsg=32618)
See https://geopandas.readthedocs.io/en/latest/projections.html
super late answer, but it's:
tmp.set_crs(...)
Use this to define whatever coordinate system the data is in, i.e. 'telling the computer what coordinate system we started in'
Then;
tmp.to_crs(...)
Use this to change to your new preferred crs.
I have a shapefile which I am converting into a geopandas dataframe. The crs of this shapefile is epsg:4326 and I want to convert this to epsg:3857.
import geopandas as gp
file_path = 'zip_poly.shp'
sf = gp.read_file(file_path)
sf = sf[['ZIP_CODE','PO_NAME','STATE','geometry']]
sf = sf.to_crs(epsg=3857)
You can find the shapefile here.
The shapefile has around 33K zipcodes and it's taking too long to convert. Is there a better way to convert quickly?