I have a shapefile which I am converting into a geopandas dataframe. The crs of this shapefile is epsg:4326 and I want to convert this to epsg:3857.
import geopandas as gp
file_path = 'zip_poly.shp'
sf = gp.read_file(file_path)
sf = sf[['ZIP_CODE','PO_NAME','STATE','geometry']]
sf = sf.to_crs(epsg=3857)
You can find the shapefile here.
The shapefile has around 33K zipcodes and it's taking too long to convert. Is there a better way to convert quickly?
Related
I have a csv file in which the 2nd and 3rd rows have lat and long values. The CSV file contains the temperature data from 2011 to 2099 in India, and I want to filter data for only the Satulaj basin using the shapefile of the Satulaj basin. How do I do this in python.
import shapefile
from shapely.geometry import shape, Point
import pandas as pd
path="D:\\THESIS\\Others\\DharamVeer_Sir\\1_Future_Climate Data\\"
df = pd.read_csv(path+"test3.csv")
path1 = "D:\\THESIS\\Others\\DharamVeer_Sir\\satulaj (1)\\Satluj\\"
# read your shapefile
r = shapefile.Reader(path1+"satulaj.shp")
# get the shapes
shapes = r.shapes()
# build a shapely polygon from your shape
polygon = shape(shapes[0])
def check(lon, lat):
# build a shapely point from your geopoint
point = Point(lon, lat)
# the contains function does exactly what you want
return polygon.contains(point)
for i in range(len(df.axes[1])):
sfile = df.values[0][i]
dst = df.values[1][i]
print(check(sfile,dst))
I am trying to convert a CSV file with WKT Polygon geometry to shapefile in Python, but cannot determine how to correctly integrate the geometry into a shapefile. Below is a segment of the CSV file:
ID Day Number Hours WKT
1 10 2 [12,12,13] POLYGON ((153.101112401 -27.797998206, 153.097860177 -27.807122487, 153.097715464 -27.8163131, 153.100598081 -27.821068293,...)
I am attempting to use the geopandas and shapely libraries and have found documentd to support conversion from CSV to Shapefile from Points geometry and using latitude/longtitude, but I cannot figure out how to do so without lat/lon and from Polygon geometry. When I attempted to plot the data, I get an "AttributeError: No geometry data set yet (expected in column 'geometry')". I can still generate a plot graphic, but there is no data associated with it. Once I can plot the data, I should be able to generate the desired shapefile output that preserves the attributes of the original CSV. Below is the the code I am using:
import pandas as pd
import geopandas as gpd
from shapely import wkt
test_file = pd.read_csv("C:\\Users\\mdl518\\Desktop\\sample_data.csv") ## read the CSV
test_file['geometry'] = test_file.WKT.apply(wkt.loads) ## load the WKT geometry
gdf = gpd.GeoDataFrame(test_file, geometry='geometry') ## load CRS into Geodataframe
test_file_gdf.plot(markersize = 1.5, figsize = (10,10)) ## plot the data
## Obtaining the ESRI WKT
ESRI_WKT = 'GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]'
## Saving to shapefile
test_file_gdf.to_file(filename = "C:\\Users\\mdl518\\Desktop\\test_sample.shp", driver = "ESRI Shapefile", crs_wkt = ESRI_WKT)
I feel like this should otherwise be fairly straightforward, but I cannot figure out the missing steps in the geodataframe geometry integration, any assistance is most appreciated!
I have a shapefile (will be called source-file hereafter), which I need to clip by a multi-polygon shapefile so that I can have a clipped shapefile for each polygon. I tried the geopandas, though I am able to clip the source-file by individually clipping the it by selecting the polygons separately from the multi-polygon shapefile, but when I try to loop over the polygons to automate the clipping process I get the following error:
Error:
TypeError: 'mask' should be GeoDataFrame, GeoSeries or(Multi)Polygon, got <class 'tuple'>
Code:
import geopandas as gpd
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
for row in maskfile.iterrows():
gpd.clip(sourcefile, row)
Two points
https://geopandas.org/en/stable/docs/reference/api/geopandas.clip.html mask can be a GeoDataFrame hence no need for looping
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html yields a tuple of the index value and named tuple of row. Hence your error is the fact your are passing this tuple to clip()
Have constructed an example. It is far simpler to clip using a GeoDataFrame as the mask.
import geopandas as gpd
import pandas as pd
# lets build a mask for use in clip, multipolygons and polygons
maskfile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
maskfile = maskfile.loc[maskfile["continent"].eq("Europe") & maskfile["name"].ne("Russia")].pipe(
lambda d: d.assign(gdp_grp=pd.cut(d["gdp_md_est"], bins=4, labels=list("abcd")))
).dissolve("gdp_grp").reset_index()
sourcefile = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# now clip, no looping needed
gpd.clip(sourcefile, maskfile)
Finally, after 5 hours of research, I am now able clip the shapefile by a multi-polygon shapefile and save the clipped polygons separately with their respective names. Following code may be dirty
but it works.
code:
import geopandas as gpd
import pandas as pd
import os, sys
source = ('source-shapefile.shp')
mask = ('mask_shapefile.shp')
outpath = ('/outpath')
sourcefile = gpd.read_file(source)
maskfile = gpd.read_file(mask)
clipshape = maskfile.explode()
clipshape.set_index('CATCH_NAME', inplace=True) # CATCH_NAME is attribute column name
for index, row in clipshape['geometry'].iteritems():
clipped = gpd.clip(sourcefile, row)
clipped.to_file(os.path.join(outpath, f'{index}.shp'))
I am trying to save the following geodataframe with columns (geometry, area, centroid, and boundary) to a json file using df.to_file('result.geojson', driver="GeoJSON"):
However, I get the following error because I have centroid and boundary columns.
TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fb7fff86940>' as a data type
This works perfectly fine when there is only geometry column and area column.
Also when I try to save it as a csv file and then read it back as geopandas file, I am only able to convert geometry into datatype geometry. Howver, centroid and boundary show as object datatype. How do I convert them to geometry datatype?
as per comments, both geopandas and geojson only support one geometry per feature
hence your data frame is polygons as geometry and other columns are series of objects (shapely objects)
have simulated data set. This clearly demonstrates your data structure is not normalised. boundary, centroid and area are calculated columns. Hence in relational theory not normalised
this can be saved as CSV shapely objects will be encoded as WKT
this can then simply be loaded, with second step to decode WKT back into shapely objects
have demonstrated this works by plotting geometries loaded from CSV
import geopandas as gpd
import pandas as pd
import shapely.geometry
from pathlib import Path
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# derive columns in question...
gdf["boundary"] = gdf["geometry"].apply(lambda p: p.boundary )
gdf["centroid"] = gdf["geometry"].apply(lambda p: p.centroid )
gdf["area"] = gdf["geometry"].apply(lambda p: p.area )
# save as CSV shapely.wkt.dumps will be implicitly used
gdf.to_csv(Path.cwd().joinpath("SO_geom.csv"), index=False)
# load encoded dataframe
df = pd.read_csv(Path.cwd().joinpath("SO_geom.csv"))
# decode geometry columns as strings back into shapely objects
for c in ["geometry","boundary","centroid"]:
df[c] = df[c].apply(shapely.wkt.loads)
# finally reconstruct geodataframe
gdf = gpd.GeoDataFrame(df)
# show it has worked
gdf.plot()
gpd.GeoSeries(gdf["boundary"]).plot()
gpd.GeoSeries(gdf["centroid"]).plot()
I read a .csv file as a dataframe that looks like the following:
import pandas as pd
df = pd.read_csv('myFile.csv')
df.head()
BoroName geometry
0 Brooklyn MULTIPOLYGON (((-73.97604935657381 40.63127590...
1 Queens MULTIPOLYGON (((-73.80379022888098 40.77561011...
2 Queens MULTIPOLYGON (((-73.8610972440186 40.763664477...
3 Queens MULTIPOLYGON (((-73.75725671509139 40.71813860...
4 Manhattan MULTIPOLYGON (((-73.94607828674226 40.82126321...
I want to convert it to a geopandas dataframe.
import geopandas as gpd
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df, crs=crs).set_geometry('geometry')
but I get the following error
TypeError: Input must be valid geometry objects: MULTIPOLYGON (((-73.97604935657381 40.631275905646774, -73.97716511994669 40.63074665412933,....
Geopandas seems to be unable to convert a geometry column from a pandas dataframe.
Solution number 2
Try applying the shapely wkt.loads function on your column before converting your dataframe to a geodataframe.
from shapely import wkt
df['geometry'] = df['geometry'].apply(wkt.loads)
gdf = gpd.GeoDataFrame(df, crs='epsg:4326')
Good luck!
Do not use - crashes spyder and jupyter kernel for some people
Solution number 1: Try loading the csv directly with geopandas
gdf = gpd.read_file('myFile.csv')
gdf.crs = 'epsg:4326'
You could also try this:
gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df.longitude, df.latitude)
)
This will convert those lat/long columns to points
Dudes, there are wkt geometry strings in the original dataframe, not xy columns, so i would suggest read this:
DataFrame with WKT Column to GeoPandas Geometry
Geopandas puts a geometry column at the end if you load directly. Found this out by experimenting with column names and it worked