I have a dataframe stored as csv file, one column of which is Polygon object. However, this column is stored as strings instead of GeoPandas geometry object. How can I convert this column to Geopandas geometry object so that I can perform geo analysis?
This is how my data looks like
my_df['geometry'].head()
0 POLYGON ((-122.419942 37.809021, -122.419938 3...
1 POLYGON ((-122.419942 37.809021, -122.419938 3...
2 POLYGON ((-122.419942 37.809021, -122.419938 3...
3 POLYGON ((-122.419942 37.809021, -122.419938 3...
4 POLYGON ((-122.405659 37.806674, -122.405974 3...
Name: geometry, dtype: object
I want to convert this Pandas DataFrame to Geopandas GeoDataFrame, using the column 'geometry' as the Geopandas geometry column.
my_geo_df = gpd.GeoDataFrame(my_df, geometry=my_df['geometry'])
However, as the column is stored as strings, Geopandas.DataFrame() does not recognize it and therefore cannot actually create a GeoDataFrame.
TypeError: Input geometry column must contain valid geometry objects.
The format of your polygon is WKT, so you have to convert it to shapely Polygon. Following Geopandas docs (https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html) do following
Using GeoPandas 0.9+:
df['geometry'] = gpd.GeoSeries.from_wkt(df['geometry'])
my_geo_df = gpd.GeoDataFrame(my_df, geometry='geometry')
Using older versions:
from shapely import wkt
df['geometry'] = df['geometry'].apply(wkt.loads)
my_geo_df = gpd.GeoDataFrame(my_df, geometry='geometry')
Related
I have a geopandas dataframe that looks like this:
shape_id geometry
1000252 LINESTRING (4.91790 52.34725, 4.91797 52.34715...
1000254 LINESTRING (4.80382 52.34495, 4.80413 52.34500...
1000255 LINESTRING (4.89922 52.37811, 4.89923 52.37807...
With Python, I would like to extract the coordinates in the geometry column for each shape_id row individually as a list. For example, the output for shape_id = 1000252 should be as follows:
[[52.34725, 4.91790],
[52.34715, 4.91797],
[52.34742, 4.91723],
[52.34752, 4.91713]]
What is the most efficient way to achieve this?
Each shapely LineString object has a coords attribute which give the points defining the linestring, and you can access the xy attr to convert the MultiPoint to a tuple of numpy arrays. A bit of extra numpy will get you to a stacked list of lists:
# e.g. for position 40...
In [3]: np.vstack(gdf.iloc[40].geometry.coords.xy).T.tolist()
Out[3]:
[[0.7741171421283728, 1.715569328873729],
[0.5852143769680165, 1.4516089839272017],
[0.378452363108969, 1.2226445706965148],
[0.43147551026039477, 0.7940308770193946],
[0.3105453476502247, 0.770655256832471],
[0.13440130471131118, 0.2957373776736154],
[0.6793980801823408, 1.4291149753156192],
[0.25803877234174954, 0.5296081932347322],
[0.12773596566152468, 0.6238335508304359],
[0.1575172393070674, 0.44929138014961945],
[0.2222528104586241, 0.8623618596533595],
[0.8185687868071416, 1.5897595726257494]]
See the shapely docs on coordinate sequences for more info.
I have the following Python code to read my shapefile features into a GeoDataFrame using the points x, y.
import math
import shapely.geometry
import geopandas as gpd
from shapely.ops import nearest_points
absolute_path_to_shapefile = 'c:/test/test1.shp'
gdf1 = gpd.read_file(absolute_path_to_shapefile)
gdf = gpd.GeoDataFrame(
gdf1, geometry=gpd.points_from_xy(gdf1['x'], gdf1['y']))
Is there a way to limit the features read in? Some shapefiles have millions of points but I just want to read in the first 100 as proof of concept.
GeoPandas read_file() has a rows option to limit the number of rows read (or to use a slice to read specific rows).
import math
import shapely.geometry
import geopandas as gpd
from shapely.ops import nearest_points
absolute_path_to_shapefile = 'c:/test/test1.shp'
gdf1 = gpd.read_file(absolute_path_to_shapefile, rows=100)
gdf = gpd.GeoDataFrame(gdf1, geometry=gpd.points_from_xy(gdf1['x'], gdf1['y']))
GeoPandas documentation
geopandas.read_file(filename, bbox=None, mask=None, rows=None, **kwargs)
Returns a GeoDataFrame from a file or URL.
Parameters
filename: str, path object or file-like object
Either the absolute or relative path to the file or URL to be opened, or any object with a read() method (such as an open file or StringIO)
bbox: tuple | GeoDataFrame or GeoSeries | shapely Geometry, default None
Filter features by given bounding box, GeoSeries, GeoDataFrame or a shapely geometry. CRS mis-matches are resolved if given a GeoSeries or GeoDataFrame. Tuple is (minx, miny, maxx, maxy) to match the bounds property of shapely geometry objects. Cannot be used with mask.
mask: dict | GeoDataFrame or GeoSeries | shapely Geometry, default None
Filter for features that intersect with the given dict-like geojson geometry, GeoSeries, GeoDataFrame or shapely geometry. CRS mis-matches are resolved if given a GeoSeries or GeoDataFrame. Cannot be used with bbox.
rows: int or slice, default None
Load in specific rows by passing an integer (first n rows) or a slice() object.
**kwargs :
Keyword args to be passed to the open or BytesCollection method in the fiona library when opening the file. For more information on possible keywords, type: import fiona; help(fiona.open)
Returns
geopandas.GeoDataFrame or pandas.DataFrame :
If ignore_geometry=True a pandas.DataFrame will be returned.
I am trying to convert a CSV file with WKT Polygon geometry to shapefile in Python, but cannot determine how to correctly integrate the geometry into a shapefile. Below is a segment of the CSV file:
ID Day Number Hours WKT
1 10 2 [12,12,13] POLYGON ((153.101112401 -27.797998206, 153.097860177 -27.807122487, 153.097715464 -27.8163131, 153.100598081 -27.821068293,...)
I am attempting to use the geopandas and shapely libraries and have found documentd to support conversion from CSV to Shapefile from Points geometry and using latitude/longtitude, but I cannot figure out how to do so without lat/lon and from Polygon geometry. When I attempted to plot the data, I get an "AttributeError: No geometry data set yet (expected in column 'geometry')". I can still generate a plot graphic, but there is no data associated with it. Once I can plot the data, I should be able to generate the desired shapefile output that preserves the attributes of the original CSV. Below is the the code I am using:
import pandas as pd
import geopandas as gpd
from shapely import wkt
test_file = pd.read_csv("C:\\Users\\mdl518\\Desktop\\sample_data.csv") ## read the CSV
test_file['geometry'] = test_file.WKT.apply(wkt.loads) ## load the WKT geometry
gdf = gpd.GeoDataFrame(test_file, geometry='geometry') ## load CRS into Geodataframe
test_file_gdf.plot(markersize = 1.5, figsize = (10,10)) ## plot the data
## Obtaining the ESRI WKT
ESRI_WKT = 'GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]'
## Saving to shapefile
test_file_gdf.to_file(filename = "C:\\Users\\mdl518\\Desktop\\test_sample.shp", driver = "ESRI Shapefile", crs_wkt = ESRI_WKT)
I feel like this should otherwise be fairly straightforward, but I cannot figure out the missing steps in the geodataframe geometry integration, any assistance is most appreciated!
I am trying to save the following geodataframe with columns (geometry, area, centroid, and boundary) to a json file using df.to_file('result.geojson', driver="GeoJSON"):
However, I get the following error because I have centroid and boundary columns.
TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fb7fff86940>' as a data type
This works perfectly fine when there is only geometry column and area column.
Also when I try to save it as a csv file and then read it back as geopandas file, I am only able to convert geometry into datatype geometry. Howver, centroid and boundary show as object datatype. How do I convert them to geometry datatype?
as per comments, both geopandas and geojson only support one geometry per feature
hence your data frame is polygons as geometry and other columns are series of objects (shapely objects)
have simulated data set. This clearly demonstrates your data structure is not normalised. boundary, centroid and area are calculated columns. Hence in relational theory not normalised
this can be saved as CSV shapely objects will be encoded as WKT
this can then simply be loaded, with second step to decode WKT back into shapely objects
have demonstrated this works by plotting geometries loaded from CSV
import geopandas as gpd
import pandas as pd
import shapely.geometry
from pathlib import Path
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# derive columns in question...
gdf["boundary"] = gdf["geometry"].apply(lambda p: p.boundary )
gdf["centroid"] = gdf["geometry"].apply(lambda p: p.centroid )
gdf["area"] = gdf["geometry"].apply(lambda p: p.area )
# save as CSV shapely.wkt.dumps will be implicitly used
gdf.to_csv(Path.cwd().joinpath("SO_geom.csv"), index=False)
# load encoded dataframe
df = pd.read_csv(Path.cwd().joinpath("SO_geom.csv"))
# decode geometry columns as strings back into shapely objects
for c in ["geometry","boundary","centroid"]:
df[c] = df[c].apply(shapely.wkt.loads)
# finally reconstruct geodataframe
gdf = gpd.GeoDataFrame(df)
# show it has worked
gdf.plot()
gpd.GeoSeries(gdf["boundary"]).plot()
gpd.GeoSeries(gdf["centroid"]).plot()
I read a .csv file as a dataframe that looks like the following:
import pandas as pd
df = pd.read_csv('myFile.csv')
df.head()
BoroName geometry
0 Brooklyn MULTIPOLYGON (((-73.97604935657381 40.63127590...
1 Queens MULTIPOLYGON (((-73.80379022888098 40.77561011...
2 Queens MULTIPOLYGON (((-73.8610972440186 40.763664477...
3 Queens MULTIPOLYGON (((-73.75725671509139 40.71813860...
4 Manhattan MULTIPOLYGON (((-73.94607828674226 40.82126321...
I want to convert it to a geopandas dataframe.
import geopandas as gpd
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df, crs=crs).set_geometry('geometry')
but I get the following error
TypeError: Input must be valid geometry objects: MULTIPOLYGON (((-73.97604935657381 40.631275905646774, -73.97716511994669 40.63074665412933,....
Geopandas seems to be unable to convert a geometry column from a pandas dataframe.
Solution number 2
Try applying the shapely wkt.loads function on your column before converting your dataframe to a geodataframe.
from shapely import wkt
df['geometry'] = df['geometry'].apply(wkt.loads)
gdf = gpd.GeoDataFrame(df, crs='epsg:4326')
Good luck!
Do not use - crashes spyder and jupyter kernel for some people
Solution number 1: Try loading the csv directly with geopandas
gdf = gpd.read_file('myFile.csv')
gdf.crs = 'epsg:4326'
You could also try this:
gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df.longitude, df.latitude)
)
This will convert those lat/long columns to points
Dudes, there are wkt geometry strings in the original dataframe, not xy columns, so i would suggest read this:
DataFrame with WKT Column to GeoPandas Geometry
Geopandas puts a geometry column at the end if you load directly. Found this out by experimenting with column names and it worked