I have a geotiff file.
import xarray as xr
urbanData = xr.open_rasterio('myGeotiff.tif')
plt.imshow(urbanData)
Here the link to the file.
I can convert the file as a dataframe with coordinates as points
ur = xr.DataArray(urbanData, name='myData')
ur = ur.to_dataframe().reset_index()
gdfur = gpd.GeoDataFrame(ur, geometry=gpd.points_from_xy(ur.x, ur.y))
However I would like to get a dataframe that contains the geometry of the pixels as polygons and not as points. Is it possible?
Somewhat to my surprise, I haven't really found a package which wrap rasterio.features to take DataArrays and produce GeoDataFrames.
These might be very useful though:
https://corteva.github.io/geocube/stable/
https://corteva.github.io/rioxarray/stable/
I generally use something like this:
import affine
import geopandas as gpd
import rasterio.features
import xarray as xr
import shapely.geometry as sg
def polygonize(da: xr.DataArray) -> gpd.GeoDataFrame:
"""
Polygonize a 2D-DataArray into a GeoDataFrame of polygons.
Parameters
----------
da : xr.DataArray
Returns
-------
polygonized : geopandas.GeoDataFrame
"""
if da.dims != ("y", "x"):
raise ValueError('Dimensions must be ("y", "x")')
values = da.values
transform = da.attrs.get("transform", None)
if transform is None:
raise ValueError("transform is required in da.attrs")
transform = affine.Affine(*transform)
shapes = rasterio.features.shapes(values, transform=transform)
geometries = []
colvalues = []
for (geom, colval) in shapes:
geometries.append(sg.Polygon(geom["coordinates"][0]))
colvalues.append(colval)
gdf = gpd.GeoDataFrame({"value": colvalues, "geometry": geometries})
gdf.crs = da.attrs.get("crs")
return gdf
Note that you should squeeze the band dimensions from your xarray first to make it 2D, after reading it with xr.open_rasterio:
urbanData = xr.open_rasterio('myGeotiff.tif').squeeze('band', drop=True)
I got lists of coordinates in the csv file(please click the pic). How should I convert them to polygons in GeoDataFrame?
Below is the coordinates of one polygon and I have thousands rows of this.
[118.103198,24.527338],[118.103224,24.527373],[118.103236,24.527366],[118.103209,24.527331],[118.103198,24.527338]
I tried the following codes:
def bike_fence_format(s):
s = s.replace('[', '').replace(']', '').split(',')
return s
df['FENCE_LOC'] = df['FENCE_LOC'].apply(bike_fence_format)
df['LAT'] = df['FENCE_LOC'].apply(lambda x: x[1::2])
df['LON'] = df['FENCE_LOC'].apply(lambda x: x[::2])
df['geom'] = Polygon(zip(df['LON'].astype(str),df['LAT'].astype(str)))
But I failed in the last step, since df['LON'] returns 'series' not 'string' type. How should I get over this problem? It's better if there is an easier way to achieve my goal.
Recreated a sample df of what your .csv file would give (depending on how your read it in with .read_csv()).
import pandas as pd
import geopandas as gpd
df = pd.DataFrame({'FENCE_LOC': ['[32250,175889],[33913,180757],[29909,182124],[28246,177257],[32250,175889]',
'[32250,175889],[33913,180757],[29909,182124],[28246,177257],[32250,175889]',
'[32250,175889],[33913,180757],[29909,182124],[28246,177257],[32250,175889]']}, index=[0, 1, 2])
Modified your function slightly because we want numeric values, not strings
def bike_fence_format(s):
s = s.replace('[', '').replace(']', '').split(',')
s = [float(x) for x in s]
return s
df['FENCE_LOC'] = df['FENCE_LOC'].apply(bike_fence_format)
df['LAT'] = df['FENCE_LOC'].apply(lambda x: x[1::2])
df['LON'] = df['FENCE_LOC'].apply(lambda x: x[::2])
We can use some list comprehensions to build a list of Shapely polygons.
geom_list = [(x, y) for x, y in zip(df['LON'],df['LAT'])]
geom_list_2 = [Polygon(tuple(zip(x, y))) for x, y in geom_list]
Finally, we can create a gdf using our list of Shapely polygons.
polygon_gdf = gpd.GeoDataFrame(geometry=geom_list_2)
To make available a small representative dataset similar to what the OP posts as an image, I create this rows of data (sorry for too many decimal digits):
[[-2247824.100899419,-4996167.43201861],[-2247824.100899419,-4996067.43201861],[-2247724.100899419,-4996067.43201861],[-2247724.100899419,-4996167.43201861],[-2247824.100899419,-4996167.43201861]]
[[-2247724.100899419,-4996167.43201861],[-2247724.100899419,-4996067.43201861],[-2247624.100899419,-4996067.43201861],[-2247624.100899419,-4996167.43201861],[-2247724.100899419,-4996167.43201861]]
[[-2247624.100899419,-4996167.43201861],[-2247624.100899419,-4996067.43201861],[-2247524.100899419,-4996067.43201861],[-2247524.100899419,-4996167.43201861],[-2247624.100899419,-4996167.43201861]]
[[-2247824.100899419,-4996067.43201861],[-2247824.100899419,-4995967.43201861],[-2247724.100899419,-4995967.43201861],[-2247724.100899419,-4996067.43201861],[-2247824.100899419,-4996067.43201861]]
[[-2247724.100899419,-4996067.43201861],[-2247724.100899419,-4995967.43201861],[-2247624.100899419,-4995967.43201861],[-2247624.100899419,-4996067.43201861],[-2247724.100899419,-4996067.43201861]]
[[-2247624.100899419,-4996067.43201861],[-2247624.100899419,-4995967.43201861],[-2247524.100899419,-4995967.43201861],[-2247524.100899419,-4996067.43201861],[-2247624.100899419,-4996067.43201861]]
[[-2247824.100899419,-4995967.43201861],[-2247824.100899419,-4995867.43201861],[-2247724.100899419,-4995867.43201861],[-2247724.100899419,-4995967.43201861],[-2247824.100899419,-4995967.43201861]]
[[-2247724.100899419,-4995967.43201861],[-2247724.100899419,-4995867.43201861],[-2247624.100899419,-4995867.43201861],[-2247624.100899419,-4995967.43201861],[-2247724.100899419,-4995967.43201861]]
[[-2247624.100899419,-4995967.43201861],[-2247624.100899419,-4995867.43201861],[-2247524.100899419,-4995867.43201861],[-2247524.100899419,-4995967.43201861],[-2247624.100899419,-4995967.43201861]]
This data is saved as polygon_data.csv file.
For the code, modules are loaded first as
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon
Then, the data is read to create a dataframe by pandas.read_csv(). To get each row of data into a single column of the dataframe, delimiter="x" is used. Since there is no x within any row of data, the whole row of data as a long string is the result.
df3 = pd.read_csv('polygon_data.csv', header=None, index_col=None, delimiter="x")
To view the content of df3, you can run
df3.head()
and get single column (with header: 0) dataframe:
0
0 [[-2247824.100899419,-4996167.43201861],[-2247...
1 [[-2247724.100899419,-4996167.43201861],[-2247...
2 [[-2247624.100899419,-4996167.43201861],[-2247...
3 [[-2247824.100899419,-4996067.43201861],[-2247...
4 [[-2247724.100899419,-4996067.43201861],[-2247...
Next, df3 is used to create a geoDataFrame. Data in each row of df3 is used to create a Polygon object to act as the geometry of the geoDataFrame polygon_df3.
geometry = [Polygon(eval(xy_string)) for xy_string in df3[0]]
polygon_df3 = gpd.GeoDataFrame(df3, \
#crs={'init': 'epsg:4326'}, #uncomment this if (x,y) is long/lat
geometry=geometry)
Finally, the geoDataFrame can be plotted with a simple command:
# this plot the geoDataFrame
polygon_df3.plot(edgecolor='black')
In this particular case with my proposed data, the output plot is:
I'm trying a create a Choropleth in Python3 using shapely, fiona & bokeh for display.
I have a file with about 7000 lines that have the location of a town and a counter.
Example:
54.7604;9.55827;208
54.4004;9.95918;207
53.8434;9.95271;203
53.5979;10.0013;201
53.728;10.2526;197
53.646;10.0403;196
54.3977;10.1054;193
52.4385;9.39217;193
53.815;10.3476;192
...
I want to show these in a 12,5km grid, for which a shapefile is available on
https://opendata-esri-de.opendata.arcgis.com/datasets/3c1f46241cbb4b669e18b002e4893711_0
The code I have works.
It's very slow, because it's a brute force algorithm that checks each of the 7127 grid points against all of the 7000 points.
import pandas as pd
import fiona
from shapely.geometry import Polygon, Point, MultiPoint, MultiPolygon
from shapely.prepared import prep
sf = r'c:\Temp\geo_de\Hexagone_125_km\Hexagone_125_km.shp'
shp = fiona.open(sf)
district_xy = [ [ xy for xy in feat["geometry"]["coordinates"][0]] for feat in shp]
district_poly = [ Polygon(xy) for xy in district_xy] # coords to Polygon
df_p = pd.read_csv('points_file.csv', sep=';', header=None)
df_p.columns = ('lat', 'lon', 'count')
map_points = [Point(x,y) for x,y in zip(df_p.lon, df_p.lat)] # Convert Points to Shapely Points
all_points = MultiPoint(map_points) # all points
def calc_points_per_poly(poly, points, values): # Returns total for poly
poly = prep(poly)
return sum([v for p, v in zip(points, values) if poly.contains(p)])
# this is the slow part
# for each shape this sums um the points
sum_hex = [calc_points_per_poly(x, all_points, df_p['count']) for x in district_poly]
Since this is extremly slow, I'm wondering if there is a faster way to get the num_hex value, especially, since the real world list of points may be a lot larger and a smaller grid with more shapes would deliver a better result.
I would recommend using 'geopandas' and its built-in rtree spatial index. It allows you to do the check only if there is a possibility that point lies within polygon.
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, Point
sf = 'Hexagone_125_km.shp'
shp = gpd.read_file(sf)
df_p = pd.read_csv('points_file.csv', sep=';', header=None)
df_p.columns = ('lat', 'lon', 'count')
gdf_p = gpd.GeoDataFrame(df_p, geometry=[Point(x,y) for x,y in zip(df_p.lon, df_p.lat)])
sum_hex = []
spatial_index = gdf_p.sindex
for index, row in shp.iterrows():
polygon = row.geometry
possible_matches_index = list(spatial_index.intersection(polygon.bounds))
possible_matches = gdf_p.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.within(polygon)]
sum_hex.append(sum(precise_matches['count']))
shp['sum'] = sum_hex
This solution should be faster than your. You can then plot your GeoDataFrame via Bokeh. If you want more details on spatial indexing I recommend this article by Geoff Boeing: https://geoffboeing.com/2016/10/r-tree-spatial-index-python/
I have a geopandas df with a column of shapely point objects. I want to extract the coordinate (lat/lon) from the shapely point objects to generate latitude and longitude columns. There must be an easy way to do this, but I cannot figure it out.
I know you can extract the individual coordinates like this:
lon = df.point_object[0].x
lat = df.point_object[0].y
And I could create a function that does this for the entire df, but I figured there was a more efficient/elegant way.
If you have the latest version of geopandas (0.3.0 as of writing), and the if df is a GeoDataFrame, you can use the x and y attributes on the geometry column:
df['lon'] = df.point_object.x
df['lat'] = df.point_object.y
In general, if you have a column of shapely objects, you can also use apply to do what you can do on individual coordinates for the full column:
df['lon'] = df.point_object.apply(lambda p: p.x)
df['lat'] = df.point_object.apply(lambda p: p.y)
Without having to iterate over the Dataframe, you can do the following:
df['lon'] = df['geometry'].x
df['lat'] = df['geometry'].y
The solution to extract the center point (latitude and longitude) from the polygon and multi-polygon.
import geopandas as gpd
df = gpd.read_file(path + 'df.geojson')
#Find the center point
df['Center_point'] = df['geometry'].centroid
#Extract lat and lon from the centerpoint
df["long"] = df.Center_point.map(lambda p: p.x)
df["lat"] = df.Center_point.map(lambda p: p.y)