How to convert shapefile/geojson to hexagons using uber h3 in python? - python

I want to create hexagons on my geographic map and want to preserve the digital boundary specified by the shapefile/geojson as well.
How do I do it using uber's h3 python library?
I'm new to shapefiles or any other geographic data structures. I'm most comfortable with python.

tl;dr
For convenience, you can use H3-Pandas.
import geopandas as gpd
import h3pandas
# Prepare data
gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf = gdf.loc[gdf['continent'].eq('Africa')]
gdf['gdp_md_per_capita'] = gdf['gdp_md_est'].div(gdf['pop_est'])
resolution = 4
# Resample to H3 cells
gdf.h3.polyfill_resample(resolution)
Full description
Use cases like this are precisely why I wrote H3-Pandas, an integration of H3 with Pandas and GeoPandas.
Let's use the naturalearth_lowres dataset included in GeoPandas to demonstrate the usage.
import geopandas as gpd
import h3pandas
path_shapefile = gpd.datasets.get_path('naturalearth_lowres')
gdf = gpd.read_file(path_shapefile)
Let's also create a numeric column for plotting.
gdf = gdf.loc[gdf['continent'].eq('Africa')]
gdf['gdp_md_per_capita'] = gdf['gdp_md_est'].div(gdf['pop_est'])
ax = gdf.plot(figsize=(15, 15), column='gdp_md_per_capita', cmap='RdBu')
ax.axis('off')
We will use H3 resolution 4. See the H3 resolution table for more details.
resolution = 4
We can add H3 hexagons using the function polyfill. This method adds a list of H3 cells whose centroid falls into each polygon.
gdf_h3 = gdf.h3.polyfill(resolution)
print(gdf_h3['h3_polyfill'].head(3))
1 [846aca7ffffffff, 8496b5dffffffff, 847b691ffff...
2 [84551a9ffffffff, 84551cdffffffff, 8455239ffff...
11 [846af51ffffffff, 8496e63ffffffff, 846a803ffff...
Name: h3_polyfill, dtype: object
If we want to explode the values horizontally (ending up with as many rows as there are H3 cells), we can use the parameter explode
gdf_h3 = gdf.h3.polyfill(resolution, explode=True)
print(gdf_h3.head(3))
pop_est continent name iso_a3 gdp_md_est \
1 53950935 Africa Tanzania TZA 150600.0
1 53950935 Africa Tanzania TZA 150600.0
1 53950935 Africa Tanzania TZA 150600.0
geometry gdp_md_per_capita \
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 0.002791
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 0.002791
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 0.002791
h3_polyfill
1 846aca7ffffffff
1 8496b5dffffffff
1 847b691ffffffff
We can then utilize the method h3_to_geo_boundary to obtain the geometries for the H3 cells. It expects that the index already has the H3 cell ids.
gdf_h3 = gdf_h3.set_index('h3_polyfill').h3.h3_to_geo_boundary()
We can now plot the result
ax = gdf_h3.plot(figsize=(15, 15), column='gdp_md_per_capita', cmap='RdBu')
ax.axis('off')
H3-Pandas actually has convenience function that performs all this at once: polyfill_resample
gdf_h3 = gdf.h3.polyfill_resample(resolution)
ax = gdf_h3.plot(figsize=(15, 15), column='gdp_md_per_capita', cmap='RdBu')
ax.axis('off')

h3-py doesn't know how to work with shapefile data directly, but it sounds like you could use a library like https://github.com/GeospatialPython/pyshp to convert you shapefile data to GeoJSON, and then use h3.polyfill() to convert to a collection of H3 hexagons.
There are lots of options for plotting your boundary along with the H3 hexagons. For example, you might use pydeck and its GeoJsonLayer and H3HexagonLayer layers.
If your plotting software can't use H3 hexagons directly, you can convert them to other formats with functions like h3.h3_to_geo_boundary() or h3.h3_set_to_multi_polygon().

Converting a GeoJSON to Uber's h3 is quite simple.
Attaching a sample code snippet and GeoJSON used below:
GeoJSON:
{"type": "FeatureCollection","features": [{"type": "Feature","properties": {},"geometry": {"type": "Polygon", "coordinates": [[[77.5250244140625,13.00857192009273],[77.51266479492188,12.971103764892034],[77.52777099609375,12.94099133483504],[77.57171630859375,12.907528813968185],[77.60604858398438,12.914890953258695],[77.662353515625,12.928276105253065],[77.69874572753906,12.961066692801282],[77.65823364257812,13.00990996390651],[77.58956909179688,13.04469656691112],[77.53944396972656,13.038007215169166],[77.5250244140625,13.00857192009273]]]}}]}
Code:
from h3converter import h3converter
geojson_raw = open("sampleJson.geojson",)
geojson = json.load(geojson_raw)
h3_list = []
h3_resolution = 7
for feature in geojson["features"]:
feature_h3 = h3converter.polyfill(feature["geometry"], h3_resolution)
h3_list.append(feature_h3)
print("H3's created => ", len(feature_h3))
print(h3_list)
Response:
[{'8761892edffffff', '8760145b0ffffff', '87618925effffff', '87618925bffffff', '87618924affffff', '876189256ffffff', '8760145b1ffffff', '8761892eaffffff', '8761892eeffffff', '876189253ffffff', '876189259ffffff', '8760145b2ffffff', '876014586ffffff', '8760145b4ffffff', '8761892e1ffffff', '8760145a2ffffff', '8761892ecffffff', '876189251ffffff', '8760145a4ffffff', '8761892e5ffffff', '87618925affffff', '8761892e9ffffff', '8761892cdffffff', '876189250ffffff', '87618925dffffff', '8760145b6ffffff', '876014595ffffff', '876189252ffffff', '8761892ebffffff', '8760145a3ffffff', '8760145a6ffffff', '876014584ffffff', '876189258ffffff', '8760145b5ffffff', '8760145b3ffffff', '876014594ffffff', '8761892c9ffffff', '87618925cffffff', '8760145a0ffffff', '8761892e8ffffff'}]
Package used: https://pypi.org/project/h3converter/
You could use the above package to convert h3 to GeoJSON as well.

You can extract GeoJSON from Shapley like this:
import geopandas as gpd
import shapely.geometry
# Create Geometry
shapely_polygon = shapely.geometry.Polygon([(0, 0), (0, 1), (1, 0)])
# Extract JSON Feature Collection
gpd.GeoSeries([shapely_polygon]).__geo_interface__
Output:
{"type":"FeatureCollection","features":[{"id":"0","type":"Feature","properties":{},"geometry":{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]},"bbox":[0.0,0.0,1.0,1.0]}],"bbox":[0.0,0.0,1.0,1.0]}

Related

Extract values from xarray dataset using geopandas multilinestring

I have a few hundred geopandas multilinestrings that trace along an object of interest (one line each week over a few years tracing the Gulf Stream) and I want to use those lines to extract values from a few other xarray datasets to know sea surface temperature, chlorophyll-a, and other variables along this path each week.
I'm unsure though how exactly to use these geopandas lines to extract values from the xarray datasets. I have thought about breaking them into points and grabbing the dataset values at each point but that seems a bit cumbersome. Is there any straightforward way to do this operation?
Breaking the lines into points and then extracting the point is quite straightforward actually!
import geopandas as gpd
import numpy as np
import shapely.geometry as sg
import xarray as xr
# Setup an example DataArray:
y = np.arange(20.0)
x = np.arange(20.0)
da = xr.DataArray(
data=np.random.rand(y.size, x.size),
coords={"y": y, "x": x},
dims=["y", "x"],
)
# Setup an example geodataframe:
gdf = gpd.GeoDataFrame(
geometry=[
sg.LineString([(0.0, 0.0), (5.0, 5.0)]),
sg.LineString([(10.0, 10.0), (15.0, 15.0)]),
]
)
# Get the centroids, and create the indexers for the DataArray:
centroids = gdf.centroid
x_indexer = xr.DataArray(centroids.x, dims=["point"])
y_indexer = xr.DataArray(centroids.y, dims=["point"])
# Grab the results:
da.sel(x=x_indexer, y=y_indexer, method="nearest")
<xarray.DataArray (point: 2)>
array([0.80121949, 0.34728138])
Coordinates:
y (point) float64 3.0 13.0
x (point) float64 3.0 13.0
* point (point) int64 0 1
The main thing is to decide on which point you'd like to sample, or how many points, etc.
Note that the geometry objects in the geodataframe also have an interpolation method, if you'd like draw values at specific points along the trajectory:
https://shapely.readthedocs.io/en/stable/manual.html#object.interpolate
In such a case, .apply can come in handy:
gdf.geometry.apply(lambda geom: geom.interpolate(3.0))
0 POINT (2.12132 2.12132)
1 POINT (12.12132 12.12132)
Name: geometry, dtype: geometry
I have used regionmask and it is pretty fast and easy to use. The mask_geopandas method is what you need.
Since GeoPandas uses the same conventions as Pandas, the best way is to unify the data type when you're working on it. You can do this in xarray with:
xr.Dataset.from_dataframe(df)

PANDAS-GEOPANDAS: Localization of points in a shapefile

Using pandas and geopandas, I would like to define a function to be applied to each row of a dataframe which operates as follows:
INPUT: column with coordinates
OUTPUT: zone in which the point falls.
I tried with this, but it takes very long.
def zone_assign(point,zones,codes):
try:
zone_label=zones[zones['geometry'].contains(point)][codes].values[0]
except:
zone_label=np.NaN
return(zone_label)
where:
point is the cell of the row which contains geographical coordinates;
zones is the shapefile imported with geopandas;
codes is the column of the shapefile which contains label to be assigned to the point.
Part of the answer, is taken from another answer I made earlier that needed within rather than contains
Your situation looks like a typical case where spatial joins are useful. The idea of spatial joins is to merge data using geographic coordinates instead of using attributes.
Three possibilities in geopandas:
intersects
within
contains
It seems like you want contains, which is possible using the following syntax:
geopandas.sjoin(polygons, points, how="inner", op='contains')
Note: You need to have installed rtree to be able to perform such operations. If you need to install this dependency, use pip or conda to install it
Example
As an example, let's take a random sample of cities and plot countries associated. The two example datasets are
import geopandas
import matplotlib.pyplot as plt
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
cities = cities.sample(n=50, random_state=1)
world.head(2)
pop_est continent name iso_a3 gdp_md_est geometry
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
cities.head(3)
name geometry
196 Bogota POINT (-74.08529 4.59837)
95 Tbilisi POINT (44.78885 41.72696)
173 Seoul POINT (126.99779 37.56829)
world is a worldwide dataset and cities is a subset.
Both dataset need to be in the same projection system. If not, use .to_crs before merging.
data_merged = geopandas.sjoin(countries, cities, how="inner", op='contains')
Finally, to see the result let's do a map
f, ax = plt.subplots(1, figsize=(20,10))
data_merged.plot(axes=ax)
countries.plot(axes=ax, alpha=0.25, linewidth=0.1)
plt.show()
and the underlying dataset merges together the information we need
data_merged.head(2)
pop_est continent name_left iso_a3 gdp_md_est geometry index_right name_right
7 6909701 Oceania Papua New Guinea PNG 28020.0 MULTIPOLYGON (((141.00021 -2.60015, 142.73525 ... 59 Port Moresby
9 44293293 South America Argentina ARG 879400.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.25000... 182 Buenos Aires
Here, I used inner join method but that's a parameter you can change if, for instance, you want to keep all points, including those not within a polygon.

Overlaying Shapefile datapoints on Density Map

I am new to shapefiles and mapping in python so I was hoping to get some help with overlaying data points from a shapefile on a density map.
To be honest, I am a beginner with mapping and reading in shapefiles so what I have so far not much.
I have started off using pyshp but if there are better packages out there to do this then I would love any feedback.
The following code is to create the base map of the LA area:
def get_base_map(rides_clean):
return folium.Map(locations=[rides_clean.start_lat.mean(),
rides_clean.start_lon.mean()],
zoom_start = 20, tiles = 'cartodbpositron')
The following code is to create the density/heat map:
from folium import plugins
stationArr = rides_clean[['start_lat', 'start_lon']][:40000].as_matrix()
get_base_map(rides_clean).add_child(plugins.HeatMap(stationArr,
radius=40, max_val=300))
The following code is the same heat map but with route lines added:
(draw_route_lines(get_base_map(rides_clean),
routedf_vol)).add_child(plugins.HeatMap(stationArr, radius=40,
max_val=300))
I want to see data points from the shapefile shown as markers on top of the density plot.
It is possible to do this with pyshp. I've only ever used Matplotlib to plot shapefile points on a map, but this method will create two arrays which will be the x and y coordinates of each point you'd like to plot. The first snippet is used if you have multiple shapes in your shapefile, while the second can be used if you only have one shape.
import shapefile
import numpy as np
sf = shapefile.Reader('/path/to/shapefile')
point_list = []
for shape in sf:
temp = shape.points()
point_list.append(temp)
point_list = np.array(point_list)
x = point_list[:,0]
y = point_list[:,1]
And for a shapefile with only a single shape:
import shapefile
import numpy as np
sf = shapefile.Reader('/path/to/shapefile')
point_list = np.array(sf.shape(0).points)
x = point_list[:,0]
y = point_list[:,1]
You can tell how many shapes are in your shapefile using sf.shapes() and it will print a list detailing all the different shapes. From your question it appeared you were wanting to plot it as points on the marker rather than lines, sorry if this is not the case.

Plotting with folium

The task is to make an adress popularity map for Moscow. Basically, it should look like this:
https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/GeoJSON_and_choropleth.ipynb
For my map I use public geojson: http://gis-lab.info/qa/moscow-atd.html
The only data I have - points coordinates and there's no information about the district they belong to.
Question 1:
Do I have to manually calculate for each disctrict if the point belongs to it, or there is more effective way to do this?
Question 2:
If there is no way to do this easier, then, how can I get all the coordinates for each disctrict from the geojson file (link above)?
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
Reading in the Moscow area shape file with geopandas
districts = gpd.read_file('mo-shape/mo.shp')
Construct a mock user dataset
moscow = [55.7, 37.6]
data = (
np.random.normal(size=(100, 2)) *
np.array([[.25, .25]]) +
np.array([moscow])
)
my_df = pd.DataFrame(data, columns=['lat', 'lon'])
my_df['pop'] = np.random.randint(500, 100000, size=len(data))
Create Point objects from the user data
geom = [Point(x, y) for x,y in zip(my_df['lon'], my_df['lat'])]
# and a geopandas dataframe using the same crs from the shape file
my_gdf = gpd.GeoDataFrame(my_df, geometry=geom)
my_gdf.crs = districts.crs
Then the join using default value of 'inner'
gpd.sjoin(districts, my_gdf, op='contains')
Thanks to #BobHaffner, I tried to solve the problem using geopandas.
Here are my steps:
I download a shape-files for Moscow using this link click
From a list of tuples containing x and y (latitude and logitude) coordinates I create list of Points (docs)
Assuming that in the dataframe from the first link I have polygons I can write a simple loop for checking if the Point is inside this polygon. For details read this.

Combining OSMnx Multipolygons

What are best practices in generating a union of Multipolygons acquired as a group using OSMnx's gdf_from_places()?
In gboeing's 02-example-osm-to-shapefile.ipynb example, multiple shapefiles are downloaded from OSM to a geodataframe using the gdf_from_places() method. The geometry is stored as Multipolygons in a Geopanda's dataframe with each row representing a place.
# you can pass multiple queries with mixed types (dicts and strings)
mx_gt_tx = ox.gdf_from_places(queries=[{'country':'Mexico'}, 'Guatemala', {'state':'Texas'}])
mx_gt_tx = ox.project_gdf(mx_gt_tx)
fig, ax = ox.plot_shape(mx_gt_tx)
In regards to the question, I've experimented with using Geopanda's GeoSeries.unary_union but wanted to know how others were accomplishing this programmatically in Python.
Current Process 2018
This method uses the Shapely function of unary_union (it would otherwise be mx_gt_tx["geometry"].unary_union through Geopandas as pointed out by #joris comment.
queries = [{'country':'Mexico'}, 'Guatemala', {'state':'Texas'}]
# buffer_dist is in meters
mx_gt_tx = ox.gdf_from_places(queries, gdf_name='region_mx_gt_tx')
mx_gt_tx
# project the geometry to the appropriate UTM zone then plot it
mx_gt_tx = ox.project_gdf(mx_gt_tx)
fig, ax = ox.plot_shape(mx_gt_tx)
# unary union through Geopandas
region_mx_gt_tx = gpd.GeoSeries(unary_union(mx_gt_tx["geometry"]))
region_mx_gt_tx.plot(color = 'blue')
plt.show()
print(region_mx_gt_tx )
import osmnx as ox
gdf = ox.gdf_from_places(queries=[{'country':'Mexico'}, 'Guatemala', {'state':'Texas'}])
unified = gdf.unary_union

Categories