What are best practices in generating a union of Multipolygons acquired as a group using OSMnx's gdf_from_places()?
In gboeing's 02-example-osm-to-shapefile.ipynb example, multiple shapefiles are downloaded from OSM to a geodataframe using the gdf_from_places() method. The geometry is stored as Multipolygons in a Geopanda's dataframe with each row representing a place.
# you can pass multiple queries with mixed types (dicts and strings)
mx_gt_tx = ox.gdf_from_places(queries=[{'country':'Mexico'}, 'Guatemala', {'state':'Texas'}])
mx_gt_tx = ox.project_gdf(mx_gt_tx)
fig, ax = ox.plot_shape(mx_gt_tx)
In regards to the question, I've experimented with using Geopanda's GeoSeries.unary_union but wanted to know how others were accomplishing this programmatically in Python.
Current Process 2018
This method uses the Shapely function of unary_union (it would otherwise be mx_gt_tx["geometry"].unary_union through Geopandas as pointed out by #joris comment.
queries = [{'country':'Mexico'}, 'Guatemala', {'state':'Texas'}]
# buffer_dist is in meters
mx_gt_tx = ox.gdf_from_places(queries, gdf_name='region_mx_gt_tx')
mx_gt_tx
# project the geometry to the appropriate UTM zone then plot it
mx_gt_tx = ox.project_gdf(mx_gt_tx)
fig, ax = ox.plot_shape(mx_gt_tx)
# unary union through Geopandas
region_mx_gt_tx = gpd.GeoSeries(unary_union(mx_gt_tx["geometry"]))
region_mx_gt_tx.plot(color = 'blue')
plt.show()
print(region_mx_gt_tx )
import osmnx as ox
gdf = ox.gdf_from_places(queries=[{'country':'Mexico'}, 'Guatemala', {'state':'Texas'}])
unified = gdf.unary_union
Related
I want to create hexagons on my geographic map and want to preserve the digital boundary specified by the shapefile/geojson as well.
How do I do it using uber's h3 python library?
I'm new to shapefiles or any other geographic data structures. I'm most comfortable with python.
tl;dr
For convenience, you can use H3-Pandas.
import geopandas as gpd
import h3pandas
# Prepare data
gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf = gdf.loc[gdf['continent'].eq('Africa')]
gdf['gdp_md_per_capita'] = gdf['gdp_md_est'].div(gdf['pop_est'])
resolution = 4
# Resample to H3 cells
gdf.h3.polyfill_resample(resolution)
Full description
Use cases like this are precisely why I wrote H3-Pandas, an integration of H3 with Pandas and GeoPandas.
Let's use the naturalearth_lowres dataset included in GeoPandas to demonstrate the usage.
import geopandas as gpd
import h3pandas
path_shapefile = gpd.datasets.get_path('naturalearth_lowres')
gdf = gpd.read_file(path_shapefile)
Let's also create a numeric column for plotting.
gdf = gdf.loc[gdf['continent'].eq('Africa')]
gdf['gdp_md_per_capita'] = gdf['gdp_md_est'].div(gdf['pop_est'])
ax = gdf.plot(figsize=(15, 15), column='gdp_md_per_capita', cmap='RdBu')
ax.axis('off')
We will use H3 resolution 4. See the H3 resolution table for more details.
resolution = 4
We can add H3 hexagons using the function polyfill. This method adds a list of H3 cells whose centroid falls into each polygon.
gdf_h3 = gdf.h3.polyfill(resolution)
print(gdf_h3['h3_polyfill'].head(3))
1 [846aca7ffffffff, 8496b5dffffffff, 847b691ffff...
2 [84551a9ffffffff, 84551cdffffffff, 8455239ffff...
11 [846af51ffffffff, 8496e63ffffffff, 846a803ffff...
Name: h3_polyfill, dtype: object
If we want to explode the values horizontally (ending up with as many rows as there are H3 cells), we can use the parameter explode
gdf_h3 = gdf.h3.polyfill(resolution, explode=True)
print(gdf_h3.head(3))
pop_est continent name iso_a3 gdp_md_est \
1 53950935 Africa Tanzania TZA 150600.0
1 53950935 Africa Tanzania TZA 150600.0
1 53950935 Africa Tanzania TZA 150600.0
geometry gdp_md_per_capita \
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 0.002791
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 0.002791
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 0.002791
h3_polyfill
1 846aca7ffffffff
1 8496b5dffffffff
1 847b691ffffffff
We can then utilize the method h3_to_geo_boundary to obtain the geometries for the H3 cells. It expects that the index already has the H3 cell ids.
gdf_h3 = gdf_h3.set_index('h3_polyfill').h3.h3_to_geo_boundary()
We can now plot the result
ax = gdf_h3.plot(figsize=(15, 15), column='gdp_md_per_capita', cmap='RdBu')
ax.axis('off')
H3-Pandas actually has convenience function that performs all this at once: polyfill_resample
gdf_h3 = gdf.h3.polyfill_resample(resolution)
ax = gdf_h3.plot(figsize=(15, 15), column='gdp_md_per_capita', cmap='RdBu')
ax.axis('off')
h3-py doesn't know how to work with shapefile data directly, but it sounds like you could use a library like https://github.com/GeospatialPython/pyshp to convert you shapefile data to GeoJSON, and then use h3.polyfill() to convert to a collection of H3 hexagons.
There are lots of options for plotting your boundary along with the H3 hexagons. For example, you might use pydeck and its GeoJsonLayer and H3HexagonLayer layers.
If your plotting software can't use H3 hexagons directly, you can convert them to other formats with functions like h3.h3_to_geo_boundary() or h3.h3_set_to_multi_polygon().
Converting a GeoJSON to Uber's h3 is quite simple.
Attaching a sample code snippet and GeoJSON used below:
GeoJSON:
{"type": "FeatureCollection","features": [{"type": "Feature","properties": {},"geometry": {"type": "Polygon", "coordinates": [[[77.5250244140625,13.00857192009273],[77.51266479492188,12.971103764892034],[77.52777099609375,12.94099133483504],[77.57171630859375,12.907528813968185],[77.60604858398438,12.914890953258695],[77.662353515625,12.928276105253065],[77.69874572753906,12.961066692801282],[77.65823364257812,13.00990996390651],[77.58956909179688,13.04469656691112],[77.53944396972656,13.038007215169166],[77.5250244140625,13.00857192009273]]]}}]}
Code:
from h3converter import h3converter
geojson_raw = open("sampleJson.geojson",)
geojson = json.load(geojson_raw)
h3_list = []
h3_resolution = 7
for feature in geojson["features"]:
feature_h3 = h3converter.polyfill(feature["geometry"], h3_resolution)
h3_list.append(feature_h3)
print("H3's created => ", len(feature_h3))
print(h3_list)
Response:
[{'8761892edffffff', '8760145b0ffffff', '87618925effffff', '87618925bffffff', '87618924affffff', '876189256ffffff', '8760145b1ffffff', '8761892eaffffff', '8761892eeffffff', '876189253ffffff', '876189259ffffff', '8760145b2ffffff', '876014586ffffff', '8760145b4ffffff', '8761892e1ffffff', '8760145a2ffffff', '8761892ecffffff', '876189251ffffff', '8760145a4ffffff', '8761892e5ffffff', '87618925affffff', '8761892e9ffffff', '8761892cdffffff', '876189250ffffff', '87618925dffffff', '8760145b6ffffff', '876014595ffffff', '876189252ffffff', '8761892ebffffff', '8760145a3ffffff', '8760145a6ffffff', '876014584ffffff', '876189258ffffff', '8760145b5ffffff', '8760145b3ffffff', '876014594ffffff', '8761892c9ffffff', '87618925cffffff', '8760145a0ffffff', '8761892e8ffffff'}]
Package used: https://pypi.org/project/h3converter/
You could use the above package to convert h3 to GeoJSON as well.
You can extract GeoJSON from Shapley like this:
import geopandas as gpd
import shapely.geometry
# Create Geometry
shapely_polygon = shapely.geometry.Polygon([(0, 0), (0, 1), (1, 0)])
# Extract JSON Feature Collection
gpd.GeoSeries([shapely_polygon]).__geo_interface__
Output:
{"type":"FeatureCollection","features":[{"id":"0","type":"Feature","properties":{},"geometry":{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]},"bbox":[0.0,0.0,1.0,1.0]}],"bbox":[0.0,0.0,1.0,1.0]}
I am new to Python, so I apologize for the rudimentary programming skills, I am aware I am using a bit too much "loop for" (coming from Matlab it is dragging me down).
I have millions of points (timestep, long, lat, pointID) and hundreds of irregular non-overlapping polygons (vertex_long,vertex_lat,polygonID).points and polygons format sample
I want to know what polygon contains each point.
I was able to do it this way:
from matplotlib import path
def inpolygon(lon_point, lat_point, lon_poly, lat_poly):
shape = lon_point.shape
lon_point = lon_point.reshape(-1)
lat_point = lat_point.reshape(-1)
lon_poly = lon_poly.values.reshape(-1)
lat_poly = lat_poly.values.reshape(-1)
points = [(lon_point[i], lat_point[i]) for i in range(lon_point.shape[0])]
polys = path.Path([(lon_poly[i], lat_poly[i]) for i in range(lon_poly.shape[0])])
return polys.contains_points(points).reshape(shape)
And then
import numpy as np
import pandas as pd
Areas_Lon = Areas.iloc[:,0]
Areas_Lat = Areas.iloc[:,1]
Areas_ID = Areas.iloc[:,2]
Unique_Areas = np.unique(Areas_ID)
Areas_true=np.zeros((Areas_ID.shape[0],Unique_Areas.shape[0]))
for i in range(Areas_ID.shape[0]):
for ii in range(Unique_Areas.shape[0]):
Areas_true[i,ii]=(Areas_ID[i]==Unique_Areas[ii])
Areas_Lon_Vertex=np.zeros(Unique_Areas.shape[0],dtype=object)
Areas_Lat_Vertex=np.zeros(Unique_Areas.shape[0],dtype=object)
for i in range(Unique_Areas.shape[0]):
Areas_Lon_Vertex[i]=(Areas_Lon[(Areas_true[:,i]==1)])
Areas_Lat_Vertex[i]=(Areas_Lat[(Areas_true[:,i]==1)])
import f_inpolygon as inpolygon
Areas_in=np.zeros((Unique_Areas.shape[0],Points.shape[0]))
for i in range (Unique_Areas.shape[0]):
for ii in range (PT.shape[0]):
Areas_in[i,ii]=(inpolygon.inpolygon(Points[ii,2], Points[ii,3], Areas_Lon_Vertex[i], Areas_Lat_Vertex[i]))
This way the final outcome Areas_in Areas_in format contains as many rows as polygons and as many columns as points, where every column is true=1 at the row where the point is relative to polygon index (1st given polygon ID --> 1st row, and so).
The code works but very slowly for what it is supossed to do. When locating points in a regular grid or within a point radius I have succesfully tried implement a KDtree, what increases dramatically the speed, but I can`t do the same or whatever faster to irregular non-overlapping polygons.
I have seen some related questions but rather than asking for what polygons a point is were about whether a point is inside a polygon or not.
Any idea please?
Have you tried Geopandas Spatial join?
install the Package using pip
pip install geopandas
or conda
conda install -c conda-forge geopandas
then you should able to read the data as GeoDataframe
import geopandas
df = geopandas.read_file("file_name1.csv") # you can read shp files too.
right_df = geopandas.read_file("file_name2.csv") # you can read shp files too.
# Convert into geometry column
geometry = [Point(xy) for xy in zip(df['longitude'], df['latitude'])] # Coordinate reference system : WGS84
crs = {'init': 'epsg:4326'}
# Creating a Geographic data frame
left_df = geopandas.GeoDataFrame(df, crs=crs, geometry=geometry)
Then you can apply the sjoin
jdf = geopandas.sjoin(left_df, right_df, how='inner', op='intersects', lsuffix='left', rsuffix='right')
the option in op are:
intersects
contains
within
All should do the same in your case when you joining two geometry columns of type Polygon and Point
The task is to make an adress popularity map for Moscow. Basically, it should look like this:
https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/GeoJSON_and_choropleth.ipynb
For my map I use public geojson: http://gis-lab.info/qa/moscow-atd.html
The only data I have - points coordinates and there's no information about the district they belong to.
Question 1:
Do I have to manually calculate for each disctrict if the point belongs to it, or there is more effective way to do this?
Question 2:
If there is no way to do this easier, then, how can I get all the coordinates for each disctrict from the geojson file (link above)?
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
Reading in the Moscow area shape file with geopandas
districts = gpd.read_file('mo-shape/mo.shp')
Construct a mock user dataset
moscow = [55.7, 37.6]
data = (
np.random.normal(size=(100, 2)) *
np.array([[.25, .25]]) +
np.array([moscow])
)
my_df = pd.DataFrame(data, columns=['lat', 'lon'])
my_df['pop'] = np.random.randint(500, 100000, size=len(data))
Create Point objects from the user data
geom = [Point(x, y) for x,y in zip(my_df['lon'], my_df['lat'])]
# and a geopandas dataframe using the same crs from the shape file
my_gdf = gpd.GeoDataFrame(my_df, geometry=geom)
my_gdf.crs = districts.crs
Then the join using default value of 'inner'
gpd.sjoin(districts, my_gdf, op='contains')
Thanks to #BobHaffner, I tried to solve the problem using geopandas.
Here are my steps:
I download a shape-files for Moscow using this link click
From a list of tuples containing x and y (latitude and logitude) coordinates I create list of Points (docs)
Assuming that in the dataframe from the first link I have polygons I can write a simple loop for checking if the Point is inside this polygon. For details read this.
I have this code:
import pandas as pd
import numpy as np
from geopandas import GeoDataFrame
import geopandas
from shapely.geometry import LineString, Point
import matplotlib.pyplot as plt
import contextily
''' Do Something'''
df = start_stop_df.drop('track', axis=1)
crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(df, crs=crs, geometry=geometry)
ax = gdf.plot()
contextily.add_basemap(ax)
ax.set_axis_off()
plt.show()
Basically, this generates a background map that is in Singapore. However, when I run it, I get the following error: HTTPError: Tile URL resulted in a 404 error. Double-check your tile url:http://tile.stamen.com/terrain/29/268436843/268435436.png
However, it still produces this image:
How can I change the Tile URL? I would still like to have the map of Singapore as the base layer.
EDIT:
Also tried including this argument to add_basemap:
url ='https://www.openstreetmap.org/#map=12/1.3332/103.7987'
Which produced this error:
OSError: cannot identify image file <_io.BytesIO object at 0x000001CC3CC4BC50>
First make sure that your GeoDataframe is in Web Mercator projection (epsg=3857). Once your Geodataframe is correctly georeferenced, you can achieve this by Geopandas reprojection:
df = df.to_crs(epsg=3857)
Once you have this done, you easily choose any of the supported map styles. A full list can be found in contextily.sources module, at the time of writing:
### Tile provider sources ###
ST_TONER = 'http://tile.stamen.com/toner/tileZ/tileX/tileY.png'
ST_TONER_HYBRID = 'http://tile.stamen.com/toner-hybrid/tileZ/tileX/tileY.png'
ST_TONER_LABELS = 'http://tile.stamen.com/toner-labels/tileZ/tileX/tileY.png'
ST_TONER_LINES = 'http://tile.stamen.com/toner-lines/tileZ/tileX/tileY.png'
ST_TONER_BACKGROUND = 'http://tile.stamen.com/toner-background/tileZ/tileX/tileY.png'
ST_TONER_LITE = 'http://tile.stamen.com/toner-lite/tileZ/tileX/tileY.png'
ST_TERRAIN = 'http://tile.stamen.com/terrain/tileZ/tileX/tileY.png'
ST_TERRAIN_LABELS = 'http://tile.stamen.com/terrain-labels/tileZ/tileX/tileY.png'
ST_TERRAIN_LINES = 'http://tile.stamen.com/terrain-lines/tileZ/tileX/tileY.png'
ST_TERRAIN_BACKGROUND = 'http://tile.stamen.com/terrain-background/tileZ/tileX/tileY.png'
ST_WATERCOLOR = 'http://tile.stamen.com/watercolor/tileZ/tileX/tileY.png'
# OpenStreetMap as an alternative
OSM_A = 'http://a.tile.openstreetmap.org/tileZ/tileX/tileY.png'
OSM_B = 'http://b.tile.openstreetmap.org/tileZ/tileX/tileY.png'
OSM_C = 'http://c.tile.openstreetmap.org/tileZ/tileX/tileY.png'
Keep in mind that you should not be adding actual x,y,z tile numbers in your tile URL (like you did in your "EDIT" example). ctx will take care of all this.
You can find a working copy-pastable example and further info at GeoPandas docs.
import contextily as ctx
# Dataframe you want to plot
gdf = GeoDataFrame(df, crs= {"init": "epsg:4326"}) # Create a georeferenced dataframe
gdf = gdf.to_crs(epsg=3857) # reproject it in Web mercator
ax = gdf.plot()
# choose any of the supported maps from ctx.sources
ctx.add_basemap(ax, url=ctx.sources.ST_TERRAIN)
ax.set_axis_off()
plt.show()
Contextily's default crs is epsg:3857. However, your data-frame is in different CRS. Use the following,refer the manual here:
ctx.add_basemap(ax, crs='epsg:4326', source=ctx.providers.Stamen.TonerLite)
Please, refer to this link for using different sources such as Stamen.Toner, Stamen.Terrain etc. (Stamen.Terrain is used as default).
Also, you can cast your data frame to EPSG:3857 by using df.to_crs(). In this case, you should skip crs argument inside ctx.add_basemap() function.
im too new to add a comment but I wanted to point out to those saying in the comments that they get a 404 error. Check you capitaliations, etc. Stamen's urls are specifc on this. For instance there is not an all caps call. It is only capitalize the first letter. For example:
ctx.add_basemap(ax=ax,url=ctx.providers.Stamen.Toner, zoom=10)
I have a pandas Dataframe with a few million rows, each with an X and Y attribute with their location in kilometres according to the WGS 1984 World Mercator projection (created using ArcGIS).
What is the easiest way to project these points back to degrees, without leaving the Python/pandas environment?
There is already a python module that can do these kind of transformations for you called pyproj. I will agree it is actually not the simplest module to find via google. Some examples of its use can be seen here
Many years later, this is how I would do this. Keeping everything in GeoPandas to minimise the possibility of footguns.
Some imports:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
Create a dataframe (note the values must be in metres!)
df = pd.DataFrame({"X": [50e3, 900e3], "Y": [20e3, 900e3]})
Create geometries from the X/Y coordinates
df["geometry"] = df.apply(lambda row: Point(row.X, row.Y), axis=1)
Convert to a GeoDataFrame, setting the current CRS.
In this case EPSG:3857, the projection from the question.
gdf = gpd.GeoDataFrame(df, crs=3857)
Project it to the standard WGS84 CRS in degrees (EPSG:4326).
gdf = gdf.to_crs(4326)
And then (optionally), extract the X/Y coordinates in degrees back into standard columns:
gdf["X_deg"] = gdf.geometry.apply(lambda p: p.x)
gdf["Y_deg"] = gdf.geometry.apply(lambda p: p.y)