PANDAS-GEOPANDAS: Localization of points in a shapefile

PANDAS-GEOPANDAS: Localization of points in a shapefile - python

Using pandas and geopandas, I would like to define a function to be applied to each row of a dataframe which operates as follows:
INPUT: column with coordinates
OUTPUT: zone in which the point falls.
I tried with this, but it takes very long.
def zone_assign(point,zones,codes):
try:
zone_label=zones[zones['geometry'].contains(point)][codes].values[0]
except:
zone_label=np.NaN
return(zone_label)
where:
point is the cell of the row which contains geographical coordinates;
zones is the shapefile imported with geopandas;
codes is the column of the shapefile which contains label to be assigned to the point.

Part of the answer, is taken from another answer I made earlier that needed within rather than contains
Your situation looks like a typical case where spatial joins are useful. The idea of spatial joins is to merge data using geographic coordinates instead of using attributes.
Three possibilities in geopandas:
intersects
within
contains
It seems like you want contains, which is possible using the following syntax:
geopandas.sjoin(polygons, points, how="inner", op='contains')
Note: You need to have installed rtree to be able to perform such operations. If you need to install this dependency, use pip or conda to install it
Example
As an example, let's take a random sample of cities and plot countries associated. The two example datasets are
import geopandas
import matplotlib.pyplot as plt
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
cities = cities.sample(n=50, random_state=1)
world.head(2)
pop_est continent name iso_a3 gdp_md_est geometry
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
cities.head(3)
name geometry
196 Bogota POINT (-74.08529 4.59837)
95 Tbilisi POINT (44.78885 41.72696)
173 Seoul POINT (126.99779 37.56829)
world is a worldwide dataset and cities is a subset.
Both dataset need to be in the same projection system. If not, use .to_crs before merging.
data_merged = geopandas.sjoin(countries, cities, how="inner", op='contains')
Finally, to see the result let's do a map
f, ax = plt.subplots(1, figsize=(20,10))
data_merged.plot(axes=ax)
countries.plot(axes=ax, alpha=0.25, linewidth=0.1)
plt.show()
and the underlying dataset merges together the information we need
data_merged.head(2)
pop_est continent name_left iso_a3 gdp_md_est geometry index_right name_right
7 6909701 Oceania Papua New Guinea PNG 28020.0 MULTIPOLYGON (((141.00021 -2.60015, 142.73525 ... 59 Port Moresby
9 44293293 South America Argentina ARG 879400.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.25000... 182 Buenos Aires
Here, I used inner join method but that's a parameter you can change if, for instance, you want to keep all points, including those not within a polygon.

Related

Which multipolygon does the point of longtitude and latitude belong to in Python?

I have the longitude and latitude and the expected result is that whichever multipolygon the point is in, I get the name or ID of the multipolygon.
import geopandas as gpd
world = gpd.read_file('/Control_Areas.shp')
world.plot()
Output
0 MULTIPOLYGON (((-9837042.000 6137048.000, -983...
1 MULTIPOLYGON (((-11583146.000 5695095.000, -11...
2 MULTIPOLYGON (((-8542840.287 4154568.013, -854...
3 MULTIPOLYGON (((-10822667.912 2996855.452, -10...
4 MULTIPOLYGON (((-13050304.061 3865631.027, -13.
Previous attempts:
I have tried fiona, shapely and geopandas to get that done but I have struggled horribly to make progress on this. The closest I have gotten is the within and contains function, but the area of work I have struggled is the transformation of multipolygon to polygon successfully as well and then utilising the power of within and contains to get the desired output.
The shapefile has been downloaded from here.

world.crs gives {'init': 'epsg:3857'} (Web Mercator projection) so you should first reproject your GeoDataFrame in the WGS84 projection if you want to keep the latitude-longitude coordinate system of your point.
world = world.to_crs("EPSG:4326")
Then you can use the intersects method of GeoPandas to find the indexes of the Polygons that contain your point.
For example for the city of New York:
from shapely.geometry import Point
NY_pnt = Point(40.712784, -74.005941)
world[["ID","NAME"]][world.intersects(NY_pnt)]
which results in:
ID NAME
20 13501 NEW YORK INDEPENDENT SYSTEM OPERATOR
you can check the result with shapely within method:
NY_pnt.within(world["geometry"][20])
If you have multiple points, you can create a GeoDataFrame and use the sjoin method:
NY_pnt = Point(40.712784, -74.005941)
LA_pnt = Point(34.052235, -118.243683)
points_df = gpd.GeoDataFrame({'geometry': [NY_pnt, LA_pnt]}, crs='EPSG:4326')
results = gpd.sjoin(points_df, world, op='within')
results[['ID', 'NAME']]
Output:
ID NAME
0 13501 NEW YORK INDEPENDENT SYSTEM OPERATOR
1 11208 LOS ANGELES DEPARTMENT OF WATER AND POWER

How to convert shapefile/geojson to hexagons using uber h3 in python?

I want to create hexagons on my geographic map and want to preserve the digital boundary specified by the shapefile/geojson as well.
How do I do it using uber's h3 python library?
I'm new to shapefiles or any other geographic data structures. I'm most comfortable with python.

tl;dr
For convenience, you can use H3-Pandas.
import geopandas as gpd
import h3pandas
# Prepare data
gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf = gdf.loc[gdf['continent'].eq('Africa')]
gdf['gdp_md_per_capita'] = gdf['gdp_md_est'].div(gdf['pop_est'])
resolution = 4
# Resample to H3 cells
gdf.h3.polyfill_resample(resolution)
Full description
Use cases like this are precisely why I wrote H3-Pandas, an integration of H3 with Pandas and GeoPandas.
Let's use the naturalearth_lowres dataset included in GeoPandas to demonstrate the usage.
import geopandas as gpd
import h3pandas
path_shapefile = gpd.datasets.get_path('naturalearth_lowres')
gdf = gpd.read_file(path_shapefile)
Let's also create a numeric column for plotting.
gdf = gdf.loc[gdf['continent'].eq('Africa')]
gdf['gdp_md_per_capita'] = gdf['gdp_md_est'].div(gdf['pop_est'])
ax = gdf.plot(figsize=(15, 15), column='gdp_md_per_capita', cmap='RdBu')
ax.axis('off')
We will use H3 resolution 4. See the H3 resolution table for more details.
resolution = 4
We can add H3 hexagons using the function polyfill. This method adds a list of H3 cells whose centroid falls into each polygon.
gdf_h3 = gdf.h3.polyfill(resolution)
print(gdf_h3['h3_polyfill'].head(3))
1 [846aca7ffffffff, 8496b5dffffffff, 847b691ffff...
2 [84551a9ffffffff, 84551cdffffffff, 8455239ffff...
11 [846af51ffffffff, 8496e63ffffffff, 846a803ffff...
Name: h3_polyfill, dtype: object
If we want to explode the values horizontally (ending up with as many rows as there are H3 cells), we can use the parameter explode
gdf_h3 = gdf.h3.polyfill(resolution, explode=True)
print(gdf_h3.head(3))
pop_est continent name iso_a3 gdp_md_est \
1 53950935 Africa Tanzania TZA 150600.0
1 53950935 Africa Tanzania TZA 150600.0
1 53950935 Africa Tanzania TZA 150600.0
geometry gdp_md_per_capita \
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 0.002791
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 0.002791
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 0.002791
h3_polyfill
1 846aca7ffffffff
1 8496b5dffffffff
1 847b691ffffffff
We can then utilize the method h3_to_geo_boundary to obtain the geometries for the H3 cells. It expects that the index already has the H3 cell ids.
gdf_h3 = gdf_h3.set_index('h3_polyfill').h3.h3_to_geo_boundary()
We can now plot the result
ax = gdf_h3.plot(figsize=(15, 15), column='gdp_md_per_capita', cmap='RdBu')
ax.axis('off')
H3-Pandas actually has convenience function that performs all this at once: polyfill_resample
gdf_h3 = gdf.h3.polyfill_resample(resolution)
ax = gdf_h3.plot(figsize=(15, 15), column='gdp_md_per_capita', cmap='RdBu')
ax.axis('off')

h3-py doesn't know how to work with shapefile data directly, but it sounds like you could use a library like https://github.com/GeospatialPython/pyshp to convert you shapefile data to GeoJSON, and then use h3.polyfill() to convert to a collection of H3 hexagons.
There are lots of options for plotting your boundary along with the H3 hexagons. For example, you might use pydeck and its GeoJsonLayer and H3HexagonLayer layers.
If your plotting software can't use H3 hexagons directly, you can convert them to other formats with functions like h3.h3_to_geo_boundary() or h3.h3_set_to_multi_polygon().

Converting a GeoJSON to Uber's h3 is quite simple.
Attaching a sample code snippet and GeoJSON used below:
GeoJSON:
{"type": "FeatureCollection","features": [{"type": "Feature","properties": {},"geometry": {"type": "Polygon", "coordinates": [[[77.5250244140625,13.00857192009273],[77.51266479492188,12.971103764892034],[77.52777099609375,12.94099133483504],[77.57171630859375,12.907528813968185],[77.60604858398438,12.914890953258695],[77.662353515625,12.928276105253065],[77.69874572753906,12.961066692801282],[77.65823364257812,13.00990996390651],[77.58956909179688,13.04469656691112],[77.53944396972656,13.038007215169166],[77.5250244140625,13.00857192009273]]]}}]}
Code:
from h3converter import h3converter
geojson_raw = open("sampleJson.geojson",)
geojson = json.load(geojson_raw)
h3_list = []
h3_resolution = 7
for feature in geojson["features"]:
feature_h3 = h3converter.polyfill(feature["geometry"], h3_resolution)
h3_list.append(feature_h3)
print("H3's created => ", len(feature_h3))
print(h3_list)
Response:
[{'8761892edffffff', '8760145b0ffffff', '87618925effffff', '87618925bffffff', '87618924affffff', '876189256ffffff', '8760145b1ffffff', '8761892eaffffff', '8761892eeffffff', '876189253ffffff', '876189259ffffff', '8760145b2ffffff', '876014586ffffff', '8760145b4ffffff', '8761892e1ffffff', '8760145a2ffffff', '8761892ecffffff', '876189251ffffff', '8760145a4ffffff', '8761892e5ffffff', '87618925affffff', '8761892e9ffffff', '8761892cdffffff', '876189250ffffff', '87618925dffffff', '8760145b6ffffff', '876014595ffffff', '876189252ffffff', '8761892ebffffff', '8760145a3ffffff', '8760145a6ffffff', '876014584ffffff', '876189258ffffff', '8760145b5ffffff', '8760145b3ffffff', '876014594ffffff', '8761892c9ffffff', '87618925cffffff', '8760145a0ffffff', '8761892e8ffffff'}]
Package used: https://pypi.org/project/h3converter/
You could use the above package to convert h3 to GeoJSON as well.

You can extract GeoJSON from Shapley like this:
import geopandas as gpd
import shapely.geometry
# Create Geometry
shapely_polygon = shapely.geometry.Polygon([(0, 0), (0, 1), (1, 0)])
# Extract JSON Feature Collection
gpd.GeoSeries([shapely_polygon]).__geo_interface__
Output:
{"type":"FeatureCollection","features":[{"id":"0","type":"Feature","properties":{},"geometry":{"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,0.0],[0.0,0.0]]]},"bbox":[0.0,0.0,1.0,1.0]}],"bbox":[0.0,0.0,1.0,1.0]}

How to implement weights in folium Heatmap?

I'm trying to add weights to my folium heatmap layer, but I can't figure out how to correctly implement this.
I have a dataframe with 3 columns: LAT, LON and VALUE. Value being the total sales of that location.
self.map = folium.Map([mlat, mlon], tiles=tiles, zoom_start=8)
locs = zip(self.data.LAT, self.data.LON, self.data.VALUE)
HeatMap(locs, radius=30, blur=10).add_to(self.map)
I tried to use the absolute sales values and I also tried to normalize sales/sales.sum(). Both give me similar results.
The problem is:
Heatmap shows stronger red levels for regions with more stores. Even if the total sales of those stores together is a lot smaller than sales of a distant and isolate large store.
Expected behaviour:
I would expect that the intensity of the heatmap should use the value of sales of each store, as sales was passed in the zip object to the HeatMap plugin.
Let's say I have 2 regions: A and B.
In region A I have 3 stores: 10 + 15 + 10 = 35 total sales.
In region B I have 1 big store: 100 total sales
I'd expect a greater intensity for region B than for region A. I noticed that a similar behaviour only occurs when the difference is very large (if I try 35 vs 5000000 then region B becomes more relevant).
My CSV file is just a random sample, like this:
LAT,LON,VALUE,DATE,DIFFLAT1,DIFFLON1
-22.4056,-53.6193,14,2010,0.0242,0.4505
-22.0516,-53.7025,12,2010,0.3137,0.6636
-22.3239,-52.9108,100,2010,0.0514,0.0002
-22.6891,-53.7424,6,2010,0.0002,0.7887
-21.8762,-53.6866,16,2010,0.7283,0.6180
-22.1861,-53.5353,11,2010,0.1420,0.2924

from folium import plugins
from folium.plugins import HeatMap
heat_df = df.loc[:,["lat","lon","weight"]]
map_hooray = folium.Map(location=[45.517999 ,-73.568184 ], zoom_start=12 )
Format: list of lists as well as lat, lon and weight
heat_data = heat_df.values.tolist()
Plot it on the map
HeatMap(heat_data,radius=13).add_to(map_hooray)
Save the map
map_hooray.save('heat_map.html')

How to extract LineString data from GeoDataFrame and match with Polygon?

Dear StackOverflow community,
Main question:
I have a GeoDataFrame containing the StreetNetwork of NYC - Manhattan (obtained through the osmnx package), where I would like to extract the coordinates (lon/lat data) from all streets which are stored as LineStrings under geometry, like so:
*
field_1 0
access
bridge
geometry LINESTRING (-73.9975944 40.7140611, -73.997492...
highway residential
junction
key 0
lanes
length 11.237
maxspeed 25 mph
name Catherine Street
oneway True
osmid 5670536
ref
service
tunnel
u 1773060097
v 42437559
width
geometry LINESTRING (-73.9975944 40.7140611, -73.997492...
Name: 0, dtype: object
*
What I am trying to do is to extract the geometry information for each line-item:
df.iloc[x][3]
The issue is that the output format is as str:
[1]: LINESTRING (-73.9975944 40.7140611, -73.9974922 40.7139962)
[2]: print(type(...))
[2]: <class 'str'>
This makes it hard to automate and process the output data. Does anyone know how to extract it so that it is already in LineString (or any other useable list/array) format?
Further question:
My overall goal is to map this street information with a shapefile for taxi zones (zones in the format of polygons) to identify which streets are in which zone, and which lon/lat areas are covered with streets within one zone (polygon). Is there any straight forward way to do this leveraging shapely, geopandas or osmnx packages (i.e. something like "polygon.contains(Point)" but in the sense of "polygon.contains(LineString)"?
Thanks a lot for your support!

For extraction coordinates of geometry you need to use the following code:
df.geometry.apply(lambda geom: geom.coords, axis=1)

Creating a UK Heatmap

I have a data frame for UK data that looks something like this:
longitude latitude region priority
51.307733 -0.75708898 South East High
51.527477 -0.20646542 London Medium
51.725135 0.4747223 East of England Low
This dataframe is several thousand rows long. I want a heatmap of the UK broken down by the regions and colour intensity to be dependent on the priority in each region.
I would like to know the best way to turn this into a heatmap of the UK. I have tried geoPandas and Plotly but I have no functioning knowledge of these. Are these the best way to do it or is there a tool out there that you can simply upload your data to and it will plot it for you? Thanks!

For this kind of job i use to go with folium, which is great to work with maps,
But for the heatMap you have to have your "priority" column as float!
import folium
from folium import plugins
from folium.plugins import HeatMap
my_map = folium.Map(location=[51.5074, 0.1278],
zoom_start = 13) # for UK
your_dataframe['latitude'] = your_dataframe['latitude'].astype(float)
your_dataframe['longitude'] = your_dataframe['longitude'].astype(float)
your_dataframe['priority'] = your_dataframe['priority'].astype(float)
heat_df = your_dataframe[['latitude', 'longitude','priority']]
heat_df = heat_df.dropna(axis=0, subset=['latitude','longitude','priority'])
# List comprehension to make out list of lists
heat_data = [[row['latitude'],row['longitude'],row['priority']] for index, row in heat_df.iterrows()]
my_map.add_children(plugins.HeatMap(heat_data))
my_map.save('map.html')
and then you have to open map.html with yout browser

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.