I have been working with an Accident Database from Seattle which contains the coordinates of around 200,000 accidents. What I want to do is to group those accidents geographically in districts, for example. To that end I visualised the grouping on a map using Folium but now I don't know how to extract those same groups into a new column in my database (or if it is even possible).
Here is what I have been doing with Folium and the result:
from folium import plugins
#Using Seattle's latitude and longitude
latitude = 47.608013
longitude = -122.335167
seattle_map = folium.Map(location = [latitude, longitude], zoom_start = 12)
incidents = plugins.MarkerCluster().add_to(seattle_map)
for lat, lng, label, in zip(database.Y, database.X, database.SEVERITYCODE):
folium.Marker(
location=[lat, lng],
icon=None,
popup=folium.Popup(label),
).add_to(incidents)
seattle_map
Output Folium
If you want to add the districts of Seattle, you can use this Github repository with all kinds of geographical data about Seattle : https://github.com/seattleio/seattle-boundaries-data
For instance, if you want to add the zipcode areas on the map, you can use the geojson file like this :
latitude = 47.608013
longitude = -122.335167
url = "https://raw.githubusercontent.com/seattleio/seattle-boundaries-data/master/data/zip-codes.geojson"
seattle_map = folium.Map(location = [latitude, longitude], zoom_start = 12)
folium.GeoJson(
url,
name='zip_code'
).add_to(seattle_map)
seattle_map
If you want to add the zipcode areas to the collisions data, the best is to use Choropleth map from Folium. You need to work a little on your data to know to which zipcode area belongs the collision. I use the shapely library to do so. You can create a code like that :
import json
import requests
import folium
import pandas as pd
from shapely.geometry import shape, Point
# Url of the geojson with zipcode of Seattle
url = "https://raw.githubusercontent.com/seattleio/seattle-boundaries-data/master/data/zip-codes.geojson"
# Import data of the collisions in Seattle
df = pd.read_csv("Collisions.csv")
# Keep only lat and long
df_clean = df.loc[:, ["X", "Y"]]
df_clean = df_clean.dropna()
r = requests.get(url)
for index,row in df_clean.iterrows():
for feature in r.json()["features"]:
polygon = shape(feature['geometry'])
point = Point(row[0], row[1])
if polygon.contains(point):
df_clean.loc[index,'ZCTA5CE10'] = feature["properties"]['ZCTA5CE10']
break
df_clean = df_clean.dropna()
result = df_clean.groupby(["ZCTA5CE10"])["X"].count()
result = pd.DataFrame(result)
result.reset_index(level=0, inplace=True)
#Using Seattle's latitude and longitude
latitude = 47.608013
longitude = -122.335167
seattle_map = folium.Map(location = [latitude, longitude], zoom_start = 12)
folium.Choropleth(
geo_data=url,
name='choropleth',
data=result,
columns=["ZCTA5CE10", 'X'],
key_on='feature.properties.ZCTA5CE10',
fill_color='YlOrRd',
).add_to(seattle_map)
seattle_map
You can improve a lot my code in terms of performance (it is clearly not the best choice to use a for loop to create a new column of the dataframe)
Related
I'm working with Folium for the first time, and attempting to make a Choropleth map of housing values in North Carolina using Zillow data as the source. I've been running into lots of issues along the way, and right now I'm a bit stuck on how to add in colors to the map; if the property value is >100k make it green, and slowing increasing the gradient to orange if it's <850k.
At the moment the map does generate the zip code data fine, but all of the polygons are a black-grey color. It's also not showing a color key or map name, and I have a feeling some of my earlier code could be off.
import folium
import pandas as pd
import requests
import os
working_directory = os.getcwd()
print(working_directory)
path = working_directory + '/Desktop/NCHomes.csv'
df = pd.read_csv(path)
df.head()
df['Homes'].min(), df['Homes'].max()
INDICATOR = 'North Carolina Home Values by Zip Code'
data = df[df['RegionName'] == INDICATOR]
max_value = data['Homes'].max()
data = data[data['Homes'] == max_value]
data.head()
geojson_url = 'https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/nc_north_carolina_zip_codes_geo.min.json'
response = requests.get(geojson_url)
geojson = response.json()
geojson
geojson['features'][0]
map_data = data[['RegionName', 'Homes']]
map_data.head()
M = folium.Map(location=[20, 10], zoom_start=2)
folium.Choropleth(
geo_data=geojson,
data=map_data,
columns=['RegionName', 'Homes'],
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name=INDICATOR
).add_to(M)
M
You can specify the threshold_scale parameter as follows:
folium.Choropleth(
geo_data=geojson,
data=map_data,
columns=['RegionName', 'Homes'],
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
threshold_scale=[100000, 850000],
legend_name=INDICATOR
).add_to(M)
I'm working on Python with a dataset that has data about a numerical variable for each italian region, like this:
import numpy as np
import pandas as pd
regions = ['Trentino Alto Adige', "Valle d'Aosta", 'Veneto', 'Lombardia', 'Emilia-Romagna', 'Toscana', 'Friuli-Venezia Giulia', 'Liguria', 'Piemonte', 'Marche', 'Lazio', 'Umbria', 'Abruzzo', 'Sardegna', 'Puglia', 'Molise', 'Basilicata', 'Calabria', 'Sicilia', 'Campania']
df = pd.DataFrame([regions,[10+(i/2) for i in range(20)]]).transpose()
df.columns = ['region','quantity']
df.head()
I would like to generate a map of Italy in which the colour of the different regions depends on the numeric values of the variable quantity (df['quantity']),i.e., a choropleth map like this:
How can I do it?
You can use geopandas.
The regions in your df compared to the geojson dont match exactly. I'm sure you can find another one, or alter the names so they match.
import pandas as pd
import geopandas as gpd
regions = ['Trentino Alto Adige', "Valle d'Aosta", 'Veneto', 'Lombardia', 'Emilia-Romagna', 'Toscana', 'Friuli-Venezia Giulia', 'Liguria', 'Piemonte', 'Marche', 'Lazio', 'Umbria', 'Abruzzo', 'Sardegna', 'Puglia', 'Molise', 'Basilicata', 'Calabria', 'Sicilia', 'Campania']
df = pd.DataFrame([regions,[10+(i/2) for i in range(20)]]).transpose()
df.columns = ['region','quantity']
#Download a geojson of the region geometries
gdf = gpd.read_file(filename=r'https://raw.githubusercontent.com/openpolis/geojson-italy/master/geojson/limits_IT_municipalities.geojson')
gdf = gdf.dissolve(by='reg_name') #The geojson is to detailed, dissolve boundaries by reg_name attribute
gdf = gdf.reset_index()
#gdf.reg_name[~gdf.reg_name.isin(regions)] Two regions are missing in your df
#16 Trentino-Alto Adige/Südtirol
#18 Valle d'Aosta/Vallée d'Aoste
gdf = pd.merge(left=gdf, right=df, how='left', left_on='reg_name', right_on='region')
ax = gdf.plot(
column="quantity",
legend=True,
figsize=(15, 10),
cmap='OrRd',
missing_kwds={'color': 'lightgrey'});
ax.set_axis_off();
I have plotted a heatmap with the following data.
I have thousands of rows. Its just a sample. I also wanted to see the google map view of that coordinate. So I did something like this.
import folium
from folium.plugins import HeatMap
from folium.plugins import FastMarkerCluster
default_location=[11.1657, 45.4515]
m = folium.Map(location=default_location, zoom_start=13)
heat_data = [[row['lat'],row['lon']] for index, row in test.iterrows()]
# Plot it on the map
HeatMap(heat_data).add_to(m)
callback = ('function (row) {'
'var marker = L.marker(new L.LatLng(row[0], row[1]), {color: "red"});'
'var icon = L.AwesomeMarkers.icon({'
"icon: 'info-sign',"
"iconColor: 'white',"
"markerColor: 'green',"
"prefix: 'glyphicon',"
"extraClasses: 'fa-rotate-0'"
'});'
'marker.setIcon(icon);'
"var popup = L.popup({maxWidth: '300'});"
"const display_text = {text1: row[0], text2: row[1]};"
"var mytext = $(`<div id='mytext' class='display_text' style='width: 100.0%; height: 100.0%;'>\
<a href=https://https://www.google.com/maps?ll=${display_text.text1},${display_text.text2} target='_blank'>Open Google Maps</a></div>`)[0];"
"popup.setContent(mytext);"
"marker.bindPopup(popup);"
'return marker};')
m.add_child(FastMarkerCluster(heat_data, callback=callback))
# Display the map
m
Now for every gps coordinate I want to plot a small arrow or few small arrows in the angle of heading_direction and if possible show the distance_of_item in that angle from the gps coordinate. The expected outcome may be something like this.
In the above image, the location pointer is the gps coordinate, the direction and angle would be according to heading direction angle and there is a little star plotted which is the object. The object should be placed at a distance(in meters) mentioned in the dataset. I am not sure how to achieve that. Any lead or suggestions are most welcome. Thanks!
given your sample data is an image, have used alternate GPS data (UK hospitals) then added distance and direction columns as random values
given requirement is to plot a marker at location defined by distance and direction, first step is to calculate GPS co-ordinates of this.
use UTM CRS so that distance is meaningful
use high school maths to calculate x and y in UTM CRS
convert CRS back to WSG 84 so that have GPS co-ordinates
you have tagged question as plotly so I have used mapbox line and scatter traces to demonstrate building a tiled map
sample data is 1200+ hospitals, performance is decent
geopandas data frame could also be used to build folium tiles / markers. Key step is calculating the GPS co-ordinates
import geopandas as gpd
import pandas as pd
import numpy as np
import shapely
import math
import plotly.express as px
import plotly.graph_objects as go
import io, requests
# get some public addressess - hospitals. data that has GPS lat / lon
dfhos = pd.read_csv(io.StringIO(requests.get("http://media.nhschoices.nhs.uk/data/foi/Hospital.csv").text),
sep="¬",engine="python",).loc[:, ["OrganisationName", "Latitude", "Longitude"]]
# debug with fewer records
# df = dfhos.loc[0:500]
df = dfhos
# to use CRS transformations use geopandas, initial data is WSG 84, transform to UTM geometry
# directions and distances are random
gdf = gpd.GeoDataFrame(
data=df.assign(
heading_direction=lambda d: np.random.randint(0, 360, len(d)),
distance_of_item=lambda d: np.random.randint(10 ** 3, 10 ** 4, len(d)),
),
geometry=df.loc[:, ["Longitude", "Latitude"]].apply(
lambda r: shapely.geometry.Point(r["Longitude"], r["Latitude"]), axis=1
),
crs="EPSG:4326",
).pipe(lambda d: d.to_crs(d.estimate_utm_crs()))
# standard high school geometry...
def new_point(point, d, alpha):
alpha = math.radians(alpha)
return shapely.geometry.Point(
point.x + (d * math.cos(alpha)),
point.y + (d * math.sin(alpha)),
)
# calculate points based on direction and distance in UTM CRS. Then convert back to WSG 84 CRS
gdf["geometry2"] = gpd.GeoSeries(
gdf.apply(
lambda r: new_point(
r["geometry"], r["distance_of_item"], r["heading_direction"]
),
axis=1,
),
crs=gdf.geometry.crs,
).to_crs("EPSG:4326")
gdf = gdf.to_crs("EPSG:4326")
# plot lines to show start point and direct. plot markers of destinations for text of distance, etc
fig = px.line_mapbox(
lon=np.stack(
[gdf.geometry.x.values, gdf.geometry2.x.values, np.full(len(gdf), np.nan)],
axis=1,
).reshape([1, len(gdf) * 3])[0],
lat=np.stack(
[gdf.geometry.y.values, gdf.geometry2.y.values, np.full(len(gdf), np.nan)],
axis=1,
).reshape([1, len(gdf) * 3])[0],
).add_traces(
px.scatter_mapbox(
gdf,
lat=gdf.geometry2.y,
lon=gdf.geometry2.x,
hover_data=["distance_of_item", "OrganisationName"],
).data
)
# c = gdf.loc[]
fig.update_layout(mapbox={"style": "open-street-map", "zoom": 8, 'center': {'lat': 52.2316838387109, 'lon': -1.4577750831062155}}, margin={"l":0,"r":0,"t":0,"r":0})
I am plotting some maps using folium.
Works pretty smoothly.
However, I could not figure out how to pre-calculate the right level of zoom
I can set it automatically
import folium
m = folium.Map([point_of_interest.iloc[0].Lat, point_of_interest.iloc[0].Long])
but in my use case I would need to pre-calculate zoom_start such that:
all couples (Lat,Long) from my pandas dataframe of point_of_interest are within the map
the zoom level is the mnimum possilbe
folium's fit_bounds method should work for you
Some random sample data
import folium
import numpy as np
import pandas as pd
center_point = [40, -90]
data = (
np.random.normal(size=(100, 2)) *
np.array([[.5, .5]]) +
np.array([center_point])
)
df = pd.DataFrame(data, columns=['Lat', 'Long'])
Creating a map with some markers
m = folium.Map(df[['Lat', 'Long']].mean().values.tolist())
for lat, lon in zip(df['Lat'], df['Long']):
folium.Marker([lat, lon]).add_to(m)
fit_bounds requires the 'bounds' of our data in the form of the southwest and northeast corners. There are some padding parameters you can use as well
sw = df[['Lat', 'Long']].min().values.tolist()
ne = df[['Lat', 'Long']].max().values.tolist()
m.fit_bounds([sw, ne])
m
I have a geopandas df with a column of shapely point objects. I want to extract the coordinate (lat/lon) from the shapely point objects to generate latitude and longitude columns. There must be an easy way to do this, but I cannot figure it out.
I know you can extract the individual coordinates like this:
lon = df.point_object[0].x
lat = df.point_object[0].y
And I could create a function that does this for the entire df, but I figured there was a more efficient/elegant way.
If you have the latest version of geopandas (0.3.0 as of writing), and the if df is a GeoDataFrame, you can use the x and y attributes on the geometry column:
df['lon'] = df.point_object.x
df['lat'] = df.point_object.y
In general, if you have a column of shapely objects, you can also use apply to do what you can do on individual coordinates for the full column:
df['lon'] = df.point_object.apply(lambda p: p.x)
df['lat'] = df.point_object.apply(lambda p: p.y)
Without having to iterate over the Dataframe, you can do the following:
df['lon'] = df['geometry'].x
df['lat'] = df['geometry'].y
The solution to extract the center point (latitude and longitude) from the polygon and multi-polygon.
import geopandas as gpd
df = gpd.read_file(path + 'df.geojson')
#Find the center point
df['Center_point'] = df['geometry'].centroid
#Extract lat and lon from the centerpoint
df["long"] = df.Center_point.map(lambda p: p.x)
df["lat"] = df.Center_point.map(lambda p: p.y)