How to generate a choropleth map based on region names? - python

I'm working on Python with a dataset that has data about a numerical variable for each italian region, like this:
import numpy as np
import pandas as pd
regions = ['Trentino Alto Adige', "Valle d'Aosta", 'Veneto', 'Lombardia', 'Emilia-Romagna', 'Toscana', 'Friuli-Venezia Giulia', 'Liguria', 'Piemonte', 'Marche', 'Lazio', 'Umbria', 'Abruzzo', 'Sardegna', 'Puglia', 'Molise', 'Basilicata', 'Calabria', 'Sicilia', 'Campania']
df = pd.DataFrame([regions,[10+(i/2) for i in range(20)]]).transpose()
df.columns = ['region','quantity']
df.head()
I would like to generate a map of Italy in which the colour of the different regions depends on the numeric values of the variable quantity (df['quantity']),i.e., a choropleth map like this:
How can I do it?

You can use geopandas.
The regions in your df compared to the geojson dont match exactly. I'm sure you can find another one, or alter the names so they match.
import pandas as pd
import geopandas as gpd
regions = ['Trentino Alto Adige', "Valle d'Aosta", 'Veneto', 'Lombardia', 'Emilia-Romagna', 'Toscana', 'Friuli-Venezia Giulia', 'Liguria', 'Piemonte', 'Marche', 'Lazio', 'Umbria', 'Abruzzo', 'Sardegna', 'Puglia', 'Molise', 'Basilicata', 'Calabria', 'Sicilia', 'Campania']
df = pd.DataFrame([regions,[10+(i/2) for i in range(20)]]).transpose()
df.columns = ['region','quantity']
#Download a geojson of the region geometries
gdf = gpd.read_file(filename=r'https://raw.githubusercontent.com/openpolis/geojson-italy/master/geojson/limits_IT_municipalities.geojson')
gdf = gdf.dissolve(by='reg_name') #The geojson is to detailed, dissolve boundaries by reg_name attribute
gdf = gdf.reset_index()
#gdf.reg_name[~gdf.reg_name.isin(regions)] Two regions are missing in your df
#16 Trentino-Alto Adige/Südtirol
#18 Valle d'Aosta/Vallée d'Aoste
gdf = pd.merge(left=gdf, right=df, how='left', left_on='reg_name', right_on='region')
ax = gdf.plot(
column="quantity",
legend=True,
figsize=(15, 10),
cmap='OrRd',
missing_kwds={'color': 'lightgrey'});
ax.set_axis_off();

Related

Folium Color Issues

I'm working with Folium for the first time, and attempting to make a Choropleth map of housing values in North Carolina using Zillow data as the source. I've been running into lots of issues along the way, and right now I'm a bit stuck on how to add in colors to the map; if the property value is >100k make it green, and slowing increasing the gradient to orange if it's <850k.
At the moment the map does generate the zip code data fine, but all of the polygons are a black-grey color. It's also not showing a color key or map name, and I have a feeling some of my earlier code could be off.
import folium
import pandas as pd
import requests
import os
working_directory = os.getcwd()
print(working_directory)
path = working_directory + '/Desktop/NCHomes.csv'
df = pd.read_csv(path)
df.head()
df['Homes'].min(), df['Homes'].max()
INDICATOR = 'North Carolina Home Values by Zip Code'
data = df[df['RegionName'] == INDICATOR]
max_value = data['Homes'].max()
data = data[data['Homes'] == max_value]
data.head()
geojson_url = 'https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/nc_north_carolina_zip_codes_geo.min.json'
response = requests.get(geojson_url)
geojson = response.json()
geojson
geojson['features'][0]
map_data = data[['RegionName', 'Homes']]
map_data.head()
M = folium.Map(location=[20, 10], zoom_start=2)
folium.Choropleth(
geo_data=geojson,
data=map_data,
columns=['RegionName', 'Homes'],
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name=INDICATOR
).add_to(M)
M
You can specify the threshold_scale parameter as follows:
folium.Choropleth(
geo_data=geojson,
data=map_data,
columns=['RegionName', 'Homes'],
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
threshold_scale=[100000, 850000],
legend_name=INDICATOR
).add_to(M)

Frequency map by country in python

I am trying to make a world map with some specific frequency data for some countries. I have tried to use plotly (below), but the base map is not available, and it won't let me load a new one I found.
The map I need is a color scale (intensity) for the countries with presence of this variable.
These are the data and the code with which I have tried to plot the map:
database = px.data.gapminder()
d = {'Australia':[3],
'Brazil' :[2],
'Canada':[6],
'Chile':[3],
'Denmark':[1],
'France':[16],
'Germany':[3],
'Israel':[1]}
data = pd.DataFrame(d).T.reset_index()
data.columns=['country', 'count']
df=pd.merge(database, yourdata, how='left', on='country')
url = (
"https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
)
fig = px.choropleth(df, locations="country",
locationmode='ISO-3',
geojson = f"{url}/world-countries.json",
color="count")
I keep getting the same error.
Thank you for your help
I think the reason why it is not displayed is because the location mode and the target column are incorrectly specified. If the location mode is 'iso_3', then for this data, the location would be 'iso_alpha'. Also, if the location mode is 'country names', then the location would be 'country'. Since there are many data presented, we extracted by year and changed the merging method.
import pandas as pd
d = {'Australia':[3],
'Brazil' :[2],
'Canada':[6],
'Chile':[3],
'Denmark':[1],
'France':[16],
'Germany':[3],
'Israel':[1]}
data = pd.DataFrame(d).T.reset_index()
data.columns=['country', 'count']
import plotly.express as px
database = px.data.gapminder().query('year == 2007')
df = pd.merge(database, data, how='inner', on='country')
url = (
"https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
)
fig = px.choropleth(df,
locations="country",#"iso_alpha",
locationmode="country names",#"ISO-3",
geojson = f"{url}/world-countries.json",
color="count"
)
fig.show()

Create a new column from results in Folium

I have been working with an Accident Database from Seattle which contains the coordinates of around 200,000 accidents. What I want to do is to group those accidents geographically in districts, for example. To that end I visualised the grouping on a map using Folium but now I don't know how to extract those same groups into a new column in my database (or if it is even possible).
Here is what I have been doing with Folium and the result:
from folium import plugins
#Using Seattle's latitude and longitude
latitude = 47.608013
longitude = -122.335167
seattle_map = folium.Map(location = [latitude, longitude], zoom_start = 12)
incidents = plugins.MarkerCluster().add_to(seattle_map)
for lat, lng, label, in zip(database.Y, database.X, database.SEVERITYCODE):
folium.Marker(
location=[lat, lng],
icon=None,
popup=folium.Popup(label),
).add_to(incidents)
seattle_map
Output Folium
If you want to add the districts of Seattle, you can use this Github repository with all kinds of geographical data about Seattle : https://github.com/seattleio/seattle-boundaries-data
For instance, if you want to add the zipcode areas on the map, you can use the geojson file like this :
latitude = 47.608013
longitude = -122.335167
url = "https://raw.githubusercontent.com/seattleio/seattle-boundaries-data/master/data/zip-codes.geojson"
seattle_map = folium.Map(location = [latitude, longitude], zoom_start = 12)
folium.GeoJson(
url,
name='zip_code'
).add_to(seattle_map)
seattle_map
If you want to add the zipcode areas to the collisions data, the best is to use Choropleth map from Folium. You need to work a little on your data to know to which zipcode area belongs the collision. I use the shapely library to do so. You can create a code like that :
import json
import requests
import folium
import pandas as pd
from shapely.geometry import shape, Point
# Url of the geojson with zipcode of Seattle
url = "https://raw.githubusercontent.com/seattleio/seattle-boundaries-data/master/data/zip-codes.geojson"
# Import data of the collisions in Seattle
df = pd.read_csv("Collisions.csv")
# Keep only lat and long
df_clean = df.loc[:, ["X", "Y"]]
df_clean = df_clean.dropna()
r = requests.get(url)
for index,row in df_clean.iterrows():
for feature in r.json()["features"]:
polygon = shape(feature['geometry'])
point = Point(row[0], row[1])
if polygon.contains(point):
df_clean.loc[index,'ZCTA5CE10'] = feature["properties"]['ZCTA5CE10']
break
df_clean = df_clean.dropna()
result = df_clean.groupby(["ZCTA5CE10"])["X"].count()
result = pd.DataFrame(result)
result.reset_index(level=0, inplace=True)
#Using Seattle's latitude and longitude
latitude = 47.608013
longitude = -122.335167
seattle_map = folium.Map(location = [latitude, longitude], zoom_start = 12)
folium.Choropleth(
geo_data=url,
name='choropleth',
data=result,
columns=["ZCTA5CE10", 'X'],
key_on='feature.properties.ZCTA5CE10',
fill_color='YlOrRd',
).add_to(seattle_map)
seattle_map
You can improve a lot my code in terms of performance (it is clearly not the best choice to use a for loop to create a new column of the dataframe)

How to get variation of the color in choropleth Folium?

I am surely missing something in choropleth configuration. Please find below code.
import pandas as pd
import folium
df = pd.read_csv("https://cocl.us/sanfran_crime_dataset",index_col=0)
# group by neighborhood
sf = df.groupby('PdDistrict').count()
sf = pd.DataFrame(sf,columns=['Category']) # remove unneeded columns
sf.reset_index(inplace=True) # default index, otherwise groupby column becomes index
sf.rename(columns={'PdDistrict':'Neighborhood','Category':'Count'}, inplace=True)
sf.sort_values(by='Count', inplace=True, ascending=False)
sf
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42
sf_neighborhood_geo = 'https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/san-francisco.geojson'
# Create map
sf_map = folium.Map(location=[latitude,longitude], zoom_start=12)
# Use json file TEST based on class
sf_map.choropleth(
geo_data=sf_neighborhood_geo,
data=sf,
columns=['Neighborhood','Count'],
key_on='name',
fill_color='YlOrRd',
fill_opacity='0.7',
line_opacity='0.3',
legend_name='Crime Rate in San Francisco, by Neighborhood')
folium.LayerControl().add_to(sf_map)
# display the map
sf_map
PLease let me know what part of the choropleth is not correct?
First of all, please use class folium.Choropleth() instead of method choropleth() which is deprecated.
For example, for your problem:
m = folium.Map(location=[latitude,longitude], zoom_start=12)
folium.Choropleth(geo_data=sf_neighborhood_geo,
name='choropleth',
data=sf,
columns=['Neighborhood','Count'],
key_on='feature.properties.name',
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Crime Rate in San Francisco, by Neighborhood').add_to(m)
folium.LayerControl().add_to(m)
Having said that, there are two problems in your code:
according to the geojson file, key_on='name' should be key_on='feature.properties.name'
the column Neighborhood in you DataFrame does not have names contained in the geojson file, therefore you are going to likely obtain a map like this:
In order to obtain a meaningful choropleth map, names in sf_neighborhood_geo should correspond to values in sf['Neighborhood'].

pre-determine optimal level of zoom in folium

I am plotting some maps using folium.
Works pretty smoothly.
However, I could not figure out how to pre-calculate the right level of zoom
I can set it automatically
import folium
m = folium.Map([point_of_interest.iloc[0].Lat, point_of_interest.iloc[0].Long])
but in my use case I would need to pre-calculate zoom_start such that:
all couples (Lat,Long) from my pandas dataframe of point_of_interest are within the map
the zoom level is the mnimum possilbe
folium's fit_bounds method should work for you
Some random sample data
import folium
import numpy as np
import pandas as pd
center_point = [40, -90]
data = (
np.random.normal(size=(100, 2)) *
np.array([[.5, .5]]) +
np.array([center_point])
)
df = pd.DataFrame(data, columns=['Lat', 'Long'])
Creating a map with some markers
m = folium.Map(df[['Lat', 'Long']].mean().values.tolist())
for lat, lon in zip(df['Lat'], df['Long']):
folium.Marker([lat, lon]).add_to(m)
fit_bounds requires the 'bounds' of our data in the form of the southwest and northeast corners. There are some padding parameters you can use as well
sw = df[['Lat', 'Long']].min().values.tolist()
ne = df[['Lat', 'Long']].max().values.tolist()
m.fit_bounds([sw, ne])
m

Categories