Creating a UK Heatmap - python

I have a data frame for UK data that looks something like this:
longitude latitude region priority
51.307733 -0.75708898 South East High
51.527477 -0.20646542 London Medium
51.725135 0.4747223 East of England Low
This dataframe is several thousand rows long. I want a heatmap of the UK broken down by the regions and colour intensity to be dependent on the priority in each region.
I would like to know the best way to turn this into a heatmap of the UK. I have tried geoPandas and Plotly but I have no functioning knowledge of these. Are these the best way to do it or is there a tool out there that you can simply upload your data to and it will plot it for you? Thanks!

For this kind of job i use to go with folium, which is great to work with maps,
But for the heatMap you have to have your "priority" column as float!
import folium
from folium import plugins
from folium.plugins import HeatMap
my_map = folium.Map(location=[51.5074, 0.1278],
zoom_start = 13) # for UK
your_dataframe['latitude'] = your_dataframe['latitude'].astype(float)
your_dataframe['longitude'] = your_dataframe['longitude'].astype(float)
your_dataframe['priority'] = your_dataframe['priority'].astype(float)
heat_df = your_dataframe[['latitude', 'longitude','priority']]
heat_df = heat_df.dropna(axis=0, subset=['latitude','longitude','priority'])
# List comprehension to make out list of lists
heat_data = [[row['latitude'],row['longitude'],row['priority']] for index, row in heat_df.iterrows()]
my_map.add_children(plugins.HeatMap(heat_data))
my_map.save('map.html')
and then you have to open map.html with yout browser

Related

Folium chloropleth map only showing grey - help to troubleshoot

I having trouble getting some air pollution data to show different colors in a chloropleth map using folium. Please let me know where my code may be throwing an error. I think it is the key_on parameter but need help.
This is how my map turns out.
enter image description here
What I would like is for the mean concentration of the air pollution data to show up on the map but the map is still greyed out.
Here are the files I used:
Geojson file - Used "download zip" in upper right of this website https://gist.github.com/miguelpaz/edbc79fc55447ae736704654b3b2ef90#file-uhf42-geojson
Data file - Exported data from here https://a816-dohbesp.nyc.gov/IndicatorPublic/VisualizationData.aspx?id=2023,719b87,122,Summarize
Here is my code:
import geopandas as gpd
import folium
#clean pollution data
pm_df1 = pd.read_csv('/work/Fine Particulate Matter (PM2.5).csv',header = 5, usecols = ['GeoTypeName', 'Borough','Geography', 'Geography ID','Mean (mcg per cubic meter)'], nrows = 140)
#limit dataframe to rows with neighborhood (UHF 42) that matches geojson file
pm_df2 = pm_df1[(pm_df1['GeoTypeName'] == 'Neighborhood (UHF 42)')]
pm_df2
#clean geojson file
uhf_df2 = gpd.read_file('/work/uhf42.geojson', driver='GeoJSON')
uhf_df2.head()
#drop row 1 that has no geography
uhf_df3 = uhf_df2.iloc[1:]
uhf_df3.head()
## create a map
pm_testmap = folium.Map(location=[40.65639,-73.97379], tiles = "cartodbpositron", zoom_start=10)
# generate choropleth map
pm_testmap.choropleth(
geo_data=uhf_df3,
data=pm_df2,
columns=['Geography', 'Mean (mcg per cubic meter)'],
key_on='feature.properties.uhf_neigh', #think this is where I mess up.
fill_color='BuPu',
fill_opacity=0.2,
line_opacity=0.7,
legend_name='Average dust concentration',
smooth_factor=0)
# display map
pm_testmap
The problem with key_on is right as you think.
Both data have the name of UHF written on them, but in a completely different form.
In order to link these two, the data must first be preprocessed.
I don't know your data.
It would be nice if you could df.head() the two data to show them, but I'll explain based on the data I checked through the link you provided.
In your geojson file, uhf_neigh simply says Northeast Bronx. However, your PM data appears to have the region listed as Bronx: Northeast Bronx. The following process seems to be necessary to unify your local name before plotting map.
uhf_df2['UHF_NEIGH'] = uhf_df2['BOROUGH']+ ': ' + uhf_df2['UHF_NEIGH']
I tried to run it with your data and code, but it was not even displaying the map. There should be no problem in your code because you have associated the place name in the data frame with the place name in geojson. I gave up on the string association and changed the association to a place name code association, and the map was displayed. The provided csv file failed to load, so I deleted the unnecessary lines and loaded it. Also, I read the file as a json file instead of geopandas.
import pandas as pd
import geopandas as gpd
import json
import folium
pm_df1 = pd.read_csv('./data/test_20211221.csv')
pm_df1 = pm_df1[['GeoTypeName', 'Borough', 'Geography', 'Geography ID', 'Mean (mcg per cubic meter)']]
pm_df2 = pm_df1[(pm_df1['GeoTypeName'] == 'Neighborhood (UHF 42)')]
with open('./data/uhf42.geojson') as f:
uhf_df3 = json.load(f)
pm_testmap = folium.Map(location=[40.65639,-73.97379], tiles = "cartodbpositron", zoom_start=10)
# generate choropleth map
pm_testmap.choropleth(
geo_data=uhf_df3,
data=pm_df2,
columns=['Geography ID', 'Mean (mcg per cubic meter)'],
key_on='feature.properties.uhfcode', #think this is where I mess up.
fill_color='BuPu',
fill_opacity=0.2,
line_opacity=0.7,
legend_name='Average dust concentration',
smooth_factor=0)
# display map
pm_testmap

PANDAS-GEOPANDAS: Localization of points in a shapefile

Using pandas and geopandas, I would like to define a function to be applied to each row of a dataframe which operates as follows:
INPUT: column with coordinates
OUTPUT: zone in which the point falls.
I tried with this, but it takes very long.
def zone_assign(point,zones,codes):
try:
zone_label=zones[zones['geometry'].contains(point)][codes].values[0]
except:
zone_label=np.NaN
return(zone_label)
where:
point is the cell of the row which contains geographical coordinates;
zones is the shapefile imported with geopandas;
codes is the column of the shapefile which contains label to be assigned to the point.
Part of the answer, is taken from another answer I made earlier that needed within rather than contains
Your situation looks like a typical case where spatial joins are useful. The idea of spatial joins is to merge data using geographic coordinates instead of using attributes.
Three possibilities in geopandas:
intersects
within
contains
It seems like you want contains, which is possible using the following syntax:
geopandas.sjoin(polygons, points, how="inner", op='contains')
Note: You need to have installed rtree to be able to perform such operations. If you need to install this dependency, use pip or conda to install it
Example
As an example, let's take a random sample of cities and plot countries associated. The two example datasets are
import geopandas
import matplotlib.pyplot as plt
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
cities = cities.sample(n=50, random_state=1)
world.head(2)
pop_est continent name iso_a3 gdp_md_est geometry
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
cities.head(3)
name geometry
196 Bogota POINT (-74.08529 4.59837)
95 Tbilisi POINT (44.78885 41.72696)
173 Seoul POINT (126.99779 37.56829)
world is a worldwide dataset and cities is a subset.
Both dataset need to be in the same projection system. If not, use .to_crs before merging.
data_merged = geopandas.sjoin(countries, cities, how="inner", op='contains')
Finally, to see the result let's do a map
f, ax = plt.subplots(1, figsize=(20,10))
data_merged.plot(axes=ax)
countries.plot(axes=ax, alpha=0.25, linewidth=0.1)
plt.show()
and the underlying dataset merges together the information we need
data_merged.head(2)
pop_est continent name_left iso_a3 gdp_md_est geometry index_right name_right
7 6909701 Oceania Papua New Guinea PNG 28020.0 MULTIPOLYGON (((141.00021 -2.60015, 142.73525 ... 59 Port Moresby
9 44293293 South America Argentina ARG 879400.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.25000... 182 Buenos Aires
Here, I used inner join method but that's a parameter you can change if, for instance, you want to keep all points, including those not within a polygon.

Plotly Express Choropleth for Country Regions

I have a dataframe created on a csv file about Italian Covid-19 spread all over regions. I was trying to create a px.choropleth plot in which showing Total Positive values for every regions in Italy.
This the code tried:
italy_regions=[i for i in region['Region'].unique()]
fig = px.choropleth(italy_last, locations="Country",
locationmode=italy_regions,
color=np.log(italy_last["TotalPositive"]),
hover_name="Region", hover_data=['TotalPositive'],
color_continuous_scale="Sunsetdark",
title='Regions with Positive Cases')
fig.update(layout_coloraxis_showscale=False)
fig.show()
Now I report some info: 'Country' is the name given to my dataframe and is filled only with the same values: 'Italy'. If I only input 'location="Country"' the graph is fine and I can see Italy colored into the world map.
The problems start when I try to make pyplot color my regions. As I'm a newbye in pyplot express, I read some examples and I thought I had to create a list of italian regions names and then put into 'choropleth' as input for 'barmode'.
Clearly I'm wrong.
So, what is the procedure to follow to make it run (if any)?
In case of need, I can provide both the csv file that the jupyter file I'm working on.
You need to provide a geojson with Italian region borders as geojson parameter to plotly.express.choropleth, for instance this one
https://gist.githubusercontent.com/datajournalism-it/48e29e7c87dca7eb1d29/raw/2636aeef92ba0770a073424853f37690064eb0ea/regioni.geojson
If you use this one, you need to explicitly pass featureidkey='properties.NOME_REG' as a parameter of plotly.express.choropleth.
Working example:
import pandas as pd
import requests
import plotly.express as px
regions = ['Piemonte', 'Trentino-Alto Adige', 'Lombardia', 'Puglia', 'Basilicata',
'Friuli Venezia Giulia', 'Liguria', "Valle d'Aosta", 'Emilia-Romagna',
'Molise', 'Lazio', 'Veneto', 'Sardegna', 'Sicilia', 'Abruzzo',
'Calabria', 'Toscana', 'Umbria', 'Campania', 'Marche']
# Create a dataframe with the region names
df = pd.DataFrame(regions, columns=['NOME_REG'])
# For demonstration, create a column with the length of the region's name
df['name_length'] = df['NOME_REG'].str.len()
# Read the geojson data with Italy's regional borders from github
repo_url = 'https://gist.githubusercontent.com/datajournalism-it/48e29e7c87dca7eb1d29/raw/2636aeef92ba0770a073424853f37690064eb0ea/regioni.geojson'
italy_regions_geo = requests.get(repo_url).json()
# Choropleth representing the length of region names
fig = px.choropleth(data_frame=df,
geojson=italy_regions_geo,
locations='NOME_REG', # name of dataframe column
featureidkey='properties.NOME_REG', # path to field in GeoJSON feature object with which to match the values passed in to locations
color='name_length',
color_continuous_scale="Magma",
scope="europe",
)
fig.update_geos(showcountries=False, showcoastlines=False, showland=False, fitbounds="locations")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
Output image

How to implement weights in folium Heatmap?

I'm trying to add weights to my folium heatmap layer, but I can't figure out how to correctly implement this.
I have a dataframe with 3 columns: LAT, LON and VALUE. Value being the total sales of that location.
self.map = folium.Map([mlat, mlon], tiles=tiles, zoom_start=8)
locs = zip(self.data.LAT, self.data.LON, self.data.VALUE)
HeatMap(locs, radius=30, blur=10).add_to(self.map)
I tried to use the absolute sales values and I also tried to normalize sales/sales.sum(). Both give me similar results.
The problem is:
Heatmap shows stronger red levels for regions with more stores. Even if the total sales of those stores together is a lot smaller than sales of a distant and isolate large store.
Expected behaviour:
I would expect that the intensity of the heatmap should use the value of sales of each store, as sales was passed in the zip object to the HeatMap plugin.
Let's say I have 2 regions: A and B.
In region A I have 3 stores: 10 + 15 + 10 = 35 total sales.
In region B I have 1 big store: 100 total sales
I'd expect a greater intensity for region B than for region A. I noticed that a similar behaviour only occurs when the difference is very large (if I try 35 vs 5000000 then region B becomes more relevant).
My CSV file is just a random sample, like this:
LAT,LON,VALUE,DATE,DIFFLAT1,DIFFLON1
-22.4056,-53.6193,14,2010,0.0242,0.4505
-22.0516,-53.7025,12,2010,0.3137,0.6636
-22.3239,-52.9108,100,2010,0.0514,0.0002
-22.6891,-53.7424,6,2010,0.0002,0.7887
-21.8762,-53.6866,16,2010,0.7283,0.6180
-22.1861,-53.5353,11,2010,0.1420,0.2924
from folium import plugins
from folium.plugins import HeatMap
heat_df = df.loc[:,["lat","lon","weight"]]
map_hooray = folium.Map(location=[45.517999 ,-73.568184 ], zoom_start=12 )
Format: list of lists as well as lat, lon and weight
heat_data = heat_df.values.tolist()
Plot it on the map
HeatMap(heat_data,radius=13).add_to(map_hooray)
Save the map
map_hooray.save('heat_map.html')

Pandas plot several df with different variables on the same barplot

I have 4 Dataframes with different location: Indonesia, Singapore, Malaysia and Total each of them containing the percentage of the 5 top revenue-generating products. I have plotted them separately.
I want to combine them together on one plot where X-axis shows different locations and top-revenue-generating products for each location.
I have printed data frames and as you can see they have different products in them.
print(Ind_top_cat, Sin_top_cat, Mal_top_cat, Tot_top_cat)
Category Amt
M020P 0.144131
MH 0.099439
ML 0.055052
PB 0.050057
PPDR 0.048315
Category Amt
ML 0.480781
M015 0.073034
PPDR 0.035412
M025 0.033418
M020 0.031836
Category Amt
TN 0.343650
PPDR 0.190773
NMCN 0.118425
M015 0.047539
NN 0.038140
Category Amt
M020P 0.158575
MH 0.092012
ML 0.064179
PPDR 0.050803
PB 0.044301
Thanks to joelostblom I was able to construct a plot, however, there are still some issues.
enter image description here
all_countries = pd.concat([Ind_top_cat, Sin_top_cat, Mal_top_cat, Tot_top_cat])
all_countries['Category'] = all_countries.index
sns.barplot(x='Country', y='Amt',hue = 'Category',data=all_countries)
Is there any way I can put legend values on the x-axis (no need to colour categories on I want to instead colour countries), and put data values on top of bars. Also, bars are not centred and have no idea how to solve it.
You could create a new column in each dataframe with the country name, e.g.
Ind_top_cat['Country'] = 'Indonesia'
Sin_top_cat['Country'] = 'Singapore'
The you can create one big dataframe by concatenating the country dataframes together:
all_countries = pd.concat([Ind_top_cat, Sin_top_cat])
And finally, you can use a high level plotting library such as seaborn to assign one column to the x-axis location and one to the color of the bars:
import seaborn as sns
sns.barplot(x='Country', y='Amt', color='Category', data=all_countries)
You can scroll down to the second example on this page to get an idea what such a plot would look like (also pasted below):

Categories