Plotly Express Choropleth for Country Regions - python

I have a dataframe created on a csv file about Italian Covid-19 spread all over regions. I was trying to create a px.choropleth plot in which showing Total Positive values for every regions in Italy.
This the code tried:
italy_regions=[i for i in region['Region'].unique()]
fig = px.choropleth(italy_last, locations="Country",
locationmode=italy_regions,
color=np.log(italy_last["TotalPositive"]),
hover_name="Region", hover_data=['TotalPositive'],
color_continuous_scale="Sunsetdark",
title='Regions with Positive Cases')
fig.update(layout_coloraxis_showscale=False)
fig.show()
Now I report some info: 'Country' is the name given to my dataframe and is filled only with the same values: 'Italy'. If I only input 'location="Country"' the graph is fine and I can see Italy colored into the world map.
The problems start when I try to make pyplot color my regions. As I'm a newbye in pyplot express, I read some examples and I thought I had to create a list of italian regions names and then put into 'choropleth' as input for 'barmode'.
Clearly I'm wrong.
So, what is the procedure to follow to make it run (if any)?
In case of need, I can provide both the csv file that the jupyter file I'm working on.

You need to provide a geojson with Italian region borders as geojson parameter to plotly.express.choropleth, for instance this one
https://gist.githubusercontent.com/datajournalism-it/48e29e7c87dca7eb1d29/raw/2636aeef92ba0770a073424853f37690064eb0ea/regioni.geojson
If you use this one, you need to explicitly pass featureidkey='properties.NOME_REG' as a parameter of plotly.express.choropleth.
Working example:
import pandas as pd
import requests
import plotly.express as px
regions = ['Piemonte', 'Trentino-Alto Adige', 'Lombardia', 'Puglia', 'Basilicata',
'Friuli Venezia Giulia', 'Liguria', "Valle d'Aosta", 'Emilia-Romagna',
'Molise', 'Lazio', 'Veneto', 'Sardegna', 'Sicilia', 'Abruzzo',
'Calabria', 'Toscana', 'Umbria', 'Campania', 'Marche']
# Create a dataframe with the region names
df = pd.DataFrame(regions, columns=['NOME_REG'])
# For demonstration, create a column with the length of the region's name
df['name_length'] = df['NOME_REG'].str.len()
# Read the geojson data with Italy's regional borders from github
repo_url = 'https://gist.githubusercontent.com/datajournalism-it/48e29e7c87dca7eb1d29/raw/2636aeef92ba0770a073424853f37690064eb0ea/regioni.geojson'
italy_regions_geo = requests.get(repo_url).json()
# Choropleth representing the length of region names
fig = px.choropleth(data_frame=df,
geojson=italy_regions_geo,
locations='NOME_REG', # name of dataframe column
featureidkey='properties.NOME_REG', # path to field in GeoJSON feature object with which to match the values passed in to locations
color='name_length',
color_continuous_scale="Magma",
scope="europe",
)
fig.update_geos(showcountries=False, showcoastlines=False, showland=False, fitbounds="locations")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
Output image

Related

Plotting a map using Geoview and using size/ colour option

I'm trying to visualize a dataset which I've filtered down to just longitude/latitude, country name, year and a count of deaths. I'm trying to plot that using geoviews as I wish to add lot more to my dataset and interactive map would be a great add on
My code is as follows: (for_plot is the dataframe)
# Plotting the graph
Best = gv.Dataset(for_plot)
points = Best.to(gv.Points, ['longitude', 'latitude'], ['deaths', 'country'])
(gts.Wikipedia * points).opts(
opts.Points(width=600, height=350, tools=['hover'],
size='deaths', cmap='viridis'))
This creates a perfect graph put the 'size' function doesn't work. If I change size to color, graph is not generated. I'm okay with either but just need atleast one marker.
Thanks for any help
Tried to switch values for color instead of size, works with year but not deaths

Folium chloropleth map only showing grey - help to troubleshoot

I having trouble getting some air pollution data to show different colors in a chloropleth map using folium. Please let me know where my code may be throwing an error. I think it is the key_on parameter but need help.
This is how my map turns out.
enter image description here
What I would like is for the mean concentration of the air pollution data to show up on the map but the map is still greyed out.
Here are the files I used:
Geojson file - Used "download zip" in upper right of this website https://gist.github.com/miguelpaz/edbc79fc55447ae736704654b3b2ef90#file-uhf42-geojson
Data file - Exported data from here https://a816-dohbesp.nyc.gov/IndicatorPublic/VisualizationData.aspx?id=2023,719b87,122,Summarize
Here is my code:
import geopandas as gpd
import folium
#clean pollution data
pm_df1 = pd.read_csv('/work/Fine Particulate Matter (PM2.5).csv',header = 5, usecols = ['GeoTypeName', 'Borough','Geography', 'Geography ID','Mean (mcg per cubic meter)'], nrows = 140)
#limit dataframe to rows with neighborhood (UHF 42) that matches geojson file
pm_df2 = pm_df1[(pm_df1['GeoTypeName'] == 'Neighborhood (UHF 42)')]
pm_df2
#clean geojson file
uhf_df2 = gpd.read_file('/work/uhf42.geojson', driver='GeoJSON')
uhf_df2.head()
#drop row 1 that has no geography
uhf_df3 = uhf_df2.iloc[1:]
uhf_df3.head()
## create a map
pm_testmap = folium.Map(location=[40.65639,-73.97379], tiles = "cartodbpositron", zoom_start=10)
# generate choropleth map
pm_testmap.choropleth(
geo_data=uhf_df3,
data=pm_df2,
columns=['Geography', 'Mean (mcg per cubic meter)'],
key_on='feature.properties.uhf_neigh', #think this is where I mess up.
fill_color='BuPu',
fill_opacity=0.2,
line_opacity=0.7,
legend_name='Average dust concentration',
smooth_factor=0)
# display map
pm_testmap
The problem with key_on is right as you think.
Both data have the name of UHF written on them, but in a completely different form.
In order to link these two, the data must first be preprocessed.
I don't know your data.
It would be nice if you could df.head() the two data to show them, but I'll explain based on the data I checked through the link you provided.
In your geojson file, uhf_neigh simply says Northeast Bronx. However, your PM data appears to have the region listed as Bronx: Northeast Bronx. The following process seems to be necessary to unify your local name before plotting map.
uhf_df2['UHF_NEIGH'] = uhf_df2['BOROUGH']+ ': ' + uhf_df2['UHF_NEIGH']
I tried to run it with your data and code, but it was not even displaying the map. There should be no problem in your code because you have associated the place name in the data frame with the place name in geojson. I gave up on the string association and changed the association to a place name code association, and the map was displayed. The provided csv file failed to load, so I deleted the unnecessary lines and loaded it. Also, I read the file as a json file instead of geopandas.
import pandas as pd
import geopandas as gpd
import json
import folium
pm_df1 = pd.read_csv('./data/test_20211221.csv')
pm_df1 = pm_df1[['GeoTypeName', 'Borough', 'Geography', 'Geography ID', 'Mean (mcg per cubic meter)']]
pm_df2 = pm_df1[(pm_df1['GeoTypeName'] == 'Neighborhood (UHF 42)')]
with open('./data/uhf42.geojson') as f:
uhf_df3 = json.load(f)
pm_testmap = folium.Map(location=[40.65639,-73.97379], tiles = "cartodbpositron", zoom_start=10)
# generate choropleth map
pm_testmap.choropleth(
geo_data=uhf_df3,
data=pm_df2,
columns=['Geography ID', 'Mean (mcg per cubic meter)'],
key_on='feature.properties.uhfcode', #think this is where I mess up.
fill_color='BuPu',
fill_opacity=0.2,
line_opacity=0.7,
legend_name='Average dust concentration',
smooth_factor=0)
# display map
pm_testmap

Encoding a list column to the legend of a plot

Apologies in advance, I am not sure how to word this question best:
I am working with a large dataset, and I would like to plot Latitude and Longitude where the colour of the points (actually the opacity) is encoded to a 'FeatureType' column binded to the legend. This way I can use the legend to highlight on my map various features I am looking for.
Here is a picture of my map and legend so far
The problem is that in my dataset, the FeatureType column is a list of features that can be found there (i.e arch, bridge, etc..).
How can I make it so that the point shows up for both arch, and bridge. At the moment it creates its own category of (arch,bridge etc.), leading to over 300 combinations of about 20 different FeatureTypes.
The dataset can be found at http://atlantides.org/downloads/pleiades/dumps/pleiades-locations-latest.csv.gz
N.B: I am using altair/pandas
import altair as alt
import pandas as pd
from vega_datasets import data
df = pd.read_csv ('C://path/pleiades-locations.csv')
alt.data_transformers.enable('json')
countries = alt.topo_feature(data.world_110m.url, 'countries')
selection = alt.selection_multi(fields=['featureType'], bind='legend')
brush = alt.selection(type='interval', encodings=['x'])
map = alt.Chart(countries).mark_geoshape(
fill='lightgray',
stroke='white'
).project('equirectangular').properties(
width=500,
height=300
)
points = alt.Chart(df).mark_circle().encode(
alt.Latitude('reprLat:Q'),
alt.Longitude('reprLong:Q'),
alt.Color('featureType:N'),
tooltip=['featureType','timePeriodsKeys:N'],
opacity=alt.condition(selection, alt.value(1), alt.value(0.0))
).add_selection(
selection)
(map + points)
It is not possible for Altair to generate the labels you want from your current column format. You will need to turn your comma-separated string labels into lists and then explode the column so that you get one row per item in the list:
import altair as alt
import pandas as pd
from vega_datasets import data
alt.data_transformers.enable('data_server')
df = pd.read_csv('http://atlantides.org/downloads/pleiades/dumps/pleiades-locations-latest.csv.gz')[['reprLong', 'reprLat', 'featureType']]
df['featureType'] = df['featureType'].str.split(',')
df = df.explode('featureType')
countries = alt.topo_feature(data.world_110m.url, 'countries')
world_map = alt.Chart(countries).mark_geoshape(
fill='lightgray',
stroke='white')
points = alt.Chart(df).mark_circle(size=10).encode(
alt.Latitude('reprLat:Q'),
alt.Longitude('reprLong:Q'),
alt.Color('featureType:N', legend=alt.Legend(columns=2)))
world_map + points
Note that having this many entries in the legend is not meaningful since the colors are repeated. The interactivity would help with that somewhat, but I would consider splitting this up into multiple charts. I am not sure if it is even possible to expand the legend to show those hidden 81 entries. And double check that the long lat location corresponds correctly with the world map projection you are using, they seemed to move around when I changed the projection.

Pandas plot several df with different variables on the same barplot

I have 4 Dataframes with different location: Indonesia, Singapore, Malaysia and Total each of them containing the percentage of the 5 top revenue-generating products. I have plotted them separately.
I want to combine them together on one plot where X-axis shows different locations and top-revenue-generating products for each location.
I have printed data frames and as you can see they have different products in them.
print(Ind_top_cat, Sin_top_cat, Mal_top_cat, Tot_top_cat)
Category Amt
M020P 0.144131
MH 0.099439
ML 0.055052
PB 0.050057
PPDR 0.048315
Category Amt
ML 0.480781
M015 0.073034
PPDR 0.035412
M025 0.033418
M020 0.031836
Category Amt
TN 0.343650
PPDR 0.190773
NMCN 0.118425
M015 0.047539
NN 0.038140
Category Amt
M020P 0.158575
MH 0.092012
ML 0.064179
PPDR 0.050803
PB 0.044301
Thanks to joelostblom I was able to construct a plot, however, there are still some issues.
enter image description here
all_countries = pd.concat([Ind_top_cat, Sin_top_cat, Mal_top_cat, Tot_top_cat])
all_countries['Category'] = all_countries.index
sns.barplot(x='Country', y='Amt',hue = 'Category',data=all_countries)
Is there any way I can put legend values on the x-axis (no need to colour categories on I want to instead colour countries), and put data values on top of bars. Also, bars are not centred and have no idea how to solve it.
You could create a new column in each dataframe with the country name, e.g.
Ind_top_cat['Country'] = 'Indonesia'
Sin_top_cat['Country'] = 'Singapore'
The you can create one big dataframe by concatenating the country dataframes together:
all_countries = pd.concat([Ind_top_cat, Sin_top_cat])
And finally, you can use a high level plotting library such as seaborn to assign one column to the x-axis location and one to the color of the bars:
import seaborn as sns
sns.barplot(x='Country', y='Amt', color='Category', data=all_countries)
You can scroll down to the second example on this page to get an idea what such a plot would look like (also pasted below):

Setting col_colors in seaborn clustermap from pandas

I have a clustermap generated from a pandas dataframe. Two of the columns are used to generate the clustermap and I need to use a 3rd column to generate a col_colors bar using sns.palplot(sns.light_palette('red')) palette (values will be from 0 - 1, light - dark colors).
The pseudo-code looks something like this:
df=pd.DataFrame(input, columns = ['Source', 'amplicon', 'coverage', 'GC'])
tiles = df.pivot("Source", "amplicon", "coverage")
col_colors = [values from df['GC']]
sns.clustermap(tiles, vmin=0, vmax=2, col_colors=col_colors)
I'm battling to find details on how to setup the col_colors so the correct values are linked to the appropriate tiles. Some direction would be greatly appreciated.
This example will be much easier to explain with sample data. I don't know what your data looks like, but say you had a bunch of GC content measurements For instance:
import seaborn as sns
import numpy as np
import pandas as pd
data = {'16S':np.random.normal(.52, 0.05, 12),
'ITS':np.random.normal(.52, 0.05, 12),
'Source':np.random.choice(['soil', 'water', 'air'], 12, replace=True)}
df=pd.DataFrame(data)
df[:3]
16S ITS Source
0 0.493087 0.460066 air
1 0.607229 0.592945 water
2 0.577155 0.440726 water
So data is GC content, and then there is a column describing the source. Say we want to plot a cluster map of the GC content where we use the Source column to define the network
#create a color palette with the same number of colors as unique values in the Source column
network_pal = sns.light_palette('red', len(df.Source.unique()))
#Create a dictionary where the key is the category and the values are the
#colors from the palette we just created
network_lut = dict(zip(df.Source.unique(), network_pal))
#get the series of all of the categories
networks = df.Source
#map the colors to the series. Now we have a list of colors the same
#length as our dataframe, where unique values are mapped to the same color
network_colors = pd.Series(networks).map(network_lut)
#plot the heatmap with the 16S and ITS categories with the network colors
#defined by Source column
sns.clustermap(df[['16S', 'ITS']], row_colors=network_colors, cmap='BuGn_r')
Basically what most of the above code is doing is creating a vector of colors that corrospond to the Source column of the data frame. You could of course create this manually, where the first color in the list would be mapped to the first row in the dataframe and the second color would be mapped to the second row and so on (this order will change when you plot it), however that would be a lot of work. I used a red color palette as that is what you mentioned in your question though I might recommend using a different palette. I colored by rows, however you can do the same thing for columns. Hope this helps!

Categories