Color map based on countries' frequency counts - python

I have users data set with Country column, and want to plot a map of users' distribution across the countries. I converted the data set into a dictionary, where keys are Country names, and values - frequency counts for countries. The dictionary looks like this:
'usa': 139421,
'canada': 21601,
'united kingdom': 18314,
'germany': 17024,
'spain': 13096,
[...]
To plot distribution on a world map I used this code:
#Convert to dictionary
counts = users['Country'].value_counts().to_dict()
#Country names
def getList(dict):
return [*dict]
countrs = getList(counts)
#Frequency counts
freqs = list(counts.values())
#Plotting
data = dict(
type = 'choropleth',
colorscale = 'Viridis',
reversescale = True,
locations = countrs,
locationmode = "country names",
z = freqs,
text = users['Country'],
colorbar = {'title' : 'Number of Users'},
)
layout = dict(title = 'Number of Users per Country',
geo = dict(showframe = False)
)
choromap = go.Figure(data = [data],layout = layout)
iplot(choromap,validate=False)
This is the result I got:
The coloring is wrong; it shows that all countries fall into 0-20K range, which is false. Is there a way to fix this? Thank you

Without access to your complete dataset, this is really hard to answer. I'd suggest starting out with this example instead:
Plot 1:
Here you can simply replace lifeExp with your data and everything should be fine as long as your data has a correct format. In the following snippet I've created random integeres for each country to represent your counts variable.
Code:
import plotly.express as px
import numpy as np
np.random.seed(12)
gapminder = px.data.gapminder().query("year==2007")
gapminder['counts'] = np.random.uniform(low=100000, high=200000, size=len(gapminder)).tolist()
fig = px.choropleth(gapminder, locations="iso_alpha",
color="counts",
hover_name="country", # column to add to hover information
color_continuous_scale=px.colors.sequential.Plasma)
fig.show()
Plot 2:
Let me know how this works out for you.
Edit: suggestion with your data:
If you have a dictionary with country names and counts, you can easily construct a dataframe of it and perform a left join to get this:
Plot 2:
Just make sure that your dictionary values are lists, and that the country names are spelled and formatted correctly.
Code 2:
import plotly.express as px
import numpy as np
import pandas as pd
np.random.seed(12)
gapminder = px.data.gapminder().query("year==2007")
#gapminder['counts'] = np.nan
d = {'United States': [139421],
'Canada': [21601],
'United Kingdom': [18314],
'Germany': [17024],
'Spain': [13096]}
yourdata = pd.DataFrame(d).T.reset_index()
yourdata.columns=['country', 'count']
df=pd.merge(gapminder, yourdata, how='left', on='country')
fig = px.choropleth(df, locations="iso_alpha",
color="count",
hover_name="country", # column to add to hover information
color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

Related

Plotly pie graph not showing all data

I have noticed that my go.Pie graph only shows 2 of the 3 values held in the dataframe column. I noticed this when creating a px.treemap referencing the exact same column in the dataframe and it shows all 3 values.
Below is my code for the pie chart and then the treemap
#docCategory count pie graph
valuesDocCat = df['docCategory'].value_counts()
figDocCat = go.Figure(data=[go.Pie(labels = df['docCategory'], values = valuesDocCat)])
figDocCat.update_traces(textposition = 'inside')
figDocCat.update_layout(uniformtext_minsize=14, uniformtext_mode='hide', title='Document Category breakdown')
#treeMap test graph
valuesTreemap = df['Kind'].value_counts()
figTreemap = px.treemap(df, path = ['docCategory', 'Kind'], color='docCategory')
figTreemap.update_traces(root_color='lightgrey')
figTreemap.update_layout(margin = dict(t=50, l=25, r=25, b=25)
You can see my code above referencing the df['docCategory'] in both instances but as you can see in the images below the pie chart doesnt have the 'Unknown' field whereas the treemap does.
Any ideas on why? I have other pie charts that have more than 2 fields being referenced and no issues, it is just happening on this one.
your question "Plotly pie graph not showing all data", it is showing everything.
figDocCat = go.Figure(data=[go.Pie(labels = df['docCategory'], values = valuesDocCat)])
you are passing different length arrays for labels and values. plotly is taking first 3 items from labels, some of which are the same.
to be consistent this line would be figDocCat = go.Figure(data=[go.Pie(labels=valuesDocCat.index, values=valuesDocCat)]). i.e. both labels and values come from the same pandas series
have simulated data frame to demonstrate
full solution
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import numpy as np
cats = {
"Structured": ["Spreadsheet"],
"Unknown": ["System File", "Unrecognised"],
"Unstrcutured": ["Document", "Email", "Image", "Calendar Entry"],
}
df = pd.DataFrame(
[
{"docCategory": c, "Kind": np.random.choice(cats[c], 2)[0]}
for c in np.random.choice(list(cats.keys()), 25)
]
)
# docCategory count pie graph
valuesDocCat = df["docCategory"].value_counts()
figDocCat = go.Figure(data=[go.Pie(labels=valuesDocCat.index, values=valuesDocCat)])
figDocCat.update_traces(textposition="inside")
figDocCat.update_layout(
uniformtext_minsize=14, uniformtext_mode="hide", title="Document Category breakdown"
)
figDocCat.show()
# treeMap test graph
valuesTreemap = df["Kind"].value_counts()
figTreemap = px.treemap(df, path=["docCategory", "Kind"], color="docCategory")
figTreemap.update_traces(root_color="lightgrey")
figTreemap.update_layout(margin=dict(t=50, l=25, r=25, b=25))

Python Plotly express two bubble markers on the same scatter_geo?

Hi is it possible to have two different bubble types representing two different values from the same dataframe?
Currently my code is as follows:
covid = pd.read_csv('covid_19_data.csv')
fig = px.scatter_geo(covid, locations="Country/Region", locationmode="country names",animation_frame = "ObservationDate", hover_name = "Country/Region", size = "Confirmed", size_max = 100, projection= "natural earth")
Which produces the following output:
Map output
Is it possible to get it to show two different bubbles, one for confirmed cases and another for tweets? The data frame I'm working with is shown here:
Dataframe
Sure! You can freely add another dataset from px.scatter_geo() on an existing px.scatter_geo() using:
fig=px.scatter_geo()
fig.add_traces(fig1._data)
fig.add_traces(fig2._data)
Where fig1._data comes from a setup similar to yours in:
fig = px.scatter_geo(covid, locations="Country/Region", locationmode="country names",animation_frame = "ObservationDate", hover_name = "Country/Region", size = "Confirmed", size_max = 100, projection= "natural earth")
Since you haven't provided a dataset I'll use px.data.gapminder() and use the columns pop and gdpPercap, where the color of the latter is set to 'rgba(255,0,0,0.1)' which is a transparent red:
Complete code:
import plotly.express as px
df = px.data.gapminder().query("year == 2007")
fig1 = px.scatter_geo(df, locations="iso_alpha",
size="pop", # size of markers, "pop" is one of the columns of gapminder
)
fig2 = px.scatter_geo(df, locations="iso_alpha",
size="gdpPercap", # size of markers, "pop" is one of the columns of gapminder
)
# fig1.add_traces(fig2._data)
# fig1.show()
fig=px.scatter_geo()
fig.add_traces(fig1._data)
fig.add_traces(fig2._data)
fig.data[1].marker.color = 'rgba(255,0,0,0.1)'
f = fig.full_figure_for_development(warn=False)
fig.show()
Please let me know how this works out for you.

Add dropdown menu to plotly express treemap

I am currently trying to add a dropdown menu to my treemap plot
The code I am using :
import pandas as pd
import plotly.express as px
fig = px.treemap(df,
path=['RuleName','RuleNumber','ParaInvolved',"CreationP","MAjP"],
color='Somme',
hover_data=["RuleDecision","RuleMAJ"],
color_continuous_scale='RdBu')
fig.show()
The problem I am facing is that in my column "RuleName" I have 151 different values (but 1300 rows in total), that's why I'm trying to add a button allowing myself to chose for what RuleName value I want to plot my treemap. For now I am using a barbaric method consisting in filtering my dataframe by each RuleName value, which lead me to get 151 different treemap. I don't find any solution on that website or any other.
Thanks for your help
Here I'm basically using the same logic from this answer but I use px.treemap(...).data[0] to produce the traces instead of go.
import plotly.express as px
import plotly.graph_objects as go
df = px.data.tips()
# We have a list for every day
# In your case will be gropuby('RuleName')
# here for every element d
# d[0] is the name(key) and d[1] is the dataframe
dfs = list(df.groupby("day"))
first_title = dfs[0][0]
traces = []
buttons = []
for i,d in enumerate(dfs):
visible = [False] * len(dfs)
visible[i] = True
name = d[0]
traces.append(
px.treemap(d[1],
path=['day', 'time', 'sex'],
values='total_bill').update_traces(visible=True if i==0 else False).data[0]
)
buttons.append(dict(label=name,
method="update",
args=[{"visible":visible},
{"title":f"{name}"}]))
updatemenus = [{'active':0, "buttons":buttons}]
fig = go.Figure(data=traces,
layout=dict(updatemenus=updatemenus))
fig.update_layout(title=first_title, title_x=0.5)
fig.show()

plotly.graph_objects Choropleth Map subplots with shared colorscale

I'm new to python and plotly.graph_objects. I created some maps similar to the example found here: United States Choropleth Map
I'd like to combine the maps into one figure with a common color scale. I've looked at lots of examples of people using shared scales on subplots but they are using different graphing libraries. Is the functionality I want supported? If so, how is it done?
Here is the code I am using:
import plotly.graph_objects as go
import pandas as pd
df_shootings = pd.read_csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv')
state_count = df_shootings.groupby(['state', 'race']).size().reset_index(name='total')
races = pd.DataFrame({'W': 'White, non-Hispanic',
'B': 'Black, non-Hispanic',
'A': 'Asian',
'N': 'Native American',
'H': 'Hispanic'}, index=[0])
for race in races:
result = state_count[['state', 'total']][state_count.race == race]
fig = go.Figure(data=go.Choropleth(
locations=result.state,
z = result.total,
locationmode = 'USA-states', # set of locations match entries in `locations`
marker_line_color='white',
colorbar_title = "Shooting deaths",
))
fig.update_layout(
title_text = races[race][0],
geo_scope='usa', # limite map scope to USA
)
fig.data[0].hovertemplate = 'State: %{location}<br>Shooting deaths: %{z:.2f}<extra></extra>'
fig.show()
This is what I would like to get:
Right now I get individual maps with their own color scale which is different for each map.
I was working on a similar project and got this far See picture
Couldn't find a way how to name the subplots
import plotly.graph_objects as go
import pandas as pd
df_shootings = pd.read_csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv')
state_count = df_shootings.groupby(['state', 'race']).size().reset_index(name='total')
races = pd.DataFrame({'W': 'White, non-Hispanic',
'B': 'Black, non-Hispanic',
'A': 'Asian',
'N': 'Native American',
'H': 'Hispanic'}, index=[0])
races
fig = go.Figure()
layout = dict(
title_text = "Fatal Police Shootings Data",
geo_scope='usa',
)
for index, race in enumerate(races):
result = state_count[['state', 'total']][state_count.race == race]
geo_key = 'geo'+str(index+1) if index != 0 else 'geo'
fig.add_trace(
go.Choropleth(
locations=result.state,
z = result.total,
locationmode = 'USA-states', # set of locations match entries in `locations`
marker_line_color='white',
colorbar_title = "Shooting deaths",
geo=geo_key,
name=races[race].values[0],
coloraxis = 'coloraxis',
)
)
layout[geo_key] = dict(
scope = 'usa',
domain = dict( x = [], y = [] ),
)
layout
z = 0
COLS = 3
ROWS = 2
for y in reversed(range(ROWS)):
for x in range(COLS):
geo_key = 'geo'+str(z+1) if z != 0 else 'geo'
layout[geo_key]['domain']['x'] = [float(x)/float(COLS), float(x+1)/float(COLS)]
layout[geo_key]['domain']['y'] = [float(y)/float(ROWS), float(y+1)/float(ROWS)]
z=z+1
if z > 4:
break
fig.update_layout(layout)
fig.show()
I just solved this problem for my own project. I rewrote your code and split it up with comments.
No changes here, except adding a couple common packages and maybe the location of some code.
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np
df_shootings = pd.read_csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv')
state_count = df_shootings.groupby(['state', 'race']).size().reset_index(name='total')
races = pd.DataFrame({'W': 'White, non-Hispanic',
'B': 'Black, non-Hispanic',
'A': 'Asian',
'N': 'Native American',
'H': 'Hispanic'}, index=[0])
Declare a figure of subplots. You need to specify the number of rows and columns, you need to prepare the subplots to receive choropleths specifically, and we'll use your races dataframe to declare your subplot titles.
rows = 2
cols = 3
fig = make_subplots(
rows=rows, cols=cols,
specs = [[{'type': 'choropleth'} for c in np.arange(cols)] for r in np.arange(rows)],
subplot_titles = list(races.loc[0,:]))
I use enumerate in order to calculate the correct row and column of each subplot graph. I don't know if this is pythonic. The value i is just counting from 0 as we go through races. Each time through the loop we're going to add a trace to the appropriate row and column. In order to make sure that we use the same colorscale for each subplot, we set zmin and zmax to 0 and the maximum state_count['total'] respectively. The calculations for row and col use the count from enumerate to determine which row and column we graph this in.
for i, race in enumerate(races):
result = state_count[['state', 'total']][state_count.race == race]
fig.add_trace(go.Choropleth(
locations=result.state,
z = result.total,
locationmode = 'USA-states', # set of locations match entries in `locations`
marker_line_color='white',
zmin = 0,
zmax = max(state_count['total']),
colorbar_title = "Shooting deaths",
), row = i//cols+1, col = i%cols+1)
I made up what I thought was an appropriate title for the entire figure. The next line was a heck of a trick for me, but that's what sets each subplot to a plot of the USA instead of the default map of the entire world. Without it, only the first subplot would use the USA map. I'll give you my best understanding of what was going on, and I couldn't find or figure out a better solution. Basically, each subplot has a kinda sorta location that are named, in order, geo, geo2, geo3, geo4, etc. Each one has to have its scope set to usa. 'Magic' underscores didn't work, so I had to put together a **kwargs (keyword argument) that was equivalent to geo_scope = 'usa', geo2_scope = 'usa', geo3_scope = 'usa', geo4_scope = 'usa', geo5_scope = 'usa'. Granted that wasn't so bad to type out, but in my project, I had 50 subplots, so I coded it. Basically I made a dictionary and then converted it to kwargs with **. The fact that the keyword argument list starts with geo instead of geo0 or even geo1 is why that line is as complicated as it is.
fig.update_layout(
title_text = 'Shooting Deaths by Race',
**{'geo' + str(i) + '_scope': 'usa' for i in [''] + np.arange(2,rows*cols+1).tolist()},
)
for index, trace in enumerate(fig.data):
fig.data[index].hovertemplate = 'State: %{location}<br>Shooting deaths: %{z:.2f}
<extra></extra>'
fig.show()
And here's the result:
Shooting Deaths by Race

Plotly: How to prepare data visualization for below image using scatter bubble chart?

Here is my dataset after cleaning csv file
Here is output what I want
What I want is , I have to display years in x axis and column values in y axis.and I want to display bubbles with different colors and size with play animation button
I am new to data science , can someone help me ,how can I achieve this?
Judging by your dataset and attached image, what you're asking for is something like this:
But I'm not sure that is what you actually want. You see, with your particular dataset there aren't enough dimensions to justify an animation. Or even a bubble plot. This is because you're only looking at one value. So you end up showing the same value throuh the bubble sizes and on the y axis. And there's really no need to change your dataset given that your provided screenshot is in fact your desired plot. But we can talk more about that if you'd like.
Since you haven't provided a sample dataset, I've used a dataset that's available through plotly express and reshaped it so that is matches your dataset:
Complete code:
# imports
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import math
import numpy as np
# color cycle
colors = px.colors.qualitative.Alphabet*10
# sample data with similar structure as OP
df = px.data.gapminder().query("continent=='Americas'")
dfp=df.pivot(index='year', columns='country', values='pop')
dfp=dfp[['United States', 'Mexico', 'Argentina', 'Brazil', 'Colombia']]
dfp=dfp.sort_values(by='United States', ascending = False)
dfp=dfp.T
dfp.columns = [str(yr) for yr in dfp.columns]
dfp = dfp[dfp.columns[::-1]].T
# build figure and add traces
fig=go.Figure()
for col, country in enumerate(dfp):
vals = dfp[country].values
yVals = [col]*len(vals)
fig.add_traces(go.Scatter(
y=yVals,
x=dfp.index,
mode='markers',
marker=dict(color=colors[col],
size=vals,
sizemode='area',
#sizeref=2.*max(vals)/(40.**2),
sizeref=2.*max(dfp.max())/(40.**2),
sizemin=4),
name = country
))
# edit y tick layout
tickVals = np.arange(0, len(df.columns))
fig.update_layout(
yaxis = dict(tickmode = 'array',
tickvals = tickVals,
ticktext = dfp.columns.tolist()))
fig.show()

Categories