"Complex" plotting with plottly.express

"Complex" plotting with plottly.express - python

What I am trying to do is something like this using plotly.express:
It partly worked, but I wish each part of the bars would be different colors
and that it showed the value in the columns 'CBK_total' and 'Estorno_total'
on each individual part of each bar. Don't know if it's possible.
My code:
performance_mes_CBK = px.bar(dados
, x='Ano_Mes_Solicitacao'
, y=['Prop_CBK', 'Prop_Estorno']
, color='Regra'
, barmode='group'
, height=600
, title='Performance Regras')

when asking questions, provide your data as marked up text, not a screen shot. Doing OCR on data is not straight forward
this can be achieved using opacity encoded into rgba(). Understanding that marker_color can be a value or an array
have restructured dataframe to stack y-values into a column, with another column showing which measure it is
can then use for_each_trace() to update marker_color using assigned color and column that has been included in customdata through use of hover_data
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.colors
# simulate data frame... data as images in questions is unusable
s = 100
dados = pd.DataFrame(
{
"Ano_Mes_Solicitacao": np.random.choice(
pd.date_range("1-oct-2021", freq="MS", periods=4), s
),
"Prop_CBK": np.random.randint(20, 50, s),
"Prop_Estorno": np.random.randint(20, 50, s),
"Regra": np.random.choice([0.0, 1.0, 2.0], 100).astype(str),
}
)
dados = dados.groupby(["Ano_Mes_Solicitacao", "Regra"], as_index=False).sum()
# OP code, from simulated dataframe
performance_mes_CBK = px.bar(
dados,
x="Ano_Mes_Solicitacao",
y=["Prop_CBK", "Prop_Estorno"],
color="Regra",
barmode="group",
height=600,
title="Performance Regras",
)
performance_mes_CBK.show()
# restruct dataframe so that data is stacked
d2 = (
dados.set_index(["Ano_Mes_Solicitacao", "Regra"])
.stack()
.to_frame()
.reset_index()
.rename(columns={"level_2": "column", 0: "value"})
)
# utility function to set transparency based on which measure is being displayed
def color_array(t):
r, g, b = plotly.colors.hex_to_rgb(t.marker.color)
return [
f"rgba({r},{g},{b},{1 if v==t.customdata[0] else .6})"
for v in t.customdata.T[0]
]
# use hover_data to create custom data so that measures are identifable
# update marker_color to use transparency function
fig = px.bar(
d2,
x="Ano_Mes_Solicitacao",
y="value",
color="Regra",
barmode="group",
height=600,
hover_data=["column"],
title="Performance Regras",
).for_each_trace(lambda t: t.update(marker_color=color_array(t)))
fig

Related

Adding counts to Plotly boxplots

I have a relatively simple issue, but cannot find any answer online that addresses it. Starting from a simple boxplot:
import plotly.express as px
df = px.data.iris()
fig = px.box(
df, x='species', y='sepal_length'
)
val_counts = df['species'].value_counts()
I would now like to add val_counts (in this dataset, 50 for each species) to the plots, preferably on either of the following places:
On top of the median line
On top of the max/min line
Inside the hoverbox
How can I achieve this?

The snippet below will set count = 50 for all unique values of df['species'] on top of the max line using fig.add_annotation like this:
for s in df.species.unique():
fig.add_annotation(x=s,
y = df[df['species']==s]['sepal_length'].max(),
text = str(len(df[df['species']==s]['species'])),
yshift = 10,
showarrow = False
)
Plot:
Complete code:
import plotly.express as px
df = px.data.iris()
fig = px.box(
df, x='species', y='sepal_length'
)
for s in df.species.unique():
fig.add_annotation(x=s,
y = df[df['species']==s]['sepal_length'].max(),
text = str(len(df[df['species']==s]['species'])),
yshift = 10,
showarrow = False
)
f = fig.full_figure_for_development(warn=False)
fig.show()

Using same approach that I presented in this answer: Change Plotly Boxplot Hover Data
calculate all the measures a box plot calculates plus the additional measure you want count
overlay bar traces over box plot traces so hover has all measures required
import plotly.express as px
df = px.data.iris()
# summarize data as per same dimensions as boxplot
df2 = df.groupby("species").agg(
**{
m
if isinstance(m, str)
else m[0]: ("sepal_length", m if isinstance(m, str) else m[1])
for m in [
"max",
("q75", lambda s: s.quantile(0.75)),
"median",
("q25", lambda s: s.quantile(0.25)),
"min",
"count",
]
}
).reset_index().assign(y=lambda d: d["max"] - d["min"])
# overlay bar over boxplot
px.bar(
df2,
x="species",
y="y",
base="min",
hover_data={c:not c in ["y","species"] for c in df2.columns},
hover_name="species",
).update_traces(opacity=0.1).add_traces(px.box(df, x="species", y="sepal_length").data)

How to remove x-axis tick labels in a stacked-and-grouped bar chart using Plotly

I have a dataset:
I want to visualise this dataset in a stacked-and-grouped bar chart using Plotly. Unfortunately, Plotly does not have this type of charts yet, but there is this workaround that I tried.
My code:
sv_clusters = ["cluster_1", "cluster_2", "cluster_3", "cluster_4", "cluster_5", "cluster_6", "cluster_7"]
sv_data = sv_data[["Population", "Sample_name"] + sv_clusters]
for r in sv_clusters:
fig.add_trace(
go.Bar(
x=[sv_data.Population, sv_data.Sample_name],
y=sv_data[r],
name=r,
marker=dict(
line_width=0)),
)
fig.update_layout(
template="simple_white",
xaxis=dict(title_text=None),
yaxis=dict(title_text="fraction"),
width=2000,
bargap=0,
title='Alles',
barmode="stack",
Now my plot looks like this:
I want to remove the x-label ticks, since it clutters the chart (sample names not the population names). So I tried showticklabels=False, which resulted in this:
Which removes all x-axis labels.
How do I remove the sample name tick labels?

have simulated data to make code reproducible
found same issue you noted
reverted back to not using https://plotly.com/python/categorical-axes/#multicategorical-axes but encoded xaxis
then can update array to define ticks
import requests
import pandas as pd
import numpy as np
import plotly.graph_objects as go
# generate some data... similar to what was presented
sv_data = pd.DataFrame(
{
"Population": pd.json_normalize(
requests.get("https://restcountries.eu/rest/v2/all").json()
)["subregion"].unique()
}
).loc[0:6,].assign(
Sample_name=lambda d: d["Population"]
.str[:2]
.str.upper()
.apply(lambda s: [f"{s}{i}" for i in range(1500, 1550)])
).explode(
"Sample_name"
).assign(**{f"cluster_{i}":lambda d: np.random.uniform(0,1, len(d)) for i in range(1,8)})
sv_clusters = ["cluster_1", "cluster_2", "cluster_3", "cluster_4", "cluster_5", "cluster_6", "cluster_7"]
sv_data = sv_data[["Population", "Sample_name"] + sv_clusters]
fig=go.Figure()
# instead of categoricals use concatenated value for x, define text to hover works
for r in sv_clusters:
fig.add_trace(
go.Bar(
x=sv_data.loc[:,["Population","Sample_name"]].apply(lambda r: " ".join(r), axis=1),
y=sv_data[r],
text=sv_data.loc[:,["Population","Sample_name"]].apply(lambda r: " ".join(r), axis=1),
name=r,
marker=dict(
line_width=0)),
)
# given simple x, set tick vals as wanted
fig.update_layout(
template="simple_white",
yaxis=dict(title_text="fraction"),
width=2000,
bargap=0,
title='Alles',
barmode="stack",
xaxis={"tickmode":"array", "tickvals":sv_data.loc[:,["Population","Sample_name"]].apply(lambda r: " ".join(r), axis=1),
"ticktext":np.where(sv_data["Population"]==sv_data["Population"].shift(), "", sv_data["Population"])}
)

Adding outliers to plotly boxplot properly

I am building a serie of boxplots with pre calculated data using plotly graphic_objects. My problem comes when I need to send the outliers list for each plot. I did not find a proper way of sendding them.
My code looks like this:
from plotly import graph_objects as go
fig = go.Figure()
fig.add_trace(go.Box(x = df.mes, y = df.json_agg, mean = df.media, q1 = df.p25, median = df.mediana, q3 = df.p75, lowerfence = df.li, upperfence = df.ls))
fig.update_xaxes(
dtick="M1",
tickformat="%m-%Y",
ticklabelmode="period")
fig.show()
And my final plot:
What I need is the outliers propertly shown on top or bottop of each boxplot, not side by side.
Thanks, you all help a lot.

have simulated data to make your code sample work
whenever I try passing q3 plot fails to build
the parameter you need to show outliers is boxpoints https://plotly.com/python/box-plots/#styling-outliers
import plotly.graph_objects as go
import pandas as pd
import numpy as np
S = 1000
df = pd.DataFrame(
{
"mes": np.random.choice(pd.date_range("1-jan-2021", freq="M", periods=10), S),
"json_agg": np.random.uniform(-0.4, 0.5, S) * np.random.uniform(0.1, 1, S),
}
)
df = (
df.groupby("mes", as_index=False)
.apply(
lambda d: d.assign(
media=d["json_agg"].mean(),
p25=np.percentile(d["json_agg"], 25),
p75=np.percentile(d["json_agg"], 75),
mediana=np.percentile(d["json_agg"], 50),
li=np.percentile(d["json_agg"], 20),
ls=np.percentile(d["json_agg"], 80),
)
)
.sort_values("mes")
)
fig = go.Figure()
fig.add_trace(
go.Box(
x=df.mes,
y=df.json_agg,
mean=df.media,
q1=df.p25,
# q3=df.p75,
median=df.mediana,
lowerfence=df.li,
upperfence=df.ls,
boxpoints="outliers",
)
)
# fig.update_xaxes(dtick="M1", tickformat="%m-%Y", ticklabelmode="period")

Python Plotly express two bubble markers on the same scatter_geo?

Hi is it possible to have two different bubble types representing two different values from the same dataframe?
Currently my code is as follows:
covid = pd.read_csv('covid_19_data.csv')
fig = px.scatter_geo(covid, locations="Country/Region", locationmode="country names",animation_frame = "ObservationDate", hover_name = "Country/Region", size = "Confirmed", size_max = 100, projection= "natural earth")
Which produces the following output:
Map output
Is it possible to get it to show two different bubbles, one for confirmed cases and another for tweets? The data frame I'm working with is shown here:
Dataframe

Sure! You can freely add another dataset from px.scatter_geo() on an existing px.scatter_geo() using:
fig=px.scatter_geo()
fig.add_traces(fig1._data)
fig.add_traces(fig2._data)
Where fig1._data comes from a setup similar to yours in:
fig = px.scatter_geo(covid, locations="Country/Region", locationmode="country names",animation_frame = "ObservationDate", hover_name = "Country/Region", size = "Confirmed", size_max = 100, projection= "natural earth")
Since you haven't provided a dataset I'll use px.data.gapminder() and use the columns pop and gdpPercap, where the color of the latter is set to 'rgba(255,0,0,0.1)' which is a transparent red:
Complete code:
import plotly.express as px
df = px.data.gapminder().query("year == 2007")
fig1 = px.scatter_geo(df, locations="iso_alpha",
size="pop", # size of markers, "pop" is one of the columns of gapminder
)
fig2 = px.scatter_geo(df, locations="iso_alpha",
size="gdpPercap", # size of markers, "pop" is one of the columns of gapminder
)
# fig1.add_traces(fig2._data)
# fig1.show()
fig=px.scatter_geo()
fig.add_traces(fig1._data)
fig.add_traces(fig2._data)
fig.data[1].marker.color = 'rgba(255,0,0,0.1)'
f = fig.full_figure_for_development(warn=False)
fig.show()
Please let me know how this works out for you.

python plotly: box plot using column in dataframe

I am enjoying using plotly and wanted to plot boxplots for my data.
From their website, I do the following:
import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np
y0 = np.random.randn(50)
y1 = np.random.randn(50)+1
trace0 = go.Box(
y=y0,
name = 'Sample A',
marker = dict(
color = 'rgb(214, 12, 140)',
)
)
trace1 = go.Box(
y=y1,
name = 'Sample B',
marker = dict(
color = 'rgb(0, 128, 128)',
)
)
data = [trace0, trace1]
py.iplot(data)
The challenge that I have is that I do not know the total number of "trace" is unknown. For example:
titanic = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv")
I would like to plot a boxplot, by column 'embarked', a boxplot of the 'fare' column. Since the total number of unique values in 'embarked' is unknown, I do not want to hardcode that in.
Does anyone know how I can do this properly in plotly?
Thank you!

You could loop over your unique values in embarked and add a trace for each one. In this case there is also nan which needs separate treatment.
for embarked in titanic.embarked.unique():
import plotly
plotly.offline.init_notebook_mode()
import pandas as pd
import numpy as np
titanic = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv")
traces = list()
for embarked in titanic.embarked.unique():
if str(embarked) == 'nan':
traces.append(plotly.graph_objs.Box(y=titanic[pd.isnull(titanic.embarked)].fare,
name = str(embarked)
)
)
else:
traces.append(plotly.graph_objs.Box(y=titanic[titanic.embarked == embarked].fare,
name = embarked
)
)
plotly.offline.iplot(traces)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

"Complex" plotting with plottly.express - python

Related

Adding counts to Plotly boxplots

How to remove x-axis tick labels in a stacked-and-grouped bar chart using Plotly

Adding outliers to plotly boxplot properly

Python Plotly express two bubble markers on the same scatter_geo?

python plotly: box plot using column in dataframe

Categories

Resources