How to avoid None when plotting sunburst chart using Plotly? - python

I am trying to create sunburst chart using Plotly. My data consists of several different types of journeys of varying steps. Some journeys are 10 steps others are 100. But for the purposes of simplicity, let us consider only 3 steps.
Here is the data -
import pandas as pd
import plotly.express as px
import numpy as np
data = {
'step0' :['home', 'home','product2','product1','home'],
'step1' : ['product1','product1', None, 'product2',None] ,
'step2' : ['product2','checkout', None, None,None] ,
'total_sales' : [50,20,10,0,7]
}
data_df = pd.DataFrame(data)
data_df.head()
I now try to plot these steps in sunburst chart. Because some journeys can be short, the subsequent steps are marked as None in those cases.
data_df = data_df.fillna('end')
plotting code -
fig = px.sunburst(data_df, path=['step0','step1','step2'], values='total_sales', height = 400)
fig.show()
As you can see above, the None have been filled by end because Plotly does not like NAs. But then I do not want to show the end in the sunburst chart.
I want to re-create something like this -
https://bl.ocks.org/kerryrodden/7090426
How can I make this work in Plotly?

One workaround that uses what you already have would be to instead fillna with an empty string like " " so the word "end" doesn't show on the chart. Then you can loop through the marker colors and marker labels in the fig.data[0] object, changing the marker color to transparent "rgba(0,0,0,0)" for every label that matches the empty string.
The only thing is that the hovertemplate will still show information for the part of the sunburst chart we have used our workaround to hide, but the static image will look correct.
For example:
import pandas as pd
import plotly.express as px
import numpy as np
data = {
'step0' :['home', 'home','product2','product1','home'],
'step1' : ['product1','product1', None, 'product2',None] ,
'step2' : ['product2','checkout', None, None,None] ,
'total_sales' : [50,20,10,0,7]
}
data_df = pd.DataFrame(data)
# data_df.head()
data_df = data_df.fillna(" ")
fig = px.sunburst(
data_df,
path=['step0','step1','step2'],
values='total_sales',
color=["red","orange","yellow","green","blue"],
height = 400
)
## set marker colors whose labels are " " to transparent
marker_colors = list(fig.data[0].marker['colors'])
marker_labels = list(fig.data[0]['labels'])
new_marker_colors = ["rgba(0,0,0,0)" if label==" " else color for (color, label) in zip(marker_colors, marker_labels)]
marker_colors = new_marker_colors
fig.data[0].marker['colors'] = marker_colors
fig.show()

Related

Plotly: different colors in one line

I am trying to distinguish weekends from weekdays by either 1) shading the region 2) coloring points with different colors or 3) setting x-axis label marked different for weekend.
Here I am trying the 2nd option — coloring data points for weekend differently. I first created an additional column (Is_Weekday) for distinguish weekends from weekdays. However, it’s not drawn on the same line, but rather draws two lines with different colors. I would like them to be in one line but with different color for values on weekends.
Here’s my code for reproducible data:
import pandas as pd
from datetime import datetime
import plotly.express as px
np.random.seed(42)
rng = pd.date_range('2022-04-10', periods=21, freq='D')
practice_df = pd.DataFrame({ 'Date': rng, 'Val' : np.random.randn(len(rng))})
practice_df = practice_df.set_index('Date')
weekend_list = []
for i in range(len(practice_df)):
if practice_df.index[i].weekday() > 4:
weekend_list.append(True)
else:
weekend_list.append(False)
practice_df['IsWeekend'] = weekend_list
fig = px.line(temp_df,
x=temp_df.index, y='cnt',
color = 'Is_Weekend',
markers=True)
fig.show()
What I want to do would look something like this but coloring data points/line for weekends differently.
Edit:
Thanks so much to #Derek_O, I was able to color weekend with my original dataset. But I'd want to color the friday-saturday line also colored as weekend legend, so I set practice_df.index[i].weekday() >= 4 instead of practice_df.index[i].weekday() > 4.
But would it be possible to have the Friday point to be the same as weekdays.
Also, is it possible to have a straight line connecting the points, not like stairs?
Otherwise, it'd also work if we could shade weekend region like the image at the bottom.
Borrowing from #Rob Raymond's answer here, we can loop through the practice_df two elements at a time, adding a trace to the fig for each iteration of the loop.
We also only want to show the legend category the first time it occurs (so that the legend entries only show each category like True or False once), which is why I've created a new column called "showlegend" that determines whether the legend is shown or not.
import numpy as np
import pandas as pd
from datetime import datetime
import plotly.express as px
import plotly.graph_objects as go
np.random.seed(42)
rng = pd.date_range('2022-04-10', periods=21, freq='D')
practice_df = pd.DataFrame({ 'Date': rng, 'Val' : np.random.randn(len(rng))})
practice_df = practice_df.set_index('Date')
weekend_list = []
for i in range(len(practice_df)):
if practice_df.index[i].weekday() > 4:
weekend_list.append(True)
else:
weekend_list.append(False)
practice_df['IsWeekend'] = weekend_list
weekend_color_map = {True:0, False:1}
weekend_name_map = {True:"True", False:"False"}
practice_df['color'] = practice_df['IsWeekend'].map(weekend_color_map)
practice_df['name'] = practice_df['IsWeekend'].map(weekend_name_map)
## use the color column since weekend corresponds to 0, nonweekend corresponds to 1
first_weekend_idx = practice_df['color'].loc[practice_df['color'].idxmin()]
first_nonweekend_idx = practice_df['color'].loc[practice_df['color'].idxmax()]
practice_df["showlegend"] = False
showlegendIdx = practice_df.columns.get_indexer(["showlegend"])[0]
practice_df.iat[first_weekend_idx, showlegendIdx] = True
practice_df.iat[first_nonweekend_idx, showlegendIdx] = True
practice_df["showlegend"] = practice_df["showlegend"].astype(object)
fig = go.Figure(
[
go.Scatter(
x=practice_df.index[tn : tn + 2],
y=practice_df['Val'][tn : tn + 2],
mode='lines+markers',
# line_shape="hv",
line_color=px.colors.qualitative.Plotly[practice_df['color'][tn]],
name=practice_df['name'][tn],
legendgroup=practice_df['name'][tn],
showlegend=practice_df['showlegend'][tn],
)
for tn in range(len(practice_df))
]
)
fig.update_layout(legend_title_text='Is Weekend')
fig.show()

How plot points based on categorical variable in plotly

I am using Plotly for visualization. I want to make plot, and give the points colors based on categorical variable.
fig = go.Figure()
fig.add_trace(go.Scatter(x=df.Predicted, y=df.Predicted,colors='Category',mode='markers',
))
fig.add_trace(go.Scatter(x=df.Predicted, y=df.real , colors='Category'
))
fig.show()
where Category is column in my dataframe. How can I do this kind of graph
you have implied a data frame structure which I have simulated
it's simpler to use Plotly Express higher level API that graph objects
have used to calls to px.scatter() to generate traces defined in your question. Plus have renamed traces in second call to ensure legend is clear and made them lines
import numpy as np
import pandas as pd
import plotly.express as px
df = pd.DataFrame(
{
"Predicted": np.sort(np.random.uniform(3, 15, 100)),
"real": np.sort(np.random.uniform(3, 15, 100)),
"Category": np.random.choice(list("ABCD"), 100),
}
)
px.scatter(df, x="Predicted", y="Predicted", color="Category").add_traces(
px.line(df, x="Predicted", y="real", color="Category")
.for_each_trace(
lambda t: t.update(name="real " + t.name)
) # make it clear in legend this is second set of traces
.data
)

Plotly pie graph not showing all data

I have noticed that my go.Pie graph only shows 2 of the 3 values held in the dataframe column. I noticed this when creating a px.treemap referencing the exact same column in the dataframe and it shows all 3 values.
Below is my code for the pie chart and then the treemap
#docCategory count pie graph
valuesDocCat = df['docCategory'].value_counts()
figDocCat = go.Figure(data=[go.Pie(labels = df['docCategory'], values = valuesDocCat)])
figDocCat.update_traces(textposition = 'inside')
figDocCat.update_layout(uniformtext_minsize=14, uniformtext_mode='hide', title='Document Category breakdown')
#treeMap test graph
valuesTreemap = df['Kind'].value_counts()
figTreemap = px.treemap(df, path = ['docCategory', 'Kind'], color='docCategory')
figTreemap.update_traces(root_color='lightgrey')
figTreemap.update_layout(margin = dict(t=50, l=25, r=25, b=25)
You can see my code above referencing the df['docCategory'] in both instances but as you can see in the images below the pie chart doesnt have the 'Unknown' field whereas the treemap does.
Any ideas on why? I have other pie charts that have more than 2 fields being referenced and no issues, it is just happening on this one.
your question "Plotly pie graph not showing all data", it is showing everything.
figDocCat = go.Figure(data=[go.Pie(labels = df['docCategory'], values = valuesDocCat)])
you are passing different length arrays for labels and values. plotly is taking first 3 items from labels, some of which are the same.
to be consistent this line would be figDocCat = go.Figure(data=[go.Pie(labels=valuesDocCat.index, values=valuesDocCat)]). i.e. both labels and values come from the same pandas series
have simulated data frame to demonstrate
full solution
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import numpy as np
cats = {
"Structured": ["Spreadsheet"],
"Unknown": ["System File", "Unrecognised"],
"Unstrcutured": ["Document", "Email", "Image", "Calendar Entry"],
}
df = pd.DataFrame(
[
{"docCategory": c, "Kind": np.random.choice(cats[c], 2)[0]}
for c in np.random.choice(list(cats.keys()), 25)
]
)
# docCategory count pie graph
valuesDocCat = df["docCategory"].value_counts()
figDocCat = go.Figure(data=[go.Pie(labels=valuesDocCat.index, values=valuesDocCat)])
figDocCat.update_traces(textposition="inside")
figDocCat.update_layout(
uniformtext_minsize=14, uniformtext_mode="hide", title="Document Category breakdown"
)
figDocCat.show()
# treeMap test graph
valuesTreemap = df["Kind"].value_counts()
figTreemap = px.treemap(df, path=["docCategory", "Kind"], color="docCategory")
figTreemap.update_traces(root_color="lightgrey")
figTreemap.update_layout(margin=dict(t=50, l=25, r=25, b=25))

How to get standard notation (rather than scientific) when hovering over pie chart in Plotly

I have a pie chart that displays worldwide movie sales by rating. When I hover over the chart the woldwide sales are being displayed in scientific notation. How do I fix this so that worldwide sales are represented in standard notation instead? I would appreciate it if anyone has a solution to this in express or graph objects (or both).
Thank you.
# formatting and importing data
import pandas as pd
movie_dataframe = pd.read_csv("https://raw.githubusercontent.com/NicholasTuttle/public_datasets/main/movie_data.csv") # importing dataset to dataframe
movie_dataframe['worldwide_gross'] = movie_dataframe['worldwide_gross'].str.replace(',', '', regex=True) # removing commas from column
movie_dataframe['worldwide_gross'] = movie_dataframe['worldwide_gross'].str.replace('$', '' , regex=True ) # removing dollar signs from column
movie_dataframe['worldwide_gross'] = movie_dataframe['worldwide_gross'].astype(float)
# narrowing dataframe to specific columns
movies_df = movie_dataframe.loc[:, ['title', 'worldwide_gross', 'rating', 'rt_score', 'rt_freshness']]
# plotly express
import plotly.express as px
fig = px.pie(movies_df,
values= movies_df['worldwide_gross'],
names= movies_df['rating'],
)
fig.show()
# plotly graph objects
import plotly.graph_objects as go
fig = go.Figure(go.Pie(
values = movies_df['worldwide_gross'],
labels = movies_df['rating']
))
fig.show()
Have a look here: https://plotly.com/python/hover-text-and-formatting/#disabling-or-customizing-hover-of-columns-in-plotly-express
Basically you give a dictionary of row name and format string to hover_data. The formatting string follows the d3-format's syntax.
import plotly.express as px
fig = px.pie(
movies_df, values= movies_df['worldwide_gross'], names= movies_df['rating'],
hover_data={
"worldwide_gross": ':.d',
# "worldwide_gross": ':.2f', # float
}
)
fig.show()
For the graph object API you need to write an hover_template:
https://plotly.com/python/reference/pie/#pie-hovertemplate
import plotly.graph_objects as go
fig = go.Figure(go.Pie(
values = movies_df['worldwide_gross'],
labels = movies_df['rating'],
hovertemplate='Rating: %{label}<br />World wide gross: %{value:d}<extra></extra>'
))
fig.show()

Plotly subplot represent same y-axis name with same color and single legend

I am trying to create a plot for two categories in a subplot. 1st column represent category FF and 2nd column represent category RF in the subplot.
The x-axis is always time and y-axis is remaining columns. In other words, it is a plot with one column vs rest.
1st category and 2nd category always have same column names just only the values differs.
I tried to generate the plot in a for loop but the problem is plotly treats each column name as distinct and thereby it represents the lines in different color for y-axis with same name. As a consequence, in legend also an entry is created.
For example, in first row Time vs price2010 I want both subplot FF and RF to be represented in same color (say blue) and a single entry in legend.
I tried adding legendgroup in go.Scatter but it doesn't help.
import pandas as pd
from pandas import DataFrame
from plotly import tools
from plotly.offline import init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from plotly.subplots import make_subplots
CarA = {'Time': [10,20,30,40 ],
'Price2010': [22000,26000,27000,35000],
'Price2011': [23000,27000,28000,36000],
'Price2012': [24000,28000,29000,37000],
'Price2013': [25000,29000,30000,38000],
'Price2014': [26000,30000,31000,39000],
'Price2015': [27000,31000,32000,40000],
'Price2016': [28000,32000,33000,41000]
}
ff = DataFrame(CarA)
CarB = {'Time': [8,18,28,38 ],
'Price2010': [19000,20000,21000,22000],
'Price2011': [20000,21000,22000,23000],
'Price2012': [21000,22000,23000,24000],
'Price2013': [22000,23000,24000,25000],
'Price2014': [23000,24000,25000,26000],
'Price2015': [24000,25000,26000,27000],
'Price2016': [25000,26000,27000,28000]
}
rf = DataFrame(CarB)
Type = {
'FF' : ff,
'RF' : rf
}
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.3/len(ff.columns))
labels = ff.columns[1:]
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params)
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=2024, width=1024,title_text="Car Analysis")
iplot(fig)
It might not be a good solution, but so far I can able to come up only with this hack.
fig = make_subplots(rows=len(ff.columns), cols=len(Type), subplot_titles=('FF','RF'),vertical_spacing=0.2/len(ff.columns))
labels = ff.columns[1:]
colors = [ '#a60000', '#f29979', '#d98d36', '#735c00', '#778c23', '#185900', '#00a66f']
legend = True
for indexC, (cat, values) in enumerate(Type.items()):
for indexP, params in enumerate(values.columns[1:]):
trace = go.Scatter(x=values.iloc[:,0], y=values[params], mode='lines', name=params,legendgroup=params, showlegend=legend, marker=dict(
color=colors[indexP]))
fig.append_trace(trace,indexP+1, indexC+1)
fig.update_xaxes(title_text=values.columns[0],row=indexP+1, col=indexC+1)
fig.update_yaxes(title_text=params,row=indexP+1, col=indexC+1)
fig.update_layout(height=1068, width=1024,title_text="Car Analysis")
legend = False
If you combine your data into a single tidy data frame, you can use a simple Plotly Express call to make the chart: px.line() with color, facet_row and facet_col

Categories