python plotly: box plot using column in dataframe - python

I am enjoying using plotly and wanted to plot boxplots for my data.
From their website, I do the following:
import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np
y0 = np.random.randn(50)
y1 = np.random.randn(50)+1
trace0 = go.Box(
y=y0,
name = 'Sample A',
marker = dict(
color = 'rgb(214, 12, 140)',
)
)
trace1 = go.Box(
y=y1,
name = 'Sample B',
marker = dict(
color = 'rgb(0, 128, 128)',
)
)
data = [trace0, trace1]
py.iplot(data)
The challenge that I have is that I do not know the total number of "trace" is unknown. For example:
titanic = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv")
I would like to plot a boxplot, by column 'embarked', a boxplot of the 'fare' column. Since the total number of unique values in 'embarked' is unknown, I do not want to hardcode that in.
Does anyone know how I can do this properly in plotly?
Thank you!

You could loop over your unique values in embarked and add a trace for each one. In this case there is also nan which needs separate treatment.
for embarked in titanic.embarked.unique():
import plotly
plotly.offline.init_notebook_mode()
import pandas as pd
import numpy as np
titanic = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv")
traces = list()
for embarked in titanic.embarked.unique():
if str(embarked) == 'nan':
traces.append(plotly.graph_objs.Box(y=titanic[pd.isnull(titanic.embarked)].fare,
name = str(embarked)
)
)
else:
traces.append(plotly.graph_objs.Box(y=titanic[titanic.embarked == embarked].fare,
name = embarked
)
)
plotly.offline.iplot(traces)

Related

How to add labels to plotly Box chart like Scatter chart?

I couldn't find the way to add text labels to plotly/dash box plot like you could add it to a scatterplot. In the example below, for ScatterPlot x=qty, y=price and you can then add Salesperson to the graph when the cursor is on Marker. For adding this I use the 'text' argument.
In the second example for BoxPlot when x=date, y=price I want to add salesperson in the same way. It would be very useful in case of outliers to see immediately who was the salesperson for that purchase. I looked in the documentation, but there is no clue. I assume it's not possible but still decided to try my luck here.
scatterplot:
import plotly.offline as pyo
import plotly.graph_objs as go
purchase={'date':['11/03/2021','12/03/2021','14/03/2021','11/03/2021'],
'price':[300, 400,200, 200],
'currency':['eur', 'usd','usd','usd'],
'qty':[200, 300, 400, 500],
'salesman':['AC', 'BC', "CC", 'DC']}
pur=pd.DataFrame(purchase)
pur
data = [go.Scatter(
x = pur['qty'],
y = pur['price'],
mode = 'markers',
text=pur['salesman'],
marker = dict(
size = 12,
color = 'rgb(51,204,153)',
symbol = 'pentagon',
line = dict(
width = 2,
)
)
)]
layout = go.Layout(
title = 'Random Data Scatterplot',
xaxis = dict(title = 'Some random x-values'),
yaxis = dict(title = 'Some random y-values'),
hovermode ='closest'
)
fig = go.Figure(data=data, layout=layout)
fig.show()
boxplot:
import plotly.offline as pyo
import plotly.graph_objs as go
x = pur['date']
y = pur['price']
data = [
go.Box(
y=y,
x=x,
text=pur['salesman']
)
]
layout = go.Layout(
title = 'box_plot'
)
fig = go.Figure(data=data, layout=layout)
fig.show()
The data you currently have is not suitable for boxplot. If you try to plot a boxplot with your data, the list [300, 400,200, 200] is used only once for the first date. For the other dates, there is no data.
I will show a simpler example with my own data.
dataset.csv
salesman,sales
alan,1.8
bary,2.3
copa,4.2
dac,1.19
eila,2.3
foo,2.5
gary,0.1
holland,10
code
import plotly.graph_objs as go
import pandas as pd
import plotly.io as pio
pio.renderers.default = 'browser'
df = pd.read_csv("deletelater")
fig = go.Figure()
fig.add_trace(go.Box(
y=df["sales"],
name='12/12/22',
customdata=df["salesman"],
hovertemplate='<b>sales: %{y}</b><br>salesperson: %{customdata}'
))
fig.show()
Diagram
As you can see, the name of the outlier salesperson is displayed on the hover label.

Adding outliers to plotly boxplot properly

I am building a serie of boxplots with pre calculated data using plotly graphic_objects. My problem comes when I need to send the outliers list for each plot. I did not find a proper way of sendding them.
My code looks like this:
from plotly import graph_objects as go
fig = go.Figure()
fig.add_trace(go.Box(x = df.mes, y = df.json_agg, mean = df.media, q1 = df.p25, median = df.mediana, q3 = df.p75, lowerfence = df.li, upperfence = df.ls))
fig.update_xaxes(
dtick="M1",
tickformat="%m-%Y",
ticklabelmode="period")
fig.show()
And my final plot:
What I need is the outliers propertly shown on top or bottop of each boxplot, not side by side.
Thanks, you all help a lot.
have simulated data to make your code sample work
whenever I try passing q3 plot fails to build
the parameter you need to show outliers is boxpoints https://plotly.com/python/box-plots/#styling-outliers
import plotly.graph_objects as go
import pandas as pd
import numpy as np
S = 1000
df = pd.DataFrame(
{
"mes": np.random.choice(pd.date_range("1-jan-2021", freq="M", periods=10), S),
"json_agg": np.random.uniform(-0.4, 0.5, S) * np.random.uniform(0.1, 1, S),
}
)
df = (
df.groupby("mes", as_index=False)
.apply(
lambda d: d.assign(
media=d["json_agg"].mean(),
p25=np.percentile(d["json_agg"], 25),
p75=np.percentile(d["json_agg"], 75),
mediana=np.percentile(d["json_agg"], 50),
li=np.percentile(d["json_agg"], 20),
ls=np.percentile(d["json_agg"], 80),
)
)
.sort_values("mes")
)
fig = go.Figure()
fig.add_trace(
go.Box(
x=df.mes,
y=df.json_agg,
mean=df.media,
q1=df.p25,
# q3=df.p75,
median=df.mediana,
lowerfence=df.li,
upperfence=df.ls,
boxpoints="outliers",
)
)
# fig.update_xaxes(dtick="M1", tickformat="%m-%Y", ticklabelmode="period")

How to get standard notation (rather than scientific) when hovering over pie chart in Plotly

I have a pie chart that displays worldwide movie sales by rating. When I hover over the chart the woldwide sales are being displayed in scientific notation. How do I fix this so that worldwide sales are represented in standard notation instead? I would appreciate it if anyone has a solution to this in express or graph objects (or both).
Thank you.
# formatting and importing data
import pandas as pd
movie_dataframe = pd.read_csv("https://raw.githubusercontent.com/NicholasTuttle/public_datasets/main/movie_data.csv") # importing dataset to dataframe
movie_dataframe['worldwide_gross'] = movie_dataframe['worldwide_gross'].str.replace(',', '', regex=True) # removing commas from column
movie_dataframe['worldwide_gross'] = movie_dataframe['worldwide_gross'].str.replace('$', '' , regex=True ) # removing dollar signs from column
movie_dataframe['worldwide_gross'] = movie_dataframe['worldwide_gross'].astype(float)
# narrowing dataframe to specific columns
movies_df = movie_dataframe.loc[:, ['title', 'worldwide_gross', 'rating', 'rt_score', 'rt_freshness']]
# plotly express
import plotly.express as px
fig = px.pie(movies_df,
values= movies_df['worldwide_gross'],
names= movies_df['rating'],
)
fig.show()
# plotly graph objects
import plotly.graph_objects as go
fig = go.Figure(go.Pie(
values = movies_df['worldwide_gross'],
labels = movies_df['rating']
))
fig.show()
Have a look here: https://plotly.com/python/hover-text-and-formatting/#disabling-or-customizing-hover-of-columns-in-plotly-express
Basically you give a dictionary of row name and format string to hover_data. The formatting string follows the d3-format's syntax.
import plotly.express as px
fig = px.pie(
movies_df, values= movies_df['worldwide_gross'], names= movies_df['rating'],
hover_data={
"worldwide_gross": ':.d',
# "worldwide_gross": ':.2f', # float
}
)
fig.show()
For the graph object API you need to write an hover_template:
https://plotly.com/python/reference/pie/#pie-hovertemplate
import plotly.graph_objects as go
fig = go.Figure(go.Pie(
values = movies_df['worldwide_gross'],
labels = movies_df['rating'],
hovertemplate='Rating: %{label}<br />World wide gross: %{value:d}<extra></extra>'
))
fig.show()

Set up multiple subplots with moving averages using cufflinks and plotly offline

Im trying to select 4 different product prices from my dataframe and plot their moving average as a subplot (2,2) using plotly cufflinks. I would appreciate if anyone can guide on this.
I tried plotting the price as below.
I came across cufflinks technical analysis which allow me to plot moving average in a cleaner way, however, im not too sure how to apply it yet.
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
from plotly import tools
import plotly.graph_objs as go
trace1= go.Scatter(name=',milk', x=df.Date, y=df['milk'])
trace2= go.Scatter(name='soap', x=df.Date, y=df['soap'])
trace3= go.Scatter(name='rice', x=df.Date, y=df['rice'])
trace4= go.Scatter(name='water', x=df.Date, y=df['water'])
fig = tools.make_subplots(rows=2, cols=2, subplot_titles=('milk', 'soap',
'rice', 'water'))
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)
fig['layout'].update(height=1000, width=1800, title='supermarket')
plot(fig, filename='supermarket.html')
I would appreciate if someone could teach me how to use plotly cufflinks to plot four moving averages from the selected columns from a dataframe using plotly offline.
Insert the code section below in a Jupyter Notebook to produce the following plot using cufflinks and plotly offline:
Plot:
Code:
# imports
import plotly
from plotly import tools
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import pandas as pd
import numpy as np
from IPython.core.display import display, HTML
import copy
import plotly.graph_objs as go
####### PART 1 - SETUP AND SAMPLE DATA #######
# setup
display(HTML("<style>.container { width:55% !important; } .widget-select > select {background-color: gainsboro;}</style>"))
init_notebook_mode(connected=True)
np.random.seed(123)
cf.set_config_file(theme='pearl')
# Random data using cufflinks
df = cf.datagen.lines().iloc[:,0:4]
df.columns = ['StockA', 'StockB', 'StockC', 'StockD']
####### PART 2 - FUNCTION FOR MOVING AVERAGES #######
# Function for moving averages
def movingAvg(df, win, keepSource):
"""Add moving averages for all columns in a dataframe.
Arguments:
df -- pandas dataframe
win -- length of movingAvg estimation window
keepSource -- True or False for keep or drop source data in output dataframe
"""
df_temp = df.copy()
# Manage existing column names
colNames = list(df_temp.columns.values).copy()
removeNames = colNames.copy()
i = 0
for col in colNames:
# Make new names for movingAvgs
movingAvgName = colNames[i] + '_MA' #+ str(win)
# Add movingAvgs
df_temp[movingAvgName] = df[col].rolling(window=win).mean()
i = i + 1
# Remove estimates with insufficient window length
df_temp = df_temp.iloc[win:]
# Remove or keep source data
if keepSource == False:
df_temp = df_temp.drop(removeNames,1)
return df_temp
# Add moving averages to df
windowLength = 10
df = movingAvg(df=df, win=windowLength, keepSource = True)
####### PART 3 -PLOTLY RULES #######
# Structure lines / traces for the plots
# trace 1
trace1 = go.Scatter(
x=df.index,
y=df['StockA'],
name='StockA'
)
trace1_ma = go.Scatter(
x=df.index,
y=df['StockA_MA'],
name='StockA_MA'
)
# trace 2
trace2 = go.Scatter(
x=df.index,
y=df['StockB'],
name='StockB'
)
trace2_ma = go.Scatter(
x=df.index,
y=df['StockB_MA'],
name='StockB_MA'
)
# trace 3
trace3 = go.Scatter(
x=df.index,
y=df['StockC'],
name='StockC'
)
trace3_ma = go.Scatter(
x=df.index,
y=df['StockC_MA'],
name='StockC_MA'
)
# trace 4
trace4 = go.Scatter(
x=df.index,
y=df['StockD'],
name='StockD'
)
trace4_ma = go.Scatter(
x=df.index,
y=df['StockD_MA'],
name='StockD_MA'
)
# Structure traces as datasets
data1 = [trace1, trace1_ma]
data2 = [trace2, trace2_ma]
data3 = [trace3, trace3_ma]
data4 = [trace4, trace4_ma]
# Build figures
fig1 = go.Figure(data=data1)
fig2 = go.Figure(data=data2)
fig3 = go.Figure(data=data3)
fig4 = go.Figure(data=data4)
# Subplots setup and layout
figs = cf.subplots([fig1, fig2, fig3, fig4],shape=(2,2))
figs['layout'].update(height=800, width=1200,
title='Stocks with moving averages = '+ str(windowLength))
iplot(figs)

Generate random Colors in Bar chart with Plotly Python

I am using Plotly for Python to generate some stacked bar charts. Since I have 17 objects which are getting stacked, the colour of the bars has started repeating as seen in the image below.
Can someone tell me how to get unique colours for each stack?
Please find my code to generate the bar chart below:
import plotly
plotly.tools.set_credentials_file(username='xxxxxxxx',
api_key='********')
dd = []
import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np
for k,v in new_dict.items():
trace = go.Bar(x = x['unique_days'],
y = v,
name = k,
text=v,
textposition = 'auto',
)
dd.append(trace)
layout= go.Layout(
title= 'Daily Cumulative Spend per campaign',
hovermode= 'closest',
autosize= True,
width =5000,
barmode='stack',
xaxis= dict(
title= 'Date',
zeroline= False,
gridwidth= 0,
showticklabels=True,
tickangle=-45,
nticks = 60,
ticklen = 5
),
yaxis=dict(
title= 'Cumulative Spend($)',
ticklen= 5,
gridwidth= 2,
),
showlegend= True
)
fig = dict(data=dd, layout = layout)
py.iplot(fig)
It was the issue which I have been facing in this week and I solved with Matplotlib module. Here is my code:
import matplotlib, random
hex_colors_dic = {}
rgb_colors_dic = {}
hex_colors_only = []
for name, hex in matplotlib.colors.cnames.items():
hex_colors_only.append(hex)
hex_colors_dic[name] = hex
rgb_colors_dic[name] = matplotlib.colors.to_rgb(hex)
print(hex_colors_only)
# getting random color from list of hex colors
print(random.choice(hex_colors_only))
There are 148 colors in the list and you can integrate this list with your wish. Hopefully it is useful for someone :)
the same as above, short version:
import matplotlib, random
colors = dict(matplotlib.colors.cnames.items())
hex_colors = tuple(colors.values())
print(hex_colors)
#getting a random color from the dict
print(random.choice(hex_colors))

Categories