How do you add labels to a plotly boxplot in python? - python

I have the following code;
y = errnums
err_box = Box(
y=y,
name='Error Percent',
boxmean='sd',
marker=Marker(color='red'),
boxpoints='all',
jitter=0.5,
pointpos=-2.0
)
layout = Layout(
title='Error BoxPlot',
height=500,
width=500
)
fig = Figure(data=Data([err_box]), layout=layout)
plotly.image.save_as(fig, os.path.join(output_images, 'err_box.png'))
Which generates the following image;
What I would like to do is the following two things;
1) Add % next to the y-axis numbers. (Instead of having a traditional y-axis label saying "Error (%)")
2) Label all the vital points: mean, first quartile, third quartile, and stdev. Ideally the label would be a 4 sig-fig ('.2f') number next to the line.
Also, the stdev is the dotted line, and the diamond represents 1 sigma? 2 sigma?

For labels, try annotations. You'll have to compute the quartiles and mean yourself to position the labels.
Simple example:
import plotly.plotly as py
from plotly.graph_objs import *
data = Data([
Box(
y=[0, 1, 1, 2, 3, 5, 8, 13, 21],
boxpoints='all',
jitter=0.3,
pointpos=-1.8
)
])
layout = Layout(
annotations=Annotations([
Annotation(
x=0.3,
y=8.822,
text='3rd Quartile',
showarrow=False,
font=Font(
size=16
)
)
])
)
fig = Figure(data=data, layout=layout)
plot_url = py.plot(fig)
Simple Python boxplot
I recommend adding and positioning the annotations in the Plotly workspace, and then viewing the generated code:
The diamond shows the mean, and +- 1 standard deviation away from it.
It's not currently possible to add a % to the y-axis labels.

Related

is there any way to get plotly radar charts with a complete line using go.Scatterpolar()?

In plotly's example scatter radar plots (here), one of the line segments is missing. This is also the case when I've tried it myself - I think if you're using plotly express you can use line_close. Is there an equivalent in using go.Scatterpolar?
The examples in the reference are created in a graph object and the lines are not closed. To close it, the data must be adjusted. This is accomplished by repeating the first point.
import plotly.graph_objects as go
r = [1, 5, 2, 2, 3]
r.append(r[0])
theta = ['processing cost','mechanical properties','chemical stability','thermal stability','device integration','processing cost']
theta.append(theta[0])
fig = go.Figure(data=go.Scatterpolar(
r=r,
theta=theta,
fill='toself'
))
fig.update_layout(autosize=False,
height=450,
polar=dict(
radialaxis=dict(
visible=True
),
),
showlegend=False)
fig.show()

How can I plot a line with a confidence interval in python using plotly?

I am trying to use plotly to plot a graph similar to the one here below:
Unfortunately I am only able to plot something like this
What I would like is to have normal boundaries (upper and lower defined by two dataframe columns and only one entry in the legend.
import plotly.graph_objs as go
# Create a trace for the lower bound
trace1 = go.Scatter(x=df.index,
y=df['lower'],
name='Lower Bound',
fill='tonexty',
fillcolor='rgba(255,0,0,0.2)',
line=dict(color='blue'))
# Create a trace for the median
trace2 = go.Scatter(x=df.index,
y=df['median'],
name='median',
line=dict(color='blue', width=2))
# Create a trace for the upper bound
trace3 = go.Scatter(x=df.index,
y=df['upper'],
name='Upper Bound',
fill='tonexty',
fillcolor='rgba(255,0,0,0.2)',
line=dict(color='blue'))
# Create the layout
layout = go.Layout(xaxis=dict(title='Date'),
yaxis=dict(title='title'))
# Create the figure with the three traces and the layout
fig = go.Figure(data=[trace1, trace2, trace3], layout=layout)
context['pltyplot'] = pltyplot(fig, output_type="div")
I want to use plotly because I am integrating the resulting figure into a django web page and plotly enables, with the las line, to import the whole object in a clean, simple and interactive way into the poge.
Any ideas?
You can try this code:
import plotly.graph_objs as go
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 3, 6]
# Define the confidence interval
interval = 0.6 * np.std(y) / np.mean(y)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode='lines', name='Line'))
fig.add_trace(go.Scatter(x=x+x[::-1],
y=y+[i + interval for i in y[::-1]],
fill='toself',
fillcolor='rgba(0,100,80,0.2)',
line=dict(width=0),
showlegend=False))
fig.add_trace(go.Scatter(x=x+x[::-1],
y=y+[i - interval for i in y[::-1]],
fill='toself',
fillcolor='rgba(0,100,80,0.2)',
line=dict(width=0),
showlegend=False))
fig.show()

Connecting data points with lines in a Plotly boxplot in Python

I am working on some boxplots. I found this code very helpful and I managed to replicate it for my needs:
import plotly.express as px
import numpy as np
import pandas as pd
np.random.seed(1)
y0 = np.random.randn(50) - 1
y1 = np.random.randn(50) + 1
df = pd.DataFrame({'graph_name':['trace 0']*len(y0)+['trace 1']*len(y1),
'value': np.concatenate([y0,y1],0),
'color':np.random.choice([0,1,2,3,4,5,6,7,8,9], size=100, replace=True)}
)
fig = px.strip(df,
x='graph_name',
y='value',
color='color',
stripmode='overlay')
fig.add_trace(go.Box(y=df.query('graph_name == "trace 0"')['value'], name='trace 0'))
fig.add_trace(go.Box(y=df.query('graph_name == "trace 1"')['value'], name='trace 1'))
fig.update_layout(autosize=False,
width=600,
height=600,
legend={'traceorder':'normal'})
fig.show()
I am now trying to put some lines connecting the datapoints with the same colors, but I am lost. Any idea?
Something similar to this:
My first idea was to add lines to your figure by using plotly shapes and specifying the start and end points in x- and y-axis coordinates. However, when you use px.strip, plotly implements jittering (adding randomly generated small values, say between -0.1 and 0.1, to the x-coordinates under the hood to avoid points overlapping), but as far as I know, there is no way to retrieve the exact x-coordinates of each point.
However we can get around this by using go.Scatter to plot all the paired points individually, adding jittering as needed to the x-values and connecting each pair of points with a line. We are basically implementing px.strip ourselves but with full control of the exact coordinates of each point.
In order to toggle colors the same way that px.strip allows you to, we need to assign all points of the same color to the same legendgroup, and also only show the legend entry the first time a color is plotted (as we don't want an legend entry for each point)
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import pandas as pd
np.random.seed(1)
y0 = np.random.randn(50) - 1
y1 = np.random.randn(50) + 1
## sort both sets of data so we can easily connect them with line annotations
y0.sort()
y1.sort()
df = pd.DataFrame({'graph_name':['trace 0']*len(y0)+['trace 1']*len(y1),
'value': np.concatenate([y0,y1],0)}
# 'color':np.random.choice([0,1,2,3,4,5,6,7,8,9], size=100, replace=True)}
)
fig = go.Figure()
## i will set jittering to 0.1
x0 = np.array([0]*len(y0)) + np.random.uniform(-0.1,0.1,len(y0))
x1 = np.array([1]*len(y0)) + np.random.uniform(-0.1,0.1,len(y0))
## px.colors.sequential.Plasma contains 10 distinct colors
## colors_list = np.random.choice(px.colors.qualitative.D3, size=50)
## for simplicity, we repeat it 5 times instead of selecting randomly
## this guarantees the colors appear in order in the legend
colors_list = px.colors.qualitative.D3*5
color_number = {i:color for color,i in enumerate(px.colors.qualitative.D3)}
## keep track of whether the color is showing up for the first time as we build out the legend
colors_legend = {color:False for color in colors_list}
for x_start,x_end,y_start,y_end,color in zip(x0,x1,y0,y1,colors_list):
## if the color hasn't been added to the legend yet, add a legend entry
if colors_legend[color] == False:
fig.add_trace(
go.Scatter(
x=[x_start,x_end],
y=[y_start,y_end],
mode='lines+markers',
marker=dict(color=color),
line=dict(color="rgba(100,100,100,0.5)"),
legendgroup=color_number[color],
name=color_number[color],
showlegend=True,
hoverinfo='skip'
)
)
colors_legend[color] = True
## otherwise omit the legend entry, but add it to the same legend group
else:
fig.add_trace(
go.Scatter(
x=[x_start,x_end],
y=[y_start,y_end],
mode='lines+markers',
marker=dict(color=color),
line=dict(color="rgba(100,100,100,0.5)"),
legendgroup=color_number[color],
showlegend=False,
hoverinfo='skip'
)
)
fig.add_trace(go.Box(y=df.query('graph_name == "trace 0"')['value'], name='trace 0'))
fig.add_trace(go.Box(y=df.query('graph_name == "trace 1"')['value'], name='trace 1'))
fig.update_layout(autosize=False,
width=600,
height=600,
legend={'traceorder':'normal'})
fig.show()

Rounding Numbers in a Quartile Figures of a Plotly Box Plot

I have been digging around a while trying to figure out how to round the numbers displayed in quartile figures displayed in the hover feature. There must be a straightforward to do this as it is with the x and y coordinates. In this case rounding to two decimals would be sufficient.
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")
fig = go.Figure(data=go.Box(y=df['total_bill'],
name='total_bill',
boxmean=True,
)
)
fig.update_layout(width=800, height=800,
hoverlabel=dict(bgcolor="white",
font_size=16,
font_family="Arial",
)
)
fig.show()
Unfortunately this is something that it looks like Plotly cannot easily do. If you modify the hovertemplate, it will only apply to markers that you hover over (the outliers), and the decimals after each of the boxplot statistics will remain unchanged upon hovering. Another issue with plotly-python is that you cannot extract the boxplot statistics because this would require you to interact with the javascript under the hood.
However, you can calculate the boxplot statistics on your own using the same method as plotly and round all of the statistics down to two decimal places. Then you can pass boxplot statistics: lowerfence, q1, median, mean, q3, upperfence to force plotly to construct the boxplot manually, and plot all the outliers as another trace of scatters.
This is a pretty ugly hack because you are essentially redoing all of calculations Plotly already does, and then constructing the boxplot manually, but it does force the boxplot statistics to display to two decimal places.
from math import floor, ceil
from numpy import mean
import pandas as pd
import plotly.graph_objects as go
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")
## calculate quartiles as outlined in the plotly documentation
def get_percentile(data, p):
data.sort()
n = len(data)
x = n*p + 0.5
x1, x2 = floor(x), ceil(x)
y1, y2 = data[x1-1], data[x2-1] # account for zero-indexing
return round(y1 + ((x - x1) / (x2 - x1))*(y2 - y1), 2)
## calculate all boxplot statistics
y = df['total_bill'].values
lowerfence = min(y)
q1, median, q3 = get_percentile(y, 0.25), get_percentile(y, 0.50), get_percentile(y, 0.75)
upperfence = max([y0 for y0 in y if y0 < (q3 + 1.5*(q3-q1))])
## construct the boxplot
fig = go.Figure(data=go.Box(
x=["total_bill"]*len(y),
q1=[q1], median=[median], mean=[round(mean(y),2)],
q3=[q3], lowerfence=[lowerfence],
upperfence=[upperfence], orientation='v', showlegend=False,
)
)
outliers = y[y>upperfence]
fig.add_trace(go.Scatter(x=["total_bill"]*len(outliers), y=outliers, showlegend=False, mode='markers', marker={'color':'#1f77b4'}))
fig.update_layout(width=800, height=800,
hoverlabel=dict(bgcolor="white",
font_size=16,
font_family="Arial",
)
)
fig.show()
for me, setting yaxis_tickformat=",.2f" worked:
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")
fig = go.Figure(data=go.Box(y=df['total_bill'],
name='total_bill',
boxmean=True,
)
)
fig.update_layout(width=800, height=800,
# >>>>
yaxis_tickformat=",.2f",
# <<<<
hoverlabel=dict(bgcolor="white",
font_size=16,
font_family="Arial",
)
)
fig.show()
... you can also override yaxis back by setting text of y ticks:
fig.update_layout(
yaxis = dict(
tickformat=",.2f",
tickmode = 'array',
tickvals = [10, 20, 30, 40, 50],
ticktext =["10", "20", "30", "40", "50"],
))
if you want the y axis ticks unchanged
(tested on plotly 5.8.2)

Label specific bubbles in Plotly bubble chart

I am having trouble figuring out how to label specific bubbles in a plot.ly bubble chart. I want certain "outlier" bubbles to have text written inside the bubble instead of via hover text.
Let's say I have this data:
import plotly.plotly as py
import plotly.graph_objs as go
trace0 = go.Scatter(
x=[1, 2, 3, 4],
y=[10, 11, 12, 13],
mode='markers',
marker=dict(
size=[40, 60, 80, 100],
)
)
data = [trace0]
py.iplot(data, filename='bubblechart-size')
I'd like to only add text markers on bubbles that correspond to (1,10) and (4,13). Furthermore, is it possible to control the location of text markers?
You can achieve this with annotations.
This allows you to write any text you want on the chart and reference it to your data. You can also control where the text appears using position anchors or by applying an additional calculation on top of the x and y data. For example:
x_data = [1, 2, 3, 4]
y_data = [10, 11, 12, 13]
z_data = [40, 60, 80, 100]
annotations = [
dict(
x=x,
y=y,
text='' if 4 > x > 1 else z, # Some conditional to define outliers
showarrow=False,
xanchor='center', # Position of text relative to x axis (left/right/center)
yanchor='middle', # Position of text relative to y axis (top/bottom/middle)
) for x, y, z in zip(x_data, y_data, z_data)
]
trace0 = go.Scatter(
x=x_data,
y=y_data,
mode='markers',
marker=dict(
size=z_data,
)
)
data = [trace0]
layout = go.Layout(annotations=annotations)
py.iplot(go.Figure(data=data, layout=layout), filename='bubblechart-size')
Edit
If using cufflinks, then the above can be adapted slightly to:
bubbles_to_annotate = df[(df['avg_pos'] < 2) | (df['avg_pos'] > 3)] # Some conditional to define outliers
annotations = [
dict(
x=row['avg_pos'],
y=row['avg_neg'],
text=row['subreddit'],
showarrow=False,
xanchor='center', # Position of text relative to x axis (left/right/center)
yanchor='middle', # Position of text relative to y axis (top/bottom/middle)
) for _, row in bubbles_to_annotate.iterrows()
]
df.iplot(kind='bubble', x='avg_pos', y='avg_neg', size='counts',
text='subreddit', xTitle='Average Negative Sentiment',
yTitle='Average Positive Sentiment', annotations=annotations,
filename='simple-bubble-chart')
You will still need to define the annotations since you need a conditional argument. Then pass this directly to cufflinks via annotations.

Categories