Rounding Numbers in a Quartile Figures of a Plotly Box Plot

Rounding Numbers in a Quartile Figures of a Plotly Box Plot - python

I have been digging around a while trying to figure out how to round the numbers displayed in quartile figures displayed in the hover feature. There must be a straightforward to do this as it is with the x and y coordinates. In this case rounding to two decimals would be sufficient.
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")
fig = go.Figure(data=go.Box(y=df['total_bill'],
name='total_bill',
boxmean=True,
)
)
fig.update_layout(width=800, height=800,
hoverlabel=dict(bgcolor="white",
font_size=16,
font_family="Arial",
)
)
fig.show()

Unfortunately this is something that it looks like Plotly cannot easily do. If you modify the hovertemplate, it will only apply to markers that you hover over (the outliers), and the decimals after each of the boxplot statistics will remain unchanged upon hovering. Another issue with plotly-python is that you cannot extract the boxplot statistics because this would require you to interact with the javascript under the hood.
However, you can calculate the boxplot statistics on your own using the same method as plotly and round all of the statistics down to two decimal places. Then you can pass boxplot statistics: lowerfence, q1, median, mean, q3, upperfence to force plotly to construct the boxplot manually, and plot all the outliers as another trace of scatters.
This is a pretty ugly hack because you are essentially redoing all of calculations Plotly already does, and then constructing the boxplot manually, but it does force the boxplot statistics to display to two decimal places.
from math import floor, ceil
from numpy import mean
import pandas as pd
import plotly.graph_objects as go
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")
## calculate quartiles as outlined in the plotly documentation
def get_percentile(data, p):
data.sort()
n = len(data)
x = n*p + 0.5
x1, x2 = floor(x), ceil(x)
y1, y2 = data[x1-1], data[x2-1] # account for zero-indexing
return round(y1 + ((x - x1) / (x2 - x1))*(y2 - y1), 2)
## calculate all boxplot statistics
y = df['total_bill'].values
lowerfence = min(y)
q1, median, q3 = get_percentile(y, 0.25), get_percentile(y, 0.50), get_percentile(y, 0.75)
upperfence = max([y0 for y0 in y if y0 < (q3 + 1.5*(q3-q1))])
## construct the boxplot
fig = go.Figure(data=go.Box(
x=["total_bill"]*len(y),
q1=[q1], median=[median], mean=[round(mean(y),2)],
q3=[q3], lowerfence=[lowerfence],
upperfence=[upperfence], orientation='v', showlegend=False,
)
)
outliers = y[y>upperfence]
fig.add_trace(go.Scatter(x=["total_bill"]*len(outliers), y=outliers, showlegend=False, mode='markers', marker={'color':'#1f77b4'}))
fig.update_layout(width=800, height=800,
hoverlabel=dict(bgcolor="white",
font_size=16,
font_family="Arial",
)
)
fig.show()

for me, setting yaxis_tickformat=",.2f" worked:
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")
fig = go.Figure(data=go.Box(y=df['total_bill'],
name='total_bill',
boxmean=True,
)
)
fig.update_layout(width=800, height=800,
# >>>>
yaxis_tickformat=",.2f",
# <<<<
hoverlabel=dict(bgcolor="white",
font_size=16,
font_family="Arial",
)
)
fig.show()
... you can also override yaxis back by setting text of y ticks:
fig.update_layout(
yaxis = dict(
tickformat=",.2f",
tickmode = 'array',
tickvals = [10, 20, 30, 40, 50],
ticktext =["10", "20", "30", "40", "50"],
))
if you want the y axis ticks unchanged
(tested on plotly 5.8.2)

Related

Plotly to show 2 decimal points when hovering over the chart, not nearest point

I am using Plotly to build a line chart, and when I hover over the line I would like it to display the x and y axis values up to 2 decimal points, instead of displaying the nearest data point on the line chart. To explain better, please see the example:
df = pd.DataFrame({'col1':[0.5,1.5,2.5], 'time':[2,3.5,4.5]})
def plot():
fig = go.Figure()
fig.add_trace(go.Scatter(x = df['time'],
y = df['col1'],
mode='lines', name = 'time plot',
hovertemplate='%{x:.2f}: %{y:.2f}'))
fig.update_layout(title='Plot', xaxis_tickformat = '.3f')
So, when I hover over the line, I can see x and y axis values to the nearest point from my dataset. I would like to be able to see 2 decimal points, for example, if I hover over the line, I want to see the points 2.11, 2.12 etc from the x-axis, even though they are not available on the data points.

I cannot think of a way to do this using plotly methods but I was able to think of a workaround by creating another line plot and setting the opacity to zero.
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# your data
df = pd.DataFrame({'col1':[0.5,1.5,2.5], 'time':[2,3.5,4.5]})
# get the min and max X axis values
min_val, max_val = df['time'].agg([min, max])
# use np.arange to create the range with a step of .01
x = np.arange(min_val, max_val+.01, .01)
# create a zeros array of the same length
y = np.zeros(len(x))
# create your go.Figure object
fig = go.Figure()
# add your traces
fig.add_trace(go.Scatter(x=df['time'],
y=df['col1'],
name='time plot',
hovertemplate='%{x:.2f}: %{y:.2f}'))
fig.add_trace(go.Scatter(x=x,
y=y,
showlegend=False, # remove line from legend
hoverinfo='x',
opacity=0)) # set opacity to zero so it does not display on the graph
# your layout
fig.update_layout(hovermode='x unified', xaxis_tickformat = '.2f', title='Plot')
fig.show()

Connecting data points with lines in a Plotly boxplot in Python

I am working on some boxplots. I found this code very helpful and I managed to replicate it for my needs:
import plotly.express as px
import numpy as np
import pandas as pd
np.random.seed(1)
y0 = np.random.randn(50) - 1
y1 = np.random.randn(50) + 1
df = pd.DataFrame({'graph_name':['trace 0']*len(y0)+['trace 1']*len(y1),
'value': np.concatenate([y0,y1],0),
'color':np.random.choice([0,1,2,3,4,5,6,7,8,9], size=100, replace=True)}
)
fig = px.strip(df,
x='graph_name',
y='value',
color='color',
stripmode='overlay')
fig.add_trace(go.Box(y=df.query('graph_name == "trace 0"')['value'], name='trace 0'))
fig.add_trace(go.Box(y=df.query('graph_name == "trace 1"')['value'], name='trace 1'))
fig.update_layout(autosize=False,
width=600,
height=600,
legend={'traceorder':'normal'})
fig.show()
I am now trying to put some lines connecting the datapoints with the same colors, but I am lost. Any idea?
Something similar to this:

My first idea was to add lines to your figure by using plotly shapes and specifying the start and end points in x- and y-axis coordinates. However, when you use px.strip, plotly implements jittering (adding randomly generated small values, say between -0.1 and 0.1, to the x-coordinates under the hood to avoid points overlapping), but as far as I know, there is no way to retrieve the exact x-coordinates of each point.
However we can get around this by using go.Scatter to plot all the paired points individually, adding jittering as needed to the x-values and connecting each pair of points with a line. We are basically implementing px.strip ourselves but with full control of the exact coordinates of each point.
In order to toggle colors the same way that px.strip allows you to, we need to assign all points of the same color to the same legendgroup, and also only show the legend entry the first time a color is plotted (as we don't want an legend entry for each point)
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import pandas as pd
np.random.seed(1)
y0 = np.random.randn(50) - 1
y1 = np.random.randn(50) + 1
## sort both sets of data so we can easily connect them with line annotations
y0.sort()
y1.sort()
df = pd.DataFrame({'graph_name':['trace 0']*len(y0)+['trace 1']*len(y1),
'value': np.concatenate([y0,y1],0)}
# 'color':np.random.choice([0,1,2,3,4,5,6,7,8,9], size=100, replace=True)}
)
fig = go.Figure()
## i will set jittering to 0.1
x0 = np.array([0]*len(y0)) + np.random.uniform(-0.1,0.1,len(y0))
x1 = np.array([1]*len(y0)) + np.random.uniform(-0.1,0.1,len(y0))
## px.colors.sequential.Plasma contains 10 distinct colors
## colors_list = np.random.choice(px.colors.qualitative.D3, size=50)
## for simplicity, we repeat it 5 times instead of selecting randomly
## this guarantees the colors appear in order in the legend
colors_list = px.colors.qualitative.D3*5
color_number = {i:color for color,i in enumerate(px.colors.qualitative.D3)}
## keep track of whether the color is showing up for the first time as we build out the legend
colors_legend = {color:False for color in colors_list}
for x_start,x_end,y_start,y_end,color in zip(x0,x1,y0,y1,colors_list):
## if the color hasn't been added to the legend yet, add a legend entry
if colors_legend[color] == False:
fig.add_trace(
go.Scatter(
x=[x_start,x_end],
y=[y_start,y_end],
mode='lines+markers',
marker=dict(color=color),
line=dict(color="rgba(100,100,100,0.5)"),
legendgroup=color_number[color],
name=color_number[color],
showlegend=True,
hoverinfo='skip'
)
)
colors_legend[color] = True
## otherwise omit the legend entry, but add it to the same legend group
else:
fig.add_trace(
go.Scatter(
x=[x_start,x_end],
y=[y_start,y_end],
mode='lines+markers',
marker=dict(color=color),
line=dict(color="rgba(100,100,100,0.5)"),
legendgroup=color_number[color],
showlegend=False,
hoverinfo='skip'
)
)
fig.add_trace(go.Box(y=df.query('graph_name == "trace 0"')['value'], name='trace 0'))
fig.add_trace(go.Box(y=df.query('graph_name == "trace 1"')['value'], name='trace 1'))
fig.update_layout(autosize=False,
width=600,
height=600,
legend={'traceorder':'normal'})
fig.show()

draw quantile lines and connect two violin plots

How do I draw quantile lines and connect two violin plots in plotly in Python?
For example, there is a library to do this in R (https://github.com/GRousselet/rogme). The library provided does not necessarily work when there are more than two groups.

There is definitely no built-in method to do something this specific in Plotly. The best you can do is probably draw some lines, and consider writing a function or some loops if you need to do this for multiple groups of data for different quantile values.
Here is how I would get started. You can create a list or array to store all of the coordinates of the lines if you want to connect the same quantiles from the Grouped violin plots. I acknowledge what I have at the moment is hacky, as it relies on groups in Plotly having y-coordinates starting at 0 and increasing by 1. There might be a way to access the y-coordinates of grouped violin plots, I'd recommend looking into the documentation.
Some more work will need to be done if you want to add text boxes to indicate the values of quantiles.
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
# generate some random data that is normally distributed
np.random.seed(42)
y1 = np.random.normal(0, 1, 1000) * 1.5 + 6
y2 = np.random.normal(0, 5, 1000) + 6
# group the data together and combine into one dataframe
df1 = pd.DataFrame({'Group': 'Group1', 'Values': y1})
df2 = pd.DataFrame({'Group': 'Group2', 'Values': y2})
df_final = pd.concat([df1, df2])
fig = px.strip(df_final, x='Values', y='Group', color_discrete_sequence=['grey'])
quantiles_list = [0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95]
## this is a bit hacky and relies on y coordinates for groups starting from 0 and increasing by 1
y_diff = 0
## these store the coordinates in order to connect the quantile lines
lower_coordinates, upper_coordinates = [], []
for group_name in df_final.Group.unique():
for quantile in quantiles_list:
quantile_value = np.quantile(df_final[df_final['Group'] == group_name].Values, quantile)
if group_name == 'Group1':
lower_coordinates.append((quantile_value, 0.2+1*y_diff))
if group_name == 'Group2':
upper_coordinates.append((quantile_value, -0.2+1*y_diff))
fig.add_shape(
# Vertical Line for Group1
dict(
type="line",
x0=quantile_value,
y0=-0.2+1*y_diff,
x1=quantile_value,
y1=0.2+1*y_diff,
line=dict(
color="black",
width=4
)
),
)
y_diff += 1
## draw connecting lines
for idx in range(len(upper_coordinates)):
fig.add_shape(
dict(
type="line",
x0=lower_coordinates[idx][0],
y0=lower_coordinates[idx][1],
x1=upper_coordinates[idx][0],
y1=upper_coordinates[idx][1],
line=dict(
color="chocolate",
width=4
)
),
)
fig.show()

Plotly: How to set a fill color between two vertical lines?

Using matplotlib, we can "trivially" fill the area between two vertical lines using fill_between() as in the example:
https://matplotlib.org/3.2.1/gallery/lines_bars_and_markers/fill_between_demo.html#selectively-marking-horizontal-regions-across-the-whole-axes
Using matplotlib, I can make what I need:
We have two signals, and I''m computing the rolling/moving Pearson's and Spearman's correlation. When the correlations go either below -0.5 or above 0.5, I want to shade the period (blue for Pearson's and orange for Spearman's). I also darken the weekends in gray in all plots.
However, I'm finding a hard time to accomplish the same using Plotly. And it will also be helpful to know how to do it between two horizontal lines.
Note that I'm using Plotly and Dash to speed up the visualization of several plots. Users asked for a more "dynamic type of thing." However, I'm not a GUI guy and cannot spend time on this, although I need to feed them with initial results.
BTW, I tried Bokeh in the past, and I gave up for some reason I cannot remember. Plotly looks good since I can use either from Python or R, which are my main development tools.
Thanks,
Carlos

I don't think there is any built-in Plotly method that that is equivalent to matplotlib's fill_between() method. However you can draw shapes so a possible workaround is to draw a grey rectangle and set the the parameter layer="below" so that the signal is still visible. You can also set the coordinates of the rectangle outside of your axis range to ensure the rectangle extends to the edges of the plot.
You can fill the area in between horizontal lines by drawing a rectangle and setting the axes ranges in a similar manner.
import numpy as np
import plotly.graph_objects as go
x = np.arange(0, 4 * np.pi, 0.01)
y = np.sin(x)
fig = go.Figure()
fig.add_trace(go.Scatter(
x=x,
y=y
))
# hard-code the axes
fig.update_xaxes(range=[0, 4 * np.pi])
fig.update_yaxes(range=[-1.2, 1.2])
# specify the corners of the rectangles
fig.update_layout(
shapes=[
dict(
type="rect",
xref="x",
yref="y",
x0="4",
y0="-1.3",
x1="5",
y1="1.3",
fillcolor="lightgray",
opacity=0.4,
line_width=0,
layer="below"
),
dict(
type="rect",
xref="x",
yref="y",
x0="9",
y0="-1.3",
x1="10",
y1="1.3",
fillcolor="lightgray",
opacity=0.4,
line_width=0,
layer="below"
),
]
)
fig.show()

You haven't provided a data sample so I'm going to use a synthetical time-series to show you how you can add a number of shapes with defined start and stop dates for several different categories using a custom function bgLevel
Two vertical lines with a fill between them very quickly turns into a rectangle. And rectangles can easily be added as shapes using fig.add_shape. The example below will show you how to find start and stop dates for periods given by a certain critera. In your case these criteria are whether or not the value of a variable is higher or lower than a certain level.
Using shapes instead of traces with fig.add_trace() will let you define the position with regards to plot layers using layer='below'. And the shapes outlines can easily be hidden using line=dict(color="rgba(0,0,0,0)).
Plot 1: Time series figure with random data:
Plot 2: Background is set to an opaque grey when A > 100 :
Plot 2: Background is also set to an opaque red when D < 60
Complete code:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import datetime
pd.set_option('display.max_rows', None)
# data sample
nperiods = 200
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10, 12, size=(nperiods, 4)),
columns=list('ABCD'))
datelist = pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=nperiods).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0] = 0
df = df.cumsum().reset_index()
# function to set background color for a
# specified variable and a specified level
# plotly setup
fig = px.line(df, x='dates', y=df.columns[1:])
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
def bgLevels(fig, variable, level, mode, fillcolor, layer):
"""
Set a specified color as background for given
levels of a specified variable using a shape.
Keyword arguments:
==================
fig -- plotly figure
variable -- column name in a pandas dataframe
level -- int or float
mode -- set threshold above or below
fillcolor -- any color type that plotly can handle
layer -- position of shape in plotly fiugre, like "below"
"""
if mode == 'above':
m = df[variable].gt(level)
if mode == 'below':
m = df[variable].lt(level)
df1 = df[m].groupby((~m).cumsum())['dates'].agg(['first','last'])
for index, row in df1.iterrows():
#print(row['first'], row['last'])
fig.add_shape(type="rect",
xref="x",
yref="paper",
x0=row['first'],
y0=0,
x1=row['last'],
y1=1,
line=dict(color="rgba(0,0,0,0)",width=3,),
fillcolor=fillcolor,
layer=layer)
return(fig)
fig = bgLevels(fig = fig, variable = 'A', level = 100, mode = 'above',
fillcolor = 'rgba(100,100,100,0.2)', layer = 'below')
fig = bgLevels(fig = fig, variable = 'D', level = -60, mode = 'below',
fillcolor = 'rgba(255,0,0,0.2)', layer = 'below')
fig.show()

I think that fig.add_hrect() and fig.add_vrect() are the simplest approaches to reproducing the MatPlotLib fill_between functionality in this case:
https://plotly.com/python/horizontal-vertical-shapes/
For your example, add_vrect() should do the trick.

Logarithmic color scale in plotly

I'm trying to visualize the data with some outliers using Plotly and Python3. Outliers cause the color scale legend to look badly: there are only few high data points, but the legend looks bad: space between 2k and 10k is too big.
So the question is, how to change the appearance of 'color legend' on the right (see image below), so it will show the difference between 0 to 2k mostly? Unfortunately, couldn't get an answer from this doc file
Sample code (jupyter notebook):
import numpy as np
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()
x = np.random.randn(100,1) + 3
y = np.random.randn(100,1) + 10
x = np.reshape(x, 100)
y = np.reshape(y, 100)
color = np.random.randint(0,1000, [100])
color[[1,3,5]] = color[[1,3,5]] + 10000 # create outliers in color var
trace = Scatter(
x = x,
y = y,
mode = 'markers',
marker=dict(
color = color,
showscale=True,
colorscale = [[0, 'rgb(166,206,227, 0.5)'],
[0.05, 'rgb(31,120,180,0.5)'],
[0.1, 'rgb(178,223,138,0.5)'],
[0.15, 'rgb(51,160,44,0.5)'],
[0.2, 'rgb(251,154,153,0.5)'],
[1, 'rgb(227,26,28,0.5)']
]
)
)
fig = Figure(data=[trace])
iplot(fig)
What i'm looking for:

You can accomplish what I think you're after by customizing the colorscale, cmin, and cmax properties to have a discrete color change at exactly 2000. Then you can customize colorbar.tickvals to label the boundary as 2000. See https://plot.ly/python/reference/#scatter-marker-colorbar.
import numpy as np
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()
x = np.random.randn(100,1) + 3
y = np.random.randn(100,1) + 10
x = np.reshape(x, 100)
y = np.reshape(y, 100)
color = np.random.randint(0,1000, [100])
color[[1,3,5]] = color[[1,3,5]] + 10000 # create outliers in color var
bar_max = 2000
factor = 0.9 # Normalized location where continuous colorscale should end
trace = Scatter(
x = x,
y = y,
mode = 'markers',
marker=dict(
color = color,
showscale=True,
cmin=0,
cmax= bar_max/factor,
colorscale = [[0, 'rgb(166,206,227, 0.5)'],
[0.05, 'rgb(31,120,180,0.5)'],
[0.2, 'rgb(178,223,138,0.5)'],
[0.5, 'rgb(51,160,44,0.5)'],
[factor, 'rgb(251,154,153,0.5)'],
[factor, 'rgb(227,26,28,0.5)'],
[1, 'rgb(227,26,28,0.5)']
],
colorbar=dict(
tickvals = [0, 500, 1000, 1500, 2000],
ticks='outside'
)
)
)
fig = Figure(data=[trace])
iplot(fig)
New figure result

Since you asked with a precise question, I try to reply with a precise answer, even if I don't think this could not be the best in data visualization. Later I show you why.
Anyway, you can normalize the values of the colors and "squeeze" your data in a much smaller interval. It mathematically represents the power to which the number e must be raised to produce the original value. You can use log10 if you're more comfortable with.
The code is very very simple, I attach only the trace definition as the rest is unchanged. I placed a standard cmap for convenience as the interval of the values is continuous.
trace = Scatter(
x = x,
y = y,
mode = 'markers',
marker=dict(
color = np.log(color),
showscale=True,
colorscale = 'RdBu'
)
)
As I said, transforming the values with log isn't always the best. It actually forces the observer to a rough reading of the graph. As example, nevertheless in my example the orange markers range between 410 and 950, can you tell the difference?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Rounding Numbers in a Quartile Figures of a Plotly Box Plot - python

Related

Plotly to show 2 decimal points when hovering over the chart, not nearest point

Connecting data points with lines in a Plotly boxplot in Python

draw quantile lines and connect two violin plots

Plotly: How to set a fill color between two vertical lines?

Logarithmic color scale in plotly

Categories

Resources