How to draw custom error bars with plotly? - python

I have a data frame with one column that describes y-axis values and two more columns that describe the upper and lower bounds of a confidence interval. I would like to use those values to draw error bars using plotly. Now I am aware that plotly offers the possibility to draw confidence intervals (using the error_y and error_y_minus keyword-arguments) but not in the logic that I need, because those keywords are interpreted as additions and subtractions from the y-values. Instead, I would like to directly define the upper and lower positions:
For instance, how could I use plotly and this example data frame
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'x':[0, 1, 2],
'y':[6, 10, 2],
'ci_upper':[8,11,2.5],
'ci_lower':[5,9,1.5]})
to produce a plot like this?

used Plotly Express to create bar chart
used https://plotly.com/python/error-bars/#asymmetric-error-bars for generation of error bars using appropriate subtractions with your required outcome
import pandas as pd
import plotly.express as px
df = pd.DataFrame(
{"x": [0, 1, 2], "y": [6, 10, 2], "ci_upper": [8, 11, 2.5], "ci_lower": [5, 9, 1.5]}
)
px.bar(df, x="x", y="y").update_traces(
error_y={
"type": "data",
"symmetric": False,
"array": df["ci_upper"] - df["y"],
"arrayminus": df["y"] - df["ci_lower"],
}
)

Related

How do I normalize plotly express's histogram as probability density for multiple groups?

Let's take this toy df as an example:
import pandas as pd
toy_df = pd.DataFrame({"value": [1, 12, 16, 4, 27, 38, 19], "group": [1, 1, 2, 1, 2, 2, 2]})
Using this df, we can build a histogram in plotly express with data normalized as probability density as follows:
import plotly.express as px
px.histogram(toy_df, x="value", histnorm='probability density')
This will give us the following graph:
Now, if we manually calculate the cumulative area of both rectangles it adds up to 1.
But if we introduce the group column as color it gets probability densities separately for both groups:
px.histogram(toy_df, x="value", color = "group", histnorm='probability density')
How could I make them add up to 1 as well?
I investigated the documentation of px.histogram, but haven't found any parameter that could help with that.
I would appreciate any suggestion how this might be obtained within plotly express (I am pretty sure matplotlib is more flexible in this case).

plotly don't show zeros in area plot

For context: I'd like to make a plot in plotly showing the evolution of an investment portfolio where the value of each asset is plotted on top of each other. Since assets are bought and sold, not every asset should be shown for the entire range of the curve.
The below example can clarify this. Leading or trailing zeros indicate that the asset was not in the portfolio at that moment.
import pandas as pd
import plotly.express as px
import numpy as np
data = {"Asset 1": [0, 1, 2, 3, 4, 5], "Asset 2": [0, 0, 2, 3, 2, 2], "Asset 3": [1, 1, 3, 0, 0, 0]}
df = pd.DataFrame(data)
fig = px.area(df)
fig.show()
This results in the following figure:
The problem is now that at the indicated time (index=4), Asset 3 is not in the portfolio anymore, hence its value 0. However it is still shown, and the bigger problem is that it makes it impossible to see the value of Asset 2 which is in the portfolio.
I tried changing the zeros to NaN values to indicate that they don't exist but that gives the exact same figure.
data2 = {"a": [np.nan, 1, 2, 3, 4, 5], "b": [np.nan, np.nan, 2, 3, 2, 2], "c": [1, 1, 3, np.nan, np.nan, np.nan]}
df2 = pd.DataFrame(data2)
fig2 = px.area(df2)
fig2.show()
I am afraid I cannot construct an elegant solution. However this will work for most requirements you stated. How it works:
Instead of using the auto stack function, draw the line one by one by yourself.
That means you will have to pre-process the dataframe a little bit - by calculating the values of column A+B and column A+B+C.
plotly.express offers limited custom control. Instead of using plotly.express, use plotly.graph_objects. They have similar syntax.
The order of placing the "traces" (aka. lines) is important. The last line rendered get placed on the top. In your problem statement, the lines get drawn from left-most to right-most column, and that's why overlapping would favor the right-er column.
The NaN values has to be zero-filled manually before the plotting. Otherwise the filled areas create weird shapes, considering your sample data contains a certain amount of NaNs.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
data = {"a": [np.nan, 1, 2, 3, 4, 5], "b": [np.nan, np.nan, 2, 3, 2, 2], "c": [1, 1, 3, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)
# fill NAs with zeros before doing anything
df = df.fillna(0)
fig = go.Figure()
# add lines one by one. The order matters - last one lays on top along with its hoverinfo
fig.add_trace(go.Scatter(
x=df.index,
y=df['a'],
mode='lines',
fill='tonexty', # fill the area under line to next y
))
fig.add_trace(go.Scatter(
x=df.index,
y=df['a']+df['b'], # sum of 'a' and 'b'
mode='lines',
fill='tonexty', # fill the area under line to next y
))
fig.add_trace(go.Scatter(
x=df.index,
y=df['a']+df['b']+df['c'], # sum of 'a' and 'b' and 'c'
mode='lines',
fill='tonexty', # fill the area under line to next y
))
# minor bug where an area below zero is shown
fig.update_layout(yaxis=dict(range=[0, max(df.sum(axis=1) * 1.05)]))
fig.show()
The resulting plot would look like:
The green line, representing values of df['a']+df['b']+df['c'] still sits on the top. However, the hover label is now showing the value of df['a']+df['b']+df['c'] instead of either of the assets.
In fact, I found these asset-allocation-y plot prettier without the edge lines:
and this can be done by setting mode='none' for each of the 3 plot objects.
Remarks:
Another way I have tried for anyone who is reading: consider each filled area and line as two separate traces. By doing so, you will need to define custom pairs of colors (solid and its half-transparent color). There were some buggy results for this. Also, the struggle of traces with stackgroup set in argument cannot contain NaN values and NaN values will either be zero-filled or interpolated. This creates bad plots in the context of this problem.

Python Plotly hoverover information for two or more heatmaps

I am plotting 3 heatmaps in plotly on top of each other and would like to display the z value of all 3 when I hover over the (x,y) points.
I have seen that for scatter plots you can use unified x to display the info of all plots on hover. Is there a similar way to do a unified z for heatmap plots?
I have also seen that you can create a data frame of custom texts and use those as hoverlabels but that seems a little too excessive for what I'm trying to do.
Thanks
this is effectively the answer Plotly Python - Heatmap - Change Hovertext (x,y,z)
simulated 3 heat maps on top of each other
build text array which are the z values across all three layers
import pandas as pd
import numpy as np
import plotly.graph_objects as go
dfs = [pd.DataFrame(index=list("abcd"), columns=list("ab"),
data=np.where(np.random.randint(1, 8, [4, 2]) == 1,
np.nan, np.random.randint(1, 500, [4, 2]),)
)
for i in range(3)]
# create text array same shape as z
text = pd.concat(dfs).groupby(level=0).agg({c:lambda v: ", ".join(v.astype(str)) for c in dfs[0].columns}).values
# figure
go.Figure([go.Heatmap(z=df.values, x=df.columns, y=df.index, name=i, text=text, hoverinfo="text")
for i, df in enumerate(dfs)
])

Disable hue nesting in Seaborn

When plotting a bar chart with Seaborn and using the hue parameter to color the bars according to their column value, bars with identical column values are nested, or aggregated, and only a single bar is shown. The image below illustrates the problem. Patient number 1 has two samples of sample_type 1, with values 10 and 20. The two values have been nested, and both values are represented as a single bar (as the average of the two).
I'd like to avoid this nesting, and rather have something like in the image below.
Is this possible to achieve? MVE below. Thanks!
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({
"patient_number": [1, 1, 1, 2, 2, 2],
"sample_type": [1, 1, 2, 1, 2, 3],
"value": [10, 20, 15, 10, 11, 12]
})
sns.barplot(x="patient_number", y="value", hue="sample_type", data=df)
plt.show()
The following approach obtains the desired plot:
Seaborn's hue= parameter both defines the color and the position of the bars.
Per patient, an extra field ('idx') contains a unique number for each of the desired bars. This field 'idx' restarts from 0 for every next patient and is added to the dataframe.
'idx' can then be used as hue='idx' to get the desired columns, although they will be colored just sequently.
In order to get one color per sample type, an extra column now contains a factorized version of the sample types (so, 0 for the first type, 1 for the next, etc.)
Seaborn generates the bars per hue, one for each patient. These bars can be accessed as a list via ax.patches If some patient doesn't have a value for a given 'idx', a dummy bar is will be added to the list.
By iterating through the patients and then through the 'idx', all bars can be visited and colored via 'sample_type'. As the ordering of the bars is a bit tricky, an adequate renumbering is needed.
The legend needs to be changed to reflect the sample types.
The given data is extended a bit to be able to test different numbers of samples per patient, and sample types that aren't simple subsequent numbers.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({
'patient_number': [1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3],
'sample_type': ['st1', 'st1', 'st2', 'st1', 'st2', 'st3', 'st4', 'st4', 'st4', 'st4', 'st4'],
'value': [10, 20, 15, 10, 11, 12, 1, 2, 3, 4, 5]
})
df['idx'] = df.groupby('patient_number').cumcount()
df['sample_factors'], sample_labels = pd.factorize(df['sample_type'])
ax = sns.barplot(x='patient_number', y='value', hue='idx', data=df)
colors = plt.cm.get_cmap('Set2').colors # https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
handles = [None for _ in sample_labels]
num_patients = len(ax.patches) // (df['idx'].max() + 1)
for i, (patient_id, group) in enumerate(df.groupby('patient_number')):
for j, factor in enumerate(group['sample_factors']):
patch = ax.patches[i + j * num_patients]
patch.set_color(colors[factor])
handles[factor] = patch
ax.legend(handles=handles, labels=list(sample_labels), title='Sample type')
plt.show()

How do I show only available values in the x-axis

I would like to plot a chart with plotly that shows only the existing values in the x-axis.
When I execute the code below, a chart that looks like in the following image appears:
The range on the x-axis as well as the range on the y-axis is evenly set from zero up to the maximal value.
import plotly.graph_objs as go
from plotly.offline import plot
xValues = [1, 2, 27, 50]
yValues = [7, 1, 2, 3]
trace = go.Scatter( x = xValues, y = yValues, mode='lines+markers', name='high limits' )
plottedData = [trace]
plot( plottedData )
Now, I would like to show only the existing values on the x axis. Related to my example, I want just the values [1, 2, 27, 50] to appear. And they should have the same space in between. Is this possible? If yes, how?
You can force the xaxis.type to be category like this:
plot( dict(data=plottedData, layout=go.Layout(xaxis = {"type": "category"} )))

Categories