Python Seaborn Boxplot: Overlay 95 percentile values on whisker - python

I want to overlay 95 percentile values on seaborn boxplot. I could not figure out the ways to overlay text or if there is seaborn capability for that. How would I modify following code to overlay the 95 percentile values on plot.
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.randn(200, 4), columns=list('ABCD'))*100
alphabet = list('AB')
df['Gr'] = np.random.choice(np.array(alphabet, dtype="|S1"), df.shape[0])
df_long = pd.melt(df, id_vars=['Gr'], value_vars = ['A','B','C','D'])
sns.boxplot(x = "variable", y="value", hue = 'Gr', data=df_long, whis = [5,95])

Consider seaborn's plot.text, borrowing from #bernie's answer (also a healty +1 for including sample dataset). The only challenge is adjusting the alignment due to grouping in hue field to have labels overlay over each boxplot series. Even have labels color coded according to series.
import pandas as pd
import numpy as np
import seaborn as sns
np.random.seed(61518)
# ... same as OP
# 95TH PERCENTILE SERIES
pctl95 = df_long.groupby(['variable', 'Gr'])['value'].quantile(0.95)
pctl95_labels = [str(np.round(s, 2)) for s in pctl95]
# GROUP INDEX TUPLES
grps = [(i, 2*i, 2*i+1) for i in range(4)]
# [(0,0,1), (1,2,3), (2,4,5), (3,6,7)]
pos = range(len(pctl95))
# ADJUST HORIZONTAL ALIGNMENT WITH MORE SERIES
for tick, label in zip(grps, hplot.get_xticklabels()):
hplot.text(tick[0]-0.1, pctl95[tick[1]] + 0.95, pctl95_labels[tick[1]],
ha='center', size='x-small', color='b', weight='semibold')
hplot.text(tick[0]+0.1, pctl95[tick[2]] + 0.95, pctl95_labels[tick[2]],
ha='center', size='x-small', color='g', weight='semibold')
sns.plt.show()

Related

Change the color of lines with categorical column values [duplicate]

I am trying to plot two columns of a pandas dataframe against each other, grouped by a values in a third column. The color of each line should be determined by that third column, i.e. one color per group.
For example:
import pandas as pd
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3]})
df.groupby('colors').plot('x','y',ax=ax)
If I do it this way, I end up with three different lines plotting x against y, with each line a different color. I now want to determine the color by the values in 'colors'. How do I do this using a gradient colormap?
Looks like seaborn is applying the color intensity automatically based on the value in hue..
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2,3,4,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3,1.5,1.5,1.5]})
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors')
Gives:
you can change the colors by adding palette argument as below:
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
#more combinations : viridis, mako, flare, etc.
gives:
Edit (for colormap):
based on answers at Make seaborn show a colorbar instead of a legend when using hue in a bar plot?
import seaborn as sns
fig = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
norm = plt.Normalize(vmin = df['colors'].min(), vmax = df['colors'].max())
sm = plt.cm.ScalarMappable(cmap="mako", norm = norm)
fig.figure.colorbar(sm)
fig.get_legend().remove()
plt.show()
gives..
Hope that helps..
Complementing to Prateek's very good answer, once you have assigned the colors based on the intensity of the palette you choose (for example Mako):
plots = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors',palette='mako')
You can add a colorbar with matplotlib's function plt.colorbar() and assign the palette you used:
sm = plt.cm.ScalarMappable(cmap='mako')
plt.colorbar(sm)
After plt.show(), we get the combined output:

Seaborn: reverse cbar

I have a Dataframe which represents a binary matrix (0 and 1), with labels on rows and columns. I'm using the following code to print the matrix assigning each label a color:
import seaborn as sns
import matplotlib.pylab as plt
import matplotlib as mpl
import pandas as pd
import numpy as np
N = 100
M = 200
p = 0.8
df = pd.DataFrame(np.random.choice([0,1], (M,N), p=(p, 1-p)),
columns=sorted((list(range(10))*N)[0:N]),
index=sorted((list(range(10))*N)[0:M]))
cmap = mpl.colors.ListedColormap([(.8, .8, .8, 1.0)] + [plt.cm.jet(i) for i in range(plt.cm.jet.N-1)])
ax = sns.heatmap(df.apply(lambda s: (s.name==s.index)*s*(s.index+1)), mask=df.eq(0), cmap=cmap )
My issue is that the colors displayed in the cbar are in the reversed order with respect to those shown in the figure (and so are the labels). How can I reverse the colors and the labels in the cbar?
I tried:
ax.invert_yaxis()
but it also changes the structure of the plot.
Is there a solution?
You can grab the colorbar via ax.collections[0].colorbar and then call invert_yaxis() on its ax.
ax.collections[0].colorbar.ax.invert_yaxis()

Python Seaborn Chart - Shadow Area

Sorry to my noob question, but how can I add a shadow area/color between the upper and lower lines in a seaborn chart?
The primary code I've working on is the following:
plt.figure(figsize=(18,10))
sns.set(style="darkgrid")
palette = sns.color_palette("mako_r", 3)
sns.lineplot(x="Date", y="Value", hue='Std_Type', style='Value_Type', sizes=(.25, 2.5), palette = palette, data=tbl4)
The idea is to get some effect like below (the example from seaborn website):
But I could not replicate the effect although my data structure is pretty much in the same fashion as fmri (seaborn example)
from seaborn link:
import seaborn as sns
sns.set(style="darkgrid")
# Load an example dataset with long-form data
fmri = sns.load_dataset("fmri")
# Plot the responses for different events and regions
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri)
Do you have some ideas?
I tried to change the chart style, but if I go to a distplot or relplot, for example, the x_axis cannot show the timeframe...
Check this code:
# import
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
sns.set(style = 'darkgrid')
# data generation
time = pd.date_range(start = '2006-01-01', end = '2020-01-01', freq = 'M')
tbl4 = pd.DataFrame({'Date': time,
'down': 1 - 0.5*np.random.randn(len(time)),
'up': 4 + 0.5*np.random.randn(len(time))})
tbl4 = tbl4.melt(id_vars = 'Date',
value_vars = ['down', 'up'],
var_name = 'Std_Type',
value_name = 'Value')
# figure plot
fig, ax = plt.subplots(figsize=(18,10))
sns.lineplot(ax = ax,
x = 'Date',
y = 'Value',
hue = 'Std_Type',
data = tbl4)
# fill area
plt.fill_between(x = tbl4[tbl4['Std_Type'] == 'down']['Date'],
y1 = tbl4[tbl4['Std_Type'] == 'down']['Value'],
y2 = tbl4[tbl4['Std_Type'] == 'up']['Value'],
alpha = 0.3,
facecolor = 'green')
plt.show()
which gives me this plot:
Since I do not have access to your data, I generated random ones. Replace them with yours.
The shadow area is done with plt.fill_between (documentation here), where you specify the x array (common to both curves), the upper and lower limits of the area as y1 and y2 and, optionally a color and its transparency with the facecolor and alpha parameters respectively.
You cannot do it through ci parameter, since it is used to show the confidence interval of your data.

How to add a marker from a different column to a seaborn pandas barplot

I have the following dataset, code and plot:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
data = [['tom', 10,15], ['matt', 13,10]]
df3 = pd.DataFrame(data, columns = ['Name', 'Attempts','L4AverageAttempts'])
f,ax = plt.subplots(nrows=1,figsize=(16,9))
sns.barplot(x='Attempts',y='Name',data=df3)
plt.show()
How can get a marker of some description (dot, *, shape, etc) to show that tomhas averaged 15 (so is below his average) and matt has averaged 10 so is above average. So a marker basxed off the L4AverageAttempts value for each person.
I have looked into axvline but that seems to be only a set number rather than a specific value for each y axis category. Any help would be much appreciated! thanks!
You can simply plot a scatter plot on top of your bar plot using L4AverageAttempts as the x value:
You can use seaborn.scatterplot for this. Make sure to set the zorder parameter so that the markers appear on top of the bars.
import seaborn as sns
import pandas as pd
data = [['tom', 10,15], ['matt', 13,10]]
df3 = pd.DataFrame(data, columns = ['Name', 'Attempts','L4AverageAttempts'])
f,ax = plt.subplots(nrows=1,figsize=(16,9))
sns.barplot(x='Attempts',y='Name',data=df3)
sns.scatterplot(x='L4AverageAttempts', y="Name", data=df3, zorder=10, color='k', edgecolor='k')
plt.show()

Plotting grouped data in same plot using Pandas

In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()

Categories