I'm trying to plot multi-hue distributions with Seaborn, but I find that the plots are difficult to be traced back to the tick they belong to. I have tried to add a grid, but the grid is only showing on the dimension of the distribution, so separating the distribution itself but not different distributions from each other. Is it possible to have Seaborn add a grid line between different violin plot groups/hues? To illustrate, take one of the plots from the docs. I've added what I'd like to see to this plot (I've made the width of these separators quite heavy for illustration purposes, in the solution I'd like them to be just as thick as the grid lines):
You could use matplotlib's axvline to draw vertical lines at positions 0.5, 1.5, ...
import numpy as np
import seaborn as sns
sns.set(style="whitegrid")
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="muted")
for i in range(len(np.unique(tips['day'])) - 1):
ax.axvline(i + 0.5, color='grey', lw=1)
plt.show()
Alternatively, you could set minor ticks at these positions and turn the minor gridlines on for the x-axis.
Related
Below is the code I am using for generating the plot, but the issue is style of the marker in the graph is different from that of the plot
sns.set_style(rc={'boxplot.flierprops.markeredgecolor':'black' ,'boxplot.flierprops.markeredgewidth':1.25,'boxplot.flierprops.markerfacecolor':'white'})
fig, scatter = plt.subplots(figsize = (6,4), dpi = 100)
scatter = sns.lineplot(data=df_whole,x='shortest_distance',y='similarity',style ='Metric',hue='Metric'
,markers=True,lw=1,markeredgewidth=1.25,markeredgecolor='black',markersize=7,dashes= False,errorbar=None,markerfacecolor='white')
scatter.set(title='TF-IDF')
scatter.legend(title = "Similarity Methods",prop={'size': 12})
As seaborn uses complex combinations of matplotlib elements to create its plots, and tries to make the legend as compact as possible, the legend is often custom-made. As such, seaborn unfortunately does not always take into account all matplotlib-level parameters.
In this case, the problem can be worked around via assigning these parameters again to the legend handles. Here is an example using one of seaborn's test datasets:
import matplotlib.pyplot as plt
import seaborn as sns
flights = sns.load_dataset('flights')
markerprops = dict(markeredgewidth=1.25, markeredgecolor='black', markersize=7, markerfacecolor='none')
ax = sns.lineplot(data=flights, x='year', y='passengers', style='month', hue='month',
markers=True, lw=1, dashes=False, errorbar=None, **markerprops)
ax.set(title='TF-IDF')
handles, labels = ax.get_legend_handles_labels()
for h in handles:
h.set(**markerprops)
ax.legend(handles=handles, title="Months", prop={'size': 12}, ncol=3)
plt.tight_layout()
plt.show()
PS: Matplotlib functions usually return the graphical elements they created (e.g. scatter dots or lines), while seaborn (and pandas) usually returns the subplot (ax) or grid of subplots. As such, giving the name scatter to the return value of sns.lineplot might be confusing when comparing code with other matplotlib and seaborn examples.
I'm plotting two series on one chart using twiny() and want the grid lines behind the bars. However, by default the behaviour is mixed: in front of elements of the original axes and behind the twin's.
Neither axes.set_axisbelow(True) nor zorder=0 with the grid call fixes this. Is this a bug in matplotlib? Anyone know of a work around?
Here's an example to illustrate:
import matplotlib, matplotlib.pyplot as plt
print('matplotlib: {}'.format(matplotlib.__version__))
series1 = [1.25,2.25,3]
series2 = [12,9.75,6.75]
categories = ['A','B','C']
fig, axes1 = plt.subplots(nrows=1,ncols=1)
axes2 = axes1.twiny()
axes1.barh(categories, series1, color='red', align='edge', height=-0.4)
axes2.barh(categories, series2, color='blue', align='edge', height=+0.4)
axes1.set_axisbelow(True)
axes2.set_axisbelow(True)
axes1.grid(axis='x')
axes2.grid(axis='x')
plt.show()
matplotlib v. 3.2.2
Everything drawn on one axis (including the grid) is always or completely before or completely behind everything drawn on the other axis. The only solution in your case is to only draw a grid for axes1, removing the call to the grid for the other axis.
I'm using Seaborn to make some plots using the whitegrid style. After calling despine(), I'm seeing that the gridlines that would overlap with the axes spines have smaller linewidth than the other gridlines. But it seems like this only happens when I save the plots as pdf. I'm sharing
three different figures with different despine configurations that show the effect.
Does anyone know why this occurs? And is there a simple fix?
PDF plot with all spines
PDF plot that despines all axes
PDF plot that despines left, top, and right axes
Code:
splot = sns.boxplot(data=df, palette=color, whis=np.inf, width=0.5, linewidth = 0.5)
splot.set_ylabel('Normalized WS')
plt.xticks(rotation=90)
plt.tight_layout()
sns.despine(left=True, bottom=True)
plt.savefig('test.pdf', bbox_inches='tight')
Essentially what's happening here is that the grid lines are centered on the tick position, so the outer half of the extreme grid lines are not drawn because they extend past the limits of the axes.
One approach is to disable clipping for the grid lines:
import numpy as np
import seaborn as sns
sns.set(style="whitegrid", rc={"grid.linewidth": 5})
x = np.random.randn(100, 6)
ax = sns.boxplot(data=x)
ax.yaxis.grid(True, clip_on=False)
sns.despine(left=True)
My hacking solution now is to not despine the top and bottom axes and make them the same width as the gridlines. This is not ideal. If someone can point out a way to fix the root cause, I will really appreciate that.
I am trying to figure a nice way to plot two distplots (from seaborn) on the same axis. It is not coming out as pretty as I want since the histogram bars are covering each other. And I don't want to use countplot or barplot simply because they don't look as pretty. Naturally if there is no other way I shall do it in that fashion, but distplot looks very good. But, as said, the bars are now covering each other (see pic).
Thus is there any way to fit two distplot frequency bars onto one bin so that they do not overlap? Or placing the counts on top of each other? Basically I want to do this in seaborn:
Any ideas to clean it up are most welcome. Thanks.
MWE:
sns.set_context("paper",font_scale=2)
sns.set_style("white")
rc('text', usetex=False)
fig, ax = plt.subplots(figsize=(7,7),sharey=True)
sns.despine(left=True)
mats=dict()
mats[0]=[1,1,1,1,1,2,3,3,2,3,3,3,3,3]
mats[1]=[3,3,3,3,3,4,4,4,5,6,1,1,2,3,4,5,5,5]
N=max(max(set(mats[0])),max(set(mats[1])))
binsize = np.arange(0,N+1,1)
B=['Thing1','Thing2']
for i in range(len(B)):
ax = sns.distplot(mats[i],
kde=False,
label=B[i],
bins=binsize)
ax.set_xlabel('My label')
ax.get_yaxis().set_visible(False)
ax.legend()
plt.show()
As #mwaskom has said seaborn is wrapping matplotlib plotting functions (well to most part) to deliver more complex and nicer looking charts.
What you are looking for is "simple enough" to get it done with matplotlib:
sns.set_context("paper", font_scale=2)
sns.set_style("white")
plt.rc('text', usetex=False)
fig, ax = plt.subplots(figsize=(4,4))
sns.despine(left=True)
# mats=dict()
mats0=[1,1,1,1,1,2,3,3,2,3,3,3,3,3]
mats1=[3,3,3,3,3,4,4,4,5,6,1,1,2,3,4,5,5,5]
N=max(mats0 + mats1)
# binsize = np.arange(0,N+1,1)
binsize = N
B=['Thing1','Thing2']
ax.hist([mats0, mats1], binsize, histtype='bar',
align='mid', label=B, alpha=0.4)#, rwidth=0.6)
ax.set_xlabel('My label')
ax.get_yaxis().set_visible(False)
# ax.set_xlim(0,N+1)
ax.legend()
plt.show()
Which yields:
You can uncomment ax.set_xlim(0,N+1) to give more space around this histogram.
I am trying to create violinplots that shows confidence intervals for the mean. I thought an easy way to do this would be to plot a pointplot on top of the violinplot, but this is not working since they seem to be using different indices for the xaxis as in this example:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset("titanic")
titanic.dropna(inplace=True)
fig, (ax1,ax2,ax3) = plt.subplots(1,3, sharey=True, figsize=(12,4))
#ax1
sns.pointplot("who", "age", data=titanic, join=False,n_boot=10, ax=ax1)
#ax2
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax2)
#ax3
sns.pointplot("who", "age", data=titanic, join=False, n_boot=10, ax=ax3)
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax3)
ax3.set_xlim([-0.5,4])
print(ax1.get_xticks(), ax2.get_xticks())
gives: [0 1 2] [1 2 3]
Why are these plots not assigning the same xtick numbers to the 'who'-variable and is there any way I can change this?
I also wonder if there is anyway I can change the marker for pointplot, because as you can see in the figure, the point is so big so that it covers the entire confidence interval. I would like just a horizontal line if possible.
I'm posting my final solution here. The reason I wanted to do this kind of plot to begin with, was to display information about the distribution shape, shift in means, and outliers in the same figure. With mwaskom's pointers and some other tweaks I finally got what I was looking for.
The left hand figure is there as a comparison with all data points plotted as lines and the right hand one is my final figure. The thick grey line in the middle of the violin is the bootstrapped 99% confidence interval of the mean, which is the white horizontal line, both from pointplot. The three dotted lines are the standard 25th, 50th and 75th percentile and the lines outside that are the caps of the whiskers of a boxplot I plotted on top of the violin plot. Individual data points are plotted as lines beyond this points since my data usually has a few extreme ones that I need to remove manually like the two points in the violin below.
For now, I am going to to continue making histograms and boxplots in addition to these enhanced violins, but I hope to find that all the information is accurately captured in the violinplot and that I can start and rely on it as my main initial data exploration plot. Here is the final code to produce the plots in case someone else finds them useful (or finds something that can be improved). Lots of tweaking to the boxplot.
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
#change the linewidth which to get a thicker confidence interval line
mpl.rc("lines", linewidth=3)
df = sns.load_dataset("titanic")
df.dropna(inplace=True)
x = 'who'
y = 'age'
fig, (ax1,ax2) = plt.subplots(1,2, sharey=True, figsize=(12,6))
#Left hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax1, inner='stick')
#Right hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax2, positions=0)
sns.pointplot(df[x],df[y], join=False, ci=99, n_boot=1000, ax=ax2, color=[0.3,0.3,0.3], markers=' ')
df.boxplot(y, by=x, sym='_', ax=ax2, showbox=False, showmeans=True, whiskerprops={'linewidth':0},
medianprops={'linewidth':0}, flierprops={'markeredgecolor':'k', 'markeredgewidth':1},
meanprops={'marker':'_', 'color':'w', 'markersize':6, 'markeredgewidth':1.5},
capprops={'linewidth':1, 'color':[0.3,0.3,0.3]}, positions=[0,1,2])
#One could argue that this is not beautiful
labels = [item.get_text() + '\nn=' + str(df.groupby(x).size().loc[item.get_text()]) for item in ax2.get_xticklabels()]
ax2.set_xticklabels(labels)
#Clean up
fig.suptitle('')
ax2.set_title('')
fig.set_facecolor('w')
Edit: Added 'n='
violinplot takes a positions argument that you can use to put the violins somewhere else (they currently just inherit the default matplotlib boxplot positions).
pointplot takes a markers argument that you can use to change how the point estimate is rendered.