Plotting a reference line over facet plots - python

I need to scatter plot data with its own line by type with a comparison to a reference line for each facet. I am wrestling with getting the line equation y=8x+10 to plot on each facet plot.
import pandas as pd
import seaborn as sns
sns.lmplot(x="18O‰ VSMOW", y="D‰ VSMOW", hue="Type",
col="Type", col_wrap=2, data=df)
My goal is to enable easy comparison of each Type to a known general relationship. Below, I drew in what I would like on the top two plots:

As of matplotlib 3.3, use axline() to easily plot reference lines.
Figure-level functions like lmplot return a FacetGrid, so store the grid to access the facets.
Either use FacetGrid.map_dataframe() to apply axline to each facet:
# store underlying facet grid
g = sns.lmplot(x='total_bill', y='tip', col='day', hue='day', col_wrap=2, data=df)
# apply axline to each facet (y = 0.18*x - 0.3)
g.map_dataframe(lambda data, **kws: plt.axline((0, -0.3), slope=0.18))
Or iterate the facets manually via g.axes.flat:
for ax in g.axes.flat:
ax.axline((0, -0.3), slope=0.18) # y = 0.18*x - 0.3

Related

Marker style of legend and the graph are not matching?

Below is the code I am using for generating the plot, but the issue is style of the marker in the graph is different from that of the plot
sns.set_style(rc={'boxplot.flierprops.markeredgecolor':'black' ,'boxplot.flierprops.markeredgewidth':1.25,'boxplot.flierprops.markerfacecolor':'white'})
fig, scatter = plt.subplots(figsize = (6,4), dpi = 100)
scatter = sns.lineplot(data=df_whole,x='shortest_distance',y='similarity',style ='Metric',hue='Metric'
,markers=True,lw=1,markeredgewidth=1.25,markeredgecolor='black',markersize=7,dashes= False,errorbar=None,markerfacecolor='white')
scatter.set(title='TF-IDF')
scatter.legend(title = "Similarity Methods",prop={'size': 12})
As seaborn uses complex combinations of matplotlib elements to create its plots, and tries to make the legend as compact as possible, the legend is often custom-made. As such, seaborn unfortunately does not always take into account all matplotlib-level parameters.
In this case, the problem can be worked around via assigning these parameters again to the legend handles. Here is an example using one of seaborn's test datasets:
import matplotlib.pyplot as plt
import seaborn as sns
flights = sns.load_dataset('flights')
markerprops = dict(markeredgewidth=1.25, markeredgecolor='black', markersize=7, markerfacecolor='none')
ax = sns.lineplot(data=flights, x='year', y='passengers', style='month', hue='month',
markers=True, lw=1, dashes=False, errorbar=None, **markerprops)
ax.set(title='TF-IDF')
handles, labels = ax.get_legend_handles_labels()
for h in handles:
h.set(**markerprops)
ax.legend(handles=handles, title="Months", prop={'size': 12}, ncol=3)
plt.tight_layout()
plt.show()
PS: Matplotlib functions usually return the graphical elements they created (e.g. scatter dots or lines), while seaborn (and pandas) usually returns the subplot (ax) or grid of subplots. As such, giving the name scatter to the return value of sns.lineplot might be confusing when comparing code with other matplotlib and seaborn examples.

Separating violinplots in seaborn with a line

I'm trying to plot multi-hue distributions with Seaborn, but I find that the plots are difficult to be traced back to the tick they belong to. I have tried to add a grid, but the grid is only showing on the dimension of the distribution, so separating the distribution itself but not different distributions from each other. Is it possible to have Seaborn add a grid line between different violin plot groups/hues? To illustrate, take one of the plots from the docs. I've added what I'd like to see to this plot (I've made the width of these separators quite heavy for illustration purposes, in the solution I'd like them to be just as thick as the grid lines):
You could use matplotlib's axvline to draw vertical lines at positions 0.5, 1.5, ...
import numpy as np
import seaborn as sns
sns.set(style="whitegrid")
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="muted")
for i in range(len(np.unique(tips['day'])) - 1):
ax.axvline(i + 0.5, color='grey', lw=1)
plt.show()
Alternatively, you could set minor ticks at these positions and turn the minor gridlines on for the x-axis.

Superimposing Images in Catplot

I am attempting to superimpose dot plots on top of a bar graph. Both types of plots have been generated using seaborn's catplot function.
Scatter-plot Code:
dotplot2 = sns.catplot(x="Group", y="MASQ_Score", col= "MASQ_Item", units="subject", aspect=.6, hue="Group", ci = 68, data=df_reshaped)
Resulting Plot Image:
Bar-plot Code:
barplot2 = sns.catplot(x="Group", y="MASQ_Score", hue='Group', kind="bar", col= "MASQ_Item", units="subject", aspect=.6, ci = 68, data=df_reshaped)
Resulting Plot Image:
Does anyone know if there is a way to superimpose the scatter-plot data on top of the bar-plot? So that both types of information are conveniently visible.
Based on #ImportanceOfBeingErnest comment, here is a working solution with open data using the map_dataframe method from FacetGrid:
import seaborn as sns
tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time")
g.map_dataframe(sns.stripplot,x="sex", y="total_bill")
g.map_dataframe(sns.boxplot,x="sex", y="total_bill",boxprops={'facecolor':'None'},showfliers = False)
The stirpplot and boxplot can easily be replaced by other seaborn components like swarmplot or violinplot.
What you are plotting is not a scatter plot, it is a stirpplot. As #Rachid Riad shows. If what you are looking for is to make a barplot instead of a boxplot, you just have to change that line to:
sns.barplot()
I would personally recommend using boxplot and swarmplot.

Using seaborn, how can I add a data point of a different color to my scatterplot or change to color of last data point?

import seaborn as sns
iris = sns.load_dataset("iris")
grid = sns.JointGrid(iris.petal_length, iris.petal_width, space=0, size=6, ratio=50)
grid.plot_joint(plt.scatter, color="g")
The above code will create the scatter plot based on the Iris data set. I want to add another data point at [3,.05] that will be red in color; or make the last point within the data set red in color. How do I go about doing this?
To add a point at custom x and y coordinates, add matplotlib.pyplot.scatter with your coordinates:
plt.scatter(x=3, y=0.5, color='r')
And to color your last point, use the .iloc locator on your data:
plt.scatter(iris.petal_length.iloc[-1], iris.petal_width.iloc[-1], color='r')
Note that the iloc locator is from pandas, and plt.scatter is from matplotlib.pyplot. Both of these are mandatory dependencies of seaborn, so you definitely have them on your machine if you're using seaborn.
For example:
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")
grid = sns.JointGrid(iris.petal_length, iris.petal_width, space=0, size=6, ratio=50)
grid.plot_joint(plt.scatter, color="g")
# add your point
plt.scatter(x=3, y=0.5, color='r')
# or
# plt.scatter(iris.petal_length.iloc[-1], iris.petal_width.iloc[-1], color='r')

How to use seaborn pointplot and violinplot in the same figure? (change xticks and marker of pointplot)

I am trying to create violinplots that shows confidence intervals for the mean. I thought an easy way to do this would be to plot a pointplot on top of the violinplot, but this is not working since they seem to be using different indices for the xaxis as in this example:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset("titanic")
titanic.dropna(inplace=True)
fig, (ax1,ax2,ax3) = plt.subplots(1,3, sharey=True, figsize=(12,4))
#ax1
sns.pointplot("who", "age", data=titanic, join=False,n_boot=10, ax=ax1)
#ax2
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax2)
#ax3
sns.pointplot("who", "age", data=titanic, join=False, n_boot=10, ax=ax3)
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax3)
ax3.set_xlim([-0.5,4])
print(ax1.get_xticks(), ax2.get_xticks())
gives: [0 1 2] [1 2 3]
Why are these plots not assigning the same xtick numbers to the 'who'-variable and is there any way I can change this?
I also wonder if there is anyway I can change the marker for pointplot, because as you can see in the figure, the point is so big so that it covers the entire confidence interval. I would like just a horizontal line if possible.
I'm posting my final solution here. The reason I wanted to do this kind of plot to begin with, was to display information about the distribution shape, shift in means, and outliers in the same figure. With mwaskom's pointers and some other tweaks I finally got what I was looking for.
The left hand figure is there as a comparison with all data points plotted as lines and the right hand one is my final figure. The thick grey line in the middle of the violin is the bootstrapped 99% confidence interval of the mean, which is the white horizontal line, both from pointplot. The three dotted lines are the standard 25th, 50th and 75th percentile and the lines outside that are the caps of the whiskers of a boxplot I plotted on top of the violin plot. Individual data points are plotted as lines beyond this points since my data usually has a few extreme ones that I need to remove manually like the two points in the violin below.
For now, I am going to to continue making histograms and boxplots in addition to these enhanced violins, but I hope to find that all the information is accurately captured in the violinplot and that I can start and rely on it as my main initial data exploration plot. Here is the final code to produce the plots in case someone else finds them useful (or finds something that can be improved). Lots of tweaking to the boxplot.
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
#change the linewidth which to get a thicker confidence interval line
mpl.rc("lines", linewidth=3)
df = sns.load_dataset("titanic")
df.dropna(inplace=True)
x = 'who'
y = 'age'
fig, (ax1,ax2) = plt.subplots(1,2, sharey=True, figsize=(12,6))
#Left hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax1, inner='stick')
#Right hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax2, positions=0)
sns.pointplot(df[x],df[y], join=False, ci=99, n_boot=1000, ax=ax2, color=[0.3,0.3,0.3], markers=' ')
df.boxplot(y, by=x, sym='_', ax=ax2, showbox=False, showmeans=True, whiskerprops={'linewidth':0},
medianprops={'linewidth':0}, flierprops={'markeredgecolor':'k', 'markeredgewidth':1},
meanprops={'marker':'_', 'color':'w', 'markersize':6, 'markeredgewidth':1.5},
capprops={'linewidth':1, 'color':[0.3,0.3,0.3]}, positions=[0,1,2])
#One could argue that this is not beautiful
labels = [item.get_text() + '\nn=' + str(df.groupby(x).size().loc[item.get_text()]) for item in ax2.get_xticklabels()]
ax2.set_xticklabels(labels)
#Clean up
fig.suptitle('')
ax2.set_title('')
fig.set_facecolor('w')
Edit: Added 'n='
violinplot takes a positions argument that you can use to put the violins somewhere else (they currently just inherit the default matplotlib boxplot positions).
pointplot takes a markers argument that you can use to change how the point estimate is rendered.

Categories