I have a scatter plot matrix generated using the seaborn package and I'd like to remove all the tick mark labels as these are just messying up the graph (either that or just remove those on the x-axis), but I'm not sure how to do it and have had no success doing Google searches. Any suggestions?
import seaborn as sns
sns.pairplot(wheat[['area_planted',
'area_harvested',
'production',
'yield']])
plt.show()
import seaborn as sns
iris = sns.load_dataset("iris")
g = sns.pairplot(iris)
g.set(xticklabels=[])
You can use a list comprehension to loop through all columns and turn off visibility of the xaxis.
df = pd.DataFrame(np.random.randn(1000, 2)) * 1e6
sns.pairplot(df)
plot = sns.pairplot(df)
[plot.axes[len(df.columns) - 1][col].xaxis.set_visible(False)
for col in range(len(df.columns))]
plt.show()
You could also rescale your data to something more readable:
df /= 1e6
sns.pairplot(df)
Probably using the following is more appropriate
import seaborn as sns
iris = sns.load_dataset("iris")
g = sns.pairplot(iris)
g.set(xticks=[])
Related
I have a dataframe and I'm using seaborn pairplot to plot one target column vs the rest of the columns.
Code is below,
import seaborn as sns
import matplotlib.pyplot as plt
tgt_var = 'AB'
var_lst = ['A','GH','DL','GT','MS']
pp = sns.pairplot(data=df,
y_vars=[tgt_var],
x_vars=var_lst)
pp.fig.set_figheight(6)
pp.fig.set_figwidth(20)
The var_lst is not a static list, I just provided an example.
What I need is to plot tgt_var on Y axis and each var_lst on x axis.
I'm able to do this with above code, but I also want to use log scale on X axis only if the var_lst item is 'GH' or 'MS', for the rest normal scale. Is there any way to achieve this?
Iterate pp.axes.flat and set xscale="log" if the xlabel matches "GH" or "MS":
log_columns = ["GH", "MS"]
for ax in pp.axes.flat:
if ax.get_xlabel() in log_columns:
ax.set(xscale="log")
Full example with the iris dataset where the petal columns are xscale="log":
import seaborn as sns
df = sns.load_dataset("iris")
pp = sns.pairplot(df)
log_columns = ["petal_length", "petal_width"]
for ax in pp.axes.flat:
if ax.get_xlabel() in log_columns:
ax.set(xscale="log")
In the example below, how do I use seaborn.PairGrid() to reproduce the plots created by seaborn.pairplot()? Specifically, I'd like the diagonal distributions to span the vertical axis. Markers with white borders etc... would be great too. Thanks!
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
# pairplot() example
g = sns.pairplot(iris, kind='scatter', diag_kind='kde')
plt.show()
# PairGrid() example
g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot)
g.map_offdiag(plt.scatter)
plt.show()
This is quite simple to achieve. The main differences between your plot and what pairplot does are:
the use of the diag_sharey parameter of PairGrid
using sns.scatterplot instead of plt.scatter
With that, we have:
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris, diag_sharey=False)
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.scatterplot)
To change the visual style:
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot, shade=True)
g.map_offdiag(plt.scatter, edgecolor="w")
plt.show()
I'm plotting time series data using seaborn lineplot (https://seaborn.pydata.org/generated/seaborn.lineplot.html), and plotting the median instead of mean. Example code:
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
fmri = sns.load_dataset("fmri")
ax = sns.lineplot(x="timepoint", y="signal", estimator = np.median, data=fmri)
I want the error bands to show the interquartile range as opposed to the confidence interval. I know I can use ci = "sd" for standard deviation, but is there a simple way to add the IQR instead? I cannot figure it out.
Thank you!
I don't know if this can be done with seaborn alone, but here's one way to do it with matplotlib, keeping the seaborn style. The describe() method conveniently provides summary statistics for a DataFrame, among them the quartiles, which we can use to plot the medians with inter-quartile-ranges.
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
fmri = sns.load_dataset("fmri")
fmri_stats = fmri.groupby(['timepoint']).describe()
x = fmri_stats.index
medians = fmri_stats[('signal', '50%')]
medians.name = 'signal'
quartiles1 = fmri_stats[('signal', '25%')]
quartiles3 = fmri_stats[('signal', '75%')]
ax = sns.lineplot(x, medians)
ax.fill_between(x, quartiles1, quartiles3, alpha=0.3);
You can calculate the median within lineplot like you have done, set ci to be none and fill in using ax.fill_between()
import numpy as np
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
fmri = sns.load_dataset("fmri")
ax = sns.lineplot(x="timepoint", y="signal", estimator = np.median,
data=fmri,ci=None)
bounds = fmri.groupby('timepoint')['signal'].quantile((0.25,0.75)).unstack()
ax.fill_between(x=bounds.index,y1=bounds.iloc[:,0],y2=bounds.iloc[:,1],alpha=0.1)
This option is possible since version 0.12 of seaborn, see here for the documentation.
pip install --upgrade seaborn
The estimator specifies the point by the name of pandas method or callable, such as 'median' or 'mean'.
The errorbar is an option to plot a distribution spread by a string, (string, number) tuple, or callable. In order to mark the median value and fill the area between the interquartile, you would need the params:
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
fmri = sns.load_dataset("fmri")
ax = sns.lineplot(data=fmri, x="timepoint", y="signal", estimator=np.median,
errorbar=lambda x: (np.quantile(x, 0.25), np.quantile(x, 0.75)))
You can now!
estimator="median", errobar=("pi",0.5)
https://seaborn.pydata.org/tutorial/error_bars
how to use groupby function in the y-axis? the below code doesn't display what i expect, due to y = df.groupby('column1')['column2'].count()
import seaborn as sns
import pandas as pd
sns.set(style="whitegrid", color_codes=True)
sns.stripplot(x="column1", y = df.groupby('column1')['column2'].count(), data=df)
Seaborn just doesn't work that way. In seaborn, you specify the x and y columns as well as the data frame. Seaborn will do the aggregation itself.
import seaborn as sns
sns.striplot('column1', 'column2', data=df)
For the count, maybe what you need is countplot
sns.countplot('column1', data=df)
The equivalent pandas code is:
df.groupby('column1').size().plot(kind='bar')
this code will create a count plot with horizontal bar equivalent and descending sorted values
fig,ax = plt.subplots(figsize=(10,16))
grouped=df.groupby('Age').size(). \
sort_values(ascending=False).plot(kind='barh',ax=ax)
When plotting correlations, this code
>>> import seaborn as sns
>>> iris = sns.load_dataset("iris")
>>> g = sns.pairplot(iris)
results in the following pairplot:
http://seaborn.pydata.org/_images/seaborn-pairplot-1.png
What if I just want to show the first row out of those four (i.e. correlations of 'sepal_length' vs all other features)? How can I plot that? Could pairplot be used but with some modifications?
Thanks
Using the x_vars and y_vars arguments of pairplot you can select which columns to correlate.
import matplotlib.pyplot as plt
import seaborn as sns
iris = sns.load_dataset("iris")
g = sns.pairplot(iris,
x_vars=["sepal_width","petal_length","petal_width"],
y_vars=["sepal_length"])
plt.show()