Two seaborn distplots one same axis - python

I am trying to figure a nice way to plot two distplots (from seaborn) on the same axis. It is not coming out as pretty as I want since the histogram bars are covering each other. And I don't want to use countplot or barplot simply because they don't look as pretty. Naturally if there is no other way I shall do it in that fashion, but distplot looks very good. But, as said, the bars are now covering each other (see pic).
Thus is there any way to fit two distplot frequency bars onto one bin so that they do not overlap? Or placing the counts on top of each other? Basically I want to do this in seaborn:
Any ideas to clean it up are most welcome. Thanks.
MWE:
sns.set_context("paper",font_scale=2)
sns.set_style("white")
rc('text', usetex=False)
fig, ax = plt.subplots(figsize=(7,7),sharey=True)
sns.despine(left=True)
mats=dict()
mats[0]=[1,1,1,1,1,2,3,3,2,3,3,3,3,3]
mats[1]=[3,3,3,3,3,4,4,4,5,6,1,1,2,3,4,5,5,5]
N=max(max(set(mats[0])),max(set(mats[1])))
binsize = np.arange(0,N+1,1)
B=['Thing1','Thing2']
for i in range(len(B)):
ax = sns.distplot(mats[i],
kde=False,
label=B[i],
bins=binsize)
ax.set_xlabel('My label')
ax.get_yaxis().set_visible(False)
ax.legend()
plt.show()

As #mwaskom has said seaborn is wrapping matplotlib plotting functions (well to most part) to deliver more complex and nicer looking charts.
What you are looking for is "simple enough" to get it done with matplotlib:
sns.set_context("paper", font_scale=2)
sns.set_style("white")
plt.rc('text', usetex=False)
fig, ax = plt.subplots(figsize=(4,4))
sns.despine(left=True)
# mats=dict()
mats0=[1,1,1,1,1,2,3,3,2,3,3,3,3,3]
mats1=[3,3,3,3,3,4,4,4,5,6,1,1,2,3,4,5,5,5]
N=max(mats0 + mats1)
# binsize = np.arange(0,N+1,1)
binsize = N
B=['Thing1','Thing2']
ax.hist([mats0, mats1], binsize, histtype='bar',
align='mid', label=B, alpha=0.4)#, rwidth=0.6)
ax.set_xlabel('My label')
ax.get_yaxis().set_visible(False)
# ax.set_xlim(0,N+1)
ax.legend()
plt.show()
Which yields:
You can uncomment ax.set_xlim(0,N+1) to give more space around this histogram.

Related

How i can delete xlabel of plot? [duplicate]

I'm trying to plot a figure without tickmarks or numbers on either of the axes (I use axes in the traditional sense, not the matplotlib nomenclature!). An issue I have come across is where matplotlib adjusts the x(y)ticklabels by subtracting a value N, then adds N at the end of the axis.
This may be vague, but the following simplified example highlights the issue, with '6.18' being the offending value of N:
import matplotlib.pyplot as plt
import random
prefix = 6.18
rx = [prefix+(0.001*random.random()) for i in arange(100)]
ry = [prefix+(0.001*random.random()) for i in arange(100)]
plt.plot(rx,ry,'ko')
frame1 = plt.gca()
for xlabel_i in frame1.axes.get_xticklabels():
xlabel_i.set_visible(False)
xlabel_i.set_fontsize(0.0)
for xlabel_i in frame1.axes.get_yticklabels():
xlabel_i.set_fontsize(0.0)
xlabel_i.set_visible(False)
for tick in frame1.axes.get_xticklines():
tick.set_visible(False)
for tick in frame1.axes.get_yticklines():
tick.set_visible(False)
plt.show()
The three things I would like to know are:
How to turn off this behaviour in the first place (although in most cases it is useful, it is not always!) I have looked through matplotlib.axis.XAxis and cannot find anything appropriate
How can I make N disappear (i.e. X.set_visible(False))
Is there a better way to do the above anyway? My final plot would be 4x4 subplots in a figure, if that is relevant.
Instead of hiding each element, you can hide the whole axis:
frame1.axes.get_xaxis().set_visible(False)
frame1.axes.get_yaxis().set_visible(False)
Or, you can set the ticks to an empty list:
frame1.axes.get_xaxis().set_ticks([])
frame1.axes.get_yaxis().set_ticks([])
In this second option, you can still use plt.xlabel() and plt.ylabel() to add labels to the axes.
If you want to hide just the axis text keeping the grid lines:
frame1 = plt.gca()
frame1.axes.xaxis.set_ticklabels([])
frame1.axes.yaxis.set_ticklabels([])
Doing set_visible(False) or set_ticks([]) will also hide the grid lines.
If you are like me and don't always retrieve the axes, ax, when plotting the figure, then a simple solution would be to do
plt.xticks([])
plt.yticks([])
I've colour coded this figure to ease the process.
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
You can have full control over the figure using these commands, to complete the answer I've add also the control over the spines:
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# X AXIS -BORDER
ax.spines['bottom'].set_visible(False)
# BLUE
ax.set_xticklabels([])
# RED
ax.set_xticks([])
# RED AND BLUE TOGETHER
ax.axes.get_xaxis().set_visible(False)
# Y AXIS -BORDER
ax.spines['left'].set_visible(False)
# YELLOW
ax.set_yticklabels([])
# GREEN
ax.set_yticks([])
# YELLOW AND GREEN TOGHETHER
ax.axes.get_yaxis().set_visible(False)
I was not actually able to render an image without borders or axis data based on any of the code snippets here (even the one accepted at the answer). After digging through some API documentation, I landed on this code to render my image
plt.axis('off')
plt.tick_params(axis='both', left=False, top=False, right=False, bottom=False, labelleft=False, labeltop=False, labelright=False, labelbottom=False)
plt.savefig('foo.png', dpi=100, bbox_inches='tight', pad_inches=0.0)
I used the tick_params call to basically shut down any extra information that might be rendered and I have a perfect graph in my output file.
Somewhat of an old thread but, this seems to be a faster method using the latest version of matplotlib:
set the major formatter for the x-axis
ax.xaxis.set_major_formatter(plt.NullFormatter())
One trick could be setting the color of tick labels as white to hide it!
plt.xticks(color='w')
plt.yticks(color='w')
or to be more generalized (#Armin Okić), you can set it as "None".
When using the object oriented API, the Axes object has two useful methods for removing the axis text, set_xticklabels() and set_xticks().
Say you create a plot using
fig, ax = plt.subplots(1)
ax.plot(x, y)
If you simply want to remove the tick labels, you could use
ax.set_xticklabels([])
or to remove the ticks completely, you could use
ax.set_xticks([])
These methods are useful for specifying exactly where you want the ticks and how you want them labeled. Passing an empty list results in no ticks, or no labels, respectively.
You could simply set xlabel to None, straight in your axis. Below an working example using seaborn
from matplotlib import pyplot as plt
import seaborn as sns
tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", data=tips)
ax.set(xlabel=None)
plt.show()
Just do this in case you have subplots
fig, axs = plt.subplots(1, 2, figsize=(16, 8))
ax[0].set_yticklabels([]) # x-axis
ax[0].set_xticklabels([]) # y-axis

Remove text from figure when using dataframe.boxplot(by=...) [duplicate]

I'm trying to plot a figure without tickmarks or numbers on either of the axes (I use axes in the traditional sense, not the matplotlib nomenclature!). An issue I have come across is where matplotlib adjusts the x(y)ticklabels by subtracting a value N, then adds N at the end of the axis.
This may be vague, but the following simplified example highlights the issue, with '6.18' being the offending value of N:
import matplotlib.pyplot as plt
import random
prefix = 6.18
rx = [prefix+(0.001*random.random()) for i in arange(100)]
ry = [prefix+(0.001*random.random()) for i in arange(100)]
plt.plot(rx,ry,'ko')
frame1 = plt.gca()
for xlabel_i in frame1.axes.get_xticklabels():
xlabel_i.set_visible(False)
xlabel_i.set_fontsize(0.0)
for xlabel_i in frame1.axes.get_yticklabels():
xlabel_i.set_fontsize(0.0)
xlabel_i.set_visible(False)
for tick in frame1.axes.get_xticklines():
tick.set_visible(False)
for tick in frame1.axes.get_yticklines():
tick.set_visible(False)
plt.show()
The three things I would like to know are:
How to turn off this behaviour in the first place (although in most cases it is useful, it is not always!) I have looked through matplotlib.axis.XAxis and cannot find anything appropriate
How can I make N disappear (i.e. X.set_visible(False))
Is there a better way to do the above anyway? My final plot would be 4x4 subplots in a figure, if that is relevant.
Instead of hiding each element, you can hide the whole axis:
frame1.axes.get_xaxis().set_visible(False)
frame1.axes.get_yaxis().set_visible(False)
Or, you can set the ticks to an empty list:
frame1.axes.get_xaxis().set_ticks([])
frame1.axes.get_yaxis().set_ticks([])
In this second option, you can still use plt.xlabel() and plt.ylabel() to add labels to the axes.
If you want to hide just the axis text keeping the grid lines:
frame1 = plt.gca()
frame1.axes.xaxis.set_ticklabels([])
frame1.axes.yaxis.set_ticklabels([])
Doing set_visible(False) or set_ticks([]) will also hide the grid lines.
If you are like me and don't always retrieve the axes, ax, when plotting the figure, then a simple solution would be to do
plt.xticks([])
plt.yticks([])
I've colour coded this figure to ease the process.
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
You can have full control over the figure using these commands, to complete the answer I've add also the control over the spines:
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# X AXIS -BORDER
ax.spines['bottom'].set_visible(False)
# BLUE
ax.set_xticklabels([])
# RED
ax.set_xticks([])
# RED AND BLUE TOGETHER
ax.axes.get_xaxis().set_visible(False)
# Y AXIS -BORDER
ax.spines['left'].set_visible(False)
# YELLOW
ax.set_yticklabels([])
# GREEN
ax.set_yticks([])
# YELLOW AND GREEN TOGHETHER
ax.axes.get_yaxis().set_visible(False)
I was not actually able to render an image without borders or axis data based on any of the code snippets here (even the one accepted at the answer). After digging through some API documentation, I landed on this code to render my image
plt.axis('off')
plt.tick_params(axis='both', left=False, top=False, right=False, bottom=False, labelleft=False, labeltop=False, labelright=False, labelbottom=False)
plt.savefig('foo.png', dpi=100, bbox_inches='tight', pad_inches=0.0)
I used the tick_params call to basically shut down any extra information that might be rendered and I have a perfect graph in my output file.
Somewhat of an old thread but, this seems to be a faster method using the latest version of matplotlib:
set the major formatter for the x-axis
ax.xaxis.set_major_formatter(plt.NullFormatter())
One trick could be setting the color of tick labels as white to hide it!
plt.xticks(color='w')
plt.yticks(color='w')
or to be more generalized (#Armin Okić), you can set it as "None".
When using the object oriented API, the Axes object has two useful methods for removing the axis text, set_xticklabels() and set_xticks().
Say you create a plot using
fig, ax = plt.subplots(1)
ax.plot(x, y)
If you simply want to remove the tick labels, you could use
ax.set_xticklabels([])
or to remove the ticks completely, you could use
ax.set_xticks([])
These methods are useful for specifying exactly where you want the ticks and how you want them labeled. Passing an empty list results in no ticks, or no labels, respectively.
You could simply set xlabel to None, straight in your axis. Below an working example using seaborn
from matplotlib import pyplot as plt
import seaborn as sns
tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", data=tips)
ax.set(xlabel=None)
plt.show()
Just do this in case you have subplots
fig, axs = plt.subplots(1, 2, figsize=(16, 8))
ax[0].set_yticklabels([]) # x-axis
ax[0].set_xticklabels([]) # y-axis

Overlay two separate histograms in python

I have two separate dataframes that I made into histograms and I want to know how I can overlay them so for each category in the x axis the bar is a different color for each dataframe. This is the code I have for the separate bar graphs.
df1.plot.bar(x='brand', y='desc')
df2.groupby(['brand']).count()['desc'].plot(kind='bar')
I tried this code:
previous = df1.plot.bar(x='brand', y='desc')
current= df2.groupby(['brand']).count()['desc'].plot(kind='bar')
bins = np.linspace(1, 4)
plt.hist(x, bins, alpha=0.9,normed=1, label='Previous')
plt.hist(y, bins, alpha=0.5, normed=0,label='Current')
plt.legend(loc='upper right')
plt.show()
This code is not overlaying the graphs properly. The problem is dataframe 2 doesn't have numeric values so i need to use the count method. Appreciate the help!
You might have to use axes objects in matplotlib. In simple terms, you create a figure with some axes object associated with it, then you can call hist from it. Here's one way you can do it:
fig, ax = plt.subplots(1, 1)
ax.hist(x, bins, alpha=0.9,normed=1, label='Previous')
ax.hist(y, bins, alpha=0.5, normed=0,label='Current')
ax.legend(loc='upper right')
plt.show()
Make use of seaborn's histogram with several variables. In your case it would be:
import seaborn as sns
previous = df1.plot.bar(x='brand', y='desc')
current= df2.groupby(['brand']).count()['desc']
sns.distplot( previous , color="skyblue", label="previous")
sns.distplot( current , color="red", label="Current")

How to use seaborn pointplot and violinplot in the same figure? (change xticks and marker of pointplot)

I am trying to create violinplots that shows confidence intervals for the mean. I thought an easy way to do this would be to plot a pointplot on top of the violinplot, but this is not working since they seem to be using different indices for the xaxis as in this example:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset("titanic")
titanic.dropna(inplace=True)
fig, (ax1,ax2,ax3) = plt.subplots(1,3, sharey=True, figsize=(12,4))
#ax1
sns.pointplot("who", "age", data=titanic, join=False,n_boot=10, ax=ax1)
#ax2
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax2)
#ax3
sns.pointplot("who", "age", data=titanic, join=False, n_boot=10, ax=ax3)
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax3)
ax3.set_xlim([-0.5,4])
print(ax1.get_xticks(), ax2.get_xticks())
gives: [0 1 2] [1 2 3]
Why are these plots not assigning the same xtick numbers to the 'who'-variable and is there any way I can change this?
I also wonder if there is anyway I can change the marker for pointplot, because as you can see in the figure, the point is so big so that it covers the entire confidence interval. I would like just a horizontal line if possible.
I'm posting my final solution here. The reason I wanted to do this kind of plot to begin with, was to display information about the distribution shape, shift in means, and outliers in the same figure. With mwaskom's pointers and some other tweaks I finally got what I was looking for.
The left hand figure is there as a comparison with all data points plotted as lines and the right hand one is my final figure. The thick grey line in the middle of the violin is the bootstrapped 99% confidence interval of the mean, which is the white horizontal line, both from pointplot. The three dotted lines are the standard 25th, 50th and 75th percentile and the lines outside that are the caps of the whiskers of a boxplot I plotted on top of the violin plot. Individual data points are plotted as lines beyond this points since my data usually has a few extreme ones that I need to remove manually like the two points in the violin below.
For now, I am going to to continue making histograms and boxplots in addition to these enhanced violins, but I hope to find that all the information is accurately captured in the violinplot and that I can start and rely on it as my main initial data exploration plot. Here is the final code to produce the plots in case someone else finds them useful (or finds something that can be improved). Lots of tweaking to the boxplot.
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
#change the linewidth which to get a thicker confidence interval line
mpl.rc("lines", linewidth=3)
df = sns.load_dataset("titanic")
df.dropna(inplace=True)
x = 'who'
y = 'age'
fig, (ax1,ax2) = plt.subplots(1,2, sharey=True, figsize=(12,6))
#Left hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax1, inner='stick')
#Right hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax2, positions=0)
sns.pointplot(df[x],df[y], join=False, ci=99, n_boot=1000, ax=ax2, color=[0.3,0.3,0.3], markers=' ')
df.boxplot(y, by=x, sym='_', ax=ax2, showbox=False, showmeans=True, whiskerprops={'linewidth':0},
medianprops={'linewidth':0}, flierprops={'markeredgecolor':'k', 'markeredgewidth':1},
meanprops={'marker':'_', 'color':'w', 'markersize':6, 'markeredgewidth':1.5},
capprops={'linewidth':1, 'color':[0.3,0.3,0.3]}, positions=[0,1,2])
#One could argue that this is not beautiful
labels = [item.get_text() + '\nn=' + str(df.groupby(x).size().loc[item.get_text()]) for item in ax2.get_xticklabels()]
ax2.set_xticklabels(labels)
#Clean up
fig.suptitle('')
ax2.set_title('')
fig.set_facecolor('w')
Edit: Added 'n='
violinplot takes a positions argument that you can use to put the violins somewhere else (they currently just inherit the default matplotlib boxplot positions).
pointplot takes a markers argument that you can use to change how the point estimate is rendered.

matplotlib scatter plot change distance in x-axis

I want to plot some Data with Matplotlib scatter plot.
I used the following code to plot the Data as a scatter with using the same axes for the different subplots.
import numpy as np
import matplotlib.pyplot as plt
epsilon= np.array([1,2,3,4,5])
f, (ax1, ax2, ax3, ax4) = plt.subplots(4, sharex= True, sharey=True)
ax1.scatter(epsilon, mean_percent_100_0, color='r', label='Totaldehnung= 0.000')
ax1.scatter(epsilon, mean_percent_100_03, color='g',label='Totaldehnung= 0.003')
ax1.scatter(epsilon, mean_percent_100_05, color='b',label='Totaldehnung= 0.005')
ax1.set_title('TOR_R')
ax2.scatter(epsilon, mean_percent_111_0,color='r')
ax2.scatter(epsilon, mean_percent_111_03,color='g')
ax2.scatter(epsilon, mean_percent_111_05,color='b')
ax3.scatter(epsilon, mean_percent_110_0,color='r')
ax3.scatter(epsilon, mean_percent_110_03,color='g')
ax3.scatter(epsilon, mean_percent_110_05,color='b')
ax4.scatter(epsilon, mean_percent_234_0,color='r')
ax4.scatter(epsilon, mean_percent_234_03,color='g')
ax4.scatter(epsilon, mean_percent_234_05,color='b')
# Fine-tune figure; make subplots close to each other and hide x ticks for
# all but bottom plot.
f.subplots_adjust(hspace=0.13)
plt.setp([a.get_xticklabels() for a in f.axes[:-1]], visible=False)
plt.locator_params(axis = 'y', nbins = 4)
ax1.grid()
ax2.grid()
ax3.grid()
ax4.grid()
plt.show()
Now i want to have a x-axis with smaller space between each point. I tried to change the range but it was not working. Can someone help me?
To make the x ticks come closer you might have to set the dimensions of the figure.
Since, in your case, the figure is already created, Set the size of the plot using set_size_inches method of the figure object.
This question contains a few other ways to do the same.
Adding the following line before the plt.show()
fig.set_size_inches(2,8)
Gives me this :
Which I hope is what you are trying to do.

Categories