How do I overlay different kinds of graph in Seaborn? - python

I'm trying to get two graphs into the same fig, using different y-axes, and it works fine when I use the same kind of plot (two barplots or two lineplots, for example). Using this code
fig, graph = plt.subplots(figsize=(75,3))
sns.lineplot(x='YearBuilt',y='SalePrice',ax=graph,data=processed_data,color='red')
graph2 = graph.twinx()
sns.lineplot(x='YearBuilt', y='AvgOverallQual',ax=graph2,data=processed_data,color='teal')
I obtain this
But when I try to use different kinds, like this:
fig, graph = plt.subplots(figsize=(75,3))
sns.barplot(x='YearBuilt',y='SalePrice',ax=graph,data=processed_data,color='red')
graph2 = graph.twinx()
sns.lineplot(x='YearBuilt', y='AvgOverallQual',ax=graph2,data=processed_data,color='teal')
my graph looks like:
How do I overlay different kinds of graph in Seaborn?

A seaborn barplot is a categorical plot. The first bar will be at position 0, the second at position 1 etc. A lineplot is a numeric plot; it will put all points at a position given by the numeric coordinates.
Here, it seems there is no need to use seaborn at all. Since matplotlib bar plots are numerical as well, doing this in matplotlib alone will give you the desired overlay
fig, ax = plt.subplots(figsize=(75,3))
ax.bar('YearBuilt','SalePrice', data=processed_data, color='red')
ax2 = ax.twinx()
ax2.plot('YearBuilt', 'AvgOverallQual', data=processed_data, color='teal')

Related

Plotting two pandas series together one appears flat

I am practicing with Python Pandas plotting functions and I am trying to plot the content of two series extracted from the same dataframe into one plot.
When I plot the two series individually the result is correct. However, when I plot them together, the one that I plot as second appears flat in the picture.
Here is my code:
# dailyFlow and smooth are created in the same way from the same dataframe
dailyFlow = pd.Series(dataFrame...
smooth = pd.Series(dataFrame...
# lower the noise in the signal with standard deviation = 6
smooth = smooth.resample('D').sum().rolling(31, center=True, win_type='gaussian').sum(std=6)
dailyFlow.plot(style ='-b')
plt.legend(loc = 'upper right')
plt.show()
smooth.plot(style ='-r')
plt.legend(loc = 'upper right')
plt.show()
plt.figure(figsize=(12,5))
smooth.plot(style ='-r')
dailyFlow.plot(style ='-b')
plt.legend(loc = 'upper right')
plt.show()
Here is the output of my function:
I already tried using the parameter secondary_y=True in the second plot, but then I lose the information on the second line in the legend and the scaling between the two plots is wrong.
Many sources on the Internet seem to suggest that plotting the two series like I am doing should be correct, but then why is the third plot incorrect?
Thank you very much for your help.
For the data you have, the 3rd plot is correct. Look at the scale of the y axis on your two plots: one goes up to 70,000 and the other to 60,000,000.
I suspect what you actually want is a .rolling(...).mean() which should have a range comparable to your original data.
If you would like to make both plots bigger, you cold try something like this
fig, ax1 = plt.subplots()
ax1.set_ylim([0, 75000])
# plot first graph
ax2 = ax1.twinx() # second axes that shares the same x-axis
ax2.set_ylim([0, 60000000])
#plot the second graph

Seaborn -xtick labels in KDE plot

Ive created a simple histogram/KDE plot with seaborn and Im trying to add custom labels to the x-axis as follows:
plt.title("Cond Density")
plt.xlabel("Cond")
plt.ylabel("Density")
plt.xticks = (['Bob','Alex','Steve','Gwen','Darren'])
sns.distplot(rawData['Conditions'], bins=20)
sns.kdeplot(rawData['Conditions'], shade=True)
plt.show()
There are only 5 int elements in rawData['Conditions'], but the x-axis justs reflects the values in rawData['Conditions'], which are just [0,1,2,3,4].
What am I missing?
Histograms need sequential ticks. I'm unsure as to what you're exactly trying to plot, but if you want to graph the density relative to each of these names, a bar graph would be best.

Matplotlib: data being plotted over legend when using twinx

I'm trying to use Python and Matplotlib to plot a number of different data sets. I'm using twinx to have one data set plotted on the primary axis and another on the secondary axis. I would like to have two separate legends for these data sets.
In my current solution, the data from the secondary axis is being plotted over the top of the legend for the primary axis, while data from the primary axis is not being plotted over the secondary axis legend.
I have generated a simplified version based on the example here: http://matplotlib.org/users/legend_guide.html
Here is what I have so far:
import matplotlib.pyplot as plt
import pylab
fig, ax1 = plt.subplots()
fig.set_size_inches(18/1.5, 10/1.5)
ax2 = ax1.twinx()
ax1.plot([1,2,3], label="Line 1", linestyle='--')
ax2.plot([3,2,1], label="Line 2", linewidth=4)
ax1.legend(loc=2, borderaxespad=1.)
ax2.legend(loc=1, borderaxespad=1.)
pylab.savefig('test.png',bbox_inches='tight', dpi=300, facecolor='w', edgecolor='k')
With the result being the following plot:
As shown in the plot, the data from ax2 is being plotted over the ax1 legend and I would like the legend to be over the top of the data. What am I missing here?
Thanks for the help.
You could replace your legend setting lines with these:
ax1.legend(loc=1, borderaxespad=1.).set_zorder(2)
ax2.legend(loc=2, borderaxespad=1.).set_zorder(2)
And it should do the trick.
Note that locations have changed to correspond to the lines and there is .set_zorder() method applied after the legend is defined.
The higher integer in zorder the 'higher' layer it will be painted on.
The trick is to draw your first legend, remove it, and then redraw it on the second axis with add_artist():
legend_1 = ax1.legend(loc=2, borderaxespad=1.)
legend_1.remove()
ax2.legend(loc=1, borderaxespad=1.)
ax2.add_artist(legend_1)
Tribute to #ImportanceOfBeingErnest :
https://github.com/matplotlib/matplotlib/issues/3706#issuecomment-378407795

How to use seaborn pointplot and violinplot in the same figure? (change xticks and marker of pointplot)

I am trying to create violinplots that shows confidence intervals for the mean. I thought an easy way to do this would be to plot a pointplot on top of the violinplot, but this is not working since they seem to be using different indices for the xaxis as in this example:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset("titanic")
titanic.dropna(inplace=True)
fig, (ax1,ax2,ax3) = plt.subplots(1,3, sharey=True, figsize=(12,4))
#ax1
sns.pointplot("who", "age", data=titanic, join=False,n_boot=10, ax=ax1)
#ax2
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax2)
#ax3
sns.pointplot("who", "age", data=titanic, join=False, n_boot=10, ax=ax3)
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax3)
ax3.set_xlim([-0.5,4])
print(ax1.get_xticks(), ax2.get_xticks())
gives: [0 1 2] [1 2 3]
Why are these plots not assigning the same xtick numbers to the 'who'-variable and is there any way I can change this?
I also wonder if there is anyway I can change the marker for pointplot, because as you can see in the figure, the point is so big so that it covers the entire confidence interval. I would like just a horizontal line if possible.
I'm posting my final solution here. The reason I wanted to do this kind of plot to begin with, was to display information about the distribution shape, shift in means, and outliers in the same figure. With mwaskom's pointers and some other tweaks I finally got what I was looking for.
The left hand figure is there as a comparison with all data points plotted as lines and the right hand one is my final figure. The thick grey line in the middle of the violin is the bootstrapped 99% confidence interval of the mean, which is the white horizontal line, both from pointplot. The three dotted lines are the standard 25th, 50th and 75th percentile and the lines outside that are the caps of the whiskers of a boxplot I plotted on top of the violin plot. Individual data points are plotted as lines beyond this points since my data usually has a few extreme ones that I need to remove manually like the two points in the violin below.
For now, I am going to to continue making histograms and boxplots in addition to these enhanced violins, but I hope to find that all the information is accurately captured in the violinplot and that I can start and rely on it as my main initial data exploration plot. Here is the final code to produce the plots in case someone else finds them useful (or finds something that can be improved). Lots of tweaking to the boxplot.
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
#change the linewidth which to get a thicker confidence interval line
mpl.rc("lines", linewidth=3)
df = sns.load_dataset("titanic")
df.dropna(inplace=True)
x = 'who'
y = 'age'
fig, (ax1,ax2) = plt.subplots(1,2, sharey=True, figsize=(12,6))
#Left hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax1, inner='stick')
#Right hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax2, positions=0)
sns.pointplot(df[x],df[y], join=False, ci=99, n_boot=1000, ax=ax2, color=[0.3,0.3,0.3], markers=' ')
df.boxplot(y, by=x, sym='_', ax=ax2, showbox=False, showmeans=True, whiskerprops={'linewidth':0},
medianprops={'linewidth':0}, flierprops={'markeredgecolor':'k', 'markeredgewidth':1},
meanprops={'marker':'_', 'color':'w', 'markersize':6, 'markeredgewidth':1.5},
capprops={'linewidth':1, 'color':[0.3,0.3,0.3]}, positions=[0,1,2])
#One could argue that this is not beautiful
labels = [item.get_text() + '\nn=' + str(df.groupby(x).size().loc[item.get_text()]) for item in ax2.get_xticklabels()]
ax2.set_xticklabels(labels)
#Clean up
fig.suptitle('')
ax2.set_title('')
fig.set_facecolor('w')
Edit: Added 'n='
violinplot takes a positions argument that you can use to put the violins somewhere else (they currently just inherit the default matplotlib boxplot positions).
pointplot takes a markers argument that you can use to change how the point estimate is rendered.

plotting 2 graph in same window using matplotlib in python

I was plotting a line graph and a bar chart in matplotlib and both individually were working fine with my script.
but i'm facing a problem:
1. if i want to plot both graphs in the same output window
2. if i want to customize the display window to 1024*700
in 1 st case I was using subplot to plot two graphs in same window but i'm not being able to give both graphs their individual x-axis and y-axis names and also their individual title.
my failed code is:
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
xs,ys = np.loadtxt("c:/users/name/desktop/new folder/x/counter.cnt",delimiter = ',').T
fig = plt.figure()
lineGraph = fig.add_subplot(211)
barChart = fig.add_subplot(212)
plt.title('DISTRIBUTION of NUMBER')
lineGraph = lineGraph.plot(xs,ys,'-') #generate line graph
barChart = barChart.bar(xs,ys,width=1.0,facecolor='g') #generate bar plot
plt.grid(True)
plt.axis([0,350,0,25]) #controlls axis for charts x first and then y axis.
plt.savefig('new.png',dpi=400)
plt.show()
but with this I am not being able to mark both graphs properly.
and also please site some idea about how to resize the window to 1024*700.
When you say
I was using subplot to plot two graphs in same window but i'm not being able to give both graphs their individual x-axis and y-axis names and also their individual title.
do you mean you want to set axis labels? If so try using lineGraph.set_xlabel and lineGraph.set_ylabel. Alternatively, call plt.xlabel and plot.ylabel just after you create a plot and before you create any other plots. For example
# Line graph subplot
lineGraph = lineGraph.plot(xs,ys,'-')
lineGraph.set_xlabel('x')
lineGraph.set_ylabel('y')
# Bar graph subplot
barChart = barChart.bar(xs,ys,width=1.0,facecolor='g')
barChart.set_xlabel('x')
barChart.set_ylabel('y')
The same applies to the title. Calling plt.title will add a title to the currently active plot. This is the last plot that you created or the last plot you actived with plt.gca. If you want a title on a specific subplot use the subplot handle: lineGraph.set_title or barChart.set_title.
fig.add_subplot returns a matplotlib Axes object. Methods on that object include set_xlabel and set_ylabel, as described by Chris. You can see the full set of methods available on Axes objects at http://matplotlib.sourceforge.net/api/axes_api.html.

Categories