Plotting two pandas series together one appears flat - python

I am practicing with Python Pandas plotting functions and I am trying to plot the content of two series extracted from the same dataframe into one plot.
When I plot the two series individually the result is correct. However, when I plot them together, the one that I plot as second appears flat in the picture.
Here is my code:
# dailyFlow and smooth are created in the same way from the same dataframe
dailyFlow = pd.Series(dataFrame...
smooth = pd.Series(dataFrame...
# lower the noise in the signal with standard deviation = 6
smooth = smooth.resample('D').sum().rolling(31, center=True, win_type='gaussian').sum(std=6)
dailyFlow.plot(style ='-b')
plt.legend(loc = 'upper right')
plt.show()
smooth.plot(style ='-r')
plt.legend(loc = 'upper right')
plt.show()
plt.figure(figsize=(12,5))
smooth.plot(style ='-r')
dailyFlow.plot(style ='-b')
plt.legend(loc = 'upper right')
plt.show()
Here is the output of my function:
I already tried using the parameter secondary_y=True in the second plot, but then I lose the information on the second line in the legend and the scaling between the two plots is wrong.
Many sources on the Internet seem to suggest that plotting the two series like I am doing should be correct, but then why is the third plot incorrect?
Thank you very much for your help.

For the data you have, the 3rd plot is correct. Look at the scale of the y axis on your two plots: one goes up to 70,000 and the other to 60,000,000.
I suspect what you actually want is a .rolling(...).mean() which should have a range comparable to your original data.

If you would like to make both plots bigger, you cold try something like this
fig, ax1 = plt.subplots()
ax1.set_ylim([0, 75000])
# plot first graph
ax2 = ax1.twinx() # second axes that shares the same x-axis
ax2.set_ylim([0, 60000000])
#plot the second graph

Related

How to overlay a scatter plot on a line plot in python where x axis data is string?

I have data as follows:
I want to plot for three different dataframes of the same kind with same no. of columns two line plot and one scatter plot( which is a smaller dataframe from the rest). The code I have used is as follows:
fig, axs = plt.subplots(figsize = (16,8))
df1.plot(ax=axs,x='day-month', y='2015Data_Value', kind = 'scatter')
df2.plot(ax=axs, x='day-month', y='Data_Value', linewidth=2)
df3.plot(ax=axs, x='day-month', y='Data_Value', linewidth=2)
but there is an error for scatter plot as it is not able to take x-axis value, always the error shows with 'day-month', but line plot run fine and give correct plotting when you comment out scatter plot. How does one solve such problem ?

Add an additional axis with different ticks in matplotlib

I have a plot that plots iteration vs. progress for an optimization problem. What I want to do is add an additional axis (at the top of the plot) that uses the same data - but also marks wall time. Thus there are two x-axes in 1-to-1 correspondence with each other, on top and bottom, and one data series. I've created the second axis as:
ax2 = ax.twiny()
ax2.set_xlabel('Wall Time (s)')
But now I don't know how to add the new ticks. I'm alternatively open to having two x-data series for each y series, but I don't know how to do this either.
I figured it out:
ax2 = ax.twiny()
ax2.set_xlabel('Wall Time (s)')
ax2.set_xlim(0.0, np.max(all_data, axis=0)[0] * scale_amt)

Overlay two separate histograms in python

I have two separate dataframes that I made into histograms and I want to know how I can overlay them so for each category in the x axis the bar is a different color for each dataframe. This is the code I have for the separate bar graphs.
df1.plot.bar(x='brand', y='desc')
df2.groupby(['brand']).count()['desc'].plot(kind='bar')
I tried this code:
previous = df1.plot.bar(x='brand', y='desc')
current= df2.groupby(['brand']).count()['desc'].plot(kind='bar')
bins = np.linspace(1, 4)
plt.hist(x, bins, alpha=0.9,normed=1, label='Previous')
plt.hist(y, bins, alpha=0.5, normed=0,label='Current')
plt.legend(loc='upper right')
plt.show()
This code is not overlaying the graphs properly. The problem is dataframe 2 doesn't have numeric values so i need to use the count method. Appreciate the help!
You might have to use axes objects in matplotlib. In simple terms, you create a figure with some axes object associated with it, then you can call hist from it. Here's one way you can do it:
fig, ax = plt.subplots(1, 1)
ax.hist(x, bins, alpha=0.9,normed=1, label='Previous')
ax.hist(y, bins, alpha=0.5, normed=0,label='Current')
ax.legend(loc='upper right')
plt.show()
Make use of seaborn's histogram with several variables. In your case it would be:
import seaborn as sns
previous = df1.plot.bar(x='brand', y='desc')
current= df2.groupby(['brand']).count()['desc']
sns.distplot( previous , color="skyblue", label="previous")
sns.distplot( current , color="red", label="Current")

How to add axes for subplot in matplotlib

I need to fit a function to a large number of datasets stored in several files and compare the fits. I open a file, read the columns and plot each fit as a subplot after fitting. Eventually I have a figure with lot of subplots showing all the fits. However, I need to see the fit and also the residual for each subplot like in the figure.
So far, I have the following. I thought I could add axes to subplot but it does not work. The function that I have works. But I do not know how to add axes to subplot to plot the residual with the fit as a subplot to the subplot.
def plotall(args):
x=args[0]
ydata=args[1]
chisq=args[2]
fit=args[3]
g1=args[4]
a=args[5]
ptitle=args[6]
axi = fig1.add_subplot(a1,b1,a+1)
axi.plot(x, ydata,'ko',markersize=2,label='Data')
axi.plot(x,fit,'m-',label='Fit')
axi.text(0.75,0.8,'T=%4.1f(K)'%ptitle, fontsize=7,transform = axi.transAxes)
axi.text(0.05,0.45,r'$\chi^2$=%3.1f'%chisq,fontsize=7,transform = axi.transAxes)
ytlist=np.linspace(min(ydata),max(ydata),4)
axi.set_yticks(ytlist)
axi.set_xlim([xlow,xhi])
xtlist=np.linspace(xlow,xhi,6)
axi.set_xticks(xtlist)
for label in (axi.get_xticklabels() + axi.get_yticklabels()):
label.set_fontname('Arial')
label.set_fontsize(5)
axi.legend(('Data','Fit'), 'upper left', shadow=False, fancybox=False,numpoints=1,
frameon = 0,labelspacing=0.01,handlelength=0.5,handletextpad=0.5,fontsize=6)

How to use seaborn pointplot and violinplot in the same figure? (change xticks and marker of pointplot)

I am trying to create violinplots that shows confidence intervals for the mean. I thought an easy way to do this would be to plot a pointplot on top of the violinplot, but this is not working since they seem to be using different indices for the xaxis as in this example:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset("titanic")
titanic.dropna(inplace=True)
fig, (ax1,ax2,ax3) = plt.subplots(1,3, sharey=True, figsize=(12,4))
#ax1
sns.pointplot("who", "age", data=titanic, join=False,n_boot=10, ax=ax1)
#ax2
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax2)
#ax3
sns.pointplot("who", "age", data=titanic, join=False, n_boot=10, ax=ax3)
sns.violinplot(titanic.age, groupby=titanic.who, ax=ax3)
ax3.set_xlim([-0.5,4])
print(ax1.get_xticks(), ax2.get_xticks())
gives: [0 1 2] [1 2 3]
Why are these plots not assigning the same xtick numbers to the 'who'-variable and is there any way I can change this?
I also wonder if there is anyway I can change the marker for pointplot, because as you can see in the figure, the point is so big so that it covers the entire confidence interval. I would like just a horizontal line if possible.
I'm posting my final solution here. The reason I wanted to do this kind of plot to begin with, was to display information about the distribution shape, shift in means, and outliers in the same figure. With mwaskom's pointers and some other tweaks I finally got what I was looking for.
The left hand figure is there as a comparison with all data points plotted as lines and the right hand one is my final figure. The thick grey line in the middle of the violin is the bootstrapped 99% confidence interval of the mean, which is the white horizontal line, both from pointplot. The three dotted lines are the standard 25th, 50th and 75th percentile and the lines outside that are the caps of the whiskers of a boxplot I plotted on top of the violin plot. Individual data points are plotted as lines beyond this points since my data usually has a few extreme ones that I need to remove manually like the two points in the violin below.
For now, I am going to to continue making histograms and boxplots in addition to these enhanced violins, but I hope to find that all the information is accurately captured in the violinplot and that I can start and rely on it as my main initial data exploration plot. Here is the final code to produce the plots in case someone else finds them useful (or finds something that can be improved). Lots of tweaking to the boxplot.
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
#change the linewidth which to get a thicker confidence interval line
mpl.rc("lines", linewidth=3)
df = sns.load_dataset("titanic")
df.dropna(inplace=True)
x = 'who'
y = 'age'
fig, (ax1,ax2) = plt.subplots(1,2, sharey=True, figsize=(12,6))
#Left hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax1, inner='stick')
#Right hand plot
sns.violinplot(df[y], groupby=df[x], ax=ax2, positions=0)
sns.pointplot(df[x],df[y], join=False, ci=99, n_boot=1000, ax=ax2, color=[0.3,0.3,0.3], markers=' ')
df.boxplot(y, by=x, sym='_', ax=ax2, showbox=False, showmeans=True, whiskerprops={'linewidth':0},
medianprops={'linewidth':0}, flierprops={'markeredgecolor':'k', 'markeredgewidth':1},
meanprops={'marker':'_', 'color':'w', 'markersize':6, 'markeredgewidth':1.5},
capprops={'linewidth':1, 'color':[0.3,0.3,0.3]}, positions=[0,1,2])
#One could argue that this is not beautiful
labels = [item.get_text() + '\nn=' + str(df.groupby(x).size().loc[item.get_text()]) for item in ax2.get_xticklabels()]
ax2.set_xticklabels(labels)
#Clean up
fig.suptitle('')
ax2.set_title('')
fig.set_facecolor('w')
Edit: Added 'n='
violinplot takes a positions argument that you can use to put the violins somewhere else (they currently just inherit the default matplotlib boxplot positions).
pointplot takes a markers argument that you can use to change how the point estimate is rendered.

Categories