Pandas: plotting two histograms on the same plot - python

I would like to have 2 histograms to appear on the same plot (with different colors, and possibly differente alphas). I tried
import random
x = pd.DataFrame([random.gauss(3,1) for _ in range(400)])
y = pd.DataFrame([random.gauss(4,2) for _ in range(400)])
x.hist( alpha=0.5, label='x')
y.hist(alpha=0.5, label='y')
x.plot(kind='kde', style='k--')
y.plot(kind='kde', style='k--')
plt.legend(loc='upper right')
plt.show()
This produces the result in 4 different plots. How can I have them on the same one?

If I understood correctly, both hists should go into the same subplot. So it should be
fig = plt.figure()
ax = fig.add_subplot(111)
_ = ax.hist(x.values)
_ = ax.hist(y.values, color='red', alpha=.3)
You can also pass the pandas plot method an axis object, so if you want both kde's in another plot do:
fig = plt.figure()
ax = fig.add_subplot(111)
x.plot(kind='kde', ax=ax)
y.plot(kind='kde', ax=ax, color='red')
To get everything into a single plot you need two different y-scales since kde is density and histogram is frequency. For that you use the axes.twinx() command.
fig = plt.figure()
ax = fig.add_subplot(111)
_ = ax.hist(x.values)
_ = ax.hist(y.values, color='red', alpha=.3)
ax1 = ax.twinx()
x.plot(kind='kde', ax=ax1)
y.plot(kind='kde', ax=ax1, color='red')

You can use plt.figure() and the function add_subplot(): the first 2 arguments are the number of rows and cols you want in your plot, the last is the position of the subplot in the plot.
fig = plt.figure()
subplot = fig.add_subplot(1, 2, 1)
subplot.hist(x.ix[:,0], alpha=0.5)
subplot = fig.add_subplot(1, 2, 2)
subplot.hist(y.ix[:,0], alpha=0.5)

Related

Combine Binned barplot with lineplot

I'd like to represent two datasets on the same plot, one as a line as one as a binned barplot. I can do each individually:
tobar = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
tobar["bins"] = pd.qcut(tobar.index, 20)
bp = sns.barplot(data=tobar, x="bins", y="value")
toline = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
lp = sns.lineplot(data=toline, x=toline.index, y="value")
But when I try to combine them, of course the x axis gets messed up:
fig, ax = plt.subplots()
ax2 = ax.twinx()
bp = sns.barplot(data=tobar, x="bins", y="value", ax=ax)
lp = sns.lineplot(data=toline, x=toline.index, y="value", ax=ax2)
bp.set(xlabel=None)
I also can't seem to get rid of the bin labels.
How can I get these two informations on the one plot?
This answer explains why it's better to plot the bars with matplotlib.axes.Axes.bar instead of sns.barplot or pandas.DataFrame.bar.
In short, the xtick locations correspond to the actual numeric value of the label, whereas the xticks for seaborn and pandas are 0 indexed, and don't correspond to the numeric value.
This answer shows how to add bar labels.
ax2 = ax.twinx() can be used for the line plot if needed
Works the same if the line plot is different data.
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Imports and DataFrame
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# test data
np.random.seed(2022)
df = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
# create the bins
df["bins"] = pd.qcut(df.index, 20)
# add a column for the mid point of the interval
df['mid'] = df.bins.apply(lambda row: row.mid.round().astype(int))
# pivot the dataframe to calculate the mean of each interval
pt = df.pivot_table(index='mid', values='value', aggfunc='mean').reset_index()
Plot 1
# create the figure
fig, ax = plt.subplots(figsize=(30, 7))
# add a horizontal line at y=0
ax.axhline(0, color='black')
# add the bar plot
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
# set the labels on the xticks - if desired
ax.set_xticks(ticks=pt.mid, labels=pt.mid)
# add the intervals as labels on the bars - if desired
ax.bar_label(ax.containers[0], labels=df.bins.unique(), weight='bold')
# add the line plot
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 2
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 3
The bar width is the width of the interval
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=50, alpha=0.5, ec='k')
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')

how to create multiple one plot that contains all my plots

fig = plt.figure()
ax = fig.add_subplot(111)
scatter = ax.scatter(wh1['area'],wh1['rain'],
c=kmeans[0],s=50)
ax.set_title('K-Means Clustering')
ax.set_xlabel('area')
ax.set_ylabel('rain')
plt.colorbar(scatter)
fig = plt.figure()
ax1 = fig.add_subplot(111)
scatter = ax.scatter(wh1['area'],wh1['wind'],
c=kmeans[0],s=50)
ax1.set_title('K-Means Clustering')
ax1.set_xlabel('area')
ax1.set_ylabel('wind')
plt.colorbar(scatter)
plot.show()
this code creates two separate plots, i want to create one plot that contains both of these.i left an image of how the plots appear. Help would be appreciated, thanks
a suggested solution was to avoid plotting twice and using subplots instead, but this causes the 2 graphs to bisect each other any suggested fixes?
fig = plt.figure()
ax = fig.add_subplot(121)
scatter = ax.scatter(wh1['area'],wh1['rain'],
c=kmeans[0],s=50)
ax.set_title('K-Means Clustering')
ax.set_xlabel('area')
ax.set_ylabel('rain')
plt.colorbar(scatter)
ax1 = fig.add_subplot(122)
scatter = ax.scatter(wh1['area'],wh1['wind'],
c=kmeans[0],s=50)
ax1.set_title('K-Means Clustering')
ax1.set_xlabel('area')
ax1.set_ylabel('wind')
plt.colorbar(scatter)
You can use subplots. Instead of making different figures you can call add_subplot on the same figure.
You make a figure by the following code and get a handle to a figure:
fig = plt.figure()
Then you determine the number of rows and columns of plots inside that figure by a number that you pass to the add_subplot function. For example, if you want a layout of one row and two columns the first two digits in the argument is 12 and the third digit determines which cell:
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
So, your code will be like this:
fig = plt.figure()
ax = fig.add_subplot(121)
scatter = ax.scatter(wh1['area'],wh1['rain'],
c=kmeans[0],s=50)
ax.set_title('K-Means Clustering')
ax.set_xlabel('area')
ax.set_ylabel('rain')
plt.colorbar(scatter)
ax1 = fig.add_subplot(122)
scatter = ax1.scatter(wh1['area'],wh1['wind'],
c=kmeans[0],s=50)
ax1.set_title('K-Means Clustering')
ax1.set_xlabel('area')
ax1.set_ylabel('wind')
plt.colorbar(scatter)
plot.show()

Creating a 2x2 subplot from one dataset as different graphs

I have a large census dataset I am working with and am taking different data from it and representing it as a singular .png in the end. I have created the graphs individually, but when I try to add them to the subplots they get distorted or axis get messed up.
Current code:
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)
ax1.pie(df.data.valuecounts(normalize=True),labels=None,startangle-240)
ax1.legend(['a','b','c','d','e'])
ax1.axis('equal')
data2=df[['A']].dropna().values
kde=df.A.plot.kde()
binss = np.logspace(0.01,7.0)
ax2=plt.hist(hincp, normed=True, bins=binss)
ax2=plt.xscale('log')
ax3 = df.replace(np.nan,0)
ax3 = (df.groupby(['G'])['R'].sum()/1000)
ax3.plot.bar(width=0.9, color='red',title='Gs').set_ylabel('Rs')
ax3.set_ylabel('Rs')
ax3.set_xlabel('# G')
t = df[['p','o','s','y']]
ax4=plt.scatter(t.o,t.p,s=t.s,c=t.y, marker = 'o', alpha = 0.2)
plt.ylim(0, 10000)
plt.xlim(0,1200000)
cbar=plt.colorbar()
plt.title("this vs that", loc = 'center')
plt.xlabel('this')
plt.ylabel('that')
All four types of graphs should be displayed and not overlap.
You create Axes for each subplot but then you don't use them.
ax1.pie(...) looks correct but later you don't use ax2,ax3,ax4.
If you are going to to use the DataFrame plotting methods, just call plt.subplot before each new plot. Like this.
df = pd.DataFrame(np.random.random((6,3)))
plt.subplot(3,1,1)
df.loc[:,0].plot()
plt.subplot(3,1,2)
df.loc[:,1].plot()
plt.subplot(3,1,3)
df.loc[:,2].plot()
plt.show()
plt.close()
Or use the Axes that you create.
df = pd.DataFrame(np.random.random((6,3)))
fig = plt.figure()
ax1 = fig.add_subplot(3,1,1)
ax2 = fig.add_subplot(3,1,2)
ax3 = fig.add_subplot(3,1,3)
ax1.plot(df.loc[:,0])
ax2.plot(df.loc[:,1])
ax3.plot(df.loc[:,2])
plt.show()
plt.close()

How can I plot one colorbar for all my subplots using Gridspec?

I am trying to plot a single colorbar (range : 0, max), for all the sublots that I have. I have tried the solutions here Matplotlib 2 Subplots, 1 Colorbar but they require using plt.subplots, which I am not using.
Here is my current code which displays 3 individual colorbars.
fig = plt.figure(figsize=(15,15))
G = gridspec.GridSpec(2, 2)
#Haut
top = plt.subplot(G[0,0], projection='polar')
phi2D_grid, rho2D_grid = np.meshgrid(phi_grid, rho_grid)
plt.pcolormesh(phi2D_grid, rho2D_grid, compteur_top, cmap='jet', vmin=0, vmax=max)
plt.colorbar()
#Côtés
lat = plt.subplot(G[1,:])
phi2D_grid, z2D_grid = np.meshgrid(phi_grid, z_grid)
plt.pcolormesh(phi2D_grid, z2D_grid, compteur_lat, cmap='jet', vmin=0, vmax=max)
plt.colorbar()
#Bas
bot = plt.subplot(G[0, 1], projection='polar')
phi2D_grid, rho2D_grid = np.meshgrid(phi_grid, rho_grid)
plt.pcolormesh(phi2D_grid, rho2D_grid, compteur_bot, cmap='jet', vmin=0, vmax=max)
plt.colorbar()
I think I do need to use Gridspec here given that I need to display my figures as follow:

Plotting grids across the subplots Python matplotlib

I have tried the following:
d = [1,2,3,4,5,6,7,8,9]
f = [0,1,0,0,1,0,1,1,0]
fig = plt.figure()
fig.set_size_inches(30,10)
ax1 = fig.add_subplot(211)
line1 = ax1.plot(d,marker='.',color='b',label="1 row")
ax2 = fig.add_subplot(212)
line1 = ax2.plot(f,marker='.',color='b',label="1 row")
ax1.grid()
ax2.grid()
plt.show()
I got the following output :
But I was expecting the following output:
How I can get the grids across the two plots?
There is no built-in option to create inter-subplot grids. In this case I'd say an easy option is to create a third axes in the background with the same grid in x direction, such that the gridline can be seen in between the two subplots.
import matplotlib.pyplot as plt
d = [1,2,3,4,5,6,7,8,9]
f = [0,1,0,0,1,0,1,1,0]
fig, (ax1,ax2) = plt.subplots(nrows=2, sharex=True)
ax3 = fig.add_subplot(111, zorder=-1)
for _, spine in ax3.spines.items():
spine.set_visible(False)
ax3.tick_params(labelleft=False, labelbottom=False, left=False, right=False )
ax3.get_shared_x_axes().join(ax3,ax1)
ax3.grid(axis="x")
line1 = ax1.plot(d, marker='.', color='b', label="1 row")
line1 = ax2.plot(f, marker='.', color='b', label="1 row")
ax1.grid()
ax2.grid()
plt.show()
Here is my solution:
import matplotlib.pyplot as plt
x1 = [1,2,3,4,5,6,7,8,9]
x2= [0,1,0,0,1,0,1,1,0]
x3= range(-10,0)
# frameon=False removes frames
# fig, (ax1,ax2, ax3) = plt.subplots(nrows=3, sharex=True, subplot_kw=dict(frameon=False))
fig, (ax1,ax2, ax3) = plt.subplots(nrows=3, sharex=True)
# remove vertical gap between subplots
plt.subplots_adjust(hspace=.0)
ax1.grid()
ax2.grid()
ax3.grid()
ax1.plot(x1)
ax2.plot(x2)
ax3.plot(x3)
Without frames subplot_kw=dict(frameon=False):
An option is to create a single plot then just offset the data. So one set plots above the other.

Categories