Superimposition of histogram and density in Pandas/Matplotlib in Python

Superimposition of histogram and density in Pandas/Matplotlib in Python - python

I've got a Pandas dataframe named clean which contains a column v for which I would like to draw a histogram and superimpose a density plot. I know I can plot one under the other this way:
import pandas as pd
import matplotlib.pyplot as plt
Maxv=200
plt.subplot(211)
plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
plt.subplot(212)
ax=clean['v'].plot(kind='density')
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
But when I try to superimpose, y scales doesn't match (and I loose y axis ticks and labels):
yhist, xhist, _hist = plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
ax=clean['v'].plot(kind='density') #I would like to insert here a normalization to max(yhist)/max(ax)
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
Some hint? (Additional question: how can I change the width of density smoothing?)

Based on your code, this should work:
ax = clean.v.plot(kind='hist', bins=40, normed=True)
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
ax.set(xlim=[0, Maxv])
You might not even need the secondary_y anymore.

No I try this:
ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv))
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
But the range part doesn't work, and ther's still the left y-axis problem

Seaborn makes this easy
import seaborn as sns
sns.distplot(df['numeric_column'],bins=25)

Related

How to plot a paired histogram using seaborn

I would like to make a paired histogram like the one shown here using the seaborn distplot.
This kind of plot can also be referred to as the back-to-back histogram shown here, or a bihistogram inverted/mirrored along the x-axis as discussed here.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20,10,1000)
blue = np.random.poisson(60,1000)
fig, ax = plt.subplots(figsize=(8,6))
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='blue')
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='green')
ax.set_xticks(np.arange(-20,121,20))
ax.set_yticks(np.arange(0.0,0.07,0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
Here is the output:
When I use the method discussed here (plt.barh), I get the bar plot shown just below, which is not what I am looking for.
Or maybe I haven't understood the workaround well enough...
A simple/short implementation of python-seaborn-distplot similar to these kinds of plots would be perfect. I edited the figure of my first plot above to show the kind of plot I hope to achieve (though y-axis not upside down):
Any leads would be greatly appreciated.

You could use two subplots and invert the y-axis of the lower one and plot with the same bins.
df = pd.DataFrame({'a': np.random.normal(0,5,1000), 'b': np.random.normal(20,5,1000)})
fig =plt.figure(figsize=(5,5))
ax = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
bins = np.arange(-20,40)
ax.hist(df['a'], bins=bins)
ax2.hist(df['b'],color='orange', bins=bins)
ax2.invert_yaxis()
edit:
improvements suggested by #mwaskom
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(5,5))
bins = np.arange(-20,40)
for ax, column, color, invert in zip(axes.ravel(), df.columns, ['teal', 'orange'], [False,True]):
ax.hist(df[column], bins=bins, color=color)
if invert:
ax.invert_yaxis()
plt.subplots_adjust(hspace=0)

Here is a possible approach using seaborn's displots.
Seaborn doesn't return the created graphical elements, but the ax can be interrogated. To make sure the ax only contains the elements you want upside down, those elements can be drawn first. Then, all the patches (the rectangular bars) and the lines (the curve for the kde) can be given their height in negative. Optionally the x-axis can be set at y == 0 using ax.spines['bottom'].set_position('zero').
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20, 10, 1000)
blue = np.random.poisson(60, 1000)
fig, ax = plt.subplots(figsize=(8, 6))
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='green')
for p in ax.patches: # turn the histogram upside down
p.set_height(-p.get_height())
for l in ax.lines: # turn the kde curve upside down
l.set_ydata(-l.get_ydata())
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='blue')
ax.set_xticks(np.arange(-20, 121, 20))
ax.set_yticks(np.arange(0.0, 0.07, 0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
pos_ticks = np.array([t for t in ax.get_yticks() if t > 0])
ticks = np.concatenate([-pos_ticks[::-1], [0], pos_ticks])
ax.set_yticks(ticks)
ax.set_yticklabels([f'{abs(t):.2f}' for t in ticks])
ax.spines['bottom'].set_position('zero')
plt.show()

Matplotlib graph expand the x axis

I have the following .
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
x_values = [2**6,2**7,2**8,2**9,2**10,2**12]
y_values_ST = [7.3,15,29,58,117,468]
y_values_S3 = [2.3,4.6,9.1,19,39,156]
xticks=['2^6','2^7','2^8','2^9','2^10','2^12']
plt.plot(x_values, y_values_ST,'-gv')
plt.plot(x_values, y_values_S3,'-r+')
plt.legend(['ST','S^3'], loc='upper left')
plt.xticks(x_values,xticks)
fig.suptitle('Encrypted Query Size Overhead')
plt.xlabel('Query size')
plt.ylabel('Size in KB')
plt.grid()
fig.savefig('token_size_plot.pdf')
plt.show()
1)How i can delete the last gap as shown after 2^12?
2)How i can spread more the values in the x axis such that the first two values are not overlapped?

1)How i can delete the last gap as shown after 2^12?
Set the limits explicitly, e.g.:
plt.xlim(2**5.8, 2**12.2)
2)How i can spread more the values in the x axis such that the first two values are not overlapped?
You seem to want a log plot. Use pyplot.semilog(), or set the log scale on the x-axis (base 2 seems appropriate in your case):
plt.xscale('log', basex=2)
Note that in this case you don't even have to set the 2^* ticks manually, they will be created this way automatically.

1.Using autoscale, specify the axes, or alternately you can use plt.axis('tight') for both the axes. 2.Using log scaled x-axis. Code below:
import matplotlib.pyplot as plt
fig = plt.figure()
x_values = [2**6,2**7,2**8,2**9,2**10,2**12]
y_values_ST = [7.3,15,29,58,117,468]
y_values_S3 = [2.3,4.6,9.1,19,39,156]
xticks=['2^6','2^7','2^8','2^9','2^10','2^12']
ax = plt.gca()
ax.set_xscale('log')
plt.plot(x_values, y_values_ST,'-gv')
plt.plot(x_values, y_values_S3,'-r+')
plt.legend(['ST','S^3'], loc='upper left')
plt.xticks(x_values,xticks)
fig.suptitle('Encrypted Query Size Overhead')
plt.xlabel('Query size')
plt.ylabel('Size in KB')
plt.autoscale(enable=True, axis='x', tight=True)#plt.axis('tight')
plt.grid()
fig.savefig('token_size_plot.pdf')
plt.show()

No color when I make python scatter color plot using third variable to define color

I try to make colorful scatter plot using third variable to define color. It is simple to use the following code:
plt.scatter(mH, mA, s=1, c=mHc)
plt.colorbar()
plt.show()
But I do not have many choices to modify the frame of the plot. I am trying the following code to make colorful scatter plot, at the same time I try to optimize the frame of the plot:
import numpy as np
import math
from matplotlib import rcParams
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
fig, ax = plt.subplots()
cax = ax.scatter(mH,mA,s=0.5,c=mHc) ### mH, mA, mHC are the dataset
fig.colorbar(cax)
minor_locator1 = AutoMinorLocator(6)
minor_locator2 = AutoMinorLocator(6)
ax.xaxis.set_minor_locator(minor_locator1)
ax.yaxis.set_minor_locator(minor_locator2)
ax.tick_params('both', length=10, width=2, which='major')
ax.tick_params('both', length=5, width=2, which='minor')
ax.set_xlabel(r'$m_H$')
ax.set_ylabel(r'$m_A$')
ax.set_xticks([300,600,900,1200,1500])
ax.set_yticks([300,600,900,1200,1500])
plt.savefig('mH_mA.png',bbox_inches='tight')
plt.show()
But the plot I got is black-white. It looks like the problem lies in the marker size argument, but I do not have much idea how to correct it. I want to have smaller marker size. Anyone can offer me some idea to approach this issue. Thanks.

size=0.5 is extremely small - probably all you are seeing is the marker outlines. I would suggest you increase the size a bit, and perhaps pass edgecolors="none" to turn off the marker edge stroke:
import numpy as np
from matplotlib import pyplot as plt
n = 10000
x, y = np.random.randn(2, n)
z = -(x**2 + y**2)**0.5
fig, ax = plt.subplots(1, 1)
ax.scatter(x, y, s=5, c=z, cmap="jet", edgecolors="none")
You might also want to experiment with making the points semi-transparent using the alpha= parameter:
ax.scatter(x, y, s=20, c=z, alpha=0.1, cmap="jet", edgecolors="none")
It can be difficult to get scatter plots to look nice when you have such a massive number of overlapping points. I would be tempted to plot your data as a 2D histogram or contour plot instead, or perhaps even a combination of a scatter plot and a contour plot:
density, xe, ye = np.histogram2d(x, y, bins=20, normed=True)
ax.hold(True)
ax.scatter(x, y, s=5, c=z, cmap="jet", edgecolors="none")
ax.contour(0.5*(xe[:-1] + xe[1:]), 0.5*(ye[:-1] + ye[1:]), density,
colors='k')

Use matplotlib: plot error bars on two y axes

I'd like to plot a series with x and y error bars, then plot a second series with x and y error bars on a second y axis all on the same subplot. Can this be done with matplotlib?
import matplotlib.pyplot as plt
plt.figure()
ax1 = plt.errorbar(voltage, dP, xerr=voltageU, yerr=dPU)
ax2 = plt.errorbar(voltage, current, xerr=voltageU, yerr=currentU)
plt.show()
Basically, I'd like to put ax2 on a second axis and have the scale on the right side.
Thanks!

twinx() is your friend for adding a secondary y-axis, e.g.:
import matplotlib.pyplot as pl
import numpy as np
pl.figure()
ax1 = pl.gca()
ax1.errorbar(np.arange(10), np.arange(10), xerr=np.random.random(10), yerr=np.random.random(10), color='g')
ax2 = ax1.twinx()
ax2.errorbar(np.arange(10), np.arange(10)+5, xerr=np.random.random(10), yerr=np.random.random(10), color='r')
There is not a lot of documentation except for:
matplotlib.pyplot.twinx(ax=None)
Make a second axes that shares the x-axis. The new axes will overlay ax (or the current axes if ax is None). The ticks for ax2 will be placed on the right, and the ax2 instance is returned.

I was struggling to share the x-axis, but thank you #Bart you saved me!
The simple solution is use twiny instead of twinx
ax1.errorbar(layers, scores_means[str(epoch)][h,:],np.array(scores_stds[str(epoch)][h,:]))
# Make the y-axis label, ticks and tick labels match the line color.
ax1.set_xlabel('depth', color='b')
ax1.tick_params('x', colors='b')
ax2 = ax1.twiny()
ax2.errorbar(hidden_dim, scores_means[str(epoch)][:,l], np.array(scores_stds[str(epoch)][:,l]))
ax2.set_xlabel('width', color='r')
ax2.tick_params('x', colors='r')
fig.tight_layout()
plt.show()

python: scatter plot logarithmic scale

In my code, I take the logarithm of two data series and plot them. I would like to change each tick value of the x-axis by raising it to the power of e (anti-log of natural logarithm).
In other words. I want to graph the logarithms of both series but have x-axis in levels.
Here is the code that I'm using.
from pylab import scatter
import pylab
import matplotlib.pyplot as plt
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
file_name = '/Users/joedanger/Desktop/Python/scatter_python.csv'
data = DataFrame(pd.read_csv(file_name))
y = np.log(data['o_value'], dtype='float64')
x = np.log(data['time_diff_day'], dtype='float64')
fig = plt.figure()
plt.scatter(x, y, c='blue', alpha=0.05, edgecolors='none')
fig.suptitle('test title', fontsize=20)
plt.xlabel('time_diff_day', fontsize=18)
plt.ylabel('o_value', fontsize=16)
plt.xticks([-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4])
plt.grid(True)
pylab.show()

let matplotlib take the log for you:
fig = plt.figure()
ax = plt.gca()
ax.scatter(data['o_value'] ,data['time_diff_day'] , c='blue', alpha=0.05, edgecolors='none')
ax.set_yscale('log')
ax.set_xscale('log')
If you are using all the same size and color markers, it is faster to use plot
fig = plt.figure()
ax = plt.gca()
ax.plot(data['o_value'] ,data['time_diff_day'], 'o', c='blue', alpha=0.05, markeredgecolor='none')
ax.set_yscale('log')
ax.set_xscale('log')

The accepted answer is a bit out of date. At least pandas 0.25 natively supports log axes:
# logarithmic X
df.plot.scatter(..., logx=True)
# logarithmic Y
df.plot.scatter(..., logy=True)
# both
df.plot.scatter(..., loglog=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Superimposition of histogram and density in Pandas/Matplotlib in Python - python

Based on your code, this should work: ax = clean.v.plot(kind='hist', bins=40, normed=True) clean.v.plot(kind='kde', ax=ax, secondary_y=True) ax.set(xlim=[0, Maxv]) You might not even need the secondary_y anymore.

No I try this: ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv)) clean.v.plot(kind='kde', ax=ax, secondary_y=True) But the range part doesn't work, and ther's still the left y-axis problem

Seaborn makes this easy import seaborn as sns sns.distplot(df['numeric_column'],bins=25)

Related

How to plot a paired histogram using seaborn

Matplotlib graph expand the x axis

No color when I make python scatter color plot using third variable to define color

Use matplotlib: plot error bars on two y axes

python: scatter plot logarithmic scale

Categories

Resources