I am plotting from a pandas dataframe with commands like
fig1 = plt.hist(dataset_1[dataset_1>-1.0],bins=bins,alpha=0.75,label=label1,normed=True)
and the plots comprise multiple histograms on one canvas. Since each histogram is normalised to its own integral (hence the histograms have the same area, because the purpose of the histograms is to illustrate the shape of the datasets rather than their relative sizes), the numbers on the y axis are not meaningful. For now, I am suppressing y axis labelling using
axes.set_ylabel("(Normalised to unity)")
axes.get_yaxis().set_ticks([])
Is there a way of adjusting the scaling of the y axis such that "1" corresponds to the highest value on any histogram? This would display a vertical scale to guide the eye and with which to judge the relative values of different bins. In essence, I mean re-normalising the maximum displayed y value without affecting the scaling of the histograms (i.e. decoupling the axis scale from what it represents).
You have two options:
Drawing histogram, adjusting y axis tick.
You may set the y tick to the location of the maximum and label it with 1 afterwards.
import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
a = np.random.rayleigh(scale=3, size=2000)
hist, edges,_ = plt.hist(a, ec="k")
plt.yticks([0,hist.max()], [0,1])
plt.show()
Normalizing histogram, drawing to scale.
You may normalize the histogram in the way you desire by first calculating the histogram, dividing it by its maximum and then plot a bar plot of it.
import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
a = np.random.rayleigh(scale=3, size=2000)
hist, edges = np.histogram(a)
hist = hist/float(hist.max())
plt.bar(edges[1:], hist, width=np.diff(edges)[0], align="edge", ec="k")
plt.yticks([0,1])
plt.show()
The output in both cases would be the same:
Related
I want to create a 2D joint plot with the following data and from what I've read Seaborn is the best solution for this
I have completed a desired 1_D line plot, and have attempted to create the joint plot in Seaborn by putting the equations for each plot in the respective axes.
I am expecting the plot on the x axis to look similar to the plot I created using matplotlib and therefore the jointplot should have some vertical lines through the circular region.
However the plot output from seaborn on the x axis appears to have smoothed out many of the data points desired giving a smooth curve.
From reading about Seaborn it may not fit my needs for this kind of data, I have attempted using a matrix also but it did not seem to work with Seaborn.
This is the code I used
#imported as required
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
#Set limits for y values (x - axis)
ymin=-.6
ymax= .6
#Set up an array of angle values between defined y values in mm
angle = np.linspace(np.deg2rad(ymin), np.deg2rad(ymax), 1000)
# Define known values
L = 480
a = 0.09
d = 0.4
lam = 670e-6
# Calculate values for position y, alpha and beta
y = np.tan(angle)*L
alpha = (np.pi*a/lam)*np.sin(angle)
beta = (np.pi*d/lam)*np.sin(angle)
I = ((np.sin(alpha)/alpha)**2)*((np.cos(beta))**2)
# Plot the graph of intensity versus displacement
plt.plot(y, I)
import seaborn as sns
p = ((np.sin(alpha)/alpha)**2)*((np.cos(beta))**2) # Interference term and decaying term
q = (np.sin(alpha)/alpha)**2 # Decaying term
sns.jointplot(x=p, y=q, kind='kde',marginal_kws=dict(bw=0.6),bw=0.8)
plt.show()
You might recognize this as famous the Double Slits Experiment
These are the outputs. Note the smooth plot on Seaborn x axis
edit: I have used JointGrid as follows to plot on the axes in an attempt to solve the problem
g = sns.JointGrid(x=p, y=q)
g.plot_joint(sns.kdeplot)
g.plot_marginals(sns.kdeplot)
I am not familiar with Seaborn syntax, so this simple snippet is all I could get to give an output, which had the same problem as my initial attempt.
When I draw displot for discrete variables, the distribution might not be as what I think. For example.
We can find that there are crevices in the barplot so that the curve in kdeplot is "lower" in y axis.
In my work, it was even worse:
I think it may because the "width" or "weight" was not 1 for each bar. But I didn't find any parameter that can justify it.
I'd like to draw such curve (It should be more smooth)
One way to deal with this problem might be to adjust the "bandwidth" of the KDE (see the documentation for seaborn.kdeplot())
n = np.round(np.random.normal(5,2,size=(10000,)))
sns.distplot(n, kde_kws={'bw':1})
EDIT Here is an alternative with a different scale for the bars and the KDE
n = np.round(np.random.normal(5,2,size=(10000,)))
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
sns.distplot(n, kde=False, ax=ax1)
sns.distplot(n, hist=False, ax=ax2, kde_kws={'bw':1})
If the problem is that there are some emptry bins in the histogram, it probably makes sense to specify the bins to match the data. In this case, use bins=np.arange(0,16) to get the bins for all integers in the data.
import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
import seaborn as sns
n = np.random.randint(0,15,10000)
sns.distplot(n, bins=np.arange(0,16), hist_kws=dict(ec="k"))
plt.show()
It seems sns.distplot (or displot https://seaborn.pydata.org/generated/seaborn.displot.html) is for plotting histograms and no barplots. Both Histogram and KDE (which is an approximation of the probability density function) make sense only with continuous random variables.
So in your case, as you'd like to plot a distribution of a discrete random variable, you must go for a bar plot and plotting the Probability Mass Function (PMF) instead.
import numpy as np
import matplotlib.pyplot as plt
array = np.random.randint(15, size=10000)
unique, counts = np.unique(array, return_counts=True)
freq =counts/10000 # to change into frequency, no count
# plotting the points
plt.bar(unique, freq)
# naming the x axis
plt.xlabel('Value')
# naming the y axis
plt.ylabel('Frequency')
#Title
plt.title("Discrete uniform distribution")
# function to show the plot
plt.show()
Here is the histogram
To generate this plot, I did:
bins = np.array([0.03, 0.3, 2, 100])
plt.hist(m, bins = bins, weights=np.zeros_like(m) + 1. / m.size)
However, as you noticed, I want to plot the histogram of the relative frequency of each data point with only 3 bins that have different sizes:
bin1 = 0.03 -> 0.3
bin2 = 0.3 -> 2
bin3 = 2 -> 100
The histogram looks ugly since the size of the last bin is extremely large relative to the other bins. How can I fix the histogram? I want to change the width of the bins but I do not want to change the range of each bin.
As #cel pointed out, this is no longer a histogram, but you can do what you are asking using plt.bar and np.histogram. You then just need to set the xticklabels to a string describing the bin edges. For example:
import numpy as np
import matplotlib.pyplot as plt
bins = [0.03,0.3,2,100] # your bins
data = [0.04,0.07,0.1,0.2,0.2,0.8,1,1.5,4,5,7,8,43,45,54,56,99] # random data
hist, bin_edges = np.histogram(data,bins) # make the histogram
fig,ax = plt.subplots()
# Plot the histogram heights against integers on the x axis
ax.bar(range(len(hist)),hist,width=1)
# Set the ticks to the middle of the bars
ax.set_xticks([0.5+i for i,j in enumerate(hist)])
# Set the xticklabels to a string that tells us what the bin edges were
ax.set_xticklabels(['{} - {}'.format(bins[i],bins[i+1]) for i,j in enumerate(hist)])
plt.show()
EDIT
If you update to matplotlib v1.5.0, you will find that bar now takes a kwarg tick_label, which can make this plotting even easier (see here):
hist, bin_edges = np.histogram(data,bins)
ax.bar(range(len(hist)),hist,width=1,align='center',tick_label=
['{} - {}'.format(bins[i],bins[i+1]) for i,j in enumerate(hist)])
If your actual values of the bins are not important but you want to have a histogram of values of completely different orders of magnitude, you can use a logarithmic scaling along the x axis. This here gives you bars with equal widths
import numpy as np
import matplotlib.pyplot as plt
data = [0.04,0.07,0.1,0.2,0.2,0.8,1,1.5,4,5,7,8,43,45,54,56,99]
plt.hist(data,bins=10**np.linspace(-2,2,5))
plt.xscale('log')
plt.show()
When you have to use your bin values you can do
import numpy as np
import matplotlib.pyplot as plt
data = [0.04,0.07,0.1,0.2,0.2,0.8,1,1.5,4,5,7,8,43,45,54,56,99]
bins = [0.03,0.3,2,100]
plt.hist(data,bins=bins)
plt.xscale('log')
plt.show()
However, in this case the widths are not perfectly equal but still readable. If the widths must be equal and you have to use your bins I recommend #tom's solution.
How can I bin 3d points into 3d bins? Is there a multi dimensional version for np.digitize?
I can use np.digitize separately for each dimension, like here. Is there a better solution?
Thanks!
You can do this with numpy.histogramdd(sample), where the number of bins in each direction and the physical range can be adjusted as with a 1D histogram. More info on the reference page. For more general statistics, like the mean of another variable per point in a bin, you can use the scipy scipy.stats.binned_statistic_dd function, see docs.
For your case with an array of three dimensional points, you would use this in the following way,
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from scipy import stats
#Setup some dummy data
points = np.random.randn(1000,3)
hist, binedges = np.histogramdd(points, normed=False)
#Setup a 3D figure and plot points as well as a series of slices
fig = plt.figure()
ax1 = fig.add_subplot(111, projection='3d')
ax1.plot(points[:,0],points[:,1],points[:,2],'k.',alpha=0.3)
#Use one less than bin edges to give rough bin location
X, Y = np.meshgrid(binedges[0][:-1],binedges[1][:-1])
#Loop over range of slice locations (default histogram uses 10 bins)
for ct in [0,2,5,7,9]:
cs = ax1.contourf(X,Y,hist[:,:,ct],
zdir='z',
offset=binedges[2][ct],
level=100,
cmap=plt.cm.RdYlBu_r,
alpha=0.5)
ax1.set_xlim(-3, 3)
ax1.set_ylim(-3, 3)
ax1.set_zlim(-3, 3)
plt.colorbar(cs)
plt.show()
which gives a series of histogram slices of occupancy at each location,
I have a problem changing my axis labels in Matplotlib. I want to change the radial axis options in my Polar Plot.
Basically, I'm computing the distortion of a cylinder, which is nothing but how much the radius deviates from the original (perfectly circular) cylinder. Some of the distortion values are negative, while some are positive due to tensile and compressive forces. I'm looking for a way to represent this in cylindrical coordinates graphically, so I thought that a polar plot was my best bet. Excel gives me a 'radar chart' option which is flexible enough to let me specify minimum and maximum radial axis values. I want to replicate this on Python using Matplotlib.
My Python script for plotting on polar coordinates is as follows.
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
offset = 2.0
R1 = [-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358]
fig1 = plt.figure()
ax1 = fig1.add_axes([0.1,0.1,0.8,0.8],polar=True)
ax1.set_rmax(1)
ax1.plot(theta,R1,lw=2.5)
My plot looks as follows:
But this is not how I want to present it. I want to vary my radial axis, so that I can show the data as a deviation from some reference value, say -2. How do I ask Matplotlib in polar coordinates to change the minimum axis label? I can do this VERY easily in Excel. I choose a minimum radial value of -2, to get the following Excel radar chart:
On Python, I can easily offset my input data by a magnitude of 2. My new dataset is called R2, as shown:
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
offset = 2.0
R2 = [1.642,1.517,1.521,1.654,1.879,2.137,2.358,2.483,2.479,2.346,2.121,1.863,\
1.642,1.517,1.521,1.654,1.879,2.137,2.358,2.483,2.479,2.346,2.121,1.863,1.642,\
1.517,1.521,1.654,1.879,2.137,2.358,2.483,2.479,2.346,2.121,1.863,1.642]
fig2 = plt.figure()
ax2 = fig2.add_axes([0.1,0.1,0.8,0.8],polar=True)
ax2.plot(theta,R2,lw=2.5)
ax2.set_rmax(1.5*offset)
plt.show()
The plot is shown below:
Once I get this, I can MANUALLY add axis labels and hard-code it into my script. But this is a really ugly way. Is there any way I can directly get a Matplotlib equivalent of the Excel radar chart and change my axis labels without having to manipulate my input data?
You can just use the normal way of setting axis limits:
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
offset = 2.0
R1 = [-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358,-0.483,-0.479,-0.346,-0.121,0.137,0.358,0.483,0.479,0.346,0.121,\
-0.137,-0.358]
fig1 = plt.figure()
ax1 = fig1.add_axes([0.1,0.1,0.8,0.8],polar=True)
ax1.set_ylim(-2,2)
ax1.set_yticks(np.arange(-2,2,0.5))
ax1.plot(theta,R1,lw=2.5)