I'm trying to plot my data (1D array) into an histogram with Matplotlib.
num_bins = [1,2,3,4,5,6,7,8,9]
n, bins, patches = plt.hist(
cluster_1bo,
num_bins,
density=False,
facecolor='g',
alpha=0.75
)
The problem is that I would like to divide the height of the bin for a given number, say 100, in order to get the average value and plot it in the histogram.
Is it possible to do it with .hist without counting how much times a certain number appears, then divide that number of 100 and plotting?
Hi you could try this:
np.random.seed(10)
values= np.random.randint(0,100,1000)
#change y_axis values
fig, ax =plt.subplots()
ax.hist(values,50)
y_values = ax.get_yticks()#get y_values
ax.set_yticklabels(['{:}'.format(x/100)for x in y_values]) #divide each y value by 100
plt.show()
Related
I am trying to make a normalized histogram in matplotlib, however I want it normalized such that the total area will be 1000. Is there a way to do this?
I know to get it normalized to 1, you just have to include density=True,stacked=True in the argument of plt.hist(). An equivalent solution would be to do this and multiply the height of each column by 1000, if that would be more doable than changing what the histogram is normalized to.
Thank you very much in advance!
The following approach uses np.histogram to calculate the counts for each histogram bin. Using 1000 / total_count / bin_width as normalization factor, the total area will be 1000. On the contrary, to get the sum of all bar heights to be 1000, a factor of 1000 / total_count would be needed.
plt.bar is used to display the end result.
The example code calculates the same combined histogram with density=True, to compare it with the new histogram summing to 1000.
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.randn(100) * 5 + 10, np.random.randn(300) * 4 + 14, np.random.randn(100) * 3 + 17]
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 4))
ax1.hist(data, stacked=True, density=True)
ax1.set_title('Histogram with density=True')
xmin = min([min(d) for d in data])
xmax = max([max(d) for d in data])
bins = np.linspace(xmin, xmax, 11)
bin_width = bins[1] - bins[0]
counts = [np.histogram(d, bins=bins)[0] for d in data]
total_count = sum([sum(c) for c in counts])
# factor = 1000 / total_count # to sum to 1000
factor = 1000 / total_count / bin_width # for an area of 1000
thousands = [c * factor for c in counts]
bottom = 0
for t in thousands:
ax2.bar(bins[:-1], t, bottom=bottom, width=bin_width, align='edge')
bottom += t
ax2.set_title('Histogram with total area of 1000')
plt.show()
An easy way to do this is to set up a second y-axis whose tick labels are the original multiplied by 1000, then hide the original axis' ticks:
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.randn(5000)]
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
#hist returns a tuple that contains a list of y values at its 0 index:
y,_,_ = ax1.hist(data, density=True, bins=10, edgecolor = 'black')
#find max y value of histogram and multiply by 1000:
max_y = np.round(y.max(),1)*1000
#set up the second y-axis ticks as increments of max_y:
ax2.set_ylim(0,max_y)
ax2.set_yticks(np.linspace(0, max_y, 9))
#hide original y-axis ticks:
ax1.axes.yaxis.set_ticks([])
plt.show()
I'm new to python and im trying to plot the degree distribution for some data. So I wrote the following function:
def plotDegDistLogLog(G):
degree_sequence = sorted([d for n, d in G.degree()], reverse=True) # degree sequence
degreeCount = collections.Counter(degree_sequence)
deg, cnt = zip(*degreeCount.items())
frac = [n/G.number_of_nodes() for n in cnt]
fig, ax = plt.subplots()
plt.plot(deg, frac, 'o')
ax.set_yscale('log')
ax.set_xscale('log')
plt.ylabel("Fraction of nodes")
plt.xlabel("Degree")
plt.show()
I want to ask:
How can I create bins that grow exponentially in size?
How can I, in each bin, divide the sum of counts by the bin length?
I want to plot a line.
With hist, bins = numpy.histogram(x, bins, density=True) you can specify bins explicitly, so you can choose what you want (for example bins = numpy.exp(numpy.arange(10))). The density argument allows to normalize the histogram. Or you can divide each point of hist by each bin in bins[:-1].
I have a 2D data set and I would like to plot a 2D histogram, with each cell on the histogram representing the probability of the data point. Hence to obtain the probability, I need to normalize the histogram data so they sum to 1. Here is what I have for an example, from the 2Dhistogram documentation:
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
#create edges of bins
#create random data points
x=np.random.normal(2,1,100)
y=np.random.normal(1,1,100)
H,xedges,yedges = np.histogram2d(x,y,bins=(xedges,yedges))
#setting normed=True in histogram2d doesn't seem to do what I need
H=H.T
#weirdly histogram2d swaps the x,y axis, so transpose to restore it.
fig = plt.figure(figsize=(7,3))
plt.imshow(H,interpolation='nearest',origin='low',extent=[xedges[0], xedges[-1],yedges[0],yedges[-1]])
plt.show()
Resulting plot
Firstly, a np.sum(H) gives something like 86. I would like each cell to represent the probability of the data lying on that binned cell, so they should all sum to 1. Additionally, how do you plot a legend mapping the color intensity to its value with imshow?
Thank you!
Try using the normed argument. Also, per the docs the values in H will be calculated as bin_count / sample_count / bin_area. So we calculate the areas of the bins and multiply it by H to get the probability for the bin.
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
# create edges of bins
x = np.random.normal(2, 1, 100) # create random data points
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges), normed=True)
areas = np.matmul(np.array([np.diff(xedges)]).T, np.array([np.diff(yedges)]))
# setting normed=True in histogram2d doesn't seem to do what I need
fig = plt.figure(figsize=(7, 3))
im = plt.imshow(H*areas, interpolation='nearest', origin='low', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar(im)
plt.show()
The Y limits on my imshow subplot are stuck on a seemingly arbitrary range.
In this example, I'm trying to show the mean of N trials and then plot all the N trials over time as a 2d plot.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
N = 20 # number of trials
M = 3000 # number of samples in each trial
data = np.random.randn(N, M)
x = np.linspace(0,1,M) # the M samples occur in the range 0-1
# ie the sampling rate is 3000 samples per second
f, (ax1, ax2) = plt.subplots(2,1, sharex=True)
ax1.plot(x, np.mean(data, 0))
ax2.imshow(data, cmap="inferno", interpolation="nearest", extent=[0, 1, 0, N])
ax2.set_ylim(0, N)
ax1.set_ylabel("mean over trials")
ax2.set_ylabel("trial")
ax2.set_xlabel("time")
Are there any tricks to set the Y limits correctly?
By default, imshow uses an equal aspect ratio.
Since your x-axis is fixed to the extent of the plot above (ax1), which is 1, the y-axis can only extent to a fraction of 1.
The solution is actually quite simple: You just need to add
ax2.set_aspect('auto')
Say I want to build up histogram of particle data which is smoothed over some bin range, nbin. Now I have 5 data sets with particles of different mass (each set of x,y has a different mass). Ordinarily, a histogram of particle positions is a simple case (using numpy):
heatmap, xedges, yedges = np.histogram2d(x, y, bins=nbin)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
heatmap = np.flipud(np.rot90(heatmap))
ax.imshow(heatmap, extent=extent)
However, if I want to add the next lot of particles, they have different masses and so the density will be different. Is there a way to weight the histogram by some constant such that the plotted heatmap will be a true representation of the density rather than just a binning of the total number of particles?
I know 'weights' is a feature, but is it a case of just setting weights = m_i where m_i is the mass of the particle for each dataset 1-5?
The weights parameter expects an array of the same length as x and y. np.histogram2d. It will not broadcast a constant value, so even though the mass is the same for each call to np.histogram2d, you still must use something like
weights=np.ones_like(x)*mass
Now, one problem you may run into if you use bin=nbin is that the bin edges, xedges, yedges may change depending on the values of x and y that you pass to np.histogram2d. If you naively add heatmaps together, the final result will accumulate particle density in the wrong places.
So if you want to call np.histogram2d more than once and add partial heatmaps together, you must determine in advance where you want the bin edges.
For example:
import numpy as np
import itertools as IT
import matplotlib.pyplot as plt
N = 50
nbin = 10
xs = [np.array([i,i,i+1,i+1]) for i in range(N)]
ys = [np.array([i,i+1,i,i+1]) for i in range(N)]
masses = np.arange(N)
heatmap = 0
xedges = np.linspace(0, N, nbin)
yedges = np.linspace(0, N, nbin)
for x, y, mass in IT.izip(xs, ys, masses):
hist, xedges, yedges = np.histogram2d(
x, y, bins=[xedges, yedges], weights=np.ones_like(x)*mass)
heatmap += hist
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
heatmap = np.flipud(np.rot90(heatmap))
fig, ax = plt.subplots()
ax.imshow(heatmap, extent=extent, interpolation='nearest')
plt.show()
yields