Plot degree distribution in python - python

I'm new to python and im trying to plot the degree distribution for some data. So I wrote the following function:
def plotDegDistLogLog(G):
degree_sequence = sorted([d for n, d in G.degree()], reverse=True) # degree sequence
degreeCount = collections.Counter(degree_sequence)
deg, cnt = zip(*degreeCount.items())
frac = [n/G.number_of_nodes() for n in cnt]
fig, ax = plt.subplots()
plt.plot(deg, frac, 'o')
ax.set_yscale('log')
ax.set_xscale('log')
plt.ylabel("Fraction of nodes")
plt.xlabel("Degree")
plt.show()
I want to ask:
How can I create bins that grow exponentially in size?
How can I, in each bin, divide the sum of counts by the bin length?
I want to plot a line.

With hist, bins = numpy.histogram(x, bins, density=True) you can specify bins explicitly, so you can choose what you want (for example bins = numpy.exp(numpy.arange(10))). The density argument allows to normalize the histogram. Or you can divide each point of hist by each bin in bins[:-1].

Related

Matplotlib Histograms

I'm trying to plot my data (1D array) into an histogram with Matplotlib.
num_bins = [1,2,3,4,5,6,7,8,9]
n, bins, patches = plt.hist(
cluster_1bo,
num_bins,
density=False,
facecolor='g',
alpha=0.75
)
The problem is that I would like to divide the height of the bin for a given number, say 100, in order to get the average value and plot it in the histogram.
Is it possible to do it with .hist without counting how much times a certain number appears, then divide that number of 100 and plotting?
Hi you could try this:
np.random.seed(10)
values= np.random.randint(0,100,1000)
#change y_axis values
fig, ax =plt.subplots()
ax.hist(values,50)
y_values = ax.get_yticks()#get y_values
ax.set_yticklabels(['{:}'.format(x/100)for x in y_values]) #divide each y value by 100
plt.show()

rayleigh distribution curve on histogram

I have an array of velocity data in directions V_x and V_y. I've plotted a histogram for the velocity norm using the code below,
plt.hist(V_norm_hist, bins=60, density=True, rwidth=0.95)
which gives the following figure:
Now I also want to add a Rayleigh Distribution curve on top of this, but I can't get it to work. I've been trying different combinations using scipy.stats.rayleigh but the scipy homepage isn't really intuative so I can't get it to function properly...
What exactly does the lines
mean, var, skew, kurt = rayleigh.stats(moments='mvsk')
and
x = np.linspace(rayleigh.ppf(0.01),rayleigh.ppf(0.99), 100)
ax.plot(x, rayleigh.pdf(x),'r-', lw=5, alpha=0.6, label='rayleigh pdf')
do?
You might need to first follow the link to rv_continuous, from which rayleigh is subclassed. And from there to the ppf to find out that ppf is the 'Percent point function'. x0 = ppf(0.01) tells at which spot everything less than x0 has accumulated 1% of its total 'weight' and similarly x1 = ppf(0.99) is where 99% of the 'weight' is accumulated. np.linspace(x0, x1, 100) divides the space from x0 to x1 in 100 short intervals. As a continuous distribution can be infinite, these x0 and x1 limits are needed to only show the interesting interval.
rayleigh.pdf(x) gives the pdf at x. So, an indication of how probable each x is.
rayleigh.stats(moments='mvsk') where moments is composed of letters [‘mvsk’] defines which moments to compute: ‘m’ = mean, ‘v’ = variance, ‘s’ = (Fisher’s) skew, ‘k’ = (Fisher’s) kurtosis.
To plot both the histogram and the distribution on the same plot, we need to know the parameters of Raleigh that correspond to your sample (loc and scale). Furthermore, both the pdf and the histogram would need the same x and same y. For the x we can take the limits of the histogram bins. For the y, we can scale up the pdf, knowing that the total area of the pdf is supposed to be 1. And the histogram bins are proportional to the number of entries.
If you do know the loc is 0 but don't know the scale, the wikipedia article gives a formula that connects the scale to the mean of your samples:
estimated_rayleigh_scale = samples.mean() / np.sqrt(np.pi / 2)
Supposing a loc of 0 and a scale of 0.08 the code would look like:
from matplotlib import pyplot as plt
import numpy as np
from scipy.stats import rayleigh
N = 1000
# V = np.random.uniform(0, 0.1, 2*N).reshape((N,2))
# V_norm = (np.linalg.norm(V, axis=1))
scale = 0.08
V_norm_hist = scale * np.sqrt( -2* np.log (np.random.uniform(0, 1, N)))
fig, ax = plt.subplots(1, 1)
num_bins = 60
_binvalues, bins, _patches = plt.hist(V_norm_hist, bins=num_bins, density=False, rwidth=1, ec='white', label='Histogram')
x = np.linspace(bins[0], bins[-1], 100)
binwidth = (bins[-1] - bins[0]) / num_bins
scale = V_norm_hist.mean() / np.sqrt(np.pi / 2)
plt.plot(x, rayleigh(loc=0, scale=scale).pdf(x)*len(V_norm_hist)*binwidth, lw=5, alpha=0.6, label=f'Rayleigh pdf (s={scale:.3f})')
plt.legend()
plt.show()

2D Histogram normalized for probabilities

I have a 2D data set and I would like to plot a 2D histogram, with each cell on the histogram representing the probability of the data point. Hence to obtain the probability, I need to normalize the histogram data so they sum to 1. Here is what I have for an example, from the 2Dhistogram documentation:
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
#create edges of bins
#create random data points
x=np.random.normal(2,1,100)
y=np.random.normal(1,1,100)
H,xedges,yedges = np.histogram2d(x,y,bins=(xedges,yedges))
#setting normed=True in histogram2d doesn't seem to do what I need
H=H.T
#weirdly histogram2d swaps the x,y axis, so transpose to restore it.
fig = plt.figure(figsize=(7,3))
plt.imshow(H,interpolation='nearest',origin='low',extent=[xedges[0], xedges[-1],yedges[0],yedges[-1]])
plt.show()
Resulting plot
Firstly, a np.sum(H) gives something like 86. I would like each cell to represent the probability of the data lying on that binned cell, so they should all sum to 1. Additionally, how do you plot a legend mapping the color intensity to its value with imshow?
Thank you!
Try using the normed argument. Also, per the docs the values in H will be calculated as bin_count / sample_count / bin_area. So we calculate the areas of the bins and multiply it by H to get the probability for the bin.
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
# create edges of bins
x = np.random.normal(2, 1, 100) # create random data points
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges), normed=True)
areas = np.matmul(np.array([np.diff(xedges)]).T, np.array([np.diff(yedges)]))
# setting normed=True in histogram2d doesn't seem to do what I need
fig = plt.figure(figsize=(7, 3))
im = plt.imshow(H*areas, interpolation='nearest', origin='low', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar(im)
plt.show()

Set a radial offset on a polar projection in matplotlib

I have some simulated data in a 2D numpy array with a size like (512, 768).
This data is simulated from rmin = 1 to rmax = 100 and phi from 0 to 2pi
I try to plot this on a polar plot, but without an offset in radial direction this looks really odd. Note: The images are coming from a radial density distribution, so the plotsshould be radial symmetric.
Without xlim/ylim set:
fig = plt.figure()
ax = fig.add_subplot(111, projection='polar')
rho = // 2D numpy array
ax.pcolormesh(rho)
fig.show()
With xlim/ylim set:
fig = plt.figure()
ax = fig.add_subplot(111, projection='polar')
rho = // 2D numpy array
print rho.shape
ax.axis([x_scale[0], x_scale[-1], y_scale[0], y_scale[-1]])
ax.pcolormesh(rho)
fig.show()
With a manual axis + X/Y values.
fig = plt.figure()
ax = fig.add_subplot(111, projection='polar')
rho = // 2D numpy array
print rho.shape
ax.axis([x_scale[0], x_scale[-1], 0, y_scale[-1]])
y_scale_with_offset = np.insert(y_scale, 0, 0)
ax.pcolormesh(x_scale, y_scale_with_offset, rho)
ax.pcolormesh(rho)
Is there a trick to add a radial offset from 1?
I believe you can use ax.set_rmin() with polar plots, a negative value will give you the effect your looking for.
fig = plt.figure()
ax = fig.add_subplot(111, projection='polar')
c = np.ones((50,50)) + np.arange(50).reshape(50,1)
aP = ax.pcolormesh(c)
plt.colorbar(aP)
ax.set_rmin(-10.0)
plt.show()
It's worth including a scale so you know your not just removing data from the plot(I assume this is not what you intended).
On a side note, if you haven't already you should check out the [ipython notebook], you may have been able to find the solution to your problem as you can press tab after typing ax. and it will pop up a list of all the objects you could use. Since matplotlib is nicely labeled, set_rmin is a fairly obvious choice.

Create a stacked 2D histogram using different weights

Say I want to build up histogram of particle data which is smoothed over some bin range, nbin. Now I have 5 data sets with particles of different mass (each set of x,y has a different mass). Ordinarily, a histogram of particle positions is a simple case (using numpy):
heatmap, xedges, yedges = np.histogram2d(x, y, bins=nbin)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
heatmap = np.flipud(np.rot90(heatmap))
ax.imshow(heatmap, extent=extent)
However, if I want to add the next lot of particles, they have different masses and so the density will be different. Is there a way to weight the histogram by some constant such that the plotted heatmap will be a true representation of the density rather than just a binning of the total number of particles?
I know 'weights' is a feature, but is it a case of just setting weights = m_i where m_i is the mass of the particle for each dataset 1-5?
The weights parameter expects an array of the same length as x and y. np.histogram2d. It will not broadcast a constant value, so even though the mass is the same for each call to np.histogram2d, you still must use something like
weights=np.ones_like(x)*mass
Now, one problem you may run into if you use bin=nbin is that the bin edges, xedges, yedges may change depending on the values of x and y that you pass to np.histogram2d. If you naively add heatmaps together, the final result will accumulate particle density in the wrong places.
So if you want to call np.histogram2d more than once and add partial heatmaps together, you must determine in advance where you want the bin edges.
For example:
import numpy as np
import itertools as IT
import matplotlib.pyplot as plt
N = 50
nbin = 10
xs = [np.array([i,i,i+1,i+1]) for i in range(N)]
ys = [np.array([i,i+1,i,i+1]) for i in range(N)]
masses = np.arange(N)
heatmap = 0
xedges = np.linspace(0, N, nbin)
yedges = np.linspace(0, N, nbin)
for x, y, mass in IT.izip(xs, ys, masses):
hist, xedges, yedges = np.histogram2d(
x, y, bins=[xedges, yedges], weights=np.ones_like(x)*mass)
heatmap += hist
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
heatmap = np.flipud(np.rot90(heatmap))
fig, ax = plt.subplots()
ax.imshow(heatmap, extent=extent, interpolation='nearest')
plt.show()
yields

Categories