2D Histogram normalized for probabilities - python

I have a 2D data set and I would like to plot a 2D histogram, with each cell on the histogram representing the probability of the data point. Hence to obtain the probability, I need to normalize the histogram data so they sum to 1. Here is what I have for an example, from the 2Dhistogram documentation:
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
#create edges of bins
#create random data points
x=np.random.normal(2,1,100)
y=np.random.normal(1,1,100)
H,xedges,yedges = np.histogram2d(x,y,bins=(xedges,yedges))
#setting normed=True in histogram2d doesn't seem to do what I need
H=H.T
#weirdly histogram2d swaps the x,y axis, so transpose to restore it.
fig = plt.figure(figsize=(7,3))
plt.imshow(H,interpolation='nearest',origin='low',extent=[xedges[0], xedges[-1],yedges[0],yedges[-1]])
plt.show()
Resulting plot
Firstly, a np.sum(H) gives something like 86. I would like each cell to represent the probability of the data lying on that binned cell, so they should all sum to 1. Additionally, how do you plot a legend mapping the color intensity to its value with imshow?
Thank you!

Try using the normed argument. Also, per the docs the values in H will be calculated as bin_count / sample_count / bin_area. So we calculate the areas of the bins and multiply it by H to get the probability for the bin.
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
# create edges of bins
x = np.random.normal(2, 1, 100) # create random data points
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges), normed=True)
areas = np.matmul(np.array([np.diff(xedges)]).T, np.array([np.diff(yedges)]))
# setting normed=True in histogram2d doesn't seem to do what I need
fig = plt.figure(figsize=(7, 3))
im = plt.imshow(H*areas, interpolation='nearest', origin='low', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar(im)
plt.show()

Related

Plot degree distribution in python

I'm new to python and im trying to plot the degree distribution for some data. So I wrote the following function:
def plotDegDistLogLog(G):
degree_sequence = sorted([d for n, d in G.degree()], reverse=True) # degree sequence
degreeCount = collections.Counter(degree_sequence)
deg, cnt = zip(*degreeCount.items())
frac = [n/G.number_of_nodes() for n in cnt]
fig, ax = plt.subplots()
plt.plot(deg, frac, 'o')
ax.set_yscale('log')
ax.set_xscale('log')
plt.ylabel("Fraction of nodes")
plt.xlabel("Degree")
plt.show()
I want to ask:
How can I create bins that grow exponentially in size?
How can I, in each bin, divide the sum of counts by the bin length?
I want to plot a line.
With hist, bins = numpy.histogram(x, bins, density=True) you can specify bins explicitly, so you can choose what you want (for example bins = numpy.exp(numpy.arange(10))). The density argument allows to normalize the histogram. Or you can divide each point of hist by each bin in bins[:-1].

Matplotlib Histograms

I'm trying to plot my data (1D array) into an histogram with Matplotlib.
num_bins = [1,2,3,4,5,6,7,8,9]
n, bins, patches = plt.hist(
cluster_1bo,
num_bins,
density=False,
facecolor='g',
alpha=0.75
)
The problem is that I would like to divide the height of the bin for a given number, say 100, in order to get the average value and plot it in the histogram.
Is it possible to do it with .hist without counting how much times a certain number appears, then divide that number of 100 and plotting?
Hi you could try this:
np.random.seed(10)
values= np.random.randint(0,100,1000)
#change y_axis values
fig, ax =plt.subplots()
ax.hist(values,50)
y_values = ax.get_yticks()#get y_values
ax.set_yticklabels(['{:}'.format(x/100)for x in y_values]) #divide each y value by 100
plt.show()

How do I get a white border in a figure that has been plotted in python?

I have written the following code that calculates the orientation of a blob using eigenvalues. When the orientation is determined, the function "straighten_up" straightens the blob out.
The only thing I'm missing to be fully satisfied, is a 1px white border in the second output figure between the black area and the green area. How can I do this?
I'm using a mask image as input:
code:
import numpy as np
import matplotlib.pyplot as plt
import cv2
img = cv2.imread('input_image.png',100)
edges = cv2.Canny(img,0,255) #searching for a border
# compute the orientation of a blob
img = edges
y, x = np.nonzero(img) # Find the index of the white pixels
x = x - np.mean(x) #The average of an array of elements
y = y - np.mean(y)
coords = np.vstack([x, y])
cov = np.cov(coords) #determine covariance matrix
evals, evecs = np.linalg.eig(cov) #eigenvectors
sort_indices = np.argsort(evals)[::-1] #Sort Eigenvalues in decreasing order
x_v1, y_v1 = evecs[:, sort_indices[0]]
x_v2, y_v2 = evecs[:, sort_indices[1]]
scale = 30
plt.plot([x_v1*-scale*2, x_v1*scale*2], #plot to show the eigenvectors
[y_v1*-scale*2, y_v1*scale*2], color='red')
plt.plot([x_v2*-scale, x_v2*scale],
[y_v2*-scale, y_v2*scale], color='blue')
plt.plot(x, y, 'k.')
plt.axis('equal')
plt.gca().invert_yaxis()
plt.show()
def straighten_up(x_v1,y_v1,coords):
theta = np.arctan((x_v1)/(y_v1))
rotation_mat =np.matrix([[np.cos(theta), -np.sin(theta)],[np.sin(theta),np.cos(theta)]])
transformed_mat = rotation_mat*coords
x_transformed, y_transformed = transformed_mat.A
fig, ax = plt.subplots(nrows=1, ncols=1)
ax = fig.add_subplot(1, 1, 1) # nrows, ncols, index
ax.set_facecolor((1.0, 0.47, 0.42))
plt.plot(x_transformed,y_transformed,"black")
straighten_up(x_v1,y_v1,coords)
plt.show()
with output:
Your x_transformed and y_transformed are the x and y coordinates of the rotated border. So you can draw them e.g. with plt.scatter. This draws dots (the third parameter is the size) on these x,y positions. Use zorder to make sure the scatter dots are not hidden by the previous parts of the plot.
Following code does just that:
fig, ax = plt.subplots(nrows=1, ncols=1)
ax = fig.add_subplot(1, 1, 1) # nrows, ncols, index
ax.set_facecolor('fuchsia')
plt.axis('equal')
plt.plot(x_transformed, y_transformed, c="lime")
plt.scatter(x_transformed, y_transformed, 1, c="white", zorder=3)
plt.show()
As you notice, there is another problem: the plot of the filled figure isn't similar to your input image. What is happening, is that plot draws lines(x[0],y[0]) to (x[1],y[1]) to (x[2],y[2]) etc.. As your x and y are only the border points, not ordered as a polygon, it is more complicated to get a correctly filled polygon. For a random input image, you can have many borders, that can form polygons with holes and islands and which can touch the image borders.
To properly get the interior points, you might get y, x = np.nonzero(img) from the original image (instead of only the edges), then do the same shift subtracting the mean of the edges, and use the same transformation matrix.

matplotlib - randomly pick N points from 2D array, and plot spatial scatter plot

I have plots like the following:
Left Plot : original 100 * 100 numpy data
Right Plot : What I want - randomly choose N data from the original data, and plot them on a surface plot
How can I randomly choose N number of data from the left plot, and plot the chosen data on a scatter plot like the right plot?
I used ax.imshow(data) to generate the surface plot on the left. data is a 2D numpy array.
If you want to colorize the randomly chosen points according to the image you can use the same colormap and normalization for the scatter as you have for the image.
import numpy as np
import matplotlib.pyplot as plt
original_data = np.random.rand(100,100)
fig, (ax, ax2) = plt.subplots(ncols=2)
im = ax.imshow(original_data, cmap="summer")
N = 89
x = np.random.randint(0,100,size=N)
y = np.random.randint(0,100,size=N)
random_sample = original_data[x,y]
sc = ax2.scatter(x,y,c=random_sample, cmap=im.cmap, norm=im.norm)
ax2.set_aspect("equal")
ax2.set(xlim=ax.get_xlim(), ylim=ax.get_ylim())
fig.colorbar(sc, ax=[ax,ax2], orientation="horizontal")
plt.show()
You just need to choose N numbers from 10,000 (100 x 100) unique points on the 2d plot. I assume you want without replacement. Then you can "unravel" them onto your x,y coordinate.
random_choices = np.random.choice(10000, size=N, replace=False)
x, y = np.unravel_index(random_choices, (100, 100))
You can use these indices to create your scatter plot and size points appropriately:
data = np.random.random((100, 100))
plt.scatter(x, y, s=data[y, x])

Create a stacked 2D histogram using different weights

Say I want to build up histogram of particle data which is smoothed over some bin range, nbin. Now I have 5 data sets with particles of different mass (each set of x,y has a different mass). Ordinarily, a histogram of particle positions is a simple case (using numpy):
heatmap, xedges, yedges = np.histogram2d(x, y, bins=nbin)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
heatmap = np.flipud(np.rot90(heatmap))
ax.imshow(heatmap, extent=extent)
However, if I want to add the next lot of particles, they have different masses and so the density will be different. Is there a way to weight the histogram by some constant such that the plotted heatmap will be a true representation of the density rather than just a binning of the total number of particles?
I know 'weights' is a feature, but is it a case of just setting weights = m_i where m_i is the mass of the particle for each dataset 1-5?
The weights parameter expects an array of the same length as x and y. np.histogram2d. It will not broadcast a constant value, so even though the mass is the same for each call to np.histogram2d, you still must use something like
weights=np.ones_like(x)*mass
Now, one problem you may run into if you use bin=nbin is that the bin edges, xedges, yedges may change depending on the values of x and y that you pass to np.histogram2d. If you naively add heatmaps together, the final result will accumulate particle density in the wrong places.
So if you want to call np.histogram2d more than once and add partial heatmaps together, you must determine in advance where you want the bin edges.
For example:
import numpy as np
import itertools as IT
import matplotlib.pyplot as plt
N = 50
nbin = 10
xs = [np.array([i,i,i+1,i+1]) for i in range(N)]
ys = [np.array([i,i+1,i,i+1]) for i in range(N)]
masses = np.arange(N)
heatmap = 0
xedges = np.linspace(0, N, nbin)
yedges = np.linspace(0, N, nbin)
for x, y, mass in IT.izip(xs, ys, masses):
hist, xedges, yedges = np.histogram2d(
x, y, bins=[xedges, yedges], weights=np.ones_like(x)*mass)
heatmap += hist
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
heatmap = np.flipud(np.rot90(heatmap))
fig, ax = plt.subplots()
ax.imshow(heatmap, extent=extent, interpolation='nearest')
plt.show()
yields

Categories