Say I want to build up histogram of particle data which is smoothed over some bin range, nbin. Now I have 5 data sets with particles of different mass (each set of x,y has a different mass). Ordinarily, a histogram of particle positions is a simple case (using numpy):
heatmap, xedges, yedges = np.histogram2d(x, y, bins=nbin)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
heatmap = np.flipud(np.rot90(heatmap))
ax.imshow(heatmap, extent=extent)
However, if I want to add the next lot of particles, they have different masses and so the density will be different. Is there a way to weight the histogram by some constant such that the plotted heatmap will be a true representation of the density rather than just a binning of the total number of particles?
I know 'weights' is a feature, but is it a case of just setting weights = m_i where m_i is the mass of the particle for each dataset 1-5?
The weights parameter expects an array of the same length as x and y. np.histogram2d. It will not broadcast a constant value, so even though the mass is the same for each call to np.histogram2d, you still must use something like
weights=np.ones_like(x)*mass
Now, one problem you may run into if you use bin=nbin is that the bin edges, xedges, yedges may change depending on the values of x and y that you pass to np.histogram2d. If you naively add heatmaps together, the final result will accumulate particle density in the wrong places.
So if you want to call np.histogram2d more than once and add partial heatmaps together, you must determine in advance where you want the bin edges.
For example:
import numpy as np
import itertools as IT
import matplotlib.pyplot as plt
N = 50
nbin = 10
xs = [np.array([i,i,i+1,i+1]) for i in range(N)]
ys = [np.array([i,i+1,i,i+1]) for i in range(N)]
masses = np.arange(N)
heatmap = 0
xedges = np.linspace(0, N, nbin)
yedges = np.linspace(0, N, nbin)
for x, y, mass in IT.izip(xs, ys, masses):
hist, xedges, yedges = np.histogram2d(
x, y, bins=[xedges, yedges], weights=np.ones_like(x)*mass)
heatmap += hist
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
heatmap = np.flipud(np.rot90(heatmap))
fig, ax = plt.subplots()
ax.imshow(heatmap, extent=extent, interpolation='nearest')
plt.show()
yields
Related
I am trying to get a heatmap kind of color map with known values at points in plane which are spread non-uniformly (i.e., not uniform or not in grid form)
import numpy as np
x = np.random.rand(100)
y = np.random.rand(100)
z = np.random.rand(100)
Here, Using x, y and z, I want a plot showing intensity z[i] at point coordinates (x[i], y[i])
I tried using pcolor mesh of matplotlib as follows
import matplotlib.pyplot as plt
plt.figure()
plt.pcolormesh(x.reshape(10, 10), y.reshape(10, 10), z.reshape(10, 10))
plt.colorbar()
plt.show()
But this is giving the following error,
UserWarning: The input coordinates to pcolormesh are interpreted as cell centers, but are not monotonically increasing or decreasing. This may lead to incorrectly calculated cell edges, in which case, please supply explicit cell edges to pcolormesh.
I'd rather use a tricontourf
In [11]: x = np.random.rand(10000)
...: y = np.random.rand(10000)
...: z = np.random.rand(10000)+3*x+2*y
...: plt.tricontourf(x, y, z)
...: plt.colorbar()
...: plt.show()
I have measured the positions of different products in different angles positions (6 values in steps of 60 deg. over a complete rotation). Instead of representing my values on a Cartesian graph where 0 and 360 are the same point, I want to use a polar graph.
With matplotlib, I got a spider chart type graph, but I want to avoid straight lines between points and display and extrapolated values between those. I have a solution that is kind of OK, but I was hoping there is a nice "one liner" I could use to have a more realistic representation or a better tangent handling for some points.
Does anyone have an idea to improve my code below ?
# Libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Some data to play with
df = pd.DataFrame({'measure':[10, -5, 15,20,20, 20,15,5,10], 'angle':[0,45,90,135,180, 225, 270, 315,360]})
# The few lines I would like to avoid...
angles = [y/180*np.pi for x in [np.arange(x, x+45,5) for x in df.angle[:-1]] for y in x]
values = [y for x in [np.linspace(x, df.measure[i+1], 10)[:-1] for i, x in enumerate(df.measure[:-1])] for y in x]
angles.append(360/180*np.pi)
values.append(values[0])
# Initialise the spider plot
ax = plt.subplot(polar=True)
# Plot data
ax.plot(df.angle/180*np.pi, df['measure'], linewidth=1, linestyle='solid', label="Spider chart")
ax.plot(angles, values, linewidth=1, linestyle='solid', label='what I want')
ax.legend()
# Fill area
ax.fill(angles, values, 'b', alpha=0.1)
plt.show()
the result is below, I want something similar to the orange line with some kind of spline to avoid sharp corners I currently get
I have a solution that is a patchwork of other solutions. It needs to be cleaned and optimized, but it does the job !
Comments and improvements are always welcome, see below
# https://stackoverflow.com/questions/33962717/interpolating-a-closed-curve-using-scipy
from scipy import interpolate
x=df.measure[:-1] * np.cos(df.angle[:-1]/180*np.pi)
y=df.measure[:-1] * np.sin(df.angle[:-1]/180*np.pi)
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
# fit splines to x=f(u) and y=g(u), treating both as periodic. also note that s=0
# is needed in order to force the spline fit to pass through all the input points.
tck, u = interpolate.splprep([x, y], s=0, per=True)
# evaluate the spline fits for 1000 evenly spaced distance values
xi, yi = interpolate.splev(np.linspace(0, 1, 1000), tck)
def cart2pol(x, y):
rho = np.sqrt(x**2 + y**2)
phi = np.arctan2(y, x)
return(rho, phi)
# Initialise the spider plot
plt.figure(figsize=(12,8))
ax = plt.subplot(polar=True)
# Plot data
ax.plot(df.angle/180*np.pi, df['measure'], linewidth=1, linestyle='solid', label="Spider chart")
ax.plot(angles, values, linewidth=1, linestyle='solid', label='Interval linearisation')
ax.plot(cart2pol(xi, yi)[1], cart2pol(xi, yi)[0], linewidth=1, linestyle='solid', label='Smooth interpolation')
ax.legend()
# Fill area
ax.fill(angles, values, 'b', alpha=0.1)
plt.show()
As a minimal reproducible example, suppose I have the following multivariate normal distribution:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from scipy.stats import multivariate_normal, gaussian_kde
# Choose mean vector and variance-covariance matrix
mu = np.array([0, 0])
sigma = np.array([[2, 0], [0, 3]])
# Create surface plot data
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
rv = multivariate_normal(mean=mu, cov=sigma)
Z = np.array([rv.pdf(pair) for pair in zip(X.ravel(), Y.ravel())])
Z = Z.reshape(X.shape)
# Plot it
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
pos = ax.plot_surface(X, Y, Z)
plt.show()
This gives the following surface plot:
My goal is to marginalize this and use Kernel Density Estimation to get a nice and smooth 1D Gaussian. I am running into 2 problems:
Not sure my marginalization technique makes sense.
After marginalizing I am left with a barplot, but gaussian_kde requires actual data (not frequencies of it) in order to fit KDE, so I am unable to use this function.
Here is how I marginalize it:
# find marginal distribution over y by summing over all x
y_distribution = Z.sum(axis=1) / Z.sum() # Do I need to normalize?
# plot bars
plt.bar(y, y_distribution)
plt.show()
and this is the barplot that I obtain:
Next, I follow this StackOverflow question to find the KDE only from "histogram" data. To do this, we resample the histogram and fit KDE on the resamples:
# sample the histogram
resamples = np.random.choice(y, size=1000, p=y_distribution)
kde = gaussian_kde(resamples)
# plot bars
fig, ax = plt.subplots(nrows=1, ncols=2)
ax[0].bar(y, y_distribution)
ax[1].plot(y, kde.pdf(y))
plt.show()
This produces the following plot:
which looks "okay-ish" but the two plots are clearly not on the same scale.
Coding Issue
How come the KDE is coming out on a different scale? Or rather, why is the barplot on a different scale than the KDE?
To further highlight this, I've changed the variance covariance matrix so that we know that the marginal distribution over y is a normal distribution centered at 0 with variance 3. At this point we can compare the KDE with the actual normal distribution as follows:
plt.plot(y, norm.pdf(y, loc=0, scale=np.sqrt(3)), label='norm')
plt.plot(y, kde.pdf(y), label='kde')
plt.legend()
plt.show()
This gives:
Which means the bar plot is on the wrong scale. What coding issue made the barplot in the wrong scale?
I have plots like the following:
Left Plot : original 100 * 100 numpy data
Right Plot : What I want - randomly choose N data from the original data, and plot them on a surface plot
How can I randomly choose N number of data from the left plot, and plot the chosen data on a scatter plot like the right plot?
I used ax.imshow(data) to generate the surface plot on the left. data is a 2D numpy array.
If you want to colorize the randomly chosen points according to the image you can use the same colormap and normalization for the scatter as you have for the image.
import numpy as np
import matplotlib.pyplot as plt
original_data = np.random.rand(100,100)
fig, (ax, ax2) = plt.subplots(ncols=2)
im = ax.imshow(original_data, cmap="summer")
N = 89
x = np.random.randint(0,100,size=N)
y = np.random.randint(0,100,size=N)
random_sample = original_data[x,y]
sc = ax2.scatter(x,y,c=random_sample, cmap=im.cmap, norm=im.norm)
ax2.set_aspect("equal")
ax2.set(xlim=ax.get_xlim(), ylim=ax.get_ylim())
fig.colorbar(sc, ax=[ax,ax2], orientation="horizontal")
plt.show()
You just need to choose N numbers from 10,000 (100 x 100) unique points on the 2d plot. I assume you want without replacement. Then you can "unravel" them onto your x,y coordinate.
random_choices = np.random.choice(10000, size=N, replace=False)
x, y = np.unravel_index(random_choices, (100, 100))
You can use these indices to create your scatter plot and size points appropriately:
data = np.random.random((100, 100))
plt.scatter(x, y, s=data[y, x])
I have a 2D data set and I would like to plot a 2D histogram, with each cell on the histogram representing the probability of the data point. Hence to obtain the probability, I need to normalize the histogram data so they sum to 1. Here is what I have for an example, from the 2Dhistogram documentation:
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
#create edges of bins
#create random data points
x=np.random.normal(2,1,100)
y=np.random.normal(1,1,100)
H,xedges,yedges = np.histogram2d(x,y,bins=(xedges,yedges))
#setting normed=True in histogram2d doesn't seem to do what I need
H=H.T
#weirdly histogram2d swaps the x,y axis, so transpose to restore it.
fig = plt.figure(figsize=(7,3))
plt.imshow(H,interpolation='nearest',origin='low',extent=[xedges[0], xedges[-1],yedges[0],yedges[-1]])
plt.show()
Resulting plot
Firstly, a np.sum(H) gives something like 86. I would like each cell to represent the probability of the data lying on that binned cell, so they should all sum to 1. Additionally, how do you plot a legend mapping the color intensity to its value with imshow?
Thank you!
Try using the normed argument. Also, per the docs the values in H will be calculated as bin_count / sample_count / bin_area. So we calculate the areas of the bins and multiply it by H to get the probability for the bin.
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
# create edges of bins
x = np.random.normal(2, 1, 100) # create random data points
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges), normed=True)
areas = np.matmul(np.array([np.diff(xedges)]).T, np.array([np.diff(yedges)]))
# setting normed=True in histogram2d doesn't seem to do what I need
fig = plt.figure(figsize=(7, 3))
im = plt.imshow(H*areas, interpolation='nearest', origin='low', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar(im)
plt.show()