I need to plot a HEATMAP in python using x, y, z data from the excel file.
All the values of z are 1 except at (x=5,y=5). The plot should be red at point (5,5) and blue elsewhere. But I am getting false alarms which need to be removed. The COLORMAP I have used is 'jet'
X=[0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9]
Y=[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9]
Z=[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,9,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
Code I have used is:
import matplotlib.pyplot as plt
import numpy as np
from numpy import ravel
from scipy.interpolate import interp2d
import pandas as pd
import matplotlib as mpl
excel_data_df = pd.read_excel('test.xlsx')
X= excel_data_df['x'].tolist()
Y= excel_data_df['y'].tolist()
Z= excel_data_df['z'].tolist()
x_list = np.array(X)
y_list = np.array(Y)
z_list = np.array(Z)
# f will be a function with two arguments (x and y coordinates),
# but those can be array_like structures too, in which case the
# result will be a matrix representing the values in the grid
# specified by those arguments
f = interp2d(x_list,y_list,z_list,kind="linear")
x_coords = np.arange(min(x_list),max(x_list))
y_coords = np.arange(min(y_list),max(y_list))
z= f(x_coords,y_coords)
fig = plt.imshow(z,
extent=[min(x_list),max(x_list),min(y_list),max(y_list)],
origin="lower", interpolation='bicubic', cmap= 'jet', aspect='auto')
# Show the positions of the sample points, just to have some reference
fig.axes.set_autoscale_on(False)
#plt.scatter(x_list,y_list,400, facecolors='none')
plt.xlabel('X Values', fontsize = 15, va="center")
plt.ylabel('Y Values', fontsize = 15,va="center")
plt.title('Heatmap', fontsize = 20)
plt.tight_layout()
plt.show()
For your ease you can also use the X, Y, Z arrays instead of reading excel file.
The result that I am getting is:
Here you can see dark blue regions at (5,0) and (0,5). These are the FALSE ALARMS I am getting and I need to REMOVE these.
I am probably doing some beginner's mistake. Grateful to anyone who points it out. Regards
There are at least three problems in your example:
x_coords and y_coords are not properly resampled;
the interpolation z does to fill in the whole grid leading to incorrect output;
the output is then forced to be plotted on the original grid (extent) that add to the confusion.
Leading to the following interpolated results:
On what you have applied an extra smoothing with imshow.
Let's create your artificial input:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 11)
y = np.arange(0, 11)
X, Y = np.meshgrid(x, y)
Z = np.ones(X.shape)
Z[5,5] = 9
Depending on how you want to proceed, you can simply let imshow smooth your signal by interpolation:
fig, axe = plt.subplots()
axe.imshow(Z, origin="lower", cmap="jet", interpolation='bicubic')
And you are done, simple and efficient!
If you aim to do it by yourself, then choose the interpolant that suits you best and resample on a grid with a higher resolution:
interpolant = interpolate.interp2d(x, y, Z.ravel(), kind="linear")
xlin = np.linspace(0, 10, 101)
ylin = np.linspace(0, 10, 101)
zhat = interpolant(xlin, ylin)
fig, axe = plt.subplots()
axe.imshow(zhat, origin="lower", cmap="jet")
Have a deeper look on scipy.interpolate module to pick up the best interpolant regarding your needs. Notice that all methods does not expose the same interface for imputing parameters. You may need to reshape your data to use another objects.
MCVE
Here is a complete example using the trial data generated above. Just bind it to your excel columns:
# Flatten trial data to meet your requirement:
x = X.ravel()
y = Y.ravel()
z = Z.ravel()
# Resampling on as square grid with given resolution:
resolution = 11
xlin = np.linspace(x.min(), x.max(), resolution)
ylin = np.linspace(y.min(), y.max(), resolution)
Xlin, Ylin = np.meshgrid(xlin, ylin)
# Linear multi-dimensional interpolation:
interpolant = interpolate.NearestNDInterpolator([r for r in zip(x, y)], z)
Zhat = interpolant(Xlin.ravel(), Ylin.ravel()).reshape(Xlin.shape)
# Render and interpolate again if necessary:
fig, axe = plt.subplots()
axe.imshow(Zhat, origin="lower", cmap="jet", interpolation='bicubic')
Which renders as expected:
So I have used scikit-learn's Gaussian mixture models(http://scikit-learn.org/stable/modules/mixture.html) to fit my data, now I want to use the model, How can I do it? Specifically:
How can I plot the probability density distribution?
How can I calculate the mean square error of the fitting model?
Here is the code you may need:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
from sklearn import mixture
import matplotlib as mpl
from matplotlib.patches import Ellipse
%matplotlib inline
n_samples = 300
# generate random sample, two components
np.random.seed(0)
shifted_gaussian = np.random.randn(n_samples, 2) + np.array([20, 5])
sample= shifted_gaussian
# fit a Gaussian Mixture Model with two components
clf = mixture.GMM(n_components=2, covariance_type='full')
clf.fit(sample)
# plot sample scatter
plt.scatter(sample[:, 0], sample[:, 1])
# 1. Plot the probobility density distribution
# 2. Calculate the mean square error of the fitting model
UPDATE:
I can plot the distribution by:
x = np.linspace(-20.0, 30.0)
y = np.linspace(-20.0, 40.0)
X, Y = np.meshgrid(x, y)
XX = np.array([X.ravel(), Y.ravel()]).T
Z = -clf.score_samples(XX)[0]
Z = Z.reshape(X.shape)
CS = plt.contour(X, Y, Z, norm=LogNorm(vmin=1.0, vmax=1000.0),
levels=np.logspace(0, 3, 10))
CB = plt.colorbar(CS, shrink=0.8, extend='both')
But isn't it quite strange? Is there better way do to it? Can I plot something like this?
I think the result is reasonable, if you adjust the xlim and ylim a little bit:
# plot sample scatter
plt.scatter(sample[:, 0], sample[:, 1], marker='+', alpha=0.5)
# 1. Plot the probobility density distribution
# 2. Calculate the mean square error of the fitting model
x = np.linspace(-20.0, 30.0, 100)
y = np.linspace(-20.0, 40.0, 100)
X, Y = np.meshgrid(x, y)
XX = np.array([X.ravel(), Y.ravel()]).T
Z = -clf.score_samples(XX)[0]
Z = Z.reshape(X.shape)
CS = plt.contour(X, Y, Z, norm=LogNorm(vmin=1.0, vmax=10.0),
levels=np.logspace(0, 1, 10))
CB = plt.colorbar(CS, shrink=0.8, extend='both')
plt.xlim((10,30))
plt.ylim((-5, 15))
I have a point cloud of magnetization directions with azimut (declination between 0° and 360°) and inclination between 0° and 90°. I display these points in a polar azimuthal equidistant projection (using matplotlib basemap). That means 90° inclination will point directly in the center of the plot and the declination runs clockwise.
My problem is that I want to also plot isolines around these point clouds, which should represent where the highest density of point/directions is located. What is the easiest way to do this? Nice would be to mark the isoline which encircles 50% is my data. If Iam not mistaken - this would be the median.
So far I've fiddled around with gaussian_kde and the outlier detection of sklearn (1 and 2), but the results are not as expected.
Any ideas?
Edit #1:
First gaussian_kde
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from mpl_toolkits.basemap import Basemap
m = Basemap(projection='spaeqd',boundinglat=0,lon_0=180,resolution='l',round=True)
m.drawparallels(np.arange(-80.,1.,10.),labels=[False,True,True,False])
m.drawmeridians(np.arange(-180.,181.,30.),labels=[True,False,False,True])
#data
x, y = m(m1,-m2) #m2 is negative because I to plot in the southern hemisphere!
#set up the grid for evaluation of the KDE
yi = np.arange(0,360.1,1)
xi = np.arange(-90,1,1)
xx,yy = np.meshgrid(xi,yi)
X, Y = m(xx,yy) # to have it in my basemap projection
#setup the gaussian kde and evaluate it
#pretty much similiar to the scipy.stats docs
positions = np.vstack([X.ravel(), Y.ravel()])
values = np.vstack([x, y])
kernel = stats.gaussian_kde(values)
Z = np.reshape(kernel(positions).T, X.shape)
#plot orginal points and probaility density function
ax = plt.gca()
ax.scatter(x,y,c = 'Crimson')
TOT = ax.contour(X,Y,Z,cmap=plt.cm.Reds)
plt.show()
Then sklearn:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from mpl_toolkits.basemap import Basemap
from sklearn import svm
from sklearn.covariance import EllipticEnvelope
m = Basemap(projection='spaeqd',boundinglat=0,lon_0=180,resolution='l',round=True)
m.drawparallels(np.arange(-80.,1.,10.),labels=[False,True,True,False])
m.drawmeridians(np.arange(-180.,181.,30.),labels=[True,False,False,True])
#data
x, y = m(m1,-m2) #m2 is negative because I to plot in the southern hemisphere!
#Similar to examples in sklearn docs
outliers_fraction = 0.5
oneclass_svm = svm.OneClassSVM(nu=0.95 * outliers_fraction + 0.05,\
kernel="rbf", gamma=0.1,verbose=True)
#seup grid
yi = np.arange(0,360.1,1)
xi = np.arange(-90,1,1)
R,T = np.meshgrid(xi,yi)
xx, yy = m(T,R)
x, y = m(m1,-m2)
#standardize data as suggested by docs
x_std = (x-x.mean())/x.std()
y_std = (y-y.mean())/y.std()
values = np.vstack([x_std, y_std])
#fit data and calculate threshold - this should mark my median - according to value of outliers_fraction
oneclass_svm.fit(values.T)
y_pred = oneclass_svm.decision_function(values.T).ravel()
threshold = stats.scoreatpercentile(y_pred, 100 * outliers_fraction)
y_pred = y_pred > threshold
#Target vector for evaluation
TV = np.c_[xx.ravel(), yy.ravel()]
TV = (TV-TV.mean(axis=0))/TV.std(axis=0) #must be standardized as well
# evaluation - This is now shifted in the plot ad does not fit my point cloud anymore - because of the standadrization
Z = oneclass_svm.decision_function(TV)
Z = Z.reshape(xx.shape)
#plotting - very similar to the example in the docs
ax = plt.gca()
ax.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7), \
cmap=plt.cm.Blues_r)
ax.contour(xx, yy, Z, levels=[threshold],
linewidths=2, colors='red')
ax.contourf(xx, yy, Z, levels=[threshold, Z.max()],
colors='orange')
ax.scatter(x, y,s=30, marker='s',c = 'RoyalBlue',label = 'Mr')
plt.show()
The EllipticEvelope works, but it is not that want I want.
Ok, I think I might found a solution. But it should not work in every case. It should fail in my opinion when the data is multimodal distributed.
Nevertheless, here is my though process:
So the Probalibity Density Function (PDF) is essentially the same as a continuous histogram. So I used np.percentile to calculate the upper and lower 25% percentile of both vectors. The I've searched for the value of the PDF at these perctiles and this should be the Isoline that i want.
Of course this should also work in the polar stereographic (or any other) projection.
Here is a litte example code of two gamma distributed data sets in a crossplot:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.interpolate import LinearNDInterpolator, RegularGridInterpolator
#generate some data
x = np.random.gamma(10,0.8,1e4)
y = np.random.gamma(4,0.3,1e4)
#set up the data and grid for the 2D PDF
values = np.vstack([x,y])
pdf_x = np.linspace(x.min(),x.max(),1e2)
pdf_y = np.linspace(y.min(),y.max(),1e2)
X,Y = np.meshgrid(pdf_x,pdf_y)
kernel = stats.gaussian_kde(values)
#evaluate the PDF at every grid location
positions = np.vstack([X.ravel(), Y.ravel()])
Z = np.reshape(kernel(positions).T, X.shape)
#upper and lower quartiles of x and y data
xql = np.percentile(x,25)
xqu = np.percentile(x,75)
yql = np.percentile(y,25)
yqu = np.percentile(y,75)
#set up the interpolator - I could also use RegularGridInterpolator - should be faster
Interp = LinearNDInterpolator((X.flatten(),Y.flatten()),Z.flatten())
#1D example to illustrate what I mean
plt.figure()
kernel2 = stats.gaussian_kde(x)
plt.hist(x,30,normed=True)
plt.plot(pdf_x,kernel2(pdf_x),'r--',linewidth=2)
#plot vertical lines at the upper and lower quartiles
plt.vlines(np.percentile(x,25),0,0.2,color='red')
plt.vlines(np.percentile(x,75),0,0.2,color='red')
#Scatterplot / Crossplot with PDF and 25 and 75% isolines
plt.figure()
plt.scatter(x,y)
#search for the isolines defining the upper and lower quartiles
#the lower quartiles isoline should encircle 75% of the data
levels = [Interp(xql,yql),Interp(xqu,yqu)]
plt.contour(X,Y,Z,levels=levels,colors='orange')
plt.show()
To finish up I will give a quick example of what it looks in a polar stereographic projection:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.interpolate import LinearNDInterpolator
from mpl_toolkits.basemap import Basemap
#set up the coordinate projection
m = Basemap(projection='spaeqd',boundinglat=0,lon_0=180,\
resolution='l',round=True,suppress_ticks=True)
parallelGrid = np.arange(-80.,1.,10.)
meridianGrid = np.arange(-180.0,180.1,30)
m.drawparallels(parallelGrid,labels=[False,False,False,False])
m.drawmeridians(meridianGrid,labels=[False,False,False,False],labelstyle='+/-',fmt='%i')
#Found this on stackoverflow - labels it exactly how I want it
ax = plt.gca()
ax.text(0.5,1.025,'N',transform=ax.transAxes,\
horizontalalignment='center',verticalalignment='bottom',size=25)
for para in np.arange(30,360,30):
x= (1.1*0.5*np.sin(np.deg2rad(para)))+0.5
y= (1.1*0.5*np.cos(np.deg2rad(para)))+0.5
ax.text(x,y,u'%i\N{DEGREE SIGN}'%para,transform=ax.transAxes,\
horizontalalignment='center',verticalalignment='center')
#generate some data
x = np.random.randint(180,225,size=15)
y = np.random.randint(30,40,size=15)
#into projection
x,y = m(x,-y)
values = np.vstack([x,y])
pdf_x = np.arange(0,361,1)
pdf_y = np.arange(0,91,1)
#into projection
X,Y = np.meshgrid(pdf_x,pdf_y)
X,Y = m(X,-Y)
kernel = stats.gaussian_kde(values)
positions = np.vstack([X.ravel(), Y.ravel()])
Z = np.reshape(kernel(positions).T, X.shape)
xql = np.percentile(x,25)
xqu = np.percentile(x,75)
yql = np.percentile(y,25)
yqu = np.percentile(y,75)
Interp = LinearNDInterpolator((X.flatten(),Y.flatten()),Z.flatten())
ax = plt.gca()
ax.scatter(x,y)
levels = [Interp(xql,yql),Interp(xqu,yqu)]
ax.contour(X,Y,Z,levels=levels,colors='red')
plt.show()