Related
I'm trying to plot the predicted mean data from Gaussian process regression into a 3-D contour. I've followed Plot 3D Contour from an Image using extent with Matplotlib
and mplot3d example code: contour3d_demo3.py threads. Following is my code:
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
from matplotlib import cm
x_train = np.array([[0,0],[2,2],[3,3]])
y_train = np.array([[200,321,417]])
xvalues = np.array([0,1,2,3])
yvalues = np.array([0,1,2,3])
a,b = np.meshgrid(xvalues,yvalues)
positions = np.vstack([a.ravel(), b.ravel()])
x_test = (np.array(positions)).T
kernel = C(1.0, (1e-3, 1e3)) * RBF(10)
gp = GaussianProcessRegressor(kernel=kernel)
gp.fit(x_train, y_train)
y_pred_test = gp.predict(x_test)
fig = plt.figure()
ax = fig.add_subplot(projection = '3d')
x=y=np.arange(0,3,1)
X, Y = np.meshgrid(x,y)
Z = y_pred_test
cset = ax.contour(X, Y, Z, cmap=cm.coolwarm)
ax.clabel(cset, fontsize=9, inline=1)
plt.show()
After running the above code, I get following error on console:
I want x and y-axis as 2-D plane and the predicted values on the z-axis.The sample plot is as follows:
What is wrong with my code?
Thank you!
The specific error you've mentioned comes from your y_train, which might be a typo. It should be:
y_train_ : array-like, shape = (n_samples, [n_output_dims])
According to your x_train, you have 3 samples. So your y_train should have shape (3, 1) rather than (1, 3).
You also have other bugs in the plotting part:
add_subplot should have a position before projection = '3d'.
Z should have the same shape as X and Y for contour plot.
Because of 2, your x and y should match xvalues and yvalues.
Taken together, you might need to make the following changes:
...
y_train = np.array([200,321,417])
...
ax = fig.add_subplot(111, projection = '3d')
x=y=np.arange(0,4,1)
...
Z = y_pred_test.reshape(X.shape)
...
Just to mention two things:
The plot you will get after these changes won't match the figure you've shown. The figure in your question is a surface plot instead of a contour plot. You can use ax.plot_surface to get that type of plot.
I think you've already know this. But just in case, your plot won't be as smooth as your sample plot since your np.meshgrid is sparse.
I am trying to plot the comun distribution of two normal distributed variables.
The code below plots one normal distributed variable. What would the code be for plotting two normal distributed variables?
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
import math
mu = 0
variance = 1
sigma = math.sqrt(variance)
x = np.linspace(-3, 3, 100)
plt.plot(x,mlab.normpdf(x, mu, sigma))
plt.show()
It sounds like what you're looking for is a Multivariate Normal Distribution. This is implemented in scipy as scipy.stats.multivariate_normal. It's important to remember that you are passing a covariance matrix to the function. So to keep things simple keep the off diagonal elements as zero:
[X variance , 0 ]
[ 0 ,Y Variance]
Here is an example using this function and generating a 3D plot of the resulting distribution. I add the colormap to make seeing the curves easier but feel free to remove it.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
from mpl_toolkits.mplot3d import Axes3D
#Parameters to set
mu_x = 0
variance_x = 3
mu_y = 0
variance_y = 15
#Create grid and multivariate normal
x = np.linspace(-10,10,500)
y = np.linspace(-10,10,500)
X, Y = np.meshgrid(x,y)
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X; pos[:, :, 1] = Y
rv = multivariate_normal([mu_x, mu_y], [[variance_x, 0], [0, variance_y]])
#Make a 3D plot
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, rv.pdf(pos),cmap='viridis',linewidth=0)
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
plt.show()
Giving you this plot:
Edit the method used below was deprecated in Matplotlib v2.2 and removed in v3.1
A simpler version is available through matplotlib.mlab.bivariate_normal
It takes the following arguments so you don't need to worry about matrices
matplotlib.mlab.bivariate_normal(X, Y, sigmax=1.0, sigmay=1.0, mux=0.0, muy=0.0, sigmaxy=0.0)
Here X, and Y are again the result of a meshgrid so using this to recreate the above plot:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.mlab import bivariate_normal
from mpl_toolkits.mplot3d import Axes3D
#Parameters to set
mu_x = 0
sigma_x = np.sqrt(3)
mu_y = 0
sigma_y = np.sqrt(15)
#Create grid and multivariate normal
x = np.linspace(-10,10,500)
y = np.linspace(-10,10,500)
X, Y = np.meshgrid(x,y)
Z = bivariate_normal(X,Y,sigma_x,sigma_y,mu_x,mu_y)
#Make a 3D plot
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z,cmap='viridis',linewidth=0)
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
plt.show()
Giving:
The following adaption to #Ianhi's code above returns a contour plot version of the 3D plot above.
import matplotlib.pyplot as plt
from matplotlib import style
style.use('fivethirtyeight')
import numpy as np
from scipy.stats import multivariate_normal
#Parameters to set
mu_x = 0
variance_x = 3
mu_y = 0
variance_y = 15
x = np.linspace(-10,10,500)
y = np.linspace(-10,10,500)
X,Y = np.meshgrid(x,y)
pos = np.array([X.flatten(),Y.flatten()]).T
rv = multivariate_normal([mu_x, mu_y], [[variance_x, 0], [0, variance_y]])
fig = plt.figure(figsize=(10,10))
ax0 = fig.add_subplot(111)
ax0.contour(X, Y, rv.pdf(pos).reshape(500,500))
plt.show()
While the other answers are great, I wanted to achieve similar results while also illustrating the distribution with a scatter plot of the sample.
More details can be found here: Python 3d plot of multivariate gaussian distribution
The results looks like:
And is generated using the following code:
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy.stats import multivariate_normal
# Sample parameters
mu = np.array([0, 0])
sigma = np.array([[0.7, 0.2], [0.2, 0.3]])
rv = multivariate_normal(mu, sigma)
sample = rv.rvs(500)
# Bounds parameters
x_abs = 2.5
y_abs = 2.5
x_grid, y_grid = np.mgrid[-x_abs:x_abs:.02, -y_abs:y_abs:.02]
pos = np.empty(x_grid.shape + (2,))
pos[:, :, 0] = x_grid
pos[:, :, 1] = y_grid
levels = np.linspace(0, 1, 40)
fig = plt.figure()
ax = fig.gca(projection='3d')
# Removes the grey panes in 3d plots
ax.xaxis.set_pane_color((1.0, 1.0, 1.0, 0.0))
ax.yaxis.set_pane_color((1.0, 1.0, 1.0, 0.0))
ax.zaxis.set_pane_color((1.0, 1.0, 1.0, 0.0))
# The heatmap
ax.contourf(x_grid, y_grid, 0.1 * rv.pdf(pos),
zdir='z', levels=0.1 * levels, alpha=0.9)
# The wireframe
ax.plot_wireframe(x_grid, y_grid, rv.pdf(
pos), rstride=10, cstride=10, color='k')
# The scatter. Note that the altitude is defined based on the pdf of the
# random variable
ax.scatter(sample[:, 0], sample[:, 1], 1.05 * rv.pdf(sample), c='k')
ax.legend()
ax.set_title("Gaussian sample and pdf")
ax.set_xlim3d(-x_abs, x_abs)
ax.set_ylim3d(-y_abs, y_abs)
ax.set_zlim3d(0, 1)
plt.show()
I want to plot the a probability density function z=f(x,y).
I find the code to plot surf in Color matplotlib plot_surface command with surface gradient
But I don't know how to conver the z value into grid so I can plot it
The example code and my modification is below.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
%matplotlib inline
n_samples = 1000
# generate random sample, two components
np.random.seed(0)
shifted_gaussian = np.random.randn(n_samples, 2) + np.array([20, 5])
sample = shifted_gaussian
# fit a Gaussian Mixture Model with two components
clf = mixture.GMM(n_components=3, covariance_type='full')
clf.fit(sample)
# Plot it
fig = plt.figure()
ax = fig.gca(projection='3d')
X = np.arange(-5, 5, .25)
Y = np.arange(-5, 5, .25)
X, Y = np.meshgrid(X, Y)
## In example Code, the z is generate by grid
# R = np.sqrt(X**2 + Y**2)
# Z = np.sin(R)
# In my case,
# for each point [x,y], the probability value is
# z = clf.score([x,y])
# but How can I generate a grid Z?
Gx, Gy = np.gradient(Z) # gradients with respect to x and y
G = (Gx**2+Gy**2)**.5 # gradient magnitude
N = G/G.max() # normalize 0..1
surf = ax.plot_surface(
X, Y, Z, rstride=1, cstride=1,
facecolors=cm.jet(N),
linewidth=0, antialiased=False, shade=False)
plt.show()
The original approach to plot z is to generate through mesh. But in my case, the fitted model cannot return result in grid-like style, so the problem is how can I generete the grid-style z value, and plot it?
If I understand correctly, you basically have a function z that takes a two scalar values x,y in a list and returns another scalar z_val. In other words z_val = z([x,y]), right?
If that's the case, the you could do the following (note that this is not written with efficiency in mind, but with focus on readability):
from itertools import product
X = np.arange(15) # or whatever values for x
Y = np.arange(5) # or whatever values for y
N, M = len(X), len(Y)
Z = np.zeros((N, M))
for i, (x,y) in enumerate(product(X,Y)):
Z[np.unravel_index(i, (N,M))] = z([x,y])
If you want to use plot_surface, then follow that with this:
X, Y = np.meshgrid(X, Y)
ax.plot_surface(X, Y, Z.T)
I have a point cloud of magnetization directions with azimut (declination between 0° and 360°) and inclination between 0° and 90°. I display these points in a polar azimuthal equidistant projection (using matplotlib basemap). That means 90° inclination will point directly in the center of the plot and the declination runs clockwise.
My problem is that I want to also plot isolines around these point clouds, which should represent where the highest density of point/directions is located. What is the easiest way to do this? Nice would be to mark the isoline which encircles 50% is my data. If Iam not mistaken - this would be the median.
So far I've fiddled around with gaussian_kde and the outlier detection of sklearn (1 and 2), but the results are not as expected.
Any ideas?
Edit #1:
First gaussian_kde
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from mpl_toolkits.basemap import Basemap
m = Basemap(projection='spaeqd',boundinglat=0,lon_0=180,resolution='l',round=True)
m.drawparallels(np.arange(-80.,1.,10.),labels=[False,True,True,False])
m.drawmeridians(np.arange(-180.,181.,30.),labels=[True,False,False,True])
#data
x, y = m(m1,-m2) #m2 is negative because I to plot in the southern hemisphere!
#set up the grid for evaluation of the KDE
yi = np.arange(0,360.1,1)
xi = np.arange(-90,1,1)
xx,yy = np.meshgrid(xi,yi)
X, Y = m(xx,yy) # to have it in my basemap projection
#setup the gaussian kde and evaluate it
#pretty much similiar to the scipy.stats docs
positions = np.vstack([X.ravel(), Y.ravel()])
values = np.vstack([x, y])
kernel = stats.gaussian_kde(values)
Z = np.reshape(kernel(positions).T, X.shape)
#plot orginal points and probaility density function
ax = plt.gca()
ax.scatter(x,y,c = 'Crimson')
TOT = ax.contour(X,Y,Z,cmap=plt.cm.Reds)
plt.show()
Then sklearn:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from mpl_toolkits.basemap import Basemap
from sklearn import svm
from sklearn.covariance import EllipticEnvelope
m = Basemap(projection='spaeqd',boundinglat=0,lon_0=180,resolution='l',round=True)
m.drawparallels(np.arange(-80.,1.,10.),labels=[False,True,True,False])
m.drawmeridians(np.arange(-180.,181.,30.),labels=[True,False,False,True])
#data
x, y = m(m1,-m2) #m2 is negative because I to plot in the southern hemisphere!
#Similar to examples in sklearn docs
outliers_fraction = 0.5
oneclass_svm = svm.OneClassSVM(nu=0.95 * outliers_fraction + 0.05,\
kernel="rbf", gamma=0.1,verbose=True)
#seup grid
yi = np.arange(0,360.1,1)
xi = np.arange(-90,1,1)
R,T = np.meshgrid(xi,yi)
xx, yy = m(T,R)
x, y = m(m1,-m2)
#standardize data as suggested by docs
x_std = (x-x.mean())/x.std()
y_std = (y-y.mean())/y.std()
values = np.vstack([x_std, y_std])
#fit data and calculate threshold - this should mark my median - according to value of outliers_fraction
oneclass_svm.fit(values.T)
y_pred = oneclass_svm.decision_function(values.T).ravel()
threshold = stats.scoreatpercentile(y_pred, 100 * outliers_fraction)
y_pred = y_pred > threshold
#Target vector for evaluation
TV = np.c_[xx.ravel(), yy.ravel()]
TV = (TV-TV.mean(axis=0))/TV.std(axis=0) #must be standardized as well
# evaluation - This is now shifted in the plot ad does not fit my point cloud anymore - because of the standadrization
Z = oneclass_svm.decision_function(TV)
Z = Z.reshape(xx.shape)
#plotting - very similar to the example in the docs
ax = plt.gca()
ax.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7), \
cmap=plt.cm.Blues_r)
ax.contour(xx, yy, Z, levels=[threshold],
linewidths=2, colors='red')
ax.contourf(xx, yy, Z, levels=[threshold, Z.max()],
colors='orange')
ax.scatter(x, y,s=30, marker='s',c = 'RoyalBlue',label = 'Mr')
plt.show()
The EllipticEvelope works, but it is not that want I want.
Ok, I think I might found a solution. But it should not work in every case. It should fail in my opinion when the data is multimodal distributed.
Nevertheless, here is my though process:
So the Probalibity Density Function (PDF) is essentially the same as a continuous histogram. So I used np.percentile to calculate the upper and lower 25% percentile of both vectors. The I've searched for the value of the PDF at these perctiles and this should be the Isoline that i want.
Of course this should also work in the polar stereographic (or any other) projection.
Here is a litte example code of two gamma distributed data sets in a crossplot:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.interpolate import LinearNDInterpolator, RegularGridInterpolator
#generate some data
x = np.random.gamma(10,0.8,1e4)
y = np.random.gamma(4,0.3,1e4)
#set up the data and grid for the 2D PDF
values = np.vstack([x,y])
pdf_x = np.linspace(x.min(),x.max(),1e2)
pdf_y = np.linspace(y.min(),y.max(),1e2)
X,Y = np.meshgrid(pdf_x,pdf_y)
kernel = stats.gaussian_kde(values)
#evaluate the PDF at every grid location
positions = np.vstack([X.ravel(), Y.ravel()])
Z = np.reshape(kernel(positions).T, X.shape)
#upper and lower quartiles of x and y data
xql = np.percentile(x,25)
xqu = np.percentile(x,75)
yql = np.percentile(y,25)
yqu = np.percentile(y,75)
#set up the interpolator - I could also use RegularGridInterpolator - should be faster
Interp = LinearNDInterpolator((X.flatten(),Y.flatten()),Z.flatten())
#1D example to illustrate what I mean
plt.figure()
kernel2 = stats.gaussian_kde(x)
plt.hist(x,30,normed=True)
plt.plot(pdf_x,kernel2(pdf_x),'r--',linewidth=2)
#plot vertical lines at the upper and lower quartiles
plt.vlines(np.percentile(x,25),0,0.2,color='red')
plt.vlines(np.percentile(x,75),0,0.2,color='red')
#Scatterplot / Crossplot with PDF and 25 and 75% isolines
plt.figure()
plt.scatter(x,y)
#search for the isolines defining the upper and lower quartiles
#the lower quartiles isoline should encircle 75% of the data
levels = [Interp(xql,yql),Interp(xqu,yqu)]
plt.contour(X,Y,Z,levels=levels,colors='orange')
plt.show()
To finish up I will give a quick example of what it looks in a polar stereographic projection:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.interpolate import LinearNDInterpolator
from mpl_toolkits.basemap import Basemap
#set up the coordinate projection
m = Basemap(projection='spaeqd',boundinglat=0,lon_0=180,\
resolution='l',round=True,suppress_ticks=True)
parallelGrid = np.arange(-80.,1.,10.)
meridianGrid = np.arange(-180.0,180.1,30)
m.drawparallels(parallelGrid,labels=[False,False,False,False])
m.drawmeridians(meridianGrid,labels=[False,False,False,False],labelstyle='+/-',fmt='%i')
#Found this on stackoverflow - labels it exactly how I want it
ax = plt.gca()
ax.text(0.5,1.025,'N',transform=ax.transAxes,\
horizontalalignment='center',verticalalignment='bottom',size=25)
for para in np.arange(30,360,30):
x= (1.1*0.5*np.sin(np.deg2rad(para)))+0.5
y= (1.1*0.5*np.cos(np.deg2rad(para)))+0.5
ax.text(x,y,u'%i\N{DEGREE SIGN}'%para,transform=ax.transAxes,\
horizontalalignment='center',verticalalignment='center')
#generate some data
x = np.random.randint(180,225,size=15)
y = np.random.randint(30,40,size=15)
#into projection
x,y = m(x,-y)
values = np.vstack([x,y])
pdf_x = np.arange(0,361,1)
pdf_y = np.arange(0,91,1)
#into projection
X,Y = np.meshgrid(pdf_x,pdf_y)
X,Y = m(X,-Y)
kernel = stats.gaussian_kde(values)
positions = np.vstack([X.ravel(), Y.ravel()])
Z = np.reshape(kernel(positions).T, X.shape)
xql = np.percentile(x,25)
xqu = np.percentile(x,75)
yql = np.percentile(y,25)
yqu = np.percentile(y,75)
Interp = LinearNDInterpolator((X.flatten(),Y.flatten()),Z.flatten())
ax = plt.gca()
ax.scatter(x,y)
levels = [Interp(xql,yql),Interp(xqu,yqu)]
ax.contour(X,Y,Z,levels=levels,colors='red')
plt.show()
Most pyplot examples out there use linear data, but what if data is scattered?
x = 3,7,9
y = 1,4,5
z = 20,3,7
better meshgrid for contourf
xi = np.linspace(min(x)-1, max(x)+1, 9)
yi = np.linspace(min(y)-1, max(y)+1, 9)
X, Y = np.meshgrid(xi, yi)
Now "z" data got to be interpolated onto the meshgrid.
numpy.interp does little help here, while both linear and nn interpolaton of
zi = matplotlib.mlab.griddata(x,y,z,xi,yi,interp="linear")
returns rather strange results
scipy.interpolate.griddata cubic from second answer below needs something else to return data rather than nils
With custom levels data expected be looking something like this
This is what happens:
Although contour requires grid data, we can caste scatter data to a grid and then using masked arrays mask out the blank regions. I simulate this in the code below, by creating a random array, then using this to mask a test dataset (shown at bottom). The bulk of the code is taken from this matplotlib demo page.
import matplotlib
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
matplotlib.rcParams['xtick.direction'] = 'out'
matplotlib.rcParams['ytick.direction'] = 'out'
delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-2.0, 2.0, delta)
X, Y = np.meshgrid(x, y)
Z1 = mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
# difference of Gaussians
Z = 10.0 * (Z2 - Z1)
from numpy.random import *
import numpy.ma as ma
J = random_sample(X.shape)
mask = J > 0.7
X = ma.masked_array(X, mask=mask)
Y = ma.masked_array(Y, mask=mask)
Z = ma.masked_array(Z, mask=mask)
plt.figure()
CS = plt.contour(X, Y, Z, 20)
plt.clabel(CS, inline=1, fontsize=10)
plt.title('Simplest default with labels')
plt.savefig('cat.png')
plt.show()
countourf will only work with a grid of data. If you're data is scattered, then you'll need to create an interpolated grid matching your data, like this: (note you'll need scipy to perform the interpolation)
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
import numpy.ma as ma
from numpy.random import uniform, seed
# your data
x = [3,7,9]
y = [1,4,5]
z = [20,3,7]
# define grid.
xi = np.linspace(0,10,300)
yi = np.linspace(0,6,300)
# grid the data.
zi = griddata((x, y), z, (xi[None,:], yi[:,None]), method='cubic')
# contour the gridded data, plotting dots at the randomly spaced data points.
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
CS = plt.contourf(xi,yi,zi,15,cmap=plt.cm.jet)
plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=5)
plt.xlim(min(x),max(x))
plt.ylim(min(y),max(y))
plt.title('griddata test (%d points)' % len(x))
plt.show()
See here for the origin of that code.