I have recently been working with gpflow, in-particular Gaussian process regression, to model a process for which I have access to approximated moments for each input. I have a vector of input values X of size (N,1) and a vector of responses Y of size (N,1). However, I also know, for each (x,y) pair, an approximation of the associated variance, skewness, kurtosis and so on for the particular y value.
From this, I know properties that inform me of appropriate likelihoods to use for each data point.
In the simplest case, I just assume all likelihoods are Gaussian, and specify the variance at each point. I've created a minimal example of my code by adapting the tutorial on: https://nbviewer.jupyter.org/github/GPflow/GPflow/blob/develop/doc/source/notebooks/advanced/varying_noise.ipynb#Demo-2:-grouped-noise-variances.
import numpy as np
import gpflow
def generate_data(N=100):
X = np.random.rand(N)[:, None] * 10 - 5 # Inputs, shape N x 1
F = 2.5 * np.sin(6 * X) + np.cos(3 * X) # Mean function values
groups = np.arange( 0, N, 1 ).reshape(-1,1)
NoiseVar = np.array([i/100.0 for i in range(N)])[groups]
Y = F + np.random.randn(N, 1) * np.sqrt(NoiseVar) # Noisy data
return X, Y, groups, NoiseVar
# Get data
X, Y, groups, NoiseVar = generate_data()
Y_data = np.hstack([Y, groups])
# Generate one likelihood per data-point
likelihood = gpflow.likelihoods.SwitchedLikelihood( [gpflow.likelihoods.Gaussian(variance=NoiseVar[i]) for i in range(Y.shape[0])])
# model construction (notice that num_latent is 1)
kern = gpflow.kernels.Matern52(input_dim=1, lengthscales=0.5)
model = gpflow.models.VGP(X, Y_data, kern=kern, likelihood=likelihood, num_latent=1)
# Specify the likelihood as non-trainable
model.likelihood.set_trainable(False)
# build the natural gradients optimiser
natgrad_optimizer = gpflow.training.NatGradOptimizer(gamma=1.)
natgrad_tensor = natgrad_optimizer.make_optimize_tensor(model, var_list=[(model.q_mu, model.q_sqrt)])
session = model.enquire_session()
session.run(natgrad_tensor)
# update the cache of the variational parameters in the current session
model.anchor(session)
# Stop Adam from optimising the variational parameters
model.q_mu.trainable = False
model.q_sqrt.trainable = False
# Create Adam tensor
adam_tensor = gpflow.train.AdamOptimizer(learning_rate=0.1).make_optimize_tensor(model)
for i in range(200):
session.run(natgrad_tensor)
session.run(adam_tensor)
# update the cache of the parameters in the current session
model.anchor(session)
print(model)
The above code works for a gaussian likelihood, and known variances. Inspecting my real data, I see that it is skewed very often and as a result, I want to use non-gaussian likelihoods to model it, but am unsure how to specify these other likelihood parameters given what I know.
So my question is: Given this setup, how can I adapt my code so far to include non-Gaussian likelihoods at each step, in-particular specifying and fixing their parameters based on my known variances, skewness, kurtosis and so on associated with each individual y value?
Firstly, you will need to choose which non-Gaussian likelihood you use. GPflow includes various ones in likelihoods.py. You then need to adapt the line
likelihood = gpflow.likelihoods.SwitchedLikelihood(
[gpflow.likelihoods.Gaussian(variance=NoiseVar[i]) for i in range(Y.shape[0])]
)
to give a list of your non-Gaussian likelihoods.
Which likelihood can take advantage of your skewness and kurtosis information is a statistical question. Depending on what you come up with, you may need to implement your own likelihood class, which can be done by inheriting from Likelihood. You should be able to follow some other examples from likelihoods.py.
Related
I have a data set X which i need to use to maximise the parameters by MLE. I have the log likelihood function
def llh(alpha, beta):
a = [0]*999
for i in range(1, 1000):
a[i-1] = (-0.5)*(((1/beta)*(X[i]-np.sin((alpha)*X[i-1])))**2)
return sum(a)
I need to maximise this but i have no idea how. I can only think of plotting 3d graphs to find the maximum point but that gives me weird answers that are not what I want.
This is the plot I got
Is there any other possible way to get my maximum parameters or am I going about this the wrong way? My dataset model function is Xk = sin(alphaXk-1) + betaWk where Wk is normally distributed with mean 0 and sigma 1. Any help would be appreciated.Thank you!
You have to find the maximum of your likelihood numerically. In practice this is done by computing the negative (log) likelihood and using numerical minimization to find the most likely parameters of your model to describe your data. Make use of scipy.optimize.minimize to minimize your likelihood.
I implemented a short example for normal distributed data.
import numpy as np
from scipy.stats import norm
from scipy.optimize import minimize
def neg_llh(popt, X):
return -np.log(norm.pdf(X, loc = popt[0], scale = popt[1])).sum()
# example data
X = np.random.normal(loc = 5, scale = 2, size = 1000)
# minimize log likelihood
res = minimize(neg_llh, x0 = [2, 2], args = (X))
print(res.x)
array([5.10023503, 2.01174199])
Since you are using sum I suppose the likelihood you defined above is already a (negative?) log-likelihood.
def neg_llh(popt, X):
alpha = popt[0]
beta = popt[1]
return np.sum((-0.5)*(((1 / beta)*(X - np.sin((alpha) * X)))**2))
Try minimizing your negative likelihood. Using your plot you can make a good initial guess (x0) about the values of alpha and beta.
I have an array of 100x100 data points, where I'm trying to perform a Gaussian fit to each column of 100 values in the array. I then want the parameters of the Gaussian found by using the fit of the first column to be the initial parameters of the starting point for the next column to use. Let's say I start with the initial parameters of 1000, 0, and 1, and the fit finds values of 800, 3, and 1.5. I then want the fitter to use these three parameters as initial values for the next column.
My code is:
x = np.linspace(-50,50,100)
Gauss_Model = models.Gaussian1D(amplitude = 1000., mean = 0, stddev = 1.)
Fitting_Model = fitting.LevMarLSQFitter()
Fit_Data = []
for i in range(0, Data_Array.shape[0]):
Fit_Data.append(Fitting_Model(Gauss_Model, x, Data_Array[:,i]))
Right now it uses the same initial values for every fit. Does anyone know how to perform such a running median/mean for a Gaussian fitting method? Would really appreciate any help or being pointed in the right direction, thanks!
I'm not familiar with the specific library you are using, but if you can get your fitted parameters out with something like fit_data[-1].amplitude or fit_data[-1].mean, then you could modify your loop to use something like:
for i in range(0, data_array.shape[0]):
if fit_data: # true if not an empty list
Gauss_Model = models.Gaussian1D(amplitude=fit_data[-1].amplitude,
mean=fit_data[-1].mean,
stddev=fit_data[-1].stddev)
fit_data.append(Fitting_Model(Gauss_Model, x, Data_Array[:,i]))
basically checking whether you have already fit a model, and if you have, use the most recent fitted amplitude, mean, and standard deviation as the starting point for your next Gauss_Model.
A thought: this might speed up your fitting, but it shouldn't result in a "better" fit to the 100 data points in each fit operation. Your resulting model is probably the best fit model to the data it was presented. If you want to estimate the error in the parameters of your model, you can use the fact that, for two normal distributions A ~ N(m_a, v_a) and B ~ N(m_b, v_b), the distribution A + B will have mean m_a + m_b and variance is v_a + v_b. Thus, the distribution of your means will be N(sum(means)/n, sum(variances)/n). Basically you can say that your true mean is centered at the mean of your means with standard deviation (sum(stddev)/sqrt(n)).
I also cannot tell what library you are using, and the details of how to do this probably depend on the details of how that library stores the fitted values. I can say that for lmfit (https://lmfit.github.io/lmfit-py/) we struggled with this sort of usage and arrived at a design that makes what you are trying to do pretty easy. With lmfit, you might compose this problem as:
import numpy as np
from lmfit import GaussianModel
x = np.linspace(-50,50,100)
# get Data_Array from somewhere....
# create a model for a Gaussian
Gauss_Model = GaussianModel()
# make a set of parameters, setting initial values
params = Gauss_Model.make_params(amplitude=1000, center=0, sigma=1.0)
Fit_Results = []
for i in range(Data_Array.shape[1]):
result = Gauss_Model.fit(Data_Array[:, i], params, x=x)
Fit_Results.append(result)
# update `params` with the current best fit params for the next column
params = result.params
Note that this works because lmfit is careful that Model.fit() will not alter the input parameters, and will put the resulting best-fit parameters for each fit in result.params.
And, if you decide you do want to have all columns use the original initial values, just comment out that last params = result.params.
Lmfit has a lot more bells and whistles, but I hope that helps you do what you need.
I have some 2D data (GPS data) with clusters (stop locations) that I know resemble Gaussians with a characteristic standard deviation (proportional to the inherent noise of GPS samples). The figure below visualizes a sample that I expect has two such clusters. The image is 25 meters wide and 13 meters tall.
The sklearn module has a function sklearn.mixture.GaussianMixture which allows you to fit a mixture of Gaussians to data. The function has a parameter, covariance_type, that enables you to assume different things about the shape of the Gaussians. You can, for example, assume them to be uniform using the 'tied' argument.
However, it does not appear directly possible to assume the covariance matrices to remain constant. From the sklearn source code it seems trivial to make a modification that enables this but it feels a bit excessive to make a pull request with an update that allows this (also I don't want to accidentally add bugs in sklearn). Is there a better way to fit a mixture to data where the covariance matrix of each Gaussian is fixed?
I want to assume that the SD should remain constant at around 3 meters for each component, since that is roughly the noise level of my GPS samples.
It is simple enough to write your own implementation of EM algorithm. It would also give you a good intuition of the process. I assume that covariance is known and that prior probabilities of components are equal, and fit only means.
The class would look like this (in Python 3):
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
class FixedCovMixture:
""" The model to estimate gaussian mixture with fixed covariance matrix. """
def __init__(self, n_components, cov, max_iter=100, random_state=None, tol=1e-10):
self.n_components = n_components
self.cov = cov
self.random_state = random_state
self.max_iter = max_iter
self.tol=tol
def fit(self, X):
# initialize the process:
np.random.seed(self.random_state)
n_obs, n_features = X.shape
self.mean_ = X[np.random.choice(n_obs, size=self.n_components)]
# make EM loop until convergence
i = 0
for i in range(self.max_iter):
new_centers = self.updated_centers(X)
if np.sum(np.abs(new_centers-self.mean_)) < self.tol:
break
else:
self.mean_ = new_centers
self.n_iter_ = i
def updated_centers(self, X):
""" A single iteration """
# E-step: estimate probability of each cluster given cluster centers
cluster_posterior = self.predict_proba(X)
# M-step: update cluster centers as weighted average of observations
weights = (cluster_posterior.T / cluster_posterior.sum(axis=1)).T
new_centers = np.dot(weights, X)
return new_centers
def predict_proba(self, X):
likelihood = np.stack([multivariate_normal.pdf(X, mean=center, cov=self.cov)
for center in self.mean_])
cluster_posterior = (likelihood / likelihood.sum(axis=0))
return cluster_posterior
def predict(self, X):
return np.argmax(self.predict_proba(X), axis=0)
On the data like yours, the model would converge quickly:
np.random.seed(1)
X = np.random.normal(size=(100,2), scale=3)
X[50:] += (10, 5)
model = FixedCovMixture(2, cov=[[3,0],[0,3]], random_state=1)
model.fit(X)
print(model.n_iter_, 'iterations')
print(model.mean_)
plt.scatter(X[:,0], X[:,1], s=10, c=model.predict(X))
plt.scatter(model.mean_[:,0], model.mean_[:,1], s=100, c='k')
plt.axis('equal')
plt.show();
and output
11 iterations
[[9.92301067 4.62282807]
[0.09413883 0.03527411]]
You can see that the estimated centers ((9.9, 4.6) and (0.09, 0.03)) are close to the true centers ((10, 5) and (0, 0)).
I think the best option would be to "roll your own" GMM model by defining a new scikit-learn class that inherits from GaussianMixture and overwrites the methods to get the behavior you want. This way you just have an implementation yourself and you don't have to change the scikit-learn code (and create a pull-request).
Another option that might work is to look at the Bayesian version of GMM in scikit-learn. You might be able to set the prior for the covariance matrix so that the covariance is fixed. It seems to use the Wishart distribution as a prior for the covariance. However I'm not familiar enough with this distribution to help you out more.
First, you can use spherical option, which will give you single variance value for each component. This way you can check yourself, and if the received values of variance are too different then something went wrong.
In a case you want to preset the variance, you problem degenerates to finding only best centers for your components. You can do it by using k-means, for example. If you don't know the number of the components, you may sweep over all logical values (like 1 to 20) and evaluate the decrement in fitting error. Or you can optimize your own EM function, to find the centers and the number of components simultaneously.
I've been working on getting a hierarchical model of some psychophysical behavioral data up and running in pymc3. I'm incredibly impressed with things overall, but after trying to get up to speed with Theano and pymc3 I have a model that mostly works, however has a couple problems.
The code is built to fit a parameterized version of a Weibull to seven sets of data. Each trial is modeled as a binary Bernoulli outcome, while the thresholds (output of thact as the y values which are used to fit a Gaussian function for height, width, and elevation (a, c, and d on a typical Gaussian).
Using the parameterized Weibull seems to be working nicely, and is now hierarchical for the slope of the Weibull while the thresholds are fit separately for each chunk of data. However - the output I'm getting from k and y_est leads me to believe they may not be the correct size, and unlike the probability distributions, it doesn't look like I can specify shape (unless there's a theano way to do this that I haven't found - though from what I've read specifying shape in theano is tricky).
Ultimately, I'd like to use y_est to estimate the gaussian height or width, however the output right now results in an incredible mess that I think originates with size problems in y_est and k. Any help would be fantastic - the code below should simulate some data and is followed by the model. The model does a nice job fitting each individual threshold and getting the slopes, but falls apart when dealing with the rest.
Thanks for having a look - I'm super impressed with pymc3 so far!
EDIT: Okay, so the shape output by y_est.tag.test_value.shape looks like this
y_est.tag.test_value.shape
(101, 7)
k.tag.test_value.shape
(7,)
I think this is where I'm running into trouble, though it may just be poorly constructed on my part. k has the right shape (one k value per unique_xval). y_est is outputting an entire set of data (101x7) instead of a single estimate (one y_est per unique_xval) for each difficulty level. Is there some way to specify that y_est get specific subsets of df_y_vals to control this?
#Import necessary modules and define our weibull function
import numpy as np
import pylab as pl
from scipy.stats import bernoulli
#x stimulus intensity
#g chance (0.5 for 2AFC)
# m slope
# t threshold
# a performance level defining threshold
def weib(x,g,a,m,t):
k=-np.log(((1-a)/(1-g))**(1/t))
return 1- (1-g)*np.exp(- (k*x/t)**m);
#Output values from weibull function
xit=101
xvals=np.linspace(0.05,1,xit)
out_weib=weib(xvals, 0.5, 0.8, 3, 0.6)
#Okay, fitting the perfect output of a Weibull should be easy, contaminate with some noise
#Slope of 3, threshold of 0.6
#How about 5% noise!
noise=0.05*np.random.randn(np.size(out_weib))
out=out_weib+noise
#Let's make this more like a typical experiment -
#i.e. no percent correct, just one or zero
#Randomly pick based on the probability at each point whether they got the trial right or wrong
trial=np.zeros_like(out)
for i in np.arange(out.size):
p=out_weib[i]
trial[i] = bernoulli.rvs(p)
#Iterate for 6 sets of data, similar slope (from a normal dist), different thresh (output from gaussian)
#Gauss parameters=
true_gauss_height = 0.3
true_gauss_width = 0.01
true_gauss_elevation = 0.2
#What thresholds will we get then? 6 discrete points along that gaussian, from 0 to 180 degree mask
x_points=[0, 30, 60, 90, 120, 150, 180]
x_points=np.asarray(x_points)
gauss_points=true_gauss_height*np.exp(- ((x_points**2)/2*true_gauss_width**2))+true_gauss_elevation
import pymc as pm2
import pymc3 as pm
import pandas as pd
slopes=pm2.rnormal(3, 3, size=7)
out_weib=np.zeros([xvals.size,x_points.size])
for i in np.arange(x_points.size):
out_weib[:,i]=weib(xvals, 0.5, 0.8, slopes[i], gauss_points[i])
#Let's make this more like a typical experiment - i.e. no percent correct, just one or zero
#Randomly pick based on the probability at each point whether they got the trial right or wrong
trials=np.zeros_like(out_weib)
for i in np.arange(len(trials)):
for ii in np.arange(gauss_points.size):
p=out_weib[i,ii]
trials[i,ii] = bernoulli.rvs(p)
#Let's make that data into a DataFrame for pymc3
y_vals=np.tile(xvals, [7, 1])
df_correct = pd.DataFrame(trials, columns=x_points)
df_y_vals = pd.DataFrame(y_vals.T, columns=x_points)
unique_xvals=x_points
import theano as th
with pm.Model() as hierarchical_model:
# Hyperpriors for group node
mu_slope = pm.Normal('mu_slope', mu=3, sd=1)
sigma_slope = pm.Uniform('sigma_slope', lower=0.1, upper=2)
#Priors for the overall gaussian function - 3 params, the height of the gaussian
#Width, and elevation
gauss_width = pm.HalfNormal('gauss_width', sd=1)
gauss_elevation = pm.HalfNormal('gauss_elevation', sd=1)
slope = pm.Normal('slope', mu=mu_slope, sd=sigma_slope, shape=unique_xvals.size)
thresh=pm.Uniform('thresh', upper=1, lower=0.1, shape=unique_xvals.size)
k = -th.tensor.log(((1-0.8)/(1-0.5))**(1/thresh))
y_est=1-(1-0.5)*th.tensor.exp(-(k*df_y_vals/thresh)**slope)
#We want our model to predict either height or width...height would be easier.
#Our Gaussian function has y values estimated by y_est as the 82% thresholds
#and Xvals based on where each of those psychometrics were taken.
#height_est=pm.Deterministic('height_est', (y_est/(th.tensor.exp((-unique_xvals**2)/2*gauss_width)))+gauss_elevation)
height_est = pm.Deterministic('height_est', (y_est-gauss_elevation)*th.tensor.exp((unique_xvals**2)/2*gauss_width**2))
#Define likelihood as Bernoulli for each binary trial
likelihood = pm.Bernoulli('likelihood',p=y_est, shape=unique_xvals.size, observed=df_correct)
#Find start
start=pm.find_MAP()
step=pm.NUTS(state=start)
#Do MCMC
trace = pm.sample(5000, step, njobs=1, progressbar=True) # draw 5000 posterior samples using NUTS sampling
I'm not sure exactly what you want to do when you say "Is there some way to specify that y_est get specific subsets of df_y_vals to control this". Can you describe for each y_est value what values of df_y_vals are you supposed to use? What's the shape of df_y_vals? What's the shape of y_est supposed to be? (7,)?
I suspect what you want is to index into df_y_vals using numpy advanced indexing, which works the same in PyMC as in numpy. Its hard to say exactly without more information.
I'm trying to use polyfit to find the best fitting straight line to a set of data, but I also need to know the uncertainty on the parameters, so I want the covariance matrix too. The online documentation suggests I write:
polyfit(x, y, 2, cov=True)
but this gives the error:
TypeError: polyfit() got an unexpected keyword argument 'cov'
And sure enough help(polyfit) shows no keyword argument 'cov'.
So does the online documentation refer to a previous release of numpy? (I have 1.6.1, the newest one). I could write my own polyfit function, but has anyone got any suggestions for why I don't have a covariance option on my polyfit?
Thanks
For a solution that comes from a library, I find that using scikits.statsmodels is a convenient choice. In statsmodels, regression objects have callable attributes that return the parameters and standard errors. I put an example of how this would work for you below:
# Imports, I assume NumPy for forming your data.
import numpy as np
import scikits.statsmodels.api as sm
# Form the data here
(X, Y) = ....
reg_x_data = np.ones(X.shape); # 0th degree term.
for ii in range(1,deg+1):
reg_x_data = np.hstack(( reg_x_data, X**(ii) )); # Append the ii^th degree term.
# Store OLS regression results into `result`
result = sm.OLS(Y,reg_x_data).fit()
# Print the estimated coefficients
print result.params
# Print the basic OLS standard error in the coefficients
print result.bse
# Print the estimated basic OLS covariance matrix
print result.cov_params() # <-- Notice, this one is a function by convention.
# Print the heteroskedasticity-consistent standard error
print result.HC0_se
# Print the heteroskedasticity-consistent covariance matrix
print result.cov_HC0
There are additional robust covariance attributes in the result object as well. You can see them by printing out dir(result). Also, by convention, the covariance matrices for the heteroskedasticity-consistent estimators are only available after you already call the corresponding standard error, such as: you must call result.HC0_se prior to result.cov_HC0 because the first reference causes the second one to be computed and stored.
Pandas is another library that probably provides more advanced support for these operations.
Non-library function
This might be useful when you don't want to / can't rely on an extra library function.
Below is a function that I wrote to return the OLS regression coefficients, as well as a bunch of stuff. It returns the residuals, the regression variance and standard error (standard error of the residuals-squared), the asymptotic formula for large-sample variance, the OLS covariance matrix, the heteroskedasticity-consistent "robust" covariance estimate (which is the OLS covariance but weighted according to the residuals), and the "White" or "bias-corrected" heteroskedasticity-consistent covariance.
import numpy as np
###
# Regression and standard error estimation functions
###
def ols_linreg(X, Y):
""" ols_linreg(X,Y)
Ordinary least squares regression estimator given explanatory variables
matrix X and observations matrix Y.The length of the first dimension of
X and Y must be the same (equal to the number of samples in the data set).
Note: these methods should be adapted if you need to use this for large data.
This is mostly for illustrating what to do for calculating the different
classicial standard errors. You would never really want to compute the inverse
matrices for large problems.
This was developed with NumPy 1.5.1.
"""
(N, K) = X.shape
t1 = np.linalg.inv( (np.transpose(X)).dot(X) )
t2 = (np.transpose(X)).dot(Y)
beta = t1.dot(t2)
residuals = Y - X.dot(beta)
sig_hat = (1.0/(N-K))*np.sum(residuals**2)
sig_hat_asymptotic_variance = 2*sig_hat**2/N
conv_st_err = np.sqrt(sig_hat)
sum1 = 0.0
for ii in range(N):
sum1 = sum1 + np.outer(X[ii,:],X[ii,:])
sum1 = (1.0/N)*sum1
ols_cov = (sig_hat/N)*np.linalg.inv(sum1)
PX = X.dot( np.linalg.inv(np.transpose(X).dot(X)).dot(np.transpose(X)) )
robust_se_mat1 = np.linalg.inv(np.transpose(X).dot(X))
robust_se_mat2 = np.transpose(X).dot(np.diag(residuals[:,0]**(2.0)).dot(X))
robust_se_mat3 = np.transpose(X).dot(np.diag(residuals[:,0]**(2.0)/(1.0-np.diag(PX))).dot(X))
v_robust = robust_se_mat1.dot(robust_se_mat2.dot(robust_se_mat1))
v_modified_robust = robust_se_mat1.dot(robust_se_mat3.dot(robust_se_mat1))
""" Returns:
beta -- The vector of coefficient estimates, ordered on the columns on X.
residuals -- The vector of residuals, Y - X.beta
sig_hat -- The sample variance of the residuals.
conv_st_error -- The 'standard error of the regression', sqrt(sig_hat).
sig_hat_asymptotic_variance -- The analytic formula for the large sample variance
ols_cov -- The covariance matrix under the basic OLS assumptions.
v_robust -- The "robust" covariance matrix, weighted to account for the residuals and heteroskedasticity.
v_modified_robust -- The bias-corrected and heteroskedasticity-consistent covariance matrix.
"""
return beta, residuals, sig_hat, conv_st_err, sig_hat_asymptotic_variance, ols_cov, v_robust, v_modified_robust
For your problem, you would use it like this:
import numpy as np
# Define or load your data:
(Y, X) = ....
# Desired polynomial degree
deg = 2;
reg_x_data = np.ones(X.shape); # 0th degree term.
for ii in range(1,deg+1):
reg_x_data = np.hstack(( reg_x_data, X**(ii) )); # Append the ii^th degree term.
# Get all of the regression data.
beta, residuals, sig_hat, conv_st_error, sig_hat_asymptotic_variance, ols_cov, v_robust, v_modified_robust = ols_linreg(reg_x_data,Y)
# Print the covariance matrix:
print ols_cov
If you spot any bugs in my computations (especially the heteroskedasticity-consistent estimators) please let me know and I'll fix it asap.