I have been trying to do a pixel-to-pixel fitting of a set of images ie I have data at different wavelengths in different images and I am trying to fit a function for each pixel individually. I have done the fitting using lmfit and obtained the values of the unknown parameters for each pixel. Now, I want to obtain the chi-squared value for each fit. I know that lmfit has this attribute called chisqr which can give me the same but what is confusing me is this line from the lmfit github site:
"Note that the calculation of chi-square and reduced chi-square assume that the returned residual function is scaled properly to the uncertainties in the data. For these statistics to be meaningful, the person writing the function to be minimized must scale them properly."
I doubt that the values I am getting from the chisqr attribute are not exactly right and some scaling needs to be done. Can somebody please explain how lmfit calculates the chisquare value and what scaling am I required to do?
This is a sample of my fitting function
def fcn2fit(params,freq,F,sigma):
colden=params['colden'].value
tk=params['tk'].value
model = greybodyfit(np.array(freq),colden,tk)
return (model - F)/sigma
colden and tk are the free parameters, freq is the independent variable and F is the dependent variable, sigma is the error in F. Is returning (model-F)/sigma the right way of scaling the residuals so that the chisqr attribute gives the correct chi-square value?
The value reported for chi-square is the sum of the square of the residual for the fit. Lmfit cannot tell whether that residual function is properly scaled by the standard error of the data - this scaling must be done in the objective function if you are using lmfit.minimize or passed in as weights if using lmfit.Model.
Related
I'm currently trying to train a GP regression model in GPflow which will predict precipitation values given some meteorological inputs. I'm using a Linear+RBF+WhiteNoise kernel, which seems appropriate given the set of predictors I'm using.
My problem at the moment is that when I get the model to predict new values, it has a tendency to predict negative precipitation - see attached figure.
How can I "enforce" physical constraints when building the model? The training data doesn't contain any negative precipitation values, but it does contain a lot of values close to zero, which I assume means the GPR model isn't learning the "precipitation must be >=0" constraint very well.
If there's a way of explicitly enforcing a constraint like this it'd be perfect, but I'm not sure how that would work. Would this require a different optimization algorithm? Or is it possible to somehow build this constraint into the kernel structure?
This is more of a question for CrossValidated ... A Gaussian process is essentially a distribution over functions with Gaussian marginals: the predictive distribution of f(x) at any point is by construction a Gaussian, not constrained. E.g. if you have lots of observations close to zero, your model expects that something just below zero must also be very likely.
If your observations are strictly positive, you could use a different likelihood, e.g. Exponential (gpflow.likelihoods.Exponential) or Beta (gpflow.likelihoods.Beta). Note that model.predict_y() always returns mean and variance, and for non-Gaussian likelihoods the variance may not actually be what you want. In practice, you're more likely to care about quantiles (e.g. 10%-90% confidence interval); there is an open issue on the GPflow github that relates to this. Which likelihood you use is part of your modelling choice, and depends on your data.
The simplest practical answer to your problem is to consider modelling the log-precipitation: if your original dataset is X and Y (with Y > 0 for all entries), compute logY = np.log(Y) and create your GP model e.g. using gpflow.models.GPR((X, logY), kernel). You then predict logY at test points, and can then convert it back from log-precipitation into precipitation space. (This is equivalent to a LogNormal likelihood, which isn't currently implemented in GPflow, though this would be straightforward.)
I've been starting to use the minimizer in scipy.optimize and for most parameters I've tried to fit, the default method BFGS has worked just fine. The method helpfully reports the inverse of the Hessian matrix from which I can extract the errors on the fitted parameters from the diagonals of the matrix. However, for some new parameters that I'm trying to fit, the values are quite small and I run into precision errors using BFGS.
Switching to Nelder-Mead does the job, but I don't know how to extract the uncertainties from the fitted parameters using this method.
How can I extract the uncertainties for the fitted parameters using the Nelder-Mead in scipy.optimize()?
Comparing a parametric statistical modelling method (gaussian mixture modelling - GMM) with a nonparametric statistical modelling method (kernel density estimation - KDE), I wanted to use the Akaike Information Criteria (AIC) to take into account the parametric components that are used in gaussian mixture modelling.
Scikit-learn GaussianMixture class contains a .aic(X) method that returns the AIC directly. The KernelDensity class does not contain this method. In programming the algorithm I noticed that the GaussianMixture.score(X) method works different for both classes.
GaussianMixture.score(X) returns:
log_likelihood : float
Log likelihood of the Gaussian mixture given X.
KernelDensity.score(X) returns:
logprob : float
Total log-likelihood of the data in X.
While not directly a problem for my code I was wondering if this was done on purpose, and if so, what that purpose would be. It did lead to initial misleading results before I found the difference, which is why I am putting this up on stackoverflow, in case other people found the same issue.
I'm trying to do a Negative Binomial regression using Python's statsmodels package. The model estimates fine when using the GLM routine i.e.
model = smf.glm(formula="Sales_Focus_2016 ~ Sales_Focus_2015 + A_Calls + A_Ed", data=df, family=sm.families.NegativeBinomial()).fit()
model.summary()
However, the GLM routine doesn't estimate alpha, the dispersion term. I tried to use the Negative Binomial routine directly (which does estimate alpha) i.e.
nb = smf.negativebinomial(formula="Sales_Focus_2016 ~ Sales_Focus_2015 + A_Calls + A_Ed", data=df).fit()
nb.summary()
But this doesn't converge. Instead I get the message:
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: nan
Iterations: 0
Function evaluations: 1
Gradient evaluations: 1
My question is:
Do the two routines use different methods of estimation? Is there a way to make the smf.NegativeBinomial routine use the same estimation methods as the GLM routine?
discrete.NegativeBinomial uses either a newton method (default) in statsmodels or the scipy optimizers. The main problem is that the exponential mean function can easily result in overflow problems or problems from large gradients and hessian when we are still far away from the optimum. There are some attempts in the fit method to get good starting values but this does not always work.
a few possibilities that I usually try
check that no regressor has large values, e.g. rescale to have max below 10
use method='nm' Nelder-Mead as initial optimizer and switch to newton or bfgs after some iterations or after convergence.
try to come up with better starting values (see for example about GLM below)
GLM uses by default iteratively reweighted least squares, IRLS, which is only standard for one parameter families, i.e. it takes the dispersion parameter as given. So the same method cannot be directly used for the full MLE in discrete NegativeBinomial.
GLM negative binomial still specifies the full loglike. So it is possible to do a grid search over the dispersion parameter using GLM.fit() for estimating the mean parameters for each value of the dispersion parameter. This should be equivalent to the corresponding discrete NegativeBinomial version (nb2 ? I don't remember). It could also be used as start_params for the discrete version.
In the statsmodels master version, there is now a connection to allow arbitrary scipy optimizers instead of the ones that were hardcoded. scipy recently obtained trust region newton methods, and will get more in future, which should work for more cases than the simple newton method in statsmodels.
(However, most likely that does not work currently for discrete NegativeBinomial, I just found out about a possible problem https://github.com/statsmodels/statsmodels/issues/3747 )
I'd like to know ways to determine how well a Gaussian function is fitting my data.
Here are a few plots I've been testing methods against. Currently, I'm just using the RMSE of the fit versus the sample (red is fit, blue is sample).
For instance, here are 2 good fits:
And here are 2 terrible fits that should be flagged as bad data:
In general, I'm looking for suggestions of additional metrics to measure the goodness of fit. Additionally, as you can see in the second 'good' fit, there can sometimes be other peaks outside the data. Currently, these are penalized by the RSME method, though they should not be.
I'm looking for suggestions of additional metrics to measure the goodness of fit.
The one-sample Kolmogorov-Smirnov (KS) test would be a good starting point.
I'd suggest the Wikipedia article as an introduction.
The test is available in SciPy as scipy.stats.kstest. The function computes and returns both the KS test statistic and the p-value.
You can try quantile-quantile (qq) plots using probplot from stats:
import pylab
from stats import probplot
plot = probplot(data, dist='norm', plot=pylab)
pylab.show()
Calculate quantiles for a probability plot, and optionally show the
plot.
Generates a probability plot of sample data against the quantiles of a
specified theoretical distribution (the normal distribution by
default). probplot optionally calculates a best-fit line for the data
and plots the results using Matplotlib or a given plot function.
There are other ways of evaluating a good fit, but most of them are not robust to outliers.
There is MSE - Mean squared error, which you already know, and RMSE which is the root of it.
But you can also measure it using MAE - Mean Absolute Error and MAPE - Mean absolute percentage error.
Also, there is the Kolmogorov-Smirnov test which is far more complex and you would probably need a library to do that, while MAE, MAPE and MSE you can implement yourself quiet easily.
(If you are dealing with unsupervised data and/or classification, which is not your case apparently, ROC curves and confusion matrix are also accuracy metrics.)