Is there a function for linear-link NB1 regression in python? - python

I would like to build a linear link negative binomial 1 glm model for some data (where negative binomial 1 means that the dispersion parameter is of the form (1 + a)*mean). This means I need to simultaneously control both the link function form and the dispersion parameter form.
Is there a library/function I can use to control both?
Python's statsmodels negative binomial regression implementation is an NB2 model, which assumes dispersion is of the form mean + a*mean^2 (see https://www.statsmodels.org/stable/glm.html), and I don't see a way to change this.
I am also aware of NegativeBinomialP, which allows me to specify a form for the dispersion parameter, but not for the link function, which defaults to log-link.

Related

Chi-square value from lmfit

I have been trying to do a pixel-to-pixel fitting of a set of images ie I have data at different wavelengths in different images and I am trying to fit a function for each pixel individually. I have done the fitting using lmfit and obtained the values of the unknown parameters for each pixel. Now, I want to obtain the chi-squared value for each fit. I know that lmfit has this attribute called chisqr which can give me the same but what is confusing me is this line from the lmfit github site:
"Note that the calculation of chi-square and reduced chi-square assume that the returned residual function is scaled properly to the uncertainties in the data. For these statistics to be meaningful, the person writing the function to be minimized must scale them properly."
I doubt that the values I am getting from the chisqr attribute are not exactly right and some scaling needs to be done. Can somebody please explain how lmfit calculates the chisquare value and what scaling am I required to do?
This is a sample of my fitting function
def fcn2fit(params,freq,F,sigma):
colden=params['colden'].value
tk=params['tk'].value
model = greybodyfit(np.array(freq),colden,tk)
return (model - F)/sigma
colden and tk are the free parameters, freq is the independent variable and F is the dependent variable, sigma is the error in F. Is returning (model-F)/sigma the right way of scaling the residuals so that the chisqr attribute gives the correct chi-square value?
The value reported for chi-square is the sum of the square of the residual for the fit. Lmfit cannot tell whether that residual function is properly scaled by the standard error of the data - this scaling must be done in the objective function if you are using lmfit.minimize or passed in as weights if using lmfit.Model.

Python statsmodels get_prediction function formula

What formula does this function use after computing a simple linear regression (OLS) on data? There's many different prediction interval formulas, some using RMSE (root mean square error), some using standard deviation, etc.
http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLSResults.get_prediction.html#statsmodels.regression.linear_model.OLSResults.get_prediction
In particular, I want to know if it's using this formula or something else:
Note the standard deviation of x parameter.
It does use the same formula as shown above. The standard error of the prediction is calculated using the formula sqrt(variance of predicted mean + variance of residuals). This can be simplified as shown in this link.

Modified negative binomial GLM in Python

Packages pymc3 and statsmodels can handle negative binomial GLMs in Python as shown here:
E(Y) = e^(beta_0 + Sigma (X_i * beta_i))
Where X_is are my predictor variables and Y is my dependent variable. Is there a way to force one my variables (for example X_1) to have beta_1=1 so that the algorithm optimizes other coefficients. I am open to using both pymc3 and statsmodels. Thanks.
GLM and the count models in statsmodels.discrete include and optional keyword offset which is exactly for this use case. It is added to the linear prediction part, and so corresponds to an additional variable with fixed coefficient equal to 1.
http://www.statsmodels.org/devel/generated/statsmodels.genmod.generalized_linear_model.GLM.html
http://www.statsmodels.org/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.html
Aside: GLM with family NegativeBinomial takes the negative binomial dispersion parameter as fixed, while the discrete model NegativeBinomial estimates the dispersion parameter by MLE jointly with the mean parameters.
Another aside: GLM has a fit_constrained method for linear or affine restrictions on the parameters. This works by transforming the design matrix and using offset for the constant part. In the simple case of a fixed parameter as in the question, this reduces to using offset in the same way as described above (although fit_constrained has to go through the more costly general case.)

Scikit-learn: why does KernelDensity.score() provide a different result than GaussianMixture.score()?

Comparing a parametric statistical modelling method (gaussian mixture modelling - GMM) with a nonparametric statistical modelling method (kernel density estimation - KDE), I wanted to use the Akaike Information Criteria (AIC) to take into account the parametric components that are used in gaussian mixture modelling.
Scikit-learn GaussianMixture class contains a .aic(X) method that returns the AIC directly. The KernelDensity class does not contain this method. In programming the algorithm I noticed that the GaussianMixture.score(X) method works different for both classes.
GaussianMixture.score(X) returns:
log_likelihood : float
Log likelihood of the Gaussian mixture given X.
KernelDensity.score(X) returns:
logprob : float
Total log-likelihood of the data in X.
While not directly a problem for my code I was wondering if this was done on purpose, and if so, what that purpose would be. It did lead to initial misleading results before I found the difference, which is why I am putting this up on stackoverflow, in case other people found the same issue.

Statsmodels - Negative Binomial doesn't converge while GLM does converge

I'm trying to do a Negative Binomial regression using Python's statsmodels package. The model estimates fine when using the GLM routine i.e.
model = smf.glm(formula="Sales_Focus_2016 ~ Sales_Focus_2015 + A_Calls + A_Ed", data=df, family=sm.families.NegativeBinomial()).fit()
model.summary()
However, the GLM routine doesn't estimate alpha, the dispersion term. I tried to use the Negative Binomial routine directly (which does estimate alpha) i.e.
nb = smf.negativebinomial(formula="Sales_Focus_2016 ~ Sales_Focus_2015 + A_Calls + A_Ed", data=df).fit()
nb.summary()
But this doesn't converge. Instead I get the message:
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: nan
Iterations: 0
Function evaluations: 1
Gradient evaluations: 1
My question is:
Do the two routines use different methods of estimation? Is there a way to make the smf.NegativeBinomial routine use the same estimation methods as the GLM routine?
discrete.NegativeBinomial uses either a newton method (default) in statsmodels or the scipy optimizers. The main problem is that the exponential mean function can easily result in overflow problems or problems from large gradients and hessian when we are still far away from the optimum. There are some attempts in the fit method to get good starting values but this does not always work.
a few possibilities that I usually try
check that no regressor has large values, e.g. rescale to have max below 10
use method='nm' Nelder-Mead as initial optimizer and switch to newton or bfgs after some iterations or after convergence.
try to come up with better starting values (see for example about GLM below)
GLM uses by default iteratively reweighted least squares, IRLS, which is only standard for one parameter families, i.e. it takes the dispersion parameter as given. So the same method cannot be directly used for the full MLE in discrete NegativeBinomial.
GLM negative binomial still specifies the full loglike. So it is possible to do a grid search over the dispersion parameter using GLM.fit() for estimating the mean parameters for each value of the dispersion parameter. This should be equivalent to the corresponding discrete NegativeBinomial version (nb2 ? I don't remember). It could also be used as start_params for the discrete version.
In the statsmodels master version, there is now a connection to allow arbitrary scipy optimizers instead of the ones that were hardcoded. scipy recently obtained trust region newton methods, and will get more in future, which should work for more cases than the simple newton method in statsmodels.
(However, most likely that does not work currently for discrete NegativeBinomial, I just found out about a possible problem https://github.com/statsmodels/statsmodels/issues/3747 )

Categories