Can scipy.optimize.least_squares be converted to handle IRLS?

Can scipy.optimize.least_squares be converted to handle IRLS? - python

I typically fit GLM's using statsmodels. The output always mentions that the method fits using IRLS.
I then fit a regression problem using scipy.optimize.least_squares but notice that the coefficients are somewhat different. Is this because scipy.optimize.least_squares employs ordinary least squares? If so, is there any workaround to try and implement IRLS into the scipy.optimize.least_squares function.
Note: I cannot use ordinary statsmodels GLM as my model requires random parameters.

Related

Statsmodels - Negative Binomial doesn't converge while GLM does converge

I'm trying to do a Negative Binomial regression using Python's statsmodels package. The model estimates fine when using the GLM routine i.e.
model = smf.glm(formula="Sales_Focus_2016 ~ Sales_Focus_2015 + A_Calls + A_Ed", data=df, family=sm.families.NegativeBinomial()).fit()
model.summary()
However, the GLM routine doesn't estimate alpha, the dispersion term. I tried to use the Negative Binomial routine directly (which does estimate alpha) i.e.
nb = smf.negativebinomial(formula="Sales_Focus_2016 ~ Sales_Focus_2015 + A_Calls + A_Ed", data=df).fit()
nb.summary()
But this doesn't converge. Instead I get the message:
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: nan
Iterations: 0
Function evaluations: 1
Gradient evaluations: 1
My question is:
Do the two routines use different methods of estimation? Is there a way to make the smf.NegativeBinomial routine use the same estimation methods as the GLM routine?

discrete.NegativeBinomial uses either a newton method (default) in statsmodels or the scipy optimizers. The main problem is that the exponential mean function can easily result in overflow problems or problems from large gradients and hessian when we are still far away from the optimum. There are some attempts in the fit method to get good starting values but this does not always work.
a few possibilities that I usually try
check that no regressor has large values, e.g. rescale to have max below 10
use method='nm' Nelder-Mead as initial optimizer and switch to newton or bfgs after some iterations or after convergence.
try to come up with better starting values (see for example about GLM below)
GLM uses by default iteratively reweighted least squares, IRLS, which is only standard for one parameter families, i.e. it takes the dispersion parameter as given. So the same method cannot be directly used for the full MLE in discrete NegativeBinomial.
GLM negative binomial still specifies the full loglike. So it is possible to do a grid search over the dispersion parameter using GLM.fit() for estimating the mean parameters for each value of the dispersion parameter. This should be equivalent to the corresponding discrete NegativeBinomial version (nb2 ? I don't remember). It could also be used as start_params for the discrete version.
In the statsmodels master version, there is now a connection to allow arbitrary scipy optimizers instead of the ones that were hardcoded. scipy recently obtained trust region newton methods, and will get more in future, which should work for more cases than the simple newton method in statsmodels.
(However, most likely that does not work currently for discrete NegativeBinomial, I just found out about a possible problem https://github.com/statsmodels/statsmodels/issues/3747 )

Does sklearn have group lasso?

I'm interested in using group lasso for a problem I have. Here is a link to the algorithm. I know R has a slick implementation, but am curious to see if python has something similar.
I think sklearn.linear_model.MultiTaskLasso might be kind of similar, but am not sure. Can anyone shed some light on this?

Whether or not to implement the Group Lasso in sklearn is discussed in this issue in the sklearn repo, where the conclusion so far is that it is too much of a niche model to justify the maintenance it would need if included in master.
Therefore, I have implemented a GroupLasso class which passes sklearn's check_estimator() in my python/cython package celer, which acts as a dropin replacement for sklearn's Lasso, MultitaskLasso, sparse Logistic regression with faster solvers.
The solver uses coordinate descent, working set methods and extrapolation, which should allow it to scale to problems with millions of features.
It supports sparse and dense data, along with centering and normalization (centering sparse data is not trivial as it breaks the sparsity of the design matrix), and comes with a GroupLassoCV class to perform cross-validation.
In celer's documentation, there is an example showing how to use it.

I've also looked into this, as far as I know scikit-learn does not provide this implementation.
The MultiTaskLasso does something else. From the documentation:
"The MultiTaskLasso is a linear model that estimates sparse coefficients for multiple regression problems jointly: y is a 2D array, of shape (n_samples, n_tasks). The constraint is that the selected features are the same for all the regression problems, also called tasks."
In other words, the MultiTaskLasso is an implementation of the Lasso which is able to predict multiple targets at the same time (hence y is a 2D array). Another way this problem is known is 'multi-output regression' or 'multi-target regression'. If the tasks are related, such methods can improve methods which try to model every task or target separately.

Lasso Generalized linear model in Python

I would like to fit a generalized linear model with negative binomial link function and L1 regularization (lasso) in python.
Matlab provides the nice function :
lassoglm(X,y, distr)
where distr can be poisson, binomial etc.
I had a look at both statmodels and scikit-learn but I did not find any ready to use function or example that could direct me towards a solution.
In matlab it seems they minimize this:
min (1/N * Deviance(β0,β) + λ * sum(abs(β)) )
where deviance depends on the link function.
Is there a way to implement this easily with scikit or statsmodels or I should go for cvxopt?

statsmodels has had for some time a fit_regularized for the discrete models including NegativeBinomial.
http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.fit_regularized.html
which doesn't have the docstring (I just saw). The docstring for Poisson has the same information http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.Poisson.fit_regularized.html
and there should be some examples available in the documentation or unit tests.
It uses an interior algorithm with either scipy slsqp or optionally, if installed, cvxopt. Compared to steepest descend or coordinate descend methods, this is only appropriate for cases where the number of features/explanatory variables is not too large.
Coordinate descend with elastic net for GLM is in a work in progress pull request and will most likely be available in statsmodels 0.8.

Regression using PYMC3

I posted a IPython notebook here http://nbviewer.ipython.org/gist/dartdog/9008026
And I worked through both standard Statsmodels OLS and then similar with PYMC3 with the data provided via Pandas, that part works great by the way.
I can't see how to get the more standard parameters out of PYMC3? The examples seem to just use OLS to plot the base regression line. It seems that the PYMC3 model data should be able to give the parameters for the regression line? in addition to the probable traces,, ie what is the highest probability line?
Any further explanation of interpretation of Alpha, beta and sigma welcomed!
Also how to use PYMC3 model to estimate a future value of y given a new x ie prediction with some probability?
And lastly PYMC3 has a newish GLM wrapper which I tried and it seemed to get messed up? (it could well be me though)

The glm submodule sets some default priors which might very well not be appropriate for every case of which yours is one. You can change them by using the family argument, e.g.:
pm.glm.glm('y ~ x', data,
family=pm.glm.families.Normal(priors={'sd': ('sigma', pm.Uniform.dist(0, 12000))}))
Unfortunately this isn't very well documented yet and requires some good examples.

How can I set, rather than fit, the co-efficients of a spline interpolation using scipy?

I am trying to train a predictive model and want to use a spline-like interpolation to represent some function that forms part of the model. However, this is not a simple case of fitting some x,y data over a region to find the spline co-efficients, rather the function being approximated by the spline forms part of a non-linear generative model. To find the co-efficients I need to use non-linear minimisation algorithms to optimise against a training data set. This means I need to be able to directly specify a set of co-efficients rather than using the fitting methods in scipy.interpolate (such as scipy.interpolate.UnivariateSpline).
Is there a way to specify spline co-efficients and then use the resulting object as a function within the model? If this isn't possible with scipy, is there another python library that supports this functionality?

If I understand you correctly, you basically want to specify your own spline coefficients as independent parameters and then evaluate them using the built-in spline functionality.
See http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.splev.html
splev takes knots and coefficients typically generated by splrep or splprep, but you should be able to bypass those routines and modify the coefficients yourself.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.