How to specify constraints in statsmodels OLS? - python

I am using statsmodels OLS to run some linear regression on my data. My problem is that I would like the coefficients to add up to 1 (I plan to not use constant parameter).
Is it possible to specify at least 1 constraint on coefficients in statsmodels OLS. I see no option to do so.
Thanks in advance.
Coder.

Related

Is it possible to do a restricted VAR-X model using python?

I have seen some similar questions but they didn't work for my situation.
Here is the model I am trying to implement.
VAR model
I suppose I would need to be able to change the coefficient of stockxSign to 0 when we calculate Stock and same thing for CDSxSign when calculating the CDS
Does someone have any idea how i could do this?
It is possible now with the package that I just wrote.
https://github.com/fstroes/ridge-varx
You can fit coefficients for specific lags only by supplying a list of lags to fit coefficient matrices for. Providing "lags=[1,12]" for instance would only use the variables at lags 1 and 12.
In addition you can use Ridge regularization if you are not sure what lags should be included. If Ridge is not used, the model is fitted using the usual multivariate least squares approach.

python automatic statistical linear regression

Is there some python packages that helps to do statistical linear regression? For example, I hope such program could do something like automatically performing different types of statistical tests (t-test, F-test etc.) and then automatically removes redundant variable etc., correct for heteroskedasticity etc.. Or is LASSO just the best?
You can perform and visualize linear regression in Python with a wide array of packages like:
scipy, statsmodels and seaborn. LASSO is available through statsmodels as described here. When it comes to automated approches to linear regresssion analysis you could start with Forward Selection with statsmodels that was described in an answer in the post Stepwise Regression in Python.

Linear Regression with positive coefficients in Python

I'm trying to find a way to fit a linear regression model with positive coefficients.
The only way I found is sklearn's Lasso model, which has a positive=True argument, but doesn't recommend using with alpha=0 (means no other constraints on the weights).
Do you know of another model/method/way to do it?
IIUC, this is a problem which can be solved by the scipy.optimize.nnls, which can do non-negative least squares.
Solve argmin_x || Ax - b ||_2 for x>=0.
In your case, b is the y, A is the X, and x is the β (coefficients), but, otherwise, it's the same, no?
Many functions can keep linear regression model with positive coefficients.
scipy.optimize.nnls can solve above problem.
scikit-learn LinearRegression can set the parameter positive=True to solve this. And, the sklearn also uses the scipy.optimize.nnls. Interestingly, you can learn how to write multiple targets outputs in source code.
Additionally, if you want to solve linear least squares with bounds on the variables. You can see lsq_linear
.
As of version 0.24, scikit-learn LinearRegression includes a similar argument positive, which does exactly that; from the docs:
positive : bool, default=False
When set to True, forces the coefficients to be positive. This option is only supported for dense arrays.
New in version 0.24.

Lasso Generalized linear model in Python

I would like to fit a generalized linear model with negative binomial link function and L1 regularization (lasso) in python.
Matlab provides the nice function :
lassoglm(X,y, distr)
where distr can be poisson, binomial etc.
I had a look at both statmodels and scikit-learn but I did not find any ready to use function or example that could direct me towards a solution.
In matlab it seems they minimize this:
min (1/N * Deviance(β0,β) + λ * sum(abs(β)) )
where deviance depends on the link function.
Is there a way to implement this easily with scikit or statsmodels or I should go for cvxopt?
statsmodels has had for some time a fit_regularized for the discrete models including NegativeBinomial.
http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.fit_regularized.html
which doesn't have the docstring (I just saw). The docstring for Poisson has the same information http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.Poisson.fit_regularized.html
and there should be some examples available in the documentation or unit tests.
It uses an interior algorithm with either scipy slsqp or optionally, if installed, cvxopt. Compared to steepest descend or coordinate descend methods, this is only appropriate for cases where the number of features/explanatory variables is not too large.
Coordinate descend with elastic net for GLM is in a work in progress pull request and will most likely be available in statsmodels 0.8.

Regression using PYMC3

I posted a IPython notebook here http://nbviewer.ipython.org/gist/dartdog/9008026
And I worked through both standard Statsmodels OLS and then similar with PYMC3 with the data provided via Pandas, that part works great by the way.
I can't see how to get the more standard parameters out of PYMC3? The examples seem to just use OLS to plot the base regression line. It seems that the PYMC3 model data should be able to give the parameters for the regression line? in addition to the probable traces,, ie what is the highest probability line?
Any further explanation of interpretation of Alpha, beta and sigma welcomed!
Also how to use PYMC3 model to estimate a future value of y given a new x ie prediction with some probability?
And lastly PYMC3 has a newish GLM wrapper which I tried and it seemed to get messed up? (it could well be me though)
The glm submodule sets some default priors which might very well not be appropriate for every case of which yours is one. You can change them by using the family argument, e.g.:
pm.glm.glm('y ~ x', data,
family=pm.glm.families.Normal(priors={'sd': ('sigma', pm.Uniform.dist(0, 12000))}))
Unfortunately this isn't very well documented yet and requires some good examples.

Categories