Linear Regression with positive coefficients in Python - python

I'm trying to find a way to fit a linear regression model with positive coefficients.
The only way I found is sklearn's Lasso model, which has a positive=True argument, but doesn't recommend using with alpha=0 (means no other constraints on the weights).
Do you know of another model/method/way to do it?

IIUC, this is a problem which can be solved by the scipy.optimize.nnls, which can do non-negative least squares.
Solve argmin_x || Ax - b ||_2 for x>=0.
In your case, b is the y, A is the X, and x is the β (coefficients), but, otherwise, it's the same, no?

Many functions can keep linear regression model with positive coefficients.
scipy.optimize.nnls can solve above problem.
scikit-learn LinearRegression can set the parameter positive=True to solve this. And, the sklearn also uses the scipy.optimize.nnls. Interestingly, you can learn how to write multiple targets outputs in source code.
Additionally, if you want to solve linear least squares with bounds on the variables. You can see lsq_linear
.

As of version 0.24, scikit-learn LinearRegression includes a similar argument positive, which does exactly that; from the docs:
positive : bool, default=False
When set to True, forces the coefficients to be positive. This option is only supported for dense arrays.
New in version 0.24.

Related

Is it possible to do a restricted VAR-X model using python?

I have seen some similar questions but they didn't work for my situation.
Here is the model I am trying to implement.
VAR model
I suppose I would need to be able to change the coefficient of stockxSign to 0 when we calculate Stock and same thing for CDSxSign when calculating the CDS
Does someone have any idea how i could do this?
It is possible now with the package that I just wrote.
https://github.com/fstroes/ridge-varx
You can fit coefficients for specific lags only by supplying a list of lags to fit coefficient matrices for. Providing "lags=[1,12]" for instance would only use the variables at lags 1 and 12.
In addition you can use Ridge regularization if you are not sure what lags should be included. If Ridge is not used, the model is fitted using the usual multivariate least squares approach.

linear and nonlinear regression combined in python

I am working on data set where there are four predictors. There is good linear relation with one of the predictors but with other three i think polynomial would fit. Is there any method in python where i can predict single variable combining linear regression on one predictors and polynomial or other non-linear regression on other three predictors?
Please help.
You can fit one polynomial expression for all features which should take care of the linear one as well. The only difference is that the coefficient of the linear one will be of order 1.
You could try np.polyfit to see a potential trend in your data.
Documentation: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html

Feeding a seed value to solver in Python Logistic Regression

I am using scikit-learn's linear_model.LogisticRegression to perform multinomial logistic regress. I would like to initialize the solver's seed value, i.e. I want to give the solver its initial guess as the coefficients' values.
Does anyone know how to do that? I have looked online and sifted through the code too, but haven't found an answer.
Thanks!
You can use the warm_start option (with solver not liblinear), and manually set coef_ and intercept_ prior to fitting.
warm_start : bool, default=False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver.

SGDClassifier with constraints

I am trying to do logistic regression on a huge data set using scikit-learn SGDClassifier (I am using partial_fit to be precise). The coefficients I obtained are of different sign, whereas I would like to force the classifier to look only for positive values (I know it may not be the best approach in terms of methodology however it is what would be ok for now).
My question is:
Is there any way to impose constraints on coefficients using SGDClassifier?
Thanks for your time
This is not possible with SGDClassifier in its current implementation.
If you wanted to implement this, you have to add a penalty, call it e.g. 'positivity', which makes sure this constraint is verified by placing infinite cost on negative values.
It may be possible to implement this using e.g. this paper, Duchi 2009 (but I think there are follow-ups in newer literature that could be more up to the job). What you need to do at every mini-batch is to project onto the positive orthant. This is done by simply setting all negative values the occur after a gradient step in the logistic loss to 0.

Using l1 penalty with LogisticRegressionCV() in scikit-learn

I am using python scikit-learn library for classification.
As a feature selection step, I want to use RandomizedLogisticRegression().
So for finding best value of C by cross-validation, I used LogisticRegressionCV(penalty='l1', solver='liblinear').
However, all coefficients were all 0 in this case.
Using l2 penalty works without problem. Also, single run of LogisticRegression() with l1 penalty seems to give proper coeffients.
I am using RandomizedLasso and LassoCV() for work-around, but I am not sure whether it is proper to use LASSO for binary class label.
So my question is like these.
Is there some problem in using LogisticRegressionCV() in my case?
Is there another way to find best value of C_ for logistic regression except GridSearchCV()?
Is it possible to use LASSO for binary(not continuous) classification?
From what you describe, I can say that the coefficient of the l1 regularisation term is high in your case which you need to decrease.
When the coefficient is very high, the regularisation terms becomes more important than the error term and so your model just becomes very sparse and doesn't predict anything.
I checked the LogisticRegressionCV and it says that it will search from 1e-4 to 1e4 using the Cs argument. The documentation says that in order to have lower regularisation coefficients you need to have higher Cs if you provide an integer. Alternatively you can possibly provide the inverse of regularisation coefficients yourself as a list.
So play with the Cs parameter and try to lower the regularisation coefficient.

Categories