SGDClassifier with constraints - python

I am trying to do logistic regression on a huge data set using scikit-learn SGDClassifier (I am using partial_fit to be precise). The coefficients I obtained are of different sign, whereas I would like to force the classifier to look only for positive values (I know it may not be the best approach in terms of methodology however it is what would be ok for now).
My question is:
Is there any way to impose constraints on coefficients using SGDClassifier?
Thanks for your time

This is not possible with SGDClassifier in its current implementation.
If you wanted to implement this, you have to add a penalty, call it e.g. 'positivity', which makes sure this constraint is verified by placing infinite cost on negative values.
It may be possible to implement this using e.g. this paper, Duchi 2009 (but I think there are follow-ups in newer literature that could be more up to the job). What you need to do at every mini-batch is to project onto the positive orthant. This is done by simply setting all negative values the occur after a gradient step in the logistic loss to 0.

Related

Is it possible to do a restricted VAR-X model using python?

I have seen some similar questions but they didn't work for my situation.
Here is the model I am trying to implement.
VAR model
I suppose I would need to be able to change the coefficient of stockxSign to 0 when we calculate Stock and same thing for CDSxSign when calculating the CDS
Does someone have any idea how i could do this?
It is possible now with the package that I just wrote.
https://github.com/fstroes/ridge-varx
You can fit coefficients for specific lags only by supplying a list of lags to fit coefficient matrices for. Providing "lags=[1,12]" for instance would only use the variables at lags 1 and 12.
In addition you can use Ridge regularization if you are not sure what lags should be included. If Ridge is not used, the model is fitted using the usual multivariate least squares approach.

n_neighbor parameter of Local Outlier Factor affects to ROC-AUC

I am trying to solve the outlier detection problem with several algorithms. When I use Local Outlier Factor API of Scikit-learn, I have to input a very important parameter--n_neighbors. However, with different n_neighbors, I receive different ROC_AUC scores. For example, with n_neighbors=5 then ROC_AUC=56. However, with n_neighbors=6 then ROC_AUC=85; with n_neighbors=7 then ROC_AUC=94, etc. Formally, ROC_AUC is very high if n_neighbors>=6
I want to ask three questions:
(1) Why the n_neighbors parameter of Local Outlier Factor affects to ROC-AUC?
(2) How to choose an appropriate n_neighbors in an unsupervised learning setting?
(3) Should I choose high n_neighbors to get a high ROC_AUC?
If the results would not be affected, the parameter would not be needed, right?
Considering more neighbors is more costly. But it also means more data is used, so I'm not surprised that results improve. Did you read the paper that explains what the parameter does?
When you are choosing the parameter based on the evaluation, then you are cheating. It is an unsupervised method - you are not supposed to have such labels in a real use case.

What are the initial estimates taken in Logistic regression in Scikit-learn for the first iteration?

I am trying out logistic regression from scratch in python.(through finding probability estimates,cost function,applying gradient descent for increasing the maximum likelihood).But I have a confusion regarding which estimates should I take for the first iteration process.I took all the estimates as 0(including the intercept).But the results are different from that we get in Scikit-learn.I want to know which are the initial estimates taken in Scikit-learn for logistic regression?
First of all scikit learn's LogsiticRegerssion uses regularization. So unless you apply that too , it is unlikely you will get exactly the same estimates. if you really want to test your method versus scikit's , it is better to use their gradient decent implementation of Logistic regersion which is called SGDClassifier . Make certain you put loss='log' for logistic regression and set alpha=0 to remove regularization, but again you will need to adjust the iterations and eta as their implementation is likely to be slightly different than yours.
To answer specifically about the initial estimates, I don't think it matters, but most commonly you set everything to 0 (including the intercept) and should converge just fine.
Also bear in mind GD (gradient Decent) models are hard to tune sometimes and you may need to apply some scaling(like StandardScaler) to your data beforehand as very high values are very likely to drive your gradient out of its slope. Scikit's implementation adjusts for that.

Polynomial Regression of a Noisy Dataset

I was wondering if I could get some help on a problem.
I am creating a tool for a former lab of mine which uses data from a physics based machine (a lot of noise) that results as simple x, y coordinates. I want to identify local maximums of the dataset, however, since there is a bunch of noise in the set, you cannot just check the slope between the points in order to determine the peak.
In order to solve this, I was thinking of using polynomial regression to somewhat "smooth out" the data set, then determine local maximums from the resulting model.
I've run through this link
http://scikit-learn.org/stable/auto_examples/linear_model/plot_polynomial_interpolation.html, however, it only tells you how create a model that is a close fit. It doesn't tell you if there is an integrated metric in which to measure which is the best model. Should I do this through Chi squared? Or is there some other metric that works better or is integrated into the scikit-learn kit?
Link procided esentially shows you how to build a Ridge Regression on top of polynomial features. Consequently this is not a "tight fit", as you can control it through regularization (alpha parameter) - prior over the parameters. Now, what do you mean by "best model" - there are infinitely many possible criterions for being a best regression, each tested through different criterion. You need to answer yourself - what is the measure that you are interested in. Should it be some kind of "golden ratio" between smoothness and close fitness? Or maybe you want a model of at most some smoothness, which minimizes some error measure (mean squared distance to the points?)? Yet another would be to test how well it captures the underlying process - through some kind of typical validation (like cross validation etc.) where you repeat building the model on the subset of the data and check error on the holdout part. There are many possible (and completely valid!) approaches - everything depends on the exact question you want to answer. "What is the best model" is not a good question, unfortunately.

Logistic Regression function on sklearn

I am learning Logistic Regression from sklearn and came across this : http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression
I have a created an implementation which shows me the accuracy scores for training and testing. However it is very unclear how this was achieved. My question is : What is the Maximum likelihood estimate? How is this being calculated? What is the error measure? What is the optimisation algorithm used?
I know all of the above in theory, however I am not sure where and when and how scikit.learn calculates it, or if its something I need to implement at some point. I have an accuracy rate of 83% which was what I was aiming for but I am very confused about how this was achieved by scikit learn.
Would anyone be able to point me in the right direction?
I recently started studying LR myself, I still don't get many steps of the derivation but I think I understand which formulas are being used.
First of all let's assume that you are using the latest version of scikit-learn and that the solver being used is solver='lbfgs' (which is the default I believe).
The code is here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/logistic.py
What is the Maximum likelihood estimate? How is this being calculated?
The function to compute the likelihood estimate is this one https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/logistic.py#L57
The interesting line is:
# Logistic loss is the negative of the log of the logistic function.
out = -np.sum(sample_weight * log_logistic(yz)) + .5 * alpha * np.dot(w, w)
which is the formula 7 of this tutorial. The function also computes the gradient of the likelihood, which is then passed to the minimization function (see below). One important thing is that the intercept is w0 of the formulas in the tutorial. But that's only valid fit_intercept is True.
What is the error measure?
I'm sorry I'm not sure.
What is the optimisation algorithm used?
See the following lines in the code: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/logistic.py#L389
It's this function http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html
One very important thing is that the classes are +1 or -1! (for the binary case, in the literature 0 and 1 are common, but it won't work)
Also notice that numpy broadcasting rules are used in all formulas. (That's why you don't see iteration)
This was my attempt at understanding the code. I slowly went mad till the point of ripping appart scikit-learn code (in only works for the binary case). Also this served as inspiration too
Hope it helps.
Check out Prof. Andrew Ng's machine learning notes on Logistic Regression (starting from page 16): http://cs229.stanford.edu/notes/cs229-notes1.pdf
In logistic regression you minimize cross entropy (which in turn maximizes the likelihood of y given x). In order to do this, the gradient of the cross entropy (cost) function is being computed and is used to update the weights of the algorithm which are assigned to each input. In simple terms, logistic regression comes up with a line that best discriminates your two binary classes by changing around its parameters such that the cross entropy keeps going down. The 83% accuracy (i'm not sure what accuracy that is; you should be diving your data into training/validation/testing) means the line Logistic Regression is using for classification can correctly separate the classes 83% of the time.
I would have a look at the following on github :
https://github.com/scikit-learn/scikit-learn/blob/965b109bf2ac3a61dcbd02bc29dd8c9598c2b54c/sklearn/linear_model/logistic.py
The link is to the implementation of sklearn logictic regression. It contains the optimization algorithms used which include newton conjugate gradient (newton-cg) and bfgs (broyden fletcher goldfarb shanno algorithm) all of which require the calculation of the hessian of the loss function (_logistic_loss) . _logistic_loss is your likelihood function.

Categories