In sklearn logisticRegression, what is the Fit_intercept=False MEANING? - python

I tried to compare logistic regression result from statsmodel with sklearn logisticRegression result. actually I tried to compare with R result also.
I made the options C=1e6(no penalty) but I got almost same coefficients except the intercept.
model = sm.Logit(Y, X).fit()
print(model.summary())
==> intercept = 5.4020
model = LogisticRegression(C=1e6,fit_intercept=False)
model = model.fit(X, Y)
===> intercept = 2.4508
so I read the user guide, they said Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
what is this meaning? due to this, sklearn logisticRegression gave a different intercept value?
please help me

LogisticRegression is in some aspects similar to the Perceptron Model and LinearRegression.
You multiply your weights with the data points and compare it to a threshold value b:
w_1 * x_1 + ... + w_n*x_n > b
This can be rewritten as:
-b + w_1 * x_1 + ... + w_n*x_n > 0
or
w_0 * 1 + w_1 * x_1 + ... + w_n*x_n > 0
For linear regression we keep this, for the perceptron we feed this to a chosen function and here for the logistic regression pass this to the logistic function.
Instead of learning n parameters now n+1 are learned. For the perceptron it is called bias, for regression intercept.
For linear regression it's easy to understand geometrically. In the 2D case you can think about this as a shifting the decision boundary by w_0 in the y direction**.
or y = m*x vs y = m*x + c
So now the decision boundary does not go through (0,0) anymore.
For the logistic function it is similar it shifts it away for the origin.
Implementation wise what happens, you add one more weight and a constant 1 to the X values. And then you proceed as normal.
if fit_intercept:
intercept = np.ones((X_train.shape[0], 1))
X_train = np.hstack((intercept, X_train))
weights = np.zeros(X_train.shape[1])

Related

Why can't I get the result I got with the sklearn LogisticRegression with the coefficients_sgd method?

from math import exp
import numpy as np
from sklearn.linear_model import LogisticRegression
I used code below from How To Implement Logistic Regression From Scratch in Python
def predict(row, coefficients):
yhat = coefficients[0]
for i in range(len(row)-1):
yhat += coefficients[i + 1] * row[i]
return 1.0 / (1.0 + exp(-yhat))
def coefficients_sgd(train, l_rate, n_epoch):
coef = [0.0 for i in range(len(train[0]))]
for epoch in range(n_epoch):
sum_error = 0
for row in train:
yhat = predict(row, coef)
error = row[-1] - yhat
sum_error += error**2
coef[0] = coef[0] + l_rate * error * yhat * (1.0 - yhat)
for i in range(len(row)-1):
coef[i + 1] = coef[i + 1] + l_rate * error * yhat * (1.0 - yhat) * row[i]
return coef
dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
l_rate = 0.3
n_epoch = 100
coef = coefficients_sgd(dataset, l_rate, n_epoch)
print(coef)
[-0.39233141593823756, 1.4791536027917747, -2.316697087065274]
x = np.array(dataset)[:,:2]
y = np.array(dataset)[:,2]
model = LogisticRegression(penalty="none")
model.fit(x,y)
print(model.intercept_.tolist() + model.coef_.ravel().tolist())
[-3.233238244349982, 6.374828107647225, -9.631487530388092]
What should I change to get the same or closer coefficients ? How can I establish initial coefficients , learning rate , n_epoch ?
Well, there are many nuances here 🙂
First, recall that estimating coefficients of logistic regression with (negative) log-likelihood is possible using various optimization methods, including SGD you implemented, but there is no exact, closed-form solution. So even if you implement an exact copy of scikit-learn's LogisticRegression, you will need to set the same hyperparameters (number of epochs, learning rate, etc.) and random state to obtain the same coefficients.
Second, LogisticRegression offers five different optimization methods (solver parameter). You run LogisticRegression(penalty="none") with its default parameters and the default for solver is 'lbfgs', not SGD; so depending on your data and hyperparameters, you may get significantly different results.
What should I change to get the same or closer coefficients ?
I would suggest comparing your implementation with SGDClassifier(loss='log') first, since LogisticRegression does not offer SGD solver. Although keep in mind that scikit-learn's implementation is more sophisticated, in particular having more hyperparameters for early stopping like tol.
How can I establish initial coefficients, learning rate, n_epoch?
Typically, coefficients for SGD are initialized randomly (e.g., uniform(-1/(2n), 1/(2n))), using some data statistics (e.g., dot(y, w)/(dot(w, w) for every coefficient w), or with pre-trained model's parameters. On the contrary, there is no golden rule for learning rate or number of epochs. Usually, we set a big number of epochs and some other stopping criterion (e.g., whether norm between current and previous coefficients is smaller than some small tol), a moderate learning rate, and every iteration we reduce the learning rate following some rule (see learning_rate parameter of SGDClassifier or User Guide) and check the stopping criterion.

What is coef_ in Lasso Regression in scikit learn

What does coef_ store?
coef_ originates from the Lasso regression method when trying to accomplish feature selection
It stores the parameter vector, describing the weights you multiply each feature by to get your predicted values. Essentially, coef_ are the parameters of your model (excluding the regularisation and the intercept (w0) term - see below).
When using Lasso regression you are performing linear regression with regularisation.
Without regularisation your predicted value for a given instance is of the form:
y = w0 + w1 * x1 + w2 * x2 + … + wn * xn
where y is your predicted value and your parameter vector is w = [w0, w1, w2, ..., wn] and your feature vector for the training instance is x = [x1, x2, ..., xn]. When you are performing your regression you are changing the values of the parameter vector (or 'weight vector') to get the predicted values which minimize the differences between the predicted values and the target values. This is achieved by minimizing a cost function (a measure of how far your predictions vary from the true values).
With (lasso) regularisation use simply add the l1 norm of the parameter vector to the cost function (which you minimize) which helps keep the values of the feature vector as small as possible.

Linear Regression model (using Gradient Descent) does not converge on Boston Housing Dataset

I've been trying to find out why my linear regression model performs poorly when compared to sklearn's linear regression model.
My linear regression model (update rules based on gradient descent)
w0 = 0
w1 = 0
alpha = 0.001
N = len(xTrain)
for i in range(1000):
yPred = w0 + w1*xTrain
w0 = w0 - (alpha/N)* sum(yPred - yTrain)
w1 = w1 - (alpha/N)*sum((yPred - yTrain) * xTrain)
Code for plotting the values of x from the training set and the predicted values of y
#Scatter plot between x and y
plot.scatter(xTrain,yTrain, c='black')
plot.plot(xTrain, w0+w1*xTrain, color='r')
plot.xlabel('Number of rooms')
plot.ylabel('Median value in 1000s')
plot.show()
I get the output as shown here https://i.stack.imgur.com/jvOfM.png
On running the same code using sklearn's inbuilt linear regression, I get this
https://i.stack.imgur.com/jvOfM.png
Can anyone help me where my model is going wrong? I have tried changing a number of iterations and learning rates, but there were no significant changes.
Here's the ipython notebook on colab if it helps: https://colab.research.google.com/drive/1c3lWKkv2lJfZAc19LiDW7oTuYuacQ3nd
Any help is highly appreciated
You can set a bigger learning rate such an 0.01. And it more times such as 500000 times. Then you will get a similar result.
Or you can initialize the w1 with a bigger number such as 5.

Python Logistic regression and future samples

I'm learning Logistic regression in Python and have managed to fit a model to my existing data (stock market data), and the predictions produce a nice result.
But I do not know how to convert that predictive model in a way I can apply it to future data. Ie is there a y=ax+b algo I can use to input future samples? How do I use the 'model'? How does one use the prediction for subsequent data? Or am I off track here - is Logistic regression not applied in this manner?
When you train the Logistic Regression, you learn the parameters a and b in y = ax + b. So, after training, a and b are known and can be used to solve the equation y = ax + b.
I don't know what exact Python packages you did use to train your model and how many classes you have, but if it would be, let's say, numpy and 2 classes, the prediction function could look like this:
import numpy as np
def sigmoid(z):
"""
Compute the sigmoid of z.
Arguments:
z: a scalar or numpy array
Return:
s: sigmoid(z)
"""
s = 1 / (1 + np.exp(-z))
return s
def predict(w, b, X):
'''
Predict whether the label is 0 or 1 using learned logistic
regression parameters (w, b).
Arguments:
w: weights
b: bias
X: data to predict
Returns:
Y_pred: a numpy array (vector) containing all predictions (0/1)
for the examples in X
'''
m = X.shape[1] # number of instances in X
Y_pred = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)
# Apply the same activation function which you applied during
# training, in this case it is a sigmoid
A = sigmoid(np.dot(w.T, X) + b)
for i in range(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
if A[0, i] > 0.5:
Y_pred[0, i] = 1
else:
Y_pred[0, i] = 0
return Y_pred

How to set the learning rate in scikit-learn's ridge regression?

I'm using scikit-learn's ridge regression:
regr = linear_model.Ridge (alpha = 0.5)
# Train the model using the training sets
regr.fit(X_train, Y_train)
#bias:
print('bias: \n', regr.intercept_)
# The coefficients
print('Coefficients: \n', regr.coef_)
I found (here) the different options for the linear_model.Ridge function, but there is a specific option that I didn't find in the list: How could I set the learning rate (or learning step) of the update function?
By learning rate, I mean:
w_{t+1} = w_t + (learning_rate) * (partial derivative of objective function)
I refer to learning rate as step size.
Your code is not using the sag (stochastic average gradient) solver. The default parameter for solver is set to auto, which will choose a solver depending on the data type. A description of the other solvers and which to use is here.
To use the sag solver:
regr = linear_model.Ridge (alpha = 0.5, solver = 'sag')
However, for this solver you do not set the step size because the solver computes the step size based on your data and alpha. Here is the code for sag solver used for ridge regression, where they explain how the step size is computed.
The step size is set to 1 / (alpha_scaled + L + fit_intercept) where L is
the max sum of squares for over all samples.
Line 401 shows how sag_solver being used for ridge regression.

Categories