from math import exp
import numpy as np
from sklearn.linear_model import LogisticRegression
I used code below from How To Implement Logistic Regression From Scratch in Python
def predict(row, coefficients):
yhat = coefficients[0]
for i in range(len(row)-1):
yhat += coefficients[i + 1] * row[i]
return 1.0 / (1.0 + exp(-yhat))
def coefficients_sgd(train, l_rate, n_epoch):
coef = [0.0 for i in range(len(train[0]))]
for epoch in range(n_epoch):
sum_error = 0
for row in train:
yhat = predict(row, coef)
error = row[-1] - yhat
sum_error += error**2
coef[0] = coef[0] + l_rate * error * yhat * (1.0 - yhat)
for i in range(len(row)-1):
coef[i + 1] = coef[i + 1] + l_rate * error * yhat * (1.0 - yhat) * row[i]
return coef
dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
l_rate = 0.3
n_epoch = 100
coef = coefficients_sgd(dataset, l_rate, n_epoch)
print(coef)
[-0.39233141593823756, 1.4791536027917747, -2.316697087065274]
x = np.array(dataset)[:,:2]
y = np.array(dataset)[:,2]
model = LogisticRegression(penalty="none")
model.fit(x,y)
print(model.intercept_.tolist() + model.coef_.ravel().tolist())
[-3.233238244349982, 6.374828107647225, -9.631487530388092]
What should I change to get the same or closer coefficients ? How can I establish initial coefficients , learning rate , n_epoch ?
Well, there are many nuances here 🙂
First, recall that estimating coefficients of logistic regression with (negative) log-likelihood is possible using various optimization methods, including SGD you implemented, but there is no exact, closed-form solution. So even if you implement an exact copy of scikit-learn's LogisticRegression, you will need to set the same hyperparameters (number of epochs, learning rate, etc.) and random state to obtain the same coefficients.
Second, LogisticRegression offers five different optimization methods (solver parameter). You run LogisticRegression(penalty="none") with its default parameters and the default for solver is 'lbfgs', not SGD; so depending on your data and hyperparameters, you may get significantly different results.
What should I change to get the same or closer coefficients ?
I would suggest comparing your implementation with SGDClassifier(loss='log') first, since LogisticRegression does not offer SGD solver. Although keep in mind that scikit-learn's implementation is more sophisticated, in particular having more hyperparameters for early stopping like tol.
How can I establish initial coefficients, learning rate, n_epoch?
Typically, coefficients for SGD are initialized randomly (e.g., uniform(-1/(2n), 1/(2n))), using some data statistics (e.g., dot(y, w)/(dot(w, w) for every coefficient w), or with pre-trained model's parameters. On the contrary, there is no golden rule for learning rate or number of epochs. Usually, we set a big number of epochs and some other stopping criterion (e.g., whether norm between current and previous coefficients is smaller than some small tol), a moderate learning rate, and every iteration we reduce the learning rate following some rule (see learning_rate parameter of SGDClassifier or User Guide) and check the stopping criterion.
What does coef_ store?
coef_ originates from the Lasso regression method when trying to accomplish feature selection
It stores the parameter vector, describing the weights you multiply each feature by to get your predicted values. Essentially, coef_ are the parameters of your model (excluding the regularisation and the intercept (w0) term - see below).
When using Lasso regression you are performing linear regression with regularisation.
Without regularisation your predicted value for a given instance is of the form:
y = w0 + w1 * x1 + w2 * x2 + … + wn * xn
where y is your predicted value and your parameter vector is w = [w0, w1, w2, ..., wn] and your feature vector for the training instance is x = [x1, x2, ..., xn]. When you are performing your regression you are changing the values of the parameter vector (or 'weight vector') to get the predicted values which minimize the differences between the predicted values and the target values. This is achieved by minimizing a cost function (a measure of how far your predictions vary from the true values).
With (lasso) regularisation use simply add the l1 norm of the parameter vector to the cost function (which you minimize) which helps keep the values of the feature vector as small as possible.
I've been trying to find out why my linear regression model performs poorly when compared to sklearn's linear regression model.
My linear regression model (update rules based on gradient descent)
w0 = 0
w1 = 0
alpha = 0.001
N = len(xTrain)
for i in range(1000):
yPred = w0 + w1*xTrain
w0 = w0 - (alpha/N)* sum(yPred - yTrain)
w1 = w1 - (alpha/N)*sum((yPred - yTrain) * xTrain)
Code for plotting the values of x from the training set and the predicted values of y
#Scatter plot between x and y
plot.scatter(xTrain,yTrain, c='black')
plot.plot(xTrain, w0+w1*xTrain, color='r')
plot.xlabel('Number of rooms')
plot.ylabel('Median value in 1000s')
plot.show()
I get the output as shown here https://i.stack.imgur.com/jvOfM.png
On running the same code using sklearn's inbuilt linear regression, I get this
https://i.stack.imgur.com/jvOfM.png
Can anyone help me where my model is going wrong? I have tried changing a number of iterations and learning rates, but there were no significant changes.
Here's the ipython notebook on colab if it helps: https://colab.research.google.com/drive/1c3lWKkv2lJfZAc19LiDW7oTuYuacQ3nd
Any help is highly appreciated
You can set a bigger learning rate such an 0.01. And it more times such as 500000 times. Then you will get a similar result.
Or you can initialize the w1 with a bigger number such as 5.
I'm learning Logistic regression in Python and have managed to fit a model to my existing data (stock market data), and the predictions produce a nice result.
But I do not know how to convert that predictive model in a way I can apply it to future data. Ie is there a y=ax+b algo I can use to input future samples? How do I use the 'model'? How does one use the prediction for subsequent data? Or am I off track here - is Logistic regression not applied in this manner?
When you train the Logistic Regression, you learn the parameters a and b in y = ax + b. So, after training, a and b are known and can be used to solve the equation y = ax + b.
I don't know what exact Python packages you did use to train your model and how many classes you have, but if it would be, let's say, numpy and 2 classes, the prediction function could look like this:
import numpy as np
def sigmoid(z):
"""
Compute the sigmoid of z.
Arguments:
z: a scalar or numpy array
Return:
s: sigmoid(z)
"""
s = 1 / (1 + np.exp(-z))
return s
def predict(w, b, X):
'''
Predict whether the label is 0 or 1 using learned logistic
regression parameters (w, b).
Arguments:
w: weights
b: bias
X: data to predict
Returns:
Y_pred: a numpy array (vector) containing all predictions (0/1)
for the examples in X
'''
m = X.shape[1] # number of instances in X
Y_pred = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)
# Apply the same activation function which you applied during
# training, in this case it is a sigmoid
A = sigmoid(np.dot(w.T, X) + b)
for i in range(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
if A[0, i] > 0.5:
Y_pred[0, i] = 1
else:
Y_pred[0, i] = 0
return Y_pred
I'm using scikit-learn's ridge regression:
regr = linear_model.Ridge (alpha = 0.5)
# Train the model using the training sets
regr.fit(X_train, Y_train)
#bias:
print('bias: \n', regr.intercept_)
# The coefficients
print('Coefficients: \n', regr.coef_)
I found (here) the different options for the linear_model.Ridge function, but there is a specific option that I didn't find in the list: How could I set the learning rate (or learning step) of the update function?
By learning rate, I mean:
w_{t+1} = w_t + (learning_rate) * (partial derivative of objective function)
I refer to learning rate as step size.
Your code is not using the sag (stochastic average gradient) solver. The default parameter for solver is set to auto, which will choose a solver depending on the data type. A description of the other solvers and which to use is here.
To use the sag solver:
regr = linear_model.Ridge (alpha = 0.5, solver = 'sag')
However, for this solver you do not set the step size because the solver computes the step size based on your data and alpha. Here is the code for sag solver used for ridge regression, where they explain how the step size is computed.
The step size is set to 1 / (alpha_scaled + L + fit_intercept) where L is
the max sum of squares for over all samples.
Line 401 shows how sag_solver being used for ridge regression.