While implementing Gradient Descent Algorithm in linear regression, the prediction that my algorithm is making and the resulting regression line are coming as a wrong output. Could anyone please have a look at my implementation and help me out? Also, please guide me that how can I know what value of "learning rate" and "number of iterations" to choose in specific regression problem?
theta0 = 0 #first parameter
theta1 = 0 #second parameter
alpha = 0.001 #learning rate (denoted by alpha)
num_of_iterations = 100 #total number of iterations performed by Gradient Descent
m = float(len(X)) #total number of training examples
for i in range(num_of_iterations):
y_predicted = theta0 + theta1 * X
derivative_theta0 = (1/m) * sum(y_predicted - Y)
derivative_theta1 = (1/m) * sum(X * (y_predicted - Y))
temp0 = theta0 - alpha * derivative_theta0
temp1 = theta1 - alpha * derivative_theta1
theta0 = temp0
theta1 = temp1
print(theta0, theta1)
y_predicted = theta0 + theta1 * X
plt.scatter(X,Y)
plt.plot(X, y_predicted, color = 'red')
plt.show()
Resulting regression line about which I need some help
Your learning rate is to high, I got it working by reducing the learning rate to alpha = 0.0001.
Related
I need to implement gradient descent on my own. My task is to create an arbitrary function, add noise to it, and then find the coefficients values for that function. So first, I created a function and created some random values:
# Preprocessing Input data
function = lambda x: x ** 2 +(x)+1
X=[]
Y=[]
for i in range(-100,100):
X.append(i)
Y.append(function(i) + random.randrange(-10,10)
then I normalized the values-
maxVal = np.max(np.hstack((X,Y)))
X = X/maxVal
Y = Y/maxVal
X= np.asarray(X)
Y= np.asarray(Y)
and this is my code for gradient descent, using derivative to find the coefficients
w1Arr = []
w2Arr = []
bArr = []
lossArr = []
for i in range(epochs):
Y_pred =w1*np.square(X)+w2*X+b
D_w1 = (-2/n) * sum( np.square(X) * (Y - Y_pred)) # Derivative for w1
D_w2 = (-2/n) * sum(X * (Y - Y_pred)) # Derivative for w2
D_b = (-2/n) * sum(Y - Y_pred) # Derivative for b
w1 = w1 - L * D_w1 # Update w1
w2 = w2 - L * D_w2 # Update w2
b = b - L * D_b # Update b
loss = sum((Y - Y_pred) * (Y - Y_pred)) #MSE
w1Arr.append(w1)
w2Arr.append(w2)
bArr.append(b)
lossArr.append(loss)
when I try to plot the results:
# Making predictions
Y_pred = w1*(np.square(X))+w2*X+b
#print(Y_pred)
plt.scatter(X, Y)
plt.plot(X, Y_pred) # predicted
plt.legend()
plt.show()
I see that the coefficients are pretty much the same,and just looks like a linear line-
I'm pretty much stuck, and don't know what is wrong with my code or how to fix it.
I looked online looking for solutions, but couldn't find any.
any help would be appreciated.
Found out the problem!
The normalization you applied just messed up the relation between the x and y, in particular you skewed the domain with respect of the codomain:
maxVal = np.max(np.hstack((X,Y)))
X = X/maxVal
Y = Y/maxVal
Just remove the normalization and you will find that you can learn the coefficients.
If you really want, you can normalize both the axis, but they have to be two proportional values:
X = X / np.max(X)
Y = Y / np.max(Y)
Writing this algorithm for my final year project. Used gradient descent to find the best fit line. I tried solving it with excel too using Multi-regression. The values are different.
The csv file is attached here https://drive.google.com/file/d/1-UaU34w3c5-VunYrVz9fD7vRb0c-XDqk/view?usp=sharing. The first 3 columns are independent variables (x1,x2,x3) and the last is dependent (y).
Its a different question, If you could explain why the answer is different from excel values?
import numpy as np
import random
import pandas as pd
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
gradient = np.dot(xTrans, loss) / m
theta = theta - alpha * gradient
return theta
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)
df.columns = ['x0','Speed','Feed','DOC','Roughness']
print(df)
y = np.array(df['Roughness'])
#x = np.array(d)
x = np.array(df.drop(['Roughness'],1))
#x[:,2:3] = 1.0
print (x)
print(y)
m, n = np.shape(x)
print(m,n)
numIterations= 50000
alpha = 0.000001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
For the past few days, I have been trying to code this application of Gradient Descent for my final-year project in Mechanical Engineering. https://drive.google.com/open?id=1tIGqZ2Lb0sN4GEpgYEZLFvtmhigXnot0 The HTML file is attached above. Just download the file, and if you see the results. There are only 3 values in theta, whereas x has 3 independent variables. So it should have 4 values in theta.
The code is as follows. For the result, it is theta [-0.03312393 0.94409351 0.99853041]
import numpy as np
import random
import pandas as pd
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)
x = df.loc[:,'0':'2']
y = df[3]
print (x)
m, n = np.shape(x)
numIterations= 200
alpha = 0.000001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
I am trying to mimic the gradient descent algorithm for linear regression from Andrew NG's Machine learning course to Python, but for some reason my implementation is not working correctly.
Here's my implementation in Octave, it works correctly:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
prediction = X*theta;
margin_error = prediction - y;
gradient = 1/m * (alpha * (X' * margin_error));
theta = theta - gradient;
J_history(iter) = computeCost(X, y, theta);
end
end
However, when I translate this to Python for some reason it is not giving me accurate results. The cost seems to be going up rather than descending.
Here's my implementation in Python:
def gradientDescent(x, y, theta, alpha, iters):
m = len(y)
J_history = np.matrix(np.zeros((iters,1)))
for i in range(iters):
prediction = x*theta.T
margin_error = prediction - y
gradient = 1/m * (alpha * (x.T * margin_error))
theta = theta - gradient
J_history[i] = computeCost(x,y,theta)
return theta,J_history
My code is compiling and there isn't anything wrong. Please note this is theta:
theta = np.matrix(np.array([0,0]))
Alpha and iters is set to this:
alpha = 0.01
iters = 1000
When I run it, opt_theta, cost = gradientDescent(x, y, theta, alpha, iters) and print out opt_theta, I get this:
matrix([[ 2.36890383e+16, -1.40798902e+16],
[ 2.47503758e+17, -2.36890383e+16]])
when I should get this:
matrix([[-3.24140214, 1.1272942 ]])
What am I doing wrong?
Edit:
Cost function
def computeCost(x, y, theta):
# Get length of data set
m = len(y)
# We get theta transpose because we are working with a numpy array [0,0] for example
prediction = x * theta.T
J = 1/(2*m) * np.sum(np.power((prediction - y), 2))
return J
Look there:
>>> A = np.matrix([3,3,3])
>>> B = np.matrix([[1,1,1], [2,2,2]])
>>> A-B
matrix([[2, 2, 2],
[1, 1, 1]])
Matrices are broadcasted together.
"it's because np.matrix inherits from np.array. np.matrix overrides multiplication, but not addition and subtraction"
In yours situation theta(1x2) subtract gradient(2x1) and in result you have got 2x2. Try to transpose gradient before subtracting.
theta = theta - gradient.T
Hey I am trying to understand this algorithm for a linear hypothesis. I can't figure out if my implementation is correct or not. I think it is not correct but I can't figure out what am I missing.
theta0 = 1
theta1 = 1
alpha = 0.01
for i in range(0,le*10):
for j in range(0,le):
temp0 = theta0 - alpha * (theta1 * x[j] + theta0 - y[j])
temp1 = theta1 - alpha * (theta1 * x[j] + theta0 - y[j]) * x[j]
theta0 = temp0
theta1 = temp1
print ("Values of slope and y intercept derived using gradient descent ",theta1, theta0)
It is giving me the correct answer to the 4th degree of precision. but when I compare it to other programs on the net I am getting confused by it.
Thanks in advance!
Implementation of the Gradient Descent algorithm:
import numpy as np
cur_x = 1 # Initial value
gamma = 1e-2 # step size multiplier
precision = 1e-10
prev_step_size = cur_x
# test function
def foo_func(x):
y = (np.sin(x) + x**2)**2
return y
# Iteration loop until a certain error measure
# is smaller than a maximal error
while (prev_step_size > precision):
prev_x = cur_x
cur_x += -gamma * foo_func(prev_x)
prev_step_size = abs(cur_x - prev_x)
print("The local minimum occurs at %f" % cur_x)