what's wrong of the ridge regression gradient descent function? - python

I code the python function but the prediction doesn't accord with the fact.The price it predicts is negative. However, I cant't find where it is wrong. It is right or not when I compute the derivative[i] and weight[i]? please help.
following is a function which the function in picture use:
def feature_derivative_ridge(errors, feature, weight, l2_penalty, feature_is_constant):
# If feature_is_constant is True, derivative is twice the dot product of errors and feature
if feature_is_constant == True:
derivative = 2*np.dot(errors, feature)
# Otherwise, derivative is twice the dot product plus 2*l2_penalty*weight
else:
derivative = (2*np.dot(errors, feature) + 2*l2_penalty*weight)
return derivative

oh,I have found the answer.
First: errors = output - predictions
should be : errors = predictions
then: weights[i] = (1-....
should be: weights[i] = weights[i] - step_size*derivative[i]
(recall the formula)
finally, the output is right

Related

Build Linear Regression model in Python without SK Learn

I am trying to build a Linear Regression model without using SK learn package. For a single independent variable I have the code (given by my professor) which is below:
def get_gradient(w, x, y):
y_estimate = (np.power(x,1).dot(w)).flatten() #hypothesis
error = (y.flatten() - y_estimate)
mse = (1.0/len(x))*np.sum(np.power(error,2)) # mse
gradient = -(1.0/len(x)) * error.dot(np.power(x,1)) # gradient
return gradient, mse
w = np.random.randn(2) # Random Intialization
alpha = 0.5 # learning rate
tolerance = 1e-3 # param for stopping the loop
print("Intial values of Weights:")
print(w[1], w[0])
# Perform Gradient Descent
iterations = 1
while True:
gradient, error = get_gradient(w, train_x, train_y)
new_w = w - alpha * gradient
# Stopping Condition
if np.sum(abs(new_w - w)) < tolerance:
print ("Converged")
break
# Print error every 10 iterations
if iterations % 10 == 0:
print ("Iteration: %d - Error: %.4f" %(iterations, error))
print ("Updated Weights : {:f} , {:f}".format(w[1], w[0]))
iterations += 1
w = new_w
print ("Final Weights : {:f} , {:f}".format(w[1], w[0]))
print ("Test Cost =", get_gradient(w, test_x, test_y)[1])
This works fine with one variable (+ the intercept). Note - created a numpy array for x and y each for purpose of this code.
But now I wish to use this on the Boston Housing Dataset which has 13 Independent variables/features.
I cannot figure out how to use this for multiple variables. I cannot figure out how to:
Have each weight work with only the values in the column of one variable - apply gradient descent and recalculate the weight. If I try to run the gradient descent function with 13 weights inside the weights array - it ends up (I think) multiplying all 13 weights with ALL rows in EACH variable. I think this is what is happening because initially, I got 'error' as infinity after only a few iterations. Also, it got thrown into an infinite loop.
I tried running a for loop for each weight in the weights variable (13 variables) and THEN calculating gradient for each of the 13 variables. I then put each weight in a list (later converted to numpy array) before calling the 'get gradient' function in the while loop. This worked, at the time but error reached infinity and the program went into an infinite loop just like before. Anyway, this should not even be a solution since by doing a for loop I am making the first variable get all the relevant weight first.
I also tried to divide all the weights and errors by 13 - to at least get the program to run and then I would tweak it. But no success there either.
Strangely I cannot replicate the for loop anymore. I keep getting some or the error - shape errors, value errors etc.
Please help.
If any further information is required - please do let me know.
Thank you so much!

Return an array containing the squared errors between all predicted_prices and the actual prices (from the dataset)

I'm doing an exercise and I'm stuck.
Here's what I have to do:
I've been given a function to implement which has 4 arguments.
def squared_errors(slope, intercept, surfaces, prices
And I tried with a friend to get that function to work but none of us found the solution.
Basically, I have been given a dataset and I have to make sure that our estimator line is the best possible one, we need to compute the Mean Squared Error between price and
predicted_price (slope * surface + intercept). The dataset is a vector of shape(1000,1).
for each row, we should evaluate the squared_error (predicted_price - price)**2
But my brain is just numb and I can't come to a solution, and help would be greatly appreciate. !
Given the slope and the intercept, for any given data point x you can get its prediction as slope*x+intercept (or generally as slope^T.X+intrecept when it is vetorized)
Now that we have the predictions, if we have the actual ground truth then we can measure how good/bad our predictions are using squared loss which is nothing but just the square root of the mean of the squared difference between the prediction and the corresponding actual ground truth.
Sample (documented inline)
import numpy as np
# Actual slope
slope = 2
# Actual intercept
intercept = 0
# Some data
X = np.random.rand(10,1)
# The ground truth
prices = slope*X + intercept
# Loss
def squared_errors(slope, intercept, surfaces, prices):
y_hat = slope*surfaces + intercept
return np.sqrt(np.mean((y_hat - prices)**2))
# prefect prediction
print (squared_errors(2, 0, X, prices))
# Non prefect prediction
print (squared_errors(2, 0.5, X, prices))
print (squared_errors(1, 0, X, prices))
Output:
0.0
0.5
0.6286343914881158
As you can see for the prefect prediction the error is 0 and non zero for the rest based on how far way on average the predictions are from the ground truth.

Why the gradients are unconnected in the following function?

I am implementing a customer operation whose gradients must be calculated. The following is the function:
def difference(prod,box):
result = tf.Variable(tf.zeros((prod.shape[0],box.shape[1]),dtype=tf.float16))
for i in tf.range(0,prod.shape[0]):
for j in tf.range(0,box.shape[1]):
result[i,j].assign((tf.reduce_prod(box[:,j])-tf.reduce_prod(prod[i,:]))/tf.reduce_prod(box[:,j]))
return result
I am unable to calculate the gradients with respect to box, the tape.gradient() is returning None, here is the code I have written for calculating gradients
prod = tf.constant([[3,4,5],[4,5,6],[1,3,3]],dtype=tf.float16)
box = tf.Variable([[4,5],[5,6],[5,7]],dtype=tf.float16)
with tf.GradientTape() as tape:
tape.watch(box)
loss = difference(prod,box)
print(tape.gradient(loss,box))
I am not able to find the reason for unconnected gradients. Is the result variable causing it? Kindly suggest an alternative implementation.
Yes, in order to calculate gradients we need a set of (differentiable) operations on your variables.
You should re-write difference as a function of the 2 input tensors. I think (though happy to confess I am not 100% sure!) that it is the use of 'assign' that makes the gradient tape fall over.
Perhaps something like this:
def difference(prod, box):
box_red = tf.reduce_prod(box, axis=0)
prod_red = tf.reduce_prod(prod, axis=1)
return (tf.expand_dims(box_red, 0) - tf.expand_dims(prod_red, 1)) / tf.expand_dims(box_red, 0)
would get you the desired result

Gradient Descent Algorithm in Python

I am trying to write a gradient descent function in python as part of a multivariate linear regression exercise. It runs, but does not compute the correct answer. My code is below. I've been trying for weeks to finish this problem but have made zero progress.
I believe that I understand the concept of gradient descent to optimize a multivariate linear regression function and also that the 'math' is correct. I believe that the error is in my code, but I am still learning python. Your help is very much appreciated.
def regression_gradient_descent(feature_matrix,output,initial_weights,step_size,tolerance):
from math import sqrt
converged = False
weights = np.array(initial_weights)
while not converged:
predictions = np.dot(feature_matrix,weights)
errors = predictions - output
gradient_sum_squares = 0
for i in range(len(weights)):
derivative = -2 * np.dot(errors[i],feature_matrix[i])
gradient_sum_squares = gradient_sum_squares + np.dot(derivative, derivative)
weights[i] = weights[i] - step_size * derivative[i]
gradient_magnitude = sqrt(gradient_sum_squares)
print gradient_magnitude
if gradient_magnitude < tolerance:
converged = True
return(weights)
Feature matrix is:
sales = gl.SFrame.read_csv('kc_house_data.csv',column_type_hints = {'bathrooms':float, 'waterfront':int, 'sqft_above':int, 'sqft_living15':float,'grade':int, 'yr_renovated':int, 'price':float, 'bedrooms':float, 'zipcode':str,'long':float, 'sqft_lot15':float, 'sqft_living':float, 'floors':str, 'condition':int,'lat':float, 'date':str, 'sqft_basement':int, 'yr_built':int, 'id':str, 'sqft_lot':int,'view':int})
I'm calling the function as:
train_data,test_data = sales.random_split(.8,seed=0)
simple_features = ['sqft_living']
my_output= 'price'
(simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, my_output)
initial_weights = np.array([-47000., 1.])
step_size = 7e-12
tolerance = 2.5e7
simple_weights = regression_gradient_descent(simple_feature_matrix, output,initial_weights,step_size,tolerance)
**get_numpy_data is just a function to convert everything into arrays and works as intended
Update: I fixed the formula to:
derivative = 2 * np.dot(errors,feature_matrix)
and it seems to have worked. The derivation of this formula in my online course used
-2 * np.dot(errors,feature_matrix)
and I'm not sure why this formula did not provide the correct answer.
The step size seems too small, and the tolerance unusually big. Perhaps you meant to use them the other way around?
In general, the step size is determined by a trial-and-error procedure: the "natural" step size α=1 might lead to divergence, so one could try to lower the value (e.g. taking α=1/2, α=1/4, etc until convergence is achieved. Don't start with a very small step size.

Compute the gradient of the SVM loss function

I am trying to implement the SVM loss function and its gradient.
I found some example projects that implement these two, but I could not figure out how they can use the loss function when computing the gradient.
Here is the formula of loss function:
What I cannot understand is that how can I use the loss function's result while computing gradient?
The example project computes the gradient as follows:
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
dW[:,j] += X[i]
dW[:,y[i]] -= X[i]
dW is for gradient result. And X is the array of training data.
But I didn't understand how the derivative of the loss function results in this code.
The method to calculate gradient in this case is Calculus (analytically, NOT numerically!). So we differentiate loss function with respect to W(yi) like this:
and with respect to W(j) when j!=yi is:
The 1 is just indicator function so we can ignore the middle form when condition is true. And when you write in code, the example you provided is the answer.
Since you are using cs231n example, you should definitely check note and videos if needed.
Hope this helps!
If the substraction less than zero the loss is zero so the gradient of W is also zero. If the substarction larger than zero, then the gradient of W is the partial derviation of the loss.
If we don't keep these two lines of code:
dW[:,j] += X[i]
dW[:,y[i]] -= X[i]
we get loss value.

Categories