Issue Implementing Custom Gradient Descent Function - python

I am implementing my own/custom Gradient descent algorithm using python but the weights and biases that are returned by my algorithm has 10 values (shape=(10, )) but my input data has only 1 column so I am expecting it to return 1 Weight and 1 Bias
Code:
import numpy as np
import matplotlib.pyplot as plt
def SGD(X, y, learning_rate=0.01, max_iter=1000):
w = np.random.randn(X.shape[1])
b = np.random.randn(1,)
print(w, b)
n = len(X)
loss_list = []
for i in range(max_iter):
y_pred = w*X + b
Lw = -(2/n)*sum(X*(y - y_pred))
Lb = -(2/n)*sum(y - y_pred)
w = w - learning_rate*Lw
b = b - learning_rate*Lb
loss = np.square(np.subtract(y, y_pred)).mean()
loss_list.append(loss)
print(f"Epoch: {i}, loss: {loss}")
return w, b
x = list(range(1, 11))
y = []
for i in x:
y.append(i**2)
x, y = np.array(x).reshape(-1, 1), np.array(y)
w, b = SGD(x, y)
print("\n\n\n\n")
print(w)
print(b)
Loss of last iteration:
Epoch: 999, loss: 0.11521764208740602
Returned weight and bias respectively,
w: [0.00149535 0.00777379 0.01823786 0.03288755 0.05172286 0.07474381
0.10195038 0.13334257 0.1689204 0.20868384] # giving 10 values
b: [ 0.98958964 3.94588026 8.87303129 15.77104274 24.63991461 35.47964689
48.29023958 63.07169269 79.82400621 98.54718014] # giving 10 values
I am not understanding the cause, how this is happening?
Thanks!

I think this is because your y is a 1d row list, but y_pred is a 1xn column list, so subtracting them will give you an nxn matrix which you don't want. The fix is to just reshape y before you call your function like so:
import numpy as np
import matplotlib.pyplot as plt
def SGD(X, y, learning_rate=0.01, max_iter=1000):
w = np.random.randn(X.shape[1])
b = np.random.randn(1,)
print(w, b)
n = len(X)
loss_list = []
for i in range(max_iter):
y_pred = w*X + b
Lw = -(2/n)*sum(X*(y - y_pred))
Lb = -(2/n)*sum(y - y_pred)
w = w - learning_rate*Lw
b = b - learning_rate*Lb
loss = np.square(np.subtract(y, y_pred)).mean()
loss_list.append(loss)
print(f"Epoch: {i}, loss: {loss}")
return w, b
x = list(range(1, 11))
y = []
for i in x:
y.append(i**2)
x, y = np.array(x).reshape(-1, 1), np.array(y).reshape((-1, 1)) # Change is here
w, b = SGD(x, y)
print("\n\n\n\n")
print(w)
print(b)
and then w, b are:
[10.94655101]
[-21.6278976]
respectively

Related

pytorch's augmented assignment and requires_grad

Why does:
with torch.no_grad():
w = w - lr*w.grad
print(w)
results in:
tensor(0.9871)
and
with torch.no_grad():
w -= lr*w.grad
print(w)
results in:
tensor(0.9871, requires_grad=True)
Aren't both operations the same?
Here is some test code:
def test_stack():
np.random.seed(0)
n = 50
feat1 = np.random.randn(n, 1)
feat2 = np.random.randn(n, 1)
X = torch.tensor(feat1).view(-1, 1)
Y = torch.tensor(feat2).view(-1, 1)
w = torch.tensor(1.0, requires_grad=True)
epochs = 1
lr = 0.001
for epoch in range(epochs):
for i in range(len(X)):
y_pred = w*X[i]
loss = (y_pred - Y[i])**2
loss.backward()
with torch.no_grad():
#w = w - lr*w.grad # DOESN'T WORK!!!!
#print(w); return
w -= lr*w.grad
print(w); return
w.grad.zero_()
Remove the comments and you'll se the requires_grad disappearing. Could this be a bug?

Logistic regression from scratch: error keeps increasing

I have implemented logistic regression from scratch, however when I run the script the algorithm always predict the wrong label.
I've tried changing the training output and test_output by switching all 1 to 0 and vice versa but it always predict the wrong label.
I also noticed that changing the "-" sign to "+", when updating the weigths and the bias, the script correctly predicts the label.
What am I doing wrong?
This is the code I've written:
# IMPORTS
import numpy as np
# HYPERPARAMETERS
EPOCHS = 1000
LEARNING_RATE = 0.1
# FUNCTIONS
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def cost(y_pred, training_outputs, m):
j = - np.sum(training_outputs * np.log(y_pred) + (1 - training_outputs) * np.log(1 - y_pred)) / m
return j
# ENTRY
if __name__ == "__main__":
# Training input and output
x = np.array([[1, 1, 1], [0, 0, 0], [1, 0, 1]])
training_outputs = np.array([1, 0, 1])
# Test input and output
test_input = np.array([[0, 1, 1]])
test_output = np.array([0])
# Weigths
w = np.array([0.3, 0.3, 0.3])
# Biases
b = 0
m = 3
# Training
for iteration in range(EPOCHS):
print("Iteration n.", iteration, end= "\r")
# Compute log odds
z = np.dot(x, w) + b
# Compute predicted probability
y_pred = sigmoid(z)
# Back propagation
dz = y_pred - training_outputs
dw = np.dot(x, dz) / m
db = np.sum(dz) / m
# Update weights and bias according to the gradient descent algorithm
w = w - LEARNING_RATE * dw
b = b - LEARNING_RATE * db
print("Model trained. Proceeding with model evaluation...")
# Test
# Compute log odds
z = np.dot(test_input, w) + b
# Compute predicted probability
y_pred = sigmoid(z)
print(y_pred)
# Compute cost
cost = cost(y_pred, test_output, m)
print(cost)
There was an incorrect assumption pointed out by #J_H:
>>> from sklearn.linear_model import LogisticRegression
>>> import numpy as np
>>> x = np.array([[1, 1, 1], [0, 0, 0], [1, 0, 1]])
>>> y = np.array([1, 0, 1])
>>> clf = LogisticRegression().fit(x, y)
>>> clf.predict([[0, 1, 1]])
array([1])
scikit-learn at appears to believe that test_output should be a 1 rather than a 0.
A few more recommendations:
m should be fine to remove (it's a constant, so it could be included in the LEARNING_RATE)
w should be initialized proportional to the number of columns in x (i.e., x.shape[1])
dw = np.dot(x, dz) should be np.dot(dz, x)
Prediction in logistic regression depends on a threshold, usually 0.5
Taking this into account would look something like the following.
# Initialize weights and bias
w, b = np.zeros(X.shape[1]), 0
for _ in range(EPOCHS):
# Compute log odds
z = np.dot(x, w) + b
# Compute predicted probability
y_pred = sigmoid(z)
# Back propagation
dz = y_pred - training_outputs
dw = np.dot(dz, x)
db = np.sum(dz)
# Update
w = w - LEARNING_RATE * dw
b = b - LEARNING_RATE * db
# Test
z = np.dot(test_input, w) + b
test_pred = sigmoid(z) >= 0.5
print(test_pred)
And a complete example on random train/test sets created with sklearn.datasets.make_classification could look like this—which usually gets within a few decimals of the scikit-learn implementation as well:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
EPOCHS = 100
LEARNING_RATE = 0.01
def sigmoid(z):
return 1 / (1 + np.exp(-z))
if __name__ == "__main__":
X, y = make_classification(n_samples=1000, n_features=5)
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Initialize `w` and `b`
w, b = np.zeros(X.shape[1]), 0
for _ in range(EPOCHS):
z = np.dot(X_train, w) + b
y_pred = sigmoid(z)
dz = y_pred - y_train
dw = np.dot(dz, X_train)
db = np.sum(dz)
w = w - LEARNING_RATE * dw
b = b - LEARNING_RATE * db
# Test
z = np.dot(X_test, w) + b
test_pred = sigmoid(z) >= 0.5
print(accuracy_score(y_test, test_pred))

Why doesn't my logistic regression algorithm work?

I'm trying to compute my gradient for a multiclass classification model with logistic regression and it seems not to be working properly.
This is the data that I am using for this model.
import pandas as pd
from sklearn.preprocessing import normalize
# Create x and y datasets
path = '/kaggle/input/digit-recognizer'
# Set train and test sets
data = pd.read_csv(path + '/train.csv', nrows=6000)
x_train, y_train = data.iloc[:4800, 1:].values, data.iloc[:4800, 0].values
x_test, y_test = data.iloc[4800:, 1:].values, data.iloc[4800:, 0].values
# Normalize and expand dims
x_train, x_test = x_train / 255, x_test / 255
y_train, y_test = np.expand_dims(y_train, 1), np.expand_dims(y_test, 1)
assert len(x_train) == len(y_train)
assert len(x_test) == len(y_test)
x_train.shape, y_train.shape
((4800, 784), (4800, 1))
Here is the following code where I try to implement gradient descent:
def Sigmoid(z):
from math import e
return 1 / (1 + e**-z)
def CostFunction(h, y, m):
j = -(1/m) * (y # np.log(h) + (1-y) # np.log(1-h))
return j
def GradientDescent(X, y, theta, n_classes):
import numpy as np
# Useful variables
m = len(y)
theta0, theta1 = [x.copy() for x in theta]
grad0, grad1 = [np.zeros(x.shape) for x in [theta0, theta1]]
y_vec = np.zeros((m, n_classes))
j = 0
for i in range(m):
y_vec[i, y[i]] = 1
### Forward propagation
a0 = np.concatenate(([1], X[i]))
a1 = np.concatenate(([1], Sigmoid(theta0 # a0)))
a2 = Sigmoid(theta1 # a1)
h = a2
j += CostFunction(h, y_vec[i], m)
### Backpropagation
delta2 = a2 - y_vec[i]
delta1 = theta1.T # delta2 * (a1 * (1 - a1))
grad0 += np.expand_dims(delta1[1:], 1) # np.expand_dims(a0, 0)
grad1 += np.expand_dims(delta2, 1) # np.expand_dims(a1, 0)
grad0 = grad0 / m
grad1 = grad1 / m
return j, [grad0, grad1]
Now comes the training process.
### Create theta parameters
n_layers = 3
n_classes = 10
# Weigth matrix dims(i, j) = (number of nodes, input shape + bias)
theta0 = np.random.uniform(0, 0.01, (24, x_train.shape[1] + 1))
theta1 = np.random.uniform(0, 0.01, (n_classes, len(theta0) + 1))
theta_params = [theta0, theta1]
### Train parameters
%%time
epochs = 200
alpha = 0.001
j, t = np.zeros(epochs), theta_params.copy()
for i in range(epochs):
print("Iterarion: {}/{}".format(i + 1, epochs))
j[i], g = GradientDescent(x_train, y_train, t, n_classes)
print(j[i])
t[0] = t[0] - a * g[0]
t[1] = t[1] - a * g[1]
The cost starts from J=7.2583 and goes down to approximately J=3.5223, where it gets stuck.
Then, whenever I try to predict any of the samples from the training or test sets it outputs the same approximate probability for all classes.
def Predict(X, theta):
import numpy as np
# Useful variables
m = len(X)
theta0, theta1 = [x for x in theta]
h = np.zeros(m)
for i in range(m):
### Forward propagation
a0 = np.concatenate(([1], X[i]))
a1 = np.concatenate(([1], Sigmoid(theta0 # a0)))
a2 = Sigmoid(theta1 # a1)
print(a2)
h[i] = np.argmax(a2)
return h
Predict(x_train[:1], t)
[0.20078521 0.19842413 0.20535222 0.1953332 0.19425315 0.19302124
0.20107485 0.19589331 0.19688894 0.19526526]
array([2.])
Notice that I'm am printing the results of the hypothesis probability for each node in the last layer during the Predict function.
Anyone could point me the direction by sharing some tips?

How can i complete gradient descent algorithm code?

I am freshman & beginner.
I am studying machine learning with open tutorials.
I have a trouble with making gradient descent algorithm
I have to complete "for _ in range(max_iter):" but, I don't know about numpy... so I don't know what code should i add
Could you please help me fill the blank?
I know this type of question is so rude... sorry but I need your help :(
Thank you in advance.
from sklearn import datasets
import numpy as np
from sklearn.metrics import accuracy_score
X, y = datasets.make_classification(
n_samples = 200, n_features = 2, random_state = 333,
n_informative =2, n_redundant = 0 , n_clusters_per_class= 1)
def sigmoid(s):
return 1 / (1 + np.exp(-s))
def loss(y, h):
return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
def gradient(X, y, w):
return -(y * X) / (1 + np.exp(-y * np.dot(X, w)))
X_bias = np.append(np.ones((X.shape[0], 1)), X, axis=1)
y = np.array([[1] if label == 0 else [0] for label in y])
w = np.array([[random.uniform(-1, 1)] for _ in range(X.shape[1]+1)])
max_iter = 100
learning_rate = 0.1
threshold = 0.5
for _ in range(max_iter):
#fill in the blank
what code should i add ????
probabilities = sigmoid(np.dot(X_bias, w))
predictions = [[1] if p > threshold else [0] for p in probabilities]
print("loss: %.2f, accuracy: %.2f" %
(loss(y, probabilities), accuracy_score(y, predictions)))
Inside the for loop, we have to first compute the probabilities. Then find the gradients and then update the weights.
For computing probabilities, you can use the code below
probs=sigmoid(np.dot(X_bias,w))
np.dot is numpy command for matrix multiplication. Then we will calculate the loss and its gradients.
J=loss(y,probs)
dJ=gradient(X_bias,y,w)
Now we will update the weights.
w=w-learning_rate*dJ
So the final code will be
from sklearn import datasets
import numpy as np
from sklearn.metrics import accuracy_score
X, y = datasets.make_classification(
n_samples = 200, n_features = 2, random_state = 333,
n_informative =2, n_redundant = 0 , n_clusters_per_class= 1)
def sigmoid(s):
return 1 / (1 + np.exp(-s))
def loss(y, h):
return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
def gradient(X, y, w):
return -(y * X) / (1 + np.exp(-y * np.dot(X, w)))
X_bias = np.append(np.ones((X.shape[0], 1)), X, axis=1)
y = np.array([[1] if label == 0 else [0] for label in y])
w = np.array([[np.random.uniform(-1, 1)] for _ in range(X.shape[1]+1)])
max_iter = 100
learning_rate = 0.1
threshold = 0.5
for _ in range(max_iter):
probs=sigmoid(np.dot(X_bias,w))
J=loss(y,probs)
dJ=gradient(X_bias,y,w)
w=w-learning_rate*dJ
probabilities = sigmoid(np.dot(X_bias, w))
predictions = [[1] if p > threshold else [0] for p in probabilities]
print("loss: %.2f, accuracy: %.2f" %
(loss(y, probabilities), accuracy_score(y, predictions)))
Note: In the for loop, there is no need to compute probs and loss, As we only need gradients to update the weights. I did that because it will be easy to understand.

tensorflow program got stuck in variables

I am studying Tensorflow and got some problems. I want to minimize loss function when i am trying to approximate 2x+2z-3t=y (to get a,b,c values where a=2,b=2,c=-3) but it doesn't work. Where is my mistake?
This is my output:
a: [ 0.51013279] b: [ 0.51013279] c: [ 1.00953674] loss: 2.72952e+10
I need a:2 b:2 c:-3 and loss close to 0
import tensorflow as tf
import numpy as np
a = tf.Variable([1], dtype=tf.float32)
b = tf.Variable([1], dtype=tf.float32)
c = tf.Variable([0], dtype=tf.float32)
x = tf.placeholder(tf.float32)
z = tf.placeholder(tf.float32)
t = tf.placeholder(tf.float32)
linear_model = a * x + b * z + c * t
y = tf.placeholder(tf.float32)
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
x_train = np.arange(0, 5000, 1)
z_train = np.arange(0, 10000, 2)
t_train = np.arange(0, 5000, 1)
y_train = list(map(lambda x, z, t: 2 * x + 2 * z - 3 * t, x_train, z_train,
t_train))
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(10000):
sess.run(train, {x: x_train, z: z_train, t: t_train, y: y_train})
curr_a, curr_b, curr_c, curr_loss = sess.run([a, b, c, loss], {x: x_train,
z: z_train, t: t_train, y: y_train})
print("a: %s b: %s c: %s loss: %s" % (curr_a, curr_b, curr_c, curr_loss))
I changed Maxim's code a bit to see values of a,b,c like this:
_, loss_val, curr_a, curr_b, curr_c, model_val = sess.run([optimizer,
loss,a, b, c, linear model], {x: x_train, z: z_train, t: t_train,
y: y_train})
So my output is:
10 2.04454e-11 1.83333 0.666667 -0.166667
20 2.04454e-11 1.83333 0.666667 -0.166667
30 2.04454e-11 1.83333 0.666667 -0.166667
I expected a=2,b=2,c=-3
First up, there is no single solution, so the optimizer can converge to any one of local minima. The exact value greatly depends on initialization of your variables.
Short answer concerning your bug: be careful with the learning rate. Checkout my version of your code:
a = tf.Variable(2, dtype=tf.float32)
b = tf.Variable(1, dtype=tf.float32)
c = tf.Variable(0, dtype=tf.float32)
x = tf.placeholder(shape=[None, 1], dtype=tf.float32)
z = tf.placeholder(shape=[None, 1], dtype=tf.float32)
t = tf.placeholder(shape=[None, 1], dtype=tf.float32)
y = tf.placeholder(shape=[None, 1], dtype=tf.float32)
linear_model = a * x + b * z + c * t
loss = tf.reduce_mean(tf.square(linear_model - y)) # sum of the squares
optimizer = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)
n = 50
x_train = np.arange(0, n, 1).reshape([-1, 1])
z_train = np.arange(0, 2*n, 2).reshape([-1, 1])
t_train = np.arange(0, n, 1).reshape([-1, 1])
y_train = np.array(map(lambda x, z, t: 2 * x + 2 * z - 3 * t, x_train, z_train, t_train)).reshape([-1, 1])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(101):
_, loss_val = sess.run([optimizer, loss], {x: x_train, z: z_train, t: t_train, y: y_train})
if i % 10 == 0:
a_val, b_val, c_val = sess.run([a, b, c])
print('iteration %2i, loss=%f a=%.5f b=%.5f c=%.5f' % (i, loss_val, a_val, b_val, c_val))
If you run it, you'll notice that it converges very fast - in less than 10 iterations. However, if you increase the training size n from 50 to 75, the model is going to diverge. But decreasing the learning rate 0.00001 will make it converge again, though not so fast as before. The more data you push to the optimizer, the more important an appropriate learning rate becomes.
You've tried 5000 training size: I can't even imaging how small the learning rate should be to process that many points at once correctly.

Categories