I am freshman & beginner.
I am studying machine learning with open tutorials.
I have a trouble with making gradient descent algorithm
I have to complete "for _ in range(max_iter):" but, I don't know about numpy... so I don't know what code should i add
Could you please help me fill the blank?
I know this type of question is so rude... sorry but I need your help :(
Thank you in advance.
from sklearn import datasets
import numpy as np
from sklearn.metrics import accuracy_score
X, y = datasets.make_classification(
n_samples = 200, n_features = 2, random_state = 333,
n_informative =2, n_redundant = 0 , n_clusters_per_class= 1)
def sigmoid(s):
return 1 / (1 + np.exp(-s))
def loss(y, h):
return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
def gradient(X, y, w):
return -(y * X) / (1 + np.exp(-y * np.dot(X, w)))
X_bias = np.append(np.ones((X.shape[0], 1)), X, axis=1)
y = np.array([[1] if label == 0 else [0] for label in y])
w = np.array([[random.uniform(-1, 1)] for _ in range(X.shape[1]+1)])
max_iter = 100
learning_rate = 0.1
threshold = 0.5
for _ in range(max_iter):
#fill in the blank
what code should i add ????
probabilities = sigmoid(np.dot(X_bias, w))
predictions = [[1] if p > threshold else [0] for p in probabilities]
print("loss: %.2f, accuracy: %.2f" %
(loss(y, probabilities), accuracy_score(y, predictions)))
Inside the for loop, we have to first compute the probabilities. Then find the gradients and then update the weights.
For computing probabilities, you can use the code below
probs=sigmoid(np.dot(X_bias,w))
np.dot is numpy command for matrix multiplication. Then we will calculate the loss and its gradients.
J=loss(y,probs)
dJ=gradient(X_bias,y,w)
Now we will update the weights.
w=w-learning_rate*dJ
So the final code will be
from sklearn import datasets
import numpy as np
from sklearn.metrics import accuracy_score
X, y = datasets.make_classification(
n_samples = 200, n_features = 2, random_state = 333,
n_informative =2, n_redundant = 0 , n_clusters_per_class= 1)
def sigmoid(s):
return 1 / (1 + np.exp(-s))
def loss(y, h):
return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
def gradient(X, y, w):
return -(y * X) / (1 + np.exp(-y * np.dot(X, w)))
X_bias = np.append(np.ones((X.shape[0], 1)), X, axis=1)
y = np.array([[1] if label == 0 else [0] for label in y])
w = np.array([[np.random.uniform(-1, 1)] for _ in range(X.shape[1]+1)])
max_iter = 100
learning_rate = 0.1
threshold = 0.5
for _ in range(max_iter):
probs=sigmoid(np.dot(X_bias,w))
J=loss(y,probs)
dJ=gradient(X_bias,y,w)
w=w-learning_rate*dJ
probabilities = sigmoid(np.dot(X_bias, w))
predictions = [[1] if p > threshold else [0] for p in probabilities]
print("loss: %.2f, accuracy: %.2f" %
(loss(y, probabilities), accuracy_score(y, predictions)))
Note: In the for loop, there is no need to compute probs and loss, As we only need gradients to update the weights. I did that because it will be easy to understand.
Related
I have implemented logistic regression from scratch, however when I run the script the algorithm always predict the wrong label.
I've tried changing the training output and test_output by switching all 1 to 0 and vice versa but it always predict the wrong label.
I also noticed that changing the "-" sign to "+", when updating the weigths and the bias, the script correctly predicts the label.
What am I doing wrong?
This is the code I've written:
# IMPORTS
import numpy as np
# HYPERPARAMETERS
EPOCHS = 1000
LEARNING_RATE = 0.1
# FUNCTIONS
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def cost(y_pred, training_outputs, m):
j = - np.sum(training_outputs * np.log(y_pred) + (1 - training_outputs) * np.log(1 - y_pred)) / m
return j
# ENTRY
if __name__ == "__main__":
# Training input and output
x = np.array([[1, 1, 1], [0, 0, 0], [1, 0, 1]])
training_outputs = np.array([1, 0, 1])
# Test input and output
test_input = np.array([[0, 1, 1]])
test_output = np.array([0])
# Weigths
w = np.array([0.3, 0.3, 0.3])
# Biases
b = 0
m = 3
# Training
for iteration in range(EPOCHS):
print("Iteration n.", iteration, end= "\r")
# Compute log odds
z = np.dot(x, w) + b
# Compute predicted probability
y_pred = sigmoid(z)
# Back propagation
dz = y_pred - training_outputs
dw = np.dot(x, dz) / m
db = np.sum(dz) / m
# Update weights and bias according to the gradient descent algorithm
w = w - LEARNING_RATE * dw
b = b - LEARNING_RATE * db
print("Model trained. Proceeding with model evaluation...")
# Test
# Compute log odds
z = np.dot(test_input, w) + b
# Compute predicted probability
y_pred = sigmoid(z)
print(y_pred)
# Compute cost
cost = cost(y_pred, test_output, m)
print(cost)
There was an incorrect assumption pointed out by #J_H:
>>> from sklearn.linear_model import LogisticRegression
>>> import numpy as np
>>> x = np.array([[1, 1, 1], [0, 0, 0], [1, 0, 1]])
>>> y = np.array([1, 0, 1])
>>> clf = LogisticRegression().fit(x, y)
>>> clf.predict([[0, 1, 1]])
array([1])
scikit-learn at appears to believe that test_output should be a 1 rather than a 0.
A few more recommendations:
m should be fine to remove (it's a constant, so it could be included in the LEARNING_RATE)
w should be initialized proportional to the number of columns in x (i.e., x.shape[1])
dw = np.dot(x, dz) should be np.dot(dz, x)
Prediction in logistic regression depends on a threshold, usually 0.5
Taking this into account would look something like the following.
# Initialize weights and bias
w, b = np.zeros(X.shape[1]), 0
for _ in range(EPOCHS):
# Compute log odds
z = np.dot(x, w) + b
# Compute predicted probability
y_pred = sigmoid(z)
# Back propagation
dz = y_pred - training_outputs
dw = np.dot(dz, x)
db = np.sum(dz)
# Update
w = w - LEARNING_RATE * dw
b = b - LEARNING_RATE * db
# Test
z = np.dot(test_input, w) + b
test_pred = sigmoid(z) >= 0.5
print(test_pred)
And a complete example on random train/test sets created with sklearn.datasets.make_classification could look like thisβwhich usually gets within a few decimals of the scikit-learn implementation as well:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
EPOCHS = 100
LEARNING_RATE = 0.01
def sigmoid(z):
return 1 / (1 + np.exp(-z))
if __name__ == "__main__":
X, y = make_classification(n_samples=1000, n_features=5)
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Initialize `w` and `b`
w, b = np.zeros(X.shape[1]), 0
for _ in range(EPOCHS):
z = np.dot(X_train, w) + b
y_pred = sigmoid(z)
dz = y_pred - y_train
dw = np.dot(dz, X_train)
db = np.sum(dz)
w = w - LEARNING_RATE * dw
b = b - LEARNING_RATE * db
# Test
z = np.dot(X_test, w) + b
test_pred = sigmoid(z) >= 0.5
print(accuracy_score(y_test, test_pred))
I'm trying to compute my gradient for a multiclass classification model with logistic regression and it seems not to be working properly.
This is the data that I am using for this model.
import pandas as pd
from sklearn.preprocessing import normalize
# Create x and y datasets
path = '/kaggle/input/digit-recognizer'
# Set train and test sets
data = pd.read_csv(path + '/train.csv', nrows=6000)
x_train, y_train = data.iloc[:4800, 1:].values, data.iloc[:4800, 0].values
x_test, y_test = data.iloc[4800:, 1:].values, data.iloc[4800:, 0].values
# Normalize and expand dims
x_train, x_test = x_train / 255, x_test / 255
y_train, y_test = np.expand_dims(y_train, 1), np.expand_dims(y_test, 1)
assert len(x_train) == len(y_train)
assert len(x_test) == len(y_test)
x_train.shape, y_train.shape
((4800, 784), (4800, 1))
Here is the following code where I try to implement gradient descent:
def Sigmoid(z):
from math import e
return 1 / (1 + e**-z)
def CostFunction(h, y, m):
j = -(1/m) * (y # np.log(h) + (1-y) # np.log(1-h))
return j
def GradientDescent(X, y, theta, n_classes):
import numpy as np
# Useful variables
m = len(y)
theta0, theta1 = [x.copy() for x in theta]
grad0, grad1 = [np.zeros(x.shape) for x in [theta0, theta1]]
y_vec = np.zeros((m, n_classes))
j = 0
for i in range(m):
y_vec[i, y[i]] = 1
### Forward propagation
a0 = np.concatenate(([1], X[i]))
a1 = np.concatenate(([1], Sigmoid(theta0 # a0)))
a2 = Sigmoid(theta1 # a1)
h = a2
j += CostFunction(h, y_vec[i], m)
### Backpropagation
delta2 = a2 - y_vec[i]
delta1 = theta1.T # delta2 * (a1 * (1 - a1))
grad0 += np.expand_dims(delta1[1:], 1) # np.expand_dims(a0, 0)
grad1 += np.expand_dims(delta2, 1) # np.expand_dims(a1, 0)
grad0 = grad0 / m
grad1 = grad1 / m
return j, [grad0, grad1]
Now comes the training process.
### Create theta parameters
n_layers = 3
n_classes = 10
# Weigth matrix dims(i, j) = (number of nodes, input shape + bias)
theta0 = np.random.uniform(0, 0.01, (24, x_train.shape[1] + 1))
theta1 = np.random.uniform(0, 0.01, (n_classes, len(theta0) + 1))
theta_params = [theta0, theta1]
### Train parameters
%%time
epochs = 200
alpha = 0.001
j, t = np.zeros(epochs), theta_params.copy()
for i in range(epochs):
print("Iterarion: {}/{}".format(i + 1, epochs))
j[i], g = GradientDescent(x_train, y_train, t, n_classes)
print(j[i])
t[0] = t[0] - a * g[0]
t[1] = t[1] - a * g[1]
The cost starts from J=7.2583 and goes down to approximately J=3.5223, where it gets stuck.
Then, whenever I try to predict any of the samples from the training or test sets it outputs the same approximate probability for all classes.
def Predict(X, theta):
import numpy as np
# Useful variables
m = len(X)
theta0, theta1 = [x for x in theta]
h = np.zeros(m)
for i in range(m):
### Forward propagation
a0 = np.concatenate(([1], X[i]))
a1 = np.concatenate(([1], Sigmoid(theta0 # a0)))
a2 = Sigmoid(theta1 # a1)
print(a2)
h[i] = np.argmax(a2)
return h
Predict(x_train[:1], t)
[0.20078521 0.19842413 0.20535222 0.1953332 0.19425315 0.19302124
0.20107485 0.19589331 0.19688894 0.19526526]
array([2.])
Notice that I'm am printing the results of the hypothesis probability for each node in the last layer during the Predict function.
Anyone could point me the direction by sharing some tips?
I need to build a function that gives the a posteriori covariance of a Gaussian Process. The idea is to train a GP using GPytorch, then take the learned hyperparameters, and pass them into my kernel function. (for several reason I can't use the GPyTorch directly).
Now the problem is that I can't reproduce the prediction. Here the code I wrote. I have been working on it the whole day but I can't find the problem. Do you know what I am doing wrong?
from gpytorch.mlls import ExactMarginalLogLikelihood
import numpy as np
import gpytorch
import torch
train_x1 = torch.linspace(0, 0.95, 50) + 0.05 * torch.rand(50)
train_y1 = torch.sin(train_x1 * (2 * np.pi)) + 0.2 * torch.randn_like(train_x1)
n_datapoints = train_x1.shape[0]
def kernel_rbf(x1, x2, c, l):
# my RBF function
if x1.shape is ():
x1 = np.atleast_2d(x1)
if x2.shape is ():
x2 = np.atleast_2d(x2)
return c * np.exp(- np.matmul((x1 - x2).T, (x1 - x2)) / (2 * l ** 2))
class ExactGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super().__init__(train_x, train_y, likelihood)
lengthscale_prior = gpytorch.priors.GammaPrior(3.0, 6.0)
outputscale_prior = gpytorch.priors.GammaPrior(2.0, 0.15)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.ScaleKernel(
gpytorch.kernels.RBFKernel(lengthscale_prior=lengthscale_prior),
outputscale_prior=outputscale_prior)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x1, train_y1, likelihood)
# Find optimal model hyperparameters
model.train()
likelihood.train()
mll = ExactMarginalLogLikelihood(likelihood, model)
# Use the Adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1) # Includes GaussianLikelihood parameters
training_iterations = 50
for i in range(training_iterations):
optimizer.zero_grad()
output = model(*model.train_inputs)
loss = -mll(output, model.train_targets)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
optimizer.step()
# Get the learned hyperparameters
outputscale = model.covar_module.outputscale.item()
lengthscale = model.covar_module.base_kernel.lengthscale.item()
noise = likelihood.noise_covar.noise.item()
train_x1 = train_x1.numpy()
train_y1 = train_y1.numpy()
# Get covariance train points
K = np.zeros((n_datapoints, n_datapoints))
for i in range(n_datapoints):
for j in range(n_datapoints):
K[i, j] = kernel_rbf(train_x1[i], train_x1[j], outputscale, lengthscale)
# Add noise
K += noise ** 2 * np.eye(n_datapoints)
# Get covariance train-test points
x_test = torch.rand(1, 1)
Ks = np.zeros((n_datapoints, 1))
for i in range(n_datapoints):
Ks[i] = kernel_rbf(train_x1[i], x_test.numpy(), outputscale, lengthscale)
# Get variance test points
Kss = kernel_rbf(x_test.numpy(), x_test.numpy(), outputscale, lengthscale)
L = np.linalg.cholesky(K)
v = np.linalg.solve(L, Ks)
var = Kss - np.matmul(v.T, v)
model.eval()
likelihood.eval()
with gpytorch.settings.fast_pred_var():
y_preds = likelihood(model(x_test))
print(f"Predicted variance with gpytorch:{y_preds.variance.item()}")
print(f"Predicted variance with my kernel:{var}")
I found the errors:
The noise is not squared so it is K += noise * np.eye(n_datapoints) and not K += noise**2 * np.eye(n_datapoints)
I forgot to add the noise term in the $$ K** $$, i.e. Kss += noise
I am trying to create a basic Linear Regression Model implementing Coordinate Descent (I have made it inherit from OrdinaryLinearRegression, because it implements the same predict and score functions).
Using the loss function as the Residual Sum of squares:
πΏπ
ππ= 1βN βππ€βπ¦β2
Our gradient descent should be:
π€β²= π€ β π 2βN ππ(ππ€βπ¦)
Implementing the code:
def scalingfeatures(X):
scaler = StandardScaler()
scaler.fit(X)
return scaler.transform(X)
class OrdinaryLinearRegressionCoordinateDescent(OrdinaryLinearRegression):
def __init__(self,lr,num_iter):
self.lr = lr
self.num_iter = num_iter
def lossfunction(self,X,y,w):
m = np.size(y)
#Cost function in vectorized form
y_pred = X # w
# J = 1/N * Sum((αΊ - y)**2)
J = float((1./(2*m)) * (y_pred - y).T # (y_pred - y))
return J
def fit(self,X,y):
X = scalingfeatures(X)
X = np.concatenate((np.ones((X.shape[0],1)),X),axis=1)
m,n = X.shape
np.random.seed(42)
w = np.random.randn(n,1)
y = y.reshape(-1,1)
for iter in range(self.num_iter):
for j in range(n):
#Coordinate descent in vectorized form
X_j = X[:,j].reshape(-1,1)
y_pred = X # w
gradient = X_j.T # (y_pred-y)
w[j] = w[j] - self.lr * (2/n) * gradient
loss = self.lossfunction(X,y,w)
print(loss)
self.w = w
return self
OLRCD = OrdinaryLinearRegressionCoordinateDescent(lr=0.05,num_iter=500)
train = OLRCD.fit(X,y)
print("The training MSE for ORLGD is: ",train.score(X,y))
When I run the code, I get that with every iteration the loss only increases...
I am now learning the stanford cs231n course. When completing the softmax_loss function, I found it is not easy to write in a full-vectorized type, especially dealing with the dw term. Below is my code. Can somebody optimize the code. Would be appreciated.
def softmax_loss_vectorized(W, X, y, reg):
loss = 0.0
dW = np.zeros_like(W)
num_train = X.shape[0]
num_classes = W.shape[1]
scores = X.dot(W)
scores -= np.max(scores, axis = 1)[:, np.newaxis]
exp_scores = np.exp(scores)
sum_exp_scores = np.sum(exp_scores, axis = 1)
correct_class_score = scores[range(num_train), y]
loss = np.sum(np.log(sum_exp_scores)) - np.sum(correct_class_score)
exp_scores = exp_scores / sum_exp_scores[:,np.newaxis]
# **maybe here can be rewroten into matrix operations**
for i in xrange(num_train):
dW += exp_scores[i] * X[i][:,np.newaxis]
dW[:, y[i]] -= X[i]
loss /= num_train
loss += 0.5 * reg * np.sum( W*W )
dW /= num_train
dW += reg * W
return loss, dW
Here's a vectorized implementation below. But I suggest you try to spend a little bit more time and get to the solution yourself. The idea is to construct a matrix with all softmax values and subtract -1 from the correct elements.
def softmax_loss_vectorized(W, X, y, reg):
num_train = X.shape[0]
scores = X.dot(W)
scores -= np.max(scores)
correct_scores = scores[np.arange(num_train), y]
# Compute the softmax per correct scores in bulk, and sum over its logs.
exponents = np.exp(scores)
sums_per_row = np.sum(exponents, axis=1)
softmax_array = np.exp(correct_scores) / sums_per_row
information_array = -np.log(softmax_array)
loss = np.mean(information_array)
# Compute the softmax per whole scores matrix, which gives the matrix for X rows coefficients.
# Their linear combination is algebraically dot product X transpose.
all_softmax_matrix = (exponents.T / sums_per_row).T
grad_coeff = np.zeros_like(scores)
grad_coeff[np.arange(num_train), y] = -1
grad_coeff += all_softmax_matrix
dW = np.dot(X.T, grad_coeff) / num_train
# Regularization
loss += 0.5 * reg * np.sum(W * W)
dW += reg * W
return loss, dW