Multiclass logistic regression - implementation question

Multiclass logistic regression - implementation question - python

This is my try to implement multi-class logistic regression in python using softmax as activation function and mnist digit data set as training and test set.
import numpy as np
def softmax(z):
return np.array([(np.exp(el)/np.sum(np.exp(el))) for el in z])
def cost(W,F,L):
m = F.shape[0] #get number of rows
mul = np.dot(F, W)
sm_mul_T = softmax(mul)
return -(1/m) * np.sum(L * np.log(sm_mul_T))
def gradient(W,F,L):
m = F.shape[0] # get number of rows
mul = np.dot(F, W)
sm_mul_T = softmax(mul)
return -(1 / m) * np.dot(F.T , (L - sm_mul_T))
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("./datasets/MNIST_data/", one_hot=True)
W = np.zeros((785, 10)) #784 features + 1 bias
for _ in range(10000):
F, L = mnist.train.next_batch(100)
F = np.insert(F,0, values=1, axis=1)
total_cost = cost(W,F,L)
print("Total cost is {}".format(total_cost))
gradients = gradient(W,F,L)
W = W - (0.1 * gradients)
FU = mnist.test.images
FU = np.insert(FU,0, values=1, axis=1)
LU = mnist.test.labels
mulU = np.dot(FU, W)
sm_mulU = softmax(mulU)
OK=0
NOK=0
for i in range(10000):
a1 = np.argmax(sm_mulU[i])
a2 = np.argmax(LU[i])
if a1 == a2:
OK = OK + 1
else:
NOK = NOK + 1
print("{} OK vs {} NOK".format(OK, NOK))
print("accur {}%".format((OK/(NOK+OK))*100))
What I wanted to do is basically to try to implement it myself and try to get similar results as using Tensor flow. But the problem is that tensor flow implementation gets in the end 91% accuracy, while I can get only around 70%. Also, it seems like my model diverges and cost starts to increase pretty fast.
Is my implementation wrong, or is it due to more advanced algorithm inside TensorFlow implementation?

Related

Neural network built from scratch in python to classify digits stuck at 11.35 percent accuracy. I am using the MNIST dataset

My neural network is stuck at 11.35 percent accuracy and i am unable to trace the error.
low accuracy at 11.35 percent
I am following this code https://github.com/MLForNerds/DL_Projects/blob/main/mnist_ann.ipynb which I found in a youtube video.
Here is my code for the neural network(I have defined Xavier weight initialization in a module called nn):
"""1. 784 neurons in input layer
2. 128 neurons in hidden layer 1
3. 64 neurons in hidden layer 2
4. 10 neurons in output layer"""
def softmax(input):
y = np.exp(input - input.max())
activated = y/ np.sum(y, axis=0)
return activated
def softmax_grad(x):
exps = np.exp(x-x.max())
return exps / np.sum(exps,axis = 0) * (1 - exps /np.sum(exps,axis = 0))
def sigmoid(input):
activated = 1/(1 + np.exp(-input))
return activated
def sigmoid_grad(input):
grad = input*(1-input)
return grad
class DenseNN:
def __init__(self,d0,d1,d2,d3):
self.params = {'w1': nn.Xavier.initialize(d0, d1),
'w2': nn.Xavier.initialize(d1, d2),
'w3': nn.Xavier.initialize(d2, d3)}
def forward(self,a0):
params = self.params
params['a0'] = a0
params['z1'] = np.dot(params['w1'],params['a0'])
params['a1'] = sigmoid(params['z1'])
params['z2'] = np.dot(params['w2'],params['a1'])
params['a2'] = sigmoid(params['z2'])
params['z3'] = np.dot(params['w3'],params['a2'])
params['a3'] = softmax(params['z3'])
return params['a3']
def backprop(self,y_true,y_pred):
params = self.params
w_change = {}
error = softmax_grad(params['z3'])*((y_pred - y_true)/y_true.shape[0])
w_change['w3'] = np.outer(error,params['a2'])
error = np.dot(params['w3'].T,error)*sigmoid_grad(params['a2'])
w_change['w2'] = np.outer(error,params['a1'])
error = np.dot(params['w2'].T,error)*sigmoid_grad(params['a1'])
w_change['w1'] = np.outer(error,params['a0'])
return w_change
def update_weights(self,learning_rate,w_change):
self.params['w1'] -= learning_rate*w_change['w1']
self.params['w2'] -= learning_rate*w_change['w2']
self.params['w3'] -= learning_rate*w_change['w3']
def train(self,epochs,lr):
for epoch in range(epochs):
for i in range(60000):
a0 = np.array([x_train[i]]).T
o = np.array([y_train[i]]).T
y_pred = self.forward(a0)
w_change = self.backprop(o,y_pred)
self.update_weights(lr,w_change)
# print(self.compute_accuracy()*100)
# print(calc_mse(a3, o))
print((self.compute_accuracy())*100)
def compute_accuracy(self):
'''
This function does a forward pass of x, then checks if the indices
of the maximum value in the output equals the indices in the label
y. Then it sums over each prediction and calculates the accuracy.
'''
predictions = []
for i in range(10000):
idx = i
a0 = x_test[idx]
a0 = np.array([a0]).T
#print("acc a1",np.shape(a1))
o = y_test[idx]
o = np.array([o]).T
#print("acc o",np.shape(o))
output = self.forward(a0)
pred = np.argmax(output)
predictions.append(pred == np.argmax(o))
return np.mean(predictions)
Here is the code for loading the data:
#load dataset csv
train_data = pd.read_csv('../Datasets/MNIST/mnist_train.csv')
test_data = pd.read_csv('../Datasets/MNIST/mnist_test.csv')
#train data
x_train = train_data.drop('label',axis=1).to_numpy()
y_train = pd.get_dummies(train_data['label']).values
#test data
x_test = test_data.drop('label',axis=1).to_numpy()
y_test = pd.get_dummies(test_data['label']).values
fac = 0.99 / 255
x_train = np.asfarray(x_train) * fac + 0.01
x_test = np.asfarray(x_test) * fac + 0.01
# train_labels = np.asfarray(train_data[:, :1])
# test_labels = np.asfarray(test_data[:, :1])
#printing dimensions
print(np.shape(x_train)) #(60000,784)
print(np.shape(y_train)) #(60000,10)
print(np.shape(x_test)) #(10000,784)
print(np.shape(y_test)) #(10000,10)
print((x_train))
Kindly help
I am a newbie in machine learning so any help would be appreciated.I am unable to figure out where i am going wrong.Most of the code is almost similar to https://github.com/MLForNerds/DL_Projects/blob/main/mnist_ann.ipynb but it manages to get 60 percent accuracy.
EDIT
I found the mistake :
Thanks to Bartosz Mikulski.
The problem was with how the weights were initialized in my Xavier weights initialization algorithm.
I changed the code for weights initialization to this:
self.params = {
'w1':np.random.randn(d1, d0) * np.sqrt(1. / d1),
'w2':np.random.randn(d2, d1) * np.sqrt(1. / d2),
'w3':np.random.randn(d3, d2) * np.sqrt(1. / d3),
'b1':np.random.randn(d1, 1) * np.sqrt(1. / d1),
'b2':np.random.randn(d2, 1) * np.sqrt(1. / d2),
'b3':np.random.randn(d3, 1) * np.sqrt(1. / d3),
}
then i got the output:
After changing weights initialization
after adding the bias parameters i got the output:
After changing weights initialization and adding bias
3: After changing weights initialization and adding bias

The one problem that I can see is that you are using only weights but no biases. They are very important because they allow your model to change the position of the decision plane (boundary) in the solution space. If you only have weights you can only angle the solution.
I guess that basically, this is the best fit you can get without biases. The dense layer is basically a linear function: w*x + b and you are missing the b. See the PyTorch documentation for the example: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#linear.
Also, can you show your Xavier initialization? In your case, even the simple normal distributed values would be enough as initialization, no need to rush into more advanced topics.
I would also suggest you start from the smaller problem (for example Iris dataset) and no hidden layers (just a simple linear regression that learns by using gradient descent). Then you can expand it by adding hidden layers, and then by trying harder problems with the code you already have.

Why does regularization in pytorch and scratch code does not match and what is the formula used for regularization in pytorch?

I have been trying to do L2 regularization on a binary classification model in PyTorch but when I match the results of PyTorch and scratch code it doesn't match,
Pytorch code:
class LogisticRegression(nn.Module):
def __init__(self,n_input_features):
super(LogisticRegression,self).__init__()
self.linear=nn.Linear(4,1)
self.linear.weight.data.fill_(0.0)
self.linear.bias.data.fill_(0.0)
def forward(self,x):
y_predicted=torch.sigmoid(self.linear(x))
return y_predicted
model=LogisticRegression(4)
criterion=nn.BCELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=0.05,weight_decay=0.1)
dataset=Data()
train_data=DataLoader(dataset=dataset,batch_size=1096,shuffle=False)
num_epochs=1000
for epoch in range(num_epochs):
for x,y in train_data:
y_pred=model(x)
loss=criterion(y_pred,y)
loss.backward()
optimizer.step()
optimizer.zero_grad()
Scratch Code:
def sigmoid(z):
s = 1/(1+ np.exp(-z))
return s
def yinfer(X, beta):
return sigmoid(beta[0] + np.dot(X,beta[1:]))
def cost(X, Y, beta, lam):
sum = 0
sum1 = 0
n = len(beta)
m = len(Y)
for i in range(m):
sum = sum + Y[i]*(np.log( yinfer(X[i],beta)))+ (1 -Y[i])*np.log(1-yinfer(X[i],beta))
for i in range(0, n):
sum1 = sum1 + beta[i]**2
return (-sum + (lam/2) * sum1)/(1.0*m)
def pred(X,beta):
if ( yinfer(X, beta) > 0.5):
ypred = 1
else :
ypred = 0
return ypred
beta = np.zeros(5)
iterations = 1000
arr_cost = np.zeros((iterations,4))
print(beta)
n = len(Y_train)
for i in range(iterations):
Y_prediction_train=np.zeros(len(Y_train))
Y_prediction_test=np.zeros(len(Y_test))
for l in range(len(Y_train)):
Y_prediction_train[l]=pred(X[l,:],beta)
for l in range(len(Y_test)):
Y_prediction_test[l]=pred(X_test[l,:],beta)
train_acc = format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)
test_acc = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100
arr_cost[i,:] = [i,cost(X,Y_train,beta,lam),train_acc,test_acc]
temp_beta = np.zeros(len(beta))
''' main code from below '''
for j in range(n):
temp_beta[0] = temp_beta[0] + yinfer(X[j,:], beta) - Y_train[j]
temp_beta[1:] = temp_beta[1:] + (yinfer(X[j,:], beta) - Y_train[j])*X[j,:]
for k in range(0, len(beta)):
temp_beta[k] = temp_beta[k] + lam * beta[k] #regularization here
temp_beta= temp_beta / (1.0*n)
beta = beta - alpha*temp_beta
graph of the losses
graph of training accuracy
graph of testing accuracy
Can someone please tell me why this is happening?
L2 value=0.1

Great question. I dug a lot through PyTorch documentation and found the answer. The answer is very tricky. Basically there are two ways to calculate regulalarization. (For summery jump to the last section).
The PyTorch uses the first type (in which regularization factor is not divided by batch size).
Here's a sample code which demonstrates that:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import torch.optim as optim
class model(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(1, 1)
self.linear.weight.data.fill_(1.0)
self.linear.bias.data.fill_(1.0)
def forward(self, x):
return self.linear(x)
model = model()
optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=1.0)
input = torch.tensor([[2], [4]], dtype=torch.float32)
target = torch.tensor([[7], [11]], dtype=torch.float32)
optimizer.zero_grad()
pred = model(input)
loss = F.mse_loss(pred, target)
print(f'input: {input[0].data, input[1].data}')
print(f'prediction: {pred[0].data, pred[1].data}')
print(f'target: {target[0].data, target[1].data}')
print(f'\nMSEloss: {loss.item()}\n')
loss.backward()
print('Before updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data, gradient]: {model.linear.weight.data, model.linear.weight.grad}')
print(f'bias [data, gradient]: {model.linear.bias.data, model.linear.bias.grad}')
print('--------------------------------------------------------------------------')
optimizer.step()
print('After updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data]: {model.linear.weight.data}')
print(f'bias [data]: {model.linear.bias.data}')
print('--------------------------------------------------------------------------')
which outputs:
input: (tensor([2.]), tensor([4.]))
prediction: (tensor([3.]), tensor([5.]))
target: (tensor([7.]), tensor([11.]))
MSEloss: 26.0
Before updation:
--------------------------------------------------------------------------
weight [data, gradient]: (tensor([[1.]]), tensor([[-32.]]))
bias [data, gradient]: (tensor([1.]), tensor([-10.]))
--------------------------------------------------------------------------
After updation:
--------------------------------------------------------------------------
weight [data]: tensor([[4.1000]])
bias [data]: tensor([1.9000])
--------------------------------------------------------------------------
Here m = batch size = 2, lr = alpha = 0.1, lambda = weight_decay = 1.
Now consider tensor weight which has value = 1 and grad = -32
case1(type1 regularization):
weight = weight - lr(grad + weight_decay.weight)
weight = 1 - 0.1(-32 + 1(1))
weight = 4.1
case2(type2 regularization):
weight = weight - lr(grad + (weight_decay/batch size).weight)
weight = 1 - 0.1(-32 + (1/2)(1))
weight = 4.15
From the output we can see that updated weight = 4.1000. That concludes PyTorch uses type1 regularization.
So finally In your code you are following type2 regularization. So just change some last lines to this:
# for k in range(0, len(beta)):
# temp_beta[k] = temp_beta[k] + lam * beta[k] #regularization here
temp_beta= temp_beta / (1.0*n)
beta = beta - alpha*(temp_beta + lam * beta)
And also PyTorch loss functions doesn't include regularization term(implemented inside optimizers) so also remove regularization terms inside your custom cost function.
In summary:
Pytorch use this Regularization function:
Regularization is implemented inside Optimizers (weight_decay parameter).
PyTorch Loss functions doesn't include Regularization term.
Bias is also regularized if Regularization is used.
To use Regularization try:
torch.nn.optim.optimiser_name(model.parameters(), lr, weight_decay=lambda).

Scipy fails to minimize cost function

Currently I'm learning from Andrew Ng course on Coursera called "Machine Learning". In exercise 5, we built a model that can predict digits, trained by the MNIST dataset. This task was completed successfully in Matlab by me, but I wanted to migrate that code to Python, just to see how different things are and maybe continue to play around with the model.
I managed to implement the cost function and the back propagation algorithm correctly. I know that because I compared the metrics with my working model in Matlab and it emits the same numbers.
Now, because in the course we train the model using fmincg, I tried to do the same using Scipy fmin_cg
function.
My problem is, the cost function takes extra small steps and fails to converge.
Here is my code for the network:
import numpy as np
import utils
import scipy.optimize as op
class Network:
def __init__(self, layers):
self.layers = layers
self.weights = self.generate_params()
# Function for generating theta multidimensional matrix
def generate_params(self):
theta = []
epsilon = 0.12
for i in range(len(self.layers) - 1):
current_layer_units = self.layers[i]
next_layer_units = self.layers[i + 1]
theta_i = np.multiply(
np.random.rand(next_layer_units, current_layer_units + 1),
2 * epsilon - epsilon
)
# Appending the params to the theta matrix
theta.append(theta_i)
return theta
# Function to append bias row/column to matrix X
def append_bias(self, X, d):
m = X.shape[0]
n = 1 if len(X.shape) == 1 else X.shape[1]
if (d == 'column'):
ones = np.ones((m, n + 1))
ones[:, 1:] = X.reshape((m, n))
elif (d == 'row'):
ones = np.ones((m + 1, n))
ones[1:, :] = X.reshape((m, n))
return ones
# Function for computing the gradient for 1 training example
def back_prop(self, y, feed, theta):
activations = feed["activations"]
weighted_layers = feed["weighted_layers"]
delta_output = activations[-1] - y.reshape(len(y), 1)
current_delta = delta_output
# Initializing gradients
gradients = []
for i, theta_i in enumerate(theta):
gradients.append(np.zeros(theta_i.shape))
# Peforming delta calculations.
# Here, we continue to propagate the delta values backwards
# until we arrive to the second layer.
for i in reversed(range(len(theta))):
theta_i = theta[i]
if (i > 0):
i_weighted_inputs = self.append_bias(weighted_layers[i - 1], 'row')
t_theta_i = np.transpose(theta_i)
delta_i = np.multiply(np.dot(t_theta_i, current_delta), utils.sigmoidGradient(i_weighted_inputs))
delta_i = delta_i[1:]
gradients[i] = current_delta * np.transpose(activations[i])
# Setting current delta for the next layer
current_delta = delta_i
else:
gradients[i] = current_delta * np.transpose(activations[i])
return gradients
# Function for computing the cost and the derivatives
def compute_cost(self, theta, X, y, r12n = 0):
m = len(X)
num_labels = self.layers[-1]
costs = np.zeros(m)
# Initializing gradients
gradients = []
for i, theta_i in enumerate(theta):
gradients.append(np.zeros(theta_i.shape))
# Iterating over the training set
for i in range(m):
inputs = X[i]
observed = utils.create_output_vector(y[i], num_labels)
feed = self.feed_forward(inputs)
predicted = feed["activations"][-1]
total_cost = 0
for k, o in enumerate(observed):
if (o == 1):
total_cost += np.log(predicted[k])
else:
total_cost += np.log(1 - predicted[k])
cost = -1 * total_cost
# Storing the cost for the i-th training example
costs[i] = cost
# Calculating the gradient for this training example
# using back propagation algorithm
gradients_i = self.back_prop(observed, feed, theta)
for i, gradient in enumerate(gradients_i):
gradients[i] += gradient
# Calculating the avg regularization term for the cost
sum_of_theta = 0
for i, theta_i in enumerate(theta):
squared_theta = np.power(theta_i[:, 1:], 2)
sum_of_theta += np.sum(squared_theta)
r12n_avg = r12n * sum_of_theta / (2 * m)
total_cost = np.sum(costs) / m + r12n_avg
# Applying regularization terms to the gradients
for i, theta_i in enumerate(theta):
lambda_i = np.copy(theta_i)
lambda_i[:, 0] = 0
lambda_i = np.multiply((r12n / m), lambda_i)
# Adding the r12n matrix to the gradient
gradients[i] = gradients[i] / m + lambda_i
return total_cost, gradients
# Function for training the neural network using conjugate gradient algorithm
def train_cg(self, X, y, r12n = 0, iterations = 50):
weights = self.weights
def Cost(theta, X, y):
theta = utils.roll_theta(theta, self.layers)
cost, _ = self.compute_cost(theta, X, y, r12n)
print(cost);
return cost
def Gradient(theta, X, y):
theta = utils.roll_theta(theta, self.layers)
_, gradient = self.compute_cost(theta, X, y, r12n)
return utils.unroll_theta(gradient)
unrolled_theta = utils.unroll_theta(weights)
result = op.fmin_cg(f = Cost,
x0 = unrolled_theta,
args=(X, y),
fprime=Gradient,
maxiter = iterations)
self.weights = utils.roll_theta(result, self.layers)
# Function for feeding forward the network
def feed_forward(self, X):
# Useful variables
activations = []
weighted_layers = []
weights = self.weights
currentActivations = self.append_bias(X, 'row')
activations.append(currentActivations)
for i in range(len(self.layers) - 1):
layer_weights = weights[i]
weighted_inputs = np.dot(layer_weights, currentActivations)
# Storing the weighted inputs
weighted_layers.append(weighted_inputs)
activation_nodes = []
# If the next layer is not the output layer, we'd like to add a bias unit to it
# (Excluding the input and the output layer)
if (i < len(self.layers) - 2):
activation_nodes = self.append_bias(utils.sigmoid(weighted_inputs), 'row')
else:
activation_nodes = utils.sigmoid(weighted_inputs)
# Appending the layer of nodes to the activations array
activations.append(activation_nodes)
currentActivations = activation_nodes
data = {
"activations": activations,
"weighted_layers": weighted_layers
}
return data
def predict(self, X):
data = self.feed_forward(X)
output = data["activations"][-1]
# Finding the max index in the output layer
return np.argmax(output, axis=0)
Here is the invocation of the code:
import numpy as np
from network import Network
# %% Load data
X = np.genfromtxt('data/mnist_data.csv', delimiter=',')
y = np.genfromtxt('data/mnist_outputs.csv', delimiter=',').astype(int)
# %% Create network
num_labels = 10
input_layer = 400
hidden_layer = 25
output_layer = num_labels
layers = [input_layer, hidden_layer, output_layer]
# Create a new neural network
network = Network(layers)
# %% Train the network and save the weights
network.train_cg(X, y, r12n = 1, iterations = 20)
This is what the code emits after each iteration:
15.441233231650283
15.441116436313076
15.441192262452514
15.44122384651483
15.441231216030646
15.441232804294314
15.441233141284435
15.44123321255294
15.441233227614855
As you can see, the changes to the cost are very small.
I checked for the shapes of the vectors and gradient and they both seem fine, just like in my Matlab implementation. I'm not sure what I do wrong here.
If you guys could help me, that'd be great :)

Neural network can learn |sin(x)| for [0,pi] but not [0,2pi] or [0, 4pi]

My neural network can learn |sin(x)| for [0,pi], but not larger intervals than that. I tried changing the quantity and widths of hidden layers in various ways, but none of the changes leads to a good result.
I train the NN on thousands of random values from a uniform distribution in the chosen interval. using back propagation with gradient descent.
I am starting to think there is a fundamental problem in my network.
For the following examples I used a 1-10-10-1 layer structure:
[0, pi]:
[0, 2pi]:
[0, 4pi]:
Here is the code for the neural network:
import math
import numpy
import random
import copy
import matplotlib.pyplot as plt
def sigmoid(x):
return 1.0/(1+ numpy.exp(-x))
def sigmoid_derivative(x):
return x * (1.0 - x)
class NeuralNetwork:
def __init__(self, weight_dimensions, x=None, y=None):
self.weights = []
self.layers = [[]] * len(weight_dimensions)
self.weight_gradients = []
self.learning_rate = 1
self.layers[0] = x
for i in range(len(weight_dimensions) - 1):
self.weights.append(numpy.random.rand(weight_dimensions[i],weight_dimensions[i+1]) - 0.5)
self.y = y
def feed_forward(self):
# calculate an output using feed forward layer-by-layer
for i in range(len(self.layers) - 1):
self.layers[i + 1] = sigmoid(numpy.dot(self.layers[i], self.weights[i]))
def print_loss(self):
loss = numpy.square(self.layers[-1] - self.y).sum()
print(loss)
def get_weight_gradients(self):
return self.weight_gradients
def apply_weight_gradients(self):
for i in range(len(self.weight_gradients)):
self.weights[i] += self.weight_gradients[i] * self.learning_rate
if self.learning_rate > 0.001:
self.learning_rate -= 0.0001
def back_prop(self):
# find derivative of the loss function with respect to weights
self.weight_gradients = []
deltas = []
output_error = (self.y - self.layers[-1])
output_delta = output_error * sigmoid_derivative(self.layers[-1])
deltas.append(output_delta)
self.weight_gradients.append(self.layers[-2].T.dot(output_delta))
for i in range(len(self.weights) - 1):
i_error = deltas[i].dot(self.weights[-(i+1)].T)
i_delta = i_error * sigmoid_derivative(self.layers[-(i+2)])
self.weight_gradients.append(self.layers[-(i+3)].T.dot(i_delta))
deltas.append(copy.deepcopy(i_delta))
# Unreverse weight gradient list
self.weight_gradients = self.weight_gradients[::-1]
def get_output(self, inp):
self.layers[0] = inp
self.feed_forward()
return self.layers[-1]
def sin_test():
interval = numpy.random.uniform(0, 2*math.pi, int(1000*(2*math.pi)))
x_values = []
y_values = []
for i in range(len(interval)):
y_values.append([abs(math.sin(interval[i]))])
x_values.append([interval[i]])
x = numpy.array(x_values)
y = numpy.array(y_values)
nn = NeuralNetwork([1, 10, 10, 1], x, y)
for i in range(10000):
tmp_input = []
tmp_output = []
mini_batch_indexes = random.sample(range(0, len(x)), 10)
for j in mini_batch_indexes:
tmp_input.append(x[j])
tmp_output.append(y[j])
nn.layers[0] = numpy.array(tmp_input)
nn.y = numpy.array(tmp_output)
nn.feed_forward()
nn.back_prop()
nn.apply_weight_gradients()
nn.print_loss()
nn.layers[0] = numpy.array(numpy.array(x))
nn.y = numpy.array(numpy.array(y))
nn.feed_forward()
axis_1 = []
axis_2 = []
for i in range(len(nn.layers[-1])):
axis_1.append(nn.layers[0][i][0])
axis_2.append(nn.layers[-1][i][0])
true_axis_2 = []
for x in axis_1:
true_axis_2.append(abs(math.sin(x)))
axises = []
for i in range(len(axis_1)):
axises.append([axis_1[i], axis_2[i], true_axis_2[i]])
axises.sort(key=lambda x: x[0], reverse=False)
axis_1_new = []
axis_2_new = []
true_axis_2_new = []
for elem in axises:
axis_1_new.append(elem[0])
axis_2_new.append(elem[1])
true_axis_2_new.append(elem[2])
plt.plot(axis_1_new, axis_2_new, label="nn")
plt.plot(axis_1_new, true_axis_2_new, 'k--', label="sin(x)")
plt.grid()
plt.axis([0, 2*math.pi, -1, 2.5])
plt.show()
sin_test()

The main issue with your network seem to be that you apply the activation function to the final "layer" of your network. The final output of your network should be a linear combination without any sigmoid applied.
As a warning though, do not expect the model to generalize outside of the region included in the training data.
Here is an example in PyTorch:
import torch
import torch.nn as nn
import math
import numpy as np
import matplotlib.pyplot as plt
N = 1000
p = 2.5
x = 2 * p * math.pi * torch.rand(N, 1)
y = np.abs(np.sin(x))
with torch.no_grad():
plt.plot(x.numpy(), y.numpy(), '.')
plt.savefig("training_data.png")
inner = 20
model = nn.Sequential(
nn.Linear(1, inner, bias=True),
nn.Sigmoid(),
nn.Linear(inner, 1, bias=True)#,
#nn.Sigmoid()
)
loss_fn = nn.MSELoss()
learning_rate = 1e-3
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500000):
y_pred = model(x)
loss = loss_fn(y_pred, y)
if t % 1000 == 0:
print("MSE: {}".format(t), loss.item())
model.zero_grad()
loss.backward()
optimizer.step()
with torch.no_grad():
X = torch.arange(0, p * 2 * math.pi, step=0.01).reshape(-1, 1)
Y = model(X)
Y_TRUTH = np.abs(np.sin(X))
print(Y.shape)
print(Y_TRUTH.shape)
loss = loss_fn(Y, Y_TRUTH)
plt.clf()
plt.plot(X.numpy(), Y_TRUTH.numpy())
plt.plot(X.numpy(), Y.numpy())
plt.title("MSE: {}".format(loss.item()))
plt.savefig("output.png")
The output is available here: Image showing neural network prediction and ground truth. The yellow line is the predicted line by the neural network and the blue line is the ground truth.

First and foremost, you've chosen a topology suited for a different class of problems. A simple, fully-connected NN such as this is great with trivial classification (e.g. Boolean operators) or functions with at least two continuous derivatives. You've tried to apply it to a function that is simply one step beyond its capabilities.
Try your model on sin(x) and see how it performs at larger ranges. Try it on max(sin(x), 0). Do you see how the model has trouble with certain periodicity and irruptions? These are an emergent feature of the many linear equations struggling to predict the proper functional value: the linear combinations have trouble emulating non-linearities past a simple level.

Implementing back propagation using numpy and python for cleveland dataset

I wanted to predict heart disease using backpropagation algorithm for neural networks. For this I used UCI heart disease data set linked here: processed cleveland. To do this, I used the cde found on the following blog: Build a flexible Neural Network with Backpropagation in Python and changed it little bit according to my own dataset. My code is as follows:
import numpy as np
import csv
reader = csv.reader(open("cleveland_data.csv"), delimiter=",")
x = list(reader)
result = np.array(x).astype("float")
X = result[:, :13]
y0 = result[:, 13]
y1 = np.array([y0])
y = y1.T
# scale units
X = X / np.amax(X, axis=0) # maximum of X array
class Neural_Network(object):
def __init__(self):
# parameters
self.inputSize = 13
self.outputSize = 1
self.hiddenSize = 13
# weights
self.W1 = np.random.randn(self.inputSize, self.hiddenSize)
self.W2 = np.random.randn(self.hiddenSize, self.outputSize)
def forward(self, X):
# forward propagation through our network
self.z = np.dot(X, self.W1)
self.z2 = self.sigmoid(self.z) # activation function
self.z3 = np.dot(self.z2, self.W2)
o = self.sigmoid(self.z3) # final activation function
return o
def sigmoid(self, s):
# activation function
return 1 / (1 + np.exp(-s))
def sigmoidPrime(self, s):
# derivative of sigmoid
return s * (1 - s)
def backward(self, X, y, o):
# backward propgate through the network
self.o_error = y - o # error in output
self.o_delta = self.o_error * self.sigmoidPrime(o) # applying derivative of sigmoid to error
self.z2_error = self.o_delta.dot(
self.W2.T) # z2 error: how much our hidden layer weights contributed to output error
self.z2_delta = self.z2_error * self.sigmoidPrime(self.z2) # applying derivative of sigmoid to z2 error
self.W1 += X.T.dot(self.z2_delta) # adjusting first set (input --> hidden) weights
self.W2 += self.z2.T.dot(self.o_delta) # adjusting second set (hidden --> output) weights
def train(self, X, y):
o = self.forward(X)
self.backward(X, y, o)
NN = Neural_Network()
for i in range(100): # trains the NN 100 times
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" + str(NN.forward(X)))
print("Loss: \n" + str(np.mean(np.square(y - NN.forward(X))))) # mean sum squared loss
print("\n")
NN.train(X, y)
But when I run this code, my all predicted outputs become = 1 after few iterations and then stays the same for up to all 100 iterations. what is the problem in the code?

Few mistakes that I've noticed:
The output of your network is a sigmoid, i.e. a value between [0, 1] -- suits for predicting probabilities. But the target seems to be a value between [0, 4]. This explains the desire of the network to maximize the output to get as close as possible to large labels. But it can't go more than 1.0 and gets stuck.
You should either get rid of the final sigmoid or pre-process the label and scale it to [0, 1]. Both options will make it learn better.
You don't use the learning rate (effectively setting it to 1.0), which is probably a bit high, so it's possible for the NN to diverge. My experiments showed that 0.01 is a good learning rate, but you can play around with that.
Other than this, your backprop seems working right.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiclass logistic regression - implementation question - python

Related

Neural network built from scratch in python to classify digits stuck at 11.35 percent accuracy. I am using the MNIST dataset

Why does regularization in pytorch and scratch code does not match and what is the formula used for regularization in pytorch?

Scipy fails to minimize cost function

Neural network can learn |sin(x)| for [0,pi] but not [0,2pi] or [0, 4pi]

Implementing back propagation using numpy and python for cleveland dataset

Categories

Resources