Neural Network ReLU Outputting All 0s

Neural Network ReLU Outputting All 0s - python

Here is a link to my project: https://github.com/aaronnoyes/neural-network/blob/master/nn.py
I have implemented a basic neural network in python. By default it uses a sigmoid activation function and that works great. I'm trying to compare changes in learning rate between activation functions, so I tried implementing an option for using ReLU. When it runs however, the weights all drop immediately to 0.
if (self.activation == 'relu'):
d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * self.relu(self.output, True)))
d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) * self.relu(self.output, True), self.weights2.T) * self.relu(self.layer1, True)))
I'm almost sure the issue is in lines 54-56 of my program (shown above) when I try to apply gradient descent. How can I fix this so the program will actually update weights appropriately? My relu implementation is as follows:
def relu(self, x, derivative=False):
if derivative:
return 1. * (x > 0)
else:
return x * (x > 0)

There are two problems with your code:
You are applying a relu to the output layer as well. The recommended standard approach is to use identity as output layer activation for regression and sigmoid/softmax for classification.
You are using a learning rate of 1, which is way to high. (Usual test values are 1e-2 and smaller.)
I changed the output activation to sigmoid even when using relu activation in the hidden layers
def feedforward(self):
...
if (self.activation == 'relu'):
self.layer1 = self.relu(np.dot(self.input, self.weights1))
self.output = self.sigmoid(np.dot(self.layer1, self.weights2))
return self.output
def backprop(self):
...
if (self.activation == 'relu'):
d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * self.sigmoid(self.output, True)))
d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) * self.relu(self.output, True), self.weights2.T) * self.relu(self.layer1, True)))
and used a smaller learning rate
# update the weights with the derivative (slope) of the loss function
self.weights1 += .01 * d_weights1
self.weights2 += .01 * d_weights2
and this is the result:
Actual Output : [[ 0.00000] [ 1.00000] [ 1.00000] [ 0.00000]]
Predicted Output: [[ 0.10815] [ 0.92762] [ 0.94149] [ 0.05783]]

Related

XOR neural network with python

I developed a neural network representing an XOR gate using the sigmoid function as the activation function and the loss function as the objective function.
the question is to let the training continues while the training error is above 0.3
import numpy as np
np.random.seed(0)
def sigmoid (x):
# compute and return the sigmoid activation value for a
# given input value
return 1.0/(1 + np.exp(-x))
def sigmoid_derivative(x):
# compute the derivative of the sigmoid function
return x * (1 - x)
def loss(residual):
# compute the loss function
return residual * residual
def loss_derivative(residual):
# compute the derivative of the loss function
return 2 * residual
#Define the inputs and stucture of neural network
# XOR Inputs
inputs = np.array([[0,0],[0,1],[1,0],[1,1]])
# XOR Output
expected_output = np.array([[0],[1],[1],[0]])
# Learning rate
lr = 0.1
# Number of neurons
n_x = 2
n_h = 2
n_y = 1
#Random weights initialization
hidden_weights = np.random.rand(n_x,n_h)
output_weights = np.random.rand(n_h,n_y)
print("Initial hidden weights: ",end='')
print(*hidden_weights)
print("Initial output weights: ",end='')
print(*output_weights)
#Training algorithm
# loop over each individual data point and train
# the network on it
count = 0
while(True):
# FEEDFORWARD:
# feedforward the activation at the current layer by
# taking the dot product between the activation and the weight matrix
hidden_layer_activation = np.dot(inputs,hidden_weights)
hidden_layer_output = sigmoid(hidden_layer_activation)
output_layer_activation = np.dot(hidden_layer_output,output_weights)
predicted_output = loss(output_layer_activation)
# BACKPROPAGATION
#the first phase of backpropagation is to compute the
# difference between the *prediction* and the true target value y-ŷ
error = predicted_output - expected_output
if (error < 0.3).any(): break
d_predicted_output = error * loss_derivative(predicted_output)
error_hidden_layer = d_predicted_output.dot(output_weights.T)
d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)
#Updating Weights
output_weights += hidden_layer_output.T.dot(d_predicted_output) * lr
hidden_weights += inputs.T.dot(d_hidden_layer) * lr
print("Final hidden weights: ",end='')
print(*hidden_weights)
print("Final output weights: ",end='')
print(*output_weights)
print("\nOutput from neural network after learning: ",end='')
print(*predicted_output)
but when I write a condition like
if (error < 0.3).any(): break
the program does only one iteration and then stop
can anyone tell me what the problem is with my code?

You need to use the absolute error, as if it underpredicts, the error is negative:
np.abs(error < 0.3).any()
However, you also probably want the mean error:
np.mean(np.abs(error)) < 0.3

Strange result Neural network Python

I followed an article here: TowardsDataScience.
I wrote math equations about the network, everything made sense.
However, after writing the code, results are pretty strange, like it is predicting always same class...
I spent a lot of time on it, changed many things, but I still cannot understand what I did wrong.
Here is the code:
# coding: utf-8
from mnist import MNIST
import numpy as np
import math
import os
import pdb
DATASETS_PREFIX = '../Datasets/MNIST'
mndata = MNIST(DATASETS_PREFIX)
TRAINING_IMAGES, TRAINING_LABELS = mndata.load_training()
TESTING_IMAGES , TESTING_LABELS = mndata.load_testing()
### UTILS
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def d_sigmoid(x):
return x.T * (1 - x)
#return np.dot(x.T, 1.0 - x)
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
def d_softmax(x):
#This function has not yet been tested.
return x.T * (1 - x)
def tanh(x):
return np.tanh(x)
def d_tanh(x):
return 1 - x.T * x
def normalize(image):
return image / (255.0 * 0.99 + 0.01)
### !UTILS
class NeuralNetwork(object):
"""
This is a 3-layer neural network (1 hidden layer).
#_input : input layer
#_weights1: weights between input layer and hidden layer (matrix shape (input.shape[1], 4))
#_weights2: weights between hidden layer and output layer (matrix shape (4, 1))
#_y : output
#_output : computed output
#_alpha : learning rate
"""
def __init__(self, xshape, yshape):
self._neurones_nb = 20
self._input = None
self._weights1 = np.random.randn(xshape, self._neurones_nb)
self._weights2 = np.random.randn(self._neurones_nb, yshape)
self._y = np.mat(np.zeros(yshape))
self._output = np.mat(np.zeros(yshape))
self._alpha1 = 0.1
self._alpha2 = 0.1
self._function = sigmoid
self._derivative = d_sigmoid
self._epoch = 1
def Train(self, xs, ys):
for j in range(self._epoch):
for i in range(len(xs)):
self._input = normalize(np.mat(xs[i]))
self._y[0, ys[i]] = 1
self.feedforward()
self.backpropagation()
self._y[0, ys[i]] = 0
def Predict(self, image):
self._input = normalize(image)
out = self.feedforward()
return out
def feedforward(self):
self._layer1 = self._function(np.dot(self._input, self._weights1))
self._output = self._function(np.dot(self._layer1, self._weights2))
return self._output
def backpropagation(self):
d_weights2 = np.dot(
self._layer1.T,
2 * (self._y - self._output) * self._derivative(self._output)
)
d_weights1 = np.dot(
self._input.T,
np.dot(
2 * (self._y - self._output) * self._derivative(self._output),
self._weights2.T
) * self._derivative(self._layer1)
)
self._weights1 += self._alpha1 * d_weights1
self._weights2 += self._alpha2 * d_weights2
if __name__ == '__main__':
neural_network = NeuralNetwork(len(TRAINING_IMAGES[0]), 10)
print('* training neural network')
neural_network.Train(TRAINING_IMAGES, TRAINING_LABELS)
print('* testing neural network')
count = 0
for i in range(len(TESTING_IMAGES)):
image = np.mat(TESTING_IMAGES[i])
expected = TESTING_LABELS[i]
prediction = neural_network.Predict(image)
if i % 100 == 0: print(expected, prediction)
#print(f'* results: {count} / {len(TESTING_IMAGES)}')
Thank you for your help, really appreciated.
Julien

Well, I don't see any error in the implementation so considering your network, this could be improved by doing two things :
One epoch is not enough. Like not a all ! You need to pass over your data multiple times (a great minimum is 10 times, average might be around 100 epochs and this could go up to 5000 or more)
You network is a shallow network, e.g. really simple. To detect difficult things (like images), you could implement a CNN (Convolutional Neural Network) or first trying to deepen your network and complexify it
=> Try to add layers (3, 4, 5 etc..) and then add neurons to each layers (50, 60, ..) depending of the size of your input. You can still go up to 800, 900 or more.

Autoencoder using Backpropagation

I'm trying to implement autoencoder using this resource which implements backpropagation algorithm. I'm using the same feed forward algorithm implemented there but however it gives me a large error. In Autoencoders, the sigmoid function to be applied to the hidden for encoding and again to the output for decoding.
def feedForwardPropagation(network, row, output=False):
currentInput = row
if not output:
layer = network[0]
else:
layer = network[1]
layer_output = []
for neuron in layer:
activation = neuron_activation(neuron['weights'], currentInput)
neuron['output'] = neuron_transfer(activation)
layer_output.append(neuron['output'])
currentInput = layer_output
return currentInput
def backPropagationNetworkErrorUpdate(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network) - 1:
# Hidden Layers weight error compute
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]: # It starts with computing weight error of output neuron.
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
# Output layer error computer
for j in range(len(layer)):
neuron = layer[j]
error = expected[j] - neuron['output']
errors.append(error)
for j in range(len(layer)):
neuron = layer[j]
transfer = neuron['output'] * (1.0 - neuron['output'])
neuron['delta'] = errors[j] * transfer
def updateWeights(network, row, l_rate, momentum=0.5):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['velocity'][j] = momentum * neuron['velocity'][j] + l_rate * neuron['delta'] * inputs[j]
neuron['weights'][j] += neuron['velocity'][j]
neuron['velocity'][-1] = momentum * neuron['velocity'][-1] + l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += neuron['velocity'][-1]
def trainNetwork(network, train, l_rate, n_epoch, n_outputs, test_set):
hitrate = list()
errorRate = list()
epoch_step = list()
for epoch in range(n_epoch):
sum_error = 0
np.random.shuffle(train)
for row in train:
outputs = feedForwardPropagation(network, row)
outputs = feedForwardPropagation(network, outputs)
expected = row
sum_error += sum([(expected[i] - outputs[i]) ** 2 for i in range(len(expected))])
backPropagationNetworkErrorUpdate(network, expected)
updateWeights(network, row, l_rate)
if epoch % 10 == 0:
errorRate.append(sum_error)
epoch_step.append(epoch)
log = '>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error)
print(log, n_epoch, len(network[1][0]['weights']) - 1, l_rate)
return epoch_step, errorRate
For autoencoding I use one hidden layer, n inputs and n outputs. I believe I have gone wrong with the feedforward implementation. Any suggestions will be greatly appreciated.
Edit: I tried computing the weights after first layer (continue commented in feedforward method) and then decoding the output using the sigmoid function commented in trainNetwork method. However, the error didn't change after 100 epochs

The characteristics of your problem (like error barely changing over 100 epochs, and remaining with a big error), suggest that the problem might be (and probably is) caused by the order of size of your input data, and the fact that you use sigmoids as activation function. I will give you a simple example:
Suppose I want to reconstruct the value x=100.
If I train it with an autoencoder on a single neuron, the reconstructed output will be given by r = sigmoid(w*x), where the error is the difference between the actual input and the reconstruction, i.e. e = x - r. Note, that since a sigmoid function is bounded between -1 and 1 , the minimum error you can get in this case is e = 100-1 = 99. No matter how good you train the weight w in this case, r=sigmoid(w*x) would always be bounded by one.
This means that the sigmoid activation function is not able to represent your data in this case.
To solve this problem, either:
Downscale or Normalize your input data to a size between -1 and 1, or
Change the sigmoid to another activation function, that can actually reconstruct the right size of order of your data.
Hope this helps.

Why does my XOR neural net converge to 0.5, python

I've implemented the following neural network to solve the XOR problem in Python. My neural network consists of an input layer of 2 neurons, 1 hidden layer of 2 neurons and an output layer of 1 neuron. I am using the Sigmoid function as the activation function for both the hidden layer and output layer. Can someone please explain what I have done wrong.
import numpy
import scipy.special
class NeuralNetwork:
def __init__(self, inputNodes, hiddenNodes, outputNodes, learningRate):
self.iNodes = inputNodes
self.hNodes = hiddenNodes
self.oNodes = outputNodes
self.wIH = numpy.random.normal(0.0, pow(self.iNodes, -0.5), (self.hNodes, self.iNodes))
self.wOH = numpy.random.normal(0.0, pow(self.hNodes, -0.5), (self.oNodes, self.hNodes))
self.lr = learningRate
self.activationFunction = lambda x: scipy.special.expit(x)
def train(self, inputList, targetList):
inputs = numpy.array(inputList, ndmin=2).T
targets = numpy.array(targetList, ndmin=2).T
#print(inputs, targets)
hiddenInputs = numpy.dot(self.wIH, inputs)
hiddenOutputs = self.activationFunction(hiddenInputs)
finalInputs = numpy.dot(self.wOH, hiddenOutputs)
finalOutputs = self.activationFunction(finalInputs)
outputErrors = targets - finalOutputs
hiddenErrors = numpy.dot(self.wOH.T, outputErrors)
self.wOH += self.lr * numpy.dot((outputErrors * finalOutputs * (1.0 - finalOutputs)), numpy.transpose(hiddenOutputs))
self.wIH += self.lr * numpy.dot((hiddenErrors * hiddenOutputs * (1.0 - hiddenOutputs)), numpy.transpose(inputs))
def query(self, inputList):
inputs = numpy.array(inputList, ndmin=2).T
hiddenInputs = numpy.dot(self.wIH, inputs)
hiddenOutputs = self.activationFunction(hiddenInputs)
finalInputs = numpy.dot(self.wOH, hiddenOutputs)
finalOutputs = self.activationFunction(finalInputs)
return finalOutputs
nn = NeuralNetwork(2, 2, 1, 0.01)
data = [[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 0]]
epochs = 10
for e in range(epochs):
for record in data:
inputs = numpy.asfarray(record[1:])
targets = record[0]
#print(targets)
#print(inputs, targets)
nn.train(inputs, targets)
print(nn.query([0, 0]))
print(nn.query([1, 0]))
print(nn.query([0, 1]))
print(nn.query([1, 1]))

Several reasons.
I don't think you should be taking the activation function of everything, especially in your query function. I think you have muddled up the ideas of neuron to neuron weightings (wIH and wOH) with the activation values.
Because of your muddle you have missed the idea of re-using your query function as part of your training. You should think of it as feed forward activation levels to the output, compare the result with the target output to give an array of errors which are then fed backwards using the derivative of the sigmoid function to adjust the weightings.
I would put the function and it's derivative in rather than importing from scipy as they are so simple. Also "it's recommended" to use tanh and d/dx.tanh for the hidden layer functions (can't remember why, probably not needed for this simple net)
# transfer functions
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# derivative of sigmoid
def dsigmoid(y):
return y * (1.0 - y)
# using tanh over logistic sigmoid for the hidden layer is recommended
def tanh(x):
return np.tanh(x)
# derivative for tanh sigmoid
def dtanh(y):
return 1 - y*y
Finally, you might be able to figure out what I did a while ago with a neural net using just numpy here https://github.com/paddywwoof/Machine-Learning/blob/master/perceptron.py

Neural Network loss starts increasing while acc is increasing on both train/val data sets

For past few days I have been debugging my NN but I can't find an issue.
I've created total raw implementation of multi-layer perceptron for identifying MNIST dataset images.
Network seems to learn because after train cycle test data accuracy is above 94% accuracy. I have problem with loss function - it starts increasing after a while, when test/val accuracy reaches ~76%.
Can someone please check my forward/backprop math and tell me if my loss function is properly implemented, or suggest what might be wrong?
NN structure:
input layer: 758 nodes, (1 node per pixel)
hidden layer 1: 300 nodes
hidden layer 2: 75 nodes
output layer: 10 nodes
NN activation functions:
input layer -> hidden layer 1: ReLU
hidden layer 1 -> hidden layer 2: ReLU
hidden layer 2 -> output layer 3: Softmax
NN Loss function:
Categorial Cross-Entropy
Full CLEAN code available here as Jupyter Notebook.
Neural Network forward/backward pass:
def train(self, features, targets):
n_records = features.shape[0]
# placeholders for weights and biases change values
delta_weights_i_h1 = np.zeros(self.weights_i_to_h1.shape)
delta_weights_h1_h2 = np.zeros(self.weights_h1_to_h2.shape)
delta_weights_h2_o = np.zeros(self.weights_h2_to_o.shape)
delta_bias_i_h1 = np.zeros(self.bias_i_to_h1.shape)
delta_bias_h1_h2 = np.zeros(self.bias_h1_to_h2.shape)
delta_bias_h2_o = np.zeros(self.bias_h2_to_o.shape)
for X, y in zip(features, targets):
### forward pass
# input to hidden 1
inputs_to_h1_layer = np.dot(X, self.weights_i_to_h1) + self.bias_i_to_h1
inputs_to_h1_layer_activated = self.activation_ReLU(inputs_to_h1_layer)
# hidden 1 to hidden 2
h1_to_h2_layer = np.dot(inputs_to_h1_layer_activated, self.weights_h1_to_h2) + self.bias_h1_to_h2
h1_to_h2_layer_activated = self.activation_ReLU(h1_to_h2_layer)
# hidden 2 to output
h2_to_output_layer = np.dot(h1_to_h2_layer_activated, self.weights_h2_to_o) + self.bias_h2_to_o
h2_to_output_layer_activated = self.softmax(h2_to_output_layer)
# output
final_outputs = h2_to_output_layer_activated
### backpropagation
# output to hidden2
error = y - final_outputs
output_error_term = error.dot(self.dsoftmax(h2_to_output_layer_activated))
h2_error = np.dot(output_error_term, self.weights_h2_to_o.T)
h2_error_term = h2_error * self.activation_dReLU(h1_to_h2_layer_activated)
# hidden2 to hidden1
h1_error = np.dot(h2_error_term, self.weights_h1_to_h2.T)
h1_error_term = h1_error * self.activation_dReLU(inputs_to_h1_layer_activated)
# weight & bias step (input to hidden)
delta_weights_i_h1 += h1_error_term * X[:, None]
delta_bias_i_h1 = np.sum(h1_error_term, axis=0)
# weight & bias step (hidden1 to hidden2)
delta_weights_h1_h2 += h2_error_term * inputs_to_h1_layer_activated[:, None]
delta_bias_h1_h2 = np.sum(h2_error_term, axis=0)
# weight & bias step (hidden2 to output)
delta_weights_h2_o += output_error_term * h1_to_h2_layer_activated[:, None]
delta_bias_h2_o = np.sum(output_error_term, axis=0)
# update the weights and biases
self.weights_i_to_h1 += self.lr * delta_weights_i_h1 / n_records
self.weights_h1_to_h2 += self.lr * delta_weights_h1_h2 / n_records
self.weights_h2_to_o += self.lr * delta_weights_h2_o / n_records
self.bias_i_to_h1 += self.lr * delta_bias_i_h1 / n_records
self.bias_h1_to_h2 += self.lr * delta_bias_h1_h2 / n_records
self.bias_h2_to_o += self.lr * delta_bias_h2_o / n_records
Activation function implementation:
def activation_ReLU(self, x):
return x * (x > 0)
def activation_dReLU(self, x):
return 1. * (x > 0)
def softmax(self, x):
z = x - np.max(x)
return np.exp(z) / np.sum(np.exp(z))
def dsoftmax(self, x):
# TODO: vectorise math
vec_len = len(x)
J = np.zeros((vec_len, vec_len))
for i in range(vec_len):
for j in range(vec_len):
if i == j:
J[i][j] = x[i] * (1 - x[j])
else:
J[i][j] = -x[i] * x[j]
return J
Loss function implementation:
def categorical_cross_entropy(pred, target):
return (1/len(pred)) * -np.sum(target * np.log(pred))

I managed to find the problem.
Neural Network is large so I couldn't stick everything to this question. Though if you check my Jupiter Notebook you could see implementation of my Softmax activation function and how do I use it in train cycle.
Problem with Loss miscalculation was caused by the fact my Softmax implementation worked only for ndarray dim == 1.
During training step I have put only ndarray with dim 1 to activtion function so NN learned well, but my run() function was returning wrong predictions as I have inserted whole test data to it, not only single row of it in for loop. Because of that it calculated Softmax "matrix-wise" rather than "row-wise".
This is very fast fix for it:
def softmax(self, x):
# TODO: vectorise math to speed up computation
softmax_result = None
if x.ndim == 1:
z = x - np.max(x)
softmax_result = np.exp(z) / np.sum(np.exp(z))
return softmax_result
else:
softmax_result = []
for row in x:
z = row - np.max(row)
row_softmax_result = np.exp(z) / np.sum(np.exp(z))
softmax_result.append(row_softmax_result)
return np.array(softmax_result)
Yet this code should be vectorised to avoid for loops and ifs if possible because currently it's ugly and takes too much PC resources.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Neural Network ReLU Outputting All 0s - python

Related

XOR neural network with python

Strange result Neural network Python

Autoencoder using Backpropagation

Why does my XOR neural net converge to 0.5, python

Neural Network loss starts increasing while acc is increasing on both train/val data sets

Categories

Resources