Can ReLU replace a Sigmoid Activation Function in Neural Network

Can ReLU replace a Sigmoid Activation Function in Neural Network - python

I'm new into this and I'm trying to replace the sigmoid activation function in the following simple NN with ReLU. Can I do that? I've tried replacing the sigmoid function, but it's not working. The output should be the AND gate(if input (0,0)-> output 0).
import numpy as np
# sigmoid function
def nonlin(x, deriv=False):
if(deriv == True):
return x*(1-x)
return 1/(1+np.exp(-x))
# input dataset
X = np.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
# output dataset
y = np.array([[0, 0, 0, 1]]).T
# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)
# initialize weights randomly with mean 0
syn0 = 2*np.random.random((2, 1)) - 1
for iter in xrange(10000):
# forward propagation
l0 = X
l1 = nonlin(np.dot(l0,syn0))
# how much did we miss?
l1_error = y - l1
l1_delta = l1_error * nonlin(l1, True)
syn0 += np.dot(l0.T,l1_delta)

Related

Neural Network from scratch using Python

I am trying to implement a NN from scratch in Python. It has 2 layers: input layer –
output layer. The input layer will have 4 neurons and the output layer will have only a
single node (+biases). I have the following code but I get the error message: ValueError: shapes (4,2) and (4,1) not aligned: 2 (dim 1) != 4 (dim 0). Can someone help me?
import numpy as np
# Step 1: Define input and output data
X = np.array([[0, 0, 1, 1], [0, 1, 0, 1]])
y = np.array([[0, 1, 0, 1]])
# Step 2: Define the number of input neurons, hidden neurons, and output neurons
input_neurons = 4
output_neurons = 1
# Step 3: Define the weights and biases for the network
weights = np.random.rand(input_neurons, output_neurons)
biases = np.random.rand(output_neurons, 1)
# Step 4: Define the sigmoid activation function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Step 5: Define the derivative of the sigmoid function
def sigmoid_derivative(x):
return sigmoid(x) * (1 - sigmoid(x))
# Step 6: Define the forward propagation function
def forward_propagation(X, weights, biases):
output = sigmoid(np.dot(X.T, weights) + biases)
return output
# Step 7: Define the backward propagation function
def backward_propagation(X, y, output, weights, biases):
error = output - y
derivative = sigmoid_derivative(output)
delta = error * derivative
weights_derivative = np.dot(X, delta.T)
biases_derivative = np.sum(delta, axis=1, keepdims=True)
return delta, weights_derivative, biases_derivative
# Step 8: Define the train function
def train(X, y, weights, biases, epochs, learning_rate):
for i in range(epochs):
output = forward_propagation(X, weights, biases)
delta, weights_derivative, biases_derivative = backward_propagation(X, y, output, weights, biases)
weights -= learning_rate * weights_derivative
biases -= learning_rate * biases_derivative
error = np.mean(np.abs(delta))
print("Epoch ", i, " error: ", error)
# Step 9: Train the network
epochs = 5000
learning_rate = 0.1
train(X, y, weights, biases, epochs, learning_rate)

You have an output layer with one neuron, so your output should be of one dimension.
You're assuming that the output has 4 dims:
y = np.array([[0, 1, 0, 1]])
Since you are giving two inputs (a pair of 4 dim inputs) like this,
X = np.array([[0, 0, 1, 1], [0, 1, 0, 1]])
You need also give two outputs (in one dim), for example like this:
y = np.array([[0],[1]])
Hope this helps.

Trying to understand cross_entropy loss in PyTorch

This is a very newbie question but I'm trying to wrap my head around cross_entropy loss in Torch so I created the following code:
x = torch.FloatTensor([
[1.,0.,0.]
,[0.,1.,0.]
,[0.,0.,1.]
])
print(x.argmax(dim=1))
y = torch.LongTensor([0,1,2])
loss = torch.nn.functional.cross_entropy(x, y)
print(loss)
which outputs the following:
tensor([0, 1, 2])
tensor(0.5514)
What I don't understand is given my input matches the expected output why is the loss not 0?

That is because the input you give to your cross entropy function is not the probabilities as you did but the logits to be transformed into probabilities with this formula:
probas = np.exp(logits)/np.sum(np.exp(logits), axis=1)
So here the matrix of probabilities pytorch will use in your case is:
[0.5761168847658291, 0.21194155761708547, 0.21194155761708547]
[0.21194155761708547, 0.5761168847658291, 0.21194155761708547]
[0.21194155761708547, 0.21194155761708547, 0.5761168847658291]

torch.nn.functional.cross_entropy function combines log_softmax(softmax followed by a logarithm) and nll_loss(negative log likelihood loss) in a single
function, i.e. it is equivalent to F.nll_loss(F.log_softmax(x, 1), y).
Code:
x = torch.FloatTensor([[1.,0.,0.],
[0.,1.,0.],
[0.,0.,1.]])
y = torch.LongTensor([0,1,2])
print(torch.nn.functional.cross_entropy(x, y))
print(F.softmax(x, 1).log())
print(F.log_softmax(x, 1))
print(F.nll_loss(F.log_softmax(x, 1), y))
output:
tensor(0.5514)
tensor([[-0.5514, -1.5514, -1.5514],
[-1.5514, -0.5514, -1.5514],
[-1.5514, -1.5514, -0.5514]])
tensor([[-0.5514, -1.5514, -1.5514],
[-1.5514, -0.5514, -1.5514],
[-1.5514, -1.5514, -0.5514]])
tensor(0.5514)
Read more about torch.nn.functional.cross_entropy loss function from here.

Complete, copy/paste runnable example showing an example categorical cross-entropy loss calculation via:
-paper+pencil+calculator
-NumPy
-PyTorch
Other than minor rounding differences all 3 come out to be the same:
import torch
import torch.nn.functional as F
import numpy as np
def main():
### paper + pencil + calculator calculation #################
"""
predictions before softmax:
columns
(4 categories)
rows 1, 4, 1, 1
(3 samples) 5, 1, 2, 1
1, 2, 5, 1
ground truths (NOT one hot encoded)
1, 0, 2
preds softmax calculation:
(e^1/(e^1+e^4+e^1+e^1)), (e^4/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1))
(e^5/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1)), (e^2/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1))
(e^1/(e^1+e^2+e^5+e^1)), (e^2/(e^1+e^2+e^5+e^1)), (e^5/(e^1+e^2+e^5+e^1)), (e^1/(e^1+e^2+e^5+e^1))
preds after softmax:
0.04332, 0.87005, 0.04332, 0.04332
0.92046, 0.01686, 0.04583, 0.01686
0.01686, 0.04583, 0.92046, 0.01686
categorical cross-entropy loss calculation:
(-ln(0.87005) + -ln(0.92046) + -ln(0.92046)) / 3 = 0.10166
Note the loss ends up relatively low because all 3 predictions are correct
"""
### calculation via NumPy ###################################
# predictions from model (just made up example data in this case)
# rows = 3 samples, cols = 4 categories
preds = np.array([[1, 4, 1, 1],
[5, 1, 2, 1],
[1, 2, 5, 1]], dtype=np.float32)
# ground truths, NOT one hot encoded
gndTrs = np.array([1, 0, 2], dtype=np.int64)
preds = softmax(preds)
loss = calcCrossEntropyLoss(preds, gndTrs)
print('\n' + 'NumPy loss = ' + str(loss) + '\n')
### calculation via PyTorch #################################
# predictions from model (just made up example data in this case)
# rows = 3 samples, cols = 4 categories
preds = torch.tensor([[1, 4, 1, 1],
[5, 1, 2, 1],
[1, 2, 5, 1]], dtype=torch.float32)
# ground truths, NOT one hot encoded
gndTrs = torch.tensor([1, 0, 2], dtype=torch.int64)
loss = F.cross_entropy(preds, gndTrs)
print('PyTorch loss = ' + str(loss) + '\n')
# end function
def softmax(x: np.ndarray) -> np.ndarray:
numSamps = x.shape[0]
for i in range(numSamps):
x[i] = np.exp(x[i]) / np.sum(np.exp(x[i]))
# end for
return x
# end function
def calcCrossEntropyLoss(preds: np.ndarray, gndTrs: np.ndarray) -> np.ndarray:
assert len(preds.shape) == 2
assert len(gndTrs.shape) == 1
assert preds.shape[0] == gndTrs.shape[0]
numSamps = preds.shape[0]
mySum = 0.0
for i in range(numSamps):
# Note: in numpy, "log" is actually natural log (ln)
mySum += -1 * np.log(preds[i, gndTrs[i]])
# end for
crossEntLoss = mySum / numSamps
return crossEntLoss
# end function
if __name__ == '__main__':
main()
program output:
NumPy loss = 0.10165966302156448
PyTorch loss = tensor(0.1017)

Errors from building a one hidden neural network

I'm currently building my 3-4-1 neural network from scratch using numpy (I avoided using keras and tensorflow for the purpose of learning and trying to demonstrate my knowledge instead of using pre-built libraries to do all the work), the problems I find when I run the program are:
1/ getting "nan" values after a certain number of iterations in the "updated" weights, lowering the learning rate only delays the problem and doesn't solve it.
2/ the second problem is the very low predicting accuracy.
I would like to know what causes these bugs on my program and would appreciate any help.
here is the code:
# Import our dependencies
from numpy import exp, array, random, dot, ones_like, where
# Create our Artificial Neural Network class
class ArtificialNeuralNetwork():
# initializing the class
def __init__(self):
# generating the same synaptic weights every time the program runs
random.seed(1)
# synaptic weights (3 × 4 Matrix) of the hidden layer
self.w_ij = 2 * random.rand(3, 4) - 1
# synaptic weights (4 × 1 Matrix) of the output layer
self.w_jk = 2 * random.rand(4, 1) - 1
def LeakyReLU(self, x):
# The Leaky ReLU (short for Rectified Linear Unit) activation function will be applied to the inputs of the hidden layer
# The activation function will return the same value of x if x is positive
# while it will multiply the negative values of x by the alpha parameter
# we used in this example the Leaky ReLU instead of the standard ReLU activation function to avoid the dying ReLU problem
return where(x > 0, x, x * 0.01)
def LeakyReLUDerivative(self, x, α = 0.01):
# The Leaky ReLU Derivative will return 1 for every positive value in the x array
# while returning the value of the parameter alpha for every negative value
x[x > 0] = 1 # returns 1 for every positive value in the x array
x[x <= 0] = α # returns α for every negative value in the x array
return x
def Sigmoid(self, x):
# The Sigmoid activation function will turn every input value into probabilities between 0 and 1
# the probabilistic values help us assert which class x belongs to
return 1 / (1 + exp(-x))
def SigmoidDerivative(self, x):
# The derivative of the Sigmoid activation function will be used to calculate the gradient during the backpropagation process
# and help optimize the random starting synaptic weights
return x * (1 - x)
def train(self, x, y, learning_rate, iterations):
# x: training set of data
# y: the actual output of the training data
for i in range(iterations):
z_ij = dot(x, self.w_ij) # the dot product of the weights of the hidden layer and the inputs
a_ij = self.LeakyReLU(z_ij) # using the Leaky ReLU activation function to introduce non-linearity to our Neural Network
z_jk = dot(a_ij, self.w_jk) # the same precedent process will be applied to find the last input of the output layer
a_jk = self.Sigmoid(z_jk) # this time the Sigmoid activation function will be used instead of Leaky ReLU
dl_jk = -y/a_jk + (1 - y)/(1 - a_jk) # calculating the derivative of the cross entropy loss wrt output
da_jk = self.SigmoidDerivative(a_jk) # calculating the derivative of the Sigmoid activation function wrt the input (before activation) of the output layer
dz_jk = a_ij # calculating the derivative of the inputs of the hidden layer (before activation) wrt weights of the output layer
dl_ij = dot(da_jk * dl_jk, self.w_jk.T) # calculating the derivative of the cross entropy loss wrt activated input of the hidden layer
# to do so we multiply the derivative of the cross entropy loss wrt output by the derivative of the Sigmoid activation function wrt the input (before activation) of the output layer by the derivative of the inputs of the hidden layer (before activation) wrt weights of the output layer
da_ij = self.LeakyReLUDerivative(z_ij) # calculating the derivative of the Leaky ReLU activation function wrt the inputs of the hidden layer (before activation)
dz_ij = x # calculating the derivative of the inputs of the hidden layer (before activation) wrt weights of the hidden layer
# calculating the gradient using the chain rule
gradient_ij = dot(dz_ij.T , dl_ij * da_ij)
gradient_jk = dot(dz_jk.T , dl_jk * da_jk)
# calculating the new optimal weights
self.w_ij = self.w_ij - learning_rate * gradient_ij
self.w_jk = self.w_jk - learning_rate * gradient_jk
def predict(self, inputs):
# predicting the class of the input data after weights optimization
output_from_layer1 = self.LeakyReLU(dot(inputs, self.w_ij)) # the output of the hidden layer
output_from_layer2 = self.Sigmoid(dot(output_from_layer1, self.w_jk)) # the output of the output layer
return output_from_layer1, output_from_layer2
# the function will print the initial starting weights before training
def SynapticWeights(self):
print("Layer 1 (4 neurons, each with 3 inputs): ")
print("w_ij: ", self.w_ij)
print("Layer 2 (1 neuron, with 4 inputs): ")
print("w_jk: ", self.w_jk)
def main():
ANN = ArtificialNeuralNetwork()
ANN.SynapticWeights()
# the training inputs
x = array([[0, 0, 1], [0, 1, 1], [1, 0, 1], [0, 1, 0], [1, 0, 0], [1, 1, 1], [0, 0, 0]])
# the training outputs
y = array([[0, 1, 1, 1, 1, 0, 0]]).T
ANN.train(x, y, 1, 10000)
# Printing the new synaptic weights after training
print("New synaptic weights after training: ")
print("w_ij: ", ANN.w_ij)
print("w_jk: ", ANN.w_jk)
# Our prediction after feeding the ANN with new set of data
print("Considering new situation [1, 1, 0] -> ?: ")
print(ANN.predict(array([[1, 1, 0]])))
if __name__=="__main__":
main()

So, I changed a few things. (Disclaimer: I didn't check the correctness of the code)
Weight initialization: initialize to much smaller weights.
# synaptic weights (3 × 4 Matrix) of the hidden layer
self.w_ij = (2 * random.rand(3, 4) - 1)*0.1
# synaptic weights (4 × 1 Matrix) of the output layer
self.w_jk = (2 * random.rand(4, 1) - 1)*0.1
Weight initialization really matter.
I reduced the learning rate to 0.1.
ANN.train(x, y, .1, 500000)
I see the neural network perfectly fitting your data and not giving Nan even after 500,000 iterations.
print(ANN.predict(array([[0, 0, 1],
[0, 1, 1],
[1, 0, 1],
[0, 1, 0],
[1, 0, 0],
[1, 1, 1],
[0, 0, 0]])))

Softmax activation with cross entropy loss results in the outputs converging to exactly 0 and 1 for both classes, respectively

I have implemented a simple Neural Net with just a single sigmoid hidden layer, with the choice of a sigmoid or softmax output layer and squared error or cross entropy loss function, respectively. After much research on the softmax activation function, the cross entropy loss, and their derivatives (and with following this blog) I believe that my implementation seems correct.
When attempting to learn the simple XOR function, the NN with the sigmoid output learns to a very small loss very quickly when using single binary outputs of 0 and 1. However, when changing the labels to one-hot encodings of [1, 0] = 0 and [0, 1] = 1, the softmax implementation does not work. The loss consistently increases as the network's outputs converge to exactly [0, 1] for the two outputs on every input, yet the labels of the data set is perfectly balanced between [0, 1] and [1, 0].
My code is below, where the choice of using sigmoid or softmax at the output layer can be chosen by uncommenting the necessary two lines near the bottom of the code. I cannot figure out why the softmax implementation is not working.
import numpy as np
class MLP:
def __init__(self, numInputs, numHidden, numOutputs, activation):
self.numInputs = numInputs
self.numHidden = numHidden
self.numOutputs = numOutputs
self.activation = activation.upper()
self.IH_weights = np.random.rand(numInputs, numHidden) # Input -> Hidden
self.HO_weights = np.random.rand(numHidden, numOutputs) # Hidden -> Output
self.IH_bias = np.zeros((1, numHidden))
self.HO_bias = np.zeros((1, numOutputs))
# Gradients corresponding to weight matrices computed during backprop
self.IH_w_gradients = np.zeros_like(self.IH_weights)
self.HO_w_gradients = np.zeros_like(self.HO_weights)
# Gradients corresponding to biases computed during backprop
self.IH_b_gradients = np.zeros_like(self.IH_bias)
self.HO_b_gradients = np.zeros_like(self.HO_bias)
# Input, hidden and output layer neuron values
self.I = np.zeros(numInputs) # Inputs
self.L = np.zeros(numOutputs) # Labels
self.H = np.zeros(numHidden) # Hidden
self.O = np.zeros(numOutputs) # Output
# ##########################################################################
# ACIVATION FUNCTIONS
# ##########################################################################
def sigmoid(self, x, derivative=False):
if derivative:
return x * (1 - x)
return 1 / (1 + np.exp(-x))
def softmax(self, prediction, label=None, derivative=False):
if derivative:
return prediction - label
return np.exp(prediction) / np.sum(np.exp(prediction))
# ##########################################################################
# LOSS FUNCTIONS
# ##########################################################################
def squaredError(self, prediction, label, derivative=False):
if derivative:
return (-2 * prediction) + (2 * label)
return (prediction - label) ** 2
def crossEntropy(self, prediction, label, derivative=False):
if derivative:
return [-(y / x) for x, y in zip(prediction, label)] # NOT NEEDED ###############################
return - np.sum([y * np.log(x) for x, y in zip(prediction, label)])
# ##########################################################################
def forward(self, inputs):
self.I = np.array(inputs).reshape(1, self.numInputs) # [numInputs, ] -> [1, numInputs]
self.H = self.I.dot(self.IH_weights) + self.IH_bias
self.H = self.sigmoid(self.H)
self.O = self.H.dot(self.HO_weights) + self.HO_bias
if self.activation == 'SIGMOID':
self.O = self.sigmoid(self.O)
elif self.activation == 'SOFTMAX':
self.O = self.softmax(self.O) + 1e-10 # allows for log(0)
return self.O
def backward(self, labels):
self.L = np.array(labels).reshape(1, self.numOutputs) # [numOutputs, ] -> [1, numOutputs]
if self.activation == 'SIGMOID':
self.O_error = self.squaredError(self.O, self.L)
self.O_delta = self.squaredError(self.O, self.L, derivative=True) * self.sigmoid(self.O, derivative=True)
elif self.activation == 'SOFTMAX':
self.O_error = self.crossEntropy(self.O, self.L)
self.O_delta = self.softmax(self.O, self.L, derivative=True)
self.H_error = self.O_delta.dot(self.HO_weights.T)
self.H_delta = self.H_error * self.sigmoid(self.H, derivative=True)
self.IH_w_gradients += self.I.T.dot(self.H_delta)
self.HO_w_gradients += self.H.T.dot(self.O_delta)
self.IH_b_gradients += self.H_delta
self.HO_b_gradients += self.O_delta
return self.O_error
def updateWeights(self, learningRate):
self.IH_weights += learningRate * self.IH_w_gradients
self.HO_weights += learningRate * self.HO_w_gradients
self.IH_bias += learningRate * self.IH_b_gradients
self.HO_bias += learningRate * self.HO_b_gradients
self.IH_w_gradients = np.zeros_like(self.IH_weights)
self.HO_w_gradients = np.zeros_like(self.HO_weights)
self.IH_b_gradients = np.zeros_like(self.IH_bias)
self.HO_b_gradients = np.zeros_like(self.HO_bias)
sigmoidData = [
[[0, 0], 0],
[[0, 1], 1],
[[1, 0], 1],
[[1, 1], 0]
]
softmaxData = [
[[0, 0], [1, 0]],
[[0, 1], [0, 1]],
[[1, 0], [0, 1]],
[[1, 1], [1, 0]]
]
sigmoidMLP = MLP(2, 10, 1, 'SIGMOID')
softmaxMLP = MLP(2, 10, 2, 'SOFTMAX')
# SIGMOID #######################
# data = sigmoidData
# mlp = sigmoidMLP
# ###############################
# SOFTMAX #######################
data = softmaxData
mlp = softmaxMLP
# ###############################
numEpochs = 5000
for epoch in range(numEpochs):
losses = []
for i in range(len(data)):
print(mlp.forward(data[i][0])) # Print outputs
# mlp.forward(data[i][0]) # Don't print outputs
loss = mlp.backward(data[i][1])
losses.append(loss)
mlp.updateWeights(0.001)
# if epoch % 1000 == 0 or epoch == numEpochs - 1: # Print loss every 1000 epochs
print(np.mean(losses)) # Print loss every epoch

Contrary to all the information online, simply changing the derivative of the softmax cross entropy from prediction - label to label - prediction solved the problem. Perhaps I have something else backwards somewhere since every source I have come across has it as prediction - label.

Trouble implementing softmax activation and cross-entropy loss, and their derivatives in a neural net

I have implemented a simple multi-layer perceptron (with just 1 hidden layer) which can learn regression problems. I have written it so that the choice between sigmoid, tanh and relu activations can be specified. The squared error is then implemented as the loss function with each of these.
I now want to allow the choice to use the same model to learn multi-class classification problems, and so would like to implement the choice to use the softmax activation along with the cross-entropy loss. In my code below, the only changes that would need to be made (I hope) is to implement these in the activation() and loss() functions, and this should then work out of the box in both the forward pass and the backprop. This code runs a simulation of my model learning the XOR function, where the chosen activation function should be uncommented at the top.
However, I am really lost with implementing both of these functions, and even more so their derivatives. Any help and guidance is appreciated.
import sys
import numpy as np
activation = 'sigmoid'
# activation = 'tanh'
# activation = 'relu'
# activation = 'softmax'
numEpochs = 10000
class DataSet:
def __init__(self, data, trainSplit=1):
self.size = len(data)
self.trainSize = int(self.size * trainSplit)
self.testSize = self.size - self.trainSize
self.inputs, self.labels = [], []
for i in range(len(data)):
self.inputs.append(data[i][0])
self.labels.append(data[i][1])
self.trainInputs = self.inputs[:self.trainSize]
self.trainLabels = self.labels[:self.trainSize]
self.testInputs = self.inputs[self.trainSize:]
self.testLabels = self.labels[self.trainSize:]
try:
self.numInputs = len(self.inputs[0])
except TypeError:
self.numInputs = 1
try:
self.numOutputs = len(self.labels[0])
except TypeError:
self.numOutputs = 1
class MLP:
def __init__(self, numInputs, numHidden, numOutputs, activationFunction):
# MLP architecture sizes
self.numInputs = numInputs
self.numHidden = numHidden
self.numOutputs = numOutputs
self.activationFunction = activationFunction.lower()
# MLP weights
self.IH_weights = np.random.rand(numInputs, numHidden) # Input -> Hidden
self.HO_weights = np.random.rand(numHidden, numOutputs) # Hidden -> Output
# MLP biases
self.IH_bias = np.zeros((1, numHidden))
self.HO_bias = np.zeros((1, numOutputs))
# Gradients corresponding to weight matrices computed during backprop
self.IH_w_gradients = np.zeros_like(self.IH_weights)
self.HO_w_gradients = np.zeros_like(self.HO_weights)
# Gradients corresponding to biases computed during backprop
self.IH_b_gradients = np.zeros_like(self.IH_bias)
self.HO_b_gradients = np.zeros_like(self.HO_bias)
# Input, hidden and output layer neuron values
self.I = np.zeros(numInputs) # Inputs
self.L = np.zeros(numOutputs) # Labels
self.H = np.zeros(numHidden) # Hidden
self.O = np.zeros(numOutputs) # Output
def activation(self, x, derivative=False):
if self.activationFunction == 'sigmoid':
if derivative:
return x * (1 - x)
return 1 / (1 + np.exp(-x))
if self.activationFunction == 'tanh':
if derivative:
return 1 - np.tanh(x) ** 2
return np.tanh(x)
if self.activationFunction == 'relu':
if derivative:
return (x > 0).astype(float)
return np.maximum(0, x)
# TO DO ################################################################
if self.activationFunction == 'softmax':
if derivative:
return 0
return 0
print("ERROR: Activation function not found.")
sys.exit()
def loss(self, labels, predictions, derivative=False):
# TO DO ################################################################
# Cross-Entropy
if self.activationFunction == 'softmax':
if derivative:
return 0
return 0
# Squared Error
else:
if derivative:
return (-2 * labels) + (2 * predictions)
return (labels - predictions) ** 2
def forward(self, inputs):
# Ensure that inputs is a list
try:
len(inputs)
except TypeError:
inputs = [inputs]
self.I = np.array(inputs).reshape(1, self.numInputs)
self.H = self.I.dot(self.IH_weights) + self.IH_bias
self.H = self.activation(self.H)
self.O = self.H.dot(self.HO_weights) + self.HO_bias
self.O = self.activation(self.O)
def backwards(self, labels):
# Ensure that labels is a list
try:
len(labels)
except TypeError:
labels = [labels]
self.L = np.array(labels)
self.O_error = self.loss(self.O, self.L)
self.O_delta = self.loss(self.O, self.L, derivative=True) * self.activation(self.O, derivative=True)
self.H_error = self.O_delta.dot(self.HO_weights.T)
self.H_delta = self.H_error * self.activation(self.H, derivative=True)
self.IH_w_gradients += self.I.T.dot(self.H_delta)
self.HO_w_gradients += self.H.T.dot(self.O_delta)
self.IH_b_gradients += self.H_delta
self.HO_b_gradients += self.O_delta
return self.O_error
def updateWeights(self, learningRate):
self.IH_weights += learningRate * self.IH_w_gradients
self.HO_weights += learningRate * self.HO_w_gradients
self.IH_bias += learningRate * self.IH_b_gradients
self.HO_bias += learningRate * self.HO_b_gradients
self.IH_w_gradients = np.zeros_like(self.IH_weights)
self.HO_w_gradients = np.zeros_like(self.HO_weights)
self.IH_b_gradients = np.zeros_like(self.IH_bias)
self.HO_b_gradients = np.zeros_like(self.HO_bias)
def process(self, data, train=False, learningRate=0):
if train:
size = data.trainSize
inputs = data.trainInputs
labels = data.trainLabels
else:
size = data.testSize
inputs = data.testInputs
labels = data.testLabels
errors = []
for i in range(size):
self.forward(inputs[i])
errors.append(self.backwards(labels[i]))
if train:
self.updateWeights(learningRate)
return np.mean(errors)
data1 = DataSet([
[[0, 0], 0],
[[0, 1], 1],
[[1, 0], 1],
[[1, 1], 0]
])
data2 = DataSet([
[[0, 0], -1],
[[0, 1], 1],
[[1, 0], 1],
[[1, 1], -1]
])
data3 = DataSet([
[[0, 0], [1, 0]],
[[0, 1], [0, 1]],
[[1, 0], [0, 1]],
[[1, 1], [1, 0]]
])
if activation == 'sigmoid':
data = data1
mlp = MLP(data.numInputs, 2, data.numOutputs, 'sigmoid')
learningRate = 1
if activation == 'tanh':
data = data2
mlp = MLP(data.numInputs, 2, data.numOutputs, 'tanh')
learningRate = 0.1
if activation == 'relu':
data = data1
mlp = MLP(data.numInputs, 2, data.numOutputs, 'relu')
learningRate = 0.001
if activation == 'softmax':
data = data3
mlp = MLP(data.numInputs, 2, data.numOutputs, 'softmax')
learningRate = 0.01
################################################################################
# TO DO: UPDATE WEIGHTS AT INTERVALS, NOT EVERY EPOCH
################################################################################
losses = []
for epoch in range(numEpochs):
epochLoss = mlp.process(data, train=True, learningRate=learningRate)
losses.append(epochLoss)
if epoch % 1000 == 0 or epoch == numEpochs - 1:
print("EPOCH:", epoch)
print("LOSS: ", epochLoss, "\n")

Unfortunately, softmax is not as easy as the other activation functions you have posted. For the activation function, you must calculate the exp(y_i) and then divide by the sum exp(y_k) for every y_k in Y. For the derivative, you must calculate every combination (n^2 combinations) of partial derivatives of every output wrt every input of the neuron. Luckily, the loss it is something a little bit easier to understand, since you can think about the softmax giving you some probabilities (so it resembles a probability distribution) and you calculate the Cross Entropy as is between the returned values and the target ones.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can ReLU replace a Sigmoid Activation Function in Neural Network - python

Related

Neural Network from scratch using Python

Trying to understand cross_entropy loss in PyTorch

Errors from building a one hidden neural network

Softmax activation with cross entropy loss results in the outputs converging to exactly 0 and 1 for both classes, respectively

Trouble implementing softmax activation and cross-entropy loss, and their derivatives in a neural net

Categories

Resources