I am working on a simple neural network in Keras with Tensorflow. There is a significant jump in loss value from the last mini-batch of epoch L-1 to the first mini-batch of epoch L.
I am aware that the loss should decrease with an increase in the number of iterations but a significant jump in loss after each epoch does looks strange. Here is the code snippet
tf.keras.initializers.he_uniform(seed=None)
initializer = tf.keras.initializers.he_uniform()
def my_loss(y_true, y_pred):
epsilon=1e-30 #epsilon is added to avoid inf/nan
y_pred = K.cast(y_pred, K.floatx())
y_true = K.cast(y_true, K.floatx())
loss = y_true* K.log(y_pred+epsilon) + (1-y_true)*K.log(1-y_pred+epsilon)
loss = K.mean(loss, axis= -1)
loss = K.mean(loss)
loss = -1*loss
return loss
inputs = tf.keras.Input(shape=(140,))
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(100, kernel_initializer=initializer)(x)
outputs = tf.keras.activations.sigmoid(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
opt = tf.keras.optimizers.Adam()
recall1 = tf.keras.metrics.Recall(top_k = 8)
c_entropy = tf.keras.losses.BinaryCrossentropy()
model.compile(loss=c_entropy, optimizer= opt , metrics = [recall1,my_loss], run_eagerly=True)
model.fit(X_train_test, Y_train_test, epochs=epochs, batch_size=batch, shuffle=True, verbose = 1)
When I search online, I found this article, which suggests that Keras calculates the moving average over the mini-batches. Also, I found somewhere that the array for calculating the moving average is reset after each epoch that's why we obtain a very smooth curve within an epoch but a jump after the epoch.
In order to avoid the moving average, I implemented my own loss function, which should output the loss values of the mini-batch instead of the moving average over the batches. As each mini-batch is different from each other; therefore the corresponding loss must also be different from each other. Due to this reason, I was expecting an arbitrary loss value on each mini-batch through my implementation of the loss function. Instead, I obtain exactly the same values as the loss function by Keras.
I am unclear about:
Is Keras calculating the moving average over the mini-batches, the array of which is reset after each epoch causing the jump. If not, then what is causing the jump behaviour in loss value.
Is my implementation of loss for each mini-batch correct? If not, then how can I obtain the loss value of the mini-batch during the training.
Keras in fact shows the moving average instead of the "raw" loss values. The moving average array is reset after each epoch that's why we can see a huge jump after each epoch. In order to acquire the raw loss values, one should implement a callback as shown below:
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
#initialize a list at the begining of training
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
mycallback = LossHistory()
Then call it in model.fit
model.fit(X, Y, epochs=epochs, batch_size=batch, shuffle=True, verbose = 0, callbacks=[mycallback])
print(mycallback.losses)
I tested with the following configuration
Keras 2.3.1
Tensorflow 2.1.0
Python 3.7.9
For some reason, it didn't work with the following configuration
Keras 2.4.3
Tensorflow 2.2.0
Python 3.8.5
To answer the second question, the implementation of the loss function my_loss is correct and the values obtained are pretty much close to the values generated by the built-in function.
tf.keras.losses.BinaryCrossentropy()
In TensorFlow version 2.2 and newer, the loss provided to on_train_batch_end is now the average loss of all batches up until the current batch within the given epoch. This is also the case for additional metrics, and applies to the built-in losses/metrics as well as any custom losses/metrics.
Fortunately, the loss for the current batch can be calculated from the average loss as follows:
from tensorflow.keras.callbacks import Callback
class CustomCallback(Callback):
''' This callback converts the average loss (default behavior in TF>=2.2)
into the loss for only the current batch.
'''
def on_epoch_begin(self, epoch, logs={}):
self.previous_loss_sum = 0
def on_train_batch_end(self, batch, logs={}):
# calculate loss of current batch:
current_loss_sum = (batch + 1) * logs['loss']
current_loss = current_loss_sum - self.previous_loss_sum
self.previous_loss_sum = current_loss_sum
# use current_loss:
# ...
This code can be added to any custom callback that needs the loss for the current batch instead of the average loss, including the LossHistory callback provided in Doc Jazzy's answer.
Also, if you are using Tensorflow 1 or TensorFlow 2 version <= 2.1, then do not include this code in your callback, as in those versions the current loss is already provided, instead of the average loss.
Related
I am trying to implement a Neural Network for predicting the h1_hemoglobin in PyTorch. After creating a model, I kept 1 in the output layer as this is Regression. But I got the error as below. I'm not able to understand the mistake. Keeping a large value like 100 in the output layer removes the error but renders the model useless as I am trying to implement regression.
Data:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)
##### Creating Tensors
X_train=torch.tensor(X_train)
X_test=torch.tensor(X_test)
y_train=torch.LongTensor(y_train)
y_test=torch.LongTensor(y_test)
class ANN_Model(nn.Module):
def __init__(self,input_features=4,hidden1=20,hidden2=20,out_features=1):
super().__init__()
self.f_connected1=nn.Linear(input_features,hidden1)
self.f_connected2=nn.Linear(hidden1,hidden2)
self.out=nn.Linear(hidden2,out_features)
def forward(self,x):
x=F.relu(self.f_connected1(x))
x=F.relu(self.f_connected2(x))
x=self.out(x)
return x
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)
epochs = 500
final_losses = []
for i in range(epochs):
i = i + 1
y_pred = model.forward(X_train.float())
loss=loss_function(y_pred, y_train)
final_losses.append(loss.item())
if i%10==1:
print("Epoch number: {} and the loss: {}".format(i, loss.item()))
optimizer.zero_grad()
loss.backward()
optimizer.step()
Error:
Since you are performing regression, the CrossEntropyLoss() internally implements the NLLLoss() function. The CrossEntropyLoss() expects C classes for C predictions but you have specified only one class. The NLLLoss() tries to index into the prediction logits based on the ground-truth value. E.g., in your case, the ground-truth is a single value 14. The loss step tries to index into the 14th logit of your predictions to get its corresponding value so that it can compute the negative log likelihood on it, which is essentially - -log(probability_k) where k is the index that the ground-truth outputs. Since you have only logit in your predictions, it throws an error - index out of bounds.
For regression problems, you should consider using distance based losses such as MSELoss().
Try replacing your loss function - loss_function = CrossEntropyLoss() with loss_function = MSELoss()
Your response variable h1_hemoglobin looks like continous response variable. If that's the case please change the Torch Tensor Type for y_train and y_test from LongTensor to FloatTensor or DoubleTensor.
According to the Pytorch docs, CrossEntropyLoss is useful for classification problems with a number of classes. Try to change your loss_function from CrossEntropyLoss to a more suitable one for your continuous response variable h1_hemoglobin.
In your case, the following might do it.
y_train=torch.DoubleTensor(y_train)
y_test=torch.DoubleTensor(y_test)
...
...
loss_function = nn.MSELoss()
Pytorch MSELoss
Pytorch CrossEntropyLoss
I am training a cnn using pytorch and have created a training loop. As I am performing optimisation and experimenting with hyper-parameter tuning, I want to separate my training, validation and testing into different functions. I need to be able to record my accuracy and loss for each function in order to plot graphs. For this I want to create a function which returns the accuracy.
I am pretty new to coding and was wondering the best way to go about this. I feel like my code is a bit messy at the moment. I need to be able to feed in various hyper-parameters for experimentation in my training function. Could anyone offer any advice? Below is what I can so far:
def train_model(model, optimizer, data_loader, num_epochs, criterion=criterion):
total_epochs = notebook.tqdm(range(num_epochs))
for epoch in total_epochs:
model.train()
train_correct = 0.0
train_running_loss=0.0
train_total=0.0
for i, (img, label) in enumerate(data_loader['train']):
#uploading images and labels to GPU
img = img.to(device)
label = label.to(device)
#training model
outputs = model(img)
#computing losss
loss = criterion(outputs, label)
#propagating the loss backwards
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_running_loss += loss.item()
_, predicted = outputs.max(1)
train_total += label.size(0)
train_correct += predicted.eq(label).sum().item()
train_loss=train_running_loss/len(data_loader['train'])
train_accu=100.*correct/total
print('Train Loss: %.3f | Train Accuracy: %.3f'%(train_loss,train_accu))
I have also experimented with making a functions to record accuracy:
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim = 1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))
First, note that:
Unless you have some specific motivation, validation (and testing) should be performed on a different dataset than the training set, so you should use a different DataLoader. The computation time will increase because of an additional for loop at every epoch.
Always call model.eval() before validation/testing.
That said, The signature of the validation function is pretty much similar to that of train_model
# criterion is passed if you want to register the validation loss too
def validate_model(model, eval_loader, criterion):
...
Then, in train_model, after each epoch, you can call the function validate_model and store the returned metrics in some data structure (list, tensor, etc.) that will be used later for plotting.
At the end of the training, you can then use the same validate_model function for testing.
Instead of coding the accuracy by yourself, you can use Accuracy from TorchMetrics
Finally, if you feel the need to level up, you can use DL training frameworks like PyTorch Lightning or FastAI. Give also a look at some hyperparameter tuning library such as Ray Tune.
I'm kinda new to pytorch and trying to wrap my head around it.
I've read about custom loss functions and as far as I've seen, they cannot be decoupled from internal computational graph. This means loss function consumes tensors, does operations on them, which are implemented in pytorch, and outputs tensor. Is there any way to have decoupled loss calculation and plug it back somehow?
Use case
I'm trying to train an encoder, where latent space will be optimized to some statistical quality. This means I don't train in batches and I calculate single loss value for whole epoch and whole data set. Is it even possible to teach net that way?
class Encoder(nn.Module):
def __init__(self, genome_size: int):
super(Encoder, self).__init__()
self.fc1 = nn.Linear(genome_size, genome_size)
self.fc2 = nn.Linear(genome_size, genome_size)
self.fc3 = nn.Linear(genome_size, genome_size)
self.genome_size = genome_size
def forward(self, x):
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
return x
def train_encoder(
net: nn.Module,
optimizer: Optimizer,
epochs: int,
population: Tensor,
fitness: Tensor,
):
running_loss = 0.0
for epoch in range(epochs):
optimizer.zero_grad()
outputs = net(population)
# encoder_loss is computationally heavy and cannot be done only on tensors
# I need to unwrap those tensors to numpy arrays and use them as an input to another model
loss = encoder_loss(outputs, fitness)
running_loss += loss
running_loss.backward()
optimizer.step()
print('Encoder loss:', loss)
I've seen some examples with accumulated running_loss, but my encoder is unable to learn anything. Convergence plot just jumps all over the place.
Thanks for your time <3
I'm training a model with tensorflow 2.0. The images in my training set are of different resolutions. The Model I've built can handle variable resolutions (conv layers followed by global averaging). My training set is very small and I want to use full training set in a single batch.
Since my images are of different resolutions, I can't use model.fit(). So, I'm planning to pass each sample through the network individually, accumulate the errors/gradients and then apply one optimizer step. I'm able to compute loss values, but I don't know how to accumulate the losses/gradients. How can I accumulate the losses/gradients and then apply a single optimizer step?
Code:
for i in range(num_epochs):
print(f'Epoch: {i + 1}')
total_loss = 0
for j in tqdm(range(num_samples)):
sample = samples[j]
with tf.GradientTape as tape:
prediction = self.model(sample)
loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
gradients = tape.gradient(loss_value, self.model.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
total_loss += loss_value
epoch_loss = total_loss / num_samples
print(f'Epoch loss: {epoch_loss}')
If I understand correctly from this statement:
How can I accumulate the losses/gradients and then apply a single optimizer step?
#Nagabhushan is trying to accumulate gradients and then apply the optimization on the (mean) accumulated gradient. The answer provided by #TensorflowSupport does not answers it.
In order to perform the optimization only once, and accumulate the gradient from several tapes, you can do the following:
for i in range(num_epochs):
print(f'Epoch: {i + 1}')
total_loss = 0
# get trainable variables
train_vars = self.model.trainable_variables
# Create empty gradient list (not a tf.Variable list)
accum_gradient = [tf.zeros_like(this_var) for this_var in train_vars]
for j in tqdm(range(num_samples)):
sample = samples[j]
with tf.GradientTape as tape:
prediction = self.model(sample)
loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
total_loss += loss_value
# get gradients of this tape
gradients = tape.gradient(loss_value, train_vars)
# Accumulate the gradients
accum_gradient = [(acum_grad+grad) for acum_grad, grad in zip(accum_gradient, gradients)]
# Now, after executing all the tapes you needed, we apply the optimization step
# (but first we take the average of the gradients)
accum_gradient = [this_grad/num_samples for this_grad in accum_gradient]
# apply optimization step
self.optimizer.apply_gradients(zip(accum_gradient,train_vars))
epoch_loss = total_loss / num_samples
print(f'Epoch loss: {epoch_loss}')
Using tf.Variable() should be avoided inside the training loop, since it will produce errors when trying to execute the code as a graph. If you use tf.Variable() inside your training function and then decorate it with "#tf.function" or apply "tf.function(my_train_fcn)" to obtain a graph function (i.e. for improved performance), the execution will rise an error.
This happens because the tracing of the tf.Variable function results in a different behaviour than the observed in eager execution (re-utilization or creation, respectively). You can find more info on this in the tensorflow help page.
In line with the Stack Overflow Answer and the explanation provided in Tensorflow Website, mentioned below is the code for Accumulating Gradients in Tensorflow Version 2.0:
def train(epochs):
for epoch in range(epochs):
for (batch, (images, labels)) in enumerate(dataset):
with tf.GradientTape() as tape:
logits = mnist_model(images, training=True)
tvs = mnist_model.trainable_variables
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
loss_value = loss_object(labels, logits)
loss_history.append(loss_value.numpy().mean())
grads = tape.gradient(loss_value, tvs)
#print(grads[0].shape)
#print(accum_vars[0].shape)
accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]
optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
print ('Epoch {} finished'.format(epoch))
# Call the above function
train(epochs = 3)
Complete code can be found in this Github Gist.
I am experimenting with TensorFlow 2.0 (alpha). I want to implement a simple feed forward Network with two output nodes for binary classification (it's a 2.0 version of this model).
This is a simplified version of the script. After I defined a simple Sequential() model, I set:
# import layers + dropout & activation
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.activations import elu, softmax
# Neural Network Architecture
n_input = X_train.shape[1]
n_hidden1 = 15
n_hidden2 = 10
n_output = y_train.shape[1]
model = tf.keras.models.Sequential([
Dense(n_input, input_shape = (n_input,), activation = elu), # Input layer
Dropout(0.2),
Dense(n_hidden1, activation = elu), # hidden layer 1
Dropout(0.2),
Dense(n_hidden2, activation = elu), # hidden layer 2
Dropout(0.2),
Dense(n_output, activation = softmax) # Output layer
])
# define loss and accuracy
bce_loss = tf.keras.losses.BinaryCrossentropy()
accuracy = tf.keras.metrics.BinaryAccuracy()
# define optimizer
optimizer = tf.optimizers.Adam(learning_rate = 0.001)
# save training progress in lists
loss_history = []
accuracy_history = []
# loop over 1000 epochs
for epoch in range(1000):
with tf.GradientTape() as tape:
# take binary cross-entropy (bce_loss)
current_loss = bce_loss(model(X_train), y_train)
# Update weights based on the gradient of the loss function
gradients = tape.gradient(current_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# save in history vectors
current_loss = current_loss.numpy()
loss_history.append(current_loss)
accuracy.update_state(model(X_train), y_train)
current_accuracy = accuracy.result().numpy()
accuracy_history.append(current_accuracy)
# print loss and accuracy scores each 100 epochs
if (epoch+1) % 100 == 0:
print(str(epoch+1) + '.\tTrain Loss: ' + str(current_loss) + ',\tAccuracy: ' + str(current_accuracy))
accuracy.reset_states()
print('\nTraining complete.')
Training goes without errors, however strange things happen:
Sometimes, the Network doesn't learn anything. All loss and accuracy scores are constant throughout all the epochs.
Other times, the network is learning, but very very badly. Accuracy never went beyond 0.4 (while in TensorFlow 1.x I got an effortless 0.95+). Such a low performance suggests me that something went wrong in the training.
Other times, the accuracy is very slowly improving, while the loss remains constant all the time.
What can cause these problems? Please help me understand my mistakes.
UPDATE:
After some corrections, I can make the Network learn. However, its performance is extremely poor. After 1000 epochs, it reaches about %40 accuracy, which clearly means something is still wrong. Any help is appreciated.
The tf.GradientTape is recording every operation that happens inside its scope.
You don't want to record in the tape the gradient calculation, you only want to compute the loss forward.
with tf.GradientTape() as tape:
# take binary cross-entropy (bce_loss)
current_loss = bce_loss(model(df), classification)
# End of tape scope
# Update weights based on the gradient of the loss function
gradients = tape.gradient(current_loss, model.trainable_variables)
# The tape is now consumed
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
More importantly, I don't see the loop on the training set, therefore I suppose the complete code looks like:
for epoch in range(n_epochs):
for df, classification in dataset:
# your code that computes loss and trains
Moreover, the usage of the metrics is wrong.
You want to accumulate, thus update the internal state of the accuracy operation, at every training step and measure the overall accuracy at the end of every epoch.
Thus you have to:
# Measure the accuracy inside the training loop
accuracy.update_state(model(df), classification)
And call accuracy.result() only at the end of the epoch, when all the accuracy value have been saved into the metric.
Remember to call to the .reset_states() method to clears the variable states, resetting it to zero at the end of every epoch.