How do I get a loss per epoch and not per batch?

How do I get a loss per epoch and not per batch? - python

In my understanding an epoch is an arbitrarily often repeated run over the whole dataset, which in turn is processed in parts, so called batches. After each train_on_batch a loss is calculated, the weights are updated and the next batch will get better results. These losses are indicators of the quality and learning state of my to NNs.
In several sources the loss is calculated (and printed) per epoch. Therefore I am not sure if I am doing this right.
At the moment my GAN looks like this:
for epoch:
for batch:
fakes = generator.predict_on_batch(batch)
dlc = discriminator.train_on_batch(batch, ..)
dlf = discriminator.train_on_batch(fakes, ..)
dis_loss_total = 0.5 * np.add(dlc, dlf)
g_loss = gan.train_on_batch(batch,..)
# save losses to array to work with later
These losses are for each batch. How do I get them for an epoch? As an aside: Do I need losses for an epoch, what for?

There is no direct way to compute the loss for an epoch. Actually, the loss of an epoch is usually defined as the average of the loss of batches in that epoch. So you can accumulate the loss values during an epoch and at the end divide it by the number of batches in the epoch:
epoch_loss = []
for epoch in range(n_epochs):
acc_loss = 0.
for batch in range(n_batches):
# do the training
loss = model.train_on_batch(...)
acc_loss += loss
epoch_loss.append(acc_loss / n_batches)
As for the other question, one usage of epoch loss might be to use it as an indicator to stop the training (however, the validation loss is usually used for that, not the training loss).

I'll expand on #today answer a bit. There is a certain balance to strike in how to report loss over an epoch and how to use it to determine when training should stop.
If you only look at the loss of the most recent batch, it will be a very noisy estimate of your dataset loss because maybe that batch happened to store all the samples your model has trouble with, or all the samples that are trivial to succeed on.
If you look at the averaged loss over all batches in the epoch, you may get a skewed response because, like you indicated, the model has been (hopefully) improving over the epoch, so the performance on the initial batches aren't as meaningfully compared to the performance on the later batches.
The only way to accurately report your epoch loss is to take your model out of training mode, i.e. fix all the model parameters, and run your model on the whole dataset. That will be an unbiased computation of your epoch loss. However, in general that's a terrible idea because if you have a complex model or a lot of training data, you will waste a lot of time doing this.
So, I think it's most common to balance these factors by reporting an averaged loss over N mini-batches, where N is large enough to smooth out the noise of individual batches but not so large that the model performance is not comparable between the first and last batches.
I know you're in Keras but here is a PyTorch example that illustrates this concept clearly, replicated here:
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
You can see they accumulate the loss over N=2000 batches, report the averaged loss over those 2000 batches, then zero out the running loss and keep going.

Related

Python - neural network training loss and validation loss values

I have a question about my training loss and validation loss for a neural network in python using pytorch. I am using bert to classify labels for some given text.
I have about 14k text records with 20 unique labels - where some labels are more frequent than others.
I use about 25% as my validation set and use strafification when performing train_test_splits.
My learning rate is 1e-6
attention_probs_dropout_prob=0.2
hidden_dropout_prob=0.2
There is no data leakeage as I did not impute any values
While training my model I notice few things.
training loss remains higher than validation loss
with each epoch both losses go down but training loss never goes below the validation loss even though they are close
Example
epoch 8
epoch 50
epoch 100
epoch 200
training loss
2.9
1.18
.98
.75
validation loss
2.4
1.0
.67
.45
F1 score - weighted
.55
.75
.86
.90
As noticed we see that the training loss decreases a bit at first but then slows down, but validation loss keeps decreasing with bigger increments
can someone explain to me what is going on with how this model is learning? My understanding is that it is not performing well based on the training loss and validation loss values. Usually my values should be lower with the training loss below validation loss
any input is appreciated
Thank you

How to seperate code into train, val and test functions for pytorch cnn?

I am training a cnn using pytorch and have created a training loop. As I am performing optimisation and experimenting with hyper-parameter tuning, I want to separate my training, validation and testing into different functions. I need to be able to record my accuracy and loss for each function in order to plot graphs. For this I want to create a function which returns the accuracy.
I am pretty new to coding and was wondering the best way to go about this. I feel like my code is a bit messy at the moment. I need to be able to feed in various hyper-parameters for experimentation in my training function. Could anyone offer any advice? Below is what I can so far:
def train_model(model, optimizer, data_loader, num_epochs, criterion=criterion):
total_epochs = notebook.tqdm(range(num_epochs))
for epoch in total_epochs:
model.train()
train_correct = 0.0
train_running_loss=0.0
train_total=0.0
for i, (img, label) in enumerate(data_loader['train']):
#uploading images and labels to GPU
img = img.to(device)
label = label.to(device)
#training model
outputs = model(img)
#computing losss
loss = criterion(outputs, label)
#propagating the loss backwards
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_running_loss += loss.item()
_, predicted = outputs.max(1)
train_total += label.size(0)
train_correct += predicted.eq(label).sum().item()
train_loss=train_running_loss/len(data_loader['train'])
train_accu=100.*correct/total
print('Train Loss: %.3f | Train Accuracy: %.3f'%(train_loss,train_accu))
I have also experimented with making a functions to record accuracy:
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim = 1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))

First, note that:
Unless you have some specific motivation, validation (and testing) should be performed on a different dataset than the training set, so you should use a different DataLoader. The computation time will increase because of an additional for loop at every epoch.
Always call model.eval() before validation/testing.
That said, The signature of the validation function is pretty much similar to that of train_model
# criterion is passed if you want to register the validation loss too
def validate_model(model, eval_loader, criterion):
...
Then, in train_model, after each epoch, you can call the function validate_model and store the returned metrics in some data structure (list, tensor, etc.) that will be used later for plotting.
At the end of the training, you can then use the same validate_model function for testing.
Instead of coding the accuracy by yourself, you can use Accuracy from TorchMetrics
Finally, if you feel the need to level up, you can use DL training frameworks like PyTorch Lightning or FastAI. Give also a look at some hyperparameter tuning library such as Ray Tune.

PyTorch - Computing and printing the training accuracy of the QuickStart Tutorial

to learn PyTorch, I started with the Quickstart Tutorial. In the train() method, I noticed that they don't print the training accuracy during the training session. Only the training loss is printed.
Coming from Keras, this was very unusual for me, since the training accuracy automatically printed when you call fit().
So, I decided to modify the tutorial code like the following to print the training accuracy:
def train(dataloader, model, optimizer, loss_fn):
model.train()
size = len(dataloader.dataset)
num_batches = len(dataloader)
training_loss = 0.0
correct = 0.0
for batch, (imgs, labels) in enumerate(dataloader):
imgs = imgs.to(device=device)
labels = labels.to(device=device)
predictions = model(imgs)
loss = loss_fn(predictions, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# accumulate the training loss - each batch's loss will be added to trainin_loss
training_loss += loss.item()
# determines the number of correct predictions
correct += (predictions.argmax(1) == labels).type(torch.float).sum().item()
# end of for loop - all batches are processed
# after all batches are processed, determine the average training loss
training_loss = training_loss / num_batches
# this would be the training accuracy: number of correct predictions / number of samples in dataset
correct = correct / size
print(f"{datetime.datetime.now()} Training Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {training_loss:>8f} \n")
Is this ok ? As a beginner to PyTorch, I wanted to make sure that is correct before I start training my neural networks.

It all looks correct. Doing things like this should not influence training. loss.backward() computes your gradients and anything not connect to that can not change them. By the way, just run the training, you can't break anything :) (Yet. Just wait when you start building self driving cars.).
I thought, in Keras/TensorFlow fit() does not compute accuracy automatically, you still have to specify this metric for example when compiling the model or as a parameter to fit(), e.g.:
model.compile(optimizer='sgd',
loss='mse',
metrics=[tf.keras.metrics.Accuracy()])

training by batches leads to more over-fitting

I'm training a sequence to sequence (seq2seq) model and I have different values to train on for the input_sequence_length.
For values 10 and 15, I get acceptable results but when I try to train with 20, I get memory errors so I switched the training to train by batches but the model over-fit and the validation loss explodes, and even with the accumulated gradient I get the same behavior, so I'm looking for hints and leads to more accurate ways to do the update.
Here is my training function (only with batch section) :
if batch_size is not None:
k=len(list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )))
for epoch in range(num_epochs):
optimizer.zero_grad()
epoch_loss=0
for i in list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )): # by using equidistant batch till the last one it becomes much faster than using the X.size()[0] directly
sequence = X_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, input_size).to(device)
labels = y_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, output_size).to(device)
# Forward pass
outputs = model(sequence)
loss = criterion(outputs, labels)
epoch_loss+=loss.item()
# Backward and optimize
loss.backward()
optimizer.step()
epoch_loss=epoch_loss/k
model.eval
validation_loss,_= evaluate(model,X_test_hard_tensor_1,y_test_hard_tensor_1)
model.train()
training_loss_log.append(epoch_loss)
print ('Epoch [{}/{}], Train MSELoss: {}, Validation : {} {}'.format(epoch+1, num_epochs,epoch_loss,validation_loss))
EDIT:
here are the parameters that I'm training with :
batch_size = 1024
num_epochs = 25000
learning_rate = 10e-04
optimizer=torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss(reduction='mean')

Batch size affects regularization. Training on a single example at a time is quite noisy, which makes it harder to overfit. Training on batches smoothes everything out, which makes it easier to overfit. Translating back to regularization:
Smaller batches add regularization.
Larger batches reduce regularization.
I am also curious about your learning rate. Every call to loss.backward() will accumulate the gradient. If you have set your learning rate to expect a single example at a time, and not reduced it to account for batch accumulation, then one of two things will happen.
The learning rate will be too high for the now-accumulated gradient, training will diverge, and both training and validation errors will explode.
The learning rate won't be too high, and nothing will diverge. The model will just train more quickly and effectively. If the model is too large for the data being fit, then training error will go to 0 but validation error will explode due to overfitting.
Update
Here is a bit more detail regarding the gradient accumulation.
Every call to loss.backward() will accumulate gradient, until you reset it with optimizer.zero_grad(). It will be acted on when you call optimizer.step(), based on whatever it has accumulated.
The way your code is written, you call loss.backward() for every pass through the inner loop, then you call optimizer.step() in the outer loop before resetting. So the gradient has been accumulated, that is summed, over all examples in the batch and not just one example at a time.
Under most assumptions, that will make the batch-accumulated gradient larger than the gradient for a single example. If the gradients are all aligned, for B batches, it will be larger by B times. If the gradients are i.i.d. then it will be more like sqrt(B) times larger.
If you do not account for this, then you have effectively increased your learning rate by that factor. Some of that will be mitigated by the smoothing effect of larger batches, which can then tolerate a higher learning rate. Larger batches reduce regularization, larger learning rates add it back. But that will not be a perfect match to compensate, so you will still want to adjust accordingly.
In general, whenever you change your batch size you will also want to re-tune your learning rate to compensate.
Leslie N. Smith has written some excellent papers on a methodical approach to hyperparameter tuning. A great place to start is A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay. He recommends you start by reading the diagrams, which are very well done.

How can we perform early stopping with train_on_batch?

I manually run the epochs in a loop, as well as further nested mini-batches in the loop. At each mini-batch, I need to call train_on_batch, to enable the training of a customized model.
Is there a manual way to restore the functionality of early stopping, i.e. to break the loop?

In practice, 'early stopping' is largely done via: (1) train for X epochs, (2) save the model each time it achieves a new best performance, (3) select the best model. "Best performance" defined as achieving the highest (e.g. accuracy) or lowest (e.g. loss) validation metric - example script below:
best_val_loss = 999 # arbitrary init - should be high if 'best' is low, and vice versa
num_epochs = 5
epoch = 0
while epoch < num_epochs:
model.train_on_batch(x_train, y_train) # get x, y somewhere in the loop
val_loss = model.evaluate(x_val, y_val)
if val_loss < best_val_loss:
model.save(best_model_path) # OR model.save_weights()
print("Best model w/ val loss {} saved to {}".format(val_loss, best_model_path))
# ...
epoch += 1
See saving Keras models. If you rather early-stop directly, then define some metric - i.e. condition - that'll end the train loop. For example,
while True:
loss = model.train_on_batch(...)
if loss < .02:
break

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.