I have a question about my training loss and validation loss for a neural network in python using pytorch. I am using bert to classify labels for some given text.
I have about 14k text records with 20 unique labels - where some labels are more frequent than others.
I use about 25% as my validation set and use strafification when performing train_test_splits.
My learning rate is 1e-6
attention_probs_dropout_prob=0.2
hidden_dropout_prob=0.2
There is no data leakeage as I did not impute any values
While training my model I notice few things.
training loss remains higher than validation loss
with each epoch both losses go down but training loss never goes below the validation loss even though they are close
Example
epoch 8
epoch 50
epoch 100
epoch 200
training loss
2.9
1.18
.98
.75
validation loss
2.4
1.0
.67
.45
F1 score - weighted
.55
.75
.86
.90
As noticed we see that the training loss decreases a bit at first but then slows down, but validation loss keeps decreasing with bigger increments
can someone explain to me what is going on with how this model is learning? My understanding is that it is not performing well based on the training loss and validation loss values. Usually my values should be lower with the training loss below validation loss
any input is appreciated
Thank you
Related
I'm trying to build a model on python to predict an operational parameter (ROP- Rate of Penetration) while drilling an oil well. I'm working with a neural network trained with PSO using pyswarms library. Input layer consists of 11 neurons and output layer just 1 neuron (ROP). I'm still searching for the "right" number of hidden layers.I don't have enough knowledge about machine learning, so any suggestion will be accepted.The loss function to minimize is MAE, due to it is not affected by outliers.
To track the performance of the model I'm not sure about what loss function I have to use. That's why after every run, I print MAE, RMSE, MSE R2 and R. The problem is that the values for train are "high" (loss functions) or "low" (R o R2) and for validation data is quite close.
I would like you to give oppinion about my "work".I'm not really sure about if the model is overfitting, underfitting or data quality is low.
Whole dataset consists of 6 wells (F-1A, F-1B, F-1C, F-11A,F-11B,F-11T2), for each well we have 12 parameters (including ROP that is the target). The number of samples for each well is different:
For instance:
Well F-1A: 60 000 samples (aprox)
Well F-1B: 20 000 samples (aprox)
Well F-1C: 25 000 samples (aprox)
So I consider that is enough to train my model on one well, for example on Well F-11A and then validate on Well F-1B.
On one of those runs I got this result:
Input layer: 11
Hidden layers: 2 (8 neurons and 10 neurons)
Output layer: 1
Options : {'c1': 0.68, 'c2': 0.7, 'w': 0.73}
n_particles = 100
iters = 100
The results for loss functions, R2 and R for each dataset are:
ROP Train Data r^2= 0.4955
ROP Train Data r= 0.7039
ROP Train Data MAE = 3.272725
ROP Train Data MSE = 19.528535
ROP Train Data RMSE = 4.41911
ROP Validation Data r^2= 0.5169
ROP Validation Data r= 0.719
ROP Validation Data MAE = 10.755544
ROP Validation Data MSE = 124.405781
ROP Validation Data RMSE = 11.153734
I dont know well what is the interpretation of this values. What I have to do next? Because I have realized that on the right plot, the curve of the Predicted Validation data (green curve) follow the trend of the Actual Validation data (blue curve) but the predicted values seems to be lower (as if they had been displaced)
to learn PyTorch, I started with the Quickstart Tutorial. In the train() method, I noticed that they don't print the training accuracy during the training session. Only the training loss is printed.
Coming from Keras, this was very unusual for me, since the training accuracy automatically printed when you call fit().
So, I decided to modify the tutorial code like the following to print the training accuracy:
def train(dataloader, model, optimizer, loss_fn):
model.train()
size = len(dataloader.dataset)
num_batches = len(dataloader)
training_loss = 0.0
correct = 0.0
for batch, (imgs, labels) in enumerate(dataloader):
imgs = imgs.to(device=device)
labels = labels.to(device=device)
predictions = model(imgs)
loss = loss_fn(predictions, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# accumulate the training loss - each batch's loss will be added to trainin_loss
training_loss += loss.item()
# determines the number of correct predictions
correct += (predictions.argmax(1) == labels).type(torch.float).sum().item()
# end of for loop - all batches are processed
# after all batches are processed, determine the average training loss
training_loss = training_loss / num_batches
# this would be the training accuracy: number of correct predictions / number of samples in dataset
correct = correct / size
print(f"{datetime.datetime.now()} Training Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {training_loss:>8f} \n")
Is this ok ? As a beginner to PyTorch, I wanted to make sure that is correct before I start training my neural networks.
It all looks correct. Doing things like this should not influence training. loss.backward() computes your gradients and anything not connect to that can not change them. By the way, just run the training, you can't break anything :) (Yet. Just wait when you start building self driving cars.).
I thought, in Keras/TensorFlow fit() does not compute accuracy automatically, you still have to specify this metric for example when compiling the model or as a parameter to fit(), e.g.:
model.compile(optimizer='sgd',
loss='mse',
metrics=[tf.keras.metrics.Accuracy()])
I am trying to optimize a given neural network (ex Perceptron Multilayer, with 2 hidden layers), by finding the number of epoch and batch that give the highest accuracy.
for epoch from 10 to 200 (in steps of 10):
for batch from 40 to 200 (in steps of 20):
modele.fit (X_train, Y_train, epochs = epoch, batch_size = batch)
I save batch, epoch, Accuracy;
Afterwards I kept the smallest epoch with the smallest corresponding batch which has the highest recognition
ex best_params: epoch = 10, batch = 150 => Accuracy = 94%
My problem is that when I re-run my model with the best_params, it doesn't give me the same results (loss, accuracy), even sometimes very low accuracy (eg 10%).
i try to fix seed, but no best result
Regards
Djam75
df=pd.DataFrame(columns=['Nb_Batch','Nb_Epoch','Accuracy'])
i=0
lst_loss=[]
lst_accuracy=[]
lst_epoch=list(np.arange(10,200,10))
lst_batch=list(np.arange(100,400,20))
for epoch in lst_epoch:
print ('---------------- Epoch ' + str(epoch)+ '------------------')
for batch in lst_batch:
modelSimple.fit(X_train, Y_train, nb_epoch = epoch, batch_size = batch, verbose = 0)
score = modelSimple.evaluate(X_test, Y_test)
df.loc[i,"Nb_Batch"]=batch
df.loc[i,"Nb_Epoch"]=epoch
df.loc[i,"Accuracy"]=score[1]*100
i=i+1
This might be happening due to random parameter initialization. Because if you are building an end-to-end model without transfer learn the weights, every time you training architecture get random values for its parameters.
In this case, a good practice is to use batch normalization layers after some layers according to your architecture.
tensoflow-implementation
pytorch-implmentation
extra idea:
Do not use any 'for', 'while' loops in the model implementation.
you can follow templates in TensorFlow or PyTorch.
OR, if you build a complete model from scratch, vectorize operations by using NumPy like metrics operation library.
Thanks for the update.
I resolve my probelm by saving a model and load it after.
thaks for idea (batch normalization ) and extra idea : not user any for ;-)
regards
I think you might not be updating the weight matrix after completing the training for certain batch sizes and epochs.
Please include the code as well in order to see the problem
I'm working on building an LSTM model to binary classify price movements.
My training data is data I simulated, it's a 2,000 rows * 3,780 columns dataframe of price movements.
I have a separate labels file that classifies price movements as either 1 or 2 (due to memory).
From what I've read, it appears as though two loss functions are the most appropriate for binary classification:
Binary Cross-Entropy
Hinge Loss
I've implemented two separate LSTM models in google colab which run as expected.
I have the same code for both models, with just the loss function being changed from a Squared Hinge loss in the former to a Binary Cross Entropy in the latter.
My issue is deciding which is the better model, as the model outputs give conflicting outputs.
Hinge Loss Output:
Training Output:
The loss starts at 0.3, then goes to 0.20 after and stays pretty much constant for the remaining 98 epochs.
The MSE does decrease marginally across the epochs from 2.8 to 1.68 at the end. Average MSE = 1.72.
The accuracy is 0.00 on every epoch (which I don't understand).
Validation Output:
The Validation loss starts at 0.0117 and goes to 9.8264e-06 by the end.
The Validation MSE starts at 2.4 and ends at 1.54. Average Validation MSE = 1.31.
The Validation accuracy is 0.00 on every epoch (which again I don't understand).
Binary Cross Entropy Loss Output:
Training Output:
The loss starts at 8.3095, then goes to 3.83 after and stays pretty much constant for the remaining 97 epochs.
The MSE does decrease marginally across the epochs from 2.8 to 1.68 at the end. Average MSE = 1.69.
The accuracy starts at 0.00 and increased to roughly 0.8 by end.
Validation Output:
The Validation loss starts at -0.82 and goes to -.89 by the end.
The Validation MSE starts at 1.56 and ends at 1.53. Average Validation MSE = 1.30.
The Validation accuracy starts at 0.00 and increased to roughly 0.997 by end.
So, I have a question now:
Why is the accuracy of the SHL model 0.00? Is there an error in my model?
My code is saved here:
https://nbviewer.jupyter.org/github/Ianfm94/Financial_Analysis/blob/master/LSTM_Workings/LSTM_Model.ipynb
The training data* and labels data are saved at the below location:
https://github.com/Ianfm94/Financial_Analysis/tree/master/LSTM_Workings
*Training data here is split into two separate files due to Github limiting file size to 25 mb's.
Any help would be greatly appreciated.
Thanks.
In my understanding an epoch is an arbitrarily often repeated run over the whole dataset, which in turn is processed in parts, so called batches. After each train_on_batch a loss is calculated, the weights are updated and the next batch will get better results. These losses are indicators of the quality and learning state of my to NNs.
In several sources the loss is calculated (and printed) per epoch. Therefore I am not sure if I am doing this right.
At the moment my GAN looks like this:
for epoch:
for batch:
fakes = generator.predict_on_batch(batch)
dlc = discriminator.train_on_batch(batch, ..)
dlf = discriminator.train_on_batch(fakes, ..)
dis_loss_total = 0.5 * np.add(dlc, dlf)
g_loss = gan.train_on_batch(batch,..)
# save losses to array to work with later
These losses are for each batch. How do I get them for an epoch? As an aside: Do I need losses for an epoch, what for?
There is no direct way to compute the loss for an epoch. Actually, the loss of an epoch is usually defined as the average of the loss of batches in that epoch. So you can accumulate the loss values during an epoch and at the end divide it by the number of batches in the epoch:
epoch_loss = []
for epoch in range(n_epochs):
acc_loss = 0.
for batch in range(n_batches):
# do the training
loss = model.train_on_batch(...)
acc_loss += loss
epoch_loss.append(acc_loss / n_batches)
As for the other question, one usage of epoch loss might be to use it as an indicator to stop the training (however, the validation loss is usually used for that, not the training loss).
I'll expand on #today answer a bit. There is a certain balance to strike in how to report loss over an epoch and how to use it to determine when training should stop.
If you only look at the loss of the most recent batch, it will be a very noisy estimate of your dataset loss because maybe that batch happened to store all the samples your model has trouble with, or all the samples that are trivial to succeed on.
If you look at the averaged loss over all batches in the epoch, you may get a skewed response because, like you indicated, the model has been (hopefully) improving over the epoch, so the performance on the initial batches aren't as meaningfully compared to the performance on the later batches.
The only way to accurately report your epoch loss is to take your model out of training mode, i.e. fix all the model parameters, and run your model on the whole dataset. That will be an unbiased computation of your epoch loss. However, in general that's a terrible idea because if you have a complex model or a lot of training data, you will waste a lot of time doing this.
So, I think it's most common to balance these factors by reporting an averaged loss over N mini-batches, where N is large enough to smooth out the noise of individual batches but not so large that the model performance is not comparable between the first and last batches.
I know you're in Keras but here is a PyTorch example that illustrates this concept clearly, replicated here:
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
You can see they accumulate the loss over N=2000 batches, report the averaged loss over those 2000 batches, then zero out the running loss and keep going.