Keras LSTM training loss is increasing instantaneously in one epoch - python

I am training an LSTM to predict a price chart. In order to do so I am arranging the data into 3D with a certain time window like 10 days to predict the next day.
I am not shuffling the data before splitting into training and testing. So the most recent data is used as testing. Is that a good idea with a stateless LSTM?
Secondly, during one epoch, the training is firstly decreasing but halfway through the epoch it is suddenly increased by an order of magnitude? what are the possible reasons for that happening? Is this normal behavior. I.e. is it a misconception of mine that loss in one epoch should constantly decrease?
I used Bayesian optimization to find the right hyperparameter and I am using adam optimizer with the default learning rate. And normalizing data between 0 and 1.

Related

CNN model validation accuracy is not improving

I am currently working on a CNN model for classification, I have to predict words on a wav file. I encountered a problem with my validation accuracy that stays (almost) the same, first I was thinking of overfitting but that does not seem to be the problem. Below you can see a photo with the result at the different epochs:
I am building a CNN model with Keras and using the 'adam' optimizer and 'categorical_crossentropy' for the loss. I already have tried to increase the number of epochs until 1000 and changed the batch size.
Your training loss seems to be decreasing but val_loss is increasing while val_accuracy is approximately same. This is standard case of overfitting. Why do you think that's not the case?
Increasing the training epochs or batch size is not helpful as you're just changing the number of times the model sees the data or the quantity of data it sees in one epoch.
For current scenario, the best model is created till the point both val_loss and train_loss continues to decrease before it becomes saturated.
To address the problem, you need to add noise in the training data so that the model generalizes better, generalize the examples better, create balanced categories in terms of training data volume.
Secondly, you can increase your validation dataset to see if it continues to have the same issue. If it's there then the model is definitely overfitting. ALso please update your question about what kind of validation set and technique you're using. If possible, add the code snippet of your validation set and loss function

What is causing large jumps in training accuracy and loss between epochs?

In training a neural network in Tensorflow 2.0 in python, I'm noticing that training accuracy and loss change dramatically between epochs. I'm aware that the metrics printed are an average over the entire epoch, but accuracy seems to drop significantly after each epoch, despite the average always increasing.
The loss also exhibits this behavior, dropping significantly each epoch but the average increases. Here is an image of what I mean (from Tensorboard):
I've noticed this behavior on all of the models I've implemented myself, so it could be a bug, but I want a second opinion on whether this is normal behavior and if so what does it mean?
Also, I'm using a fairly large dataset (roughly 3 million examples). Batch size is 32 and each dot in the accuracy/loss graphs represent 50 batches (2k on the graph = 100k batches). The learning rate graph is 1:1 for batches.
It seems this phenomenon comes from the fact that the model has a high batch-to-batch variance in terms of accuracy and loss. This is illustrated if I take a graph of the model with the actual metrics per step as opposed to the average over the epoch:
Here you can see that the model can vary widely. (This graph is just for one epoch, but the fact remains).
Since the average metrics were being reported per epoch, at the beginning of the next epoch it is highly likely that the average metrics will be lower than the previous average, leading to a dramatic drop in the running average value, illustrated in red below:
If you imagine the discontinuities in the red graph as being epoch transitions, you can see why you would observe the phenomenon in the question.
TL;DR The model has a very high variance in it's output with respect to each batch.
I have just newly experienced this kind of issue while I was working on a project that is about object localization. For my case, there was three main candidates.
I have used no shuffling in my training. That creates a loss increase after each epoch.
I have defined a new loss function that is calculated using IOU. It was something like;
def new_loss(y_true, y_pred):
mse = tf.losses.mean_squared_error(y_true, y_pred)
iou = calculate_iou(y_true, y_pred)
return mse + (1 - iou)
I also suspect this loss may be a possible candidate of increase in loss after epoch. However, I was not able to replace it.
I was using an Adam optimizer. So, a possible thing to do is to change it to see how the training affected.
Conclusion
I have just changed the Adam to SGD and shuffled my data in training. There was still a jump in the loss but it was so minimal compared without a change. For example, my loss spike was ~0.3 before the changes and it became ~0.02.
Note
I need to add there are lots of discussions about this topic. I tried to utilize the possible solutions that are possible candidates for my model.

Reason for keras validation zig-zagging

I am training a NN and getting this result on loss and validation loss:
These are 200 epochs, a batch size of 16, 500 training samples and 200 validation samples.
As you can see, after about 20 epochs, the validation loss begins to do a very exaggerated zig-zagging.
Do you know which could be the reason for that behavior?
I tried to increase the number of validation samples but that just increased the zig-zagging and made it more exaggerated.
Also, I added a decay value to the optimizer, but the loss and validation loss did not look so good.
.
I was looking for another way to improve it.
Any idea on which is the zig-zagging reason and how could I minimize it?
This might be a case of overfitting:
Overfitting refers to a model that models the “training data” too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data source.
Basically, you have a very small training sample (500), but are training for a very long time (200 epochs!).
The network will start learning your training data by heart and won't learn to generalise. It will thus seem to be very good during training, but will fail miserably on the test set.
early stopping is a nice way to avoid overfitting: basically, stop as soon as the validation loss becomes erratic/starts increasing. Another way to lower the chances of overfitting is to use techniques such as dropout or simply to increase the training data.
tldr; you are overfitting. To avoid this issue, many possibilities: reduce drastically the number of epochs, use a dev set and a stopping criterion, have more training data, ...
For alternative explanations, see also this question on QUORA.
I would suggest that don't be worry for the zigzag fashion of the validation loss or validation accuracy. See, what happens when training of the neural network goes on, it makes the mistakes and update the weights, right ?( if you know the math behind it). So it is obvious that testing data will create zigzag because model is in training mode (learning stage). Once the model will get trained fully , you will notice that ... zigzag will decrease (if you have chose correct number of epochs).
So don't worry for this.

Number of epochs to be used in a Keras sequential model

I'm building a Keras sequential model to do a binary image classification. Now when I use like 70 to 80 epochs I start getting good validation accuracy (81%). But I was told that this is a very big number to be used for epochs which would affect the performance of the network.
My question is: is there a limited number of epochs that I shouldn't exceed, note that I have 2000 training images and 800 validation images.
If the number of epochs are very high, your model may overfit and your training accuracy will reach 100%. In that approach you plot the error rate on training and validation data. The horizontal axis is the number of epochs and the vertical axis is the error rate. You should stop training when the error rate of validation data is minimum.
You need to have a trade-off between your regularization parameters. Major problem in Deep Learning is overfitting model. Various regularization techniques are used,as
i) Reducing batch-size
ii) Data Augmentation(only if your data is not diverse)
iii) Batch Normalization
iv) Reducing complexity in architecture(mainly convolutional layers)
v) Introducing dropout layer(only if you are using any dense layer)
vi) Reduced learning rate.
vii) Transfer learning
Batch-size vs epoch tradeoff is quite important. Also it is dependent on your data and varies from application to application. In that case, you have to play with your data a little bit to know the exact figure. Normally a batch size of 32 medium size images requires 10 epochs for good feature extraction from the convolutional layers. Again, it is relative
There's this Early Stopping function that Keras supply which you simply define.
EarlyStopping(patience=self.patience, verbose=self.verbose, monitor=self.monitor)
Let's say that the epochs parameter equals to 80, like you said before. When you use the EarlyStopping function the number of epochs becomes the maximum number of epochs.
You can define the EarlyStopping function to monitor the validation loss, for example, when ever this loss does not improve no more it'll give it a few last chances (the number you put in the patience parameter) and if after those last chances the monitored value didn't improve the training process will stop.
The best practice, in my opinion, is to use both EarlyStopping and ModelCheckpoint, which is another callback function supplied in Keras' API that simply saves your last best model (you decide what best means, best loss or other value that you test your results with).
This is the Keras solution for the problem your trying to deal with. In addition there is a lot of online material that you can read about how to deal with overfit.
Yaa! Their is a solution for your problem. Select epochs e.g 1k ,2k just use early stoping on your neural net.
Early Stopping:
Keras supports the early stopping of training via a callback called Early-stopping.
This callback allows you to specify the performance measure to monitor, the trigger, and once triggered, it will stop the training process. For example you apply a trigger that stop the training if accuracy is not increasing in previous 5 epochs. So keras will see previous 5 epochs through call backs and stop training if your accuracy is not increasing
Early Stopping link :

Is this an overfitted network?

I have obtained this result after training a neural network in keras and I was wondering if this is overfitting or not.
I'm having doubts because I have read overfitting is produced when a net is overtrained, and it happens when the validation loss INCREASES.
But in this case it doesn't increase. It remains the same, but the training loss DECREASES.
EXTRA INFO
Single dataset split on this way:
70% of the dataset used as training data
30% of the dataset used as validation data
500 EPOCHS TRAINING
2000 EPOCHS TRAINING
Training loss: 3.1711e-05
Validation loss: 0.0036
There is a slight overfit in the sense that you training loss keeps decreasing and the validation loss stopped decreasing.
However, I wouldn't consider this harmful because the validation loss insn't increasing. This is if I read the graph correctly, if there is a small increase then it's getting bad.
A harmful overfit is when your validation loss starts increasing. The validation loss is your true measure of the performance of the network. If it goes up then your model is starting to do bad things and you should stop there.
All in all this seems pretty decent. The training loss will almost always be going lower than the validation at some point, this is an optimization process over the training set.
Training loss does indeed appear to continue decreasing further than validation loss (it still looks to me like it didn't finish decreasing yet at the 500th epoch, would be good to continue for more epochs and see what happens). The difference doesn't appear to be large though.
It may be overfitting slightly, but it may also be possible that the distribution of your validation data is simply a bit different from the distribution of the training data.
I'd recommend testing the following:
Continue for more than 500 epochs, to see if the training loss keeps on decreasing even further, or if it stabilizes close to the validation loss. If it keeps on decreasing much further, and the validation loss stays the same, it's safe to say that the network is overfitting.
Try creating different splits of training and validation sets. How did you determine training and validation sets actually? Were you given two separate sets, one for training and one for validation? Or were you given a single large training set, and did you split it up yourself? In the first case, the distributions may be different, so a difference in training vs validation loss wouldn't be strange. In the second case, try randomly creating different splits and repeating the experiments to see if you always consistently get the same difference in training vs validation loss, or if they're sometimes also closer together.

Categories