Is this TF training curve overfitting or underfitting?

Is this TF training curve overfitting or underfitting? - python

In the case of overfitting, to my knowledge the val_loss has to soar as opposed to the train_loss.
But how about the case below (val_loss remains low)? Is this model underfitting horribly? Or is it some completely different case?
Previously my models would overfit badly so I added the dropout of 0.3 (4 CuDNNGRU layers with 64 neurons and one Dense layer and batchsize of 64), so should I reduce the dropout?

This is neither overfitting nor underfitting. Some people refer to it as Unknown fit. Validation << training loss happens when you apply regularization (L1, L2, Dropout, ...) in keras because they are applied to training only and not on testing (validating). So it makes sense that your training loss is bigger (not all neurons are available for feed forward due to dropout for example).
But what is clear is that your model is not being optimized for your validation set, (almost a flat line). This can be due to many things:
Your validation set is badly representative of your dataset, has a very easy predictions or very small.
decrease learning rate or add more regularization (recurrent_regularization since you are using CuDNNGRU)
Your loss function is not appropriate for the problem you are trying to solve.
Hope these tips help you out.

Related

Cannot improve the accuracy of my Multilayer Perceptron (MLP) model in classification

I am trying to predict labels for building performance: {1, 0}. Since this is a binary classification, I tried sigmoid and identity activation functions with Xavier initialization. However, I cannot improve the accuracy of my models as the loss and accuracy stay still after training each epoch. This is a very imbalanced dataset where the ones have the 90% majority. So, I assume this might be due to the initial bias. Can you help me with this one? You can see the setup of the training process and the other relevant images attached.model definition, hyperparameters, results

Here are several suggestions which may help:
Use activation after each hidden layer
Learning rate of 0.1 is too high for Adam. Try smaller (3e-4 for example)
You are printing loss value incorrectly. Currently loss value is taken for the last iteration only. Calculate mean epoch loss instead.
Minor suggestion: since the task is binary classification it's better to apply torch.nn.BCELoss or torch.nn.BCEWithLogitsLoss if you don't use sigmoid on last layer. Last linear layer must have output_size=1 in this case.
Best model checkpoint may be missed with code you provided. That's because you calculate accuracy each 10 epochs, however accuracy > best_accuracy is done on each epoch which is inconsistent.

Training Accuracy Increasing but Validation Accuracy Remains as Chance of Each Class (1/number of classes)

I am training a classifier using CNNs in Pytorch. My classifier has 6 labels. There are 700 training images for each label and 10 validation images for each label. The batch size is 10 and the learning rate is 0.000001. Each class has 16.7% of the whole dataset images. I have trained 60 epochs and the architecture has 3 main layers:
Conv2D->ReLU->BatchNorm2D->MaxPool2D>Dropout2D
Conv2D->ReLU->BatchNorm2D->Flattening->Dropout2D
Linear->ReLU->BatchNorm1D->Dropout And finally a fully connected and
a softmax.
My optimizer is AdamW and the loss function is crossentropy. The network is training well as the training accuracy is increasing but the validation accuracy remains almost fixed and equal as the chance of each class(1/number of classes). The accuracy is shown in the image below:
Accuracy of training and test
And the loss is shown in:
Loss for training and validation
Is there any idea why this is happening?How can I improve the validation accuracy? I have used L1 and L2 Regularization as well and also the Dropout Layers. I have also tried adding more data but these didn't help.

Problem Solved: First, I looked at this problem as overfitting and spend so much time on methods to solve this such as regularization and augmentation. Finally, after trying different methods, I couldn't improve the validation accuracy. Thus, I went through the data. I found a bug in my data preparation which was resulting in similar tensors being generated under different labels. I generated the correct data and the problem was solved to some extent (The validation accuracy increased around 60%). Then finally I improved the validation accuracy to 90% by adding more "conv2d + maxpool" layers.

This is not so much a programming related question so maybe ask it again in cross-validated
and it would be easier if you would post your architecture code.
But here are things that I would suggest:
you wrote that you "tried adding more data", if you can, always use all data you have. If thats still not enough (and even if it is) use augmentation (e.g. flip, crop, add noise to the image)
your learning rate should not be so small, start with 0.001 and decay while training or try ~ 0.0001 without decaying
remove the dropout after the conv layers and the batchnorm after the dense layers and see if that helps, it is not so common to use cropout after conv but normally that shouldnt have a negative effect. try it anyways

how will I be able to choose the right parameters and the structure of a neural networks during the training phase?

I'm trying to train a neural network in a supervised learning which has as input x_train a list of 100 list each containing 2000 column ....... and a target output y_train which has a list of 100 list also but contains each 20 column.
This is what x_train and y_train look like:
here is the neural networks that I created :
dnnmodel = tf.keras.models.Sequential()
dnnmodel.add(tf.keras.layers.Dense(40, input_dim = len(id2word), activation='relu'))
dnnmodel.add(tf.keras.layers.Dense(20, activation='relu'))
dnnmodel. compile ( loss = tf.keras.losses.MeanSquaredLogarithmicError(), optimizer = 'adam' , metrics = [ 'accuracy' ])
during the training phase I cannot choose the right number of neurons, layers and the activation and loss functions, since the accurency and loss values are not at all reasonable. .... can someone help me please?
Here is the display after the execution:

There is no correct method or formula to decide the correct number of layers or neurons or any other functions you use in your model. It all comes down experimentation and what works best for your data and the problem that you are trying to solve.
Here are some tips:
sigmoid, tanh = These activations are generally not used in hidden layers as their computed slopes or gradient is very small. So the model can take a long to converge.
Relu, elu, leaky relu - These activations can be used used in hidden layers as they have steep slope compared to others so the training process is fast. Relu is commonly used.
Layers: The more layers you add the deeper you make your neural network. Deeper neural networks are able to learn complex features about your data but they are prone to overfitting. Also, Deep Neural Network suffers from problems like vanishing gradient or exploding gradients. Fewer layers mean fewer params to learn and prone to underfitting.
Loss Function - Loss function depends on the problem you are trying to solve.
For classification
If y_label is categorical go for categorical_cross_entropy
If y_label is discreet go for sparse_categorical_cross_entropy
For regression problems
Use Rmse or MSE
Coming to the training logs. Your model is training as you can see the loss at each epoch less than the previous one. You should train your model for more epochs in order to see improvements in your accuracy.

Why does the accuracy of my convolutional neural network increase after removing the fully connected layer before the final softmax layer?

I have designed convolutional neural network(tf. Keras) which has few parallel convolutional units with different kernal sizes. Then, each output results of that convolution layers are fed into another convolutional units which are in parallel. Then all the outputs are concatenated. Next flattening is done. After that I added fully connected layer and connected to the final softmax layer for multi class classification. I trained it and had good results in validation test.
However I remove the fully connected layer and accuracy was higher than the previous.
Please someone can explain, how does it happen, it will be very helpful.
Thank you for your valuable time.
Parameters as follows.

When you remove a layer, your model will have less chance of over-fitting the training set. Consequently, by making the network shallower, you make your model more robust to unknown examples and the validation accuracy increases.
Since your training accuracy is also increasing, it can be an indication that -
Exploding or vanishing gradients. You can try solving this problem using careful weight initialization, proper regularization, adding shortcuts, or gradient clipping.
You are not training for enough epochs to learn a deeper network. You can try few more epochs.
You do not have enough data to train a deeper network.

Eliminating the dense layer reduces the tendency for over fitting. Therefore your validation loss should improve as long as your model's training accuracy remains high. Alternatively you can add an additional dropout layer. You can also reduce the tendency for over fitting by using regularizers. Documentation for that is here.

My LSTM model overfits over validation data

Here's my LSTM model to classify hand gesture. Initially, I had 1960 training data of shape(num_sequences, num_joints, 3) that I reshape to shape(num_sequences, num_joints*3).
Here's my model:
input_shape = (trainx.shape[1], trainx.shape[2])
print("Build LSTM RNN model ...")
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(171, 66)))
model.add(Bidirectional(LSTM(units=256, activation='tanh', return_sequences=True, input_shape=input_shape)))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Bidirectional(LSTM(units=128, activation='tanh', return_sequences=True)))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Bidirectional(LSTM(units=128, activation='tanh', return_sequences=False)))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Dense(units=trainy.shape[1], activation="softmax"))
print("Compiling ...")
# Keras optimizer defaults:
# Adam : lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8, decay=0.
# RMSprop: lr=0.001, rho=0.9, epsilon=1e-8, decay=0.
# SGD : lr=0.01, momentum=0., decay=0.
opt = Adam()
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
I get a 90% accuracy on train and 50% on test

Overfitting is quite common in deep learning.
To circumvent over fitting with your LSTM architecture try the following things in this order:
Decrease the learning rate from 0.1 or 0.01 to 0.001,0.0001,0.00001.
Reduce the number of epochs. You can try to plot the training and validation accuracy as a function of the number of epochs and see when the training accuracy becomes larger than the validation accuracy. That is the number of epochs that you should use. Combine this with the 1st step decreasing the learning rate.
Then you can try to modify the architecture of the LSTM, here you already added dropout (maximum value 0.5), I would suggest to try 0.2, 0.3. You have 3 cells which is better than 2, the size of the nodes look reasonable. What is the embedding dimension you are currently using? Since you are overfitting it is worth a try to reduce the number of cells from 3 to 2 and keeping the same number of nodes.
The batch size might be important as well as the distribution of subclasses in your dataset. Is the dataset equally distributed and equally balanced between training and validation sets? What I mean by this is that if one hand gesture is over represented in the training set compared to the validation set that might be a problem. A good strategy to overcome this is to keep some part of the data as a test set. Then do a train/split cross validation using sklearn (5 times). Then train your architecture on each train/split model separately (5 times) and compare the training and validation accuracy. If there is a big bias in the split or among the sets you will be able to identify it in this manner.
Last, you can try augmentation, specifically rotation and horizontal/vertical flip. This library might help https://github.com/aleju/imgaug
Hope this helps!

How do you know the network is over fitting versus some kind of error in your data set. Does the validation loss improve initially up to some epoch then plateau or start to increase? Then it is over fitting. If it starts at 50% and stays there it is not an over fitting problem. With the amount of drop out you have over fitting does not look very likely. How did you select your validation set? Was it randomly selected from the overall data set or did you do the selection? It is always better to randomly select the data so that its probability distribution mirrors that of the training data. As said in the comments please show your code for model.fit there could be a problem there. How do you input the data? Did you use generators? A 50% validation accuracy leads me to suspect some error in how your validation data is provided to the network or some error in labeling of the validation data. I would also recommend you consider the use of dynamically adjusting your learning rate based on monitoring of validation loss. Keras has a callback for this
called ReduceLROnPlateau. Documentation is here. Set it up to monitor validation loss. I set the parameters patience=3 and factor=.5 which seems to work well. You can think of training as descending into a valley. As you descend the valley gets narrower. If the learning rate is to large and remains fixed you won't be able to reach further down toward the minimum. This should improve your training accuracy which should result in improvement of validation accuracy. As I said with the level of drop out you have I do not think it is over fitting but if it is you can also use Keras regularizes to help avoid over training. Documentation is here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.