Training loss started extremely lower than validation loss. What is happening?

Training loss started extremely lower than validation loss. What is happening? - python

My training loss started very low at 0.0181 whereas validation loss started at 2.4625, which is over 150 fold difference. Validation loss did improve as the model tries to learn and memorize generalizable features, yet it never gets to the level of the training loss and the training loss becomes extremely small as the model learns more epochs.
The model is trained on ~31K data instances (train+val) for a binary classification problem. The model adopts a Conv1D layer fed into Bidirectional GRU units, which is then output into two fully connected layers.
Total params: 294,737
Trainable params: 294,417
Non-trainable params: 320
Model architecture (figure): Model architecture
Training model....
(31639, 121, 4)
Epoch 1/100
learning rate: 0.01
15/15 [==============================] - 6s 190ms/step - loss: 0.0181 - accuracy: 0.0926 - val_loss: 2.4625 - val_accuracy: 0.0506
Epoch 2/100
learning rate: 0.01
15/15 [==============================] - 2s 148ms/step - loss: 0.0104 - accuracy: 0.0478 - val_loss: 2.1587 - val_accuracy: 0.0506
.....
Full model loss training history Model Loss
I trained it for 100 epochs with several callback modifications, and binary cross-entropy was used as the loss function. The dataset was highly imbalanced with 5% positive labels and 95% negative labels.
model = Model(inputs=[f_input, rc_input], outputs=predict)
adam=Adam(lr=hyperparam_dict['initial_lr'],beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
Loss history see here (figure):
Model training/val loss vs epochs
Accuracy history see here (figure):
Model training/val accuracy vs epochs

Related

Keras metrics during model.fit

Are these metrics, displayed while model.fit(x_train, y_train, epochs=3, batch_size=128) is running
945792/1424460 [==================>...........] - ETA: 5:22 - loss: 0.7029 - accuracy: 0.7312
945920/1424460 [==================>...........] - ETA: 5:22 - loss: 0.7029 - accuracy: 0.7312
946048/1424460 [==================>...........] - ETA: 5:22 - loss: 0.7029 - accuracy: 0.7312
an information about
the average loss on the current mini batch? (i.e. on the last 128 training inputs)
the average loss since the beginning of the epoch?
the average loss since the beginning of all epochs, i.e. the start of the training process?
The same question applies to the accuracy: on what is it computed exactly?

The average loss since the beginning of the epoch.
In general it's like (this is pseudocode):
for epoch in range(epochs):
total_batches = 0
total_loss = 0
for batch in range(batches):
total_loss += calculate_loss(batch)
total_batches += 1
displayed_loss = total_loss / total_batches
The same goes for every metric. But again, metrics may be written as the user likes it. Usually they're an average of the batch.
For each batch: get the batch loss as calculated (most of them are an average of samples, but losses may return whatever you code)
When the loss is evolving (it's not evolving in your example, it's frozen), you will notice a significant difference between the loss displayed for the last epoch and the initial loss of the current epoch. This difference is because the previus loss was considering also the initial batches in a time when the model wasn't as well trained.

How do I know if my Neural Network model is overfitting or not (Keras)

I'm using Keras to predict if I'll get an output of 1 or 0. The data looks like this:
funded_amnt emp_length avg_cur_bal num_actv_rev_tl loan_status
10000 5.60088 19266 2 1
13750 5.60088 2802 6 0
26100 10.0000 19241 17 1
The target is loan_status and the features are the remaining. I've normalized the data before starting to build a Neural Network model.
Here's the shape of my training and testing data:
print(X_train.shape,Y_train.shape)
# Output: (693, 4) (693,)
print(X_test.shape,Y_test.shape)
# Output: (149, 4) (149,)
The process I followed to build the Neural Network is:
# define the keras model
model = Sequential()
model.add(Dense(4, input_dim=4,activation='relu'))
model.add(Dense(4 ,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
hist = model.fit(X_train, Y_train, epochs=10, batch_size=2)
The output after running hist:
Epoch 1/10
693/693 [==============================] - 2s 2ms/step - loss: 0.6379 - acc: 0.7013
Epoch 2/10
693/693 [==============================] - 0s 611us/step - loss: 0.5207 - acc: 0.7951
Epoch 3/10
693/693 [==============================] - 0s 605us/step - loss: 0.5126 - acc: 0.7951
Epoch 4/10
693/693 [==============================] - 0s 621us/step - loss: 0.5109 - acc: 0.7951
Epoch 5/10
693/693 [==============================] - 0s 611us/step - loss: 0.5105 - acc: 0.7951
Epoch 6/10
693/693 [==============================] - 0s 636us/step - loss: 0.5091 - acc: 0.7951
Epoch 7/10
693/693 [==============================] - 0s 644us/step - loss: 0.5090 - acc: 0.7951
Epoch 8/10
693/693 [==============================] - 0s 659us/step - loss: 0.5086 - acc: 0.7951
Epoch 9/10
693/693 [==============================] - 0s 668us/step - loss: 0.5083 - acc: 0.7951
Epoch 10/10
693/693 [==============================] - 0s 656us/step - loss: 0.5076 - acc: 0.7951
it's all almost the same and doesn't change after the second Epoch. I've tried changing number of Epochs and Batch size but keep getting the same results.
Is this normal? or is it a sign of overfitting and I need to change some parameters

Your test data meant to be for monitoring the model's overfitting on train data:
hist = model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=10, batch_size=2)
During the training you will reach a point, where the train loss continues to decrease, but your test loss stops to decrease. That the point where your data starts to overfit.
In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably".
As an extreme example, if the number of parameters is the same as or greater than the number of observations, then a model can perfectly predict the training data simply by memorizing the data in its entirety. Such a model, though, will typically fail severely when making predictions.
Usually a learning algorithm is trained using some set of "training data": exemplary situations for which the desired output is known. The goal is that the algorithm will also perform well on predicting the output when fed "validation data" that was not encountered during its training. Overfitting is especially likely in cases where learning was performed too long or where training examples are rare, causing the learner to adjust to very specific random features of the training data, that have no causal relation to the target function. In this process of overfitting, the performance on the training examples still increases while the performance on unseen data becomes worse.
The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data, compared to the black line.

Overfitting is not your problem right now, it can appear in models with a high accurrancy (>95%), you should try training more your model. If you want to check if your model is suffering overffiting, try to forecast using the validation data. If the acurrancy looks too low and the training acurrancy is high, then it is overfitting, maybe.

If you are overfitting, your training loss will continue decreasing, but the validation accuracy doesn't improve. The problem in your case is that your network doesn't have enough capacity to fit the data, or the features you are using doesn't have enough information to perfectly predict the loan status.
You can solve this by either increasing the capacity of your network by adding some layers, dropout, regularization, etc. Or by adding more training data and more features if possible.

Testing a saved Convolutional autoencoder

I have trained and saved a Convolutional Autoencoder in Keras. Before saving the model in .h5 format, it had a training loss of 0.2394 and a validation loss of 0.2586. In testing the saved model, I get a loss which is more than double the validation loss, 0.6707. I am actually testing it with the sample from the training data just to see if I will get the same loss or even closer as it was during training.
Here is how I calculate the loss, where 'total' is the total number of images I am passing to test the model
score = np.sqrt(metrics.mean_squared_error(predicteds,images))
print ('Loss:',score/total)
Am I making a mistake on how I calculate the test loss?
Here is model compile
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
and train verbose
Epoch 18/20 167/167 [==============================] -
462s 3s/step - loss: 0.2392 - val_loss: 0.2585
Epoch 19/20 167/167 [==============================] -
461s 3s/step - loss: 0.2399 - val_loss: 0.2609
Epoch 20/20 167/167 [==============================] -
475s 3s/step - loss: 0.2394 - val_loss: 0.2586

I think you are confusing the metrics and the loss function.
Based on your model.compile(), you are using the binary_crossentropy loss function. This means that the loss mentioned in the verbose is related to the binary crossentropy (both loss - training loss and val_loss - validation loss).
You are scoring the model using RMSE, but then comparing the RMSE to binary crossentropy loss.
To train using MSE or to make use of other comparable metrics, you need to compile the model with the MSE loss or make use of MSE as a metric. For more information on keras.losses and keras.metrics, have a look at the documentation.

Assuming that you complied with autoencoder.compile(optimizer='adam', loss='mean_squared_error'). But it seens that you use root mean square error in np.sqrt(metrics.mean_squared_error(predicteds,images)) to calculate the loss rather than mean square error. When a number is less than 1, its square root is larger than itself. Maybe this is why your test loss is strangely larger than you training loss. By the way, you can use autoencoder.evaluate(images, images) to get the test loss.

Accuracy metric fails with trivial custom Keras loss function

I am playing around with custom loss functions on Keras models. My "custom" loss seems to fail (in terms of accuracy score), even though I am only using a wrapper that returns an original keras loss.
As a toy example, I am using the "Basic classification" Tensorflow/Keras tutorial that uses a simple NN on the fashion-MNIST data set and I am following the related Keras documentation and this SO post.
This is the model:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
Now, if I leave the sparse_categorical_crossentropy as a string argument in compile() function, the training results to a ~ 87% accuracy which is fine:
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('\nTest accuracy:', test_acc)
But when I just create a trivial wrapper function to call keras' cross-entropy I get a ~ 10% accuracy both on training and test sets:
from tensorflow.keras import losses
def my_loss(y_true, y_pred):
return losses.sparse_categorical_crossentropy(y_true, y_pred)
model.compile(optimizer='adam',
loss=my_loss,
metrics=['accuracy'])
Epoch 1/10 60000/60000 [==============================] - 3s 51us/sample - loss: 0.5030 - accuracy: 0.1032
Epoch 2/10 60000/60000 [==============================] - 3s 45us/sample - loss: 0.3766 - accuracy: 0.1035
...
Test accuracy: 0.1013
By plotting a few images and checking their classified labels, it doesn't look like the results differ in each case, but accuracies printed are very different. So, is it the case that the default metrics do not play nicely with custom losses? Can it be the case that what I see is the error rather than the accuracy? Am I missing something from the documentation?
Edit: The values of the loss functions in both cases end up roughly the same, so training indeed takes place. The accuracy is the point of failure.

Here's the reason:
When you use inbuilt loss and use loss='sparse_categorical_crossentropy' at that time accuracy metric used is sparse_categorical_accuracy But when you use custom loss function at that time accuracy metric used is categorical_accuracy.
Example:
model.compile(optimizer='adam',
loss=losses.sparse_categorical_crossentropy,
metrics=['categorical_accuracy', 'sparse_categorical_accuracy'])
model.fit(train_images, train_labels, epochs=1)
'''
Train on 60000 samples
60000/60000 [==============================] - 5s 86us/sample - loss: 0.4955 - categorical_accuracy: 0.1045 - sparse_categorical_accuracy: 0.8255
'''
model.compile(optimizer='adam',
loss=my_loss,
metrics=['accuracy', 'sparse_categorical_accuracy'])
model.fit(train_images, train_labels, epochs=1)
'''
Train on 60000 samples
60000/60000 [==============================] - 5s 87us/sample - loss: 0.4956 - acc: 0.1043 - sparse_categorical_accuracy: 0.8256
'''

Linear Classifier Tensorflow2 not training(1 neuron model)

I'm currently working on the CIFAR-10 Dataset which is an image classification problem with 10 classes.
I have started to develop with Tensorflow 2 a Linear Classification without the LinearClassifier Object.
X shape corresponds to 10 000 images of 32*32 pixels RBG = (10000, 3072)
Y_one_hot is a one hot vector = (10000, 10)
model creation code:
model = tf.keras.Sequential()
model.add(Dense(1, activation="linear", input_dim=32*32*3))
model.add(Dense(10, activation="softmax", input_dim=1))
model.compile(optimizer="adam", loss="mean_squared_error", metrics=["accuracy"])
training code:
model.fit(X, Y_one_hot, batch_size=10000, verbose=1, epochs=100)
predict code:
img = X[0].reshape(1, 3072) # Select image 0
res = np.argmax((model.predict(img))) # select the max in output
Problem:
res value is always the same. It seems my model is not learning.
Model.summary
Summary displays :
dense (Dense) (None, 1) 3073
dense_1 (Dense) (None, 10) 20
Total params: 3,093
Trainable params: 3,093
Non-trainable params: 0
Accuracy & loss:
Epoch 1/100
10000/10000 [==============================] - 2s 184us/sample - loss: 0.0949 - accuracy: 0.1005
Epoch 50/100
10000/10000 [==============================] - 0s 10us/sample - loss: 0.0901 - accuracy: 0.1000
Epoch 100/100
10000/10000 [==============================] - 0s 8us/sample - loss: 0.0901 - accuracy: 0.1027
Do you have any idea why my model is always prediciting the same value ?
Thanks,

One remarks:
The loss you used loss="mean_squared_error"is not meant for classification. Is meant for regression. Two very different problems. Try a cross entropy. For example
`model.compile(optimizer=AdamOpt,
loss='categorical_crossentropy', metrics=['accuracy'])`
You can find an example here: https://github.com/michelucci/oreilly-london-ai/blob/master/day1/Beginner%20friendly%20networks/First_Example_of_a_CNN_(CIFAR10).ipynb. Is a note book I used for a training I gave. The network is CNN but you can change it with yours.
Try that...
Best of luck, Umberto

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.