Keras metrics during model.fit

Keras metrics during model.fit - python

Are these metrics, displayed while model.fit(x_train, y_train, epochs=3, batch_size=128) is running
945792/1424460 [==================>...........] - ETA: 5:22 - loss: 0.7029 - accuracy: 0.7312
945920/1424460 [==================>...........] - ETA: 5:22 - loss: 0.7029 - accuracy: 0.7312
946048/1424460 [==================>...........] - ETA: 5:22 - loss: 0.7029 - accuracy: 0.7312
an information about
the average loss on the current mini batch? (i.e. on the last 128 training inputs)
the average loss since the beginning of the epoch?
the average loss since the beginning of all epochs, i.e. the start of the training process?
The same question applies to the accuracy: on what is it computed exactly?

The average loss since the beginning of the epoch.
In general it's like (this is pseudocode):
for epoch in range(epochs):
total_batches = 0
total_loss = 0
for batch in range(batches):
total_loss += calculate_loss(batch)
total_batches += 1
displayed_loss = total_loss / total_batches
The same goes for every metric. But again, metrics may be written as the user likes it. Usually they're an average of the batch.
For each batch: get the batch loss as calculated (most of them are an average of samples, but losses may return whatever you code)
When the loss is evolving (it's not evolving in your example, it's frozen), you will notice a significant difference between the loss displayed for the last epoch and the initial loss of the current epoch. This difference is because the previus loss was considering also the initial batches in a time when the model wasn't as well trained.

Related

Training loss started extremely lower than validation loss. What is happening?

My training loss started very low at 0.0181 whereas validation loss started at 2.4625, which is over 150 fold difference. Validation loss did improve as the model tries to learn and memorize generalizable features, yet it never gets to the level of the training loss and the training loss becomes extremely small as the model learns more epochs.
The model is trained on ~31K data instances (train+val) for a binary classification problem. The model adopts a Conv1D layer fed into Bidirectional GRU units, which is then output into two fully connected layers.
Total params: 294,737
Trainable params: 294,417
Non-trainable params: 320
Model architecture (figure): Model architecture
Training model....
(31639, 121, 4)
Epoch 1/100
learning rate: 0.01
15/15 [==============================] - 6s 190ms/step - loss: 0.0181 - accuracy: 0.0926 - val_loss: 2.4625 - val_accuracy: 0.0506
Epoch 2/100
learning rate: 0.01
15/15 [==============================] - 2s 148ms/step - loss: 0.0104 - accuracy: 0.0478 - val_loss: 2.1587 - val_accuracy: 0.0506
.....
Full model loss training history Model Loss
I trained it for 100 epochs with several callback modifications, and binary cross-entropy was used as the loss function. The dataset was highly imbalanced with 5% positive labels and 95% negative labels.
model = Model(inputs=[f_input, rc_input], outputs=predict)
adam=Adam(lr=hyperparam_dict['initial_lr'],beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
Loss history see here (figure):
Model training/val loss vs epochs
Accuracy history see here (figure):
Model training/val accuracy vs epochs

Optimal batch size and epochs for large models

I know there are a number of related questions but I was hoping someone could provide some advice specific to the model I am trying to build.
It is an image classification model. At the moment I am trying to classify 40 different classes (40 different types of animals). Within each class there are between 120 and 220 images. My training set is 4708 images and my validation set is 2512 images.
I ran a sequential model (code below) where I used a batch size of 64 and 30 epochs. The code took a long time to run. The accuracy after 30 epochs was about 67 on the validation set and about 70 on the training set. The loss on the validation set was about 1.2 and about 1 on the training set (I have included the last 12 epoch results below). It appears to be tapering off after about 25 epochs.
My questions are around batch size and epochs. Is there value to using larger or smaller batch sizes (than 64) and should I be using more epochs. I read that generally between 50 and 100 epochs are common practice, but if my results are tapering off after 25 is there value to adding more.
Model
history = model.fit_generator(
train_data_gen,
steps_per_epoch= 4708 // batch_size,
epochs=30,
validation_data=val_data_gen,
validation_steps= 2512 // batch_size
)
Results
Epoch 18/30
73/73 [==============================] - 416s 6s/step - loss: 1.0982 - accuracy: 0.6843 - val_loss: 1.3010 - val_accuracy: 0.6418
Epoch 19/30
73/73 [==============================] - 414s 6s/step - loss: 1.1215 - accuracy: 0.6712 - val_loss: 1.2761 - val_accuracy: 0.6454
Epoch 20/30
73/73 [==============================] - 414s 6s/step - loss: 1.0848 - accuracy: 0.6809 - val_loss: 1.2918 - val_accuracy: 0.6442
Epoch 21/30
73/73 [==============================] - 413s 6s/step - loss: 1.0276 - accuracy: 0.7013 - val_loss: 1.2581 - val_accuracy: 0.6430
Epoch 22/30
73/73 [==============================] - 415s 6s/step - loss: 1.0985 - accuracy: 0.6854 - val_loss: 1.2626 - val_accuracy: 0.6575
Epoch 23/30
73/73 [==============================] - 413s 6s/step - loss: 1.0621 - accuracy: 0.6949 - val_loss: 1.3168 - val_accuracy: 0.6346
Epoch 24/30
73/73 [==============================] - 415s 6s/step - loss: 1.0718 - accuracy: 0.6869 - val_loss: 1.1658 - val_accuracy: 0.6755
Epoch 25/30
73/73 [==============================] - 419s 6s/step - loss: 1.0368 - accuracy: 0.6957 - val_loss: 1.1962 - val_accuracy: 0.6739
Epoch 26/30
73/73 [==============================] - 419s 6s/step - loss: 1.0231 - accuracy: 0.7067 - val_loss: 1.3491 - val_accuracy: 0.6426
Epoch 27/30
73/73 [==============================] - 434s 6s/step - loss: 1.0520 - accuracy: 0.6919 - val_loss: 1.2039 - val_accuracy: 0.6683
Epoch 28/30
73/73 [==============================] - 417s 6s/step - loss: 0.9810 - accuracy: 0.7151 - val_loss: 1.2047 - val_accuracy: 0.6711
Epoch 29/30
73/73 [==============================] - 436s 6s/step - loss: 0.9915 - accuracy: 0.7140 - val_loss: 1.1737 - val_accuracy: 0.6711
Epoch 30/30
73/73 [==============================] - 424s 6s/step - loss: 1.0006 - accuracy: 0.7087 - val_loss: 1.2213 - val_accuracy: 0.6619

You should only interrupt the training process when the model doesn't "learn" anymore, meaning that loss and accuracy on the validation data doesn't improve. To do this, you can put an arbitrarily high number of epochs, and use tf.keras.callbacks.EarlyStopping (documentation). This will interrupt the training process when a certain condition is met, for instance when the val_loss hasn't decreased in 10 epochs.
es = EarlyStopping(monitor='val_loss', patience=10)
fit_generator(... callbacks=[es])
This will ensure that the learning process isn't interrupted while the model is still learning, and also that the model won't overfit.
Batch size of 32 is standard, but that's a question more relevant for another site because it's about statistics (and it's very hotly debated).

Yes, if you can go for the as large batch size as you can.
High batch size almost always results in faster convergence, short training time. If you have a GPU with a good memory, just go as high as you can.
As for as epochs, it is hard to decide, as I can your model is still improving in 28-29 epochs so you may have to train for more epochs to get a better model, but aslo look for val_accuracy, it seems your val_acc is improving too which suggests model needs more training.
You can use ModelCheckpoint to store models after each epochs to get the best version of your model. https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint
You can use keras

There are three reasons to choose a batch size.
Speed. If you are using a GPU then larger batches are often nearly as fast to process as smaller batches. That means individual cases are much faster, which means each epoch is faster too.
Regularization. Smaller batches add regularization, similar to increasing dropout, increasing the learning rate, or adding weight decay. Larger batches will reduce regularization.
Memory constraints. This one is a hard limit. At a certain point your GPU just won't be able to fit all the data in memory, and you can't increase batch size any more.
That suggests that larger batch sizes are better until you run out of memory. Unless you are having trouble with overfitting, a larger and still-working batch size will (1) speed up training and (2) allow a larger learning rate, which also speeds up the training process.
That second point comes about because of regularization. If you increase batch size, the reduced regularization gives back some "regularization budget" to spend on an increased learning rate, which will add that regularization back.
Regularization, by the way, is just a way to think about how noisy or smooth your training process is.
Low regularization means that training is very smooth, which means that it is easy for training to converge but also easy for training to overfit.
High regularization means that training is more noisy or difficult, but validation results are better because the noisy training process reduces overfitting and the resulting generalization error.
If you are familiar with the Bias-Variance Tradeoff, adding regularization is a way of adding a bit of bias in order to reduce the variance. Here is one of many good write ups on the subject: Regularization: the path to bias-variance trade-off.
On the broader topic of regularization, training schedules, and hyper-parameter tuning, I highly recommend two papers on the subject by Leslie N. Smith.
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
The first paper, on Super-Convergence, will also address your some of your questions on how many epochs to use.
After that there are no correct answers for how many epochs to use, only guidance. What I do is:
Keep the training schedule as fast as possible for as long as possible while you are working on the model. Faster training means can try more ideas and tune your hyper-parameters more finely.
When you are ready to fine-tune results for some reason (submitting to Kaggle, deploying a model to production) then you can increase epochs and do some final hyper-parameter tuning until validation results stop improving "enough", where "enough" is a combination of your patience and the need for better results.

How do I know if my Neural Network model is overfitting or not (Keras)

I'm using Keras to predict if I'll get an output of 1 or 0. The data looks like this:
funded_amnt emp_length avg_cur_bal num_actv_rev_tl loan_status
10000 5.60088 19266 2 1
13750 5.60088 2802 6 0
26100 10.0000 19241 17 1
The target is loan_status and the features are the remaining. I've normalized the data before starting to build a Neural Network model.
Here's the shape of my training and testing data:
print(X_train.shape,Y_train.shape)
# Output: (693, 4) (693,)
print(X_test.shape,Y_test.shape)
# Output: (149, 4) (149,)
The process I followed to build the Neural Network is:
# define the keras model
model = Sequential()
model.add(Dense(4, input_dim=4,activation='relu'))
model.add(Dense(4 ,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
hist = model.fit(X_train, Y_train, epochs=10, batch_size=2)
The output after running hist:
Epoch 1/10
693/693 [==============================] - 2s 2ms/step - loss: 0.6379 - acc: 0.7013
Epoch 2/10
693/693 [==============================] - 0s 611us/step - loss: 0.5207 - acc: 0.7951
Epoch 3/10
693/693 [==============================] - 0s 605us/step - loss: 0.5126 - acc: 0.7951
Epoch 4/10
693/693 [==============================] - 0s 621us/step - loss: 0.5109 - acc: 0.7951
Epoch 5/10
693/693 [==============================] - 0s 611us/step - loss: 0.5105 - acc: 0.7951
Epoch 6/10
693/693 [==============================] - 0s 636us/step - loss: 0.5091 - acc: 0.7951
Epoch 7/10
693/693 [==============================] - 0s 644us/step - loss: 0.5090 - acc: 0.7951
Epoch 8/10
693/693 [==============================] - 0s 659us/step - loss: 0.5086 - acc: 0.7951
Epoch 9/10
693/693 [==============================] - 0s 668us/step - loss: 0.5083 - acc: 0.7951
Epoch 10/10
693/693 [==============================] - 0s 656us/step - loss: 0.5076 - acc: 0.7951
it's all almost the same and doesn't change after the second Epoch. I've tried changing number of Epochs and Batch size but keep getting the same results.
Is this normal? or is it a sign of overfitting and I need to change some parameters

Your test data meant to be for monitoring the model's overfitting on train data:
hist = model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=10, batch_size=2)
During the training you will reach a point, where the train loss continues to decrease, but your test loss stops to decrease. That the point where your data starts to overfit.
In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably".
As an extreme example, if the number of parameters is the same as or greater than the number of observations, then a model can perfectly predict the training data simply by memorizing the data in its entirety. Such a model, though, will typically fail severely when making predictions.
Usually a learning algorithm is trained using some set of "training data": exemplary situations for which the desired output is known. The goal is that the algorithm will also perform well on predicting the output when fed "validation data" that was not encountered during its training. Overfitting is especially likely in cases where learning was performed too long or where training examples are rare, causing the learner to adjust to very specific random features of the training data, that have no causal relation to the target function. In this process of overfitting, the performance on the training examples still increases while the performance on unseen data becomes worse.
The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data, compared to the black line.

Overfitting is not your problem right now, it can appear in models with a high accurrancy (>95%), you should try training more your model. If you want to check if your model is suffering overffiting, try to forecast using the validation data. If the acurrancy looks too low and the training acurrancy is high, then it is overfitting, maybe.

If you are overfitting, your training loss will continue decreasing, but the validation accuracy doesn't improve. The problem in your case is that your network doesn't have enough capacity to fit the data, or the features you are using doesn't have enough information to perfectly predict the loan status.
You can solve this by either increasing the capacity of your network by adding some layers, dropout, regularization, etc. Or by adding more training data and more features if possible.

Testing a saved Convolutional autoencoder

I have trained and saved a Convolutional Autoencoder in Keras. Before saving the model in .h5 format, it had a training loss of 0.2394 and a validation loss of 0.2586. In testing the saved model, I get a loss which is more than double the validation loss, 0.6707. I am actually testing it with the sample from the training data just to see if I will get the same loss or even closer as it was during training.
Here is how I calculate the loss, where 'total' is the total number of images I am passing to test the model
score = np.sqrt(metrics.mean_squared_error(predicteds,images))
print ('Loss:',score/total)
Am I making a mistake on how I calculate the test loss?
Here is model compile
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
and train verbose
Epoch 18/20 167/167 [==============================] -
462s 3s/step - loss: 0.2392 - val_loss: 0.2585
Epoch 19/20 167/167 [==============================] -
461s 3s/step - loss: 0.2399 - val_loss: 0.2609
Epoch 20/20 167/167 [==============================] -
475s 3s/step - loss: 0.2394 - val_loss: 0.2586

I think you are confusing the metrics and the loss function.
Based on your model.compile(), you are using the binary_crossentropy loss function. This means that the loss mentioned in the verbose is related to the binary crossentropy (both loss - training loss and val_loss - validation loss).
You are scoring the model using RMSE, but then comparing the RMSE to binary crossentropy loss.
To train using MSE or to make use of other comparable metrics, you need to compile the model with the MSE loss or make use of MSE as a metric. For more information on keras.losses and keras.metrics, have a look at the documentation.

Assuming that you complied with autoencoder.compile(optimizer='adam', loss='mean_squared_error'). But it seens that you use root mean square error in np.sqrt(metrics.mean_squared_error(predicteds,images)) to calculate the loss rather than mean square error. When a number is less than 1, its square root is larger than itself. Maybe this is why your test loss is strangely larger than you training loss. By the way, you can use autoencoder.evaluate(images, images) to get the test loss.

What does the acc means in the Keras model.fit output? the accuracy of the final iteration in a epoch or the average accuracy in a epoch?

I want to know the printed accuracy is the accuracy of the final iteration in a epoch or the average accuracy in a epoch?
code:
history=model.fit(data,label,nb_epoch=1,batch_size=32,validation_data=(X_test,Y_test))
the printed log:
Epoch 1/1
128/128 [==============================] - 17s - loss: 2.3152 - acc: 0.0859 - val_loss: 2.3010 - val_acc: 0.1157

According to callback and history documentation;
acc represents the average training accuracy at the end of an epoch.
val_acc represents the accuracy of validation set at the and of an epoch.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.