Optimal batch size and epochs for large models

Optimal batch size and epochs for large models - python

I know there are a number of related questions but I was hoping someone could provide some advice specific to the model I am trying to build.
It is an image classification model. At the moment I am trying to classify 40 different classes (40 different types of animals). Within each class there are between 120 and 220 images. My training set is 4708 images and my validation set is 2512 images.
I ran a sequential model (code below) where I used a batch size of 64 and 30 epochs. The code took a long time to run. The accuracy after 30 epochs was about 67 on the validation set and about 70 on the training set. The loss on the validation set was about 1.2 and about 1 on the training set (I have included the last 12 epoch results below). It appears to be tapering off after about 25 epochs.
My questions are around batch size and epochs. Is there value to using larger or smaller batch sizes (than 64) and should I be using more epochs. I read that generally between 50 and 100 epochs are common practice, but if my results are tapering off after 25 is there value to adding more.
Model
history = model.fit_generator(
train_data_gen,
steps_per_epoch= 4708 // batch_size,
epochs=30,
validation_data=val_data_gen,
validation_steps= 2512 // batch_size
)
Results
Epoch 18/30
73/73 [==============================] - 416s 6s/step - loss: 1.0982 - accuracy: 0.6843 - val_loss: 1.3010 - val_accuracy: 0.6418
Epoch 19/30
73/73 [==============================] - 414s 6s/step - loss: 1.1215 - accuracy: 0.6712 - val_loss: 1.2761 - val_accuracy: 0.6454
Epoch 20/30
73/73 [==============================] - 414s 6s/step - loss: 1.0848 - accuracy: 0.6809 - val_loss: 1.2918 - val_accuracy: 0.6442
Epoch 21/30
73/73 [==============================] - 413s 6s/step - loss: 1.0276 - accuracy: 0.7013 - val_loss: 1.2581 - val_accuracy: 0.6430
Epoch 22/30
73/73 [==============================] - 415s 6s/step - loss: 1.0985 - accuracy: 0.6854 - val_loss: 1.2626 - val_accuracy: 0.6575
Epoch 23/30
73/73 [==============================] - 413s 6s/step - loss: 1.0621 - accuracy: 0.6949 - val_loss: 1.3168 - val_accuracy: 0.6346
Epoch 24/30
73/73 [==============================] - 415s 6s/step - loss: 1.0718 - accuracy: 0.6869 - val_loss: 1.1658 - val_accuracy: 0.6755
Epoch 25/30
73/73 [==============================] - 419s 6s/step - loss: 1.0368 - accuracy: 0.6957 - val_loss: 1.1962 - val_accuracy: 0.6739
Epoch 26/30
73/73 [==============================] - 419s 6s/step - loss: 1.0231 - accuracy: 0.7067 - val_loss: 1.3491 - val_accuracy: 0.6426
Epoch 27/30
73/73 [==============================] - 434s 6s/step - loss: 1.0520 - accuracy: 0.6919 - val_loss: 1.2039 - val_accuracy: 0.6683
Epoch 28/30
73/73 [==============================] - 417s 6s/step - loss: 0.9810 - accuracy: 0.7151 - val_loss: 1.2047 - val_accuracy: 0.6711
Epoch 29/30
73/73 [==============================] - 436s 6s/step - loss: 0.9915 - accuracy: 0.7140 - val_loss: 1.1737 - val_accuracy: 0.6711
Epoch 30/30
73/73 [==============================] - 424s 6s/step - loss: 1.0006 - accuracy: 0.7087 - val_loss: 1.2213 - val_accuracy: 0.6619

You should only interrupt the training process when the model doesn't "learn" anymore, meaning that loss and accuracy on the validation data doesn't improve. To do this, you can put an arbitrarily high number of epochs, and use tf.keras.callbacks.EarlyStopping (documentation). This will interrupt the training process when a certain condition is met, for instance when the val_loss hasn't decreased in 10 epochs.
es = EarlyStopping(monitor='val_loss', patience=10)
fit_generator(... callbacks=[es])
This will ensure that the learning process isn't interrupted while the model is still learning, and also that the model won't overfit.
Batch size of 32 is standard, but that's a question more relevant for another site because it's about statistics (and it's very hotly debated).

Yes, if you can go for the as large batch size as you can.
High batch size almost always results in faster convergence, short training time. If you have a GPU with a good memory, just go as high as you can.
As for as epochs, it is hard to decide, as I can your model is still improving in 28-29 epochs so you may have to train for more epochs to get a better model, but aslo look for val_accuracy, it seems your val_acc is improving too which suggests model needs more training.
You can use ModelCheckpoint to store models after each epochs to get the best version of your model. https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint
You can use keras

There are three reasons to choose a batch size.
Speed. If you are using a GPU then larger batches are often nearly as fast to process as smaller batches. That means individual cases are much faster, which means each epoch is faster too.
Regularization. Smaller batches add regularization, similar to increasing dropout, increasing the learning rate, or adding weight decay. Larger batches will reduce regularization.
Memory constraints. This one is a hard limit. At a certain point your GPU just won't be able to fit all the data in memory, and you can't increase batch size any more.
That suggests that larger batch sizes are better until you run out of memory. Unless you are having trouble with overfitting, a larger and still-working batch size will (1) speed up training and (2) allow a larger learning rate, which also speeds up the training process.
That second point comes about because of regularization. If you increase batch size, the reduced regularization gives back some "regularization budget" to spend on an increased learning rate, which will add that regularization back.
Regularization, by the way, is just a way to think about how noisy or smooth your training process is.
Low regularization means that training is very smooth, which means that it is easy for training to converge but also easy for training to overfit.
High regularization means that training is more noisy or difficult, but validation results are better because the noisy training process reduces overfitting and the resulting generalization error.
If you are familiar with the Bias-Variance Tradeoff, adding regularization is a way of adding a bit of bias in order to reduce the variance. Here is one of many good write ups on the subject: Regularization: the path to bias-variance trade-off.
On the broader topic of regularization, training schedules, and hyper-parameter tuning, I highly recommend two papers on the subject by Leslie N. Smith.
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
The first paper, on Super-Convergence, will also address your some of your questions on how many epochs to use.
After that there are no correct answers for how many epochs to use, only guidance. What I do is:
Keep the training schedule as fast as possible for as long as possible while you are working on the model. Faster training means can try more ideas and tune your hyper-parameters more finely.
When you are ready to fine-tune results for some reason (submitting to Kaggle, deploying a model to production) then you can increase epochs and do some final hyper-parameter tuning until validation results stop improving "enough", where "enough" is a combination of your patience and the need for better results.

Related

how to get accuracy from callback function to keras sequence

How can I get accuracy calculated each epoch in keras sequence?
Accuracy after each epoch is being printed on console like:
Epoch 14/500
90/90 [==============================] - 17s 184ms/step - loss: 0.6935 - sparse_categorical_accuracy: 0.5174 - val_loss: 0.6927 - val_sparse_categorical_accuracy: 0.5146 - lr: 0.0010
I am using tf.keras.utils.Sequence to change dataset every epoch with on_epoch_end.
I want to use this accuracy in the Sequence to decide whether to change the dataset or not.
How can I call(or get) this accuracy from callback to Sequence?

Does EarlyStopping take model of last epoch or last best score?

I trained a model in keras using EarlyStopping in callbacks with patience=2:
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=validation_split,
callbacks= [keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=2, verbose=0, mode='min'), keras.callbacks.ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', verbose=1, save_best_only=True)], class_weight=class_weights)
Epoch 1/20
3974/3975 [============================>.] - ETA: 0s - loss: 0.3499 - accuracy: 0.7683
Epoch 00001: val_loss improved from inf to 0.30331, saving model to best_model.h5
3975/3975 [==============================] - 15s 4ms/step - loss: 0.3499 - accuracy: 0.7683 - val_loss: 0.3033 - val_accuracy: 0.8134
Epoch 2/20
3962/3975 [============================>.] - ETA: 0s - loss: 0.2821 - accuracy: 0.8041
Epoch 00002: val_loss improved from 0.30331 to 0.25108, saving model to best_model.h5
3975/3975 [==============================] - 14s 4ms/step - loss: 0.2819 - accuracy: 0.8043 - val_loss: 0.2511 - val_accuracy: 0.8342
Epoch 3/20
3970/3975 [============================>.] - ETA: 0s - loss: 0.2645 - accuracy: 0.8157
Epoch 00003: val_loss did not improve from 0.25108
3975/3975 [==============================] - 14s 4ms/step - loss: 0.2645 - accuracy: 0.8157 - val_loss: 0.2687 - val_accuracy: 0.8338
Epoch 4/20
3962/3975 [============================>.] - ETA: 0s - loss: 0.2553 - accuracy: 0.8223
Epoch 00004: val_loss did not improve from 0.25108
3975/3975 [==============================] - 15s 4ms/step - loss: 0.2553 - accuracy: 0.8224 - val_loss: 0.2836 - val_accuracy: 0.8336
Wall time: 58.4 s
Obviously the model didn't improve after epoch 2, but patience=2 led the algorithm terminate after 4 epochs. When I run model.evaluate now, does it take the model trained after 4 epochs or does it take the model trained after 2 epochs?
Is there any need to save and load the best model with ModelCheckpoint then in order to evaluate?

In you specific case, it retain the model after 4 epochs:
as I can see, the model did indeed improve after the first 2 epochs:
accuracy
0.7683 - 0.8043 - 0.8157 - 0.8224
loss
0.3499 - 0.2821 - 0.2645 - 0.2553
Early stopping would stop your training after the first two epochs if had there been no improvements. In this situation your loss isn't stuck after the first two epochs
So early stopping is not called at the end of epoch 2, and the model returned is the one at the end of your first 4 epochs and so the training has continued
From tensorflow:
Assuming the goal of a training is to minimize the loss. With this,
the metric to be monitored would be 'loss', and mode would be 'min'. A
model.fit() training loop will check at end of every epoch whether the
loss is no longer decreasing, considering the min_delta and patience
if applicable. Once it's found no longer decreasing,
model.stop_training is marked True and the training terminates.
If your model had not improved for after two epochs, your returned model would have been the model after the first two epochs.
The pointed parameter restore_best_weights is not directly connected to the early stopping.
It simply restore the weights of the model at the best detected performances. Always from tf:
Whether to restore model weights from the epoch with the best value of
the monitored quantity. If False, the model weights obtained at the
last step of training are used.
To make an example, you can use restore_best_weights even without early_stopping. The training would end only at the end of your set number of epochs, but the model returned at the end would have been the one with the best performances. This could be useful in the situation of bouncing loss due to a wrong learning rate and\or optimizer
refs: early stopping tf doc

Printing from a callback disrupts data printed by model,fit and data values are different

I am training a model using tensorflow. I have a custom callback which on epoch_end prints
out some data. specifically training accuracy or validation loss. With verbose=1
in model,fit tensorflow prints out the training loss, training accuracy, val_loss and val_accuracy as
shown below. Apparently is is also using tqdm to print the progress bar as well. Problem is if I printout the train accuracy captured from acc=logs.get('accuracy') in the callback at the end of the
epoch it is different from the value model.fit prints out. It is like the model.fit
accuracy value is not from the last batch of the epoch but from the next to last batch. Also you can see that my print out from the callback interrupts the model.fit printout. Note for epoch 1 I have accuracy as 0.9172 while the model.fit value is either 0.8805 or 0.8808.
Anyone know why this is or how to fix it ? Data is shown below;
Epoch 1/30
129/129 [==============================] - ETA: 0s - loss: 3.1791 - accuracy: 0.8805
training accuracy improved from 0.0000 to 0.9172 learning rate held at 0.002000 # callback printed data
129/129 [==============================] - 50s 389ms/step - loss: 3.1723 - accuracy: 0.8808 - val_loss: 3.1509 - val_accuracy: 0.5783
Epoch 2/30
129/129 [==============================] - ETA: 0s - loss: 0.9478 - accuracy: 0.9671
training accuracy improved from 0.9172 to 0.9661 learning rate held at 0.002000 # calback printed data
129/129 [==============================] - 43s 333ms/step - loss: 0.9466 - accuracy: 0.9671 - val_loss: 0.9385 - val_accuracy: 0.8000
Epoch 3/30
129/129 [==============================] - ETA: 0s - loss: 0.4501 - accuracy: 0.9787

OK its a long story but I am using tensorflow 2.1. I had supresed the annoying tensorflow warnings so I did not see any warnings. Out of suspicion I enabled warning and got the warning below.
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch
time (batch time: 0.0350s vs `on_train_batch_end` time: 0.0910s). Check your callbacks.
After much searching I found warning shows up if you run model.fit with verbose=1. Apparently this was a bug in the tensorflow release that causes accuracy data printed out by model.fit to be incorrect. If you run model.fit with verbose=2 you do not get the warning and the accuracy data printed is correct. Problem solved

How do I know if my Neural Network model is overfitting or not (Keras)

I'm using Keras to predict if I'll get an output of 1 or 0. The data looks like this:
funded_amnt emp_length avg_cur_bal num_actv_rev_tl loan_status
10000 5.60088 19266 2 1
13750 5.60088 2802 6 0
26100 10.0000 19241 17 1
The target is loan_status and the features are the remaining. I've normalized the data before starting to build a Neural Network model.
Here's the shape of my training and testing data:
print(X_train.shape,Y_train.shape)
# Output: (693, 4) (693,)
print(X_test.shape,Y_test.shape)
# Output: (149, 4) (149,)
The process I followed to build the Neural Network is:
# define the keras model
model = Sequential()
model.add(Dense(4, input_dim=4,activation='relu'))
model.add(Dense(4 ,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
hist = model.fit(X_train, Y_train, epochs=10, batch_size=2)
The output after running hist:
Epoch 1/10
693/693 [==============================] - 2s 2ms/step - loss: 0.6379 - acc: 0.7013
Epoch 2/10
693/693 [==============================] - 0s 611us/step - loss: 0.5207 - acc: 0.7951
Epoch 3/10
693/693 [==============================] - 0s 605us/step - loss: 0.5126 - acc: 0.7951
Epoch 4/10
693/693 [==============================] - 0s 621us/step - loss: 0.5109 - acc: 0.7951
Epoch 5/10
693/693 [==============================] - 0s 611us/step - loss: 0.5105 - acc: 0.7951
Epoch 6/10
693/693 [==============================] - 0s 636us/step - loss: 0.5091 - acc: 0.7951
Epoch 7/10
693/693 [==============================] - 0s 644us/step - loss: 0.5090 - acc: 0.7951
Epoch 8/10
693/693 [==============================] - 0s 659us/step - loss: 0.5086 - acc: 0.7951
Epoch 9/10
693/693 [==============================] - 0s 668us/step - loss: 0.5083 - acc: 0.7951
Epoch 10/10
693/693 [==============================] - 0s 656us/step - loss: 0.5076 - acc: 0.7951
it's all almost the same and doesn't change after the second Epoch. I've tried changing number of Epochs and Batch size but keep getting the same results.
Is this normal? or is it a sign of overfitting and I need to change some parameters

Your test data meant to be for monitoring the model's overfitting on train data:
hist = model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=10, batch_size=2)
During the training you will reach a point, where the train loss continues to decrease, but your test loss stops to decrease. That the point where your data starts to overfit.
In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably".
As an extreme example, if the number of parameters is the same as or greater than the number of observations, then a model can perfectly predict the training data simply by memorizing the data in its entirety. Such a model, though, will typically fail severely when making predictions.
Usually a learning algorithm is trained using some set of "training data": exemplary situations for which the desired output is known. The goal is that the algorithm will also perform well on predicting the output when fed "validation data" that was not encountered during its training. Overfitting is especially likely in cases where learning was performed too long or where training examples are rare, causing the learner to adjust to very specific random features of the training data, that have no causal relation to the target function. In this process of overfitting, the performance on the training examples still increases while the performance on unseen data becomes worse.
The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data, compared to the black line.

Overfitting is not your problem right now, it can appear in models with a high accurrancy (>95%), you should try training more your model. If you want to check if your model is suffering overffiting, try to forecast using the validation data. If the acurrancy looks too low and the training acurrancy is high, then it is overfitting, maybe.

If you are overfitting, your training loss will continue decreasing, but the validation accuracy doesn't improve. The problem in your case is that your network doesn't have enough capacity to fit the data, or the features you are using doesn't have enough information to perfectly predict the loan status.
You can solve this by either increasing the capacity of your network by adding some layers, dropout, regularization, etc. Or by adding more training data and more features if possible.

In Neural Networks: accuracy improvement after each epoch is GREATER than accuracy improvement after each batch. Why?

I am training a neural network in batches with Keras 2.0 package for Python.
Below is some information about the data and the training parameters:
#samples in train: 414934
#features: 590093
#classes: 2 (binary classification problem)
batch size: 1024
#batches = 406 (414934 / 1024 = 405.2)
Below are some logs of the follow code:
for i in range(epochs):
print("train_model:: starting epoch {0}/{1}".format(i + 1, epochs))
model.fit_generator(generator=batch_generator(data_train, target_train, batch_size),
steps_per_epoch=num_of_batches,
epochs=1,
verbose=1)
(partial) Logs:
train_model:: starting epoch 1/3
Epoch 1/1
1/406 [..............................] - ETA: 11726s - loss: 0.7993 - acc: 0.5996
2/406 [..............................] - ETA: 11237s - loss: 0.7260 - acc: 0.6587
3/406 [..............................] - ETA: 14136s - loss: 0.6619 - acc: 0.7279
404/406 [============================>.] - ETA: 53s - loss: 0.3542 - acc: 0.8917
405/406 [============================>.] - ETA: 26s - loss: 0.3541 - acc: 0.8917
406/406 [==============================] - 10798s - loss: 0.3539 - acc: 0.8918
train_model:: starting epoch 2/3
Epoch 1/1
1/406 [..............................] - ETA: 15158s - loss: 0.2152 - acc: 0.9424
2/406 [..............................] - ETA: 14774s - loss: 0.2109 - acc: 0.9419
3/406 [..............................] - ETA: 16132s - loss: 0.2097 - acc: 0.9408
404/406 [============================>.] - ETA: 64s - loss: 0.2225 - acc: 0.9329
405/406 [============================>.] - ETA: 32s - loss: 0.2225 - acc: 0.9329
406/406 [==============================] - 13127s - loss: 0.2225 - acc: 0.9329
train_model:: starting epoch 3/3
Epoch 1/1
1/406 [..............................] - ETA: 22631s - loss: 0.1145 - acc: 0.9756
2/406 [..............................] - ETA: 24469s - loss: 0.1220 - acc: 0.9688
3/406 [..............................] - ETA: 23475s - loss: 0.1202 - acc: 0.9691
404/406 [============================>.] - ETA: 60s - loss: 0.1006 - acc: 0.9745
405/406 [============================>.] - ETA: 31s - loss: 0.1006 - acc: 0.9745
406/406 [==============================] - 11147s - loss: 0.1006 - acc: 0.9745
My question is: what happens after each epoch that improves the accuracy like that? For example, the accuracy at the end of the first epoch is 0.8918, but at the beginning of the second epoch accuracy of 0.9424 is observed. Similarly, the accuracy at the end of the second epoch is 0.9329, but the third epoch starts with accuracy of 0.9756.
I would expect to find an accuracy of ~0.8918 at the beginning of the second epoch, and ~0.9329 at the beginning of the third epoch.
I know that in each batch there is one forward pass and one backward pass of training samples in the batch. Thus, in each epoch there is one forward pass and one backward pass of all training samples.
Also, from Keras documentation:
Epoch: an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation.
Why is the accuracy improvement within each epoch is smaller than the accuracy improvement between the end of epoch X and the beginning of epoch X+1?

This has nothing to do with your model or your dataset; the reason for this "jump" lies in how metrics are calculated and displayed in Keras.
As Keras processes batch after batch, it saves accuracies at each one of them, and what it displays to you is not the accuracy on the latest processed batch, but the average over all batches in the current epoch. And, as the model is being trained, accuracies over successive batches tend to improve.
Now consider: in the first epoch, let's say, there are 50 batches, and network went from 0% to 90% during these 50 batches. Then at the end of the epoch Keras will show accuracy of, e.g. (0 + 0.1 + 0.5 + ... + 90) / 50%, which is, obviously, much less than 90%! But, because your actual accuracy is 90%, the first batch of the second epoch will show 90%, giving the impression of a sudden "jump" in quality. The same, obviously, goes for loss or any other metric.
Now, if you want more realistic and trustworthy calculation of accuracy, loss, or any other metric you may find yourself using, I would suggest using validation_data parameter in model.fit[_generator] to provide validation data, which will not be used for training, but will be used only to evaluate the network at the end of each epoch, without averaging over various points in time.

The accuracy at the end of an epoch is the accuracy over the full dataset. The accuracy after each batch is the accuracy over all batches that are used for training at that moment. It could be the case that your first batch is predicted very well and the following batches have a lower accuracy. In that case the accuracy over your full dataset will be low compared to the accuracy of your first batch.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.