Resuming neural network training after a certain epoch in Keras

Resuming neural network training after a certain epoch in Keras - python

I am training a neural network with a constant learning rate and epoch = 45. I observed that the accuracy is highest at epoch no 35 and then it wiggles around and decreases. I think I need to reduce the learning rate at epoch 35. Is there any chance that I can train the model again from epoch no 35 after the completion of all the epochs? My code is shown below-
model_nn = keras.Sequential()
model_nn.add(Dense(352, input_dim=28, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(384, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(288, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(448, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(320, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(1, activation='sigmoid'))
auc_score = tf.keras.metrics.AUC()
model_nn.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
metrics=['accuracy',auc_score])
history = model_nn.fit(X_train1, y_train1,
validation_data=(X_test, y_test),
epochs=45,
batch_size=250,
verbose=1)
_, accuracy = model_nn.evaluate(X_test, y_test)
# Saving model weights
model_nn.save('mymodel.h5')

You can do two useful things:
Use the ModelCheckpoint callback with the save_best_only=True parameter. It only saves when the model is considered the "best" and the latest best model according to the quantity monitored will not be overwritten.
Use the ReduceLROnPlateau and EarlyStopping callbacks. ReduceLROnPlateau will reduce learning rate when the metric has stops improving for the validation subset. EarlyStopping will stop training when a monitored metric has stopped improving at all.
In simple words, ReduceLROnPlateau helps us find the global minimum, EarlyStopping takes care of the number of epochs, and ModelCheckpoint will save the best model.
The code might look like this:
early_stoping = EarlyStopping(patience=5, min_delta=0.0001)
reduce_lr_loss = ReduceLROnPlateau(patience=2, verbose=1, min_delta=0.0001, factor=0.65)
model_checkpoint = ModelCheckpoint(save_best_only=True)
history = model_nn.fit(X_train1, y_train1,
validation_data=(X_test,y_test),
epochs=100,
batch_size=250,
verbose=1,
callbacks=[early_stoping, reduce_lr_loss, model_checkpoint])

Related

TensorFlow Regression with EarlyStopping and Dropout results in underfitting

New to ML and I would like to know what I'm missing or doing incorrectly.
I'm trying to figure out why my data is being underfit when applying early stopping and dropout however when I don't use earlystopping or dropout the fit seems to be okay...
Dataset I'm using:
https://www.kaggle.com/datasets/kanths028/usa-housing
Model Parameters:
The dataset has 5 features to train on and the target is the price
I chose 4 layers arbitrarily
Epochs at 600 (way too many) because I want to test early stopping
Optimizers and loss because those seemed to get me the most consistent results when compared to SKLearns LinearRegression (MAE is about 81K)
Data Pre-preprocessing:
X = df[df.columns[:-2]].values
y = df['Price'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Fit looks okay:
model = Sequential()
model.add(Dense(5, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mae')
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=600)
Data looks underfit with earlystopping and dropout combined:
model = Sequential()
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=25)
model.compile(optimizer='adam', loss='mae')
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=600, callbacks=[early_stopping])
I'm trying to figure out why early stopping would stop when the results are so far off. I would guess that the model would continue until the end of the 600 epochs however early stopping pulls the plug around 300.
I'm probably doing something wrong but I can't figure it out so any insights would be appreciated. Thank you in advance :)

It defines performance measure and specifies whether to maximize or minimize it.
Keras then stops training at the appropriate epoch. When verbose=1 is designated, it is possible to output on the screen when the training is stopped in keras.
es = EarlyStopping(monitor='val_loss', mode='min')
It may not be effective to stop right away because performance does not increase. Patience defines how many times to allow epochs that do not increase performance. Partiance is a rather subjective criterion. The optimal value can be changed depending on the design of the used data and model used.
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=50)
When the training is stopped by the Model Choice Early stopping object, the state will generally have a higher validation error than the previous model. Therefore, early stopping may be controlled so that the validation error of the model is no longer lowered by stopping the training of the model at a certain point in time, but the stopped state will not be the best model. Therefore, it is necessary to store the model with the best validation performance, and for this purpose, the object called Model Checkpoint exists in keras. This object monitors validation errors and unconditionally stores parameters at this time if the validation performance is better than the previous epoch. Through this, when training is stopped, the model with the highest validation performance can be returned.
from keras.callbacks import ModelCheckpoint
mc = ModelCheckpoint ('best_model.h5', monitor='val_loss', mode='min', save_best_only=True)
in the callbacks parameter, allowing the best model to be stored.
hist = model.fit(train_x, train_y, nb_epoch=10,
batch_size=10, verbose=2, validation_split=0.2,
callbacks=[early_stopping, mc])
In your case Patience 25 indicates whether to end when the reference value does not improve more than 25 times consecutively.
from keras.callbacks import ModelCheckpoint
model = Sequential()
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=25, verbose=1)
mc = ModelCheckpoint ('best_model.h5', monitor='val_loss', mode='min', save_best_only=True)
model.compile(optimizer='adam', loss='mae')
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=600, callbacks=[early_stopping, mc])

I recommend 2 things. In the early stop callback set the parameter
restore_best_weights=True
This way if the early stopping callback activates, your model is set to the weights for the epoch with the lowest validation loss. To get the lower validation loss I recommend you use the callback ReduceLROnPlateau. My recommended code for these callbacks is shown below.
estop=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=4,
verbose=1, estore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5,
patience=2, verbose=1)
callbacks=[estop, rlronp]
In model.fit set parameter callbacks=callbacks. Set epochs to a large number so it is likely the estop callback will be activated.

Keras CNN, High training while low testing

Iam doing a text classification, my dataset size is 16000 KB, my problem is I have 95% of training and 90% in testing.. can I increase testing ? and how?
here is my code
model = Sequential()
model.add(Conv1D( filters=256,kernel_size=5, activation = 'relu',input_shape=(7,1)))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(Dense(11, activation='softmax'))
model.summary()
model.compile(Adam(lr=0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
epochs=200,
verbose=True,
validation_data=(X_test, y_test),
batch_size=128)
loss, accuracy = model.evaluate(X_train, y_train, verbose=True)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))

The first step to debug the model is to plot the training validation curve like the example.
Typical training validation curve
Now based on how the curves behave there can be below possible inferences and solutions.
The two curves diverge as the model is trained, training keeps on improving while the testing either gets worse or saturates way earlier than training.
Cause: Model is overfitting the training and needs regularisation eg. dropout, weight decay, etc.
The two curves stick close together at the end and no further improvements happen.
Cause: Model is saturated or stuck in local minima, try increasing the learning rate to push out of minima, if still no major improvements, try adding more complexity to the model.
The two curves have saturated at the end, but are a small distance apart, and not major changes happed as further trained.
Cause: the model has learned what it could from the available data and will not improve any further, try data transformations to generate new data or get more data.

Why sudden drop in accuracy but loss still decreasing?

I am using tensorflow and keras for a binary classification problem.
I have only 121 samples, but 20.000 features. I know its too less samples and too many features, but its a biological problem (gene-expression data), so i have to deal with it.
My question: Why is accuracy (train and test) going up to 100%, then down and then increasing again. BUT loss is decreasing all the time?
Accuracy plot:
Validation plot:
Since my dataset is only 118 samples big i have only 24 test data points. See confusion matrix:
This is my neural network architecture:
with current settings:
{'ann__dropout_rate': 0.4, 'ann__learning_rate': 0.01, 'ann__n_neurons': 16, 'ann__num_hidden': 1, 'ann__regularization_rate': 0.6}
model = Sequential()
model.add(Dense(input_shape, activation="relu",
input_dim=input_shape)) # First Layer
model.add(Dense(n_neurons, activation="relu",
kernel_regularizer=tf.keras.regularizers.l1(regularization_rate)))
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation="sigmoid"))
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
model.compile(loss="binary_crossentropy",
optimizer=optimizer, metrics=['accuracy'])
return model
Thank you!

try to shuffle your training data if you are not doing so already. You might also try a larger batch size. I also recommend using the ReduceLROnPlateau callback in model.fit. Documentation is here. Set it up to monitor validation loss and to reduce the learning rate by a factor <1 if the loss fails to reduce after patience epochs.

I implemented your #Gerry P ideas (Shuffle=true) and ReduceLROnPlateau (batch size is 64). My callbacks are now:
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_lr=1e-6, verbose=1)
early_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=20, mode='auto')
My Accuracy accuracy and Loss loss looks now like this:
I would say its still overfitted.
Confusion-Matrix:
Confusionmatrix

Keras: change learning rate

I'm trying to change the learning rate of my model after it has been trained with a different learning rate.
I read here, here, here and some other places i can't even find anymore.
I tried:
model.optimizer.learning_rate.set_value(0.1)
model.optimizer.lr = 0.1
model.optimizer.learning_rate = 0.1
K.set_value(model.optimizer.learning_rate, 0.1)
K.set_value(model.optimizer.lr, 0.1)
model.optimizer.lr.assign(0.1)
... but none of them worked!
I don't understand how there could be such confusion around such a simple thing. Am I missing something?
EDIT: Working example
Here is a working example of what I'd like to do:
from keras.models import Sequential
from keras.layers import Dense
import keras
import numpy as np
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse',
optimizer=optimizer)
model.fit(np.random.randn(50,10), np.random.randn(50), epochs=50)
# Change learning rate to 0.001 and train for 50 more epochs
model.fit(np.random.randn(50,10), np.random.randn(50), initial_epoch=50, epochs=50)

You can change the learning rate as follows:
from keras import backend as K
K.set_value(model.optimizer.learning_rate, 0.001)
Included into your complete example it looks as follows:
from keras.models import Sequential
from keras.layers import Dense
from keras import backend as K
import keras
import numpy as np
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse', optimizer=optimizer)
print("Learning rate before first fit:", model.optimizer.learning_rate.numpy())
model.fit(np.random.randn(50,10), np.random.randn(50), epochs=50, verbose=0)
# Change learning rate to 0.001 and train for 50 more epochs
K.set_value(model.optimizer.learning_rate, 0.001)
print("Learning rate before second fit:", model.optimizer.learning_rate.numpy())
model.fit(np.random.randn(50,10),
np.random.randn(50),
initial_epoch=50,
epochs=50,
verbose=0)
I've just tested this with keras 2.3.1. Not sure why the approach didn't seem to work for you.

There is another way, you have to find the variable that holds the learning rate and assign it another value.
optimizer = tf.keras.optimizers.Adam(0.001)
optimizer.learning_rate.assign(0.01)
print(optimizer.learning_rate)
output:
<tf.Variable 'learning_rate:0' shape=() dtype=float32, numpy=0.01>

You can change lr during training with
from keras.callbacks import LearningRateScheduler
# This is a sample of a scheduler I used in the past
def lr_scheduler(epoch, lr):
decay_rate = 0.85
decay_step = 1
if epoch % decay_step == 0 and epoch:
return lr * pow(decay_rate, np.floor(epoch / decay_step))
return lr
Apply scheduler to your model
callbacks = [LearningRateScheduler(lr_scheduler, verbose=1)]
model = build_model(pretrained_model=ka.InceptionV3, input_shape=(224, 224, 3))
history = model.fit(train, callbacks=callbacks, epochs=EPOCHS, verbose=1)

You should define it in the compile function :
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['categorical_accuracy'])
Looking at your comment, if you want to change the learning rate after the beginning you need to use a scheduler : link
Edit with your code and scheduler:
from keras.models import Sequential
from keras.layers import Dense
import keras
import numpy as np
def lr_scheduler(epoch, lr):
if epoch > 50:
lr = 0.001
return lr
return lr
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse',
optimizer=optimizer)
callbacks = [keras.callbacks.LearningRateScheduler(lr_scheduler, verbose=1)]
model.fit(np.random.randn(50,10), np.random.randn(50), epochs=100, callbacks=callbacks)

Suppose that you use Adam optimizer in keras, you'd want to define your optimizer before you compile your model with it.
For example, you can define
myadam = keras.optimizers.Adam(learning_rate=0.1)
Then, you compile your model with this optimizer.
I case you want to change your optimizer (with different type of optimizer or with different learning rate), you can define a new optimizer and compile your existing model with the new optimizer.
Hope this helps!

Some time ago I had a project for which I needed something similar. My idea to change the learning rate was to compile a new model with the new rate, then load the parameter weights from de old model to the new one.
For your example:
from keras.models import Sequential
from keras.layers import Dense
import keras
import numpy as np
# Initial model
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse', optimizer=optimizer)
model.fit(np.random.randn(50,10), np.random.randn(50), epochs=50)
# Change learning rate to 0.001 and train for 50 more epochs
new_model = Sequential()
new_model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.001)
new_model.compile(loss='mse', optimizer=optimizer)
new_model.set_weights(model.get_weights())
model = new_model
model.fit(np.random.randn(50,10), np.random.randn(50), initial_epoch=50, epochs=50)
With this you could see a worse fit of your model in the first epochs because ADAM uses previous steps to optimize and you will lose them.
Hope it helps someone!

How to detect the epoch where Keras earlyStopping occurred?

I am training a neural network with Keras. I set num_epochs to a high number and let EarlyStopping terminate training.
model = Sequential()
model.add(Dense(1, input_shape=(nFeatures,), activation='linear'))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse', 'mae'])
early_stopping_monitor = EarlyStopping(monitor='val_loss', patience=15, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath = fname_saveWeights, verbose=1, save_best_only=True)
seqModel = model.fit(X_train, y_train, batch_size=4, epochs=num_epochs, validation_data=(X_test, y_test), shuffle=True, callbacks=[early_stopping_monitor, checkpointer], verbose=2)
This works fine. However, I then attempt to plot the loss function:
val_loss = seqModel.history['val_loss']
xc = range(num_epochs)
plt.figure()
plt.plot(xc, val_loss)
plt.show()
I am attempting to plot the range of num-epochs (xc) but EarlyStopping ends much earlier, so I have an error in shapes.
How can I detect at what epoch EarlyStopping ended to solve the mismatch?
Verbose setting prints the ending epoch to screen, but I cannot determine how to access the value to use in the plot.

It is set (code) as a field inside the callback:
early_stopping_monitor.stopped_epoch
will give you the epoch it stopped at after training or 0 if it didn't early stop.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Resuming neural network training after a certain epoch in Keras - python

Related

TensorFlow Regression with EarlyStopping and Dropout results in underfitting

Keras CNN, High training while low testing

Why sudden drop in accuracy but loss still decreasing?

Keras: change learning rate

How to detect the epoch where Keras earlyStopping occurred?

Categories

Resources