Keras: change learning rate - python

I'm trying to change the learning rate of my model after it has been trained with a different learning rate.
I read here, here, here and some other places i can't even find anymore.
I tried:
model.optimizer.learning_rate.set_value(0.1)
model.optimizer.lr = 0.1
model.optimizer.learning_rate = 0.1
K.set_value(model.optimizer.learning_rate, 0.1)
K.set_value(model.optimizer.lr, 0.1)
model.optimizer.lr.assign(0.1)
... but none of them worked!
I don't understand how there could be such confusion around such a simple thing. Am I missing something?
EDIT: Working example
Here is a working example of what I'd like to do:
from keras.models import Sequential
from keras.layers import Dense
import keras
import numpy as np
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse',
optimizer=optimizer)
model.fit(np.random.randn(50,10), np.random.randn(50), epochs=50)
# Change learning rate to 0.001 and train for 50 more epochs
model.fit(np.random.randn(50,10), np.random.randn(50), initial_epoch=50, epochs=50)

You can change the learning rate as follows:
from keras import backend as K
K.set_value(model.optimizer.learning_rate, 0.001)
Included into your complete example it looks as follows:
from keras.models import Sequential
from keras.layers import Dense
from keras import backend as K
import keras
import numpy as np
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse', optimizer=optimizer)
print("Learning rate before first fit:", model.optimizer.learning_rate.numpy())
model.fit(np.random.randn(50,10), np.random.randn(50), epochs=50, verbose=0)
# Change learning rate to 0.001 and train for 50 more epochs
K.set_value(model.optimizer.learning_rate, 0.001)
print("Learning rate before second fit:", model.optimizer.learning_rate.numpy())
model.fit(np.random.randn(50,10),
np.random.randn(50),
initial_epoch=50,
epochs=50,
verbose=0)
I've just tested this with keras 2.3.1. Not sure why the approach didn't seem to work for you.

There is another way, you have to find the variable that holds the learning rate and assign it another value.
optimizer = tf.keras.optimizers.Adam(0.001)
optimizer.learning_rate.assign(0.01)
print(optimizer.learning_rate)
output:
<tf.Variable 'learning_rate:0' shape=() dtype=float32, numpy=0.01>

You can change lr during training with
from keras.callbacks import LearningRateScheduler
# This is a sample of a scheduler I used in the past
def lr_scheduler(epoch, lr):
decay_rate = 0.85
decay_step = 1
if epoch % decay_step == 0 and epoch:
return lr * pow(decay_rate, np.floor(epoch / decay_step))
return lr
Apply scheduler to your model
callbacks = [LearningRateScheduler(lr_scheduler, verbose=1)]
model = build_model(pretrained_model=ka.InceptionV3, input_shape=(224, 224, 3))
history = model.fit(train, callbacks=callbacks, epochs=EPOCHS, verbose=1)

You should define it in the compile function :
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['categorical_accuracy'])
Looking at your comment, if you want to change the learning rate after the beginning you need to use a scheduler : link
Edit with your code and scheduler:
from keras.models import Sequential
from keras.layers import Dense
import keras
import numpy as np
def lr_scheduler(epoch, lr):
if epoch > 50:
lr = 0.001
return lr
return lr
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse',
optimizer=optimizer)
callbacks = [keras.callbacks.LearningRateScheduler(lr_scheduler, verbose=1)]
model.fit(np.random.randn(50,10), np.random.randn(50), epochs=100, callbacks=callbacks)

Suppose that you use Adam optimizer in keras, you'd want to define your optimizer before you compile your model with it.
For example, you can define
myadam = keras.optimizers.Adam(learning_rate=0.1)
Then, you compile your model with this optimizer.
I case you want to change your optimizer (with different type of optimizer or with different learning rate), you can define a new optimizer and compile your existing model with the new optimizer.
Hope this helps!

Some time ago I had a project for which I needed something similar. My idea to change the learning rate was to compile a new model with the new rate, then load the parameter weights from de old model to the new one.
For your example:
from keras.models import Sequential
from keras.layers import Dense
import keras
import numpy as np
# Initial model
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse', optimizer=optimizer)
model.fit(np.random.randn(50,10), np.random.randn(50), epochs=50)
# Change learning rate to 0.001 and train for 50 more epochs
new_model = Sequential()
new_model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.001)
new_model.compile(loss='mse', optimizer=optimizer)
new_model.set_weights(model.get_weights())
model = new_model
model.fit(np.random.randn(50,10), np.random.randn(50), initial_epoch=50, epochs=50)
With this you could see a worse fit of your model in the first epochs because ADAM uses previous steps to optimize and you will lose them.
Hope it helps someone!

Related

Keras Resnet-50 image classification overfitting

Hello I am getting overfitting with resnet-50 pretrained weights. I am trying to train RGB images of files and the dataset I am using comes with training and validation sets. I have 26 classes and about 14k images, 9 k training and 5k testing.
The name of data set is maleviz
My validation accuracy is very low and my training accuracy reaches 1.000. My validation doesn't go over 0.50-0.55 so seems to be overfitting I think.. Is there something wrong with data like per class samples or is there something wrong with my model?
I expect resnet to perform well on this...
Here is my code:
import tensorflow as tf
import keras
from keras import backend as K
from keras.preprocessing.image import ImageDataGenerator
import keras
from keras.models import Sequential, Model, load_model
from tensorflow.keras.optimizers import Adam
from keras.callbacks import EarlyStopping,ModelCheckpoint
from keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization,Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D,MaxPool2D
from keras.preprocessing import image
from keras.initializers import glorot_uniform
from keras.applications.resnet import ResNet50
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
samples = ImageDataGenerator().flow_from_directory(directory='malevis_train_val_300x300/train', target_size=(300,300))
imgs, labels = next(samples)
print(imgs.shape, labels.shape)
samples2 = ImageDataGenerator().flow_from_directory(directory='malevis_train_val_300x300/val', target_size=(300,300))
imgs2, labels2 = next(samples2)
classes = samples.class_indices.keys()
y = (sum(labels)/labels.shape[0])*100
plt.xticks(rotation='vertical')
plt.bar(classes,y)
plt.show()
X_train, y_train = imgs,labels
X_val, y_val = imgs2,labels2
def define_model():
model = ResNet50(weights = 'imagenet', pooling = 'avg', include_top = False, input_shape =(300,300,3))
for layer in model.layers:
layer.trainable = False
flat1 = Flatten()(model.layers[-1].output)
class1 = Dense(256,activation='relu',)(flat1)
output = Dense(26,activation='softmax')(class1)
model = Model(inputs = model.inputs, outputs=output)
opt = Adam(lr =0.001)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
return model
model = define_model()
model.summary()
history1 = model.fit(X_train,y_train, validation_data=(X_val,y_val), epochs = 200,batch_size = 20, steps_per_epoch = 4,shuffle=True)
scores = model.evaluate(X_val,y_val)
print('Final accuracy:', scores[1])
acc = history1.history['accuracy']
val_acc = history1.history['val_accuracy']
loss = history1.history['loss']
val_loss = history1.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend(loc=0)
plt.figure()
plt.show()
I have tried different optimizers, loss functions, target size, and added epochs per step.. Nothing really makes much different it still overfits. I am using softmax activation and freezing the layers and removing top. I just then add dense layer and output layer for 26 classes.I have tried with shuffling true and false
I would like to suggest you a few things, one of them might be helpful:
You didn't provide any classes parameter inside flow_from_directory() make sure you have the proper folder structure as the documentation requires: flow_from_directory
Try changing the loss from categorical_crossentropy to sparse_categorical_crossentropy if your output labels are not one-hot encoded. Ref: Probabilistic losses | SparseCategoricalCrossentropy

Transfer learning on MobileNetV3 reaches plateau and I can't move past it

I'm trying to do transfer learning on MobileNetV3-Small using Tensorflow 2.5.0 to predict dog breeds (133 classes) and since it got reasonable accuracy on the ImageNet dataset (1000 classes) I thought it should have no problem adapting to my problem.
I've tried a multitude of training variations and recently had a breakthrough but now my training stagnates at about 60% validation accuracy with minor fluctuations in validation loss (accuracy and loss curves for training and validation below).
I tried using ReduceLROnPlateau in the 3rd graph below, but it didn't help to improve matters. Can anyone suggest how I could improve the training?
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.layers import GlobalMaxPooling2D, Dense, Dropout, BatchNormalization
from tensorflow.keras.applications import MobileNetV3Large, MobileNetV3Small
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True # needed for working with this dataset
# define generators
train_datagen = ImageDataGenerator(vertical_flip=True, horizontal_flip=True,
rescale=1.0/255, brightness_range=[0.5, 1.5],
zoom_range=[0.5, 1.5], rotation_range=90)
test_datagen = ImageDataGenerator(rescale=1.0/255)
train_gen = train_datagen.flow_from_directory(train_dir, target_size=(224,224),
batch_size=32, class_mode="categorical")
val_gen = test_datagen.flow_from_directory(val_dir, target_size=(224,224),
batch_size=32, class_mode="categorical")
test_gen = test_datagen.flow_from_directory(test_dir, target_size=(224,224),
batch_size=32, class_mode="categorical")
pretrained_model = MobileNetV3Small(input_shape=(224,224,3), classes=133,
weights="imagenet", pooling=None, include_top=False)
# set all layers trainable because when I froze most of the layers the model didn't learn so well
for layer in pretrained_model.layers:
layer.trainable = True
last_output = pretrained_model.layers[-1].output
x = GlobalMaxPooling2D()(last_output)
x = BatchNormalization()(x)
x = Dense(512, activation='relu')(x)
x = Dense(133, activation='softmax')(x)
model = Model(pretrained_model.input, x)
model.compile(optimizer=Adam(learning_rate=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
# val_acc with min_delta 0.003; val_loss with min_delta 0.01
plateau = ReduceLROnPlateau(monitor="val_loss", mode="min", patience=5,
min_lr=1e-8, factor=0.3, min_delta=0.01,
verbose=1)
checkpointer = ModelCheckpoint(filepath=savepath, verbose=1, save_best_only=True,
monitor="val_accuracy", mode="max",
save_weights_only=True)
Your code looks good, but it seems to have one issue - you might be rescaling the inputs twice. According to the docs for MobilenetV3:
The preprocessing logic has been included in the mobilenet_v3 model implementation. Users are no longer required (...) to normalize the input data.
Now, in your code, there is:
test_datagen = ImageDataGenerator(rescale=1.0/255)
which essentially, makes the first model layers to rescale, already rescaled values.
The same applies for train_datagen.
You could try removing the rescale argument from both train and test loaders, or setting rescale=None.
This could also explain why the model did not learn well with the backbone frozen.

LOSS not changeing in very simple KERAS binary classifier

I'm trying to get a very (over) simplified Keras binary classifier neural network running without success. The LOSS just stays constant. I've played around with Optimizers (SGD, Adam, RMSProp), Learningrates, Weight-Initializations, Batch Size and input data normalization so far.
Nothing changes at all. Am I doing something fundamentally wrong? Here is the code:
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
data = np.array(
[
[100,35,35,12,0],
[101,46,35,21,0],
[130,56,46,3412,1],
[131,58,48,3542,1]
]
)
x = data[:,1:-1]
y_target = data[:,-1]
x = x / np.linalg.norm(x)
model = Sequential()
model.add(Dense(3, input_shape=(3,), activation='softmax', kernel_initializer='lecun_normal',
bias_initializer='lecun_normal'))
model.add(Dense(1, activation='softmax', kernel_initializer='lecun_normal',
bias_initializer='lecun_normal'))
model.compile(optimizer=SGD(learning_rate=0.1),
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x, y_target, batch_size=2, epochs=10,
verbose=1)
Softmax definition is:
exp(a) / sum(exp(a)
so when you use with a single neuron you will get:
exp(a) / exp(a) = 1
That is why your classifier doesn't work with a single neuron.
You can use sigmoid instead in this special case:
exp(a) / (exp(a) + 1)
Furthermore sigmoid function is for two class classifiers. Softmax is an extension of sigmoid for multiclass classifers.
For the first layer you should use relu or sigmoid function instead of softmax.
This is the working solution based on the feedback I got
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.utils import to_categorical
data = np.array(
[
[100,35,35,12,0],
[101,46,35,21,0],
[130,56,46,3412,1],
[131,58,48,3542,1]
]
)
x = data[:,1:-1]
y_target = data[:,-1]
x = x / np.linalg.norm(x)
model = Sequential()
model.add(Dense(3, input_shape=(3,), activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer=SGD(learning_rate=0.1),
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x, y_target, epochs=1000,
verbose=1)

Strange behavior of keras v1.2.2 vs. keras v2+ (HUGE differences in accuracy)

Today I've ran into some very strange behavior of Keras. When I try to do a classification run on the iris-dataset with a simple model, keras version 1.2.2 gives me +- 95% accuracy, whereas a keras version of 2.0+ predicts the same class for every training example (leading to an accuracy of +- 35%, as there are three types of iris). The only thing that makes my model predict +-95% accuracy is downgrading keras to a version below 2.0:
I think it is a problem with Keras, as I have tried the following things, all do not make a difference;
Switching activation function in the last layer (from Sigmoid to softmax).
Switching backend (Theano and Tensorflow both give roughly same performance).
Using a random seed.
Varying the number of neurons in the hidden layer (I only have 1 hidden layer in this simple model).
Switching loss-functions.
As the model is very simple and it runs on it's own (You just need the easy-to-obtain iris.csv dataset) I decided to include the entire code;
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
#Load data
data_frame = pd.read_csv("iris.csv", header=None)
data_set = data_frame.values
X = data_set[:, 0:4].astype(float)
Y = data_set[:, 4]
#Encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
def baseline_model():
#Create & Compile model
model = Sequential()
model.add(Dense(8, input_dim=4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
#Create Wrapper For Neural Network Model For Use in scikit-learn
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
#Create kfolds-cross validation
kfold = KFold(n_splits=10, shuffle=True)
#Evaluate our model (Estimator) on dataset (X and dummy_y) using a 10-fold cross-validation procedure (kfold).
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Accuracy: {:2f}% ({:2f}%)".format(results.mean()*100, results.std()*100))
if anyone wants to replicate the error here are the dependencies I used to observe the problem:
numpy=1.16.4
pandas=0.25.0
sk-learn=0.21.2
theano=1.0.4
tensorflow=1.14.0
In Keras 2.0, many parameters changed names, there is compatibility layer to keep things working, but somehow it did not apply when using KerasClassifier.
In this part of the code:
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
You are using the old name nb_epoch instead of the modern name of epochs. The default value is epochs=1, meaning that your model was only being trained for one epoch, producing very low quality predictions.
Also note that here:
model.add(Dense(3, init='normal', activation='sigmoid'))
You should be using a softmax activation instead of sigmoid, as you are using the categorical cross-entropy loss:
model.add(Dense(3, init='normal', activation='softmax'))
I've managed to isolate the issue, if you change nb_epoch to epochs, (All else being exactly equal) the model predicts very good again, in keras 2 as well. I don't know if this is intended behavior or a bug.

Tensor math with tensorflow backend

I was trying add custom metrics while training my LSTM using keras. See code below:
from keras.models import Sequential
from keras.layers import Dense, LSTM, Masking, Dropout
from keras.optimizers import SGD, Adam, RMSprop
import keras.backend as K
import numpy as np
_Xtrain = np.random.rand(1000,21,47)
_ytrain = np.random.randint(2, size=1000)
_Xtest = np.random.rand(200,21,47)
_ytest = np.random.randint(1, size=200)
def t1(y_pred, y_true):
return K.tf.count_nonzero((1 - y_true))
def t2(y_pred, y_true):
return K.tf.count_nonzero(y_true)
def build_model():
model = Sequential()
model.add(Masking(mask_value=0, input_shape=(21, _Xtrain[0].shape[1])))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(64, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))
rms = RMSprop(lr=.001, decay=.001)
model.compile(loss='binary_crossentropy', optimizer=rms, metrics=[t1, t2])
return model
model = build_model()
hist = model.fit(_Xtrain, _ytrain, epochs=1, batch_size=5, validation_data=(_Xtest, _ytest), shuffle=True)
The output of the above code is as follows:
Train on 1000 samples, validate on 200 samples
Epoch 1/1
1000/1000 [==============================] - 5s - loss: 0.6958 - t1: 5.0000 - t2: 5.0000 - val_loss: 0.6975 - val_t1: 5.0000 - val_t2: 5.0000
So it appears that both methods t1 and t2 are producing the exact same output and it is baffling me. What could be going wrong and how could I get the complementary tensor to y_true?
Backstory: I was trying to write custom metrics (F1 score) in particular for my model. Keras does not seems to have those readily available. If anyone knows a better way, please help me get pointed to the right direction.
One easy way to handle this issue is to use a callback instead. Following the logic from this issue, you could specify a metrics call back that calculates any metric using sci-kit learn. For example, if you wanted to calculate f1, you could do the following:
from keras.models import Sequential
from keras.layers import Dense, LSTM, Masking, Dropout
from keras.optimizers import SGD, Adam, RMSprop
import keras.backend as K
from keras.callbacks import Callback
import numpy as np
from sklearn.metrics import f1_score
_Xtrain = np.random.rand(1000,21,47)
_ytrain = np.random.randint(2, size=1000)
_Xtest = np.random.rand(200,21,47)
_ytest = np.random.randint(2, size=200)
class MetricsCallback(Callback):
def __init__(self, train_data, validation_data):
super().__init__()
self.validation_data = validation_data
self.train_data = train_data
self.f1_scores = []
self.cutoff = .5
def on_epoch_end(self, epoch, logs={}):
X_val = self.validation_data[0]
y_val = self.validation_data[1]
preds = self.model.predict(X_val)
f1 = f1_score(y_val, (preds > self.cutoff).astype(int))
self.f1_scores.append(f1)
def build_model():
model = Sequential()
model.add(Masking(mask_value=0, input_shape=(21, _Xtrain[0].shape[1])))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(64, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))
rms = RMSprop(lr=.001, decay=.001)
model.compile(loss='binary_crossentropy', optimizer=rms, metrics=['acc'])
return model
model = build_model()
hist = model.fit(_Xtrain, _ytrain, epochs=2, batch_size=5, validation_data=(_Xtest, _ytest), shuffle=True,
callbacks=[MetricsCallback((_Xtrain, _ytrain), (_Xtest, _ytest))])

Categories