Different training and validation results with same input Keras - python

I am trying to re-train MobileNet for a different multiclassification purpose as:
train_datagen = ImageDataGenerator(
preprocessing_function = preprocess_input
training_generator = train_datagen.flow_from_directory(
directory = train_data_dir,
target_size=(parameters["img_width"], parameters["img_height"]),
batch_size = parameters["batch_size"],
class_mode= "categorical",
subset = "training",
color_mode = "rgb",
seed = 42)
# Define the Model
base_model = MobileNet(weights='imagenet',
include_top=False, input_shape = (128, 128, 3)) #imports the mobilenet model and discards the last 1000 neuron layer.
# Let only the last n layers as trainable
for layer in base_model.layers:
layer.trainable = False
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(800,activation='relu')(x) #we add dense layers so that the model can learn more complex functions and classify for better results.
x = Dense(600,activation='relu')(x) #dense layer 2
x = Dropout(0.8)(x)
x = Dense(256,activation='relu')(x) #dense layer 3
x = Dropout(0.2)(x)
preds = Dense(N_classes, activation='softmax')(x) #final layer with softmax activation
model= Model(inputs = base_model.input, outputs = preds)
model.compile(optimizer = "Adam", loss='categorical_crossentropy', metrics=['accuracy'])
And performing training setting as validation dataset, the training set as:
history = model.fit_generator(
training_generator,
steps_per_epoch= training_generator.n // parameters["batch_size"],
epochs = parameters["epochs"]
,
##### VALIDATION SET = TRAINING
validation_data = training_generator,
validation_steps = training_generator.n // parameters["batch_size"],
callbacks=[
EarlyStopping(monitor = "acc", patience = 8, restore_best_weights=False),
ReduceLROnPlateau(patience = 3)]
)
However, I do get significant differences in accuracy, between TRAINING AND VALIDATION ACCURACY, even if they are the same dataset, while training; what could it be due to?

Training a neural network involves random distribution of the data in the training database. Because of this, the results are not reproducible. If you're getting significant differences in accuracy, you may try:
get a bigger training database;
retrain the network;
get a database with more consistent results.
LE: it doesn't matter if you get significant differences in accuracy while training. Training is an iterative optimization process, which minimizes the mean square error objective function. It takes a while until this goal is achieved.

I do not know the EXACT reason but I duplicated your problem. The problem happens because you are using the SAME generator which runs for training and then again for validation. If your create a SEPERATE generator for validation that takes the same training data as input then once you run enough epochs for the training accuracy to get into the 90% range you will see the validation accuracy stabilize and converge toward the training accuracy
Train-Valid Acc vs Epochs

Related

Low accuracy after testing hyperparameters

I am using VGG19 pre-trained model with ImageNet weights to do transfer-learning on 4 classes with keras. However I do not know if there really is a difference between these 4 classes, I'd like to discover it. The goal would be to discover if these classes make sense or if there is no difference between these images classes.
These classes are made up of abstract paintings from the same individual.
I tried different models with different hyperparameters (Adam/SGD, learning rate, dropout, l2 regularization, FC layers size, batch size, unfreeze, and also weighted classes as the data is a little bit unbalanced
batch_size = 32
unfreeze = 17
dropout = 0.2
fc = 256
lr = 1e-4
l2_reg = 0.1
train_datagen = ImageDataGenerator(
preprocessing_function = preprocess_input,
horizontal_flip=True,
vertical_flip=True,
fill_mode='nearest'
)
test_datagen = ImageDataGenerator(preprocessing_function = preprocess_input)
train_generator = train_datagen.flow_from_directory(
'C:/Users/train',
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
'C:/Users/test',
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical')
base_model = VGG19(
weights="imagenet",
input_shape=(224, 224, 3),
include_top=False,
)
last_layer = base_model.get_layer('block5_pool')
last_output = last_layer.output
x = Flatten()(last_output)
x = GlobalMaxPooling2D()(last_output)
x = Dense(fc)(x)
x = Activation('relu')(x)
x = BatchNormalization()(x)
x = Dropout(dropout)(x)
x = Dense(fc, activation='relu', kernel_regularizer = regularizers.l2(l2=l2_reg))(x)
x = layers.Dense(4, activation='softmax')(x)
model = Model(base_model.input, x)
for layer in model.layers:
layer.trainable = False
for layer in model.layers[unfreeze:]:
layer.trainable = True
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(learning_rate = lr),
metrics=['accuracy'])
class_weights = class_weight.compute_class_weight('balanced',
np.unique(train_generator.classes),
train_generator.classes)
class_weights_dict = dict(enumerate(class_weights))
history = model.fit(train_generator, epochs=epochs, validation_data=validation_generator,
validation_steps=392//batch_size,
steps_per_epoch=907//batch_size)
plot_model_history(history)
I also did feature extractions at every layer, and fed the extracted features to a SVM (for each layer), and the accuracy of these SVM was about 40%, which is higher than this model (30 to 33%). So, I may be wrong but I think this model could achieve a higher accuracy.
I have a few questions about my model.
First, is my code correct, or am I doing something wrong ?
If the validation set accuracy for a 4-classes classification task is ~30% (assuming the data are balanced or weighted), is it likely or very not likely to be able to improve it to something significantly better with other hyperparameters ?
What else can I try to have a better accuracy ?
When and how can I conclude that these classes do not make sense ?

ResNet50 Model Always Predicts 1 Class

I am working on a ResNet50 model to predict covid/non-covid presence in chest x-rays. However, my model currently only predicts class label 1... I have tried 3 different optimizers, 2 different loss functions, changing the learning rate multiple times from 1e-6 to 0.5, and changing the weights on the class labels...
Does anyone have any ideas what the issue could be? Why does it always predict class label 1?
Here is the code:
# import data
# train_ds = tf.keras.utils.image_dataset_from_directory(
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
DATASET_PATH+"Covid/",
labels="inferred",
batch_size=64,
image_size=(256, 256),
shuffle=True,
seed=COVID_SEED,
validation_split=0.2,
subset="training",
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
DATASET_PATH+"Covid/",
labels="inferred",
batch_size=64,
image_size=(256, 256),
shuffle=True,
seed=COVID_SEED,
validation_split=0.2,
subset="validation",
)
# split data
train_X = list()
train_y = list()
test_X = list()
test_y = list()
for image_batch_train, labels_batch_train in train_ds:
for index in range(0, len(image_batch_train)):
train_X.append(image_batch_train[index])
train_y.append(labels_batch_train[index])
for image_batch, labels_batch in val_ds:
for index in range(0, len(image_batch)):
test_X.append(image_batch[index])
test_y.append(labels_batch[index])
Conv_Base = ResNet50(weights=None, input_shape=(256, 256, 3), classes=2)
# The Convolutional Base of the Pre-Trained Model will be added as a Layer in this Model
for layer in Conv_Base.layers[:-8]:
layer.trainable = False
model = Sequential()
model.add(Conv_Base)
model.add(Flatten())
model.add(Dense(units = 1024, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(units = 1, activation = 'sigmoid'))
model.summary()
opt = Adadelta(learning_rate=0.3)
model.compile(optimizer = opt, loss = 'BinaryCrossentropy', metrics = ['accuracy'])
# try to add class weights to make it predict 0, since we currently only predict class label 1
class_weight = {0: 50.,
1: 1.}
r=model.fit(x = train_ds, validation_data = val_ds, epochs = COVID_EPOCHS, class_weight=class_weight)
#print the class labels of prediction
predictions = model.predict(val_ds)
predictions = np.ndarray.flatten(predictions)
predictions = np.where(predictions < 0, 0, 1) # Convert to 0 and 1.
np.set_printoptions(threshold=np.inf)
print(predictions)
Well done! I'll leave an answer here as well because I think you need to do more besides normalization.
When the weights are None (see here) the resnet weights are randomized. You are using a large convolutional feature extractor (the first layers of a Resnet) but this extractor was not trained on anything. You may achieve decent performance because the Dense layer that succeeds it compensates for this random initialization but chances are it's not what you're aiming for. Keep in mind your resnet weights are not trainable, so the feature extraction will never change.
The reason I suggested imagenet weights is because you're working with images and therefore it's reasonable to assume that your convolutional feature extractor needs to extract important image features such as colors, shapes, edges etc. The fact that the imagenet resnet was trained on 1000 classes or so is irrelevant because you chop it off before it reaches the output layer, which is where the class number bottleneck occurs. I would pursue the weights = 'imagenet' thing.

Keras LSTM model overfitting

I am using an LSTM model in Keras. During the fitting stage, I added the validation_data paramater. When I plot my training vs validation loss, it seems there are major overfitting issues. My validation loss just won't decrease.
My full data is a sequence with shape [50,]. The first 20 records are used as training and the remaining used for the test data.
I have tried adding dropout and reducing the model complexity as much as I can and still no luck.
# transform data to be stationary
raw_values = series.values
diff_values = difference_series(raw_values, 1)
# transform data to be supervised learning
# using a sliding window
supervised = timeseries_to_supervised(diff_values, 1)
supervised_values = supervised.values
# split data into train and test-sets
train, test = supervised_values[:20], supervised_values[20:]
# transform the scale of the data
# scale function uses MinMaxScaler(feature_range=(-1,1)) and fit via training set and is applied to both train and test.
scaler, train_scaled, test_scaled = scale(train, test)
batch_size = 1
nb_epoch = 1000
neurons = 1
X, y = train_scaled[:, 0:-1], train_scaled[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
testX, testY = test_scaled[:, 0:-1].reshape(-1,1,1), test_scaled[:, -1]
model = Sequential()
model.add(LSTM(units=neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]),
stateful=True))
model.add(Dropout(0.1))
model.add(Dense(1, activation="linear"))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(X, y, epochs=nb_epoch, batch_size=batch_size, verbose=0, shuffle=False,
validation_data=(testX, testY))
This what it looks like when changing the amount of neurons. I even tried using Keras Tuner (hyperband) to find the optimal parameters.
def fit_model(hp):
batch_size = 1
model = Sequential()
model.add(LSTM(units=hp.Int("units", min_value=1,
max_value=20, step=1),
batch_input_shape=(batch_size, X.shape[1], X.shape[2]),
stateful=True))
model.add(Dense(units=hp.Int("units", min_value=1, max_value=10),
activation="linear"))
model.compile(loss='mse', metrics=["mse"],
optimizer=keras.optimizers.Adam(
hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])))
return model
X, y = train_scaled[:, 0:-1], train_scaled[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
tuner = kt.Hyperband(
fit_model,
objective='mse',
max_epochs=100,
hyperband_iterations=2,
overwrite=True)
tuner.search(X, y, epochs=100, validation_split=0.2)
When evaluating the model against X_test and y_test, I get the same loss and accuracy score. But when fitting the "best model", I get this:
However, my predictions looks very reasonable against my true values. What should I do to get a better fit?
20 records as training data is too small. There won't be enough variation in the training data for the model to approximate a function accurately, and so your validation data, which is likely much smaller than 20, will likely contain an example wildly different from just those 20 in the training data (i.e. it hasn't seen an example of that nature during training) resulting in a loss that is much higher.

Fine-tuning ResNet50 with Keras - val_loss keeps increasing

I am trying to customize resnet50 using keras with a tensorflow backend. However, upon tranining my val_loss keeps increasing. Trying different learning rates and batch sizes does not resolve the problem.
Using different preprocessing methods such as rescaling or using the preprocess_input function for resnet50 inside the ImageDataGenerator did not not solve the problem either.
This is the code I am using
Importing and preprocessing data:
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import preprocess_input, decode_predictions
IMAGE_SIZE = 224
BATCH_SIZE = 32
num_classes = 27
main_path = "C:/Users/aaron/Desktop/DATEN/data"
gesamt_path = os.path.join(main_path, "ML_DATA")
labels = listdir(gesamt_path)
data_generator = ImageDataGenerator(#rescale=1./255,
validation_split=0.20,
preprocessing_function=preprocess_input)
train_generator = data_generator.flow_from_directory(gesamt_path, target_size=(IMAGE_SIZE, IMAGE_SIZE), shuffle=True, seed=13,
class_mode='categorical', batch_size=BATCH_SIZE, subset="training")
validation_generator = data_generator.flow_from_directory(gesamt_path, target_size=(IMAGE_SIZE, IMAGE_SIZE), shuffle=False, seed=13,
class_mode='categorical', batch_size=BATCH_SIZE, subset="validation")
Defining and training the model
img_width = 224
img_height = 224
model = keras.applications.resnet50.ResNet50()
classes = list(iter(train_generator.class_indices))
model.layers.pop()
for layer in model.layers:
layer.trainable=False
last = model.layers[-1].output
x = Dense(len(classes), activation="softmax")(last)
finetuned_model = Model(model.input, x)
finetuned_model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
for c in train_generator.class_indices:
classes[train_generator.class_indices[c]] = c
finetuned_model.classes = classes
earlystopCallback = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=8, verbose=1, mode='auto')
tbCallBack = keras.callbacks.TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)
history = finetuned_model.fit_generator(train_generator,
validation_data=validation_generator,
epochs=85, verbose=1,callbacks=[tbCallBack,earlystopCallback])
You need to match the preprocessing used for the pretrained network, not come up your own preprocessing. Double check the network input tensor, i.e. whether the channel-wise average of your input matches that of the data used for the pretrained network.
It could be that your new data is very different from the data used for the pretrained network. In that case, all BN layers gonna migrate their pretrained mean/var to new values, so an increasing loss is also possible (but eventually the loss should decrease).
In your training you are using a pretrained model (resnet50) changing only the last layer because you want to predict only a few classes and not the 1000 classes the pretrained model was trained on (that's the meaning of transfer learning).
You are freezing all weights and you are not letting your model to train. Try:
model = keras.applications.resnet50.ResNet50(include_top=False, pooling='avg')
for layer in model.layers:
layer.trainable=False
last = model.output
x = Dense(512, activation='relu')(last)
x = Dropout(0.5)(x)
#x = BatchNormalization()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
#x = BatchNormalization()(x)
x = Dense(len(classes), activation="softmax")(x)
You can modify the code above, change 512 number of neurons, add or not dropout/batchnormalization, use as many dense layers as you want....
There is known ""problem"" (strange design) regarding BN in Keras and your bad result may be related to this issue.

How to use InceptionV3 bottlenecks as input in Keras 2.0

I want to use bottlenecks for transfer learning using InceptionV3 in Keras.
I've used some of the tips on creating, loading and using bottlenecks from
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
My problem is that I don't know how to use a bottleneck (numpy array) as input to an InceptionV3 with a new top layer.
I get the following error:
ValueError: Error when checking input: expected input_3 to have shape
(None, None, None, 3) but got array with shape (248, 8, 8, 2048)
248 refers to the total number of images in this case.
I know that this line is wrong, but I dont't know how to correct it:
model = Model(inputs=base_model.input, outputs=predictions)
What is the correct way to input the bottleneck into InceptionV3?
Creating the InceptionV3 bottlenecks:
def create_bottlenecks():
datagen = ImageDataGenerator(rescale=1. / 255)
model = InceptionV3(include_top=False, weights='imagenet')
# Generate bottlenecks for all training images
generator = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_train_samples = len(generator.filenames)
bottlenecks_train = model.predict_generator(generator, int(math.ceil(nb_train_samples / float(batch_size))), verbose=1)
np.save(open(train_bottlenecks_file, 'w'), bottlenecks_train)
# Generate bottlenecks for all validation images
generator = datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_validation_samples = len(generator.filenames)
bottlenecks_validation = model.predict_generator(generator, int(math.ceil(nb_validation_samples / float(batch_size))), verbose=1)
np.save(open(validation_bottlenecks_file, 'w'), bottlenecks_validation)
Loading the bottlenecks:
def load_bottlenecks(src_dir, bottleneck_file):
datagen = ImageDataGenerator(rescale=1. / 255)
generator = datagen.flow_from_directory(
src_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
num_classes = len(generator.class_indices)
# load the bottleneck features saved earlier
bottleneck_data = np.load(bottleneck_file)
# get the class lebels for the training data, in the original order
bottleneck_class_labels = generator.classes
# convert the training labels to categorical vectors
bottleneck_class_labels = to_categorical(bottleneck_class_labels, num_classes=num_classes)
return bottleneck_data, bottleneck_class_labels
Starting training:
def start_training():
global nb_train_samples, nb_validation_samples
create_bottlenecks()
train_data, train_labels = load_bottlenecks(train_data_dir, train_bottlenecks_file)
validation_data, validation_labels = load_bottlenecks(validation_data_dir, validation_bottlenecks_file)
nb_train_samples = len(train_data)
nb_validation_samples = len(validation_data)
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 2 classes
predictions = Dense(2, activation='softmax')(x)
# What is the correct input? Obviously not base_model.input.
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer=optimizers.SGD(lr=0.01, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])
# train the model on the new data for a few epochs
history = model.fit(train_data, train_labels,
epochs=epochs,
batch_size=batch_size,
validation_data=(validation_data, validation_labels),
)
Any help would be appreciated!
This error happens when you try to train your model with input data in a different shape from the shape your model supports.
Your model supports (None, None, None, 3), meaning:
Any number of images
Any height
Any width
3 channels
So, you must make sure that train_data (and validation_data) matches this shape.
The system is telling that train_data.shape = (248,8,8,2048)
I see that train_data comes from load_botlenecks. Is it really supposed to be coming from there? What is train data supposed to be? An image? Something else? What is a bottleneck?
Your model starts in the Inception model, and the Inception model takes images.
But if bottlenecks are already results of the Inception model, and you want to feed only bottlenecks, then the Inception model should not participate of anything at all.
Start from:
inputTensor = Input((8,8,2048)) #Use (None,None,2048) if bottlenecks vary in size
x = GlobalAveragePooling2D()(inputTensor)
.....
Create the model with:
model = Model(inputTensor, predictions)
The idea is:
Inception model: Image -> Inception -> Bottlenecks
Your model: Bottlenecks -> Model -> Labels
The combination of the two models is only necessary when you don't have the bottlenecks preloaded, but you have your own images for which you want to predict the bottlenecks first. (Of course you can work with separate models as well)
Then you're going to input only images (the bottlenecks will be created by Inception and passed to your model, everything internally):
Combined model: Image -> Inception ->(bottlenecks)-> Model -> Labels
For that:
inputImage = Input((None,None,3))
bottleNecks = base_model(inputImage)
predictions = model(bottleNecks)
fullModel = Model(inputImage, predictions)

Categories