Inconsistent Accuracy Score using evaluate_generator on keras

Inconsistent Accuracy Score using evaluate_generator on keras - python

I’m trying to solve a problem in my CNN model. I'm using a dataset that is structured as follow:
My data is organized as follow:
train
parasitized
uninfected
test
parasitized
uninfected
validation
parasitized
uninfected
My dataset is too large and i'm using ImageDataGenerator to preprocess the images and also upload them in batches (reduce computational cost). In a first i configured the ImageDataGenerator as follows:
from keras.preprocessing.image import ImageDataGenerator
#Define a ImageDataGenerator for each dataset.
#This augmentation process is only to rescale each imagem to 1/255
datagen_train = ImageDataGenerator(rescale=1./255) #rescale=1./255
datagen_test = ImageDataGenerator(rescale=1./255)
datagen_valid = ImageDataGenerator(rescale=1./255)
#Define a batch_size parameter
batch_size=32
# Here .flow_from_directory is used to transform
train_generator = datagen_train.flow_from_directory(
'content/cell_images/train', #Train folder path
target_size=(150,150), #all images will be resized to 150x150
batch_size=batch_size,
class_mode='categorical') # We use categorical_crossentropy loss,
# we need categorical labels
test_generator = datagen_test.flow_from_directory(
'content/cell_images/test', #Test folder path
target_size=(150,150), #all images will be resized to 150x150
batch_size=batch_size,
class_mode='categorical')
valid_generator = datagen_valid.flow_from_directory(
'content/cell_images/valid',
target_size=(150,150),
batch_size=32,
class_mode='categorical')
To fit the model it was used fit_generator and a checkpointer to save the best weights based on the validation_accuracy:
from keras.callbacks import ModelCheckpoint
# Define epochs number
epochs = 10
# Create a checkpointer to save only the best params
checkpointer = ModelCheckpoint(filepath='cnn_model.weights.best.hdf5',
verbose=1, save_best_only=True)
model.fit_generator(train_generator,
steps_per_epoch=train_generator.samples//batch_size,
epochs=epochs,
callbacks=[checkpointer],
validation_data=valid_generator,
validation_steps=valid_generator.samples//batch_size)
And finally, the best weights was loaded to the model. The model was evaluated using test_set:
# load the weights that yielded the best validation accuracy
model.load_weights('cnn_model.weights.best.hdf5')
#evaluate and print test accuracy
score = model.evaluate_generator(test_generator,
test_generator.samples//batch_size)
print('\n', 'Test accuracy:', score[1])
But, here is my problem: each time i run only the model.evaluate_generator without train the model again (i.e. keeping the same weights), it returns different accuracy scores.
I've been looking for a solution, reading a lot of posts to get some insights and recently i got some advance.
Recently, i discovered based on this post that if i set Shuffle=True and batch_size=1 in test_generator:
test_generator = datagen_test.flow_from_directory(
'content/cell_images/test', #Test folder path
target_size=(150,150), #all images will be resized to 150x150
batch_size=1,
class_mode='categorical',
shuffle=False)`
and steps = test_generator.samples in test_generator:
score = model.evaluate_generator(test_generator, test_generator.samples)
the values doesn't change any more.
I was investigating the effect of rescale 1./255 based on this post. For this, i used callbacks with a checkpointer to save the weights only for the best validation socore. After, i loaded the best weights to the model and evaluated using model.evaluate_generator, as mentioned above. To check the score consistency i also use the validation score to check if the values returned by the callbacks for the best weights is the same returned by the evaluate_generator. Before run the evaluate_generator with validation_set i used the same params of the test set:
valid_generator = datagen_valid.flow_from_directory(
'content/cell_images/valid',
target_size=(150,150),
batch_size=1,
class_mode='categorical',
shuffle=False)
#evaluate and print test accuracy
score = model.evaluate_generator(valid_generator,
valid_generator.samples)
print('\n', 'Valid accuracy:', score[1])
#evaluate and print test accuracy
score = model.evaluate_generator(test_generator,
test_generator.samples)
print('\n', 'Test accuracy:', score[1])
Curiously i noticed that:
When i don't use the rescale (1./255):
datagen_train = ImageDataGenerator()
datagen_test = ImageDataGenerator()
datagen_valid = ImageDataGenerator()
the validation_score displayed by the callbacks (0.5) it's exacly the same obtained from model.evaluate_generator (0.5); Also, the test set returns an accuracy score = 0.5.
When i use the rescale (1./255):
datagen_train = ImageDataGenerator(rescale=1./255)
datagen_test = ImageDataGenerator(rescale=1./255)
datagen_valid = ImageDataGenerator(rescale=1./255)
the difference between validation_score displayed by the callbacks (0.9515):
Epoch 7/10
688/688 [==============================] - 67s 97ms/step - loss:
0.2017 - acc: 0.9496 - val_loss: 0.1767 - val_acc: 0.9515
Epoch 00007: val_loss improved from 0.19304 to 0.17671, saving model
to cnn_model.weights.best.hdf5
and the score obtained from model.evaluate_generator (Valid accuracy: 0.9466618287373004) it's very tiny; Using test set -
Test accuracy: 0.9078374455732946
Based on this small difference between validation scores, could i infer that the evaluate_generator is working corrreclty? And, could i infer that the accuracy score on the test_set is also correctly ? Or there is another approach to solve this problem?
I'm frustating with this problem.
Sorry for the long post, i'm trying to be the more didactic i can.
Thanks!

Related

Loading pre-trained keras model and continue training

I trained a model on initial data, got some good scores, and now after receiving more data I want to load the pre-trained model and continue to train.
Here some snippet of what I did:
(1) I follow this post which says to save the model in 'tf' format
# saving initial model
model.save(path2initial, save_format='tf')
# load pre-trained model
clf = tf.keras.models.load_model(path2initial)
# create new data generators
train_gen = generators.create(generator_config, 'train')
val_gen = generators.create(generator_config, 'val')
# create metrics, loss, optimizer and callbacks
loss = losses.create(loss_config)
callback_list = callbacks.create(callback_config)
optimizer = optimizers.create(optimizer_confing)
metrics = metrics.create(metrics_config)
# compile model
clf.compile(optimizer=optimizer, loss=loss, metrics=metrics)
# train
clf.fit(x=train_gen,
epochs=NB_EPCOHS,
validation_data=val_gen,
steps_per_epochs=math.ceil(len(train_steps)/ BATCH_SIZE),
validation_steps=math.ceil(len(val_steps)/ BATCH_SIZE),
callbacks=callback_list,
use_multiprocessing=True,
workers=16,
max_queue_size=8,
verbose=1
)
I should note that two of my callbacks are
EarlyStopping(monitor='val_loss', restore_best_weights=True,
min_delta=0.001, patience=10, mode='min', verbose=1)
ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True,
mode='min', save_freq='epoch')
And that train_gen is consisted with both initial_data and new_data.
This method trained only for 4 epochs, and hadn't changed for the
rest of the 10 'patient' epochs. Moreover the results were way worse than the initial model's results.
(2) The second method I tried was to save the model in the default format (that's the only change):
model.save(path2initial)
.
.
.
This model had trained for 71/200 epochs, but it seems that it ignored my EarlyStopping() callback. In some epochs the val-loss had changed by 1e-4 or even less, and still it continue with the training (weirddd), And it stopped (by EarlyStopping()) in epoch 71 even that the val-loss had change! Moreover, the results had barely changed.
For comparison I trained a model from scratch on all the data (both initial and new data) and got way better results:
Initial data Method (1) Method (2) New model on all data
mean F1 score: 0.735 0.422 0.74 0.803
Is there a proven way to how to continue training a keras model?
When loading the model does the optimizer status reset?
When loading the model, do I need to define all the callbacks, loss, opt, metrics all over again? Do I need to compile it again?

Tensorflow metrics confusion: accuracy and loss are high but confusion matrix indicates bad prediction

after years of reading, it is finally time form my first question:
Using tensorflow and keras in a jupyter notebook, I trained a VGG16 Model on 20k sound spectrograms (my own dataset) and a bit of data augmentation using a data generator to do a 4-class multiclass classification. Below, my code:
import tensorflow as tf
from tensorflow.keras.applications.vgg16 import VGG16
model = VGG16(include_top=True,
weights=None,
input_tensor=None,
pooling=None,
classes=len(labels),
classifier_activation="softmax")
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import optimizers
# Rescale by 1/255, add data augmentation:
train_datagen = ImageDataGenerator(
rescale=1./255,
width_shift_range=0.2,
brightness_range=[0.8,1.2],
fill_mode='nearest')
# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# This is the target directory
train_dir,
# All images will be resized to 224x224
target_size=(224, 224),
batch_size=20,
# one hot label for multiclass
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(224, 224),
batch_size=20,
class_mode='categorical')
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(learning_rate=2e-5),
metrics=[tf.keras.metrics.CategoricalAccuracy(),
tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()])
# Train the model:
history = model.fit(
train_generator,
steps_per_epoch=100,
epochs=100,
validation_data=validation_generator,
validation_steps=50,
verbose=2)
To evaluate the training process I plotted acc, loss, precision, recall and f1-score. All of them looking good and indicating the training went well.
When I use modell.evaluate on my test set, I get a acc of 91%.
test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(224, 224),
batch_size=20,
class_mode='categorical')
test_loss, test_acc, test_precison, test_recall = model.evaluate(test_generator, steps=50)
print('test_acc:' + str(test_acc))
Found 4724 images belonging to 4 classes.
50/50 [==============================] - 2s 49ms/step - loss: 0.2739 - categorical_accuracy: 0.9120 - precision: 0.9244 - recall: 0.9050
test_acc:0.9120000004768372
But when I try to plot a confusion matrix in the following way, it looks horrible and when I calculate the acc manualy form the data I created the confusion matrix from, I get a acc of 25%. Which with 4 classes would mean my model learn absolutely nothing…
import numpy as np
import sklearn.metrics
# Print confuision matrix for test set:
test_pred_raw = model.predict(test_generator)
print('raw preditcitons:')
print(test_pred_raw)
test_pred = np.argmax(test_pred_raw, axis=1)
print('prediction:')
print(test_pred)
test_labels = test_generator.classes
print('labels')
print(test_labels)
# Calculate accuracy manualy:
my_test_acc = sum(test_pred == test_labels) / len(test_labels)
print('My_acc:')
print(my_test_acc)
# Calculate the confusion matrix using sklearn.metrics
cm = sklearn.metrics.confusion_matrix(test_labels, test_pred)
figure = plot_confusion_matrix(cm, class_names=labels)
raw preditcitons:
[[2.9204198e-12 2.8631955e-09 1.0000000e+00 7.3386294e-16]
[1.1940503e-11 8.0026985e-11 1.0000000e+00 7.3565399e-16]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00]
...
[2.2919695e-03 3.8061540e-07 9.9770677e-01 8.1024604e-07]
[5.7501338e-35 1.0000000e+00 0.0000000e+00 0.0000000e+00]
[0.0000000e+00 1.0000000e+00 4.0776377e-37 2.6318860e-38]]
prediction:
[2 2 1 ... 2 1 1]
labels
[0 0 0 ... 3 3 3]
My_acc:
0.2491532599491956\
My question now is, which of the metrics can I trust and what is wrong with the other one?

Okay. I think I got it!
Setting shuffle = False in test_datagen.flow_from_directory() seems to solve he problem. Now the confusion matrix looks way better and my_acc = 89% looks fine to.
It seems that the data generator yields different batches when called twice. First, by model.predict(test_generator) and then again by test_generator.classes, basicly making the label and predictions not match because they are for different batches.
Can someone confirm I got this right?

The problem might be:
my_test_acc = sum(test_pred == test_labels) / len(test_labels)
Maybe you should add a rounding up step to be sure the predicted values are really 1.0 and not 0.99.

Warning : Tensorflow: can save best model only with val_accuracy available, skipping

I am building a CNN model for image classification using keras. I am unable to save the best model using ModelCheckPoint as it throws this warning:
WARNING:tensorflow:Can save best model only with val_accuracy
available, skipping.
I have researched on stackoverflow for all the related questions, but nothing has worked so far. [1] [2] [3] [4] and more
Here's my code:
model.compile(loss='categorical_crossentropy',metrics=['accuracy', Precision(), Recall()],optimizer='adam')
train_datagen = ImageDataGenerator(rescale = 1/255.,
rotation_range =40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range =0.2,
horizontal_flip=True,
fill_mode ='nearest')
test_datagen = ImageDataGenerator(rescale = 1/255.)
train_generator = train_datagen.flow_from_directory(r"./datasetcrop512/train", target_size=(512,512), batch_size=32, class_mode='categorical')
test_generator = test_datagen.flow_from_directory(r"./datasetcrop512/test", target_size=(512,512), batch_size=32, class_mode='categorical')
After augmentation, model checkpoint
from keras.callbacks import ModelCheckpoint
filepath = 'weights_best_model3_6.hdf5'
checkpoint = ModelCheckpoint(filepath,monitor = 'val_accuracy',verbose = 1, save_best_only=True, mode='max')
fit the model
history = model.fit(train_generator, steps_per_epoch = stepsPerEpoch,
epochs = 15, validation_data=test_generator, validation_steps = stepsPerEpoch,
callbacks = [ PlotLossesKeras(),checkpoint])
After running, validation accuracy is calculated for 1st epoch, but from 2nd epoch it starts giving the aforementioned warning and doesn't save the best model.

This seems similar to another issue that i saw before, the use of validation_steps is only allowed under some conditions as stated in the docs
validation_steps : Only relevant if validation_data is provided and is a tf.data dataset. ...
The data you provided is not from the required type to make it work, validating this way requires the given dataset to be either very large or to use repeat method to repeat itself to accommodate the number of validation batches you want, i believe that's why validation only worked at the first time then stopped for the rest.
Check also this better explanation of the parameters steps_per_epoch and validation_steps and when to use them.

Issues with KeyError: 'val_acc'

This is rather a popular error, but I couldn't find a proper answer given my setup.
I found this tutorial code, but when running, I get this error:
val_acc = history.history['val_acc']
KeyError: 'val_acc'
The fit_generator() function unlike fit(), doesn't allow a validation split. So how to fix it?
Here is the code:
def plot_training(history):
print (history.history.keys())
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'r.')
plt.plot(epochs, val_acc, 'r')
plt.title('Training and validation accuracy')
# plt.figure()
# plt.plot(epochs, loss, 'r.')
# plt.plot(epochs, val_loss, 'r-')
# plt.title('Training and validation loss')
plt.show()
plt.savefig('acc_vs_epochs.png')
#....
finetune_model = build_finetune_model(base_model, dropout=dropout, fc_layers=FC_LAYERS, num_classes=len(class_list))
adam = Adam(lr=0.00001)
finetune_model.compile(adam, loss='categorical_crossentropy', metrics=['accuracy'])
filepath="./checkpoints/" + "ResNet50" + "_model_weights.h5"
checkpoint = ModelCheckpoint(filepath, monitor=["acc"], verbose=1, mode='max')
callbacks_list = [checkpoint]
history = finetune_model.fit_generator(train_generator, epochs=NUM_EPOCHS, workers=8,
steps_per_epoch=steps_per_epoch,
shuffle=True, callbacks=callbacks_list)
plot_training(history)

Hi writting my suggestions here because I'm not able to comment yet,
You are right the fuction fit_generator() dosen't have the validation split attribute.
Therefore you need to make your own validation dataset and feed it to the fit generator through validation_data=(val_X, val_y) as eg.:
history = finetune_model.fit_generator(train_generator, epochs=NUM_EPOCHS, workers=8, validation_data=(val_X, val_y),
steps_per_epoch=steps_per_epoch,
shuffle=True, callbacks=callbacks_list)
Hope this helps.
EDIT
To get a validation dataset from your data you can use the methode train_test_split() from sklearn. For example a split with 77% train and 33% validation data:
X_train, val_X, y_train, val_y= train_test_split(
X, y, test_size=0.33, random_state=42)
Look here for more information.
Alternatively you could write your own split methode :)
Edit 2
If you don't have the possibility to use train_test split and with the assumption you have a pandas dataframe called train_data with the features and labels together:
val_data=train_data.sample(frac=0.33,random_state=1)
This should creates a validation dataset with 33% of the data and a train dataset with 77% of the data.
Edit3
It turns out you are using ImageDataGenerator() to create your data. This is quite handy because you can set your validation percentage via validation_split= while you initialize the ImageDataGenerator() as seen in the documentation (here). This should look something like this:
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
validation_split=0.33)
After this you need two "generated" datasets. One to train and one to make your validation. This should look as following:
train_generator = train_datagen.flow_from_directory(TRAIN_DIR,
target_size=(HEIGHT, WIDTH),
batch_size=BATCH_SIZE,subset="training")
validation_generator = train_datagen.flow_from_directory(TRAIN_DIR,
target_size=(HEIGHT, WIDTH),
batch_size=BATCH_SIZE,subset="validation")
Finally you can use both sets in your fit_generator as following:
history = finetune_model.fit_generator(train_generator,epochs=NUM_EPOCHS, workers=8,
validation_data=validation_generator, validation_steps = validation_generator.samples,steps_per_epoch=steps_per_epoch,
shuffle=True, callbacks=callbacks_list)
Let me know if this solves your problem :)

Constant Accuracy in CNN

I am trying train a Vigg19 network with Fine-Grained Visual Classification of Aircraft benchmark data set. I chose 3 aircraft family. Using the text files that are given in dataset, I manage to construct a dataframe in order to use flow_from_dataframe generator.
df = pd.read_csv("./train_family.csv")
df_test = pd.read_csv("./test_family.csv")
datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25,rotation_range=10, horizontal_flip=True, vertical_flip=True)
train_generator=datagen.flow_from_dataframe(dataframe=df,
directory="./data/images",
x_col="train", y_col="labels",
class_mode="categorical", target_size=(224,224), batch_size=8, seed = 19, shuffle = True, color_mode = "rgb",
subset = "training")
valid_generator=datagen.flow_from_dataframe(dataframe=df,
directory="./data/images",
x_col="train", y_col="labels",
class_mode="categorical", target_size=(224,224), batch_size=8, seed = 19, shuffle = True, color_mode = "rgb",
subset = "validation")
test_generator=datagen.flow_from_dataframe(dataframe=df_test,
directory="./data/images",
x_col="test", y_col="labels",
class_mode="categorical", target_size=(224,224), batch_size=8, seed = 19, shuffle = False, color_mode = "rgb")
While fitting the model, algorithm always gives the same validation accuracy which is "1"
loss: 5.5803 - acc: 0.6538 - val_loss: 1.1921e-07 - val_acc: 1.0000
I did some digging, maybe the problem caused by the amount of data in my train/val/test sets. Below, you can see the output of generator for train, validation and test sets respectively.
Found 550 images belonging to 3 classes.
Found 183 images belonging to 3 classes.
Found 367 images belonging to 3 classes.
Should I increase the train set by decreasing the test set?
Thanks

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Inconsistent Accuracy Score using evaluate_generator on keras - python

Related

Loading pre-trained keras model and continue training

Tensorflow metrics confusion: accuracy and loss are high but confusion matrix indicates bad prediction

Warning : Tensorflow: can save best model only with val_accuracy available, skipping

Issues with KeyError: 'val_acc'

Constant Accuracy in CNN

Categories

Resources