Constant Accuracy in CNN

Constant Accuracy in CNN - python

I am trying train a Vigg19 network with Fine-Grained Visual Classification of Aircraft benchmark data set. I chose 3 aircraft family. Using the text files that are given in dataset, I manage to construct a dataframe in order to use flow_from_dataframe generator.
df = pd.read_csv("./train_family.csv")
df_test = pd.read_csv("./test_family.csv")
datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25,rotation_range=10, horizontal_flip=True, vertical_flip=True)
train_generator=datagen.flow_from_dataframe(dataframe=df,
directory="./data/images",
x_col="train", y_col="labels",
class_mode="categorical", target_size=(224,224), batch_size=8, seed = 19, shuffle = True, color_mode = "rgb",
subset = "training")
valid_generator=datagen.flow_from_dataframe(dataframe=df,
directory="./data/images",
x_col="train", y_col="labels",
class_mode="categorical", target_size=(224,224), batch_size=8, seed = 19, shuffle = True, color_mode = "rgb",
subset = "validation")
test_generator=datagen.flow_from_dataframe(dataframe=df_test,
directory="./data/images",
x_col="test", y_col="labels",
class_mode="categorical", target_size=(224,224), batch_size=8, seed = 19, shuffle = False, color_mode = "rgb")
While fitting the model, algorithm always gives the same validation accuracy which is "1"
loss: 5.5803 - acc: 0.6538 - val_loss: 1.1921e-07 - val_acc: 1.0000
I did some digging, maybe the problem caused by the amount of data in my train/val/test sets. Below, you can see the output of generator for train, validation and test sets respectively.
Found 550 images belonging to 3 classes.
Found 183 images belonging to 3 classes.
Found 367 images belonging to 3 classes.
Should I increase the train set by decreasing the test set?
Thanks

Related

Tensorflow metrics confusion: accuracy and loss are high but confusion matrix indicates bad prediction

after years of reading, it is finally time form my first question:
Using tensorflow and keras in a jupyter notebook, I trained a VGG16 Model on 20k sound spectrograms (my own dataset) and a bit of data augmentation using a data generator to do a 4-class multiclass classification. Below, my code:
import tensorflow as tf
from tensorflow.keras.applications.vgg16 import VGG16
model = VGG16(include_top=True,
weights=None,
input_tensor=None,
pooling=None,
classes=len(labels),
classifier_activation="softmax")
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import optimizers
# Rescale by 1/255, add data augmentation:
train_datagen = ImageDataGenerator(
rescale=1./255,
width_shift_range=0.2,
brightness_range=[0.8,1.2],
fill_mode='nearest')
# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# This is the target directory
train_dir,
# All images will be resized to 224x224
target_size=(224, 224),
batch_size=20,
# one hot label for multiclass
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(224, 224),
batch_size=20,
class_mode='categorical')
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(learning_rate=2e-5),
metrics=[tf.keras.metrics.CategoricalAccuracy(),
tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()])
# Train the model:
history = model.fit(
train_generator,
steps_per_epoch=100,
epochs=100,
validation_data=validation_generator,
validation_steps=50,
verbose=2)
To evaluate the training process I plotted acc, loss, precision, recall and f1-score. All of them looking good and indicating the training went well.
When I use modell.evaluate on my test set, I get a acc of 91%.
test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(224, 224),
batch_size=20,
class_mode='categorical')
test_loss, test_acc, test_precison, test_recall = model.evaluate(test_generator, steps=50)
print('test_acc:' + str(test_acc))
Found 4724 images belonging to 4 classes.
50/50 [==============================] - 2s 49ms/step - loss: 0.2739 - categorical_accuracy: 0.9120 - precision: 0.9244 - recall: 0.9050
test_acc:0.9120000004768372
But when I try to plot a confusion matrix in the following way, it looks horrible and when I calculate the acc manualy form the data I created the confusion matrix from, I get a acc of 25%. Which with 4 classes would mean my model learn absolutely nothing…
import numpy as np
import sklearn.metrics
# Print confuision matrix for test set:
test_pred_raw = model.predict(test_generator)
print('raw preditcitons:')
print(test_pred_raw)
test_pred = np.argmax(test_pred_raw, axis=1)
print('prediction:')
print(test_pred)
test_labels = test_generator.classes
print('labels')
print(test_labels)
# Calculate accuracy manualy:
my_test_acc = sum(test_pred == test_labels) / len(test_labels)
print('My_acc:')
print(my_test_acc)
# Calculate the confusion matrix using sklearn.metrics
cm = sklearn.metrics.confusion_matrix(test_labels, test_pred)
figure = plot_confusion_matrix(cm, class_names=labels)
raw preditcitons:
[[2.9204198e-12 2.8631955e-09 1.0000000e+00 7.3386294e-16]
[1.1940503e-11 8.0026985e-11 1.0000000e+00 7.3565399e-16]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 0.0000000e+00]
...
[2.2919695e-03 3.8061540e-07 9.9770677e-01 8.1024604e-07]
[5.7501338e-35 1.0000000e+00 0.0000000e+00 0.0000000e+00]
[0.0000000e+00 1.0000000e+00 4.0776377e-37 2.6318860e-38]]
prediction:
[2 2 1 ... 2 1 1]
labels
[0 0 0 ... 3 3 3]
My_acc:
0.2491532599491956\
My question now is, which of the metrics can I trust and what is wrong with the other one?

Okay. I think I got it!
Setting shuffle = False in test_datagen.flow_from_directory() seems to solve he problem. Now the confusion matrix looks way better and my_acc = 89% looks fine to.
It seems that the data generator yields different batches when called twice. First, by model.predict(test_generator) and then again by test_generator.classes, basicly making the label and predictions not match because they are for different batches.
Can someone confirm I got this right?

The problem might be:
my_test_acc = sum(test_pred == test_labels) / len(test_labels)
Maybe you should add a rounding up step to be sure the predicted values are really 1.0 and not 0.99.

Warning : Tensorflow: can save best model only with val_accuracy available, skipping

I am building a CNN model for image classification using keras. I am unable to save the best model using ModelCheckPoint as it throws this warning:
WARNING:tensorflow:Can save best model only with val_accuracy
available, skipping.
I have researched on stackoverflow for all the related questions, but nothing has worked so far. [1] [2] [3] [4] and more
Here's my code:
model.compile(loss='categorical_crossentropy',metrics=['accuracy', Precision(), Recall()],optimizer='adam')
train_datagen = ImageDataGenerator(rescale = 1/255.,
rotation_range =40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range =0.2,
horizontal_flip=True,
fill_mode ='nearest')
test_datagen = ImageDataGenerator(rescale = 1/255.)
train_generator = train_datagen.flow_from_directory(r"./datasetcrop512/train", target_size=(512,512), batch_size=32, class_mode='categorical')
test_generator = test_datagen.flow_from_directory(r"./datasetcrop512/test", target_size=(512,512), batch_size=32, class_mode='categorical')
After augmentation, model checkpoint
from keras.callbacks import ModelCheckpoint
filepath = 'weights_best_model3_6.hdf5'
checkpoint = ModelCheckpoint(filepath,monitor = 'val_accuracy',verbose = 1, save_best_only=True, mode='max')
fit the model
history = model.fit(train_generator, steps_per_epoch = stepsPerEpoch,
epochs = 15, validation_data=test_generator, validation_steps = stepsPerEpoch,
callbacks = [ PlotLossesKeras(),checkpoint])
After running, validation accuracy is calculated for 1st epoch, but from 2nd epoch it starts giving the aforementioned warning and doesn't save the best model.

This seems similar to another issue that i saw before, the use of validation_steps is only allowed under some conditions as stated in the docs
validation_steps : Only relevant if validation_data is provided and is a tf.data dataset. ...
The data you provided is not from the required type to make it work, validating this way requires the given dataset to be either very large or to use repeat method to repeat itself to accommodate the number of validation batches you want, i believe that's why validation only worked at the first time then stopped for the rest.
Check also this better explanation of the parameters steps_per_epoch and validation_steps and when to use them.

Validation Accuracity isn't changing

I trained my first categorical image classifier, but due I´m new to deep learning, I don't notice what is happening with my validation set. I´m doing a 5 Fold CV, and in each fold, I split the data set into train/validation/test.
After the split, I apply data augmentation in x_train:
datagen = ImageDataGenerator(
rotation_range=90,
horizontal_flip=True)
Then fit my model with the generator:
history = model.fit_generator(datagen.flow(x_train, y_train, batch_size=25),
validation_data=(x_validation,y_validation),
steps_per_epoch=math.ceil(len(x_train) / batch),
epochs=epochs)
Params: len(x_train)=31 and batch=1 cause I want sgd batch.
The data set has 76 images spplited into test=15, Train=31 and Validation=30.
After running this for 5 fold it gives me this graphic:
What is happening?
Thanks in advance!

model.predict_generator and model.evaluate_generator return completely different accuracies

I have trained VGG as a 10-class classifier on 100 epochs, and this is the train/validation accuracy.
Also, I wanted to test the model on a hold-out test set, so I evaluated it like so:
test_datagen = ImageDataGenerator(
rescale=1./255,
)
test_generator = test_datagen.flow_from_directory(
'/content/drive/My Drive/Colab Notebooks/domat/solo-dataset/test/',
target_size=(224, 224),
batch_size=32,
class_mode='categorical',
shuffle=False
)
steps = 3616 // 32
loss, accuracy = model_vgg_imagenet_dropout.evaluate_generator(test_generator,
steps = steps,
workers = 4,
use_multiprocessing=True)
When I print the results I get (1.4021655139801776, 0.802820796460177), which is similar to what I expected.
However, when I try to manually evaluate it through the model.predict_generator, I only get 13% accuracy.
Following is the code for manually evaluating it (the generator is the same object):
predictions = model_vgg_imagenet_dropout.predict_generator(test_generator,
steps = steps,
workers = 4,
use_multiprocessing=True)
y_pred = np.zeros(len(predictions))
for i, p in enumerate(predictions):
max_index = np.argmax(p)
y_pred[i] = max_index
# the y_pred array should contain the class index of each sample, as defined by test_generator.class_indices
y_true = test_generator.classes
from sklearn.metrics import accuracy_score
print(accuracy_score(y_true, y_pred))
I don't understand where I'm making a mistake, it seems correct to me.
Edit: when I manually observe the results from model.predict_generator() and map the softmax values to the class index, it literally outputs like 3 or 4 classes most of the time.

Inconsistent Accuracy Score using evaluate_generator on keras

I’m trying to solve a problem in my CNN model. I'm using a dataset that is structured as follow:
My data is organized as follow:
train
parasitized
uninfected
test
parasitized
uninfected
validation
parasitized
uninfected
My dataset is too large and i'm using ImageDataGenerator to preprocess the images and also upload them in batches (reduce computational cost). In a first i configured the ImageDataGenerator as follows:
from keras.preprocessing.image import ImageDataGenerator
#Define a ImageDataGenerator for each dataset.
#This augmentation process is only to rescale each imagem to 1/255
datagen_train = ImageDataGenerator(rescale=1./255) #rescale=1./255
datagen_test = ImageDataGenerator(rescale=1./255)
datagen_valid = ImageDataGenerator(rescale=1./255)
#Define a batch_size parameter
batch_size=32
# Here .flow_from_directory is used to transform
train_generator = datagen_train.flow_from_directory(
'content/cell_images/train', #Train folder path
target_size=(150,150), #all images will be resized to 150x150
batch_size=batch_size,
class_mode='categorical') # We use categorical_crossentropy loss,
# we need categorical labels
test_generator = datagen_test.flow_from_directory(
'content/cell_images/test', #Test folder path
target_size=(150,150), #all images will be resized to 150x150
batch_size=batch_size,
class_mode='categorical')
valid_generator = datagen_valid.flow_from_directory(
'content/cell_images/valid',
target_size=(150,150),
batch_size=32,
class_mode='categorical')
To fit the model it was used fit_generator and a checkpointer to save the best weights based on the validation_accuracy:
from keras.callbacks import ModelCheckpoint
# Define epochs number
epochs = 10
# Create a checkpointer to save only the best params
checkpointer = ModelCheckpoint(filepath='cnn_model.weights.best.hdf5',
verbose=1, save_best_only=True)
model.fit_generator(train_generator,
steps_per_epoch=train_generator.samples//batch_size,
epochs=epochs,
callbacks=[checkpointer],
validation_data=valid_generator,
validation_steps=valid_generator.samples//batch_size)
And finally, the best weights was loaded to the model. The model was evaluated using test_set:
# load the weights that yielded the best validation accuracy
model.load_weights('cnn_model.weights.best.hdf5')
#evaluate and print test accuracy
score = model.evaluate_generator(test_generator,
test_generator.samples//batch_size)
print('\n', 'Test accuracy:', score[1])
But, here is my problem: each time i run only the model.evaluate_generator without train the model again (i.e. keeping the same weights), it returns different accuracy scores.
I've been looking for a solution, reading a lot of posts to get some insights and recently i got some advance.
Recently, i discovered based on this post that if i set Shuffle=True and batch_size=1 in test_generator:
test_generator = datagen_test.flow_from_directory(
'content/cell_images/test', #Test folder path
target_size=(150,150), #all images will be resized to 150x150
batch_size=1,
class_mode='categorical',
shuffle=False)`
and steps = test_generator.samples in test_generator:
score = model.evaluate_generator(test_generator, test_generator.samples)
the values doesn't change any more.
I was investigating the effect of rescale 1./255 based on this post. For this, i used callbacks with a checkpointer to save the weights only for the best validation socore. After, i loaded the best weights to the model and evaluated using model.evaluate_generator, as mentioned above. To check the score consistency i also use the validation score to check if the values returned by the callbacks for the best weights is the same returned by the evaluate_generator. Before run the evaluate_generator with validation_set i used the same params of the test set:
valid_generator = datagen_valid.flow_from_directory(
'content/cell_images/valid',
target_size=(150,150),
batch_size=1,
class_mode='categorical',
shuffle=False)
#evaluate and print test accuracy
score = model.evaluate_generator(valid_generator,
valid_generator.samples)
print('\n', 'Valid accuracy:', score[1])
#evaluate and print test accuracy
score = model.evaluate_generator(test_generator,
test_generator.samples)
print('\n', 'Test accuracy:', score[1])
Curiously i noticed that:
When i don't use the rescale (1./255):
datagen_train = ImageDataGenerator()
datagen_test = ImageDataGenerator()
datagen_valid = ImageDataGenerator()
the validation_score displayed by the callbacks (0.5) it's exacly the same obtained from model.evaluate_generator (0.5); Also, the test set returns an accuracy score = 0.5.
When i use the rescale (1./255):
datagen_train = ImageDataGenerator(rescale=1./255)
datagen_test = ImageDataGenerator(rescale=1./255)
datagen_valid = ImageDataGenerator(rescale=1./255)
the difference between validation_score displayed by the callbacks (0.9515):
Epoch 7/10
688/688 [==============================] - 67s 97ms/step - loss:
0.2017 - acc: 0.9496 - val_loss: 0.1767 - val_acc: 0.9515
Epoch 00007: val_loss improved from 0.19304 to 0.17671, saving model
to cnn_model.weights.best.hdf5
and the score obtained from model.evaluate_generator (Valid accuracy: 0.9466618287373004) it's very tiny; Using test set -
Test accuracy: 0.9078374455732946
Based on this small difference between validation scores, could i infer that the evaluate_generator is working corrreclty? And, could i infer that the accuracy score on the test_set is also correctly ? Or there is another approach to solve this problem?
I'm frustating with this problem.
Sorry for the long post, i'm trying to be the more didactic i can.
Thanks!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Constant Accuracy in CNN - python

Related

Tensorflow metrics confusion: accuracy and loss are high but confusion matrix indicates bad prediction

Warning : Tensorflow: can save best model only with val_accuracy available, skipping

Validation Accuracity isn't changing

model.predict_generator and model.evaluate_generator return completely different accuracies

Inconsistent Accuracy Score using evaluate_generator on keras

Categories

Resources