Tensorflow model.evaluate gives different result from that obtained from training - python

I am using tensorflow to do a multi-class classification
I load the training dataset and validation dataset in the following way
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
image_size=(img_height, img_width),
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
image_size=(img_height, img_width),
Then when I train the model using model.fit()
history = model.fit(
I get validation accuracy around 95%.
But when I load the same validation set and use model.evaluate()
I get very low accuracy (around 10%).
Why am I getting such different results? Am I using the model.evaluate function incorrectly?
Note : In the model.compile() I am specifying the following,
Optimizer - Adam,
Loss - SparseCategoricalCrossentropy,
Metric - Accuracy
Model.evaluate() output
41/41 [==============================] - 5s 118ms/step - loss: 0.3037 - accuracy: 0.1032
Test Loss - 0.3036555051803589
Test Acc - 0.10315627604722977
Model.fit() output for last three epochs
Epoch 8/10
41/41 [==============================] - 3s 80ms/step - loss: 0.6094 - accuracy: 0.8861 - val_loss: 0.4489 - val_accuracy: 0.9483
Epoch 9/10
41/41 [==============================] - 3s 80ms/step - loss: 0.5377 - accuracy: 0.8953 - val_loss: 0.3868 - val_accuracy: 0.9554
Epoch 10/10
41/41 [==============================] - 3s 80ms/step - loss: 0.4663 - accuracy: 0.9092 - val_loss: 0.3404 - val_accuracy: 0.9590

Try saving your model as .h5 instead of .tf - this worked for me in Keras 2.4.0 and tensorflow 2.4.0

Why am I getting such different results? Am I using the model.evaluate
function incorrectly?
I suppose that it is the over fitting that cause this issue. You can check them out in this way!
Extract the history of model
history_dict = history.history
Visualize the history
import matplotlib.pyplot as plt
ax1.plot(epochs,loss,'bo',label='Training loss')
ax1.plot(epochs,acc,'ro',label='Training acc')
ax1.set_title('loss and acc of Training')
ax2.plot(epochs,val_acc,'r',label='Validation acc')
ax2.plot(epochs,val_loss,'b',label='Validation loss')
ax2.set_title('loss and acc of Training')
Maybe, the results you get are like these:
In training process, acc and loss changed with epochs
But in validation, acc and loss seem to reached a peak after 20 epochs
It turns out that, when overfitting occurs, fewer epochs can be set to avoid this problem!


training model for different batch sizes in keras

I want to train my model for different batch sizes i.e: [64, 128]
I am doing it with for loop like below
batch_sizes = [128,256]
for i in range(len(batch_sizes)):
history = model.fit(x_train, y_train, batch_sizes[i], epochs=epochs,
callbacks=[early_stopping, chk], validation_data=(x_test, y_test))
for above code my model produce following results:
Epoch 1/2
311/311 [==============================] - 157s 494ms/step - loss: 0.2318 -
f1: 0.0723
Epoch 2/2
311/311 [==============================] - 152s 488ms/step - loss: 0.1402 -
f1: 0.4360
Epoch 1/2
156/156 [==============================] - 137s 877ms/step - loss: 0.1197 -
f1: **0.5450**
Epoch 2/2
156/156 [==============================] - 136s 871ms/step - loss: 0.1132 -
f1: 0.5756
it looks like the model continues training after completing training for batch size 64, i.e I want to get my model trained for the next batch from scratch, how can I do it kindly guide me.
p.s: what i have tried:
batch_sizes = [128,256]
for i in range(len(batch_sizes)):
history = model.fit(x_train, y_train, batch_sizes[i], epochs=epochs,
callbacks=[early_stopping, chk], validation_data=(x_test, y_test))
it also did not worked
You can write a function to define a model, and you would need to call that before the subsequent fit calls. If your model is contained within model, the weights are updated during training, and they stay that way after the fit call. That is why you need to redefine the model. This can help you
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
X = np.random.rand(1000,5)
Y = np.random.rand(1000,1)
def build_model():
model = Sequential()
return model
batch_sizes = [128,256]
for i in range(len(batch_sizes)):
model = build_model()
history = model.fit(X, Y, batch_sizes[i], epochs=epoch, verbose=2)
model.save('Model_' + str(batch_sizes[i]) + '.h5')
Then, the output looks like:
Epoch 1/2
8/8 - 0s - loss: 0.3164
Epoch 2/2
8/8 - 0s - loss: 0.1367
Epoch 1/2
4/4 - 0s - loss: 0.7221
Epoch 2/2
4/4 - 0s - loss: 0.4787

Why is my model overfitting on the second epoch?

I'm a beginner in deep learning and I'm trying to train a deep learning model to classify different ASL hand signs using Mobilenet_v2 and Inception.
Here are my codes create an ImageDataGenerator for creating the training and validation set.
# Reformat Images and Create Batches
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
validation_split = 0.4
train_generator = datagen.flow_from_directory(
target_size = (IMAGE_RES,IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'training'
val_generator = datagen.flow_from_directory(
target_size= (IMAGE_RES, IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'validation'
Here are the codes to train the models:
# Do transfer learning with Tensorflow Hub
URL = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
feature_extractor = hub.KerasLayer(URL,
input_shape=(IMAGE_RES, IMAGE_RES, 3))
# Freeze pre-trained model
feature_extractor.trainable = False
# Attach a classification head
model = tf.keras.Sequential([
layers.Dense(5, activation='softmax')
# Train the model
history = model.fit(train_generator,
validation_data = val_generator,
Epoch 1/5
94/94 [==============================] - 19s 199ms/step - loss: 0.7333 - accuracy: 0.7730 - val_loss: 0.6276 - val_accuracy: 0.7705
Epoch 2/5
94/94 [==============================] - 18s 190ms/step - loss: 0.1574 - accuracy: 0.9893 - val_loss: 0.5118 - val_accuracy: 0.8145
Epoch 3/5
94/94 [==============================] - 18s 191ms/step - loss: 0.0783 - accuracy: 0.9980 - val_loss: 0.4850 - val_accuracy: 0.8235
Epoch 4/5
94/94 [==============================] - 18s 196ms/step - loss: 0.0492 - accuracy: 0.9997 - val_loss: 0.4541 - val_accuracy: 0.8395
Epoch 5/5
94/94 [==============================] - 18s 193ms/step - loss: 0.0349 - accuracy: 0.9997 - val_loss: 0.4590 - val_accuracy: 0.8365
I've tried using data augmentation but the model still overfits so I'm wondering if I've done something wrong in my code.
Your data is very small. Try splitting with random seeds and check if the problem still persists.
If it does, then use regularizations and decrease the complexity of neural network.
Also experiment with different optimizers and smaller learning rate (try lr scheduler)
It seems like your dataset is very small with some true outputs separated only by a small distance of inputs in the input-output curve. That is why it is fitting easily to those points.

How to get accuracy of model using keras?

After fitting the model (which was running for a couple of hours), I wanted to get the accuracy with the following code:
of the trained model, but was getting an error, which is caused by the deprecated methods I was using.
KeyError Traceback (most recent call last)
<ipython-input-233-081ed5e89aa4> in <module>()
3 train_loss=hist.history['loss']
4 val_loss=hist.history['val_loss']
----> 5 train_acc=hist.history['acc']
6 val_acc=hist.history['val_acc']
7 xc=range(nb_epoch)
KeyError: 'acc'
The code I used to fit the model before trying to read the accuracy, is the following:
hist = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, Y_test))
hist = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_split=0.2)
Which produces this output when running it:
Epoch 1/20
237/237 [==============================] - 104s 440ms/step - loss: 6.2802 - val_loss: 2.4209
Epoch 19/20
189/189 [==============================] - 91s 480ms/step - loss: 0.0590 - val_loss: 0.2193
Epoch 20/20
189/189 [==============================] - 85s 451ms/step - loss: 0.0201 - val_loss: 0.2312
I've noticed that I was running deprecated methods & arguments.
So how can I read the accuracy and val_accuracy without having to fit again, and waiting for a couple of hours again? I tried to replace train_acc=hist.history['acc'] with train_acc=hist.history['accuracy'] but it didn't help.
You probably didn't add "acc" as a metric when compiling the model.
model.compile(optimizer=..., loss=..., metrics=['accuracy',...])
You can get the metrics and loss from any data without training again with:
model.evaluate(X, Y)
add a metrics = ['accuracy'] when you compile the model
simply get the accuracy of the last epoch . hist.history.get('acc')[-1]
what i would do actually is use a GridSearchCV and then get the best_score_ parameter to print the best metrics
Just tried it in tensorflow==2.0.0. With the following result:
Given a training call like:
history = model.fit(train_data, train_labels, epochs=100,
validation_data=(test_images, test_labels))
The final accuracy for the above call can be read out as follows:
Printing the entire dict history.history gives you overview of all the contained values.
You will find that all the values reported in a line such as:
7570/7570 [==============================] - 42s 6ms/sample - loss: 1.1612 - accuracy: 0.5715 - val_loss: 0.5541 - val_accuracy: 0.8300
can be read out from that dict.
For the sake of completeness, I created the model as follows:
There is a way to take the most performant model accuracy by adding callback to serialize that Model such as ModelCheckpoint and extracting required value from the history having the lowest loss:
best_model_accuracy = history.history['acc'][argmin(history.history['loss'])]

0% accuracy with evaluate_generator but 75% accuracy during training with same data - what is going on?

I'm encountering a very strange with a keras model using ImageDataGenerator, fit_generator, and evaluate_generator.
I'm creating the model like so:
classes = <list of classes>
num_classes = len(classes)
pretrained_model = Sequential()
pretrained_model.add(ResNet50(include_top=False, weights='imagenet', pooling='avg'))
pretrained_model.add(Dense(num_classes, activation='softmax'))
pretrained_model.layers[0].trainable = False
And I'm training it like this:
idg_final = ImageDataGenerator(
width_shift_range = 0.2,
height_shift_range = 0.2,
traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')
pretrained_model.fit_generator(traing_gen, epochs=1, verbose=1)
fit_generator prints loss: 1.0297 - acc: 0.7546.
Then, I am trying to evaluate the model on the exact same data it was trained on.
debug_gen = idg_final.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes, shuffle=True)
print(pretrained_model.evaluate_generator(debug_gen, steps=100))
Which prints [10.278913383483888, 0.0].
Why is the accuracy so different on the same exact data?
Edit: I also wanted to point out that sometimes the accuracy is above 0.0. For example, when I use a model trained with five epochs, evaluate_accuracy returns 6% accuracy.
Edit 2: Based on the answers below I made sure to train for more epochs and that the ImageDataGenerator for evaluation did not have random shifts and rotations. I'm still getting very high accuracy during training and extremely low accuracy during evaluation on the same dataset.
I'm training like
idg_final = ImageDataGenerator(
width_shift_range = 0.2,
height_shift_range = 0.2,
traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')
pretrained_model.fit_generator(traing_gen, epochs=10, verbose=1)
Which prints the following:
Found 9850 images belonging to 4251 classes.
Epoch 1/10
308/308 [==============================] - 3985s 13s/step - loss: 8.9218 - acc: 0.0860
Epoch 2/10
308/308 [==============================] - 3555s 12s/step - loss: 3.2710 - acc: 0.3403
Epoch 3/10
308/308 [==============================] - 3594s 12s/step - loss: 1.8597 - acc: 0.5836
Epoch 4/10
308/308 [==============================] - 3656s 12s/step - loss: 1.2712 - acc: 0.7058
Epoch 5/10
308/308 [==============================] - 3667s 12s/step - loss: 0.9556 - acc: 0.7795
Epoch 6/10
308/308 [==============================] - 3689s 12s/step - loss: 0.7665 - acc: 0.8207
Epoch 7/10
308/308 [==============================] - 3693s 12s/step - loss: 0.6581 - acc: 0.8498
Epoch 8/10
308/308 [==============================] - 3618s 12s/step - loss: 0.5874 - acc: 0.8636
Epoch 9/10
308/308 [==============================] - 3823s 12s/step - loss: 0.5144 - acc: 0.8797
Epoch 10/10
308/308 [==============================] - 4334s 14s/step - loss: 0.4835 - acc: 0.8854
And I'm evaluating like this on the exact same dataset
idg_debug = ImageDataGenerator(
debug_gen = idg_debug.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes)
Which prints the following very low accuracy: [10.743386410747084, 0.0001015228426395939]
The full code is here.
Two things I suspect.
1 - No, your data is not the same.
You're using three types of augmentation in ImageDataGenerator, and it seems there isn't a random seed being set. So, test data is not equal to training data.
And as it seems, you're also training for only one epoch, which is very little (unless you really have tons of data, but since you're using augmentation, maybe that's not the case). (PS: I don't see the steps_per_epoch argument in your fit_generator call...)
So, if you want to see good results, here are some solutions:
remove the augmentation arguments from the generator for this test (either training and test data) - This means, remove width_shift_range, height_shift_range and rotation_range;
if not, train for really long, enough for your model to really get used to all kinds of augmented images (as it seems, five epochs seem still to be way too little);
or set a random seed and guarantee that the test data is equal to the training data (argument seed in flow_from_directory)
2 - (This may happen if you're very new to Keras/programming, so please ignore if it's not the case) You might be running the code that defines the model again when testing.
If you run the code that defines the model again, it will replace all your previous training with random weights.
3 - Since we're out of suggestions:
Maybe save the weights instead of saving the model. I usually do this instead of saving the models. (For some reason I don't understand, I've never been able to load a model like that)
def createModel():
model = createModel()
model = createModel()
It's not related to the bug, but make sure that the base model layer is really layer 0. If I remember well, sequential models have an input layer and you should actually be making layer 1 untrainable instead.
Use the model.summary() to confirm the number of untrainable parameters.

Printing out the validation accuracy to the console for every batch or epoch (Keras)

I'm using ImageDataGenerator and flow_from_directory to generate my data, and
using model.fit_generator to fit the data.
This defaults to outputting the accuracy for training data set only.
There doesn't seem to be an option to output validation accuracy to the terminal.
Here is the relevant portion of my code:
#train data generator
print('Starting Preprocessing')
train_datagen = ImageDataGenerator(preprocessing_function = preprocess)
train_generator = train_datagen.flow_from_directory(
target_size = (img_height, img_width),
batch_size = batch_size,
class_mode = 'categorical') #class_mode = 'categorical'
#same for validation
val_datagen = ImageDataGenerator(preprocessing_function = preprocess)
validation_generator = val_datagen.flow_from_directory(
target_size = (img_height, img_width),
########################Model Creation###################################
#create the base pre-trained model
print('Finished Preprocessing, starting model creating \n')
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(12, activation='softmax')(x)
model = Model(input=base_model.input, output=predictions)
for layer in model.layers[:-34]:
layer.trainable = False
for layer in model.layers[-34:]:
layer.trainable = True
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.001, momentum=0.92),
metrics = ['accuracy'])
#############SAVE Model #######################################
file_name = str(datetime.datetime.now()).split(' ')[0] + '_{epoch:02d}.hdf5'
filepath = os.path.join(save_dir, file_name)
checkpoints =ModelCheckpoint(filepath, monitor='val_acc', verbose=1,
save_best_only=False, save_weights_only=False,
mode='auto', period=2)
###############Fit Model #############################
steps_per_epoch =total_samples//batch_size,
epochs = epochs,
callbacks = [checkpoints],
shuffle= True)
Throughout training, I'm only getting the output of training accuracy,
but at the end of training, I"m getting both training, validation accuracy.
Epoch 1/10
1/363 [..............................] - ETA: 1:05:58 - loss: 2.4976 - acc: 0.0640
2/363 [..............................] - ETA: 51:33 - loss: 2.4927 - acc: 0.0760
3/363 [..............................] - ETA: 48:55 - loss: 2.5067 - acc: 0.0787
4/363 [..............................] - ETA: 47:26 - loss: 2.5110 - acc: 0.0770
5/363 [..............................] - ETA: 46:30 - loss: 2.5021 - acc: 0.0824
6/363 [..............................] - ETA: 45:56 - loss: 2.5063 - acc: 0.0820
The idea is that you go through you validation set after each epoch, not after each batch.
If after every batch, you had to evaluate the performances of the model on the whole validation set, you would loose a lot of time.
After each epoch, you will have the corresponding losses and accuracies both for training and validation. But during one epoch, you will only have access to the training loss and accuracy.
Validation loss and validation accuracy gets printed for every epoch once you specify the validation_split.
model.fit(X, Y, epochs=1000, batch_size=10, validation_split=0.2)
I have used the above in my code, and val_loss and val_acc are getting printed for every epoch, but not after every batch.
Hope that answers your question.
Epoch 1/500
1267/1267 [==============================] - 0s 376us/step - loss: 0.6428 - acc: 0.6409 - val_loss: 0.5963 - val_acc: 0.6656
In fit_generator,
fit_generator(generator, steps_per_epoch=None, epochs=1, verbose=1, callbacks=None, **validation_data=None, validation_steps=None**, validation_freq=1, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True, initial_epoch=0)
since there is no validation_split parameter, you can create two different ImageDataGenerator flow, one for training and one for validating and then place that 'validation_generator' in validation_data. Then it will print the validation loss and accuracy.
