Why is my model overfitting on the second epoch?

Why is my model overfitting on the second epoch? - python

I'm a beginner in deep learning and I'm trying to train a deep learning model to classify different ASL hand signs using Mobilenet_v2 and Inception.
Here are my codes create an ImageDataGenerator for creating the training and validation set.
# Reformat Images and Create Batches
IMAGE_RES = 224
BATCH_SIZE = 32
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split = 0.4
)
train_generator = datagen.flow_from_directory(
base_dir,
target_size = (IMAGE_RES,IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'training'
)
val_generator = datagen.flow_from_directory(
base_dir,
target_size= (IMAGE_RES, IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'validation'
)
Here are the codes to train the models:
# Do transfer learning with Tensorflow Hub
URL = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
feature_extractor = hub.KerasLayer(URL,
input_shape=(IMAGE_RES, IMAGE_RES, 3))
# Freeze pre-trained model
feature_extractor.trainable = False
# Attach a classification head
model = tf.keras.Sequential([
feature_extractor,
layers.Dense(5, activation='softmax')
])
model.summary()
# Train the model
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
EPOCHS = 5
history = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=EPOCHS,
validation_data = val_generator,
validation_steps=len(val_generator)
)
Epoch 1/5
94/94 [==============================] - 19s 199ms/step - loss: 0.7333 - accuracy: 0.7730 - val_loss: 0.6276 - val_accuracy: 0.7705
Epoch 2/5
94/94 [==============================] - 18s 190ms/step - loss: 0.1574 - accuracy: 0.9893 - val_loss: 0.5118 - val_accuracy: 0.8145
Epoch 3/5
94/94 [==============================] - 18s 191ms/step - loss: 0.0783 - accuracy: 0.9980 - val_loss: 0.4850 - val_accuracy: 0.8235
Epoch 4/5
94/94 [==============================] - 18s 196ms/step - loss: 0.0492 - accuracy: 0.9997 - val_loss: 0.4541 - val_accuracy: 0.8395
Epoch 5/5
94/94 [==============================] - 18s 193ms/step - loss: 0.0349 - accuracy: 0.9997 - val_loss: 0.4590 - val_accuracy: 0.8365
I've tried using data augmentation but the model still overfits so I'm wondering if I've done something wrong in my code.

Your data is very small. Try splitting with random seeds and check if the problem still persists.
If it does, then use regularizations and decrease the complexity of neural network.
Also experiment with different optimizers and smaller learning rate (try lr scheduler)

It seems like your dataset is very small with some true outputs separated only by a small distance of inputs in the input-output curve. That is why it is fitting easily to those points.

Related

Binary classification model using BERT encoder stuck at 50% accuracy

I'm trying to train a simple model for the Yelp binary classification task.
Load BERT encoder:
gs_folder_bert = "gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12"
bert_config_file = os.path.join(gs_folder_bert, "bert_config.json")
config_dict = json.loads(tf.io.gfile.GFile(bert_config_file).read())
bert_config = bert.configs.BertConfig.from_dict(config_dict)
_, bert_encoder = bert.bert_models.classifier_model(
bert_config, num_labels=2)
checkpoint = tf.train.Checkpoint(model=bert_encoder)
checkpoint.restore(
os.path.join(gs_folder_bert, 'bert_model.ckpt')).assert_consumed()
Load data:
data, info = tfds.load('yelp_polarity_reviews', with_info=True, batch_size=-1, as_supervised=True)
train_x_orig, train_y_orig = tfds.as_numpy(data['train'])
train_x = encode_examples(train_x_orig)
train_y = train_y_orig
Use BERT to embed the data:
encoder_output = bert_encoder.predict(train_x)
Setup the model:
inputs = keras.Input(shape=(768,))
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(8, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
sgd = SGD(lr=0.0001)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
Train:
model.fit(encoder_output[0], train_y, batch_size=64, epochs=3)
# encoder_output[0].shape === (10000, 1, 768)
# y_train.shape === (100000,)
Training results:
Epoch 1/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6921 - accuracy: 0.5455
Epoch 2/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6918 - accuracy: 0.5455
Epoch 3/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6915 - accuracy: 0.5412
Epoch 4/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6913 - accuracy: 0.5407
Epoch 5/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6911 - accuracy: 0.5358
I tried different learning rates, but the main issue seems that training takes 1 second and the accuracy stays at ~0.5. Am I not setting the inputs/model correctly?

Your BERT model is not training. It has to be placed before dense layers and train as part of the model. the input layer has to take not BERT vectors, but the sequence of tokens cropped to max_length and padded. Here is the example code: https://keras.io/examples/nlp/text_extraction_with_bert/, see the beginning of create_model function.
Alternatively, you can use Trainer from transformers.

validation accuracy not being shown after 1 epoch when using transfer learning from InceptionV3

I am trying to build an image classifier that differentiates images into pumps, Turbines, and PCB classes. I am using transfer learning from Inception V3.
Below is my code to initialize InceptionV3
import os
from tensorflow.keras import layers
from tensorflow.keras import Model
!wget --no-check-certificate \
https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
-O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
from tensorflow.keras.applications.inception_v3 import InceptionV3
local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'
pre_trained_model = InceptionV3(input_shape = (150, 150, 3),
include_top = False,
weights = None)
pre_trained_model.load_weights(local_weights_file)
for layer in pre_trained_model.layers:
layer.trainable = False
# pre_trained_model.summary()
last_layer = pre_trained_model.get_layer('mixed7')
print('last layer output shape: ', last_layer.output_shape)
last_output = last_layer.output
Next I connect my DNN to the pre-trained model:
from tensorflow.keras.optimizers import RMSprop
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
x = layers.Dense (3, activation='softmax')(x)
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
I feed in my images using ImageDataGenerator and train the model as below:
history = model.fit(
train_generator,
validation_data = validation_generator,
steps_per_epoch = 100,
epochs = 20,
validation_steps = 50,
verbose = 2)
However, the validation accuracy is not printed/generated after the first epoch:
Epoch 1/20
/usr/local/lib/python3.6/dist-packages/PIL/TiffImagePlugin.py:788: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
/usr/local/lib/python3.6/dist-packages/PIL/Image.py:932: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
"Palette images with Transparency expressed in bytes should be "
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 50 batches). You may need to use the repeat() function when building your dataset.
100/100 - 43s - loss: 0.1186 - accuracy: 0.9620 - val_loss: 11.7513 - val_accuracy: 0.3267
Epoch 2/20
100/100 - 41s - loss: 0.1299 - accuracy: 0.9630
Epoch 3/20
100/100 - 39s - loss: 0.0688 - accuracy: 0.9840
Epoch 4/20
100/100 - 39s - loss: 0.0826 - accuracy: 0.9785
Epoch 5/20
100/100 - 39s - loss: 0.0909 - accuracy: 0.9810
Epoch 6/20
100/100 - 39s - loss: 0.0523 - accuracy: 0.9845
Epoch 7/20
100/100 - 38s - loss: 0.0976 - accuracy: 0.9835
Epoch 8/20
100/100 - 39s - loss: 0.0802 - accuracy: 0.9795
Epoch 9/20
100/100 - 39s - loss: 0.0612 - accuracy: 0.9860
Epoch 10/20
100/100 - 40s - loss: 0.0729 - accuracy: 0.9825
Epoch 11/20
100/100 - 39s - loss: 0.0601 - accuracy: 0.9870
Epoch 12/20
100/100 - 39s - loss: 0.0976 - accuracy: 0.9840
Epoch 13/20
100/100 - 39s - loss: 0.0591 - accuracy: 0.9815
Epoch 14/20
I am not understanding as to what is stopping the validation accuracy from being printed/generated. I get an error if the plot a graph on accuracy vs validation accuracy with a message as:
ValueError: x and y must have same first dimension, but have shapes (20,) and (1,)
what am I missing here?

It worked finally, posting my changes here in case if anybody faces issues like these.
So I changed the "weights" parameter in InceptionV3 from None to 'imagenet' and calculated my steps per epoch and validations steps as follows:
steps_per_epoch = np.ceil(no_of_training_images/batch_size)
validation_steps = np.ceil(no_of validation_images/batch_size)

As you see WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least ``steps_per_epoch * epochs`` batches (in this case, 50 batches). You may need to use the repeat() function when building your dataset.
To make sure that you have "at least steps_per_epoch * epochs batches", set the steps_per_epoch to:
steps_per_epoch = X_train.shape[0]//batch_size

Val_acc doesn't increse

I'm trying to train a model with transfer learning, mobilenetv2, but my val accurary stops incressing at around 0,60. I've tried to train the top layers that I've build, after that I've tried to also train some of the mobilenets layers. Same result. How can I fix it? I have to mention that I am new to deep learning and I am not sure that the top layers I've build are right. Feel free to correct me.
IMAGE_SIZE = 224
BATCH_SIZE = 64
train_data_dir = "/content/FER2013/Training"
validation_data_dir = "/content/FER2013/PublicTest"
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split=0)
train_generator = datagen.flow_from_directory(
train_data_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
class_mode = 'categorical',
batch_size=BATCH_SIZE)
val_generator = datagen.flow_from_directory(
validation_data_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
class_mode = 'categorical',
batch_size=BATCH_SIZE)
IMG_SHAPE = (IMAGE_SIZE, IMAGE_SIZE, 3)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(7, activation='softmax')
])
model.compile(loss='categorical_crossentropy',
optimizer = tf.keras.optimizers.Adam(1e-5), #I've tried with .Adam as well
metrics=['accuracy'])
from keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint
lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=3)
early_stopper = EarlyStopping(monitor='val_accuracy', min_delta=0, patience=6, mode='auto')
checkpointer = ModelCheckpoint('/content/weights.hd5', monitor='val_loss', verbose=1, save_best_only=True)
epochs = 50
learning_rate = 0.004 #I've tried other values as well
history_fine = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=epochs,
callbacks=[lr_reducer, checkpointer, early_stopper],
validation_data=val_generator,
validation_steps=len(val_generator))
Epoch 1/50
448/448 [==============================] - ETA: 0s - loss: 1.7362 - accuracy: 0.2929
Epoch 00001: val_loss improved from inf to 1.58818, saving model to /content/weights.hd5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 166s 370ms/step - loss: 1.7362 - accuracy: 0.2929 - val_loss: 1.5882 - val_accuracy: 0.4249
Epoch 2/50
448/448 [==============================] - ETA: 0s - loss: 1.3852 - accuracy: 0.4664
Epoch 00002: val_loss improved from 1.58818 to 1.31690, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 165s 368ms/step - loss: 1.3852 - accuracy: 0.4664 - val_loss: 1.3169 - val_accuracy: 0.4827
Epoch 3/50
448/448 [==============================] - ETA: 0s - loss: 1.2058 - accuracy: 0.5277
Epoch 00003: val_loss improved from 1.31690 to 1.21979, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 165s 368ms/step - loss: 1.2058 - accuracy: 0.5277 - val_loss: 1.2198 - val_accuracy: 0.5271
Epoch 4/50
448/448 [==============================] - ETA: 0s - loss: 1.0828 - accuracy: 0.5861
Epoch 00004: val_loss improved from 1.21979 to 1.18972, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 166s 370ms/step - loss: 1.0828 - accuracy: 0.5861 - val_loss: 1.1897 - val_accuracy: 0.5533
Epoch 5/50
448/448 [==============================] - ETA: 0s - loss: 0.9754 - accuracy: 0.6380
Epoch 00005: val_loss improved from 1.18972 to 1.13336, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 165s 368ms/step - loss: 0.9754 - accuracy: 0.6380 - val_loss: 1.1334 - val_accuracy: 0.5743
Epoch 6/50
448/448 [==============================] - ETA: 0s - loss: 0.8761 - accuracy: 0.6848
Epoch 00006: val_loss did not improve from 1.13336
448/448 [==============================] - 153s 342ms/step - loss: 0.8761 - accuracy: 0.6848 - val_loss: 1.1348 - val_accuracy: 0.5882
Epoch 7/50
448/448 [==============================] - ETA: 0s - loss: 0.7783 - accuracy: 0.7264
Epoch 00007: val_loss did not improve from 1.13336
448/448 [==============================] - 153s 341ms/step - loss: 0.7783 - accuracy: 0.7264 - val_loss: 1.1392 - val_accuracy: 0.5893
Epoch 8/50
448/448 [==============================] - ETA: 0s - loss: 0.6832 - accuracy: 0.7638
Epoch 00008: val_loss did not improve from 1.13336
448/448 [==============================] - 153s 342ms/step - loss: 0.6832 - accuracy: 0.7638 - val_loss: 1.1542 - val_accuracy: 0.6052

Since your validation loss is increasing while your training loss decreases, I think you may have a problem of overfitting to your training sets. Some things that could help are:
Use less dense layers. I think there are too many, but I could be wrong since I don't know what problem you are solving.
Add dropout layers after every dense layer.
Increase the dropout rates.
Use data augmentation on your training set (since you are already using ImageDataGenerator it won't be that hard).
Reduce the number of neurons in dense layers.
Use regularization.
You can try applying any of them or multiple of them at the same time. Tweaking your model is a lot of trial and error, do some experiments, and keep the model that achieves the best performance.

With your training loss decreasing and your validation loss increasing it appears that you are in an over fitting situation. The more complex your model the higher the chance of this happening so I suggest you try to simplify your model. Try removing all the dense layers except for the top layer which is producing your classifications. Run that and see how it does. My experience with MobileNet is that it should work well. If not add an additional dense layer with about 128 neurons followed by a drop out layer. Use the drop out value to adjust for over fitting. If you are still over fitting you might want to consider adding regularizers to this dense layer. Documentation is here. It could also be the case that you need more training samples. Use the Image Data Generator to augment your data set. For example set horizontal_flip-True. I notice you did not include the MobileNetV2 pre-processing function in the Image Data Generator. I believe this is necessary so modify the code as shown below. I have also found I get the best results with MobileNet if I train the entire model. So before you compile your model add the code below.
ImageDataGenerator(preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input)
for layer in model.layers:
layer.trainable=True
As an aside you can delete that layer in you model and within the MobileNet set the parameter pooling='max'. I see you start out with a very small learning rate try something like .005. Since you have the ReduceLROnPlateau callback this will adjust it if it is to large but will allow you to converge faster.

Why do I get inconsistent loss behavior for my deep model?

I am working on an image classification results. My training and testing split used the same random_state. Model definition is the same. However, when I run the model for multiple times, three out of four times, the model is not learning, the loss function does not go down; one out of four times, the model is learning, I can get good classificaiton results. I suspect the randomness comes from the ImageDataGenerator(). But I cannot figure out how to let the model learn every time.
I have a relative small labeled dataset, I don't have ways to increase the data size
I tried different optimizers, different batch size. It doesn't help. I found that when I reduce the trainable layers and make the later fully-connected layers smaller (reduce to 256 units), the model start to learn every time. But why big network does not learn well even on the training data set??? My understanding is that the model will overfit, but why in this case, it is not learning at all?
IMAGE_WIDTH=128
IMAGE_HEIGHT=128
IMAGE_SIZE=(IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS=3 # RGB color
os.chdir(r"XXX")
filenames = os.listdir(r"XXX")
ref_db= pd.read_csv(r"XXX")
ref_db['obj_id']= [str(i)+ '.tif' for i in ref_db.OBJECTID.values ]
ref_db2= ref_db[['label', 'obj_id' ]]
ref_db2['label'] = ref_db2['label'].apply(str)
train_df, validate_df = train_test_split(ref_db2, test_size=0.20, random_state=42)
train_df = train_df.reset_index(drop=True)
validate_df = validate_df.reset_index(drop=True)
total_train = train_df.shape[0]
total_validate = validate_df.shape[0]
batch_size=64
train_datagen = ImageDataGenerator(
rotation_range=15,
rescale=1./255,
shear_range=0.1,
zoom_range=0.2,
horizontal_flip=True,
width_shift_range=0.1,
height_shift_range=0.1
)
train_generator = train_datagen.flow_from_dataframe(
train_df,
r"XXX",
x_col='obj_id',
y_col='Green_Roof',
target_size=IMAGE_SIZE,
class_mode='binary',
batch_size=batch_size
)
inputs= Input(shape=(IMAGE_WIDTH, IMAGE_HEIGHT, 3))
base_model = VGG19(weights='imagenet', include_top=False,)
for layer in base_model.layers[:-3]:
layer.trainable = False
x = base_model(inputs)
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
#x = Dropout(0.5)(x)
x = Dense(512, activation="relu")(x)
predictions = Dense(1, activation="sigmoid")(x)
model_vgg= Model(inputs=inputs , outputs=predictions)
model_vgg.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
#########################
history = model_vgg.fit_generator(
train_generator,
epochs=50,
validation_data=validation_generator,
validation_steps=total_validate//batch_size,
steps_per_epoch=total_train//batch_size,
verbose=2
)
This is the unwanted behavior, Model is not learning, all observations are predicted as 1, the loss is not dropping
Found 756 validated image filenames belonging to 2 classes.
Found 190 validated image filenames belonging to 2 classes.
Epoch 1/50
- 4s - loss: 4.0464 - acc: 0.6203 - val_loss: 4.9820 - val_acc: 0.6875
Epoch 2/50
- 2s - loss: 4.3811 - acc: 0.7252 - val_loss: 4.8856 - val_acc: 0.6935
Epoch 3/50
- 2s - loss: 5.0209 - acc: 0.6851 - val_loss: 5.3556 - val_acc: 0.6641
Epoch 4/50
- 2s - loss: 4.3583 - acc: 0.7266 - val_loss: 4.1142 - val_acc: 0.7419
Epoch 5/50
- 2s - loss: 4.9317 - acc: 0.6907 - val_loss: 4.7329 - val_acc: 0.7031
Epoch 6/50
- 2s - loss: 4.6275 - acc: 0.7097 - val_loss: 5.3998 - val_acc: 0.6613
Epoch 7/50
This is the expected behavior, model is learning, both 1 and 0 are predicted, the loss is dropping
Found 756 validated image filenames belonging to 2 classes.
Found 190 validated image filenames belonging to 2 classes.
Epoch 1/50
- 4s - loss: 2.1181 - acc: 0.6484 - val_loss: 0.8013 - val_acc: 0.6562
Epoch 2/50
- 2s - loss: 0.6609 - acc: 0.7096 - val_loss: 0.5670 - val_acc: 0.7581
Epoch 3/50
- 2s - loss: 0.6539 - acc: 0.6912 - val_loss: 0.5923 - val_acc: 0.6953
Epoch 4/50
- 2s - loss: 0.5695 - acc: 0.7083 - val_loss: 0.5426 - val_acc: 0.6774
Epoch 5/50
- 2s - loss: 0.5262 - acc: 0.7176 - val_loss: 0.5386 - val_acc: 0.6875

0% accuracy with evaluate_generator but 75% accuracy during training with same data - what is going on?

I'm encountering a very strange with a keras model using ImageDataGenerator, fit_generator, and evaluate_generator.
I'm creating the model like so:
classes = <list of classes>
num_classes = len(classes)
pretrained_model = Sequential()
pretrained_model.add(ResNet50(include_top=False, weights='imagenet', pooling='avg'))
pretrained_model.add(Dense(num_classes, activation='softmax'))
pretrained_model.layers[0].trainable = False
pretrained_model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
And I'm training it like this:
idg_final = ImageDataGenerator(
data_format='channels_last',
rescale=1./255,
width_shift_range = 0.2,
height_shift_range = 0.2,
rotation_range=15,
)
traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')
pretrained_model.fit_generator(traing_gen, epochs=1, verbose=1)
fit_generator prints loss: 1.0297 - acc: 0.7546.
Then, I am trying to evaluate the model on the exact same data it was trained on.
debug_gen = idg_final.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes, shuffle=True)
print(pretrained_model.evaluate_generator(debug_gen, steps=100))
Which prints [10.278913383483888, 0.0].
Why is the accuracy so different on the same exact data?
Edit: I also wanted to point out that sometimes the accuracy is above 0.0. For example, when I use a model trained with five epochs, evaluate_accuracy returns 6% accuracy.
Edit 2: Based on the answers below I made sure to train for more epochs and that the ImageDataGenerator for evaluation did not have random shifts and rotations. I'm still getting very high accuracy during training and extremely low accuracy during evaluation on the same dataset.
I'm training like
idg_final = ImageDataGenerator(
data_format='channels_last',
rescale=1./255,
width_shift_range = 0.2,
height_shift_range = 0.2,
rotation_range=15,
)
traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')
pretrained_model.fit_generator(traing_gen, epochs=10, verbose=1)
Which prints the following:
Found 9850 images belonging to 4251 classes.
Epoch 1/10
308/308 [==============================] - 3985s 13s/step - loss: 8.9218 - acc: 0.0860
Epoch 2/10
308/308 [==============================] - 3555s 12s/step - loss: 3.2710 - acc: 0.3403
Epoch 3/10
308/308 [==============================] - 3594s 12s/step - loss: 1.8597 - acc: 0.5836
Epoch 4/10
308/308 [==============================] - 3656s 12s/step - loss: 1.2712 - acc: 0.7058
Epoch 5/10
308/308 [==============================] - 3667s 12s/step - loss: 0.9556 - acc: 0.7795
Epoch 6/10
308/308 [==============================] - 3689s 12s/step - loss: 0.7665 - acc: 0.8207
Epoch 7/10
308/308 [==============================] - 3693s 12s/step - loss: 0.6581 - acc: 0.8498
Epoch 8/10
308/308 [==============================] - 3618s 12s/step - loss: 0.5874 - acc: 0.8636
Epoch 9/10
308/308 [==============================] - 3823s 12s/step - loss: 0.5144 - acc: 0.8797
Epoch 10/10
308/308 [==============================] - 4334s 14s/step - loss: 0.4835 - acc: 0.8854
And I'm evaluating like this on the exact same dataset
idg_debug = ImageDataGenerator(
data_format='channels_last',
rescale=1./255,
)
debug_gen = idg_debug.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes)
print(pretrained_model.evaluate_generator(debug_gen))
Which prints the following very low accuracy: [10.743386410747084, 0.0001015228426395939]
The full code is here.

Two things I suspect.
1 - No, your data is not the same.
You're using three types of augmentation in ImageDataGenerator, and it seems there isn't a random seed being set. So, test data is not equal to training data.
And as it seems, you're also training for only one epoch, which is very little (unless you really have tons of data, but since you're using augmentation, maybe that's not the case). (PS: I don't see the steps_per_epoch argument in your fit_generator call...)
So, if you want to see good results, here are some solutions:
remove the augmentation arguments from the generator for this test (either training and test data) - This means, remove width_shift_range, height_shift_range and rotation_range;
if not, train for really long, enough for your model to really get used to all kinds of augmented images (as it seems, five epochs seem still to be way too little);
or set a random seed and guarantee that the test data is equal to the training data (argument seed in flow_from_directory)
2 - (This may happen if you're very new to Keras/programming, so please ignore if it's not the case) You might be running the code that defines the model again when testing.
If you run the code that defines the model again, it will replace all your previous training with random weights.
3 - Since we're out of suggestions:
Maybe save the weights instead of saving the model. I usually do this instead of saving the models. (For some reason I don't understand, I've never been able to load a model like that)
def createModel():
....
model = createModel()
...
model.fit_generator(....)
np.save('model_weights.npy',model.get_weights())
model = createModel()
model.set_weights(np.load('model_weights.npy'))
...
model.evaluate_generator(...)
Hint:
It's not related to the bug, but make sure that the base model layer is really layer 0. If I remember well, sequential models have an input layer and you should actually be making layer 1 untrainable instead.
Use the model.summary() to confirm the number of untrainable parameters.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.