Keras Resnet-50 image classification overfitting - python

Hello I am getting overfitting with resnet-50 pretrained weights. I am trying to train RGB images of files and the dataset I am using comes with training and validation sets. I have 26 classes and about 14k images, 9 k training and 5k testing.
The name of data set is maleviz
My validation accuracy is very low and my training accuracy reaches 1.000. My validation doesn't go over 0.50-0.55 so seems to be overfitting I think.. Is there something wrong with data like per class samples or is there something wrong with my model?
I expect resnet to perform well on this...
Here is my code:
import tensorflow as tf
import keras
from keras import backend as K
from keras.preprocessing.image import ImageDataGenerator
import keras
from keras.models import Sequential, Model, load_model
from tensorflow.keras.optimizers import Adam
from keras.callbacks import EarlyStopping,ModelCheckpoint
from keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization,Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D,MaxPool2D
from keras.preprocessing import image
from keras.initializers import glorot_uniform
from keras.applications.resnet import ResNet50
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
samples = ImageDataGenerator().flow_from_directory(directory='malevis_train_val_300x300/train', target_size=(300,300))
imgs, labels = next(samples)
print(imgs.shape, labels.shape)
samples2 = ImageDataGenerator().flow_from_directory(directory='malevis_train_val_300x300/val', target_size=(300,300))
imgs2, labels2 = next(samples2)
classes = samples.class_indices.keys()
y = (sum(labels)/labels.shape[0])*100
X_train, y_train = imgs,labels
X_val, y_val = imgs2,labels2
def define_model():
model = ResNet50(weights = 'imagenet', pooling = 'avg', include_top = False, input_shape =(300,300,3))
for layer in model.layers:
layer.trainable = False
flat1 = Flatten()(model.layers[-1].output)
class1 = Dense(256,activation='relu',)(flat1)
output = Dense(26,activation='softmax')(class1)
model = Model(inputs = model.inputs, outputs=output)
opt = Adam(lr =0.001)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
return model
model = define_model()
history1 =,y_train, validation_data=(X_val,y_val), epochs = 200,batch_size = 20, steps_per_epoch = 4,shuffle=True)
scores = model.evaluate(X_val,y_val)
print('Final accuracy:', scores[1])
acc = history1.history['accuracy']
val_acc = history1.history['val_accuracy']
loss = history1.history['loss']
val_loss = history1.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
I have tried different optimizers, loss functions, target size, and added epochs per step.. Nothing really makes much different it still overfits. I am using softmax activation and freezing the layers and removing top. I just then add dense layer and output layer for 26 classes.I have tried with shuffling true and false

I would like to suggest you a few things, one of them might be helpful:
You didn't provide any classes parameter inside flow_from_directory() make sure you have the proper folder structure as the documentation requires: flow_from_directory
Try changing the loss from categorical_crossentropy to sparse_categorical_crossentropy if your output labels are not one-hot encoded. Ref: Probabilistic losses | SparseCategoricalCrossentropy


What is the problem in this keras input shape?

from matplotlib import units
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers
from tensorflow import keras
x_test = np.array([[0,0,0,0,0],[1,1,1,1,1]])
x_data = np.array([[[0,0,0,0,0],[1,1,1,1,1]],
y_data = np.array([[0],[1]])
model = Sequential([
sgd = optimizers.SGD(learning_rate = 0.001)
model.compile(loss='mse', optimizer=sgd, metrics=['mse'])
# model fit
history =, y_data, batch_size=1, epochs=400, shuffle=False, verbose=1) # prediction
print (model.predict(x_test))
I want to get one output but when I do program it output like [[[~~],[~~]]] this.
iNPUT WHEN x is [[[1,1,1,1,1],[0,0,0,0,0]]] out [0]
when x is [[[0,0,0,0,0],[1,1,1,1,1]]] out [1]
(this case is just an example)
what's the problem?

Transfer learning on MobileNetV3 reaches plateau and I can't move past it

I'm trying to do transfer learning on MobileNetV3-Small using Tensorflow 2.5.0 to predict dog breeds (133 classes) and since it got reasonable accuracy on the ImageNet dataset (1000 classes) I thought it should have no problem adapting to my problem.
I've tried a multitude of training variations and recently had a breakthrough but now my training stagnates at about 60% validation accuracy with minor fluctuations in validation loss (accuracy and loss curves for training and validation below).
I tried using ReduceLROnPlateau in the 3rd graph below, but it didn't help to improve matters. Can anyone suggest how I could improve the training?
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.layers import GlobalMaxPooling2D, Dense, Dropout, BatchNormalization
from tensorflow.keras.applications import MobileNetV3Large, MobileNetV3Small
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True # needed for working with this dataset
# define generators
train_datagen = ImageDataGenerator(vertical_flip=True, horizontal_flip=True,
rescale=1.0/255, brightness_range=[0.5, 1.5],
zoom_range=[0.5, 1.5], rotation_range=90)
test_datagen = ImageDataGenerator(rescale=1.0/255)
train_gen = train_datagen.flow_from_directory(train_dir, target_size=(224,224),
batch_size=32, class_mode="categorical")
val_gen = test_datagen.flow_from_directory(val_dir, target_size=(224,224),
batch_size=32, class_mode="categorical")
test_gen = test_datagen.flow_from_directory(test_dir, target_size=(224,224),
batch_size=32, class_mode="categorical")
pretrained_model = MobileNetV3Small(input_shape=(224,224,3), classes=133,
weights="imagenet", pooling=None, include_top=False)
# set all layers trainable because when I froze most of the layers the model didn't learn so well
for layer in pretrained_model.layers:
layer.trainable = True
last_output = pretrained_model.layers[-1].output
x = GlobalMaxPooling2D()(last_output)
x = BatchNormalization()(x)
x = Dense(512, activation='relu')(x)
x = Dense(133, activation='softmax')(x)
model = Model(pretrained_model.input, x)
model.compile(optimizer=Adam(learning_rate=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
# val_acc with min_delta 0.003; val_loss with min_delta 0.01
plateau = ReduceLROnPlateau(monitor="val_loss", mode="min", patience=5,
min_lr=1e-8, factor=0.3, min_delta=0.01,
checkpointer = ModelCheckpoint(filepath=savepath, verbose=1, save_best_only=True,
monitor="val_accuracy", mode="max",
Your code looks good, but it seems to have one issue - you might be rescaling the inputs twice. According to the docs for MobilenetV3:
The preprocessing logic has been included in the mobilenet_v3 model implementation. Users are no longer required (...) to normalize the input data.
Now, in your code, there is:
test_datagen = ImageDataGenerator(rescale=1.0/255)
which essentially, makes the first model layers to rescale, already rescaled values.
The same applies for train_datagen.
You could try removing the rescale argument from both train and test loaders, or setting rescale=None.
This could also explain why the model did not learn well with the backbone frozen.

Keras: change learning rate

I'm trying to change the learning rate of my model after it has been trained with a different learning rate.
I read here, here, here and some other places i can't even find anymore.
I tried:
model.optimizer.learning_rate.set_value(0.1) = 0.1
model.optimizer.learning_rate = 0.1
K.set_value(model.optimizer.learning_rate, 0.1)
K.set_value(, 0.1)
... but none of them worked!
I don't understand how there could be such confusion around such a simple thing. Am I missing something?
EDIT: Working example
Here is a working example of what I'd like to do:
from keras.models import Sequential
from keras.layers import Dense
import keras
import numpy as np
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
optimizer=optimizer),10), np.random.randn(50), epochs=50)
# Change learning rate to 0.001 and train for 50 more epochs,10), np.random.randn(50), initial_epoch=50, epochs=50)
You can change the learning rate as follows:
from keras import backend as K
K.set_value(model.optimizer.learning_rate, 0.001)
Included into your complete example it looks as follows:
from keras.models import Sequential
from keras.layers import Dense
from keras import backend as K
import keras
import numpy as np
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse', optimizer=optimizer)
print("Learning rate before first fit:", model.optimizer.learning_rate.numpy()),10), np.random.randn(50), epochs=50, verbose=0)
# Change learning rate to 0.001 and train for 50 more epochs
K.set_value(model.optimizer.learning_rate, 0.001)
print("Learning rate before second fit:", model.optimizer.learning_rate.numpy()),10),
I've just tested this with keras 2.3.1. Not sure why the approach didn't seem to work for you.
There is another way, you have to find the variable that holds the learning rate and assign it another value.
optimizer = tf.keras.optimizers.Adam(0.001)
<tf.Variable 'learning_rate:0' shape=() dtype=float32, numpy=0.01>
You can change lr during training with
from keras.callbacks import LearningRateScheduler
# This is a sample of a scheduler I used in the past
def lr_scheduler(epoch, lr):
decay_rate = 0.85
decay_step = 1
if epoch % decay_step == 0 and epoch:
return lr * pow(decay_rate, np.floor(epoch / decay_step))
return lr
Apply scheduler to your model
callbacks = [LearningRateScheduler(lr_scheduler, verbose=1)]
model = build_model(pretrained_model=ka.InceptionV3, input_shape=(224, 224, 3))
history =, callbacks=callbacks, epochs=EPOCHS, verbose=1)
You should define it in the compile function :
optimizer = keras.optimizers.Adam(lr=0.01)
Looking at your comment, if you want to change the learning rate after the beginning you need to use a scheduler : link
Edit with your code and scheduler:
from keras.models import Sequential
from keras.layers import Dense
import keras
import numpy as np
def lr_scheduler(epoch, lr):
if epoch > 50:
lr = 0.001
return lr
return lr
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
callbacks = [keras.callbacks.LearningRateScheduler(lr_scheduler, verbose=1)],10), np.random.randn(50), epochs=100, callbacks=callbacks)
Suppose that you use Adam optimizer in keras, you'd want to define your optimizer before you compile your model with it.
For example, you can define
myadam = keras.optimizers.Adam(learning_rate=0.1)
Then, you compile your model with this optimizer.
I case you want to change your optimizer (with different type of optimizer or with different learning rate), you can define a new optimizer and compile your existing model with the new optimizer.
Hope this helps!
Some time ago I had a project for which I needed something similar. My idea to change the learning rate was to compile a new model with the new rate, then load the parameter weights from de old model to the new one.
For your example:
from keras.models import Sequential
from keras.layers import Dense
import keras
import numpy as np
# Initial model
model = Sequential()
model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss='mse', optimizer=optimizer),10), np.random.randn(50), epochs=50)
# Change learning rate to 0.001 and train for 50 more epochs
new_model = Sequential()
new_model.add(Dense(1, input_shape=(10,)))
optimizer = keras.optimizers.Adam(lr=0.001)
new_model.compile(loss='mse', optimizer=optimizer)
model = new_model,10), np.random.randn(50), initial_epoch=50, epochs=50)
With this you could see a worse fit of your model in the first epochs because ADAM uses previous steps to optimize and you will lose them.
Hope it helps someone!

Value error while accessing 10,000 .png image files from a folder

I'm trying to train a large data of about 10,000 images using the vgg16 pre-trained network for this i coded this but it seems to generate
ValueError: too many values to unpack (expected 2).
path= "C:/Users/52/.spyder-py3/IAM/train_patches/*.png"
(X_train,y_train),(X_test,y_test) = path //The error is occurring here
initially when i was just using it simply it was working but now when i'm using datagen function its not working. Please kindly help me in making this code correct
from keras.models import model_from_json
from keras.applications import VGG16
import numpy as np
import glob
import os
import keras
from keras.utils import to_categorical
from keras import backend as K
from PIL import Image
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from sklearn.model_selection import train_test_split
from keras import optimizers
import matplotlib.pyplot as plt
from keras.preprocessing.image import ImageDataGenerator
path= "C:/Users/52/.spyder-py3/IAM/train_patches/*.png"
(X_train,y_train) == path //The error is occurring here
sample_image = X_train[1,:,:,:]
plt.imshow(sample_image), plt.axis('off')
classes = 651
Y_train = to_categorical(y_train,classes)
X_train = X_train.astype('float32')
X_train = X_train/255
img_rows, img_cols = 500,500
#Include_top=False, Does not load the last two fully connected layers which act as the classifier.
#We are just loading the convolutional layers.
vgg_conv = VGG16(weights='imagenet',include_top=False,input_shape=(img_rows,img_cols,3))
# freeze the layer except the last 4 layers
for layer in vgg_conv.layers[:-4]:
model = Sequential()
# Add the vgg convolutional base model
# Add new layers
model.add(Dense(1024, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
# Show a summary of the model. Check the number of trainable parameters
datagen = ImageDataGenerator(rotation_range=40,
print("Size is: ",X_train.shape[0])
history = model.fit_generator(datagen.flow(X_train,Y_train,batch_size=128),
epochs = 2,
acc = history.history['acc']
loss = history.history['loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'b', label='Training acc')
plt.title('Training accuracy')
model_json = model.to_json()
The error means that you're trying to make an assignment on 4 variables, but the function to thr right of the = only outputs two.
I think it's beacuse you're reading the data and labels and also making training/testing split.
Try to read all the images and labels in only two variables and then make the split.

How to train a CNN model on 2 classes of 100 samples each and then test it on 200 new samples?

I've got 2 classes for my training set: Birds(100 samples) and no_birds(100) samples. And, the test set is unlabelled consisting of 200 test samples (mixed with birds and no_birds). For every sample in the test set I intend to classify it as bird or no_bird using CNN with Keras.
import numpy as np
import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Activation
from keras.layers.core import Dense, Flatten
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import *
from sklearn.metrics import confusion_matrix
import itertools
import matplotlib.pyplot as plt
train_path = 'dataset/train_set'
test_path = 'dataset/test_set'
train_batches = ImageDataGenerator().flow_from_directory(train_path, target_size=(224,224), classes=['bird', 'no_bird'], batch_size=10) # bird directory consisting of 100
test_batches = ImageDataGenerator().flow_from_directory(test_path, target_size=(224,224), classes=['unknown'], batch_size=10)
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(224,224,3)),
Dense(2, activation='softmax'),
model.compile(Adam(lr=.0001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(train_batches, steps_per_epoch=20, validation_data=test_batches, validation_steps=20, epochs=10, verbose=2)
Error I'm getting at the last step is this:
ValueError: Error when checking target: expected dense_1 to have shape (2,) but got array with shape (1,)
Now, I know it could be probably because of test_set having only 1 directory, since it's unlabelled. Correct me if I'm wrong. What should I do to make this work?
It seems your test set is unlabelled. Remove validation arguments from It should be:
model.fit_generator(train_batches, steps_per_epoch=20, epochs=10, verbose=2)
You can't validate without labels.
the line test_batches = ImageDataGenerator().flow_from_directory(test_path, target_size=(224,224), classes=['unknown'], batch_size=10) is wrong
you should do test_batches = ImageDataGenerator().flow_from_directory(test_path, target_size=(224,224), classes=['bird', 'no_bird'], batch_size=10) still. That way you can score your predictions
folowup information:
when you look at, it says
validation_data: tuple (x_val, y_val) or tuple (x_val, y_val, val_sample_weights) on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. This will override validation_split.
Your test data must be the same shape as your train data. You'll have to organize the test data directory so it's structured the same as the training data
