My model trains fine on a CPU machine but I am running into an issue when trying to rerun it on our cluster (using a single GPU and the same dataset). When training on a GPU machine validation loss and accuracy are not improving from epoch to epoch (see below).This was not the case on a CPU machine (I was able to achieve validation accuracy ~0.8 after 20 epochs)
Details:
Keras 2.1.3
TensforFlow backend
70/20/10 train/dev/test
~ 7000 images
model is based on ResNet50
Code
import sys
import math
import os
import glob
from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential, Model
from keras.layers import Flatten, Dense
from keras import backend as k
from keras.callbacks import ModelCheckpoint, CSVLogger, EarlyStopping
############ Training parameters ##################
img_width, img_height = 224, 224
batch_size = 32
epochs = 100
############ Define the data ##################
train_data_dir = '/mnt/data/train'
validation_data_dir = '/mnt/data/validate'
train_data_dir_class1 = os.path.join(train_data_dir,'class1', '*.jpg')
train_data_dir_class2 = os.path.join(train_data_dir, 'class2', '*.jpg')
validation_data_dir_class1 = os.path.join(validation_data_dir, 'class1', '*.jpg')
validation_data_dir_class2 = os.path.join(validation_data_dir, 'class2', '*.jpg')
# number of training and validation samples
nb_train_samples = len(glob.glob(train_data_dir_class1)) + len(glob.glob(train_data_dir_class2))
nb_validation_samples = len(glob.glob(validation_data_dir_class1)) + len(glob.glob(validation_data_dir_class2))
############ Define the model ##################
model = applications.resnet50.ResNet50(weights = "imagenet",
include_top = False,
input_shape = (img_width, img_height, 3))
for layer in model.layers:
layer.trainable = False
# Adding a FC layer
x = model.output
x = Flatten()(x)
predictions = Dense(1, activation = "sigmoid")(x)
# creating the final model
model_final = Model(inputs = model.input, outputs = predictions)
# compile the model
model_final.compile(loss = "binary_crossentropy",
optimizer = optimizers.Adam(lr = 0.001,
beta_1 = 0.9,
beta_2 = 0.999,
epsilon = 1e-10),
metrics = ["accuracy"])
# train and test generators
train_datagen = ImageDataGenerator(rescale = 1./255,
horizontal_flip = True,
fill_mode = "nearest",
zoom_range = 0.3,
width_shift_range = 0.3,
height_shift_range = 0.3,
rotation_range = 30)
test_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,
target_size = (img_height, img_width),
batch_size = batch_size,
class_mode = "binary",
seed = 2018)
validation_generator = test_datagen.flow_from_directory(validation_data_dir,
target_size = (img_height, img_width),
class_mode = "binary",
seed = 2018)
early = EarlyStopping(monitor = 'val_loss', min_delta = 10e-5, patience = 10, verbose = 1, mode = 'auto')
performance_log = CSVLogger('/mnt/results/vanilla_model_log.csv', separator = ',', append = False)
# Train the model
model_final.fit_generator(generator = train_generator,
steps_per_epoch = math.ceil(train_generator.samples / batch_size),
epochs = epochs,
validation_data = validation_generator,
validation_steps = math.ceil(validation_generator.samples / batch_size),
callbacks = [early, performance_log])
# Save the model
model_final.save('/mnt/results/vanilla_model.h5')
Training Log
Epoch 1/100
151/151 [==============================] - 237s 2s/step - loss: 0.7234 - acc: 0.5240 - val_loss: 0.9899 - val_acc: 0.5425
Epoch 2/100
151/151 [==============================] - 65s 428ms/step - loss: 0.6491 - acc: 0.6228 - val_loss: 1.0248 - val_acc: 0.5425
Epoch 3/100
151/151 [==============================] - 65s 429ms/step - loss: 0.6091 - acc: 0.6648 - val_loss: 1.0377 - val_acc: 0.5425
Epoch 4/100
151/151 [==============================] - 64s 426ms/step - loss: 0.5829 - acc: 0.6968 - val_loss: 1.0459 - val_acc: 0.5425
Epoch 5/100
151/151 [==============================] - 64s 427ms/step - loss: 0.5722 - acc: 0.7070 - val_loss: 1.0472 - val_acc: 0.5425
Epoch 6/100
151/151 [==============================] - 64s 427ms/step - loss: 0.5582 - acc: 0.7166 - val_loss: 1.0501 - val_acc: 0.5425
Epoch 7/100
151/151 [==============================] - 64s 424ms/step - loss: 0.5535 - acc: 0.7188 - val_loss: 1.0492 - val_acc: 0.5425
Epoch 8/100
151/151 [==============================] - 64s 426ms/step - loss: 0.5377 - acc: 0.7287 - val_loss: 1.0209 - val_acc: 0.5425
Epoch 9/100
151/151 [==============================] - 64s 425ms/step - loss: 0.5328 - acc: 0.7368 - val_loss: 1.0062 - val_acc: 0.5425
Epoch 10/100
151/151 [==============================] - 65s 432ms/step - loss: 0.5296 - acc: 0.7381 - val_loss: 1.0016 - val_acc: 0.5425
Epoch 11/100
151/151 [==============================] - 65s 430ms/step - loss: 0.5231 - acc: 0.7419 - val_loss: 1.0021 - val_acc: 0.5425
Since I was able to get good results on a CPU machine, I hypothesized that validation loss/accuracy must be calculated incorrectly at the end of each epoch. To test this theory I used train set as validation set: if validation loss/accuracy is calculated correctly we should see roughly the same values for train and validation loss and accuracy. As you may see below, validation loss values are not the same as training loss values, which makes me believe validation loss is calculated incorrectly at the end of each epoch. Why does it happen? What are the possible solutions?
Modified Code
import sys
import math
import os
import glob
from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential, Model
from keras.layers import Flatten, Dense
from keras import backend as k
from keras.callbacks import ModelCheckpoint, CSVLogger, EarlyStopping
############ Training parameters ##################
img_width, img_height = 224, 224
batch_size = 32
epochs = 100
############ Define the data ##################
train_data_dir = '/mnt/data/train'
validation_data_dir = '/mnt/data/train' # redefined validation set to test accuracy of validation loss/accuracy calculations
train_data_dir_class1 = os.path.join(train_data_dir,'class1', '*.jpg')
train_data_dir_class2 = os.path.join(train_data_dir, 'class2', '*.jpg')
validation_data_dir_class1 = os.path.join(validation_data_dir, 'class1', '*.jpg')
validation_data_dir_class2 = os.path.join(validation_data_dir, 'class2', '*.jpg')
# number of training and validation samples
nb_train_samples = len(glob.glob(train_data_dir_class1)) + len(glob.glob(train_data_dir_class2))
nb_validation_samples = len(glob.glob(validation_data_dir_class1)) + len(glob.glob(validation_data_dir_class2))
############ Define the model ##################
model = applications.resnet50.ResNet50(weights = "imagenet",
include_top = False,
input_shape = (img_width, img_height, 3))
for layer in model.layers:
layer.trainable = False
# Adding a FC layer
x = model.output
x = Flatten()(x)
predictions = Dense(1, activation = "sigmoid")(x)
# creating the final model
model_final = Model(inputs = model.input, outputs = predictions)
# compile the model
model_final.compile(loss = "binary_crossentropy",
optimizer = optimizers.Adam(lr = 0.001,
beta_1 = 0.9,
beta_2 = 0.999,
epsilon = 1e-10),
metrics = ["accuracy"])
# train and test generators
train_datagen = ImageDataGenerator(rescale = 1./255,
horizontal_flip = True,
fill_mode = "nearest",
zoom_range = 0.3,
width_shift_range = 0.3,
height_shift_range = 0.3,
rotation_range = 30)
test_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,
target_size = (img_height, img_width),
batch_size = batch_size,
class_mode = "binary",
seed = 2018)
validation_generator = test_datagen.flow_from_directory(validation_data_dir,
target_size = (img_height, img_width),
class_mode = "binary",
seed = 2018)
early = EarlyStopping(monitor = 'val_loss', min_delta = 10e-5, patience = 10, verbose = 1, mode = 'auto')
performance_log = CSVLogger('/mnt/results/vanilla_model_log.csv', separator = ',', append = False)
# Train the model
model_final.fit_generator(generator = train_generator,
steps_per_epoch = math.ceil(train_generator.samples / batch_size),
epochs = epochs,
validation_data = validation_generator,
validation_steps = math.ceil(validation_generator.samples / batch_size),
callbacks = [early, performance_log])
# Save the model
model_final.save('/mnt/results/vanilla_model.h5')
Training log for the modified code:
Epoch 1/100
151/151 [==============================] - 251s 2s/step - loss: 0.6804 - acc: 0.5910 - val_loss: 0.6923 - val_acc: 0.5469
Epoch 2/100
151/151 [==============================] - 87s 578ms/step - loss: 0.6258 - acc: 0.6523 - val_loss: 0.6938 - val_acc: 0.5469
Epoch 3/100
151/151 [==============================] - 88s 580ms/step - loss: 0.5946 - acc: 0.6874 - val_loss: 0.7001 - val_acc: 0.5469
Epoch 4/100
151/151 [==============================] - 88s 580ms/step - loss: 0.5718 - acc: 0.7086 - val_loss: 0.7036 - val_acc: 0.5469
Epoch 5/100
151/151 [==============================] - 87s 578ms/step - loss: 0.5634 - acc: 0.7157 - val_loss: 0.7067 - val_acc: 0.5469
Epoch 6/100
151/151 [==============================] - 87s 578ms/step - loss: 0.5467 - acc: 0.7243 - val_loss: 0.7099 - val_acc: 0.5469
Epoch 7/100
151/151 [==============================] - 87s 578ms/step - loss: 0.5392 - acc: 0.7317 - val_loss: 0.7096 - val_acc: 0.5469
Epoch 8/100
151/151 [==============================] - 87s 578ms/step - loss: 0.5287 - acc: 0.7387 - val_loss: 0.7083 - val_acc: 0.5469
Epoch 9/100
151/151 [==============================] - 87s 575ms/step - loss: 0.5306 - acc: 0.7385 - val_loss: 0.7088 - val_acc: 0.5469
Epoch 10/100
151/151 [==============================] - 87s 577ms/step - loss: 0.5303 - acc: 0.7318 - val_loss: 0.7111 - val_acc: 0.5469
Epoch 11/100
151/151 [==============================] - 87s 578ms/step - loss: 0.5157 - acc: 0.7474 - val_loss: 0.7143 - val_acc: 0.5469
A very quick idea that might help.
I think image labels are randomly assigned by two image data generator and trained.
And two image data generator gives different label distribution.
That's why training accuracy goes up while validation set remains around 50%.
I haven't entirely checked documentation of data image generator. Hope this might helps.
Argument classes for flow_from_directory() describes a way of setting up training labels.
classes: optional list of class subdirectories (e.g. ['dogs',
'cats']). Default: None. If not provided, the list of classes will be
automatically inferred from the subdirectory names/structure under
directory, where each subdirectory will be treated as a different
class (and the order of the classes, which will map to the label
indices, will be alphanumeric). The dictionary containing the mapping
from class names to class indices can be obtained via the attribute
class_indices.
Related
I have made a model that tries to predict the chances of every piano key playing in a time step given all time steps before it. I tried making a GRU network with 88 outputs(one for every piano key)
input shape = (600,88,)
desired output/ label shape = (88, )
import numpy as np
import midi_processer
from keras import models
from keras import layers
x_train, x_test = np.load("samples.npy", mmap_mode='r'), np.load("test_samples.npy", mmap_mode='r')
y_train, y_test = np.load("labels.npy", mmap_mode='r'), np.load("test_labels.npy", mmap_mode='r')
def build_model():
model = models.Sequential()
model.add(layers.Input(shape=(600,88,)))
model.add(layers.GRU(512,activation='tanh',recurrent_activation='hard_sigmoid'))
model.add(layers.RepeatVector(600))
model.add(layers.GRU(512,activation='tanh', recurrent_activation='hard_sigmoid'))
model.add(layers.Dense(88, activation = 'sigmoid'))
return model
x_partial, x_val = x_train[:13000], x_train[13000:]
y_partial, y_val = y_train[:13000], y_train[13000:]
model = build_model()
model.compile(optimizer = 'adam',
loss = 'binary_crossentropy',
metrics = ['accuracy'])
history = model.fit(x_partial, y_partial, batch_size = 50, epochs = , validation_data= (x_val,y_val))
instead of learning normally my algorithm had stayed with constant accuracy throughout all of the epochs
Epoch 1/15
260/260 [==============================] - 998s 4s/step - loss: -0.1851 - accuracy: 0.0298 - val_loss: -8.8735 - val_accuracy: 0.0310
Epoch 2/15
260/260 [==============================] - 827s 3s/step - loss: -33.6520 - accuracy: 0.0382 - val_loss: -56.0122 - val_accuracy: 0.0310
Epoch 3/15
260/260 [==============================] - 844s 3s/step - loss: -78.6130 - accuracy: 0.0382 - val_loss: -98.2798 - val_accuracy: 0.0310
Epoch 4/15
260/260 [==============================] - 906s 3s/step - loss: -121.0963 - accuracy: 0.0382 - val_loss: -139.3440 - val_accuracy: 0.0310
Epoch 5/15
I am running tf.keras.callbacks.ModelCheckpoint with the accuracy metric but loss is used to save the best checkpoints. I have tested this in different places (my computer and collab) and two different code and faced the same issue. Here is an example code and the results:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
import shutil
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(
optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
return model
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
ckpt_folder = os.path.join(os.getcwd(), 'ckpt')
if os.path.exists(ckpt_folder):
shutil.rmtree(ckpt_folder)
ckpt_path = os.path.join(r'D:\deep_learning\tf_keras\semantic_segmentation\logs', 'mymodel_{epoch}')
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
# Path where to save the model
# The two parameters below mean that we will overwrite
# the current checkpoint if and only if
# the `val_loss` score has improved.
# The saved model name will include the current epoch.
filepath=ckpt_path,
montior="val_accuracy",
# save the model weights with best validation accuracy
mode='max',
save_best_only=True, # only save the best weights
save_weights_only=False,
# only save model weights (not whole model)
verbose=1
)
]
model = get_compiled_model()
model.fit(
x_train, y_train, epochs=3, batch_size=1, callbacks=callbacks, validation_split=0.2, steps_per_epoch=1
)
1/1 [==============================] - ETA: 0s - loss: 2.6475 - accuracy: 0.0000e+00
Epoch 1: val_loss improved from -inf to 2.32311, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_1
1/1 [==============================] - 6s 6s/step - loss: 2.6475 - accuracy: 0.0000e+00 - val_loss: 2.3231 - val_accuracy: 0.1142
Epoch 2/3
1/1 [==============================] - ETA: 0s - loss: 1.9612 - accuracy: 1.0000
Epoch 2: val_loss improved from 2.32311 to 2.34286, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_2
1/1 [==============================] - 5s 5s/step - loss: 1.9612 - accuracy: 1.0000 - val_loss: 2.3429 - val_accuracy: 0.1187
Epoch 3/3
1/1 [==============================] - ETA: 0s - loss: 2.8378 - accuracy: 0.0000e+00
Epoch 3: val_loss did not improve from 2.34286
1/1 [==============================] - 5s 5s/step - loss: 2.8378 - accuracy: 0.0000e+00 - val_loss: 2.2943 - val_accuracy: 0.1346
In your code, You write montior instead of monitor, and the function doesn't have this word as param then use the default value, If you write like below, You get what you want:
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
filepath=ckpt_path,
monitor="val_accuracy",
mode='max',
save_best_only=True,
save_weights_only=False,
verbose=1
)
]
I tried to basically copy this tutorial: https://keras.io/examples/vision/image_classification_from_scratch/
But I can't seem to improve on my val_accuracy score. I also have 2 kinds of images dogs (Hunde) and cats (Katzen) but only 95 samples each. I have an "upper" folder "Hunde und Katzen" where the folders of these samples are. I probably have to tune some parameters, because my sample size is so low but I already tried at some code parts.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
num_skipped = 0
for folder_name in ("Hund", "Katze"):
folder_path = os.path.join("Hund und Katze", folder_name)
for fname in os.listdir(folder_path):
fpath = os.path.join(folder_path, fname)
try:
fobj = open(fpath, "rb")
is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
finally:
fobj.close()
if not is_jfif:
num_skipped += 1
# Delete corrupted image
os.remove(fpath)
print("Deleted %d images" % num_skipped)
image_size = (180, 180)
batch_size = 16
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"Hund und Katze",
validation_split=0.5,
subset="training",
seed=9,
image_size=image_size,
batch_size=batch_size,
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
"Hund und Katze",
validation_split=0.5,
subset="validation",
seed=9,
image_size=image_size,
batch_size=batch_size,
)
#Found 190 files belonging to 2 classes.
#Using 95 files for training.
#Found 190 files belonging to 2 classes.
#Using 95 files for validation.
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
]
)
train_ds = train_ds.prefetch(buffer_size=8)
val_ds = val_ds.prefetch(buffer_size=8)
def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Image augmentation block
x = data_augmentation(inputs)
# Entry block
x = layers.Rescaling(1.0 / 255)(x)
x = layers.Conv2D(16, 3, strides=2, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.Conv2D(32, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
previous_block_activation = x # Set aside residual
for size in [128, 256, 512, 728]:
x = layers.Activation("relu")(x)
x = layers.SeparableConv2D(size, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.SeparableConv2D(size, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(3, strides=2, padding="same")(x)
# Project residual
residual = layers.Conv2D(size, 1, strides=2, padding="same")(
previous_block_activation
)
x = layers.add([x, residual]) # Add back residual
previous_block_activation = x # Set aside next residual
x = layers.SeparableConv2D(1024, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.GlobalAveragePooling2D()(x)
if num_classes == 2:
activation = "sigmoid"
units = 1
else:
activation = "softmax"
units = num_classes
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(units, activation=activation)(x)
return keras.Model(inputs, outputs)
model = make_model(input_shape=image_size + (3,), num_classes=2)
keras.utils.plot_model(model, show_shapes=True)
#('You must install pydot (`pip install pydot`) and install graphviz (see instructions at
#https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')
epochs = 10
callbacks = [
keras.callbacks.ModelCheckpoint("save_at_{epoch}.h5"),
]
model.compile(
optimizer=keras.optimizers.Adam(0.001),
loss="binary_crossentropy",
metrics=["accuracy"],
)
model.fit(
train_ds, epochs=epochs, callbacks=callbacks, validation_data=val_ds,
)
Output: Epoch 1/10
6/6 [==============================] - 8s 1s/step - loss: 0.7691 - accuracy: 0.6421 - val_loss: 0.6935 - val_accuracy: 0.4632
E:\anacondaBI\lib\site-packages\keras\engine\functional.py:1410: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
layer_config = serialize_layer_fn(layer)
Epoch 2/10
6/6 [==============================] - 6s 995ms/step - loss: 0.7747 - accuracy: 0.6526 - val_loss: 0.6917 - val_accuracy: 0.5368
Epoch 3/10
6/6 [==============================] - 6s 1s/step - loss: 0.6991 - accuracy: 0.7053 - val_loss: 0.6905 - val_accuracy: 0.5368
Epoch 4/10
6/6 [==============================] - 6s 1s/step - loss: 0.5411 - accuracy: 0.7368 - val_loss: 0.6935 - val_accuracy: 0.5368
Epoch 5/10
6/6 [==============================] - 6s 1s/step - loss: 0.3949 - accuracy: 0.8316 - val_loss: 0.7023 - val_accuracy: 0.5368
Epoch 6/10
6/6 [==============================] - 6s 1s/step - loss: 0.4440 - accuracy: 0.8526 - val_loss: 0.7199 - val_accuracy: 0.5368
Epoch 7/10
6/6 [==============================] - 6s 1s/step - loss: 0.3515 - accuracy: 0.8842 - val_loss: 0.7470 - val_accuracy: 0.5368
Epoch 8/10
6/6 [==============================] - 6s 1s/step - loss: 0.3249 - accuracy: 0.8526 - val_loss: 0.7955 - val_accuracy: 0.5368
Epoch 9/10
6/6 [==============================] - 6s 994ms/step - loss: 0.3953 - accuracy: 0.8421 - val_loss: 0.8570 - val_accuracy: 0.5368
Epoch 10/10
6/6 [==============================] - 6s 989ms/step - loss: 0.4363 - accuracy: 0.7789 - val_loss: 0.9189 - val_accuracy: 0.5368
<keras.callbacks.History at 0x2176ec764c0>
95 samples for each class is less to achieve a decent accuracy
decrease your validation_split to 0.05 (5% for validation ), as you have very less number of data points
If the first step does not help you then you can use transfer learning i.e using architectures that have a good accuracy on imagenet for e.g: MobileNets, ResNets and efficientnets
If the above 2 steps are not giving you a good accuracy then try increasing your data size and tune your hyperparameters.
I'm a beginner in deep learning and I'm trying to train a deep learning model to classify different ASL hand signs using Mobilenet_v2 and Inception.
Here are my codes create an ImageDataGenerator for creating the training and validation set.
# Reformat Images and Create Batches
IMAGE_RES = 224
BATCH_SIZE = 32
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split = 0.4
)
train_generator = datagen.flow_from_directory(
base_dir,
target_size = (IMAGE_RES,IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'training'
)
val_generator = datagen.flow_from_directory(
base_dir,
target_size= (IMAGE_RES, IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'validation'
)
Here are the codes to train the models:
# Do transfer learning with Tensorflow Hub
URL = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
feature_extractor = hub.KerasLayer(URL,
input_shape=(IMAGE_RES, IMAGE_RES, 3))
# Freeze pre-trained model
feature_extractor.trainable = False
# Attach a classification head
model = tf.keras.Sequential([
feature_extractor,
layers.Dense(5, activation='softmax')
])
model.summary()
# Train the model
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
EPOCHS = 5
history = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=EPOCHS,
validation_data = val_generator,
validation_steps=len(val_generator)
)
Epoch 1/5
94/94 [==============================] - 19s 199ms/step - loss: 0.7333 - accuracy: 0.7730 - val_loss: 0.6276 - val_accuracy: 0.7705
Epoch 2/5
94/94 [==============================] - 18s 190ms/step - loss: 0.1574 - accuracy: 0.9893 - val_loss: 0.5118 - val_accuracy: 0.8145
Epoch 3/5
94/94 [==============================] - 18s 191ms/step - loss: 0.0783 - accuracy: 0.9980 - val_loss: 0.4850 - val_accuracy: 0.8235
Epoch 4/5
94/94 [==============================] - 18s 196ms/step - loss: 0.0492 - accuracy: 0.9997 - val_loss: 0.4541 - val_accuracy: 0.8395
Epoch 5/5
94/94 [==============================] - 18s 193ms/step - loss: 0.0349 - accuracy: 0.9997 - val_loss: 0.4590 - val_accuracy: 0.8365
I've tried using data augmentation but the model still overfits so I'm wondering if I've done something wrong in my code.
Your data is very small. Try splitting with random seeds and check if the problem still persists.
If it does, then use regularizations and decrease the complexity of neural network.
Also experiment with different optimizers and smaller learning rate (try lr scheduler)
It seems like your dataset is very small with some true outputs separated only by a small distance of inputs in the input-output curve. That is why it is fitting easily to those points.
I am working on an image classification results. My training and testing split used the same random_state. Model definition is the same. However, when I run the model for multiple times, three out of four times, the model is not learning, the loss function does not go down; one out of four times, the model is learning, I can get good classificaiton results. I suspect the randomness comes from the ImageDataGenerator(). But I cannot figure out how to let the model learn every time.
I have a relative small labeled dataset, I don't have ways to increase the data size
I tried different optimizers, different batch size. It doesn't help. I found that when I reduce the trainable layers and make the later fully-connected layers smaller (reduce to 256 units), the model start to learn every time. But why big network does not learn well even on the training data set??? My understanding is that the model will overfit, but why in this case, it is not learning at all?
IMAGE_WIDTH=128
IMAGE_HEIGHT=128
IMAGE_SIZE=(IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS=3 # RGB color
os.chdir(r"XXX")
filenames = os.listdir(r"XXX")
ref_db= pd.read_csv(r"XXX")
ref_db['obj_id']= [str(i)+ '.tif' for i in ref_db.OBJECTID.values ]
ref_db2= ref_db[['label', 'obj_id' ]]
ref_db2['label'] = ref_db2['label'].apply(str)
train_df, validate_df = train_test_split(ref_db2, test_size=0.20, random_state=42)
train_df = train_df.reset_index(drop=True)
validate_df = validate_df.reset_index(drop=True)
total_train = train_df.shape[0]
total_validate = validate_df.shape[0]
batch_size=64
train_datagen = ImageDataGenerator(
rotation_range=15,
rescale=1./255,
shear_range=0.1,
zoom_range=0.2,
horizontal_flip=True,
width_shift_range=0.1,
height_shift_range=0.1
)
train_generator = train_datagen.flow_from_dataframe(
train_df,
r"XXX",
x_col='obj_id',
y_col='Green_Roof',
target_size=IMAGE_SIZE,
class_mode='binary',
batch_size=batch_size
)
inputs= Input(shape=(IMAGE_WIDTH, IMAGE_HEIGHT, 3))
base_model = VGG19(weights='imagenet', include_top=False,)
for layer in base_model.layers[:-3]:
layer.trainable = False
x = base_model(inputs)
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
#x = Dropout(0.5)(x)
x = Dense(512, activation="relu")(x)
predictions = Dense(1, activation="sigmoid")(x)
model_vgg= Model(inputs=inputs , outputs=predictions)
model_vgg.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
#########################
history = model_vgg.fit_generator(
train_generator,
epochs=50,
validation_data=validation_generator,
validation_steps=total_validate//batch_size,
steps_per_epoch=total_train//batch_size,
verbose=2
)
This is the unwanted behavior, Model is not learning, all observations are predicted as 1, the loss is not dropping
Found 756 validated image filenames belonging to 2 classes.
Found 190 validated image filenames belonging to 2 classes.
Epoch 1/50
- 4s - loss: 4.0464 - acc: 0.6203 - val_loss: 4.9820 - val_acc: 0.6875
Epoch 2/50
- 2s - loss: 4.3811 - acc: 0.7252 - val_loss: 4.8856 - val_acc: 0.6935
Epoch 3/50
- 2s - loss: 5.0209 - acc: 0.6851 - val_loss: 5.3556 - val_acc: 0.6641
Epoch 4/50
- 2s - loss: 4.3583 - acc: 0.7266 - val_loss: 4.1142 - val_acc: 0.7419
Epoch 5/50
- 2s - loss: 4.9317 - acc: 0.6907 - val_loss: 4.7329 - val_acc: 0.7031
Epoch 6/50
- 2s - loss: 4.6275 - acc: 0.7097 - val_loss: 5.3998 - val_acc: 0.6613
Epoch 7/50
This is the expected behavior, model is learning, both 1 and 0 are predicted, the loss is dropping
Found 756 validated image filenames belonging to 2 classes.
Found 190 validated image filenames belonging to 2 classes.
Epoch 1/50
- 4s - loss: 2.1181 - acc: 0.6484 - val_loss: 0.8013 - val_acc: 0.6562
Epoch 2/50
- 2s - loss: 0.6609 - acc: 0.7096 - val_loss: 0.5670 - val_acc: 0.7581
Epoch 3/50
- 2s - loss: 0.6539 - acc: 0.6912 - val_loss: 0.5923 - val_acc: 0.6953
Epoch 4/50
- 2s - loss: 0.5695 - acc: 0.7083 - val_loss: 0.5426 - val_acc: 0.6774
Epoch 5/50
- 2s - loss: 0.5262 - acc: 0.7176 - val_loss: 0.5386 - val_acc: 0.6875