Training and validation loss increases after 10 epochs - python

I am training an image captioning model. This model is consist of two other models, a BERT and an Xception model. I train both of these two models in parallel. The model training accuracy seems fine till 10 epochs then the loss starts increasing. The code and parameters of this model are as follows.
num_epochs = 20 # In practice, train for at least 30 epochs
batch_size = 1
vision_encoder = create_vision_encoder(num_projection_layers=1, projection_dims=256, dropout_rate=0.1)
text_encoder = create_text_encoder(num_projection_layers=1, projection_dims=256, dropout_rate=0.1)
dual_encoder = DualEncoder(text_encoder, vision_encoder, temperature=0.05)
dual_encoder.compile(optimizer=tfa.optimizers.AdamW(learning_rate=0.001, weight_decay=0.001), #run_eagerly=True)
from tensorflow.keras.callbacks import LearningRateScheduler
import math
def step_decay(epoch):
initial_lrate = 0.001
drop = 0.005
epochs_drop = 10.0
lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))
return lrate
lrate = LearningRateScheduler(step_decay)
callbacks_list = [lrate]
print(f"Number of GPUs: {len(tf.config.list_physical_devices('GPU'))}")
print(f"Number of examples (caption-image pairs): {train_example_count}")
print(f"Batch size: {batch_size}")
print(f"Steps per epoch: {int(np.ceil(train_example_count / batch_size))}")
train_dataset = get_dataset(os.path.join(tfrecords_dir, "train-*.tfrecord"), batch_size)
valid_dataset = get_dataset(os.path.join(tfrecords_dir, "valid-*.tfrecord"), batch_size)
# Create a learning rate scheduler callback.
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.2, patience=3)
# Create an early stopping callback.
early_stopping = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True)
history = dual_encoder.fit(
train_dataset,
epochs=num_epochs,
validation_data=valid_dataset,
callbacks=[reduce_lr, early_stopping, callbacks_list],
)
print("Training completed. Saving vision and text encoders...")
vision_encoder.save("/content/drive/MyDrive/vision_encoder")
text_encoder.save("/content/drive/MyDrive/text_encoder")
print("Models are saved.")
model epochs
Number of GPUs: 1
Number of examples (caption-image pairs): 3500
Batch size: 1
Steps per epoch: 3500
Epoch 1/20
3500/3500 [==============================] - 217s 62ms/step - loss: 5.1028e-04 - val_loss: 1.9643e-04 - lr: 0.0010
Epoch 2/20
3500/3500 [==============================] - 218s 62ms/step - loss: 8.8274e-05 - val_loss: 3.3228e-05 - lr: 0.0010
Epoch 3/20
3500/3500 [==============================] - 220s 63ms/step - loss: 0.3582 - val_loss: 4.2012e-04 - lr: 0.0010
Epoch 4/20
3500/3500 [==============================] - 216s 62ms/step - loss: 9.6259e-04 - val_loss: 3.7130e-05 - lr: 0.0010
Epoch 5/20
3500/3500 [==============================] - 213s 61ms/step - loss: 1.7488e-05 - val_loss: 6.3365e-06 - lr: 2.0000e-04
Epoch 6/20
3500/3500 [==============================] - 208s 59ms/step - loss: 2.9985e-06 - val_loss: 1.0982e-06 - lr: 0.0010
Epoch 7/20
3500/3500 [==============================] - 207s 59ms/step - loss: 1.0761 - val_loss: 0.0212 - lr: 0.0010
Epoch 8/20
3500/3500 [==============================] - 211s 60ms/step - loss: 0.0062 - val_loss: 4.6654e-05 - lr: 2.0000e-04
Epoch 9/20
3499/3500 [============================>.] - ETA: 0s - loss: 2.2375e-05Epoch 10/20
3500/3500 [==============================] - 210s 60ms/step - loss: 234.2512 - val_loss: 309.9704 - lr: 5.0000e-06
Epoch 11/20
3500/3500 [==============================] - 211s 60ms/step - loss: 310.0370 - val_loss: 309.7400 - lr: 1.0000e-06
Training completed. Saving vision and text encoders...
WARNING:absl:Found untraced functions such as restored_function_body, restored_function_body, restored_function_body, restored_function_body, restored_function_body while saving (showing 5 of 124). These functions will not be directly callable after loading.
Models are saved.

Hi your loss is high because of the learning rate that is 5e-06 and 1e-06 in these cases try adjusting this and may be consider increasing it where you have a learning rate from 1e-4 to 0.01 because if learning rate is smaller and your model started from a place where loss was huge then it will be difficult for the model to adjust the new coefficients values.

Related

TensorFlow model has zero accuracy

I am currently training a model using the Cars196 dataset from Stanford. However, with the dataset correctly imported and recognized by TensorFlow, my accuracy is still 0. I used a similar approach to train the model on other datasets and it works. Did I do anything wrong?
Here is my code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import csv
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Flatten,Dense
car_dir = './src/'
test_dir = './src/cars_test/'
train_dir = './src/cars_train/'
train_labels_file = './src/labels-train.csv'
test_labels_file = './src/labels-test.csv'
IMG_SIZE = (150,150)
def read_labels(label_file:str):
pathAndClass = list()
with open(label_file) as csv_file:
reader = csv.reader(csv_file)
next(reader) # skip first row
for row in reader:
pathAndClass.append([row[5].lower(), row[4]])
return pd.DataFrame(pathAndClass,columns=['path', 'class'])
pathAndClass = read_labels(train_labels_file)
n_classes = np.size(np.unique(pathAndClass['class']))
pathAndClass['path'] = pathAndClass['path'].astype(str)
pathAndClass['class'] = pathAndClass['class'].astype(str)
data_gen = ImageDataGenerator(rescale = 1.0/255.0, validation_split=0.25)
BATCH_SIZE = 32
index_list = []
for i in range(0, n_classes):
index_list.append(str(i))
train_flow = data_gen.flow_from_dataframe(
dataframe=pathAndClass,
x_col='path',
y_col='class',
directory=train_dir,
subset="training",
seed=42,
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
shuffle=True,
classes=index_list,
class_mode='categorical')
valid_flow = data_gen.flow_from_dataframe(
dataframe=pathAndClass,
x_col='path',
y_col='class',
directory=train_dir,
subset="validation",
seed=42,
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
shuffle=True,
classes=index_list,
class_mode='categorical')
model_nn = Sequential()
model_nn.add(Flatten(input_shape=(150,150, 3)))
model_nn.add(Dense(300, activation="relu"))
model_nn.add(Dense(n_classes, activation="softmax"))
model_nn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model_nn.summary())
training = model_nn.fit(
train_flow,
steps_per_epoch=train_flow.n//train_flow.batch_size,
epochs=10,
validation_data=valid_flow,
validation_steps=valid_flow.n//valid_flow.batch_size)
print(model_nn.evaluate(train_flow))
plt.plot(training.history['accuracy'])
plt.plot(training.history['val_accuracy'])
plt.plot(training.history['loss'])
plt.plot(training.history['val_loss'])
plt.title('Model accuracy/loss')
plt.ylabel('accuracy/loss')
plt.xlabel('epoch')
plt.legend(['accuracy', 'val_accuracy', 'loss', 'val_loss'])
plt.show()
The output I got
Found 6078 validated image filenames belonging to 196 classes.
Found 2026 validated image filenames belonging to 196 classes.
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 67500) 0
_________________________________________________________________
dense_2 (Dense) (None, 300) 20250300
_________________________________________________________________
dense_3 (Dense) (None, 196) 58996
=================================================================
Total params: 20,309,296
Trainable params: 20,309,296
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/10
189/189 [==============================] - 68s 361ms/step - loss: 9.6809 - accuracy: 0.0036 - val_loss: 5.2785 - val_accuracy: 0.0030
Epoch 2/10
189/189 [==============================] - 58s 307ms/step - loss: 5.2770 - accuracy: 0.0055 - val_loss: 5.2785 - val_accuracy: 0.0089
Epoch 3/10
189/189 [==============================] - 58s 307ms/step - loss: 5.2743 - accuracy: 0.0083 - val_loss: 5.2793 - val_accuracy: 0.0104
Epoch 4/10
189/189 [==============================] - 58s 306ms/step - loss: 5.2728 - accuracy: 0.0089 - val_loss: 5.2800 - val_accuracy: 0.0089
Epoch 5/10
189/189 [==============================] - 58s 307ms/step - loss: 5.2710 - accuracy: 0.0084 - val_loss: 5.2806 - val_accuracy: 0.0089
Epoch 6/10
189/189 [==============================] - 57s 305ms/step - loss: 5.2698 - accuracy: 0.0086 - val_loss: 5.2815 - val_accuracy: 0.0089
Epoch 7/10
189/189 [==============================] - 58s 307ms/step - loss: 5.2695 - accuracy: 0.0083 - val_loss: 5.2822 - val_accuracy: 0.0089
Epoch 8/10
189/189 [==============================] - 58s 310ms/step - loss: 5.2681 - accuracy: 0.0086 - val_loss: 5.2834 - val_accuracy: 0.0089
Epoch 9/10
189/189 [==============================] - 58s 306ms/step - loss: 5.2679 - accuracy: 0.0083 - val_loss: 5.2840 - val_accuracy: 0.0089
Epoch 10/10
189/189 [==============================] - 58s 308ms/step - loss: 5.2669 - accuracy: 0.0083 - val_loss: 5.2848 - val_accuracy: 0.0089
1578/Unknown - 339s 215ms/step - loss: 5.2657 - accuracy: 0.0085
Update 1
I have increased the training sample by decreasing the batch size to 8. I tried to train the model again. However, the accuracy is still nearly 0.
Epoch 1/10
759/759 [==============================] - 112s 147ms/step - loss: 7.6876 - accuracy: 0.0051 - val_loss: 5.2779 - val_accuracy: 0.0089
Epoch 2/10
759/759 [==============================] - 112s 148ms/step - loss: 5.2728 - accuracy: 0.0086 - val_loss: 5.2792 - val_accuracy: 0.0089
Epoch 3/10
759/759 [==============================] - 112s 148ms/step - loss: 5.2695 - accuracy: 0.0087 - val_loss: 5.2808 - val_accuracy: 0.0089
Epoch 4/10
759/759 [==============================] - 109s 143ms/step - loss: 5.2671 - accuracy: 0.0087 - val_loss: 5.2828 - val_accuracy: 0.0089
Epoch 5/10
759/759 [==============================] - 111s 146ms/step - loss: 5.2661 - accuracy: 0.0086 - val_loss: 5.2844 - val_accuracy: 0.0089
Epoch 6/10
759/759 [==============================] - 114s 151ms/step - loss: 5.2648 - accuracy: 0.0089 - val_loss: 5.2862 - val_accuracy: 0.0089
Epoch 7/10
759/759 [==============================] - 118s 156ms/step - loss: 5.2646 - accuracy: 0.0086 - val_loss: 5.2881 - val_accuracy: 0.0089
Epoch 8/10
759/759 [==============================] - 117s 155ms/step - loss: 5.2639 - accuracy: 0.0087 - val_loss: 5.2891 - val_accuracy: 0.0089
Epoch 9/10
759/759 [==============================] - 115s 151ms/step - loss: 5.2635 - accuracy: 0.0087 - val_loss: 5.2903 - val_accuracy: 0.0089
Epoch 10/10
759/759 [==============================] - 112s 147ms/step - loss: 5.2634 - accuracy: 0.0086 - val_loss: 5.2915 - val_accuracy: 0.0089
2390/Unknown - 141s 59ms/step - loss: 5.2611 - accuracy: 0.0088
Indeed the last dataset I used had less classes but more samples. Maybe there is another model that fits my dataset, any suggestions?
For computer vision problems, you want to look at Convolutional Neural Networks. If you're unfamiliar with them, they learn to identify features in images. Examples could be edges and textures in early layers, and then wheels, windows, doors, etc in later layers.
For this problem, I would suggest using an existing, pretrained network such as MobileNet V2 or InceptionNetV3 as a backbone, and then building your own classifier on top. This tutorial on the Tensorflow website will get you started https://www.tensorflow.org/tutorials/images/transfer_learning#create_the_base_model_from_the_pre-trained_convnets
Here's an excerpt from this tutorial:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
Then adding your model code from above, you could try:
model = tf.keras.Sequential([
base_model,
Flatten(input_shape=(150,150, 3)),
Dense(300, activation="relu")),
Dense(n_classes, activation='softmax')
])
model_nn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
This is the model I've used on similar datasets and got reasonable accuracy with it:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
GlobalAveragePooling2D(),,
Dense(n_classes, activation='softmax')
])
In you current model, you are not trying to extract any features from the images. A single hidden layer with 300 neurons is nowhere near enough to be able to learn the features in images and give meaningful results.
You also need to check your input image size. MobileNet V2 works well with 224x224 colour images.
As per other comments, you will need to use the full dataset, you are not going to get any meaningful results with a few hundred images.
I would suggest that the ~6000 Training Samples for almost 200 classes is simply way to little for the model to work well.
The model did ~2000 wheight updates (200 in each epoch), which is way to few for it to learn to distinguish between ~200 classifications.
Maybe you had less classes and more training data in the other training sets?

Tensorflow tf.data.Dataset.cache seems do not take the expected effect

I am trying to improve my model training performance following the Better performance with the tf.data API guideline. However, I have observed that the performance using .cache() is almost the same or even worse if compared to same settings without .cache().
datafile_list = load_my_files()
RAW_BYTES = 403*4
BATCH_SIZE = 32
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=datafile_list, record_bytes=RAW_BYTES, num_parallel_reads=10, buffer_size=1024*RAW_BYTES)
raw_dataset = raw_dataset.map(tf.autograph.experimental.do_not_convert(decode_and_prepare),
num_parallel_calls=tf.data.AUTOTUNE)
raw_dataset = raw_dataset.cache()
raw_dataset = raw_dataset.shuffle(buffer_size=1024)
raw_dataset = raw_dataset.batch(BATCH_SIZE)
raw_dataset = raw_dataset.prefetch(tf.data.AUTOTUNE)
The data in datafile_list hold 9.92GB which fairly fits the system total physical RAM available (100GB). System swap is disabled.
By training the model using the dataset:
model = build_model()
model.fit(raw_dataset, epochs=5, verbose=2)
results in:
Epoch 1/5
206247/206247 - 126s - loss: 0.0043 - mae: 0.0494 - mse: 0.0043
Epoch 2/5
206247/206247 - 125s - loss: 0.0029 - mae: 0.0415 - mse: 0.0029
Epoch 3/5
206247/206247 - 129s - loss: 0.0027 - mae: 0.0397 - mse: 0.0027
Epoch 4/5
206247/206247 - 125s - loss: 0.0025 - mae: 0.0386 - mse: 0.0025
Epoch 5/5
206247/206247 - 125s - loss: 0.0024 - mae: 0.0379 - mse: 0.0024
This result is frustrating. By the docs:
The first time the dataset is iterated over, its elements will be cached either in the specified file or in memory. Subsequent iterations will use the cached data.
And from this guide:
When iterating over this dataset, the second iteration will be much faster than the first one thanks to the caching.
However, the elapsed time took by all epochs are almost the same. In addition, during the training both CPU and GPU usage are very low (see images below).
By commenting out the line raw_dataset = raw_dataset.cache() the results do not show any notable difference:
Epoch 1/5
206067/206067 - 129s - loss: 0.0042 - mae: 0.0492 - mse: 0.0042
Epoch 2/5
206067/206067 - 127s - loss: 0.0028 - mae: 0.0412 - mse: 0.0028
Epoch 3/5
206067/206067 - 134s - loss: 0.0026 - mae: 0.0393 - mse: 0.0026
Epoch 4/5
206067/206067 - 127s - loss: 0.0024 - mae: 0.0383 - mse: 0.0024
Epoch 5/5
206067/206067 - 126s - loss: 0.0023 - mae: 0.0376 - mse: 0.0023
As pointed out in the docs, my expectations were using cache would result in a much fast training time. I would like to know what I am doing wrong.
Attachments
GPU usage during training using cache:
GPU usage during training WITHOUT cache:
System Stats (Memory, CPU etc) during training using cache:
System Stats (Memory, CPU etc) during training WITHOUT cache:
Just a small observation using Google Colab. According to the docs:
Note: For the cache to be finalized, the input dataset must be iterated through in its entirety. Otherwise, subsequent iterations will not use cached data.
And
Note: cache will produce exactly the same elements during each
iteration through the dataset. If you wish to randomize the iteration
order, make sure to call shuffle after calling cache.
I did notice a few differences when using caching and iterating over the dataset beforehand. Here is an example.
Prepare data:
import random
import struct
import tensorflow as tf
import numpy as np
RAW_N = 2 + 20*20 + 1
bytess = random.sample(range(1, 5000), RAW_N*4)
with open('mydata.bin', 'wb') as f:
f.write(struct.pack('1612i', *bytess))
def decode_and_prepare(register):
register = tf.io.decode_raw(register, out_type=tf.float32)
inputs = register[2:402]
label = tf.random.uniform(()) + register[402:]
return inputs, label
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['/content/mydata.bin']*7000, record_bytes=RAW_N*4)
raw_dataset = raw_dataset.map(decode_and_prepare)
Train model without caching and iterating beforehand:
total_data_entries = len(list(raw_dataset.map(lambda x, y: (x, y))))
train_ds = raw_dataset.shuffle(buffer_size=total_data_entries).batch(32).prefetch(tf.data.AUTOTUNE)
inputs = tf.keras.layers.Input((400,))
x = tf.keras.layers.Dense(200, activation='relu', kernel_initializer='normal')(inputs)
x = tf.keras.layers.Dense(100, activation='relu', kernel_initializer='normal')(x)
outputs = tf.keras.layers.Dense(1, kernel_initializer='normal')(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(train_ds, epochs=5)
Epoch 1/5
875/875 [==============================] - 4s 3ms/step - loss: 0.1425
Epoch 2/5
875/875 [==============================] - 4s 3ms/step - loss: 0.0841
Epoch 3/5
875/875 [==============================] - 4s 3ms/step - loss: 0.0840
Epoch 4/5
875/875 [==============================] - 4s 3ms/step - loss: 0.0840
Epoch 5/5
875/875 [==============================] - 4s 3ms/step - loss: 0.0840
<keras.callbacks.History at 0x7fc41be037d0>
Training model with caching but no iterating:
total_data_entries = len(list(raw_dataset.map(lambda x, y: (x, y))))
train_ds = raw_dataset.shuffle(buffer_size=total_data_entries).cache().batch(32).prefetch(tf.data.AUTOTUNE)
inputs = tf.keras.layers.Input((400,))
x = tf.keras.layers.Dense(200, activation='relu', kernel_initializer='normal')(inputs)
x = tf.keras.layers.Dense(100, activation='relu', kernel_initializer='normal')(x)
outputs = tf.keras.layers.Dense(1, kernel_initializer='normal')(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(train_ds, epochs=5)
Epoch 1/5
875/875 [==============================] - 4s 2ms/step - loss: 0.1428
Epoch 2/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0841
Epoch 3/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
Epoch 4/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
Epoch 5/5
875/875 [==============================] - 2s 3ms/step - loss: 0.0840
<keras.callbacks.History at 0x7fc41fa87810>
Training model with caching and iterating:
total_data_entries = len(list(raw_dataset.map(lambda x, y: (x, y))))
train_ds = raw_dataset.shuffle(buffer_size=total_data_entries).cache().batch(32).prefetch(tf.data.AUTOTUNE)
_ = list(train_ds.as_numpy_iterator()) # iterate dataset beforehand
inputs = tf.keras.layers.Input((400,))
x = tf.keras.layers.Dense(200, activation='relu', kernel_initializer='normal')(inputs)
x = tf.keras.layers.Dense(100, activation='relu', kernel_initializer='normal')(x)
outputs = tf.keras.layers.Dense(1, kernel_initializer='normal')(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(train_ds, epochs=5)
Epoch 1/5
875/875 [==============================] - 3s 3ms/step - loss: 0.1427
Epoch 2/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0841
Epoch 3/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
Epoch 4/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
Epoch 5/5
875/875 [==============================] - 2s 2ms/step - loss: 0.0840
<keras.callbacks.History at 0x7fc41ac9c850>
Conclusion: The caching and the prior iteration of the dataset seem to have an effect on training, but in this example only 7000 files were used.

Regarding the accuracy of the Siamese CNN

# We have 2 inputs, 1 for each picture
left_input = Input(img_size)
right_input = Input(img_size)
# We will use 2 instances of 1 network for this task
convnet = MobileNetV2(weights='imagenet', include_top=False, input_shape=img_size,input_tensor=None)
convnet.trainable=True
x=convnet.output
x=tf.keras.layers.GlobalAveragePooling2D()(x)
x=Dense(320,activation='relu')(x)
x=Dropout(0.2)(x)
preds = Dense(101, activation='sigmoid')(x) # Apply sigmoid
convnet = Model(inputs=convnet.input, outputs=preds)
# Connect each 'leg' of the network to each input
# Remember, they have the same weights
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)
# Getting the L1 Distance between the 2 encodings
L1_layer = Lambda(lambda tensor:K.abs(tensor[0] - tensor[1]))
# Add the distance function to the network
L1_distance = L1_layer([encoded_l, encoded_r])
prediction = Dense(1,activation='sigmoid')(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
optimizer = Adam(lr, decay=2.5e-4)
#//TODO: get layerwise learning rates and momentum annealing scheme described in paperworking
siamese_net.compile(loss=keras.losses.binary_crossentropy,optimizer=optimizer,metrics=['accuracy'])
siamese_net.summary()
and the result of training is as follows
Epoch 1/10
126/126 [==============================] - 169s 1s/step - loss: 0.5683 - accuracy: 0.6840 - val_loss: 0.4644 - val_accuracy: 0.8044
Epoch 2/10
126/126 [==============================] - 163s 1s/step - loss: 0.2032 - accuracy: 0.9795 - val_loss: 0.2117 - val_accuracy: 0.9681
Epoch 3/10
126/126 [==============================] - 163s 1s/step - loss: 0.1110 - accuracy: 0.9925 - val_loss: 0.1448 - val_accuracy: 0.9840
Epoch 4/10
126/126 [==============================] - 164s 1s/step - loss: 0.0844 - accuracy: 0.9950 - val_loss: 0.1384 - val_accuracy: 0.9820
Epoch 5/10
126/126 [==============================] - 163s 1s/step - loss: 0.0634 - accuracy: 0.9990 - val_loss: 0.0829 - val_accuracy: 1.0000
Epoch 6/10
126/126 [==============================] - 165s 1s/step - loss: 0.0526 - accuracy: 0.9995 - val_loss: 0.0729 - val_accuracy: 1.0000
Epoch 7/10
126/126 [==============================] - 164s 1s/step - loss: 0.0465 - accuracy: 0.9995 - val_loss: 0.0641 - val_accuracy: 1.0000
Epoch 8/10
126/126 [==============================] - 163s 1s/step - loss: 0.0463 - accuracy: 0.9985 - val_loss: 0.0595 - val_accuracy: 1.0000
The model is predicting with good accuracy, when i am comparing two dissimilar images. Further it is predicting really good with same class of images.
But when I am comparing Image1 with image1 itself, it is predicting that they are similar only with the probability of 0.5.
in other case if I compare image1 with image2, then it is predicting correctly with a probability of 0.8.(here image1 and image2 belongs to same class)
when I am comparing individual images, it is predicting correctly, I have tried different alternatives did not workout.
May I know what might be the error?
The L1 distance between two equal vectors is always zero.
When you pass the same image, the encodings generated are equal (encoded_l is equal to encoded_r). Hence, the input to your final sigmoid layer is a zero vector.
And, sigmoid(0) = 0.5.
This is the reason providing identical inputs to your model gives 0.5 as the output.

Losses and Accuracy could not improve

Im trying to train a Product Detection model with approximately 100,000 training images and 10,000 test images. However no matter what optimizer i used in my model, i have tried Adam, SGD with multiple learning rates, my loss and accuracy does not improve. Below is my code
First i read the train images
for x in train_data.category.tolist():
if x < 10:
x = "0" + str(x)
path = os.path.join(train_DATADIR,x)
else:
x = str(x)
path = os.path.join(train_DATADIR,x)
img_array = cv2.imread(os.path.join(path,str(train_data.filename[idx])), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array,(100,100))
train_images.append(new_array)
idx += 1
print(f'{idx}/105392 - {(idx/105392)*100:.2f}%')
narray = np.array(train_images)
then i save the train_images data into a binary file
np.save(DIR_PATH + 'train_images_bitmap.npy', narray)
then i divide the train_images by 255.0
train_images = train_images / 255.0
and declared my model with input nodes of 100x100 as the images are resized to 100x100
model = keras.Sequential([
keras.layers.Flatten(input_shape=(100, 100)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(42)
])
then i compile the model, i tried adam, SGD(lr=0.01 up to 0.2 and as low to 0.001)
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
Next i fit the model with a callback of the epoch
model.fit(train_images, train_labels,epochs=2000)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,monitor='val_acc',
mode='max',save_best_only=True, save_weights_only=True, verbose=1)
but the output i got on the epoch wasnt improving, how can i improve the loss and accuracy? below is the output on the epochs
Epoch 6/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0249
Epoch 7/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0248
Epoch 8/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7209 - accuracy: 0.0255
Epoch 9/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7209 - accuracy: 0.0251
Epoch 10/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0254
Epoch 11/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7209 - accuracy: 0.0254
Epoch 12/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0243
Epoch 13/2000
3294/3294 [==============================] - 12s 3ms/step - loss: 3.7210 - accuracy: 0.0238
Epoch 14/2000
3294/3294 [==============================] - 11s 3ms/step - loss: 3.7210 - accuracy: 0.0251
Epoch 15/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7209 - accuracy: 0.0253
Epoch 16/2000
3294/3294 [==============================] - 11s 3ms/step - loss: 3.7210 - accuracy: 0.0243
Epoch 17/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0247
Epoch 18/2000
3294/3294 [==============================] - 12s 3ms/step - loss: 3.7210 - accuracy: 0.0247
I don't think the choice of optimizer is the main problem. With only a little experience on the matter, I can only suggest some things:
For images i would try using a 2d-convolution layer before the dense layer.
Try adding a dropout-layer to reduce the possibility of overfitting.
The first layer is 100*100, and a reduction to 128 is perhaps to aggressive (i dont know, but thats at least my intuition) Try increasing from 128 to a larger number, or even add an intermediate layer :)
Perhaps something like:
model = Sequential()
model.add(Conv2D(...))
model.add(MaxPooling2D(...)
model.add(Dropout(...))
model.add(Flatten(...))
model.add(Dense(...))
model.compile(...)```

Validation loss and validation accuracy both are higher than training loss and acc and fluctuating

I am trying to train my model using transfer learning, for this I am using VGG16 model, stripped the top layers and froze first 2 layers for using imagenet initial weights. For fine tuning them I am using learning rate 0.0001, activation softmax, dropout 0.5, loss categorical crossentropy, optimizer SGD, classes 46.
I am just unable to understand the behavior while training. Train loss and acc both are fine (loss is decreasing, acc is increasing). Val loss is decreasing and acc is increasing as well, BUT they are always higher than the train loss and acc.
Assuming its overfitting I made the model less complex, increased the dropout rate, added more samples to val data, but nothing seemed to work. I am a newbie so any kind of help is appreciated.
26137/26137 [==============================] - 7446s 285ms/step - loss: 1.1200 - accuracy: 0.3810 - val_loss: 3.1219 - val_accuracy: 0.4467
Epoch 2/50
26137/26137 [==============================] - 7435s 284ms/step - loss: 0.9944 - accuracy: 0.4353 - val_loss: 2.9348 - val_accuracy: 0.4694
Epoch 3/50
26137/26137 [==============================] - 7532s 288ms/step - loss: 0.9561 - accuracy: 0.4530 - val_loss: 1.6025 - val_accuracy: 0.4780
Epoch 4/50
26137/26137 [==============================] - 7436s 284ms/step - loss: 0.9343 - accuracy: 0.4631 - val_loss: 1.3032 - val_accuracy: 0.4860
Epoch 5/50
26137/26137 [==============================] - 7358s 282ms/step - loss: 0.9185 - accuracy: 0.4703 - val_loss: 1.4461 - val_accuracy: 0.4847
Epoch 6/50
26137/26137 [==============================] - 7396s 283ms/step - loss: 0.9083 - accuracy: 0.4748 - val_loss: 1.4093 - val_accuracy: 0.4908
Epoch 7/50
26137/26137 [==============================] - 7424s 284ms/step - loss: 0.8993 - accuracy: 0.4789 - val_loss: 1.4617 - val_accuracy: 0.4939
Epoch 8/50
26137/26137 [==============================] - 7433s 284ms/step - loss: 0.8925 - accuracy: 0.4822 - val_loss: 1.4257 - val_accuracy: 0.4978
Epoch 9/50
26137/26137 [==============================] - 7445s 285ms/step - loss: 0.8868 - accuracy: 0.4851 - val_loss: 1.5568 - val_accuracy: 0.4953
Epoch 10/50
26137/26137 [==============================] - 7387s 283ms/step - loss: 0.8816 - accuracy: 0.4874 - val_loss: 1.4534 - val_accuracy: 0.4970
Epoch 11/50
26137/26137 [==============================] - 7374s 282ms/step - loss: 0.8779 - accuracy: 0.4894 - val_loss: 1.4605 - val_accuracy: 0.4912
Epoch 12/50
26137/26137 [==============================] - 7411s 284ms/step - loss: 0.8733 - accuracy: 0.4915 - val_loss: 1.4694 - val_accuracy: 0.5030
Yes, you are facing over-fitting issue. To mitigate, you can try to implement below steps
1.Shuffle the Data, by using shuffle=True in VGG16_model.fit. Code is shown below:
history = VGG16_model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1,
validation_data=(x_validation, y_validation), shuffle = True)
2.Use Early Stopping. Code is shown below
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
3.Use Regularization. Code for Regularization is shown below (You can try l1 Regularization or l1_l2 Regularization as well):
from tensorflow.keras.regularizers import l2
Regularizer = l2(0.001)
VGG16_model.add(Conv2D(96,11, 11, input_shape = (227,227,3),strides=(4,4), padding='valid', activation='relu', data_format='channels_last',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
VGG16_model.add(Dense(units = 2, activation = 'sigmoid',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
4.You can try using BatchNormalization.
5.Perform Image Data Augmentation using ImageDataGenerator. Refer this link for more info about that.
6.If the Pixels are not Normalized, Dividing the Pixel Values with 255 also helps

Categories