Overfitted model performs poorly on training data

Overfitted model performs poorly on training data - python

What can be the cause of accuracy being >90% while model predicts one class in 100% cases in multiclass clasification problem? I would expect that the overfitted model with high accuracy for training data will predict well on training data.
Model:
model = tf.keras.Sequential([
tf.keras.layers.Rescaling(scale=1./255., offset=0.0),
mobilenet_custom,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
Where mobilenet_custom is a tf.keras.applications.MobileNetV2 model with random weights and input is modified to have 7 channels.
I am trying to classify frames of a clip into three classes. The training data set is kind of balanced:
Total count: 15849
Labels[ A ]: 3906
Labels[ B ]: 5955
Labels[ O ]: 5988
Batch Shape X: (32, 224, 224, 7)
Batch Shape y: (32, 3)
The accuracy is >90% after 2 epochs. (the val_accuracy is around 35%)
However I also record confusion matrices at each epoch end using the Callback class using the following function to collect data for the confusion matrix:
def _collect_validation_data(self, arg_datagen):
predicts = []
truths = []
print("\nCollecting data for epoch evaluation...")
batch_num_per_epoch = arg_datagen.__len__()
for batch_i in range(batch_num_per_epoch):
X_batch, y_batch = arg_datagen[batch_i]
y_pred = self.model.predict(X_batch, verbose=0)
y_true = y_batch
predicts += [item for item in np.argmax(y_pred, axis=1)]
truths += [item for item in np.argmax(y_true, axis=1)]
print("Batch: {}/{}".format(batch_i+1, batch_num_per_epoch), end='\r')
print("\nDone!\n")
return truths, predicts
Every time the confusion matrices look like following:
When I saved the values of y_pred and y_true that are passed to compute the accuracy it confirms that the accuracy is calculated correctly.

The problem was the batch normalization. When tried on model without batch normalization layers like vgg16, the results were as expected on training data - values of the confusion matrix build from prediction on training data were mostly on the diagonal (when the training accuracy was high).
More detailed discussion here: https://github.com/keras-team/keras/issues/6977
A possible solution here: fit() works as expected but then during evaluate() model performs at chance

Related

Bert prediction shape not equal to num_samples

I have a text classification that I am trying to do using BERT. Below is the code I am using. The model training code(below) works fine but I am facing issue with the prediction part
from transformers import TFBertForSequenceClassification
import tensorflow as tf
# recommended learning rate for Adam 5e-5, 3e-5, 2e-5
learning_rate = 5e-5
nlabels = 26
# we will do just 1 epoch for illustration, though multiple epochs might be better as long as we will not overfit the model
number_of_epochs = 1
# model initialization
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=nlabels,
output_attentions=False,
output_hidden_states=False)
# optimizer Adam
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=1e-08)
# we do not have one-hot vectors, we can use sparce categorical cross entropy and accuracy
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
bert_history = model.fit(ds_tr_encoded, epochs=number_of_epochs)
I am getting the output using the following
preds = model.predict(ds_te_encoded)
pred_labels_idx = np.argmax(preds['logits'], axis=1)
The issue I am facing is that the shape of pred_labels_idx is not the same as ds_te_encoded
len(pred_labels_idx) #426820
tf.data.experimental.cardinality(ds_te_encoded) #<tf.Tensor: shape=(), dtype=int64, numpy=21341>
Not sure why this is happening.

Since ds_te_encoded is of type tf.data.Dataset and you call cardinality(...), the cardinality in your case is simply the rounded number of batches and not the number of samples. So I am assuming you are using a batch size of 20, because 426820/20 = 21341. That is probably what is causing the confusion.

LSTM Model not having any variance during evaluation

I have a question regarding the evaluation of an LSTM Model. I have trained an LSTM Model and stored it with model.save(...). Now I want load_model and evaluate it on the validation set datasets. Since neural networks are stochastic, I run it several times and compute the mean and the variance of the different metrics I am interested in.
Now I am shocked that after the first run all consecutive runs have the same performance on every metric. I don't think that is right, but I don't know where the error occurs.
So my question is:
what is my mistake in setting up the validation of my model?
and how can I fix that?
Here are the code snippets that should explain what I am doing:
Compile and fit the Model
def compile_and_fit( hparams,
MAX_EPOCHS,
model_path ):
window = WindowGenerator( input_width= hparams[HP_WINDOW_SIZE],
label_width=hparams[HP_WINDOW_SIZE], shift=1,
label_columns=['q_MARI'], batch_size = hparams[HP_BATCH_SIZE])
model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_1"),
tf.keras.layers.Dropout(hparams[HP_DROPOUT], name="Dropout_1"),
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_2"),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
learning_rate = hparams[HP_LEARNING_RATE]
model.compile(loss=tf.losses.MeanSquaredError(),
optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
metrics=get_metrics())
history = model.fit(window.train,
epochs=MAX_EPOCHS,
validation_data=window.val,
callbacks= get_callbacks(model_path))
_, a,_,_,_,_ = model.evaluate(window.val)
return a, model, history
Train and safe it
a, model, history = compile_and_fit( hparams = hparams, MAX_EPOCHS = MAX_EPOCHS, model_path = run_path)
model.save(run_path)
Load and evaluate it
model = tf.keras.models.load_model(os.path.join(hparam_path, model_name),
custom_objects={"max_error": max_error, "median_absolute_error": median_absolute_error, "rev_metric": rev_metric, "nse_metric": nse_metric})
model.compile(loss=tf.losses.MeanSquaredError(), optimizer="adam", metrics=get_metrics())
metric_values = np.empty(shape = (nr_runs, len(metrics)), dtype=float)
for j in range(nr_runs):
window = WindowGenerator(input_width= hparam_vals[i], label_width=hparam_vals[i], shift=1,
label_columns=['q_MARI'])
metric_values[j]= np.array(model.evaluate(window.val))
means = metric_values.mean(axis=0)
varis = metric_values.var(axis=0)
print(f'means: {means}, varis: {varis}')
The results I am getting
For setting up the Training I follow those two guides:
https://www.tensorflow.org/tutorials/structured_data/time_series
https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams

LSTM is not stochastic. Evaluation results should be the same for the same data.

There are two steps, when you train the model, randomness will influence the model you trained. However, after that, you saved the model, the prediction result would be same if you use the same model.

Regression with LSTM - python and Keras

I am trying to use a LSTM network in Keras to make predictions of timeseries data one step into the future. The data I have is of 5 dimensions, and I am trying to use the previous 3 periods of readings to predict the a future value in the next period. I have normalised the data and removed all NaN etc, and this is the code I am trying to use to train the network:
def Network_ii(IN, OUT, TIME_PERIOD, EPOCHS, BATCH_SIZE, LTSM_SHAPE):
length = len(OUT)
train_x = IN[:int(0.9 * length)]
validation_x = IN[int(0.9 * length):]
train_y = OUT[:int(0.9 * length)]
validation_y = OUT[int(0.9 * length):]
# Define Network & callback:
train_x = train_x.reshape(train_x.shape[0],3, 5)
validation_x = validation_x.reshape(validation_x.shape[0],3, 5)
model = Sequential()
model.add(LSTM(units=128, return_sequences= True, input_shape=(train_x.shape[1],3)))
model.add(LSTM(units=128))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
train_y = np.asarray(train_y)
validation_y = np.asarray(validation_y)
history = model.fit(train_x, train_y, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data=(validation_x, validation_y))
# Score model
score = model.evaluate(validation_x, validation_y, verbose=0)
print('Test loss:', score)
# Save model
model.save(f"models/new_model")
I am attempting to roughly follow the steps outlined here- https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/
However, no matter what adjustments I have made in terms of changing the number of dimensions used to train the network or the length of the time period I cannot get the output of the model to give predictions that are not either 1 or 0. This is even though the target data, in the array 'OUT' is made up of data continuous on [0,1].
I think there may be something wrong with how I am setting up the .Sequential() function, but I cannot see what to adjust. I am relatively new to this so any help would be greatly appreciated.

You are probably using a prediction function that is not the standard. Maybe you are using predict_classes?
The one that is well documented and the standard is model.predict.

Why is accuracy from fit_generator different to that from evaluate_generator in Keras?

What I do:
I am training a pre-trained CNN with Keras fit_generator(). This produces evaluation metrics (loss, acc, val_loss, val_acc) after each epoch. After training the model, I produce evaluation metrics (loss, acc) with evaluate_generator().
What I expect:
If I train the model for one epoch, I would expect that the metrics obtained with fit_generator() and evaluate_generator() are the same. They both should derive the metrics based on the entire dataset.
What I observe:
Both loss and acc are different from fit_generator() and evaluate_generator():
What I don't understand:
Why the accuracy from fit_generator() is
different to that from evaluate_generator()
My code:
def generate_data(path, imagesize, nBatches):
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory\
(directory=path, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True) # whether to shuffle the data
return generator
[...]
def train_model(model, nBatches, nEpochs, trainGenerator, valGenerator, resultPath):
history = model.fit_generator(generator=trainGenerator,
steps_per_epoch=trainGenerator.samples//nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=None, # keras.callbacks.Callback instances to apply during training
validation_data=valGenerator, # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
validation_steps=
valGenerator.samples//nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=32, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=True, # whether to use process-based threading
shuffle=False, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
print("%s: Model trained." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
# Save model
modelPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelArchitecture.h5')
weightsPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelWeights.h5')
model.save(modelPath)
model.save_weights(weightsPath)
print("%s: Model saved." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
return history, model
[...]
def evaluate_model(model, generator):
score = model.evaluate_generator(generator=generator, # Generator yielding tuples
steps=
generator.samples//nBatches) # number of steps (batches of samples) to yield from generator before stopping
print("%s: Model evaluated:"
"\n\t\t\t\t\t\t Loss: %.3f"
"\n\t\t\t\t\t\t Accuracy: %.3f" %
(datetime.now().strftime('%Y-%m-%d_%H-%M-%S'),
score[0], score[1]))
[...]
def main():
# Create model
modelUntrained = create_model(imagesize, nBands, nClasses)
# Prepare training and validation data
trainGenerator = generate_data(imagePathTraining, imagesize, nBatches)
valGenerator = generate_data(imagePathValidation, imagesize, nBatches)
# Train and save model
history, modelTrained = train_model(modelUntrained, nBatches, nEpochs, trainGenerator, valGenerator, resultPath)
# Evaluate on validation data
print("%s: Model evaluation (valX, valY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
evaluate_model(modelTrained, valGenerator)
# Evaluate on training data
print("%s: Model evaluation (trainX, trainY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
evaluate_model(modelTrained, trainGenerator)
Update
I found some sites that report on this issue:
The Batch Normalization layer of Keras is broken
Strange
behaviour of the loss function in keras model, with pretrained
convolutional base
model.evaluate() gives a different loss on
training data from the one in training process
Got different accuracy between history and evaluate
ResNet: 100% accuracy during training, but 33% prediction
accuracy with the same data
I tried following some of their suggested solutions without success so far. acc and loss are still different from fit_generator() and evaluate_generator(), even when using the exact same data generated with the same generator for training and validation. Here is what I tried:
statically setting the learning_phase for the entire script or before adding new layers to the pre-trained ones
K.set_learning_phase(0) # testing
K.set_learning_phase(1) # training
unfreezing all batch normalization layers from the pre-trained model
for i in range(len(model.layers)):
if str.startswith(model.layers[i].name, 'bn'):
model.layers[i].trainable=True
not adding dropout or batch normalization as untrained layers
# Create pre-trained base model
basemodel = ResNet50(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
# Create new untrained layers
x = basemodel.output
x = GlobalAveragePooling2D()(x) # global spatial average pooling layer
x = Dense(1024, activation='relu')(x) # fully-connected layer
y = Dense(nClasses, activation='softmax')(x) # logistic layer making sure that probabilities sum up to 1
# Create model combining pre-trained base model and new untrained layers
model = Model(inputs=basemodel.input,
outputs=y)
# Freeze weights on pre-trained layers
for layer in basemodel.layers:
layer.trainable = False
# Define learning optimizer
learningRate = 0.01
optimizerSGD = optimizers.SGD(lr=learningRate, # learning rate.
momentum=0.9, # parameter that accelerates SGD in the relevant direction and dampens oscillations
decay=learningRate/nEpochs, # learning rate decay over each update
nesterov=True) # whether to apply Nesterov momentum
# Compile model
model.compile(optimizer=optimizerSGD, # stochastic gradient descent optimizer
loss='categorical_crossentropy', # objective function
metrics=['accuracy'], # metrics to be evaluated by the model during training and testing
loss_weights=None, # scalar coefficients to weight the loss contributions of different model outputs
sample_weight_mode=None, # sample-wise weights
weighted_metrics=None, # metrics to be evaluated and weighted by sample_weight or class_weight during training and testing
target_tensors=None) # tensor model's target, which will be fed with the target data during training
using different pre-trained CNNs as base model (VGG19, InceptionV3, InceptionResNetV2, Xception)
from keras.applications.vgg19 import VGG19
basemodel = VGG19(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
Please let me know if there are other solutions around that I am missing.

I now managed having the same evaluation metrics. I changed the following:
I set seed in flow_from_directory() as suggested by #Anakin
def generate_data(path, imagesize, nBatches):
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory(directory=path, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True, # whether to shuffle the data
seed=42) # random seed for shuffling and transformations
return generator
I set use_multiprocessing=False in fit_generator() according to the warning: use_multiprocessing=True and multiple workers may duplicate your data
history = model.fit_generator(generator=trainGenerator,
steps_per_epoch=trainGenerator.samples//nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=callback, # keras.callbacks.Callback instances to apply during training
validation_data=valGenerator, # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
validation_steps=
valGenerator.samples//nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=1, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=False, # whether to use process-based threading
shuffle=False, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
I unified my python setup as suggested in the keras documentation on how to obtain reproducible results using Keras during development
import tensorflow as tf
import random as rn
from keras import backend as K
np.random.seed(42)
rn.seed(12345)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
Instead of rescaling input images with datagen = ImageDataGenerator(rescale=1./255), I now generate my data with:
from keras.applications.resnet50 import preprocess_input
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
With this, I managed to have a similar accuracy and loss from fit_generator() and evaluate_generator(). Also, using the same data for training and testing now results in a similar metrics. Reasons for remaining differences are provided in the keras documentation.

Set use_multiprocessing=False at fit_generator level fixes the problem BUT at the cost of slowing down training significantly. A better but still imperfect workround would be to set use_multiprocessing=False for only the validation generator as the code below modified from keras' fit_generator function.
...
try:
if do_validation:
if val_gen and workers > 0:
# Create an Enqueuer that can be reused
val_data = validation_data
if isinstance(val_data, Sequence):
val_enqueuer = OrderedEnqueuer(val_data,
**use_multiprocessing=False**)
validation_steps = len(val_data)
else:
val_enqueuer = GeneratorEnqueuer(val_data,
**use_multiprocessing=False**)
val_enqueuer.start(workers=workers,
max_queue_size=max_queue_size)
val_enqueuer_gen = val_enqueuer.get()
...

Training for one epoch might not be informative enough in this case. Also your train and test data may not be exactly same, since you are not setting a random seed to the flow_from_directory method. Have a look here.
Maybe, you can set a seed, remove augmentations (if any) and save trained model weights to load them later to check.

Keras Deep Learning and Financial Returns

I am experiencing with Tensorflow via the Keras library and before diving into predicting uncertainty, I thought it might be a good idea to predict something certain. Therefore, I tried to predict weekly returns using daily price level data. My input shape looks like this: (1000, 5, 2), i.e. 1000 matrices of the form:
Stock A Stock B
110 100
95 101
90 100
89 99
100 110
For Stock A the price at day t=0is 100, 95 at t-1 and 100 at t-5. Thus, the weekly return for Stock A would be 110/100=10%and -10% for Stock B. Because I focus on only predicting Stock As return for now, my y for this input matrix would just be the scalar 0.01. Furthermore, I want to make it a classification problem and thus make a one-hot encoded vector via to_categorical with 1 if the y is above 5%, 2 if it is below -5% and 0 if it is in between. Hence my classification output for the aforementioned matrix would be:
0 1 0
To simplify: I want my model to learn to calculate returns, i.e. divide the first value in the input matrix by the last value of the input matrix for stock A and ignore the input for stock B. This would give the y. It is just a practice task for me before I get to more difficult tasks and the model should achieve a loss of zero because there is no uncertainty. What model do you propose to do that? I tried the following and it does not converge at all. Training and validation weights are calculated via compute_sample_weight('balanced', ).
Earlystop = EarlyStopping(monitor='val_loss', patience=150, mode='min', verbose=1, min_delta=0.0002, restore_best_weights=True)
checkpoint = ModelCheckpoint('nn', monitor='val_loss', verbose=1, save_best_only=True, mode='min', save_weights_only=False)
Plateau = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=30, verbose=1)
optimizer = optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, amsgrad=True)
input_ = Input(batch_shape=(batch_size, 1, 5, 2))
model = LocallyConnected2D(16, kernel_size=(5, 1), padding='valid', data_format="channels_first")(input_)
model = LeakyReLU(alpha=0.01)(model)
model = Dense(128)(model)
model = LeakyReLU(alpha=0.01)(model)
model = Flatten()(model)
x1 = Dense(3, activation='softmax', name='0')(model)
final_model = Model(inputs=input_, outputs=[x1])
final_model.compile(loss='categorical_crossentropy' , optimizer=optimizer, metrics=['accuracy'])
history = final_model.fit(X_train, y_train, epochs=1000, batch_size=batch_size, verbose=2, shuffle=False, validation_data=[X_valid, y_valid, valid_weight], sample_weight=train_weight, callbacks=[Earlystop, checkpoint, Plateau])
I thought convolution might be good for this and because every return is calcualted individually I decided to go for a LocallyConnected layer. Do I need to add more layers for such a simple task?
EDIT: transformed my input matrix to returns and the model converges successfully. So the input must be correct but the model fails to find the division function. Are there any layers that would be suited to do that?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Overfitted model performs poorly on training data - python

Related

Bert prediction shape not equal to num_samples

LSTM Model not having any variance during evaluation

Regression with LSTM - python and Keras

Why is accuracy from fit_generator different to that from evaluate_generator in Keras?

Keras Deep Learning and Financial Returns

Categories

Resources