Validation and Evaluation Metric Issues in TensorFlow when Transfer Learning

Validation and Evaluation Metric Issues in TensorFlow when Transfer Learning - python

I have come across some odd behaviours when training CNNs with Tensorflow 2.0 and would appreciate any help in solving them.
I am doing transfer learning (just training the classification head) using the pre-trained networks available in 'tensorflow.keras.applications' and have noticed the following:
For the first epoch, validation metrics are always zero, no matter what I do.
When training after the first epoch, the training metrics improve as you would expect, but the validation metrics essentially are random guesses, even when the EXACT same dataset is used as a training and a validation dataset. It is like it isn't using the model being trained to do its evaluation.
I have tried, VGG16, MobileNetV2, and ResNet50V2, and they all exhibit the same behaviours.
The configurations I am able to reproduce this on are:
Ubuntu 18.04LTS, Nvidia RTX2080ti with driver version 430.50, CUDA10.0, TensorFlow-gpu==2.0.0
MacBook Pro, TensorFlow==2.0.0 (cpu)
Both are running in Conda environments and I have installed TensorFlow with pip. I have put some sample code to show the essence of my workflow down below just in case I am doing anything obviously stupid.Any help would be very appreciated as I am at a loss as to how to fix it.
def parse_function(example_proto):
image_feature_description = {
'label': tf.io.FixedLenFeature([], tf.int64),
'image_raw': tf.io.FixedLenFeature([], tf.string)
}
parsed_example = tf.io.parse_single_example(example_proto, image_feature_description)
image = tf.io.decode_image(
parsed_example['image_raw'],
channels = 3,
dtype = tf.float32,
expand_animations = False
)
image = tf.image.per_image_standardization(image)
label = tf.one_hot(parsed_example['label'], 24, dtype=tf.float32)
return (image, label)
def load_dataset(TFRecord_dir, record_name):
record_files = tf.io.matching_files(os.path.join(TFRecord_dir, record_name + '.tfrecords-????'))
shards = tf.data.TFRecordDataset(record_files)
shards = shards.shuffle(tf.cast(tf.shape(record_files)[0], tf.int64))
dataset = shards.map(map_func=parse_function)
dataset = dataset.batch(batch_size=16, drop_remainder = True)
dataset = dataset.prefetch(16)
return dataset
base_model = tf.keras.applications.ResNet50V2(
input_shape=(224,224,3),
weights='imagenet',
include_top = False
)
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(24, activation = 'softmax')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=[
tf.keras.metrics.CategoricalAccuracy(),
tf.keras.metrics.TopKCategoricalAccuracy(),
tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()
])
train_dataset = load_dataset(train_dir, 'train')
model.fit(train_dataset,
verbose = 1,
epochs= 5,
validation_data = train_dataset)
model.evaluate(train_dataset)

When training after the first epoch, the training metrics improve as
you would expect, but the validation metrics essentially are random
guesses, even when the EXACT same dataset is used as a training and a
validation dataset. It is like it isn't using the model being trained
to do its evaluation.
This means that your network is not able to learn everything and it just overfitting. Random guesses means that you have 1/n accuracy where n is the number of classes.
You may want to modify the learning_rate to a much lower value (1e-5) for start and then even unfreeze some of the lower layers(close to your GAP+Dropout+Dense).

I no longer have this problem since started using the docker images provided instead. There must have been something installed incorrectly but I don't know what.
Also, for anyone in the same position, I found out during debugging that if you are normalising your images using image = (image/127.5) - 1 as in transfer learning with pre-trained CNN turtorial change to image = tf.image.per_image_standardization(image) as it exhibits the same behaviour, even in the docker container, i.e.training metrics would improve, but the validation metrics would remain random on the same dataset used to train.

Related

Bert prediction shape not equal to num_samples

I have a text classification that I am trying to do using BERT. Below is the code I am using. The model training code(below) works fine but I am facing issue with the prediction part
from transformers import TFBertForSequenceClassification
import tensorflow as tf
# recommended learning rate for Adam 5e-5, 3e-5, 2e-5
learning_rate = 5e-5
nlabels = 26
# we will do just 1 epoch for illustration, though multiple epochs might be better as long as we will not overfit the model
number_of_epochs = 1
# model initialization
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=nlabels,
output_attentions=False,
output_hidden_states=False)
# optimizer Adam
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=1e-08)
# we do not have one-hot vectors, we can use sparce categorical cross entropy and accuracy
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
bert_history = model.fit(ds_tr_encoded, epochs=number_of_epochs)
I am getting the output using the following
preds = model.predict(ds_te_encoded)
pred_labels_idx = np.argmax(preds['logits'], axis=1)
The issue I am facing is that the shape of pred_labels_idx is not the same as ds_te_encoded
len(pred_labels_idx) #426820
tf.data.experimental.cardinality(ds_te_encoded) #<tf.Tensor: shape=(), dtype=int64, numpy=21341>
Not sure why this is happening.

Since ds_te_encoded is of type tf.data.Dataset and you call cardinality(...), the cardinality in your case is simply the rounded number of batches and not the number of samples. So I am assuming you are using a batch size of 20, because 426820/20 = 21341. That is probably what is causing the confusion.

fit_generator() returns NoneType instead of History object in Mask R CNN

I would like to save the loss data while training my Mask R CNN, but I seem to be missing something. The training is working but I'm getting the Error:
AttributeError: 'NoneType' object has no attribute 'history'
history = model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=EPOCH_NUMBER,
augmentation=augmentation,
layers='heads', custom_callbacks=custom_callbacks)
df = pd.DataFrame(history.history).to_excel(
LOSS_DIR)
I'm not even sure if this is the right approach but it seemed easy enough.
This part of the code calls this function in model.py, to which I only added at the very end the history = fit_generator(...) and return history, it used to just call fit_generator(...):
def train(self, train_dataset, val_dataset, learning_rate, epochs, layers,
augmentation=None, custom_callbacks=None, no_augmentation_sources=None):
"""Train the model.
train_dataset, val_dataset: Training and validation Dataset objects.
learning_rate: The learning rate to train with
epochs: Number of training epochs. Note that previous training epochs
are considered to be done already, so this actually determines
the epochs to train in total rather than in this particular
call.
layers: Allows selecting which layers to train. It can be:
- A regular expression to match layer names to train
- One of these predefined values:
heads: The RPN, classifier and mask heads of the network
all: All the layers
3+: Train Resnet stage 3 and up
4+: Train Resnet stage 4 and up
5+: Train Resnet stage 5 and up
augmentation: Optional. An imgaug (https://github.com/aleju/imgaug)
augmentation. For example, passing imgaug.augmenters.Fliplr(0.5)
flips images right/left 50% of the time. You can pass complex
augmentations as well. This augmentation applies 50% of the
time, and when it does it flips images right/left half the time
and adds a Gaussian blur with a random sigma in range 0 to 5.
augmentation = imgaug.augmenters.Sometimes(0.5, [
imgaug.augmenters.Fliplr(0.5),
imgaug.augmenters.GaussianBlur(sigma=(0.0, 5.0))
])
custom_callbacks: Optional. Add custom callbacks to be called
with the keras fit_generator method. Must be list of type keras.callbacks.
no_augmentation_sources: Optional. List of sources to exclude for
augmentation. A source is string that identifies a dataset and is
defined in the Dataset class.
"""
assert self.mode == "training", "Create model in training mode."
# Pre-defined layer regular expressions
layer_regex = {
# all layers but the backbone
"heads": r"(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
# From a specific Resnet stage and up
"3+": r"(res3.*)|(bn3.*)|(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
"4+": r"(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
"5+": r"(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
# All layers
"all": ".*",
}
if layers in layer_regex.keys():
layers = layer_regex[layers]
# Data generators
train_generator = data_generator(train_dataset, self.config, shuffle=True,
augmentation=augmentation,
batch_size=self.config.BATCH_SIZE,
no_augmentation_sources=no_augmentation_sources)
val_generator = data_generator(val_dataset, self.config, shuffle=True,
batch_size=self.config.BATCH_SIZE)
# Create log_dir if it does not exist
if not os.path.exists(self.log_dir):
os.makedirs(self.log_dir)
# Callbacks
callbacks = [
keras.callbacks.TensorBoard(log_dir=self.log_dir,
histogram_freq=0, write_graph=True, write_images=False),
keras.callbacks.ModelCheckpoint(self.checkpoint_path,
verbose=0, save_weights_only=True),
]
# Add custom callbacks to the list
if custom_callbacks:
callbacks += custom_callbacks
# Train
log("\nStarting at epoch {}. LR={}\n".format(self.epoch, learning_rate))
log("Checkpoint Path: {}".format(self.checkpoint_path))
self.set_trainable(layers)
self.compile(learning_rate, self.config.LEARNING_MOMENTUM)
# Work-around for Windows: Keras fails on Windows when using
# multiprocessing workers. See discussion here:
# https://github.com/matterport/Mask_RCNN/issues/13#issuecomment-353124009
if os.name is 'nt':
workers = 1
else:
workers = multiprocessing.cpu_count()
history = self.keras_model.fit_generator(
train_generator,
initial_epoch=self.epoch,
epochs=epochs,
steps_per_epoch=self.config.STEPS_PER_EPOCH,
callbacks=callbacks,
validation_data=val_generator,
validation_steps=self.config.VALIDATION_STEPS,
max_queue_size=100,
workers=0,
use_multiprocessing=False,
)
self.epoch = max(self.epoch, epochs)
return history
In the documentation of fit_generator() it says that it returns a History Object but it looks like it doesn't? I'm very new to machine learning and working on projects like this in general so I'm sorry if this was a stupid question or if I forgot some crucial information.

I believe that model.fit_generator is deprecated, in TensorFlow 2.2 and higher you can just use model.fit because this now supports generators.
https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit_generator

As of Tensorflow 2.1, if you call fit_generator you will get a warning like this one:
Model.fit_generator (from tensorflow.python.keras.engine.training) is
deprecated and will be removed in a future version.
Instructions for updating:
Please use Model.fit, which supports generators.
It prompts you to instead use model.fit which now supports generators.
https://www.codesofinterest.com/2020/09/model-fit-vs-fit-generator-tfkeras.html

Tensorflow model pruning gives 'nan' for training and validation losses

I'm trying to prune a base model that consists of several layers on top of a VGG network. It also contains a user-defined layer named instance_normalization. For pruning to be successful, I've defined the get_prunable_weights function of this layer as follows:
### defined for model pruning
def get_prunable_weights(self):
return self.weights
I used the following function to obtain a to-be-pruned model structure using a base model named model:
def define_prune_model(self, model, img_shape, epochs, batch_size, validation_split=0.1):
num_images = img_shape[0] * (1 - validation_split)
end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs
# Define model for pruning.
pruning_params = {
'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.5,
final_sparsity=0.80,
begin_step=0,
end_step=end_step)
}
model_for_pruning = prune_low_magnitude(model, **pruning_params)
model_for_pruning.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model_for_pruning.summary()
return model_for_pruning
Then, I wrote the following function to perform training on this pruning model:
def train_prune_model(self, model_for_pruning, train_images, train_labels,
epochs, batch_size, validation_split=0.1):
callbacks = [
tfmot.sparsity.keras.UpdatePruningStep(),
tfmot.sparsity.keras.PruningSummaries(log_dir='./models/pruned'),
]
model_for_pruning.fit(train_images, train_labels,
batch_size=batch_size, epochs=epochs, validation_split=validation_split,
callbacks=callbacks)
return model_for_pruning
However, when training, I found out that the training and validation losses were all nan, and the final model prediction output was totally zero. However, the base model that passed to define_prune_model has successfully trained and predicted correctly.
How can I solve this? Thank you in advance.

It is difficult to pinpoint the issue without more informations. In particular, can you please give more detail (preferably as code) about your custom instance_normalization layer ?
Assuming that the code is fine: Since you mentioned that the model trains correctly without pruning, could it be that those pruning parameters are too harsh ? After all, those options set 50% of the weights to zero right from the first learning step.
Here is what I would try:
Experiment with a lower level of sparsity (especially initial_sparsity).
Start to apply pruning later during the training (begin_step argument of the pruning schedule). Some even prefer to train the model once without applying pruning at all. Then re-train again with prune_low_magnitude().
Only prune at some steps, giving time for the model to recover between prunings (frequency argument).
Finally should it still fail, the usual cures when encountering nan losses: reduce the learning rate, use regularization or gradient clipping, ...

LSTM Model not having any variance during evaluation

I have a question regarding the evaluation of an LSTM Model. I have trained an LSTM Model and stored it with model.save(...). Now I want load_model and evaluate it on the validation set datasets. Since neural networks are stochastic, I run it several times and compute the mean and the variance of the different metrics I am interested in.
Now I am shocked that after the first run all consecutive runs have the same performance on every metric. I don't think that is right, but I don't know where the error occurs.
So my question is:
what is my mistake in setting up the validation of my model?
and how can I fix that?
Here are the code snippets that should explain what I am doing:
Compile and fit the Model
def compile_and_fit( hparams,
MAX_EPOCHS,
model_path ):
window = WindowGenerator( input_width= hparams[HP_WINDOW_SIZE],
label_width=hparams[HP_WINDOW_SIZE], shift=1,
label_columns=['q_MARI'], batch_size = hparams[HP_BATCH_SIZE])
model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_1"),
tf.keras.layers.Dropout(hparams[HP_DROPOUT], name="Dropout_1"),
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_2"),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
learning_rate = hparams[HP_LEARNING_RATE]
model.compile(loss=tf.losses.MeanSquaredError(),
optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
metrics=get_metrics())
history = model.fit(window.train,
epochs=MAX_EPOCHS,
validation_data=window.val,
callbacks= get_callbacks(model_path))
_, a,_,_,_,_ = model.evaluate(window.val)
return a, model, history
Train and safe it
a, model, history = compile_and_fit( hparams = hparams, MAX_EPOCHS = MAX_EPOCHS, model_path = run_path)
model.save(run_path)
Load and evaluate it
model = tf.keras.models.load_model(os.path.join(hparam_path, model_name),
custom_objects={"max_error": max_error, "median_absolute_error": median_absolute_error, "rev_metric": rev_metric, "nse_metric": nse_metric})
model.compile(loss=tf.losses.MeanSquaredError(), optimizer="adam", metrics=get_metrics())
metric_values = np.empty(shape = (nr_runs, len(metrics)), dtype=float)
for j in range(nr_runs):
window = WindowGenerator(input_width= hparam_vals[i], label_width=hparam_vals[i], shift=1,
label_columns=['q_MARI'])
metric_values[j]= np.array(model.evaluate(window.val))
means = metric_values.mean(axis=0)
varis = metric_values.var(axis=0)
print(f'means: {means}, varis: {varis}')
The results I am getting
For setting up the Training I follow those two guides:
https://www.tensorflow.org/tutorials/structured_data/time_series
https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams

LSTM is not stochastic. Evaluation results should be the same for the same data.

There are two steps, when you train the model, randomness will influence the model you trained. However, after that, you saved the model, the prediction result would be same if you use the same model.

Model performance not improving during federated learning training

I have followed this emnist tutorial to create an image classification experiment (7 classes) with the aim of training a classifier on 3 silos of data with the TFF framework.
Before training begins, I convert the model to a tf keras model using tff.learning.assign_weights_to_keras_model(model,state.model) to evaluate on my validation set. Regardless of the label, the model only predicts one class. This is to be expected as no training of the model has occurred yet. However, I repeat this step after each federated averaging round and the problem persists. All validation images are predicted to one class. I also save the tf keras model weights after each round and make predictions on the test set - no changes.
Some of the steps I have taken to check the source of the issue:
Checked if the tf keras model weights are updating when the FL model is converted after each round - they are updating.
Ensured that the buffer size is greater than the training dataset size for each client.
Compared the predictions to the class distribution in the training datasets. There is a class imbalance but the one class that the model predicts is not necessarily the majority class. Also, it is not always the same class. For the most part, it predicts only class 0.
Increased the number of rounds to 5 and epochs per round to 10. This is computationally very intensive as it is quite a large model being trained with approx 1500 images per client.
Investigated the TensorBoard logs from each training attempt. The training loss is decreasing as the round progresses.
Tried a much simpler model - basic CNN with 2 conv layers. This allowed me to greatly increase the number of epochs and rounds. When evaluating this model on the test set, it predicted 4 different classes but the performance remains very bad. This would indicate that I just would need to increase the number of rounds and epochs for my original model to increase the variation in predictions. This is difficult due the large training time that would be a result.
Model details:
The model uses the XceptionNet as the base model with the weights unfrozen. This performs well on the classification task when all the training images are pooled into a global dataset. Our aim is to hopefully achieve a comparable performance with FL.
base_model = Xception(include_top=False,
weights=weights,
pooling='max',
input_shape=input_shape)
x = GlobalAveragePooling2D()( x )
predictions = Dense( num_classes, activation='softmax' )( x )
model = Model( base_model.input, outputs=predictions )
Here is my training code:
def fit(self):
"""Train FL model"""
# self.load_data()
summary_writer = tf.summary.create_file_writer(
self.logs_dir
)
federated_averaging = self._construct_iterative_process()
state = federated_averaging.initialize()
tfkeras_model = self._convert_to_tfkeras_model( state )
print( np.argmax( tfkeras_model.predict( self.val_data ), axis=-1 ) )
val_loss, val_acc = tfkeras_model.evaluate( self.val_data, steps=100 )
with summary_writer.as_default():
for round_num in tqdm( range( 1, self.num_rounds ), ascii=True, desc="FedAvg Rounds" ):
print( "Beginning fed avg round..." )
# Round of federated averaging
state, metrics = federated_averaging.next(
state,
self.training_data
)
print( "Fed avg round complete" )
# Saving logs
for name, value in metrics._asdict().items():
tf.summary.scalar(
name,
value,
step=round_num
)
print( "round {:2d}, metrics={}".format( round_num, metrics ) )
tff.learning.assign_weights_to_keras_model(
tfkeras_model,
state.model
)
# tfkeras_model = self._convert_to_tfkeras_model(
# state
# )
val_metrics = {}
val_metrics["val_loss"], val_metrics["val_acc"] = tfkeras_model.evaluate(
self.val_data,
steps=100
)
for name, metric in val_metrics.items():
tf.summary.scalar(
name=name,
data=metric,
step=round_num
)
self._checkpoint_tfkeras_model(
tfkeras_model,
round_num,
self.checkpoint_dir
)
def _checkpoint_tfkeras_model(self,
model,
round_number,
checkpoint_dir):
# Obtaining model dir path
model_dir = os.path.join(
checkpoint_dir,
f'round_{round_number}',
)
# Creating directory
pathlib.Path(
model_dir
).mkdir(
parents=True
)
model_path = os.path.join(
model_dir,
f'model_file_round{round_number}.h5'
)
# Saving model
model.save(
model_path
)
def _convert_to_tfkeras_model(self, state):
"""Converts global TFF modle of TF keras model
Takes the weights of the global model
and pushes them back into a standard
Keras model
Args:
state: The state of the FL server
containing the model and
optimization state
Returns:
(model); TF Keras model
"""
model = self._load_tf_keras_model()
model.compile(
loss=self.loss,
metrics=self.metrics
)
tff.learning.assign_weights_to_keras_model(
model,
state.model
)
return model
def _load_tf_keras_model(self):
"""Loads tf keras models
Raises:
KeyError: A model name was not defined
correctly
Returns:
(model): TF keras model object
"""
model = create_models(
model_type=self.model_type,
input_shape=[self.img_h, self.img_w, 3],
freeze_base_weights=self.freeze_weights,
num_classes=self.num_classes,
compile_model=False
)
return model
def _define_model(self):
"""Model creation function"""
model = self._load_tf_keras_model()
tff_model = tff.learning.from_keras_model(
model,
dummy_batch=self.sample_batch,
loss=self.loss,
# Using self.metrics throws an error
metrics=[tf.keras.metrics.CategoricalAccuracy()] )
return tff_model
def _construct_iterative_process(self):
"""Constructing federated averaging process"""
iterative_process = tff.learning.build_federated_averaging_process(
self._define_model,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD( learning_rate=0.02 ),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD( learning_rate=1.0 ) )
return iterative_process

Increased the number of rounds to 5 ...
Running only a few rounds of federated learning sounds insufficient. One of the earliest Federated Averaging papers (McMahan 2016) required running for hundreds of rounds when the MNIST data had non-iid splits. More recently (Reddi 2020) required thousands of rounds for CIFAR-100. One thing to note is that each "round" is one "step" of the global model. That step may be larger with more client epochs, but these are averaged and diverging clients may reduce the magnitude of the global step.
I also save the tf keras model weights after each round and make predictions on the test set - no changes.
This can be concerning. It will be easier to debug if you could share the code used in the FL training loop.

Note sure this is an answer, but more a liked observation.
I've been trying to characterize the learning process (accuracy and loss) on the Federated Learning for Image Classification notebook tutorial with TFF.
I'm seeing major improvements in speed of convergence by modifying the epoch hyperparameter. Changing epochs from 5, 10, 20 etc. But I'm also seeing major increase in training accuracy. I suspect overfitting is occurring, though then I evaluate on the test set accuracy is still high.
Wondering what is going on. ?
My understanding is that the epoch param controls the # of forward/back prop on each client per round of training. Is this correct ? So ie 10 rounds of training on 10 clients with 10 epochs would be 10 Epochs X 10 Clients X 10 rounds. Realise a lager range of clients is needed etc but I was expecting to see poorer accuracy on the test set.
What can I do to see whats going on. Could I use the evaluation check with something like learning curves to to see if overfitting is occurring ?
test_metrics = evaluation(state.model, federated_test_data) Only appears to give a single data point, how can I get the individual test accuracy for each test example validated?
Appreciate any thoughts you may have on the matter, Colin . . .

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Validation and Evaluation Metric Issues in TensorFlow when Transfer Learning - python

Related

Bert prediction shape not equal to num_samples

fit_generator() returns NoneType instead of History object in Mask R CNN

Tensorflow model pruning gives 'nan' for training and validation losses

LSTM Model not having any variance during evaluation

Model performance not improving during federated learning training

Categories

Resources