How to stop over fitting a Model in Keras?

How to stop over fitting a Model in Keras? - python

I am working on Text Classification Problem. My Model looks like this :
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_6 (Embedding) (None, 100, 50) 676050
_________________________________________________________________
lstm_6 (LSTM) (None, 16) 4288
_________________________________________________________________
dropout_1 (Dropout) (None, 16) 0
_________________________________________________________________
dense_6 (Dense) (None, 3) 51
=================================================================
Total params: 680,389
Trainable params: 680,389
Non-trainable params: 0
_________________________________________________________________
None
The dataset contains around 5300 No. of Sentences. I am using validation split=0.33.
The Model behaves in abnormal way. The validation loss keeps increasing and validation accuracy moves in constant way. I am attaching the graph.
Please guide me how to solve this issue.
My Model looks like this :
model=Sequential()
model.add(Embedding(
num_words,
EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH
))
model.add(LSTM(32,return_sequences=True))
model.add(Dropout(0.5))
model.add(GlobalMaxPool1D())
model.add(Dense(len(possible_labels), activation="softmax"))
I am also attaching Accuracy Graph.

Increase dropout.
Train for fewer epochs.
Try Conv1D instead of LSTM to see if the overfitting goes away.

Related

LSTM model always predicts one class Label

I am predicting for network intrusion attacks using CICIDS2017 dataset using trained LSTM model which has following architechture :
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 30) 13080
dropout (Dropout) (None, 30) 0
softmax (Dense) (None, 2) 62
=================================================================
Total params: 13,142
Trainable params: 13,142
Non-trainable params: 0
_________________________________________________________________
I have applied MinMaxScaler() for normalizing and SMOTE for fixing imbalance in the dataset.
Now, when I predict for a single sample, I always get label "1" which corresponds to Attack category.
I have already tried checking and verifying the features but still it didn't helped.

Acces to last convolutional layer transfer learning

I'm trying to get some heatmaps from a computervision model that's it's already working to classify images but I'm finding some difficulties.
This is the model summary:
model.summary()
Model: "model_4"
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) [(None, 512, 512, 1)] 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 512, 512, 3) 30
_________________________________________________________________
densenet121 (Functional) (None, 1024) 7037504
_________________________________________________________________
dense_4 (Dense) (None, 100) 102500
_________________________________________________________________
dropout_4 (Dropout) (None, 100) 0
_________________________________________________________________
predictions (Dense) (None, 2) 202
=================================================================
Total params: 7,140,236
Trainable params: 7,056,588
Non-trainable params: 83,648
As part of the standard procces to create a heatmap, I know I have to acces to the last convolutional layer in the model, that in this case I'll say it's a layer inside the Densenet121, but I can not find a way to access to all the layers belonging to densenet121.
Right now, I've been using conv2d_4 layer to run some tests, but I feel is not the right way because that layer is before all the Transfer learning work from densenet.
Also, I just looked up for Funcitnal layers in KErar official documentation but I cound't find it, so I guess it's not a layer, it's like the hole densenet model embedded there, but I can not find a way to access.
By the way, here I share the model construction because it may help to answer this:
from tensorflow.keras.applications.densenet import DenseNet121
num_classes = 2
input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1))
x = Conv2D(3,(3,3), padding='same')(input_tensor)
x = DenseNet121(include_top=False, classes=2, pooling="avg", weights="imagenet")(x)
x = Dense(100)(x)
x = Dropout(0.45)(x)
predictions = Dense(num_classes, activation='softmax', name="predictions")(x)
model = Model(inputs=input_tensor, outputs=predictions)

I found you can use
.get_layer()
twice to acces layers inside functional densenet model embebeed in the "main" model.
In this case I can use model.get_layer('densenet121').summary() to check all thje layer inside the embebeed model, and then use them with this code: model.get_layer('densenet121').get_layer('xxxxx')

How to interpret output shape printed in the summary of tensorflow model?

I have simple multi-layer perceptron for MNIST data classification problem.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
When printing summary i receive following output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_8 (Flatten) (None, 784) 0
_________________________________________________________________
dense_16 (Dense) (None, 128) 100480
_________________________________________________________________
dense_17 (Dense) (None, 10) 1290
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
How do I interpret output shape printed in the summary? Why is there None therm in the output shape tuple? Why is it not just (784) in the first layer?

The "None" value refers to the number of input samples (the batch size). To allow you to train on different sized training sets, this value is None. If it were a number, let's say 50 for example, that means you can only train on exactly 50 samples which is usually not very useful (but does occasionally have applications).

Keras Transfer-Learning setting layers.trainable to True has no effect

I want to finetune efficientnet using tf.keras (tensorflow 2.3) but i cannot change the training status of layers properly. My model looks like this:
data_augmentation_layers = tf.keras.Sequential([
keras.layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),
keras.layers.experimental.preprocessing.RandomRotation(0.8)])
efficientnet = EfficientNetB3(weights="imagenet", include_top=False,
input_shape=(*img_size, 3))
#Setting to not trainable as described in the standard keras FAQ
efficientnet.trainable = False
inputs = keras.layers.Input(shape=(*img_size, 3))
augmented = augmentation_layers(inputs)
base = efficientnet(augmented, training=False)
pooling = keras.layers.GlobalAveragePooling2D()(base)
outputs = keras.layers.Dense(5, activation="softmax")(pooling)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss="categorical_crossentropy", optimizer=keras_opt, metrics=["categorical_accuracy"])
This is done so that my random weights on the custom top wont destroy the weights asap.
Model: "functional_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 512, 512, 3)] 0
_________________________________________________________________
sequential (Sequential) (None, 512, 512, 3) 0
_________________________________________________________________
efficientnetb3 (Functional) (None, 16, 16, 1536) 10783535
_________________________________________________________________
global_average_pooling2d (Gl (None, 1536) 0
_________________________________________________________________
dense (Dense) (None, 5) 7685
=================================================================
Total params: 10,791,220
Trainable params: 7,685
Non-trainable params: 10,783,535
Everything seems to work until this point. I train my model for 2 epochs and then i want to start fine-tuning the efficientnet base. Thus i call
for l in model.get_layer("efficientnetb3").layers:
if not isinstance(l, keras.layers.BatchNormalization):
l.trainable = True
model.compile(loss="categorical_crossentropy", optimizer=keras_opt, metrics=["categorical_accuracy"])
I recompiled and print the summary again to see that the number of non-trainable weights remained the same. Also fitting does not bring better results that keeping frozen.
dense (Dense) (None, 5) 7685
=================================================================
Total params: 10,791,220
Trainable params: 7,685
Non-trainable params: 10,783,535
Ps: I also tried efficientnet3.trainable = True but this also had no effect.
Could it be that it has something to do with the fact that i'm using a sequential and a functional model at the same time?

For me the problem was using sequential API for part of the model. When I change to sequential, my model.sumary() displayed all the sublayers and it was possible to set some of them as trainable and others not.

how many bidirectional lstm layers to use and how many is too many? Any advice on very imbalanced dataset?

Would much appreciate your help.
I am new to the RNN and I am trying to implement a RNN architecture to classify protein sequences. essentially they are one hot encoded np arrays.
I have an issue that the data is very imbalanced:
Examples:
Total: 34909
Positive: 282 (0.81% of total)
Therefore I am planning to implement the weights for the different classes by adding the class_weight=class_weight parameter when model is fitted.
I am also planning to use the f1 on the validation as a metric instead of accuracy or loss for the model as I am not interested in the true negatives.
Moreover, I am planning to implement transfer learning as I have dataseets with more positive data and datasets with only few points therefore I am planning to pretrain a general model and use the weights to further train on the specific problem.
I have come up with this architecture of the model however I am not sure if adding 4 bidirectional LSTM layers is a wise choice?:
from keras import regularizers
if output_bias is not None:
output_bias = Constant(output_bias)
model = Sequential()
# First LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True, recurrent_dropout=0.1), input_shape=(timesteps, features)))
model.add(Dropout(0.5))
# Second LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Third LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Forth LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=False)))
model.add(Dropout(0.5))
#First Dense Layer
model.add(Dense(units=128,kernel_initializer='he_normal',activation='relu'))
model.add(Dropout(0.5))
# Adding the output layer
if output_bias == None:
model.add(Dense(units=1, activation='sigmoid',kernel_regularizer=regularizers.l2(0.001)))
else:
model.add(Dense(units=1, activation='sigmoid',
bias_initializer=output_bias,kernel_regularizer=regularizers.l2(0.001)))
model.compile(optimizer=Adam(lr=1e-3), loss=BinaryCrossentropy(), metrics=metrics)
model.build()
How do I know how many LSTM layers I should add? is it just trial and error?
Is there anything else I should include in the layers?
model.summary():
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_13 (Bidirectio (None, 5, 100) 28400
_________________________________________________________________
dropout_16 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_14 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_17 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_15 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_18 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_16 (Bidirectio (None, 100) 60400
_________________________________________________________________
dropout_19 (Dropout) (None, 100) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 12928
_________________________________________________________________
dropout_20 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 1) 129
=================================================================
Total params: 222,657
Trainable params: 222,657
Non-trainable params: 0
I have built this model by going through multiple tutorials such as https://www.tensorflow.org/tutorials/text/text_classification_rnn
and https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
Would appreciate if you could point in the right direction.
Thanks!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to stop over fitting a Model in Keras? - python

Increase dropout. Train for fewer epochs. Try Conv1D instead of LSTM to see if the overfitting goes away.

Related

LSTM model always predicts one class Label

Acces to last convolutional layer transfer learning

How to interpret output shape printed in the summary of tensorflow model?

Keras Transfer-Learning setting layers.trainable to True has no effect

how many bidirectional lstm layers to use and how many is too many? Any advice on very imbalanced dataset?

Categories

Resources