BatchNormalization layer in keras leads to oscillating validation accuracy - python

I am trying to use BatchNorm layers for a image classification task but I am having trouble with the keras implementation.When I run the same network with BatchNormalization layers I get better training accuracy and validation accuracy but during training the validation accuracy oscillates strongly. Here are what the training surves look like.
Training accuracy with batchnorm
Training accuracy without batchnorm
I have tried changing the batch size (from 128 to 1024), and changing the momentum parameter i the batchnorm layer but it doesn't change much.
I am using batchnorm inbetween conv/Dense layers and their activation layers.
I have also checked that the normalization axis is correct for conv layers (axis=1 with Theano).
Has anybody had similar issues? There are a look of issues posted about the keras implementation of batchnorm but I have not found a solution to this problem yet.
Thanks for any pointers or references to similar issues.
EDIT:
Here is the keras code I have used to build the mdoel but I have tried different architectures and different momentums:
# create model
model = Sequential()
model.add(Conv2D(20, (4, 4), input_shape=(input_channels,s,s), activation='relu'))
model.add(MaxPooling2D(pool_size=(5, 5)))
model.add(Conv2D(30, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(GlobalAveragePooling2D())
model.add(Dense(128))
if use_batch_norm:
model.add(BatchNormalization(axis=1, momentum=0.6))
model.add(Activation('relu'))
model.add(Dropout(dropout_rate))
model.add(Dense(nb_classes))
if use_batch_norm:
model.add(BatchNormalization(axis=1, momentum=0.6))
model.add(Activation('softmax'))

Related

Any suggestions to improve my CNN model (always the same low test accuracy)?

I am working on a project to detect the presence of a person in a painting. I have 4000 training images and 1000 test images resized to (256,256,3)
I tried a CNN model with 3 (Conv layers, MaxPool, BatchNormalization) and 2 fully connected layers.
model = Sequential()
model.add(Conv2D(32, kernel_size = (7, 7), activation='relu', input_shape=shape))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(7,7), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(96, kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation = 'sigmoid'))
The train accuracy always converges to 1 (with just 20-50 epochs) and the test accuracy always remains constant around 0.67.
I tried the following:
I tried changing the size of the layers and adding more layers.
I tried data augmentation
I tried smaller images 128x128x3.
But I always have the same results.
I don't know if this is due to the few images I have, or if the architecture isn't big enough to learn from complex paintings.
I thought of trying Transfer Learning (But I don't know if this will help because it is my first time trying it). Also, do you have any idea where can I find trained models?
So, I am asking from some suggestions to improve my model.
It might be that you are overfitting on your training data, in that case you can use dropout.
The other thing is If you have not already normalized your data, you can do that. I am not sure whether that would be much help but give it a try with sth like:
X_training = X_training / X_training.max()
I tried using VGG16 (frozen) with 4 fully connected layers and the validation accuracy went up to 0.83. Also, I am using ImageDataGenerator.

Training and Testing accuracy not increasing for a CNN followed by a RNN for signature verification

I'm currently working on online signature verification. The dataset has a variable shape of (x, 7) where x is the number of points a person used to sign their signature. I have the following model:
model = Sequential()
#CNN
model.add(Conv1D(filters=64, kernel_size=3, activation='sigmoid', input_shape=(None, 7)))
model.add(MaxPooling1D(pool_size=3))
model.add(Conv1D(filters=64, kernel_size=2, activation='sigmoid'))
#RNN
model.add(Masking(mask_value=0.0))
model.add(LSTM(8))
model.add(Dense(2, activation='softmax'))
opt = Adam(lr=0.0001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
print(model.fit(x_train, y_train, epochs=100, verbose=2, batch_size=50))
score, accuracy = model.evaluate(x_test,y_test, verbose=2)
print(score, accuracy)
I know it may not be the best model but this is the first time I'm building a neural network. I have to use a CNN and RNN as it is required for my honours project. At the moment, I achieve 0.5142 as the highest training accuracy and 0.54 testing accuracy. I have tried increasing the number of epochs, changing the activation function, add more layers, moving the layers around, changing the learning rate and changing the optimizer.
Please share some advice on changing my model or dataset. Any help is much appreciated.
For CNN-RNN, some promising things to try:
Conv1D layers: activation='relu', kernel_initializer='he_normal'
LSTM layer: activation='tanh', and recurrent_dropout=.1, .2, .3
Optimizer: Nadam, lr=2e-4 (Nadam may significantly outperform all other optimizers for RNNs)
batch_size: lower it. Unless you have 200+ batches in total, set batch_size=32; lower batch size better exploits the Stochastic mechanism of the optimizer and can improve generalization
Dropout: right after second Conv1D, with a rate .1, .2 - or, after first Conv1D, with a rate .25, .3, but only if you use SqueezeExcite (see below), else MaxPooling won't work as well
SqueezeExcite: shown to enhance all CNN performance across a large variety of tasks; Keras implementation you can use below
BatchNormalization: while your model isn't large, it's still deep, and may benefit from one BN layer right after second Conv1D
L2 weight decay: on first Conv1D, to prevent it from memorizing the input; try 1e-5, 1e-4, e.g. kernel_regularizer=l2(1e-4) # from keras.regularizers import l2
Preprocessing: make sure all data is normalized (or standardized if time-series), and batches are shuffled each epoch
def SqueezeExcite(_input):
filters = _input._keras_shape[-1]
se = GlobalAveragePooling1D()(_input)
se = Reshape((1, filters))(se)
se = Dense(filters//16,activation='relu',
kernel_initializer='he_normal', use_bias=False)(se)
se = Dense(filters, activation='sigmoid',
kernel_initializer='he_normal', use_bias=False)(se)
return multiply([_input, se])
# Example usage
x = Conv1D(filters=64, kernel_size=4, activation='relu', kernel_initializer='he_normal')(x)
x = SqueezeExcite(x) # place after EACH Conv1D

Training CNN with transfer learning in Keras - image input doesn't work but vector input does

I'm trying to do transfer learning in Keras. I set up a ResNet50 network set to not trainable with some extra layers:
# Image input
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg')) # output is 2048
model.add(Dropout(0.05))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(512, activation='relu'))
model.add(Dense(7, activation='softmax'))
model.layers[0].trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
Then I create input data: x_batch using the ResNet50 preprocess_input function, along with the one hot encoded labels y_batch and do the fitting as so:
model.fit(x_batch,
y_batch,
epochs=nb_epochs,
batch_size=64,
shuffle=True,
validation_split=0.2,
callbacks=[lrate])
Training accuracy gets close to 100% after ten or so epochs, but validation accuracy actually decreases from around 50% to 30% with validation loss steadily increasing.
However if I instead create a network with just the last layers:
# Vector input
model2 = Sequential()
model2.add(Dropout(0.05, input_shape=(2048,)))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model2.summary()
and feed in the output of the ResNet50 prediction:
resnet = ResNet50(include_top=False, pooling='avg')
x_batch = resnet.predict(x_batch)
Then validation accuracy gets up to around 85%... What is going on? Why won't the image input method work?
Update:
This problem is really bizarre. If I change ResNet50 to VGG19 it seems to work ok.
After a lot of googling I found that the problem is to do with the Batch Normalisation layers in ResNet. There are no batch normalisation layers in VGGNet which is why it works for that topology.
There is a pull request to fix this in Keras here, which explains in more detail:
Assume we use one of the pre-trained CNNs of Keras and we want to fine-tune it. Unfortunately, we get no guarantees that the mean and variance of our new dataset inside the BN layers will be similar to the ones of the original dataset. As a result, if we fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference the top layers will receive data which are scaled using the mean/variance of the original dataset. This discrepancy can lead to reduced accuracy.
This means that the BN layers are adjusting to the training data, however when validation is performed, the original parameters of the BN layers are used. From what I can tell, the fix is to allow the frozen BN layers to use the updated mean and variance from training.
A work around is to pre-compute the ResNet output. In fact, this decreases training time considerably, as we are not repeating that part of the calculation.
you can try :
Res = keras.applications.resnet.ResNet50(include_top=False,
weights='imagenet', input_shape=(IMG_SIZE , IMG_SIZE , 3 ) )
# Freeze the layers except the last 4 layers
for layer in vgg_conv.layers :
layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers:
print(layer, layer.trainable)
# Vector input
model2 = Sequential()
model2.add(Res)
model2.add(Flatten())
model2.add(Dropout(0.05 ))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics =(['accuracy'])
model2.summary()

What to look for when initial loss is high in Neural network training?

I am trying to train my neural network for image classification using conv3d. While training I see the initial loss is more than 2. So I was wondering what could I do to reduce this initial loss ?
Here is my model code :
model = Sequential()
model.add(Conv3D(2, (3,3,3), padding = 'same', input_shape= [num_of_frame,
img_rows,img_cols, img_channels] ))
model.add(Activation('relu'))
model.add(Conv3D(64, (3,3,3)))
model.add(Activation('relu'))
model.add(MaxPooling3D(pool_size=(2, 2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
I am using 30 as my batch size and image dimension is 120*90 with adam optimizer.
Your model's first layer is having a hard time detecting basic features since you have only 2 convolution kernels on the first layer which isn't a very good idea. also using 0.25 as a dropout rate is not very common.(values between 0.5 and 0.7 are more commonly used.)
The main reason for a very high loss in the first iteration is due to weights and bias initialization. Loss is calculated after each forward pass and the forward pass is a function of inputs, weight, bias and non-linearities.
As, the only nonlinearity in your network is in the output layer. I suspect this is due to the weights and bias initialization.

Validation accuracy constant in Keras CNN for multiclass image classification

I'm performing a multiclass image classification task. While training my CNN the validation accuracy remains constant across all epochs. I've tried different model architectures and different hyperparameter values but no change. Any ideas would be greatly appreciated. Here are my current results:
Train and Validation Loss and Accuracy
Here is my CNN:
model = models.Sequential()
model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape= .
(img_width, img_height, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.2))
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(8, activation = 'softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-08, decay=0.0001),metrics = ['acc'])
model.summary()
There are a variety of possible underlying factors that can potentially cause this phenomenon - below is a list, by no means exhaustive, of some preliminary fixes you could try:
If you're using the Adam optimizer(or any other adaptive learning rate optimizer such as RMSprop or Adadelta), try a significantly smaller initial learning rate than the default, somewhere on the order of 10E-6. Alternatively, try Stochastic Gradient Descent with an initial learning rate somewhere in the regime of 10E-2 to 10E-3. You could also set a large initial learning rate and anneal it over the course of several training epochs by employing Keras's LearningRateScheduler callback and defining a custom learning rate schedule(for SGD).
If the above doesn't work, try decreasing the complexity of your network (e.g. the number of layers) and increasing the size of the training set. Also, while you're inspecting your training dataset, ensure that your training set doesn't suffer from class imbalance - if it does, you can artificially weight the losses associated with the underrepresented class's training examples using the class_weights parameter that can be passed to the model's fit() method.
If the issue still persists, you may have to confront the possibility that a constant validation loss is possibly an artifact of essentially fitting on noise and any (even somewhat plausible) predictions the model emits may be spurious. You may want to try extracting more informative features, a larger variety of features or perform extensive data augmentation on your training set at this point.
Have a look at this GitHub issue for further suggestions that may help resolve your problem:
https://github.com/keras-team/keras/issues/1597

Categories