I'm performing a multiclass image classification task. While training my CNN the validation accuracy remains constant across all epochs. I've tried different model architectures and different hyperparameter values but no change. Any ideas would be greatly appreciated. Here are my current results:
Train and Validation Loss and Accuracy
Here is my CNN:
model = models.Sequential()
model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape= .
(img_width, img_height, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.2))
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(8, activation = 'softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-08, decay=0.0001),metrics = ['acc'])
model.summary()
There are a variety of possible underlying factors that can potentially cause this phenomenon - below is a list, by no means exhaustive, of some preliminary fixes you could try:
If you're using the Adam optimizer(or any other adaptive learning rate optimizer such as RMSprop or Adadelta), try a significantly smaller initial learning rate than the default, somewhere on the order of 10E-6. Alternatively, try Stochastic Gradient Descent with an initial learning rate somewhere in the regime of 10E-2 to 10E-3. You could also set a large initial learning rate and anneal it over the course of several training epochs by employing Keras's LearningRateScheduler callback and defining a custom learning rate schedule(for SGD).
If the above doesn't work, try decreasing the complexity of your network (e.g. the number of layers) and increasing the size of the training set. Also, while you're inspecting your training dataset, ensure that your training set doesn't suffer from class imbalance - if it does, you can artificially weight the losses associated with the underrepresented class's training examples using the class_weights parameter that can be passed to the model's fit() method.
If the issue still persists, you may have to confront the possibility that a constant validation loss is possibly an artifact of essentially fitting on noise and any (even somewhat plausible) predictions the model emits may be spurious. You may want to try extracting more informative features, a larger variety of features or perform extensive data augmentation on your training set at this point.
Have a look at this GitHub issue for further suggestions that may help resolve your problem:
https://github.com/keras-team/keras/issues/1597
Related
I'm currently working on online signature verification. The dataset has a variable shape of (x, 7) where x is the number of points a person used to sign their signature. I have the following model:
model = Sequential()
#CNN
model.add(Conv1D(filters=64, kernel_size=3, activation='sigmoid', input_shape=(None, 7)))
model.add(MaxPooling1D(pool_size=3))
model.add(Conv1D(filters=64, kernel_size=2, activation='sigmoid'))
#RNN
model.add(Masking(mask_value=0.0))
model.add(LSTM(8))
model.add(Dense(2, activation='softmax'))
opt = Adam(lr=0.0001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
print(model.fit(x_train, y_train, epochs=100, verbose=2, batch_size=50))
score, accuracy = model.evaluate(x_test,y_test, verbose=2)
print(score, accuracy)
I know it may not be the best model but this is the first time I'm building a neural network. I have to use a CNN and RNN as it is required for my honours project. At the moment, I achieve 0.5142 as the highest training accuracy and 0.54 testing accuracy. I have tried increasing the number of epochs, changing the activation function, add more layers, moving the layers around, changing the learning rate and changing the optimizer.
Please share some advice on changing my model or dataset. Any help is much appreciated.
For CNN-RNN, some promising things to try:
Conv1D layers: activation='relu', kernel_initializer='he_normal'
LSTM layer: activation='tanh', and recurrent_dropout=.1, .2, .3
Optimizer: Nadam, lr=2e-4 (Nadam may significantly outperform all other optimizers for RNNs)
batch_size: lower it. Unless you have 200+ batches in total, set batch_size=32; lower batch size better exploits the Stochastic mechanism of the optimizer and can improve generalization
Dropout: right after second Conv1D, with a rate .1, .2 - or, after first Conv1D, with a rate .25, .3, but only if you use SqueezeExcite (see below), else MaxPooling won't work as well
SqueezeExcite: shown to enhance all CNN performance across a large variety of tasks; Keras implementation you can use below
BatchNormalization: while your model isn't large, it's still deep, and may benefit from one BN layer right after second Conv1D
L2 weight decay: on first Conv1D, to prevent it from memorizing the input; try 1e-5, 1e-4, e.g. kernel_regularizer=l2(1e-4) # from keras.regularizers import l2
Preprocessing: make sure all data is normalized (or standardized if time-series), and batches are shuffled each epoch
def SqueezeExcite(_input):
filters = _input._keras_shape[-1]
se = GlobalAveragePooling1D()(_input)
se = Reshape((1, filters))(se)
se = Dense(filters//16,activation='relu',
kernel_initializer='he_normal', use_bias=False)(se)
se = Dense(filters, activation='sigmoid',
kernel_initializer='he_normal', use_bias=False)(se)
return multiply([_input, se])
# Example usage
x = Conv1D(filters=64, kernel_size=4, activation='relu', kernel_initializer='he_normal')(x)
x = SqueezeExcite(x) # place after EACH Conv1D
My model is experiencing wild and big fluctuations in the validation loss and does not converge.
I am doing an image recognition project with my three dogs i.e. classifying the dog in the image. Two dogs are very similar and the 3rd is very different. I took 10 minute videos of each dog, separately. Frames were extracted as images at each second. My dataset consists of about 1800 photos, 600 of each dog.
This block of code is responsible for augmenting and creating the data to feed the model.
randomize = np.arange(len(imArr)) # imArr is the numpy array of all the images
np.random.shuffle(randomize) # Shuffle the images and labels
imArr = imArr[randomize]
imLab= imLab[randomize] # imLab is the array of labels of the images
lab = to_categorical(imLab, 3)
gen = ImageDataGenerator(zoom_range = 0.2,horizontal_flip = True , vertical_flip = True,validation_split = 0.25)
train_gen = gen.flow(imArr,lab,batch_size = 64, subset = 'training')
test_gen = gen.flow(imArr,lab,batch_size =64,subset = 'validation')
This picture is the result of the model below.
model = Sequential()
model.add(Conv2D(16, (11, 11),strides = 1, input_shape=(imgSize,imgSize,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(32, (5, 5),strides = 1))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(BatchNormalization(axis=-1))
model.add(Dropout(0.3))
#Fully connected layer
model.add(Dense(256))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(3))
model.add(Activation('softmax'))
sgd = SGD(lr=0.004)
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])
batch_size = 64
epochs = 100
model.fit_generator(train_gen, steps_per_epoch=(len(train_gen)), epochs=epochs, validation_data=test_gen, validation_steps=len(test_gen),shuffle = True)
Things I have tried.
High/low Learning rate ( 0.01 -> 0.0001)
Increase Dropout to 0.5 in both Dense layers
Increase/Decrease size of both Dense Layers ( 128 min -> 4048 max)
Increased number of CNN layers
Introduced Momentum
Increased/Decreased Batch Size
Things I have not tried
I have not used any other loss or metric
I have not used any other optimiser.
Have not adjusted any parameters of the CNN layers
It seems that there is some form of randomness or too many parameters in my model. I am aware that it is currently overfitting, but that should not be the cause of the volatility(?).
I am not too worried about the performance of the model. I would like to achieve about a 70% accuracy. All I want to do now is to stabilise the validation accuracy and to converge.
Note:
At some epochs, the training loss is very low ( <0.1 ) but validation
loss is very high ( > 3 ).
The videos are taken on different backgrounds, but +- the same amount on each background for each dog.
Some images are a bit blurry.
Change the optimizer to Adam, definitely better. In your code you are using it but with default parameters, you are creating an SGD optimizer but in the compile line you introduce an Adam with no parameters. Play with the actual parameters of your optimizer.
I encourage you to take out the dropout first, see what is happening and the if you manage to overfit, start with low dropout and go up.
Also it might be due to some of your test samples are very hard to detect and thus increase the loss, maybe take out the shuffle in the validation set and watch for any peridiocities to try to find out if there are validation samples hard to detect.
Hope it helps!
I see you have tried a lot of different things. Few suggestions:
I see you use large filters in your Conv2D eg. 11x11 and 5x5. If your image dimensions are not very big, you should definitely go for lower filter dimensions like 3x3.
Try different optimizers, try Adam with varying lr if you haven't.
Otherwise, I don't see much problems. Maybe you need more data for the network to learn better.
I am trying to train my neural network for image classification using conv3d. While training I see the initial loss is more than 2. So I was wondering what could I do to reduce this initial loss ?
Here is my model code :
model = Sequential()
model.add(Conv3D(2, (3,3,3), padding = 'same', input_shape= [num_of_frame,
img_rows,img_cols, img_channels] ))
model.add(Activation('relu'))
model.add(Conv3D(64, (3,3,3)))
model.add(Activation('relu'))
model.add(MaxPooling3D(pool_size=(2, 2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
I am using 30 as my batch size and image dimension is 120*90 with adam optimizer.
Your model's first layer is having a hard time detecting basic features since you have only 2 convolution kernels on the first layer which isn't a very good idea. also using 0.25 as a dropout rate is not very common.(values between 0.5 and 0.7 are more commonly used.)
The main reason for a very high loss in the first iteration is due to weights and bias initialization. Loss is calculated after each forward pass and the forward pass is a function of inputs, weight, bias and non-linearities.
As, the only nonlinearity in your network is in the output layer. I suspect this is due to the weights and bias initialization.
I am trying to use BatchNorm layers for a image classification task but I am having trouble with the keras implementation.When I run the same network with BatchNormalization layers I get better training accuracy and validation accuracy but during training the validation accuracy oscillates strongly. Here are what the training surves look like.
Training accuracy with batchnorm
Training accuracy without batchnorm
I have tried changing the batch size (from 128 to 1024), and changing the momentum parameter i the batchnorm layer but it doesn't change much.
I am using batchnorm inbetween conv/Dense layers and their activation layers.
I have also checked that the normalization axis is correct for conv layers (axis=1 with Theano).
Has anybody had similar issues? There are a look of issues posted about the keras implementation of batchnorm but I have not found a solution to this problem yet.
Thanks for any pointers or references to similar issues.
EDIT:
Here is the keras code I have used to build the mdoel but I have tried different architectures and different momentums:
# create model
model = Sequential()
model.add(Conv2D(20, (4, 4), input_shape=(input_channels,s,s), activation='relu'))
model.add(MaxPooling2D(pool_size=(5, 5)))
model.add(Conv2D(30, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(GlobalAveragePooling2D())
model.add(Dense(128))
if use_batch_norm:
model.add(BatchNormalization(axis=1, momentum=0.6))
model.add(Activation('relu'))
model.add(Dropout(dropout_rate))
model.add(Dense(nb_classes))
if use_batch_norm:
model.add(BatchNormalization(axis=1, momentum=0.6))
model.add(Activation('softmax'))
I'm trying to make an autoencoder using Keras with a tensorflow backend. In particular, I have data of a vector of n_components (i.e. 200) sampled n_times (i.e. 20000). It is key that when I train time t, that I compare it only to time t. It appears that it is shuffling the sampling times. I removed the bottleneck and find that the network is doing a pretty bad job of predicting the n_components, instead representing something more like the mean of the input scaled by each component.
Here is my network with the bottleneck commented out:
model = keras.models.Sequential()
# Make a 7-layer autoencoder network
model.add(keras.layers.Dense(n_components, activation='relu', input_shape=(n_components,)))
model.add(keras.layers.Dense(n_components, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
# model.add(keras.layers.Dense(3, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
# act is a numpy matrix of size (n_components, n_times)
model.fit(act.T, act.T, epochs=15, batch_size=100, shuffle=False)
newact = model.predict(act.T).T
I have tested shuffling the second component of act, n_times, and passing it as model.fit(act.T, act_shuffled.T) and see no difference from model.fit(act.T, act.T). Am I doing something wrong? How can I force it to learn from the specific time?
Many thanks,
Arthur
I believe that I have solved the problem, but more knowledgeable users of Keras might be able to correct me. I had tried many different values for the argument batch_size of fit, but I didn't try a value of 1. When I changed it to 1, it did a good job of reproducing the input data.
I believe that the batch size, even if shuffle is set to False, allows the autoencoder to train one input time against an unrelated input time.
So, I have ammended my code to:
model.fit(act.T, act.T, epochs=15, batch_size=1, shuffle=False)