I am new to keras and deep learnin.When i crate a sample basic model,i fit it and my model's log loss is same always.
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same', init='he_normal',
input_shape=(color_type, img_rows, img_cols)))
model.add(MaxPooling2D(pool_size=(2, 2), dim_ordering="th"))
model.add(Dropout(0.5))
model.add(Convolution2D(64, 3, 3, border_mode='same', init='he_normal'))
model.add(MaxPooling2D(pool_size=(2, 2), dim_ordering="th")) #this part is wrong
model.add(Dropout(0.5))
model.add(Convolution2D(128, 3, 3, border_mode='same', init='he_normal'))
model.add(MaxPooling2D(pool_size=(2, 2), dim_ordering="th"))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(Adam(lr=1e-3), loss='categorical_crossentropy')
model.fit(x_train, y_train, batch_size=64, nb_epoch=200,
verbose=1, validation_data=(x_valid,y_valid))
Train on 17939 samples, validate on 4485 samples
Epoch 1/200
17939/17939 [==============================] - 8s - loss: 99.8137 - acc: 0.3096 - val_loss: 99.9626 - val_acc: 0.0000e+00
Epoch 2/200
17939/17939 [==============================] - 8s - loss: 99.8135 - acc: 0.2864 - val_loss: 99.9626 - val_acc: 0.0000e+00
Epoch 3/200
17939/17939 [==============================] - 8s - loss: 99.8135 - acc: 0.3120 - val_loss: 99.9626 - val_acc: 1.0000
Epoch 4/200
17939/17939 [==============================] - 10s - loss: 99.8135 - acc: 0.3315 - val_loss: 99.9626 - val_acc: 1.0000
Epoch 5/200
17939/17939 [==============================] - 10s - loss: 99.8138 - acc: 0.3435 - val_loss: 99.9626 - val_acc: 0.4620
..
...
it's going like this
Do you know whicc part i made wrong ?
One reason for such behavior might be a too small learning rate. Try to increase your learning rate by using Adam(lr=1e-2) or Adam(lr=1e-1). Also, wait couple of more iterations (epochs) and see whether it improves. If not, you may try to decrease the dropout. In addition, I would suggest to normalize your input data if you haven't done it yet.
Related
I'm training a classification model (cifar-100) using a convolutional neural network to classify 60000 images into 100 classes (600 images per class).
When I trained the model for the first time without using any regularization technique
cifar100_model = Sequential()
cifar100_model.add(Conv2D(filters = 32, kernel_size=(3, 3), input_shape=(32, 32, 3),
activation='relu'))
cifar100_model.add(MaxPooling2D(pool_size=(2, 2)))
cifar100_model.add(Conv2D(filters = 64, kernel_size=(3, 3), input_shape=(32, 32, 3),
activation='relu'))
cifar100_model.add(MaxPooling2D(pool_size=(2, 2)))
cifar100_model.add(Conv2D(filters = 128, kernel_size=(3, 3), input_shape=(32, 32, 3),
activation='relu'))
cifar100_model.add(MaxPooling2D(pool_size=(2, 2)))
cifar100_model.add(Flatten())
cifar100_model.add(Dense(256, activation='relu'))
cifar100_model.add(Dense(128, activation='relu'))
#Output Layer
cifar100_model.add(Dense(100, activation="softmax"))
I got the following result:
Training accuracy — 52%
Validation accuracy — 37%
This suggests that the model is overfitting
I re-trained the model and this time I used dropout as a regularization technique
cifar100_model = Sequential()
cifar100_model.add(Conv2D(filters = 32, kernel_size=(3, 3), input_shape=(32, 32, 3), activation='relu'))
cifar100_model.add(MaxPooling2D(pool_size=(2, 2)))
cifar100_model.add(Dropout(0.25))
cifar100_model.add(Conv2D(filters = 64, kernel_size=(3, 3), input_shape=(32, 32, 3), activation='relu'))
cifar100_model.add(MaxPooling2D(pool_size=(2, 2)))
cifar100_model.add(Dropout(0.25))
cifar100_model.add(Conv2D(filters = 128, kernel_size=(3, 3), input_shape=(32, 32, 3), activation='relu'))
cifar100_model.add(MaxPooling2D(pool_size=(2, 2)))
cifar100_model.add(Dropout(0.25))
cifar100_model.add(Flatten())
#Fully connected layers
cifar100_model.add(Dense(256, activation='relu'))
cifar100_model.add(Dense(128, activation='relu'))
#Output Layer (Fully connected)
cifar100_model.add(Dense(100, activation="softmax"))
I got the following results:
Training accuracy — 37%
Validation accuracy — 35%
How can I improve the model?
(Considering that the number of images per class is very low and the quality of images is poor)
Instead of using dropout layer, try using BatchNormalization() layer with Conv2d layer. Also , you can change optimizer, increase number of epochs to get better accuracy.
Here what I have replicated and found better accuracy:
cifar100_model = Sequential()
cifar100_model.add(Conv2D(filters = 32, kernel_size=(2, 2), input_shape=(32, 32, 3), activation='relu'))
cifar100_model.add(MaxPooling2D(pool_size=(2, 2)))
cifar100_model.add(BatchNormalization())
cifar100_model.add(Conv2D(filters = 64, kernel_size=(2, 2), activation='relu'))
cifar100_model.add(MaxPooling2D(pool_size=(2, 2)))
cifar100_model.add(BatchNormalization())
cifar100_model.add(Conv2D(filters = 128, kernel_size=(2, 2),activation='relu'))
cifar100_model.add(MaxPooling2D(pool_size=(2, 2)))
cifar100_model.add(BatchNormalization())
cifar100_model.add(Flatten())
#Fully connected layers
cifar100_model.add(Dense(256, activation='relu'))
cifar100_model.add(Dropout(0.2))
cifar100_model.add(Dense(128, activation='relu'))
cifar100_model.add(Dropout(0.2))
#Output Layer (Fully connected)
cifar100_model.add(Dense(100, activation="softmax"))
cifar100_model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])
history = cifar100_model.fit(train_images, train_labels, epochs=70, validation_data=(test_images, test_labels))
Output:
Epoch 60/70
1563/1563 [==============================] - 8s 5ms/step - loss: 0.5905 - accuracy: 0.8195 - val_loss: 3.0412 - val_accuracy: 0.4311
Epoch 61/70
1563/1563 [==============================] - 9s 6ms/step - loss: 0.5958 - accuracy: 0.8159 - val_loss: 3.1160 - val_accuracy: 0.4278
Epoch 62/70
1563/1563 [==============================] - 9s 6ms/step - loss: 0.5824 - accuracy: 0.8207 - val_loss: 3.0850 - val_accuracy: 0.4341
Epoch 63/70
1563/1563 [==============================] - 9s 6ms/step - loss: 0.5749 - accuracy: 0.8209 - val_loss: 3.2525 - val_accuracy: 0.4323
Epoch 64/70
1563/1563 [==============================] - 9s 6ms/step - loss: 0.5797 - accuracy: 0.8205 - val_loss: 3.0821 - val_accuracy: 0.4366
Epoch 65/70
1563/1563 [==============================] - 8s 5ms/step - loss: 0.5664 - accuracy: 0.8236 - val_loss: 3.1441 - val_accuracy: 0.4333
Epoch 66/70
1563/1563 [==============================] - 9s 6ms/step - loss: 0.5639 - accuracy: 0.8271 - val_loss: 3.0761 - val_accuracy: 0.4368
Epoch 67/70
1563/1563 [==============================] - 9s 6ms/step - loss: 0.5646 - accuracy: 0.8266 - val_loss: 3.2371 - val_accuracy: 0.4368
Epoch 68/70
1563/1563 [==============================] - 9s 6ms/step - loss: 0.5604 - accuracy: 0.8263 - val_loss: 3.1636 - val_accuracy: 0.4271
Epoch 69/70
1563/1563 [==============================] - 9s 5ms/step - loss: 0.5554 - accuracy: 0.8269 - val_loss: 3.2159 - val_accuracy: 0.4296
Epoch 70/70
1563/1563 [==============================] - 9s 6ms/step - loss: 0.5439 - accuracy: 0.8329 - val_loss: 3.2745 - val_accuracy: 0.4291
You can refer this link to know more about other methodologies to be used to overcome with overfitting problem.
Im trying to implement a Cnn using Keras on a Sklearn dataset for handwritten digits recognition (load_digits). I have got the model to run but it is not improving the accuracy for each 'epochs' cycle, Im guessing its because my labels are incorrect, I have tried encoding my Y values with use of 'to_categorical' but it displays the following error:
C:\Users\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\backend.py:4979 binary_crossentropy
return nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)
C:\Users\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\util\dispatch.py:201 wrapper
return target(*args, **kwargs)
C:\Users\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\ops\nn_impl.py:173 sigmoid_cross_entropy_with_logits
raise ValueError("logits and labels must have the same shape (%s vs %s)" %
ValueError: logits and labels must have the same shape ((None, 1) vs (None, 10))
When i run my code without trying to encode the Y values it seems to go through the Cnn Model however it isn't accurate and it doesn't increase, this is my code:
import tensorflow as tf
from sklearn import datasets
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
#from keras.utils.np_utils import to_categorical
X,y = datasets.load_digits(return_X_y = True)
X = X/16
#X = X.reshape(1797,8,8,1)
train_x, test_x, train_y, test_y = train_test_split(X, y)
train_x = train_x.reshape(1347,8,8,1)
#test_x = test_x.reshape()
#train_y = to_categorical(train_y, num_classes = 10)
model = Sequential()
model.add(Conv2D(32, (2, 2), input_shape=( 8, 8, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (2, 2)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(train_x, train_y, batch_size=32, epochs=6, validation_split=0.3)
print(train_x[0])
And this gives me the following output:
Epoch 1/6
1/30 [>.............................] - ETA: 13s - loss: 1.1026 - accuracy: 0.0938
6/30 [=====>........................] - ETA: 0s - loss: 0.2949 - accuracy: 0.0652
30/30 [==============================] - 1s 33ms/step - loss: -5.4832 - accuracy: 0.0893 - val_loss: -49.9462 - val_accuracy: 0.1012
Epoch 2/6
1/30 [>.............................] - ETA: 0s - loss: -52.2145 - accuracy: 0.0625
30/30 [==============================] - 0s 3ms/step - loss: -120.6972 - accuracy: 0.0961 - val_loss: -513.0211 - val_accuracy: 0.1012
Epoch 3/6
1/30 [>.............................] - ETA: 0s - loss: -638.2873 - accuracy: 0.1250
30/30 [==============================] - 0s 3ms/step - loss: -968.3621 - accuracy: 0.1006 - val_loss: -2804.1062 - val_accuracy: 0.1012
Epoch 4/6
1/30 [>.............................] - ETA: 0s - loss: -3427.3135 - accuracy: 0.0000e+00
30/30 [==============================] - 0s 3ms/step - loss: -4571.7894 - accuracy: 0.0934 - val_loss: -10332.9727 - val_accuracy: 0.1012
Epoch 5/6
1/30 [>.............................] - ETA: 0s - loss: -12963.2559 - accuracy: 0.0625
30/30 [==============================] - 0s 3ms/step - loss: -15268.3010 - accuracy: 0.0887 - val_loss: -29262.1191 - val_accuracy: 0.1012
Epoch 6/6
1/30 [>.............................] - ETA: 0s - loss: -30990.6758 - accuracy: 0.1562
30/30 [==============================] - 0s 3ms/step - loss: -40321.9540 - accuracy: 0.0960 - val_loss: -68548.6094 - val_accuracy: 0.1012
Any guidance is greatly appricated, Thanks!
When you have a CNN you want the last layer to have as many nodes as the labels. So if you have 10 digits you want the last layer to have an output size 10. It usually has the activation function "softmax", which makes every value go to 0, except on value which is 1.
model.add(Dense(10))
model.add(Activation('softmax'))
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm training a deep learning model and get a very low accuracy. I used L2 regularization to stop overfitting and to have high accuracy but it didn't solve the problem. what would be the cause of this very low accuracy and how can I stop it ?
The model accuracy is almost perfect (>90%) whereas the validation accuracy is very low (<51%) (shown bellow)
Epoch 1/15
2601/2601 - 38s - loss: 1.6510 - accuracy: 0.5125 - val_loss: 1.6108 - val_accuracy: 0.4706
Epoch 2/15
2601/2601 - 38s - loss: 1.1733 - accuracy: 0.7009 - val_loss: 1.5660 - val_accuracy: 0.4971
Epoch 3/15
2601/2601 - 38s - loss: 0.9169 - accuracy: 0.8147 - val_loss: 1.6223 - val_accuracy: 0.4948
Epoch 4/15
2601/2601 - 38s - loss: 0.7820 - accuracy: 0.8551 - val_loss: 1.7773 - val_accuracy: 0.4683
Epoch 5/15
2601/2601 - 38s - loss: 0.6539 - accuracy: 0.8989 - val_loss: 1.7968 - val_accuracy: 0.4937
Epoch 6/15
2601/2601 - 38s - loss: 0.5691 - accuracy: 0.9204 - val_loss: 1.8743 - val_accuracy: 0.4844
Epoch 7/15
2601/2601 - 38s - loss: 0.5090 - accuracy: 0.9327 - val_loss: 1.9348 - val_accuracy: 0.5029
Epoch 8/15
2601/2601 - 40s - loss: 0.4465 - accuracy: 0.9500 - val_loss: 1.9566 - val_accuracy: 0.4787
Epoch 9/15
2601/2601 - 38s - loss: 0.3931 - accuracy: 0.9596 - val_loss: 2.0824 - val_accuracy: 0.4764
Epoch 10/15
2601/2601 - 41s - loss: 0.3786 - accuracy: 0.9596 - val_loss: 2.1185 - val_accuracy: 0.4925
Epoch 11/15
2601/2601 - 38s - loss: 0.3471 - accuracy: 0.9604 - val_loss: 2.1972 - val_accuracy: 0.4879
Epoch 12/15
2601/2601 - 38s - loss: 0.3169 - accuracy: 0.9669 - val_loss: 2.1091 - val_accuracy: 0.4948
Epoch 13/15
2601/2601 - 38s - loss: 0.3018 - accuracy: 0.9685 - val_loss: 2.2073 - val_accuracy: 0.5006
Epoch 14/15
2601/2601 - 38s - loss: 0.2629 - accuracy: 0.9746 - val_loss: 2.2086 - val_accuracy: 0.4971
Epoch 15/15
2601/2601 - 38s - loss: 0.2700 - accuracy: 0.9650 - val_loss: 2.2178 - val_accuracy: 0.4879
I tried to increase the number of epoch, and it only increases the model accuracy and lowers the validation accuracy.
Any advice on how to overcome this issue?
My code:
def createModel():
input_shape=(11, 3840,1)
model = Sequential()
#C1
model.add(Conv2D(16, (5, 5), strides=( 2, 2), padding='same',activation='relu', input_shape=input_shape))
model.add(keras.layers.MaxPooling2D(pool_size=( 2, 2), padding='same'))
model.add(BatchNormalization())
#C2
model.add(Conv2D(32, ( 3, 3), strides=(1,1), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), padding='same'))
model.add(BatchNormalization())
#C3
model.add(Conv2D(64, (3, 3), strides=( 1,1), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2), padding='same'))
model.add(BatchNormalization())
model.add(Dense(64, input_dim=64,kernel_regularizer=regularizers.l2(0.01)))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(256, activation='sigmoid'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
opt_adam = keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=opt_adam, metrics=['accuracy'])
return model
def getFilesPathWithoutSeizure(indexSeizure, indexPat):
filesPath=[]
print(indexSeizure)
print(indexPat)
for i in range(0, nSeizure):
if(i==indexSeizure):
filesPath.extend(interictalSpectograms[i])
filesPath.extend(preictalSpectograms[i])
shuffle(filesPath)
return filesPath
def generate_arrays_for_training(indexPat, paths, start=0, end=100):
while True:
from_=int(len(paths)/100*start)
to_=int(len(paths)/100*end)
for i in range(from_, int(to_)):
f=paths[i]
x = np.load(PathSpectogramFolder+f)
x = np.expand_dims(np.expand_dims(x, axis=0), axis = 0)
x = x.transpose(0, 2, 3, 1)
if('P' in f):
y = np.repeat([[0,1]],x.shape[0], axis=0)
else:
y =np.repeat([[1,0]],x.shape[0], axis=0)
yield(x,y)
filesPath=getFilesPathWithoutSeizure(i, indexPat)
history=model.fit_generator(generate_arrays_for_training(indexPat, filesPath, end=75),#It take the first 75%
validation_data=generate_arrays_for_training(indexPat, filesPath, start=75), #It take the last 25%
steps_per_epoch=int((len(filesPath)-int(len(filesPath)/100*25))),
validation_steps=int((len(filesPath)-int(len(filesPath)/100*75))),
verbose=2,class_weight = {0:1, 1:1},
epochs=15, max_queue_size=2, shuffle=True)
You seem to have implemented shuffling in the function getFilesPathWithoutSeizure(), though you could verify whether the shuffling is actually working or not by printing out the filenames multiple times.
filesPath=getFilesPathWithoutSeizure(i, indexPat) - is the i getting updated?
As per your code if(i==indexSeizure): in the method getFilesPathWithoutSeizure, only 1 file would return when indexSeizure is equal to the counter (i of the for loop)
If you are not changing i argument being passed during the function call, it could mean that only 1 file is being returned to the filePath variable and your whole training is done on 1 input data instead of the 75% of 3467 files.
--
After confirming that shuffling works and that your function call is inserting all the data in your filePath variable, it still doesn't solve your problem, then try the following:
Data Augmentation could help solve over-fitting by increasing the diversity of your dataset by applying random but realistic transformations such as image rotation, shearing, hortizontal & vertical flips, zooming, de-centering etc.
But more importantly you would need to manually look into your data and understand the similarity in your training data.
Another option would be to just get more and diverse data to train on.
Your model is overfitting and not generalizing properly. If your training set is completely different to your validation set (you are splitting 75% and 25% but the 75% could be completely different to the 25% ), your model will have a hard time generalizing.
Shuffle your data before you split into training and validation. That should improve your results.
I've learned some deep learning with Tensorflow and Keras, so I wanted to do some pratical experiments.
I want to train a model with the CAISAV5 Fingerprint dataset(totally 20,000 fingerprint images), but during the training the training accuracy reaches 97% after 120 epochs while validation accuracy stays abbot 45%.
Here are the results:
Epoch 109/200
150/150 [==============================] - 23s 156ms/step - loss: 0.6971 - accuracy: 0.9418 - val_loss: 4.1766 - val_accuracy: 0.4171
Epoch 110/200
150/150 [==============================] - 23s 155ms/step - loss: 0.6719 - accuracy: 0.9492 - val_loss: 4.1447 - val_accuracy: 0.4379
Epoch 111/200
150/150 [==============================] - 24s 162ms/step - loss: 0.7003 - accuracy: 0.9388 - val_loss: 4.1439 - val_accuracy: 0.4396
Epoch 112/200
150/150 [==============================] - 24s 157ms/step - loss: 0.7010 - accuracy: 0.9377 - val_loss: 4.1577 - val_accuracy: 0.4425
Epoch 113/200
150/150 [==============================] - 24s 160ms/step - loss: 0.6699 - accuracy: 0.9494 - val_loss: 4.1242 - val_accuracy: 0.4371
Epoch 114/200
150/150 [==============================] - 25s 167ms/step - loss: 0.6814 - accuracy: 0.9456 - val_loss: 4.1966 - val_accuracy: 0.4288
Epoch 115/200
150/150 [==============================] - 24s 160ms/step - loss: 0.6440 - accuracy: 0.9590 - val_loss: 4.1586 - val_accuracy: 0.4354
Epoch 116/200
150/150 [==============================] - 23s 157ms/step - loss: 0.7877 - accuracy: 0.9212 - val_loss: 4.0408 - val_accuracy: 0.4246
Epoch 117/200
150/150 [==============================] - 23s 156ms/step - loss: 0.6728 - accuracy: 0.9504 - val_loss: 3.9317 - val_accuracy: 0.4567
Epoch 118/200
150/150 [==============================] - 25s 167ms/step - loss: 0.5710 - accuracy: 0.9874 - val_loss: 3.9505 - val_accuracy: 0.4483
Epoch 119/200
150/150 [==============================] - 24s 158ms/step - loss: 0.5616 - accuracy: 0.9873 - val_loss: 4.0607 - val_accuracy: 0.4542
Epoch 120/200
150/150 [==============================] - 23s 156ms/step - loss: 0.5948 - accuracy: 0.9716 - val_loss: 4.1531 - val_accuracy: 0.4238
Epoch 121/200
150/150 [==============================] - 23s 155ms/step - loss: 0.7453 - accuracy: 0.9150 - val_loss: 4.0798 - val_accuracy: 0.4154
Epoch 122/200
150/150 [==============================] - 26s 172ms/step - loss: 0.7232 - accuracy: 0.9256 - val_loss: 3.9307 - val_accuracy: 0.4425
Epoch 123/200
150/150 [==============================] - 24s 158ms/step - loss: 0.6277 - accuracy: 0.9632 - val_loss: 3.9988 - val_accuracy: 0.4408
Epoch 124/200
150/150 [==============================] - 23s 156ms/step - loss: 0.6367 - accuracy: 0.9581 - val_loss: 4.0837 - val_accuracy: 0.4358
I searched via the Internet and found overfitting may explain this, so I tried to simplify the layers, add dropouts and regulaziers and use batchnormalization. But those methods contribute very little to the accuracy.
Also I have normalized the data, already shuffled and convert its float value between 0.0 and 1.0. The original resolution of the images is 328 * 356, which was resized into 400 * 400 before being fed into the autoencoder.
Here is part of my code:
def encoder(input_img):
#encoder
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
conv1 = BatchNormalization()(conv1)
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv1)
conv1 = BatchNormalization()(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1)
conv2 = BatchNormalization()(conv2)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv2)
conv2 = BatchNormalization()(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool2)
conv3 = BatchNormalization()(conv3)
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv3)
conv3 = BatchNormalization()(conv3)
return conv3
def fc(enco):
pool = keras.layers.MaxPooling2D(pool_size = (2, 2))(enco)
keras.layers.BatchNormalization()
den1 = keras.layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(1e-3))(pool)
keras.layers.BatchNormalization()
pool1 = keras.layers.MaxPooling2D(pool_size = (2, 2))(den1)
keras.layers.Dropout(0.4)
den2 = keras.layers.Dense(256, activation = 'relu', kernel_regularizer=regularizers.l2(1e-3))(pool1)
keras.layers.BatchNormalization()
pool2 = keras.layers.MaxPooling2D(pool_size = (2, 2))(den2)
keras.layers.Dropout(0.4)
den3 = keras.layers.Dense(512, activation = 'relu', kernel_regularizer=regularizers.l2(1e-4))(pool2)
keras.layers.BatchNormalization()
pool3 = keras.layers.AveragePooling2D(pool_size = (2, 2))(den3)
keras.layers.Dropout(0.4)
flat = keras.layers.Flatten()(pool3)
keras.layers.Dropout(0.4)
keras.layers.BatchNormalization()
den4 = keras.layers.Dense(256, activation = 'relu', kernel_regularizer=regularizers.l2(1e-3))(flat)
keras.layers.Dropout(0.4)
keras.layers.BatchNormalization()
out = keras.layers.Dense(num, activation='softmax',kernel_regularizer=regularizers.l2(1e-4))(den4)
return out
encode = encoder(input_img)
full_model = Model(input_img,fc(encode))
for l1,l2 in zip(full_model.layers[0:15],autoencoder_model.layers[0:15]):
l1.set_weights(l2.get_weights())
for layer in full_model.layers[0:15]:
layer.trainable = False
full_model.summary()
full_model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Nadam(),metrics=['accuracy'])
batch_size = 64
The autoencoder_model has already been trained and performs well with a loss lower than 3e-4.
So I'm wondering what cause the low validation accuracy and what can I do to contribute to it?
Most obvious conclusion would be over fitting but given that you tried the standard methods to correct this like model simplification, dropout and regularization without any improvement it may be a different problem. For validation accuracy to be high the probability distribution of the validation data must mirror that of the data the model was trained on. So the question is how was the validation data selected? One thing I would try as a test is to make the validation data an identical subset of the training data. In that case the validation accuracy should approach 100%. If it does not get high then it may be pointing to something in how you process the validation data. I also noticed you elected to not train some layers in the model. Try making all layers trainable and see if that helps. I have seen where freezing the weights in a model can result in lower validation accuracy. Not sure why but I believe if the non trainable layers include dropout then with the weights frozen dropout has no effect and thus lead to over fitting. I am not a great fan of early stopping. It is a crutch for not effectively addressing over fitting issues.
This is my CNN model structure.
def make_dcnn_model():
model = models.Sequential()
model.add(layers.Conv2D(5, (5, 5), input_shape=(9, 128,1), padding='same', strides = (1,2), activity_regularizer=tf.keras.regularizers.l1(0.001)))
model.add(layers.LeakyReLU())
model.add(BatchNormalization())
model.add(layers.AveragePooling2D((4, 4), strides = (2,4)))
model.add(layers.Conv2D(10, (5, 5), padding='same', activity_regularizer=tf.keras.regularizers.l1(0.001)))
model.add(layers.LeakyReLU())
model.add(BatchNormalization())
model.add(layers.AveragePooling2D((2, 2), strides = (1,2)))
model.add(layers.Flatten())
model.add(layers.Dense(50, activity_regularizer=tf.keras.regularizers.l1(0.001)))
model.add(layers.LeakyReLU())
model.add(BatchNormalization())
model.add(layers.Dense(6, activation='softmax'))
return model
The result shows that this model fit well the training data and for the validation data the great fluctuation of validation accuracy occurred.
Train on 7352 samples, validate on 2947 samples
Epoch 1/3000 7352/7352
[==============================] - 3s 397us/sample - loss: 0.1016 -
accuracy: 0.9698 - val_loss: 4.0896 - val_accuracy: 0.5816
Epoch
2/3000 7352/7352 [==============================] - 2s 214us/sample -
loss: 0.0965 - accuracy: 0.9727 - val_loss: 1.2296 - val_accuracy:
0.7384 Epoch 3/3000 7352/7352 [==============================] - 1s 198us/sample - loss: 0.0930 - accuracy: 0.9727 - val_loss: 0.9901 -
val_accuracy: 0.7855 Epoch 4/3000 7352/7352
[==============================] - 2s 211us/sample - loss: 0.1013 -
accuracy: 0.9701 - val_loss: 0.5319 - val_accuracy: 0.9114 Epoch
5/3000 7352/7352 [==============================] - 1s 201us/sample -
loss: 0.0958 - accuracy: 0.9721 - val_loss: 0.6938 - val_accuracy:
0.8388 Epoch 6/3000 7352/7352 [==============================] - 2s 205us/sample - loss: 0.0925 - accuracy: 0.9743 - val_loss: 1.4033 -
val_accuracy: 0.7472 Epoch 7/3000 7352/7352
[==============================] - 1s 203us/sample - loss: 0.0948 -
accuracy: 0.9740 - val_loss: 0.8375 - val_accuracy: 0.7998
Reducing overfitting is a matter of trial and error. There are many ways to deal with it.
Try to add more data to the model or maybe augmenting your data if you're dealing with images. (very helpful)
Try reducing the complexity of the model by tweaking the parameters of the layers.
Try stopping the training earlier.
Regularization and batch normalization are very helpful but it may be the case that your model is already performing much worse without them in terms of overfitting. Try different types of regularization. (maybe Dropout)
My guess is that by adding more variety in the data you're model will overfit less.