I tried to train a CNN to classify 9 class of image. Each class has 1000 image for training. I tried training on VGG16 and VGG19, both can achieve validation accuracy of 90%. But when I tried to train on InceptionResNetV2 model, the model seems to stuck around 20% and 30%. Below is my code for InceptionResNetV2 and the training. What can I do to improve the training?
base_model = tf.keras.applications.InceptionResNetV2(input_shape=(IMG_HEIGHT, IMG_WIDTH ,3),weights = 'imagenet',include_top=False)
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
Flatten(),
Dense(1024, activation = 'relu', kernel_regularizer=regularizers.l2(0.001)),
LeakyReLU(alpha=0.4),
Dropout(0.5),
BatchNormalization(),
Dense(1024, activation = 'relu', kernel_regularizer=regularizers.l2(0.001)),
LeakyReLU(alpha=0.4),
Dense(9, activation = 'softmax')])
optimizer_model = tf.keras.optimizers.Adam(learning_rate=0.0001, name='Adam', decay=0.00001)
loss_model = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer_model, loss="categorical_crossentropy", metrics=['accuracy'])
Epoch 1/10
899/899 [==============================] - 255s 283ms/step - loss: 4.3396 - acc: 0.3548 - val_loss: 4.2744 - val_acc: 0.3874
Epoch 2/10
899/899 [==============================] - 231s 257ms/step - loss: 3.5856 - acc: 0.4695 - val_loss: 3.9151 - val_acc: 0.3816
Epoch 3/10
899/899 [==============================] - 225s 250ms/step - loss: 3.1451 - acc: 0.4959 - val_loss: 4.8801 - val_acc: 0.2425
Epoch 4/10
899/899 [==============================] - 227s 252ms/step - loss: 2.7771 - acc: 0.5124 - val_loss: 3.7167 - val_acc: 0.3023
Epoch 5/10
899/899 [==============================] - 231s 257ms/step - loss: 2.4993 - acc: 0.5260 - val_loss: 3.7276 - val_acc: 0.3770
Epoch 6/10
899/899 [==============================] - 227s 252ms/step - loss: 2.3148 - acc: 0.5251 - val_loss: 3.7677 - val_acc: 0.3115
Epoch 7/10
899/899 [==============================] - 234s 260ms/step - loss: 2.1381 - acc: 0.5379 - val_loss: 3.4867 - val_acc: 0.2862
Epoch 8/10
899/899 [==============================] - 230s 256ms/step - loss: 2.0091 - acc: 0.5367 - val_loss: 4.1032 - val_acc: 0.3080
Epoch 9/10
899/899 [==============================] - 225s 251ms/step - loss: 1.9155 - acc: 0.5399 - val_loss: 4.1270 - val_acc: 0.2954
Epoch 10/10
899/899 [==============================] - 232s 258ms/step - loss: 1.8349 - acc: 0.5508 - val_loss: 4.3918 - val_acc: 0.2276
VGG-16/19 has a depth of 23/26 layers, whereas, InceptionResNetV2 has a depth of 572 layers. Now, there is minimal domain similarity between medical images and imagenet dataset. In VGG, due to low depth the features you're getting are not that complex and network is able to classify it on the basis of Dense layer features. However, in IRV2 network, as it's too much deep, the output of the fc layer is more complex (visualize it something object like but for imagenet dataset), and, then the features obtained from these layers are unable to connect to the Dense layer features, and, hence overfitting. I think you were able to get my point.
Check out my answer to very similar question of yours on this link: Link. It will help improve your accuracy.
Related
I'm new to machine learning and I'm building an RNN classifier for a problem similar to Name Entity Recognition (NER) but with only two tags.
I followed a tutorial to build the classifier, and now when fitting the model, I get a constant validation accuracy for all the epochs, and some part of me thinks this may be a mistake. Is it normal to have a constant val_accuracy ?
this is my model:
input = Input(shape=(66,))
word_embedding_size = 66
model = Embedding(input_dim=n_words, output_dim=word_embedding_size, input_length=66)(input)
model = Bidirectional(LSTM(units=word_embedding_size,
return_sequences=True,
dropout=0.5,
recurrent_dropout=0.5,
kernel_initializer=k.initializers.he_normal()))(model)
model = LSTM(units=word_embedding_size * 2,
return_sequences=True,
dropout=0.5,
recurrent_dropout=0.5,
kernel_initializer=k.initializers.he_normal())(model)
model = TimeDistributed(Dense(n_tags, activation="sigmoid"))(model)
out = model
model = Model(input, out)
adam = k.optimizers.Adam(lr=0.0005, beta_1=0.9, beta_2=0.999)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
history = model.fit(X, np.array(Y), batch_size=256, epochs=10, validation_split=0.3, verbose=1)
and this is how the epoch look
Epoch 1/10
2/2 [==============================] - 2s 801ms/step - loss: 0.6990 - accuracy: 0.3123 - val_loss: 0.5732 - val_accuracy: 0.9675
Epoch 2/10
2/2 [==============================] - 1s 334ms/step - loss: 0.5552 - accuracy: 0.9713 - val_loss: 0.4202 - val_accuracy: 0.9675
Epoch 3/10
2/2 [==============================] - 1s 310ms/step - loss: 0.3997 - accuracy: 0.9723 - val_loss: 0.2377 - val_accuracy: 0.9675
Epoch 4/10
2/2 [==============================] - 1s 303ms/step - loss: 0.2260 - accuracy: 0.9723 - val_loss: 0.1168 - val_accuracy: 0.9675
Epoch 5/10
2/2 [==============================] - 1s 312ms/step - loss: 0.1126 - accuracy: 0.9723 - val_loss: 0.0851 - val_accuracy: 0.9675
I am trying to train my model using transfer learning, for this I am using VGG16 model, stripped the top layers and froze first 2 layers for using imagenet initial weights. For fine tuning them I am using learning rate 0.0001, activation softmax, dropout 0.5, loss categorical crossentropy, optimizer SGD, classes 46.
I am just unable to understand the behavior while training. Train loss and acc both are fine (loss is decreasing, acc is increasing). Val loss is decreasing and acc is increasing as well, BUT they are always higher than the train loss and acc.
Assuming its overfitting I made the model less complex, increased the dropout rate, added more samples to val data, but nothing seemed to work. I am a newbie so any kind of help is appreciated.
26137/26137 [==============================] - 7446s 285ms/step - loss: 1.1200 - accuracy: 0.3810 - val_loss: 3.1219 - val_accuracy: 0.4467
Epoch 2/50
26137/26137 [==============================] - 7435s 284ms/step - loss: 0.9944 - accuracy: 0.4353 - val_loss: 2.9348 - val_accuracy: 0.4694
Epoch 3/50
26137/26137 [==============================] - 7532s 288ms/step - loss: 0.9561 - accuracy: 0.4530 - val_loss: 1.6025 - val_accuracy: 0.4780
Epoch 4/50
26137/26137 [==============================] - 7436s 284ms/step - loss: 0.9343 - accuracy: 0.4631 - val_loss: 1.3032 - val_accuracy: 0.4860
Epoch 5/50
26137/26137 [==============================] - 7358s 282ms/step - loss: 0.9185 - accuracy: 0.4703 - val_loss: 1.4461 - val_accuracy: 0.4847
Epoch 6/50
26137/26137 [==============================] - 7396s 283ms/step - loss: 0.9083 - accuracy: 0.4748 - val_loss: 1.4093 - val_accuracy: 0.4908
Epoch 7/50
26137/26137 [==============================] - 7424s 284ms/step - loss: 0.8993 - accuracy: 0.4789 - val_loss: 1.4617 - val_accuracy: 0.4939
Epoch 8/50
26137/26137 [==============================] - 7433s 284ms/step - loss: 0.8925 - accuracy: 0.4822 - val_loss: 1.4257 - val_accuracy: 0.4978
Epoch 9/50
26137/26137 [==============================] - 7445s 285ms/step - loss: 0.8868 - accuracy: 0.4851 - val_loss: 1.5568 - val_accuracy: 0.4953
Epoch 10/50
26137/26137 [==============================] - 7387s 283ms/step - loss: 0.8816 - accuracy: 0.4874 - val_loss: 1.4534 - val_accuracy: 0.4970
Epoch 11/50
26137/26137 [==============================] - 7374s 282ms/step - loss: 0.8779 - accuracy: 0.4894 - val_loss: 1.4605 - val_accuracy: 0.4912
Epoch 12/50
26137/26137 [==============================] - 7411s 284ms/step - loss: 0.8733 - accuracy: 0.4915 - val_loss: 1.4694 - val_accuracy: 0.5030
Yes, you are facing over-fitting issue. To mitigate, you can try to implement below steps
1.Shuffle the Data, by using shuffle=True in VGG16_model.fit. Code is shown below:
history = VGG16_model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1,
validation_data=(x_validation, y_validation), shuffle = True)
2.Use Early Stopping. Code is shown below
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
3.Use Regularization. Code for Regularization is shown below (You can try l1 Regularization or l1_l2 Regularization as well):
from tensorflow.keras.regularizers import l2
Regularizer = l2(0.001)
VGG16_model.add(Conv2D(96,11, 11, input_shape = (227,227,3),strides=(4,4), padding='valid', activation='relu', data_format='channels_last',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
VGG16_model.add(Dense(units = 2, activation = 'sigmoid',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
4.You can try using BatchNormalization.
5.Perform Image Data Augmentation using ImageDataGenerator. Refer this link for more info about that.
6.If the Pixels are not Normalized, Dividing the Pixel Values with 255 also helps
Training convolutional neural network from scratch on my own dataset with Keras and Tensorflow.
learning rate = 0.0001,
5 classes to sort,
no Dropout used,
dataset checked twice, no wrong labels found
Model:
model = models.Sequential()
model.add(layers.Conv2D(16,(2,2),activation='relu',input_shape=(75,75,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(16,(2,2),activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(32,(2,2),activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(128,activation='relu'))
model.add(layers.Dense(5,activation='sigmoid'))
model.compile(optimizer=optimizers.adam(lr=0.0001),
loss='categorical_crossentropy',
metrics=['acc'])
history = model.fit_generator(train_generator,
steps_per_epoch=100,
epochs=50,
validation_data=val_generator,
validation_steps=25)
Everytime when model achieves 25-35 epochs (80-90% accuracy) this happens:
Epoch 31/50
100/100 [==============================] - 3s 34ms/step - loss: 0.3524 - acc: 0.8558 - val_loss: 0.4151 - val_acc: 0.7992
Epoch 32/50
100/100 [==============================] - 3s 34ms/step - loss: 0.3393 - acc: 0.8700 - val_loss: 0.4384 - val_acc: 0.7951
Epoch 33/50
100/100 [==============================] - 3s 34ms/step - loss: 0.3321 - acc: 0.8702 - val_loss: 0.4993 - val_acc: 0.7620
Epoch 34/50
100/100 [==============================] - 3s 33ms/step - loss: 1.5444 - acc: 0.3302 - val_loss: 1.6062 - val_acc: 0.1704
Epoch 35/50
100/100 [==============================] - 3s 34ms/step - loss: 1.6094 - acc: 0.2935 - val_loss: 1.6062 - val_acc: 0.1724
There is some similar problems with answers, but mostly they recommend to lower learning rate, but it doesnt help at all.
UPD: almost all weights and biases in network became nan. Network somehow died inside
Solution in this case:
I changed sigmoid function in last layer to softmax function and drops are gone
Why this worked out?
sigmoid activation function is used for binary (two-class) classifications.
In multiclassification problems we should use softmax function - special extension of sigmoid function for multiclassification problems.
More information: Sigmoid vs Softmax
Special thanks to #desertnaut and #Shubham Panchal for error indication
I am having 0.3 million image in my Train set - Male/Female and around ~50K image in the test set - Male/Female . I am using below to work , also tried to add few more layers and more units . Also, I am doing data augmentation and others provided from keras docs.
targetSize =64
classifier.add(Conv2D(filters = 32,kernel_size =(3,3),input_shape=(targetSize,targetSize,3),activation ='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Conv2D(filters = 32,kernel_size =(3,3),activation ='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Conv2D(filters = 32,kernel_size =(3,3),activation ='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Conv2D(filters = 32,kernel_size =(3,3),activation ='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Flatten())
classifier.add(Dropout(rate = 0.6))
classifier.add(Dense(units = 64, activation='relu'))
classifier.add(Dropout(rate = 0.5))
classifier.add(Dense(units = 64, activation='relu'))
classifier.add(Dropout(rate = 0.2))
classifier.add(Dense(units = 1,activation='sigmoid')
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Part 2 - Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
height_shift_range = 0.2,
width_shift_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('<train_folder_loc>',
target_size = (img_size, img_size),
batch_size = batch_size_train,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('<test_folder_loc>',
target_size = (img_size, img_size),
batch_size = batch_size_test,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = <train_image_count>/batch_size_train,
epochs = n_epoch,
validation_data = test_set,
validation_steps = <test_image_count>/batch_size_test,
use_multiprocessing = True,
workers=<mycpu>)
But with many combinations tried I am getting result like below , train acc and val acc is not moving ahead . I tried till 100 epoch and its almost like same.
11112/11111 [==============================] - 156s 14ms/step - loss: 0.5628 - acc: 0.7403 - val_loss: 0.6001 - val_acc: 0.6967
Epoch 2/25
11112/11111 [==============================] - 156s 14ms/step - loss: 0.5516 - acc: 0.7403 - val_loss: 0.6096 - val_acc: 0.6968
Epoch 3/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5472 - acc: 0.7404 - val_loss: 0.5837 - val_acc: 0.6967
Epoch 4/25
11112/11111 [==============================] - 155s 14ms/step - loss: 0.5437 - acc: 0.7408 - val_loss: 0.5850 - val_acc: 0.6978
Epoch 5/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5409 - acc: 0.7417 - val_loss: 0.5844 - val_acc: 0.6991
Epoch 6/25
11112/11111 [==============================] - 155s 14ms/step - loss: 0.5386 - acc: 0.7420 - val_loss: 0.5828 - val_acc: 0.7011
Epoch 7/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5372 - acc: 0.7427 - val_loss: 0.5856 - val_acc: 0.6984
Epoch 8/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5347 - acc: 0.7437 - val_loss: 0.5847 - val_acc: 0.7017
Epoch 9/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5331 - acc: 0.7444 - val_loss: 0.5770 - val_acc: 0.7017
Epoch 10/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5323 - acc: 0.7443 - val_loss: 0.5803 - val_acc: 0.7037
Epoch 11/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5309 - acc: 0.7453 - val_loss: 0.5877 - val_acc: 0.7018
Epoch 12/25
11112/11111 [==============================] - 155s 14ms/step - loss: 0.5294 - acc: 0.7454 - val_loss: 0.5774 - val_acc: 0.7037
Epoch 13/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5282 - acc: 0.7464 - val_loss: 0.5807 - val_acc: 0.7024
Epoch 14/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5276 - acc: 0.7467 - val_loss: 0.5815 - val_acc: 0.7033
Epoch 15/25
11112/11111 [==============================] - 156s 14ms/step - loss: 0.5269 - acc: 0.7474 - val_loss: 0.5753 - val_acc: 0.7038
Epoch 16/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5263 - acc: 0.7477 - val_loss: 0.5825 - val_acc: 0.7039
Epoch 17/25
11112/11111 [==============================] - 155s 14ms/step - loss: 0.5249 - acc: 0.7485 - val_loss: 0.5821 - val_acc: 0.7037
I need your suggestion on this or any snippet to try .
Make sure you are overfitting on a small sample before trying to extend the network.
I would remove some/all of the Dropout layers and see if it improves performance. I think 3 Dropout layers is quite high.
Try reducing the learning rate.
Try and understand some of the basic principles of CNNs and how they are constructed; implement a simple one which works before arbitrarily putting in your own parameters.
For example, typically the number of filters in successive convolutions increases in powers of two (e.g. 32, 64, 128 etc). Your use of dropout also is questionable, 0.6 is very high, not to mention stacking the three dropouts like you have doesn't make any sense.
Hmm if you look at it closely, its not that its not moviing. it is moving a bit. There are times when models only get better at a certain point no matter how long you train it, or even how much more layers you add. When that happens, it all boils down to the data. I think it would be best to determine what is hindering your model to improve. Also, my friend, training a good model doesn't happen overnight specially with real world data, much more with complex data such as images of humans.
I guess, if you are just following a tutorial which has achieved a better score than yours, you could check the version of packages their using, the data that you have, the steps they took and much more importantly the re run the model. There are instances where models could get different scores on different instances of training.
I suggest you should try playing with the layers more, or even use a different type of Neural Network. If not, you should try playing with your data more. 300k images are a lot but when it comes to image classification, it could be really hard.
Finally, I guess you could look into transfer learning by tensorflow. You can read about it there. It works by retraining pre-made image recognition models. Keras has a tutorial on Transfer learning too.
I want to retrain Google's inception v3 for my own problem with Keras and imagenet weights. The problem is that, when you load inception v3 imagenet weights, you must specify that the number of classes are 1000, as follows:
base_network=inception_v3.InceptionV3(include_top=False, weights='imagenet',classes=1000)
If I the custom number of classes my dataset have (are not 1000), it raises an error saying that if use imagenet weights, you must, mandatory set classes to 1000.
In order to customize the top layer of inception, I've read that you can use a bottleneck. This is nothing more that not use the standard inception top layer and customize it, so, I can use the include_top=False parameter and program my own top layer.
If I do so as follows:
x = base_network.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation='relu')(x)
predictions = Dense(globals_params.num_classes, activation='softmax')(x)
network = Model(inputs=base_network.input, outputs=predictions)
It works (trains) but validation loss and accuracy never change, as you can see (obviously, inception layers are set trainable=False in order to keep imagenet weights).
...
Epoch 00071: acc did not improve
Epoch 72/300
2741/2741 [==============================] - 12s 4ms/step - loss: 0.0471 - acc: 0.9810 - val_loss: 8.5221 - val_acc: 0.4643
Epoch 00072: acc did not improve
Epoch 73/300
2741/2741 [==============================] - 12s 4ms/step - loss: 0.0354 - acc: 0.9872 - val_loss: 8.4629 - val_acc: 0.4718
Epoch 00073: acc did not improve
Epoch 74/300
2741/2741 [==============================] - 12s 4ms/step - loss: 0.0277 - acc: 0.9891 - val_loss: 8.2515 - val_acc: 0.4881
Epoch 00074: acc did not improve
Epoch 75/300
2741/2741 [==============================] - 12s 4ms/step - loss: 0.0330 - acc: 0.9880 - val_loss: 8.5953 - val_acc: 0.4618
Epoch 00075: acc did not improve
Epoch 76/300
2741/2741 [==============================] - 12s 4ms/step - loss: 0.0402 - acc: 0.9854 - val_loss: 8.3820 - val_acc: 0.4793
Epoch 00076: acc did not improve
Epoch 77/300
2741/2741 [==============================] - 12s 4ms/step - loss: 0.0337 - acc: 0.9880 - val_loss: 8.1831 - val_acc: 0.4906
Epoch 00077: acc did not improve
Epoch 78/300
2741/2741 [==============================] - 12s 4ms/step - loss: 0.0381 - acc: 0.9858 - val_loss: 8.4118 - val_acc: 0.4756
...
The question is: how can I program a top layer for inception that allows me train on my own dataset and changing my validation accuracy? I've looked for every site on the internet and I didn't find anything.
Thanks in advance!