I'm a beginner in the development of CNNs and for a university assignment I've been tasked to create an image classificator for food items. The dataset I'm using is Recipes5k. It has 101 classes of foods:
I'm using Google Colab paired with the Tensorflow to achieve this and have been following Tensorflow's image classification beginner tutorial.
So far, everything has been clear and easy to understand but I've ran across a problem when it comes to training my model: The Validation Accuracy is outrageously low (10-11%) when compared to the training accuracy (90%+). I suspect this may be due to overfitting of the model. So far, I've tried image augmentation techniques and applying dropout to the model. This did not work as expected and only boosted the accuracy by about 5%. I have posted the code snippets necessary below:
Data Augmentation layer:
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.experimental.preprocessing.RandomRotation(0.1),
layers.experimental.preprocessing.RandomZoom(0.1),
]
)
Model:
model = Sequential([
data_augmentation,
layers.experimental.preprocessing.Rescaling(1./255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.3),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
Model Summary:
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential_1 (Sequential) (None, 224, 224, 3) 0
_________________________________________________________________
rescaling_2 (Rescaling) (None, 224, 224, 3) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 224, 224, 16) 448
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 112, 112, 16) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 112, 112, 32) 4640
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 56, 56, 32) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 56, 56, 64) 18496
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 28, 28, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 28, 28, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 50176) 0
_________________________________________________________________
dense_2 (Dense) (None, 128) 6422656
_________________________________________________________________
dense_3 (Dense) (None, 101) 13029
=================================================================
Total params: 6,459,269
Trainable params: 6,459,269
Non-trainable params: 0
_________________________________________________________________
Results after training with 250 epochs
Epoch 250/250
121/121 [==============================] - 3s 25ms/step - loss: 0.2564 - accuracy: 0.9270 - val_loss: 17.6184 - val_accuracy: 0.1202
What other techniques can I use to improve the accuracy of my model?
Update: I followed Gerry P's suggestion and edited my last dense layer to work with softmax activation. The results of 1250 epochs of training presented a slower increase in training accuracy and around 5-6% more validation accuracy. This improved my model but it is still a very low accuracy.
For your last dense layer change it to
layers.Dense(num_classes, activation='softmax')
In model.compile() use
loss='categorical_crossentropy'
If your labels are one hot encoded. If they are integers then use
loss='sparse_categorical_crossentropy'
Related
I have a custom model trained initially on VGG16 using transfer learning. However, it was initially trained on images with a smaller input size. Now, I am using images with bigger sizes, therefore I'd like to grab the first model and take advantage of what it has learned but now with new dataset.
More specifically:
Layer (type) Output Shape Param #
=================================================================
block1_conv1 (Conv2D) (None, 128, 160, 64) 1792
block1_conv2 (Conv2D) (None, 128, 160, 64) 36928
block1_pool (MaxPooling2D) (None, 64, 80, 64) 0
block2_conv1 (Conv2D) (None, 64, 80, 128) 73856
block2_conv2 (Conv2D) (None, 64, 80, 128) 147584
block2_pool (MaxPooling2D) (None, 32, 40, 128) 0
block3_conv1 (Conv2D) (None, 32, 40, 256) 295168
block3_conv2 (Conv2D) (None, 32, 40, 256) 590080
block3_conv3 (Conv2D) (None, 32, 40, 256) 590080
block3_pool (MaxPooling2D) (None, 16, 20, 256) 0
block4_conv1 (Conv2D) (None, 16, 20, 512) 1180160
block4_conv2 (Conv2D) (None, 16, 20, 512) 2359808
block4_conv3 (Conv2D) (None, 16, 20, 512) 2359808
block4_pool (MaxPooling2D) (None, 8, 10, 512) 0
block5_conv1 (Conv2D) (None, 8, 10, 512) 2359808
block5_conv2 (Conv2D) (None, 8, 10, 512) 2359808
block5_conv3 (Conv2D) (None, 8, 10, 512) 2359808
block5_pool (MaxPooling2D) (None, 4, 5, 512) 0
flatten (Flatten) (None, 10240) 0
dense (Dense) (None, 16) 163856
output (Dense) (None, 1) 17
The problem is that this model already includes an input layer of 128x160, and I'd like to change it to 384x288 for transfer learning.
The above is my first model, I now would like to do transfer learning again but with a different dataset that has an input of size 384x288 and I'd like to use a softmax for two classes instead.
So, what i want to do is a transfer learning from the custom model on a different dataset, So I need to change the input size and retrain the new model with my own data
How can I do a transfer learning on the model above but with a new dataset and different classification layer in the output?
You can follow these steps:
Build another instance of model, don't forget to change it's input shape.
Copy the weights of the shared convolutional layers from the loaded model, and set them to be non_trainable.
for new_layer, layer in zip(new_model.layers[0:-4], model.layers[0:-4]):
new_layer.set_weights(layer.get_weights())
new_layer.trainable = False
Add new dense layers and train the whole model.
Further reading:
This answer and This question expain how you can change the input shape.
Keras guides shows how you can do transfer learning with Keras. Under This question are some useful code snippets.
There are many possible solutions for it.
As suggested by many and a very simple solution:
Downscale the image to the input size of pretrained model
Change the final layer of pretrained model and freeze the rest of the layers
Train the model [transfer learning]
Once the model converges you can unfreeze the full model and train the full model again at a very low learning rate [finetuning]
However, in the above approach you are not able to take advantage of higher resolution images you have.
Using pretrained model as feature extractor
Another approach is to use the pretrained model just as feature extractor and train a seperate model on high resolution images. Finally use the features from both the pretrained model as well as your trained model to do the final predictions. The high level idea is as below:
Sample code:
import numpy as np
import tensorflow as tf
from tensorflow import keras
low_res_image_size = (150, 150, 3)
hig_res_image_size = (320, 240, 3)
n_classes = 4
# Load your pretrained model train on low resolution images
base_model = tf.keras.applications.VGG16(
include_top=False, weights='imagenet', input_shape=low_res_image_size)
# Freeze the pretrained model
base_model.trainable = False
# Unfreezed model to be trained on high resolution images
model = tf.keras.applications.VGG19(
include_top=False, weights='imagenet', input_shape=hig_res_image_size)
model.trainable = True
# Downscale images
downscale_layer = tf.keras.layers.Resizing(
low_res_image_size[0], low_res_image_size[1],
interpolation='bilinear', crop_to_aspect_ratio=False)
# Create model
inputs = keras.Input(shape=hig_res_image_size)
downscaled_inputs = downscale_layer(inputs)
features = base_model(downscaled_inputs, training=False)
features = keras.layers.GlobalAveragePooling2D()(features)
x = model(inputs, training=True)
x = keras.layers.GlobalAveragePooling2D()(x)
concatted = tf.keras.layers.Concatenate()([features, x])
outputs = keras.layers.Dense(n_classes)(concatted)
model = keras.Model(inputs, outputs)
model.compile(optimizer="adam", loss='sparse_categorical_crossentropy')
# Train on some random data
model.fit(
np.random.random((100,*hig_res_image_size)),
np.random.randint(0, n_classes, 100), epochs=3)
Output:
Epoch 1/3
4/4 [==============================] - 4s 553ms/step - loss: 8.7033
Epoch 2/3
4/4 [==============================] - 2s 554ms/step - loss: 9.0746
Epoch 3/3
4/4 [==============================] - 2s 553ms/step - loss: 9.0746
<keras.callbacks.History at 0x7f559a104650>
As and added step, after the model converges you and also unfreeze all the layers and train the full model again using a very low learning rate. Just keep an eye on overfitting.
Found a very simple solution to my problem and now I am able to train it with different data and diferent classification layers:
from keras.models import load_model
from keras.models import Model
from keras.models import Sequential
old_model = load_model("/content/drive/MyDrive/old_model.h5")
old_model = Model(old_model.input, old_model.layers[-4].output) # Remove the classification, dense and flatten layers
base_model = Sequential() # Create a new model from the 2nd layer and all the convolutional blocks
for layer in old_model.layers[1:]:
base_model.add(layer)
for layer_number, layer in enumerate(base_model.layers):
print(layer_number, layer.name, layer.trainable)
# Perform transfer learning
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(384, 288, 3)),
base_model,
tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(units=2, activation='softmax')
])
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
Copy your model to the another model (transfer learning), and then update the new model in the way you want to use it. Change input size, change activation functions, whatever you wanna do.
So I am working on a "music genre classification" project and I am working with the GTZAN dataset to create a simple CNN network to classify the genre for an audio file.
My code for the model training , validation and testing is below:
input_shape = (genre_features.train_X.shape[1], genre_features.train_X.shape[2],1)
print("Build CNN model ...")
model = Sequential()
model.add(Conv2D(24, (5, 5), strides=(1, 1), input_shape=input_shape))
model.add(AveragePooling2D((2, 2), strides=(2,2)))
model.add(Activation('relu'))
model.add(Conv2D(48, (5, 5), padding="same"))
model.add(AveragePooling2D((2, 2), strides=(2,2)))
model.add(Activation('relu'))
model.add(Conv2D(48, (5, 5), padding="same"))
model.add(AveragePooling2D((2, 2), strides=(2,2)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dropout(rate=0.5))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
print("Compiling ...")
opt = Adam()
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
model.summary()
print("Training ...")
batch_size = 35 # num of training examples per minibatch
num_epochs = 400
model.fit(
genre_features.train_X,
genre_features.train_Y,
batch_size=batch_size,
epochs=num_epochs
)
print("\nValidating ...")
score, accuracy = model.evaluate(
genre_features.dev_X, genre_features.dev_Y, batch_size=batch_size, verbose=1
)
print("Dev loss: ", score)
print("Dev accuracy: ", accuracy)
print("\nTesting ...")
score, accuracy = model.evaluate(
genre_features.test_X, genre_features.test_Y, batch_size=batch_size, verbose=1
)
print("Test loss: ", score)
print("Test accuracy: ", accuracy)
# Creates a HDF5 file 'lstm_genre_classifier.h5'
model_filename = "lstm_genre_classifier_lstm.h5"
print("\nSaving model: " + model_filename)
model.save(model_filename)
And when I try to train the file I get the following Error ( I also printed the Train , Validation and Test Shape before compiling model)
Training X shape: (700, 128, 33)
Training Y shape: (700, 10)
Dev X shape: (200, 128, 33)
Dev Y shape: (200, 10)
Test X shape: (100, 128, 33)
Test Y shape: (100, 10)
Build CNN model ...
2020-12-25 15:46:58.410663: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Compiling ...
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 124, 29, 24) 624
_________________________________________________________________
average_pooling2d_1 (Average (None, 62, 14, 24) 0
_________________________________________________________________
activation_1 (Activation) (None, 62, 14, 24) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 62, 14, 48) 28848
_________________________________________________________________
average_pooling2d_2 (Average (None, 31, 7, 48) 0
_________________________________________________________________
activation_2 (Activation) (None, 31, 7, 48) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 31, 7, 48) 57648
_________________________________________________________________
average_pooling2d_3 (Average (None, 15, 3, 48) 0
_________________________________________________________________
activation_3 (Activation) (None, 15, 3, 48) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 2160) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 2160) 0
_________________________________________________________________
dense_1 (Dense) (None, 64) 138304
_________________________________________________________________
activation_4 (Activation) (None, 64) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 64) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 650
_________________________________________________________________
activation_5 (Activation) (None, 10) 0
=================================================================
Total params: 226,074
Trainable params: 226,074
Non-trainable params: 0
_________________________________________________________________
Training ...
Traceback (most recent call last):
File "cnn.py", line 82, in <module>
epochs=400
File "C:\Users\Bharat.000\miniconda3\lib\site-packages\keras\engine\training.py", line 1154, in fit
batch_size=batch_size)
File "C:\Users\Bharat.000\miniconda3\lib\site-packages\keras\engine\training.py", line 579, in _standardize_user_data
exception_prefix='input')
File "C:\Users\Bharat.000\miniconda3\lib\site-packages\keras\engine\training_utils.py", line 135, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (700, 128, 33)
I tried few solutions from some similar questions , but I could not understood much since I am new to this topic. Any help apprecitated about what do I change to get proper output.
Your input dimension is wrong. Are you sure your data is 2D (like images) and not 1D (like sound waves)? If your data is 1D then you should be doing 1 dimensional convolutions. The reason why an error occurs is because your train data has the shape (700 (how many datapoints), 128, 33). In Conv2D in keras you need to have (batch size, image_height, image_width, channels) -- channels could be first or last but its not really relevant. What I am trying to say is that instead of the (image_height, image_width) tuple required by 2Dconv you only provide the number 128. Maybe what you're looking for is 1 Dimensional conv.
I have a model as follows:
from tensorflow import keras
from tensorflow.keras.layers import Dense, Conv2D, Flatten, MaxPooling2D
model = keras.Sequential([
Conv2D(16, (3, 3), padding='same', activation='relu', input_shape=(480, 640, 3), data_format="channels_last"),
MaxPooling2D(),
Conv2D(32, (3, 3), padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, (3, 3), padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(480, activation='relu'),
Dense(1, activation="relu")
])
model.compile(optimizer='adam',
loss=keras.losses.MeanSquaredError(),
metrics=['accuracy'])
epochs = 3
model.fit(
x=train_images,
y=train_values,
epochs=epochs
)
The variable train_images is an array of PNG images (640x480 pixels) and train_values is an array of floats (e.g: [1.11842, -17.894, 2.03, ...].
My goal is to predict the float value (at least, to find some approximate value), so I suppose that MSE should be the loss function in this case.
However, after training the model, I only get zeros not only with model.predict(test_images) but also with model.predict(train_images).
Note: I have to recall that my batch contains only 37 images, and my test sample contains 14. I know that the size is ridiculous, but this script is just a concept for something bigger.
If it helps, here is the result of model.summary():
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 480, 640, 16) 448
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 240, 320, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 240, 320, 32) 4640
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 120, 160, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 120, 160, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 60, 80, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 307200) 0
_________________________________________________________________
dense (Dense) (None, 480) 147456480
_________________________________________________________________
dense_1 (Dense) (None, 1) 481
=================================================================
Total params: 147,480,545
Trainable params: 147,480,545
Non-trainable params: 0
To start with change your activation function, relu limits values so anything below 0 = 0, thats not desire-able
Secondly normalize your y values, as it stands your values could be anywhere from -inf to +inf, range normalize them and store the normalization parameters. At run time you could always reverse this and get actual values
Also pump up the epochs, with that small a train set i suggest try overfitting the network, if a network overfits it will most likely train well
For now try these suggestions out, i think normalization is quite important
ALSO :: I suggest make the network much deeper, you need to extract the shapes and textures in the image and your network might not be deep enough (or as a matter of fact even be dense enough) to do that. I suggest use keras to load a pre trained model like VGG16, strip the head off add regression layers and transfer learn it onto your dataset. That could be better
I want to train a model to predict one's emotion from the physical signals. I have a physical signal and using it as input feature;
ecg(Electrocardiography)
I want to use the CNN architecture to extract features from the data, and then use these extracted features to feed a classical "Decision Tree Classifier". Below, you can see my CNN aproach without the decision tree;
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(MaxPooling1D(2,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(MaxPooling1D(4,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(3, activation = 'softmax'))
I want to edit this code so that, in the output layer there will be working decision tree instead of model.add(Dense(3, activation = 'softmax')). I have tried to save the outputs of the last convolutional layer like this;
output = model.layers[-6].output
And when I printed out the output variable, result was this;
THE OUTPUT: Tensor("conv1d_56/Relu:0", shape=(?, 8971, 30),
dtype=float32)
I guess, the output variable holds the extracted features. Now, how can I feed my decision tree classifier model with this data which is stored in the output variable? Here is the decision tree from scikit learn;
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier(criterion = 'entropy')
dtc.fit()
How should I feed the fit() method? Thanks in advance.
To extract a vector of features that you can pass on to another algorithm, you need a fully connected layer before your softmax layer. Something like this will add in a 128 dimensional layer just before your softmax layer:
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(MaxPooling1D(2,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(MaxPooling1D(4,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation = 'softmax'))
If you then run model.summary() you can see the name of the layers:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_9 (Conv1D) (None, 17941, 15) 915
_________________________________________________________________
max_pooling1d_9 (MaxPooling1 (None, 8970, 15) 0
_________________________________________________________________
dropout_10 (Dropout) (None, 8970, 15) 0
_________________________________________________________________
batch_normalization_9 (Batch (None, 8970, 15) 60
_________________________________________________________________
conv1d_10 (Conv1D) (None, 8911, 30) 27030
_________________________________________________________________
max_pooling1d_10 (MaxPooling (None, 2227, 30) 0
_________________________________________________________________
dropout_11 (Dropout) (None, 2227, 30) 0
_________________________________________________________________
batch_normalization_10 (Batc (None, 2227, 30) 120
_________________________________________________________________
flatten_6 (Flatten) (None, 66810) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 8551808
_________________________________________________________________
dropout_12 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 3) 387
=================================================================
Total params: 8,580,320
Trainable params: 8,580,230
Non-trainable params: 90
_________________________________________________________________
Once your network has been trained you can create a new model where the output layer becomes 'dense_7' and it'll generate 128 dimensional feature vectors:
feature_vectors_model = Model(model.input, model.get_layer('dense_7').output)
dtc_features = feature_vectors_model.predict(your_X_data) # fit your decision tree on this data
I'm trying to train a neural network with the following structure:
model = Sequential()
model.add(Conv1D(filters = 300, kernel_size = 5, activation='relu', input_shape=(4000, 1)))
model.add(Conv1D(filters = 300, kernel_size = 5, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(filters = 320, kernel_size = 5, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Dropout(0.5))
model.add(Dense(num_labels, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
return model
And I'm getting this error:
expected dense_1 to have shape (442, 3) but got array with shape (3, 1)
My input is a set of phrases (12501 total) that have been tokenized for the 4000 most relevant words, and there's 3 possible classification. Therefore my input is train_x.shape = (12501, 4000). I reshaped this to (12501, 4000, 1) for the Conv1D layer. Now, my train_y.shape = (12501,3), and I reshaped that into (12501,3, 1).
I'm using the fit function as follows:
model.fit(train_x, train_y, batch_size=32, epochs=10, verbose=1, validation_split=0.2, shuffle=True)
What am I doing wrong?
There's no need to convert label shape for classification. And you can look at your network structure.
print(model.summary())
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_1 (Conv1D) (None, 3996, 300) 1800
_________________________________________________________________
conv1d_2 (Conv1D) (None, 3992, 300) 450300
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1330, 300) 0
_________________________________________________________________
conv1d_3 (Conv1D) (None, 1326, 320) 480320
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 442, 320) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 442, 320) 0
_________________________________________________________________
dense_1 (Dense) (None, 442, 3) 963
=================================================================
Total params: 933,383
Trainable params: 933,383
Non-trainable params: 0
_________________________________________________________________
The last output of the model is (None, 442, 3), but the shape of your label is (None, 3, 1). You should eventually ending in either a global pooling layer GlobalMaxPooling1D() or a Flatten layer Flatten(), turning the 3D outputs into 2D outputs, for classification or regression.