I have tried many combinations in the values for this model.
Can 2D Convolutions be used instead of 1D for the following case?
How can accuracy be improved for the training dataset?
shape of original dataset : (343889, 80)
shape of - training dataset : (257916, 80)
shape of - training Labels : (257916,)
shape of - testing dataset : (85973, 80)
shape of - testing Labels : (85973,)
The model is
inputShape = (80,1,)
model = Sequential()
model.add(Input(shape=inputShape))
model.add(Conv1D(filters=80, kernel_size=30, activation='relu'))
model.add(MaxPooling1D(40))
model.add(Dense(60))
model.add(Dense(9))
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
Model's summary
Model: "sequential_11"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_11 (Conv1D) (None, 51, 80) 2480
max_pooling1d_9 (MaxPooling (None, 1, 80) 0
1D)
dense_8 (Dense) (None, 1, 60) 4860
dense_9 (Dense) (None, 1, 9) 549
=================================================================
Total params: 7,889
Trainable params: 7,889
Non-trainable params: 0
_________________________________________________________________
The training is given below.
Epoch 1/5
8060/8060 [==============================] - 56s 7ms/step - loss: -25.7724 - accuracy: 0.0015
Epoch 2/5
8060/8060 [==============================] - 44s 5ms/step - loss: -26.7578 - accuracy: 0.0011
Epoch 3/5
8060/8060 [==============================] - 43s 5ms/step - loss: -26.7578 - accuracy: 0.0011
You can try a couple of things to adjust your model performance.
Firstly Try Using Conv2D layers
Modify kernel size to (3,3)
Change optimiser to SGD and loss to Sparse Categorical Crossentropy
Try the following, run the model for a longer epoch and let's see how that goes.
Since you want to classify something, your model is not doing so (at least not directly).
The problems I can see at first sight are:
You use no activation functions (especially in the last layer)
You use 9 output neurons, but binary crossentropy loss.
First of all, in your shoes, I would revise the classification problems with neural network.
About your model, a starting point could be this edit
inputShape = (80,1,)
model = Sequential()
model.add(Conv1D(filters=80, kernel_size=30, activation='relu', input_shape = inputShape))
model.add(MaxPooling1D(40))
model.add(Dense(60), activation='relu') # note activation function
model.add(Dense(9), activation='softmax') # note activation function
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy']) # note the loss function
I am not saying this is going to solve your problem (without knowing data it is impossible) but it is a start, then you have to work on fighting overfitting, hyperparameters tuning, etc.
Related
I am working on sign language detection using VGG16 pre-trained model with grayscale images. When I am trying to run the model.fit command, I am getting the following error.
CLARIFICATION
I already have images as RGB form but I want to use them as grayscale to check if they would work with grayscale. The reason being, with color images, I am not getting the accuracy which I am expecting. It is having test accuracy of max 40% only and getting overfitted on dataset.
Also, this is my model command
vgg = VGG16(input_shape= [128, 128] + [3], weights='imagenet', include_top=False)
This is my model.fit command
history = model.fit(
train_x,
train_y,
epochs=15,
validation_data=(test_x, test_y),
callbacks=[early_stop, checkpoint],
batch_size=32,shuffle=True)
I am new to working with pre-trained models. When I am trying to run the code with color images with 3 channels, my model is getting into overfitting and val_accuracy doesn't rise above 40% so I want to give try the grayscale images as I have added many data augmentation techniques but accuracy is not improving. Any leads are welcomed as I am stuck into this for long time now.
The simplest (and likely fastest) solution I can think of is to just convert your image to rgb. You can do this as part of your model.
model = Sequential([
tf.keras.layers.Lambda(tf.image.grayscale_to_rgb),
vgg
])
This will fix your issue with VGG. I also see that you're missing the last dimensionality for your images. Images in grayscale are expected to be of shape [height, width, 1], but you simply have [height, width]. You can fix this using tf.expand_dims:
model = Sequential([
tf.keras.layers.Lambda(
lambda x: tf.image.grayscale_to_rgb(tf.expand_dims(x, -1))
),
vgg,
])
Note that this solution solves the problem in the graph, so it runs online. Meaning, at runtime, you can feed data exactly the same way you have it now (in the shape [128, 128], without a channels dimension) and it will still functionally work. If this is your expected dimensionality during runtime, this will be faster than manipulating your data before throwing it into the model.
By the way, none of this is ideal, given that VGG was trained specifically to work best with color images. Just thought I should add that.
Why are you getting overfitting?
Maybe for different reasons:
Your images and labels don't equally exist in the train, Val, test. (maybe you have images in train and don't have them in test.) Or your train, Val, test data don't stratify correctly and you train your model on a specific area in your data and features.
You Dataset is very small and you need more data.
Maybe you have noise in your datase, first make sure to remove noise from the dataset. (if you have noise, model fit on your noise.)
How can you input grayscale images to VGG16?
For Using VGG16, you need to input 3 channels images. For this reason, you need to concatenate your images like below to get three channels images from grayscale:
image = tf.concat([image, image, image], -1)
Example of training VGG16 on grayscale images from fashion_mnist dataset:
from tensorflow.keras.applications.vgg16 import VGG16
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
train, val, test = tfds.load(
'fashion_mnist',
shuffle_files=True,
as_supervised=True,
split = ['train[:85%]', 'train[85%:]', 'test']
)
def resize_preprocess(image, label):
image = tf.image.resize(image, (32, 32))
image = tf.concat([image, image, image], -1)
image = tf.keras.applications.densenet.preprocess_input(image)
return image, label
train = train.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
test = test.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
val = val.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
train = train.repeat(15).batch(64).prefetch(tf.data.AUTOTUNE)
test = test.batch(64).prefetch(tf.data.AUTOTUNE)
val = val.batch(64).prefetch(tf.data.AUTOTUNE)
base_model = VGG16(weights="imagenet", include_top=False, input_shape=(32,32,3))
base_model.trainable = False ## Not trainable weights
model = tf.keras.Sequential()
model.add(base_model)
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer='Adam',
metrics=['accuracy'])
model.summary()
fit_callbacks = [tf.keras.callbacks.EarlyStopping(
monitor='val_accuracy', patience = 4, restore_best_weights = True)]
history = model.fit(train, steps_per_epoch=150, epochs=5, batch_size=64, validation_data=val, callbacks=fit_callbacks)
model.evaluate(test)
Output:
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Functional) (None, 1, 1, 512) 14714688
flatten_3 (Flatten) (None, 512) 0
dense_9 (Dense) (None, 1024) 525312
dropout_6 (Dropout) (None, 1024) 0
dense_10 (Dense) (None, 256) 262400
dropout_7 (Dropout) (None, 256) 0
dense_11 (Dense) (None, 10) 2570
=================================================================
Total params: 15,504,970
Trainable params: 790,282
Non-trainable params: 14,714,688
_________________________________________________________________
Epoch 1/5
150/150 [==============================] - 6s 35ms/step - loss: 0.8056 - accuracy: 0.7217 - val_loss: 0.5433 - val_accuracy: 0.7967
Epoch 2/5
150/150 [==============================] - 4s 26ms/step - loss: 0.5560 - accuracy: 0.7965 - val_loss: 0.4772 - val_accuracy: 0.8224
Epoch 3/5
150/150 [==============================] - 4s 26ms/step - loss: 0.5287 - accuracy: 0.8080 - val_loss: 0.4698 - val_accuracy: 0.8234
Epoch 4/5
150/150 [==============================] - 5s 32ms/step - loss: 0.5012 - accuracy: 0.8149 - val_loss: 0.4334 - val_accuracy: 0.8329
Epoch 5/5
150/150 [==============================] - 4s 25ms/step - loss: 0.4791 - accuracy: 0.8315 - val_loss: 0.4312 - val_accuracy: 0.8398
157/157 [==============================] - 2s 15ms/step - loss: 0.4457 - accuracy: 0.8325
[0.44566288590431213, 0.8324999809265137]
I'm having trouble in interpreting the output of Keras model.fit() method.
The setting
print(tf.version.VERSION) # 2.3.0
print(keras.__version__) # 2.4.0
I have a simple feedforward model for a 3-class classification problem:
def get_baseline_mlp(signal_length):
input_tensor = keras.layers.Input(signal_length, name="input")
dense_1 = keras.layers.Flatten()(input_tensor)
dense_1 = keras.layers.Dense(name='dense_1',activation='relu',units=500)(dense_1)
dense_1 = keras.layers.Dense(name='dense_2',activation='relu',units=500)(dense_1)
dense_1 = keras.layers.Dense(name='dense_3',activation='relu',units=500)(dense_1)
dense_1 = keras.layers.Dense(name='dense_4',activation='softmax',units=3, bias_initializer='zero')(dense_1)
model = keras.models.Model(inputs=input_tensor, outputs=[dense_1])
model.summary()
return model
My training data are univariate timeseries, and my output is a one-hot encoded vector of length 3 (I have 3 classes in my classification problem)
Model is compiled as following:
mlp_base.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
I have a function to manually calculate accuracy of my prediction with two methods:
def get_accuracy(model, true_x, true_y):
res = model.predict(true_x)
res = np.rint(res)
right = 0
for i in range(len(true_y[:, 0])):
if np.array_equal(res[i, :], true_y[i, :]):
#print(res[i,:], tr_y[i,:])
right += 1
else:
pass
tot = len(true_y[:,0])
print('True - total', right, tot)
print('acc: {}'.format((right/tot)))
print()
print(' method 2 - categorical')
res = model.predict(true_x)
res = np.argmax(res, axis=-1)
true_y = np.argmax(true_y, axis=-1)
right = 0
for i in range(len(true_y)):
if res[i] == true_y[i]:
right += 1
else:
pass
tot = len(true_y)
print('True - total', right, tot)
print('acc: {}'.format((right/tot)))
The Problem
At training end, the outputted categorical accuracy does not match the one I get using my custom function.
Training output:
Model: "functional_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, 9000)] 0
_________________________________________________________________
flatten_8 (Flatten) (None, 9000) 0
_________________________________________________________________
dense_1 (Dense) (None, 500) 4500500
_________________________________________________________________
dense_2 (Dense) (None, 500) 250500
_________________________________________________________________
dense_3 (Dense) (None, 500) 250500
_________________________________________________________________
dense_4 (Dense) (None, 3) 1503
=================================================================
Total params: 5,003,003
Trainable params: 5,003,003
Non-trainable params: 0
-------------------------------------------------------------------
Fit model on training data
Epoch 1/2
20/20 [==] - 0s 14ms/step - loss: 1.3796 categorical_accuracy: 0.3250 - val_loss: 0.9240 -
Epoch 2/2
20/20 [==] - 0s 8ms/step - loss: 0.8131 categorical_accuracy: 0.6100 - val_loss: 1.2811
Output of accuracy function:
True / total 169 200
acc: 0.845
method 2
True / total 182 200
acc: 0.91
Why am I getting wrong results? Is my accuracy implementation wrong?
Update
Correcting the settings as desertnaut suggested is still not working.
Output of fit:
Epoch 1/3
105/105 [===] - 1s 9ms/step - loss: 1.7666 - categorical_accuracy: 0.2980
Epoch 2/3
105/105 [===] - 1s 6ms/step - loss: 1.2380 - categorical_accuracy: 0.4432
Epoch 3/3
105/105 [===] - 1s 5ms/step - loss: 1.0318 - categorical_accuracy: 0.5989
If I use the categorical accuracy function by keras I'm still getting different results.
cat_acc = keras.metrics.CategoricalAccuracy()
cat_acc.update_state(tr_y2, y_pred)
print(cat_acc.result().numpy()) # outputs : 0.7211079
Interestingly, if I compute with the above methods the validation accuracy I get consistent output.
Not quite sure about your accuracy calculation (seems unnecessary convoluted, and we always prefer vector calculations over for loops), but there are two issues with your code that may impact the results (or even render them meaningless).
The first issue is that, since your are in a multiclass setting, you should compile your model with loss='categorical_crossentropy', and not 'binary_crossentropy'; check own answer in Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? to see what may happen when you mix losses & accuracies that way (plus, a 'binary_accuracy' here is absolutely meaningless).
The second issue is that you erroneously use activation='sigmoid' for your last layer: since you are in a multi-class (not multi-label) setting with your labels one-hot encoded, the activation in your last layer should be softmax, and not sigmoid.
I would love some insight on this. I'm working on a regression problem in Keras with a simple neural network. I have train and test data, training data consists of 33230 samples with 20020 features (which is a ton of features for this amount of data, but that's another story - the features are just various measurements). Test set is 8308 samples with same number of features. My data is in a pandas dataframe, and I convert it into numpy arrays which seem to look as expected:
X_train = np.array(X_train_df)
X_train.shape
(33230, 20020)
X_test = np.array(X_test_df)
X_test.shape
(8308, 20020)
If I pass this into the following fully connected model, it trains very quickly, and produces terrible results on the test set:
Model:
model = Sequential()
model.add(Dense(300, activation="relu", input_shape=(20020,)))
model.add(Dense(300, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Fit:
model.fit(x=X_train, y=y_train, validation_data=(X_test, y_test), batch_size=128, shuffle=True, epochs=100)
Results after 5 epochs (doesn't change substantially after this, training loss goes down, validation loss shoots up):
Train on 33230 samples, validate on 8308 samples
Epoch 1/100
33230/33230 [==============================] - 11s 322us/sample - loss: 217.6460 - mean_absolute_error: 9.6896 - val_loss: 92.2517 - val_mean_absolute_error: 7.6400
Epoch 2/100
33230/33230 [==============================] - 10s 308us/sample - loss: 70.0501 - mean_absolute_error: 7.0170 - val_loss: 90.1813 - val_mean_absolute_error: 7.5721
Epoch 3/100
33230/33230 [==============================] - 10s 309us/sample - loss: 62.5253 - mean_absolute_error: 6.6401 - val_loss: 104.1333 - val_mean_absolute_error: 8.0131
Epoch 4/100
33230/33230 [==============================] - 11s 335us/sample - loss: 55.6250 - mean_absolute_error: 6.2346 - val_loss: 142.8665 - val_mean_absolute_error: 9.3112
Epoch 5/100
33230/33230 [==============================] - 10s 311us/sample - loss: 51.7378 - mean_absolute_error: 5.9570 - val_loss: 208.8995 - val_mean_absolute_error: 11.4158
However if I reshape the data:
X_test = X_test.reshape(8308, 20020, 1)
X_train = X_train.reshape(33230, 20020, 1)
And then use the same model with a Flatten() after the first layer:
model = Sequential()
model.add(Dense(300, activation="relu", input_shape=(20020,1)))
model.add(Flatten())
model.add(Dense(300, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Then my results look very different, and much better:
Train on 33230 samples, validate on 8308 samples
Epoch 1/100
33230/33230 [==============================] - 1117s 34ms/sample - loss: 112.4860 - mean_absolute_error: 7.5939 - val_loss: 59.3871 - val_mean_absolute_error: 6.2453
Epoch 2/100
33230/33230 [==============================] - 1112s 33ms/sample - loss: 4.7877 - mean_absolute_error: 1.6323 - val_loss: 23.8041 - val_mean_absolute_error: 3.8226
Epoch 3/100
33230/33230 [==============================] - 1116s 34ms/sample - loss: 2.3945 - mean_absolute_error: 1.1755 - val_loss: 14.9597 - val_mean_absolute_error: 2.8702
Epoch 4/100
33230/33230 [==============================] - 1113s 33ms/sample - loss: 1.5722 - mean_absolute_error: 0.9616 - val_loss: 15.0566 - val_mean_absolute_error: 2.9075
Epoch 5/100
33230/33230 [==============================] - 1117s 34ms/sample - loss: 1.4161 - mean_absolute_error: 0.9179 - val_loss: 11.5235 - val_mean_absolute_error: 2.4781
It also takes 1000x times longer, but performs well on the test set. I don't understand why this happens. Can someone shed light on this? I'm guessing I'm missing something really basic, but I can't figure out what.
A very good question. First of all you will have to understand how the network actually work. Dense layer is a fully conected layer so each neuron will have a connection with the previous layer's neuron. Now your networks Performance that you have mentioned that it is 1000x time slower is nothing to do with your training data, but with your network. Your second network is so big that I was unable to fit it in my RAM as well as not in Google Colab. So for demonstration purposes I will take that your training data is is of (500, 100) shape.
For the First network that you posted taking the above mentioned shape your model network looks something like below:
model = Sequential()
model.add(Dense(300, activation="relu", input_shape=(100,)))
model.add(Dense(300, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 300) 30300
_________________________________________________________________
dense_3 (Dense) (None, 300) 90300
_________________________________________________________________
dense_4 (Dense) (None, 100) 30100
_________________________________________________________________
dense_5 (Dense) (None, 1) 101
=================================================================
Total params: 150,801
Trainable params: 150,801
Non-trainable params: 0
_________________________________________________________________
Take a note of the Total params, it is 150,801.
Now if we take your second example.
model1 = Sequential()
model1.add(Dense(300, activation="relu", input_shape=(100,1)))
model1.add(Flatten())
model1.add(Dense(300, activation="relu"))
model1.add(Dense(100, activation="relu"))
model1.add(Dense(1, activation='linear'))
model1.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_14 (Dense) (None, 100, 300) 600
_________________________________________________________________
flatten_2 (Flatten) (None, 30000) 0
_________________________________________________________________
dense_15 (Dense) (None, 300) 9000300
_________________________________________________________________
dense_16 (Dense) (None, 100) 30100
_________________________________________________________________
dense_17 (Dense) (None, 1) 101
=================================================================
Total params: 9,031,101
Trainable params: 9,031,101
Non-trainable params: 0
_________________________________________________________________
Your total params increases to 9,031,101. You can image when you use your actual data that has length 20020. Your model increases like anything and I was even unable to fit that model in my RAM.
So to conclude, your second model has huge number of parameters compared to first model. This is the reason for slow training and better performance may be? more parameters makes the learning better. Can't say what makes it better without actually looking at your data. But more parameters can contribute to better performance.
Note: If you remove the Flatten layer your network paramters will decrease, here is the example.
model1 = Sequential()
model1.add(Dense(300, activation="relu", input_shape=(100,1)))
model1.add(Dense(300, activation="relu"))
model1.add(Dense(100, activation="relu"))
model1.add(Dense(1, activation='linear'))
model1.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_18 (Dense) (None, 100, 300) 600
_________________________________________________________________
dense_19 (Dense) (None, 100, 300) 90300
_________________________________________________________________
dense_20 (Dense) (None, 100, 100) 30100
_________________________________________________________________
dense_21 (Dense) (None, 100, 1) 101
=================================================================
Total params: 121,101
Trainable params: 121,101
Non-trainable params: 0
_________________________________________________________________
I hope my answer helped you understand what is hapening and what is the difference between two models.
UPDATE : 20/07
For your comment, I thought it is better to update the answer for more clarity. Your question is -- how does the number of parameters relate to the shape of the network?
To be honest I do not clearly understand what you mean by this. I will still try to answer it. The more layers or neurons you add increases the network and the number of trainable parameters.
So your actual issue is why does the layer Flatten increases you parameter. For that you need to understand how are parameters calculated.
model.add(Dense(300, activation="relu", input_shape=(100,)))
Consider this is your first layer the number of parameters will be units *(input_size + 1) that comes to 30300. Now when you add the Flatten layer, this actually does not increase your parameter by itself, but the output of the Flatten layer is input to the following layer. So consider the following example.
_________________________________________________________________
flatten_2 (Flatten) (None, 30000) 0
_________________________________________________________________
dense_15 (Dense) (None, 300) 9000300
_________________________________________________________________
Here you can see that the output size of the Flatten layer is 30000. Now considering the above formula you can see 300 *(30000 + 1) will result in 9000300 parameters which is a huge deal in itself. More number of parameters can help to learn more features and might help to achieve better results. But it always depends on the data, you will have to experiment with it.
I hope the above explainations might have cleared your doubts.
I have built a model and when I train it, my validation loss is smaller than my training one and the validation accuracy is higher than the training one. Is the model being overfitted? Am I doing something wrong? Can someone please look at my model and see if there is anything wrong with it? Thank you.
input_text = Input(shape=(200,), dtype='int32', name='input_text')
meta_input = Input(shape=(2,), name='meta_input')
embedding = Embedding(input_dim=len(tokenizer.word_index) + 1,
output_dim=300,
input_length=200)(input_text)
lstm = Bidirectional(LSTM(units=128,
dropout=0.5,
recurrent_dropout=0.5,
return_sequences=True),
merge_mode='concat')(embedding)
pool = GlobalMaxPooling1D()(lstm)
dropout = Dropout(0.5)(pool)
text_output = Dense(n_codes, activation='sigmoid', name='aux_output')(dropout)
output = concatenate([text_output, meta_input])
output = Dense(n_codes, activation='relu')(output)
main_output = Dense(n_codes, activation='softmax', name='main_output')(output)
model = Model(inputs=[input_text,meta_input], outputs=[output])
optimer = Adam(lr=.001)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
model.fit([X1_train, X2_train], [y_train],
validation_data=([X1_valid,X2_valid], [y_valid]),
batch_size=64, epochs=20, verbose=1)
Here is the output:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_text (InputLayer) [(None, 200)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 200, 300) 889500 input_text[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional) (None, 200, 256) 439296 embedding[0][0]
__________________________________________________________________________________________________
global_max_pooling1d (GlobalMax (None, 256) 0 bidirectional[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 256) 0 global_max_pooling1d[0][0]
__________________________________________________________________________________________________
aux_output (Dense) (None, 545) 140065 dropout[0][0]
__________________________________________________________________________________________________
meta_input (InputLayer) [(None, 2)] 0
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 547) 0 aux_output[0][0]
meta_input[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 545) 298660 concatenate[0][0]
==================================================================================================
Total params: 1,767,521
Trainable params: 1,767,521
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 11416 samples, validate on 2035 samples
Epoch 1/20
11416/11416 [==============================] - 158s 14ms/sample - loss: 0.0955 - accuracy: 0.9929 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 2/20
11416/11416 [==============================] - 152s 13ms/sample - loss: 0.0562 - accuracy: 0.9963 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 3/20
11416/11416 [==============================] - 209s 18ms/sample - loss: 0.0562 - accuracy: 0.9963 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 4/20
11416/11416 [==============================] - 178s 16ms/sample - loss: 0.0562 - accuracy: 0.9963 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 5/20
11416/11416 [==============================] - 211s 18ms/sample - loss: 0.0562 - accuracy: 0.9963 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 6/20
Overfitting would be when acc is higher than val_acc and loss lower than val_loss.
However, it looks for me that your validation dataset is not representative for the overall distribution in the dataset. For whatever reason the results of your validation dataset is constant and even constantly higher.
You are doing a binary classification. Be aware of class imbalance!
E.g. if 99% of your sample is class 0 and 1% is class 1,
then, even if your model doesn't learn anything, it will have 99% accuracy if it always predicts 0 without ever once predicting a 1.
Imagine your (mostly random) split of data created a datset with 99.5% of the validation data will be class 0 and 0.5% class 1.
Imagine in worst case your model doesn't learn anything. And spits out ("predicts") always a 0. Then train acc will be constantly 0.99 and a certain loss. And val_acc will be constantly 0.995.
For me puzzling is that your performance measures are constant. That is ALWAYS bad. Because usually if the model learns sth and even if it overfits there will be stochastic noise always.
No book tells you the following - no beginner book. And I learned this by experience: You have to put shuffle=True in your model.fit().
Because for me it seems you are training in a way that you present the model first only samples of the one class and then the samples of another class. Mixing up samples of the one and the other class perturbs the model well enough and avoids it to get stuck in some local minima.
Or sometimes I got such constant results even when shuffling.
In that case, I just try to choose another random split which then works better. (So: try other splits!)
The difference is marginal so I would not worry.
In general what might be happening is that by incident during the random splitting between train and validation sets the examples selected in the validation set are "easier" to guess than the ones in the training set.
You could overcome this by developing a cross validation strategy as following:
Take 10% of the dataset out (holdout) and consider it you test set.
With the remaining dataset make a 80%-20% split for training and validation sets.
Repeat the 80-20 tranining validation split 5 times.
Train 5 models on your 5 different train-valid datasets and see what the results are.
You can even compare all 5 models on the test sets just to see what would be the "real" or "closer to reality" accuracy. That might help to see which model generalizes better.
In the end you might even consider to stack them together:
https://machinelearningmastery.com/stacking-ensemble-for-deep-learning-neural-networks/
The fact that both training and validation accuracy looks similar and do not change during the training indicates that the model might be stuck in a local minima.
It is worth to train for more epochs (at least 20) to see if the model can "jump" out of the local minimal with the current learning rate.
If this not solve the problem I would change the learning rate from .001 to .0001 or .00001. This should help the model to converge hopefully to a global minimal.
If this does not solve the problem, there many other parameters/hyperparameters in general which might be useful to check further: number of nodes in the layers, number of layers, optimizer strategy, size and distribution (generality and variance) of the training set...
No, there is nothing wrong, this effect (validation metrics being better than training ones) is common with the use of Dropout, as your network uses.
Dropout adds noise during training, and this noise is not present during validation/testing, so it's natural that training metrics get a bit worse, but validation metrics do not have this noise, and are a bit better due to the improved generalization produced by Dropout.
I'm currently working on the CIFAR-10 Dataset which is an image classification problem with 10 classes.
I have started to develop with Tensorflow 2 a Linear Classification without the LinearClassifier Object.
X shape corresponds to 10 000 images of 32*32 pixels RBG = (10000, 3072)
Y_one_hot is a one hot vector = (10000, 10)
model creation code:
model = tf.keras.Sequential()
model.add(Dense(1, activation="linear", input_dim=32*32*3))
model.add(Dense(10, activation="softmax", input_dim=1))
model.compile(optimizer="adam", loss="mean_squared_error", metrics=["accuracy"])
training code:
model.fit(X, Y_one_hot, batch_size=10000, verbose=1, epochs=100)
predict code:
img = X[0].reshape(1, 3072) # Select image 0
res = np.argmax((model.predict(img))) # select the max in output
Problem:
res value is always the same. It seems my model is not learning.
Model.summary
Summary displays :
dense (Dense) (None, 1) 3073
dense_1 (Dense) (None, 10) 20
Total params: 3,093
Trainable params: 3,093
Non-trainable params: 0
Accuracy & loss:
Epoch 1/100
10000/10000 [==============================] - 2s 184us/sample - loss: 0.0949 - accuracy: 0.1005
Epoch 50/100
10000/10000 [==============================] - 0s 10us/sample - loss: 0.0901 - accuracy: 0.1000
Epoch 100/100
10000/10000 [==============================] - 0s 8us/sample - loss: 0.0901 - accuracy: 0.1027
Do you have any idea why my model is always prediciting the same value ?
Thanks,
One remarks:
The loss you used loss="mean_squared_error"is not meant for classification. Is meant for regression. Two very different problems. Try a cross entropy. For example
`model.compile(optimizer=AdamOpt,
loss='categorical_crossentropy', metrics=['accuracy'])`
You can find an example here: https://github.com/michelucci/oreilly-london-ai/blob/master/day1/Beginner%20friendly%20networks/First_Example_of_a_CNN_(CIFAR10).ipynb. Is a note book I used for a training I gave. The network is CNN but you can change it with yours.
Try that...
Best of luck, Umberto