Train accuracy decreases with train loss - python

I wrote this very simple code
model = keras.models.Sequential()
model.add(layers.Dense(13000, input_dim=X_train.shape[1], activation='relu', trainable=False))
model.add(layers.Dense(1, input_dim=13000, activation='linear'))
model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])
model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1)
The data is MNIST but only for digits '0' and '1'.
I have a very strange issue, where the loss is monotonically decreasing to zero, as expected, yet the accuracy instead of increasing, is also decreasing.
Here is a sample output
12665/12665 [==============================] - 0s 11us/step - loss: 0.0107 - accuracy: 0.2355
Epoch 181/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0114 - accuracy: 0.2568
Epoch 182/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0128 - accuracy: 0.2726
Epoch 183/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0133 - accuracy: 0.2839
Epoch 184/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0134 - accuracy: 0.2887
Epoch 185/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0110 - accuracy: 0.2842
Epoch 186/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0101 - accuracy: 0.2722
Epoch 187/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0094 - accuracy: 0.2583
Since we only have two classes, the benchmark for lowest possible accuracy should be 0.5, and furthermore we are monitoring accuracy on the training set, so it should very going up to 100%, I expect overfitting and I am overfitting according to the loss function.
At the final epoch, this is the situation
12665/12665 [==============================] - 0s 11us/step - loss: 9.9710e-06 - accuracy: 0.0758
a 7% accuracy when the worst theoretical possibility if you guess randomly is 50%. This is no accident. Something is going on here.
Can anyone see the problem?
Entire code
from tensorflow import keras
import numpy as np
from matplotlib import pyplot as plt
import keras
from keras.callbacks import Callback
from keras import layers
import warnings
class EarlyStoppingByLossVal(Callback):
def __init__(self, monitor='val_loss', value=0.00001, verbose=0):
super(Callback, self).__init__()
self.monitor = monitor
self.value = value
self.verbose = verbose
def on_epoch_end(self, epoch, logs={}):
current = logs.get(self.monitor)
if current is None:
warnings.warn("Early stopping requires %s available!" % self.monitor, RuntimeWarning)
if current < self.value:
if self.verbose > 0:
print("Epoch %05d: early stopping THR" % epoch)
self.model.stop_training = True
def load_mnist():
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = np.reshape(train_images, (train_images.shape[0], train_images.shape[1] * train_images.shape[2]))
test_images = np.reshape(test_images, (test_images.shape[0], test_images.shape[1] * test_images.shape[2]))
train_labels = np.reshape(train_labels, (train_labels.shape[0],))
test_labels = np.reshape(test_labels, (test_labels.shape[0],))
train_images = train_images[(train_labels == 0) | (train_labels == 1)]
test_images = test_images[(test_labels == 0) | (test_labels == 1)]
train_labels = train_labels[(train_labels == 0) | (train_labels == 1)]
test_labels = test_labels[(test_labels == 0) | (test_labels == 1)]
train_images, test_images = train_images / 255, test_images / 255
return train_images, train_labels, test_images, test_labels
X_train, y_train, X_test, y_test = load_mnist()
train_acc = []
train_errors = []
test_acc = []
test_errors = []
width_list = [13000]
for width in width_list:
print(width)
model = keras.models.Sequential()
model.add(layers.Dense(width, input_dim=X_train.shape[1], activation='relu', trainable=False))
model.add(layers.Dense(1, input_dim=width, activation='linear'))
model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])
callbacks = [EarlyStoppingByLossVal(monitor='loss', value=0.00001, verbose=1)]
model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1, callbacks=callbacks)
train_errors.append(model.evaluate(X_train, y_train)[0])
test_errors.append(model.evaluate(X_test, y_test)[0])
train_acc.append(model.evaluate(X_train, y_train)[1])
test_acc.append(model.evaluate(X_test, y_test)[1])
plt.plot(width_list, train_errors, marker='D')
plt.xlabel("width")
plt.ylabel("train loss")
plt.show()
plt.plot(width_list, test_errors, marker='D')
plt.xlabel("width")
plt.ylabel("test loss")
plt.show()
plt.plot(width_list, train_acc, marker='D')
plt.xlabel("width")
plt.ylabel("train acc")
plt.show()
plt.plot(width_list, test_acc, marker='D')
plt.xlabel("width")
plt.ylabel("test acc")
plt.show()

A linear activation in the last layer for a (binary) classification problem is meaningless; change your last layer to:
model.add(layers.Dense(1, input_dim=width, activation='sigmoid'))
Linear activations for the last layer are used for regression problems and not for classification ones.

Related

K fold cross validation not working properly

I have created a transfer learning model with RESNET-50. I am using K-fold cross-validation in my model. However, my model is not performing properly. Although I didn't get any error messages, my accuracy values are very weird. They are the same across all the folds. Please find my code below:
root_path = 'D:/regionGrowing_MLT/NewSavedRGBImages'
datasetFolderName=root_path
MODEL_FILENAME=root_path+"model_cv.h5"
sourceFiles=[]
classLabels=['Benign', 'Malignant']
X=[]
Y=[]
img_rows, img_cols = 224, 224 # input image dimensions
train_path=datasetFolderName+'/Training/'
validation_path=datasetFolderName+'/validation/'
test_path=datasetFolderName+'/test/'
def transferBetweenFolders(source, dest, splitRate):
global sourceFiles
sourceFiles=os.listdir(source)
if(len(sourceFiles)!=0):
transferFileNumbers=int(len(sourceFiles)*splitRate)
transferIndex=random.sample(range(0, len(sourceFiles)), transferFileNumbers)
for eachIndex in transferIndex:
shutil.move(source+str(sourceFiles[eachIndex]), dest+str(sourceFiles[eachIndex]))
else:
print("No file moved. Source empty!")
def transferAllClassBetweenFolders(source, dest, splitRate):
for label in classLabels:
transferBetweenFolders(datasetFolderName+'/'+source+'/'+label+'/',
datasetFolderName+'/'+dest+'/'+label+'/',
splitRate)
def my_metrics(y_true, y_pred):
accuracy=accuracy_score(y_true, y_pred)
precision=precision_score(y_true, y_pred,average='weighted')
f1Score=f1_score(y_true, y_pred, average='weighted')
print("Accuracy : {}".format(accuracy))
print("Precision : {}".format(precision))
print("f1Score : {}".format(f1Score))
cm=confusion_matrix(y_true, y_pred)
print(cm)
return accuracy, precision, f1Score
transferAllClassBetweenFolders('Training', 'test', 0.20)
def prepareNameWithLabels(folderName):
sourceFiles=os.listdir(datasetFolderName+'/Training/'+folderName)
for val in sourceFiles:
X.append(val)
for i in range(len(classLabels)):
if(folderName==classLabels[i]):
Y.append(i)
# Organize file names and class labels in X and Y variables
for i in range(len(classLabels)):
prepareNameWithLabels(classLabels[i])
X=np.asarray(X)
Y=np.asarray(Y)
batch_size = 32
epoch=100
activationFunction='relu'
def getModel():
model = Sequential()
model.add(Flatten())
model.add(Dense(256, activation='relu', name='fc1'))
model.add(Dense(128, activation='relu', name='fc2'))
model.add(layers.Dropout(0.5)) #### used for regularization (to aviod overfitting)
model.add(Dense(2, activation='softmax'))
# model.summary()
model.compile(optimizer=optimizers.Adam(learning_rate=2e-5),
loss='binary_crossentropy',
metrics=['accuracy'])
return model
model=getModel()
# ===============Stratified K-Fold======================
skf = StratifiedKFold(n_splits=5, shuffle=True)
skf.get_n_splits(X, Y)
foldNum=0
for train_index, val_index in skf.split(X, Y):
#First cut all images from validation to train (if any exists)
transferAllClassBetweenFolders('validation', 'Training', 1.0)
foldNum+=1
print("Results for fold",foldNum)
X_train, X_val = X[train_index], X[val_index]
Y_train, Y_val = Y[train_index], Y[val_index]
# Move validation images of this fold from train folder to the validation folder
for eachIndex in range(len(X_val)):
classLabel=''
for i in range(len(classLabels)):
if(Y_val[eachIndex]==i):
classLabel=classLabels[i]
#Then, copy the validation images to the validation folder
shutil.move(datasetFolderName+'/Training/'+classLabel+'/'+X_val[eachIndex],
datasetFolderName+'/validation/'+classLabel+'/'+X_val[eachIndex])
train_datagen = ImageDataGenerator(
rescale=1./255,
horizontal_flip=True,
rotation_range=40,
shear_range=0.2,
width_shift_range=0.2,
height_shift_range=0.2,
zoom_range=0.20,
fill_mode="nearest")
validation_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
#Start ImageClassification Model
train_generator = train_datagen.flow_from_directory(
train_path,
target_size=(img_rows, img_cols),
batch_size=batch_size,
class_mode='categorical',
subset='training')
validation_generator = validation_datagen.flow_from_directory(
validation_path,
target_size=(img_rows, img_cols),
batch_size=batch_size,
class_mode=None, # only data, no labels
shuffle=False)
# fit model
history=model.fit_generator(train_generator,
epochs=epoch)
predictions = model.predict_generator(validation_generator, verbose=1)
yPredictions = np.argmax(predictions, axis=1)
true_classes = validation_generator.classes
# evaluate validation performance
print("***Performance on Validation data***")
valAcc, valPrec, valFScore = my_metrics(true_classes, yPredictions)
My results are below:
Epoch 1/100
72/72 [==============================] - 25s 350ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 2/100
72/72 [==============================] - 25s 349ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 3/100
72/72 [==============================] - 25s 348ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 4/100
72/72 [==============================] - 25s 348ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 5/100
72/72 [==============================] - 25s 352ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 6/100
72/72 [==============================] - 25s 351ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 7/100
72/72 [==============================] - 25s 353ms/step - loss: 0.6931 - accuracy: 0.5013
As can be seen above, the accuracy values are the same for all epochs which shouldn't be the case. I am not sure exactly where I am making the error. Any suggestions would be appreciated. Thank you.

tf.keras.callbacks.ModelCheckpoint ignores the montior parameter and always use loss

I am running tf.keras.callbacks.ModelCheckpoint with the accuracy metric but loss is used to save the best checkpoints. I have tested this in different places (my computer and collab) and two different code and faced the same issue. Here is an example code and the results:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
import shutil
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(
optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
return model
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
ckpt_folder = os.path.join(os.getcwd(), 'ckpt')
if os.path.exists(ckpt_folder):
shutil.rmtree(ckpt_folder)
ckpt_path = os.path.join(r'D:\deep_learning\tf_keras\semantic_segmentation\logs', 'mymodel_{epoch}')
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
# Path where to save the model
# The two parameters below mean that we will overwrite
# the current checkpoint if and only if
# the `val_loss` score has improved.
# The saved model name will include the current epoch.
filepath=ckpt_path,
montior="val_accuracy",
# save the model weights with best validation accuracy
mode='max',
save_best_only=True, # only save the best weights
save_weights_only=False,
# only save model weights (not whole model)
verbose=1
)
]
model = get_compiled_model()
model.fit(
x_train, y_train, epochs=3, batch_size=1, callbacks=callbacks, validation_split=0.2, steps_per_epoch=1
)
1/1 [==============================] - ETA: 0s - loss: 2.6475 - accuracy: 0.0000e+00
Epoch 1: val_loss improved from -inf to 2.32311, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_1
1/1 [==============================] - 6s 6s/step - loss: 2.6475 - accuracy: 0.0000e+00 - val_loss: 2.3231 - val_accuracy: 0.1142
Epoch 2/3
1/1 [==============================] - ETA: 0s - loss: 1.9612 - accuracy: 1.0000
Epoch 2: val_loss improved from 2.32311 to 2.34286, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_2
1/1 [==============================] - 5s 5s/step - loss: 1.9612 - accuracy: 1.0000 - val_loss: 2.3429 - val_accuracy: 0.1187
Epoch 3/3
1/1 [==============================] - ETA: 0s - loss: 2.8378 - accuracy: 0.0000e+00
Epoch 3: val_loss did not improve from 2.34286
1/1 [==============================] - 5s 5s/step - loss: 2.8378 - accuracy: 0.0000e+00 - val_loss: 2.2943 - val_accuracy: 0.1346
In your code, You write montior instead of monitor, and the function doesn't have this word as param then use the default value, If you write like below, You get what you want:
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
filepath=ckpt_path,
monitor="val_accuracy",
mode='max',
save_best_only=True,
save_weights_only=False,
verbose=1
)
]

Keras Model IndexError

Code:
import tensorflow.keras as tfk
import pandas as pd
import numpy as np
dataset = pd.read_csv("translator.csv")
x_train, x_test = dataset[["Afrikaans Woorde", "English Words"]]
y_train, y_test = dataset[["Total Letter Amount", "Incommon Letters"]]
x_train = np.array(x_train)
y_train = np.array(y_train)
x_test = np.array(x_test)
y_test = np.array(y_test)
model = tfk.models.Sequential()
input_layer = model.add(tfk.layers.Flatten())
hidden_layer1 = model.add(tfk.layers.Dense(128, activation="relu"))
hidden_layer2 = model.add(tfk.layers.Dense(128, activation="relu"))
output_layer = model.add(tfk.layers.Dense(1))
compiler = model.compile(optimizer="adam", loss="spare_categorical_crossentropy", metrics=["accuracy"])
fitter = model.fit(x_train, y_train, epochs=10)
val_loss, val_acc = model.evaluate(x_test, y_test)
print(f"Percentage loss {val_loss * 100}%", f"Percentage accuracy {val_acc * 100}%")
Error:
IndexError: list index out of range
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_5408/3773670894.py in <module>
22 compiler = model.compile(optimizer="adam", loss="spare_categorical_crossentropy", metrics=["accuracy"])
23
---> 24 fitter = model.fit(x_train, y_train, epochs=10)
25
26 val_loss, val_acc = model.evaluate(x_test, y_test)
Question:
I have tried everything, I am not sure what to do? I have, even converted the dataset to an numpy array, yet it still gives me the error.
This specific model is to see if I can build a Translator just from a couple of words.
I tried with random input, your model architecture outputs 1, which means binary classification.
Working sample code
import tensorflow.keras as tfk
import numpy as np
import tensorflow as tf
X_train = np.random.random((1512,18))
y_train = np.random.random((1512,1))
dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_data = dataset.shuffle(len(X_train)).batch(32)
train_data = train_data.prefetch(
buffer_size=tf.data.experimental.AUTOTUNE)
model = tfk.models.Sequential()
input = model.add(tfk.layers.Dense(15, activation=tf.nn.relu, input_shape=(18,)))
input_layer = model.add(tfk.layers.Flatten())
hidden_layer1 = model.add(tfk.layers.Dense(128, activation="relu"))
hidden_layer2 = model.add(tfk.layers.Dense(128, activation="relu"))
output_layer = model.add(tfk.layers.Dense(1))
model.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
fitter = model.fit(train_data, epochs=5, batch_size=5, verbose=1)
Output
Epoch 1/5
48/48 [==============================] - 4s 5ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00
Epoch 2/5
48/48 [==============================] - 0s 4ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00
Epoch 3/5
48/48 [==============================] - 0s 5ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00
Epoch 4/5
48/48 [==============================] - 0s 6ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00
Epoch 5/5
48/48 [==============================] - 0s 5ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00

0 accuracy with LSTM

I trained LSTM classification model, but got weird results (0 accuracy). Here is my dataset with preprocessing steps:
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
import numpy as np
url = 'https://raw.githubusercontent.com/MislavSag/trademl/master/trademl/modeling/random_forest/X_TEST.csv'
X_TEST = pd.read_csv(url, sep=',')
url = 'https://raw.githubusercontent.com/MislavSag/trademl/master/trademl/modeling/random_forest/labeling_info_TEST.csv'
labeling_info_TEST = pd.read_csv(url, sep=',')
# TRAIN TEST SPLIT
X_train, X_test, y_train, y_test = train_test_split(
X_TEST.drop(columns=['close_orig']), labeling_info_TEST['bin'],
test_size=0.10, shuffle=False, stratify=None)
### PREPARE LSTM
x = X_train['close'].values.reshape(-1, 1)
y = y_train.values.reshape(-1, 1)
x_test = X_test['close'].values.reshape(-1, 1)
y_test = y_test.values.reshape(-1, 1)
train_val_index_split = 0.75
train_generator = keras.preprocessing.sequence.TimeseriesGenerator(
data=x,
targets=y,
length=30,
sampling_rate=1,
stride=1,
start_index=0,
end_index=int(train_val_index_split*X_TEST.shape[0]),
shuffle=False,
reverse=False,
batch_size=128
)
validation_generator = keras.preprocessing.sequence.TimeseriesGenerator(
data=x,
targets=y,
length=30,
sampling_rate=1,
stride=1,
start_index=int((train_val_index_split*X_TEST.shape[0] + 1)),
end_index=None, #int(train_test_index_split*X.shape[0])
shuffle=False,
reverse=False,
batch_size=128
)
test_generator = keras.preprocessing.sequence.TimeseriesGenerator(
data=x_test,
targets=y_test,
length=30,
sampling_rate=1,
stride=1,
start_index=0,
end_index=None,
shuffle=False,
reverse=False,
batch_size=128
)
# convert generator to inmemory 3D series (if enough RAM)
def generator_to_obj(generator):
xlist = []
ylist = []
for i in range(len(generator)):
x, y = train_generator[i]
xlist.append(x)
ylist.append(y)
X_train = np.concatenate(xlist, axis=0)
y_train = np.concatenate(ylist, axis=0)
return X_train, y_train
X_train_lstm, y_train_lstm = generator_to_obj(train_generator)
X_val_lstm, y_val_lstm = generator_to_obj(validation_generator)
X_test_lstm, y_test_lstm = generator_to_obj(test_generator)
# test for shapes
print('X and y shape train: ', X_train_lstm.shape, y_train_lstm.shape)
print('X and y shape validate: ', X_val_lstm.shape, y_val_lstm.shape)
print('X and y shape test: ', X_test_lstm.shape, y_test_lstm.shape)
and here is my model with resuslts:
### MODEL
model = keras.models.Sequential([
keras.layers.LSTM(124, return_sequences=True, input_shape=[None, 1]),
keras.layers.LSTM(258),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train_lstm, y_train_lstm, epochs=10, batch_size=128,
validation_data=[X_val_lstm, y_val_lstm])
# history = model.fit_generator(train_generator, epochs=40, validation_data=validation_generator, verbose=1)
score, acc = model.evaluate(X_val_lstm, y_val_lstm,
batch_size=128)
historydf = pd.DataFrame(history.history)
historydf.head(10)
Why do I get 0 accuracy?
You're using sigmoid activation, which means your labels must be in range 0 and 1. But in your case, the labels are 1. and -1.
Just replace -1 with 0.
for i, y in enumerate(y_train_lstm):
if y == -1.:
y_train_lstm[i,:] = 0.
for i, y in enumerate(y_val_lstm):
if y == -1.:
y_val_lstm[i,:] = 0.
for i, y in enumerate(y_test_lstm):
if y == -1.:
y_test_lstm[i,:] = 0.
Sidenote:
The signals are very close, it would be hard to distinguish them. So, probably accuracy won't be high with simple models.
After training with 0. and 1. labels,
model = keras.models.Sequential([
keras.layers.LSTM(124, return_sequences=True, input_shape=(30, 1)),
keras.layers.LSTM(258),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train_lstm, y_train_lstm, epochs=5, batch_size=128,
validation_data=(X_val_lstm, y_val_lstm))
# history = model.fit_generator(train_generator, epochs=40, validation_data=validation_generator, verbose=1)
score, acc = model.evaluate(X_val_lstm, y_val_lstm,
batch_size=128)
historydf = pd.DataFrame(history.history)
historydf.head(10)
Epoch 1/5
12/12 [==============================] - 5s 378ms/step - loss: 0.7386 - accuracy: 0.4990 - val_loss: 0.6959 - val_accuracy: 0.4896
Epoch 2/5
12/12 [==============================] - 4s 318ms/step - loss: 0.6947 - accuracy: 0.5133 - val_loss: 0.6959 - val_accuracy: 0.5104
Epoch 3/5
12/12 [==============================] - 4s 318ms/step - loss: 0.6941 - accuracy: 0.4895 - val_loss: 0.6930 - val_accuracy: 0.5104
Epoch 4/5
12/12 [==============================] - 4s 332ms/step - loss: 0.6946 - accuracy: 0.5269 - val_loss: 0.6946 - val_accuracy: 0.5104
Epoch 5/5
12/12 [==============================] - 4s 334ms/step - loss: 0.6931 - accuracy: 0.4901 - val_loss: 0.6929 - val_accuracy: 0.5104
3/3 [==============================] - 0s 73ms/step - loss: 0.6929 - accuracy: 0.5104
loss accuracy val_loss val_accuracy
0 0.738649 0.498980 0.695888 0.489583
1 0.694708 0.513256 0.695942 0.510417
2 0.694117 0.489463 0.692987 0.510417
3 0.694554 0.526852 0.694613 0.510417
4 0.693118 0.490143 0.692936 0.510417
Source code in colab: https://colab.research.google.com/drive/10yRf4TfGDnp_4F2HYoxPyTlF18no-8Dr?usp=sharing

Training accuracy is less than validation accuracy

I have created a CNN model for classifying text data. Please help me interpret my result and tell me why is my training accuracy less than validation accuracy?
I have a total of 2619 Data, all of them are text data. There are two different classes. Here is a sample of my dataset.
The validation set has 34 data. Rest of 2619 data are training data.
I have done RepeatedKfold cross-validation. Here is my code.
from sklearn.model_selection import RepeatedKFold
kf = RepeatedKFold(n_splits=75, n_repeats=1, random_state= 42)
for train_index, test_index in kf.split(X,Y):
#print("Train:", train_index, "Validation:",test_index)
x_train, x_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = Y.iloc[train_index], Y.iloc[test_index]
I have used CNN. Here is my model.
model = Sequential()
model.add(Embedding(2900,2 , input_length=1))
model.add(Conv1D(filters=2, kernel_size=3, kernel_regularizer=l2(0.0005 ), bias_regularizer=l2(0.0005 ), padding='same', activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1, kernel_regularizer=l2(0.0005 ), bias_regularizer=l2(0.0005 ), activation='sigmoid'))
model.add(Dropout(0.25))
adam = optimizers.Adam(lr = 0.0005, beta_1 = 0.9, beta_2 = 0.999, epsilon = None, decay = 0.0, amsgrad = False)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
print(model.summary())
history = model.fit(x_train, y_train, epochs=300,validation_data=(x_test, y_test), batch_size=128, shuffle=False)
# Final evaluation of the model
scores = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
And here is the result.
Epoch 295/300
2585/2585 [==============================] - 0s 20us/step - loss: 1.6920 - acc: 0.7528 - val_loss: 0.5839 - val_acc: 0.8235
Epoch 296/300
2585/2585 [==============================] - 0s 20us/step - loss: 1.6532 - acc: 0.7617 - val_loss: 0.5836 - val_acc: 0.8235
Epoch 297/300
2585/2585 [==============================] - 0s 27us/step - loss: 1.5328 - acc: 0.7551 - val_loss: 0.5954 - val_acc: 0.8235
Epoch 298/300
2585/2585 [==============================] - 0s 20us/step - loss: 1.6289 - acc: 0.7524 - val_loss: 0.5897 - val_acc: 0.8235
Epoch 299/300
2585/2585 [==============================] - 0s 21us/step - loss: 1.7000 - acc: 0.7582 - val_loss: 0.5854 - val_acc: 0.8235
Epoch 300/300
2585/2585 [==============================] - 0s 25us/step - loss: 1.5475 - acc: 0.7451 - val_loss: 0.5934 - val_acc: 0.8235
Accuracy: 82.35%
Please help me with my problem. Thank you.
You may have too much regularization for your model causing it to underfit your data.
A good way to start is begin with no regularization at all (no Dropout, no weights decay, ..) and look if it's overfitting:
If not, regularization is useless
If it's overfitting, add regularization little by little, start by small dropout / weights decay, and then icrease it if it's continue to overfit
Moroever, don't put Dropout as final layer, and don't put two Dropout layers successively.
Your training accuracy is less than validation accuracy likely because of using dropout: it "turns off" some neurons during training to prevent overfitting. During validation dropout is off, so your network uses all its neurons, thus making (in that particular case) more accurate predictions.
In general, I agree with advice by Thibault Bacqueyrisses and want to add that it's also usually a bad practice to put dropout before batch normalization (which is not about this particular case anyway).

Categories