I have created a CNN model for classifying text data. Please help me interpret my result and tell me why is my training accuracy less than validation accuracy?
I have a total of 2619 Data, all of them are text data. There are two different classes. Here is a sample of my dataset.
The validation set has 34 data. Rest of 2619 data are training data.
I have done RepeatedKfold cross-validation. Here is my code.
from sklearn.model_selection import RepeatedKFold
kf = RepeatedKFold(n_splits=75, n_repeats=1, random_state= 42)
for train_index, test_index in kf.split(X,Y):
#print("Train:", train_index, "Validation:",test_index)
x_train, x_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = Y.iloc[train_index], Y.iloc[test_index]
I have used CNN. Here is my model.
model = Sequential()
model.add(Embedding(2900,2 , input_length=1))
model.add(Conv1D(filters=2, kernel_size=3, kernel_regularizer=l2(0.0005 ), bias_regularizer=l2(0.0005 ), padding='same', activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1, kernel_regularizer=l2(0.0005 ), bias_regularizer=l2(0.0005 ), activation='sigmoid'))
model.add(Dropout(0.25))
adam = optimizers.Adam(lr = 0.0005, beta_1 = 0.9, beta_2 = 0.999, epsilon = None, decay = 0.0, amsgrad = False)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
print(model.summary())
history = model.fit(x_train, y_train, epochs=300,validation_data=(x_test, y_test), batch_size=128, shuffle=False)
# Final evaluation of the model
scores = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
And here is the result.
Epoch 295/300
2585/2585 [==============================] - 0s 20us/step - loss: 1.6920 - acc: 0.7528 - val_loss: 0.5839 - val_acc: 0.8235
Epoch 296/300
2585/2585 [==============================] - 0s 20us/step - loss: 1.6532 - acc: 0.7617 - val_loss: 0.5836 - val_acc: 0.8235
Epoch 297/300
2585/2585 [==============================] - 0s 27us/step - loss: 1.5328 - acc: 0.7551 - val_loss: 0.5954 - val_acc: 0.8235
Epoch 298/300
2585/2585 [==============================] - 0s 20us/step - loss: 1.6289 - acc: 0.7524 - val_loss: 0.5897 - val_acc: 0.8235
Epoch 299/300
2585/2585 [==============================] - 0s 21us/step - loss: 1.7000 - acc: 0.7582 - val_loss: 0.5854 - val_acc: 0.8235
Epoch 300/300
2585/2585 [==============================] - 0s 25us/step - loss: 1.5475 - acc: 0.7451 - val_loss: 0.5934 - val_acc: 0.8235
Accuracy: 82.35%
Please help me with my problem. Thank you.
You may have too much regularization for your model causing it to underfit your data.
A good way to start is begin with no regularization at all (no Dropout, no weights decay, ..) and look if it's overfitting:
If not, regularization is useless
If it's overfitting, add regularization little by little, start by small dropout / weights decay, and then icrease it if it's continue to overfit
Moroever, don't put Dropout as final layer, and don't put two Dropout layers successively.
Your training accuracy is less than validation accuracy likely because of using dropout: it "turns off" some neurons during training to prevent overfitting. During validation dropout is off, so your network uses all its neurons, thus making (in that particular case) more accurate predictions.
In general, I agree with advice by Thibault Bacqueyrisses and want to add that it's also usually a bad practice to put dropout before batch normalization (which is not about this particular case anyway).
Related
I'm trying to build and train LSTM Neural Network.
Here is my code (summary version):
X_train, X_test, y_train, y_test = train_test_split(np.array(sequences), to_categorical(labels).astype(int), test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)
log_dir = os.path.join('Logs')
tb_callback = TensorBoard(log_dir=log_dir)
model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='tanh', input_shape=(60,1662)))
model.add(LSTM(128, return_sequences=True, activation='tanh', dropout=0.31))
model.add(LSTM(64, return_sequences=False, activation='tanh'))
model.add(Dense(32, activation='relu'))
model.add(Dense(len(actions), activation='softmax'))
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])
val_dataset = tf.data.Dataset.from_tensor_slices((X_val, y_val)) # default slice percentage check
val_dataset = val_dataset.batch(256)
model.fit(X_train, y_train, batch_size=256, epochs=250, callbacks=[tb_callback], validation_data=val_dataset)
And model fit result:
Epoch 248/250
8/8 [==============================] - 2s 252ms/step - loss: 0.4563 - categorical_accuracy: 0.8641 - val_loss: 2.1406 - val_categorical_accuracy: 0.6104
Epoch 249/250
8/8 [==============================] - 2s 255ms/step - loss: 0.4542 - categorical_accuracy: 0.8672 - val_loss: 2.2365 - val_categorical_accuracy: 0.5667
Epoch 250/250
8/8 [==============================] - 2s 234ms/step - loss: 0.4865 - categorical_accuracy: 0.8562 - val_loss: 2.1668 - val_categorical_accuracy: 0.5875
I wanna reduce value gap between categorical_accuracy and val_categorical_accuracy.
Can I know how to do it?
Thank you for reading my article.
When there is so much difference between your train and validation data this mean your model is overfitting.
So look for how prevent from overfitting. Usually what you have to do is add more data to your dataset.
It won’t work every time, but training with more data can help algorithms detect the signal better.
Try to stop before overfit
another aspect is to try stop the model and reduce the learning rate
I am running tf.keras.callbacks.ModelCheckpoint with the accuracy metric but loss is used to save the best checkpoints. I have tested this in different places (my computer and collab) and two different code and faced the same issue. Here is an example code and the results:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
import shutil
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(
optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
return model
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
ckpt_folder = os.path.join(os.getcwd(), 'ckpt')
if os.path.exists(ckpt_folder):
shutil.rmtree(ckpt_folder)
ckpt_path = os.path.join(r'D:\deep_learning\tf_keras\semantic_segmentation\logs', 'mymodel_{epoch}')
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
# Path where to save the model
# The two parameters below mean that we will overwrite
# the current checkpoint if and only if
# the `val_loss` score has improved.
# The saved model name will include the current epoch.
filepath=ckpt_path,
montior="val_accuracy",
# save the model weights with best validation accuracy
mode='max',
save_best_only=True, # only save the best weights
save_weights_only=False,
# only save model weights (not whole model)
verbose=1
)
]
model = get_compiled_model()
model.fit(
x_train, y_train, epochs=3, batch_size=1, callbacks=callbacks, validation_split=0.2, steps_per_epoch=1
)
1/1 [==============================] - ETA: 0s - loss: 2.6475 - accuracy: 0.0000e+00
Epoch 1: val_loss improved from -inf to 2.32311, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_1
1/1 [==============================] - 6s 6s/step - loss: 2.6475 - accuracy: 0.0000e+00 - val_loss: 2.3231 - val_accuracy: 0.1142
Epoch 2/3
1/1 [==============================] - ETA: 0s - loss: 1.9612 - accuracy: 1.0000
Epoch 2: val_loss improved from 2.32311 to 2.34286, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_2
1/1 [==============================] - 5s 5s/step - loss: 1.9612 - accuracy: 1.0000 - val_loss: 2.3429 - val_accuracy: 0.1187
Epoch 3/3
1/1 [==============================] - ETA: 0s - loss: 2.8378 - accuracy: 0.0000e+00
Epoch 3: val_loss did not improve from 2.34286
1/1 [==============================] - 5s 5s/step - loss: 2.8378 - accuracy: 0.0000e+00 - val_loss: 2.2943 - val_accuracy: 0.1346
In your code, You write montior instead of monitor, and the function doesn't have this word as param then use the default value, If you write like below, You get what you want:
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
filepath=ckpt_path,
monitor="val_accuracy",
mode='max',
save_best_only=True,
save_weights_only=False,
verbose=1
)
]
Working on IMDB Dataset Google Colab, the model accuracy refuses to go above 50%.
The dataset has already been tokenized and cleaned before this.
Any suggestions on how the accuracy can be improved are welcome.
le=LabelEncoder()
df['sentiment']= le.fit_transform(df['sentiment'])
labels=to_categorical(df['sentiment'],num_classes=2) # this is output
max_len = 400
embeddings=256
sequences = tokenizer.texts_to_sequences(df['review'])
sequences_padded=pad_sequences(sequences,maxlen=max_len,padding='post',truncating='post')
num words = 10000
embeddings = 256
max_len=400
X_train,X_test,y_train,y_test=train_test_split(sequences_padded,labels,test_size=0.20,random_state=42)
model= keras.Sequential()
model.add(Embedding(num_words,embeddings,input_length=max_len))
model.add(Conv1D(256,10,activation='relu'))
model.add(keras.layers.Bidirectional(LSTM(128,return_sequences=True,kernel_regularizer=tf.keras.regularizers.l1(0.01),activity_regularizer=tf.keras.regularizers.l2(0.01))))
model.add(LSTM(64))
model.add(keras.layers.Dropout(0.4))
model.add(Dense(2,activation='softmax'))
model.summary()
history=model.fit(X_train,y_train,epochs=3, batch_size=128, verbose=1)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
Epoch 1/3
310/310 [==============================] - 157s 398ms/step - loss: 35.7756 - accuracy: 0.5007
Epoch 2/3
310/310 [==============================] - 123s 395ms/step - loss: 1.0212 - accuracy: 0.5003
Epoch 3/3
310/310 [==============================] - 123s 397ms/step - loss: 1.0211 - accuracy: 0.5015
Update:
Model accuracy started improving when I changed from post to pre padding. Any leads on why this happens would be highly appreciated.
You use binary_crossentropy + softmax instead of categorical_crossentropy + softmax.
Change to:
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
I'm a beginner in deep learning and I'm trying to train a deep learning model to classify different ASL hand signs using Mobilenet_v2 and Inception.
Here are my codes create an ImageDataGenerator for creating the training and validation set.
# Reformat Images and Create Batches
IMAGE_RES = 224
BATCH_SIZE = 32
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split = 0.4
)
train_generator = datagen.flow_from_directory(
base_dir,
target_size = (IMAGE_RES,IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'training'
)
val_generator = datagen.flow_from_directory(
base_dir,
target_size= (IMAGE_RES, IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'validation'
)
Here are the codes to train the models:
# Do transfer learning with Tensorflow Hub
URL = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
feature_extractor = hub.KerasLayer(URL,
input_shape=(IMAGE_RES, IMAGE_RES, 3))
# Freeze pre-trained model
feature_extractor.trainable = False
# Attach a classification head
model = tf.keras.Sequential([
feature_extractor,
layers.Dense(5, activation='softmax')
])
model.summary()
# Train the model
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
EPOCHS = 5
history = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=EPOCHS,
validation_data = val_generator,
validation_steps=len(val_generator)
)
Epoch 1/5
94/94 [==============================] - 19s 199ms/step - loss: 0.7333 - accuracy: 0.7730 - val_loss: 0.6276 - val_accuracy: 0.7705
Epoch 2/5
94/94 [==============================] - 18s 190ms/step - loss: 0.1574 - accuracy: 0.9893 - val_loss: 0.5118 - val_accuracy: 0.8145
Epoch 3/5
94/94 [==============================] - 18s 191ms/step - loss: 0.0783 - accuracy: 0.9980 - val_loss: 0.4850 - val_accuracy: 0.8235
Epoch 4/5
94/94 [==============================] - 18s 196ms/step - loss: 0.0492 - accuracy: 0.9997 - val_loss: 0.4541 - val_accuracy: 0.8395
Epoch 5/5
94/94 [==============================] - 18s 193ms/step - loss: 0.0349 - accuracy: 0.9997 - val_loss: 0.4590 - val_accuracy: 0.8365
I've tried using data augmentation but the model still overfits so I'm wondering if I've done something wrong in my code.
Your data is very small. Try splitting with random seeds and check if the problem still persists.
If it does, then use regularizations and decrease the complexity of neural network.
Also experiment with different optimizers and smaller learning rate (try lr scheduler)
It seems like your dataset is very small with some true outputs separated only by a small distance of inputs in the input-output curve. That is why it is fitting easily to those points.
I have an unbalanced dataset (only 0.06% of data is labeled 1 and the rest are labeled 0). As I researched, I had to resample the data, so I used imblearn package to randomUnserSample my dataset. Then I used a Keras Sequential to create a neural network. While training, F1Score increases to around 75% (1000th epoch result is: loss: 0.5691 - acc: 0.7543 - f1_m: 0.7525 - precision_m: 0.7582 - recall_m: 0.7472), but on test set, the result is disappointing (loss: 55.35181%, acc: 79.25248%, f1_m: 0.39789%, precision_m: 0.23259%, recall_m: 1.54982%).
What I assume is that on train set, because the number of 1s and 0s are the same and therefore class_wights are both set to 1, so the network is not costing much for wrong predictions of 1s.
I have used some techniques like reducing the number of layers, reducing number of neurons, using regularization and dropouts, but the test set f1Score is never more than 0.5%. What should I do. Thanks
my neural network:
def neural_network(X, y, epochs_count=3, handle_overfit=False):
# create model
model = Sequential()
model.add(Dense(12, input_dim=len(X_test.columns), activation='relu'))
if (handle_overfit):
model.add(Dropout(rate = 0.5))
model.add(Dense(8, activation='relu', kernel_regularizer=regularizers.l1(0.1)))
if (handle_overfit):
model.add(Dropout(rate = 0.1))
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc', f1_m, precision_m, recall_m])
# change weights of the classes '0' and '1' and set weights automatically
class_weights = class_weight.compute_class_weight('balanced', [0, 1], y)
print("---------------------- \n chosen class_wieghts are: ", class_weights, " \n ---------------------")
# Fit the model
model.fit(X, y, epochs=epochs_count, batch_size=512, class_weight=class_weights)
return model
defining train and test set:
vtrain_set, test_set = train_test_split(data, test_size=0.35, random_state=0)
X_train = train_set[['..... some columns ....']]
y_train = train_set[['success']]
print('Initial dataset shape: ', X_train.shape)
rus = RandomUnderSampler(random_state=42)
X_undersampled, y_undersampled = rus.fit_sample(X_train, y_train)
print('undersampled dataset shape: ', X_undersampled.shape)
and result is:
Initial dataset shape: (1625843, 11)
undersampled dataset shape: (1970, 11)
and finally the neural network calling:
print (X_undersampled.shape, y_undersampled.shape)
print (X_test.shape, y_test.shape)
model = neural_network(X_undersampled, y_undersampled, 1000, handle_overfit=True)
# evaluate the model
print("\n---------------\nEvaluated on test set:")
scores = model.evaluate(X_test, y_test)
for i in range(len(model.metrics_names)):
print("%s: %.5f%%" % (model.metrics_names[i], scores[i]*100))
and result is:
(1970, 11) (1970,)
(875454, 11) (875454, 1)
----------------------
chosen class_wieghts are: [1. 1.]
---------------------
Epoch 1/1000
1970/1970 [==============================] - 4s 2ms/step - loss: 4.5034 - acc: 0.5147 - f1_m: 0.3703 - precision_m: 0.5291 - recall_m: 0.2859
.
.
.
.
Epoch 999/1000
1970/1970 [==============================] - 0s 6us/step - loss: 0.5705 - acc: 0.7538 - f1_m: 0.7471 - precision_m: 0.7668 - recall_m: 0.7296
Epoch 1000/1000
1970/1970 [==============================] - 0s 6us/step - loss: 0.5691 - acc: 0.7543 - f1_m: 0.7525 - precision_m: 0.7582 - recall_m: 0.7472
---------------
Evaluated on test set:
875454/875454 [==============================] - 49s 56us/step
loss: 55.35181%
acc: 79.25248%
f1_m: 0.39789%
precision_m: 0.23259%
recall_m: 1.54982%