I am trying to build a deep learning model on Keras for a test and I am not very good at this. I have a scaled dataset with 128 features and these correspond to 6 different classes.
I have already tried adding/deleting layers or using regularisation like dropout/l1/l2, My model learns and accuracy goes up so high. But accuracy on test set is around 15%.
from tensorflow.keras.layers import Dense, Dropout
model = Sequential()
model.add(Dense(128, activation='tanh', input_shape=(128,)))
model.add(Dropout(0.5))
model.add(Dense(60, activation='tanh'))
model.add(Dropout(0.5))
model.add(Dense(20, activation='tanh'))
model.add(Dropout(0.5))
model.add(Dense(6, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy'])
model.fit(train_X, train_y, epochs=20, batch_size=32, verbose=1)
6955/6955 [==============================] - 1s 109us/sample - loss: 1.5805 - acc: 0.3865
Epoch 2/20
6955/6955 [==============================] - 0s 71us/sample - loss: 1.1512 - acc: 0.6505
Epoch 3/20
6955/6955 [==============================] - 0s 71us/sample - loss: 0.9191 - acc: 0.7307
Epoch 4/20
6955/6955 [==============================] - 0s 67us/sample - loss: 0.7819 - acc: 0.7639
Epoch 5/20
6955/6955 [==============================] - 0s 66us/sample - loss: 0.6939 - acc: 0.7882
Epoch 6/20
6955/6955 [==============================] - 0s 69us/sample - loss: 0.6284 - acc: 0.8099
Epoch 7/20
6955/6955 [==============================] - 0s 70us/sample - loss: 0.5822 - acc: 0.8240
Epoch 8/20
6955/6955 [==============================] - 1s 73us/sample - loss: 0.5305 - acc: 0.8367
Epoch 9/20
6955/6955 [==============================] - 1s 75us/sample - loss: 0.5130 - acc: 0.8441
Epoch 10/20
6955/6955 [==============================] - 1s 75us/sample - loss: 0.4703 - acc: 0.8591
Epoch 11/20
6955/6955 [==============================] - 1s 73us/sample - loss: 0.4679 - acc: 0.8650
Epoch 12/20
6955/6955 [==============================] - 1s 77us/sample - loss: 0.4399 - acc: 0.8705
Epoch 13/20
6955/6955 [==============================] - 1s 80us/sample - loss: 0.4055 - acc: 0.8904
Epoch 14/20
6955/6955 [==============================] - 1s 77us/sample - loss: 0.3965 - acc: 0.8874
Epoch 15/20
6955/6955 [==============================] - 1s 77us/sample - loss: 0.3964 - acc: 0.8877
Epoch 16/20
6955/6955 [==============================] - 1s 77us/sample - loss: 0.3564 - acc: 0.9048
Epoch 17/20
6955/6955 [==============================] - 1s 80us/sample - loss: 0.3517 - acc: 0.9087
Epoch 18/20
6955/6955 [==============================] - 1s 78us/sample - loss: 0.3254 - acc: 0.9133
Epoch 19/20
6955/6955 [==============================] - 1s 78us/sample - loss: 0.3367 - acc: 0.9116
Epoch 20/20
6955/6955 [==============================] - 1s 76us/sample - loss: 0.3165 - acc: 0.9192
The result I am recieving 39% With other models like GBM or XGB I can reach upto 85%
What am I doing wrong? Any suggestions?
Related
I am writing a neural network for translating texts from Russian to English, but I ran into the problem that my neural network gives a big loss, as well as a very far from the correct answer.
Below is the LSTM that I build using Keras:
def make_model(in_vocab, out_vocab, in_timesteps, out_timesteps, n):
model = Sequential()
model.add(Embedding(in_vocab, n, input_length=in_timesteps, mask_zero=True))
model.add(LSTM(n))
model.add(Dropout(0.3))
model.add(RepeatVector(out_timesteps))
model.add(LSTM(n, return_sequences=True))
model.add(Dropout(0.3))
model.add(Dense(out_vocab, activation='softmax'))
model.compile(optimizer=optimizers.RMSprop(lr=0.001), loss='sparse_categorical_crossentropy')
return model
And also the learning process is presented:
Epoch 1/10
3/3 [==============================] - 5s 1s/step - loss: 8.3635 - accuracy: 0.0197 - val_loss: 8.0575 - val_accuracy: 0.0563
Epoch 2/10
3/3 [==============================] - 2s 806ms/step - loss: 7.9505 - accuracy: 0.0334 - val_loss: 8.2927 - val_accuracy: 0.0743
Epoch 3/10
3/3 [==============================] - 2s 812ms/step - loss: 7.7977 - accuracy: 0.0349 - val_loss: 8.2959 - val_accuracy: 0.0571
Epoch 4/10
3/3 [==============================] - 3s 825ms/step - loss: 7.6700 - accuracy: 0.0389 - val_loss: 8.5628 - val_accuracy: 0.0751
Epoch 5/10
3/3 [==============================] - 3s 829ms/step - loss: 7.5595 - accuracy: 0.0411 - val_loss: 8.5854 - val_accuracy: 0.0743
Epoch 6/10
3/3 [==============================] - 3s 807ms/step - loss: 7.4604 - accuracy: 0.0406 - val_loss: 8.7633 - val_accuracy: 0.0743
Epoch 7/10
3/3 [==============================] - 2s 815ms/step - loss: 7.3475 - accuracy: 0.0436 - val_loss: 8.9103 - val_accuracy: 0.0743
Epoch 8/10
3/3 [==============================] - 3s 825ms/step - loss: 7.2548 - accuracy: 0.0455 - val_loss: 9.0493 - val_accuracy: 0.0721
Epoch 9/10
3/3 [==============================] - 2s 814ms/step - loss: 7.1751 - accuracy: 0.0449 - val_loss: 9.0740 - val_accuracy: 0.0788
Epoch 10/10
3/3 [==============================] - 3s 831ms/step - loss: 7.1132 - accuracy: 0.0479 - val_loss: 9.2443 - val_accuracy: 0.0773
And the parameters that I transmit for training:
model = make_model(# the size of tokenized words
russian_vocab_size,
english_vocab_size,
# maximum sentence lengths
max_russian_sequence_length,
max_english_sequence_length,
512)
model.fit(preproc_russian_sentences, # all tokenized Russian offers that are transmitted in the format shape (X, Y)
preproc_english_sentences, # all tokenized English offers that are transmitted in the format shape (X, Y, 1)
epochs=10,
batch_size=1024,
validation_split=0.2,
callbacks=None,
verbose=1)
Thank you in advance.
I have a classification model that is clearly overfitting and the validation accuracy doesn't change.
I've tried using feature selection and feature extraction methods but they didn't help.
feature selection method:
fs = SelectKBest(f_classif, k=10)
fs.fit(x, y)
feature_index = fs.get_support(True)
feature_index = feature_index.tolist()
best_features = []
# makes list of best features
for index in feature_index:
best_features.append(x[:, index])
x = (np.array(best_features, dtype=np.float)).T
model:
def model_and_print(x, y, Epochs, Batch_Size, loss, opt, class_weight, callback):
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
# K-fold Cross Validation model evaluation
fold_no = 1
for train, test in kfold.split(x, y):
# create model
model = Sequential()
model.add(Dropout(0.2, input_shape=(len(x[0]),)))
model.add(Dense(6, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(8, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.4))
model.add(Dense(1, activation=tf.nn.sigmoid))
# compile model
model.compile(optimizer=opt,
loss=loss, metrics=['accuracy']
)
history = model.fit(x[train], y[train], validation_data=(x[test], y[test]), epochs=Epochs,
batch_size=Batch_Size, verbose=1)
def main():
data = ["data.pkl", "data_list.pkl", "data_mean.pkl"]
df = pd.read_pickle(data[2])
x, y = data_frame_to_feature_and_target_arrays(df)
# hyper meters
Epochs = 200
Batch_Size = 1
learning_rate = 0.003
optimizer = optimizers.Adam(learning_rate=learning_rate)
loss = "binary_crossentropy"
model_and_print(x, y, Epochs, Batch_Size, loss, optimizer, class_weight, es_callback)
if __name__ == "__main__":
main()
output for part of one fold:
1/73 [..............................] - ETA: 0s - loss: 0.6470 - accuracy: 1.0000
62/73 [========================>.....] - ETA: 0s - loss: 0.5665 - accuracy: 0.7097
73/73 [==============================] - 0s 883us/step - loss: 0.5404 - accuracy: 0.7534 - val_loss: 0.5576 - val_accuracy: 0.5000
Epoch 100/200
1/73 [..............................] - ETA: 0s - loss: 0.4743 - accuracy: 1.0000
69/73 [===========================>..] - ETA: 0s - loss: 0.6388 - accuracy: 0.6522
73/73 [==============================] - 0s 806us/step - loss: 0.6316 - accuracy: 0.6575 - val_loss: 0.5592 - val_accuracy: 0.5000
Epoch 101/200
1/73 [..............................] - ETA: 0s - loss: 0.6005 - accuracy: 1.0000
69/73 [===========================>..] - ETA: 0s - loss: 0.5656 - accuracy: 0.7101
73/73 [==============================] - 0s 806us/step - loss: 0.5641 - accuracy: 0.7123 - val_loss: 0.5629 - val_accuracy: 0.5000
Epoch 102/200
1/73 [..............................] - ETA: 0s - loss: 0.2126 - accuracy: 1.0000
65/73 [=========================>....] - ETA: 0s - loss: 0.5042 - accuracy: 0.8000
73/73 [==============================] - 0s 847us/step - loss: 0.5340 - accuracy: 0.7671 - val_loss: 0.5608 - val_accuracy: 0.5000
Epoch 103/200
1/73 [..............................] - ETA: 0s - loss: 0.8801 - accuracy: 0.0000e+00
68/73 [==========================>...] - ETA: 0s - loss: 0.5754 - accuracy: 0.6471
73/73 [==============================] - 0s 819us/step - loss: 0.5780 - accuracy: 0.6575 - val_loss: 0.5639 - val_accuracy: 0.5000
Epoch 104/200
1/73 [..............................] - ETA: 0s - loss: 0.0484 - accuracy: 1.0000
70/73 [===========================>..] - ETA: 0s - loss: 0.5711 - accuracy: 0.7571
73/73 [==============================] - 0s 806us/step - loss: 0.5689 - accuracy: 0.7534 - val_loss: 0.5608 - val_accuracy: 0.5000
Epoch 105/200
1/73 [..............................] - ETA: 0s - loss: 0.1237 - accuracy: 1.0000
69/73 [===========================>..] - ETA: 0s - loss: 0.5953 - accuracy: 0.7101
73/73 [==============================] - 0s 820us/step - loss: 0.5922 - accuracy: 0.7260 - val_loss: 0.5672 - val_accuracy: 0.5000
Epoch 106/200
1/73 [..............................] - ETA: 0s - loss: 0.3360 - accuracy: 1.0000
67/73 [==========================>...] - ETA: 0s - loss: 0.5175 - accuracy: 0.7313
73/73 [==============================] - 0s 847us/step - loss: 0.5320 - accuracy: 0.7397 - val_loss: 0.5567 - val_accuracy: 0.5000
Epoch 107/200
1/73 [..............................] - ETA: 0s - loss: 0.1384 - accuracy: 1.0000
67/73 [==========================>...] - ETA: 0s - loss: 0.5435 - accuracy: 0.6866
73/73 [==============================] - 0s 833us/step - loss: 0.5541 - accuracy: 0.6575 - val_loss: 0.5629 - val_accuracy: 0.5000
Epoch 108/200
1/73 [..............................] - ETA: 0s - loss: 0.2647 - accuracy: 1.0000
69/73 [===========================>..] - ETA: 0s - loss: 0.6047 - accuracy: 0.6232
73/73 [==============================] - 0s 820us/step - loss: 0.5948 - accuracy: 0.6301 - val_loss: 0.5660 - val_accuracy: 0.5000
Epoch 109/200
1/73 [..............................] - ETA: 0s - loss: 0.8837 - accuracy: 0.0000e+00
66/73 [==========================>...] - ETA: 0s - loss: 0.5250 - accuracy: 0.7576
73/73 [==============================] - 0s 861us/step - loss: 0.5357 - accuracy: 0.7397 - val_loss: 0.5583 - val_accuracy: 0.5000
Epoch 110/200
final accuracy:
Score for fold 10: loss of 0.5600861310958862; accuracy of 50.0%
my question is what can I do about the overfitting if I've tried feature extraction and dropout layers and why is the validation accuracy not changing?
I am building a DNN with keras to classify between background or signal events (HEP). Nevertheless the loss and the accuracy are not changing.
I already tried changing the parameters on the optimizer, normalizing the data, changing the number of layers, neurons, epochs, initializing the weights, etc.
Here's the model:
epochs = 20
num_features = 2
num_classes = 2
batch_size = 32
# model
print("\n Building model...")
model = Sequential()
model.add(Dropout(0.2))
model.add(Dense(128, input_shape=(2,), activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes,activation=tf.nn.softmax))
print("\n Compiling model...")
opt = adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0,
amsgrad=False)
# compile model
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
print("\n Fitting model...")
history = model.fit(x_train, y_train, epochs = epochs,
batch_size = batch_size, validation_data = (x_test, y_test))
I'm expecting a change in the loss but it won't decrease from 0.69-ish.
The epochs report
Building model...
Compiling model...
Fitting model...
Train on 18400 samples, validate on 4600 samples
Epoch 1/20
18400/18400 [==============================] - 1s 71us/step - loss: 0.6939 - acc: 0.4965 - val_loss: 0.6933 - val_acc: 0.5000
Epoch 2/20
18400/18400 [==============================] - 1s 60us/step - loss: 0.6935 - acc: 0.5045 - val_loss: 0.6933 - val_acc: 0.5000
Epoch 3/20
18400/18400 [==============================] - 1s 69us/step - loss: 0.6937 - acc: 0.4993 - val_loss: 0.6934 - val_acc: 0.5000
Epoch 4/20
18400/18400 [==============================] - 1s 65us/step - loss: 0.6939 - acc: 0.4984 - val_loss: 0.6932 - val_acc: 0.5000
Epoch 5/20
18400/18400 [==============================] - 1s 58us/step - loss: 0.6936 - acc: 0.5000 - val_loss: 0.6936 - val_acc: 0.5000
Epoch 6/20
18400/18400 [==============================] - 1s 57us/step - loss: 0.6937 - acc: 0.4913 - val_loss: 0.6932 - val_acc: 0.5000
Epoch 7/20
18400/18400 [==============================] - 1s 58us/step - loss: 0.6935 - acc: 0.5008 - val_loss: 0.6932 - val_acc: 0.5000
Epoch 8/20
18400/18400 [==============================] - 1s 63us/step - loss: 0.6936 - acc: 0.5013 - val_loss: 0.6936 - val_acc: 0.5000
Epoch 9/20
18400/18400 [==============================] - 1s 67us/step - loss: 0.6936 - acc: 0.4924 - val_loss: 0.6932 - val_acc: 0.5000
Epoch 10/20
18400/18400 [==============================] - 1s 61us/step - loss: 0.6933 - acc: 0.5067 - val_loss: 0.6934 - val_acc: 0.5000
Epoch 11/20
18400/18400 [==============================] - 1s 64us/step - loss: 0.6938 - acc: 0.4972 - val_loss: 0.6931 - val_acc: 0.5000
Epoch 12/20
18400/18400 [==============================] - 1s 64us/step - loss: 0.6936 - acc: 0.4991 - val_loss: 0.6934 - val_acc: 0.5000
Epoch 13/20
18400/18400 [==============================] - 1s 70us/step - loss: 0.6937 - acc: 0.4960 - val_loss: 0.6935 - val_acc: 0.5000
Epoch 14/20
18400/18400 [==============================] - 1s 63us/step - loss: 0.6935 - acc: 0.4992 - val_loss: 0.6932 - val_acc: 0.5000
Epoch 15/20
18400/18400 [==============================] - 1s 61us/step - loss: 0.6937 - acc: 0.4940 - val_loss: 0.6931 - val_acc: 0.5000
Epoch 16/20
18400/18400 [==============================] - 1s 68us/step - loss: 0.6933 - acc: 0.5067 - val_loss: 0.6936 - val_acc: 0.5000
Epoch 17/20
18400/18400 [==============================] - 1s 58us/step - loss: 0.6938 - acc: 0.4997 - val_loss: 0.6935 - val_acc: 0.5000
Epoch 18/20
18400/18400 [==============================] - 1s 56us/step - loss: 0.6936 - acc: 0.4972 - val_loss: 0.6941 - val_acc: 0.5000
Epoch 19/20
18400/18400 [==============================] - 1s 57us/step - loss: 0.6934 - acc: 0.5061 - val_loss: 0.6954 - val_acc: 0.5000
Epoch 20/20
18400/18400 [==============================] - 1s 58us/step - loss: 0.6936 - acc: 0.5037 - val_loss: 0.6939 - val_acc: 0.5000
Update: My data preparation contains this
np.random.shuffle(x_train)
np.random.shuffle(y_train)
np.random.shuffle(x_test)
np.random.shuffle(y_test)
And I'm thinking it's changing the class for each data point cause the shuffle is done separately.
I want to predict ideal linear data (identical function)
data = np.asarray(range(100),dtype=np.float32)
I use to this linear function
model = Sequential([
Dense(1, input_shape=(1,))
])
model.compile(optimizer='sgd', loss='mse')
model.fit(data, data, epochs=10, batch_size=100)
but my loss function is increasing. What is wrong with this simple code?
Epoch 1/10
100/100 [==============================] - 1s 7ms/step - loss: 3559.4075
Epoch 2/10
100/100 [==============================] - 0s 20us/step - loss: 14893056.0000
Epoch 3/10
100/100 [==============================] - 0s 170us/step - loss: 62314639360.0000
Epoch 4/10
100/100 [==============================] - 0s 30us/step - loss: 260733187129344.0000
Epoch 5/10
100/100 [==============================] - 0s 70us/step - loss: 1090944439330799616.0000
Epoch 6/10
100/100 [==============================] - 0s 20us/step - loss: 4564665060617919397888.0000
Epoch 7/10
100/100 [==============================] - 0s 30us/step - loss: 19099198494067630815576064.0000
Epoch 8/10
100/100 [==============================] - 0s 30us/step - loss: 79913699011849558249925771264.0000
Epoch 9/10
100/100 [==============================] - 0s 50us/step - loss: 334370041805433555342669660553216.0000
Epoch 10/10
100/100 [==============================] - 0s 20us/step - loss: 1399051141583436919510296595359858688.0000
You need to standardize input features. And you can learn How and why do normalization and feature scaling work?. Let me just use (x-mean(x))/std(x)as an example here.
import numpy as np
from keras.layers import Dense
from keras.models import Sequential
data = np.asarray(range(100),dtype=np.float32)
model = Sequential([
Dense(1, input_shape=(1,))
])
model.compile(optimizer='sgd', loss='mse')
model.fit((data-np.mean(data))/np.std(data), data, epochs=200, batch_size=100)
Epoch 1/200
100/100 [==============================] - 3s 26ms/step - loss: 3284.6235
Epoch 2/200
100/100 [==============================] - 0s 25us/step - loss: 3154.5522
Epoch 3/200
100/100 [==============================] - 0s 22us/step - loss: 3029.6318
...
100/100 [==============================] - 0s 27us/step - loss: 1.1016
Epoch 200/200
100/100 [==============================] - 0s 28us/step - loss: 1.0579
I am trying to train some 200 pairs of images using siamese CNNs using Keras and notice that the validation accuracy doesn't change through the epochs.
Train on 144 samples, validate on 16 samples
Epoch 1/20
144/144 [==============================] - 51s 352ms/step - loss: 0.3041 - accuracy: 0.4375 - val_loss: 0.4816 - val_accuracy: 0.5000
Epoch 2/20
144/144 [==============================] - 56s 387ms/step - loss: 0.2819 - accuracy: 0.5208 - val_loss: 0.4816 - val_accuracy: 0.5000
Epoch 3/20
144/144 [==============================] - 47s 325ms/step - loss: 0.2784 - accuracy: 0.4861 - val_loss: 0.4816 - val_accuracy: 0.5000
Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0001500000071246177.
Epoch 4/20
144/144 [==============================] - 50s 349ms/step - loss: 0.2865 - accuracy: 0.4306 - val_loss: 0.4816 - val_accuracy: 0.5000
Epoch 5/20
144/144 [==============================] - 54s 377ms/step - loss: 0.2936 - accuracy: 0.4375 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 00005: ReduceLROnPlateau reducing learning rate to 4.500000213738531e-05.
Epoch 6/20
144/144 [==============================] - 50s 349ms/step - loss: 0.2980 - accuracy: 0.4097 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 7/20
144/144 [==============================] - 47s 324ms/step - loss: 0.2824 - accuracy: 0.4931 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 00007: ReduceLROnPlateau reducing learning rate to 1.3500000204658135e-05.
Epoch 8/20
144/144 [==============================] - 48s 336ms/step - loss: 0.2888 - accuracy: 0.4722 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 9/20
144/144 [==============================] - 45s 315ms/step - loss: 0.2572 - accuracy: 0.5417 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 00009: ReduceLROnPlateau reducing learning rate to 4.050000006827758e-06.
Epoch 10/20
144/144 [==============================] - 45s 313ms/step - loss: 0.2827 - accuracy: 0.5139 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 11/20
144/144 [==============================] - 46s 318ms/step - loss: 0.2660 - accuracy: 0.5764 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 00011: ReduceLROnPlateau reducing learning rate to 1.2149999747634864e-06.
Epoch 12/20
144/144 [==============================] - 58s 401ms/step - loss: 0.2869 - accuracy: 0.4583 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 13/20
144/144 [==============================] - 60s 417ms/step - loss: 0.2779 - accuracy: 0.5486 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 00013: ReduceLROnPlateau reducing learning rate to 3.644999992502562e-07.
Epoch 14/20
144/144 [==============================] - 51s 357ms/step - loss: 0.2959 - accuracy: 0.4722 - val_loss: 0.4815 - val_accuracy: 0.5000
Epoch 15/20
144/144 [==============================] - 49s 343ms/step - loss: 0.2729 - accuracy: 0.5069 - val_loss: 0.4815 - val_accuracy: 0.5000
My neural nets look like the below:
input_shape = X_train.shape[1:]
model = Sequential()
model.add(Conv2D(nb_filters, nb_conv, border_mode = 'valid', input_shape=(1, img_rows, img_cols), data_format = 'channels_first'))
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes, activation='sigmoid'))
left_input = Input(input_shape)
right_input = Input(input_shape)
processed_a = model(left_input)
processed_b = model(right_input)
distance = Lambda(euclidean_distance,output_shape=eucl_dist_output_shape)([processed_a, processed_b])
siamese_net = Model([left_input, right_input], distance)
I have tried different optimizers, different learning rates and regularization(dropouts) but not change in validation accuracy/ Loss.
How to improve it?