I trained LSTM classification model, but got weird results (0 accuracy). Here is my dataset with preprocessing steps:
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
import numpy as np
url = 'https://raw.githubusercontent.com/MislavSag/trademl/master/trademl/modeling/random_forest/X_TEST.csv'
X_TEST = pd.read_csv(url, sep=',')
url = 'https://raw.githubusercontent.com/MislavSag/trademl/master/trademl/modeling/random_forest/labeling_info_TEST.csv'
labeling_info_TEST = pd.read_csv(url, sep=',')
# TRAIN TEST SPLIT
X_train, X_test, y_train, y_test = train_test_split(
X_TEST.drop(columns=['close_orig']), labeling_info_TEST['bin'],
test_size=0.10, shuffle=False, stratify=None)
### PREPARE LSTM
x = X_train['close'].values.reshape(-1, 1)
y = y_train.values.reshape(-1, 1)
x_test = X_test['close'].values.reshape(-1, 1)
y_test = y_test.values.reshape(-1, 1)
train_val_index_split = 0.75
train_generator = keras.preprocessing.sequence.TimeseriesGenerator(
data=x,
targets=y,
length=30,
sampling_rate=1,
stride=1,
start_index=0,
end_index=int(train_val_index_split*X_TEST.shape[0]),
shuffle=False,
reverse=False,
batch_size=128
)
validation_generator = keras.preprocessing.sequence.TimeseriesGenerator(
data=x,
targets=y,
length=30,
sampling_rate=1,
stride=1,
start_index=int((train_val_index_split*X_TEST.shape[0] + 1)),
end_index=None, #int(train_test_index_split*X.shape[0])
shuffle=False,
reverse=False,
batch_size=128
)
test_generator = keras.preprocessing.sequence.TimeseriesGenerator(
data=x_test,
targets=y_test,
length=30,
sampling_rate=1,
stride=1,
start_index=0,
end_index=None,
shuffle=False,
reverse=False,
batch_size=128
)
# convert generator to inmemory 3D series (if enough RAM)
def generator_to_obj(generator):
xlist = []
ylist = []
for i in range(len(generator)):
x, y = train_generator[i]
xlist.append(x)
ylist.append(y)
X_train = np.concatenate(xlist, axis=0)
y_train = np.concatenate(ylist, axis=0)
return X_train, y_train
X_train_lstm, y_train_lstm = generator_to_obj(train_generator)
X_val_lstm, y_val_lstm = generator_to_obj(validation_generator)
X_test_lstm, y_test_lstm = generator_to_obj(test_generator)
# test for shapes
print('X and y shape train: ', X_train_lstm.shape, y_train_lstm.shape)
print('X and y shape validate: ', X_val_lstm.shape, y_val_lstm.shape)
print('X and y shape test: ', X_test_lstm.shape, y_test_lstm.shape)
and here is my model with resuslts:
### MODEL
model = keras.models.Sequential([
keras.layers.LSTM(124, return_sequences=True, input_shape=[None, 1]),
keras.layers.LSTM(258),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train_lstm, y_train_lstm, epochs=10, batch_size=128,
validation_data=[X_val_lstm, y_val_lstm])
# history = model.fit_generator(train_generator, epochs=40, validation_data=validation_generator, verbose=1)
score, acc = model.evaluate(X_val_lstm, y_val_lstm,
batch_size=128)
historydf = pd.DataFrame(history.history)
historydf.head(10)
Why do I get 0 accuracy?
You're using sigmoid activation, which means your labels must be in range 0 and 1. But in your case, the labels are 1. and -1.
Just replace -1 with 0.
for i, y in enumerate(y_train_lstm):
if y == -1.:
y_train_lstm[i,:] = 0.
for i, y in enumerate(y_val_lstm):
if y == -1.:
y_val_lstm[i,:] = 0.
for i, y in enumerate(y_test_lstm):
if y == -1.:
y_test_lstm[i,:] = 0.
Sidenote:
The signals are very close, it would be hard to distinguish them. So, probably accuracy won't be high with simple models.
After training with 0. and 1. labels,
model = keras.models.Sequential([
keras.layers.LSTM(124, return_sequences=True, input_shape=(30, 1)),
keras.layers.LSTM(258),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train_lstm, y_train_lstm, epochs=5, batch_size=128,
validation_data=(X_val_lstm, y_val_lstm))
# history = model.fit_generator(train_generator, epochs=40, validation_data=validation_generator, verbose=1)
score, acc = model.evaluate(X_val_lstm, y_val_lstm,
batch_size=128)
historydf = pd.DataFrame(history.history)
historydf.head(10)
Epoch 1/5
12/12 [==============================] - 5s 378ms/step - loss: 0.7386 - accuracy: 0.4990 - val_loss: 0.6959 - val_accuracy: 0.4896
Epoch 2/5
12/12 [==============================] - 4s 318ms/step - loss: 0.6947 - accuracy: 0.5133 - val_loss: 0.6959 - val_accuracy: 0.5104
Epoch 3/5
12/12 [==============================] - 4s 318ms/step - loss: 0.6941 - accuracy: 0.4895 - val_loss: 0.6930 - val_accuracy: 0.5104
Epoch 4/5
12/12 [==============================] - 4s 332ms/step - loss: 0.6946 - accuracy: 0.5269 - val_loss: 0.6946 - val_accuracy: 0.5104
Epoch 5/5
12/12 [==============================] - 4s 334ms/step - loss: 0.6931 - accuracy: 0.4901 - val_loss: 0.6929 - val_accuracy: 0.5104
3/3 [==============================] - 0s 73ms/step - loss: 0.6929 - accuracy: 0.5104
loss accuracy val_loss val_accuracy
0 0.738649 0.498980 0.695888 0.489583
1 0.694708 0.513256 0.695942 0.510417
2 0.694117 0.489463 0.692987 0.510417
3 0.694554 0.526852 0.694613 0.510417
4 0.693118 0.490143 0.692936 0.510417
Source code in colab: https://colab.research.google.com/drive/10yRf4TfGDnp_4F2HYoxPyTlF18no-8Dr?usp=sharing
Related
I have created a transfer learning model with RESNET-50. I am using K-fold cross-validation in my model. However, my model is not performing properly. Although I didn't get any error messages, my accuracy values are very weird. They are the same across all the folds. Please find my code below:
root_path = 'D:/regionGrowing_MLT/NewSavedRGBImages'
datasetFolderName=root_path
MODEL_FILENAME=root_path+"model_cv.h5"
sourceFiles=[]
classLabels=['Benign', 'Malignant']
X=[]
Y=[]
img_rows, img_cols = 224, 224 # input image dimensions
train_path=datasetFolderName+'/Training/'
validation_path=datasetFolderName+'/validation/'
test_path=datasetFolderName+'/test/'
def transferBetweenFolders(source, dest, splitRate):
global sourceFiles
sourceFiles=os.listdir(source)
if(len(sourceFiles)!=0):
transferFileNumbers=int(len(sourceFiles)*splitRate)
transferIndex=random.sample(range(0, len(sourceFiles)), transferFileNumbers)
for eachIndex in transferIndex:
shutil.move(source+str(sourceFiles[eachIndex]), dest+str(sourceFiles[eachIndex]))
else:
print("No file moved. Source empty!")
def transferAllClassBetweenFolders(source, dest, splitRate):
for label in classLabels:
transferBetweenFolders(datasetFolderName+'/'+source+'/'+label+'/',
datasetFolderName+'/'+dest+'/'+label+'/',
splitRate)
def my_metrics(y_true, y_pred):
accuracy=accuracy_score(y_true, y_pred)
precision=precision_score(y_true, y_pred,average='weighted')
f1Score=f1_score(y_true, y_pred, average='weighted')
print("Accuracy : {}".format(accuracy))
print("Precision : {}".format(precision))
print("f1Score : {}".format(f1Score))
cm=confusion_matrix(y_true, y_pred)
print(cm)
return accuracy, precision, f1Score
transferAllClassBetweenFolders('Training', 'test', 0.20)
def prepareNameWithLabels(folderName):
sourceFiles=os.listdir(datasetFolderName+'/Training/'+folderName)
for val in sourceFiles:
X.append(val)
for i in range(len(classLabels)):
if(folderName==classLabels[i]):
Y.append(i)
# Organize file names and class labels in X and Y variables
for i in range(len(classLabels)):
prepareNameWithLabels(classLabels[i])
X=np.asarray(X)
Y=np.asarray(Y)
batch_size = 32
epoch=100
activationFunction='relu'
def getModel():
model = Sequential()
model.add(Flatten())
model.add(Dense(256, activation='relu', name='fc1'))
model.add(Dense(128, activation='relu', name='fc2'))
model.add(layers.Dropout(0.5)) #### used for regularization (to aviod overfitting)
model.add(Dense(2, activation='softmax'))
# model.summary()
model.compile(optimizer=optimizers.Adam(learning_rate=2e-5),
loss='binary_crossentropy',
metrics=['accuracy'])
return model
model=getModel()
# ===============Stratified K-Fold======================
skf = StratifiedKFold(n_splits=5, shuffle=True)
skf.get_n_splits(X, Y)
foldNum=0
for train_index, val_index in skf.split(X, Y):
#First cut all images from validation to train (if any exists)
transferAllClassBetweenFolders('validation', 'Training', 1.0)
foldNum+=1
print("Results for fold",foldNum)
X_train, X_val = X[train_index], X[val_index]
Y_train, Y_val = Y[train_index], Y[val_index]
# Move validation images of this fold from train folder to the validation folder
for eachIndex in range(len(X_val)):
classLabel=''
for i in range(len(classLabels)):
if(Y_val[eachIndex]==i):
classLabel=classLabels[i]
#Then, copy the validation images to the validation folder
shutil.move(datasetFolderName+'/Training/'+classLabel+'/'+X_val[eachIndex],
datasetFolderName+'/validation/'+classLabel+'/'+X_val[eachIndex])
train_datagen = ImageDataGenerator(
rescale=1./255,
horizontal_flip=True,
rotation_range=40,
shear_range=0.2,
width_shift_range=0.2,
height_shift_range=0.2,
zoom_range=0.20,
fill_mode="nearest")
validation_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
#Start ImageClassification Model
train_generator = train_datagen.flow_from_directory(
train_path,
target_size=(img_rows, img_cols),
batch_size=batch_size,
class_mode='categorical',
subset='training')
validation_generator = validation_datagen.flow_from_directory(
validation_path,
target_size=(img_rows, img_cols),
batch_size=batch_size,
class_mode=None, # only data, no labels
shuffle=False)
# fit model
history=model.fit_generator(train_generator,
epochs=epoch)
predictions = model.predict_generator(validation_generator, verbose=1)
yPredictions = np.argmax(predictions, axis=1)
true_classes = validation_generator.classes
# evaluate validation performance
print("***Performance on Validation data***")
valAcc, valPrec, valFScore = my_metrics(true_classes, yPredictions)
My results are below:
Epoch 1/100
72/72 [==============================] - 25s 350ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 2/100
72/72 [==============================] - 25s 349ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 3/100
72/72 [==============================] - 25s 348ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 4/100
72/72 [==============================] - 25s 348ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 5/100
72/72 [==============================] - 25s 352ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 6/100
72/72 [==============================] - 25s 351ms/step - loss: 0.6931 - accuracy: 0.5013
Epoch 7/100
72/72 [==============================] - 25s 353ms/step - loss: 0.6931 - accuracy: 0.5013
As can be seen above, the accuracy values are the same for all epochs which shouldn't be the case. I am not sure exactly where I am making the error. Any suggestions would be appreciated. Thank you.
I am running tf.keras.callbacks.ModelCheckpoint with the accuracy metric but loss is used to save the best checkpoints. I have tested this in different places (my computer and collab) and two different code and faced the same issue. Here is an example code and the results:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
import shutil
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(
optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
return model
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
ckpt_folder = os.path.join(os.getcwd(), 'ckpt')
if os.path.exists(ckpt_folder):
shutil.rmtree(ckpt_folder)
ckpt_path = os.path.join(r'D:\deep_learning\tf_keras\semantic_segmentation\logs', 'mymodel_{epoch}')
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
# Path where to save the model
# The two parameters below mean that we will overwrite
# the current checkpoint if and only if
# the `val_loss` score has improved.
# The saved model name will include the current epoch.
filepath=ckpt_path,
montior="val_accuracy",
# save the model weights with best validation accuracy
mode='max',
save_best_only=True, # only save the best weights
save_weights_only=False,
# only save model weights (not whole model)
verbose=1
)
]
model = get_compiled_model()
model.fit(
x_train, y_train, epochs=3, batch_size=1, callbacks=callbacks, validation_split=0.2, steps_per_epoch=1
)
1/1 [==============================] - ETA: 0s - loss: 2.6475 - accuracy: 0.0000e+00
Epoch 1: val_loss improved from -inf to 2.32311, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_1
1/1 [==============================] - 6s 6s/step - loss: 2.6475 - accuracy: 0.0000e+00 - val_loss: 2.3231 - val_accuracy: 0.1142
Epoch 2/3
1/1 [==============================] - ETA: 0s - loss: 1.9612 - accuracy: 1.0000
Epoch 2: val_loss improved from 2.32311 to 2.34286, saving model to D:\deep_learning\tf_keras\semantic_segmentation\logs\mymodel_2
1/1 [==============================] - 5s 5s/step - loss: 1.9612 - accuracy: 1.0000 - val_loss: 2.3429 - val_accuracy: 0.1187
Epoch 3/3
1/1 [==============================] - ETA: 0s - loss: 2.8378 - accuracy: 0.0000e+00
Epoch 3: val_loss did not improve from 2.34286
1/1 [==============================] - 5s 5s/step - loss: 2.8378 - accuracy: 0.0000e+00 - val_loss: 2.2943 - val_accuracy: 0.1346
In your code, You write montior instead of monitor, and the function doesn't have this word as param then use the default value, If you write like below, You get what you want:
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
filepath=ckpt_path,
monitor="val_accuracy",
mode='max',
save_best_only=True,
save_weights_only=False,
verbose=1
)
]
After training the below model and plotting the train and validation accuracy I'm getting two straight horizontal lines (picture attached).
These are the parameters
Params:
mid_units: 256.0
activation: relu
dropout: 0.34943936277356535
optimizer: adam
batch_size: 64.0
for cls in os.listdir(path):
for sound in tqdm(os.listdir(os.path.join(path, cls))):
wav = librosa.load(os.path.join(os.path.join(path, cls, sound)), sr=16000)[0].astype(np.float32)
tmp_samples.append(wav)
tmp_labels.append(cls)
X_train, X_test, y_train , y_test = train_test_split( tmp_samples, tmp_labels , test_size=0.60,shuffle=True)
X_test,X_valid, y_test , y_valid = train_test_split( X_test, y_test , test_size=0.50,shuffle=True)
for x,y in zip(X_train,y_train):
extract_features_with_aug(x, y, model, samples , labels )
for x,y in zip(X_test,y_test):
extract_features(x, y, model, plain_samples , plain_labels )
for x,y in zip(X_valid,y_valid):
extract_features(x, y, model, valid_sample,valid_label)
X_train = np.asarray(samples)
y_train = np.asarray(labels)
X_test = np.asarray(plain_samples)
y_test=np.asarray(plain_labels)
X_valid = np.asarray(valid_sample)
y_valid=np.asarray(valid_label)
X_train = shuffle(samples)
y_train = shuffle(labels)
X_test = shuffle(plain_samples)
y_test=shuffle(plain_labels)
X_valid = shuffle(valid_sample)
y_valid=shuffle(valid_label)
return X_train, y_train , X_test , y_test ,X_valid,y_valid
Model:
input = layers.Input( batch_shape=(None,1024,1),dtype=tf.float32,name='audio')
drop=layers.Dropout( dropout_rate ) (input)
fl= layers.Flatten() (drop)
l= layers.Dense( mid_units , activation= activation )(fl)
ba=layers.BatchNormalization() (l)
drop2=layers.Dropout( dropout_rate ) (ba)
net=layers.Dense( 5, activation= activation )(drop2)
model = Model(inputs=input, outputs=net)
model.summary()
return model
def train_model(
X_train, y_train , X_test , y_test , X_valid,y_valid,
fname, # Path where to save the model
mid_units,
activation ,
dropout ,
batch_size ,
optimizer
):
# Generate the model
general_model = create_model( mid_units, activation , dropout )
general_model.compile(optimizer= optimizer , loss='categorical_crossentropy',
metrics=['accuracy'])
# Create some callbacks
callbacks = [tf.keras.callbacks.ModelCheckpoint(filepath=fname, monitor='val_loss', save_best_only=True),
tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.95, patience=5, verbose=1,
min_lr=0.000001)]
################
history = general_model.fit(X_train, y_train, epochs=EPOCHS, validation_data = ( X_valid,y_valid ), batch_size= batch_size ,
callbacks=callbacks, verbose=1)
For the training history I'm getting fixed values
3027/3027 [==============================] - 29s 9ms/step - loss: nan - accuracy: 0.2150 - val_loss: nan - val_accuracy: 0.2266
Epoch 97/100
3027/3027 [==============================] - 31s 10ms/step - loss: nan - accuracy: 0.2150 - val_loss: nan - val_accuracy: 0.2266
Epoch 98/100
3027/3027 [==============================] - 41s 14ms/step - loss: nan - accuracy: 0.2150 - val_loss: nan - val_accuracy: 0.2266
Epoch 99/100
3027/3027 [==============================] - 32s 11ms/step - loss: nan - accuracy: 0.2150 - val_loss: nan - val_accuracy: 0.2266
Epoch 100/100
Code:
import tensorflow.keras as tfk
import pandas as pd
import numpy as np
dataset = pd.read_csv("translator.csv")
x_train, x_test = dataset[["Afrikaans Woorde", "English Words"]]
y_train, y_test = dataset[["Total Letter Amount", "Incommon Letters"]]
x_train = np.array(x_train)
y_train = np.array(y_train)
x_test = np.array(x_test)
y_test = np.array(y_test)
model = tfk.models.Sequential()
input_layer = model.add(tfk.layers.Flatten())
hidden_layer1 = model.add(tfk.layers.Dense(128, activation="relu"))
hidden_layer2 = model.add(tfk.layers.Dense(128, activation="relu"))
output_layer = model.add(tfk.layers.Dense(1))
compiler = model.compile(optimizer="adam", loss="spare_categorical_crossentropy", metrics=["accuracy"])
fitter = model.fit(x_train, y_train, epochs=10)
val_loss, val_acc = model.evaluate(x_test, y_test)
print(f"Percentage loss {val_loss * 100}%", f"Percentage accuracy {val_acc * 100}%")
Error:
IndexError: list index out of range
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_5408/3773670894.py in <module>
22 compiler = model.compile(optimizer="adam", loss="spare_categorical_crossentropy", metrics=["accuracy"])
23
---> 24 fitter = model.fit(x_train, y_train, epochs=10)
25
26 val_loss, val_acc = model.evaluate(x_test, y_test)
Question:
I have tried everything, I am not sure what to do? I have, even converted the dataset to an numpy array, yet it still gives me the error.
This specific model is to see if I can build a Translator just from a couple of words.
I tried with random input, your model architecture outputs 1, which means binary classification.
Working sample code
import tensorflow.keras as tfk
import numpy as np
import tensorflow as tf
X_train = np.random.random((1512,18))
y_train = np.random.random((1512,1))
dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_data = dataset.shuffle(len(X_train)).batch(32)
train_data = train_data.prefetch(
buffer_size=tf.data.experimental.AUTOTUNE)
model = tfk.models.Sequential()
input = model.add(tfk.layers.Dense(15, activation=tf.nn.relu, input_shape=(18,)))
input_layer = model.add(tfk.layers.Flatten())
hidden_layer1 = model.add(tfk.layers.Dense(128, activation="relu"))
hidden_layer2 = model.add(tfk.layers.Dense(128, activation="relu"))
output_layer = model.add(tfk.layers.Dense(1))
model.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
fitter = model.fit(train_data, epochs=5, batch_size=5, verbose=1)
Output
Epoch 1/5
48/48 [==============================] - 4s 5ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00
Epoch 2/5
48/48 [==============================] - 0s 4ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00
Epoch 3/5
48/48 [==============================] - 0s 5ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00
Epoch 4/5
48/48 [==============================] - 0s 6ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00
Epoch 5/5
48/48 [==============================] - 0s 5ms/step - loss: 5.9153e-08 - accuracy: 0.0000e+00
I wrote this very simple code
model = keras.models.Sequential()
model.add(layers.Dense(13000, input_dim=X_train.shape[1], activation='relu', trainable=False))
model.add(layers.Dense(1, input_dim=13000, activation='linear'))
model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])
model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1)
The data is MNIST but only for digits '0' and '1'.
I have a very strange issue, where the loss is monotonically decreasing to zero, as expected, yet the accuracy instead of increasing, is also decreasing.
Here is a sample output
12665/12665 [==============================] - 0s 11us/step - loss: 0.0107 - accuracy: 0.2355
Epoch 181/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0114 - accuracy: 0.2568
Epoch 182/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0128 - accuracy: 0.2726
Epoch 183/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0133 - accuracy: 0.2839
Epoch 184/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0134 - accuracy: 0.2887
Epoch 185/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0110 - accuracy: 0.2842
Epoch 186/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0101 - accuracy: 0.2722
Epoch 187/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0094 - accuracy: 0.2583
Since we only have two classes, the benchmark for lowest possible accuracy should be 0.5, and furthermore we are monitoring accuracy on the training set, so it should very going up to 100%, I expect overfitting and I am overfitting according to the loss function.
At the final epoch, this is the situation
12665/12665 [==============================] - 0s 11us/step - loss: 9.9710e-06 - accuracy: 0.0758
a 7% accuracy when the worst theoretical possibility if you guess randomly is 50%. This is no accident. Something is going on here.
Can anyone see the problem?
Entire code
from tensorflow import keras
import numpy as np
from matplotlib import pyplot as plt
import keras
from keras.callbacks import Callback
from keras import layers
import warnings
class EarlyStoppingByLossVal(Callback):
def __init__(self, monitor='val_loss', value=0.00001, verbose=0):
super(Callback, self).__init__()
self.monitor = monitor
self.value = value
self.verbose = verbose
def on_epoch_end(self, epoch, logs={}):
current = logs.get(self.monitor)
if current is None:
warnings.warn("Early stopping requires %s available!" % self.monitor, RuntimeWarning)
if current < self.value:
if self.verbose > 0:
print("Epoch %05d: early stopping THR" % epoch)
self.model.stop_training = True
def load_mnist():
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = np.reshape(train_images, (train_images.shape[0], train_images.shape[1] * train_images.shape[2]))
test_images = np.reshape(test_images, (test_images.shape[0], test_images.shape[1] * test_images.shape[2]))
train_labels = np.reshape(train_labels, (train_labels.shape[0],))
test_labels = np.reshape(test_labels, (test_labels.shape[0],))
train_images = train_images[(train_labels == 0) | (train_labels == 1)]
test_images = test_images[(test_labels == 0) | (test_labels == 1)]
train_labels = train_labels[(train_labels == 0) | (train_labels == 1)]
test_labels = test_labels[(test_labels == 0) | (test_labels == 1)]
train_images, test_images = train_images / 255, test_images / 255
return train_images, train_labels, test_images, test_labels
X_train, y_train, X_test, y_test = load_mnist()
train_acc = []
train_errors = []
test_acc = []
test_errors = []
width_list = [13000]
for width in width_list:
print(width)
model = keras.models.Sequential()
model.add(layers.Dense(width, input_dim=X_train.shape[1], activation='relu', trainable=False))
model.add(layers.Dense(1, input_dim=width, activation='linear'))
model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])
callbacks = [EarlyStoppingByLossVal(monitor='loss', value=0.00001, verbose=1)]
model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1, callbacks=callbacks)
train_errors.append(model.evaluate(X_train, y_train)[0])
test_errors.append(model.evaluate(X_test, y_test)[0])
train_acc.append(model.evaluate(X_train, y_train)[1])
test_acc.append(model.evaluate(X_test, y_test)[1])
plt.plot(width_list, train_errors, marker='D')
plt.xlabel("width")
plt.ylabel("train loss")
plt.show()
plt.plot(width_list, test_errors, marker='D')
plt.xlabel("width")
plt.ylabel("test loss")
plt.show()
plt.plot(width_list, train_acc, marker='D')
plt.xlabel("width")
plt.ylabel("train acc")
plt.show()
plt.plot(width_list, test_acc, marker='D')
plt.xlabel("width")
plt.ylabel("test acc")
plt.show()
A linear activation in the last layer for a (binary) classification problem is meaningless; change your last layer to:
model.add(layers.Dense(1, input_dim=width, activation='sigmoid'))
Linear activations for the last layer are used for regression problems and not for classification ones.