I am getting zero validation accuracy - python

I am getting zero validation accuracy on my LSTM model.
As my model is a many to one model, I am using one unit in the last dense layer. But, it is giving me this accuracy.
536/536 [==============================] - 6s 8ms/step - loss: nan -
accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
<keras.callbacks.History at 0x7efd6b9bc5d0>
My model is:
classifier1 = Sequential()
classifier1.add(CuDNNLSTM(100, input_shape = (x1_train.shape[1], x1_train.shape[2]), return_sequences= True))
# classifier1.add(Dropout(0.02))
classifier1.add(CuDNNLSTM(100))
classifier1.add(Dropout(0.02))
classifier1.add(Dense(100, activation = 'sigmoid'))
# classifier1.add(Dense(300))
classifier1.add(Dropout(0.02))
classifier1.add(Dense(1, activation='softmax'))
# classifier1.add(Dropout(0.02))
# classifier1.add(Dense(1))
early_stopping = EarlyStopping(monitor='val_loss',patience=3, verbose = 1)
callback = [early_stopping]
classifier1.compile(
loss='sparse_categorical_crossentropy',
# loss = 'mean_squared_error',
# optimizer=Adam(learning_rate=0.05, decay= 1e-6),
optimizer='rmsprop',
metrics=['accuracy'])
classifier1.fit(x1_train, y1_train, epochs=1 ,
validation_data=(x1_test, y1_test),
batch_size=50
# class_weight= 'balanced'
# callbacks = callback)
)

The output of your network is of dimension 1, so you should use in the last layer a sigmoid activation instead of a softmax and especially replace the categorical cross entropy by a binary cross entropy, because it is a two class problem. Then it should work.

Related

Kaggle problem for dogs and cats with kernal using ANN is not working for me

I have written Artificial Neural network code to solve Keggale Dog and Cats Kernal problem but somehow during training, it shows loss=nan and bad accuracy. My code can be found at https://www.kaggle.com/dilipkumar2k6/dogs-vs-cats-with-new-kernel/notebook
Following are details on error
from tensorflow import keras
# First apply Artificial neural network (ANN)
ann = keras.Sequential([
keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE, 3)), # Flaten 3d to 1d
keras.layers.Dense(3000, activation='relu'), # more hidden layer gives better perf
keras.layers.Dense(1000, activation='relu'), # more hidden layer gives better perf
keras.layers.Dense(100, activation='relu'), # more hidden layer gives better perf
keras.layers.Dense(2, activation='sigmoid')
])
ann.compile(optimizer='SGD', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
ann.fit(train_X, train_y, epochs=10)
Error
Epoch 1/10
438/438 [==============================] - 2s 2ms/step - loss: nan - accuracy: 5.0000e-04
Epoch 2/10
438/438 [==============================] - 1s 2ms/step - loss: nan - accuracy: 0.0000e+00
Using a sigmoid activation function in your output layer seems a bit strange to me when using sparse_categorical_crossentropy (although it could also work). Anyway, I think you should consider changing this line:
keras.layers.Dense(2, activation='sigmoid')
to
keras.layers.Dense(1, activation='sigmoid')
and use tf.keras.losses.BinaryCrossentropy().
Or change your activation function to softmax and leave the rest as it is.
You should also consider redesigning your model and using at least one tf.keras.layers.Conv2D layer before flattening the data. Here is a working example:
import tensorflow_datasets as tfds
import tensorflow as tf
ds, ds_info = tfds.load('cats_vs_dogs', split='train', with_info=True)
normalization_layer = tf.keras.layers.Rescaling(1./255)
def resize_inputs(data):
images, labels = data['image'], data['label']
images = tf.image.resize(normalization_layer(images),[64, 64], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
return images, labels
ds = ds.map(resize_inputs).batch(64)
ann = tf.keras.Sequential([
tf.keras.layers.Conv2D(64, kernel_size=3, input_shape=(64, 64, 3)),
tf.keras.layers.Flatten(), # Flaten 3d to 1d
tf.keras.layers.Dense(200, activation='relu'), # more hidden layer gives better perf
tf.keras.layers.Dense(100, activation='relu'), # more hidden layer gives better perf
tf.keras.layers.Dense(50, activation='relu'), # more hidden layer gives better perf
tf.keras.layers.Dense(1, activation='sigmoid')
])
ann.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])
ann.fit(ds, epochs=10)
Epoch 1/10
364/364 [==============================] - 58s 140ms/step - loss: 0.8692 - accuracy: 0.5902
Epoch 2/10
364/364 [==============================] - 51s 141ms/step - loss: 0.6155 - accuracy: 0.6559
Epoch 3/10
364/364 [==============================] - 51s 141ms/step - loss: 0.5708 - accuracy: 0.7009
Epoch 4/10
364/364 [==============================] - 51s 140ms/step - loss: 0.5447 - accuracy: 0.7262
...
You can experiment with this example and find out which combination of activation function, loss function, and number of output nodes works best for you.

fitting model several times in keras

I am using model.fit() several times, each time is responsible for training a block of layers where other layers are freezed
CODE
# create the base pre-trained model
base_model = efn.EfficientNetB0(input_tensor=input_tensor,weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# add a fully-connected layer
x = Dense(x.shape[1], activation='relu',name='first_dense')(x)
x=Dropout(0.5)(x)
x = Dense(x.shape[1], activation='relu',name='output')(x)
x=Dropout(0.5)(x)
no_classes=10
predictions = Dense(no_classes, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional layers
for layer in base_model.layers:
layer.trainable = False
#FIRST COMPILE
model.compile(optimizer='Adam', loss=loss_function,
metrics=['accuracy'])
#FIRST FIT
model.fit(features[train], labels[train],
batch_size=batch_size,
epochs=top_epoch,
verbose=verbosity,
validation_split=validation_split)
# Generate generalization metrics
scores = model.evaluate(features[test], labels[test], verbose=1)
print(scores)
#Let all layers be trainable
for layer in model.layers:
layer.trainable = True
from tensorflow.keras.optimizers import SGD
#FIRST COMPILE
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss=loss_function,
metrics=['accuracy'])
#SECOND FIT
model.fit(features[train], labels[train],
batch_size=batch_size,
epochs=no_epochs,
verbose=verbosity,
validation_split=validation_split)
What is weird is that in the second fit, accuracy resulted from first epoch is much lower that the accuracy of the last epoch of the first fit.
RESULT
Epoch 40/40
6286/6286 [==============================] - 14s 2ms/sample - loss: 0.2370 - accuracy: 0.9211 - val_loss: 1.3579 - val_accuracy: 0.6762
874/874 [==============================] - 2s 2ms/sample - loss: 0.4122 - accuracy: 0.8764
Train on 6286 samples, validate on 1572 samples
Epoch 1/40
6286/6286 [==============================] - 60s 9ms/sample - loss: 5.9343 - accuracy: 0.5655 - val_loss: 2.4981 - val_accuracy: 0.5115
I think the weights of the second fit are not taken from the first fit
Thanks in advance!!!
I think this is the result of using a different optimizer. You used Adam in the first fit and SGD in the second fit. Try using Adam in the second fit and see if it works correctly
I solved this by removing the second compiler.

CNN Acc and Loss constant with custom image data

I am trying to train a simple CNN model for a binary classification task in Keras with a dataset of images I mined. The problem is that I am getting constant accuracy, val_accuracy and loss after a couple of epochs. Am I processing the data the wrong way? Or is it something in the model settings?
At the beginning I was using softmax as the final activation function and categorical cossentropy, I was also using the to_categorical function on the labels.
After reading up on what usually causes this to happen I decided to use sigmoid and binary_crossentropy instead and not use to_categorical. Still the problem persists and I am starting to wonder whether it's my data the problem (the two classes are too similar) or the way I am feeding the image arrays.
conkeras1 = []
pics = os.listdir("/Matrices/")
# I do this for the images of both classes, just keeping it short.
for x in range(len(pics)):
img = image.load_img("Matrices/"+pics[x])
conkeras1.append(img)
conkeras = conkeras1+conkeras2
conkeras = np.array([image.img_to_array(x) for x in conkeras]).astype("float32")
conkeras = conkeras / 255 # I also tried normalizing with a z-score with no success
yecs1 = [1]*len(conkeras1)
yecs2 = [0]*len(conkeras2)
y_train = yecs1+yecs2
y_train = np.array(y_train).astype("float32")
model = Sequential([
Conv2D(64, (3, 3), input_shape=conkeras.shape[1:], padding="same", activation="relu"),
Conv2D(32, (3, 3), activation="relu", padding="same"),
Flatten(),
Dense(500, activation="relu"),
#Dense(4096, activation="relu"),
Dense(1, activation="sigmoid")
])
model.compile(loss=keras.losses.binary_crossentropy,
optimizer=keras.optimizers.Adam(lr=0.001),
metrics=['accuracy'])
history = model.fit(conkeras, y_train,
batch_size=32,
epochs=32, shuffle=True,
verbose=1,
callbacks=[tensorboard])
The output I get is this:
975/975 [==============================] - 107s 110ms/step - loss: 8.0022 - acc: 0.4800
Epoch 2/32
975/975 [==============================] - 99s 101ms/step - loss: 8.1756 - acc: 0.4872
Epoch 3/32
975/975 [==============================] - 97s 100ms/step - loss: 8.1756 - acc: 0.4872
Epoch 4/32
975/975 [==============================] - 97s 99ms/step - loss: 8.1756 - acc: 0.4872
and these are the shapes of the traning set and labels
>>> conkeras.shape
(975, 100, 100, 3)
>>> y_train.shape
(975,)

Deep learning: Training set tends to be good and Validation set is bad

I am facing to a problem for which I have difficulties to understand why I have such behaviour.
I am trying to use a pre-trained resnet 50 (keras) model for a binary image classification, I also built a simple cnn. I have about 8k balanced RGB images of size 200x200 and I divided this set into three sub-sets (train 70%, validation 15%, test 15%).
I built a generator to feed data to my models based on keras.utils.Sequence.
The problem that I have is my models tends to learn on the training set but on validation set I have poor results on pre-trained resnet50 and on simple cnn.
I tried several things to solve this problem but Not improvement at all.
With and without Data augmentation on training set (rotation)
Images are normalised between [0,1]
With and without Regularizers
Variation of the learning rate
This is an example of results obtained:
Epoch 1/200
716/716 [==============================] - 320s 447ms/step - loss: 8.6096 - acc: 0.4728 - val_loss: 8.6140 - val_acc: 0.5335
Epoch 00001: val_loss improved from inf to 8.61396, saving model to ../models_saved/resnet_adam_best.h5
Epoch 2/200
716/716 [==============================] - 287s 401ms/step - loss: 8.1217 - acc: 0.5906 - val_loss: 10.9314 - val_acc: 0.4632
Epoch 00002: val_loss did not improve from 8.61396
Epoch 3/200
716/716 [==============================] - 249s 348ms/step - loss: 7.5357 - acc: 0.6695 - val_loss: 11.1432 - val_acc: 0.4657
Epoch 00003: val_loss did not improve from 8.61396
Epoch 4/200
716/716 [==============================] - 284s 397ms/step - loss: 7.5092 - acc: 0.6828 - val_loss: 10.0665 - val_acc: 0.5351
Epoch 00004: val_loss did not improve from 8.61396
Epoch 5/200
716/716 [==============================] - 261s 365ms/step - loss: 7.0679 - acc: 0.7102 - val_loss: 4.2205 - val_acc: 0.5351
Epoch 00005: val_loss improved from 8.61396 to 4.22050, saving model to ../models_saved/resnet_adam_best.h5
Epoch 6/200
716/716 [==============================] - 285s 398ms/step - loss: 6.9945 - acc: 0.7161 - val_loss: 10.2276 - val_acc: 0.5335
....
This is classes used to load data into my models.
class DataGenerator(keras.utils.Sequence):
def __init__(self, inputs,
labels, img_size,
input_shape,
batch_size, num_classes,
validation=False):
self.inputs = inputs
self.labels = labels
self.img_size = img_size
self.input_shape = input_shape
self.batch_size = batch_size
self.num_classes = num_classes
self.validation = validation
self.indexes = np.arange(len(self.inputs))
self.inc = 0
def __getitem__(self, index):
"""Generate one batch of data
Parameters
----------
index :the index from which batch will be taken
Returns
-------
out : a tuple that contains (inputs and labels associated)
"""
batch_inputs = np.zeros((self.batch_size, *self.input_shape))
batch_labels = np.zeros((self.batch_size, self.num_classes))
# Generate data
for i in range(self.batch_size):
# choose random index in features
if self.validation:
index = self.indexes[self.inc]
self.inc += 1
if self.inc == len(self.inputs):
self.inc = 0
else:
index = random.randint(0, len(self.inputs) - 1)
batch_inputs[i] = self.rgb_processing(self.inputs[index])
batch_labels[i] = to_categorical(self.labels[index], num_classes=self.num_classes)
return batch_inputs, batch_labels
def __len__(self):
"""Denotes the number of batches per epoch
Returns
-------
out : number of batches per epochs
"""
return int(np.floor(len(self.inputs) / self.batch_size))
def rgb_processing(self, path):
img = load_img(path)
rgb = img.get_rgb_array()
if not self.validation:
if random.choice([True, False]):
rgb = random_rotation(rgb)
return rgb/np.max(rgb)
class Models:
def __init__(self, input_shape, classes):
self.input_shape = input_shape
self.classes = classes
pass
def simpleCNN(self, optimizer):
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=self.input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(self.classes), activation='softmax'))
model.compile(loss=keras.losses.binary_crossentropy,
optimizer=optimizer,
metrics=['accuracy'])
return model
def resnet50(self, optimizer):
model = keras.applications.resnet50.ResNet50(include_top=False,
input_shape=self.input_shape,
weights='imagenet')
model.summary()
model.layers.pop()
model.summary()
for layer in model.layers:
layer.trainable = False
output = Flatten()(model.output)
#I also tried to add dropout layers here with batch normalization but it does not change results
output = Dense(len(self.classes), activation='softmax')(output)
finetuned_model = Model(inputs=model.input,
outputs=output)
finetuned_model.compile(optimizer=optimizer,
loss=keras.losses.binary_crossentropy,
metrics=['accuracy'])
return finetuned_model
This is how these functions are called:
train_batches = DataGenerator(inputs=train.X.values,
labels=train.y.values,
img_size=img_size,
input_shape=input_shape,
batch_size=batch_size,
num_classes=len(CLASSES))
validate_batches = DataGenerator(inputs=validate.X.values,
labels=validate.y.values,
img_size=img_size,
input_shape=input_shape,
batch_size=batch_size,
num_classes=len(CLASSES),
validation=True)
if model_name == "cnn":
model = models.simpleCNN(optimizer=Adam(lr=0.0001))
elif model_name == "resnet":
model = models.resnet50(optimizer=Adam(lr=0.0001))
early_stopping = EarlyStopping(patience=15)
checkpointer = ModelCheckpoint(output_name + '_best.h5', verbose=1, save_best_only=True)
history = model.fit_generator(train_batches, steps_per_epoch=num_train_steps, epochs=epochs,
callbacks=[early_stopping, checkpointer], validation_data=validate_batches,
validation_steps=num_valid_steps)
I finally found the principal element that causes this over-fitting. Since I use a pre-trained model. I was set layers as non-trainable. Thus I tried to put them as trainable and It seems that it solves the problem.
for layer in model.layers:
layer.trainable = False
My hypothesis is that my images are too far away from data used to train the model.
I also added some dropouts and batch normalization at the end of the resnet model.

Negative Loss function when using cross_val_score in sklearn/keras. Works when not using k fold

I am trying to implement a model which takes in an array of 167 categorical variables (0 or 1), and outputs an estimated value between 0 and 1. Over 300 datapoints are available.
The model works when using a basic model, below:
classifier = Sequential()
classifier.add(Dense(units = 80, kernel_initializer = 'uniform', activation = 'relu', input_dim = 167))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.fit(X_train, y_train, batch_size = 10, epochs = 200)
y_pred = classifier.predict(X_test)
Output is similar to:
Epoch 105/200
253/253 [==============================] - 0s - loss: 0.5582 - acc: 0.0079
Epoch 106/200
253/253 [==============================] - 0s - loss: 0.5583 - acc: 0.0079
Unfortunately, when I try to use cross validation, model stops working, and loss function becomes large and negative. Code is below:
def build_classifier():
classifier = Sequential()
classifier.add(Dense(units = 80, kernel_initializer = 'uniform', activation = 'relu', input_dim = 167))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
return classifier
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, epochs = 100)
accuracies = cross_val_score(estimator = classifier, X=X_train, y=y_train, cv=3,n_jobs=1)
Output looks like:
Epoch 59/100
168/168 [==============================] - 0s - loss: -1106.9519 - acc: 0.0060
Epoch 60/100
168/168 [==============================] - 0s - loss: -1106.9519 - acc: 0.0060
I have toyed with different parameters, but I cannot seem to find what is causing the issue. Still learning, so any help is very appreciated.
It can happen if the data is sparse. A lot NaNs and Infs can cause this problem. If you are doing a 3 fold validation, it may be possible that in one of the folds, the data selected doesn't contain enough info. Possible solutions can be:
Change the random seed.
Increase the data set.

Categories