I'm working on a Neural Network and I've been training it recently, and it has approximately 93% accuracy on the training data and 0% accuracy on the validation data. My first thought was overfitting, but the model doesn't save in between training and I get these results in the first Epoch. I'm using keras in python with the following model code:
model = Sequential(
[
Conv1D(320, 8, input_shape=(560, 560), activation="relu"),
# Conv1D(320, 8, activation="relu"),
# Conv1D(320, 8, activation="relu"),
# Dense(750, activation="relu"),
# Dropout(0.6),
Dense(1500, activation="relu"),
Dropout(0.6),
Dense(750, activation="relu"),
Dropout(0.6),
GlobalMaxPooling1D(keepdims=True),
Dense(1, activation='softmax')
]
)
model.compile(optimizer=Adam(learning_rate=0.00001), loss="binary_crossentropy", metrics=['accuracy'])
earlystopping = callbacks.EarlyStopping(monitor="val_accuracy",
mode="max", patience=2,
restore_best_weights=True)
model1 = model.fit(x=training_x, y=training_y, batch_size=150, epochs=5, shuffle=True, verbose=1, callbacks=[earlystopping], validation_data=(val_x, val_y))
The results I'm getting look like this:
Epoch 1/5
167/167 [==============================] - 1266s 8s/step - loss: 6.4154 - accuracy: 0.9262 - val_loss: 0.0054 - val_accuracy: 0.0000e+00
I've tried changing almost all of the hyperparameters and changing the model's architecture but I keep getting similar results. Does this have anything to do with the data? The data I'm using is a 3d NumPy array containing pixel data from a bunch of images. Any help here would be greatly appreciated.
You need to use activation='sigmoid' and optimizers.RMSprop(lr=1e-4) for a binary classification.
Related
I have written Artificial Neural network code to solve Keggale Dog and Cats Kernal problem but somehow during training, it shows loss=nan and bad accuracy. My code can be found at https://www.kaggle.com/dilipkumar2k6/dogs-vs-cats-with-new-kernel/notebook
Following are details on error
from tensorflow import keras
# First apply Artificial neural network (ANN)
ann = keras.Sequential([
keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE, 3)), # Flaten 3d to 1d
keras.layers.Dense(3000, activation='relu'), # more hidden layer gives better perf
keras.layers.Dense(1000, activation='relu'), # more hidden layer gives better perf
keras.layers.Dense(100, activation='relu'), # more hidden layer gives better perf
keras.layers.Dense(2, activation='sigmoid')
])
ann.compile(optimizer='SGD', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
ann.fit(train_X, train_y, epochs=10)
Error
Epoch 1/10
438/438 [==============================] - 2s 2ms/step - loss: nan - accuracy: 5.0000e-04
Epoch 2/10
438/438 [==============================] - 1s 2ms/step - loss: nan - accuracy: 0.0000e+00
Using a sigmoid activation function in your output layer seems a bit strange to me when using sparse_categorical_crossentropy (although it could also work). Anyway, I think you should consider changing this line:
keras.layers.Dense(2, activation='sigmoid')
to
keras.layers.Dense(1, activation='sigmoid')
and use tf.keras.losses.BinaryCrossentropy().
Or change your activation function to softmax and leave the rest as it is.
You should also consider redesigning your model and using at least one tf.keras.layers.Conv2D layer before flattening the data. Here is a working example:
import tensorflow_datasets as tfds
import tensorflow as tf
ds, ds_info = tfds.load('cats_vs_dogs', split='train', with_info=True)
normalization_layer = tf.keras.layers.Rescaling(1./255)
def resize_inputs(data):
images, labels = data['image'], data['label']
images = tf.image.resize(normalization_layer(images),[64, 64], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
return images, labels
ds = ds.map(resize_inputs).batch(64)
ann = tf.keras.Sequential([
tf.keras.layers.Conv2D(64, kernel_size=3, input_shape=(64, 64, 3)),
tf.keras.layers.Flatten(), # Flaten 3d to 1d
tf.keras.layers.Dense(200, activation='relu'), # more hidden layer gives better perf
tf.keras.layers.Dense(100, activation='relu'), # more hidden layer gives better perf
tf.keras.layers.Dense(50, activation='relu'), # more hidden layer gives better perf
tf.keras.layers.Dense(1, activation='sigmoid')
])
ann.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])
ann.fit(ds, epochs=10)
Epoch 1/10
364/364 [==============================] - 58s 140ms/step - loss: 0.8692 - accuracy: 0.5902
Epoch 2/10
364/364 [==============================] - 51s 141ms/step - loss: 0.6155 - accuracy: 0.6559
Epoch 3/10
364/364 [==============================] - 51s 141ms/step - loss: 0.5708 - accuracy: 0.7009
Epoch 4/10
364/364 [==============================] - 51s 140ms/step - loss: 0.5447 - accuracy: 0.7262
...
You can experiment with this example and find out which combination of activation function, loss function, and number of output nodes works best for you.
I have a 9 class dataset with 7000 images, I use MobilenetV2 for training my set and ImageGenerator, resulting in 82% percent val accuracy. But when i predict my test images, it always predicts a false class. I have no idea what is wrong with it.Here is my code;
My ImageGenerator:
image_gen = ImageDataGenerator(rotation_range = 20,
width_shift_range=0.12,
height_shift_range=0.12,
shear_range=0.1,
zoom_range = 0.06,
horizontal_flip=True,
fill_mode='nearest',
rescale=1./255)
My model:
Model = Sequential()
Model.add(Conv2D(filters=32,kernel_size=(3,3),input_shape=image_shape,activation='relu'))
Model.add(MaxPool2D(pool_size=(2,2)))
Model.add(Conv2D(filters=64,kernel_size=(3,3),input_shape=image_shape,activation='relu'))
Model.add(MaxPool2D(pool_size=(2,2)))
Model.add(Conv2D(filters=64,kernel_size=(3,3),input_shape=image_shape,activation='relu'))
Model.add(MaxPool2D(pool_size=(2,2)))
Model.add(Conv2D(filters=64,kernel_size=(3,3),input_shape=image_shape,activation='relu'))
Model.add(MaxPool2D(pool_size=(2,2)))
Model.add(Flatten())
Model.add(Dense(256,activation='relu'))
Model.add(Dense(9,activation='softmax'))
MobilenetV2:
height=224
width=224
img_shape=(height, width, 3)
dropout=.3
lr=.001
class_count=9 # number of classes
img_shape=(height, width, 3)
base_model=tf.keras.applications.MobileNetV2( include_top=False, input_shape=img_shape, pooling='max', weights='imagenet')
x=base_model.output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x = Dense(512, kernel_regularizer = regularizers.l2(l = 0.016),activity_regularizer=regularizers.l1(0.006),
bias_regularizer=regularizers.l1(0.006) ,activation='relu', kernel_initializer= tf.keras.initializers.GlorotUniform(seed=123))(x)
x=Dropout(rate=dropout, seed=123)(x)
output=Dense(class_count, activation='softmax',kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123))(x)
Model = keras.models.Model(inputs=base_model.input, outputs=output)
Model.compile( loss='categorical_crossentropy', metrics=['accuracy'],optimizer='Adamax')
My Rlronp:
rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=1, verbose=1, mode='auto', min_delta=0.0001, cooldown=0, min_lr=0)
My train_image_gen:
train_image_gen = image_gen.flow_from_directory(train_path,
target_size=image_shape[:2],
color_mode='rgb',
batch_size=batch_size,
class_mode='categorical')
My test_image_gen:
test_image_gen = image_gen.flow_from_directory(test_path,
target_size=image_shape[:2],
color_mode='rgb',
batch_size=batch_size,
class_mode='categorical',shuffle=False)
My earlystop:
early_stop = EarlyStopping(monitor='val_loss',patience=4)
My Model fit:
results = Model.fit(train_image_gen,epochs=20,
validation_data=test_image_gen,callbacks=[rlronp,early_stop],class_weight=class_weight
)
Training and accuracy:
Epoch 20/20 200/200 [==============================] - 529s 3s/step -
loss: 0.3995 - accuracy: 0.9925 - val_loss: 0.8637 - val_accuracy: 0.8258
My problem is when i predict an image from test set, it predicts the false class, 90% of time.
For example here, it has to be 3rd class, but max is on 2nd class.
array([[0.08064549, 0.04599327, 0.27055973, 0.05219262, 0.055945 ,
0.25723988, 0.07608379, 0.10404343, 0.05729679]], dtype=float32)
I tried collecting my own dataset with 156 class and 2.5k images, but it was even worse.
My loss on 20 epochs:
accuracy: 0.9925; val_accuracy: 0.8258
Clearly the model is overfitted,
Try using regularization techniques such as L2,L1 or Dropout, they will work.
Try to Collect More data(Or use data augumentation)
Or search for other Neural Network Architectures
The best method is plot val_loss v/s loss
r = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=15)
import matplotlib.pyplot as plt
plt.plot(r.history['loss'], label='loss')
plt.plot(r.history['val_loss'], label='val_loss')
plt.legend()
and check the point where loss and val_loss meet each other and then at the point of intersection see the number of epochs (say x) and train the model for x epochs only.
Hope you will find this useful.
Model is overfitted...use dropout layer.. I think it will help
Model.add(Dropout(0.2))
I am training a model using TensorFlow. I was getting weird results when looking at my model performance. I built two models to classify images, one using a CNN and the other using a traditional ANN. Below is the code setup for each of them.
#CNN model
model = Sequential()
model.add(Reshape((20, 60, 3)))
#model.add(Conv2D(128, (5, 5), (2, 2), activation='elu'))
#model.add(Conv2D(64, (4, 4), (2, 2), activation='elu'))
#model.add(Flatten())
#model.add(Dense(1, activation = 'elu'))
#model.add(Dense(25, activation = 'elu'))
#model.add(Dense(10, activation = 'elu'))
#model.add(Dense(1))
opt = keras.optimizers.RMSprop(lr=0.0009, decay=1e-6)
model.compile(Adam(lr = 0.0001), loss='mse', metrics = ['mae'])
history = model.fit(X_train, y_train, epochs = 20, validation_data=(X_val, y_val), batch_size= 32)
#ANN model
model = Sequential()
model.add(Reshape((20, 60, 3)))
#model.add(Flatten())
#model.add(Dense(10, activation = 'elu'))
#model.add(Dense(1))
opt = keras.optimizers.RMSprop(lr=0.0009, decay=1e-6)
model.compile(Adam(lr = 0.0001), loss='mse', metrics = ['mae'])
history = model.fit(X_train, y_train, epochs = 20, validation_data=(X_val, y_val), batch_size= 32)
However, the problem is that I am getting nearly identical loss, and mean absolute error metrics using both of these models, when I am expecting the mae to be MUCH higher for the 2nd model. Does anyone know why this is happening? Could it be something wrong with my input data?
P.S. This network is trying to do regression to predict steering angle for a self-driving rc car from a image
EDIT:
Here is the ending error with the CNN:
Epoch 20/20 113/113 [==============================] - 1s 5ms/step - loss: 0.0382 - mae: 0.1582 - val_loss: 0.0454 - val_mae: 0.1727 dict_keys(['loss', 'mae', 'val_loss', 'val_mae'])
Here is the ending error with the ANN:
Epoch 20/20 113/113 [==============================] - 0s 3ms/step - loss: 0.0789 - mae: 0.2187 - val_loss: 0.0854 - val_mae: 0.2300 dict_keys(['loss', 'mae', 'val_loss', 'val_mae'])
I think the issue is from your training data, try using another data and check the results again
I'm performing multi-class classification with three class labels in Keras. During training, both the training and validation losses were decreasing and accuracies were increasing. After training, I tested out the model on the training set as a sanity check and there seems to be a huge discrepancy between model.evaluate and model.predict. I did find some solutions that seemed to indicate this was an issue with BatchNorm and Dropout layers, but that shouldn't result in such a huge difference. The relevant code is as shown below.
model=Sequential()
model.add(Conv2D(32, (3, 3), padding="same",input_shape=input_shape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
.
.
model.add(Dense(n_classes))
model.add(Activation("softmax"))
optimizer=Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['categorical_accuracy'])
datagen = ImageDataGenerator(horizontal_flip=True, fill_mode='nearest')
train_datagen = datagen.flow(X_train, y_train, batch_size=batch_size)
val_datagen = ImageDataGenerator().flow(X_val, y_val, batch_size=batch_size)
history=model.fit(train_datagen, steps_per_epoch=math.ceil(nb_train_samples/batch_size), verbose=2, epochs=50, validation_data=val_datagen, validation_steps=math.ceil(nb_validation_samples/batch_size), class_weight=d_class_weights)
print('model.evaluate accuracy: ', model.evaluate(X_train, y_train, batch_size=batch_size)[1])
test_pred = model.predict(ImageDataGenerator().flow(X_train, y=None, batch_size=batch_size), steps=math.ceil(nb_train_samples/batch_size))
test_result=np.array(test_pred)
test_result = np.zeros(test_result.shape)
test_result[np.arange(len(test_pred)), test_pred.argmax(1)] = 1
total=0
count=0
for i in range(test_result.shape[0]):
total+=1
count+=(test_result[i]==y_train[i]).all()
print('model.predict accuracy: ', count/total)
The output I get is as follows:-
66/66 [==============================] - 12s 177ms/step - loss: 0.0010 - categorical_accuracy: 1.0000
model.evaluate accuracy: 1.0
model.predict accuracy: 0.42138063279002874
I've been trying to solve this for a while now and have failed to find anything. I'm already using categorical_crossentropy, categorical_accuracy, and softmax activation in the last layer, so I have no idea what's wrong. Any help would be greatly appreciated!
I finally found the solution, turns out that I'm only passing X_train into the predict function, and the shuffle parameter is True by default, because of which the predictions didn't correspond to the ground truth. Setting shuffle=False solved the problem.
test_pred = model.predict(ImageDataGenerator().flow(X_train, y=None, batch_size=batch_size, shuffle=False), steps=math.ceil(nb_train_samples/batch_size))
I am trying to train a simple CNN model for a binary classification task in Keras with a dataset of images I mined. The problem is that I am getting constant accuracy, val_accuracy and loss after a couple of epochs. Am I processing the data the wrong way? Or is it something in the model settings?
At the beginning I was using softmax as the final activation function and categorical cossentropy, I was also using the to_categorical function on the labels.
After reading up on what usually causes this to happen I decided to use sigmoid and binary_crossentropy instead and not use to_categorical. Still the problem persists and I am starting to wonder whether it's my data the problem (the two classes are too similar) or the way I am feeding the image arrays.
conkeras1 = []
pics = os.listdir("/Matrices/")
# I do this for the images of both classes, just keeping it short.
for x in range(len(pics)):
img = image.load_img("Matrices/"+pics[x])
conkeras1.append(img)
conkeras = conkeras1+conkeras2
conkeras = np.array([image.img_to_array(x) for x in conkeras]).astype("float32")
conkeras = conkeras / 255 # I also tried normalizing with a z-score with no success
yecs1 = [1]*len(conkeras1)
yecs2 = [0]*len(conkeras2)
y_train = yecs1+yecs2
y_train = np.array(y_train).astype("float32")
model = Sequential([
Conv2D(64, (3, 3), input_shape=conkeras.shape[1:], padding="same", activation="relu"),
Conv2D(32, (3, 3), activation="relu", padding="same"),
Flatten(),
Dense(500, activation="relu"),
#Dense(4096, activation="relu"),
Dense(1, activation="sigmoid")
])
model.compile(loss=keras.losses.binary_crossentropy,
optimizer=keras.optimizers.Adam(lr=0.001),
metrics=['accuracy'])
history = model.fit(conkeras, y_train,
batch_size=32,
epochs=32, shuffle=True,
verbose=1,
callbacks=[tensorboard])
The output I get is this:
975/975 [==============================] - 107s 110ms/step - loss: 8.0022 - acc: 0.4800
Epoch 2/32
975/975 [==============================] - 99s 101ms/step - loss: 8.1756 - acc: 0.4872
Epoch 3/32
975/975 [==============================] - 97s 100ms/step - loss: 8.1756 - acc: 0.4872
Epoch 4/32
975/975 [==============================] - 97s 99ms/step - loss: 8.1756 - acc: 0.4872
and these are the shapes of the traning set and labels
>>> conkeras.shape
(975, 100, 100, 3)
>>> y_train.shape
(975,)