I am having trouble increasing accuracy and reducing loss in my CNN.
Here are some initial parameters:
batch_size = 32
image_shape = 150 # Sizes input to 150x150
EPOCHS = 250
STEPS_PER_EPOCH = 7
IMAGES_IN_CLASS_FOLDERS > 100
I have the training and validation set as the same images but I pre process the training images so that the validation images are not the same thus:
# Image formatting - Preprocessing images into floating point tensors before being fed into the network
# Generator for our training data - Rescales the image, Flips Images Horizontally, Rotates it
train_image_generator = ImageDataGenerator(rescale=1./255, horizontal_flip=True, rotation_range=45)
# Generator for our validation data - Rescales the image
validation_image_generator = ImageDataGenerator(rescale=1./255)
# Applies scaling and resizing
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=training_Images,
shuffle=True,
target_size=(image_shape,image_shape), #(100,100)
class_mode='categorical')
val_data_gen = validation_image_generator.flow_from_directory(batch_size=batch_size,
directory=validate_Images,
shuffle=True ,
target_size=(image_shape, image_shape),
class_mode='categorical')
Further I have a Sequential model, which I have tried various parameters such as an input CONV2D(32) -> CONV2D(64) -> CONV2D(128) but I am currently testing this model with no success:
# Defining our model
model = tf.keras.models.Sequential([
# Old Method #
tf.keras.layers.Conv2D(8 , (2,2) , activation='LeakyReLU', input_shape=(image_shape, image_shape, 3)),
tf.keras.layers.Conv2D(16, (2,2) , activation='LeakyReLU'),
tf.keras.layers.Conv2D(32, (2,2) , activation='LeakyReLU'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(40, (2,2) , activation='LeakyReLU'),
tf.keras.layers.Conv2D(56, (2,2) , activation='LeakyReLU'),
tf.keras.layers.Conv2D(64, (2,2) , activation='LeakyReLU'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(96, (2,2) , activation='LeakyReLU'),
tf.keras.layers.Conv2D(128, (2,2), activation='LeakyReLU'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(16, activation='softmax'),
#tf.keras.layers.Dense(128, activation='relu'),
#tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(120)
# End Old Method #
])
I have tried various CONV2D layers, various activation methods. Here is the model.compile:
model.compile(optimizer=SGD(lr=0.01),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
I am using an SGD Optimizer, I have tried ADAM but with similar results. Loss is reduced overtime but it seems to reach a certain value range and stagnate with no increase in Accuracy.
model.fit:
history = model.fit(
train_data_gen,
steps_per_epoch= stepForEpoch,
epochs=EPOCHS,
validation_data=val_data_gen,
validation_steps=stepForEpoch
)
Can anyone offer some tips or point me in the right direction on how to increase the accuracy and reduce loss even further? Thank you!
Image of Results
Final Update
As of 06/23/2021 my model is improving significantly, not only with more EPOCHS but with more STEPS_PER_EPOCH:
Dividing the number of images like thus (IMAGES_IN_DATASET(20700) / BATCH_SIZE(32) = 677 STEPS_PER_EPOCH) and choosing 100 EPOCHS to test I am getting an increasing value for accuracy + 10% and an ever decreasing loss with an improvement in MSE.
ACCURACY_INCREASE = %10
MSE_IMPROVEMENT = -0.0004
ACCURACY_LOSS_IMPROVEMENT = -1.1
Thank you to users
#Reda El Hail
#Dr. Snoopy
To sum up the discussion in comments, the error comes from the last layer where an activation function is not set tf.keras.layers.Dense(120).
For a classification task, it should be tf.keras.layers.Dense(120, activation = 'softmax').
As #Snoopy announced: there is no sense to use softmax in hidden layers. It should be only used in the output layer.
Related
I created a CNN using Tensorflow to identify pneumonia and sometimes it returns a very small number as a prediction. why is this happening?
I have attached the link for the dataset
Here I how I process and load the data.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator( rescale = 1.0/255. )
val_datagen = ImageDataGenerator( rescale = 1.0/255. )
test_datagen = ImageDataGenerator( rescale = 1.0/255. )
train_generator = train_datagen.flow_from_directory('/kaggle/input/chest-xray-pneumonia/chest_xray/chest_xray/train/',
batch_size=20,
class_mode='binary',
target_size=(350, 350))
validation_generator = val_datagen.flow_from_directory('/kaggle/input/chest-xray-pneumonia/chest_xray/chest_xray/val/',
batch_size=20,
class_mode = 'binary',
target_size = (350, 350))
test_generator = test_datagen.flow_from_directory('/kaggle/input/chest-xray-pneumonia/chest_xray/chest_xray/test/',
batch_size=20,
class_mode = 'binary',
target_size = (350, 350
And here the Model, compile and fit functions
import tensorflow as tf
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 150x150 with 3 bytes color
tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(350, 350, 3)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
# 512 neuron hidden layer
tf.keras.layers.Dense(1024, activation='relu'),
# Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('cats') and 1 for the other ('dogs')
tf.keras.layers.Dense(1, activation='sigmoid')
])
compile the model
from tensorflow.keras.optimizers import RMSprop
model.compile(optimizer=RMSprop(learning_rate=0.001),
loss='binary_crossentropy',
metrics = ['accuracy'])
model fit
history = model.fit(train_generator,
validation_data=validation_generator,
steps_per_epoch=200,
epochs=2000,
validation_steps=200,
callbacks=[callbacks],
verbose=2)
The evaluation metrics as followings, loss: 0.2351 - accuracy: 0.9847
The prediction shows a very small number for the negative pneumonia, and for positive it shows more than .50.
I have two questions:
why I get a very small number as 2.xxxx * 10e-20?
why I can't get the following values as null?
val_acc = history.history[ 'val_accuracy' ]
val_loss = history.history['val_loss' ]
I see that there is no problem with your code, neither with the results you get.
This is a binary classification problem (2 classes: Positive or negative pneumonia), and the output of your model is one neurone giving values between 0 and 1.
So if the output is higher than 0.5, this means positive pneumonia. Otherwise, when you have a very small value like 2 * 10e-20 this means that it's negative pneumonia.
For your second question, you are not supposed to have accuracy and loss values to be null simply because the model is well trained and has 98% accuracy on training data.
I've been playing with Numer.ai data, mostly as a way to improve my understanding of neural nets but I'm running into a problem that I can't seem to get past. No matter the configuration of my dense neural net, the output comes out in a tight range.
The input is 300 scaled feature columns (0 to 1) and the target is between 0 and 1 (values of 0, 0.25, 0.5, 0.75, and 1)
Here is my fully reproducible code:
import pandas as pd
# load data
training_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_training_data.csv.xz")
tournament_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_tournament_data.csv.xz")
feature_cols = training_data.columns[training_data.columns.str.startswith('feature')]
# select those columns out of the training dataset
X_train = training_data[feature_cols].to_numpy()
# select target variables
y_train = training_data.loc[:,'target'].to_numpy()
#same thing on validation data
val_data = tournament_data[tournament_data.data_type=='validation']
X_val = val_data[feature_cols]
y_val= val_data.loc[:,'target']
I've tried a number of different configurations in my neural network (different optimizers: adam and sgd, different learning rates 0.01 down to 0.0001, different neuron sizes, adding dropout: although, I didn't expect this to work because it seems to be a problem with bias, not variance, using linear, softmax, and sigmoid final layer activation functions: softmax produces negative values so that was an immediate non-starter, different batch sizes: as small as 16 and as large as 256, adding or removing batch normalization, shuffling the input data, and training for different numbers of epochs). Ultimately, the results are one of two things:
Predicted values are all the same number, usually somewhere in the 0.45 to 0.55 area
Predicted values are in a very narrow range, usually not more than 0.05 different. So the values are 0.45 to 0.55
I can't figure out what configuration changes I need to make to get this neural network to output predictions across a broader area of the 0 to 1 range.
from tensorflow.keras import models, layers
dropout_rate = 0.15
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1028, activation = 'relu', kernel_regularizer='l2'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',metrics=['mae', 'mse'])
history = model.fit(X_train, y_train,
validation_data=(X_val, y_val),
batch_size=64,
epochs=200,
verbose=1)
# Prediction output
predictions_df = model.predict(X_val)
predictions_df = predictions_df.reshape(len(predictions_df))
pred_max = predictions_df.max()
pred_min = predictions_df.min()
pred_range = pred_max - pred_min
print(pred_max, pred_min, pred_range)
# example output: 0.51895267 0.47968164 0.039271027
EDIT:
There is an impact on them when the following changes are made (tests run on batches size of 512, number of epochs 5, below results are only on training data) -
Loss set to mse instead of binary_crossentropy
Batch size 512 (for quick prototyping)
Epochs set to 5 (loss flattens after that)
Remove l2 regularization, and increase dropout
Set output activation -
With sigmoid -> Max:0.60, ​Min: 0.36
Without activation -> Max: 0.69, Min: 0.29
With relu -> Max: 0.73, Min: 0.10
Here is the code for testing purposes -
from tensorflow.keras import models, layers
dropout_rate = 0.50
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1024, activation = 'relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1, activation='relu'))
model.compile(optimizer='adam',
loss='mse',metrics=['mae'])
history = model.fit(X_train, y_train,
#validation_data=(X_val, y_val),
batch_size=512,
epochs=5,
verbose=1)
# Prediction output
predictions_df = model.predict(X_train)
predictions_df = predictions_df.reshape(len(predictions_df))
pred_max = predictions_df.max()
pred_min = predictions_df.min()
pred_range = pred_max - pred_min
print(pred_max, pred_min, pred_range)
0.73566914 0.1063129 0.62935627
Proposed solutions
You are trying to solve a regression problem of predicting an arbitrary value between 0 to 1 (values of 0, 0.25, 0.5, 0.75, and 1), but trying to solve it as a binary classification problem using a sigmoid activation and a binary_crossentropy loss.
What you may want to try is using mse and/or removing any output activation (or better, use relu as suggested by #desertnaut). You could simply be underfitting as suggested by #xdurch0. Try with and without the regularization as well.
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1028, activation = 'relu')
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1))
model.compile(optimizer='adam', loss='mse')
Check this table to help you with how to use losses and activations for different types of problem settings.
On a side note, the discrete nature of the values in your dependent variable, y, you can also consider reframing the problem as a multi-class single-label classification problem, if the downstream task allows it.
How may I improve the valid accuracy? Besides that, my test accuracy is also low. I am trying to do categorical image classification on pictures about weeds detection in the agriculture field.
Dataset: The total number of images is 5539 with 12 classes where 70% (3870 images) of Training set 15% (837 images) of Validation and 15% (832 images) of Testing set
#data augmentation by applying Augmentor
train_aug = Augmentor.Pipeline(source_directory="/content/dataset/train",
output_directory="/content/dataset/train")
# Defining augmentation parameters and generating 17600 samples
train_aug.flip_left_right(probability=0.4)
train_aug.flip_top_bottom(probability=0.8)
train_aug.rotate(probability=0.5, max_left_rotation=5, max_right_rotation=10)
train_aug.skew(0.4, 0.5)
train_aug.zoom(probability = 0.2, min_factor = 1.1, max_factor = 1.5)
train_aug.sample(17600)
def cnn_model():
Model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters = 32, kernel_size = (3,3) , activation ='relu',input_shape=(224,224,3)),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(filters = 96, kernel_size = (3,3) , activation ='relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(filters = 150, kernel_size = (3,3) , activation ='relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(filters = 64, kernel_size = (3,3) , activation ='relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Flatten() ,
tf.keras.layers.Dense(512,activation='relu') ,
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(416, [enter image description here][1]activation='relu') ,
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(12,activation='softmax') ,
])
Model.summary()
return Model
Model = cnn_model()
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint(filepath, monitor='accuracy', verbose=1, save_best_only=False, save_weights_only=True, mode='auto')
callbacks_list = [checkpoint]
model= Model.compile(tf.keras.optimizers.Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
History = Model.fit_generator(generator= train_data, steps_per_epoch= 3333//BATCH_SIZE , epochs= NO_OF_EPOCHS , validation_data= valid_data, validation_steps=1 ,callbacks=callbacks_list)
How may I increase my valid accuracy where my training accuracy is 98% and validation accuracy is 71%?
I would adjust the number of filters to size to 32, then 64, 128, 256. Then I would replace the flatten layer with
tf.keras.layers.GlobalAveragePooling2D()
I would also remove the checkpoint callback and replace with
es=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=3,
verbose=1, restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1,
verbose=1)
callback_list=[es, rlronp]
the early stopping callback will monitor validation loss and if it fails to reduce after 3 consecutive epochs it will halt training and restore the weights from the best epoch to the model. The ReduceLROnPlateau callback will monitor validation loss and reduce the learning rate by a factor of .5 if the loss does not reduce at the end of an epoch. Run this and if it does not do much better you can try to use a class_weight dictionary to try to compensate for the class imbalance. To calculate the dictionary find the class that has the HIGHEST number of samples. Then the weight for each class is
weight for class=highest number of samples/samples in class. So create a dictionary of the
form class integer:weight. Be careful to keep the order of the classes correct.
Also to help with the imbalance you can try image augmentation. If you use ImageDataGenerator.flow_from_directory to read in your data you can use the generator to provide image augmentation like horizontal flip. If not you can use the Keras augmentation layers directly in your model. Documentation is here..
I am new to Deep Learning and just have a question if the method I am using is correct.
Also, if anybody has suggestions on what to change on the model creation it would also be appreciated.
graphs look similar
I am using a CNN model to train candlesticks based on 'buy', sell', and 'do trade' pictures that look similar to the attached picture. (tried different number of bars but results where similar)
I based the code of this post:
https://towardsdatascience.com/making-a-i-that-looks-into-trade-charts-62e7d51edcba
I have made a few changes but kept the model training code similar (small changes did not produce significant accuracy)
# Input the size of your sample images
img_width, img_height = 150, 150
nb_filters1 = 32
nb_filters2 = 32
nb_filters3 = 64
conv1_size = 3
conv2_size = 2
conv3_size = 5
pool_size = 2
# We have 2 classes, buy and sell
classes_num = 3
batch_size = 128
lr = 0.001
chanDim =3
model = Sequential()
model.add(Convolution2D(nb_filters1, conv1_size, conv1_size, border_mode ='same', input_shape=(img_height, img_width , 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(pool_size, pool_size)))
model.add(Convolution2D(nb_filters2, conv2_size, conv2_size, border_mode ="same"))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(pool_size, pool_size), dim_ordering='th'))
model.add(Convolution2D(nb_filters3, conv3_size, conv3_size, border_mode ='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(pool_size, pool_size), dim_ordering='th'))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(classes_num, activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.rmsprop(),
metrics=['accuracy'])
train_datagen = ImageDataGenerator(
#rescale=1. / 255,
horizontal_flip=False)
test_datagen = ImageDataGenerator(
#rescale=1. / 255,
horizontal_flip=False)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_height, img_width),
#shuffle=True,
batch_size=batch_size,
class_mode='categorical'
)
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
#shuffle=True,
class_mode='categorical')
With this, I get an accuracy of 38% and if I remove the 'no trade' option, I get an accuracy of 52%.
Before training and after training does not improve accuracy drastically, that is why I am assuming the settings are not 100%
.
When predicting, the results always lean to one side (52% buy, 48% sell) and don't change much after a few hundred images.
Any suggestions?
I assume your three options are "buy", "sell", and "no trade". The reason why it jumps to 52% is because it's differentiating between 2 instead of 3 options.
With regards to the lower than expected accuracy, I recommend changing the loss to Adam. Also possibly move the dropout layer to the middle of the network. I have found success adding a dropout = .2 after each pooling layer. This way nodes are dropped throughout the network which allows for more "diversity" in node paths taken.
This is not what I had expected!
I have trained a CNN on SVHN. The accuracy is close to ~0.93 and overall it works really well when tested on the single number images. So if I test the model with images that contain a single number such as follows:
it works great with the expected class probability close to 1. But if I supply the model with random images like some house or a lion , it will still predict a class with a probability close to 1. I cannot understand the reason for this. It should have predicted very low probabilities for each class.
Here is how I created the network.
import tensorflow.keras as keras
model = keras.Sequential()
# First Conv Layer
model.add(keras.layers.Conv2D(filters = 96, kernel_size = (11,11), strides = (4,4), padding = "same", input_shape=(227,227,3)))
model.add(keras.layers.Activation("relu"))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.MaxPooling2D(pool_size = (3,3), strides = (2,2), padding="same"))
# .. More Convolution Layer ...
# .. SOME Fully Connected Layers ..
# Final Fully Connected Layer
model.add(keras.layers.Dense(10))
model.add(keras.layers.Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer=keras.optimizers.RMSprop(lr=0.0001), metrics=['accuracy'])
data_generator = keras.preprocessing.image.ImageDataGenerator(rescale = 1./255)
train_generator = data_generator.flow_from_directory(
'train',
target_size=(227, 227),
batch_size=batch_size,
color_mode='rgb',
class_mode='categorical'
)
model.fit_generator(
train_generator
epochs = 12,
steps_per_epoch = math.ceil(num_train_samples / batch_size),
verbose = 2
)
As could also be seen from the code I have shared above, I have used:
Loss function as categorical_crossentropy
Final layer activation function as softmax
There are 10 classes 0 to 9. Would I also need to have a 11th class that has some random images? But that sounds very weird. Did I choose the incorrect loss / activation functions?
It may help you to train your network with images of the numbers and to include some random other images (of houses or lions) and label them all as "not numbers". A convolutional neural network does not look at your entire image at once, but bits of it at a time. It can easily find sub shapes that resemble numbers too.
Your loss and activation is fine.