My model is experiencing wild and big fluctuations in the validation loss and does not converge.
I am doing an image recognition project with my three dogs i.e. classifying the dog in the image. Two dogs are very similar and the 3rd is very different. I took 10 minute videos of each dog, separately. Frames were extracted as images at each second. My dataset consists of about 1800 photos, 600 of each dog.
This block of code is responsible for augmenting and creating the data to feed the model.
randomize = np.arange(len(imArr)) # imArr is the numpy array of all the images
np.random.shuffle(randomize) # Shuffle the images and labels
imArr = imArr[randomize]
imLab= imLab[randomize] # imLab is the array of labels of the images
lab = to_categorical(imLab, 3)
gen = ImageDataGenerator(zoom_range = 0.2,horizontal_flip = True , vertical_flip = True,validation_split = 0.25)
train_gen = gen.flow(imArr,lab,batch_size = 64, subset = 'training')
test_gen = gen.flow(imArr,lab,batch_size =64,subset = 'validation')
This picture is the result of the model below.
model = Sequential()
model.add(Conv2D(16, (11, 11),strides = 1, input_shape=(imgSize,imgSize,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(32, (5, 5),strides = 1))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(BatchNormalization(axis=-1))
model.add(Dropout(0.3))
#Fully connected layer
model.add(Dense(256))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(3))
model.add(Activation('softmax'))
sgd = SGD(lr=0.004)
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])
batch_size = 64
epochs = 100
model.fit_generator(train_gen, steps_per_epoch=(len(train_gen)), epochs=epochs, validation_data=test_gen, validation_steps=len(test_gen),shuffle = True)
Things I have tried.
High/low Learning rate ( 0.01 -> 0.0001)
Increase Dropout to 0.5 in both Dense layers
Increase/Decrease size of both Dense Layers ( 128 min -> 4048 max)
Increased number of CNN layers
Introduced Momentum
Increased/Decreased Batch Size
Things I have not tried
I have not used any other loss or metric
I have not used any other optimiser.
Have not adjusted any parameters of the CNN layers
It seems that there is some form of randomness or too many parameters in my model. I am aware that it is currently overfitting, but that should not be the cause of the volatility(?).
I am not too worried about the performance of the model. I would like to achieve about a 70% accuracy. All I want to do now is to stabilise the validation accuracy and to converge.
Note:
At some epochs, the training loss is very low ( <0.1 ) but validation
loss is very high ( > 3 ).
The videos are taken on different backgrounds, but +- the same amount on each background for each dog.
Some images are a bit blurry.
Change the optimizer to Adam, definitely better. In your code you are using it but with default parameters, you are creating an SGD optimizer but in the compile line you introduce an Adam with no parameters. Play with the actual parameters of your optimizer.
I encourage you to take out the dropout first, see what is happening and the if you manage to overfit, start with low dropout and go up.
Also it might be due to some of your test samples are very hard to detect and thus increase the loss, maybe take out the shuffle in the validation set and watch for any peridiocities to try to find out if there are validation samples hard to detect.
Hope it helps!
I see you have tried a lot of different things. Few suggestions:
I see you use large filters in your Conv2D eg. 11x11 and 5x5. If your image dimensions are not very big, you should definitely go for lower filter dimensions like 3x3.
Try different optimizers, try Adam with varying lr if you haven't.
Otherwise, I don't see much problems. Maybe you need more data for the network to learn better.
Related
I'm currently working on online signature verification. The dataset has a variable shape of (x, 7) where x is the number of points a person used to sign their signature. I have the following model:
model = Sequential()
#CNN
model.add(Conv1D(filters=64, kernel_size=3, activation='sigmoid', input_shape=(None, 7)))
model.add(MaxPooling1D(pool_size=3))
model.add(Conv1D(filters=64, kernel_size=2, activation='sigmoid'))
#RNN
model.add(Masking(mask_value=0.0))
model.add(LSTM(8))
model.add(Dense(2, activation='softmax'))
opt = Adam(lr=0.0001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
print(model.fit(x_train, y_train, epochs=100, verbose=2, batch_size=50))
score, accuracy = model.evaluate(x_test,y_test, verbose=2)
print(score, accuracy)
I know it may not be the best model but this is the first time I'm building a neural network. I have to use a CNN and RNN as it is required for my honours project. At the moment, I achieve 0.5142 as the highest training accuracy and 0.54 testing accuracy. I have tried increasing the number of epochs, changing the activation function, add more layers, moving the layers around, changing the learning rate and changing the optimizer.
Please share some advice on changing my model or dataset. Any help is much appreciated.
For CNN-RNN, some promising things to try:
Conv1D layers: activation='relu', kernel_initializer='he_normal'
LSTM layer: activation='tanh', and recurrent_dropout=.1, .2, .3
Optimizer: Nadam, lr=2e-4 (Nadam may significantly outperform all other optimizers for RNNs)
batch_size: lower it. Unless you have 200+ batches in total, set batch_size=32; lower batch size better exploits the Stochastic mechanism of the optimizer and can improve generalization
Dropout: right after second Conv1D, with a rate .1, .2 - or, after first Conv1D, with a rate .25, .3, but only if you use SqueezeExcite (see below), else MaxPooling won't work as well
SqueezeExcite: shown to enhance all CNN performance across a large variety of tasks; Keras implementation you can use below
BatchNormalization: while your model isn't large, it's still deep, and may benefit from one BN layer right after second Conv1D
L2 weight decay: on first Conv1D, to prevent it from memorizing the input; try 1e-5, 1e-4, e.g. kernel_regularizer=l2(1e-4) # from keras.regularizers import l2
Preprocessing: make sure all data is normalized (or standardized if time-series), and batches are shuffled each epoch
def SqueezeExcite(_input):
filters = _input._keras_shape[-1]
se = GlobalAveragePooling1D()(_input)
se = Reshape((1, filters))(se)
se = Dense(filters//16,activation='relu',
kernel_initializer='he_normal', use_bias=False)(se)
se = Dense(filters, activation='sigmoid',
kernel_initializer='he_normal', use_bias=False)(se)
return multiply([_input, se])
# Example usage
x = Conv1D(filters=64, kernel_size=4, activation='relu', kernel_initializer='he_normal')(x)
x = SqueezeExcite(x) # place after EACH Conv1D
I'm trying to get a Convolution Neural Network up and running to be able to play the old NES Ice Climbers. Right now I have utilized OpenCV to capture the screen for inputs and the output is the action of the iceclimber, such as walk left - right or jump. The problem I'm running into is the trained model doesn't actually learn properly or it's overfitting from when I validate it.
I've tried lowering the outputs by excluding the jump command. I've tried different batch sizes, epochs, and different test data. I've also tried changing the optimizer and dimensions but nothing had a significant impact.
Here is the code for when I'm capturing the screen and using that data to train my model. My training data is 900 sequential screen captures with the respective inputs pushed that I played. I have around 10k sequences saved from playing for the training data.
def screen_record():
global last_time
printscreen = np.array(ImageGrab.grab(bbox=(0,130,800,640)))
last_time = time.time()
processed = greycode(printscreen)
processed = cv2.resize(processed, (80, 60))
cv2.imshow('AIBOX', processed)
cv2.moveWindow("AIBOX", 500, 150);
#training.append([processed, check_input()])
processed = np.array(processed).reshape(-1, 80, 60, 1)
result = AI.predict(processed, batch_size=1)
print (result)
AI_Control_Access(result)
def greycode(screen):
greymap = cv2.cvtColor(screen, cv2.COLOR_BGR2GRAY)
greymap = cv2.Canny(greymap, threshold1=200, threshold2=300)
return greymap
def network_train():
train_data = np.load('ICE_Train5.npy')
train = train_data[::7]
test = train_data[-3::]
x_train = np.array([i[0] for i in train]).reshape(-1,80,60,1)
x_test = np.array([i[0] for i in test]).reshape(-1,80,60,1)
y_train = np.asarray([i[1] for i in train])
y_test = np.asarray([i[1] for i in test])
model = Sequential()
model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(80, 60, 1)))
model.add(Convolution2D(16, (5, 5), activation='relu', strides=4))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='sgd')
model.fit(x_train, y_train,batch_size=450,epochs=50,verbose=1,callbacks=None,validation_split=0,validation_data=None,shuffle=False,
class_weight=None,sample_weight=None,initial_epoch=0,steps_per_epoch=None,validation_steps=None)
When I run it against the test data for validation the highest I could get was around 16%, but even when I use that model for actually playing the game it always predicts the same button pushed so I think it's either due to over fitting of the model or the model not learning at all, but since this is my first time using a convolution network i'm unsure how to tweak the network to be more responsive to training.
The general setup sounds more like a good environment for reinforcement learning.
If you want to stick with the supervised learning setup, you should first check if your different classes have the same amount of training examples. If that's the case, you could experiment with the learning rate, more regularization (dropout), the network architecture etc.
I am using tensorflow with keras to perform regression on some historical data. Data type as follows:
id,timestamp,ratio
"santalucia","2018-07-04T16:55:59.020000",21.8
"santalucia","2018-07-04T16:50:58.043000",22.2
"santalucia","2018-07-04T16:45:56.912000",21.9
"santalucia","2018-07-04T16:40:56.572000",22.5
"santalucia","2018-07-04T16:35:56.133000",22.5
"santalucia","2018-07-04T16:30:55.767000",22.5
And I am reformulating it as a time series problem (25 time steps) so that I can predict (make a regression) for the next values of the series (variance should not be high). I am using also sklearn.preprocessing MinMaxScaler to scale the data to range (-1,1) or (0,1) depending if I use LSTM or Dense (respectively).
I am training with two different architectures:
Dense is as follows:
def get_model(self, layers, activation='relu'):
model = Sequential()
# Input arrays of shape (*, layers[1])
# Output = arrays of shape (*, layers[1] * 16)
model.add(Dense(units=int(64), input_shape=(layers[1],), activation=activation))
model.add(Dense(units=int(64), activation=activation))
# model.add(Dropout(0.2))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
# opt = optimizers.Adagrad(lr=self.learning_rate, epsilon=None, decay=self.decay_lr)
opt = optimizers.rmsprop(lr=0.001)
model.compile(optimizer=opt, loss=self.loss_fn, metrics=['mae'])
model.summary()
return model
Which more or less provides with good results (same architecture as in tensorflows' tutorial for predicting house prices).
However, LSTM is not giving good results, it usually ends up stuck around a value (for example, 40 (40.0123123, 40.123123,41.09090...) and I do not see why or how to improve it. Architecture is:
def get_model(self, layers, activation='tanh'):
model = Sequential()
# Shape = (Samples, Timesteps, Features)
model.add(LSTM(units=128, input_shape=(layers[1], layers[2]),
return_sequences=True, activation=activation))
model.add(LSTM(64, return_sequences=True, activation=activation))
model.add(LSTM(layers[2], return_sequences=False, activation=activation))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
opt = optimizers.Adagrad(lr=0.001, decay=self.decay_lr)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
model.summary()
return model
I currently train with a batch size of 200 that increases by a rate of 1.5 every fit. Each fit is made of 50 epochs, and I use a keras earlystopping callback with at least 20 epoch.
I have tried adding more layers, more units, reducing layers, units, increasing and decreasing learning rate, etc, but every time it gets stuck around a value. Any reason for this?
Also, do you know any good practices that can be applied to this problem?
Cheers
Have you tried holding back a validation set seeing how well the model performance on the training set tracks with the validation set? This is often how I catch myself overfitting.
A simple function for doing this (adapted from here) can help you do that:
hist = model.fit_generator(...)
def gen_graph(history, title):
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])
plt.title(title)
gen_graph(hist, "Accuracy, training vs. validation scores")
Also, do you have enough samples? If you're really, really sure that you have done as much as you can in terms of preprocessing, and in terms of hyperparameter tuning... generating some synthetic data or doing some data augmentation has occasionally helped me.
I'm performing a multiclass image classification task. While training my CNN the validation accuracy remains constant across all epochs. I've tried different model architectures and different hyperparameter values but no change. Any ideas would be greatly appreciated. Here are my current results:
Train and Validation Loss and Accuracy
Here is my CNN:
model = models.Sequential()
model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape= .
(img_width, img_height, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.2))
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(8, activation = 'softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-08, decay=0.0001),metrics = ['acc'])
model.summary()
There are a variety of possible underlying factors that can potentially cause this phenomenon - below is a list, by no means exhaustive, of some preliminary fixes you could try:
If you're using the Adam optimizer(or any other adaptive learning rate optimizer such as RMSprop or Adadelta), try a significantly smaller initial learning rate than the default, somewhere on the order of 10E-6. Alternatively, try Stochastic Gradient Descent with an initial learning rate somewhere in the regime of 10E-2 to 10E-3. You could also set a large initial learning rate and anneal it over the course of several training epochs by employing Keras's LearningRateScheduler callback and defining a custom learning rate schedule(for SGD).
If the above doesn't work, try decreasing the complexity of your network (e.g. the number of layers) and increasing the size of the training set. Also, while you're inspecting your training dataset, ensure that your training set doesn't suffer from class imbalance - if it does, you can artificially weight the losses associated with the underrepresented class's training examples using the class_weights parameter that can be passed to the model's fit() method.
If the issue still persists, you may have to confront the possibility that a constant validation loss is possibly an artifact of essentially fitting on noise and any (even somewhat plausible) predictions the model emits may be spurious. You may want to try extracting more informative features, a larger variety of features or perform extensive data augmentation on your training set at this point.
Have a look at this GitHub issue for further suggestions that may help resolve your problem:
https://github.com/keras-team/keras/issues/1597
I'm trying to make an autoencoder using Keras with a tensorflow backend. In particular, I have data of a vector of n_components (i.e. 200) sampled n_times (i.e. 20000). It is key that when I train time t, that I compare it only to time t. It appears that it is shuffling the sampling times. I removed the bottleneck and find that the network is doing a pretty bad job of predicting the n_components, instead representing something more like the mean of the input scaled by each component.
Here is my network with the bottleneck commented out:
model = keras.models.Sequential()
# Make a 7-layer autoencoder network
model.add(keras.layers.Dense(n_components, activation='relu', input_shape=(n_components,)))
model.add(keras.layers.Dense(n_components, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
# model.add(keras.layers.Dense(3, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
# act is a numpy matrix of size (n_components, n_times)
model.fit(act.T, act.T, epochs=15, batch_size=100, shuffle=False)
newact = model.predict(act.T).T
I have tested shuffling the second component of act, n_times, and passing it as model.fit(act.T, act_shuffled.T) and see no difference from model.fit(act.T, act.T). Am I doing something wrong? How can I force it to learn from the specific time?
Many thanks,
Arthur
I believe that I have solved the problem, but more knowledgeable users of Keras might be able to correct me. I had tried many different values for the argument batch_size of fit, but I didn't try a value of 1. When I changed it to 1, it did a good job of reproducing the input data.
I believe that the batch size, even if shuffle is set to False, allows the autoencoder to train one input time against an unrelated input time.
So, I have ammended my code to:
model.fit(act.T, act.T, epochs=15, batch_size=1, shuffle=False)