Keras LSTM model overfitting - python

I am using an LSTM model in Keras. During the fitting stage, I added the validation_data paramater. When I plot my training vs validation loss, it seems there are major overfitting issues. My validation loss just won't decrease.
My full data is a sequence with shape [50,]. The first 20 records are used as training and the remaining used for the test data.
I have tried adding dropout and reducing the model complexity as much as I can and still no luck.
# transform data to be stationary
raw_values = series.values
diff_values = difference_series(raw_values, 1)
# transform data to be supervised learning
# using a sliding window
supervised = timeseries_to_supervised(diff_values, 1)
supervised_values = supervised.values
# split data into train and test-sets
train, test = supervised_values[:20], supervised_values[20:]
# transform the scale of the data
# scale function uses MinMaxScaler(feature_range=(-1,1)) and fit via training set and is applied to both train and test.
scaler, train_scaled, test_scaled = scale(train, test)
batch_size = 1
nb_epoch = 1000
neurons = 1
X, y = train_scaled[:, 0:-1], train_scaled[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
testX, testY = test_scaled[:, 0:-1].reshape(-1,1,1), test_scaled[:, -1]
model = Sequential()
model.add(LSTM(units=neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]),
stateful=True))
model.add(Dropout(0.1))
model.add(Dense(1, activation="linear"))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(X, y, epochs=nb_epoch, batch_size=batch_size, verbose=0, shuffle=False,
validation_data=(testX, testY))
This what it looks like when changing the amount of neurons. I even tried using Keras Tuner (hyperband) to find the optimal parameters.
def fit_model(hp):
batch_size = 1
model = Sequential()
model.add(LSTM(units=hp.Int("units", min_value=1,
max_value=20, step=1),
batch_input_shape=(batch_size, X.shape[1], X.shape[2]),
stateful=True))
model.add(Dense(units=hp.Int("units", min_value=1, max_value=10),
activation="linear"))
model.compile(loss='mse', metrics=["mse"],
optimizer=keras.optimizers.Adam(
hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])))
return model
X, y = train_scaled[:, 0:-1], train_scaled[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
tuner = kt.Hyperband(
fit_model,
objective='mse',
max_epochs=100,
hyperband_iterations=2,
overwrite=True)
tuner.search(X, y, epochs=100, validation_split=0.2)
When evaluating the model against X_test and y_test, I get the same loss and accuracy score. But when fitting the "best model", I get this:
However, my predictions looks very reasonable against my true values. What should I do to get a better fit?

20 records as training data is too small. There won't be enough variation in the training data for the model to approximate a function accurately, and so your validation data, which is likely much smaller than 20, will likely contain an example wildly different from just those 20 in the training data (i.e. it hasn't seen an example of that nature during training) resulting in a loss that is much higher.

Related

Need to increase the accuracy of my LSTM model

This is my model
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(df)
train_size = int(len(dataset) * 0.8)
test_size = len(dataset) - train_size
train = dataset[0:train_size,:]
test = dataset[train_size:len(dataset),:]
def create_dataset(dataset, look_back=1):
dataX, dataY = [], []
for i in range(len(dataset)-look_back-1):
a = dataset[i:(i+look_back), 0]
dataX.append(a)
dataY.append(dataset[i + look_back, 0])
return np.array(dataX), np.array(dataY)
# reshape into X=t and Y=t+1
look_back = 15
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
print(trainX.shape)
# reshape input to be [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))
from keras.layers import Dropout
from keras.layers import Bidirectional
model=Sequential()
model.add(LSTM(50,activation='relu',return_sequences=True,input_shape=(look_back,1)))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(LSTM(50, activation='sigmoid', return_sequences=False))
model.add(Dense(50))
model.add(Dense(50))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mean_squared_error',metrics=['accuracy'])
model.optimizer.learning_rate = 0.0001
Xdata_train=[]
Ydata_train=[]
Xdata_train, Ydata_train = create_dataset(train, look_back)
Xdata_train = np.reshape(Xdata_train, (Xdata_train.shape[0], Xdata_train.shape[1], 1))
#training for all data
history = model.fit(Xdata_train,Ydata_train,batch_size=1,epochs=10,shuffle=False)
RMSE value is around 35 and accuracy is very low. When I icrese the epochs there is no any variation. What are the changes should I do to get the accuracy at high value.
Here i attached the graphical results to get an idea.
How could I fix this?
Just with a once-over on your code, I can think of a few of the changes. Try using Bidirectional LSTM, binary_cross_entropy (assuming it's a binary classification) for the loss, and shuffle = True on training. Also, try adding Dropout between LSTM layers.
Here are couple of suggestions:
First of all, never fit normalizer on the entire dataset. First
partition your data into train/test parts, fit the scaler on train
data and then transform both train/test using that scaler. Otherwise
you are leaking the information from your test data into training
when doing the normalization (such as min/max values or std/mean
when using standard scaler).
You seem to be normalizing your y data, but never reverting the
normalization, as a result you end up with an output on lower scale
(as we can see on plots). You can redo normalization using
scaler.inverse_transform().
Finally, you may want to remove sigmoid activation function from the
LSTM layer, its generally not a good idea to use sigmoid anywhere
else besides the output layer as it may cause vanishing gradient.

Regression Neural Network returning low range values for prediction

I am trying to predict a value that ranges from 3 to 7 with a regression nn model. I currently have 1000 samples with 1000 points each, corresponding to physiological data. However, predictions are always in the range of 5.6 to 5.9. I have tried to change the number of layers, adjust parameters etc.. but it only changes the basal value, not the range (i.e. predicts, for example, in the range of 6.6 to 6.9).
Here's the basic core of the algorithm (data is already normalized):
X_train, X_test, y_train, y_test = train_test_split(x_0, y_0, test_size=0.2, shuffle = False, stratify = None)
K.clear_session()
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(1000,)))
#model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))
model.add(Dense(128,activation='relu'))
model.add(Dense(128,activation='relu'))
model.add(Dense(1,activation='linear'))
model.compile(loss="mean_absolute_error", optimizer=Adam(lr=0.001), metrics=['mean_absolute_error'])
model.summary()
history = model.fit(X_train,y_train,batch_size=30, epochs=10,verbose=1, validation_split=0.2)
score = model.evaluate(X_test,y_test)
print("Test score:", score)
y_pred = model.predict(X_test)
Thank you for any help!
I'm guessing that your input is sorted, and I can see that you set shuffle=False in train_test_split. This means that validation values have never been seen by you neural network. And so it will not extrapolate during validation. Setting shuffle=True should work.

RNN model predicting only one class?

I am trying to use GloVe embeddings to train a rnn model based on this article.
I have a labeled data: text(tweets) on one column, labels on another (hate, offensive or neither).
However the model seems to predict only one class in the result.
This is the LSTM model:
model = Sequential()
hidden_layer = 3
gru_node = 32
# model embedding matrix here....
for i in range(0,hidden_layer):
model.add(GRU(gru_node,return_sequences=True, recurrent_dropout=0.2))
model.add(Dropout(dropout))
model.add(GRU(gru_node, recurrent_dropout=0.2))
model.add(Dropout(dropout))
model.add(Dense(64, activation='softmax'))
model.add(Dense(nclasses, activation='softmax'))
start=time.time()
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
fitting the model:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)
X_train_Glove,X_test_Glove, word_index, embeddings_index = loadData_Tokenizer(X_train, X_test)
model_RNN = Build_Model_RNN_Text(word_index,embeddings_index, 20)
model_RNN.fit(X_train_Glove,y_train,
validation_data=(X_test_Glove, y_test),
epochs=4,
batch_size=128,
verbose=2)
y_preds = model_RNN.predict_classes(X_test_Glove)
print(metrics.classification_report(y_test, y_preds))
Results:
classification report
Confusion matrix
Am I missing something here?
Update:
this is what the distribution looks like
and the model summary, more or less
How the distribution of your data looks like? The first suggestion is to stratify train/test split (here is the link for the documentation).
The second question is how much data do you have in comparison with the complexity of the model? Maybe, your model is so complex, that just do overfitting. You can use the command model.summary() to see the number of trainable parameters.

Different training and validation results with same input Keras

I am trying to re-train MobileNet for a different multiclassification purpose as:
train_datagen = ImageDataGenerator(
preprocessing_function = preprocess_input
training_generator = train_datagen.flow_from_directory(
directory = train_data_dir,
target_size=(parameters["img_width"], parameters["img_height"]),
batch_size = parameters["batch_size"],
class_mode= "categorical",
subset = "training",
color_mode = "rgb",
seed = 42)
# Define the Model
base_model = MobileNet(weights='imagenet',
include_top=False, input_shape = (128, 128, 3)) #imports the mobilenet model and discards the last 1000 neuron layer.
# Let only the last n layers as trainable
for layer in base_model.layers:
layer.trainable = False
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(800,activation='relu')(x) #we add dense layers so that the model can learn more complex functions and classify for better results.
x = Dense(600,activation='relu')(x) #dense layer 2
x = Dropout(0.8)(x)
x = Dense(256,activation='relu')(x) #dense layer 3
x = Dropout(0.2)(x)
preds = Dense(N_classes, activation='softmax')(x) #final layer with softmax activation
model= Model(inputs = base_model.input, outputs = preds)
model.compile(optimizer = "Adam", loss='categorical_crossentropy', metrics=['accuracy'])
And performing training setting as validation dataset, the training set as:
history = model.fit_generator(
training_generator,
steps_per_epoch= training_generator.n // parameters["batch_size"],
epochs = parameters["epochs"]
,
##### VALIDATION SET = TRAINING
validation_data = training_generator,
validation_steps = training_generator.n // parameters["batch_size"],
callbacks=[
EarlyStopping(monitor = "acc", patience = 8, restore_best_weights=False),
ReduceLROnPlateau(patience = 3)]
)
However, I do get significant differences in accuracy, between TRAINING AND VALIDATION ACCURACY, even if they are the same dataset, while training; what could it be due to?
Training a neural network involves random distribution of the data in the training database. Because of this, the results are not reproducible. If you're getting significant differences in accuracy, you may try:
get a bigger training database;
retrain the network;
get a database with more consistent results.
LE: it doesn't matter if you get significant differences in accuracy while training. Training is an iterative optimization process, which minimizes the mean square error objective function. It takes a while until this goal is achieved.
I do not know the EXACT reason but I duplicated your problem. The problem happens because you are using the SAME generator which runs for training and then again for validation. If your create a SEPERATE generator for validation that takes the same training data as input then once you run enough epochs for the training accuracy to get into the 90% range you will see the validation accuracy stabilize and converge toward the training accuracy
Train-Valid Acc vs Epochs

Tensorflow training: why two successive training is better than one training

I'm working on a regression problem using Keras+Tensorflow. And I've found something interesting.
1) here are the two models, which are actually the same except that the first model is using a globally defined 'optimizer'.
optimizer = Adam() #as a global variable
def OneHiddenLayer_Model():
model = Sequential()
model.add(Dense(300 * inputDim, input_dim=inputDim, kernel_initializer='normal', activation=activationFunc))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer=optimizer)
return model
def OneHiddenLayer_Model2():
model = Sequential()
model.add(Dense(300 * inputDim, input_dim=inputDim, kernel_initializer='normal', activation=activationFunc))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer=Adam())
return model
2) Then, I use two schemes to train the datasets (training set(scaleX, Y); testing set(scaleTestX, testY)).
2.1) Scheme1. two successive fitting with the first model
numpy.random.seed(seed)
model = OneHiddenLayer_Model()
model.fit(scaleX, Y, validation_data=(scaleTestX, testY), epochs=250, batch_size=numBatch, verbose=0)
numpy.random.seed(seed)
model = OneHiddenLayer_Model()
history = model.fit(scaleX, Y, validation_data=(scaleTestX, testY), epochs=500, batch_size=numBatch, verbose=0)
predictY = model.predict(scaleX)
predictTestY = model.predict(scaleTestX)
2.2) Scheme2. one fitting with the second model
numpy.random.seed(seed)
model = OneHiddenLayer_Model2()
history = model.fit(scaleX, Y, validation_data=(scaleTestX, testY), epochs=500, batch_size=numBatch, verbose=0)
predictY = model.predict(scaleX)
predictTestY = model.predict(scaleTestX)
3). Finally, the results are plotted for each scheme, as shown below, (model loss history --> predict on scaleX --> predict on scaleTestX),
3.1) Scheme1
3.2) Scheme2 (with 500 epochs)
3.3) add one more test with Scheme2 and set epochs = 1000
From the images above, I've found that Scheme1 is better than Scheme2, even if Scheme2 is set with more epochs.
Can anyone help to explain why Scheme1 is better? Thanks a lot!!!

Categories