New to ML and I would like to know what I'm missing or doing incorrectly.
I'm trying to figure out why my data is being underfit when applying early stopping and dropout however when I don't use earlystopping or dropout the fit seems to be okay...
Dataset I'm using:
https://www.kaggle.com/datasets/kanths028/usa-housing
Model Parameters:
The dataset has 5 features to train on and the target is the price
I chose 4 layers arbitrarily
Epochs at 600 (way too many) because I want to test early stopping
Optimizers and loss because those seemed to get me the most consistent results when compared to SKLearns LinearRegression (MAE is about 81K)
Data Pre-preprocessing:
X = df[df.columns[:-2]].values
y = df['Price'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Fit looks okay:
model = Sequential()
model.add(Dense(5, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mae')
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=600)
Data looks underfit with earlystopping and dropout combined:
model = Sequential()
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=25)
model.compile(optimizer='adam', loss='mae')
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=600, callbacks=[early_stopping])
I'm trying to figure out why early stopping would stop when the results are so far off. I would guess that the model would continue until the end of the 600 epochs however early stopping pulls the plug around 300.
I'm probably doing something wrong but I can't figure it out so any insights would be appreciated. Thank you in advance :)
It defines performance measure and specifies whether to maximize or minimize it.
Keras then stops training at the appropriate epoch. When verbose=1 is designated, it is possible to output on the screen when the training is stopped in keras.
es = EarlyStopping(monitor='val_loss', mode='min')
It may not be effective to stop right away because performance does not increase. Patience defines how many times to allow epochs that do not increase performance. Partiance is a rather subjective criterion. The optimal value can be changed depending on the design of the used data and model used.
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=50)
When the training is stopped by the Model Choice Early stopping object, the state will generally have a higher validation error than the previous model. Therefore, early stopping may be controlled so that the validation error of the model is no longer lowered by stopping the training of the model at a certain point in time, but the stopped state will not be the best model. Therefore, it is necessary to store the model with the best validation performance, and for this purpose, the object called Model Checkpoint exists in keras. This object monitors validation errors and unconditionally stores parameters at this time if the validation performance is better than the previous epoch. Through this, when training is stopped, the model with the highest validation performance can be returned.
from keras.callbacks import ModelCheckpoint
mc = ModelCheckpoint ('best_model.h5', monitor='val_loss', mode='min', save_best_only=True)
in the callbacks parameter, allowing the best model to be stored.
hist = model.fit(train_x, train_y, nb_epoch=10,
batch_size=10, verbose=2, validation_split=0.2,
callbacks=[early_stopping, mc])
In your case Patience 25 indicates whether to end when the reference value does not improve more than 25 times consecutively.
from keras.callbacks import ModelCheckpoint
model = Sequential()
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=25, verbose=1)
mc = ModelCheckpoint ('best_model.h5', monitor='val_loss', mode='min', save_best_only=True)
model.compile(optimizer='adam', loss='mae')
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=600, callbacks=[early_stopping, mc])
I recommend 2 things. In the early stop callback set the parameter
restore_best_weights=True
This way if the early stopping callback activates, your model is set to the weights for the epoch with the lowest validation loss. To get the lower validation loss I recommend you use the callback ReduceLROnPlateau. My recommended code for these callbacks is shown below.
estop=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=4,
verbose=1, estore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5,
patience=2, verbose=1)
callbacks=[estop, rlronp]
In model.fit set parameter callbacks=callbacks. Set epochs to a large number so it is likely the estop callback will be activated.
Related
I am learning neural networks. I get 98% accuracy with classical ML methods, so I think I made a coding error. The neural networks model is not learning.
Things I tried:
Changing X and y to float64 or float32
Normalizing data
Changing the activation to "linear" or "relu"
Removing Flatten()
Adding hidden layers
Using stochastic gradient descent as optimizer, instead of "adam".
Changing the y label with another label
There are 9 labels in X_train and 8 different classes in y_train.
X_train:
y_train:
Code:
model = keras.models.Sequential()
model.add(keras.layers.Input(shape=(9,)))
model.add(keras.layers.Dense(8, activation='softmax'))
model.add(layers.Flatten())
model.compile(optimizer= 'adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Fitting:
I tried these lines by changing the target label. None of them help training the model. Some give "nan" loss, some go slightly up and down, but all of them are below 0.1% accuracy:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20, batch_size=24)
or this:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(3, activation='relu', name='relu1'))
model.add(layers.Dense(16, activation='relu', name='relu2'))
model.add(layers.Dense(16, activation='relu', name='relu3'))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history = model.fit(x=X_train, y=y_train, epochs=20)
My dataset( Network traffic dataset where we do binary classification)-
Number of features is 25 and I have normalized the dataset.
My model-
verbose=1
epoch_number=1000
batch_size = 32
n_outputs = 1
model = Sequential()
model.add(Conv1D(filters=200, kernel_size=4, strides=3,activation='relu', input_shape=(25,1)))
model.add(Dropout(0.05))
model.add(BatchNormalization())
model.add(Conv1D(filters=200, kernel_size=5, strides=1,activation='relu', input_shape=(25,1)))
model.add(Dropout(0.05))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.05))
model.add(Flatten())
model.add(Dense(200, activation='relu'))
model.add(Dropout(0.05))
model.add(Dense(100, activation='relu'))
model.add(Dropout(0.05))
model.add(Dense(50, activation='relu'))
model.add(Dropout(0.05))
model.add(Dense(n_outputs, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['acc',f1_m,precision_m, recall_m])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1)
# fit network
model.fit(X_train, y_train,validation_data=(X_test, y_test),epochs=epoch_number, batch_size=batch_size, verbose=1,callbacks=[es])
# evaluate model
loss, accuracy, f1_score, precision, recall = model.evaluate(X_test, y_test, batch_size=batch_size, verbose=0)
print(loss,accuracy,f1_score,precision,recall)
My model is stopping after one epoch when I add Keras Earlycall back even though loss is decreasing after every epoch when I remove it.
If you had printed your logs of training of dataset without using early stopping then It would have been easier to diagnose.
Now Let's look at the possibilities. You have set EarlyStopping as mentioned below.
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1)
Then that means your early stopping layer is like mentioned below which has default parameters.
tf.keras.callbacks.EarlyStopping(
monitor="val_loss",
min_delta=0,
patience=0,
verbose=1,
mode="min",
baseline=None,
restore_best_weights=False,
)
Now here your patience=0 , mode='min', 'min_delta= 0' and monitor_loss = 'val_loss'
This simply means that if your validation loss is not decreasing in the next layer then it will stop.
Or if your Validation loss is same or greater than the previous epoch then it will stop.
I would recommend you to change your patience parameter
I am having a time series prediction problem and building an LSTM like below :
def create_model():
model = Sequential()
model.add(LSTM(50,kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01), bias_regularizer=l2(0.01), input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.591))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
When I train the model on 5 splits like below :
tss = TimeSeriesSplit(n_splits = 5)
X = data.drop(labels=['target_prediction'], axis=1)
y = data['target_prediction']
for train_index, test_index in tss.split(X):
train_X, test_X = X.iloc[train_index, :].values, X.iloc[test_index,:].values
train_y, test_y = y.iloc[train_index].values, y.iloc[test_index].values
model=create_model()
history = model.fit(train_X, train_y, epochs=10, batch_size=64,validation_data=(test_X, test_y), verbose=0, shuffle=False)
I get an overfitting problem. The graph of loss is attached
I am not sure why there is overfitting when I use regularizers in my Keras model. Any help is appreciated .
EDIT:
Tried the architectures
def create_model():
model = Sequential()
model.add(LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
def create_model(x,y):
# define LSTM
model = Sequential()
model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(x,y)))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
but still it is overfitting.
First of all remove all your regularizers and dropout. You are literally spamming with all the tricks out there and 0.5 dropout is too high.
Reduce the number of units in your LSTM. Start from there. Reach a point where your model stops overfitting.
Then, add dropout if required.
After that, the next step is to add the tf.keras.Bidirectional. If still, you are not satfisfied then, increase number of layers. Remember to keep return_sequences True for every LSTM layer except the last one.
It is seldom I come across networks using layer regularization despite the availability because dropout and layer regularization have a same effect and people usually go with dropout (at maximum, I have seen 0.3 being used).
I am learning how to use keras and I keep getting some problems. I will try to be as much specific as possible.
My task: I am trying to create a neural network to predict opening status for a domestic residence.
I have a dataset with 524729 examples. I use 70% as training set and 30% as test set. I am reaching 70+% of acc in my tests but for some reason every time that I try to predict an output I get the same values.
Right now, I have the following topology:
model = Sequential()
model.add(Dense(15, input_shape=(13, ), kernel_initializer='random_normal'))
model.add(Dense(15, activation='softplus'))
model.add(Dense(15, activation='softplus'))
model.add(Dense(10, activation='sigmoid'))
model.summary()
sgd = optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.3, nesterov=True)
model.compile(optimizer=sgd, loss='mean_squared_error', metrics=['mae', 'acc'])
model.fit(X_training, Y_training, validation_data=(X_test, Y_test), epochs=1, batch_size=32)
and I use:
model.predict(np_inputRN, verbose=0)
to predict the output but for some reason I keep getting the same values.
0.0172018650919,0.498908281326,0.984391093254,0.485811322927,0.480756670237,0.984736263752,0.536143004894,0.475958675146,0.494080305099,0.488458126783
Can some one help me?
==========================================================================
#Aiven :
Data Set: 524729
Test Set[30%]: 157418
Training Set [70%]: 367310
X_training.shape: (367311, 13)
Y_training.shape: (367311, 10)
X_test.shape: (157419, 13)
Y_test.shape: (157419, 10)
np_inputRN.shape: (1, 13)
It's my NN in keras. The model was compiled and trained. When I try to plot the learning curve of history, only empty window appears.
model = Sequential()
model.add(Dense(64, input_dim=30,
activity_regularizer=regularizers.l2(0.01)))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dropout(0.5))
model.add(Dense(16,
activity_regularizer=regularizers.l2(0.01)))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dense(2))
model.add(Activation('softmax'))
opt = Nadam(lr=0.001)
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.9, patience=25, min_lr=0.000001, verbose=1)
checkpointer = ModelCheckpoint(filepath="test.hdf5", verbose=1, save_best_only=False)
model.compile(optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, Y_train,
nb_epoch = 1,
batch_size = 128,
verbose=1,
validation_data=(X_test, Y_test),
callbacks=[reduce_lr, checkpointer],
shuffle=True)
plt.plot(history.history['acc'])
When I print history.history['acc'], it's just one number. Not a list.
I'd be happy, if you can help
Try increasing the number of epochs