I am using tensorflow and keras for a binary classification problem.
I have only 121 samples, but 20.000 features. I know its too less samples and too many features, but its a biological problem (gene-expression data), so i have to deal with it.
My question: Why is accuracy (train and test) going up to 100%, then down and then increasing again. BUT loss is decreasing all the time?
Accuracy plot:
Validation plot:
Since my dataset is only 118 samples big i have only 24 test data points. See confusion matrix:
This is my neural network architecture:
with current settings:
{'ann__dropout_rate': 0.4, 'ann__learning_rate': 0.01, 'ann__n_neurons': 16, 'ann__num_hidden': 1, 'ann__regularization_rate': 0.6}
model = Sequential()
model.add(Dense(input_shape, activation="relu",
input_dim=input_shape)) # First Layer
model.add(Dense(n_neurons, activation="relu",
kernel_regularizer=tf.keras.regularizers.l1(regularization_rate)))
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation="sigmoid"))
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
model.compile(loss="binary_crossentropy",
optimizer=optimizer, metrics=['accuracy'])
return model
Thank you!
try to shuffle your training data if you are not doing so already. You might also try a larger batch size. I also recommend using the ReduceLROnPlateau callback in model.fit. Documentation is here. Set it up to monitor validation loss and to reduce the learning rate by a factor <1 if the loss fails to reduce after patience epochs.
I implemented your #Gerry P ideas (Shuffle=true) and ReduceLROnPlateau (batch size is 64). My callbacks are now:
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_lr=1e-6, verbose=1)
early_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=20, mode='auto')
My Accuracy accuracy and Loss loss looks now like this:
I would say its still overfitted.
Confusion-Matrix:
Confusionmatrix
Related
I am training a neural network with a constant learning rate and epoch = 45. I observed that the accuracy is highest at epoch no 35 and then it wiggles around and decreases. I think I need to reduce the learning rate at epoch 35. Is there any chance that I can train the model again from epoch no 35 after the completion of all the epochs? My code is shown below-
model_nn = keras.Sequential()
model_nn.add(Dense(352, input_dim=28, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(384, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(288, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(448, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(320, activation='relu', kernel_regularizer=l2(0.001)))
model_nn.add(Dense(1, activation='sigmoid'))
auc_score = tf.keras.metrics.AUC()
model_nn.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
metrics=['accuracy',auc_score])
history = model_nn.fit(X_train1, y_train1,
validation_data=(X_test, y_test),
epochs=45,
batch_size=250,
verbose=1)
_, accuracy = model_nn.evaluate(X_test, y_test)
# Saving model weights
model_nn.save('mymodel.h5')
You can do two useful things:
Use the ModelCheckpoint callback with the save_best_only=True parameter. It only saves when the model is considered the "best" and the latest best model according to the quantity monitored will not be overwritten.
Use the ReduceLROnPlateau and EarlyStopping callbacks. ReduceLROnPlateau will reduce learning rate when the metric has stops improving for the validation subset. EarlyStopping will stop training when a monitored metric has stopped improving at all.
In simple words, ReduceLROnPlateau helps us find the global minimum, EarlyStopping takes care of the number of epochs, and ModelCheckpoint will save the best model.
The code might look like this:
early_stoping = EarlyStopping(patience=5, min_delta=0.0001)
reduce_lr_loss = ReduceLROnPlateau(patience=2, verbose=1, min_delta=0.0001, factor=0.65)
model_checkpoint = ModelCheckpoint(save_best_only=True)
history = model_nn.fit(X_train1, y_train1,
validation_data=(X_test,y_test),
epochs=100,
batch_size=250,
verbose=1,
callbacks=[early_stoping, reduce_lr_loss, model_checkpoint])
I am trying to build a regression model but the mse and mae are very high. I filter and normalize the data (both the input and output, and also the test and train set). I think the problem comes because I have very high values in one column: the minimum is 1 and the maximum is 9100000 (without normalizing), but I actually need to predict these high values.
The model looks like this: I have 6 input columns and 800000 rows. And I have tried with more neurons and layers, or changing the sigmoid function, but the loss and the error keep being around 0.8 for mse and 0.3 for mae. The predictions are also way lower than they should be, never achieving the high values.
model = Sequential()
model.add(Dense(7, input_dim=num_input, activation='relu'))
model.add(Dense(7, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='rmsprop', metrics=['mse', 'mae'])
history = model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_data=(x_val, y_val))
A few remarks/advices:
RMSProp is generally not used with fully connected layers, I recommend switching to Adam or SGD.
If you have a skewed distribution with many large values, you might consider using the log of these values instead.
First try with a shallow model with few neurons. Then gradually increase the number of neurons in order to overfit the dataset. You should be able to reach perfect score on the train set. At that point you can start decrease the number of neurons and add layers with dropout to improve generalisation.
As already mentioned in the comments, the output activation for regression should be "linear". Sigmoid is for binary classification.
Iam doing a text classification, my dataset size is 16000 KB, my problem is I have 95% of training and 90% in testing.. can I increase testing ? and how?
here is my code
model = Sequential()
model.add(Conv1D( filters=256,kernel_size=5, activation = 'relu',input_shape=(7,1)))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(Dense(11, activation='softmax'))
model.summary()
model.compile(Adam(lr=0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
epochs=200,
verbose=True,
validation_data=(X_test, y_test),
batch_size=128)
loss, accuracy = model.evaluate(X_train, y_train, verbose=True)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))
The first step to debug the model is to plot the training validation curve like the example.
Typical training validation curve
Now based on how the curves behave there can be below possible inferences and solutions.
The two curves diverge as the model is trained, training keeps on improving while the testing either gets worse or saturates way earlier than training.
Cause: Model is overfitting the training and needs regularisation eg. dropout, weight decay, etc.
The two curves stick close together at the end and no further improvements happen.
Cause: Model is saturated or stuck in local minima, try increasing the learning rate to push out of minima, if still no major improvements, try adding more complexity to the model.
The two curves have saturated at the end, but are a small distance apart, and not major changes happed as further trained.
Cause: the model has learned what it could from the available data and will not improve any further, try data transformations to generate new data or get more data.
I need to estimate a continuous value and have a dataset with 11 fields + the one to be estimated. Therefore I "cleaned" the data, optimized with MinMaxScaler, shuffled the rows and proceded to create my model. By the way is a 127K dataset and my testing data is 20%. Here's a snippet of the model:
model = Sequential()
model.add(Dense(64,activation='relu'))
model.add(Dense(64,activation='relu'))
model.add(Dense(1))
model.compile(tf.keras.optimizers.Adam(
learning_rate=0.01,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-07,
amsgrad=False,
name='Adam'),
loss='mse')
The fit:
model.fit(x=X_train,y=y_train.values,
validation_data=(X_test,y_test.values),
batch_size=128,epochs=100)
But the outcome:
losses = pd.DataFrame(model.history.history)
losses.plot()
The history shows big gap between the training loss and the validation loss:
I've tried adding a dropout layer between both the input and the middle layer but it only makes it worse.
I'm training a text classification model where the input data consists of 4096 term frequency–inverse document frequencies.
My output are 416 possible categories. Each piece of data has 3 categories, so there are 3 ones in an array of 413 zeros (one-hot-encodings)
My model looks like this:
model = Sequential()
model.add(Dense(2048, activation="relu", input_dim=X.shape[1]))
model.add(Dense(512, activation="relu"))
model.add(Dense(416, activation="sigmoid"))
When I train it with the binary_crossentropy loss, it has a loss of 0.185 and an accuracy of 96% after one epoch. After 5 epochs, the loss is at 0.037 and the accuracy at 99.3%. I guess this is wrong, since there are a lot of 0s in my labels, which it classifies correctly.
When I train it with the categorical_crossentropy loss, it has a loss of 15.0 and an accuracy of below 5% in the first few epochs, before it gets stuck at a loss of 5.0 and an accuracy of 12% after several (over 50) epochs.
Which one of those would be right for my situation (large one-hot-encodings with multiple 1s)? What do these scores tell me?
EDIT: These are the model.compile() statement:
model.compile(loss='categorical_crossentropy',
optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
and
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
In short: the (high) accuracy reported when you use loss='binary_crossentropy' is not the correct one, as you already have guessed. For your problem, the recommended loss is categorical_crossentropy.
In long:
The underlying reason for this behavior is a rather subtle & undocumented issue at how Keras actually guesses which accuracy to use, depending on the loss function you have selected, when you include simply metrics=['accuracy'] in your model compilation, as you have. In other words, while your first compilation option
model.compile(loss='categorical_crossentropy',
optimizer=keras.optimizers.Adam(),
metrics=['accuracy']
is valid, your second one:
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
will not produce what you expect, but the reason is not the use of binary cross entropy (which, at least in principle, is an absolutely valid loss function).
Why is that? If you check the metrics source code, Keras does not define a single accuracy metric, but several different ones, among them binary_accuracy and categorical_accuracy. What happens under the hood is that, since you have selected loss='binary_crossentropy' and have not specified a particular accuracy metric, Keras (wrongly...) infers that you are interested in the binary_accuracy, and this is what it returns - while in fact you are interested in the categorical_accuracy.
Let's verify that this is the case, using the MNIST CNN example in Keras, with the following modification:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # WRONG way
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=2, # only 2 epochs, for demonstration purposes
verbose=1,
validation_data=(x_test, y_test))
# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0)
score[1]
# 0.9975801164627075
# Actual accuracy calculated manually:
import numpy as np
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98780000000000001
score[1]==acc
# False
Arguably, the verification of the above behavior with your own data should be straightforward.
And just for the completeness of the discussion, if, for whatever reason, you insist in using binary cross entropy as your loss function (as I said, nothing wrong with this, at least in principle) while still getting the categorical accuracy required by the problem at hand, you should ask explicitly for categorical_accuracy in the model compilation as follows:
from keras.metrics import categorical_accuracy
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[categorical_accuracy])
In the MNIST example, after training, scoring, and predicting the test set as I show above, the two metrics now are the same, as they should be:
# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0)
score[1]
# 0.98580000000000001
# Actual accuracy calculated manually:
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98580000000000001
score[1]==acc
# True
System setup:
Python version 3.5.3
Tensorflow version 1.2.1
Keras version 2.0.4