I am currently building a multi-output classification model. The model has 2 outputs and I define the compile as follow:
model.compile(RMSprop(lr = 0.0003, decay = 1e-6),
loss = ["categorical_crossentropy", "categorical_crossentropy"],
metrics = ["accuracy"])
The problem is if I train 2 models separately, each model gained accuracy over 80%. However, when I combined them together, the accuracy is always around 50-60%. I tried to use the loss_weights as well but it hasn't improved.
How can I improve that?
Related
I have a simple model in tensorflow which is being trained on the first 1000 images in the MNIST datset. From my previous experience the learning rates which I used were of the order of around 0.001, however for my model to converge the learning rate needs to be far heigher, at least larger than 1. The model is shown below.
def gen_model():
return tf.keras.models.Sequential([
tf.keras.Input(shape=(28,28,)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='sigmoid'),
tf.keras.layers.Dense(10, activation='softmax')
])
model = gen_model()
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=5), loss='mean_squared_error')
model.summary()
model.fit(x_train, y_train, batch_size=1000, epochs=10000)
Is it expected for models of this form to require an extremely high learning rate, or is there something I have missed? When I use a learning rate of around 0.001 the loss changes incredibly slowly.
The dataset was created with the following code:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype("float32") / 255.0
x_train = x_train.reshape(60000,28,28)[:1000];
y_train = y_train[:1000];
y_train = tf.one_hot(y_train, 10)
Generally speaking, models that require learning rates larger than 1 raise a red flag for me. It seems like your model is a vanilla multilayer perceptron, so there's nothing overly complicated about that, but there are a couple things about your setup that stand out:
The output from your model uses a softmax, which is normally used to represent values from a categorical distribution (i.e., 1-of-k) -- this is typical for a classification model. But the loss you're using is typically used for optimizing Gaussian or regression outputs. You might want to try using a cross-entropy loss to see if that helps.
The output from your model is in probability space, so the values you get out from your model are in [0, 1]. The loss you're using is averaging the squared differences between the model output and the target 1-hot vector (whose values are in {0, 1}). The value you'll get for this loss is always smaller than 1, so with a learning rate less than 1, and multiplying by the existing model weights, the delta that you'll apply to your model weights is always going to be small. Sometimes that's a good thing, but my guess is that in this case -- and particularly at the start of training when the model weights aren't near their optimal values -- this is going to be quite slow.
Related to the above point, you might try initializing your model weights with a larger range of values than the default. This would help make the gradient values larger, but could also make the model more likely to diverge.
You could also try to replace your softmax output activation with a plain linear activation, in effect converting your model's output to (unnormalized) log-probability space. Then you'd need to change your dataset labels to also represent target log-probability values, which isn't possible exactly, but could get close with something like 1e8 * (1 - one_hot). But if you wanted to go this route, you'd effectively be implementing a cross-entropy loss yourself; see the first point.
I have a text classification that I am trying to do using BERT. Below is the code I am using. The model training code(below) works fine but I am facing issue with the prediction part
from transformers import TFBertForSequenceClassification
import tensorflow as tf
# recommended learning rate for Adam 5e-5, 3e-5, 2e-5
learning_rate = 5e-5
nlabels = 26
# we will do just 1 epoch for illustration, though multiple epochs might be better as long as we will not overfit the model
number_of_epochs = 1
# model initialization
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=nlabels,
output_attentions=False,
output_hidden_states=False)
# optimizer Adam
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=1e-08)
# we do not have one-hot vectors, we can use sparce categorical cross entropy and accuracy
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
bert_history = model.fit(ds_tr_encoded, epochs=number_of_epochs)
I am getting the output using the following
preds = model.predict(ds_te_encoded)
pred_labels_idx = np.argmax(preds['logits'], axis=1)
The issue I am facing is that the shape of pred_labels_idx is not the same as ds_te_encoded
len(pred_labels_idx) #426820
tf.data.experimental.cardinality(ds_te_encoded) #<tf.Tensor: shape=(), dtype=int64, numpy=21341>
Not sure why this is happening.
Since ds_te_encoded is of type tf.data.Dataset and you call cardinality(...), the cardinality in your case is simply the rounded number of batches and not the number of samples. So I am assuming you are using a batch size of 20, because 426820/20 = 21341. That is probably what is causing the confusion.
I am looking at these two questions and documentation:
Whats the output for Keras categorical_accuracy metrics?
Categorical crossentropy need to use categorical_accuracy or accuracy as the metrics in keras?
https://keras.io/api/metrics/probabilistic_metrics/#categoricalcrossentropy-class
For classification of X-Rays images I (15 classes) I do:
# Compile a model
model1.compile(optimizer = 'adam', loss = 'categorical_crossentropy',
metrics = ['accuracy'])
# Fit the model
history1 = model1.fit_generator(train_generator, epochs = 10,
steps_per_epoch = 10, verbose = 1, validation_data = valid_generator)
My model works and I have an output:
But I am not sure how to add validation accuracy here to compare results and avoid over/underfitting.
I hope the following can help you:
The use of "categorical_crossentropy" tells me that your labels are a one hot encoding over different classes.
Let's say you have 15 classes, the correct prediction would be a vector with 14 zeros, and a one at the corresponding index. In this context "accuracy" will be very high as your model will be correctly predicting mostly zero everywhere, so the accuracy should easily be at least 13/15 = 0.86.
A more suitable metric would be "categorical_accuracy" which will give you 1 if the model predicts the correct index, and else 0.
If you have a validation "categorical_accuracy" better than 1/15 = 0.067 (assuming your class are correctly balanced), your model is better than random.
You can find a list of metrics at keras metrics.
I save keras model by two ways
1. "ModelCheckpoint"
2. "save_weights" after training model
But performance of those two are different when load trained model using "load_weights" and "predict"
My code is as follows
Train & Save Model
model_checkpoint = ModelCheckpoint("Model_weights.hdf5", verbose=1, save_best_only=True)
early_stopping = EarlyStopping(monitor='val_loss', patience=20, verbose=1, restore_best_weights=True)
hist = Model.fit(x=train_dict, y=train_label,
batch_size=batch_size, epochs=epochs,
validation_data=(valid_dict, valid_label),
callbacks=[csv_logger, early_stopping, model_checkpoint])
Model.save_weights("Model_weights.h5")
Load Trained Model and Test
Model = create_model() # Construct model skeleton
hdf5_model = load_model("Model_weights.hdf5")
h5_model = load_model("Model_weights.h5")
There are difference between "hdf5_model.predict(train)" and "h5_model.predict(train)"
First, you need to understand what ModelCheckpoint actually does. It saves only the best weight. You can see the loss and accuracy for each epoch during training. It changes on each epoch. Sometimes it increases and sometimes it decreases as the model continuously updating its weights.
Let's assume a situation. You're training your model for 50 epochs. It's possible that you will get loss = 0.25 on the 45th epoch and loss = 0.37 on the 50th epoch. It's very normal. ModelCheckpoint will only save 45th epochs weight. It won't update on the 50th epoch. ModelCheckpoint only saves the weight only if loss decreases(you can also change the logic via parameter). But if you save the weights after training is completed, it will save with a loss of 0.37 which is higher.
It's very normal that the model saved via ModelCheckpoint has lower loss value and final model has a higher value. That's why you're getting different predictions from these two models.
If you take a look at the graph below, you can see the best loss value was achieved on the 98th epoch. So your ModelCheckpoint saving the weights on 98th epoch and never updating it.
I had trained a RNN/LSTM model. I would like to interpret my model results, after plotting the graph for Loss and accuracy (b/w training and Validation data set).
My objective is to classify the labels (either 0 or 1) if i provide only a partial input to the model. In such a way I have performed training.
Train_Validate_Test_Split
Train 80% ; Validation 10 % ; Test 10%
X_train_shape : (243, 100, 5)
Y_train_shape : (243,)
X_validate_shape : (31, 100, 5)
Y_validate_shape : (31,)
X_test_shape : (28, 100, 5)
Y_test_shape : (28,)
Model Summary
Model Graph
Question or Interpretation from the model results
Q 1 : What can I understand/interpret from loss and Accuracy graph ?
How can i confirm whether the model trained properly for my data set or not ?
Q 2 : Whether oscillations in both loss and accuracy, have some effect in model training ? (Or it is a normal behavior) If not, how
can I regularize my model without oscillations ?
Q 3 : What can I interpret or understand from my metrics tabular
column ?
My Y_test accuracy is more when compared with Train & Validation accuracy, What can i interpret from this behavior ?
Model Metrics