Keras save_weights and ModelCheckpoint Difference - python

I save keras model by two ways
1. "ModelCheckpoint"
2. "save_weights" after training model
But performance of those two are different when load trained model using "load_weights" and "predict"
My code is as follows
Train & Save Model
model_checkpoint = ModelCheckpoint("Model_weights.hdf5", verbose=1, save_best_only=True)
early_stopping = EarlyStopping(monitor='val_loss', patience=20, verbose=1, restore_best_weights=True)
hist = Model.fit(x=train_dict, y=train_label,
batch_size=batch_size, epochs=epochs,
validation_data=(valid_dict, valid_label),
callbacks=[csv_logger, early_stopping, model_checkpoint])
Model.save_weights("Model_weights.h5")
Load Trained Model and Test
Model = create_model() # Construct model skeleton
hdf5_model = load_model("Model_weights.hdf5")
h5_model = load_model("Model_weights.h5")
There are difference between "hdf5_model.predict(train)" and "h5_model.predict(train)"

First, you need to understand what ModelCheckpoint actually does. It saves only the best weight. You can see the loss and accuracy for each epoch during training. It changes on each epoch. Sometimes it increases and sometimes it decreases as the model continuously updating its weights.
Let's assume a situation. You're training your model for 50 epochs. It's possible that you will get loss = 0.25 on the 45th epoch and loss = 0.37 on the 50th epoch. It's very normal. ModelCheckpoint will only save 45th epochs weight. It won't update on the 50th epoch. ModelCheckpoint only saves the weight only if loss decreases(you can also change the logic via parameter). But if you save the weights after training is completed, it will save with a loss of 0.37 which is higher.
It's very normal that the model saved via ModelCheckpoint has lower loss value and final model has a higher value. That's why you're getting different predictions from these two models.
If you take a look at the graph below, you can see the best loss value was achieved on the 98th epoch. So your ModelCheckpoint saving the weights on 98th epoch and never updating it.

Related

Tensorflow model pruning gives 'nan' for training and validation losses

I'm trying to prune a base model that consists of several layers on top of a VGG network. It also contains a user-defined layer named instance_normalization. For pruning to be successful, I've defined the get_prunable_weights function of this layer as follows:
### defined for model pruning
def get_prunable_weights(self):
return self.weights
I used the following function to obtain a to-be-pruned model structure using a base model named model:
def define_prune_model(self, model, img_shape, epochs, batch_size, validation_split=0.1):
num_images = img_shape[0] * (1 - validation_split)
end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs
# Define model for pruning.
pruning_params = {
'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.5,
final_sparsity=0.80,
begin_step=0,
end_step=end_step)
}
model_for_pruning = prune_low_magnitude(model, **pruning_params)
model_for_pruning.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model_for_pruning.summary()
return model_for_pruning
Then, I wrote the following function to perform training on this pruning model:
def train_prune_model(self, model_for_pruning, train_images, train_labels,
epochs, batch_size, validation_split=0.1):
callbacks = [
tfmot.sparsity.keras.UpdatePruningStep(),
tfmot.sparsity.keras.PruningSummaries(log_dir='./models/pruned'),
]
model_for_pruning.fit(train_images, train_labels,
batch_size=batch_size, epochs=epochs, validation_split=validation_split,
callbacks=callbacks)
return model_for_pruning
However, when training, I found out that the training and validation losses were all nan, and the final model prediction output was totally zero. However, the base model that passed to define_prune_model has successfully trained and predicted correctly.
How can I solve this? Thank you in advance.
It is difficult to pinpoint the issue without more informations. In particular, can you please give more detail (preferably as code) about your custom instance_normalization layer ?
Assuming that the code is fine: Since you mentioned that the model trains correctly without pruning, could it be that those pruning parameters are too harsh ? After all, those options set 50% of the weights to zero right from the first learning step.
Here is what I would try:
Experiment with a lower level of sparsity (especially initial_sparsity).
Start to apply pruning later during the training (begin_step argument of the pruning schedule). Some even prefer to train the model once without applying pruning at all. Then re-train again with prune_low_magnitude().
Only prune at some steps, giving time for the model to recover between prunings (frequency argument).
Finally should it still fail, the usual cures when encountering nan losses: reduce the learning rate, use regularization or gradient clipping, ...

PyTorch - Creating Federated CIFAR-10 Dataset

I'm training a neural network (doesn't matter which one) on CIFAR-10 dataset. I'm using Federated Learning:
I have 10 models, each model having access to its own part of the dataset. At every time step, each model makes a step using its own data, and then the global model is an average of the model (this version is based on this, but I tried a lot of options):
def server_aggregate(server_model, client_models):
global_dict = server_model.state_dict()
for k in global_dict.keys():
global_dict[k] = torch.stack([client_models[i].state_dict()[k].float() for i in range(len(client_models))], 0).mean(0)
server_model.load_state_dict(global_dict)
for model in client_models:
model.load_state_dict(server_model.state_dict())
To be specific, each machine only has access to a data corresponding to a single class. I.e. machine 0 has only samples corresponding to class 0, etc. I'm doing it the following way:
def split_into_classes(full_ds, batch_size, num_classes=10):
class2indices = [[] for _ in range(num_classes)]
for i, y in enumerate(full_ds.targets):
class2indices[y].append(i)
datasets = [torch.utils.data.Subset(full_ds, indices) for indices in class2indices]
return [DataLoader(ds, batch_size=batch_size, shuffle=True) for ds in datasets]
Problem. During training, I can see that my federated training loss decreases. However, I never see my test loss/accuracy improve (acc is always around 10%).
Moreover, when I check accuracy on train/test datasets:
For the federated dataset, the accuracy improves.
For the testing dataset, the accuracy doesn't improve.
(Most surprising) for the training dataset, the accuracy doesn't improve. Note that this dataset is essentially the same as federated dataset, but not split into classes. The checking code is the following:
def epoch_summary(model, fed_loaders, true_train_loader, test_loader, frac):
with torch.no_grad():
train_len = 0
train_loss, train_acc = 0, 0
for train_loader in fed_loaders:
cur_loss, cur_acc, cur_len = true_results(model, train_loader, frac)
train_loss += cur_len * cur_loss
train_acc += cur_len * cur_acc
train_len += cur_len
train_loss /= train_len
train_acc /= train_len
true_train_loss, true_train_acc, true_train_len = true_results(model, true_train_loader, frac)
test_loss, test_acc, test_len = true_results(model, test_loader, frac)
print("TrainLoss: {:.4f} TrainAcc: {:.2f} TrueLoss: {:.4f} TrueAcc: {:.2f} TestLoss: {:.4f} TestAcc: {:.2f}".format(
train_loss, train_acc, true_train_loss, true_train_acc, test_loss, test_acc
), flush=True)
The full code can be found here. Things which don't seem to matter:
Model. I got the same problem for Resnet models and for some other models.
How I aggregate the models. I tried using state_dict or directly manipulate model.parameters(), no effect.
How I learn the models. I tried using optim.SGD or directly update param.data -= learning_rate * param.grad, no effect.
Computational graph. I've tried adding .detach().clone() and with torch.no_grad() into all possible places, no effect.
So I'm suspecting that the problem is somehow with the federated data itself (especially given strange accuracy results). What can be a problem?
10% on CIFAR-10 is basically random - your model outputs labels at random and gets 10%.
I think the problem lies in your "federated training" strategy: you cannot expect your sub-models to learn anything meaningful when all they see is a single label. This is why training data is shuffled.
Think of it: if each of your sub models learns all weights to be zero apart from the bias vector of the last classification layer that has 1 in the entry corresponding to the class this sub-model sees - the training of each sub model is perfect (it gets it right for all training samples it sees), but the averaged model is meaningless.

Tensorflow neural network doesn’t learn

I built a neural network for a university project. The goal is to find out if sensor data (temperature, humidity and light) can predict if the sunrise happened during a given time frame. So, it is a binary classification.
The problem is that the network does not learn. The accuracy converges towards about 0.8 and does not change after about 5 epochs. Same with the loss, which sits at about 0.4921 after a few epochs. I tried several things like changing the activation function or the number of hidden layers, but nothing worked.
I also created a dataset with an equal amount of "sunrise = 1" and "sunrise = 0" data points. The accuracy ended up at exactly 0,5. Therefore I think that there is something wrong with the network setup itself.
Do you have any idea what could be wrong?
Here is my code:
def build_network():
input = keras.Input(shape=(4,25), name="input")
hidden = layers.Dense(1000, activation="sigmoid", name="dense1")(input)
hidden = layers.Dense(1000, activation="sigmoid", name="dense2")(hidden)
hidden = layers.Flatten()(hidden)
hidden = layers.Dense(500, activation="sigmoid", name="dense3")(hidden)
hidden = layers.Dense(500, activation="sigmoid", name="dense4")(hidden)
hidden = layers.Dense(10, activation="sigmoid", name="dense5")(hidden)
output = layers.Dense(1, activation="sigmoid", name="output")(hidden)
model = keras.Model(inputs=input, outputs=output, name="sunrise_model")
return model
def train_model():
training_files = r'data/training'
test_files = r'data/test'
print('reding files...')
train_x, train_y = load_data(training_files)
test_x, test_y = load_data(test_files)
print("training network")
# compile model
model = build_network()
model.compile(
loss=keras.losses.BinaryCrossentropy(from_logits=False),
optimizer=keras.optimizers.RMSprop(),
metrics=["accuracy"],
)
# Train / fit
model.fit(train_x, train_y, batch_size=100, epochs=200)
# evaluate
test_scores = model.evaluate(test_x, test_y, verbose=2)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Here is the output: loss: 0.4921 - accuracy: 0.8225
Test loss: 0.4921109309196472,
Test accuracy: 0.8225
And here is an example of the data: https://hastebin.com/hazipagija.json
I would use RELU instead of sigmoid as the activation function. What was the learning rate you used? Try a smaller learning rate. Actually I find I get the best results using a variable learning rate. The Keras callback ReduceLROnPlateau makes this easy to do. Documentation is here. I also recommend that you use the Keras callback ModelCheckpoint to save the model with the lowest validation loss then use that model to make predictions on the test set. Documentation is here.I also think your model has to many parameters and will overfit. Add dropout layers to the model to help reduce this problem. I would try reducing the model complexity as a good alternative. Take out in of the layers with 1000 nodes and one of the layers with 500 nodes and see what results you get. I also prefer to use the Adamax optimizer. Documentation is here.. Use the default values.

Training with keras using fragments of data

I train a sequential model (20 dense layers) in keras (python) using default settings and just 1 epoch.
All layers are activated with relu, except the last on that uses sigmoid.
METHOD A:
Feed model with 1,000,000 records of labeled training data.
METHOD B:
Train model with 50,000 records
Save the model
Do some stuff
Load saved model
Train with another 50,000 records
Repeat until all 1,000,000 records are used
Why is there a discrepancy between the above 2 methods?
I always get better accuracy using all data at once, than using it in groups.
What is the reason for that?
model = Sequential()
model.add(Dense(30, input_dim = 27, activation = 'relu'))
...
model.add(Dense(1, input_dim = 10, activation = 'sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = 'sgd', metrics = ['accuracy'])
model.load_weights(PreviousWeightsFile)
model.fit(X, Y, verbose = 0)
model.save_weights(WeightsFile)
(exit python and do some stuff)
from the documentation, here the crucial model parameters for your question
initial_epoch: Integer. Epoch at which to start training (useful for
resuming a previous training run).
and
epochs: Integer. Number of epochs to train the model. An epoch is an
iteration over the entire x and y data provided. Note that in
conjunction with initial_epoch, epochs is to be understood as "final
epoch". The model is not trained for a number of iterations given by
epochs, but merely until the epoch of index epochs is reached.
You are not using these parameters therefore you are overwriting your weights and are not resuming training like you could with the epochs parameter. That's the reason why your model always performs worse with method B.
With all the data, the interactions between features and the resultant backpropagation will be more accurate with all present data; this allows for features and the architecture of the model to build upon additional epochs.
When you save and reload you essentially restart this.

Keras: Optimal epoch selection

I'm trying to write some logic that selects the best epoch to run a neural network in Keras. My code saves the training loss and the test loss for a set number of epochs and then picks the best fitting epoch according to some logic. The code looks like this:
ini_epochs = 100
df_train_loss = DataFrame(data=history.history['loss'], columns=['Train_loss']);
df_test_loss = DataFrame(data=history.history['val_loss'], columns=['Test_loss']);
df_loss = concat([df_train_loss,df_test_loss], axis=1)
Min_loss = max(df_loss['Test_loss'])
for i in range(ini_epochs):
Test_loss = df_loss['Test_loss'][i];
Train_loss = df_loss['Train_loss'][i];
if Test_loss > Train_loss and Test_loss < Min_loss:
Min_loss = Test_loss;
The idea behind the logic is this; to get the best model, the epoch selected should select the model with the lowest loss value, but it must be above the training loss value to avoid overfitting.
In general, this epoch selection method works OK. However, if the test loss value is below the train loss from the start, then this method picks an epoch of zero (see below).
Now I could add another if statement assessing whether the difference between the test and train losses are positive or negative, and then write logic for each case, but what happens if the difference starts positive and then ends up negative. I get confused and haven't been able to write effective code.
So, my questions are:
1) Can you show me how you what code you would write to to account for the situation show in the graph (and for the case where the test and train loss curves cross). I'd say the strategy would be to take the value that with the minimum difference.
2) There is a good chance that I'm going about this the wrong way. I know Keras has a callbacks feature but I don't like the idea of using the save_best_only feature because it can save overfitted models. Any advice on a more efficient epoch selection method would be great.
Use EarlyStopping which is available in Keras. Early stopping is basically stopping the training once your loss starts to increase (or in other words validation accuracy starts to decrease). use ModelCheckpoint to save the model wherever you want.
from keras.callbacks import EarlyStopping, ModelCheckpoint
STAMP = 'simple_lstm_glove_vectors_%.2f_%.2f'%(rate_drop_lstm,rate_drop_dense)
early_stopping =EarlyStopping(monitor='val_loss', patience=5)
bst_model_path = STAMP + '.h5'
model_checkpoint = ModelCheckpoint(bst_model_path, save_best_only=True, save_weights_only=True)
hist = model.fit(data_train, labels_train, \
validation_data=(data_val, labels_val), \
epochs=50, batch_size=256, shuffle=True, \
callbacks=[early_stopping, model_checkpoint])
model.load_weights(bst_model_path)
refer to this link for more info
Here is a simple example illustrate how to use early stooping in Keras:
First necessarily import:
from keras.callbacks import EarlyStopping, ModelCheckpoint
Setup Early Stopping
# Set callback functions to early stop training and save the best model so far
callbacks = [EarlyStopping(monitor='val_loss', patience=2),
ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]
Train neural network
history = network.fit(train_features, # Features
train_target, # Target vector
epochs=20, # Number of epochs
callbacks=callbacks, # Early stopping
verbose=0, # Print description after each epoch
batch_size=100, # Number of observations per batch
validation_data=(test_features, test_target)) # Data for evaluation
See the full example here.
Please also check :Stop Keras Training when the network has fully converge; the best answer of Daniel.

Categories