how to get accuracy from callback function to keras sequence - python

How can I get accuracy calculated each epoch in keras sequence?
Accuracy after each epoch is being printed on console like:
Epoch 14/500
90/90 [==============================] - 17s 184ms/step - loss: 0.6935 - sparse_categorical_accuracy: 0.5174 - val_loss: 0.6927 - val_sparse_categorical_accuracy: 0.5146 - lr: 0.0010
I am using tf.keras.utils.Sequence to change dataset every epoch with on_epoch_end.
I want to use this accuracy in the Sequence to decide whether to change the dataset or not.
How can I call(or get) this accuracy from callback to Sequence?

Related

Why the results on the same val set are different in Model.fit() and Model.evaluate()?

I want to use a resnet50 for a regression task. And I use a custom loss for training. I want to use checkpoints to save the best model which has the minimum loss on testing data. The codes for model's training are as follows:
input_shape = (32, 32, 1)
inputs = keras.Input(shape=input_shape)
outputs = tf.keras.applications.ResNet50(
include_top=False, weights=None, input_tensor=None,
input_shape=input_shape, pooling='max'
)(inputs)
outputs = keras.layers.Dense(1, activation=None)(outputs)
model = keras.Model(inputs, outputs)
model.compile(optimizer='adam',
loss=EWC_loss(model,fisher_1,prior_weights_1,Lambda=1),
metrics='mse')
checkpoint_filepath_3 = 'F:/NTU_PyCode/CL_regression_mnist/saved_resnet/resnet50_task2_epoch=5(1).h5'
model_checkpoint_callback_2 = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath_3,
save_weights_only=True,
monitor='val_loss',
mode='min',
save_best_only=True)
model.fit(x_train_2,y_train_2,batch_size=32,shuffle=True,
validation_data=(x_test_2, y_test_2), epochs=5,
callbacks=[model_checkpoint_callback_2])
And here are the training results. In my plan, the model's weights after the 3rd epoch will be saved to the checkpoint_filepath. Because it has the minimum val_loss (val_mse is not minimum because the custom loss involves other terms).
2/1875 [..............................] - ETA: 1:07 - loss: 8.4497 - mse: 8.4489WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0239s vs `on_train_batch_end` time: 0.0449s). Check your callbacks.
1875/1875 [==============================] - 136s 73ms/step - loss: 2.6100 - mse: 2.5062 - val_loss: 5.5797 - val_mse: 5.4108
Epoch 2/5
1875/1875 [==============================] - 129s 69ms/step - loss: 1.2896 - mse: 1.1265 - val_loss: 1.6604 - val_mse: 1.4745
Epoch 3/5
1875/1875 [==============================] - 128s 68ms/step - loss: 0.9861 - mse: 0.7998 - val_loss: 1.4171 - val_mse: 1.2161
Epoch 4/5
1875/1875 [==============================] - 128s 68ms/step - loss: 1.1695 - mse: 0.8958 - val_loss: 1.4705 - val_mse: 1.2034
Epoch 5/5
1875/1875 [==============================] - 129s 69ms/step - loss: 1.0095 - mse: 0.7305 - val_loss: 11.7203 - val_mse: 11.4236
But when I load the weights and use the evaluate function to evaluate on the same testing data, there comes the problem. The loss is not custom loss here but the metric is still mse. So I assume the mse in evaluation function should be the same to the result in fit function(same as val_mse in the 3rd epoch). But the MSEs are very different!
model.compile(optimizer='adam',
loss=tf.keras.losses.mse,
metrics='mse')
print("EWC model on Task 2")
model.load_weights(checkpoint_filepath_3)
model.evaluate(x_test_2,y_test_2)
EWC model on Task 2
313/313 [==============================] - 4s 13ms/step - loss: 9.1384 - mse: 9.1384
What causes this phenomenon? Is that the weights not be saved into the checkpoints? Or any other issues? Thank you in advance~
After more experiments, I found a puzzled phenomenon. If I run the codes of training and evaluation together, the results are correct! The results for 2 epochs in training and evaluation are showed as follows. And we can see the MSEs are the same.
2/1875 [..............................] - ETA: 59s - loss: 15.2813 - mse: 15.2805WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0190s vs `on_train_batch_end` time: 0.0439s). Check your callbacks.
1875/1875 [==============================] - 137s 73ms/step - loss: 2.0093 - mse: 1.9253 - val_loss: 1.8885 - val_mse: 1.7217
Epoch 2/2
1875/1875 [==============================] - 129s 69ms/step - loss: 1.1946 - mse: 1.0230 - val_loss: 1.1102 - val_mse: 0.9254
EWC model on Task 2
313/313 [==============================] - 4s 13ms/step - loss: 0.9254 - mse: 0.9254
But if I train and evaluate separately (run codes for training first, then just load the saved weights in model and evaluate), The results are different.
EWC model on Task 2
313/313 [==============================] - 4s 14ms/step - loss: 9.0702 - mse: 9.0702
Why is that? That's really confusing. Is there any difference between train and evaluate in one run and separately?
I don't understand the details but when you use model checkpoint and save the weights only or even the whole model when you execute model.load_weights it is a complex process that is described here. When you recompile before loading the weights that restoration process apparently gets messed up. I did find a note that says changing model.compile can cause the restoration to fail.

Does EarlyStopping take model of last epoch or last best score?

I trained a model in keras using EarlyStopping in callbacks with patience=2:
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=validation_split,
callbacks= [keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=2, verbose=0, mode='min'), keras.callbacks.ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', verbose=1, save_best_only=True)], class_weight=class_weights)
Epoch 1/20
3974/3975 [============================>.] - ETA: 0s - loss: 0.3499 - accuracy: 0.7683
Epoch 00001: val_loss improved from inf to 0.30331, saving model to best_model.h5
3975/3975 [==============================] - 15s 4ms/step - loss: 0.3499 - accuracy: 0.7683 - val_loss: 0.3033 - val_accuracy: 0.8134
Epoch 2/20
3962/3975 [============================>.] - ETA: 0s - loss: 0.2821 - accuracy: 0.8041
Epoch 00002: val_loss improved from 0.30331 to 0.25108, saving model to best_model.h5
3975/3975 [==============================] - 14s 4ms/step - loss: 0.2819 - accuracy: 0.8043 - val_loss: 0.2511 - val_accuracy: 0.8342
Epoch 3/20
3970/3975 [============================>.] - ETA: 0s - loss: 0.2645 - accuracy: 0.8157
Epoch 00003: val_loss did not improve from 0.25108
3975/3975 [==============================] - 14s 4ms/step - loss: 0.2645 - accuracy: 0.8157 - val_loss: 0.2687 - val_accuracy: 0.8338
Epoch 4/20
3962/3975 [============================>.] - ETA: 0s - loss: 0.2553 - accuracy: 0.8223
Epoch 00004: val_loss did not improve from 0.25108
3975/3975 [==============================] - 15s 4ms/step - loss: 0.2553 - accuracy: 0.8224 - val_loss: 0.2836 - val_accuracy: 0.8336
Wall time: 58.4 s
Obviously the model didn't improve after epoch 2, but patience=2 led the algorithm terminate after 4 epochs. When I run model.evaluate now, does it take the model trained after 4 epochs or does it take the model trained after 2 epochs?
Is there any need to save and load the best model with ModelCheckpoint then in order to evaluate?
In you specific case, it retain the model after 4 epochs:
as I can see, the model did indeed improve after the first 2 epochs:
accuracy
0.7683 - 0.8043 - 0.8157 - 0.8224
loss
0.3499 - 0.2821 - 0.2645 - 0.2553
Early stopping would stop your training after the first two epochs if had there been no improvements. In this situation your loss isn't stuck after the first two epochs
So early stopping is not called at the end of epoch 2, and the model returned is the one at the end of your first 4 epochs and so the training has continued
From tensorflow:
Assuming the goal of a training is to minimize the loss. With this,
the metric to be monitored would be 'loss', and mode would be 'min'. A
model.fit() training loop will check at end of every epoch whether the
loss is no longer decreasing, considering the min_delta and patience
if applicable. Once it's found no longer decreasing,
model.stop_training is marked True and the training terminates.
If your model had not improved for after two epochs, your returned model would have been the model after the first two epochs.
The pointed parameter restore_best_weights is not directly connected to the early stopping.
It simply restore the weights of the model at the best detected performances. Always from tf:
Whether to restore model weights from the epoch with the best value of
the monitored quantity. If False, the model weights obtained at the
last step of training are used.
To make an example, you can use restore_best_weights even without early_stopping. The training would end only at the end of your set number of epochs, but the model returned at the end would have been the one with the best performances. This could be useful in the situation of bouncing loss due to a wrong learning rate and\or optimizer
refs: early stopping tf doc

Printing from a callback disrupts data printed by model,fit and data values are different

I am training a model using tensorflow. I have a custom callback which on epoch_end prints
out some data. specifically training accuracy or validation loss. With verbose=1
in model,fit tensorflow prints out the training loss, training accuracy, val_loss and val_accuracy as
shown below. Apparently is is also using tqdm to print the progress bar as well. Problem is if I printout the train accuracy captured from acc=logs.get('accuracy') in the callback at the end of the
epoch it is different from the value model.fit prints out. It is like the model.fit
accuracy value is not from the last batch of the epoch but from the next to last batch. Also you can see that my print out from the callback interrupts the model.fit printout. Note for epoch 1 I have accuracy as 0.9172 while the model.fit value is either 0.8805 or 0.8808.
Anyone know why this is or how to fix it ? Data is shown below;
Epoch 1/30
129/129 [==============================] - ETA: 0s - loss: 3.1791 - accuracy: 0.8805
training accuracy improved from 0.0000 to 0.9172 learning rate held at 0.002000 # callback printed data
129/129 [==============================] - 50s 389ms/step - loss: 3.1723 - accuracy: 0.8808 - val_loss: 3.1509 - val_accuracy: 0.5783
Epoch 2/30
129/129 [==============================] - ETA: 0s - loss: 0.9478 - accuracy: 0.9671
training accuracy improved from 0.9172 to 0.9661 learning rate held at 0.002000 # calback printed data
129/129 [==============================] - 43s 333ms/step - loss: 0.9466 - accuracy: 0.9671 - val_loss: 0.9385 - val_accuracy: 0.8000
Epoch 3/30
129/129 [==============================] - ETA: 0s - loss: 0.4501 - accuracy: 0.9787
OK its a long story but I am using tensorflow 2.1. I had supresed the annoying tensorflow warnings so I did not see any warnings. Out of suspicion I enabled warning and got the warning below.
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch
time (batch time: 0.0350s vs `on_train_batch_end` time: 0.0910s). Check your callbacks.
After much searching I found warning shows up if you run model.fit with verbose=1. Apparently this was a bug in the tensorflow release that causes accuracy data printed out by model.fit to be incorrect. If you run model.fit with verbose=2 you do not get the warning and the accuracy data printed is correct. Problem solved

In Neural Networks: accuracy improvement after each epoch is GREATER than accuracy improvement after each batch. Why?

I am training a neural network in batches with Keras 2.0 package for Python.
Below is some information about the data and the training parameters:
#samples in train: 414934
#features: 590093
#classes: 2 (binary classification problem)
batch size: 1024
#batches = 406 (414934 / 1024 = 405.2)
Below are some logs of the follow code:
for i in range(epochs):
print("train_model:: starting epoch {0}/{1}".format(i + 1, epochs))
model.fit_generator(generator=batch_generator(data_train, target_train, batch_size),
steps_per_epoch=num_of_batches,
epochs=1,
verbose=1)
(partial) Logs:
train_model:: starting epoch 1/3
Epoch 1/1
1/406 [..............................] - ETA: 11726s - loss: 0.7993 - acc: 0.5996
2/406 [..............................] - ETA: 11237s - loss: 0.7260 - acc: 0.6587
3/406 [..............................] - ETA: 14136s - loss: 0.6619 - acc: 0.7279
404/406 [============================>.] - ETA: 53s - loss: 0.3542 - acc: 0.8917
405/406 [============================>.] - ETA: 26s - loss: 0.3541 - acc: 0.8917
406/406 [==============================] - 10798s - loss: 0.3539 - acc: 0.8918
train_model:: starting epoch 2/3
Epoch 1/1
1/406 [..............................] - ETA: 15158s - loss: 0.2152 - acc: 0.9424
2/406 [..............................] - ETA: 14774s - loss: 0.2109 - acc: 0.9419
3/406 [..............................] - ETA: 16132s - loss: 0.2097 - acc: 0.9408
404/406 [============================>.] - ETA: 64s - loss: 0.2225 - acc: 0.9329
405/406 [============================>.] - ETA: 32s - loss: 0.2225 - acc: 0.9329
406/406 [==============================] - 13127s - loss: 0.2225 - acc: 0.9329
train_model:: starting epoch 3/3
Epoch 1/1
1/406 [..............................] - ETA: 22631s - loss: 0.1145 - acc: 0.9756
2/406 [..............................] - ETA: 24469s - loss: 0.1220 - acc: 0.9688
3/406 [..............................] - ETA: 23475s - loss: 0.1202 - acc: 0.9691
404/406 [============================>.] - ETA: 60s - loss: 0.1006 - acc: 0.9745
405/406 [============================>.] - ETA: 31s - loss: 0.1006 - acc: 0.9745
406/406 [==============================] - 11147s - loss: 0.1006 - acc: 0.9745
My question is: what happens after each epoch that improves the accuracy like that? For example, the accuracy at the end of the first epoch is 0.8918, but at the beginning of the second epoch accuracy of 0.9424 is observed. Similarly, the accuracy at the end of the second epoch is 0.9329, but the third epoch starts with accuracy of 0.9756.
I would expect to find an accuracy of ~0.8918 at the beginning of the second epoch, and ~0.9329 at the beginning of the third epoch.
I know that in each batch there is one forward pass and one backward pass of training samples in the batch. Thus, in each epoch there is one forward pass and one backward pass of all training samples.
Also, from Keras documentation:
Epoch: an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation.
Why is the accuracy improvement within each epoch is smaller than the accuracy improvement between the end of epoch X and the beginning of epoch X+1?
This has nothing to do with your model or your dataset; the reason for this "jump" lies in how metrics are calculated and displayed in Keras.
As Keras processes batch after batch, it saves accuracies at each one of them, and what it displays to you is not the accuracy on the latest processed batch, but the average over all batches in the current epoch. And, as the model is being trained, accuracies over successive batches tend to improve.
Now consider: in the first epoch, let's say, there are 50 batches, and network went from 0% to 90% during these 50 batches. Then at the end of the epoch Keras will show accuracy of, e.g. (0 + 0.1 + 0.5 + ... + 90) / 50%, which is, obviously, much less than 90%! But, because your actual accuracy is 90%, the first batch of the second epoch will show 90%, giving the impression of a sudden "jump" in quality. The same, obviously, goes for loss or any other metric.
Now, if you want more realistic and trustworthy calculation of accuracy, loss, or any other metric you may find yourself using, I would suggest using validation_data parameter in model.fit[_generator] to provide validation data, which will not be used for training, but will be used only to evaluate the network at the end of each epoch, without averaging over various points in time.
The accuracy at the end of an epoch is the accuracy over the full dataset. The accuracy after each batch is the accuracy over all batches that are used for training at that moment. It could be the case that your first batch is predicted very well and the following batches have a lower accuracy. In that case the accuracy over your full dataset will be low compared to the accuracy of your first batch.

Python: Issues training and predicting regression on Keras

I'm working on a simple time series regression problem using Keras, I want to predict the next closing price using the last 20 closing prices, I have the following code according to some examples I found:
I write my sequential model in a separated function, as needed by "build_fn" parameter:
def modelcreator():
model = Sequential()
model.add(Dense(500, input_shape = (20, ),activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(250,activation='relu'))
model.add(Dense(1,activation='linear'))
model.compile(optimizer=optimizers.Adam(),
loss=losses.mean_squared_error)
return model
I create the KerasRegressor Object passing the model creator function and the desired fit parameters:
estimator = KerasRegressor(build_fn=modelcreator,nb_epoch=100, batch_size=32)
I train the model trough the KerasRegressor Object with 592 samples:
self.estimator.fit(X_train, Y_train)
And the issues start to show up, although nb_epoch=100 my model only trains for 10 epochs:
Epoch 1/10
592/592 [==============================] - 0s - loss: 6.9555e-05
Epoch 2/10
592/592 [==============================] - 0s - loss: 1.2777e-05
Epoch 3/10
592/592 [==============================] - 0s - loss: 1.0596e-05
Epoch 4/10
592/592 [==============================] - 0s - loss: 8.8115e-06
Epoch 5/10
592/592 [==============================] - 0s - loss: 7.4438e-06
Epoch 6/10
592/592 [==============================] - 0s - loss: 8.4615e-06
Epoch 7/10
592/592 [==============================] - 0s - loss: 6.4859e-06
Epoch 8/10
592/592 [==============================] - 0s - loss: 6.9010e-06
Epoch 9/10
592/592 [==============================] - 0s - loss: 5.8951e-06
Epoch 10/10
592/592 [==============================] - 0s - loss: 7.2253e-06
When I try to get a prediction using a data sample:
prediction = self.estimator.predict(test)
The prediction value should be close to the 0.02-0.04 range but when I print it I get 0.000980315962806344
Q1: How can I set the training epochs to the desired value?
Q2: How can I generate predictions with my NN?
The first thing is that you are most likely using Keras 2.0, and in that version the parameter nb_epochs was renamed to epochs.
The second thing is that you have to normalize your inputs and outputs to the [0, 1] range. It won't work without normalization. Also to match the normalized output and the network range, it would be best to use a sigmoid activation at the output layer.
Your network is not converging. Try changing the parameters. The loss should reduce consistently. Also initialize the parameters properly.

Categories