How to have Keras model do early-stopping in different fit calls

How to have Keras model do early-stopping in different fit calls - python

Since the data dimension is big for my task, 32 samples would consume nearly 9% of memory in server, of which total free memory is about 105G. So I have to do consecutive calls to fit() in the loop. And I also want to do early-stopping with the consecutive calls to fit().
However, since the callback methods introduced in Keras documents only apply to one single fit() call.
How can I do early-stopping in this case?
Following is my code snippet:
for sen_batch, cls_batch in train_data_gen:
sen_batch = np.array(sen_batch).reshape(-1, WORD_LENGTH, 50, 1)
cls_batch = np.array(cls_batch)
model.fit(x = sen_batch,y = cls_batch)
num_iterations += 1

Use fit_generator: as you have generator - you could use generator traning instead of classical fit. This method supports Callbacks so you could use keras.callbacks.EarlyStopping.
When you cannot use fit_generator:
So - first of all - you need to use train_on_batch method - as fit call resets many model states (e.g. optimizer states).
train_on_batch method returns a loss value, but it doesn't accept callbacks. So you need to implement early stopping on your own. You can do it e.g. like this:
from six import next
patience = 4
best_loss = 1e6
rounds_without_improvement = 0
for epoch_nb in range(nb_of_epochs):
losses_list = list()
for batch in range(nb_of_batches):
x, y = next(train_data_gen)
losses_list.append(model.train_on_batch(x, y))
mean_loss = sum(losses_list) / len(losses_list)
if mean_loss < best_loss:
best_loss = mean_loss
rounds_witout_improvement = 0
else:
rounds_without_improvement +=1
if rounds_without_improvement == patience:
break

Related

TF 2.2: How to compute custom metric when using MirroredStrategy

the tf.keras.Model I am training has the following primary performance indicators:
escape rate: (#samples with predicted label 0 AND true label 1) / (#samples with true label 1)
false call rate: (#samples with predicted label 1 AND true label 0) / (#samples with true label 0)
The targeted escape rate is predefined, which means the decision threshold will have to be set appropriately. To calculate the resulting false call rate, I would like to implement a custom metric somewhere along the lines of the following pseudo code:
# separate predicted probabilities by their true label
all_ok_probabilities = all_probabilities.filter(true_label == 0)
all_nok_probabilities = all_probabilities.filter(true_label == 1)
# sort NOK samples
sorted_nok_probabilities = all_nok_probabilities.sort(ascending)
# determine decision threshold
threshold_idx = round(target_escape_rate * num_samples) - 1
threshold = sorted_nok_probabilities[threshold_idx]
# calculate false call rate
false_calls = count(all_ok_probabilities > threshold)
false_call_rate = false_calls / num_ok_samples
My issue is that, in a MirroredStrategy environment, tf.keras automatically distributes metric calculation across all replicas, each of them getting (batch_size / n_replicas) samples per update, and finally sums the results. My algorithm however only works correctly if ALL labels & predictions are combined (final summing could probably be overcome by dividing by the number of replicas).
My idea is to concatenate all y_true and y_pred in my metric's update_state() method into sequences, and running the evaluation in result(). The first step already seems impossible, however; tf.Variable only provides suitable aggregation methods for numeric scalars, not for sequences: tf.VariableAggregation.ONLY_FIRST_REPLICA makes me loose all data from 2nd to nth replica, SUM silently locks up the fit() call, MEAN does not make any sense in my application (and might hang just as well).
I already tried to instantiate the metric outside of the MirroredStrategy scope, but tf.keras.Model.compile() does not accept that.
Any hints/ideas?
P.S.: Let me know if you need a minimal code example, I am working on it. :)

Solved myself by implementing it as callback instead of metric. I run fit() without "validation_data" and instead have all validation set metrics calculated in the callback. This avoids two redundant validation set predictions.
In order to inject the resulting metric values back into the training procedure, I used the rather hackish approach from Access variables of caller function in Python.
class ValidationCallback(tf.keras.callbacks.Callback):
"""helper class to calculate validation set metrics after each epoch"""
def __init__(self, val_data, escape_rate, **kwargs):
# call parent constructor
super(ValidationCallback, self).__init__(**kwargs)
# save parameters
self.val_data = val_data
self.escape_rate = escape_rate
# declare batch_size - we will get that later
self.batch_size = 0
def on_epoch_end(self, epoch, logs=None):
# initialize empty arrays
y_pred = np.empty((0,2))
y_true = np.empty(0)
# iterate over validation set batches
for batch in self.val_data:
# save batch size, if not yet done
if self.batch_size == 0:
self.batch_size = batch[1].shape[0]
# concat all batch labels & predictions
# need to do predict()[0] due to several model outputs
y_pred = np.concatenate([y_pred, self.model.predict(batch[0])[0]], axis=0)
y_true = np.concatenate([y_true, batch[1]], axis=0)
# calculate classical accuracy for threshold 0.5
acc = ((y_pred[:, 1] >= 0.5) == y_true).sum() / y_true.shape[0]
# calculate cross-entropy loss
cce = tf.keras.losses.SparseCategoricalCrossentropy(reduction=tf.keras.losses.Reduction.SUM)
loss = cce(y_true, y_pred).numpy() / self.batch_size
# caculate false call rate
y_pred_nok = np.sort(y_pred[y_true == 1, 1])
idx = int(np.round(self.escape_rate * y_pred_nok.shape[0]))
threshold = y_pred_nok[idx]
false_calls = y_pred[(y_true == 0) & (y_pred[:, 1] >= threshold), 1].shape[0]
fcr = false_calls / y_true[y_true == 0].shape[0]
# add metrics to 'logs' dict of our caller (tf.keras.callbacks.CallbackList.on_epoch_end()),
# so that they become available to following callbacks
for f in inspect.stack():
if 'logs' in f[0].f_locals:
f[0].f_locals['logs'].update({'val_accuracy': acc,
'val_loss': loss,
'val_false_call_rate': fcr})
return

Strange behaviour of the loss using a custom generator (img,csv)

I'm working on the RNSA challenge, which consists of predicting the bondage from X-ray scans of kids between 1 and 19 years of age. We're given also the gender for each kid, so I wanted to incorporate it in the model. The image is fed through an InceptionV3, the output is then concatenated (after being max-pooled) with the gender info (which is inserted as an array of all 1s (boys) or all 0s (girls), with dimension chosen by the user).
Therefore the model takes (image,gender) as input. In order to accomplish this I'm using the following generator:
def myCustomGen(data_gen = None,dff = None,train = True,batch_size=None,img_size=None,embeddings=32):
flow = train_data_gen.flow_from_dataframe(
dataframe=dff,
# directory = 'train-dataset/boneage-training-dataset',
directory = '/content/train-dataset-compress/boneage-training-dataset',
x_col = 'id',
y_col = 'boneage',
batch_size=BATCH_SIZE,
shuffle=True,
class_mode='raw',
target_size=(IMG_SIZE,IMG_SIZE),
color_mode = 'grayscale',
)
for x, y in flow:
indices, filenames = get_indices_from_keras_generator(flow,batch_size)
genders = reduce(pd.DataFrame.append, map(lambda i: dff[dff.id == i], filenames)).gender_01.values
genders = create_embeddings2(genders,embedding) #this just creates an array of 0s or 1s with embedding dimension
if len(x) != len(genders):
yield [x,genders[-len(y):]],y
else:
yield [x,genders],y
Where the function get_indices_from_keras_generator() serves the purpose of getting the ids from the unordered batches, which are the used to extract, from each batch, the gender:
def get_indices_from_keras_generator(gen, batch_size):
idx_left = (gen.batch_index - 1) * batch_size
idx_right = idx_left + gen.batch_size if idx_left >= 0 else None
indices = gen.index_array[idx_left:idx_right]
filenames = [gen.filenames[i] for i in indices]
return indices, filenames
Now, here's the thing: the loss is decreasing (both on the training and the validation set) but it has a strange pattern: at the start of every epoch, the loss increases a bit, and then after some iterations in the same epoch, it decreases again to end the epoch at a lower value it started with (generally). Here a plot just to point out this saw-tooth behavior:
loss pattern
Please note that the plot above is not the actual loss, is just to make sure the reader grasp the trend of the loss for each epoch.
Why does this happen? Is it expected? If it's unusual, may it be caused by some bad shuffling of the training data? (Although, regarding this last point, the shuffling should be working fine since I use the default keras shuffle in flow_from_dataframe).
Any help is highly appreciated. If you think the generator doesn't work properly you're welcomed in suggesting me a better one.

Keras: Correct use of fit_generator, predict_generator, and evaluate_generator

I'm encountering weird behaviour when using fit_generator, predict_generator, and evaluate_generator, and I would like to ask the following questions, for which I could find no answer in the documentation:
Is it ok to have batches of different sizes when using fit_generator?
My batches are defined time-wise: they group the events that took place in the same hour. Each batch can therefore group a different number of events. For clarity, this is how my generators look like (following the logic in this thread):
def grouper(g,x,y):
while True:
for gr in g.unique():
# this assigns indices to the entire set of values in g,
# the subsects to all the rows in which g == gr
indices = g == gr
yield (x[indices],y[indices])
all_data_generator = grouper(df['batch_id'], X, Y)
train_generator = grouper(df.loc[df['set'] == 'train', 'batch_id'], X_train, Y_train)
validation_generator = grouper(df.loc[df['set'] == 'val', 'batch_id'], X_val, Y_val)
test_generator = grouper(df.loc[df['set'] == 'test', 'batch_id'], X_test, Y_test)
Is it ok to have different number of batches in train_generator and validation_generator?
For clarity, I pass those two (different) numbers explicitly to fit_generator in the call:
train_batches = df.loc[df['set'] == 'train', 'batch_id'].nunique()
val_batches = df.loc[df['set'] == 'val', 'batch_id'].nunique()
history = fmodel.fit_generator(train_generator,
steps_per_epoch=train_batches,
validation_data=validation_generator,
validation_steps=val_batches,
epochs=20, verbose = 0)
Predictions are wildly different depending on whether I use predict_classes or predict_generator, which baffles me.
Here's the code:
df['pred'] = fmodel.predict_classes(X)
# returns different results from
total_batches = df['batch_id'].nunique()
df['pred_gen'] = fmodel.predict_generator(all_data_generator, steps = total_batches)
Similarly, evaluate and evaluate_generator return different results.
The code:
scores = model.evaluate(X_test, Y_test, verbose = 0)
# returns different results from
scores_generator = fmodel.evaluate_generator(test_generator, steps=test_batches)
I know there are already many issues referring to my points 3. and 4. (e.g., 3477, 6499) but the main takeaways there seem to refer to
using ImageDataGenerator with/without rescaling;
shuffling data, which according to fit_generator documentation "Has no effect when steps_per_epoch is not None."
using workers > 1, which by default is not the case.
So I'm wondering whether points 1. and 2. might be the culprits here.

1 and 2
Yes, totally ok.
It's even expected that 2 be true.
3
Predict classes is not documented. What does it do exactly? I think it predicts indices while all other prediction methods predict the actual model's outputs, right?
4
This is sensible...
Are you pretty sure your generator is outputting exactly what you want?
You may try to see a few batches to compare them with x and y:
for i in range(aFewBatches):
print(next(train_generator))
#or create some comparisons
Even if the generators are correct, you are definitely reshuffling (actually sorting) your data, when you select your batches for the generator.
While evaluate will take the entire x, y data as it is, usually in batches of 32, evaluate_generator will take your selected batches. So, the metrics per batch will certainly vary, and the final result which is a mean of the batch metrics will also be different. So, unless the difference too big, it's ok.
PS: I'm not entirely sure whether evaluate will give you mean batch metrics or entire data metrics, but evaluate_generator will bring mean batch metrics, which is enough for a difference.

When using TFRecord, how can I run intermediate validation check? (a better way?)

Let's say I defined a network Net and the example code below runs well.
# ... input processing using TFRecord ... # reading from TFRecord
x, y = tf.train.batch([image, label]) # encode batch
net = Net(x,y) # connect to network
# ... initialize and session ...
for iteration:
loss, _ = sess.run([net.loss, net.train_op])
The Net does not have tf.placeholder, since input is provided by tensors from TFRecord provider. What if I would like to run validation set as well, e.g., every 500 steps? How can I switch input flow?
x, y = tf.train.batch([image, label], ...) # training set
vx, vy = tf.train.batch([vimage, vlabel], ...) # validation set
net = Net(x,y)
for iteration:
loss, _ = sess.run([net.loss, net.train_op])
if step % 500 == 0:
# graph is already defined from input to loss.
# how can I run net.loss with vx and vy??
Only one thing I can imagine is, modifying Net to have placeholders, and every time running like
sess.run([...], feed_dict = {Net.x:sess.run(x), Net.y:sess.run(y)})
sess.run([...], feed_dict = {Net.x:sess.run(vx), Net.y:sess.run(vy)})
However, this seems to me that I lost benefits of using TFRecord (e.g., full TF integration). In the middle of computation flow, I have to stop the flow, run tf.sess, and continue (doesn't this lower speed by forcing to use CPU in the middle?)
I am wondering,
if there is a better way.
if my solution is not that worse than I imagine.
Thanks in advance.

There is a better way (than placeholders). I ran into this issue with the CIFAR10 tutorial in TensorFlow, which I adjusted to check accuracy on the test set simultaneous to the training every 500 batches or so. This is where sharing variables comes in handy.
x, y = tf.train.batch([image, label], ...) # training set
vx, vy = tf.train.batch([vimage, vlabel], ...) # validation set
with tf.variable_scope("model") as scope:
net = Net(x,y)
scope.reuse_variables()
vnet = Net(vx,vy)
for iteration:
loss, _ = sess.run([net.loss, net.train_op])
if step % 500 == 0:
loss, acc = sess.run([vnet.loss, vnet.accuracy])
By setting the scope to reuse variables on the second call to Net(), you will use the same tensors and values created in the first call but with a different set of inputs. Just make sure that vimage and vlabel aren't reusing tensors from image and label (which could possibly solved by creating their own variable scopes).

Mixture of experts - Train best model only at each iteration

I am trying to implement a crude method based on the Mixture-of-Experts paper in tensorflow - https://arxiv.org/abs/1701.06538
There would be n models defined:
model_1:
var_11
var_12
loss_1
optimizer_1
model_2:
var_21
var_22
loss_2
optimizer_2
model_3:
var_31
var_32
loss_3
optimizer_3
At every iteration, I want to train the model with the least loss only while keeping the other variables constant. Is it possible to place a switch to execute one of the optimizer only?
P.S: This base of this problem is similar to one I had asked previously. http://stackoverflow.com/questions/42073239/tf-get-collection-to-extract-variables-of-one-scope/42074009?noredirect=1#comment71359330_42074009
Since the suggestion there did not work, I am trying to approach the problem differently.
Thanks in advance!

This seems to be doable with tf.cond:
import tensorflow as tf
def make_conditional_train_op(
should_update, optimizers, variable_lists, losses):
"""Conditionally trains variables.
Each argument is a Python list of Tensors, and each list must have the same
length. Variables are updated based on their optimizer only if the
corresponding `should_update` boolean Tensor is True at a given step.
Returns a single train op which performs the conditional updates.
"""
assert len(optimizers) == len(variable_lists)
assert len(variable_lists) == len(losses)
assert len(should_update) == len(variable_lists)
conditional_updates = []
for model_number, (update_boolean, optimizer, variables, loss) in enumerate(
zip(should_update, optimizers, variable_lists, losses)):
conditional_updates.append(
tf.cond(update_boolean,
lambda: tf.group(
optimizer.minimize(loss, var_list=variables),
tf.Print(0, ["Model {} updating".format(model_number), loss])),
lambda: tf.no_op()))
return tf.group(*conditional_updates)
The basic strategy is to make sure the optimizer's variable updates are defined in the lambda of one of the cond branches, in which case there is true conditional op execution, meaning that the assignment to variables (and optimizer accumulators) only happens if that branch of the cond is triggered.
As an example, we can construct some models:
def make_model_and_optimizer():
scalar_variable = tf.get_variable("scalar", shape=[])
vector_variable = tf.get_variable("vector", shape=[3])
loss = tf.reduce_sum(scalar_variable * vector_variable)
optimizer = tf.train.AdamOptimizer(0.1)
return optimizer, [scalar_variable, vector_variable], loss
# Construct each model
optimizers = []
variable_lists = []
losses = []
for i in range(10):
with tf.variable_scope("model_{}".format(i)):
optimizer, variables, loss = make_model_and_optimizer()
optimizers.append(optimizer)
variable_lists.append(variables)
losses.append(loss)
Then determine a conditional update strategy, in this case only training the model with the maximum loss (just because that results in more switching; the output is rather boring if only one model ever updates):
# Determine which model should be updated (in this case, the one with the
# maximum loss)
integer_one_hot = tf.one_hot(
tf.argmax(tf.stack(losses),
axis=0),
depth=len(losses))
is_max = tf.equal(
integer_one_hot,
tf.ones_like(integer_one_hot))
Finally, we can call the make_conditional_train_op function to create the train op, then do some training iterations:
train_op = make_conditional_train_op(
tf.unstack(is_max), optimizers, variable_lists, losses)
# Repeatedly call the conditional train op
with tf.Session():
tf.global_variables_initializer().run()
for i in range(20):
print("Iteration {}".format(i))
train_op.run()
This is printing the index which is updated and its loss at each iteration, confirming the conditional execution:
Iteration 0
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][2.7271919]
Iteration 1
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][2.1755948]
Iteration 2
I tensorflow/core/kernels/logging_ops.cc:79] [Model 2 updating][1.9858969]
Iteration 3
I tensorflow/core/kernels/logging_ops.cc:79] [Model 6 updating][1.6859927]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to have Keras model do early-stopping in different fit calls - python

Related

TF 2.2: How to compute custom metric when using MirroredStrategy

Strange behaviour of the loss using a custom generator (img,csv)

Keras: Correct use of fit_generator, predict_generator, and evaluate_generator

When using TFRecord, how can I run intermediate validation check? (a better way?)

Mixture of experts - Train best model only at each iteration

Categories

Resources