I am going through this tutorial on how to customize the training loop
https://colab.research.google.com/github/tensorflow/docs/blob/snapshot-keras/site/en/guide/keras/customizing_what_happens_in_fit.ipynb#scrollTo=46832f2077ac
The last example shows a GAN implemented with a custom training, where only __init__, train_step, and compile methods are defined
class GAN(keras.Model):
def __init__(self, discriminator, generator, latent_dim):
super(GAN, self).__init__()
self.discriminator = discriminator
self.generator = generator
self.latent_dim = latent_dim
def compile(self, d_optimizer, g_optimizer, loss_fn):
super(GAN, self).compile()
self.d_optimizer = d_optimizer
self.g_optimizer = g_optimizer
self.loss_fn = loss_fn
def train_step(self, real_images):
if isinstance(real_images, tuple):
real_images = real_images[0]
...
What happens if my model also has a call() custom function? Does train_step() overrides call()?
Aren't call() and train_step() both called by fit() and what is the difference between both ?
Below another piece of code "I" wrote where I wonder what is called into fit(), call() or train_step():
class MyModel(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, rnn_units):
super().__init__(self)
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(rnn_units,
return_sequences=True,
return_state=True,
reset_after=True
)
self.dense = tf.keras.layers.Dense(vocab_size)
def call(self, inputs, states=None, return_state=False, training=False):
x = inputs
x = self.embedding(x, training=training)
if states is None:
states = self.gru.get_initial_state(x)
x, states = self.gru(x, initial_state=states, training=training)
x = self.dense(x, training=training)
if return_state:
return x, states
else:
return x
#tf.function
def train_step(self, inputs):
# unpack the data
inputs, labels = inputs
with tf.GradientTape() as tape:
predictions = self(inputs, training=True) # forward pass
# Compute the loss value
# (the loss function is configured in `compile()`)
loss=self.compiled_loss(labels, predictions, regularization_losses=self.losses)
# compute the gradients
grads=tape.gradient(loss, model.trainable_variables)
# Update weights
self.optimizer.apply_gradients(zip(grads, model.trainable_variables))
# Update metrics (includes the metric that tracks the loss)
self.compiled_metrics.update_state(labels, predictions)
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in self.metrics}
These are different concepts and are used like this:
train_step is called by fit. Basically, fit loops over the dataset and provide each batch to train_step (and then handles metrics, bookkeeping, etc., of course).
call is used when you, well, call the model. To be precise, writing model(inputs) or in your case self(inputs) will use the function __call__, but the Model class has that function defined such that it will in turn use call.
Those are the technical aspects. Intuitively:
call should define the forward-pass of your model. i.e. how is the input transformed to the output.
train_step defines the logic of a training step, usually with gradient descent. It will often make use of call since the training step tends to include a forward pass of the model to compute gradients.
As for the GAN tutorial you linked, I would say that can actually be considered incomplete. It works without defining call because the custom train_step explicitly calls the generator/discriminator fields (as these are predefined models, they can be called as usual). If you tried to call the GAN model like gan(inputs), I would assume you get an error message (I did not test this). So you would always have to call gan.generator(inputs) to generate, for example.
Finally (this part may be a bit confusing), note that you can subclass a Model to define a custom training step, but then initialize it via the functional API (like model = Model(inputs, outputs)), in which case you can make use of call in the training step without ever defining it yourself because the functional API takes care of that.
Related
I'm trying to make a custom loss function for keras NN model.
Normally, loss functions have y_prediction and y_true for arguments.
But, I need to use model in the custom loss function like
y_prediction = model(X_train) to use tf. GradientTape.
So what I want to know is how to use the latest model(on the way to fit) in the custom loss function.
If you have an idea about that, tell me, please.
(Sorry for my bad English)
You can create a model class as and implement the train_step method:
class YourModel(Model):
def __init__(self):
super(YourModel, self).__init__()
# define your model architecture here as an attribute of the class
def train_step(data):
with tf.GradientTape() as tape:
# foward pass data through the architecture
# compute loss (y_true, y_pred, any other param)
# weight update
gradients = tape.gradient(loss, self.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
return {
'loss': loss
# other losses
}
def call(self, x):
# your forward pass implementation
return # output
More information can be found here: https://www.tensorflow.org/tutorials/quickstart/advanced
Tensorflow 2 takes about 15 minutes to make its static graph (or whatever it's doing before the first pass). The training time after this is normal, but obviously it's hard to experiment with 15 mins of waiting for any feedback.
The generator encoder and discriminator are RNNs (not unrolled) with GRU cells in a Keras model.
The generator decoder is defined and called like this:
class GeneratorDecoder(tf.keras.layers.Layer):
def __init__(self, feature_dim):
super(GeneratorDecoder, self).__init__()
self.cell = tf.keras.layers.GRUCell(
GRUI_DIM, activation='tanh', recurrent_activation='sigmoid',
dropout=DROPOUT, recurrent_dropout=DROPOUT)
self.batch_normalization = tf.keras.layers.BatchNormalization()
self.dense = tf.keras.layers.Dense(
feature_dim, activation='tanh')
#tf.function
def __call__(self, z, timesteps, training):
# z has shape (batch_size, features)
outputs = []
output, state = z, z
for i in range(timesteps):
output, state = self.cell(inputs=output, states=state,
training=training)
dense_output = self.dense(
self.batch_normalization(output))
outputs.append(dense_output)
return outputs
Here is my training loop (the mask_gt and missing_data variables are cast using tf.cast and should so already be tensors):
for it in tqdm(range(NO_ITERATIONS)):
print(it)
train_step()
#tf.function
def train_step():
with tf.GradientTape(persistent=True) as tape:
generator_output = generator(missing_data, training=True)
imputed_data = get_imputed_data(missing_data, generator_output)
mask_pred = discriminator(imputed_data)
D_loss = discriminator.loss(mask_pred, mask_gt)
G_loss = generator.loss(missing_data, mask_gt,
generator_output, mask_pred)
gen_enc_grad = tape.gradient(
G_loss, generator.encoder.trainable_variables)
gen_dec_grad = tape.gradient(
G_loss, generator.decoder.trainable_variables)
disc_grad = tape.gradient(
D_loss, discriminator.model.trainable_variables)
del tape
generator.optimizer.apply_gradients(
zip(gen_enc_grad, generator.encoder.trainable_variables))
generator.optimizer.apply_gradients(
zip(gen_dec_grad, generator.decoder.trainable_variables))
discriminator.optimizer.apply_gradients(
zip(disc_grad, discriminator.model.trainable_variables))
Note that "0" is printed within a few seconds, so the slow part is definitely not earlier.
And this is the get_imputed_data function that is called:
def get_imputed_data(incomplete_series, generator_output):
return tf.where(tf.math.is_nan(incomplete_series), generator_output, incomplete_series)
Thanks for any answers! Hope I provided just enough code to give a sense of where the problem lies. This is my first time posting here after reading for at least five years :)
I use Python 3.6 and Tensorflow 2.1.
The problem was solved by removing the tf.function decorator for the calling functions of the generator and discriminator. I was using a single global python scalar (the iteration no.) in two of the tf.function decorated functions. This caused a new graph to be created every time (see the caution in the tf.function docs).
The solution is to drop the python variables used or convert them to tensorflow variables.
I want to develop a neural network with three inputs pos,anc,neg and three outputs pos_out,anc_out,neg_out. While calculating loss in my customized loss function in keras, I want to access pos_out, anc_out, neg_out in y_pred. I can access y_pred as a whole. But how to access individual part pos_out, anc_out and neg_out
I have applied max function to y_pred. It calculates max value correctly. If I am passing only one output in Model as Model(input=[pos,anc,neg], output=pos_out) then also it calculates max value correctly. But when it comes to accessing max values form pos_out, anc_out and neg_out separately in customized function, it does not work.
def testmodel(input_shape):
pos = Input(shape=(14,300))
anc = Input(shape=(14,300))
neg = Input(shape=(14,300))
model = Sequential()
model.add(Flatten(batch_input_shape=(1,14,300)))
pos_out = model(pos)
anc_out = model(anc)
neg_out = model(neg)
model = Model(input=[pos,anc,neg], output=[pos_out,anc_out,neg_out])
return model
def customloss(y_true,y_pred):
print((K.int_shape(y_pred)[1]))
#loss = K.max((y_pred))
loss = K.max[pos_out]
return loss
You can create a loss function that contains a closure that lets you access the model and thus the targets and the model layer outputs.
class ExampleCustomLoss(object):
""" The loss function can access model.inputs, model.targets and the outputs
of specific layers. These are all tensors and will have the expected results
for the batch.
"""
def __init__(self, model):
self.model = model
def loss(self, y_true, y_pred, **kwargs):
...
return loss
model = Model(..., ...)
loss_calculator = ExampleCustomLoss(model)
model.compile('adam', loss_calculator.loss)
However, it may be simpler to do the inverse. i.e. have a single model output
out = Concatenate(axis=1)([pos_out, anc_out, neg_out])
And then in the loss function slice y_true and y_pred.
From the names of variables, it looks as if you are trying to use a triplet loss. You may find this other question useful:
How to deal with triplet loss when at time of input i have only two files i.e. at time of testing
Your loss function gets 2 arguments, model output and true label, your model output will have the shape that you define when you define the net. Your loss function needs to output a single difference value, between your model's output and the true value of the label while training.
Also please add some trainable layers to your model, because your custom loss function will be useless otherwise.
I am trying to define custom loss and accuracy functions for each output in a two output neural network model using Keras. Let's call the two outputs: A and B.
My objectives are:
Give the accuracy/loss functions for one of the outputs names such that they can be reported on the same graphs in tensorboard as the same corresponding output from older/existing models I have laying around. So for example, accuracies and losses for output A in this two output network should be viewable in the same graph in tensorboard as output A of some older model that I have. More specifically, these older models all output A_output_acc, val_A_output_acc, A_output_loss and val_A_output_loss. So I want the corresponding metric readouts for the A output in this new model to have those names as well so that they are viewable/comparable on the same graph in tensorboard.
Allow for easy configuration of accuracy/loss functions so that I can swap at whim different losses/accuracies for each output without hard coding them.
I have a Modeler class that constructs and compiles a network. The relevant code follows.
class Modeler(BaseModeler):
def __init__(self, loss=None,accuracy=None, ...):
"""
Returns compiled keras model.
"""
self.loss = loss
self.accuracy = accuracy
model = self.build()
...
model.compile(
loss={ # we are explicit here and name the outputs even though in this case it's not necessary
"A_output": self.A_output_loss(),#loss,
"B_output": self.B_output_loss()#loss
},
optimizer=optimus,
metrics= { # we need to tie each output to a specific list of metrics
"A_output": [self.A_output_acc()],
# self.A_output_loss()], # redundant since it's already reported via `loss` param,
# ends up showing up as `A_output_loss_1` since keras
# already reports `A_output_loss` via loss param
"B_output": [self.B_output_acc()]
# self.B_output_loss()] # redundant since it's already reported via `loss` param
# ends up showing up as `B_output_loss_1` since keras
# already reports `B_output_loss` via loss param
})
self._model = model
def A_output_acc(self):
"""
Allows us to output custom train/test accuracy/loss metrics to desired names e.g. 'A_output_acc' and
'val_A_output_acc' respectively so that they may be plotted on same tensorboard graph as the accuracies from
other models that same outputs.
:return: accuracy metric
"""
acc = None
if self.accuracy == TypedAccuracies.BINARY:
def acc(y_true, y_pred):
return self.binary_accuracy(y_true, y_pred)
elif self.accuracy == TypedAccuracies.DICE:
def acc(y_true, y_pred):
return self.dice_coef(y_true, y_pred)
elif self.accuracy == TypedAccuracies.JACARD:
def acc(y_true, y_pred):
return self.jacard_coef(y_true, y_pred)
else:
logger.debug('ERROR: undefined accuracy specified: {}'.format(self.accuracy))
return acc
def A_output_loss(self):
"""
Allows us to output custom train/test accuracy/loss metrics to desired names e.g. 'A_output_acc' and
'val_A_output_acc' respectively so that they may be plotted on same tensorboard graph as the accuracies from
other models that same outputs.
:return: loss metric
"""
loss = None
if self.loss == TypedLosses.BINARY_CROSSENTROPY:
def loss(y_true, y_pred):
return self.binary_crossentropy(y_true, y_pred)
elif self.loss == TypedLosses.DICE:
def loss(y_true, y_pred):
return self.dice_coef_loss(y_true, y_pred)
elif self.loss == TypedLosses.JACARD:
def loss(y_true, y_pred):
return self.jacard_coef_loss(y_true, y_pred)
else:
logger.debug('ERROR: undefined loss specified: {}'.format(self.accuracy))
return loss
def B_output_acc(self):
"""
Allows us to output custom train/test accuracy/loss metrics to desired names e.g. 'A_output_acc' and
'val_A_output_acc' respectively so that they may be plotted on same tensorboard graph as the accuracies from
other models that same outputs.
:return: accuracy metric
"""
acc = None
if self.accuracy == TypedAccuracies.BINARY:
def acc(y_true, y_pred):
return self.binary_accuracy(y_true, y_pred)
elif self.accuracy == TypedAccuracies.DICE:
def acc(y_true, y_pred):
return self.dice_coef(y_true, y_pred)
elif self.accuracy == TypedAccuracies.JACARD:
def acc(y_true, y_pred):
return self.jacard_coef(y_true, y_pred)
else:
logger.debug('ERROR: undefined accuracy specified: {}'.format(self.accuracy))
return acc
def B_output_loss(self):
"""
Allows us to output custom train/test accuracy/loss metrics to desired names e.g. 'A_output_acc' and
'val_A_output_acc' respectively so that they may be plotted on same tensorboard graph as the accuracies from
other models that same outputs.
:return: loss metric
"""
loss = None
if self.loss == TypedLosses.BINARY_CROSSENTROPY:
def loss(y_true, y_pred):
return self.binary_crossentropy(y_true, y_pred)
elif self.loss == TypedLosses.DICE:
def loss(y_true, y_pred):
return self.dice_coef_loss(y_true, y_pred)
elif self.loss == TypedLosses.JACARD:
def loss(y_true, y_pred):
return self.jacard_coef_loss(y_true, y_pred)
else:
logger.debug('ERROR: undefined loss specified: {}'.format(self.accuracy))
return loss
def load_model(self, model_path=None):
"""
Returns built model from model_path assuming using the default architecture.
:param model_path: str, path to model file
:return: defined model with weights loaded
"""
custom_objects = {'A_output_acc': self.A_output_acc(),
'A_output_loss': self.A_output_loss(),
'B_output_acc': self.B_output_acc(),
'B_output_loss': self.B_output_loss()}
self.model = load_model(filepath=model_path, custom_objects=custom_objects)
return self
def build(self, stuff...):
"""
Returns model architecture. Instead of just one task, it performs two: A and B.
:return: model
"""
...
A_conv_final = Conv2D(1, (1, 1), activation="sigmoid", name="A_output")(up_conv_224)
B_conv_final = Conv2D(1, (1, 1), activation="sigmoid", name="B_output")(up_conv_224)
model = Model(inputs=[input], outputs=[A_conv_final, B_conv_final], name="my_model")
return model
The training works fine. However, when I go to load the model for inference later, using the above load_model() function, Keras complains that it doesn't know about the custom metrics I have given it:
ValueError: Unknown loss function:loss
What seems to be happening is that Keras is appending the returned function created in each of the custom metric functions above (def loss(...), def acc(...)) to the dictionary key given in the metrics parameter of the model.compile() call.
So, for example the key is A_output and we call the custom accuracy function, A_output_acc() for it, which returns a function called acc. So the result is A_output + acc = A_output_acc. This means that I can't name those returned functions: acc/loss something else, because that will mess up the reporting/graphs.
This is all fine and well, BUT I don't know how to write my load function with a properly defined custom_objects parameter (or define/name my custom metrics functions for that matter) so that Keras knows which custom accuracy/loss functions are to be loaded with each output head.
More to the point, it seems to be wanting a custom_objects dictionary of the following form in load_model() (which won't work for obvious reasons):
custom_objects = {'acc': self.A_output_acc(),
'loss': self.A_output_loss(),
'acc': self.B_output_acc(),
'loss': self.B_output_loss()}
instead of:
custom_objects = {'A_output_acc': self.A_output_acc(),
'A_output_loss': self.A_output_loss(),
'B_output_acc': self.B_output_acc(),
'B_output_loss': self.B_output_loss()}
Any insights or work-arounds?
Thanks!
EDIT:
I've confirmed the reasoning above about key/function name concatenation IS correct for the metrics parameter of Keras' model.compile() call. HOWEVER, for the loss parameter in model.compile(), Keras just concatenates the key with the word loss, yet expects the name of the custom loss function in the custom_objects parameter of model.load_model()...go figure.
Remove the () at the end of your losses and metrics and that should be it. It'll look like this instead
loss={
"A_output": self.A_output_loss,
"B_output": self.B_output_loss
}
I am using pytorch and trying to understand how a simple linear regression model works.
I'm using a simple LinearRegressionModel class:
class LinearRegressionModel(nn.Module):
def __init__(self, input_dim, output_dim):
super(LinearRegressionModel, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self, x):
out = self.linear(x)
return out
model = LinearRegressionModel(1, 1)
Next I instantiate a loss criterion and an optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
Finally to train the model I use the following code:
for epoch in range(epochs):
if torch.cuda.is_available():
inputs = Variable(torch.from_numpy(x_train).cuda())
if torch.cuda.is_available():
labels = Variable(torch.from_numpy(y_train).cuda())
# Clear gradients w.r.t. parameters
optimizer.zero_grad()
# Forward to get output
outputs = model(inputs)
# Calculate Loss
loss = criterion(outputs, labels)
# Getting gradients w.r.t. parameters
loss.backward()
# Updating parameters
optimizer.step()
My question is how does the optimizer get the loss gradient, computed by loss.backward(), to update the parameters using the step() method? How are the model, the loss criterion and the optimizer tied together?
PyTorch has this concept of tensors and variables. When you use nn.Linear the function creates 2 variables namely W and b.In pytorch a variable is a wrapper that encapsulates a tensor , its gradient and information about its create function. you can directly access the gradients by
w.grad
When you try it before calling the loss.backward() you get None. Once you call the loss.backward() it will contain now gradients. Now you can update these gradient manually with the below simple steps.
w.data -= learning_rate * w.grad.data
When you have a complex network ,the above simple step could grow complex. So optimisers like SGD , Adam takes care of this. When you create the object for these optimisers we pass in the parameters of our model. nn.Module contains this parameters() function which will return all the learnable parameters to the optimiser. Which can be done using the below step.
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
loss.backward()
calculates the gradients and store them in the parameters.
And you pass in the paremeters that are needed to be tuned here:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)