Tensorflow training - print multiple losses for one output

Tensorflow training - print multiple losses for one output - python

I would like to print all the different losses I have for one output separately.
At the moment it looks like:
1/1 [==============================] - 1s 1s/sample - loss: 4.2632
The goal is to have a history like:
1/1 [==============================] - 1s 1s/sample - loss1: 2.1, loss2: 2.1632
I have one output layer out1 and two loss functions loss1 and loss2.
def loss1(y_true, y_pred):
...
return ...
def loss2(y_true, y_pred):
...
return ...
When I do
model.compile(...)
I can either choose to have a single loss function,
model.compile(loss=lambda x: loss1(x) + loss2(x))
or defining a loss for each output in a dictionary
model.compile(loss={'out1': loss1(x), 'out2': loss2(x)})
Since I have only one output, this isn't an option for me.
Does anyone know how to print the losses separately when having only one output?

Just use the metrics argument:
model.compile(optimizer='adam', loss='mae', metrics=['mse'])
You will still need to choose one loss to minimize.

One workaround is to artificially create the same two outputs, and then combine them with weights equal 1. For the sake of concreteness, I wrote the example:
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Dense, Lambda
from tensorflow.keras.losses import mse, mae
import numpy as np
if __name__ == '__main__':
train_x = np.random.rand(10000, 200)
train_y = np.random.rand(10000, 1)
x_input = Input(shape=(200))
x = Dense(64)(x_input)
x = Dense(64)(x)
x = Dense(1)(x)
x1 = Lambda(lambda x: x, name='out1')(x)
x2 = Lambda(lambda x: x, name='out2')(x)
model = Model(inputs=x_input, outputs=[x1, x2])
model.compile(optimizer='adam', loss={'out1': mse, 'out2': mae}, loss_weights={'out1': 1, 'out2': 1})
model.fit(train_x, train_y, epochs=10)

Related

Keras - Adding loss to intermediate layer while ignoring the last layer

I've created the following Keras custom model:
import tensorflow as tf
from tensorflow.keras.layers import Layer
class MyModel(tf.keras.Model):
def __init__(self, num_classes):
super(MyModel, self).__init__()
self.dense_layer = tf.keras.layers.Dense(num_classes,activation='softmax')
self.lambda_layer = tf.keras.layers.Lambda(lambda x: tf.math.argmax(x, axis=-1))
def call(self, inputs):
x = self.dense_layer(inputs)
x = self.lambda_layer(x)
return x
# A convenient way to get model summary
# and plot in subclassed api
def build_graph(self, raw_shape):
x = tf.keras.layers.Input(shape=(raw_shape))
return tf.keras.Model(inputs=[x],
outputs=self.call(x))
The task is multi-class classification.
Model consists of a dense layer with softmax activation and a lambda layer as a post-processing unit that converts the dense output vector to a single value (predicted class).
The train targets are a one-hot encoded matrix like so:
[
[0,0,0,0,1]
[0,0,1,0,0]
[0,0,0,1,0]
[0,0,0,0,1]
]
It would be nice if I could define a categorical_crossentropy loss over the dense layer and ignore the lambda layer while still maintaining the functionality and outputting a single value when I call model.predict(x).
Please note
My workspace environment doesn't allow me to use a custom training loop as suggested by #alonetogether excellent answer.

You can try using a custom training loop, which is pretty straightforward IMO:
import tensorflow as tf
from tensorflow.keras.layers import Layer
class MyModel(tf.keras.Model):
def __init__(self, num_classes):
super(MyModel, self).__init__()
self.dense_layer = tf.keras.layers.Dense(num_classes,activation='softmax')
self.lambda_layer = tf.keras.layers.Lambda(lambda x: tf.math.argmax(x, axis=-1))
def call(self, inputs):
x = self.dense_layer(inputs)
x = self.lambda_layer(x)
return x
# A convenient way to get model summary
# and plot in subclassed api
def build_graph(self, raw_shape):
x = tf.keras.layers.Input(shape=(raw_shape))
return tf.keras.Model(inputs=[x],
outputs=self.call(x))
n_classes = 5
model = MyModel(n_classes)
labels = tf.keras.utils.to_categorical(tf.random.uniform((50, 1), maxval=5, dtype=tf.int32))
train_dataset = tf.data.Dataset.from_tensor_slices((tf.random.normal((50, 1)), labels)).batch(2)
optimizer = tf.keras.optimizers.Adam()
loss_fn = tf.keras.losses.CategoricalCrossentropy()
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model.layers[0](x_batch_train)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
And prediction:
print(model.predict(tf.random.normal((1, 1))))
[3]

I think there is a Model.predict_classes function that would replace the need for that lambda layer. But if it doesn't work:
There doesn't seem to be a way to do that without using one of these hacks:
Two inputs (one is the groud truth values Y)
Two outputs
Two models
I'm quite convinced there is no other workaround for this.
So, I believe the "two models" version is the best for your case where you seem to "need" a model with single input, single output and fit.
Then I'd do this:
inputs = tf.keras.layers.Input(input_shape_without_batch_size)
loss_outputs = tf.keras.layers.Dense(num_classes,activation='softmax')(inputs)
final_outputs = tf.keras.layers.Lambda(lambda x: tf.math.argmax(x, axis=-1))(loss_outputs)
training_model = tf.keras.models.Model(inputs, loss_outputs)
final_model = tf.keras.models.Model(inputs, final_outputs)
training_model.compile(.....)
training_model.fit(....)
results = final_model.predict(...)

Custom Keras loss function with the output's gradient [duplicate]

I am using TF2 (2.3.0) NN to approximate the function y which solves the ODE: y'+3y=0
I have defined cutsom loss class and function in which I am trying to differentiate the single output with respect to the single input so the equation holds, provided that y_true is zero:
from tensorflow.keras.losses import Loss
import tensorflow as tf
class CustomLossOde(Loss):
def __init__(self, x, model, name='ode_loss'):
super().__init__(name=name)
self.x = x
self.model = model
def call(self, y_true, y_pred):
with tf.GradientTape() as tape:
tape.watch(self.x)
y_p = self.model(self.x)
dy_dx = tape.gradient(y_p, self.x)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
return loss
but running the following NN:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
from custom_loss_ode import CustomLossOde
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
loss = CustomLossOde(model.input, model)
model.compile(optimizer=Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99),loss=loss)
model.run_eagerly = True
model.fit(x_train, y_train, batch_size=16, epochs=30)
for now I am getting 0 loss from the fisrt epoch, which doesn't make any sense.
I have printed both y_true and y_test from within the function and they seem OK so I suspect that the problem is in the gradien which I didn't succeed to print.
Apprecitate any help

Defining a custom loss with the high level Keras API is a bit difficult in that case. I would instead write the training loop from scracth, as it allows a finer grained control over what you can do.
I took inspiration from those two guides :
Advanced Automatic Differentiation
Writing a training loop from scratch
Basically, I used the fact that multiple tape can interact seamlessly. I use one to compute the loss function, the other to calculate the gradients to be propagated by the optimizer.
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
# using the high level tf.data API for data handling
x_train = tf.reshape(x_train,(-1,1))
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(1)
opt = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99)
for step, (x,y_true) in enumerate(dataset):
# we need to convert x to a variable if we want the tape to be
# able to compute the gradient according to x
x_variable = tf.Variable(x)
with tf.GradientTape() as model_tape:
with tf.GradientTape() as loss_tape:
loss_tape.watch(x_variable)
y_pred = model(x_variable)
dy_dx = loss_tape.gradient(y_pred, x_variable)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
grad = model_tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(grad, model.trainable_variables))
if step%20==0:
print(f"Step {step}: loss={loss.numpy()}")

Join metrics of every output in Keras (in multiple output)

I am working on a multiple output model in Keras. I've implemented two custom metrics, auroc and auprc, that are passed to the compile methods of Keras model:
def auc(y_true, y_pred, curve='PR'):
score, up_opt = tf.compat.v1.metrics.auc(y_true, y_pred, curve=curve, summation_method="careful_interpolation")
K.get_session().run(tf.local_variables_initializer())
with tf.control_dependencies([up_opt]):
score = tf.identity(score)
return score
def auprc(y_true, y_pred):
return auc(y_true, y_pred, curve='PR')
def auroc(y_true, y_pred):
return auc(y_true, y_pred, curve='ROC')
mlp_model.compile(loss=...,
optimizer=...,
metrics=[auprc, auroc])
Using this method, I obtain an auprc/auroc values for every output but, to optimize my hyperparameters with a Bayesian optimizator, I need a single metrics (e.g: the average or the sum of auprc for every output). I can't figure out how I can join my metrics in a single one.
EDIT: here an example of desired results
Now for every epochs the following metrics are printed:
out1_auprc: 0.0267 - out2_auprc: 0.0277 - out3_auprc: 0.0294
where out1, out2, out3 are my neural network outputs, I desire to obtain something like:
average_auprc: 0.0279 - out1_auprc: 0.0267 - out2_auprc: 0.0277 - out3_auprc: 0.0294
I am using Keras Tuner for Bayesian Optimization.
Any help is appreciated, thank you.

I override the problem creating a custom callback
class MergeMetrics(Callback):
def __init__(self,**kargs):
super(MergeMetrics,self).__init__(**kargs)
def on_epoch_begin(self,epoch, logs={}):
return
def on_epoch_end(self, epoch, logs={}):
logs['merge_metrics'] = 0.5*logs["y1_mse"]+0.5*logs["y2_mse"]
I use this callback to merge 2 metrics coming from 2 different outputs. I use a simple problem for example but you can integrate it easily in your problem and integrate it with a validation set
this is the dummy example
X = np.random.uniform(0,1, (1000,10))
y1 = np.random.uniform(0,1, 1000)
y2 = np.random.uniform(0,1, 1000)
inp = Input((10))
x = Dense(32, activation='relu')(inp)
out1 = Dense(1, name='y1')(x)
out2 = Dense(1, name='y2')(x)
m = Model(inp, [out1,out2])
m.compile('adam','mae', metrics='mse')
checkpoint = MergeMetrics()
m.fit(X, [y1,y2], epochs=10, callbacks=[checkpoint])
the printed output
loss: ..... y1_mse: 0.0863 - y2_mse: 0.0875 - merge_metrics: 0.0869

Why does training using tf.GradientTape in tensorflow 2 have different behavior to training using fit API?

I am new to using tensorflow 2
I am familiar with using keras in tensorflow 1. And I usually use fit method API to train model. But recently in tensorflow 2, they introduced eager execution. So I implemented and compare a simple image classifier on CiFAR-10 dataset on both fit and tf.GradientTape and trained for 20 epochs each
After several runs, the results are as follow
Model trained with fit API
Training dataset, loss is around 0.61-0.65 with accuracy of 76% - 80%
Validation dataset, loss is around 0.8 with accuracy of 72% - 75%
Model trained with tf.GradientTape
Training dataset, loss is around 0.15-0.2 with accuracy of 91% - 94%
Validation dataset, loss is around 1.8-2 with accuracy of 64% - 67%
I am not sure why the model exhibits a different behavior. I think I might implement something wrong. I think it is weird that in tf.GradientTape the model start to overfit training dataset quicker
Here are some snippets
Using fit API
model = SimpleClassifier(10)
model.compile(
optimizer=Adam(),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=[tf.keras.metrics.CategoricalAccuracy()]
)
model.fit(X[:split_idx, :, :, :], y[:split_idx, :], batch_size=256, epochs=20, validation_data=(X[split_idx:, :, :, :], y[split_idx:, :]))
Using tf.GradientTape
with tf.GradientTape() as tape:
y_pred = model(tf.stop_gradient(train_X))
loss = loss_fn(train_y, y_pred)
gradients = tape.gradient(loss, model.trainable_weights)
model.optimizer.apply_gradients(zip(gradients, model.trainable_weights))
The full code can be found here in Colab
References
https://www.tensorflow.org/guide/effective_tf2
https://www.tensorflow.org/api_docs/python/tf/GradientTape?version=stable

There are few things in tf.GradientTape code that might be fixed:
1) trainable_variables not trainable_weights. You want to apply gradients on all trainable variables, not only to the model weights
# gradients = tape.gradient(loss, model.trainable_weights)
gradients = tape.gradient(loss, model.trainable_variables)
# and
# model.optimizer.apply_gradients(zip(gradients, model.trainable_weights))
model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
2) Remove tf.stop_gradient from the input tensor.
with tf.GradientTape() as tape:
# y_pred = model(tf.stop_gradient(train_X))
y_pred = model(train_X, training=True)
Note that I also added the training parameter. It should also be included in the model definition, to include the layers that depend on the phase (like BatchNormalization and Dropout):
def call(self, X, training=None):
X = self.cnn_1(X)
X = self.bn_1(X, training=training)
X = self.cnn_2(X)
X = self.max_pool_2d(X)
X = self.dropout_1(X)
X = self.cnn_3(X)
X = self.bn_2(X, training=training)
X = self.cnn_4(X)
X = self.bn_3(X, training=training)
X = self.cnn_5(X)
X = self.max_pool_2d(X)
X = self.dropout_2(X)
X = self.flatten(X)
X = self.dense_1(X)
X = self.dropout_3(X, training=training)
X = self.dense_2(X)
return self.out(X)
With these few changes I managed to get slightly better scores, that are more comparable to keras.fit results:
[19/20] loss: 0.64020, acc: 0.76965, val_loss: 0.71291, val_acc: 0.75318: 100%|██████████| 137/137 [00:12<00:00, 11.25it/s]
[20/20] loss: 0.62999, acc: 0.77649, val_loss: 0.77925, val_acc: 0.73219: 100%|██████████| 137/137 [00:12<00:00, 11.30it/s]
The answer:
The difference was probably the fact that Keras.fit did most of these things under the hood.
Finally, just for clarity and reproducibility, the partial training/eval code I used:
for bIdx, (train_X, train_y) in enumerate(train_batch):
if bIdx < epoch_max_iter:
with tf.GradientTape() as tape:
y_pred = model(train_X, training=True)
loss = loss_fn(train_y, y_pred)
total_loss += (np.sum(loss.numpy()) * train_X.shape[0])
total_num += train_X.shape[0]
# gradients = tape.gradient(loss, model.trainable_weights)
gradients = tape.gradient(loss, model.trainable_variables)
total_acc += (metrics(train_y, y_pred) * train_X.shape[0])
running_loss = (total_loss/total_num)
running_acc = (total_acc/total_num)
# model.optimizer.apply_gradients(zip(gradients, model.trainable_weights))
model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
pbar.set_description("[{}/{}] loss: {:.5f}, acc: {:.5f}".format(e, epochs, running_loss, running_acc))
pbar.refresh()
pbar.update()
and the evaluation one:
# Eval loop
# Calculate something wrong here
val_total_loss = 0
val_total_acc = 0
total_val_num = 0
for bIdx, (val_X, val_y) in enumerate(val_batch):
if bIdx >= max_val_iterations:
break
y_pred = model(val_X, training=False)

Custom Keras loss function that conditionally creates a zero gradient

My problem is I don't want the weights to be adjusted if y_true takes certain values. I do not want to simply remove those examples from training data because of the nature of the RNN I am trying to use.
Is there a way to write a conditional loss function in Keras with this behavior?
For example: if y_true is negative then apply zero gradient so that parameters in the model do not change, if y_true is positive loss = losses.mean_squared_error(y_true, y_pred).

You can define a custom loss function and simply use K.switch to conditionally get zero loss:
from keras import backend as K
from keras import losses
def custom_loss(y_true, y_pred):
loss = losses.mean_squared_error(y_true, y_pred)
return K.switch(K.flatten(K.equal(y_true, 0.)), K.zeros_like(loss), loss)
Test:
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(1, input_shape=(1,)))
model.compile(loss=custom_loss, optimizer='adam')
weights, bias = model.layers[0].get_weights()
x = np.array([1, 2, 3])
y = np.array([0, 0, 0])
model.train_on_batch(x, y)
# check if the parameters has not changed after training on the batch
>>> (weights == model.layers[0].get_weights()[0]).all()
True
>>> (bias == model.layers[0].get_weights()[1]).all()
True

Since the y's are in batches, you need to select those from the batch which are non-zero in the custom loss function
def myloss(y_true, y_pred):
idx = tf.not_equal(y_true, 0)
y_true = tf.boolean_mask(y_true, idx)
y_pred = tf.boolean_mask(y_pred, idx)
return losses.mean_squared_error(y_true, y_pred)
Then it can be used as such:
model = keras.Sequential([Dense(32, input_shape=(2,)), Dense(1)])
model.compile('adam', loss=myloss)
x = np.random.randn(2, 2)
y = np.array([1, 0])
model.fit(x, y)
But you might need extra logic in the loss function in case all y_true in the batch were zero, in this case, the loss function can be modified as such:
def myloss2(y_true, y_pred):
idx = tf.not_equal(y_true, 0)
y_true = tf.boolean_mask(y_true, idx)
y_pred = tf.boolean_mask(y_pred, idx)
loss = tf.cond(tf.equal(tf.shape(y_pred)[0], 0), lambda: tf.constant(0, dtype=tf.float32), lambda: losses.mean_squared_error(y_true, y_pred))
return loss

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow training - print multiple losses for one output - python

Just use the metrics argument: model.compile(optimizer='adam', loss='mae', metrics=['mse']) You will still need to choose one loss to minimize.

Related

Keras - Adding loss to intermediate layer while ignoring the last layer

Custom Keras loss function with the output's gradient [duplicate]

Join metrics of every output in Keras (in multiple output)

Why does training using tf.GradientTape in tensorflow 2 have different behavior to training using fit API?

Custom Keras loss function that conditionally creates a zero gradient

Categories

Resources