Keras/Tensorflow: Combined Loss function for single output - python

I have only one output for my model, but I would like to combine two different loss functions:
def get_model():
# create the model here
model = Model(inputs=image, outputs=output)
alpha = 0.2
model.compile(loss=[mse, gse],
loss_weights=[1-alpha, alpha]
, ...)
but it complains that I need to have two outputs because I defined two losses:
ValueError: When passing a list as loss, it should have one entry per model outputs.
The model has 1 outputs, but you passed loss=[<function mse at 0x0000024D7E1FB378>, <function gse at 0x0000024D7E1FB510>]
Can I possibly write my final loss function without having to create another loss function (because that would restrict me from changing the alpha outside the loss function)?
How do I do something like (1-alpha)*mse + alpha*gse?
Update:
Both my loss functions are equivalent to the function signature of any builtin keras loss function, takes in y_true and y_pred and gives a tensor back for loss (which can be reduced to a scalar using K.mean()), but I believe, how these loss functions are defined shouldn't affect the answer as long as they return valid losses.
def gse(y_true, y_pred):
# some tensor operation on y_pred and y_true
return K.mean(K.square(y_pred - y_true), axis=-1)

Specify a custom function for the loss:
model = Model(inputs=image, outputs=output)
alpha = 0.2
model.compile(
loss=lambda y_true, y_pred: (1 - alpha) * mse(y_true, y_pred) + alpha * gse(y_true, y_pred),
...)
Or if you don't want an ugly lambda make it into an actual function:
def my_loss(y_true, y_pred):
return (1 - alpha) * mse(y_true, y_pred) + alpha * gse(y_true, y_pred)
model = Model(inputs=image, outputs=output)
alpha = 0.2
model.compile(loss=my_loss, ...)
EDIT:
If your alpha is not some global constant, you can have a "loss function factory":
def make_my_loss(alpha):
def my_loss(y_true, y_pred):
return (1 - alpha) * mse(y_true, y_pred) + alpha * gse(y_true, y_pred)
return my_loss
model = Model(inputs=image, outputs=output)
alpha = 0.2
my_loss = make_my_loss(alpha)
model.compile(loss=my_loss, ...)

Yes, define your own custom loss function and pass that to the loss argument upon compiling:
def custom_loss(y_true, y_pred):
return (1-alpha) * K.mean(K.square(y_true-y_pred)) + alpha * gse
(Not sure what you mean with gse). It can be helpful to have a look at how the vanilla losses are implemented in Keras: https://github.com/keras-team/keras/blob/master/keras/losses.py

loss function should be one function.You are giving your model a list of two functions
try:
def mse(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
model.compile(loss= (mse(y_true, y_pred)*(1-alpha) + gse(y_true, y_pred)*alpha),
, ...)

Not that this answer particularly addresses the original question, I thought of writing it because the same error occurs when trying to load a keras model that has a custom loss using keras.models.load_model, and it's not been properly answered anywhere. Specifically, following the VAE example code in keras github repository, this error occurs when loading the VAE model after been saved with model.save.
The solution is to save only the weights using vae.save_weights('file.h5') instead of saving the full model. However, you would have to build and compile the model again before loading the weights using vae.load_weights('file.h5').
Following is an example implementation.
class VAE():
def build_model(self): # latent_dim and intermediate_dim can be passed as arguments
def sampling(args):
"""Reparameterization trick by sampling from an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
batch = K.shape(z_mean)[0]
dim = K.int_shape(z_mean)[1]
# by default, random_normal has mean = 0 and std = 1.0
epsilon = K.random_normal(shape=(batch, dim))
return z_mean + K.exp(0.5 * z_log_var) * epsilon
# original_dim = self.no_features
# intermediate_dim = 256
latent_dim = 8
inputs = Input(shape=(self.no_features,))
x = Dense(256, activation='relu')(inputs)
x = Dense(128, activation='relu')(x)
x = Dense(64, activation='relu')(x)
z_mean = Dense(latent_dim, name='z_mean')(x)
z_log_var = Dense(latent_dim, name='z_log_var')(x)
# use reparameterization trick to push the sampling out as input
# note that "output_shape" isn't necessary with the TensorFlow backend
z = Lambda(sampling, output_shape=(latent_dim,), name='z')([z_mean, z_log_var])
# instantiate encoder model
encoder = Model(inputs, [z_mean, z_log_var, z], name='encoder')
# build decoder model
latent_inputs = Input(shape=(latent_dim,), name='z_sampling')
x = Dense(32, activation='relu')(latent_inputs)
x = Dense(48, activation='relu')(x)
x = Dense(64, activation='relu')(x)
outputs = Dense(self.no_features, activation='linear')(x)
# instantiate decoder model
decoder = Model(latent_inputs, outputs, name='decoder')
# instantiate VAE model
outputs = decoder(encoder(inputs)[2])
VAE = Model(inputs, outputs, name='vae_mlp')
reconstruction_loss = mse(inputs, outputs)
reconstruction_loss *= self.no_features
kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
vae_loss = K.mean(reconstruction_loss + kl_loss)
VAE.add_loss(vae_loss)
VAE.compile(optimizer='adam')
return VAE
Now,
vae_cls = VAE()
vae = vae_cls.build_model()
# vae.fit()
vae.save_weights('file.h5')
Load model and predict (if in a different script, you need to import the VAE class),
vae_cls = VAE()
vae = vae_cls.build_model()
vae.load_weights('file.h5')
# vae.predict()
Finally, The Difference: [ref]
Keras model.save saves,
Model weights
Model architecture
Model compilation details (loss function(s) and metrics)
Model optimizer and regularizer states
Keras model.save_weights saves only the model weights. Keras model.to_json() saves the model architecture.
Hope this helps someone experimenting with variational autoencoders.

Combine MAE and RMSE together:
import tensorflow as tf
from tensorflow import keras
def loss_fn_mae_rmse(y_true, y_pred, alpha=0.8):
mae = keras.losses.MeanAbsoluteError()
mse = keras.losses.MeanSquaredError()
return alpha * mae(y_true, y_pred) + (1 - alpha) * tf.sqrt(mse(y_true, y_pred))
model = keras.Model(inputs=..., outputs=...)
opt = keras.optimizers.Adam(learning_rate=1e-4)
model.compile(optimizer=opt, loss=loss_fn_mae_rmse, metrics=['mae'])
At the same time, if you want to load this model after training and saved to disk:
model = keras.models.load_model('path/to/model.h5', custom_objects={'loss_fn_mae_rmse': loss_fn_mae_rmse})

Related

Custom loss function for out of distribution detection using CNN in Tensorflow 2.0+

My question is in reference to the paper Learning Confidence for Out-of-Distribution Detection in Neural Networks.
I need help in creating a custom loss function in tensorflow 2.0+ as per the paper to get confident prediction from the CNN on a in distribution (if the image belongs to train categories) image while a low prediction for an out of distribution (any random image) image. The paper suggests adding a confidence estimation branch to any conventional feedforward architecture in parallel with the original class prediction branch (refer to image below)
In order to define the loss function, the softmax prediction probabilities are adjusted by interpolating between the original predictions(pi) and the target probability distribution y, where the degree of interpolation is indicated by the network’s confidence(c):
pi'= c · pi + (1 − c)yi and the final loss is :
I need help in implementing this along with the loss function in Tensorflow 2.0+, below is what I could think of, from my knowledge:
import tensorflow.keras.backend as k
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications import ResNet50
#Defining custom loss function
def custom_loss(c):
def loss(y_true, y_pred):
interpolated_p = c*y_pred+ (1-c)*y_true
return -k.reduce_sum((k.log(interpolated_p) * y_true), axis=-1) - k.log(c)
return loss
#Defining model strcuture using resnet50
basemodel = ResNet50(weights = "imagenet",include_top = False)
headmodel = basemodel.output
headmodel = layers.AveragePooling2D(pool_size = (7,7))(headmodel)
#Add a sigmoid layer to the pooling output
conf_branch = layers.Dense(1,activation = "sigmoid",name = "confidence_branch")(headmodel)
# Add a softmax layer after the pooling output
softmax_branch = layers.Dense(10,activation = "softmax",name = "softmax_branch")(headmodel)
# Instantiate an end-to-end model predicting both confidence and class prediction
model = keras.Model(
inputs=basemodel.input,
outputs=[softmax_branch, conf_branch],
)
model.compile(loss=custom_loss(c=conf_branch.output), optimizer='rmsprop')
Appreciate any help on this ! Thanks !
The following is the code I wrote for the keras implementation:
num_classes = 10
basemodel = ResNet50(weights = "imagenet",include_top = False)
headmodel = basemodel.output
headmodel = layers.AveragePooling2D(pool_size = (7,7))(headmodel)
conf_branch = layers.Dense(1,activation = "sigmoid",name="confidence_branch")(headmodel)
softmax_branch = layers.Dense(num_classes,activation = "softmax",name = "softmax_branch")(headmodel)
output = Concatenate(axis=-1)([softmax_branch , conf_branch])
def custom_loss(y_true, y_pred, budget=0.3):
with tf.compat.v1.variable_scope("LAMBDA", reuse=tf.compat.v1.AUTO_REUSE):
LAMBDA = tf.compat.v1.get_variable("LAMBDA", dtype=tf.float32, initializer=tf.constant(0.1))
pred_original = y_pred[:, 0:num_classes]
confidence = y_pred[:, num_classes]
eps = 1e-12
pred_original = tf.clip_by_value(pred_original, 0. + eps, 1. - eps)
confidence = tf.clip_by_value(confidence, 0. + eps, 1. - eps)
b = np.random.uniform(size=y_true.shape[0], low=0.0, high=1.0)
conf = confidence * b + (1 - b)
conf = tf.expand_dims(conf, axis=-1)
pred_new = pred_original * conf + y_true * (1 - conf)
xentropy_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.math.log(pred_new), axis=-1))
confidence_loss = tf.reduce_mean(-tf.math.log(confidence))
total_loss = xentropy_loss + LAMBDA * confidence_loss
def true_func():
return LAMBDA / 1.01
def false_func():
return LAMBDA / 0.99
LAMBDA_NEW = tf.cond(budget > confidence_loss, true_func, false_func)
LAMBDA.assign(LAMBDA_NEW)
# tf.print(LAMBDA)
return total_loss
def accuracy(y_true, y_pred):
y_pred = y_pred[:, :num_classes]
correct_pred = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y_true, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
return accuracy
model = Model(inputs=basemodel.input, outputs=output)
optimizer = keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss=custom_loss, optimizer=optimizer, metrics=[accuracy])

How to apply a loss metric that will penalize predicting all zeros in multilabel classification problem?

Say I have a classification problem that has 30 potential binary labels. These labels are not mutually exclusive. The labels tend to be sparse--there is, on average, 1 positive label per all 30 labels but sometimes more than only 1. In the following code, how can I penalize the model from predicting all zeros? The accuracy will be high, but recall will be awful!
import numpy as np
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
OUTPUT_NODES = 30
np.random.seed(0)
def get_dataset():
"""
Get a dataset of X and y. This is a learnable problem as there is some signal in the features. 10% of the time, a
positive-output's index will also have a positive feature for that index
:return: X and y data for training
"""
n_observations = 30000
y = np.random.rand(n_observations, OUTPUT_NODES)
y = (y <= (1 / OUTPUT_NODES)).astype(int) # Makes a sparse output where there is roughly 1 positive label: ((1 / OUTPUT_NODES) * OUTPUT_NODES ≈ 1)
X = np.zeros((n_observations, OUTPUT_NODES))
for i in range(len(y)):
for j, feature in enumerate(y[i]):
if feature == 1:
X[i][j] = 1 if np.random.rand(1) > 0.9 else 0 # Makes the input features more noisy
# X[i][j] = 1 # Using this instead will make the model perform very well
return X, y
def create_model():
input_layer = Input(shape=(OUTPUT_NODES, ))
dense1 = Dense(100, activation='relu')(input_layer)
dense2 = Dense(100, activation='relu')(dense1)
output_layer = Dense(30, activation='sigmoid')(dense2)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['Recall'])
return model
def main():
X, y = get_dataset()
model = create_model()
model.fit(X, y, epochs=10, batch_size=10)
X_pred = np.random.randint(0, 2, (100, OUTPUT_NODES))
y_pred = model.predict(X_pred)
print(X_pred)
print(y_pred.round(1))
if __name__ == '__main__':
main()
I believe I read here that I could use:
weighted_cross_entropy_with_logits
to address this issue. How would that affect my final output layer's activation functions? Would I have to have an activation function? How do I specify a penalty to misclassifications of a true positive class?
Ok, it is an interesting problem
First you need to define a weighted cross entropy loss wrapper:
def wce_logits(positive_class_weight=1.):
def mylossw(y_true, logits):
cross_entropy = tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(logits=logits, labels=tf.cast(y_true, dtype=tf.float32), pos_weight=positive_class_weight))
return cross_entropy
return mylossw
The positive_class_weight is applied to the positive class data. You need this wrapper for tf.nn.weighted_cross_entropy_with_logits to get a loss function that takes y_true and y_pred (only) as inputs.
Note that you must cast y_true to float32.
Second, you can not use the predefined Recall, because it does not work with logits. I found a workaround in this discussion
class Recall(tf.keras.metrics.Recall):
def __init__(self, from_logits=False, *args, **kwargs):
super().__init__(*args, **kwargs)
self._from_logits = from_logits
def update_state(self, y_true, y_pred, sample_weight=None):
if self._from_logits:
super(Recall, self).update_state(y_true, tf.nn.sigmoid(y_pred), sample_weight)
else:
super(Recall, self).update_state(y_true, y_pred, sample_weight)
Finally, you need to remove the sigmoid activation from the last layer as you are using logits
def create_model():
input_layer = Input(shape=(OUTPUT_NODES, ))
dense1 = Dense(100, activation='relu')(input_layer)
dense2 = Dense(100, activation='relu')(dense1)
output_layer = Dense(30)(dense2)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer='adam', loss=wce_logits(positive_class_weight=27.), metrics=[Recall(from_logits=True)])
return model
Note that the positive weight is set to 27 here. You can read a discussion on how to correctly calculate the weight

How to write this custom loss function so it produces a loss for each sample?

I'm using this custom loss function for ccc
def ccc(y_true, y_pred):
ccc = ((ccc_v(y_true, y_pred) + ccc_a(y_true, y_pred)) / 2)
return 1 - ccc
def ccc_v(y_true, y_pred):
x = y_true[:,0]
y = y_pred[:,0]
x_mean = K.mean(x, axis=0)
y_mean = K.mean(y, axis=0)
covar = K.mean( (x - x_mean) * (y - y_mean) )
x_var = K.var(x)
y_var = K.var(y)
ccc = (2.0 * covar) / (x_var + y_var + (x_mean + y_mean)**2)
return ccc
def ccc_a(y_true, y_pred):
x = y_true[:,1]
y = y_pred[:,1]
x_mean = K.mean(x, axis=0)
y_mean = K.mean(y, axis=0)
covar = K.mean( (x - x_mean) * (y - y_mean) )
x_var = K.var(x)
y_var = K.var(y)
ccc = (2.0 * covar) / (x_var + y_var + (x_mean + y_mean)**2)
return ccc
Currently the loss function ccc returns a scalar. The loss function is split into 2 different functions (ccc_v and ccc_a) because I use them as metrics as well.
I've read from Keras doc and this question that a custom loss function should return a list of losses, one for each sample.
First question: my model trains even if the loss function returns a scalar. Is it that bad? How is training different if I use a loss function whose output is a scalar instead of a list of scalars?
Second question: how can I rewrite my loss function to return a list of losses? I know I should avoid means and sums but in my case I think it's not possible because there's not a global mean but different ones, one a the numerator for the covariance and a couple at the denominator for the variances.
if your using tensorflow there are automatic apis for calculating loss
tf.keras.losses.mse()
tf.keras.losses.mae()
tf.keras.losses.Huber()
# Define the loss function
def loss_function(w1, b1, w2, b2, features = borrower_features, targets = default):
predictions = model(w1, b1, w2, b2)
# Pass targets and predictions to the cross entropy loss
return keras.losses.binary_crossentropy(targets, predictions)
#if your using categorical_crossentropy than return the losses for it.
#convert your image into a single np.array for input
#build your SoftMax model
# Define a sequential model
model=keras.Sequential()
# Define a hidden layer
model.add(keras.layers.Dense(16, activation='relu', input_shape=(784,)))
# Define the output layer
model.add(keras.layers.Dense(4,activation='softmax'))
# Compile the model
model.compile('SGD', loss='categorical_crossentropy',metrics=['accuracy'])
# Complete the fitting operation
train_data=train_data.reshape((50,784))
# Fit the model
model.fit(train_data, train_labels, validation_split=0.2, epochs=3)
# Reshape test data
test_data = test_data.reshape(10, 784)
# Evaluate the model
model.evaluate(test_data, test_labels)

Keras my_layer.output returning KerasTensor object instead of Tensor object (in custom loss function)

I'm trying to build a custom loss function in Keras v2.4.3:
(as explained in this answer)
def vae_loss(x: tf.Tensor, x_decoded_mean: tf.Tensor,
original_dim=original_dim):
z_mean = encoder.get_layer('mean').output
z_log_var = encoder.get_layer('log-var').output
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(
1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)
return vae_loss
But I think it's behaving much different than expected (perhaps because of my Keras version?), I'm getting this error:
TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
And I think that's because encoder.get_layer('mean').output is returning a KerasTensor object instead of a tf.Tensor object (as the other answer indicates).
What am I doing wrong here? How can I access the output of a given layer from inside a custom loss function?
I think it's very simple using model.add_loss(). this functionality enables you to pass multiple inputs to your custom loss.
To make a reliable example I produce a simple VAE where I add the VAE loss using model.add_loss()
The full model structure is like below:
def sampling(args):
z_mean, z_log_sigma = args
batch_size = tf.shape(z_mean)[0]
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0., stddev=1.)
return z_mean + K.exp(0.5 * z_log_sigma) * epsilon
def vae_loss(x, x_decoded_mean, z_log_var, z_mean):
xent_loss = original_dim * K.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var))
vae_loss = K.mean(xent_loss + kl_loss)
return vae_loss
def get_model():
### encoder ###
inp = Input(shape=(n_features,))
enc = Dense(64)(inp)
z = Dense(32, activation="relu")(enc)
z_mean = Dense(latent_dim)(z)
z_log_var = Dense(latent_dim)(z)
encoder = Model(inp, [z_mean, z_log_var])
### decoder ###
inp_z = Input(shape=(latent_dim,))
dec = Dense(64)(inp_z)
out = Dense(n_features)(dec)
decoder = Model(inp_z, out)
### encoder + decoder ###
z_mean, z_log_sigma = encoder(inp)
z = Lambda(sampling)([z_mean, z_log_var])
pred = decoder(z)
vae = Model(inp, pred)
vae.add_loss(vae_loss(inp, pred, z_log_var, z_mean)) # <======= add_loss
vae.compile(loss=None, optimizer='adam')
return vae, encoder, decoder
The running notebook is available here: https://colab.research.google.com/drive/18day9KMEbH8FeYNJlCum0xMLOtf1bXn8?usp=sharing

Knowledge Distillation loss with Tensorflow 2 + Keras

I am trying to implement a very simple keras model that uses Knowledge Distillation [1] from another model.
Roughly, I need to replace the original loss L(y_true, y_pred) by L(y_true, y_pred)+L(y_teacher_pred, y_pred) where y_teacher_pred is the prediction of another model.
I've tried to do
def create_student_model_with_distillation(teacher_model):
inp = tf.keras.layers.Input(shape=(21,))
model = tf.keras.models.Sequential()
model.add(inp)
model.add(...)
model.add(tf.keras.layers.Dense(units=1))
teacher_pred = teacher_model(inp)
def my_loss(y_true,y_pred):
loss = tf.keras.losses.mean_squared_error(y_true, y_pred)
loss += tf.keras.losses.mean_squared_error(teacher_pred, y_pred)
return loss
model.compile(loss=my_loss, optimizer='adam')
return model
However, when I try to call fit on my model, I am getting
TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
How can I solve this issue ?
Refs
[1] https://arxiv.org/abs/1503.02531
Actually, this blogpost is answer to your question: keras blog
But in short - you should use new TF2 API and call teacher's predict before the tf.GradientTape() block:
def train_step(self, data):
# Unpack data
x, y = data
# Forward pass of teacher
teacher_predictions = self.teacher(x, training=False)
with tf.GradientTape() as tape:
# Forward pass of student
student_predictions = self.student(x, training=True)
# Compute losses
student_loss = self.student_loss_fn(y, student_predictions)
distillation_loss = self.distillation_loss_fn(
tf.nn.softmax(teacher_predictions / self.temperature, axis=1),
tf.nn.softmax(student_predictions / self.temperature, axis=1),
)
loss = self.alpha * student_loss + (1 - self.alpha) * distillation_loss

Categories