Part of custom loss function not differentiable - python

For my problem, I want to predict customer review scores ranging from 1 to 5. I thought it would be good to implement this as a regression problem because a predicted 1 from the model while 5 being the true value should be a "worse" prediction than 4. It is also wished, that the model performs somehow equally good for all review score classes. Because my dataset is highly unbalanced I want to create a own loss function. With a normal MSE implemenation the model will only be good at predicting the majority class. Therefore I want to estimate the MSE for every review class (1,2,3,4 and 5 star reviews) and then take the mean of these five MSE scores. That's how the model should be incentivized to also perform good at predicting the smaller classes.
The following code shows my tensorflow loss function:
def custom_loss_function(y_true, y_pred):
y_true_t = tf.cast(tf.Variable(y_true, validate_shape=False),dtype="float16")
y_pred_t = tf.cast(tf.Variable(y_pred, validate_shape=False),dtype="float16")
one = tf.cast(tf.Variable([1]),dtype="float16")
two = tf.cast(tf.Variable([2]),dtype="float16")
three = tf.cast(tf.Variable([3]),dtype="float16")
four = tf.cast(tf.Variable([4]),dtype="float16")
five = tf.cast(tf.Variable([5]),dtype="float16")
# index for a given review class
i_one = tf.where(tf.equal(y_true_t, one))
i_two = tf.where(tf.equal(y_true_t, two))
i_three = tf.where(tf.equal(y_true_t, three))
i_four = tf.where(tf.equal(y_true_t, four))
i_five = tf.where(tf.equal(y_true_t, five))
# predictions for the found indicies for a review class
y_pred_one=tf.gather(y_pred_t,i_one)
y_pred_two=tf.gather(y_pred_t,i_two)
y_pred_three=tf.gather(y_pred_t,i_three)
y_pred_four=tf.gather(y_pred_t,i_four)
y_pred_five=tf.gather(y_pred_t,i_five)
y_true_one=tf.gather(y_true_t,i_one)
y_true_two=tf.gather(y_true_t,i_two)
y_true_three=tf.gather(y_true_t,i_three)
y_true_four=tf.gather(y_true_t,i_four)
y_true_five=tf.gather(y_true_t,i_five)
mse1 = tf.reduce_mean(tf.cast(tf.square(y_true_one-y_pred_one),dtype="float"))
mse2 = tf.reduce_mean(tf.cast(tf.square(y_true_two-y_pred_two),dtype="float"))
mse3 = tf.reduce_mean(tf.cast(tf.square(y_true_three-y_pred_three),dtype="float"))
mse4 = tf.reduce_mean(tf.cast(tf.square(y_true_four-y_pred_four),dtype="float"))
mse5 = tf.reduce_mean(tf.cast(tf.square(y_true_five-y_pred_five),dtype="float"))
mse = (mse1+mse2+mse3+mse4+mse5)/5
return mse
When I try to use this loss function in my NN I get the following message:
ValueError: An operation has None for gradient. Please make sure
that all of your ops have a gradient defined (i.e. are
differentiable). Common ops without gradient: K.argmax, K.round,
K.eval.
I couldn't figure out why this error occurs. I thought that all tf operations should be differentiable but I could also be wrong.
Thanks for any help.

Related

pytorch cross-entropy-loss weights not working

I was playing around with some code and and it behaved differently than what i expected. So i dumbed it down to a minimally working example:
import torch
test_act = torch.tensor([[2.,0.]])
test_target = torch.tensor([0])
loss_function_test = torch.nn.CrossEntropyLoss()
loss_test = loss_function_test(test_act, test_target)
print(loss_test)
> tensor(0.1269)
weights=torch.tensor([0.1,0.5])
loss_function_test = torch.nn.CrossEntropyLoss(weight=weights)
loss_test = loss_function_test(test_act, test_target)
print(loss_test)
> tensor(0.1269)
As you can see the outputs are the same regardless if there are weights present or not. But i would expect the second output to be 0.0127
Is there some normalization going on that I dont know about? Or is it possibly bugged?
In this example, I add a second dataum with a different target class, and the effect of weights is visible.
import torch
test_act = torch.tensor([[2.,1.],[1.,4.]])
test_target = torch.tensor([0,1])
loss_function_test = torch.nn.CrossEntropyLoss()
loss_test = loss_function_test(test_act, test_target)
print(loss_test)
>>> tensor(0.1809)
weights=torch.tensor([0.1,0.5])
loss_function_test = torch.nn.CrossEntropyLoss(weight=weights)
loss_test = loss_function_test(test_act, test_target)
print(loss_test)
>>> tensor(0.0927)
This effect is because "The losses are averaged across observations for each minibatch. If the weight argument is specified then this is a weighted average" but only across the minibatch.
Personally I find this a bit strange and would think it would be useful to apply the weights globally (i.e. even if all classes are not present in each minibatch). One of the prominent uses of the weight parameter would ostensibly be to give more weight to classes that are under-represented in the dataset, but by this formulation the minority classes are only given higher weights for the minibatches in which they are present (which, of course, is a low percentage because they are a minority class).
In any case that is how Pytorch defines this operation.

Tensorflow Custom Regularization Term comparing the Prediction to the True value

Hello I am in need of a custom regularization term to add to my (binary cross entropy) Loss function. Can somebody help me with the Tensorflow syntax to implement this?
I simplified everything as much as possible so it could be easier to help me.
The model takes a dataset 10000 of 18 x 18 binary configurations as input and has a 16x16 of a configuration set as output. The neural network consists only of 2 Convlutional layer.
My model looks like this:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
EPOCHS = 10
model = models.Sequential()
model.add(layers.Conv2D(1,2,activation='relu',input_shape=[18,18,1]))
model.add(layers.Conv2D(1,2,activation='sigmoid',input_shape=[17,17,1]))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),loss=tf.keras.losses.BinaryCrossentropy())
model.fit(initial.reshape(10000,18,18,1),target.reshape(10000,16,16,1),batch_size = 1000, epochs=EPOCHS, verbose=1)
output = model(initial).numpy().reshape(10000,16,16)
Now I wrote a function which I'd like to use as an aditional regularization terme to have as a regularization term. This function takes the true and the prediction. Basically it multiplies every point of both with its 'right' neighbor. Then the difference is taken. I assumed that the true and prediction term is 16x16 (and not 10000x16x16). Is this correct?
def regularization_term(prediction, true):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
I would really appreciate some help with adding something like this function as a regularization term to my loss for helping the neural network to train better to this 'right neighbor' interaction. I'm really struggling with using the customizable Tensorflow functionalities a lot.
Thank you, much appreciated.
It is quite simple. You need to specify a custom loss in which you define your adding regularization term. Something like this:
# to minimize!
def regularization_term(true, prediction):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
def my_custom_loss(y_true, y_pred):
return tf.keras.losses.BinaryCrossentropy()(y_true, y_pred) + regularization_term(y_true, y_pred)
model.compile(optimizer='Adam', loss=my_custom_loss)
As stated by keras:
Any callable with the signature loss_fn(y_true, y_pred) that returns
an array of losses (one of sample in the input batch) can be passed to
compile() as a loss. Note that sample weighting is automatically
supported for any such loss.
So be sure to return an array of losses (EDIT: as I can see now it is possible to return also a simple scalar. It doesn't matter if you use for example the reduce function). Basically y_true and y_predicted have as first dimension the batch size.
here details: https://keras.io/api/losses/

How to create weighted cross entropy loss?

I have to deal with highly unbalanced data. As I understand, I need to use weighted cross entropy loss.
I tried this:
import tensorflow as tf
weights = np.array([<values>])
def loss(y_true, y_pred):
# weights.shape = (63,)
# y_true.shape = (64, 63)
# y_pred.shape = (64, 63)
return tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, weights))
model.compile('adam', loss=loss, metrics=['acc'])
But there's an error:
ValueError: Creating variables on a non-first call to a function decorated with tf.function
How can I create this kind of loss?
I suggest in the first instance to resort to using class_weight from Keras.
class_weight
is a dictionary with {label:weight}
For example, if you have 20 times more examples in label 1 than in label 0, then you can write
# Assign 20 times more weight to label 0
model.fit(..., class_weight = {0:20, 1:0})
In this way you don't need to worry implementing weighted CCE on your own.
Additional note : in your model.compile() do not forget to use weighted_metrics=['accuracy'] in order to have a relevant reflection of your accuracy.
model.fit(..., class_weight = {0:20, 1:0}, weighted_metrics = ['accuracy'])
class weights is a dictionary that compensates for the imbalance in the data set. For example if you had a data set of 1000 dog images and 100 cat images your classifier be biased toward the dog class. If it predicted dog each time it would be correct 90 percent of the time. To compensate for the imbalance the class_weights dictionary enables you to weight samples of cats 10 times higher than that of dogs when calculating loss. One way is to use the class_weight method from sklearn as shown below
from sklearn.utils import class_weight
import numpy as np
class_weights = class_weight.compute_class_weight(
'balanced',
np.unique(train_generator.classes),
train_generator.classes)
If you are working with imbalance classes, you should use the class weights. For example if you have two classes where class 0 has twice as more data than class 1 :
class_weight = {0 :1, 1: 2}
When you compile, make use the weighted_metrics instead of just metrics or else the model won't take into account the class weights when calculating the accuracy and it will be unrealistically high.
model.compile(loss="binary_crossentropy",optimizer='adam', weighted_metrics=['accuracy'])
hist = model.fit_generator(train,validation_split=0.2,epochs=20,class_weight=class_weight)

custom class-wise loss function in tensorflow

For my problem, I want to predict customer review scores ranging from 1 to 5.
I thought it would be good to implement this as a regression problem because a predicted 1 from the model while 5 being the true value should be a "worse" prediction than 4.
It is also wished, that the model performs somehow equally good for all review score classes.
Because my dataset is highly unbalanced I want to create a metric/loss that is capable of capturing this (I think just as F1 for classification).
Therefore I created following metric (for now just mse is relevant):
def custom_metric(y_true, y_pred):
df = pd.DataFrame(np.column_stack([y_pred, y_true]), columns=["Predicted", "Truth"])
class_mse = 0
#class_mae = 0
print("MAE for Classes:")
for i in df.Truth.unique():
temp = df[df["Truth"]==i]
mse = mean_squared_error(temp.Truth, temp.Predicted)
#mae = mean_absolute_error(temp.Truth, temp.Predicted)
print("Class {}: {}".format(i, mse))
class_mse += mse
#class_mae += mae
print()
print("AVG MSE over Classes {}".format(class_mse/len(df.Truth.unique())))
#print("AVG MAE over Classes {}".format(class_mae/len(df.Truth.unique())))
Now an example prediction:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, mean_absolute_error
# sample predictions: "model" messed up at class 2 and 3
y_true = np.array((1,1,1,2,2,2,3,3,3,4,4,4,5,5,5))
y_pred = np.array((1,1,1,2,2,3,5,4,3,4,4,4,5,5,5))
custom_metric(y_true, y_pred)
Now my question: Is it able to create a custom tensorflow loss function which is able to act in a similar behaviour? I also worked on this implementation which is not yet ready for tensorflow but maybe more alike:
def custom_metric(y_true, y_pred):
mse_class = 0
num_classes = len(np.unique(y_true))
stacked = np.vstack((y_true, y_pred))
for i in np.unique(stacked[0]):
y_true_temp = stacked[0][np.where(stacked[0]==i)]
y_pred_temp = stacked[1][np.where(stacked[0]==i)]
mse = np.mean(np.square(y_pred_temp - y_true_temp))
mse_class += mse
return mse_class/num_classes
But still, I am not sure how to work around the for loop for a tensorflow like definition.
Thanks in advance for any help!
The for loop should be dealt with exactly by means of numpy/tensorflow operations on a tensor.
A custom metric example would be:
from keras import backend as K
def custom_mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
where y_true is the ground truth label, y_pred are your predictions. You can see there are not explicit for-loops.
The motivation for not using for loops is that vectorized operations (which are present both in numpy and tensorflow) take advantage of the modern CPU architectures, turning multiple iterative operations into matrix ones. Consider that a dot-product implementation in numpy takes approximately 30 times less than a regular for-loop in Python.

keras combining two losses with adjustable weights where the outputs do not have the same dimensionality

My question is similar to the one posed here:
keras combining two losses with adjustable weights
However, the outputs have a different dimensionality resulting in the outputs not being able to be concatenated. Hence, the solution is not applicable, is there another way to solve this problem?
The question:
I have a keras functional model with two layers with outputs x1 and x2.
x1 = Dense(1,activation='relu')(prev_inp1)
x2 = Dense(2,activation='relu')(prev_inp2)
I need to use these x1 and x2 use them in a weighted loss function like in the attached image. Propagate the 'same loss' into both branches. Alpha is flexible to vary with iterations.
For this question, a more elaborated solution is necessary. Since we're going to use a trainable weight, we will need a custom layer.
Also, we will be needing a different form of training, since our loss doesn't work like the others taking only y_true and y_pred and considers joining two different outputs.
Thus, we're going to create two versions of the same model, one for prediction, another for training, and the training version will contain the loss in itself, using a dummy keras loss function in compilation.
The prediction model
Let's use a very basic example of model with two outputs and one input:
#any input your true model takes
inp = Input((5,5,2))
#represents the localization output
outImg = Conv2D(1,3,activation='sigmoid')(inp)
#represents the classification output
outClass = Flatten()(inp)
outClass = Dense(2,activation='sigmoid')(outClass)
#the model
predictionModel = Model(inp, [outImg,outClass])
You use this one regularly for predictions. It's not necessary to compile this one.
The losses for each branch
Now, let's create custom loss functions for each branch, one for LossCls and another for LossLoc.
Using dummy examples here, you can elaborate these losses better if necessary. The most important is that they output batches shaped like (batch, 1) or (batch,). Both output the same shape so they can be summed later.
def calcImgLoss(x):
true,pred = x
loss = binary_crossentropy(true,pred)
return K.mean(loss, axis=[1,2])
def calcClassLoss(x):
true,pred = x
return binary_crossentropy(true,pred)
These will be used in Lambda layers in the training model.
The loss weighting layer - (WARNING! EDITED! - See explanation at the end)
Now, let's weight the losses with the trainable alpha. Trainable parameters need custom layers to be implemented.
class LossWeighter(Layer):
def __init__(self, **kwargs): #kwargs can have 'name' and other things
super(LossWeighter, self).__init__(**kwargs)
#create the trainable weight here, notice the constraint between 0 and 1
def build(self, inputShape):
self.weight = self.add_weight(name='loss_weight',
shape=(1,),
initializer=Constant(0.5),
constraint=Between(0,1),
trainable=True)
super(LossWeighter,self).build(inputShape)
def call(self,inputs):
#old answer: will always tend to completely ignore the biggest loss
#return (self.weight * firstLoss) + ((1-self.weight)*secondLoss)
#problem: alpha tends to 0 or 1, eliminating the biggest of the two losses
#proposal of working alpha optimization
#return K.square((self.weight * firstLoss) - ((1-self.weight)*secondLoss))
#problem: might not train any of the losses, and even increase one of them
#in order to minimize the difference between the two losses
#new answer - a mix between the two, applying gradients to the right weights
loss1, loss2 = inputs #trainable
static_loss1 = K.stop_gradient(loss1) #non_trainable
static_loss2 = K.stop_gradient(loss2) #non_trainable
a1 = self.weight #trainable
a2 = 1 - a1 #trainable
static_a1 = K.stop_gradient(a1) #non_trainable
static_a2 = 1 - static_a1 #non_trainable
#this trains only alpha to minimize the difference between both losses
alpha_loss = K.square((a1 * static_loss1) - (a2 * static_loss2))
#or K.abs (.....)
#this trains only the original model weights to minimize both original losses
model_loss = (static_a1 * loss1) + (static_a2 * loss2)
return alpha_loss + model_loss
def compute_output_shape(self,inputShape):
return inputShape[0]
Notice that there is a custom constraint to keep this weight between 0 and 1. This constraint is implemented with:
class Between(Constraint):
def __init__(self,min_value,max_value):
self.min_value = min_value
self.max_value = max_value
def __call__(self,w):
return K.clip(w,self.min_value, self.max_value)
def get_config(self):
return {'min_value': self.min_value,
'max_value': self.max_value}
The training model
This model will take the prediction model as base, add the loss calculations and loss weighter at the end and output only the loss value. Because it outputs only a loss, we will use the true targets as inputs, and a dummy loss function defined like:
def ignoreLoss(true,pred):
return pred #this just tries to minimize the prediction without any extra computation
Model inputs:
#true targets
trueImg = Input((3,3,1))
trueClass = Input((2,))
#predictions from the prediction model
predImg = predictionModel.outputs[0]
predClass = predictionModel.outputs[1]
Model outputs = losses:
imageLoss = Lambda(calcImgLoss, name='loss_loc')([trueImg, predImg])
classLoss = Lambda(calcClassLoss, name='loss_cls')([trueClass, predClass])
weightedLoss = LossWeighter(name='weighted_loss')([imageLoss,classLoss])
Model:
trainingModel = Model([predictionModel.input, trueImg, trueClass], weightedLoss)
trainingModel.compile(optimizer='sgd', loss=ignoreLoss)
Dummy training
inputImages = np.zeros((7,5,5,2))
outputImages = np.ones((7,3,3,1))
outputClasses = np.ones((7,2))
dummyOut = np.zeros((7,))
trainingModel.fit([inputImages,outputImages,outputClasses], dummyOut, epochs = 50)
predictionModel.predict(inputImages)
Necessary imports
from keras.layers import *
from keras.models import Model
from keras.constraints import Constraint
from keras.initializers import Constant
from keras.losses import binary_crossentropy #or another you need
(EDIT) Explaining the problem with the old answer:
The formula used in the old answer would make alpha always go to 0 or 1, meaning only the smallest of the two losses would be ever trained. (Useless)
A new formula leads alpha to make both losses have the same value. Alpha would be trained properly and not tend to 0 or 1. But, still, the losses would not be properly trained because "increasing one loss to reach the other" would be a possibility for the model, and once both losses were equal, the model would stop training.
The new solution is a mix of the two proposals above, while the first actually trains the losses but with wrong alpha; and the second trains alpha with wrong losses. The mixed solution adds both, but uses K.stop_gradient to prevent the wrong part of the training from happening.
The result of this will be: the "easiest" loss (not the biggest) will be more trained than the hardest. We may use K.abs or K.square, as compared to "mae" or "mse" between the two losses. The best option is up to experiment.
See this table comparing the old and new proposals:
This does not guarantee the best optimization though!!!
Training the easiest loss will not always have the best result, though. It may be better than favoring a huge loss just because it's formula is different. But the expected result might still need some manual weighting of the losses.
I fear there is no automatic training for this weight. If you have a target metric, you can try to train this metric (when possible, but metrics that depend on sorting, getting an index, rounding or anything that breaks backpropagation may not be possible to be transformed in losses).
There is no need to concatenate your outputs. To pass multiple arguments to a loss function, you can wrap it as follows:
def custom_loss(x1, x2, y1, y2, alpha):
def loss(y_true, y_pred):
return (1-alpha) * loss_cls(y1, x1) + alpha * loss_loc(y2, x2)
return loss
And then compile your functional model as:
x1 = Dense(1, activation='relu')(prev_inp1)
x2 = Dense(2, activation='relu')(prev_inp2)
y1 = Input((1,))
y2 = Input((2,))
model.compile('sgd',
loss=custom_loss(x1, x2, y1, y2, 0.5),
target_tensors=[y1, y2])
NOTE: Not tested.

Categories