custom class-wise loss function in tensorflow - python

For my problem, I want to predict customer review scores ranging from 1 to 5.
I thought it would be good to implement this as a regression problem because a predicted 1 from the model while 5 being the true value should be a "worse" prediction than 4.
It is also wished, that the model performs somehow equally good for all review score classes.
Because my dataset is highly unbalanced I want to create a metric/loss that is capable of capturing this (I think just as F1 for classification).
Therefore I created following metric (for now just mse is relevant):
def custom_metric(y_true, y_pred):
df = pd.DataFrame(np.column_stack([y_pred, y_true]), columns=["Predicted", "Truth"])
class_mse = 0
#class_mae = 0
print("MAE for Classes:")
for i in df.Truth.unique():
temp = df[df["Truth"]==i]
mse = mean_squared_error(temp.Truth, temp.Predicted)
#mae = mean_absolute_error(temp.Truth, temp.Predicted)
print("Class {}: {}".format(i, mse))
class_mse += mse
#class_mae += mae
print()
print("AVG MSE over Classes {}".format(class_mse/len(df.Truth.unique())))
#print("AVG MAE over Classes {}".format(class_mae/len(df.Truth.unique())))
Now an example prediction:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, mean_absolute_error
# sample predictions: "model" messed up at class 2 and 3
y_true = np.array((1,1,1,2,2,2,3,3,3,4,4,4,5,5,5))
y_pred = np.array((1,1,1,2,2,3,5,4,3,4,4,4,5,5,5))
custom_metric(y_true, y_pred)
Now my question: Is it able to create a custom tensorflow loss function which is able to act in a similar behaviour? I also worked on this implementation which is not yet ready for tensorflow but maybe more alike:
def custom_metric(y_true, y_pred):
mse_class = 0
num_classes = len(np.unique(y_true))
stacked = np.vstack((y_true, y_pred))
for i in np.unique(stacked[0]):
y_true_temp = stacked[0][np.where(stacked[0]==i)]
y_pred_temp = stacked[1][np.where(stacked[0]==i)]
mse = np.mean(np.square(y_pred_temp - y_true_temp))
mse_class += mse
return mse_class/num_classes
But still, I am not sure how to work around the for loop for a tensorflow like definition.
Thanks in advance for any help!

The for loop should be dealt with exactly by means of numpy/tensorflow operations on a tensor.
A custom metric example would be:
from keras import backend as K
def custom_mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
where y_true is the ground truth label, y_pred are your predictions. You can see there are not explicit for-loops.
The motivation for not using for loops is that vectorized operations (which are present both in numpy and tensorflow) take advantage of the modern CPU architectures, turning multiple iterative operations into matrix ones. Consider that a dot-product implementation in numpy takes approximately 30 times less than a regular for-loop in Python.

Related

Tensorflow Custom Regularization Term comparing the Prediction to the True value

Hello I am in need of a custom regularization term to add to my (binary cross entropy) Loss function. Can somebody help me with the Tensorflow syntax to implement this?
I simplified everything as much as possible so it could be easier to help me.
The model takes a dataset 10000 of 18 x 18 binary configurations as input and has a 16x16 of a configuration set as output. The neural network consists only of 2 Convlutional layer.
My model looks like this:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
EPOCHS = 10
model = models.Sequential()
model.add(layers.Conv2D(1,2,activation='relu',input_shape=[18,18,1]))
model.add(layers.Conv2D(1,2,activation='sigmoid',input_shape=[17,17,1]))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),loss=tf.keras.losses.BinaryCrossentropy())
model.fit(initial.reshape(10000,18,18,1),target.reshape(10000,16,16,1),batch_size = 1000, epochs=EPOCHS, verbose=1)
output = model(initial).numpy().reshape(10000,16,16)
Now I wrote a function which I'd like to use as an aditional regularization terme to have as a regularization term. This function takes the true and the prediction. Basically it multiplies every point of both with its 'right' neighbor. Then the difference is taken. I assumed that the true and prediction term is 16x16 (and not 10000x16x16). Is this correct?
def regularization_term(prediction, true):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
I would really appreciate some help with adding something like this function as a regularization term to my loss for helping the neural network to train better to this 'right neighbor' interaction. I'm really struggling with using the customizable Tensorflow functionalities a lot.
Thank you, much appreciated.
It is quite simple. You need to specify a custom loss in which you define your adding regularization term. Something like this:
# to minimize!
def regularization_term(true, prediction):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
def my_custom_loss(y_true, y_pred):
return tf.keras.losses.BinaryCrossentropy()(y_true, y_pred) + regularization_term(y_true, y_pred)
model.compile(optimizer='Adam', loss=my_custom_loss)
As stated by keras:
Any callable with the signature loss_fn(y_true, y_pred) that returns
an array of losses (one of sample in the input batch) can be passed to
compile() as a loss. Note that sample weighting is automatically
supported for any such loss.
So be sure to return an array of losses (EDIT: as I can see now it is possible to return also a simple scalar. It doesn't matter if you use for example the reduce function). Basically y_true and y_predicted have as first dimension the batch size.
here details: https://keras.io/api/losses/

How to create weighted cross entropy loss?

I have to deal with highly unbalanced data. As I understand, I need to use weighted cross entropy loss.
I tried this:
import tensorflow as tf
weights = np.array([<values>])
def loss(y_true, y_pred):
# weights.shape = (63,)
# y_true.shape = (64, 63)
# y_pred.shape = (64, 63)
return tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, weights))
model.compile('adam', loss=loss, metrics=['acc'])
But there's an error:
ValueError: Creating variables on a non-first call to a function decorated with tf.function
How can I create this kind of loss?
I suggest in the first instance to resort to using class_weight from Keras.
class_weight
is a dictionary with {label:weight}
For example, if you have 20 times more examples in label 1 than in label 0, then you can write
# Assign 20 times more weight to label 0
model.fit(..., class_weight = {0:20, 1:0})
In this way you don't need to worry implementing weighted CCE on your own.
Additional note : in your model.compile() do not forget to use weighted_metrics=['accuracy'] in order to have a relevant reflection of your accuracy.
model.fit(..., class_weight = {0:20, 1:0}, weighted_metrics = ['accuracy'])
class weights is a dictionary that compensates for the imbalance in the data set. For example if you had a data set of 1000 dog images and 100 cat images your classifier be biased toward the dog class. If it predicted dog each time it would be correct 90 percent of the time. To compensate for the imbalance the class_weights dictionary enables you to weight samples of cats 10 times higher than that of dogs when calculating loss. One way is to use the class_weight method from sklearn as shown below
from sklearn.utils import class_weight
import numpy as np
class_weights = class_weight.compute_class_weight(
'balanced',
np.unique(train_generator.classes),
train_generator.classes)
If you are working with imbalance classes, you should use the class weights. For example if you have two classes where class 0 has twice as more data than class 1 :
class_weight = {0 :1, 1: 2}
When you compile, make use the weighted_metrics instead of just metrics or else the model won't take into account the class weights when calculating the accuracy and it will be unrealistically high.
model.compile(loss="binary_crossentropy",optimizer='adam', weighted_metrics=['accuracy'])
hist = model.fit_generator(train,validation_split=0.2,epochs=20,class_weight=class_weight)

Implementation of weigted binary cross entropy keras

I'm actually working on an image segmentation project with Keras.
I am using an implementation of Unet.
I have 2 classes identify by pixel value, 0 = background 1 = object I'm looking for.
I have a single output with a sigmoid activation function.
I'm using a binary cross-entropy has a loss function.
Here is the problem: I have a very unbalanced data set. I have aproximately 1 white pixel for 100 black pixels. And from what I understood, binary cross-entropy is not very compatible with an unbalanced data set.
So I try to implement the weigted cross entropy with the following formula:
This is my code:
def weighted_cross_entropy(y_true, y_pred):
w = [0.99, 0.01]
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
val = - (w[0] * y_true_f * K.log(y_pred_f) + w[1] * (1-y_true_f) * K.log(1-y_predf))
return K.mean(val, axis=-1)
I'm using the F1-Score/Dice to measure the result at the end of each epoch.
But pass 5 epochs, the loss is equal to NaN and the F1 Score stay very low (0.02).
It seems my network is not learning, but I don't understand why. Maybe my formula is wrong ?
I have also tried to invert weights value but the result is the same.
After some research I noted that it was possible to give defined weights directly in the fit function.
Like this:
from sklearn.utils import class_weight
w = class_weight.compute_class_weight('balanced', np.unique(y.train.ravel()) , y_train.ravel())
model.fit(epochs = 1000, ..., class_weight = w)
By doing this with the binary cross entropy basic function the network learns correctly and gives better results compared without the use of predefined weights.
So I don't understand the difference between the 2 methods. Is it really necessary to implement the weighted cross entropy function ?

converting Sklearn based custom metric function -> to use as Keras' metric for callbacks

I have already been given a custom metric code on which my model is going to be evaluated but they've used sklearn's metrices. I know If I have a metric I can use it in callbacks like
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy', custom_metric])
ModelCheckpoint(monitor='val_custom_metric',
save_best_only=True,
save_weights_only=True,
mode='max',
verbose=1)
It is a multi-output problem with 3 labels,
Submissions are evaluated using a hierarchical macro-averaged recall. First, a standard macro-averaged recall is calculated for each component (label_1,label_2 or label_3). The final score is the weighted average of those three scores, with the label_1 given double weight. You can replicate the metric with the following python snippet:
and I am unable to comprehend how do I implement the code given below in keras-
import numpy as np
import sklearn.metrics
scores = []
for component in ['label_1', 'label_2', 'label_3']:
y_true_subset = solution[solution[component] == component]['target'].values
y_pred_subset = submission[submission[component] == component]['target'].values
scores.append(sklearn.metrics.recall_score(
y_true_subset, y_pred_subset, average='macro'))
final_score = np.average(scores, weights=[2,1,1])
How can I convert it in the form to use as a metric? or more preciely, how can I use keras.backend or to implement this code?
You can only implement the metric, the rest is very obscure and will certainly not participate in Keras.
threshold = 0.5 #you can work this threshold for better results
#considering y_true is made of 0 and 1 only
#considering output shape is (batch, 3)
def custom_metric(y_true, y_pred):
weights = K.constant([2,1,1]) #shape (3,)
y_pred = K.cast(K.greater(y_pred, threshold), K.floatx()) #shape (batch, 3)
true_positives = K.sum(y_pred * y_true, axis=0) #shape (3,)
false_negatives = K.sum((1-y_pred) * y_true, axis=0) #shape (3,)
recall = true_positives / (true_positives + false_negatives)
return K.mean(recall * weights)
Notice that this will be calulated batchwise, and since the denominator is different depending on the results, the calculated metric batchwise will be different compared to when you use the metric for the entire dataset.
You may need big batch sizes to avoid metric instability. And it might be interesting to apply the metric on the entire data with a callback to get the exact result.

Part of custom loss function not differentiable

For my problem, I want to predict customer review scores ranging from 1 to 5. I thought it would be good to implement this as a regression problem because a predicted 1 from the model while 5 being the true value should be a "worse" prediction than 4. It is also wished, that the model performs somehow equally good for all review score classes. Because my dataset is highly unbalanced I want to create a own loss function. With a normal MSE implemenation the model will only be good at predicting the majority class. Therefore I want to estimate the MSE for every review class (1,2,3,4 and 5 star reviews) and then take the mean of these five MSE scores. That's how the model should be incentivized to also perform good at predicting the smaller classes.
The following code shows my tensorflow loss function:
def custom_loss_function(y_true, y_pred):
y_true_t = tf.cast(tf.Variable(y_true, validate_shape=False),dtype="float16")
y_pred_t = tf.cast(tf.Variable(y_pred, validate_shape=False),dtype="float16")
one = tf.cast(tf.Variable([1]),dtype="float16")
two = tf.cast(tf.Variable([2]),dtype="float16")
three = tf.cast(tf.Variable([3]),dtype="float16")
four = tf.cast(tf.Variable([4]),dtype="float16")
five = tf.cast(tf.Variable([5]),dtype="float16")
# index for a given review class
i_one = tf.where(tf.equal(y_true_t, one))
i_two = tf.where(tf.equal(y_true_t, two))
i_three = tf.where(tf.equal(y_true_t, three))
i_four = tf.where(tf.equal(y_true_t, four))
i_five = tf.where(tf.equal(y_true_t, five))
# predictions for the found indicies for a review class
y_pred_one=tf.gather(y_pred_t,i_one)
y_pred_two=tf.gather(y_pred_t,i_two)
y_pred_three=tf.gather(y_pred_t,i_three)
y_pred_four=tf.gather(y_pred_t,i_four)
y_pred_five=tf.gather(y_pred_t,i_five)
y_true_one=tf.gather(y_true_t,i_one)
y_true_two=tf.gather(y_true_t,i_two)
y_true_three=tf.gather(y_true_t,i_three)
y_true_four=tf.gather(y_true_t,i_four)
y_true_five=tf.gather(y_true_t,i_five)
mse1 = tf.reduce_mean(tf.cast(tf.square(y_true_one-y_pred_one),dtype="float"))
mse2 = tf.reduce_mean(tf.cast(tf.square(y_true_two-y_pred_two),dtype="float"))
mse3 = tf.reduce_mean(tf.cast(tf.square(y_true_three-y_pred_three),dtype="float"))
mse4 = tf.reduce_mean(tf.cast(tf.square(y_true_four-y_pred_four),dtype="float"))
mse5 = tf.reduce_mean(tf.cast(tf.square(y_true_five-y_pred_five),dtype="float"))
mse = (mse1+mse2+mse3+mse4+mse5)/5
return mse
When I try to use this loss function in my NN I get the following message:
ValueError: An operation has None for gradient. Please make sure
that all of your ops have a gradient defined (i.e. are
differentiable). Common ops without gradient: K.argmax, K.round,
K.eval.
I couldn't figure out why this error occurs. I thought that all tf operations should be differentiable but I could also be wrong.
Thanks for any help.

Categories