keras apply threshold for loss function - python

I am developing a Keras model. My dataset is badly unbalanced, so I want to set a threshold for training and testing. If I'm not mistaken, when doing a backward propagation, neural network checks the predicted values with the original ones and calculate the error and based on the error, set new weights for neurons.
As I know, Keras uses 0.5 for the threshold. I know there are ways to apply custom metrics (as recall and precision) with custom threshold, but that threshold is only used for calculating the recall, and it is not applied in the loss function. To be more clear, If I want to set 0.85 as my threshold, the neural network would use 0.5 as threshold to calculate loss and 0.85 for recall.
Is there any ways to set this threshold for training as well?

There is no such a thing as a threshold for loss.
A loss function must be "differentiable", thus it must be a "continuous" function.
The best you can do is to set "class weights", such as these examples: Higher loss penalty for true non-zero predictions

In addition to class weights...
you can use metric function with the threshold parameter:
model.compile(..., metrics=[tf.keras.metrics.BinaryAccuracy(threshold=0.5)])
you can use sigmoid activation in the last layer and select after that threshold manually:
pred_labels = np.where(y_pred>0.5, 1, 0)
score = sklearn.metrics.accuracy_score(pred_labels, labels)

Related

why did i got 2 different losses for sparse_categorical_crossentropy and categorical_crossentropy?

I trained a model for multiclass classification. There were three classes. In the first approach, I trained a model by converting the classes into one-hot vectors and training a model with loss function, categorical crossentropy, I achieved a loss of 0.07 in 1000 epochs. But when I used the same approach, but this time I did not converted the classes into one-hot vectors and used sparse_categorical_crossentropy, as the loss function, this time i achieved a loss of 0.05 in 1000 epochs.. Does this mean that sparse_categorical_crossentropy is better than categorical_crossentropy?
Thank You!
You can't compare two loss functions in term of losses since the definition of loss itself changed. you can compare the performance on the same test dataset.
In general use sparse_categorical_crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical_crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).
You got different losses because the representation of the labels changes, actually in keras the sparse_categorical_crossentropy is defined as categorical crossentropy with integer targets

The return value of model.evaluate_generator

I don't understand: since that the model is evaluated on a group of images, not a single image, so I think the score should return the average value of loss or metrics over the group of images.
score = model.evaluate_generator(evaluateGene,test_images, verbose=1)
This is a neural network model based on Keras, to evaluate the performance of the model, we need to calculate the average loss and metrics and their standard deviation.
score = model.evaluate_generator(evaluateGene,test_images, verbose=1)
print('%.3f' %score[0], '%.3f' %score[1],'%.3f' %score[2])
Since I want to calculate the mean loss, metrics and standard deviation. But this function seems can't do that. Are there some good solutions to return a mean value and std? Thanks a lot!

Multiple losses for imbalanced dataset with Keras

My Model:
I built a siamese network that take two input and has three outputs. So My loss functions is :
total loss = alpha( loss1) + alpah( loss2) + (1_alpah) ( loss3)
loss1 and loss2 is categorical cross entropy loss function, to classify the class identity from total of 8 classes.
loss3 is similarity loss function ( euclidean distance loss), to verify if the both input from same class or different classes.
My questions are as follow:
If I have different losses, and I want to weight them by using the variable alpha which its value depend on the epoch number. So I have to set the value pf alpha through callback. My question is it possible to pass this alpha variable that its value changed with epoch number (i.e not scalar) through the loss_weights in model.complie. The documentation said:
loss_weights: Optional list or dictionary specifying scalar
coefficients (Python floats) to weight the loss contributions of
different model outputs. The loss value that will be minimized by the
model will then be the weighted sum of all individual losses, weighted
by the loss_weights coefficients. If a list, it is expected to have a
1:1 mapping to the model's outputs. If a tensor, it is expected to map
output names (strings) to scalar coefficients.
Example
alpha = K.variable(0., dtype=tf.float32)
def changeAlpha(epoch,logs):
new_alpha = some_function(epoch)
K.set_value(alpha, new_alpha)
return
alphaChanger = LambdaCallback(on_epoch_end=changeAlpha)
model.compile(loss= [loss1, loss2, loss3], loss_weights = [alpha, alpha, (1-alpha)])
My dataset is imbalanced, so I want to use class_wights in model.fit(). So for the same model of three losses, I want to apply the class weights only for categorical cross entropy losses ( loss 1 and loss2) , So will it work on both losses and except the third loss if I pass it to model.fit? Knowing that the third loss is custom loss function.
If I want to classify the classes for siamese network, would my metric be model.compile(metrics= ['out1':'accuracy', 'out2':accuracy']])? But the final accuracy need to be the average of both,I can solve it by building my own custom metric. But is there anyway to weighted summing of both metrics.

How to set loss weight in chainer?

First of all I narrate you about my question and situation.
I want to do multi-label classification in chainer and my class imbalance problem is very serious.
In this cases I must slice the vector inorder to calculate loss function, For example, In multi-label classification, ground truth label vector most elements is 0, only few of them is 1, In this situation, directly use F.sigmoid_cross_entropy to apply all the 0/1 elements may cause training not convergence, So I decide to use a[[xx,xxx,...,xxx]] slice( a is chainer.Variable output by last FC layer) to slice specific elements to calculate loss function.
In this case, because of label imbalance may cause rare class low classification performance, so I want to set rare gt-label variable high loss weight during back propagation, but set major label(occur too many in gt) variable low weight during back propagation.
How should I do it? What is your suggestion about multi-label imbalance class problem training in chainer?
You can use sigmoid_cross_entropy() of no-reduce mode (by passing reduce='no') to obtain a loss value at each spatial location and the average function for weighted averaging.
sigmoid_cross_entropy() first computes the loss value at each spatial location and each data along the batch dimension, and then take the mean or summation over the spatial dimensions and batch dimension (depending on the normalize option). You can disable the reduction part by passing reduce='no'. If you want to do the weighted average, you should specify it so that you can get the loss value at each location and reduce them by yourself.
After that, the simplest way to manually do weighted averaging is using average(), which can accept weight argument that indicates the weights for averaging. It first does weighted summation using the input and weight, and then divides the result by the summation of weight. You can pass appropriate weight array that has the same shape as the input and pass it to average() along with the raw (unreduced) loss values obtained by sigmoid_cross_entropy(..., reduce='no'). It is also ok to manually multiply a weight array and take summation like F.sum(score * weight) if weight is appropriately scaled (e.g. summing up to 1).
If you work on multi-label classification, how about using softmax_crossentropy loss?
softmax_crossentropy can take into account the class imbalance by specifying the class_weight attribute.
https://github.com/chainer/chainer/blob/v3.0.0rc1/chainer/functions/loss/softmax_cross_entropy.py#L57
https://docs.chainer.org/en/stable/reference/generated/chainer.functions.softmax_cross_entropy.html

Unbalanced data and weighted cross entropy

I'm trying to train a network with an unbalanced data. I have A (198 samples), B (436 samples), C (710 samples), D (272 samples) and I have read about the "weighted_cross_entropy_with_logits" but all the examples I found are for binary classification so I'm not very confident in how to set those weights.
Total samples: 1616
A_weight: 198/1616 = 0.12?
The idea behind, if I understood, is to penalize the errors of the majority class and value more positively the hits in the minority one, right?
My piece of code:
weights = tf.constant([0.12, 0.26, 0.43, 0.17])
cost = tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(logits=pred, targets=y, pos_weight=weights))
I have read this one and others examples with binary classification but still not very clear.
Note that weighted_cross_entropy_with_logits is the weighted variant of sigmoid_cross_entropy_with_logits. Sigmoid cross entropy is typically used for binary classification. Yes, it can handle multiple labels, but sigmoid cross entropy basically makes a (binary) decision on each of them -- for example, for a face recognition net, those (not mutually exclusive) labels could be "Does the subject wear glasses?", "Is the subject female?", etc.
In binary classification(s), each output channel corresponds to a binary (soft) decision. Therefore, the weighting needs to happen within the computation of the loss. This is what weighted_cross_entropy_with_logits does, by weighting one term of the cross-entropy over the other.
In mutually exclusive multilabel classification, we use softmax_cross_entropy_with_logits, which behaves differently: each output channel corresponds to the score of a class candidate. The decision comes after, by comparing the respective outputs of each channel.
Weighting in before the final decision is therefore a simple matter of modifying the scores before comparing them, typically by multiplication with weights. For example, for a ternary classification task,
# your class weights
class_weights = tf.constant([[1.0, 2.0, 3.0]])
# deduce weights for batch samples based on their true label
weights = tf.reduce_sum(class_weights * onehot_labels, axis=1)
# compute your (unweighted) softmax cross entropy loss
unweighted_losses = tf.nn.softmax_cross_entropy_with_logits(onehot_labels, logits)
# apply the weights, relying on broadcasting of the multiplication
weighted_losses = unweighted_losses * weights
# reduce the result to get your final loss
loss = tf.reduce_mean(weighted_losses)
You could also rely on tf.losses.softmax_cross_entropy to handle the last three steps.
In your case, where you need to tackle data imbalance, the class weights could indeed be inversely proportional to their frequency in your train data. Normalizing them so that they sum up to one or to the number of classes also makes sense.
Note that in the above, we penalized the loss based on the true label of the samples. We could also have penalized the loss based on the estimated labels by simply defining
weights = class_weights
and the rest of the code need not change thanks to broadcasting magic.
In the general case, you would want weights that depend on the kind of error you make. In other words, for each pair of labels X and Y, you could choose how to penalize choosing label X when the true label is Y. You end up with a whole prior weight matrix, which results in weights above being a full (num_samples, num_classes) tensor. This goes a bit beyond what you want, but it might be useful to know nonetheless that only your definition of the weight tensor need to change in the code above.
See this answer for an alternate solution which works with sparse_softmax_cross_entropy:
import tensorflow as tf
import numpy as np
np.random.seed(123)
sess = tf.InteractiveSession()
# let's say we have the logits and labels of a batch of size 6 with 5 classes
logits = tf.constant(np.random.randint(0, 10, 30).reshape(6, 5), dtype=tf.float32)
labels = tf.constant(np.random.randint(0, 5, 6), dtype=tf.int32)
# specify some class weightings
class_weights = tf.constant([0.3, 0.1, 0.2, 0.3, 0.1])
# specify the weights for each sample in the batch (without having to compute the onehot label matrix)
weights = tf.gather(class_weights, labels)
# compute the loss
tf.losses.sparse_softmax_cross_entropy(labels, logits, weights).eval()
Tensorflow 2.0 Compatible Answer: Migrating the Code specified in P-Gn's Answer to 2.0, for the benefit of the community.
# your class weights
class_weights = tf.compat.v2.constant([[1.0, 2.0, 3.0]])
# deduce weights for batch samples based on their true label
weights = tf.compat.v2.reduce_sum(class_weights * onehot_labels, axis=1)
# compute your (unweighted) softmax cross entropy loss
unweighted_losses = tf.compat.v2.nn.softmax_cross_entropy_with_logits(onehot_labels, logits)
# apply the weights, relying on broadcasting of the multiplication
weighted_losses = unweighted_losses * weights
# reduce the result to get your final loss
loss = tf.reduce_mean(weighted_losses)
For more information about migration of code from Tensorflow Version 1.x to 2.x, please refer this Migration Guide.

Categories