Binary accuracy using tensorflow tensors - python

I have a time-series model in tensorflow predicting vectors of size N with binary variables, e.g. [1 0 0 1 0].
For these vectors i have recall metric which is calculated correclty as seen below:
def recall_m(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
What i would like is given y_true and y_pred also calculate a metric as follows:
if y_true contains at least one '1' and y_pred also contains at least one '1' output 1
if y_true contains has all variables to '0' and y_pred also has all variables to '0' output 1
otherwise output 0
1's are considered as attack instances while 0 normal cases so this metric predicts the existence or not of an attack in the next N time steps.
What i have done so far:
def AttackAcc(y_true, y_pred):
has_one = K.sum(K.round(K.clip(y_true, 0, 1)))
has_one_pred = K.sum(K.round(K.clip(y_pred, 0, 1)))
if has_one >tf.constant(0.0):
has_one = tf.constant(1.0)
else:
has_one = tf.constant(0.0)
if has_one_pred >tf.constant(0.0):
has_one_pred = tf.constant(1.0)
else:
has_one_pred = tf.constant(0.0)
return tf.math.equal(has_one , has_one_pred )
However, this produces results that are unreasonable, e.g. outputing 1 when the conditions described above are not met.
Any idea about what am i doing wrong?

Related

Asymmetric Function for Loss

I'm using LightGBM and I need to realize a loss function that during the training give a penalty when the prediction is lower than the target. In other words I assume that underestimates are much worse than overestimates. I've found this suggestion that do exactly the opposite:
def custom_asymmetric_train(y_true, y_pred):
residual = (y_true - y_pred).astype("float")
grad = np.where(residual<0, -2*10.0*residual, -2*residual)
hess = np.where(residual<0, 2*10.0, 2.0)
return grad, hess
def custom_asymmetric_valid(y_true, y_pred):
residual = (y_true - y_pred).astype("float")
loss = np.where(residual < 0, (residual**2)*10.0, residual**2)
return "custom_asymmetric_eval", np.mean(loss), False
https://towardsdatascience.com/custom-loss-functions-for-gradient-boosting-f79c1b40466d)
How can I modify it for my purpose?
I believe this function is where you want to make a change.
def custom_asymmetric_valid(y_true, y_pred):
residual = (y_true - y_pred).astype("float")
loss = np.where(residual < 0, (residual**2)*10.0, residual**2)
return "custom_asymmetric_eval", np.mean(loss), False
The line where loss is worked out has a comparison.
loss = np.where(residual < 0, (residual**2)*10.0, residual**2)
When residual is less that 0, loss is residual^2 * 10
where whne about 0, loss is just redisual^2.
So if we change this less than to a greater than. This will flip the skew.
loss = np.where(residual > 0, (residual**2)*10.0, residual**2)
I think this would be helpful. Originated from Custom loss function with Keras to penalise more negative prediction
def customLoss(true,pred):
diff = pred - true
greater = K.greater(diff,0)
greater = K.cast(greater, K.floatx()) #0 for lower, 1 for greater
greater = greater + 1 #1 for lower, 2 for greater
#use some kind of loss here, such as mse or mae, or pick one from keras
#using mse:
return K.mean(greater*K.square(diff))
model.compile(optimizer = 'adam', loss = customLoss)

How to create Hybrid loss consisting from dice loss and focal loss [Python]

I'm trying to implement the Multiclass Hybrid loss function in Python from following article https://arxiv.org/pdf/1808.05238.pdf for my semantic segmentation problem using an imbalanced dataset. I managed to get my implementation correct enough to start while training the model, but the results are very poor. Model architecture - U-net, learning rate in Adam optimizer is 1e-5. Mask shape is (None, 512, 512, 3), with 3 classes (in my case forest, deforestation, other). The formula I used to implement my loss:
The code I created:
def build_hybrid_loss(_lambda_=1, _alpha_=0.5, _beta_=0.5, smooth=1e-6):
def hybrid_loss(y_true, y_pred):
C = 3
tversky = 0
# Calculate Tversky Loss
for index in range(C):
inputs_fl = tf.nest.flatten(y_pred[..., index])
targets_fl = tf.nest.flatten(y_true[..., index])
#True Positives, False Positives & False Negatives
TP = tf.reduce_sum(tf.math.multiply(inputs_fl, targets_fl))
FP = tf.reduce_sum(tf.math.multiply(inputs_fl, 1-targets_fl[0]))
FN = tf.reduce_sum(tf.math.multiply(1-inputs_fl[0], targets_fl))
tversky_i = (TP + smooth) / (TP + _alpha_ * FP + _beta_ * FN + smooth)
tversky += tversky_i
tversky += C
# Calculate Focal loss
loss_focal = 0
for index in range(C):
f_loss = - (y_true[..., index] * (1 - y_pred[..., index])**2 * tf.math.log(y_pred[..., index]))
# Average over each data point/image in batch
axis_to_reduce = range(1, 3)
f_loss = tf.math.reduce_mean(f_loss, axis=axis_to_reduce)
loss_focal += f_loss
result = tversky + _lambda_ * loss_focal
return result
return hybrid_loss
The prediction of the model after the end of an epoch (I have a problem with swapped colors, so the red in the prediction is actually green, which means forest, so the prediction is mostly forest and not deforestation):
The question is what is wrong with my hybrid loss implementation, what needs to be changed to make it work?
To simplify things a little, I have divided the Hybrid loss into four separate functions: Tversky's loss, Dice coefficient, Dice loss, Hybrid loss. You can see the code below.
def TverskyLoss(targets, inputs, alpha=0.5, beta=0.5, smooth=1e-16, numLabels=3):
tversky = 0
for index in range(numLabels):
inputs_fl = tf.nest.flatten(inputs[..., index])
targets_fl = tf.nest.flatten(targets[..., index])
#True Positives, False Positives & False Negatives
TP = tf.reduce_sum(tf.math.multiply(inputs_fl, targets_fl))
FP = tf.reduce_sum(tf.math.multiply(inputs_fl, 1-targets_fl[0]))
FN = tf.reduce_sum(tf.math.multiply(1-inputs_fl[0], targets_fl))
tversky_i = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)
tversky += tversky_i
return numLabels - tversky
def dice_coef(y_true, y_pred, smooth=1e-16):
y_true_f = tf.nest.flatten(y_true)
y_pred_f = tf.nest.flatten(y_pred)
intersection = tf.math.reduce_sum(tf.math.multiply(y_true_f, y_pred_f))
return (2. * intersection + smooth) / (tf.math.reduce_sum(y_true_f) + tf.math.reduce_sum(y_pred_f) + smooth)
def dice_coef_multilabel(y_true, y_pred, numLabels=3):
dice=0
for index in range(numLabels):
dice -= dice_coef(y_true[..., index], y_pred[..., index])
return numLabels + dice
def build_hybrid_loss(_lambda_=0.5, _alpha_=0.5, _beta_=0.5, smooth=1e-16, C=3):
def hybrid_loss(y_true, y_pred):
tversky = TverskyLoss(y_true, y_pred, alpha=_alpha_, beta=_beta_)
dice = dice_coef_multilabel(y_true, y_pred)
result = tversky + _lambda_ * dice
return result
return hybrid_loss
Adding the loss=build_hybrid_loss() during model compilation will add Hybrid loss as the loss function of the model.
After a short research, I came to the conclusion that in my particular case, a Hybrid loss with _lambda_ = 0.2, _alpha_ = 0.5, _beta_ = 0.5 would not be much better than a single Dice loss or a single Tversky loss. Neither IoU (intersection over union) nor the standard accuracy metric are much better with Hybrid loss. But I believe it is not a rule of thumb that such a Hybrid loss will be worser or at the same level of performance as single loss at all cases.
link to Accuracy graph
link to IoU graph

Keras custom loss function to ignore false negatives of a specific class during semantic segmentation?

See EDIT below, the initial post almost has no meaning now but the question still remains.
I developing a neural network to semantically segment imagery. I have worked through various loss functions (categorical cross entropy (CCE), weight CCE, focal loss, tversky loss, jaccard loss, focal tversky loss, etc) which attempt to handle highly skewed class representation, though none are producing the desired effect. My advisor mentioned attempting to create a custom loss function which ignores false negatives for a specific class (but still penalizes false positives).
I have a 6 class problem and my network is setup to work in/with one-hot encoded truth data. As a result my loss function will accept two tensors, y_true, y_pred, of shape (batch, row, col, class) (which is currently (8, 128, 128, 6)). To be able to utilize the losses I have already explored I would like to alter y_pred to set the predicted value for the specific class (the 0th class) to always be correct. That is where y_true == class 0 set y_pred == class 0, otherwise do nothing.
I have spent way too much time attempting to create this loss function as a result of tensorflow tensors being immutable. My first attempt (which I was led to through my experience with numpy)
def weighted_categorical_crossentropy_ignore(weights):
weights = K.variable(weights)
def loss(y_true, y_pred):
y_pred[tf.where(y_true == [1, 0, 0, 0, 0, 0])] = [1, 0, 0, 0, 0, 0]
# Scale predictions so that the class probs of each sample sum to 1
y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
# Clip to prevent NaN's and Inf's
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
loss = y_true * K.log(y_pred) * weights
loss = -K.sum(loss, -1)
return loss
return loss
Though obviously I cannot alter y_pred so this attempt failed. I ended up creating a few monstrosities attempting to "build" a tensor by iterating over [batch, row, col] and performing comparisons. While this(ese) attempts did not technically fail, they never actually began training. I assume it was taking on the order of minutes to compute the loss.
After many more failed efforts I started attempting to perform the requisite computation in pure numpy in a SSCCE. But keeping cognizant I was essentially limited to instantiating "simple" tensors (ie ones, zeros) and only performing "simple" operations like element-wise multiply, addition, and reshaping. Thus I arrived at this SSCCE
import numpy as np
from tensorflow.keras.utils import to_categorical
# Generate the "images" at random
true_flat = np.argmax(np.random.rand(1, 2, 2, 4), axis=3).astype('int')
true = to_categorical(true_flat, num_classes=4).astype('int')
pred_flat = np.argmax(np.random.rand(1, 2, 2, 4), axis=3).astype('int')
pred = to_categorical(pred_flat, num_classes=4).astype('int')
print('True:\n', true_flat)
print('Pred:\n', pred_flat)
# Create a mask representing an all "class 0" image
class_zero_label = np.array([1, 0, 0, 0])
czl_all = class_zero_label * np.ones(true.shape).astype('int')
# Mask both the truth and pred to locate class 0 pixels
czl_true_locs = czl_all * true
czl_pred_locs = czl_all * pred
# Subtract to create "addition" matrix
a = (czl_true_locs - czl_pred_locs) * czl_true_locs
print('a:\n', a)
# Do this
m = ((a + 1) - (a * 2))
print('m - ', m.shape, ':\n', m)
# Pull the front entry from 'm' and "expand" its value
#x = (m[:, :, :, 0].flatten() * np.ones(pred.shape).astype('int')).T.reshape(pred.shape)
m_front = m[:, :, :, 0]
print('m_front - ', m_front.shape, ':\n', m_front)
#m_flat = m_front.flatten()
m_flat = m_front.reshape(m_front.shape[0], m_front.shape[1]*m_front.shape[2])
print('m_flat - ', m_flat.shape, ':\n', m_flat)
m_expand = m_flat * np.ones(pred.shape).astype('int')
print('m_expand - ', m_expand.shape, ':\n', m_expand)
m_trans = m_expand.T
m_fixT = m_trans.reshape(pred.shape)
print('m_fixT - ', m_fixT.shape, ':\n', m_fixT)
m = m_fixT
print('m:\n', m.shape)
# Perform the math as described
pred = (pred * m) + a
print('Pred:\n', np.argmax(pred, axis=3))
This SSCCE, is well, terrible and complex. Essentially my goal here was to create two matrices, the "addition" and "multiplication" matrices. The multiplication matrix is meant to "zero out" every pixel in the predicted values where the truth value was equal to class 0. That is no matter the pixel value (ie a one-hot encoded vector) zero it out to be equal to [0, 0, 0, 0, 0, 0]. The addition matrix is then meant to add the vector [1, 0, 0, 0, 0, 0] to each of the zero'ed out locations. In the end this would achieve the goal of setting the predicted value of every truly class 0 pixel to correct.
The issue is that this SSCCE does not translate fully to tensorflow operations. The first issue is with the generation of the multiplication matrix, it is not defined correctly for when batch_size > 1. I thought no matter, just to see if it work I will break down and tf.unstack the y_true and y_pred tensors and iteration over them. Which has led me to the current instantiation of my loss function
def weighted_categorical_crossentropy_ignore(weights):
weights = K.variable(weights)
def loss(y_true, y_pred):
y_true_un = tf.unstack(y_true)
y_pred_un = tf.unstack(y_pred)
y_pred_new = []
for i in range(0, y_true.shape[0]):
yt = y_true_un[i]
yp = y_pred_un[i]
# Pred:
# [[[0 3] * [[[1 0] + [[[0 1] = [[[0 0]
# [3 1]]] [[1 1]]] [[0 0]]] [[3 1]]]
# If we multiple pred by a tensor which zeros out only incorrect class 0 labelleling
# Then add class zero to those zero'd out locations
# We can negate the effect of mis-classified class 0 pixels but still punish for
# incorrectly predicted class 0 labels for other classes.
# Create a mask respresenting an all "class 0" image
class_zero_label = K.variable([1.0, 0.0, 0.0, 0.0, 0.0, 0.0])
czl_all = class_zero_label * K.ones(yt.shape)
# Mask both true and pred to locate class 0 pixels
czl_true = czl_all * yt
czl_pred = czl_all * yp
# Subtract to create "addition matrix"
a = czl_true - czl_pred
# Do this.
m = ((a + 1) - (a * 2.))
# And this.
x = K.flatten(m[:, :, 0])
x = x * K.ones(yp.shape)
x = K.transpose(x)
x = K.reshape(x, yp.shape)
# Voila.
ypnew = (yp * x) + a
y_pred_new.append(ypnew)
y_pred_new = tf.concat(y_pred_new, 0)
# Continue calculating weighted categorical crossentropy
# -------------------------------------------------------
# Scale predictions so that the class probs of each sample sum to 1
y_pred_new /= K.sum(y_pred_new, axis=-1, keepdims=True)
# Clip to prevent NaN's and Inf's
y_pred_new = K.clip(y_pred_new, K.epsilon(), 1 - K.epsilon())
loss = y_true * K.log(y_pred_new) * weights
loss = -K.sum(loss, -1)
return loss
return loss
The current issue with this loss function lies in the apparent difference in the behavior between numpy and tensorflow when performing the operation
x = K.flatten(m[:, :, 0])
x = x * K.ones(yp.shape)
Which is meant to represent the behavior
m_flat = m_front.flatten()
m_expand = m_flat * np.ones(pred.shape).astype('int')
from the SSCCE.
So at this point I feel like I have delved so far into caveman coding I can't get out of it. I have to image there is some simple way akin to my initial attempt to perform the described behavior.
So, I guess my direct question is How do I implement
y_pred[tf.where(y_true == [1, 0, 0, 0, 0, 0])] = [1, 0, 0, 0, 0, 0]
in a custom tensorflow loss function?
EDIT: After fumbling around quite a bit more I have finally determined how to call .numpy() on the y_true, y_pred tensors to utilize numpy operations (Apparently setting tf.compat.v1.enable_eager_execution at the start of the program "doesn't work". I had to pass run_eagerly=True to Model().compile(...)).
This has allowed me to implement essentially the first attempt outlined
def weighted_categorical_crossentropy_ignore(weights):
weights = K.variable(weights)
def loss(y_true, y_pred):
yp = y_pred.numpy()
yt = y_true.numpy()
yp[np.nonzero(np.all(yt == [1, 0, 0, 0, 0, 0], axis=3))] = [1, 0, 0, 0, 0, 0]
# Continue calculating weighted categorical crossentropy
# -------------------------------------------------------
# Scale predictions so that the class probs of each sample sum to 1
yp /= K.sum(yp, axis=-1, keepdims=True)
# Clip to prevent NaN's and Inf's
yp = K.clip(yp, K.epsilon(), 1 - K.epsilon())
loss = y_true * K.log(yp) * weights
loss = -K.sum(loss, -1)
return loss
return loss
Though it seems by calling y_pred.numpy() (or the use of it thereafter) I have apparently "destroyed" the path/flow through the network. Based on the error when attempting to .fit
ValueError: No gradients provided for any variable: ['conv3d/kernel:0', <....>
I assume I somehow need to "remarshall" the tensor back to GPU memory? I have tried
yp = tf.convert_to_tensor(yp)
to no avail; same error. So I guess the same question still lies, but from a different motivation..
EDIT2: Well it seems from this SO Answer that I can't actually use numpy() to marshall the y_true, y_pred to use vanilla numpy operations. This necessarily "destroys" the network path and thus gradients cannot be calculated.
As I result I had realized with run_eagerly=True I can tf.Variable my y_true/y_pred and perform assignment. So in pure tensorflow I attempted to recreate the same code again
def weighted_categorical_crossentropy_ignore(weights):
weights = K.variable(weights)
def loss(y_true, y_pred):
# yp = y_pred.numpy().copy()
# yt = y_true.numpy().copy()
# yp[np.nonzero(np.all(yt == [1, 0, 0, 0, 0, 0], axis=3))] = [1, 0, 0, 0, 0, 0]
yp = K.variable(y_pred)
yt = K.variable(y_true)
#np.all
x = K.all(yt == [1, 0, 0, 0, 0, 0], axis=3)
#np.nonzero
ne = tf.not_equal(x, tf.constant(False))
y = tf.where(ne)
# Perform the desired operation
yp[y] = [1, 0, 0, 0, 0, 0]
# Continue calculating weighted categorical crossentropy
# -------------------------------------------------------
# Scale predictions so that the class probs of each sample sum to 1
#yp /= K.sum(yp, axis=-1, keepdims=True) # Cannot use \= on tf.var, must use var = var /
yp = yp / K.sum(yp, axis=-1, keepdims=True)
# Clip to prevent NaN's and Inf's
yp = K.clip(yp, K.epsilon(), 1 - K.epsilon())
loss = y_true * K.log(yp) * weights
loss = -K.sum(loss, -1)
return loss
return loss
But alas, this apparently creates the same issue as when calling .numpy(); no gradients can be computed. So I am again seemingly back at square 1.
EDIT3: Using the solution proposed by gobrewers14 in the answer posted below but modified based on my knowledge of the problem I have produced this loss function
def weighted_categorical_crossentropy_ignore(weights):
weights = K.variable(weights)
def loss(y_true, y_pred):
print('y_true.shape: ', y_true.shape)
print('y_pred.shape: ', y_pred.shape)
# Generate modified y_pred where all truly class0 pixels are correct
y_true_class0_indicies = tf.where(tf.math.equal(y_true, [1., 0., 0., 0., 0., 0.]))
y_pred_updates = tf.repeat([
[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]],
repeats=y_true_class0_indicies.shape[0],
axis=0)
yp = tf.tensor_scatter_nd_update(y_pred, y_true_class0_indicies, y_pred_updates)
# Continue calculating weighted categorical crossentropy
# -------------------------------------------------------
# Scale predictions so that the class probs of each sample sum to 1
yp /= K.sum(yp, axis=-1, keepdims=True)
# Clip to prevent NaN's and Inf's
yp = K.clip(yp, K.epsilon(), 1 - K.epsilon())
loss = y_true * K.log(yp) * weights
loss = -K.sum(loss, -1)
return loss
return loss
Provided the original answer assumed y_true to be of shape [8, 128, 128] (ie a "flat" class representation, versus a one-hot encoded representation [8, 128, 128, 6]) I first print the shapes of the y_true and y_pred input tensors for sanity
y_true.shape: (8, 128, 128, 6)
y_pred.shape: (8, 128, 128, 6)
For further sanity, the output shape of the network, provided by the tail of model.summary is
conv2d_18 (Conv2D) (None, 128, 128, 6) 1542 dropout_5[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 128, 128, 6) 0 conv2d_18[0][0]
==================================================================================================
Total params: 535,551,494
Trainable params: 535,529,478
Non-trainable params: 22,016
__________________________________________________________________________________________________
I then follow "the pattern" in the proposed solution and replace the original tf.math.equal(y_true, 0) with tf.math.equal(y_true, [1., 0., 0., 0., 0., 0.]) to handle the one-hot encoded case. From my understanding of the proposed solution currently (after ~10min of inspecting it) I assumed this should work. Though when attempting to train a model the following exception is thrown
InvalidArgumentError: Inner dimensions of output shape must match inner dimensions of updates shape. Output: [8,128,128,6] updates: [684584,6] [Op:TensorScatterUpdate]
Thus it seems as if the production of the (as I have named them) y_pred_updates produces a "collapsed" tensor with "too many" elements. I understand the motivation of the use of tf.repeat but its specific use seems to be incorrect. I assume it should produce a tensor with shape (8, 128, 128, 6) based on what I understand tf.tensor_scatter_nd_update to do. I assume this most likely is just based on the selection of the repeats and axis during the call to tf.repeat.
If I understand your question correctly, you are looking for something like this:
import tensorflow as tf
# batch of true labels
y_true = tf.constant([5, 0, 1, 3, 4, 0, 2, 0], dtype=tf.int64)
# batch of class probabilities
y_pred = tf.constant(
[
[0.34670502, 0.04551039, 0.14020428, 0.14341979, 0.21430719, 0.10985339],
[0.25681055, 0.14013883, 0.19890164, 0.11124421, 0.14526634, 0.14763844],
[0.09199252, 0.21889475, 0.1170236 , 0.1929019 , 0.20311192, 0.17607528],
[0.3246354 , 0.23257554, 0.15549366, 0.17282239, 0.00000001, 0.11447308],
[0.16502093, 0.13163856, 0.14371352, 0.19880624, 0.23360236, 0.12721846],
[0.27362782, 0.21408406, 0.10917682, 0.13135742, 0.10814326, 0.16361059],
[0.20697299, 0.23721898, 0.06455399, 0.11071447, 0.18990229, 0.19063729],
[0.10320242, 0.22173141, 0.2547973 , 0.2314068 , 0.07063974, 0.11822232]
], dtype=tf.float32)
# find the indices in the batch where the true label is the class 0
indices = tf.where(tf.math.equal(y_true, 0))
# create a tensor with the number of updates you want to replace in `y_pred`
updates = tf.repeat(
[[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]],
repeats=indices.shape[0],
axis=0)
# insert the updates into `y_pred` at the specified indices
modified_y_pred = tf.tensor_scatter_nd_update(y_pred, indices, updates)
print(modified_y_pred)
# tf.Tensor(
# [[0.34670502, 0.04551039, 0.14020428, 0.14341979, 0.21430719, 0.10985339],
# [1.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000],
# [0.09199252, 0.21889475, 0.1170236 , 0.1929019 , 0.20311192, 0.17607528],
# [0.3246354 , 0.23257554, 0.15549366, 0.17282239, 0.00000001, 0.11447308],
# [0.16502093, 0.13163856, 0.14371352, 0.19880624, 0.23360236, 0.12721846],
# [1.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000],
# [0.20697299, 0.23721898, 0.06455399, 0.11071447, 0.18990229, 0.19063729],
# [1.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000]],
# shape=(8, 6), dtype=tf.float32)
This final tensor, modified_y_pred, can be used in differentiation.
EDIT:
It might be easier to do this with masks.
Example:
# these arent normalized to 1 but you get the point
probs = tf.random.normal([2, 4, 4, 6])
# raw labels per pixel
labels = tf.random.uniform(
shape=[2, 4, 4],
minval=0,
maxval=6,
dtype=tf.int64)
# your labels are already one-hot encoded
labels = tf.one_hot(labels, 6)
# boolean mask where classes are `0`
# converting back to int labels with argmax for purposes of
# using `tf.math.equal`. Matching on `[1, 0, 0, 0, 0, 0]` is
# potentially buggy; matching on an integer is a lot more
# explicit.
mask = tf.math.equal(tf.math.argmax(labels, -1), 0)[..., None]
# flip the mask to zero out the pixels across channels where
# labels are zero
probs *= tf.cast(tf.math.logical_not(mask), tf.float32)
# multiply the mask by the one-hot labels, and add back
# to the already masked probabilities.
probs += labels * tf.cast(mask, tf.float32)

Keras replace log(0) in custom loss function

I am trying to use Poisson unscaled deviance as a loss function for my neural network, but there's a major flow with this : y_true can take (and will take very often) the value 0.
Unscaled deviance works like this for Poisson case :
If y_true = 0, then loss = 2 * d * y_pred
If y_true > 0, then loss = 2 * d *y_pred * (y_true * log(y_true)-y_true * log(y_pred)-y_true+y_pred
Note that as soon as log(0) is computed, the loss becomes -inf so my goal is to prevent this to happen.
I tried using the switch function to solve this but here's the trick:
If I have the value log(0), I don't want to replace it by 0 (with K.zeros()) because it would be considering that y_true = 1 since log(1) = 0.
Therefore I want to try using a large negative value in this case (-10000 for example) but I don't know how to do this since K.variable(-10000) gives the error:
ValueError: Rank of `condition` should be less than or equal to rank of `then_expression` and `else_expression`. ndim(condition)=1, ndim(then_expression)=0
Using K.zeros_like(y_true) instead of K.variable(-10000) will work for keras but it is mathematically incorrect and the optimisation doesn't work properly because of this.
I'd like to know how to replace the log by a large negative value in the switch function. Here's my attempt:
def custom_loss3(data, y_pred):
y_true = data[:, 0]
d = data[:, 1]
# condition
loss_value = KB.switch(KB.less_equal(y_true, 0),
2 * d * y_pred, 2 * d * (y_true * KB.switch(KB.less_equal(y_true, 0),
KB.variable(-10000), KB.log(y_true)) - y_true * KB.switch(KB.less_equal(y_pred, 0.), KB.variable(-10000), KB.log(y_pred)) - y_true + y_pred))
return loss_value

Keras - Precision and Recall is greater than 1 (Multi classification)

I am working on a multi classification problem using CNN's in keras. My precision and recall score is always over 1 which does not make any sense at all. Attached below is my code, what am I doing wrong?
def recall(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy',recall,precision])
I was able to figure this out. The above code works perfectly once you one-hot encode all the categorical labels. Also, make sure you do NOT have sparse_categorical_crossentropy as your loss function, and instead just use categorical_crossentropy.
If you wish to convert your categorical values to one-hot encoded values in Keras, you can just use this code:
from keras.utils import to_categorical
y_train = to_categorical(y_train)
The reason you have to do the above is noted in Keras documentation:
"when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros except for a 1 at the index corresponding to the class of the sample). In order to convert integer targets into categorical targets, you can use the Keras utility to_categorical"

Categories