I'm using LightGBM and I need to realize a loss function that during the training give a penalty when the prediction is lower than the target. In other words I assume that underestimates are much worse than overestimates. I've found this suggestion that do exactly the opposite:
def custom_asymmetric_train(y_true, y_pred):
residual = (y_true - y_pred).astype("float")
grad = np.where(residual<0, -2*10.0*residual, -2*residual)
hess = np.where(residual<0, 2*10.0, 2.0)
return grad, hess
def custom_asymmetric_valid(y_true, y_pred):
residual = (y_true - y_pred).astype("float")
loss = np.where(residual < 0, (residual**2)*10.0, residual**2)
return "custom_asymmetric_eval", np.mean(loss), False
https://towardsdatascience.com/custom-loss-functions-for-gradient-boosting-f79c1b40466d)
How can I modify it for my purpose?
I believe this function is where you want to make a change.
def custom_asymmetric_valid(y_true, y_pred):
residual = (y_true - y_pred).astype("float")
loss = np.where(residual < 0, (residual**2)*10.0, residual**2)
return "custom_asymmetric_eval", np.mean(loss), False
The line where loss is worked out has a comparison.
loss = np.where(residual < 0, (residual**2)*10.0, residual**2)
When residual is less that 0, loss is residual^2 * 10
where whne about 0, loss is just redisual^2.
So if we change this less than to a greater than. This will flip the skew.
loss = np.where(residual > 0, (residual**2)*10.0, residual**2)
I think this would be helpful. Originated from Custom loss function with Keras to penalise more negative prediction
def customLoss(true,pred):
diff = pred - true
greater = K.greater(diff,0)
greater = K.cast(greater, K.floatx()) #0 for lower, 1 for greater
greater = greater + 1 #1 for lower, 2 for greater
#use some kind of loss here, such as mse or mae, or pick one from keras
#using mse:
return K.mean(greater*K.square(diff))
model.compile(optimizer = 'adam', loss = customLoss)
Related
Thank you for reading my post.
I’m currently developing the peak detection algorithm using CNN to determine the ideal convolution kernel which is representable as the ideal mother wavelet function that will maximize the peak detection accuracy.
To begin with, I created my own IoU loss function and the simple model and tried to run the learning. The execution itself worked without any errors, but somehow it failed.
The parameters of the model with custom loss function doesn't upgraded thorough its learning over epochs
My own loss function is described as below.
def IoU(inputs: torch.Tensor, labels: torch.Tensor,
smooth: float=0.1, threshold: float = 0.5, alpha: float = 1.0):
'''
- alpha: a parameter that sharpen the thresholding.
if alpha = 1 -> thresholded input is the same as raw input.
'''
thresholded_inputs = inputs**alpha / (inputs**alpha + (1 - inputs)**alpha)
inputs = torch.where(thresholded_inputs < threshold, 0, 1)
batch_size = inputs.shape[0]
intersect_tensor = (inputs * labels).view(batch_size, -1)
intersect = intersect_tensor.sum(-1)
union_tensor = torch.max(inputs, labels).view(batch_size, -1)
union = union_tensor.sum(-1)
iou = (intersect + smooth) / (union + smooth) # We smooth our devision to avoid 0/0
iou_score = iou.mean()
return 1- iou_score
and my training model is,
class MLP(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Conv1d(1, 1, kernel_size=32, stride=1, padding=16),
nn.Linear(257, 256),
nn.LogSoftmax(1)
)
def forward(self, x):
return self.net(x)
model = MLP()
opt = optim.Adadelta(model.parameters())
# initialization of the kernel of Conv1d
def init_kernel(m):
if type(m) == nn.Conv1d:
nn.init.kaiming_normal_(m.weight)
print(m.weight)
plt.plot(m.weight[0][0].detach().numpy())
model.apply(init_kernel)
def step(x, y, is_train=True):
opt.zero_grad()
y_pred = model(x)
y_pred = y_pred.reshape(-1, 256)
loss = IoU(y_pred, y)
loss.requires_grad = True
loss.retain_grad = True
if is_train:
loss.backward()
opt.step()
return loss, y_pred
and lastly, the execution code is,
from torch.autograd.grad_mode import F
train_loss_arr, val_loss_arr = [], []
valbose = 10
epochs = 200
for e in range(epochs):
train_loss, val_loss, acc = 0., 0., 0.,
for x, y in train_set.as_numpy_iterator():
x = torch.from_numpy(x)
y = torch.from_numpy(y)
model.train()
loss, y_pred = step(x, y, is_train=True)
train_loss += loss.item()
train_loss /= len(train_set)
for x, y ,in val_set.as_numpy_iterator():
x = torch.from_numpy(x)
y = torch.from_numpy(y)
model.eval()
with torch.no_grad():
loss, y_pred = step(x, y, is_train=False)
val_loss += loss.item()
val_loss /= len(val_set)
train_loss_arr.append(train_loss)
val_loss_arr.append(val_loss)
# visualize current kernel to check whether the learning is on progress safely.
if e % valbose == 0:
print(f"Epoch[{e}]({(e*100/epochs):0.2f}%): train_loss: {train_loss:0.4f}, val_loss: {val_loss:0.4f}")
fig, axs = plt.subplots(1, 4, figsize=(12, 4))
print(y_pred[0], y_pred[0].shape)
axs[0].plot(x[0][0])
axs[0].set_title("spectra")
axs[1].plot(y_pred[0])
axs[1].set_title("y pred")
axs[2].plot(y[0])
axs[2].set_title("y true")
axs[3].plot(model.state_dict()["net.0.weight"][0][0].numpy())
axs[3].set_title("kernel1")
plt.show()
with these programs, I tried to evaluate this simple model, however, model parameters didn't change at all over epochs.
Visualization of the results at epoch 0 and 30.
epoch 0:
prediction and kernel at epoch0
epoch 30:
prediction and kernel at epoch30
As you can see, the kernel has not be modified through its learning over epochs.
I took a survey to figure out what causes this problem for hours but I'm still not sure how to fix my loss function and model into trainable ones.
Thank you.
Try printing the gradient after loss.backward() with:
y_pred.grad()
I suspect what you'll find is that after a backward pass, the gradient of y_pred is zero. This means that either a.) gradient is not enabled for one or more of the variables at which the computation graph has a node, or b.) (more likely) you are using an operation which is not differentiable.
In your case, at a minimum torch.where is non-differentiable, so you'll need to replace that. Thersholding operations are non-differentiable and are generally replaced with "soft" thresholding operations (see Softmax instead of max function for classification) so that gradient computation still works. Try replacing this with a soft threshold or no threshold at all.
I'm trying to implement the Multiclass Hybrid loss function in Python from following article https://arxiv.org/pdf/1808.05238.pdf for my semantic segmentation problem using an imbalanced dataset. I managed to get my implementation correct enough to start while training the model, but the results are very poor. Model architecture - U-net, learning rate in Adam optimizer is 1e-5. Mask shape is (None, 512, 512, 3), with 3 classes (in my case forest, deforestation, other). The formula I used to implement my loss:
The code I created:
def build_hybrid_loss(_lambda_=1, _alpha_=0.5, _beta_=0.5, smooth=1e-6):
def hybrid_loss(y_true, y_pred):
C = 3
tversky = 0
# Calculate Tversky Loss
for index in range(C):
inputs_fl = tf.nest.flatten(y_pred[..., index])
targets_fl = tf.nest.flatten(y_true[..., index])
#True Positives, False Positives & False Negatives
TP = tf.reduce_sum(tf.math.multiply(inputs_fl, targets_fl))
FP = tf.reduce_sum(tf.math.multiply(inputs_fl, 1-targets_fl[0]))
FN = tf.reduce_sum(tf.math.multiply(1-inputs_fl[0], targets_fl))
tversky_i = (TP + smooth) / (TP + _alpha_ * FP + _beta_ * FN + smooth)
tversky += tversky_i
tversky += C
# Calculate Focal loss
loss_focal = 0
for index in range(C):
f_loss = - (y_true[..., index] * (1 - y_pred[..., index])**2 * tf.math.log(y_pred[..., index]))
# Average over each data point/image in batch
axis_to_reduce = range(1, 3)
f_loss = tf.math.reduce_mean(f_loss, axis=axis_to_reduce)
loss_focal += f_loss
result = tversky + _lambda_ * loss_focal
return result
return hybrid_loss
The prediction of the model after the end of an epoch (I have a problem with swapped colors, so the red in the prediction is actually green, which means forest, so the prediction is mostly forest and not deforestation):
The question is what is wrong with my hybrid loss implementation, what needs to be changed to make it work?
To simplify things a little, I have divided the Hybrid loss into four separate functions: Tversky's loss, Dice coefficient, Dice loss, Hybrid loss. You can see the code below.
def TverskyLoss(targets, inputs, alpha=0.5, beta=0.5, smooth=1e-16, numLabels=3):
tversky = 0
for index in range(numLabels):
inputs_fl = tf.nest.flatten(inputs[..., index])
targets_fl = tf.nest.flatten(targets[..., index])
#True Positives, False Positives & False Negatives
TP = tf.reduce_sum(tf.math.multiply(inputs_fl, targets_fl))
FP = tf.reduce_sum(tf.math.multiply(inputs_fl, 1-targets_fl[0]))
FN = tf.reduce_sum(tf.math.multiply(1-inputs_fl[0], targets_fl))
tversky_i = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)
tversky += tversky_i
return numLabels - tversky
def dice_coef(y_true, y_pred, smooth=1e-16):
y_true_f = tf.nest.flatten(y_true)
y_pred_f = tf.nest.flatten(y_pred)
intersection = tf.math.reduce_sum(tf.math.multiply(y_true_f, y_pred_f))
return (2. * intersection + smooth) / (tf.math.reduce_sum(y_true_f) + tf.math.reduce_sum(y_pred_f) + smooth)
def dice_coef_multilabel(y_true, y_pred, numLabels=3):
dice=0
for index in range(numLabels):
dice -= dice_coef(y_true[..., index], y_pred[..., index])
return numLabels + dice
def build_hybrid_loss(_lambda_=0.5, _alpha_=0.5, _beta_=0.5, smooth=1e-16, C=3):
def hybrid_loss(y_true, y_pred):
tversky = TverskyLoss(y_true, y_pred, alpha=_alpha_, beta=_beta_)
dice = dice_coef_multilabel(y_true, y_pred)
result = tversky + _lambda_ * dice
return result
return hybrid_loss
Adding the loss=build_hybrid_loss() during model compilation will add Hybrid loss as the loss function of the model.
After a short research, I came to the conclusion that in my particular case, a Hybrid loss with _lambda_ = 0.2, _alpha_ = 0.5, _beta_ = 0.5 would not be much better than a single Dice loss or a single Tversky loss. Neither IoU (intersection over union) nor the standard accuracy metric are much better with Hybrid loss. But I believe it is not a rule of thumb that such a Hybrid loss will be worser or at the same level of performance as single loss at all cases.
link to Accuracy graph
link to IoU graph
I am trying to use Poisson unscaled deviance as a loss function for my neural network, but there's a major flow with this : y_true can take (and will take very often) the value 0.
Unscaled deviance works like this for Poisson case :
If y_true = 0, then loss = 2 * d * y_pred
If y_true > 0, then loss = 2 * d *y_pred * (y_true * log(y_true)-y_true * log(y_pred)-y_true+y_pred
Note that as soon as log(0) is computed, the loss becomes -inf so my goal is to prevent this to happen.
I tried using the switch function to solve this but here's the trick:
If I have the value log(0), I don't want to replace it by 0 (with K.zeros()) because it would be considering that y_true = 1 since log(1) = 0.
Therefore I want to try using a large negative value in this case (-10000 for example) but I don't know how to do this since K.variable(-10000) gives the error:
ValueError: Rank of `condition` should be less than or equal to rank of `then_expression` and `else_expression`. ndim(condition)=1, ndim(then_expression)=0
Using K.zeros_like(y_true) instead of K.variable(-10000) will work for keras but it is mathematically incorrect and the optimisation doesn't work properly because of this.
I'd like to know how to replace the log by a large negative value in the switch function. Here's my attempt:
def custom_loss3(data, y_pred):
y_true = data[:, 0]
d = data[:, 1]
# condition
loss_value = KB.switch(KB.less_equal(y_true, 0),
2 * d * y_pred, 2 * d * (y_true * KB.switch(KB.less_equal(y_true, 0),
KB.variable(-10000), KB.log(y_true)) - y_true * KB.switch(KB.less_equal(y_pred, 0.), KB.variable(-10000), KB.log(y_pred)) - y_true + y_pred))
return loss_value
This is the loss function of WGAN-GP
gen_sample = model.generator(input_gen)
disc_real = model.discriminator(real_image, reuse=False)
disc_fake = model.discriminator(gen_sample, reuse=True)
disc_concat = tf.concat([disc_real, disc_fake], axis=0)
# Gradient penalty
alpha = tf.random_uniform(
shape=[BATCH_SIZE, 1, 1, 1],
minval=0.,
maxval=1.)
differences = gen_sample - real_image
interpolates = real_image + (alpha * differences)
gradients = tf.gradients(model.discriminator(interpolates, reuse=True), [interpolates])[0] # why [0]
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients), reduction_indices=[1]))
gradient_penalty = tf.reduce_mean((slopes-1.)**2)
d_loss_real = tf.reduce_mean(disc_real)
d_loss_fake = tf.reduce_mean(disc_fake)
disc_loss = -(d_loss_real - d_loss_fake) + LAMBDA * gradient_penalty
gen_loss = - d_loss_fake
This is the training loss
The generator loss is oscillating, and the value is so big.
My question is:
is the generator loss normal or abnormal?
One thing to note is that your gradient penalty calculation is wrong. The following line:
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients), reduction_indices=[1]))
should actually be:
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients), reduction_indices=[1,2,3]))
You are reducing on the first axis, but the gradient is based on an image as shown by the alpha values and therefore you have to reduce on the axis [1,2,3].
Another error in your code is that the generator loss is:
gen_loss = d_loss_real - d_loss_fake
For the gradient calculation this makes no difference, due to the parameters of the generator only being contained in d_loss_fake. However, for the value of the generator loss this makes all the difference in the world and is the reason why this oszillates this much.
At the end of the day you should look at your actual performance metric you care about to determine the quality of your GAN like the inception score or the Fréchet Inception Distance (FID), because the loss of discriminator and generator are only mildly descriptive.
I want my model to increase the loss for a false positive prediction when training by creating a custom loss function.
The class_weight parameter in model.fit() does not work for this issue. The class_weight is already set to { 0: 1, 1:23 } as I have skewed training data where there are 23 times as many non-true labels as there are true labels.
I am not too experienced when working with the keras backend. I have mostly worked with the functional model.
What I want to create is:
def weighted_binary_crossentropy(y_true, y_pred):
#where y_true == 0 and y_pred == 1:
# weight this loss and make it 50 times larger
#return loss
I can do simple stuff with the tensors such as getting the mean squared error but I have no idea how to do logical stuff.
I have tried to do some hacky solution which doesnt work and feels totally wrong:
def weighted_binary_crossentropy(y_true, y_pred):
false_positive_weight = 50
thresh = 0.5
y_pred_true = K.greater_equal(thresh,y_pred)
y_not_true = K.less_equal(thresh,y_true)
false_positive_tensor = K.equal(y_pred_true,y_not_true)
loss_weights = K.ones_like(y_pred) + false_positive_weight*false_positive_tensor
return K.binary_crossentropy(y_true, y_pred)*loss_weights
I am using python 3 with keras 2 and tensorflow as backend.
Thanks in advance!
I think you're almost there...
from keras.losses import binary_crossentropy
def weighted_binary_crossentropy(y_true, y_pred):
false_positive_weight = 50
thresh = 0.5
y_pred_true = K.greater_equal(thresh,y_pred)
y_not_true = K.less_equal(thresh,y_true)
false_positive_tensor = K.equal(y_pred_true,y_not_true)
#changing from here
#first let's transform the bool tensor in numbers - maybe you need float64 depending on your configuration
false_positive_tensor = K.cast(false_positive_tensor,'float32')
#and let's create it's complement (the non false positives)
complement = 1 - false_positive_tensor
#now we're going to separate two groups
falsePosGroupTrue = y_true * false_positive_tensor
falsePosGroupPred = y_pred * false_positive_tensor
nonFalseGroupTrue = y_true * complement
nonFalseGroupPred = y_pred * complement
#let's calculate one crossentropy loss for each group
#(directly from the keras loss functions imported above)
falsePosLoss = binary_crossentropy(falsePosGroupTrue,falsePosGroupPred)
nonFalseLoss = binary_crossentropy(nonFalseGroupTrue,nonFalseGroupPred)
#return them weighted:
return (false_positive_weight*falsePosLoss) + nonFalseLoss