Is there a differentiable alternative to K.cast? - python

For a custom Keras loss function, I need to create a float tensor from a bool tensor. Unfortunately, K.cast() is not differentiable and therefore can't be used. Is there an alternative way to do this that is differentiable?
less_than_tau = y_pred < tau
less_than_tau = K.cast(less_than_tau, 'float32')

Dr. Snoopy is right.
The way you solve for this in deep learning is "soft" functions, such as softmax instead of max.
In your case, if you want to minimize y-pred relative y-tau, you'd do something like
switch = sigmoid(y_pred - y_tau)
loss = switch * true_case + (1. - switch) * false_case

Related

tensorflow MDA custom loss and ValueError: No gradients provided for any variable

I would like to use the MDA (mean direction accuracy) as a custom loss function for a tensorflow neural network.
I am trying to implement this as described in here:
Custom Mean Directional Accuracy loss function in Keras
def mda(y_true, y_pred):
s = K.equal(K.sign(y_true[1:] - y_true[:-1]),
K.sign(y_pred[1:] - y_pred[:-1]))
return K.mean(K.cast(s, K.floatx()))
The network works fine but when I try to fit my data I am getting this error:
ValueError: No gradients provided for any variable
I think that this is because I am loosing the gradient info from my pred tensor but I don't know how can implement this.... or if this makes any sense at all.... Finally I want to predict is if some numeric series is going up or down, that is why this function made sense to me.
The problem is that with K.equal and K.cast, you change numbers into bools. As a result, no gradient can be calculated.
You could replace them with a calculation; using the fact that when two numbers are equal, their difference is zero, and that since sign returns only [-1, 0, 1], the absolute difference can only be 0, 1 or 2:
def mda(y_true, y_pred):
d = K.abs(K.sign(y_true[1:] - y_true[:-1]) - (K.sign(y_pred[1:] - y_pred[:-1])))
s = (1. - d) * (d - 1.) * (d - 2.) / 2.
return K.mean(s)
s is equal 1 when your K.equal is true, and 0 otherwise
Thanks Reda and AndrzeO for your answers my question. As AndrzejO mention, equals transform the data to boolean so no gradient there.
I implemented this other solution as an alternative to AndrzejO solution:
def mda_custom_loss(y_true, y_pred):
res = tf.math.sign(y_true[1:] - y_true[:-1]) - tf.math.sign(y_pred[1:] - y_pred[:-1])
s = tf.math.abs(tf.math.sign(res))
return 1 - tf.math.reduce_mean(tf.math.sign(s))

pytorch custom loss function nn.CrossEntropyLoss

After studying autograd, I tried to make loss function myself.
And here are my loss
def myCEE(outputs,targets):
exp=torch.exp(outputs)
A=torch.log(torch.sum(exp,dim=1))
hadamard=F.one_hot(targets, num_classes=10).float()*outputs
B=torch.sum(hadamard, dim=1)
return torch.sum(A-B)
and I compared with torch.nn.CrossEntropyLoss
here are results
for i,j in train_dl:
inputs=i
targets=j
break
outputs=model(inputs)
myCEE(outputs,targets) : tensor(147.5397, grad_fn=<SumBackward0>)
loss_func = nn.CrossEntropyLoss(reduction='sum') : tensor(147.5397, grad_fn=<NllLossBackward>)
values were same.
I thought, because those are different functions so grad_fn are different and it
won't cause any problems.
But something happened!
After 4 epochs, loss values are turned to nan.
Contrary to myCEE, with nn.CrossEntropyLoss learning went well.
So, I wonder if there is a problem with my function.
After read some posts about nan problems, I stacked more convolutions to the model.
As a result 39-epoch training did not make an error.
Nevertheless, I'd like to know difference between myCEE and nn.CrossEntropyLoss
torch.nn.CrossEntropyLoss is different to your implementation because it uses a trick to counter instable computation of the exponential when using numerically big values. Given the logits output {l_1, ... l_j, ..., l_n}, the softmax is defined as:
softmax(l_i) = exp(l_i) / sum_j(exp(l_j))
The trick is to multiple both the numerator and denominator by exp(-β):
softmax(l_i) = exp(l_i)*exp(-β) / [sum_j(exp(l_j))*exp(-β)]
= exp(l_i-β) / sum_j(exp(l_j-β))
Then the log-softmax comes down to:
logsoftmax(l_i) = l_i - β - log[sum_j(exp(l_j-β))]
In practice β is chosen as the highest logit value i.e. β = max_j(l_j).
You can read more about it on this question: Numerically Stable Softmax.

Do I need to use backend function for a custom Keras loss

I would like to implement a custom loss function shown in this paper with Keras.
My loss is not going down and I have the feeling that it is because of the implementation of the loss: It doesn't use Keras' backend for everything but rather a combination of some K functions, simple operations and numpy:
def l1_matrix_norm(M):
return K.cast(K.max(K.sum(K.abs(M), axis=0)), 'float32')
def reconstruction_loss(patch_size, mask, center_weight=0.9):
mask = mask.reshape(1, *mask.shape).astype('float32')
mask_inv = 1 - mask
def loss(y_true, y_pred):
diff = y_true - y_pred
center_part = mask * diff
center_part_normed = l1_matrix_norm(center_part)
surr_part = mask_inv * diff
surr_part_normed = l1_matrix_norm(surr_part)
num_pixels = np.prod(patch_size).astype('float32')
numerator = center_weight * center_part_normed + (1 - center_weight) * surr_part_normed
return numerator / num_pixels
return loss
Is it necessary to use Keras functions, if so for which type of operations do I need it (I saw some code where simple operations such as addition don't use K).
Also if I have to use a Keras backend function, can I instead use TensorFlows function?
NN training depends on being able to compute the derivatives of all functions in the graph including the loss function. Keras backend functions and TensorFlow functions are annotated such that tensorflow (or other backend) automatically known how to compute gradients. That is not the case for numpy functions. It is possible to use non tf functions, if you do know how to compute their gradients manually (see tf.custom_gradients). In general, I would recommend with sticking with backend functions preferably and then tensorflow functions when necessary.

A differentiable tensor operation approximating less than or equal to?

I'm trying to maximize the number of predictions that are close to the true value, even if this results in crazy outliers that may otherwise skew a median (which I already have a working loss for) or mean.
So, I try this custom loss function:
def lossMetricPercentGreaterThanTenPercentError(y_true, y_pred):
"""
CURRENTLY DOESN'T WORK AS LOSS: NOT DIFFERENTIABLE
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
See https://keras.io/losses/
"""
from keras import backend as K
import tensorflow as tf
diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true), K.epsilon(), None))
withinTenPct = tf.reduce_sum(tf.cast(K.less_equal(diff, 0.1), tf.int32), axis= -1) / tf.size(diff, out_type= tf.int32)
return 100 * (1 - tf.cast(withinTenPct, tf.float32))
I understand that at least the less_equal function isn't differentiable (I'm not sure if it's also throwing a fit over tf.size); is there some tensor operation that can approximate "less than or equal to"?
I'm on Tensorflow 1.12.3 and cannot upgrade, so even if tf.numpy_function(lambda x: np.sum(x <= 0.1) / len(x), diff, tf.float32) would work as a wrapper I can't use tf.numpy_function.
From the error message it looks like some gradient operation has not been implemented in Keras.
You could try to use Tensorflow operations to achieve the same result (Untested!):
diff = tf.abs(y_true - y_pred) / tf.clip_by_value(tf.abs(y_true), 1e-12, 1e12))
withinTenPct = tf.reduce_mean(tf.cast(tf.less_equal(diff, tf.constant(0.1, dtype=tf.float32, shape=tf.shape(diff)), tf.int32)))
return 100.0 * (1.0 - tf.cast(withinTenPct, tf.float32))
Alternatively, you can try the tf.keras.losses.logcosh(y_true, y_pred).
As it seems to fit your use case. See Tf Doc

Differences between F.relu(X) and torch.max(X, 0)

I am trying to implement the following loss function
To me, the most straight forword implementation would be using torch.max
losses = torch.max(ap_distances - an_distances + margin, torch.Tensor([0]))
However, I saw other implementations on github using F.relu
losses = F.relu(ap_distances - an_distances + margin)
They give essential the same output, but I wonder if there's any fundamental difference between the two methods.
torch.max is not differentiable according to this discussion.
A loss function needs to be continuous and differentiable to do backprop. relu is differentiable as it can be approximated and hence the use of it in a loss function.
If you are trying to limit the output value like in ReLU6 (https://pytorch.org/docs/stable/generated/torch.nn.ReLU6.html), you can use
import torch.nn.functional as F
x1 = F.hardtanh(x, min_value, max_value)
This preserves the differentiability of the model.
This will produce a result like below. (min and max values will be different)

Categories