I'm trying to maximize the number of predictions that are close to the true value, even if this results in crazy outliers that may otherwise skew a median (which I already have a working loss for) or mean.
So, I try this custom loss function:
def lossMetricPercentGreaterThanTenPercentError(y_true, y_pred):
"""
CURRENTLY DOESN'T WORK AS LOSS: NOT DIFFERENTIABLE
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
See https://keras.io/losses/
"""
from keras import backend as K
import tensorflow as tf
diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true), K.epsilon(), None))
withinTenPct = tf.reduce_sum(tf.cast(K.less_equal(diff, 0.1), tf.int32), axis= -1) / tf.size(diff, out_type= tf.int32)
return 100 * (1 - tf.cast(withinTenPct, tf.float32))
I understand that at least the less_equal function isn't differentiable (I'm not sure if it's also throwing a fit over tf.size); is there some tensor operation that can approximate "less than or equal to"?
I'm on Tensorflow 1.12.3 and cannot upgrade, so even if tf.numpy_function(lambda x: np.sum(x <= 0.1) / len(x), diff, tf.float32) would work as a wrapper I can't use tf.numpy_function.
From the error message it looks like some gradient operation has not been implemented in Keras.
You could try to use Tensorflow operations to achieve the same result (Untested!):
diff = tf.abs(y_true - y_pred) / tf.clip_by_value(tf.abs(y_true), 1e-12, 1e12))
withinTenPct = tf.reduce_mean(tf.cast(tf.less_equal(diff, tf.constant(0.1, dtype=tf.float32, shape=tf.shape(diff)), tf.int32)))
return 100.0 * (1.0 - tf.cast(withinTenPct, tf.float32))
Alternatively, you can try the tf.keras.losses.logcosh(y_true, y_pred).
As it seems to fit your use case. See Tf Doc
Related
For a custom Keras loss function, I need to create a float tensor from a bool tensor. Unfortunately, K.cast() is not differentiable and therefore can't be used. Is there an alternative way to do this that is differentiable?
less_than_tau = y_pred < tau
less_than_tau = K.cast(less_than_tau, 'float32')
Dr. Snoopy is right.
The way you solve for this in deep learning is "soft" functions, such as softmax instead of max.
In your case, if you want to minimize y-pred relative y-tau, you'd do something like
switch = sigmoid(y_pred - y_tau)
loss = switch * true_case + (1. - switch) * false_case
I am creating a custom loss function with Keras and TF as the backend. The loss function computes the proportion of elements with errors that are greater than some threshold.
def custom_loss(y_true, y_pred, thresh=1e-6):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
div_result = Lambda(
lambda x: x[0] / x[1])(
[y_true_f - y_pred_f, y_true_f])
greater = K.greater(div_result, thresh)
return K.sum(K.cast(greater, 'float32'))
Not surprisingly, I run into the error "An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval." I think this is because the K.greater operation is not differentiable.
Is there any way around this?
Thanks.
I'm trying to segment data where the label can be quite sparse. Therefore I want to only calculate gradients in columns that have at least one nonzero value.
I've tried some methods where I apply an extra input which is the mask of these nonzero columns, but given that all the necessary information already is contained in y_true, a method which only looks at y_true to find the mask would definitely be preferable.
If I would implement it with numpy, it would probably look something like this:
def loss(y_true, y_pred):
indices = np.where(np.sum(y_true, axis=1) > 0)
return binary_crossentropy(y_true[indices], y_pred[indices])
y_true and y_pred are in this example vectorized 2D images.
How could this be "translated" to a differentiable Keras loss function?
Use tf-compatible operations, via tf and keras.backend:
import tensorflow as tf
import keras.backend as K
from keras.losses import binary_crossentropy
def custom_loss(y_true, y_pred):
indices = K.squeeze(tf.where(K.sum(y_true, axis=1) > 0))
y_true_sparse = K.cast(K.gather(y_true, indices), dtype='float32')
y_pred_sparse = K.cast(K.gather(y_pred, indices), dtype='float32')
return binary_crossentropy(y_true_sparse, y_pred_sparse) # returns a tensor
I'm unsure about the exact dimensionality specs of your problem, but loss must evaluate to a single value - which above doesn't, since you're passing multi-dimensional predictions and labels. To reduce dims, wrap the return above with e.g. K.mean. Example:
y_true = np.random.randint(0,2,(10,2))
y_pred = np.abs(np.random.randn(10,2))
y_pred /= np.max(y_pred) # scale between 0 and 1
print(K.get_value(custom_loss(y_true, y_pred))) # get_value evaluates returned tensor
print(K.get_value(K.mean(custom_loss(y_true, y_pred))
>> [1.1489482 1.2705883 0.76229745 5.101402 3.1309896] # sparse; 5 / 10 results
>> 2.28284 # single value, as required
(Lastly, note that this sparsity will bias the loss by excluding all-zero columns from the total label/pred count; if undesired, you can average via K.sum and K.shape or K.size)
I would like to implement a custom loss function shown in this paper with Keras.
My loss is not going down and I have the feeling that it is because of the implementation of the loss: It doesn't use Keras' backend for everything but rather a combination of some K functions, simple operations and numpy:
def l1_matrix_norm(M):
return K.cast(K.max(K.sum(K.abs(M), axis=0)), 'float32')
def reconstruction_loss(patch_size, mask, center_weight=0.9):
mask = mask.reshape(1, *mask.shape).astype('float32')
mask_inv = 1 - mask
def loss(y_true, y_pred):
diff = y_true - y_pred
center_part = mask * diff
center_part_normed = l1_matrix_norm(center_part)
surr_part = mask_inv * diff
surr_part_normed = l1_matrix_norm(surr_part)
num_pixels = np.prod(patch_size).astype('float32')
numerator = center_weight * center_part_normed + (1 - center_weight) * surr_part_normed
return numerator / num_pixels
return loss
Is it necessary to use Keras functions, if so for which type of operations do I need it (I saw some code where simple operations such as addition don't use K).
Also if I have to use a Keras backend function, can I instead use TensorFlows function?
NN training depends on being able to compute the derivatives of all functions in the graph including the loss function. Keras backend functions and TensorFlow functions are annotated such that tensorflow (or other backend) automatically known how to compute gradients. That is not the case for numpy functions. It is possible to use non tf functions, if you do know how to compute their gradients manually (see tf.custom_gradients). In general, I would recommend with sticking with backend functions preferably and then tensorflow functions when necessary.
I want to create a custom metric for pearson correlation as defined here
I'm not sure how exactly to apply it to batches of y_pred and y_true
What I did:
def pearson_correlation_f(y_true, y_pred):
y_true,_ = tf.split(y_true[:,1:],2,axis=1)
y_pred, _ = tf.split(y_pred[:,1:], 2, axis=1)
fsp = y_pred - K.mean(y_pred,axis=-1,keepdims=True)
fst = y_true - K.mean(y_true,axis=-1, keepdims=True)
corr = K.mean((K.sum((fsp)*(fst),axis=-1))) / K.mean((
K.sqrt(K.sum(K.square(y_pred -
K.mean(y_pred,axis=-1,keepdims=True)),axis=-1) *
K.sum(K.square(y_true - K.mean(y_true,axis=-1,keepdims=True)),axis=-1))))
return corr
Is it necessary for me to use keepdims and handle the batch dimension manually and the take the mean over it? Or does Keras somehow do this automatically?
When you use K.mean without an axis, Keras automatically calculates the mean for the entire batch.
And the backend already has standard deviation functions, so it might be cleaner (and perhaps faster) to use them.
If your true data is shaped like (BatchSize,1), I'd say keep_dims is unnecessary. Otherwise I'm not sure and it would be good to test the results.
(I don't understand why you use split, but it seems also unnecessary).
So, I'd try something like this:
fsp = y_pred - K.mean(y_pred) #being K.mean a scalar here, it will be automatically subtracted from all elements in y_pred
fst = y_true - K.mean(y_true)
devP = K.std(y_pred)
devT = K.std(y_true)
return K.mean(fsp*fst)/(devP*devT)
If it's relevant to have the loss for each feature instead of putting them all in the same group:
#original shapes: (batch, 10)
fsp = y_pred - K.mean(y_pred,axis=0) #you take the mean over the batch, keeping the features separate.
fst = y_true - K.mean(y_true,axis=0)
#mean shape: (1,10)
#fst shape keeps (batch,10)
devP = K.std(y_pred,axis=0)
devt = K.std(y_true,axis=0)
#dev shape: (1,10)
return K.sum(K.mean(fsp*fst,axis=0)/(devP*devT))
#mean shape: (1,10), making all tensors in the expression be (1,10).
#sum is only necessary because we need a single loss value
Summing the result of the ten features or taking a mean of them is the same, being one 10 times the other (That is not very relevant to keras models, affecting only the learning rate, but many optimizers quickly find their way around this).