I am creating a custom loss function with Keras and TF as the backend. The loss function computes the proportion of elements with errors that are greater than some threshold.
def custom_loss(y_true, y_pred, thresh=1e-6):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
div_result = Lambda(
lambda x: x[0] / x[1])(
[y_true_f - y_pred_f, y_true_f])
greater = K.greater(div_result, thresh)
return K.sum(K.cast(greater, 'float32'))
Not surprisingly, I run into the error "An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval." I think this is because the K.greater operation is not differentiable.
Is there any way around this?
Thanks.
Related
I am using a Keras neural network inside a system of ODEs. Here is my model:
model = Sequential()
model.add(Dense(10, input_dim=3, activation='relu'))
model.add(Dense(1))
And here is a function that describes my differential equations. That Keras model is used in the calculation of ODEs.
def dxdt_new(t, x, *args):
N, beta, gamma, delta = args
deltaInfected = beta * x[0] * x[1] / N
quarantine = model.predict(np.expand_dims(x[:3], axis=0)) / N
recoveredQ = delta * x[3]
recoveredNoQ = gamma * x[1]
S = -deltaInfected
I = deltaInfected - recoveredNoQ - quarantine
R = recoveredNoQ + recoveredQ
Q = quarantine - recoveredQ
return [S, I, R, Q]
And I need to use a custom loss function for training. Inside my loss function, I cannot use the values predicted by a neural network since I do not have real data on it. I am trying to use the values that are affected by the predicted value. So I do not use y_true and y_pred.
def my_loss(y_true, y_pred):
infected = K.constant(INFECTED)
recovered = K.constant(RECOVERED)
dead = K.constant(DEAD)
pred = K.constant(predicted)
loss = K.sum((K.log(infected) - K.log(pred[1][:] + pred[3][:]))**2)
loss += K.sum((K.log(recovered + dead) - K.log(pred[2][:]))**2)
return loss
But when I try to train my neural network, I get the following error:
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
So it seems like this loss function does not work properly. How can I organize my code to get it to work? Is there any other way to construct a loss function?
I cannot use the values predicted by a neural network since I do not have real data on it
For the customized loss function to work with the Backpropagation algorithm, you need to have it defined in terms of y_true and y_pred. In the case when you do not have this data, or when your loss function is non differentiable, you have to use another algorithm to optimize the weights in your neural network. Some examples for this could be a Genetic Algorithm or Particle Swarm Optimization.
So I am relatively new to the ML/AI game in python, and I'm currently working on a problem surrounding the implementation of a custom objective function for XGBoost.
My differential equation knowledge is pretty rusty so I've created a custom obj function with a gradient and hessian that models the mean squared error function that is ran as the default objective function in XGBRegressor to make sure that I am doing all of this correctly. The problem is, the results of the model (the error outputs are close but not identical for the most part (and way off for some points). I don't know what I'm doing wrong or how that could be possible if I am computing things correctly. If you all could look at this an maybe provide insight into where I am wrong, that would be awesome!
The original code without a custom function is:
import xgboost as xgb
reg = xgb.XGBRegressor(n_estimators=150,
max_depth=2,
objective ="reg:squarederror",
n_jobs=-1)
reg.fit(X_train, y_train)
y_pred_test = reg.predict(X_test)
and my custom objective function for MSE is as follows:
def gradient_se(y_true, y_pred):
#Compute the gradient squared error.
return (-2 * y_true) + (2 * y_pred)
def hessian_se(y_true, y_pred):
#Compute the hessian for squared error
return 0*(y_true + y_pred) + 2
def custom_se(y_true, y_pred):
#squared error objective. A simplified version of MSE used as
#objective function.
grad = gradient_se(y_true, y_pred)
hess = hessian_se(y_true, y_pred)
return grad, hess
the documentation reference is here
Thanks!
According to the documentation, the library passes the predicted values (y_pred in your case) and the ground truth values (y_true in your case) in this order.
You pass the y_true and y_pred values in reversed order in your custom_se(y_true, y_pred) function to both the gradient_se and hessian_se functions. For the hessian it doesn't make a difference since the hessian should return 2 for all x values and you've done that correctly.
For the gradient_se function you've incorrect signs for y_true and y_pred.
The correct implementation is as follows:
def gradient_se(y_pred, y_true):
#Compute the gradient squared error.
return 2*(y_pred - y_true)
def hessian_se(y_pred, y_true):
#Compute the hessian for squared error
return 0*y_true + 2
def custom_se(y_pred, y_true):
#squared error objective. A simplified version of MSE used as
#objective function.
grad = gradient_se(y_pred, y_true)
hess = hessian_se(y_pred, y_true)
return grad, hess
Update: Please keep in mind that the native XGBoost implementation and the implementation of the sklearn wrapper for XGBoost use a different ordering of the arguments. The native implementation takes predictions first and true labels (dtrain) second, while the sklearn implementation takes the true labels (dtrain) first and the predictions second.
I would like to implement a custom loss function shown in this paper with Keras.
My loss is not going down and I have the feeling that it is because of the implementation of the loss: It doesn't use Keras' backend for everything but rather a combination of some K functions, simple operations and numpy:
def l1_matrix_norm(M):
return K.cast(K.max(K.sum(K.abs(M), axis=0)), 'float32')
def reconstruction_loss(patch_size, mask, center_weight=0.9):
mask = mask.reshape(1, *mask.shape).astype('float32')
mask_inv = 1 - mask
def loss(y_true, y_pred):
diff = y_true - y_pred
center_part = mask * diff
center_part_normed = l1_matrix_norm(center_part)
surr_part = mask_inv * diff
surr_part_normed = l1_matrix_norm(surr_part)
num_pixels = np.prod(patch_size).astype('float32')
numerator = center_weight * center_part_normed + (1 - center_weight) * surr_part_normed
return numerator / num_pixels
return loss
Is it necessary to use Keras functions, if so for which type of operations do I need it (I saw some code where simple operations such as addition don't use K).
Also if I have to use a Keras backend function, can I instead use TensorFlows function?
NN training depends on being able to compute the derivatives of all functions in the graph including the loss function. Keras backend functions and TensorFlow functions are annotated such that tensorflow (or other backend) automatically known how to compute gradients. That is not the case for numpy functions. It is possible to use non tf functions, if you do know how to compute their gradients manually (see tf.custom_gradients). In general, I would recommend with sticking with backend functions preferably and then tensorflow functions when necessary.
I'm trying to maximize the number of predictions that are close to the true value, even if this results in crazy outliers that may otherwise skew a median (which I already have a working loss for) or mean.
So, I try this custom loss function:
def lossMetricPercentGreaterThanTenPercentError(y_true, y_pred):
"""
CURRENTLY DOESN'T WORK AS LOSS: NOT DIFFERENTIABLE
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
See https://keras.io/losses/
"""
from keras import backend as K
import tensorflow as tf
diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true), K.epsilon(), None))
withinTenPct = tf.reduce_sum(tf.cast(K.less_equal(diff, 0.1), tf.int32), axis= -1) / tf.size(diff, out_type= tf.int32)
return 100 * (1 - tf.cast(withinTenPct, tf.float32))
I understand that at least the less_equal function isn't differentiable (I'm not sure if it's also throwing a fit over tf.size); is there some tensor operation that can approximate "less than or equal to"?
I'm on Tensorflow 1.12.3 and cannot upgrade, so even if tf.numpy_function(lambda x: np.sum(x <= 0.1) / len(x), diff, tf.float32) would work as a wrapper I can't use tf.numpy_function.
From the error message it looks like some gradient operation has not been implemented in Keras.
You could try to use Tensorflow operations to achieve the same result (Untested!):
diff = tf.abs(y_true - y_pred) / tf.clip_by_value(tf.abs(y_true), 1e-12, 1e12))
withinTenPct = tf.reduce_mean(tf.cast(tf.less_equal(diff, tf.constant(0.1, dtype=tf.float32, shape=tf.shape(diff)), tf.int32)))
return 100.0 * (1.0 - tf.cast(withinTenPct, tf.float32))
Alternatively, you can try the tf.keras.losses.logcosh(y_true, y_pred).
As it seems to fit your use case. See Tf Doc
I am implementing my own code using keras to do semantic segmentation. My testing images has shape (10, 512, 512, 5), where 10 is the number of images, 512 is their size and 5 is the number of classes I want to segment. As last activation function I use softmax and as loss I want to extract the dice loss (https://arxiv.org/abs/1606.04797) in order to improve the segmentation results. My code is:
eps = 1e-3
def dice(y_true, y_pred):
y_pred = K.one_hot(K.argmax(y_pred,axis=-1), Nclasses)
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
num = 2*K.sum(y_true_f*y_pred_f)
den = K.sum(K.square(y_true_f))+K.sum(K.square(y_pred_f))+eps
return num/den
def dice_loss(y_true, y_pred):
return 1-dice(y_true, y_pred)
I use K.one_hot(K.argmax(...)) because in this way my y_pred is binary and not made by probabilities (right?).
Anyway, when the training process starts, I receive this error:
"ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval."
Try using this code snippet for your dice coefficient.
Important observation : If you have your masks one-hot-encoded, this code should also work for multi-class segmentation.
smooth = 1.
def dice_coef(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
def dice_coef_loss(y_true, y_pred):
return -dice_coef(y_true, y_pred)
This post seems to indicate that since argmax does not have a gradient in keras, you will not be able to use it in your custom loss function.