Data from tensors in custom loss function from keras - python

Given a custom loss function in keras:
def my_custom_loss_func(self, y_true, y_pred):
#some code
Is it possible to get the values in y_pred to do some calculations for the loss function? I tried, but someone told me y_pred is just a place holder and the actual value of y_pred cannot be extracted. You can just use Keras backend functions to process y_pred, but you cannot actually access the values in it, such as y_pred[1], or something like that.
What I want to do is something like: if "the top 10 values in y_pred are negative and the bottom 10 are negative", then "return a very high cost because I do not want the network to be optimized this way".
Yes, no worries. Here is some sample code.
def my_custom_loss_func(self, y_true, y_pred):
position_vector_initial = x[0, 0:3] #global variable
position_vector_now = y_pred[0, 0:3]
angle = angle_between(position_vector_initial, position_vector_now)
if (angle < 0):
return high_loss
else:
return kb.mean(kb.sum(kb.square(y_true - y_pred)))

Related

Tensorflow / Keras custom loss function

I am currently trying to write my own loss functions in Keras, which checks if the values of my prediction are existing in the labels in any order.
Here is a code example written in python:
def my_loss(y_true, y_pred):
n_values = 5
loss = 0
for i in range(n_values):
if y_pred[i] not in y_true:
loss += 1
return loss
I have no idea how to write this code with keras.backend. I am not even able to find the docs for the keras functions like backend.sum(..), .flatten etc.

How to customize an LSTM loss function to only concider a given index range of prediction and target sequence?

I am currently working with an LSTM sequence to sequence model for time domain signal predictions. Because of domain knowledge, I know that the first part of the prediction (about 20%) can never be predicted correctly, since the information required is not available in the given input sequence. The remaining 80% of the predicted sequence are usually predicted quite well. In order to exclude the first 20% from the training optimization, it would be nice to define a loss function that would basically operate on a given index range like the numpy code below:
start = int(0.2*sequence_length)
stop = sequence_length
def mse(pred, target):
""" Mean squared error between two time series np.arrays """
return 1/target.shape[0]*np.sum((pred-target)**2)
def range_mse_loss(y_pred, y):
return mse(y_pred[start:stop],y[start:stop])
How do I have to write this loss function in order to have it work with my preexisting keras code, where loss is simply given by model.compile(loss='mse') ?
You can slice your tensor to just last 80% of the data.
size = int(y_true.shape[0] * 0.8) # for 2D vector, e.g., (100, 1)
loss_fn = tf.keras.losses.MeanSquaredError(name='mse')
loss_fn(y_pred[:-size], y_true[:-size])
You can also use the sample_weights at the tf.keras.losses.MeanSquaredError(), passing an array of weights and the first 20% of weights is zero
size = int(y_true.shape[0] * 0.8) # for 2D vector, e.g., (100, 1)
zeros = tf.zeros((y_true.shape[0] - size), dtype=tf.int32)
ones = tf.ones((size), dtype=tf.int32)
weights = tf.concat([zeros, ones], 0)
loss_fn = tf.keras.losses.MeanSquaredError(name='mse')
loss_fn(y_pred, y_true, sample_weights=weights)
There is a warming of the second solution, the final loss will be lower than the first solution, because you are putting zero in the first predictions values, but you aren't removing them in the formula MSE = 1 /n * sum((y-y_hat)^2).
One workaround would be to mark the observations as None/nan and then overwrite the train_step method. Following tensorflow's tutorial about customizing train_step, you would do something like this
#tf.function
def train_step(keras_model, data):
print('custom train_step')
# Unpack the data. Its structure depends on your model and
# on what you pass to `fit()`.
x, y = data
with tf.GradientTape() as tape:
y_pred = keras_model(x, training=True) # Forward pass
# masking nan values in observations, also assuming that targets are >0.0
mask = tf.greater(y, 0.0)
true_y = tf.boolean_mask(y, mask)
pred_y = tf.boolean_mask(y_pred, mask)
# Compute the loss value
# (the loss function is configured in `compile()`)
loss = keras_model.compiled_loss(true_y, pred_y, regularization_losses=keras_model.losses)
# Compute gradients
trainable_vars = keras_model.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# Update weights
keras_model.optimizer.apply_gradients(zip(gradients, trainable_vars))
# Update metrics (includes the metric that tracks the loss)
keras_model.compiled_metrics.update_state(true_y, pred_y)
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in keras_model.metrics}
This will work for all the performance metrics you are tracking. Alternative way would be to mask the nans inside the loss function but that would be limited to only one loss function and not any other loss function/performance metrics.

converting Sklearn based custom metric function -> to use as Keras' metric for callbacks

I have already been given a custom metric code on which my model is going to be evaluated but they've used sklearn's metrices. I know If I have a metric I can use it in callbacks like
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy', custom_metric])
ModelCheckpoint(monitor='val_custom_metric',
save_best_only=True,
save_weights_only=True,
mode='max',
verbose=1)
It is a multi-output problem with 3 labels,
Submissions are evaluated using a hierarchical macro-averaged recall. First, a standard macro-averaged recall is calculated for each component (label_1,label_2 or label_3). The final score is the weighted average of those three scores, with the label_1 given double weight. You can replicate the metric with the following python snippet:
and I am unable to comprehend how do I implement the code given below in keras-
import numpy as np
import sklearn.metrics
scores = []
for component in ['label_1', 'label_2', 'label_3']:
y_true_subset = solution[solution[component] == component]['target'].values
y_pred_subset = submission[submission[component] == component]['target'].values
scores.append(sklearn.metrics.recall_score(
y_true_subset, y_pred_subset, average='macro'))
final_score = np.average(scores, weights=[2,1,1])
How can I convert it in the form to use as a metric? or more preciely, how can I use keras.backend or to implement this code?
You can only implement the metric, the rest is very obscure and will certainly not participate in Keras.
threshold = 0.5 #you can work this threshold for better results
#considering y_true is made of 0 and 1 only
#considering output shape is (batch, 3)
def custom_metric(y_true, y_pred):
weights = K.constant([2,1,1]) #shape (3,)
y_pred = K.cast(K.greater(y_pred, threshold), K.floatx()) #shape (batch, 3)
true_positives = K.sum(y_pred * y_true, axis=0) #shape (3,)
false_negatives = K.sum((1-y_pred) * y_true, axis=0) #shape (3,)
recall = true_positives / (true_positives + false_negatives)
return K.mean(recall * weights)
Notice that this will be calulated batchwise, and since the denominator is different depending on the results, the calculated metric batchwise will be different compared to when you use the metric for the entire dataset.
You may need big batch sizes to avoid metric instability. And it might be interesting to apply the metric on the entire data with a callback to get the exact result.

Creating a Custom Objective Function in for XGBoost.XGBRegressor

So I am relatively new to the ML/AI game in python, and I'm currently working on a problem surrounding the implementation of a custom objective function for XGBoost.
My differential equation knowledge is pretty rusty so I've created a custom obj function with a gradient and hessian that models the mean squared error function that is ran as the default objective function in XGBRegressor to make sure that I am doing all of this correctly. The problem is, the results of the model (the error outputs are close but not identical for the most part (and way off for some points). I don't know what I'm doing wrong or how that could be possible if I am computing things correctly. If you all could look at this an maybe provide insight into where I am wrong, that would be awesome!
The original code without a custom function is:
import xgboost as xgb
reg = xgb.XGBRegressor(n_estimators=150,
max_depth=2,
objective ="reg:squarederror",
n_jobs=-1)
reg.fit(X_train, y_train)
y_pred_test = reg.predict(X_test)
and my custom objective function for MSE is as follows:
def gradient_se(y_true, y_pred):
#Compute the gradient squared error.
return (-2 * y_true) + (2 * y_pred)
def hessian_se(y_true, y_pred):
#Compute the hessian for squared error
return 0*(y_true + y_pred) + 2
def custom_se(y_true, y_pred):
#squared error objective. A simplified version of MSE used as
#objective function.
grad = gradient_se(y_true, y_pred)
hess = hessian_se(y_true, y_pred)
return grad, hess
the documentation reference is here
Thanks!
According to the documentation, the library passes the predicted values (y_pred in your case) and the ground truth values (y_true in your case) in this order.
You pass the y_true and y_pred values in reversed order in your custom_se(y_true, y_pred) function to both the gradient_se and hessian_se functions. For the hessian it doesn't make a difference since the hessian should return 2 for all x values and you've done that correctly.
For the gradient_se function you've incorrect signs for y_true and y_pred.
The correct implementation is as follows:
def gradient_se(y_pred, y_true):
#Compute the gradient squared error.
return 2*(y_pred - y_true)
def hessian_se(y_pred, y_true):
#Compute the hessian for squared error
return 0*y_true + 2
def custom_se(y_pred, y_true):
#squared error objective. A simplified version of MSE used as
#objective function.
grad = gradient_se(y_pred, y_true)
hess = hessian_se(y_pred, y_true)
return grad, hess
Update: Please keep in mind that the native XGBoost implementation and the implementation of the sklearn wrapper for XGBoost use a different ordering of the arguments. The native implementation takes predictions first and true labels (dtrain) second, while the sklearn implementation takes the true labels (dtrain) first and the predictions second.

Custom binary crossentropy loss in keras that ignores columns with no non-zero values

I'm trying to segment data where the label can be quite sparse. Therefore I want to only calculate gradients in columns that have at least one nonzero value.
I've tried some methods where I apply an extra input which is the mask of these nonzero columns, but given that all the necessary information already is contained in y_true, a method which only looks at y_true to find the mask would definitely be preferable.
If I would implement it with numpy, it would probably look something like this:
def loss(y_true, y_pred):
indices = np.where(np.sum(y_true, axis=1) > 0)
return binary_crossentropy(y_true[indices], y_pred[indices])
y_true and y_pred are in this example vectorized 2D images.
How could this be "translated" to a differentiable Keras loss function?
Use tf-compatible operations, via tf and keras.backend:
import tensorflow as tf
import keras.backend as K
from keras.losses import binary_crossentropy
def custom_loss(y_true, y_pred):
indices = K.squeeze(tf.where(K.sum(y_true, axis=1) > 0))
y_true_sparse = K.cast(K.gather(y_true, indices), dtype='float32')
y_pred_sparse = K.cast(K.gather(y_pred, indices), dtype='float32')
return binary_crossentropy(y_true_sparse, y_pred_sparse) # returns a tensor
I'm unsure about the exact dimensionality specs of your problem, but loss must evaluate to a single value - which above doesn't, since you're passing multi-dimensional predictions and labels. To reduce dims, wrap the return above with e.g. K.mean. Example:
y_true = np.random.randint(0,2,(10,2))
y_pred = np.abs(np.random.randn(10,2))
y_pred /= np.max(y_pred) # scale between 0 and 1
print(K.get_value(custom_loss(y_true, y_pred))) # get_value evaluates returned tensor
print(K.get_value(K.mean(custom_loss(y_true, y_pred))
>> [1.1489482 1.2705883 0.76229745 5.101402 3.1309896] # sparse; 5 / 10 results
>> 2.28284 # single value, as required
(Lastly, note that this sparsity will bias the loss by excluding all-zero columns from the total label/pred count; if undesired, you can average via K.sum and K.shape or K.size)

Categories