I want to create a loss function where the MSE is only calculated on a subset of the outputs. The subset depends on the input data. I used the answer to this question to figure out how to create a custom function based on the input data:
Custom loss function in Keras based on the input data
However, I'm having trouble implementing the custom function to work.
Here is what I've put together.
def custom_loss(input_tensor):
def loss(y_true, y_pred):
board = input_tensor[:81]
answer_vector = board == .5
#assert np.sum(answer_vector) > 0
return K.mean(K.square(y_pred * answer_vector - y_true), axis=-1)
return loss
def build_model(input_size, output_size):
learning_rate = .001
a = Input(shape=(input_size,))
b = Dense(60, activation='relu')(a)
b = Dense(60, activation='relu')(b)
b = Dense(60, activation='relu')(b)
b = Dense(output_size, activation='linear')(b)
model = Model(inputs=a, outputs=b)
model.compile(loss=custom_loss(a), optimizer=Adam(lr=learning_rate))
return model
model = build_model(83, 81)
I want the MSE to treat the output as 0 wherever the board is not equal to 0.5. (The true value is one hot encoded with the one being within the subset). For some reason my output my output is treated as always zero. In other words, the custom loss function doesn't seem to be finding any places where the board is equal to 0.5.
I can't tell if I'm misinterpretting the dimensions or if the comparisons are failing due to the tensors, or even if there is just a generally much easier approach to do what I'm trying.
The problem is that answer_vector = board == .5 is not what you think it is. It is not a tensor, but the boolean value False, since board is a tensor and 0.5 is a number:
a = tf.constant([0.5, 0.5])
print(a == 0.5) # False
Now, a * False is a vector fo zeros:
with tf.Session() as sess:
print(sess.run(a * False)) # [0.0, 0.0]
You need to use tf.equal instead of ==. Another possible pitfall is that comparing floats with equality is dangerous, see e.g. What's wrong with using == to compare floats in Java?
Related
the tf.keras.Model I am training has the following primary performance indicators:
escape rate: (#samples with predicted label 0 AND true label 1) / (#samples with true label 1)
false call rate: (#samples with predicted label 1 AND true label 0) / (#samples with true label 0)
The targeted escape rate is predefined, which means the decision threshold will have to be set appropriately. To calculate the resulting false call rate, I would like to implement a custom metric somewhere along the lines of the following pseudo code:
# separate predicted probabilities by their true label
all_ok_probabilities = all_probabilities.filter(true_label == 0)
all_nok_probabilities = all_probabilities.filter(true_label == 1)
# sort NOK samples
sorted_nok_probabilities = all_nok_probabilities.sort(ascending)
# determine decision threshold
threshold_idx = round(target_escape_rate * num_samples) - 1
threshold = sorted_nok_probabilities[threshold_idx]
# calculate false call rate
false_calls = count(all_ok_probabilities > threshold)
false_call_rate = false_calls / num_ok_samples
My issue is that, in a MirroredStrategy environment, tf.keras automatically distributes metric calculation across all replicas, each of them getting (batch_size / n_replicas) samples per update, and finally sums the results. My algorithm however only works correctly if ALL labels & predictions are combined (final summing could probably be overcome by dividing by the number of replicas).
My idea is to concatenate all y_true and y_pred in my metric's update_state() method into sequences, and running the evaluation in result(). The first step already seems impossible, however; tf.Variable only provides suitable aggregation methods for numeric scalars, not for sequences: tf.VariableAggregation.ONLY_FIRST_REPLICA makes me loose all data from 2nd to nth replica, SUM silently locks up the fit() call, MEAN does not make any sense in my application (and might hang just as well).
I already tried to instantiate the metric outside of the MirroredStrategy scope, but tf.keras.Model.compile() does not accept that.
Any hints/ideas?
P.S.: Let me know if you need a minimal code example, I am working on it. :)
Solved myself by implementing it as callback instead of metric. I run fit() without "validation_data" and instead have all validation set metrics calculated in the callback. This avoids two redundant validation set predictions.
In order to inject the resulting metric values back into the training procedure, I used the rather hackish approach from Access variables of caller function in Python.
class ValidationCallback(tf.keras.callbacks.Callback):
"""helper class to calculate validation set metrics after each epoch"""
def __init__(self, val_data, escape_rate, **kwargs):
# call parent constructor
super(ValidationCallback, self).__init__(**kwargs)
# save parameters
self.val_data = val_data
self.escape_rate = escape_rate
# declare batch_size - we will get that later
self.batch_size = 0
def on_epoch_end(self, epoch, logs=None):
# initialize empty arrays
y_pred = np.empty((0,2))
y_true = np.empty(0)
# iterate over validation set batches
for batch in self.val_data:
# save batch size, if not yet done
if self.batch_size == 0:
self.batch_size = batch[1].shape[0]
# concat all batch labels & predictions
# need to do predict()[0] due to several model outputs
y_pred = np.concatenate([y_pred, self.model.predict(batch[0])[0]], axis=0)
y_true = np.concatenate([y_true, batch[1]], axis=0)
# calculate classical accuracy for threshold 0.5
acc = ((y_pred[:, 1] >= 0.5) == y_true).sum() / y_true.shape[0]
# calculate cross-entropy loss
cce = tf.keras.losses.SparseCategoricalCrossentropy(reduction=tf.keras.losses.Reduction.SUM)
loss = cce(y_true, y_pred).numpy() / self.batch_size
# caculate false call rate
y_pred_nok = np.sort(y_pred[y_true == 1, 1])
idx = int(np.round(self.escape_rate * y_pred_nok.shape[0]))
threshold = y_pred_nok[idx]
false_calls = y_pred[(y_true == 0) & (y_pred[:, 1] >= threshold), 1].shape[0]
fcr = false_calls / y_true[y_true == 0].shape[0]
# add metrics to 'logs' dict of our caller (tf.keras.callbacks.CallbackList.on_epoch_end()),
# so that they become available to following callbacks
for f in inspect.stack():
if 'logs' in f[0].f_locals:
f[0].f_locals['logs'].update({'val_accuracy': acc,
'val_loss': loss,
'val_false_call_rate': fcr})
return
I want to build a custom accuracy metric with tolerance. Instead of counting elements exactly equal in y_true and y_pred, this accuracy regards the two elements are consistent if their difference within a given tolerance value. For example, if the differences between predicted degrees and true degrees are smaller than 5 degree, we can think the results are correct and calculate the accuracy based on this rule. I want to use this metric in model.compile so it should be a callable function.
I wrote a function as follows.
def accuracy_with_tolerence(y_true,y_pred):
"""
y_true/y_pred: batch of samples; (BatchSize, 1)
"""
threshold = 5
differnece = tf.abs(tf.subtract(y_true,y_pred)) - threshold
boolean_results = [True if i < 0 else False for i in differnece]
return K.mean(math_ops.cast(boolean_results, K.floatx()))
It can return the correct accuracy value.
x = tf.constant([1, 2, 3], dtype=tf.float32)
y = tf.constant([5, 8, 10], dtype=tf.float32)
acc = accuracy_with_tolerence(x,y)
print(acc)
tf.Tensor(0.33333334, shape=(), dtype=float32)
But when I want to use it in compile, there is an error:
# Initialize ResNet50
model = resnet50()
model.compile(optimizer='adam',loss='mse',metrics=[accuracy_with_tolerence])
model.load_weights(checkpoint_filepath_0)
model.evaluate(x_test,y_test)
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
It seems I cannot iterate the Tensor. So how can I get element-wise boolean comparison results in the metric function? How can I realize this accuracy function?
Thank you in advance.
You can't make a list comprehension with a tensor. The operation you're looking for is tf.where and you can use it as follows:
def accuracy_with_tolerence(y_true, y_pred):
threshold = 5
differnece = tf.abs(tf.subtract(y_true, y_pred)) - threshold
boolean_results = tf.where(differnece>0, True, False)
return K.mean(math_ops.cast(boolean_results, K.floatx()))
Note that you can simplify the code further:
...
boolean_results = tf.where(tf.abs(tf.subtract(y_true, y_pred)) - threshold>0, 1., 0.)
return K.mean(boolean_results)
I am new to machine learning, python and tensorflow. I am used to code in C++ or C# and it is difficult for me to use tf.backend.
I am trying to write a custom loss function for an LSTM network that tries to predict if the next element of a time series will be positive or negative. My code runs nicely with the binary_crossentropy loss function. I want now to improve my network having a loss function that adds the value of the next time series element if the predicted probability is greater than 0.5 and substracts it if the prob is less or equal to 0.5.
I tried something like this:
def customLossFunction(y_true, y_pred):
temp = 0.0
for i in range(0, len(y_true)):
if(y_pred[i] > 0):
temp += y_true[i]
else:
temp -= y_true[i]
return temp
Obviously, dimensions are wrong but since I cannot step into my function while debugging, it is very hard to get a grasp of dimensions here.
Can you please tell me if I can use an element-by-element function? If yes, how? And if not, could you help me with tf.backend?
Thanks a lot
From keras backend functions, you have the function greater that you can use:
import keras.backend as K
def customLossFunction(yTrue,yPred)
greater = K.greater(yPred,0.5)
greater = K.cast(greater,K.floatx()) #has zeros and ones
multiply = (2*greater) - 1 #has -1 and 1
modifiedTrue = multiply * yTrue
#here, it's important to know which dimension you want to sum
return K.sum(modifiedTrue, axis=?)
The axis parameter should be used according to what you want to sum.
axis=0 -> batch or sample dimension (number of sequences)
axis=1 -> time steps dimension (if you're using return_sequences = True until the end)
axis=2 -> predictions for each step
Now, if you have only a 2D target:
axis=0 -> batch or sample dimension (number of sequences)
axis=1 -> predictions for each sequence
If you simply want to sum everything for every sequence, then just don't put the axis parameter.
Important note about this function:
Since it contains only values from yTrue, it cannot backpropagate to change the weights. This will lead to a "none values not supported" error or something very similar.
Although yPred (the one that is connected to the model's weights) is used in the function, it's used only for getting a true x false condition, which is not differentiable.
I have an LSTM predicting time series values in tensorflow.
The model is working using an MSE as a loss function.
However, I'd like to be able to create a custom loss function where one of the error values is multiplied by two (therefore producing a higher error value).
In my batch of size 10, I want the 3rd value of the first input to be multiplied by 2, but because this is time series, this corresponds to the second value in the second input and the first value in the third input.
The error I get is:
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients
How do I make the gradients?
def loss_function(y_true, y_pred, peak_value=3, weight=2):
# peak value is where the multiplication happens on the first line
# weight is the how much the error is multiplied by
all_dif = tf.squared_difference(y_true, y_pred) # should be shape=[10,10]
peak = [peak_value] * 10
listy = range(0, 10)
c = [(i - j) % 10 for i, j in zip(peak, listy)]
for i in range(0, 10):
indices = [[i, c[i]]]
values = [1.0]
shape = [10,10]
delta = tf.SparseTensor(indices, values, shape)
all_dif = all_dif + tf.sparse_tensor_to_dense(delta)
return tf.reduce_sum(all_dif)
I believe the psuedo code would look something like this:
#tf.custom_gradient
def loss_function(y_true, y_pred, peak_value=3, weight=2)
## your code
def grad(dy):
return dy * partial_derivative
return loss, grad
Where partial_derivative is the analytically evaluated partial derivative with respect to your loss function. If your loss function is a function of more than one variable, it will require a partial derivative respect to each variable, I believe.
If you need more information, the documentation is good: https://www.tensorflow.org/api_docs/python/tf/custom_gradient
And I've yet to find an example of this functionality embedded in a model that's not a toy.
I want to create a custom metric for pearson correlation as defined here
I'm not sure how exactly to apply it to batches of y_pred and y_true
What I did:
def pearson_correlation_f(y_true, y_pred):
y_true,_ = tf.split(y_true[:,1:],2,axis=1)
y_pred, _ = tf.split(y_pred[:,1:], 2, axis=1)
fsp = y_pred - K.mean(y_pred,axis=-1,keepdims=True)
fst = y_true - K.mean(y_true,axis=-1, keepdims=True)
corr = K.mean((K.sum((fsp)*(fst),axis=-1))) / K.mean((
K.sqrt(K.sum(K.square(y_pred -
K.mean(y_pred,axis=-1,keepdims=True)),axis=-1) *
K.sum(K.square(y_true - K.mean(y_true,axis=-1,keepdims=True)),axis=-1))))
return corr
Is it necessary for me to use keepdims and handle the batch dimension manually and the take the mean over it? Or does Keras somehow do this automatically?
When you use K.mean without an axis, Keras automatically calculates the mean for the entire batch.
And the backend already has standard deviation functions, so it might be cleaner (and perhaps faster) to use them.
If your true data is shaped like (BatchSize,1), I'd say keep_dims is unnecessary. Otherwise I'm not sure and it would be good to test the results.
(I don't understand why you use split, but it seems also unnecessary).
So, I'd try something like this:
fsp = y_pred - K.mean(y_pred) #being K.mean a scalar here, it will be automatically subtracted from all elements in y_pred
fst = y_true - K.mean(y_true)
devP = K.std(y_pred)
devT = K.std(y_true)
return K.mean(fsp*fst)/(devP*devT)
If it's relevant to have the loss for each feature instead of putting them all in the same group:
#original shapes: (batch, 10)
fsp = y_pred - K.mean(y_pred,axis=0) #you take the mean over the batch, keeping the features separate.
fst = y_true - K.mean(y_true,axis=0)
#mean shape: (1,10)
#fst shape keeps (batch,10)
devP = K.std(y_pred,axis=0)
devt = K.std(y_true,axis=0)
#dev shape: (1,10)
return K.sum(K.mean(fsp*fst,axis=0)/(devP*devT))
#mean shape: (1,10), making all tensors in the expression be (1,10).
#sum is only necessary because we need a single loss value
Summing the result of the ten features or taking a mean of them is the same, being one 10 times the other (That is not very relevant to keras models, affecting only the learning rate, but many optimizers quickly find their way around this).