I'm working on a multi-label classification problem where instead of each target index representing a distinct class, it represents some amount of time into the future. On top of wanting my predicted label to match the target label, I want an extra term to enforce some temporal aspect of the learning.
E.g.:
y_true = [1., 1., 1., 0.]
y_pred = [0.75, 0.81, 0.93, 0.65]
Above, the truth label implies something occurring during the first three indices.
I want to easily be able to mix and match loss functions.
I have a couple custom loss functions for overall accuracy, each wrapped within functions for adjustable arguments:
def weighted_binary_crossentropy(pos_weight=1):
def weighted_binary_crossentropy_(Y_true, Y_pred):
...
return tf.reduce_mean(loss, axis=-1)
return weighted_binary_crossentropy_
def mean_squared_error(threshold=0.5):
def mean_squared_error_(Y_true, Y_pred):
...
return tf.reduce_mean(loss, axis=-1)
return mean_squared_error
I also have a custom loss function to enforce the predicted label ending at the same time as the truth label (I haven't made use of the threshold argument here yet):
def end_time_error(threshold=0.5):
def end_time_error_(Y_true, Y_pred):
_, n_times = K.int_shape(Y_true)
weights = K.arange(1, n_times + 1, dtype=float)
Y_true = tf.multiply(Y_true, weights)
argmax_true = K.argmax(Y_true, axis=1)
argmax_pred = K.argmax(Y_pred, axis=1)
loss = tf.math.squared_difference(argmax_true, argmax_pred)
return tf.reduce_mean(loss, axis=-1)
Sometimes I might want to combine end_time_error with weighted_binary_crossentropy, sometimes with mean_squared_error, and I have plenty of other loss functions to experiment with. I don't want to have to code a new combined loss function for each pair.
Attempt at solution 1
I've tried making a meta-loss function that combines loss functions (globally defined in the same script).
def combination_loss(loss_dict, combine='add', weights=[]):
losses = []
if not weights:
weights = [1] * len(loss_dict)
for (loss_func, loss_args), weight in zip(loss_dict.items(), weights):
assert loss_func in globals().keys()
loss_func = eval(loss_func)
loss = loss_func(loss_args)
losses.append(loss * weight)
if combine == 'add':
loss = sum(losses)
elif combine == 'multiply':
loss = np.prod(losses)
return loss
To use this:
loss_args = {'loss_dict':
{'weighted_binary_crossentropy': {'pos_weight': 1},
'end_time_error': {}},
'combine': 'add',
'weights': [0.75, 0.25]}
model.compile(loss=combination_loss(**loss_args), ...)
Error:
File "C:\...\losses.py", line 165, in combination_loss
losses.append(loss * weight)
TypeError: unsupported operand type(s) for *: 'function' and 'float'
I'm playing loose with functions, so I'm not surprised this failed. But I'm not sure how to get what I want.
How can I combine functions with weights in combination_loss?
Or should I be doing that directly in the model.compile() call using a lambda function?
--EDIT
Attempt at solution 2
Ditching combination_loss:
losses = []
for loss_, loss_args_ in loss_args['loss_dict'].items():
losses.append(get_loss(loss_)(**loss_args_))
loss = lambda y_true, y_pred: [l(y_true, y_pred) * w for l, w
in zip(losses, loss_args['weights'])]
model.compile(loss=loss, ...)
Error:
File "C:\...\losses.py", line 139, in end_time_error_
weights = K.arange(1, n_times + 1, dtype=float)
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
Probably because y_true, y_pred won't work as arguments for wrapped loss functions.
Let's simplify your use case for only two losses:
loss = alpha * loss1 + (1-alpha) * loss2
Then you can do:
def generate_loss(alpha):
def combination_loss(y_true, y_pred):
return alpha * loss1(y_true, y_pred) + (1-alpha) * loss2(y_true, y_pred)
return combination_loss
Obviously, loss1 and loss2 would be your respective loss functions.
You can use this to generate different loss functions for different alphas:
alpha = 0.7
combination_loss = generate_loss(alpha)
model.compile(loss=combination_loss, ...)
If alpha is supposed to be static, you can also get rid of the outer function generate_loss.
Finally, you can also define this as a lambda function:
model.compile(loss=lambda y_true, y_pred: alpha * loss1(y_true, y_pred) + (1-alpha) * loss2(y_true, y_pred), ...)
I'm not sure where your bug is (I assume it's the eval but I can't debug it) but if you simplify it enough like this or use this as a working example to introduce your losses and weights, it should work.
Related
I need to build a custom loss method based on BLEU. I'm passing my LabelEncoder in the constructor to reverse labels and predictions and calculate the bleu distance.
Here is my Loss class
class CIMCodeSuccessiveLoss(Loss):
def __init__(self, labelEncoder: LabelEncoder):
super().__init__()
self.le = labelEncoder
def bleu_score(self, true_label, pred_label):
cim_true_label = self.le.inverse_transform(true_label.numpy())
cim_pred_label = self.le.inverse_transform(pred_label.numpy())
bleu_scores = [sentence_bleu(list(one_true_label),
list(one_pred_label),
weights=(0.5, 0.25, 0.125, 0.125)) for one_true_label, one_pred_label in
zip(cim_true_label, cim_pred_label)]
return np.float32(bleu_scores)
def call(self, y_true, y_pred):
labeled_y_pred = tf.cast(tf.argmax(y_pred, axis=-1), tf.int32)
bleu = tf.py_function(self.bleu_score, (tf.reshape(y_true, [-1]), labeled_y_pred), tf.float32)
return tf.reduce_sum(tf.square(1 - bleu))
The bleu_score method is calculating the correct scores and returns a NumPy array.
when I try to return the squared sum, I get this error
raise ValueError(f"No gradients provided for any variable: {variable}.
I'm also providing the model:
inputs = tf.keras.Input(shape=(1,), dtype=tf.string)
x = vectorize_layer(inputs)
x = Embedding(vocab_size, embedding_dim, name="embedding")(x)
x = LSTM(units=32, name="lstm")(x)
outputs = Dense(classes_number, name="classification")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="first_cim_classifier")
model.summary()
# we add early stopping for our model.
early_stopping = EarlyStopping(monitor='loss', patience=2)
model.compile(
loss=CIMCodeSuccessiveLoss(le),
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy", "crossentropy"],
run_eagerly=True)
trained_model = model.fit(np.array(x_train), np.array(y_train), batch_size=64, epochs=10,
validation_data=(np.array(x_val), np.array(y_val)),
callbacks=[early_stopping])
Any help is appreciated. Thanks in advance.
To calculate the loss function, you use the method 'tf.argmax(y_pred, axis=-1)',argmax is not differentiable and the automatic differentiation to calculate the gradients is not possible, you have to remove this method, for example (depending on your data) you can change the output layer to softmax and labels to one_hot.
The issue is, the argmax function is not a differentiable, which is problematic when including it in a loss function:
labeled_y_pred = tf.cast(tf.argmax(y_pred, axis=-1), tf.int32)
One way to workaround this is to use a differentiable approximation of the argmax function, similar to the smooth maximum function:
As β approaches infinity, this will approach the the true maximum. For your purposes, β=10 or β=100 should accomplish your goals.
In Tensorflow, this could be accomplished as follows:
def differentiable_argmax_approx(x, beta=10, axis=None):
return tf.reduce_sum(tf.cumsum(tf.ones_like(x)) * tf.exp(beta * x) / tf.reduce_sum(tf.exp(beta * x), axis=axis), axis=axis) - 1
Then changing the original line to:
labeled_y_pred = tf.cast(differentiable_argmax_approx(y_pred, axis=-1), tf.int32)
We can verify the functionality with a simple test case:
beta = 10
x = np.array([1, 2, 3, 10, 4, 5], dtype=np.float)
y = differentiable_argmax_approx(x, beta)
assert x.argmax() == y
One caveat to this approach: if the maximum value is not unique along the axis that we're applying the function to, the result will be the arithmetic mean of the indices. Providing another test case to illustrate:
beta = 10
x = np.array([1, 2, 10, 3, 10], dtype=np.float)
y = differentiable_argmax_approx(x, beta)
assert y == 3
The result is 3 here, because we have two occurrences of the maximum value (10): one at index 2, and the other at index 4. In contrast, the regular argmax function returns the first index of the maximum argument.
Another improvement would be moving more computation into Tensorflow functions. To start, instead of using sklearn's LabelEncoder, to apply a mapping in the loss function, you could use a tf.lookup.StaticHashTable to accomplish the same objective with the Tensorflow API. To convert from a LabelEncoder to a tf.lookup.StaticHashTable, you can use the following function:
def convert_label_encoder_to_static_hash_table(le: LabelEncoder,
default_value: int = -1) -> tf.lookup.StaticHashTable:
static_hash_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
tf.convert_to_tensor(le.classes_),
tf.convert_to_tensor(le.transform(le.classes_))), default_value=default_value)
return static_hash_table
Or, for your purposes, since you're applying the inverse mapping (to go from integers -> string), you may want to swap the key and the values:
def convert_label_encoder_to_static_hash_table(le: LabelEncoder,
default_value: int = "") -> tf.lookup.StaticHashTable:
static_hash_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
tf.convert_to_tensor(le.transform(le.classes_)),
tf.convert_to_tensor(le.classes_))), default_value=default_value)
return static_hash_table
and, in the initializer:
def __init__(self, labelEncoder: LabelEncoder):
super().__init__()
self.table = convert_label_encoder_to_static_hash_table(labelEncoder)
By operating on tf.Tensor objects, you can utilize tf.map_fn instead of using a for-loop and converting to a numpy array/lists - your loss function would become:
def bleu_score(self, true_label, pred_label):
cim_true_label = self.table[true_label]
cim_pred_label = self.table[pred_label]
bleu_scores = tf.map_fn(lambda x: sentence_bleu([str(x[0])], [str(x[1])], weights=(0.5, 0.25, 0.125, 0.125)),
elems=tf.stack([(ground_truth, pred) for ground_truth, pred in
zip(cim_pred_label, cim_true_label)],
dtype=(tf.string, tf.string),
fn_output_signature=tf.int32))
return bleu_scores
This should also mitigate the need to call tf.py_func in the loss computation, since the bleu_score function is now entirely Tensorflow operations instead of calling native Python functions.
I am trying to use Poisson unscaled deviance as a loss function for my neural network, but there's a major flow with this : y_true can take (and will take very often) the value 0.
Unscaled deviance works like this for Poisson case :
If y_true = 0, then loss = 2 * d * y_pred
If y_true > 0, then loss = 2 * d *y_pred * (y_true * log(y_true)-y_true * log(y_pred)-y_true+y_pred
Note that as soon as log(0) is computed, the loss becomes -inf so my goal is to prevent this to happen.
I tried using the switch function to solve this but here's the trick:
If I have the value log(0), I don't want to replace it by 0 (with K.zeros()) because it would be considering that y_true = 1 since log(1) = 0.
Therefore I want to try using a large negative value in this case (-10000 for example) but I don't know how to do this since K.variable(-10000) gives the error:
ValueError: Rank of `condition` should be less than or equal to rank of `then_expression` and `else_expression`. ndim(condition)=1, ndim(then_expression)=0
Using K.zeros_like(y_true) instead of K.variable(-10000) will work for keras but it is mathematically incorrect and the optimisation doesn't work properly because of this.
I'd like to know how to replace the log by a large negative value in the switch function. Here's my attempt:
def custom_loss3(data, y_pred):
y_true = data[:, 0]
d = data[:, 1]
# condition
loss_value = KB.switch(KB.less_equal(y_true, 0),
2 * d * y_pred, 2 * d * (y_true * KB.switch(KB.less_equal(y_true, 0),
KB.variable(-10000), KB.log(y_true)) - y_true * KB.switch(KB.less_equal(y_pred, 0.), KB.variable(-10000), KB.log(y_pred)) - y_true + y_pred))
return loss_value
I've been recently trying to implement a model, which can be described as following: Given an input matrix and a set of targets, let the model learn, simultaneously, the matrix representation, as well as the targets via a custom loss function.
The architecture (simplified):
input_matrix = Input(shape=(i_shape,))
layer1 = Dense(100)(input_matrix)
output = Dense(3)(layer1)
autoencoder_mid = Dense(100)(input_matrix)
autoencoder_output = Dense(i_shape)(autoencoder_mid)
My idea of a loss function:
def customLoss(true_matrix,pred_matrix):
def combined_loss(y_true,y_pred):
return K.abs(y_true-y_pred)
a = K.mean( K.square(y_pred - y_true) * K.exp(-K.log(1.7) * (K.log(1. + K.exp((y_true - 3)/5 )))),axis=-1 )
b = K.mean( K.square(pred_matrix - true_matrix) * K.exp(-K.log(1.7) * (K.log(1. + K.exp((true_matrix - 3)/5 )))),axis=-1)
return a+b
return combined_loss
I compile the model as:
net = Model(input_matrix, [output,autoencoder_output])
net = net.compile(optimizer='adam', loss=customLoss(true_matrix=X,pred_matrix=autoencoder_output))
Where I try to fit the network with a standard:
net.fit(X,
target,
epochs=10,
batch_size=10)
The error I get is:
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float64: 'Tensor("loss/dense_4_loss/Log_3:0", shape=(389, 3890), dtype=float64, device=/device:GPU:0)'
My question is, is there any other way of doing this? If so, could you please point me towards a possible solution. Thank you very much.
You can try this:
def customLoss(true_matrix):
def combined_loss(y_true,y_pred):
y_pred, pred_matrix = y_pred
...
return combined_loss
net = Model(input_matrix, [output,autoencoder_output])
net.compile(optimizer='adam', loss=customLoss(X))
As the original y_pred will be a touple with (output,autoencoder_output).
Concerning the double return, the function will only return the first one, so I'd remove one of the two return lines or combine the two outputs such as:
alpha = 0.5
beta = 0.5
...
loss1, loss2 = K.abs(y_true-y_pred), a+b
return alpha*loss1 + beta*loss2
Changing alpha and beta upon convenience.
Thus, the whole thing could be:
def customLoss(true_matrix, alpha = 0.5, beta = 0.5):
def combined_loss(y_true,y_pred):
y_pred, pred_matrix = y_pred
a = K.mean( K.square(y_pred - y_true) * K.exp(-K.log(1.7) * (K.log(1. + K.exp((y_true - 3)/5 )))),axis=-1 )
b = K.mean( K.square(pred_matrix - true_matrix) * K.exp(-K.log(1.7) * (K.log(1. + K.exp((true_matrix - 3)/5 )))),axis=-1)
loss1, loss2 = K.abs(y_true-y_pred), a+b
return alpha*loss1 + beta*loss2
return combined_loss
net = Model(input_matrix, [output,autoencoder_output])
net.compile(optimizer='adam', loss=customLoss(X))
I want my model to increase the loss for a false positive prediction when training by creating a custom loss function.
The class_weight parameter in model.fit() does not work for this issue. The class_weight is already set to { 0: 1, 1:23 } as I have skewed training data where there are 23 times as many non-true labels as there are true labels.
I am not too experienced when working with the keras backend. I have mostly worked with the functional model.
What I want to create is:
def weighted_binary_crossentropy(y_true, y_pred):
#where y_true == 0 and y_pred == 1:
# weight this loss and make it 50 times larger
#return loss
I can do simple stuff with the tensors such as getting the mean squared error but I have no idea how to do logical stuff.
I have tried to do some hacky solution which doesnt work and feels totally wrong:
def weighted_binary_crossentropy(y_true, y_pred):
false_positive_weight = 50
thresh = 0.5
y_pred_true = K.greater_equal(thresh,y_pred)
y_not_true = K.less_equal(thresh,y_true)
false_positive_tensor = K.equal(y_pred_true,y_not_true)
loss_weights = K.ones_like(y_pred) + false_positive_weight*false_positive_tensor
return K.binary_crossentropy(y_true, y_pred)*loss_weights
I am using python 3 with keras 2 and tensorflow as backend.
Thanks in advance!
I think you're almost there...
from keras.losses import binary_crossentropy
def weighted_binary_crossentropy(y_true, y_pred):
false_positive_weight = 50
thresh = 0.5
y_pred_true = K.greater_equal(thresh,y_pred)
y_not_true = K.less_equal(thresh,y_true)
false_positive_tensor = K.equal(y_pred_true,y_not_true)
#changing from here
#first let's transform the bool tensor in numbers - maybe you need float64 depending on your configuration
false_positive_tensor = K.cast(false_positive_tensor,'float32')
#and let's create it's complement (the non false positives)
complement = 1 - false_positive_tensor
#now we're going to separate two groups
falsePosGroupTrue = y_true * false_positive_tensor
falsePosGroupPred = y_pred * false_positive_tensor
nonFalseGroupTrue = y_true * complement
nonFalseGroupPred = y_pred * complement
#let's calculate one crossentropy loss for each group
#(directly from the keras loss functions imported above)
falsePosLoss = binary_crossentropy(falsePosGroupTrue,falsePosGroupPred)
nonFalseLoss = binary_crossentropy(nonFalseGroupTrue,nonFalseGroupPred)
#return them weighted:
return (false_positive_weight*falsePosLoss) + nonFalseLoss
I'm trying to implement a new loss function of my own.
When I tried to debug it (or print in it) I've noticed it is called only once at the model creating section of the code.
How can I know what y_pred and y_true contains (shapes, data etc..) if I cannot run my code into this function while fitting the model?
I wrote this loss function:
def my_loss(y_true, y_pred):
# run over the sequence, jump by 3
# calc the label
# if the label incorrect punish
y_pred = K.reshape(y_pred, (1, 88, 3))
y_pred = K.argmax(y_pred, axis=1)
zero_count = K.sum(K.clip(y_pred, 0, 0))
one_count = K.sum(K.clip(y_pred, 1, 1))
two_count = K.sum(K.clip(y_pred, 2, 2))
zero_punish = 1 - zero_count / K.count_params(y_true)
one_punish = 1- one_count/ K.count_params(y_true)
two_punish = 1- two_count/ K.count_params(y_true)
false_arr = K.not_equal(y_true, y_pred)
mask0 = K.equal(y_true, K.zeros_like(y_pred))
mask0_miss = K.dot(false_arr, mask0) * zero_punish
mask1 = K.equal(y_true, K.ones_like(y_pred))
mask1_miss = K.dot(false_arr, mask1) * one_punish
mask2 = K.equal(y_true, K.zeros_like(y_pred)+2)
mask2_miss = K.dot(false_arr, mask2) * two_punish
return K.sum(mask0_miss) + K.sum(mask1_miss) + K.sum(mask2_miss)
It fails on:
theano.gof.fg.MissingInputError: A variable that is an input to the graph was
neither provided as an input to the function nor given a value. A chain of
variables leading from this input to an output is [/dense_1_target, Shape.0].
This chain may not be unique
Backtrace when the variable is created:
How can I fix it?
You have to understand that Theano is a symbolic language. For example, when we define the following loss function in Keras:
def myLossFn(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)
Theano is just making a symbolic rule in a computational graph, which would be executed when it gets values i.e. when you train the model with some mini-batches.
As far as your question on how to debug your model goes, you can use theano.function for that. Now, you want to know if your loss calculation is correct. You do the following.
You can implement the python/numpy version of your loss function. Pass two random vectors to your numpy-loss-function and get a number. To verify if theano gives nearly identical result, define something as follows.
import theano
from theano import tensor as T
from keras import backend as K
Y_true = T.frow('Y_true')
Y_pred = T.fcol('Y_pred')
out = K.mean(K.abs(Y_pred - Y_true), axis=-1)
f = theano.function([Y_true, Y_pred], out)
# creating some values
y_true = np.random.random((10,))
y_pred = np.random.random((10,))
numpy_loss_result = np.mean(np.abs(y_true-y_pred))
theano_loss_result = f(y_true, y_pred)
# check if both are close enough
print numpy_loss_result-theano_loss_result # should be less than 1e-5
Basically, theano.function is a way to put values and evaluate those symbolic expressions. I hope this helps.