In Keras (with Tensorflow backend), is the current input pattern available to my custom loss function?
The current input pattern is defined as the input vector used to produce the prediction. For example, consider the following: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, shuffle=False). Then the current input pattern is the current X_train vector associated with the y_train (which is termed y_true in the loss function).
When designing a custom loss function, I intend to optimize/minimize a value that requires access to the current input pattern, not just the current prediction.
I've taken a look through https://github.com/fchollet/keras/blob/master/keras/losses.py
I've also looked through "Cost function that isn't just y_pred, y_true?"
I am also familiar with previous examples to produce a customized loss function:
import keras.backend as K
def customLoss(y_true,y_pred):
return K.sum(K.log(y_true) - K.log(y_pred))
Presumably (y_true,y_pred) are defined elsewhere. I've taken a look through the source code without success and I'm wondering whether I need to define the current input pattern myself or whether this is already accessible to my loss function.
You can wrap the loss function as a inner function and pass your input tensor to it (as commonly done when passing additional arguments to the loss function).
def custom_loss_wrapper(input_tensor):
def custom_loss(y_true, y_pred):
return K.binary_crossentropy(y_true, y_pred) + K.mean(input_tensor)
return custom_loss
input_tensor = Input(shape=(10,))
hidden = Dense(100, activation='relu')(input_tensor)
out = Dense(1, activation='sigmoid')(hidden)
model = Model(input_tensor, out)
model.compile(loss=custom_loss_wrapper(input_tensor), optimizer='adam')
You can verify that input_tensor and the loss value (mostly, the K.mean(input_tensor) part) will change as different X is passed to the model.
X = np.random.rand(1000, 10)
y = np.random.randint(2, size=1000)
model.test_on_batch(X, y) # => 1.1974642
X *= 1000
model.test_on_batch(X, y) # => 511.15466
You can use add_loss to pass external layers to your loss, in your case the input tensor.
Here an example:
def CustomLoss(y_true, y_pred, input_tensor):
return K.binary_crossentropy(y_true, y_pred) + K.mean(input_tensor)
X = np.random.uniform(0,1, (1000,10))
y = np.random.randint(0,2, 1000)
inp = Input(shape=(10,))
hidden = Dense(100, activation='relu')(inp)
out = Dense(1, activation='sigmoid')(hidden)
target = Input((1,))
model = Model([inp,target], out)
model.add_loss( CustomLoss( target, out, inp ) )
model.compile(loss=None, optimizer='adam')
model.fit(x=[X,y], y=None, epochs=3)
To use the model in inference mode (removing the target from inputs)
final_model = Model(model.input[0], model.output)
final_model.predict(X)
Related
Say I have a classification problem that has 30 potential binary labels. These labels are not mutually exclusive. The labels tend to be sparse--there is, on average, 1 positive label per all 30 labels but sometimes more than only 1. In the following code, how can I penalize the model from predicting all zeros? The accuracy will be high, but recall will be awful!
import numpy as np
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
OUTPUT_NODES = 30
np.random.seed(0)
def get_dataset():
"""
Get a dataset of X and y. This is a learnable problem as there is some signal in the features. 10% of the time, a
positive-output's index will also have a positive feature for that index
:return: X and y data for training
"""
n_observations = 30000
y = np.random.rand(n_observations, OUTPUT_NODES)
y = (y <= (1 / OUTPUT_NODES)).astype(int) # Makes a sparse output where there is roughly 1 positive label: ((1 / OUTPUT_NODES) * OUTPUT_NODES ≈ 1)
X = np.zeros((n_observations, OUTPUT_NODES))
for i in range(len(y)):
for j, feature in enumerate(y[i]):
if feature == 1:
X[i][j] = 1 if np.random.rand(1) > 0.9 else 0 # Makes the input features more noisy
# X[i][j] = 1 # Using this instead will make the model perform very well
return X, y
def create_model():
input_layer = Input(shape=(OUTPUT_NODES, ))
dense1 = Dense(100, activation='relu')(input_layer)
dense2 = Dense(100, activation='relu')(dense1)
output_layer = Dense(30, activation='sigmoid')(dense2)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['Recall'])
return model
def main():
X, y = get_dataset()
model = create_model()
model.fit(X, y, epochs=10, batch_size=10)
X_pred = np.random.randint(0, 2, (100, OUTPUT_NODES))
y_pred = model.predict(X_pred)
print(X_pred)
print(y_pred.round(1))
if __name__ == '__main__':
main()
I believe I read here that I could use:
weighted_cross_entropy_with_logits
to address this issue. How would that affect my final output layer's activation functions? Would I have to have an activation function? How do I specify a penalty to misclassifications of a true positive class?
Ok, it is an interesting problem
First you need to define a weighted cross entropy loss wrapper:
def wce_logits(positive_class_weight=1.):
def mylossw(y_true, logits):
cross_entropy = tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(logits=logits, labels=tf.cast(y_true, dtype=tf.float32), pos_weight=positive_class_weight))
return cross_entropy
return mylossw
The positive_class_weight is applied to the positive class data. You need this wrapper for tf.nn.weighted_cross_entropy_with_logits to get a loss function that takes y_true and y_pred (only) as inputs.
Note that you must cast y_true to float32.
Second, you can not use the predefined Recall, because it does not work with logits. I found a workaround in this discussion
class Recall(tf.keras.metrics.Recall):
def __init__(self, from_logits=False, *args, **kwargs):
super().__init__(*args, **kwargs)
self._from_logits = from_logits
def update_state(self, y_true, y_pred, sample_weight=None):
if self._from_logits:
super(Recall, self).update_state(y_true, tf.nn.sigmoid(y_pred), sample_weight)
else:
super(Recall, self).update_state(y_true, y_pred, sample_weight)
Finally, you need to remove the sigmoid activation from the last layer as you are using logits
def create_model():
input_layer = Input(shape=(OUTPUT_NODES, ))
dense1 = Dense(100, activation='relu')(input_layer)
dense2 = Dense(100, activation='relu')(dense1)
output_layer = Dense(30)(dense2)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer='adam', loss=wce_logits(positive_class_weight=27.), metrics=[Recall(from_logits=True)])
return model
Note that the positive weight is set to 27 here. You can read a discussion on how to correctly calculate the weight
While my code runs without any problems with Keras Tuner and standard loss functions like 'mse' I am trying to figure out how to write a custom loss function that accept an external argument in addition to true and forecasted y to use inside Keras Tuner for LSTM model selection. I am looking for the easiest and less painful way and I didn't find a working solution in old posts.
One approach I follewed is this one. Let's say I have these variables
# external vector needed in custom loss function
ex_loss= np.logical_not(klines_backtest.loc[i_sel,['d']].to_numpy(dtype=np.float32)[:sample_start])
# create data sequences for x and vector to forecasy y
x_train, y_train = lstm_data_sequence(dataset[:sample_start,:-1], dataset[:sample_start,-1], lstm_sequence)
# concatenate external vector to y so y shape is Nx2
y_train = np.vstack((y_train, ex_loss[lstm_sequence:,0])).T
I have defined the following loss function
def bande_loss(y_true, y_pred):
mse = K.square(y_pred - y_true[:,0])
i_loss = K.equal(y_true[:,1], 1) and K.greater_equal(y_pred, y_true[:,0])
i_loss = K.cast(~i_loss, 'float32')
return K.mean(mse*i_loss)
Basically I tryied to avoid the loss function override passing the additional variable (of the same size of y_true) I need in the loss function inside y_train where I expext to have y_true and the corresponding external variable correctly sized for the batch.
The LSTM for model selection is
def lstm_model(hp):
model = Sequential()
model.add(InputLayer(input_shape=(48*3, 13)))
num_layers = hp.Int('num_layers', min_value=4, max_value=8, step=2)
num_units = hp.Choice('units', values=[50, 100, 250, 500])
n_dropout = hp.Choice('n_dropout', values=[float(0), 0.10, 0.20])
n_rec_dropout = hp.Choice('n_rec_dropout', values=[float(0), 0.10, 0.20])
learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4, 1e-5, 1e-6])
for i in range(num_layers):
if i < num_layers - 1:
r_sequence = True
else:
r_sequence = False
model.add(LSTM(
units=num_units,
dropout=n_dropout,
recurrent_dropout=n_rec_dropout,
return_sequences=r_sequence))
model.add(Dense(1))
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
loss=bande_loss,
metrics=[bande_loss])
return model
Executing this code
tuner = Hyperband(
hypermodel=lstm_model,
objective=Objective("bande_loss", direction="min"),
max_epochs=50,
hyperband_iterations=2,
executions_per_trial=1,
overwrite=True,
project_name='hyperband_tuner')
stop_early = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=3, verbose=1)
tuner.search(x_train, y_train, epochs=30, validation_split=p_train, callbacks=[stop_early],
shuffle=False, verbose=1)
I get this error
The second input must be a scalar, but it has shape [32]
[[{{node bande_loss/cond/switch_pred/_2736}}]] [Op:__inference_train_function_45266]
Function call stack:
train_function
Note that 32 is the (default) batch size.
Also running the same code with
def bande_loss(y_true, y_pred):
mse = K.square(y_pred - y_true[:,0])
return K.mean(mse)
seems to work fine while running with
def bande_loss(y_true, y_pred):
mse = K.square(y_pred - y_true[:,1])
return K.mean(mse)
gives me the same error and I cannot understand why.
I also tried the loss function override in this way
def lstm_model(hp):
model = Sequential()
model.add(InputLayer(input_shape=(48*3, 13)))
num_layers = hp.Int('num_layers', min_value=4, max_value=8, step=2)
num_units = hp.Choice('units', values=[50, 100, 250, 500])
n_dropout = hp.Choice('n_dropout', values=[float(0), 0.10, 0.20])
n_rec_dropout = hp.Choice('n_rec_dropout', values=[float(0), 0.10, 0.20])
learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4, 1e-5, 1e-6])
for i in range(num_layers):
if i < num_layers - 1:
r_sequence = True
else:
r_sequence = False
model.add(LSTM(
units=num_units,
dropout=n_dropout,
recurrent_dropout=n_rec_dropout,
return_sequences=r_sequence))
model.add(Dense(1))
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
loss=bande_loss(ex_loss),
metrics=[bande_loss(ex_loss)])
return model
def bande_loss(ex_loss):
def loss(y_true, y_pred):
mse = K.square(y_pred - y_true)
i_loss = K.equal(ex_loss, True) and K.greater_equal(y_pred, y_true)
i_loss = K.cast(~i_loss, 'float32')
return K.mean(mse*i_loss)
return loss
...
# external vector needed in custom loss function
ex_loss= np.logical_not(klines_backtest.loc[i_sel,['d']].to_numpy(dtype=np.float32)[:sample_start])
# create data sequences for x and vector to forecasy y
x_train, y_train = lstm_data_sequence(dataset[:sample_start,:-1], dataset[:sample_start,-1], lstm_sequence)
ex_loss = K.variable(ex_loss[lstm_sequence:], dtype=bool)
tuner = Hyperband(
hypermodel=lstm_model,
objective=Objective("bande_loss(ex_loss)", direction="min"),
max_epochs=50,
hyperband_iterations=2,
executions_per_trial=1,
overwrite=True,
project_name='hyperband_tuner')
stop_early = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=3, verbose=1)
tuner.search(x_train, y_train, epochs=30, validation_split=p_train, callbacks=[stop_early],
shuffle=False, verbose=1)
But I get this error
tensorflow.python.framework.errors_impl.InvalidArgumentError: The second input must be a scalar, but it has shape [4176]
[[{{node cond/switch_pred/_12}}]] [Op:__inference_train_function_34471]
Function call stack:
train_function
Can anyone provide me help or a simpler and effective way to implement custom loss functions with external parameters inside Keras Tuner?
I know how to write a custom loss function in Keras with additional input, not the standard y_true, y_pred pair, see below. My issue is inputting the loss function with a trainable variable (a few of them) which is part of the loss gradient and should therefore be updated.
My workaround is:
Enter the network a dummy input of NXV size where N is the number of observations and V number of additional variables
Add a Dense() layer dummy_output so that Keras will track my V "weights"
Use this layer's V weights in my custom loss function for my true output layer
Use a dummy loss function (simply returns 0.0 and/or has weight 0.0) for this dummy_output layer so my V "weights" are only updated via my custom loss function
My question is: Is there a more natural Keras/TF-like way of doing this? Because it feels so contrived not to mention prone to bugs.
Example of my workaround:
(Yes I know this is a very silly custom loss function, in reality things are much more complex)
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping
import tensorflow.keras.backend as K
from tensorflow.keras.layers import Input
from tensorflow.keras import Model
n_col = 10
n_row = 1000
X = np.random.normal(size=(n_row, n_col))
beta = np.arange(10)
y = X # beta
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# my custom loss function accepting my dummy layer with 2 variables
def custom_loss_builder(dummy_layer):
def custom_loss(y_true, y_pred):
var1 = dummy_layer.trainable_weights[0][0]
var2 = dummy_layer.trainable_weights[0][1]
return var1 * K.mean(K.square(y_true-y_pred)) + var2 ** 2 # so var2 should get to zero, var1 should get to minus infinity?
return custom_loss
# my dummy loss function
def dummy_loss(y_true, y_pred):
return 0.0
# my dummy input, N X V, where V is 2 for 2 vars
dummy_x_train = np.random.normal(size=(X_train.shape[0], 2))
# model
inputs = Input(shape=(X_train.shape[1],))
dummy_input = Input(shape=(dummy_x_train.shape[1],))
hidden1 = Dense(10)(inputs) # here only 1 hidden layer in the "real" network, assume whatever network is built here
output = Dense(1)(hidden1)
dummy_output = Dense(1, use_bias=False)(dummy_input)
model = Model(inputs=[inputs, dummy_input], outputs=[output, dummy_output])
# compilation, notice zero loss for the dummy_output layer
model.compile(
loss=[custom_loss_builder(model.layers[-1]), dummy_loss],
loss_weights=[1.0, 0.0], optimizer= 'adam')
# run, notice y_train repeating for dummy_output layer, it will not be used, could have created dummy_y_train as well
history = model.fit([X_train, dummy_x_train], [y_train, y_train],
batch_size=32, epochs=100, validation_split=0.1, verbose=0,
callbacks=[EarlyStopping(monitor='val_loss', patience=5)])
Seems to work as indeed whatever the start values for var1 and var2 (the initialization of the dummy_output layer) they aspire for minus inf and 0 respectively:
(this plot comes from running the model iteratively and saving those two weights like below)
var1_list = []
var2_list = []
for i in range(100):
if i % 10 == 0:
print('step %d' % i)
model.fit([X_train, dummy_x_train], [y_train, y_train],
batch_size=32, epochs=1, validation_split=0.1, verbose=0)
var1, var2 = model.layers[-1].get_weights()[0]
var1_list.append(var1.item())
var2_list.append(var2.item())
plt.plot(var1_list, label='var1')
plt.plot(var2_list, 'r', label='var2')
plt.legend()
plt.show()
Answering my own question here, after days of struggling I got it to work without dummy input, I think this is much better and should be the "canonical" way until Keras/TF simplify the process. This is how the Keras/TF docs do it here.
The key to using a loss function with external trainable variable is through working with a custom loss/output Layer which has self.add_loss(...) in its call() implementation, like so:
class MyLoss(Layer):
def __init__(self, var1, var2):
super(MyLoss, self).__init__()
self.var1 = K.variable(var1) # or tf.Variable(var1) etc.
self.var2 = K.variable(var2)
def get_vars(self):
return self.var1, self.var2
def custom_loss(self, y_true, y_pred):
return self.var1 * K.mean(K.square(y_true-y_pred)) + self.var2 ** 2
def call(self, y_true, y_pred):
self.add_loss(self.custom_loss(y_true, y_pred))
return y_pred
Now notice the MyLoss layer needs two inputs, the actual y_true and the predicted y until that point:
inputs = Input(shape=(X_train.shape[1],))
y_input = Input(shape=(1,))
hidden1 = Dense(10)(inputs)
output = Dense(1)(hidden1)
my_loss = MyLoss(0.5, 0.5)(y_input, output) # here can also initialize those var1, var2
model = Model(inputs=[inputs, y_input], outputs=my_loss)
model.compile(optimizer= 'adam')
Finally as TF docs mention, in this case you do not have to specify the loss or y in the fit() function:
history = model.fit([X_train, y_train], None,
batch_size=32, epochs=100, validation_split=0.1, verbose=0,
callbacks=[EarlyStopping(monitor='val_loss', patience=5)])
Again, notice that y_train comes into fit() as one of the inputs.
Now it works:
var1_list = []
var2_list = []
for i in range(100):
if i % 10 == 0:
print('step %d' % i)
model.fit([X_train, y_train], None,
batch_size=32, epochs=1, validation_split=0.1, verbose=0)
var1, var2 = model.layers[-1].get_vars()
var1_list.append(var1.numpy())
var2_list.append(var2.numpy())
plt.plot(var1_list, label='var1')
plt.plot(var2_list, 'r', label='var2')
plt.legend()
plt.show()
(I should also mention this specific pattern of var1, var2 highly depends on their initial values, if var1's initial value is higher than 1 it will not in fact decrease until minus inf)
I am using a multiple output model in tensorflow 2.0
input_layer = layers.Input(shape=(INP_MAX_LENGTH), name="input")
embed_layer = layers.Embedding(EMBED_INP_DIM, embedding_size, name="embeddings", weights=[embedding_matrix], trainable=False, input_length=INP_MAX_LENGTH)(input_layer)
lstm_layer1 = layers.LSTM(1024, name="lstm1")(embed_layer)
lstm_layer2 = layers.LSTM(1024, name="lstm2")(embed_layer)
output_layer1 = layers.Dense(1, name="output1", activation='relu')(lstm_layer1)
output_layer2 = layers.Dense(1, name="output2", activation='relu')(lstm_layer2)
concat_layer = layers.Concatenate()([output_layer1, output_layer2])
output_layer3 = layers.Dense(1, name="output3", activation='relu')(concat_layer)
model = Model(input=input_layer, output=[output_layer1, output_layer2, output_layer3])
model.compile((optimizer='adagrad', loss={'output1': loss1, 'output2': loss1, 'output3':loss2})
I'm using quantile loss function as my loss1 and it is working fine.
I want my loss2 function to behave something like this
import tensorflow.keras.backend as K
def loss2(y_true, y_pred):
# y_true1 = y from output1
# y_true2 = y from output2
# y_pred1 = y from output_layer1
# y_pred2 = y from output_layer2
# loss = K.mean(y_true - (K.sqrt(K.square(y_true2 - y_pred2) + K.square(y_true1 - y_pred1)))), axis=-1)
return loss
In short, I'm trying to implement the distance formula as my loss function and minimize the distance between two points to 0.
Can I pass y_true and y_pred from output1 and output2 to loss2 function? I tried using Concatenate to at least pass the y_preds but somehow its not working.
Yes, when you have multiple output nodes you can have loss functions that take in multiple predictions.
The basic idea is to always treat your output as a vector so when you pass it from your generator function it should be a list and when it is passed to the loss functions it should remain a list.
Rather than having different loss functions for each output node you should ideally have one loss function for all the outputs combined.
I am trying to use a custom loss-function which depends on some arguments that the model does not have.
The model has two inputs (mel_specs and pred_inp) and expects a labels tensor for training:
def to_keras_example(example):
# Preparing inputs
return (mel_specs, pred_inp), labels
# Is a tf.train.Dataset for model.fit(train_data, ...)
train_data = load_dataset(fp, 'train).map(to_keras_example).repeat()
In my loss function I need to calculate the lengths of mel_specs and pred_inp. This means my loss looks like this:
def rnnt_loss_wrapper(y_true, y_pred, mel_specs_inputs_):
input_lengths = get_padded_length(mel_specs_inputs_[:, :, 0])
label_lengths = get_padded_length(y_true)
return rnnt_loss(
acts=y_pred,
labels=tf.cast(y_true, dtype=tf.int32),
input_lengths=input_lengths,
label_lengths=label_lengths
)
However, no matter which approach I choose, I am facing some issue.
Option 1) Setting the loss-function in model.compile()
If I actually wrap the loss function s.t. it returns a function which takes y_true and y_pred like this:
def rnnt_loss_wrapper(mel_specs_inputs_):
def inner_(y_true, y_pred):
input_lengths = get_padded_length(mel_specs_inputs_[:, :, 0])
label_lengths = get_padded_length(y_true)
return rnnt_loss(
acts=y_pred,
labels=tf.cast(y_true, dtype=tf.int32),
input_lengths=input_lengths,
label_lengths=label_lengths
)
return inner_
model = create_model(hparams)
model.compile(
optimizer=optimizer,
loss=rnnt_loss_wrapper(model.inputs[0]
)
Here I get a _SymbolicException after calling model.fit():
tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [...]
Option 2) Using model.add_loss()
The documentation of add_loss() states:
[Adds a..] loss tensor(s), potentially dependent on layer inputs.
..
Arguments:
losses: Loss tensor, or list/tuple of tensors. Rather than tensors, losses
may also be zero-argument callables which create a loss tensor.
inputs: Ignored when executing eagerly. If anything ...
So I tried to do the following:
def rnnt_loss_wrapper(y_true, y_pred, mel_specs_inputs_):
input_lengths = get_padded_length(mel_specs_inputs_[:, :, 0])
label_lengths = get_padded_length(y_true)
return rnnt_loss(
acts=y_pred,
labels=tf.cast(y_true, dtype=tf.int32),
input_lengths=input_lengths,
label_lengths=label_lengths
)
model = create_model(hparams)
model.add_loss(
rnnt_loss_wrapper(
y_true=model.inputs[2],
y_pred=model.outputs[0],
mel_specs_inputs_=model.inputs[0],
),
inputs=True
)
model.compile(
optimizer=optimizer
)
However, calling model.fit() throws a ValueError:
ValueError: No gradients provided for any variable: [...]
Is any of the above options supposed to work?
I have used the add_loss method as follow:
def custom_loss(y_true, y_pred, input_):
# custom loss function
y_estim = input_[...,0]*y_pred
shape = tf.cast(tf.shape(y_true)[1], dtype='float32')
return tf.reduce_mean(1/shape*tf.reduce_sum(tf.pow(y_true-y_estim, 2), axis=1))
mix_input = layers.Input(shape=(301, 257, 4)) # input 1
ref_input = layers.Input(shape=(301, 257, 1)) # input 2
target = layers.Input(shape=(301, 257)) # output target
smss_model = Model(inputs=[mix_input, ref_input], outputs=smss) # my model that accept two inputs
model = Model(inputs=[mix_input, ref_input, target], outputs=smss) # this one used just to train the model, with the additional paramters
model.add_loss(custom_loss(target, smss, mix_input)) # the add_loss where to pass the custom loss function
model.summary()
model.compile(loss=None, optimizer='sgd')
model.fit([mix, ref, y], epochs=1, batch_size=1, verbose=1)
even do I have used this method and works, I still looking for another method, that not involve creating a training model
Did using lambda function work? (https://www.w3schools.com/python/python_lambda.asp)
loss = lambda x1, x2: rnnt_loss(x1, x2, acts, labels, input_lengths,
label_lengths, blank_label=0)
In this way your loss function should be a function accepting parameters x1 and x2, but rnnt_loss can also be aware of acts, labels, input_lengths, label_lengths and blank_label