I am creating a custom loss in Keras. Lets assume that we have the following:
def a_loss(X):
a, b = X
loss = . . .
return loss
def mean_loss(y_true, y_pred):
return K.mean(y_pred - 0 * y_true)
And the model goes something like:
.
.
.
z1 = Dense(shape1, activation="linear")(conv_something)
z2 = Dense(shape1, activation="linear")(conv_something2)
loss = a_loss([z1, z2])
model = Model(
inputs=[input1, input2, ..],
outputs=[loss])
model.compile(loss=mean_loss,optimizer=Adam())
Now this hypothetical model compines normally. But when I have to use the trained model to predict something I am using:
model.predict(X_dictionary)
I am assuming that the output of the above is the loss(output of a_loss function).Right? If not correct me.
What I want for output of model.predict is to be the z2. Searching the API u can use multiple outputs:
model = Model(
inputs=[sequence_input_desc, sequence_input_title_positive, sequence_input_title_negative],
outputs=[loss, z2]
)
But the above will train to minimize both loss and z2. What I want is to train only to minimize loss and the predict function to output z2. One way checking the doc is to use loss_weights=[1.0,0.0] in the compile but it doesn't work. It outputs the error The model expects 2target arrays, but only received one array. Found: array with shape ..
Any idea how to do it?
After training is done you can simply create a new model that uses the same layers but has a different output:
model = Model(
inputs=[input1, input2, ..],
outputs=[z2])
It will re-use the learned weights as they are stored in the layers, not in the model (it is just a container).
You can then use model.predict to get the results as you would normally.
Related
Is there a way to pass a feature to a keras model as an input only to be accessed by a custom loss function without affecting the model as an input feature? I only need the feature to calculate the loss, not to feed-forward through the hidden layers in the network. (Basically what I want is to feed the feature in as an input and extract it as it is as an output along with y_pred to be accessed in the loss function).
A worked example would be much appreciated.
If you are writing your custom loss, you could use pass the feature as an input, and then using a Lambda layer, you can make it bypass the network and directly concatenate at the end. Something like the following -
from tensorflow.keras import layers, Model, utils
inp = layers.Input((11,))
x = layers.Lambda(lambda x: x[:,:-1])(inp)
o2 = layers.Lambda(lambda x: x[:,-1:])(inp)
x = layers.Dense(20)(x)
x = layers.Dense(20)(x)
o1 = layers.Dense(1)(x)
out = layers.concatenate([o1, o2])
model = Model(inp, out)
def custom_loss(outputs, actuals):
...
utils.plot_model(model, show_shapes=True, show_layer_names=False)
Here the first 10 features are the ones you want to pass via the network, and the last feature is the one you just want as is, for the custom loss. The final output is going to just be a concatenation of your expected output for the first 10 features via the network + the untouched feature.
If you want to know how to write a custom loss, please check this excellent SO post that explains it.
I have CNN that I have built using on Tensor-flow 2.0. I need to access outputs of the intermediate layers. I was going over other stackoverflow questions that were similar but all had solutions involving Keras sequential model.
I have tried using model.layers[index].output but I get
Layer conv2d has no inbound nodes.
I can post my code here (which is super long) but I am sure even without that someone can point to me how it can be done using just Tensorflow 2.0 in eager mode.
I stumbled onto this question while looking for an answer and it took me some time to figure out as I use the model subclassing API in TF 2.0 by default (as in here https://www.tensorflow.org/tutorials/quickstart/advanced).
If somebody is in a similar situation, all you need to do is assign the intermediate output you want, as an attribute of the class. Then keep the test_step without the #tf.function decorator and create its decorated copy, say val_step, for efficient internal computation of validation performance during training. As a short example, I have modified a few functions of the tutorial from the link accordingly. I'm assuming we need to access the output after flattening.
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
self.intermediate=x #assign it as an object attribute for accessing later
x = self.d1(x)
return self.d2(x)
#Remove #tf.function decorator from test_step for prediction
def test_step(images, labels):
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
return
#Create a decorated val_step for object's internal use during training
#tf.function
def val_step(images, labels):
return test_step(images, labels)
Now when you run model.predict() after training, using the un-decorated test step, you can access the intermediate output using model.intermediate which would be an EagerTensor whose value is obtained simply by model.intermediate.numpy(). However, if you don't remove the #tf_function decorator from test_step, this would return a Tensor whose value is not so straightforward to obtain.
Thanks for answering my earlier question. I wrote this simple example to illustrate how what you're trying to do might be done in TensorFlow 2.x, using the MNIST dataset as the example problem.
The gist of the approach:
Build an auxiliary model (aux_model in the example below), which is so-called "functional model" with multiple outputs. The first output is the output of the original model and will be used for loss calculation and backprop, while the remaining output(s) are the intermediate-layer outputs that you want to access.
Use tf.GradientTape() to write a custom training loop and expose the detailed gradient values on each individual variable of the model. Then you can pick out the gradients that are of interest to you. This requires that you know the ordering of the model's variables. But that should be relatively easy for a sequential model.
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
# This is the original model.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=[28, 28, 1]),
tf.keras.layers.Dense(100, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax")])
# Make an auxiliary model that exposes the output from the intermediate layer
# of interest, which is the first Dense layer in this case.
aux_model = tf.keras.Model(inputs=model.inputs,
outputs=model.outputs + [model.layers[1].output])
# Define a custom training loop using `tf.GradientTape()`, to make it easier
# to access gradients on specific variables (the kernel and bias of the first
# Dense layer in this case).
cce = tf.keras.losses.CategoricalCrossentropy()
optimizer = tf.optimizers.Adam()
with tf.GradientTape() as tape:
# Do a forward pass on the model, retrieving the intermediate layer's output.
y_pred, intermediate_output = aux_model(x_train)
print(intermediate_output) # Now you can access the intermediate layer's output.
# Compute loss, to enable backprop.
loss = cce(tf.one_hot(y_train, 10), y_pred)
# Do backprop. `gradients` here are for all variables of the model.
# But we know we want the gradients on the kernel and bias of the first
# Dense layer, which happens to be the first two variables of the model.
gradients = tape.gradient(loss, aux_model.variables)
# This is the gradient on the first Dense layer's kernel.
intermediate_layer_kerenl_gradients = gradients[0]
print(intermediate_layer_kerenl_gradients)
# This is the gradient on the first Dense layer's bias.
intermediate_layer_bias_gradients = gradients[1]
print(intermediate_layer_bias_gradients)
# Update the variables of the model.
optimizer.apply_gradients(zip(gradients, aux_model.variables))
The most straightforward solution would go like this:
mid_layer = model.get_layer("layer_name")
you can now treat the "mid_layer" as a model, and for instance:
mid_layer.predict(X)
Oh, also, to get the name of a hidden layer, you can use this:
model.summary()
this will give you some insights about the layer input/output as well.
I am struggling to train the model in Keras, by minimizing the loss between the correct data and "input*output", but do not know how to deal with it.
Given that
X: model input (training data)
Y: model output
T: correct data
model = Model(inputs=X, outputs=Y)
Then, in my understanding,
model.fit(X,T) trains the model to minimize the distance between Y(=model(X)) and T, according to the user-defined loss function.
My question is:
What if I want to minimize the distance between Y*X and T?
I thought writing such as "model.fit(X * model.predict(X), T)" would work well? (It did not, actually)
I wonder how to write the code to do that.
Thank you for the advice in advance.
Make a functional API model:
inputs = Input(input_shape)
outputs = SomeLayer(...)(inputs)
outputs = SomeLayer(...)(outputs)
outputs = SomeLayer(...)(outputs)
....
outputs = Multiply()([inputs, outputs])
model = Model(inputs, outputs)
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
model.fit(X, T, ...)
I am trying to implement an autoencoder in Keras that not only minimizes the reconstruction error but its constructed features should also maximize a measure I define. I don't really have an idea of how to do this.
Here's a snippet of what I have so far:
corrupt_data = self._corrupt(self.data, 0.1)
# define encoder-decoder network structure
# create input layer
input_layer = Input(shape=(corrupt_data.shape[1], ))
encoded = Dense(self.encoding_dim, activation = "relu")(input_layer)
decoded = Dense(self.data.shape[1], activation="sigmoid")(encoded)
# create autoencoder
dae = Model(input_layer, decoded)
# define custom multitask loss with wlm measure
def multitask_loss(y_true, y_pred):
# extract learned features from hidden layer
learned_fea = Model(input_layer, encoded).predict(self.data)
# additional measure I want to optimize from an external function
wlm_measure = wlm.measure(learned_fea, self.labels)
cross_entropy = losses.binary_crossentropy(y_true, y_pred)
return wlm_measure + cross_entropy
# create optimizer
dae.compile(optimizer=self.optimizer, loss=multitask_loss)
dae.fit(corrupt_data, self.data,
epochs=self.epochs, batch_size=20, shuffle=True,
callbacks=[tensorboard])
# separately create an encoder model
encoder = Model(input_layer, encoded)
Currently this does not work properly... When I viewed the training history the model seems to ignore the additional measure and train only based on the cross entropy loss. Also if I change the loss function to consider only wlm measure, I get the error "numpy.float64" object has no attribute "get_shape" (I don't know if changing my wlm function's return type to a tensor will help).
There are a few places that I think may have gone wrong. I don't know if I am extracting the outputs of the hidden layer correctly in my custom loss function. Also I don't know if my wlm.measure function is outputting correctly—whether it should output numpy.float32 or a 1-dimensional tensor of type float32.
Basically a conventional loss function only cares about the output layer's predicted labels and the true labels. In my case, I also need to consider the hidden layer's output (activation), which is not that straightforward to implement in Keras.
Thanks for the help!
You don't want to define your learned_fea Model inside your custom loss function. Rather, you could define a single model upfront with two outputs: the output of the decoder (the reconstruction) and the output of the endoder (the feature representation):
multi_output_model = Model(inputs=input_layer, outputs=[decoded, encoded])
Now you can write a custom loss function that only applies to the output of the encoder:
def custom_loss(y_true, y_pred):
return wlm.measure(y_pred, y_true)
Upon compiling the model, you pass a list of loss functions (or a dictionary if you name your tensors):
model.compile(loss=['binary_crossentropy', custom_loss], optimizer=...)
And fit the model by passing a list of outputs:
model.fit(X=X, y=[data_to_be_reconstructed,labels_for_wlm_measure])
I'm wondering if it's possible to add a custom model to a loss function in keras. For example:
def model_loss(y_true, y_pred):
inp = Input(shape=(128, 128, 1))
x = Dense(2)(inp)
x = Flatten()(x)
model = Model(inputs=[inp], outputs=[x])
a = model(y_pred)
b = model(y_true)
# calculate MSE
mse = K.mean(K.square(a - b))
return mse
This is a simplified example. I'll actually be using a VGG net in the loss, so just trying to understand the mechanics of keras.
The usual way of doing that is appending your VGG to the end of your model, making sure all its layers have trainable=False before compiling.
Then you recalculate your Y_train.
Suppose you have these models:
mainModel - the one you want to apply a loss function
lossModel - the one that is part of the loss function you want
Create a new model appending one to another:
from keras.models import Model
lossOut = lossModel(mainModel.output) #you pass the output of one model to the other
fullModel = Model(mainModel.input,lossOut) #you create a model for training following a certain path in the graph.
This model will have the exact same weights of mainModel and lossModel, and training this model will affect the other models.
Make sure lossModel is not trainable before compiling:
lossModel.trainable = False
for l in lossModel.layers:
l.trainable = False
fullModel.compile(loss='mse',optimizer=....)
Now adjust your data for training:
fullYTrain = lossModel.predict(originalYTrain)
And finally do the training:
fullModel.fit(xTrain, fullYTrain, ....)
This is old but I'm going to answer it because no one did directly. You definitely can call another model in a custom loss, and I actually think it's much easier than adding the model to the end of your main model and creating a whole new one and a whole new set of training labels.
Here is an example that both calls a model and an outside function that we define -
def normalize_tensor(in_feat):
norm_factor = tf.math.sqrt(tf.keras.backend.sum(in_feat**2, axis=-1, keepdims=True))
return in_feat / (norm_factor + 1e-10)
def VGGLoss(y_true, y_pred):
true = vgg(preprocess_input(y_true * 255))
pred = vgg(preprocess_input(y_pred * 255))
t = normalize_tensor(true[i])
p = normalize_tensor(pred[i])
vggLoss = tf.math.reduce_mean(tf.math.square(t - p))
return vggLoss
vgg() just calls the vgg16 model with no head.
preprocess_input is a keras function that normalizes inputs to be used in the vgg model (here we are assuming your model outputs an image in 0-1 range, then we multiply by 255 to get 0-255 range for vgg).
normalize_tensor takes the vgg activations and makes them have a magnitude of 1 for each channel, otherwise your loss will be massive.