I am using Kears with tensorflow and I have a model with 3 output out of which I only want to train 2.
model = Model(input=input, output=[out1,out2,out3])
model.compile(loss=[loss1, loss2, loss3], optimizer=my_optimizer)
loss1(y_true, y_pred):
return calculate_loss1(y_true, y_pred)
loss2(y_true, y_pred):
return calculate_loss2(y_true, y_pred)
loss3(y_true, y_pred):
return 0.0*K.mean(y_pred)
I tried to do it with the code above but I am not sure it does what I want do do. So I think it adds up the losses and it trains each output with that loss meanwhile I do not wish to train out3 at all. (I need out3 because it is used in testing). Could anybody tell me how to achieve this or reassure me that the code actually dose what I want?
You have to create 2 different models like this
model1 = Model(input=input, output=[out1,out2])
model2 = Model(input=input, output=[out1,out2,out3])
You compile both but only fit the first. They will share the layers so model2, even if it wasn't trained, will have the weights learned from model1. But if there is a layer in out3 which is trainable but not in the flow between input and out1 and out2 of the graph, that layer wont be trained so will stay wirh its inital values.
Does that help? :-)
You can set one of the losses to None:
model = Model(input=input, output=[out1,out2,out3])
model.compile(loss=[loss1, loss2, None], optimizer=my_optimizer)
loss1(y_true, y_pred):
return calculate_loss1(y_true, y_pred)
loss2(y_true, y_pred):
return calculate_loss2(y_true, y_pred)
Related
I want to train a classification model with two losses as follows:
model.compile(optimizer=adam)
#tf.function
def train(model, inputs_data_1, inputs_data_2, y):
with tf.GradientTape(persistent=True) as tape:
logits1, features1 = model(inputs_data_1) # logits: output of fully-connected layer
logits2, features2 = model(inputs_data_2) # features: output of feature extractor
loss_fn1 = cross-entropy(y, logits1)
loss_fn2 = euclidean_dist(features1-features2)
losses = loss_fn1 + loss_fn2
optim.apply_gradients(zip(tape.gradient(losses, model.trainable_weights), model.trainable_variables))
when I try this, it just stopped without an error.
I didn't change the input data by using tf.split or tf.reshape
how can I compile the model and train with two losses?
Plz, give me some opinions or code implementation to reference this problem. Thank you.
I am struggling to train the model in Keras, by minimizing the loss between the correct data and "input*output", but do not know how to deal with it.
Given that
X: model input (training data)
Y: model output
T: correct data
model = Model(inputs=X, outputs=Y)
Then, in my understanding,
model.fit(X,T) trains the model to minimize the distance between Y(=model(X)) and T, according to the user-defined loss function.
My question is:
What if I want to minimize the distance between Y*X and T?
I thought writing such as "model.fit(X * model.predict(X), T)" would work well? (It did not, actually)
I wonder how to write the code to do that.
Thank you for the advice in advance.
Make a functional API model:
inputs = Input(input_shape)
outputs = SomeLayer(...)(inputs)
outputs = SomeLayer(...)(outputs)
outputs = SomeLayer(...)(outputs)
....
outputs = Multiply()([inputs, outputs])
model = Model(inputs, outputs)
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
model.fit(X, T, ...)
Suppose we have a simple Keras model that uses BatchNormalization:
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(1,)),
tf.keras.layers.BatchNormalization()
])
How to actually use it with GradientTape? The following doesn't seem to work as it doesn't update the moving averages?
# model training... we want the output values to be close to 150
for i in range(1000):
x = np.random.randint(100, 110, 10).astype(np.float32)
with tf.GradientTape() as tape:
y = model(np.expand_dims(x, axis=1))
loss = tf.reduce_mean(tf.square(y - 150))
grads = tape.gradient(loss, model.variables)
opt.apply_gradients(zip(grads, model.variables))
In particular, if you inspect the moving averages, they remain the same (inspect model.variables, averages are always 0 and 1). I know one can use .fit() and .predict(), but I would like to use the GradientTape and I'm not sure how to do this. Some version of the documentation suggests to update update_ops, but that doesn't seem to work in eager mode.
In particular, the following code will not output anything close to 150 after the above training.
x = np.random.randint(200, 210, 100).astype(np.float32)
print(model(np.expand_dims(x, axis=1)))
with gradient tape mode BatchNormalization layer should be called with argument training=True
example:
inp = KL.Input( (64,64,3) )
x = inp
x = KL.Conv2D(3, kernel_size=3, padding='same')(x)
x = KL.BatchNormalization()(x, training=True)
model = KM.Model(inp, x)
then moving vars are properly updated
>>> model.layers[2].weights[2]
<tf.Variable 'batch_normalization/moving_mean:0' shape=(3,) dtype=float32, numpy
=array([-0.00062087, 0.00015137, -0.00013239], dtype=float32)>
I just give up. I spent quiet a bit of time trying to make sense of a model that looks like:
model = tf.keras.Sequential([
tf.keras.layers.BatchNormalization(),
])
And I do give up because that thing looks like that:
My intuition was that BatchNorm these days is not as straight forward as it used to be and that is why it scales original distribution but not so much new distribution (which is a shame), but ain't nobody got time for that.
Edit: the reason for that behavior is that BN only calculates moments and normalizes batches during training. During training it maintains running averages of mean and deviation and once you switch to evaluation, parameters are used as constants. i.e. evaluation should not depend on normalization because evaluation can be used even for a single input and can not rely on batch statistics. Since constants are calculated on a different distribution, you are getting a higher error during evaluation.
With Gradient Tape mode, you would usually find gradients like:
with tf.GradientTape() as tape:
y_pred = model(features)
loss = your_loss_function(y_pred, y_true)
gradients = tape.gradient(loss, model.trainable_variables)
train_op = model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
However, if your model contains BatchNormalization or Dropout layer (or any layer that has different train/test phases) then tf will fail building the graph.
A good practice would be to explicitly use trainable parameter when obtaining output from a model. When optimizing use model(features, trainable=True) and when predicting use model(features, trainable=False), in order to explicitly choose train/test phase when using such layers.
For PREDICT and EVAL phase, use
training = (mode == tf.estimator.ModeKeys.TRAIN)
y_pred = model(features, trainable=training)
For TRAIN phase, use
with tf.GradientTape() as tape:
y_pred = model(features, trainable=training)
loss = your_loss_function(y_pred, y_true)
gradients = tape.gradient(loss, model.trainable_variables)
train_op = model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
Note that, iperov's answer works as well, except that you will need to set the training phase manually for those layers.
x = BatchNormalization()(x, training=True)
x = Dropout(rate=0.25)(x, training=True)
x = BatchNormalization()(x, training=False)
x = Dropout(rate=0.25)(x, training=False)
I'd recommended to have one get_model function that returns the model, while changing the phase using training parameter when calling the model.
Note:
If you use model.variables when finding gradients, you'll get this warning
Gradients do not exist for variables
['layer_1_bn/moving_mean:0',
'layer_1_bn/moving_variance:0',
'layer_2_bn/moving_mean:0',
'layer_2_bn/moving_variance:0']
when minimizing the loss.
This can be resolved by computing gradients only against trainable variables. Replace model.variables with model.trainable_variables
I want to use BERT model to do multi-label classification with Tensorflow.
To do so, I want to adapt the example run_classifier.py from BERT github repository, which is an example on how to use BERT to do simple classification, using the pre-trained weights given by Google Research. (For example with BERT-Base, Cased)
I have X different labels which have value of either 0 or 1, so I want to add to the original BERT model a new Dense layer of size X and using the sigmoid_cross_entropy_with_logits activation function.
So, for the theorical part I think I am OK.
The problem is that I don't know how I can append a new output layer and retrain only this new layer with my dataset, using the existing BertModel class.
Here is the original create_model() function from run_classifier.py where I guess I have to do my modifications. But I am a bit lost on what to do.
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
labels, num_labels, use_one_hot_embeddings):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable(
"output_weights", [num_labels, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable(
"output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
loss = tf.reduce_mean(per_example_loss)
return (loss, per_example_loss, logits, probabilities)
And here is the same function, with some of my modifications, but where there is things missing (and wrong things too? )
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids, labels, num_labels):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids)
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable("output_weights", [num_labels, hidden_size],initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable("output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
loss = tf.reduce_mean(per_example_loss)
return (loss, per_example_loss, logits, probabilities)
The other things I have adapted in the code and for which I had no problem :
DataProcessor to load and parse my custom dataset
Changing the type of labels variable from numerical values to arrays everywhere it is used
So, if anyone knows what I should do to resolve my problem, or even point out some obvious mistake I may have done, I would be glad to hear it.
Notes :
I found this article that correspond pretty well to what I am trying to do, but it use PyTorch, and I can not translate it into Tensorflow.
You want to replace the softmax that models a single distribution over possible outputs (all scores sum up to one) with sigmoid which models an independent distribution for each class (there is yes/no distribution for each output).
So, you correctly change the loss function, but you also need to change how you compute the probabilities. It should be:
probabilities = tf.sigmoid(logits)
In this case, you don't need the log_probs.
I am trying to use huber loss in a keras model (writing DQN), but I am getting bad result, I think I am something doing wrong. My is code is below.
model = Sequential()
model.add(Dense(output_dim=64, activation='relu', input_dim=state_dim))
model.add(Dense(output_dim=number_of_actions, activation='linear'))
loss = tf.losses.huber_loss(delta=1.0)
model.compile(loss=loss, opt='sgd')
return model
I came here with the exact same question. The accepted answer uses logcosh which may have similar properties, but it isn't exactly Huber Loss. Here's how I implemented Huber Loss for Keras (note that I'm using Keras from Tensorflow 1.5).
import numpy as np
import tensorflow as tf
'''
' Huber loss.
' https://jaromiru.com/2017/05/27/on-using-huber-loss-in-deep-q-learning/
' https://en.wikipedia.org/wiki/Huber_loss
'''
def huber_loss(y_true, y_pred, clip_delta=1.0):
error = y_true - y_pred
cond = tf.keras.backend.abs(error) < clip_delta
squared_loss = 0.5 * tf.keras.backend.square(error)
linear_loss = clip_delta * (tf.keras.backend.abs(error) - 0.5 * clip_delta)
return tf.where(cond, squared_loss, linear_loss)
'''
' Same as above but returns the mean loss.
'''
def huber_loss_mean(y_true, y_pred, clip_delta=1.0):
return tf.keras.backend.mean(huber_loss(y_true, y_pred, clip_delta))
Depending if you want to reduce the loss or the mean of the loss, use the corresponding function above.
You can wrap Tensorflow's tf.losses.huber_loss in a custom Keras loss function and then pass it to your model.
The reason for the wrapper is that Keras will only pass y_true, y_pred to the loss function, and you likely want to also use some of the many parameters to tf.losses.huber_loss. So, you'll need some kind of closure like:
def get_huber_loss_fn(**huber_loss_kwargs):
def custom_huber_loss(y_true, y_pred):
return tf.losses.huber_loss(y_true, y_pred, **huber_loss_kwargs)
return custom_huber_loss
# Later...
model.compile(
loss=get_huber_loss_fn(delta=0.1)
...
)
I was looking through the losses of keras. Apparently logcosh has same properties as huber loss. More details of their similarity can be seen here.
How about:
loss=tf.keras.losses.Huber(delta=100.0)