I am trying to train an image denoiser network with Keras (Tensorflow2). For the loss function, I want to use something like (a1 * L1_loss + a2 * L2_loss), where a1 and a2 is trainable, which means after I gave them initial valuse, they can get updated each training iteration. But I am stuck here for a while and do know how should I implement this.
Here is some example code,
model_input = Input(shape=self.input_shape)
l1_weight = tf.Variable(0.5, trainable=True, name='L1_Loss_weight')
l2_weight = tf.Variable(0.5, trainable=True, name='L2_Loss_weight')
model_output= Conv3D(filters=self.filters, kernel_size=self.kernel_size, padding='same')(model_input)
self.model = Model(inputs=model_input,
outputs=model_output)
optimizer = tf.keras.optimizers.SGD()
model_loss = mixed_loss(L1_weight=l1_weight, L2_weight=l2_weight)
self.model.compile(optimizer=optimizer,
loss=model_loss)
where my loss function is defined as
def mixed_loss(L1_weight, L2_weight):
def mixed(y_true, y_pred):
return L1_weight * mean_absolute_error(y_true, y_pred) + L2_weight * mean_squared_error(y_true, y_pred)
return mixed
And then I use fit() function to pass the tf.data.Dataset, which including the training data, to do the training.
Although I can add two weights parameters this way, these weights are un-trainable and they wouldn't change as training. Really hope to get some hints or examples if anyone has some ideas about this problem. Any help is appreciated!
Related
I am working on a multiple output model in Keras. I've implemented two custom metrics, auroc and auprc, that are passed to the compile methods of Keras model:
def auc(y_true, y_pred, curve='PR'):
score, up_opt = tf.compat.v1.metrics.auc(y_true, y_pred, curve=curve, summation_method="careful_interpolation")
K.get_session().run(tf.local_variables_initializer())
with tf.control_dependencies([up_opt]):
score = tf.identity(score)
return score
def auprc(y_true, y_pred):
return auc(y_true, y_pred, curve='PR')
def auroc(y_true, y_pred):
return auc(y_true, y_pred, curve='ROC')
mlp_model.compile(loss=...,
optimizer=...,
metrics=[auprc, auroc])
Using this method, I obtain an auprc/auroc values for every output but, to optimize my hyperparameters with a Bayesian optimizator, I need a single metrics (e.g: the average or the sum of auprc for every output). I can't figure out how I can join my metrics in a single one.
EDIT: here an example of desired results
Now for every epochs the following metrics are printed:
out1_auprc: 0.0267 - out2_auprc: 0.0277 - out3_auprc: 0.0294
where out1, out2, out3 are my neural network outputs, I desire to obtain something like:
average_auprc: 0.0279 - out1_auprc: 0.0267 - out2_auprc: 0.0277 - out3_auprc: 0.0294
I am using Keras Tuner for Bayesian Optimization.
Any help is appreciated, thank you.
I override the problem creating a custom callback
class MergeMetrics(Callback):
def __init__(self,**kargs):
super(MergeMetrics,self).__init__(**kargs)
def on_epoch_begin(self,epoch, logs={}):
return
def on_epoch_end(self, epoch, logs={}):
logs['merge_metrics'] = 0.5*logs["y1_mse"]+0.5*logs["y2_mse"]
I use this callback to merge 2 metrics coming from 2 different outputs. I use a simple problem for example but you can integrate it easily in your problem and integrate it with a validation set
this is the dummy example
X = np.random.uniform(0,1, (1000,10))
y1 = np.random.uniform(0,1, 1000)
y2 = np.random.uniform(0,1, 1000)
inp = Input((10))
x = Dense(32, activation='relu')(inp)
out1 = Dense(1, name='y1')(x)
out2 = Dense(1, name='y2')(x)
m = Model(inp, [out1,out2])
m.compile('adam','mae', metrics='mse')
checkpoint = MergeMetrics()
m.fit(X, [y1,y2], epochs=10, callbacks=[checkpoint])
the printed output
loss: ..... y1_mse: 0.0863 - y2_mse: 0.0875 - merge_metrics: 0.0869
I am trying to implement a very simple keras model that uses Knowledge Distillation [1] from another model.
Roughly, I need to replace the original loss L(y_true, y_pred) by L(y_true, y_pred)+L(y_teacher_pred, y_pred) where y_teacher_pred is the prediction of another model.
I've tried to do
def create_student_model_with_distillation(teacher_model):
inp = tf.keras.layers.Input(shape=(21,))
model = tf.keras.models.Sequential()
model.add(inp)
model.add(...)
model.add(tf.keras.layers.Dense(units=1))
teacher_pred = teacher_model(inp)
def my_loss(y_true,y_pred):
loss = tf.keras.losses.mean_squared_error(y_true, y_pred)
loss += tf.keras.losses.mean_squared_error(teacher_pred, y_pred)
return loss
model.compile(loss=my_loss, optimizer='adam')
return model
However, when I try to call fit on my model, I am getting
TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
How can I solve this issue ?
Refs
[1] https://arxiv.org/abs/1503.02531
Actually, this blogpost is answer to your question: keras blog
But in short - you should use new TF2 API and call teacher's predict before the tf.GradientTape() block:
def train_step(self, data):
# Unpack data
x, y = data
# Forward pass of teacher
teacher_predictions = self.teacher(x, training=False)
with tf.GradientTape() as tape:
# Forward pass of student
student_predictions = self.student(x, training=True)
# Compute losses
student_loss = self.student_loss_fn(y, student_predictions)
distillation_loss = self.distillation_loss_fn(
tf.nn.softmax(teacher_predictions / self.temperature, axis=1),
tf.nn.softmax(student_predictions / self.temperature, axis=1),
)
loss = self.alpha * student_loss + (1 - self.alpha) * distillation_loss
I want to write a custom loss function in Keras which depends on an attribute of a (custom) layer in the network.
The idea is the following:
I have a custom layer which modifies the input in each epoch based on a random variable
The output labels should be modified based on the same variable
Some example code to make it more clear:
import numpy as np
from keras import losses, layers, models
class MyLayer(layers.Layer):
def call(self, x):
a = np.random.rand()
self.a = a # <-- does this work as expected?
return x+a
def my_loss(layer):
def modified_loss(y_true, y_pred):
a = layer.a
y_true = y_true + a
return losses.mse(y_true, y_pred)
input_layer = layers.Input()
my_layer = MyLayer(input_layer, name="my_layer")
output_layer = layers.Dense(4)(my_layer)
model = models.Model(inputs=input_layer, outputs=output_layer)
model.compile('adam', my_loss(model.get_layer("my_layer")))
I expect that a is changing for every batch and that the same a is used in the layer and loss function.
Right now, it is not working the way I intended. It seems like the a in the loss function is never updated (and maybe not even in the layer).
How do I change the attribute/value of a in the layer at every call and access it in the loss function?
Not quite sure I am following the purpose on this (and I am bothered by the call to np inside the call() of your custom layer - could you not use the tf.random functions instead?) but you can certainly access the a property inside your loss function.
Perhaps something like:
class MyLayer(layers.Layer):
def call(self, x):
a = np.random.rand() # FIXME --> use tf.random
self.a = a
return x+a
input_layer = layers.Input()
my_layer = MyLayer(input_layer, name="my_layer")
output_layer = layers.Dense(4)(my_layer)
model = models.Model(inputs=input_layer, outputs=output_layer)
def my_loss(y_true, y_pred):
y_true = y_true + my_layer.a
return losses.mse(y_true, y_pred)
model.compile('adam', loss=my_loss)
I am trying to implement a Siamese Network, as in this paper
In this paper, they have used cross entropy for the Loss function
I am using STL-10 Dataset for training and instead of the 3 layer network used in the paper, I replaced it with VGG-13 CNN network, except the last logit layer.
Here is my loss function code
def loss(pred,true_pred):
cross_entropy_loss = tf.multiply(-1.0,tf.reduce_mean(tf.add(tf.multiply(true_pred,tf.log(pred)),tf.multiply((1-true_pred),tf.log(tf.subtract(1.0,pred))))))
total_loss = tf.add(tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)),cross_entropy_loss,name='total_loss')
return cross_entropy_loss,total_loss
with tf.device('/gpu:0'):
h1 = siamese(feed_image1)
h2 = siamese(feed_image2)
l1_dist = tf.abs(tf.subtract(h1,h2))
with tf.variable_scope('pred') as scope:
predictions = tf.contrib.layers.fully_connected(l1_dist,1,activation_fn = tf.sigmoid,weights_initializer = tf.contrib.layers.xavier_initializer(uniform=False),weights_regularizer = tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32)))
celoss,cost = loss(predictions,feed_labels)
with tf.variable_scope('adam_optimizer') as scope:
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
opt = optimizer.minimize(cost)
However, when I run the training, the cost remains almost constant at 0.6932
I have used Adam Optimizer here.
But previously I used Momentum Optimizer.
I have tried changing the learning rate but the cost still behaves the same.
And all the prediction values converge to 0.5 after a few iterations.
After taking the output for two batches of images (input1 and input2), I take their L1 distance and to that I have connected a fully connected layer with a single output and sigmoid activation function.
[h1 and h2 contains the output of the last fully connected layer(not the logit layer) of the VGG-13 network]
Since the output activation function is sigmoid, and since the prediction values are around 0.5, we can calculate and say that the sum of the weighted L1 distance of the output of the two networks is near to zero.
I can't understand where I am going wrong.
A little help will be very much appreciated.
I thought the nonconvergence may be caused by the gradient vanishing. You can trace the gradients using tf.contrib.layers.optimize_loss and the tensorboard. You can refer to this answer for more details.
Several optimizations(maybe):
1) don't write the cross entropy yourself.
You can employ the sigmoid cross entropy with logits API, since it ensures stability as documented:
max(x, 0) - x * z + log(1 + exp(-abs(x)))
2) do some weigh normalization may would hlep.
3) keep the regularization loss small.
You can read this answer for more information.
4) I don't see the necessity of tf.abs the L1 distance.
And here is the code I modified. Hope it helps.
mode = "training"
rl_rate = .1
with tf.device('/gpu:0'):
h1 = siamese(feed_image1)
h2 = siamese(feed_image2)
l1_dist = tf.subtract(h1, h2)
# is it necessary to use abs?
l1_dist_norm = tf.layers.batch_normalization(l1_dist, training=(mode=="training"))
with tf.variable_scope('logits') as scope:
w = tf.get_variable('fully_connected_weights', [tf.shape(l1_dist)[-1], 1],
weights_initializer = tf.contrib.layers.xavier_initializer(uniform=False), weights_regularizer = tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32))
)
logits = tf.tensordot(l1_dist_norm, w, axis=1)
xent_loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=feed_labels)
total_loss = tf.add(tf.reduce_sum(rl_rate * tf.abs(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))), (1-rl_rate) * xent_loss, name='total_loss')
# or:
# weights = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
# l1_regularizer = tf.contrib.layers.l1_regularizer()
# regularization_loss = tf.contrib.layers.apply_regularization(l1_regularizer, weights)
# total_loss = xent_loss + regularization_loss
with tf.variable_scope('adam_optimizer') as scope:
optimizer = tf.train.AdamOptimizer(learning_rate=0.0005)
opt = tf.contrib.layers.optimize_loss(total_loss, global_step, learning_rate=learning_rate, optimizer="Adam", clip_gradients=max_grad_norm, summaries=["gradients"])
Problem statement
I am trying to train a dynamic RNN in TensorFlow v1.0.1 on Linux RedHat 7.3 (problem also manifests on Windows 7), and no matter what I try, I get the exact same training and validation error at every epoch, i.e. my weights are not updating.
I appreciate any help you can offer.
Example
I tried to reduce this to a minimum example that shows my issue, but the minimum example is still pretty large. I based the network structure largely on this gist.
Network definition
import functools
import numpy as np
import tensorflow as tf
def lazy_property(function):
attribute = '_' + function.__name__
#property
#functools.wraps(function)
def wrapper(self):
if not hasattr(self, attribute):
setattr(self, attribute, function(self))
return getattr(self, attribute)
return wrapper
class MyNetwork:
"""
Class defining an RNN for labeling a time series.
"""
def __init__(self, data, target, num_hidden=64):
self.data = data
self.target = target
self._num_hidden = num_hidden
self._num_steps = int(self.target.get_shape()[1])
self._num_classes = int(self.target.get_shape()[2])
self._weight_and_bias() # create weight and bias tensors
self.prediction
self.error
self.optimize
#lazy_property
def prediction(self):
"""Defines the recurrent neural network prediction scheme."""
# Dynamic LSTM.
network = tf.contrib.rnn.BasicLSTMCell(self._num_hidden)
output, _ = tf.nn.dynamic_rnn(network, data, dtype=tf.float32)
# Flatten and apply same weights to all time steps.
output = tf.reshape(output, [-1, self._num_hidden])
prediction = tf.nn.softmax(tf.matmul(output, self.weight) + self.bias)
prediction = tf.reshape(prediction,
[-1, self._num_steps, self._num_classes])
return prediction
#lazy_property
def cost(self):
"""Defines the cost function for the network."""
cross_entropy = -tf.reduce_sum(self.target * tf.log(self.prediction),
axis=[1, 2])
cross_entropy = tf.reduce_mean(cross_entropy)
return cross_entropy
#lazy_property
def optimize(self):
"""Defines the optimization scheme."""
learning_rate = 0.003
optimizer = tf.train.RMSPropOptimizer(learning_rate)
return optimizer.minimize(self.cost)
#lazy_property
def error(self):
"""Defines a measure of prediction error."""
mistakes = tf.not_equal(tf.argmax(self.target, 2),
tf.argmax(self.prediction, 2))
return tf.reduce_mean(tf.cast(mistakes, tf.float32))
def _weight_and_bias(self):
"""Returns appropriately sized weight and bias tensors for the output layer."""
self.weight = tf.Variable(tf.truncated_normal(
[self._num_hidden, self._num_classes],
mean=0.0,
stddev=0.01,
dtype=tf.float32))
self.bias = tf.Variable(tf.constant(0.1, shape=[self._num_classes]))
Training
Here is my training process. The all_data class just holds my data and labels, and uses a batch generator class to spit out batches for training when I call all_data.train.next() and all_data.train_labels.next(). You can reproduce with any batch generation scheme you like, and I can add the code if you think it is relevant; I felt like this was getting too long as it is.
tf.reset_default_graph()
data = tf.placeholder(tf.float32,
[None, all_data.num_steps, all_data.num_features])
target = tf.placeholder(tf.float32,
[None, all_data.num_steps, all_data.num_outputs])
model = MyNetwork(data, target, NUM_HIDDEN)
print('Training the model...')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print('Initialized.')
for epoch in range(3):
print('Epoch {} |'.format(epoch), end='', flush=True)
for step in range(all_data.train_size // BATCH_SIZE):
# Generate the next training batch and train.
d = all_data.train.next()
t = all_data.train_labels.next()
sess.run(model.optimize,
feed_dict={data: d, target: t})
# Update the user periodically.
if step % summary_frequency == 0:
print('.', end='', flush=True)
# Show training and validation error at the end of each epoch.
print('|', flush=True)
train_error = sess.run(model.error,
feed_dict={data: d, target: t})
valid_error = sess.run(model.error,
feed_dict={
data: all_data.valid,
target: all_data.valid_labels
})
print('Training error: {}%'.format(100 * train_error))
print('Validation error: {}%'.format(100 * valid_error))
# Check testing error after everything.
test_error = sess.run(model.error,
feed_dict={
data: all_data.test,
target: all_data.test_labels
})
print('Testing error after {} epochs: {}%'.format(epoch + 1, 100 * test_error))
For a simple example, I generated random data and labels, where data has shape [num_samples, num_steps, num_features], and each sample has a single label associated with the whole thing:
data = np.random.rand(5000, 1000, 2)
labels = np.random.randint(low=0, high=2, size=[5000])
I then converted my labels to one-hot vectors and tiled them so that the resulting labels tensor was the same size as the data tensor.
Results
No matter what I do, I get results like this:
Training the model...
Initialized.
Epoch 0 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Epoch 1 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Epoch 2 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Testing error after 3 epochs: 49.000000953674316%
Where I have exactly the same error at every epoch. Even if my weights were randomly walking around this should change. For the example shown here, I used random data with random labels, so I do not expect much improvement, but I do expect some change, and I am getting the exact same results every epoch. When I do this with my actual data set, I get the same behavior.
Insight
I hesitate to include this in case it proves to be a red herring, but I believe that my optimizer is calculating cost function gradients of None. When I tried a different optimizer and attempted to clip the gradients, I went ahead and used tf.Print to output the gradients as well. The network crashed with an error that tf.Print could not handle None-type values.
Attempted fixes
I have tried the following things, and the problem persists in all cases:
Using different optimizers, e.g. AdamOptimizer with and without modifications to the gradients (clipping).
Adjusting batch sizes.
Using many more and many fewer hidden nodes.
Running for more epochs.
Initializing my weights with different values assigned to stddev.
Initializing my biases to zeros (using tf.zeros) and to different constants.
Using weights and biases that are defined within the prediction method and are not member variables of the class, and a _weight_and_bias method that is defined as a #staticmethod like in this gist.
Determining logits in the prediction function instead of softmax predictions, i.e. predictions = tf.matmul(output, self.weights) + self.bias, and then using tf.nn.softmax_cross_entropy_with_logits. This requires some reshaping because the method wants its labels and targets given with shape [batch_size, num_classes], so the cost method becomes:
(line added to get code to format...)
#lazy_property
def cost(self):
"""Defines the cost function for the network."""
targs = tf.reshape(self.target, [-1, self._num_classes])
logits = tf.reshape(self.predictions, [-1, self._num_classes])
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=targs, logits=logits)
cross_entropy = tf.reduce_mean(cross_entropy)
return cross_entropy
Changing which size dimension I leave as None when I create my placeholders as suggested in this answer, which requires a bit of rewriting in the network definition. Basically setting size = [all_data.batch_size, -1, all_data.num_features] and size = [all_data.batch_size, -1, all_data.num_classes].
Using tf.contrib.rnn.DropoutWrapper in my network definition and passing a dropout value set to 0.5 in training and 1.0 in validation and testing.
The problem went away when I used
output = tf.contrib.layers.flatten(output)
logits = tf.contrib.layers.fully_connected(output, some_size, activation_fn=None)
instead of flattening my network output, defining weights, and performing the tf.matmul(output, weight) + bias manually. I then used logits (instead of predictions in the question) in my cost function with
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=target,
logits=logits)
If you want to get the network prediction, you will still need to do prediction = tf.nn.softmax(logits).
I have no idea why this helped, but the network would not train even on random made-up data until I made these changes.