How to override gradient for the nonlinearity functions in lasagne? - python

I have a model, for which i need to compute the gradients of output w.r.t the model's input. But I want to apply some custom gradients for some of the nonlinearity functions applied on some of the model's layers. So i tried the idea explained here, which computes the nonlinear rectifier (RELU) in the forward pass but modifies the gradients of Relu in the backward pass. I added the following two classes:
The helper class that allows us to replace a nonlinearity with an Op
that has the same output, but a custom gradient
class ModifiedBackprop(object):
def __init__(self, nonlinearity):
self.nonlinearity = nonlinearity
self.ops = {} # memoizes an OpFromGraph instance per tensor type
def __call__(self, x):
# OpFromGraph is oblique to Theano optimizations, so we need to move
# things to GPU ourselves if needed.
if theano.sandbox.cuda.cuda_enabled:
maybe_to_gpu = theano.sandbox.cuda.as_cuda_ndarray_variable
else:
maybe_to_gpu = lambda x: x
# We move the input to GPU if needed.
x = maybe_to_gpu(x)
# We note the tensor type of the input variable to the nonlinearity
# (mainly dimensionality and dtype); we need to create a fitting Op.
tensor_type = x.type
# If we did not create a suitable Op yet, this is the time to do so.
if tensor_type not in self.ops:
# For the graph, we create an input variable of the correct type:
inp = tensor_type()
# We pass it through the nonlinearity (and move to GPU if needed).
outp = maybe_to_gpu(self.nonlinearity(inp))
# Then we fix the forward expression...
op = theano.OpFromGraph([inp], [outp])
# ...and replace the gradient with our own (defined in a subclass).
op.grad = self.grad
# Finally, we memoize the new Op
self.ops[tensor_type] = op
# And apply the memoized Op to the input we got.
return self.ops[tensor_type](x)
The subclass that does guided backpropagation through a nonlinearity:
class GuidedBackprop(ModifiedBackprop):
def grad(self, inputs, out_grads):
(inp,) = inputs
(grd,) = out_grads
dtype = inp.dtype
print('It works')
return (grd * (inp > 0).astype(dtype) * (grd > 0).astype(dtype),)
Then i used them in my code as follows:
import lasagne as nn
model_in = T.tensor3()
# model_in = net['input'].input_var
nn.layers.set_all_param_values(net['l_out'], model['param_values'])
relu = nn.nonlinearities.rectify
relu_layers = [layer for layer in
nn.layers.get_all_layers(net['l_out']) if getattr(layer,
'nonlinearity', None) is relu]
modded_relu = GuidedBackprop(relu)
for layer in relu_layers:
layer.nonlinearity = modded_relu
prop = nn.layers.get_output(
net['l_out'], model_in, deterministic=True)
for sample in range(ini, batch_len):
model_out = prop[sample, 'z'] # get prop for label 'z'
gradients = theano.gradient.jacobian(model_out, wrt=model_in)
# gradients = theano.grad(model_out, wrt=model_in)
get_gradients = theano.function(inputs=[model_in],
outputs=gradients)
grads = get_gradients(X_batch) # gradient dimension: X_batch == model_in(64, 20, 32)
grads = np.array(grads)
grads = grads[sample]
Now when i run the code, it works without any error, and the shape of the output is also correct. But that's because it executes the default theano.grad function and not the one supposed to override it. In other words, the grad() function in the class GuidedBackprop never been invoked.
I can't understand what is the issue?
is there's a solution?
If this is an unresolved issue, is there's an implementation for a Theano Op that can achieve such a functionality or some other way to override gradient for specific nonlinearity functions applied on some of the model's layers?

Are you try to set it back the value of model output into model layer input, all gradients calculation
group_1_ShoryuKen_Left = tf.constant([ 0,0,0,0,0,1,0,0,0,0,0,0, 0,0,0,0,0,1,0,1,0,0,0,0, 0,0,0,0,0,0,0,1,0,0,0,0, 0,0,0,0,0,0,0,0,0,1,0,0 ], shape=(1, 1, 48), dtype=tf.float32)
## layer_2 = tf.keras.layers.Dense(256, kernel_initializer=tf.constant_initializer(1.))
layer_2 = tf.keras.layers.LSTM(32, kernel_initializer=tf.constant_initializer(1.))
b_out = layer_2(group_1_ShoryuKen_Left)
layer_2.set_weights(layer_1.get_weights())

Related

Tensorflow gradients do not exist for bias in custom layer

I've built an input convex neural network in Tensorflow following this ArXiv paper that is a scalar output feed-forward model. The first hidden layer is dense and subsequent layers are custom that takes two inputs: the output from the previous layer (kernel) and the model input (passthrough). Separate weights are applied to each. This allows a positive weights regularizer to be applied to kernel weights but not the passthrough. I calculate the regularizer and add it using self.add_loss in the call method of the custom layer. I'm also using custom activation functions that are squared leaky ReLU and leaky ReLU.
When I am training this network I am able to calculate a gradient for the bias in the first dense layer but I get a warning that no gradient exists for the bias in the custom layer. When I add #tf.function to my activation functions the warning goes away but the gradient is 0. Furthermore, loss.numpy() throws an error when I use #tf.function and run in a local Jupyter notebook (but not in Colab).
Any ideas why the bias gradient exists for the dense but not the custom layer and how to calculate the bias gradient for all layers? A minimal working example is provided in this Colab notebook. Much appreciated!
Below is my custom layer. It's very similar to the standard dense layer.
class DensePartiallyConstrained(Layer):
'''
A custom layer inheriting from `tf.keras.layers.Layers` class.
This class is a fully-connected layer with two inputs. This allows
for different constraints on the weights of each input. This enables
a passthrough of the inputs to each hidden layer to have no
weight constraints while the input from the previous layer can have
a positive constraint. It also allows for different initializations
of the weight values for each input.
Most of this code and documentation was borrowed from the
`tf.keras.layers.Dense` documentation on Github (thanks!).
'''
def __init__(self,
units,
activation = None,
use_bias = True,
kernel_initializer = 'glorot_uniform',
passthrough_initializer = 'glorot_uniform',
bias_initializer = 'zeros',
kernel_constraint = None,
passthrough_constraint = None,
bias_constraint = None,
activity_regularizer = None,
regularizer_constant = 1.0,
**kwargs):
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super(DensePartiallyConstrained, self).__init__(
activity_regularizer = regularizers.get(activity_regularizer), **kwargs)
self.units = int(units)
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.passthrough_initializer = initializers.get(passthrough_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.passthrough_constraint = constraints.get(passthrough_constraint)
self.bias_constraint = constraints.get(bias_constraint)
# This is for add_loss in call() method
self.regularizer_constant = regularizer_constant
# What does this do?
self.supports_masking = True
self.kernel_input_spec = InputSpec(min_ndim=2)
self.passthrough_input_spec = InputSpec(min_ndim=2)
def build(self, input_shape):
# Input shapes provided as list [kernel, passthrough]
kernel_input_shape, passthrough_input_shape = input_shape
# Check for proper datatype
dtype = dtypes.as_dtype(self.dtype or K.floatx())
if not (dtype.is_floating or dtype.is_complex):
raise TypeError('Unable to build `DensePartiallyConstrained` layer with non-floating point '
'dtype %s' % (dtype,))
# Check kernel input dimensions
kernel_input_shape = tensor_shape.TensorShape(kernel_input_shape)
if tensor_shape.dimension_value(kernel_input_shape[-1]) is None:
raise ValueError('The last dimension of the inputs to `DensePartiallyConstrained` '
'should be defined. Found `None`.')
kernel_last_dim = tensor_shape.dimension_value(kernel_input_shape[-1])
self.kernel_input_spec = InputSpec(min_ndim=2,
axes={-1: kernel_last_dim})
# Check passthrough input dimensions
passthrough_input_shape = tensor_shape.TensorShape(passthrough_input_shape)
if tensor_shape.dimension_value(passthrough_input_shape[-1]) is None:
raise ValueError('The last dimension of the inputs to `DensePartiallyConstrained` '
'should be defined. Found `None`.')
passthrough_last_dim = tensor_shape.dimension_value(passthrough_input_shape[-1])
self.passthrough_input_spec = InputSpec(min_ndim=2,
axes={-1: passthrough_last_dim})
# Add weights to kernel (between layer connections)
self.kernel = self.add_weight(name = 'kernel',
shape = [kernel_last_dim, self.units],
initializer = self.kernel_initializer,
constraint = self.kernel_constraint,
dtype = self.dtype,
trainable = True)
# Add weight to input passthrough
self.passthrough = self.add_weight(name = 'passthrough',
shape = [passthrough_last_dim, self.units],
initializer = self.passthrough_initializer,
constraint = self.passthrough_constraint,
dtype = self.dtype,
trainable = True)
# Add weights to bias
if self.use_bias:
self.bias = self.add_weight(name = 'bias',
shape = [self.units,],
initializer = self.bias_initializer,
constraint = self.bias_constraint,
dtype = self.dtype,
trainable = True)
else:
self.bias = None
self.built = True
super(DensePartiallyConstrained, self).build(input_shape)
def call(self, inputs):
# Inputs provided as list [kernel, passthrough]
kernel_input, passthrough_input = inputs
# Calculate weights regularizer
self.add_loss(self.regularizer_constant * tf.reduce_sum(tf.square(tf.math.maximum(tf.negative(self.kernel), 0.0))))
# Calculate layer output
outputs = tf.add(tf.matmul(kernel_input, self.kernel), tf.matmul(passthrough_input, self.passthrough))
if self.use_bias:
outputs = tf.add(outputs, self.bias)
if self.activation is not None:
return self.activation(outputs)
return outputs
And my activation functions:
##tf.function
def squared_leaky_ReLU(x, alpha = 0.2):
return tf.square(tf.maximum(x, alpha * x))
##tf.function
def leaky_ReLU(x, alpha = 0.2):
return tf.maximum(x, alpha * x)
Edit:
With a tensorflow update I can now access loss.numpy() when using #tf.function with my activation functions. This returns 0 gradients for the bias in all of my custom layers.
I'm beginning to think that the lack of gradient for the bias terms in the custom layer might have something to do with my loss function:
minimax loss
where
regularizer
is regularization for the weights in the custom layer kernel only. The loss for g(x) is based on the gradient with respect to the inputs, so it doesn't contain any information about the bias (the bias in f(x) update normally). Still though, if this is the case I don't understand why the bias in the first hidden dense layer of g(y) is updated? The networks are identical other than f(x) has a positive constraint on the kernel weights.

Pass non-symbolic tensor to Keras Lambda layer

I am trying to pass a RNNCell object to a Keras lambda layer so that I can use the Tensorflow layer within a Keras model, as follows.
conv_cell = ConvGRUCell(shape = [14, 14],
filters = 32,
kernel = [3,3],
padding = 'SAME')
def convGRU(inputs, cell, length):
output, final = tf.nn.bidirectional_dynamic_rnn(
cell, cell, x, length, dtype=tf.float32)
output = tf.concat(output, -1)
final = tf.concat(final, -1)
return [output, final]
lm = Lambda(lambda x: convGRU(x[0], x[1], x[2])([input, conv_cell, length])
However, I get an error that conv_cell is not a symbolic tensor (it is a custom layer based on Tensorflow's GRUCell).
Is there any way to pass the cell to the lambda layer? I got it to work with functools.partial but it fails to save/load the model because it cannot access the function inside the model.
def convGRU(cell, length): # if length is produced by the model, use it with the inputs
def inner_func(inputs):
code...
return inner_func
lm = Lambda(convGRU(cell, length))(input)
For save/load you need to use custom_objects = {'convGRU': convGRU, 'cell':cell, 'length': length}, etc. Whatever Keras doesn't know automatically needs to be in custom_objects for loading a saved model.

How to restore the function defined in the graph?

I defined a funciton in tensorflow as follows:
def generator(keep_prob, z, out_channel_dim, alphag1, is_train=True):
"""
Create the generator network
:param z: Input z
:param out_channel_dim: The number of channels in the output image
:param is_train: Boolean if generator is being used for training
:return: The tensor output of the generator
"""
# TODO: Implement Function
# when it is training reuse=False
# when it is not training reuse=True
alpha=alphag1
with tf.variable_scope('generator',reuse=not is_train):
layer = tf.layers.dense(z, 3*3*512,activation=None,\
kernel_initializer=tf.contrib.layers.xavier_initializer(uniform=False))
layer = tf.reshape(layer, [-1, 3,3,512])
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.0001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, 256, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.00001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, 128, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.000001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, 64, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.0000001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, out_channel_dim, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.00000001, dtype=tf.float32)
layer = tf.tanh(layer)
return layer
This is complicated such that to track each variable in each layer is difficult.
I later used tf.train.Saver() and saver.save to save everything after training.
Now I would like to restore this function so that I can use it to do further manipulations while keeping the trained weigts of each layer unchanged.
I found online that most function like tf.get_default_graph().get_tensor_by_name or some other functions were limited to restore only the values of the variables but not this function.
For example the input z of this function generator(keep_prob, z, out_channel_dim, alphag1, is_train=True) is a tensor from another function.
I want to restore this function so that I can use two new tensors z1 and z2 with she same shape as z.
layer1 = generator(keep_prob, z1, out_channel_dim, alphag1, is_train=False)
layer2 = generator(keep_prob, z2, out_channel_dim, alphag1, is_train=False)
layer = layer1 - layer2
and I can put this new tensor layer into another function.
Here layer1 and layer2 use the function with the saved weights.
The thing that is difficlut is that when I use the function generator I have to specifiy it with the trianed weights which was stored using Saver(). I find it difficult to specify this function with its weights. For, 1. too many layers to track off and 2. I don't know how to specify weights for tf.layers.conv2().
So are there anyone who know how to solve this issue?
This is a general question:
I save the whole model into file and need to restore part of the model into part of new model.
Here name_map is a dict:the key is new name in graph and value is name in ckpt file.
def get_restore_saver(self, name_map, restore_optimise_var=True):
var_grp = {_.op.name:_ for _ in tf.global_variables()}
varm = {}
for main_name in var_grp:
if main_name in name_map:
varm[name_map[main_name]] = var_grp[main_name]
elif restore_optimise_var: # I use adam to optimise
var_arr = main_name.split('/')
tail = var_arr[-1]
_ = '/'.join(var_arr[: -1])
if tail in ['Adam', 'Adam_1', 'ExponentialMovingAverage'] and _ in name_map:
varm[name_map[_] + "/" + tail] = var_grp[main_name]
return tf.train.Saver(varm)
Why do you need to restore the function and what does that even mean? If you need to use a model, you have to restore the corresponding graph. What your function does is defining nodes of the graph. You may use your function to build or rebuild that graph again and then load weights stored somewhere using Saver() or you may restore graph from the protobuf file.
In order to rebuild the graph, try to invoke invoke your function somewhere output_layer=generator(keep_prob, z, out_channel_dim, alphag1, is_train=True) and than use Saver class as usual to restore weights. Your function does not compute, it defines a part or whole of the graph. All computations are performed by the graph.
In the last case you will find useful the following thread. Usually, you will need to know names of the input and output layers. That can be obtained by the code:
[n.name for n in tf.get_default_graph().as_graph_def().node]
After a long time of searching, it seems that maybe the following is a solution.
Define all the variables in advance,i.e.layer1 = generator(keep_prob, z1,
out_channel_dim, alphag1, is_train=False)
layer2 = generator(keep_prob, z2, out_channel_dim, alphag1, is_train=False)
layer = layer1 - layer2.
Now you can use tf.get_collection to find the operators.
It seems that tensorflow will not give you the pre defined functions. It keeps the graph and values only but not in the form of function. One needs to set everything needed in the furture in the graph or one should keep track of every weights, even too many.

Inverting Gradients in Keras

I'm trying to port the BoundingLayer function from this file to the DDPG.py agent in keras-rl but I'm having some trouble with the implementation.
I modified the get_gradients(loss, params) method in DDPG.py to add this:
action_bounds = [-30, 50]
inverted_grads = []
for g,p in zip(modified_grads, params):
is_above_upper_bound = K.greater(p, K.constant(action_bounds[1], dtype='float32'))
is_under_lower_bound = K.less(p, K.constant(action_bounds[0], dtype='float32'))
is_gradient_positive = K.greater(g, K.constant(0, dtype='float32'))
is_gradient_negative = K.less(g, K.constant(0, dtype='float32'))
invert_gradient = tf.logical_or(
tf.logical_and(is_above_upper_bound, is_gradient_negative),
tf.logical_and(is_under_lower_bound, is_gradient_positive)
)
inverted_grads.extend(K.switch(invert_gradient, -g, g))
modified_grads = inverted_grads[:]
But I get an error about the shape:
ValueError: Shape must be rank 0 but is rank 2 for 'cond/Switch' (op: 'Switch') with input shapes: [2,400], [2,400].
keras-rl "get_gradients" function uses gradients calculated with a combined actor-critic model, but you need the gradient of the critic output wrt the action input to apply the inverting gradients feature.
I've recently implemented it on a RDPG prototype I'm working on, using keras-rl. Still testing, the code can be optimized and is not bug free for sure, but I've put the inverting gradient to work by modifying some keras-rl lines of code. In order to modify the gradient of the critic output wrt the action input, I've followed the original formula to compute the actor gradient, with the help of this great post from Patrick Emami: http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html.
I'm putting here the entire "compile" function, redefined in a class that inherits from "DDPAgent", where the inverting gradient feature is implemented.
def compile(self, optimizer, metrics=[]):
metrics += [mean_q]
if type(optimizer) in (list, tuple):
if len(optimizer) != 2:
raise ValueError('More than two optimizers provided. Please only provide a maximum of two optimizers, the first one for the actor and the second one for the critic.')
actor_optimizer, critic_optimizer = optimizer
else:
actor_optimizer = optimizer
critic_optimizer = clone_optimizer(optimizer)
if type(actor_optimizer) is str:
actor_optimizer = optimizers.get(actor_optimizer)
if type(critic_optimizer) is str:
critic_optimizer = optimizers.get(critic_optimizer)
assert actor_optimizer != critic_optimizer
if len(metrics) == 2 and hasattr(metrics[0], '__len__') and hasattr(metrics[1], '__len__'):
actor_metrics, critic_metrics = metrics
else:
actor_metrics = critic_metrics = metrics
def clipped_error(y_true, y_pred):
return K.mean(huber_loss(y_true, y_pred, self.delta_clip), axis=-1)
# Compile target networks. We only use them in feed-forward mode, hence we can pass any
# optimizer and loss since we never use it anyway.
self.target_actor = clone_model(self.actor, self.custom_model_objects)
self.target_actor.compile(optimizer='sgd', loss='mse')
self.target_critic = clone_model(self.critic, self.custom_model_objects)
self.target_critic.compile(optimizer='sgd', loss='mse')
# We also compile the actor. We never optimize the actor using Keras but instead compute
# the policy gradient ourselves. However, we need the actor in feed-forward mode, hence
# we also compile it with any optimzer and
self.actor.compile(optimizer='sgd', loss='mse')
# Compile the critic.
if self.target_model_update < 1.:
# We use the `AdditionalUpdatesOptimizer` to efficiently soft-update the target model.
critic_updates = get_soft_target_model_updates(self.target_critic, self.critic, self.target_model_update)
critic_optimizer = AdditionalUpdatesOptimizer(critic_optimizer, critic_updates)
self.critic.compile(optimizer=critic_optimizer, loss=clipped_error, metrics=critic_metrics)
clipnorm = getattr(actor_optimizer, 'clipnorm', 0.)
clipvalue = getattr(actor_optimizer, 'clipvalue', 0.)
critic_gradients_wrt_action_input = tf.gradients(self.critic.output, self.critic_action_input)
critic_gradients_wrt_action_input = [g / float(self.batch_size) for g in critic_gradients_wrt_action_input] # since TF sums over the batch
action_bounds = [(-1.,1.) for i in range(self.nb_actions)]
def calculate_inverted_gradient():
"""
Applies "inverting gradient" feature to the action-value gradients.
"""
gradient_wrt_action = -critic_gradients_wrt_action_input[0]
inverted_gradients = []
for n in range(self.batch_size):
inverted_gradient = []
for i in range(gradient_wrt_action[n].shape[0].value):
action = self.critic_action_input[n][i]
is_gradient_negative = K.less(gradient_wrt_action[n][i], K.constant(0, dtype='float32'))
adjust_for_upper_bound = gradient_wrt_action[n][i] * ((action_bounds[i][1] - action) / (action_bounds[i][1] - action_bounds[i][0]))
adjust_for_lower_bound = gradient_wrt_action[n][i] * ((action - action_bounds[i][0]) / (action_bounds[i][1] - action_bounds[i][0]))
modified_gradient = K.switch(is_gradient_negative, adjust_for_upper_bound, adjust_for_lower_bound)
inverted_gradient.append( modified_gradient )
inverted_gradients.append(inverted_gradient)
gradient_wrt_action = tf.stack(inverted_gradients)
return gradient_wrt_action
actor_gradients_wrt_weights = tf.gradients(self.actor.output, self.actor.trainable_weights, grad_ys=calculate_inverted_gradient())
actor_gradients_wrt_weights = [g / float(self.batch_size) for g in actor_gradients_wrt_weights] # since TF sums over the batch
def get_gradients(loss, params):
""" Used by the actor optimizer.
Returns the gradients to train the actor.
These gradients are obtained by multiplying the gradients of the actor output w.r.t. its weights
with the gradients of the critic output w.r.t. its action input. """
# Aplly clipping if defined
modified_grads = [g for g in actor_gradients_wrt_weights]
if clipnorm > 0.:
norm = K.sqrt(sum([K.sum(K.square(g)) for g in modified_grads]))
modified_grads = [optimizers.clip_norm(g, clipnorm, norm) for g in modified_grads]
if clipvalue > 0.:
modified_grads = [K.clip(g, -clipvalue, clipvalue) for g in modified_grads]
return modified_grads
actor_optimizer.get_gradients = get_gradients
# get_updates is the optimizer function that changes the weights of the network
updates = actor_optimizer.get_updates(self.actor.trainable_weights, self.actor.constraints, None)
if self.target_model_update < 1.:
# Include soft target model updates.
updates += get_soft_target_model_updates(self.target_actor, self.actor, self.target_model_update)
updates += self.actor.updates # include other updates of the actor, e.g. for BN
# Finally, combine it all into a callable function.
# The inputs will be all the necessary placeholders to compute the gradients (actor and critic inputs)
inputs = self.actor.inputs[:] + [self.critic_action_input, self.critic_history_input]
self.actor_train_fn = K.function(inputs, [self.actor.output], updates=updates)
self.actor_optimizer = actor_optimizer
self.compiled = True
When training the actor, you should now pass 3 inputs instead of 2: the observation inputs + the action input (with a prediction from the actor network), so you must also modify the "backward" function. In my case:
...
if self.episode > self.nb_steps_warmup_actor:
action = self.actor.predict_on_batch(history_batch)
inputs = [history_batch, action, history_batch]
actor_train_result = self.actor_train_fn(inputs)
action_values = actor_train_result[0]
assert action_values.shape == (self.batch_size, self.nb_actions)
...
After that you can have your actor with a linear activation in the output.

How to add regularizations in TensorFlow?

I found in many available neural network code implemented using TensorFlow that regularization terms are often implemented by manually adding an additional term to loss value.
My questions are:
Is there a more elegant or recommended way of regularization than doing it manually?
I also find that get_variable has an argument regularizer. How should it be used? According to my observation, if we pass a regularizer to it (such as tf.contrib.layers.l2_regularizer, a tensor representing regularized term will be computed and added to a graph collection named tf.GraphKeys.REGULARIZATOIN_LOSSES. Will that collection be automatically used by TensorFlow (e.g. used by optimizers when training)? Or is it expected that I should use that collection by myself?
As you say in the second point, using the regularizer argument is the recommended way. You can use it in get_variable, or set it once in your variable_scope and have all your variables regularized.
The losses are collected in the graph, and you need to manually add them to your cost function like this.
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
reg_constant = 0.01 # Choose an appropriate one.
loss = my_normal_loss + reg_constant * sum(reg_losses)
A few aspects of the existing answer were not immediately clear to me, so here is a step-by-step guide:
Define a regularizer. This is where the regularization constant can be set, e.g.:
regularizer = tf.contrib.layers.l2_regularizer(scale=0.1)
Create variables via:
weights = tf.get_variable(
name="weights",
regularizer=regularizer,
...
)
Equivalently, variables can be created via the regular weights = tf.Variable(...) constructor, followed by tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, weights).
Define some loss term and add the regularization term:
reg_variables = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
reg_term = tf.contrib.layers.apply_regularization(regularizer, reg_variables)
loss += reg_term
Note: It looks like tf.contrib.layers.apply_regularization is implemented as an AddN, so more or less equivalent to sum(reg_variables).
I'll provide a simple correct answer since I didn't find one. You need two simple steps, the rest is done by tensorflow magic:
Add regularizers when creating variables or layers:
tf.layers.dense(x, kernel_regularizer=tf.contrib.layers.l2_regularizer(0.001))
# or
tf.get_variable('a', regularizer=tf.contrib.layers.l2_regularizer(0.001))
Add the regularization term when defining loss:
loss = ordinary_loss + tf.losses.get_regularization_loss()
Another option to do this with the contrib.learn library is as follows, based on the Deep MNIST tutorial on the Tensorflow website. First, assuming you've imported the relevant libraries (such as import tensorflow.contrib.layers as layers), you can define a network in a separate method:
def easier_network(x, reg):
""" A network based on tf.contrib.learn, with input `x`. """
with tf.variable_scope('EasyNet'):
out = layers.flatten(x)
out = layers.fully_connected(out,
num_outputs=200,
weights_initializer = layers.xavier_initializer(uniform=True),
weights_regularizer = layers.l2_regularizer(scale=reg),
activation_fn = tf.nn.tanh)
out = layers.fully_connected(out,
num_outputs=200,
weights_initializer = layers.xavier_initializer(uniform=True),
weights_regularizer = layers.l2_regularizer(scale=reg),
activation_fn = tf.nn.tanh)
out = layers.fully_connected(out,
num_outputs=10, # Because there are ten digits!
weights_initializer = layers.xavier_initializer(uniform=True),
weights_regularizer = layers.l2_regularizer(scale=reg),
activation_fn = None)
return out
Then, in a main method, you can use the following code snippet:
def main(_):
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
# Make a network with regularization
y_conv = easier_network(x, FLAGS.regu)
weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'EasyNet')
print("")
for w in weights:
shp = w.get_shape().as_list()
print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
print("")
reg_ws = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 'EasyNet')
for w in reg_ws:
shp = w.get_shape().as_list()
print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
print("")
# Make the loss function `loss_fn` with regularization.
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
loss_fn = cross_entropy + tf.reduce_sum(reg_ws)
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss_fn)
To get this to work you need to follow the MNIST tutorial I linked to earlier and import the relevant libraries, but it's a nice exercise to learn TensorFlow and it's easy to see how the regularization affects the output. If you apply a regularization as an argument, you can see the following:
- EasyNet/fully_connected/weights:0 shape:[784, 200] size:156800
- EasyNet/fully_connected/biases:0 shape:[200] size:200
- EasyNet/fully_connected_1/weights:0 shape:[200, 200] size:40000
- EasyNet/fully_connected_1/biases:0 shape:[200] size:200
- EasyNet/fully_connected_2/weights:0 shape:[200, 10] size:2000
- EasyNet/fully_connected_2/biases:0 shape:[10] size:10
- EasyNet/fully_connected/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_1/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_2/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
Notice that the regularization portion gives you three items, based on the items available.
With regularizations of 0, 0.0001, 0.01, and 1.0, I get test accuracy values of 0.9468, 0.9476, 0.9183, and 0.1135, respectively, showing the dangers of high regularization terms.
If anyone's still looking, I'd just like to add on that in tf.keras you may add weight regularization by passing them as arguments in your layers. An example of adding L2 regularization taken wholesale from the Tensorflow Keras Tutorials site:
model = keras.models.Sequential([
keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
activation=tf.nn.relu),
keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
There's no need to manually add in the regularization losses with this method as far as I know.
Reference: https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#add_weight_regularization
I tested tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) and tf.losses.get_regularization_loss() with one l2_regularizer in the graph, and found that they return the same value. By observing the value's quantity, I guess reg_constant has already make sense on the value by setting the parameter of tf.contrib.layers.l2_regularizer.
If you have CNN you may do the following:
In your model function:
conv = tf.layers.conv2d(inputs=input_layer,
filters=32,
kernel_size=[3, 3],
kernel_initializer='xavier',
kernel_regularizer=tf.contrib.layers.l2_regularizer(1e-5),
padding="same",
activation=None)
...
In your loss function:
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=num_classes)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
regularization_losses = tf.losses.get_regularization_losses()
loss = tf.add_n([loss] + regularization_losses)
cross_entropy = tf.losses.softmax_cross_entropy(
logits=logits, onehot_labels=labels)
l2_loss = weight_decay * tf.add_n(
[tf.nn.l2_loss(tf.cast(v, tf.float32)) for v in tf.trainable_variables()])
loss = cross_entropy + l2_loss
Some answers make me more confused.Here I give two methods to make it clearly.
#1.adding all regs by hand
var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
var2 = tf.Variable(name='v2',initial_value=1.0,dtype=tf.float32)
regularizer = tf.contrib.layers.l1_regularizer(0.1)
reg_term = tf.contrib.layers.apply_regularization(regularizer,[var1,var2])
#here reg_term is a scalar
#2.auto added and read,but using get_variable
with tf.variable_scope('x',
regularizer=tf.contrib.layers.l2_regularizer(0.1)):
var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
var2 = tf.get_variable(name='v2',shape=[1],dtype=tf.float32)
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
#here reg_losses is a list,should be summed
Then,it can be added into the total loss
tf.GraphKeys.REGULARIZATION_LOSSES will not be added automatically, but there is a simple way to add them:
reg_loss = tf.losses.get_regularization_loss()
total_loss = loss + reg_loss
tf.losses.get_regularization_loss() uses tf.add_n to sum the entries of tf.GraphKeys.REGULARIZATION_LOSSES element-wise. tf.GraphKeys.REGULARIZATION_LOSSES will typically be a list of scalars, calculated using regularizer functions. It gets entries from calls to tf.get_variable that have the regularizer parameter specified. You can also add to that collection manually. That would be useful when using tf.Variable and also when specifying activity regularizers or other custom regularizers. For instance:
#This will add an activity regularizer on y to the regloss collection
regularizer = tf.contrib.layers.l2_regularizer(0.1)
y = tf.nn.sigmoid(x)
act_reg = regularizer(y)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, act_reg)
(In this example it would presumably be more effective to regularize x, as y really flattens out for large x.)

Categories