Inverting Gradients in Keras

Inverting Gradients in Keras - python

I'm trying to port the BoundingLayer function from this file to the DDPG.py agent in keras-rl but I'm having some trouble with the implementation.
I modified the get_gradients(loss, params) method in DDPG.py to add this:
action_bounds = [-30, 50]
inverted_grads = []
for g,p in zip(modified_grads, params):
is_above_upper_bound = K.greater(p, K.constant(action_bounds[1], dtype='float32'))
is_under_lower_bound = K.less(p, K.constant(action_bounds[0], dtype='float32'))
is_gradient_positive = K.greater(g, K.constant(0, dtype='float32'))
is_gradient_negative = K.less(g, K.constant(0, dtype='float32'))
invert_gradient = tf.logical_or(
tf.logical_and(is_above_upper_bound, is_gradient_negative),
tf.logical_and(is_under_lower_bound, is_gradient_positive)
)
inverted_grads.extend(K.switch(invert_gradient, -g, g))
modified_grads = inverted_grads[:]
But I get an error about the shape:
ValueError: Shape must be rank 0 but is rank 2 for 'cond/Switch' (op: 'Switch') with input shapes: [2,400], [2,400].

keras-rl "get_gradients" function uses gradients calculated with a combined actor-critic model, but you need the gradient of the critic output wrt the action input to apply the inverting gradients feature.
I've recently implemented it on a RDPG prototype I'm working on, using keras-rl. Still testing, the code can be optimized and is not bug free for sure, but I've put the inverting gradient to work by modifying some keras-rl lines of code. In order to modify the gradient of the critic output wrt the action input, I've followed the original formula to compute the actor gradient, with the help of this great post from Patrick Emami: http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html.
I'm putting here the entire "compile" function, redefined in a class that inherits from "DDPAgent", where the inverting gradient feature is implemented.
def compile(self, optimizer, metrics=[]):
metrics += [mean_q]
if type(optimizer) in (list, tuple):
if len(optimizer) != 2:
raise ValueError('More than two optimizers provided. Please only provide a maximum of two optimizers, the first one for the actor and the second one for the critic.')
actor_optimizer, critic_optimizer = optimizer
else:
actor_optimizer = optimizer
critic_optimizer = clone_optimizer(optimizer)
if type(actor_optimizer) is str:
actor_optimizer = optimizers.get(actor_optimizer)
if type(critic_optimizer) is str:
critic_optimizer = optimizers.get(critic_optimizer)
assert actor_optimizer != critic_optimizer
if len(metrics) == 2 and hasattr(metrics[0], '__len__') and hasattr(metrics[1], '__len__'):
actor_metrics, critic_metrics = metrics
else:
actor_metrics = critic_metrics = metrics
def clipped_error(y_true, y_pred):
return K.mean(huber_loss(y_true, y_pred, self.delta_clip), axis=-1)
# Compile target networks. We only use them in feed-forward mode, hence we can pass any
# optimizer and loss since we never use it anyway.
self.target_actor = clone_model(self.actor, self.custom_model_objects)
self.target_actor.compile(optimizer='sgd', loss='mse')
self.target_critic = clone_model(self.critic, self.custom_model_objects)
self.target_critic.compile(optimizer='sgd', loss='mse')
# We also compile the actor. We never optimize the actor using Keras but instead compute
# the policy gradient ourselves. However, we need the actor in feed-forward mode, hence
# we also compile it with any optimzer and
self.actor.compile(optimizer='sgd', loss='mse')
# Compile the critic.
if self.target_model_update < 1.:
# We use the `AdditionalUpdatesOptimizer` to efficiently soft-update the target model.
critic_updates = get_soft_target_model_updates(self.target_critic, self.critic, self.target_model_update)
critic_optimizer = AdditionalUpdatesOptimizer(critic_optimizer, critic_updates)
self.critic.compile(optimizer=critic_optimizer, loss=clipped_error, metrics=critic_metrics)
clipnorm = getattr(actor_optimizer, 'clipnorm', 0.)
clipvalue = getattr(actor_optimizer, 'clipvalue', 0.)
critic_gradients_wrt_action_input = tf.gradients(self.critic.output, self.critic_action_input)
critic_gradients_wrt_action_input = [g / float(self.batch_size) for g in critic_gradients_wrt_action_input] # since TF sums over the batch
action_bounds = [(-1.,1.) for i in range(self.nb_actions)]
def calculate_inverted_gradient():
"""
Applies "inverting gradient" feature to the action-value gradients.
"""
gradient_wrt_action = -critic_gradients_wrt_action_input[0]
inverted_gradients = []
for n in range(self.batch_size):
inverted_gradient = []
for i in range(gradient_wrt_action[n].shape[0].value):
action = self.critic_action_input[n][i]
is_gradient_negative = K.less(gradient_wrt_action[n][i], K.constant(0, dtype='float32'))
adjust_for_upper_bound = gradient_wrt_action[n][i] * ((action_bounds[i][1] - action) / (action_bounds[i][1] - action_bounds[i][0]))
adjust_for_lower_bound = gradient_wrt_action[n][i] * ((action - action_bounds[i][0]) / (action_bounds[i][1] - action_bounds[i][0]))
modified_gradient = K.switch(is_gradient_negative, adjust_for_upper_bound, adjust_for_lower_bound)
inverted_gradient.append( modified_gradient )
inverted_gradients.append(inverted_gradient)
gradient_wrt_action = tf.stack(inverted_gradients)
return gradient_wrt_action
actor_gradients_wrt_weights = tf.gradients(self.actor.output, self.actor.trainable_weights, grad_ys=calculate_inverted_gradient())
actor_gradients_wrt_weights = [g / float(self.batch_size) for g in actor_gradients_wrt_weights] # since TF sums over the batch
def get_gradients(loss, params):
""" Used by the actor optimizer.
Returns the gradients to train the actor.
These gradients are obtained by multiplying the gradients of the actor output w.r.t. its weights
with the gradients of the critic output w.r.t. its action input. """
# Aplly clipping if defined
modified_grads = [g for g in actor_gradients_wrt_weights]
if clipnorm > 0.:
norm = K.sqrt(sum([K.sum(K.square(g)) for g in modified_grads]))
modified_grads = [optimizers.clip_norm(g, clipnorm, norm) for g in modified_grads]
if clipvalue > 0.:
modified_grads = [K.clip(g, -clipvalue, clipvalue) for g in modified_grads]
return modified_grads
actor_optimizer.get_gradients = get_gradients
# get_updates is the optimizer function that changes the weights of the network
updates = actor_optimizer.get_updates(self.actor.trainable_weights, self.actor.constraints, None)
if self.target_model_update < 1.:
# Include soft target model updates.
updates += get_soft_target_model_updates(self.target_actor, self.actor, self.target_model_update)
updates += self.actor.updates # include other updates of the actor, e.g. for BN
# Finally, combine it all into a callable function.
# The inputs will be all the necessary placeholders to compute the gradients (actor and critic inputs)
inputs = self.actor.inputs[:] + [self.critic_action_input, self.critic_history_input]
self.actor_train_fn = K.function(inputs, [self.actor.output], updates=updates)
self.actor_optimizer = actor_optimizer
self.compiled = True
When training the actor, you should now pass 3 inputs instead of 2: the observation inputs + the action input (with a prediction from the actor network), so you must also modify the "backward" function. In my case:
...
if self.episode > self.nb_steps_warmup_actor:
action = self.actor.predict_on_batch(history_batch)
inputs = [history_batch, action, history_batch]
actor_train_result = self.actor_train_fn(inputs)
action_values = actor_train_result[0]
assert action_values.shape == (self.batch_size, self.nb_actions)
...
After that you can have your actor with a linear activation in the output.

Related

How to override gradient for the nonlinearity functions in lasagne?

I have a model, for which i need to compute the gradients of output w.r.t the model's input. But I want to apply some custom gradients for some of the nonlinearity functions applied on some of the model's layers. So i tried the idea explained here, which computes the nonlinear rectifier (RELU) in the forward pass but modifies the gradients of Relu in the backward pass. I added the following two classes:
The helper class that allows us to replace a nonlinearity with an Op
that has the same output, but a custom gradient
class ModifiedBackprop(object):
def __init__(self, nonlinearity):
self.nonlinearity = nonlinearity
self.ops = {} # memoizes an OpFromGraph instance per tensor type
def __call__(self, x):
# OpFromGraph is oblique to Theano optimizations, so we need to move
# things to GPU ourselves if needed.
if theano.sandbox.cuda.cuda_enabled:
maybe_to_gpu = theano.sandbox.cuda.as_cuda_ndarray_variable
else:
maybe_to_gpu = lambda x: x
# We move the input to GPU if needed.
x = maybe_to_gpu(x)
# We note the tensor type of the input variable to the nonlinearity
# (mainly dimensionality and dtype); we need to create a fitting Op.
tensor_type = x.type
# If we did not create a suitable Op yet, this is the time to do so.
if tensor_type not in self.ops:
# For the graph, we create an input variable of the correct type:
inp = tensor_type()
# We pass it through the nonlinearity (and move to GPU if needed).
outp = maybe_to_gpu(self.nonlinearity(inp))
# Then we fix the forward expression...
op = theano.OpFromGraph([inp], [outp])
# ...and replace the gradient with our own (defined in a subclass).
op.grad = self.grad
# Finally, we memoize the new Op
self.ops[tensor_type] = op
# And apply the memoized Op to the input we got.
return self.ops[tensor_type](x)
The subclass that does guided backpropagation through a nonlinearity:
class GuidedBackprop(ModifiedBackprop):
def grad(self, inputs, out_grads):
(inp,) = inputs
(grd,) = out_grads
dtype = inp.dtype
print('It works')
return (grd * (inp > 0).astype(dtype) * (grd > 0).astype(dtype),)
Then i used them in my code as follows:
import lasagne as nn
model_in = T.tensor3()
# model_in = net['input'].input_var
nn.layers.set_all_param_values(net['l_out'], model['param_values'])
relu = nn.nonlinearities.rectify
relu_layers = [layer for layer in
nn.layers.get_all_layers(net['l_out']) if getattr(layer,
'nonlinearity', None) is relu]
modded_relu = GuidedBackprop(relu)
for layer in relu_layers:
layer.nonlinearity = modded_relu
prop = nn.layers.get_output(
net['l_out'], model_in, deterministic=True)
for sample in range(ini, batch_len):
model_out = prop[sample, 'z'] # get prop for label 'z'
gradients = theano.gradient.jacobian(model_out, wrt=model_in)
# gradients = theano.grad(model_out, wrt=model_in)
get_gradients = theano.function(inputs=[model_in],
outputs=gradients)
grads = get_gradients(X_batch) # gradient dimension: X_batch == model_in(64, 20, 32)
grads = np.array(grads)
grads = grads[sample]
Now when i run the code, it works without any error, and the shape of the output is also correct. But that's because it executes the default theano.grad function and not the one supposed to override it. In other words, the grad() function in the class GuidedBackprop never been invoked.
I can't understand what is the issue?
is there's a solution?
If this is an unresolved issue, is there's an implementation for a Theano Op that can achieve such a functionality or some other way to override gradient for specific nonlinearity functions applied on some of the model's layers?

Are you try to set it back the value of model output into model layer input, all gradients calculation
group_1_ShoryuKen_Left = tf.constant([ 0,0,0,0,0,1,0,0,0,0,0,0, 0,0,0,0,0,1,0,1,0,0,0,0, 0,0,0,0,0,0,0,1,0,0,0,0, 0,0,0,0,0,0,0,0,0,1,0,0 ], shape=(1, 1, 48), dtype=tf.float32)
## layer_2 = tf.keras.layers.Dense(256, kernel_initializer=tf.constant_initializer(1.))
layer_2 = tf.keras.layers.LSTM(32, kernel_initializer=tf.constant_initializer(1.))
b_out = layer_2(group_1_ShoryuKen_Left)
layer_2.set_weights(layer_1.get_weights())

One of the variables modified by an inplace operation

I am relatively new to Pytorch. Here I want to use this model to generate some images, however as this was written before Pytorch 1.5, since the gradient calculation has been fixed then, this is the error message.
RuntimeError: one of the variables needed for gradient computation has been
modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]]
is at version 2; expected version 1 instead.
Hint: enable anomaly detection to find the operation that
failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
I have looked at past examples and am not sure what is the problem here, I believe it is happening within this region but I don’t know where! Any help would be greatly appreciated!
def process(self, images, edges, masks):
self.iteration += 1
# zero optimizers
self.gen_optimizer.zero_grad()
self.dis_optimizer.zero_grad()
# process outputs
outputs = self(images, edges, masks)
gen_loss = 0
dis_loss = 0
# discriminator loss
dis_input_real = torch.cat((images, edges), dim=1)
dis_input_fake = torch.cat((images, outputs.detach()), dim=1)
dis_real, dis_real_feat = self.discriminator(dis_input_real) # in: (grayscale(1) + edge(1))
dis_fake, dis_fake_feat = self.discriminator(dis_input_fake) # in: (grayscale(1) + edge(1))
dis_real_loss = self.adversarial_loss(dis_real, True, True)
dis_fake_loss = self.adversarial_loss(dis_fake, False, True)
dis_loss += (dis_real_loss + dis_fake_loss) / 2
# generator adversarial loss
gen_input_fake = torch.cat((images, outputs), dim=1)
gen_fake, gen_fake_feat = self.discriminator(gen_input_fake) # in: (grayscale(1) + edge(1))
gen_gan_loss = self.adversarial_loss(gen_fake, True, False)
gen_loss += gen_gan_loss
# generator feature matching loss
gen_fm_loss = 0
for i in range(len(dis_real_feat)):
gen_fm_loss += self.l1_loss(gen_fake_feat[i], dis_real_feat[i].detach())
gen_fm_loss = gen_fm_loss * self.config.FM_LOSS_WEIGHT
gen_loss += gen_fm_loss
# create logs
logs = [
("l_d1", dis_loss.item()),
("l_g1", gen_gan_loss.item()),
("l_fm", gen_fm_loss.item()),
]
return outputs, gen_loss, dis_loss, logs
def forward(self, images, edges, masks):
edges_masked = (edges * (1 - masks))
images_masked = (images * (1 - masks)) + masks
inputs = torch.cat((images_masked, edges_masked, masks), dim=1)
outputs = self.generator(inputs) # in: [grayscale(1) + edge(1) + mask(1)]
return outputs
def backward(self, gen_loss=None, dis_loss=None):
if dis_loss is not None:
dis_loss.backward()
self.dis_optimizer.step()
if gen_loss is not None:
gen_loss.backward()
self.gen_optimizer.step()
Thank you!

You can't compute the loss for the discriminator and for the generator in one go and have the both back-propagations back-to-back like this:
if dis_loss is not None:
dis_loss.backward()
self.dis_optimizer.step()
if gen_loss is not None:
gen_loss.backward()
self.gen_optimizer.step()
Here's the reason why: when you call self.dis_optimizer.step(), you effectively in-place modify the parameters of the discriminator, the very same that were used to compute gen_loss which you are trying to backpropagate on. This is not possible.
You have to compute dis_loss backpropagate, update the weights of the discriminator, and clear the gradients. Only then can you compute gen_loss with the newly updated discriminator weights. Finally, backpropagate on the generator.
This tutorial is a good walkthrough over a typical GAN training.

This might not be an answer exactly to your question but I got this when trying to use a "custom" distributed optimizer e.g. I was using Cherry's optimizer and accidentially moving the model to a DDP model at the same time. Once I only moved the model to device according to how cherry worked I stopped getting this issue.
context: https://github.com/learnables/learn2learn/issues/263

This worked for me. For more details, please see here.
def backward(self, gen_loss=None, dis_loss=None):
if dis_loss is not None:
dis_loss.backward(retain_graph=True) # modified here
self.dis_optimizer.step()
if gen_loss is not None:
gen_loss.backward()
self.gen_optimizer.step()

Why PyTorch optimizer might fail to update its parameters?

I am trying to do a simple loss-minimization for a specific variable coeff using PyTorch optimizers. This variable is supposed to be used as an interpolation coefficient for two vectors w_foo and w_bar to find a third vector, w_target.
w_target = `w_foo + coeff * (w_bar - w_foo)
With w_foo and w_bar set as constant, at each optimization step I calculate w_target for the given coeff. Loss is determined from w_target using a fairly complex process beyond the scope of this question.
# w_foo.shape = [1, 16, 512]
# w_bar.shape = [1, 16, 512]
# num_layers = 16
# num_steps = 10000
vgg_loss = VGGLoss()
coeff = torch.randn([num_layers, ])
optimizer = torch.optim.Adam([coeff], lr=initial_learning_rate)
for step in range(num_steps):
w_target = w_foo + torch.matmul(coeff, (w_bar - w_foo))
optimizer.zero_grad()
target_image = generator.synthesis(w_target)
processed_target_image = process(target_image)
loss = vgg_loss(processed_target_image, source_image)
loss.backward()
optimizer.step()
However, when running this optimizer, I am met with query_opt not changing from one step to another, optimizer being essentially useless. I would like to ask for some advice on what I am doing wrong here.
Edit:
As suggested, I will try to elaborate on the loss function. Essentially, w_target is used to generate an image, and VGGLoss uses VGG feature extractor to compare this synthetic image with a certain exemplar source image.
class VGGLoss(torch.nn.Module):
def __init__(self, device, vgg):
super().__init__()
for param in self.parameters():
param.requires_grad = True
self.vgg = vgg # VGG16 in eval mode
def forward(self, source, target):
loss = 0
source_features = self.vgg(source, resize_images=False, return_lpips=True)
target_features = self.vgg(target, resize_images=False, return_lpips=True)
loss += (source_features - target_features).square().sum()
return loss

Using multiple loss functions in pytorch

I was working on an image restoration task and I considered multiple loss functions . My plan was to consider 3 routes:
1: Use multiple losses for monitoring but use only a few for training itself
2: Out of those loss functions that are used for training, I needed to give each a weight - currently I am specifying the weight. I would like to make that parameter adaptive.
3: If in between training - if I observe a saturation I would like to change the loss function . or its components. Currently I considered re-training a network (if in the first training the model saturated) such that it trained with a particular loss function for the the first say M epochs after which I change the loss.
Except the last case I developed a code which computes these losses but I am not sure whether it will work. - ie whether it will backpropagate? (code given below)
is it possible to give the weights adaptively when using combination of loss functions - ie can we train the network so that these weights are also learned ?
can this implementation be used for the above mentioned case 3 of changing loss functions
Sorry if anything given here is not clear or wrong. Please let me know if I have to improve the question. (I am kinda new to PyTorch)
criterion = _criterion
#--training
prediction = model(input)
loss = criterion(prediction, target)
loss.backward()
class _criterion(nn.Module):
def __init__(self, model_type="CNN"):
super(_criterion).__init__()
self.model_type = model_type
def forward(self, pred, ref):
loss_1 = lambda x,y : nn.MSELoss(size_average=False)(x,y)
loss_2 = lambda x,y : nn.L1Loss(size_average=False)(x,y)
loss_3 = lambda x,y : nn.SmoothL1Loss(size_average=False)(x,y)
loss_4 = lambda x,y : L1_Charbonnier_loss_()(x,y) #user-defined
if opt.loss_function_order == 1:
loss_function_1 = get_loss_function(opt.loss_function_1)
loss = lambda x,y: 1*loss_function_1(x,y)
elif opt.loss_function_order == 2:
loss_function_1 = get_loss_function(opt.loss_function_1)
loss_function_2 = get_loss_function(opt.loss_function_2)
weight_1 = opt.loss_function_1_weight
weight_2 = opt.loss_function_2_weight
loss = lambda x,y: weight_1*loss_function_1(x,y) + weight_2*loss_function_2(x,y)
elif opt.loss_function_order == 3:
loss_function_1 = get_loss_function(opt.loss_function_1)
loss_function_2 = get_loss_function(opt.loss_function_2)
loss_function_3 = get_loss_function(opt.loss_function_3)
weight_1 = opt.loss_function_1_weight
weight_2 = opt.loss_function_2_weight
weight_3 = opt.loss_function_3_weight
loss = lambda x,y: weight_1*loss_function_1(x,y) + weight_2*loss_function_2(x,y) + weight_3*loss_function_3(x,y)
elif opt.loss_function_order == 4:
loss_function_1 = get_loss_function(opt.loss_function_1)
loss_function_2 = get_loss_function(opt.loss_function_2)
loss_function_3 = get_loss_function(opt.loss_function_3)
loss_function_4 = get_loss_function(opt.loss_function_4)
weight_1 = opt.loss_function_1_weight
weight_2 = opt.loss_function_2_weight
weight_3 = opt.loss_function_3_weight
weight_4 = opt.loss_function_4_weight
loss = lambda x,y: weight_1*loss_function_1(x,y) + weight_2*loss_function_2(x,y) + weight_3*loss_function_3(x,y) + weight_4*loss_function_4(x,y)
else:
raise Exception("_criterion : unable to interpret loss_function_order")
return loss(ref,pred), loss_1(ref,pred), loss_2(ref,pred), loss_3(ref,pred), loss_4(ref,pred)
def get_loss_function(loss):
if loss == "MSE":
criterion = nn.MSELoss(size_average=False)
elif loss == "MAE":
criterion = nn.L1Loss(size_average=False)
elif loss == "Smooth-L1":
criterion = nn.SmoothL1Loss(size_average=False)
elif loss == "Charbonnier":
criterion = L1_Charbonnier_loss_()
else:
raise Exception("not implemented")
return criterion
class L1_Charbonnier_loss_(nn.Module):
def __init__(self):
super(L1_Charbonnier_loss_, self).__init__()
self.eps = 1e-6
def forward(self, X, Y):
diff = torch.add(X, -Y)
error = self.eps*((torch.sqrt(1+((diff * diff)/self.eps)))-1)
loss = torch.sum(error)
return loss

Whereas I understand your question. Your error calculation function will do the backpropagation, but you need to be careful when using the error functions, as they work differently for each situation.
Regarding the weights, you need to save the weights of this network and then load it again into another one using pytorch's transfer learning so that you can use the weights of other executions.
Here is the link on how to use the learning transfer from pythorch.

tensorflow 2 : loss using hidden layers output

I am trying to implement the OSME MAMC model describe in article https://arxiv.org/abs/1806.05372.
I'm stuck where I have to add a cost that doesn't depend on y_true and y_pred but on hidden layers and y_true.
It can't be right as tensorflow custom loss, for which we need y_true and y_pred.
I wrote the model into class, then tried to use gradient tape to add NPairLoss to Softmax output loss, but gradient is NaN during training.
I think my approach isn't good, but I have no idea how to design / write it.
Here my model :
class OSME_network(tf.keras.Model):
def __init__(self, nbrclass=10, weight="imagenet",input_tensor=(32,32,3)):
super(OSME_network, self).__init__()
self.nbrclass = nbrclass
self.weight = weight
self.input_tensor=input_tensor
self.Resnet_50=ResNet50(include_top=False, weights=self.weight, input_shape=self.input_tensor)
self.Resnet_50.trainable=False
self.split=Lambda(lambda x: tf.split(x,num_or_size_splits=2,axis=-1))
self.s_1=OSME_Layer(ch=1024,ratio=16)
self.s_2=OSME_Layer(ch=1024,ratio=16)
self.fl1=tf.keras.layers.Flatten()
self.fl2=tf.keras.layers.Flatten()
self.d1=tf.keras.layers.Dense(1024, name='fc1')
self.d2=tf.keras.layers.Dense(1024,name='fc2')
self.fc=Concatenate()
self.preds=tf.keras.layers.Dense(self.nbrclass,activation='softmax')
#tf.function
def call(self,x): #set à construire le model sequentiellement
x=self.Resnet_50(x)
x_1,x_2=self.split(x)
xx_1 = self.s_1(x_1)
xx_2 = self.s_2(x_2)
xxx_1 = self.d1(xx_1)
xxx_2 = self.d2(xx_2)
xxxx_1 = self.fl1(xxx_1)
xxxx_2 = self.fl2(xxx_2)
fc = self.fc([xxxx_1,xxxx_2]) #fc1 + fc2
ret=self.preds(fc)
return xxxx_1,xxxx_2,ret
class OSME_Layer(tf.keras.layers.Layer):
def __init__(self,ch,ratio):
super(OSME_Layer,self).__init__()
self.GloAvePool2D=GlobalAveragePooling2D()
self.Dense1=Dense(ch//ratio,activation='relu')
self.Dense2=Dense(ch,activation='sigmoid')
self.Mult=Multiply()
self.ch=ch
def call(self,inputs):
squeeze=self.GloAvePool2D(inputs)
se_shape = (1, 1, self.ch)
se = Reshape(se_shape)(squeeze)
excitation=self.Dense1(se)
excitation=self.Dense2(excitation)
scale=self.Mult([inputs,excitation])
return scale
class NPairLoss():
def __init__(self):
self._inputs = None
self._y=None
#tf.function
def __call__(self,inputs,y):
targets=tf.argmax(y, axis=1)
b, p, _ = inputs.shape
n = b * p
inputs=tf.reshape(inputs, [n, -1])
targets = tf.repeat(targets,repeats=p)
parts = tf.tile(tf.range(p),[b])
prod=tf.linalg.matmul(inputs,inputs,transpose_a=False,transpose_b=True)
same_class_mask = tf.math.equal(tf.broadcast_to(targets,[n, n]),tf.transpose(tf.broadcast_to(targets,(n, n))))
same_atten_mask = tf.math.equal(tf.broadcast_to(parts,[n, n]),tf.transpose(tf.broadcast_to(parts,(n, n))))
s_sasc = same_class_mask & same_atten_mask
s_sadc = (~same_class_mask) & same_atten_mask
s_dasc = same_class_mask & (~same_atten_mask)
s_dadc = (~same_class_mask) & (~same_atten_mask)
loss_sasc = 0
loss_sadc = 0
loss_dasc = 0
for i in range(n):
#loss_sasc
pos = prod[i][s_sasc[i]]
neg = prod[i][s_sadc[i] | s_dasc[i] | s_dadc[i]]
n_pos=tf.shape(pos)[0]
n_neg=tf.shape(neg)[0]
pos = tf.transpose(tf.broadcast_to(pos,[n_neg,n_pos]))
neg = tf.broadcast_to(neg,[n_pos,n_neg])
exp=tf.clip_by_value(tf.math.exp(neg - pos),clip_value_min=0,clip_value_max=9e6) # need to clip value, else inf
loss_sasc += tf.reduce_sum(tf.math.log(1 + tf.reduce_sum(exp,axis=1)))
#loss_sadc
pos = prod[i][s_sadc[i]]
neg = prod[i][s_dadc[i]]
n_pos = tf.shape(pos)[0]
n_neg = tf.shape(neg)[0]
pos = tf.transpose(tf.broadcast_to(pos,[n_neg,n_pos])) #np.transpose(np.tile(pos,[n_neg,1]))
neg = tf.broadcast_to(neg,[n_pos,n_neg])#np.tile(neg,[n_pos,1])
exp=tf.clip_by_value(tf.math.exp(neg - pos),clip_value_min=0,clip_value_max=9e6)
loss_sadc += tf.reduce_sum(tf.math.log(1 + tf.reduce_sum(exp,axis=1)))
#loss_dasc
pos = prod[i][s_dasc[i]]
neg = prod[i][s_dadc[i]]
n_pos = tf.shape(pos)[0]
n_neg = tf.shape(neg)[0]
pos = tf.transpose(tf.broadcast_to(pos,[n_neg,n_pos])) #np.transpose(np.tile(pos,[n_neg,1]))
neg = tf.broadcast_to(neg,[n_pos,n_neg])#np.tile(neg,[n_pos,1])
exp=tf.clip_by_value(tf.math.exp(neg - pos),clip_value_min=0,clip_value_max=9e6)
loss_dasc += tf.reduce_sum(tf.math.log(1 + tf.reduce_sum(exp,axis=1)))
return (loss_sasc + loss_sadc + loss_dasc) / n
then, for training :
#tf.function
def train_step(x,y):
with tf.GradientTape() as tape:
fc1,fc2,y_pred=model(x,training=True)
stacked=tf.stack([fc1,fc2],axis=1)
layerLoss=npair(stacked,y)
loss=cce(y, y_pred) +0.001*layerLoss
grads=tape.gradient(loss,model.trainable_variables)
opt.apply_gradients(zip(grads,model.trainable_variables))
return loss
model=OSME_network(weight="imagenet",nbrclass=10,input_tensor=(32, 32, 3))
model.compile(optimizer=opt, loss=categorical_crossentropy, metrics=["acc"])
model.build(input_shape=(None,32,32,3))
cce = tf.keras.losses.CategoricalCrossentropy(from_logits=True,name='categorical_crossentropy')
npair=NPairLoss()
for each batch :
x=tf.Variable(x_train[start:end])
y=tf.Variable(y_train[start:end])
train_loss=train_step(x,y)
Thanks for any help :)

You can use tensorflow's add_loss.
model.compile() loss functions in Tensorflow always take two parameters y_true and y_pred. Using model.add_loss() has no such restriction and allows you to write much more complex losses that depend on many other tensors, but it has the inconvenience of being more dependent on the model, whereas the standard loss functions work with just any model.
You can find the official documentation of add_loss here. Add loss tensor(s), potentially dependent on layer inputs. This method can be used inside a subclassed layer or model's call function, in which case losses should be a Tensor or list of Tensors. There are few example in the documentation to explain the add_loss.
This method can also be called directly on a Functional Model during construction. In this case, any loss Tensors passed to this Model must be symbolic and be able to be traced back to the model's Inputs. These losses become part of the model's topology and are tracked in get_config.
Example :
inputs = tf.keras.Input(shape=(10,))
x = tf.keras.layers.Dense(10)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs, outputs)
# Activity regularization.
model.add_loss(tf.abs(tf.reduce_mean(x)))
You can call self.add_loss(loss_value) from inside the call method of a custom layer. Here's a simple example that adds activity regularization.
Example:
class ActivityRegularizationLayer(layers.Layer):
def call(self, inputs):
self.add_loss(tf.reduce_sum(inputs) * 0.1)
return inputs # Pass-through layer.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
# The displayed loss will be much higher than before
# due to the regularization component.
model.fit(x_train, y_train,
batch_size=64,
epochs=1)
You can find good example using add_loss here and here with explanations.
Hope this answers your question. Happy Learning.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Inverting Gradients in Keras - python

Related

How to override gradient for the nonlinearity functions in lasagne?

One of the variables modified by an inplace operation

Why PyTorch optimizer might fail to update its parameters?

Using multiple loss functions in pytorch

tensorflow 2 : loss using hidden layers output

Categories

Resources