I want to multiply a Keras layer with my own Variable.
Then, I want to compute the gradients of some loss relative to the variables I have defined.
Here is a simplified MWE of what I am trying to do:
import tensorflow as tf
x = input_shape = tf.keras.layers.Input((10,))
x = tf.keras.layers.Dense(5)(x)
s = tf.Variable(tf.ones((5,)))
x = x*s
model = tf.keras.models.Model(input_shape, x)
X = tf.random.normal((50, 10)) # random sample
with tf.GradientTape() as tape:
tape.watch(s)
y = model(X)
loss = y**2
print(tape.gradient(loss, s)) # why None ??
The print prints None... why?
Notice that I am using eager-execution (TF version 2.0.0).
I managed to fix my problem by sub-classing Model and creating my variable inside the model:
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(5)
self.s = tf.Variable(tf.ones((5,)))
def call(self, inputs):
x = self.dense(inputs)
x = x * self.s
return x
Alternatively, defining my own custom layer also works.
There must be some magic going on whereby variables not inside a model are not backpropagated (like in PyTorch).
I will leave the question open because I am curious as to why my code was not working and what a simpler fix would look like.
This might be the explanation. Based on reviewing the documentation, I'm suspecting that the issue is the differentiation with respect to the model layer "s" (or any other layer say "x") might not be a meaningful calculation. For example, it is possible to do this:
print(tape.gradient(loss, model.variables))
and obtain the gradients with respect to the model weights/parameters, but differentiating the model with respect to a "layer" is not appropriate. This is my speculation at this point. I hope this helps.
Related
I am trying to make a model in tensorflow using the keras subclasses method.
Q1) I am correctly calling layers as layers = [] and then using layers.append(GTLayer....) ?
Q2) calling GTLayer in init of GTN will run class GTLayer and will it call self.conv1 (which will return a tensor A from GTNconv) and self.conv2 (which will again return a tensor A from GTNconv)and then start the call mrthod of GTLayer to H,W , Am I right?
Q3) What happens to the returned H and W from 'Q2' will it store in layers[] list ? and then when we further call the GTNs call method it will bring up those layer? Am I correct?
Q4)Later in the GTNs call method I had to implement linear layers and thus I defined model = tf.keras.models.Sequential() and after theat initialised self.linear1 and self.linear2, this way I have implemented subclassing and sequential both! Is that correct?
Q5) I will finally get loss, y, Ws from calling GTN , now if I assign my model = GTN(arguments..) how will I do the training and back-propagation steps? using an optimiser and loss function? will it follow model.compile() and model.fit ? Or can we make it any different in the sub-classing method of keras?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class GTN(layers.Layer):
def __init__(self, num_edge, num_channels,num_layers,norm):
super(GTN, self).__init__()
self.num_edge = num_edge
self.num_channels = num_channels
self.num_layers = num_layers
self.is_norm = norm
layers = []
for i in tf.range(num_layers):
if i == 0:
layers.append(GTLayer(num_edge, num_channels, first=True))
else:
layers.append(GTLayer(num_edge, num_channels, first=False))
model = tf.keras.models.Sequential()
self.loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
self.linear1 = model.add(tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None))
self.linear2 = model.add(tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None))
def gcn_conv(self,X,H):
X = tf.matmul(X, self.weight)
H = self.norm(H, add=True)
return tf.matmul(tf.transpose(H),X)
def call(self, A, X, target_x, target):
A = tf.expand_dims(A, 0)
Ws = []
for i in range(self.num_layers):
H = self.normalization(H)
H, W = self.layers[i](A, H)
Ws.append(W)
for i in range(self.num_channels):
X_tmp = tf.nn.relu(self.gcn_conv(X,H[i])).numpy()
X_ = tf.concat((X_,X_tmp), dim=1)
X_ = self.linear1(X_)
X_ = tf.nn.relu(X_).numpy()
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws
class GTLayer(keras.layers.Layer):
def __init__(self, in_channels, out_channels, first=True):
super(GTLayer, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.conv1 = GTConv(in_channels, out_channels)
self.conv2 = GTConv(in_channels, out_channels)
def call(self, A, H_=None):
a = self.conv1(A)
b = self.conv2(A)
H = tf.matmul( a, b)
W = [tf.stop_gradient(tf.nn.softmax(self.conv1.weight, axis=1).numpy()),
tf.stop_gradient(tf.nn.softmax(self.conv2.weight, axis=1).numpy()) ]
return H,W
class GTConv(keras.layers.Layer):
def __init__(self, in_channels, out_channels):
super(GTConv, self).__init__()
def call(self, A):
A = tf.add_n(tf.nn.softmax(self.weight))
return A
Q1
No. There are two possibilities here
1 - If you want to access a standard layers property of Keras models:
Only Model has a layers property, a keras.layers.Layer doesn't have this property
You are not supposed to mess with the layers property of a Model, you should just read it
The variable you are creating named layers is not a property of your class because you did not use self.layers.
2 - If you just want a list named layers for personal use in your class:
I recommend you don't use a standard name like this and change it to myLayers or something like that to avoid confusion.
The variable layers you created is not being used anywhere else in your code, you just created it and never used.
Remember that layers = [] just creates a local variable, while self.layers = [] creates a property in your class that can be used in other methods inside your class
Q2
You are not "calling" GTLayer, you are "creating" GTLayer. This means that you are running GTLayer.__init__().
This distinction is important in Keras:
This is "creating" a layer: layer_instance = GTLayer(...), which runs __init__
This is "calling" a layer: layer_instance(input_tensors), which runs __call__ (which will eventually run call as defined by you)
You can do both in the same line as output_tensors = GTLayer(...)(input_tensors)
So, this is happening in GTN.__init__:
You are "creating" two instances of the GTLayer.
This runs GTLayer.__init__() for each instance
This hits the lines self.conv1 = GTConv(in_channels, out_channels) and self.conv2 = GTConv(in_channels, out_channels)
This is also "creating" (not "calling") GTConv.
self.conv1 and self.conv2 are "Layer" instances, not tensors.
Q3
No tensor is produced here because you never "called" any layer in GTN.__init__().
(And this is ok. Usually, you "create" layers inside __init__() and "call" layers inside call.)
Your layers local variable will have "instances of GTLayer".
Q4
You mixed two approaches in a strange way.
You can, of course, use a Sequential model if you want, but it's not necessary, and you're not using it correcly.
If in call you are calling each layer (that is X_ = self.linear1(X_) and y = self.linear2(X_[target_x])), you don't need a Sequential model at all, and you can just have the following in GTN.__init__() (this is the best approach for subclassing):
self.linear1 = tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None)
self.linear2 = tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None)
But you could have self.submodel = Sequential(...) and then use self.submodel in GTN.call(). But having a Model inside a layer sounds weird and might cause some strange behavior in specific cases. And, of course, the ReLUs should be a part of this submodel.
Q5
I will finally get loss, y, Ws from calling GTN
That loss and weights coming from call is a "very very" strange thing. I never saw this and I don't understand why you're doing it this way. This is not standard use of Keras and only in very specific and otherwise unsolvable cases you'd try something like this. I cannot say it will work.
How will I do the training and back-propagation steps?
You should have implemented a keras.models.Model, not a keras.layers.Layer. Only models have the ability to compile and train.
Usually, you'd not create a loss in call, you'd create a loss in model.compile, unless you're dealing with unconventional losses, like weight or activity regularization, things that really depend on the layer and not on the model's inputs/outputs.
Extra tips
There is no need to create custom layers if you're not going to create custom trainable weights. It's not wrong, of course, but also not necessary. It can help organize your code, or just add extra complication.
You are trying to use weight from your layers, but you never defined any weight anywhere.
I'm pretty sure there is a better way to achieve what you want, but I don't know what you want (and that would be something for another question, I think...)
This might be a good reading for subclassing: https://www.tensorflow.org/guide/keras/custom_layers_and_models?hl=en
I am building up a cascade of neural networks and I would like to backpropagate the main loss back to the DNNs and also compute an auxillary loss back to each DNN.
I am trying to figure out what is the best practice when building such a model and how to make sure that my losses are computed properly. Do I build a single torch.nn.Module and a single optimizer, or do I have to create separate modules and optimizers for each network? Also I am likely to have more than three cascaded DNNs.
Approach a)
import torch
from torch import nn, optim
class MasterNetwork(nn.Module):
def init(self):
super(MasterNetwork, self).__init__()
dnn1 = nn.ModuleList()
dnn2 = nn.ModuleList()
dnn3 = nn.ModuleList()
def forward(self, x, z1, z2):
out1 = dnn1(x)
out2 = dnn2(out1 + z1)
out3 = dnn3(out2 + z2)
return [out1, out2, out3]
def LossFunction(in):
# do stuff
return loss # loss is a scalar value
def ac_loss_1_fn(in):
# do stuff
return loss # loss is a scalar value
def ac_loss_2_fn(in):
# do stuff
return loss # loss is a scalar value
def ac_loss_3_fn(in):
# do stuff
return loss # loss is a scalar value
model = MasterNetwork()
optimizer = optim.Adam(model.parameters())
input = torch.tensor()
z1 = torch.tensor()
z2 = torch.tensor()
outputs = model(input, z1, z2)
main_loss = LossFunction(outputs[2])
ac1_loss = ac_loss_1_fn(outputs[0])
ac2_loss = ac_loss_2_fn(outputs[1])
ac3_loss = ac_loss_3_fn(outputs[2])
optimizer.zero_grad()
'''
This is where I am uncertain about how to backpropagate the AC losses for each DNN
in addition to the main loss.
'''
optimizer.step()
Approach b)
This would creating a nn.Module class and optimizer for each DNN and then forwarding the loss to the next DNN.
I would prefer to have a solution for approach a) since it is less tedious and I don't have to deal with tuning multiple optimizers. However, I am not sure if this is possible. There was a similar question about backpropagating multiple losses, however, I was not able to understand how combining the losses would work for the distinct components.
the solution you are looking for is likely to use some form of the following:
y = torch.tensor([main_loss, ac1_loss, ac2_loss, ac3_loss])
y.backward(gradient=torch.tensor([1.0,1.0,1.0,1.0]))
See https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#gradients for confirmation.
A similar question exists but this one uses a different phrasing and was the question which I found first when hitting the issue. The similar question can be found at Pytorch. Can autograd be used when the final tensor has more than a single value in it?
I want to approximate the function g(x) = exp(x) with a linear combination of functions, h(x) = sum_i(a_i * f_i(x)) using a neural network with TF2.
Now, the network input is just x and the outputs are the functions f_i.
The custom loss function is simple: the mean squared difference |g(x) - h(x)|^2.
My problem is that I don't understand how to define/use a_i?
First, all a_i are just scalars, In addition they don't depend on x.
Defining a_i as inputs doesn't make sense since I want them to be optimized.
Defining as outputs makes them depend on x which I don't want.
How can I add these variables as scalars to the network and make them optimized by the optimization process ?.
During training, Tensorflow tracks all tf.Variable objects defined in the model. If you define your constant as a tf.Variable, it will be ajusted via backpropagation.
Let's say we have a dataset wi X, and y which is y = X * 2:
import tensorflow as tf
x = tf.random.uniform((10_000, 1))
y = x * 2
We will create a model which has a constant inside, which will need to replicate the relationship between X and y. We will of course initialize this value as something other than 2. The relationship between X and y is 2 so the training should make the constant converge towards 2. So let's define a model that is nothing but a constant.
class CustomModel(tf.keras.models.Model):
def __init__(self):
super(CustomModel, self).__init__()
self.constant = tf.Variable(initial_value=0.1, dtype=tf.float32, trainable=True)
def call(self, x, **kwargs):
x = self.constant * x
return x
model = CustomModel()
Now just compile and train the model:
model.compile(loss='mae')
history = model.fit(x, y, epochs=25, verbose=0)
Now look at the weights. The constant, which was initialized with value 0.1, is now 2. It was optimized, as it understood the relationship between X and y, which is 2.
model.weights
[<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>]
I am trying to use a manually calculate a gradient using the output of my network, I will then use this in a loss function. I have managed to get an example working in keras, but converting it to PyTorch has proven more difficult
I have a model like:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(1, 50)
self.fc2 = nn.Linear(50, 10)
self.fc3 = nn.Linear(10, 1)
def forward(self, x):
x = F.sigmoid(self.fc1(x))
x = F.sigmoid(self.fc2(x))
x = self.fc3(x)
return x
and some data:
x = torch.unsqueeze(torch.linspace(-1, 1, 101), dim=1)
x = Variable(x)
I can then try find a gradient like:
output = net(x)
grad = torch.autograd.grad(outputs=output, inputs=x, retain_graph=True)[0]
I want to be able to find the gradient of each point, then do something like:
err_sqr = (grad - x)**2
loss = torch.mean(err_sqr)**2
However, at the moment if I try to do this I get the error:
grad can be implicitly created only for scalar outputs
I have tried changing the shape of my network output to fix this, but if I change it to much it says its not part of the graph. I can get rid of that error by allowing that, but then it says my gradient is None. I've managed to get this working in keras, so I'm confident that its possible here too, I just need a hand!
My questions are:
Is there a way to "fix" what I have to allow me to calculate the gradient
PyTorch expects an upstream gradient in the grad call. For usual (scalar) loss functions, the upstream gradient is implicitly assumed to be 1.
You can do a similar thing by passing ones as the upstream gradient:
grad = torch.autograd.grad(outputs=output, inputs=x, grad_outputs=torch.ones_like(output), retain_graph=True)[0]
I am trying to perform the most basic function minimisation possible in TensorFlow 2.0, exactly as in the question Tensorflow 2.0: minimize a simple function, however I cannot get the solution described there to work. Here is my attempt, mostly copy-pasted but with some bits that seemed to be missing added in.
import tensorflow as tf
x = tf.Variable(2, name='x', trainable=True, dtype=tf.float32)
with tf.GradientTape() as t:
y = tf.math.square(x)
# Is the tape that computes the gradients!
trainable_variables = [x]
#### Option 2
# To use minimize you have to define your loss computation as a funcction
def compute_loss():
y = tf.math.square(x)
return y
opt = tf.optimizers.Adam(learning_rate=0.001)
train = opt.minimize(compute_loss, var_list=trainable_variables)
print("x:", x)
print("y:", y)
Output:
x: <tf.Variable 'x:0' shape=() dtype=float32, numpy=1.999>
y: tf.Tensor(4.0, shape=(), dtype=float32)
So it says the minimum is at x=1.999, but obviously that is wrong. So what happened? I suppose it only performed one loop of the minimiser or something? If so then "minimize" seems like a terrible name for the function. How is this supposed to work?
On a side note, I also need to know the values of intermediate variables that are calculated in the loss function (the example only has y, but imagine that it took several steps to compute y and I want all those numbers). I don't think I am using the gradient tape correctly either, it is not obvious to me that it has anything to do with the computations in the loss function (I just copied this stuff from the other question).
You need to call minimize multiple times, because minimize only performs a single step of your optimisation.
Following should work
import tensorflow as tf
x = tf.Variable(2, name='x', trainable=True, dtype=tf.float32)
# Is the tape that computes the gradients!
trainable_variables = [x]
# To use minimize you have to define your loss computation as a funcction
class Model():
def __init__(self):
self.y = 0
def compute_loss(self):
self.y = tf.math.square(x)
return self.y
opt = tf.optimizers.Adam(learning_rate=0.01)
model = Model()
for i in range(1000):
train = opt.minimize(model.compute_loss, var_list=trainable_variables)
print("x:", x)
print("y:", model.y)