Understanding Keras subclass method in Tensorflow's deep learning pipeline - python

I am trying to make a model in tensorflow using the keras subclasses method.
Q1) I am correctly calling layers as layers = [] and then using layers.append(GTLayer....) ?
Q2) calling GTLayer in init of GTN will run class GTLayer and will it call self.conv1 (which will return a tensor A from GTNconv) and self.conv2 (which will again return a tensor A from GTNconv)and then start the call mrthod of GTLayer to H,W , Am I right?
Q3) What happens to the returned H and W from 'Q2' will it store in layers[] list ? and then when we further call the GTNs call method it will bring up those layer? Am I correct?
Q4)Later in the GTNs call method I had to implement linear layers and thus I defined model = tf.keras.models.Sequential() and after theat initialised self.linear1 and self.linear2, this way I have implemented subclassing and sequential both! Is that correct?
Q5) I will finally get loss, y, Ws from calling GTN , now if I assign my model = GTN(arguments..) how will I do the training and back-propagation steps? using an optimiser and loss function? will it follow model.compile() and model.fit ? Or can we make it any different in the sub-classing method of keras?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class GTN(layers.Layer):
def __init__(self, num_edge, num_channels,num_layers,norm):
super(GTN, self).__init__()
self.num_edge = num_edge
self.num_channels = num_channels
self.num_layers = num_layers
self.is_norm = norm
layers = []
for i in tf.range(num_layers):
if i == 0:
layers.append(GTLayer(num_edge, num_channels, first=True))
else:
layers.append(GTLayer(num_edge, num_channels, first=False))
model = tf.keras.models.Sequential()
self.loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
self.linear1 = model.add(tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None))
self.linear2 = model.add(tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None))
def gcn_conv(self,X,H):
X = tf.matmul(X, self.weight)
H = self.norm(H, add=True)
return tf.matmul(tf.transpose(H),X)
def call(self, A, X, target_x, target):
A = tf.expand_dims(A, 0)
Ws = []
for i in range(self.num_layers):
H = self.normalization(H)
H, W = self.layers[i](A, H)
Ws.append(W)
for i in range(self.num_channels):
X_tmp = tf.nn.relu(self.gcn_conv(X,H[i])).numpy()
X_ = tf.concat((X_,X_tmp), dim=1)
X_ = self.linear1(X_)
X_ = tf.nn.relu(X_).numpy()
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws
class GTLayer(keras.layers.Layer):
def __init__(self, in_channels, out_channels, first=True):
super(GTLayer, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.conv1 = GTConv(in_channels, out_channels)
self.conv2 = GTConv(in_channels, out_channels)
def call(self, A, H_=None):
a = self.conv1(A)
b = self.conv2(A)
H = tf.matmul( a, b)
W = [tf.stop_gradient(tf.nn.softmax(self.conv1.weight, axis=1).numpy()),
tf.stop_gradient(tf.nn.softmax(self.conv2.weight, axis=1).numpy()) ]
return H,W
class GTConv(keras.layers.Layer):
def __init__(self, in_channels, out_channels):
super(GTConv, self).__init__()
def call(self, A):
A = tf.add_n(tf.nn.softmax(self.weight))
return A

Q1
No. There are two possibilities here
1 - If you want to access a standard layers property of Keras models:
Only Model has a layers property, a keras.layers.Layer doesn't have this property
You are not supposed to mess with the layers property of a Model, you should just read it
The variable you are creating named layers is not a property of your class because you did not use self.layers.
2 - If you just want a list named layers for personal use in your class:
I recommend you don't use a standard name like this and change it to myLayers or something like that to avoid confusion.
The variable layers you created is not being used anywhere else in your code, you just created it and never used.
Remember that layers = [] just creates a local variable, while self.layers = [] creates a property in your class that can be used in other methods inside your class
Q2
You are not "calling" GTLayer, you are "creating" GTLayer. This means that you are running GTLayer.__init__().
This distinction is important in Keras:
This is "creating" a layer: layer_instance = GTLayer(...), which runs __init__
This is "calling" a layer: layer_instance(input_tensors), which runs __call__ (which will eventually run call as defined by you)
You can do both in the same line as output_tensors = GTLayer(...)(input_tensors)
So, this is happening in GTN.__init__:
You are "creating" two instances of the GTLayer.
This runs GTLayer.__init__() for each instance
This hits the lines self.conv1 = GTConv(in_channels, out_channels) and self.conv2 = GTConv(in_channels, out_channels)
This is also "creating" (not "calling") GTConv.
self.conv1 and self.conv2 are "Layer" instances, not tensors.
Q3
No tensor is produced here because you never "called" any layer in GTN.__init__().
(And this is ok. Usually, you "create" layers inside __init__() and "call" layers inside call.)
Your layers local variable will have "instances of GTLayer".
Q4
You mixed two approaches in a strange way.
You can, of course, use a Sequential model if you want, but it's not necessary, and you're not using it correcly.
If in call you are calling each layer (that is X_ = self.linear1(X_) and y = self.linear2(X_[target_x])), you don't need a Sequential model at all, and you can just have the following in GTN.__init__() (this is the best approach for subclassing):
self.linear1 = tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None)
self.linear2 = tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None)
But you could have self.submodel = Sequential(...) and then use self.submodel in GTN.call(). But having a Model inside a layer sounds weird and might cause some strange behavior in specific cases. And, of course, the ReLUs should be a part of this submodel.
Q5
I will finally get loss, y, Ws from calling GTN
That loss and weights coming from call is a "very very" strange thing. I never saw this and I don't understand why you're doing it this way. This is not standard use of Keras and only in very specific and otherwise unsolvable cases you'd try something like this. I cannot say it will work.
How will I do the training and back-propagation steps?
You should have implemented a keras.models.Model, not a keras.layers.Layer. Only models have the ability to compile and train.
Usually, you'd not create a loss in call, you'd create a loss in model.compile, unless you're dealing with unconventional losses, like weight or activity regularization, things that really depend on the layer and not on the model's inputs/outputs.
Extra tips
There is no need to create custom layers if you're not going to create custom trainable weights. It's not wrong, of course, but also not necessary. It can help organize your code, or just add extra complication.
You are trying to use weight from your layers, but you never defined any weight anywhere.
I'm pretty sure there is a better way to achieve what you want, but I don't know what you want (and that would be something for another question, I think...)
This might be a good reading for subclassing: https://www.tensorflow.org/guide/keras/custom_layers_and_models?hl=en

Related

Access to layer weights from a tf.keras model

I am trying to replicate a tensorflow subclassed model, but I'm having problems accessing to the weights of a layer included in the model. Here's a summarized definition of the model:
class model():
def __init__(self, dims, size):
self._dims = dims
self.size = size
self.autoencoder= None
self.encoder = None
self.decoder = None
self.model = None
def initialize(self):
self.autoencoder, self.encoder, self.decoder = mlp_autoencoder(self.dims)
output = MyLayer(self.size, name= 'MyLayer')(self.encoder.output)
self.model = Model(inputs= self.autoencoder.input,
outputs= [self.autoencoder.output, output])
mlp_autoencoder defines as many encoder and decoder layers as introduced in dims.
MyLayer's trainable weights are learnt in the encoder's latent space and are then used to return the second output.
There are no issues accessing to the autoencoder weights, the problem is when trying to get MyLayer's weights. The first time it crashes is in the following part of the code:
#property
def layer_weights(self):
return self.model.get_layer(name= 'MyLayer').get_weights()
# ValueError: No such layer: MyLayer.
By building the model this way a different TFOpLambda Layer is created for each transformation made to the encoder.output in the custom layer. I tried getting the weights through the last TFOpLambda layer (the second output of the model) but get_weights returns an empty list. In summary, these weights are never stored in the model.
I checked if MyLayer is well defined by using it separately, and it creates and stores the variables just fine, I had no issues accessing them. The problem appears when trying to use this layer in model.
Can someone more knowledgable in subclassing tell if there is something wrong in the definition of the model? I've considered using build and call as it seems to be the 'standard' way, but there's gotta be a simpler way...
I can provide more details of the program if needed.
Thanks in advance!
A (not very elegant) way to solve it is calling the custom layer in the __init__ method. By doing this, the layer is created as a model attribute making its weights accesible.
def __init__(self, dims, size):
self.dims = dims
self.size = size
self.autoencoder = None
self.encoder = None
self.decoder = None
self.model = None
self.custom_layer = MyLayer(self.size, name= 'MyLayer')
def initialize(self):
self.autoencoder, self.encoder, self.decoder = mlp_autoencoder(self.dims)
h = self.custom_layer(self.encoder.output)
self.model = Model(inputs= self.autoencoder.input,
outputs= [self.autoencoder.output, h])
Getting weights:
def layer_weights(self):
return self.custom_layer.get_weights()[0]

The difference of loading model parameters between load_state_dict and nn.Parameter in pytorch

When I wanna assign part of pre-trained model parameters to another module defined in a new model of PyTorch, I got two different outputs using two different methods.
The Network is defined as follows:
class Net:
def __init__(self):
super(Net, self).__init__()
self.resnet = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)
self.resnet = nn.Sequential(*list(self.resnet.children())[:-1])
self.freeze_model(self.resnet)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(512, 256),
nn.ReLU(),
nn.Linear(256, 3),
)
def forward(self, x):
out = self.resnet(x)
out = out.flatten(start_dim=1)
out = self.classifier(out)
return out
What I want is to assign pre-trained parameters to classifier in the net module. Two different ways were used for this task.
# First way
net.load_state_dict(torch.load('model_CNN_pretrained.ptl'))
# Second way
params = torch.load('model_CNN_pretrained.ptl')
net.classifier[1].weight = nn.Parameter(params['classifier.1.weight'], requires_grad =False)
net.classifier[1].bias = nn.Parameter(params['classifier.1.bias'], requires_grad =False)
net.classifier[3].weight = nn.Parameter(params['classifier.3.weight'], requires_grad =False)
net.classifier[3].bias = nn.Parameter(params['classifier.3.bias'], requires_grad =False)
The parameters were assigned correctly but got two different outputs from the same input data. The first method works correctly, but the second doesn't work well. Could some guys point what the difference of these two methods?
Finally, I find out where is the problem.
During the pre-trained process, buffer parameters in BatchNorm2d Layer of ResNet18 model were changed even if we set require_grad of parameters False. Buffer parameters were calculated by the input data after model.train() was processed, and unchanged after model.eval().
There is a link about how to freeze the BN layer.
How to freeze BN layers while training the rest of network (mean and var wont freeze)

Differentiating user-defined Variables when using Keras layers

I want to multiply a Keras layer with my own Variable.
Then, I want to compute the gradients of some loss relative to the variables I have defined.
Here is a simplified MWE of what I am trying to do:
import tensorflow as tf
x = input_shape = tf.keras.layers.Input((10,))
x = tf.keras.layers.Dense(5)(x)
s = tf.Variable(tf.ones((5,)))
x = x*s
model = tf.keras.models.Model(input_shape, x)
X = tf.random.normal((50, 10)) # random sample
with tf.GradientTape() as tape:
tape.watch(s)
y = model(X)
loss = y**2
print(tape.gradient(loss, s)) # why None ??
The print prints None... why?
Notice that I am using eager-execution (TF version 2.0.0).
I managed to fix my problem by sub-classing Model and creating my variable inside the model:
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(5)
self.s = tf.Variable(tf.ones((5,)))
def call(self, inputs):
x = self.dense(inputs)
x = x * self.s
return x
Alternatively, defining my own custom layer also works.
There must be some magic going on whereby variables not inside a model are not backpropagated (like in PyTorch).
I will leave the question open because I am curious as to why my code was not working and what a simpler fix would look like.
This might be the explanation. Based on reviewing the documentation, I'm suspecting that the issue is the differentiation with respect to the model layer "s" (or any other layer say "x") might not be a meaningful calculation. For example, it is possible to do this:
print(tape.gradient(loss, model.variables))
and obtain the gradients with respect to the model weights/parameters, but differentiating the model with respect to a "layer" is not appropriate. This is my speculation at this point. I hope this helps.

How to create composable Keras layers while maintaining access to child layers?

I'm trying to create a custom Keras Layer that is composed of a variable number of other Keras layers. I want access to both the grouped layers treated as a single layer with an input and output as well as access to the individual sublayers.
I'm just wondering, what is the "proper" method for performing this form of layer abstraction in Keras?
I'm not sure if this should be done through Keras's Layer API, or if I can work with name scopes and wrapping layers in a function and then select all layers under some name scope. (e.g. model.get_layer('custom1/*'), but I don't think that works, and name_scope doesn't appear to apply to layer names)
One of the other issues with using Keras Layers for this is that you need to assign the child variables as direct attributes (I assume it makes use of the descriptor API) to the class in order for the trainable weights to be added to the model (which is messier when we don't have a fixed number of layers, meaning we'd need to hack something together with setattr(), which, ew).
Btw, this isn't my actual code, but is a barebones simplification of what I'm trying to accomplish.
class CustomLayer(Layer):
def __init__(self, *a, n_layers=2, **kw):
self.n_layers = n_layers
super().__init__(*a, **kw)
def call(self, x):
for i in range(self.n_layers):
x = Dense(64, activation='relu')(x)
x = Dropout(0.2)(x)
return x
input_shape, n_classes = (None, 100), 10
x = input = Input(input_shape)
x = CustomLayer()(x)
x = CustomLayer()(x)
x = Dense(n_classes, activation='softmax')(x)
model = Model([input], [x])
'''
So this works fine
'''
print([l.name for l in model.layers])
print(model.layers[1].name)
print(model.layers[1].output)
# Output: ['input', 'custom_1', 'custom_2', 'dense']
# Output: 'custom_1'
# Output: the custom_1 output tensor
'''
But I'm not sure how to do something to the effect of this ...
'''
print(model.layers[1].layers[0].name)
print(model.layers[1].layers[0].output)
# Output: custom_1/dense
# Output: custom_1/dense output tensor

Is keras based on closures in python?

While working with keras and tensorflow, I found the following lines of code confusing.
w_init = tf.random_normal_initializer()
self.w = tf.Variable(initial_value=w_init(shape=(input_dim, units),
dtype='float32'),trainable=True)
Also, I have seen something like:
Dense(64, activation='relu')(x)
Therefore, if Dense(...) will create the object for me, then how can I follow that with with (x)?
Likewise for w_init above. How can I say such thing:
tf.random_normal_initializer()(shape=(input_dim, units), dtype='float32'),trainable=True)
Do we have such thing in python "ClassName()" followed by "()" while creating an object such as a layer?
While I was looking into Closures in python, I found that a function can return another function. Hence, is this what really happens in Keras?
Any help is much appreciated!!
These are two totally different ways to define models.
Keras
Keras works with the concept of layers. Each line defines a full layer of your network. What you are referring to in specific is keras' functional API. The concept is to combine layers like this:
inp = Input(shape=(28, 28, 1))
x = Conv2D((6,6), strides=(1,1), activation='relu')(inp)
# ... etc ...
x = Flatten()(x)
x = Dense(10, activation='softmax')(x)
model = Model(inputs=[inp], outputs=[x])
This way you've created a full CNN in just a few lines. Note that you never had to manually input the shape of the weight vectors or the operations that are performed. These are inferred automatically by keras.
Now, this just needs to be compiled through model.compile(...) and then you can train it through model.fit(...).
Tensorflow
On the other hand TensorFlow is a bit more low-level. This means that you have do define the variables and operations by hand. So in order to write a fully-connected layer you'd have to do the following:
# Input placeholders
x = tf.placeholder(tf.float32, shape=(None, 28, 28, 1))
y = tf.placeholder(tf.float32, shape=(None, 10))
# Convolution layer
W1 = tf.Variable(tf.truncated_normal([6, 6, 1, 32], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, tf.float32, [32]))
z1 = tf.nn.conv2d(x_2d, W1, strides=[1, 1, 1, 1], padding='SAME') + b1
c1 = tf.nn.relu(z1)
# ... etc ...
# Flatten
flat = tf.reshape(p2, [-1, ...]) # need to calculate the ... by ourselves
# Dense
W3 = tf.Variable(tf.truncated_normal([..., 10], stddev=0.1)) # same size as before
b3 = tf.Variable(tf.constant(0.1, tf.float32, [10]))
fc1 = tf.nn.relu(tf.matmul(flat, W3) + b3)
Two things to note here. There is no explicit definition of a model here and this has to be trained through a tf.Session with a feed_dict feeding the data to the placeholders. If you're interested you'll find several guides online.
Closing notes...
TensorFlow has a much friendlier and easier way to define and train models through eager execution, which will be default in TF 2.0! So the code you posted is in a sense the old way of doing things in tensorflow. It's worth taking a look into TF 2.0, which actually recommends doing things the keras way!
Edit (after comment by OP):
No a layer is not a clojure. A keras layer is a class that implements a __call__ method which also makes it callable. The way they did it was so that it is a wrapper to the call method that users typically write.
You can take a look at the implementation here
Basically how this works is:
class MyClass:
def __init__(self, param):
self.p = param
def call(self, x):
print(x)
If you try to write c = MyClass(1)(3), you'll get a TypeError saying that MyClass is not callable. But if you write it like this:
class MyClass:
def __init__(self, param):
self.p = param
def __call__(self, x):
print(x)
It works now. Essentially keras does it like this:
class MyClass:
def __init__(self, param):
self.p = param
def call(self, x):
print(x)
def __call__(self, x):
self.call(x)
So that when you write your own layer you can implement your own call method and the __call__ method that wraps your one will get inherited from keras' base Layer class.
Just from the syntax, I would say that Dense() returns a function (or more accurately a callable). Similarly w_init is a callable as well.

Categories