I am trying to make a model in tensorflow using the keras subclasses method.
Q1) I am correctly calling layers as layers = [] and then using layers.append(GTLayer....) ?
Q2) calling GTLayer in init of GTN will run class GTLayer and will it call self.conv1 (which will return a tensor A from GTNconv) and self.conv2 (which will again return a tensor A from GTNconv)and then start the call mrthod of GTLayer to H,W , Am I right?
Q3) What happens to the returned H and W from 'Q2' will it store in layers[] list ? and then when we further call the GTNs call method it will bring up those layer? Am I correct?
Q4)Later in the GTNs call method I had to implement linear layers and thus I defined model = tf.keras.models.Sequential() and after theat initialised self.linear1 and self.linear2, this way I have implemented subclassing and sequential both! Is that correct?
Q5) I will finally get loss, y, Ws from calling GTN , now if I assign my model = GTN(arguments..) how will I do the training and back-propagation steps? using an optimiser and loss function? will it follow model.compile() and model.fit ? Or can we make it any different in the sub-classing method of keras?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class GTN(layers.Layer):
def __init__(self, num_edge, num_channels,num_layers,norm):
super(GTN, self).__init__()
self.num_edge = num_edge
self.num_channels = num_channels
self.num_layers = num_layers
self.is_norm = norm
layers = []
for i in tf.range(num_layers):
if i == 0:
layers.append(GTLayer(num_edge, num_channels, first=True))
else:
layers.append(GTLayer(num_edge, num_channels, first=False))
model = tf.keras.models.Sequential()
self.loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
self.linear1 = model.add(tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None))
self.linear2 = model.add(tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None))
def gcn_conv(self,X,H):
X = tf.matmul(X, self.weight)
H = self.norm(H, add=True)
return tf.matmul(tf.transpose(H),X)
def call(self, A, X, target_x, target):
A = tf.expand_dims(A, 0)
Ws = []
for i in range(self.num_layers):
H = self.normalization(H)
H, W = self.layers[i](A, H)
Ws.append(W)
for i in range(self.num_channels):
X_tmp = tf.nn.relu(self.gcn_conv(X,H[i])).numpy()
X_ = tf.concat((X_,X_tmp), dim=1)
X_ = self.linear1(X_)
X_ = tf.nn.relu(X_).numpy()
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws
class GTLayer(keras.layers.Layer):
def __init__(self, in_channels, out_channels, first=True):
super(GTLayer, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.conv1 = GTConv(in_channels, out_channels)
self.conv2 = GTConv(in_channels, out_channels)
def call(self, A, H_=None):
a = self.conv1(A)
b = self.conv2(A)
H = tf.matmul( a, b)
W = [tf.stop_gradient(tf.nn.softmax(self.conv1.weight, axis=1).numpy()),
tf.stop_gradient(tf.nn.softmax(self.conv2.weight, axis=1).numpy()) ]
return H,W
class GTConv(keras.layers.Layer):
def __init__(self, in_channels, out_channels):
super(GTConv, self).__init__()
def call(self, A):
A = tf.add_n(tf.nn.softmax(self.weight))
return A
Q1
No. There are two possibilities here
1 - If you want to access a standard layers property of Keras models:
Only Model has a layers property, a keras.layers.Layer doesn't have this property
You are not supposed to mess with the layers property of a Model, you should just read it
The variable you are creating named layers is not a property of your class because you did not use self.layers.
2 - If you just want a list named layers for personal use in your class:
I recommend you don't use a standard name like this and change it to myLayers or something like that to avoid confusion.
The variable layers you created is not being used anywhere else in your code, you just created it and never used.
Remember that layers = [] just creates a local variable, while self.layers = [] creates a property in your class that can be used in other methods inside your class
Q2
You are not "calling" GTLayer, you are "creating" GTLayer. This means that you are running GTLayer.__init__().
This distinction is important in Keras:
This is "creating" a layer: layer_instance = GTLayer(...), which runs __init__
This is "calling" a layer: layer_instance(input_tensors), which runs __call__ (which will eventually run call as defined by you)
You can do both in the same line as output_tensors = GTLayer(...)(input_tensors)
So, this is happening in GTN.__init__:
You are "creating" two instances of the GTLayer.
This runs GTLayer.__init__() for each instance
This hits the lines self.conv1 = GTConv(in_channels, out_channels) and self.conv2 = GTConv(in_channels, out_channels)
This is also "creating" (not "calling") GTConv.
self.conv1 and self.conv2 are "Layer" instances, not tensors.
Q3
No tensor is produced here because you never "called" any layer in GTN.__init__().
(And this is ok. Usually, you "create" layers inside __init__() and "call" layers inside call.)
Your layers local variable will have "instances of GTLayer".
Q4
You mixed two approaches in a strange way.
You can, of course, use a Sequential model if you want, but it's not necessary, and you're not using it correcly.
If in call you are calling each layer (that is X_ = self.linear1(X_) and y = self.linear2(X_[target_x])), you don't need a Sequential model at all, and you can just have the following in GTN.__init__() (this is the best approach for subclassing):
self.linear1 = tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None)
self.linear2 = tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None)
But you could have self.submodel = Sequential(...) and then use self.submodel in GTN.call(). But having a Model inside a layer sounds weird and might cause some strange behavior in specific cases. And, of course, the ReLUs should be a part of this submodel.
Q5
I will finally get loss, y, Ws from calling GTN
That loss and weights coming from call is a "very very" strange thing. I never saw this and I don't understand why you're doing it this way. This is not standard use of Keras and only in very specific and otherwise unsolvable cases you'd try something like this. I cannot say it will work.
How will I do the training and back-propagation steps?
You should have implemented a keras.models.Model, not a keras.layers.Layer. Only models have the ability to compile and train.
Usually, you'd not create a loss in call, you'd create a loss in model.compile, unless you're dealing with unconventional losses, like weight or activity regularization, things that really depend on the layer and not on the model's inputs/outputs.
Extra tips
There is no need to create custom layers if you're not going to create custom trainable weights. It's not wrong, of course, but also not necessary. It can help organize your code, or just add extra complication.
You are trying to use weight from your layers, but you never defined any weight anywhere.
I'm pretty sure there is a better way to achieve what you want, but I don't know what you want (and that would be something for another question, I think...)
This might be a good reading for subclassing: https://www.tensorflow.org/guide/keras/custom_layers_and_models?hl=en
When I wanna assign part of pre-trained model parameters to another module defined in a new model of PyTorch, I got two different outputs using two different methods.
The Network is defined as follows:
class Net:
def __init__(self):
super(Net, self).__init__()
self.resnet = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)
self.resnet = nn.Sequential(*list(self.resnet.children())[:-1])
self.freeze_model(self.resnet)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(512, 256),
nn.ReLU(),
nn.Linear(256, 3),
)
def forward(self, x):
out = self.resnet(x)
out = out.flatten(start_dim=1)
out = self.classifier(out)
return out
What I want is to assign pre-trained parameters to classifier in the net module. Two different ways were used for this task.
# First way
net.load_state_dict(torch.load('model_CNN_pretrained.ptl'))
# Second way
params = torch.load('model_CNN_pretrained.ptl')
net.classifier[1].weight = nn.Parameter(params['classifier.1.weight'], requires_grad =False)
net.classifier[1].bias = nn.Parameter(params['classifier.1.bias'], requires_grad =False)
net.classifier[3].weight = nn.Parameter(params['classifier.3.weight'], requires_grad =False)
net.classifier[3].bias = nn.Parameter(params['classifier.3.bias'], requires_grad =False)
The parameters were assigned correctly but got two different outputs from the same input data. The first method works correctly, but the second doesn't work well. Could some guys point what the difference of these two methods?
Finally, I find out where is the problem.
During the pre-trained process, buffer parameters in BatchNorm2d Layer of ResNet18 model were changed even if we set require_grad of parameters False. Buffer parameters were calculated by the input data after model.train() was processed, and unchanged after model.eval().
There is a link about how to freeze the BN layer.
How to freeze BN layers while training the rest of network (mean and var wont freeze)
I built a Keras model that uses another model as a layer, but the problem is the weights in the other model are not training. How to I get around this?
For more details, I am using a transformer to encode sentences individually, then combining the set of sentences with another transformer.
Here is the pseudo code:
Class:
def build_context_encoder(self):
a = Input(sentences shape)
#function stuff
b = #transformer structure
context_encoder = Model(inputs=[a], outputs=b)
return context encoder
def build_model(self):
list_of _contexts = Input(list of contexts shape)
context_embs = Lambda(lambda x: K.map_fn(fn=self.context_encoder, elems=x, dtype=tf.float32))(list_of_contexts)
c = #rest of the model (context_embs)
model = Model(inputs=[list_of _contexts], outputs=c)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[])
return model
def __init__():
self.context_encoder = self.build_context_encoder()
self.model = self.build_model()
Why don't the weights in context_encoder update when I call fit? Is it due to the map_fn, or because I'm calling the model? How do I fix this?
For the call method of my custom layer I need the weights of some precedent layers, but I don't need to modify them only access to their value.
I have the value as suggest in How do I get the weights of a layer in Keras?
but this returns weights as numpy array.
So I have cast them in Tensor (using tf.convert_to_tensor from Keras backend) but, in the moment of the creation of the model I have this error "'NoneType' object has no attribute '_inbound_nodes'".
How can I fix this problem?
Thanks you.
TensorFlow provides graph collections that group the variables. To access the variables that were trained you would call tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) or its shorthand tf.trainable_variables() or to get all variables (including some for statistics) use tf.get_collection(tf.GraphKeys.VARIABLES) or its shorthand tf.all_variables()
tvars = tf.trainable_variables()
tvars_vals = sess.run(tvars)
for var, val in zip(tvars, tvars_vals):
print(var.name, val) # Prints the name of the variable alongside its value.
You can pass this precedent layer while initializing your custom layer class.
Custom Layer:
class CustomLayer(Layer):
def __init__(self, reference_layer):
super(CustomLayer, self).__init__()
self.ref_layer = reference_layer # precedent layer
def call(self, inputs):
weights = self.ref_layer.get_weights()
''' do something with these weights '''
return something
Now you add this layer to your model using Functional-API.
inp = Input(shape=(5))
dense = Dense(5)
custom_layer= CustomLayer(dense) # pass layer here
#model
x = dense(inp)
x = custom_layer(x)
model = Model(inputs=inp, outputs=x)
Here custom_layer can access weights of layer dense.
I am working with the keras-capsnet implementation of Capsule Networks, and am trying to apply the same layer to 30 images per sample.
The weights are initialized within the init and build arguments for the class, shown below. I have successfully shared the weights between the primary routing layers which just use tf.layers.conv2d, where I can assign them the same name and set reuse = True.
Does anyone know how to initialize weights in a Keras custom layer so that they may be reused? I am much more familiar with the tensorflow API than with the Keras one!
def __init__(self, num_capsule, dim_capsule, routings=3,
kernel_initializer='glorot_uniform',
**kwargs):
super(CapsuleLayer, self).__init__(**kwargs)
self.num_capsule = num_capsule
self.dim_capsule = dim_capsule
self.routings = routings
self.kernel_initializer = initializers.get(kernel_initializer)
def build(self, input_shape):
assert len(input_shape) >= 3, "The input Tensor should have shape=[None, input_num_capsule, input_dim_capsule]"
self.input_num_capsule = input_shape[1]
self.input_dim_capsule = input_shape[2]
# Weights are initialized here each time the layer is called
self.W = self.add_weight(shape=[self.num_capsule, self.input_num_capsule,
self.dim_capsule, self.input_dim_capsule],
initializer=self.kernel_initializer,
name='W')
self.built = True
The answer was simple. Set up a layer without calling it on input, and then use that built layer to call the data individually.