We're trying to add a custom layer inside a pre-trained imagenet model. For a sequential or non-sequential model, we can easily do that. But here are some requirements.
First of all, we don't wanna disclose the whole imagenet model and deal with the desired inside layer. Let's say for DenseNet we need the following layers and further get the output shape of theirs to connect with some custom layers.
vision_model = tf.keras.applications.DenseNet121(
input_shape=(224,224,3),
include_top = False,
weights='imagenet')
for i, layer in enumerate(vision_model.layers):
if layer.name in ['conv3_block12_concat', 'conv4_block24_concat']:
print(i,'\t',layer.trainable,'\t :',layer.name)
if layer.name == 'conv3_block12_concat':
print(layer.get_output_shape_at(0)[1:]) # (28, 28, 512)
if layer.name == 'conv4_block24_concat':
print(layer.get_output_shape_at(0)[1:]) # (14, 14, 1024)
The whole requirement can be demonstrated as follows
The green indicator is basically the transition layer of the dense net.
In the above diagram, the dense net model has (let's say) 5 blocks and among them, we want to pick block 3 and block 4 and add some custom layers followed by merging them to lead the final output.
Also, the blocks of DenseNet (block 1 to 5), should be as disclose as possible with their pre-trained imagenet weights. We like to have control to freeze and unfreeze pre-trained layers when we need them.
How can we efficiently achieve with tf.keras? or, If you think there some better approach to do the same thing, please suggest.
Let's say, a custom block is something like this
class MLPBlock(tf.keras.layers.Layer):
def __init__(self, kernel_num=32, kernel_size=(3,3), strides=(1,1), padding='same'):
super(ConvModule, self).__init__()
# conv layer
self.conv = tf.keras.layers.Conv2D(kernel_num,
kernel_size=kernel_size,
strides=strides, padding=padding)
# batch norm layer
self.bn = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv(input_tensor)
x = self.bn(x, training=training)
return tf.nn.relu(x)
Motivation
I'm trying to implement this paper-work where they did something like this. Initially, the paper was free to get but now it's not. But below is the main block diagram of their approach.
I don't have access to the paper so I just build an example like the one your draw:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
class ConvBlock(layers.Layer):
def __init__(self, kernel_num=32, kernel_size=(3,3), strides=(1,1), padding='same'):
super(ConvBlock, self).__init__()
# conv layer
self.conv = layers.Conv2D(kernel_num,
kernel_size=kernel_size,
strides=strides, padding=padding)
# batch norm layer
self.bn = layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv(input_tensor)
x = self.bn(x, training=training)
return tf.nn.relu(x)
vision_model = keras.applications.DenseNet121(
input_shape=(224,224,3),
include_top = False,
weights='imagenet')
# Control freeze and unfreeze over blocks
def set_freeze(block, unfreeze):
for layer in block:
layer.trainable = unfreeze
block_1 = vision_model.layers[:7]
block_2 = vision_model.layers[7:53]
block_3 = vision_model.layers[53:141]
block_4 = vision_model.layers[141:313]
block_5 = vision_model.layers[313:]
set_freeze(block_1, unfreeze=False)
set_freeze(block_2, unfreeze=False)
for i, layer in enumerate(vision_model.layers):
print(i,'\t',layer.trainable,'\t :',layer.name)
layer_names = ['conv3_block12_concat', 'conv4_block24_concat', 'conv5_block16_concat']
vision_model_outputs = [vision_model.get_layer(name).output for name in layer_names]
custom_0 = ConvBlock()(vision_model_outputs[0])
custom_1 = ConvBlock()(layers.UpSampling2D(2)(vision_model_outputs[1]))
cat_layer = layers.concatenate([custom_0, custom_1])
last_conv_num = 2
custom_2 = layers.UpSampling2D(4)(vision_model_outputs[2])
outputs = layers.concatenate([ConvBlock()(cat_layer) for i in range(last_conv_num)] + [custom_2])
model = models.Model(vision_model.input, outputs)
keras.utils.plot_model(model, "./Model_structure.png", show_shapes=True)
Run the code and you will see the block1 and block2 are frozen,
Because the plot of full model is long so I just post few snippet of it:
Related
The below code is just pseudo-code
I have this network architecture which is composed of multiple subclassed Keras.models. The basic layout of the entire network architecture is as follows:
class Main(tf.keras.Model):
def __init__(self, block):
super(Main, self).__init__(dynamic=True)
self.conv_1 = Conv2D(ncOut, kernel_size=(3,3), strides=(1,1),
padding="same", use_bias=False)
self.conv_2 = Conv2D(ncOut, kernel_size=(3,3), strides=(1,1),
padding="same", use_bias=False)
self.block1 = block(...)
self.block2 = block(next = block1)
self.block3 = block(next = block2)
def call(self, inputs):
x = inputs[0]
out = self.conv_1(x)
self.block3(out)
.
.
return out
class Block(tf.keras.Model):
def __init__(self, next):
super(Block, self).__init__(dynamic=True)
self.conv_1 = Conv2D(ncOut, kernel_size=(3,3), strides=(1,1),
padding="same", use_bias=False)
self.conv_2 = Conv2D(ncOut, kernel_size=(3,3), strides=(1,1),
padding="same", use_bias=False')
self.next = next
def call(self, inputs):
x = inputs[0]
out = self.conv_1(x)
out = self.next(out)
.
.
return out
This should get the gist of the architecture.
I.e, we have a subclassed model which creates entries other subclassed models that are then called.
After the network has trained, I save the weights. I then wish to make some changes to the network, i.e remove some layers or update the structure. Obviously, now it is no longer possible to load in the weights as there would be a mismatch.
What I then tried to do was give each layer a unique name (so during the creation of the blocks, I also pass a unique name). For the layers, like convolution and so forth, add a name argument with this unique name (plus an identifier for conv_1 and so on), in hopes that I would be able to load_weights(by_name=True).
Imagine that the structure is changed, I.e I removed conv_2 from the Block structure. If I then went to load the weights it would complain that layer # block expected ____ weights but received ___. Even if by_name=True is enabled. This tells me that it tries to load the weights for the entire block, and not specifically for each layer in this block, i.e the named convolution.
How would I load the weights, by the name, for each of the layers in a subsubclassed model after the structure of the network has been changed?
I ended up doing it by recursion.
def recursion(input, i = 0):
try:
if input.layers and not input.layers[i:]:
#we have exhausted the entire list
return
if input.layers:
recursion(input.layers[i], 0)
recursion(input, i+1)
except:
#print(input.name)
if input.name in layers_dict:
print(input.name)
#Take weights and set.
weights = input.get_weights()
layers_dict[input.name].set_weights(weights)
This seems to get the weights and set the weights for each layer as I wanted. For now, this works. If anyone has a better solution. Do not hesitate to suggest.
layers_dict is {"name_block_conv1": stripped_model.block.conv_1 ...}
Input: a model with weights
In the keras doc, it says that if we want to pick the intermediate layer's output of the model (sequential and functional), all we need to do as follows:
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = keras.Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model(data)
So, here we get two models, the intermediate_layer_model is the sub-model of its parent model. And they're independent as well. Likewise, if we get the intermediate layer's output feature maps of the parent model (or base model), and do some operation with it and get some output feature maps from this operation, then we can also impute this output feature maps back to the parent model.
input = tf.keras.Input(shape=(size,size,3))
model = tf.keras.applications.DenseNet121(input_tensor = input)
layer_name = "conv1_block1" # for example
output_feat_maps = SomeOperationLayer()(model.get_layer(layer_name).output)
# assume, they're able to add up
base = Add()([model.output, output_feat_maps])
# bind all
imputed_model = tf.keras.Model(inputs=[model.input], outputs=base)
So, in this way we have one modified model. It's quite easy with functional API. All the keras imagenet models are written with functional API (mostly). In model subclassing API, we can use these models. My concern here is, what to do if we need the intermediate output feature maps of these functional API models' inside call function.
class Subclass(tf.keras.Model):
def __init__(self, dim):
super(Subclass, self).__init__()
self.dim = dim
self.base = DenseNet121(input_shape=self.dim)
# building new model with the desired output layer of base model
self.mid_layer_model = tf.keras.Model(self.base.inputs,
self.base.get_layer(layer_name).output)
def call(self, inputs):
# forward with base model
x = self.base(inputs)
# forward with mid_layer_model
mid_feat = self.mid_layer_model(inputs)
# do some op with it
mid_x = SomeOperationLayer()(mid_feat)
# assume, they're able to add up
out = tf.keras.layers.add([x, mid_x])
return out
The issue is, here we've technically two models in a joint fashion. But unlike building a model like this, here we simply want the intermediate output feature maps (from some inputs) of the base model forward manner and use it somewhere else and get some output. Like this
mid_x = SomeOperationLayer()(self.base.get_layer(layer_name).output)
But it gives ValueError: Graph disconnected. So, currently, we have to build a new model from the base model based on our desired intermediate layer. In the init method we define or create new self.mid_layer_model model that gives our desired output feature maps like this: mid_feat = self.mid_layer_model(inputs). Next, we take the mid_faet and do some operation and get some output and lastly add them with tf.keras.layers.add([x, mid_x]). So by creating a new model with desired intermediate out works but by the same time, we repeat the same operation twice i.e the base model and its subset model. Maybe I'm missing something obvious, please add up something. Is it how it is! or there some strategies we can adopt. I've asked in the forum here, no response yet.
Update
Here is a working example. Let's say we have a custom layer like this
import tensorflow as tf
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.layers import Add
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
class ConvBlock(tf.keras.layers.Layer):
def __init__(self, kernel_num=32, kernel_size=(3,3), strides=(1,1), padding='same'):
super(ConvBlock, self).__init__()
# conv layer
self.conv = tf.keras.layers.Conv2D(kernel_num,
kernel_size=kernel_size,
strides=strides, padding=padding)
# batch norm layer
self.bn = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv(input_tensor)
x = self.bn(x, training=training)
return tf.nn.relu(x)
And we want to impute this layer into an ImageNet model and construct a model like this
input = tf.keras.Input(shape=(32, 32, 3))
base = DenseNet121(weights=None, input_tensor = input)
# get output feature maps of at certain layer, ie. conv2_block1_0_relu
cb = ConvBlock()(base.get_layer("conv2_block1_0_relu").output)
flat = Flatten()(cb)
dense = Dense(1000)(flat)
# adding up
adding = Add()([base.output, dense])
model = tf.keras.Model(inputs=[base.input], outputs=adding)
from tensorflow.keras.utils import plot_model
plot_model(model,
show_shapes=True, show_dtype=True,
show_layer_names=True,expand_nested=False)
Here the computation from input to layer conv2_block1_0_relu is computed one time. Next, if we want to translate this functional API to subclassing API, we had to build a model from the base model's input to layer conv2_block1_0_relu first. Like
class ModelWithMidLayer(tf.keras.Model):
def __init__(self, dim=(32, 32, 3)):
super().__init__()
self.dim = dim
self.base = DenseNet121(input_shape=self.dim, weights=None)
# building sub-model from self.base which gives
# desired output feature maps: ie. conv2_block1_0_relu
self.mid_layer = tf.keras.Model(self.base.inputs,
self.base.get_layer("conv2_block1_0_relu").output)
self.flat = Flatten()
self.dense = Dense(1000)
self.add = Add()
self.cb = ConvBlock()
def call(self, x):
# forward with base model
bx = self.base(x)
# forward with mid layer
mx = self.mid_layer(x)
# make same shape or do whatever
mx = self.dense(self.flat(mx))
# combine
out = self.add([bx, mx])
return out
def build_graph(self):
x = tf.keras.layers.Input(shape=(self.dim))
return tf.keras.Model(inputs=[x], outputs=self.call(x))
mwml = ModelWithMidLayer()
plot_model(mwml.build_graph(),
show_shapes=True, show_dtype=True,
show_layer_names=True,expand_nested=False)
Here model_1 is actually a sub-model from DenseNet, which probably leads the whole model (ModelWithMidLayer) to compute the same operation twice. If this observation is correct, then this gives us concern.
I thought it might be much complex but it's actually rather very simple. We just need to build a model with desired output layers at the __init__ method and use it normally in the call method.
import tensorflow as tf
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.layers import Add
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
class ConvBlock(tf.keras.layers.Layer):
def __init__(self, kernel_num=32, kernel_size=(3,3), strides=(1,1), padding='same'):
super(ConvBlock, self).__init__()
# conv layer
self.conv = tf.keras.layers.Conv2D(kernel_num,
kernel_size=kernel_size,
strides=strides, padding=padding)
# batch norm layer
self.bn = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv(input_tensor)
x = self.bn(x, training=training)
return tf.nn.relu(x)
class ModelWithMidLayer(tf.keras.Model):
def __init__(self, dim=(32, 32, 3)):
super().__init__()
self.dim = dim
self.base = DenseNet121(input_shape=self.dim, weights=None)
# building sub-model from self.base which gives
# desired output feature maps: ie. conv2_block1_0_relu
self.mid_layer = tf.keras.Model(
inputs=[self.base.inputs],
outputs=[
self.base.get_layer("conv2_block1_0_relu").output,
self.base.output])
self.flat = Flatten()
self.dense = Dense(1000)
self.add = Add()
self.cb = ConvBlock()
def call(self, x):
# forward with base model
bx = self.mid_layer(x)[1] # output self.base.output
# forward with mid layer
mx = self.mid_layer(x)[0] # output base.get_layer("conv2_block1_0_relu").output
# make same shape or do whatever
mx = self.dense(self.flat(mx))
# combine
out = self.add([bx, mx])
return out
def build_graph(self):
x = tf.keras.layers.Input(shape=(self.dim))
return tf.keras.Model(inputs=[x], outputs=self.call(x))
mwml = ModelWithMidLayer()
tf.keras.utils.plot_model(mwml.build_graph(),
show_shapes=True, show_dtype=True,
show_layer_names=True,expand_nested=False)
I am working on a modified resnet, and want to insert dropout after activation layers.
I have tried the following but due to the model not being sequential, it did not work:
def add_dropouts(model, probability = 0.5):
print("Adding Dropouts")
updated_model = tf.keras.models.Sequential()
for layer in model.layers:
print("layer = ", layer)
updated_model.add(layer)
if isinstance(layer, tf.keras.layers.Activation):
updated_model.add(tf.keras.layers.Dropout(probability))
print("updated model Summary = ", updated_model.summary)
print("model Summary = ", model.summary)
model = updated_model
return model
base_model = tf.keras.applications.ResNet50V2(include_top=False, input_shape=input_img_shape, pooling='avg')
base_model = add_dropouts(base_model, probability = 0.5)
Then i tried my own version using the functional API, but this method doesn't work and returns a value error say Tensor doesn't have output.
prev_layer = base_model.layers[0]
for layer in base_model.layers:
next_layer = layer(prev_layer.output)
if isinstance(layer, tf.keras.layers.Activation):
next_layer = Dropout(0.5)(next_layer.output)
prev_layer = next_layer
Does anyone know how someone would add dropout layers into resnet or any other pretrained network?
So eventually i figured out how to do it; but its very hacky. Go to:
C:\ProgramData\Anaconda3\envs*your env name*\Lib\site-packages\tensorflow\python\keras\applications
Go to resnet.py. This will also change resnetv2 instances because it is based on the original resnet. Just Cntrl+F for activation,and where you see an activation layer(which is usually in the format x = Layer(x) building the model a layer at a time) then just add:
x = Dropout(prob)(x)
Here is an example:
if not preact:
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name='conv1_bn')(x)
x = layers.Activation('relu', name='conv1_relu')(x)#insert layer after each of these
x = layers.Dropout(prob)(x) # added dropout
Do this for all similar search results for 'activation'.
Then you will see the dropout added in your model summary.
I want to build a network that should be able to verificate images (e.g. human faces). As I understand, that the best solution for that is Siamese network with a triplet loss. I didn't found any ready-made implementations, so I decided to create my own.
But I have question about Keras. For example, here's the structure of the network:
And the code is something like that:
embedding = Sequential([
Flatten(),
Dense(1024, activation='relu'),
Dense(64),
Lambda(lambda x: K.l2_normalize(x, axis=-1))
])
input_a = Input(shape=shape, name='anchor')
input_p = Input(shape=shape, name='positive')
input_n = Input(shape=shape, name='negative')
emb_a = embedding(input_a)
emb_p = embedding(input_p)
emb_n = embedding(input_n)
out = Concatenate()([emb_a, emb_p, emp_n])
model = Model([input_a, input_p, input_n], out)
model.compile(optimizer='adam', loss=<triplet_loss>)
I defined only one embedding model. Does this mean that once the model starts training weights would be the same for each input?
If it is, how can I extract embedding weights from the model?
Yes, In triplet loss function weights should be shared across all three networks, i.e Anchor, Positive and Negetive.
In Tensorflow 1.x to achieve weight sharing you can use reuse=True in tf.layers.
But in Tensorflow 2.x since the tf.layers has been moved to tf.keras.layers and reuse functionality has been removed.
To achieve weight sharing you can write a custom layer that takes the parent layer and reuses its weights.
Below is the sample example to do the same.
class SharedConv(tf.keras.layers.Layer):
def __init__(
self,
filters,
kernel_size,
strides=None,
padding=None,
dilation_rates=None,
activation=None,
use_bias=True,
**kwargs
):
self.filters = filters
self.kernel_size = kernel_size
self.strides = strides
self.padding = padding
self.dilation_rates = dilation_rates
self.activation = activation
self.use_bias = use_bias
super().__init__(*args, **kwargs)
def build(self, input_shape):
self.conv = Conv2D(
self.filters,
self.kernel_size,
padding=self.padding,
dilation_rate=self.dilation_rates[0]
)
self.net1 = Activation(self.activation)
self.net2 = Activation(self.activation)
def call(self, inputs, **kwargs):
x1 = self.conv(inputs)
x1 = self.act1(x1)
x2 = tf.nn.conv2d(
inputs,
self.conv.weights[0],
padding=self.padding,
strides=self.strides,
dilations=self.dilation_rates[1]
)
if self.use_bias:
x2 = x2 + self.conv.weights[1]
x2 = self.act2(x2)
return x1, x2
I will answer on how to extract the embeddings (reference from my Github post):
My trained siamese model looked like this:
siamese_model.summary()
Note that my newly redefined model is basically the same as the one highlighted in yellow
I then redefined my model which I wanted to use for extracting embeddings (It should be the same model you defined except now it will not have those multiple inputs like siamese) which looked like this:
siamese_embeddings_model = build_siamese_model(input_shape)
siamese_embeddings_model .summary()
Then I just extracted the weights from my trained siamese model and set them into my new model
embeddings_weights = siamese_model.layers[-3].get_weights()
siamese_embeddings_model.set_weights(embeddings_weights )
Then you can supply the new Image to extract the embeddings from the new model
vector = siamese.predict(image)
len(vector[0]) it will print 150 because of my fine dense layer (which are the output vector)
My question is about restoring the Denoised Trained Model.
I have my network defined in the following way.
Conv1->relu1->Conv2->relu2->Conv3->relu3->Deconv1
The tf.variable_scope(name) is same as above.
Now I have my loss, optimizer and accuracy defined with tf.name_scope.
When I try to restore loss function, It will ask even for labels (which I don't have).
feed_dict={x:input, y:labels}
sess.run('loss',feed_dict)
Can anyone please help me understand how to test this? Which operation should I restore ?
Should I have to call all layers, pass the input and check the loss(MSE)?
I checked many examples but it seems to be all Classification problem and defining softmax with logits at last works.
Edit:
Below is my code and now it is easily visible how tf.name_scope and tf.variable_scope is defined. I feel I may have to bring whole layer to test new Image. Is that right?
def new_conv_layer(input, num_input_channels, filter_size, num_filters, name):
with tf.variable_scope(name):
# Shape of the filter-weights for the convolution
shape = [filter_size, filter_size, num_input_channels, num_filters]
# Create new weights (filters) with the given shape
weights = tf.Variable(tf.truncated_normal([filter_size, filter_size, num_input_channels, num_filters], stddev=0.5))
# Create new biases, one for each filter
biases = tf.Variable(tf.constant(0.05, shape=[num_filters]))
filters = tf.Variable(tf.truncated_normal([filter_size, filter_size, num_input_channels, num_filters], stddev=0.5))
# TensorFlow operation for convolution
layer = tf.nn.conv2d(input=input, filter=filters, strides=[1,1,1,1], padding='SAME')
# Add the biases to the results of the convolution.
layer += biases
return layer, weights
def new_relu_layer(input, name):
with tf.variable_scope(name):
#TensorFlow operation for convolution
layer = tf.nn.relu(input)
return layer
def new_pool_layer(input, name):
with tf.variable_scope(name):
# TensorFlow operation for convolution
layer = tf.nn.max_pool(value=input, ksize=[1, 1, 1, 1], strides=[1, 1, 1, 1], padding='SAME')
return layer
def new_layer(inputs, filters,kernel_size,strides,padding, name):
with tf.variable_scope(name):
layer = tf.layers.conv2d_transpose(inputs=inputs, filters=filters , kernel_size=kernel_size, strides=strides, padding=padding, data_format = 'channels_last')
return layer
layer_conv1, weights_conv1 = new_conv_layer(input=yTraininginput, num_input_channels=1, filter_size=5, num_filters=32, name ="conv1")
layer_relu1 = new_relu_layer(layer_conv1, name="relu1")
layer_conv2, weights_conv2 = new_conv_layer(input=layer_relu1, num_input_channels=32, filter_size=5, num_filters=64, name ="conv2")
layer_relu2 = new_relu_layer(layer_conv2, name="relu2")
layer_conv3, weights_conv3 = new_conv_layer(input=layer_relu2, num_input_channels=64, filter_size=5, num_filters=128, name ="conv3")
layer_relu3 = new_relu_layer(layer_conv3, name="relu3")
layer_deconv1 = new_layer(inputs=layer_relu3, filters=1, kernel_size=[5,5] ,strides=[1,1] ,padding='same',name = 'deconv1')
layer_relu4 = new_relu_layer(layer_deconv1, name="relu4")
layer_conv4, weights_conv4 = new_conv_layer(input=layer_relu4, num_input_channels=1, filter_size=5, num_filters=128, name ="conv4")
layer_relu5 = new_relu_layer(layer_conv4, name="relu5")
layer_deconv2 = new_layer(inputs=layer_relu5, filters=1, kernel_size=[5,5] ,strides=[1,1] ,padding='same',name = 'deconv2')
layer_relu6 = new_relu_layer(layer_deconv2, name="relu6")
# Use Cross entropy cost function
with tf.name_scope("loss"):
cross_entropy = tf.losses.mean_squared_error(labels = xTraininglabel,predictions = layer_relu6)
# Use Adam Optimizer
with tf.name_scope("optimizer"):
optimizer = tf.train.AdamOptimizer(learning_rate=1e-6).minimize(loss = cross_entropy)
# Accuracy
with tf.name_scope("accuracy"):
accuracy = tf.image.psnr(a=layer_relu6,b=xTraininglabel,max_val=1.0)
Try to view the graph of your code on tensorboard, get the operation name from the last layer(in your case deconv4). Something like below image.
Try loading the tensor, using below code:
operation = graph.get_tensor_by_name("<operationname:0>")
This should work, as your layers are interconnected.
Let me know if this worked!
Operation Image