I want to finetune efficientnet using tf.keras (tensorflow 2.3) but i cannot change the training status of layers properly. My model looks like this:
data_augmentation_layers = tf.keras.Sequential([
keras.layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),
keras.layers.experimental.preprocessing.RandomRotation(0.8)])
efficientnet = EfficientNetB3(weights="imagenet", include_top=False,
input_shape=(*img_size, 3))
#Setting to not trainable as described in the standard keras FAQ
efficientnet.trainable = False
inputs = keras.layers.Input(shape=(*img_size, 3))
augmented = augmentation_layers(inputs)
base = efficientnet(augmented, training=False)
pooling = keras.layers.GlobalAveragePooling2D()(base)
outputs = keras.layers.Dense(5, activation="softmax")(pooling)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss="categorical_crossentropy", optimizer=keras_opt, metrics=["categorical_accuracy"])
This is done so that my random weights on the custom top wont destroy the weights asap.
Model: "functional_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 512, 512, 3)] 0
_________________________________________________________________
sequential (Sequential) (None, 512, 512, 3) 0
_________________________________________________________________
efficientnetb3 (Functional) (None, 16, 16, 1536) 10783535
_________________________________________________________________
global_average_pooling2d (Gl (None, 1536) 0
_________________________________________________________________
dense (Dense) (None, 5) 7685
=================================================================
Total params: 10,791,220
Trainable params: 7,685
Non-trainable params: 10,783,535
Everything seems to work until this point. I train my model for 2 epochs and then i want to start fine-tuning the efficientnet base. Thus i call
for l in model.get_layer("efficientnetb3").layers:
if not isinstance(l, keras.layers.BatchNormalization):
l.trainable = True
model.compile(loss="categorical_crossentropy", optimizer=keras_opt, metrics=["categorical_accuracy"])
I recompiled and print the summary again to see that the number of non-trainable weights remained the same. Also fitting does not bring better results that keeping frozen.
dense (Dense) (None, 5) 7685
=================================================================
Total params: 10,791,220
Trainable params: 7,685
Non-trainable params: 10,783,535
Ps: I also tried efficientnet3.trainable = True but this also had no effect.
Could it be that it has something to do with the fact that i'm using a sequential and a functional model at the same time?
For me the problem was using sequential API for part of the model. When I change to sequential, my model.sumary() displayed all the sublayers and it was possible to set some of them as trainable and others not.
Related
Given the following code:
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Input, Dense, Lambda, Add, Conv2D, Flatten
from tensorflow.keras.optimizers import RMSprop
X = Flatten(input_shape=input_shape)(X_input)
X = Dense(512, activation="elu", kernel_initializer='he_uniform')(X)
action = Dense(action_space, activation="softmax", kernel_initializer='he_uniform')(X)
value = Dense(1, kernel_initializer='he_uniform')(X)
Actor = Model(inputs = X_input, outputs = action)
Actor.compile(loss=ppo_loss, optimizer=RMSprop(learning_rate=lr))
Critic = Model(inputs = X_input, outputs = value)
Critic.compile(loss='mse', optimizer=RMSprop(learning_rate=lr))
Actor.fit(...)
Critic.predict(...)
Are Actor and Critic seperate networks or do i partially fit Critic with Actor.fit()?
The network Critic and Actor share same networks for most of the part, except the last layer where Actor has action layer and Critic has value layer. This can be visible when you do a Actor.summary() compared to Critic.summary(). See below.
Actor.summary()
Model: "model_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) [(None, 784)] 0
flatten_6 (Flatten) (None, 784) 0
dense_16 (Dense) (None, 512) 401920
dense_17 (Dense) (None, 1) 513
=================================================================
Total params: 402,433
Trainable params: 402,433
Non-trainable params: 0
Critic.summary()
Model: "model_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) [(None, 784)] 0
flatten_6 (Flatten) (None, 784) 0
dense_16 (Dense) (None, 512) 401920
dense_18 (Dense) (None, 1) 513
=================================================================
Total params: 402,433
Trainable params: 402,433
Non-trainable params: 0
You can see the first three layers have represented with the same name, therefore are same objects in the memory. This can be also verified when you do a layer[n].get_weights(). This should provide you same weights for identical layer in both networks.
So when you fit on Actor, except the last layer, the weights of the other layers get adjusted, which also effects the Critic. But the last layer for Critic is not trained yet and therefore when you do Crtic.predict() your predictions are not as per training done on Actor.
So yes you partially fit Critic with Actor.fit().
I'm trying to get some heatmaps from a computervision model that's it's already working to classify images but I'm finding some difficulties.
This is the model summary:
model.summary()
Model: "model_4"
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) [(None, 512, 512, 1)] 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 512, 512, 3) 30
_________________________________________________________________
densenet121 (Functional) (None, 1024) 7037504
_________________________________________________________________
dense_4 (Dense) (None, 100) 102500
_________________________________________________________________
dropout_4 (Dropout) (None, 100) 0
_________________________________________________________________
predictions (Dense) (None, 2) 202
=================================================================
Total params: 7,140,236
Trainable params: 7,056,588
Non-trainable params: 83,648
As part of the standard procces to create a heatmap, I know I have to acces to the last convolutional layer in the model, that in this case I'll say it's a layer inside the Densenet121, but I can not find a way to access to all the layers belonging to densenet121.
Right now, I've been using conv2d_4 layer to run some tests, but I feel is not the right way because that layer is before all the Transfer learning work from densenet.
Also, I just looked up for Funcitnal layers in KErar official documentation but I cound't find it, so I guess it's not a layer, it's like the hole densenet model embedded there, but I can not find a way to access.
By the way, here I share the model construction because it may help to answer this:
from tensorflow.keras.applications.densenet import DenseNet121
num_classes = 2
input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1))
x = Conv2D(3,(3,3), padding='same')(input_tensor)
x = DenseNet121(include_top=False, classes=2, pooling="avg", weights="imagenet")(x)
x = Dense(100)(x)
x = Dropout(0.45)(x)
predictions = Dense(num_classes, activation='softmax', name="predictions")(x)
model = Model(inputs=input_tensor, outputs=predictions)
I found you can use
.get_layer()
twice to acces layers inside functional densenet model embebeed in the "main" model.
In this case I can use model.get_layer('densenet121').summary() to check all thje layer inside the embebeed model, and then use them with this code: model.get_layer('densenet121').get_layer('xxxxx')
Suppose I have a model
from tensorflow.keras.applications import DenseNet201
base_model = DenseNet201(input_tensor=Input(shape=basic_shape))
model = Sequential()
model.add(base_model)
model.add(Dense(400))
model.add(BatchNormalization())
model.add(ReLU())
model.add(Dense(50, activation='softmax'))
model.save('test.hdf5')
Then I load the saved model and try to make the last 40 layers of DenseNet201 trainable and the first 161 - non-trainable:
saved_model = load_model('test.hdf5')
cnt = 44
saved_model.trainable = False
while cnt > 0:
saved_model.layers[-cnt].trainable = True
cnt -= 1
But this is not actually working because DenseNet201 is determined as a single layer and I just get index out of range error.
Layer (type) Output Shape Param #
=================================================================
densenet201 (Functional) (None, 1000) 20242984
_________________________________________________________________
dense (Dense) (None, 400) 400400
_________________________________________________________________
batch_normalization (BatchNo (None, 400) 1600
_________________________________________________________________
re_lu (ReLU) (None, 400) 0
_________________________________________________________________
dense_1 (Dense) (None, 50) 20050
=================================================================
Total params: 20,665,034
Trainable params: 4,490,090
Non-trainable params: 16,174,944
The question is how can I actually make the first 161 layers of DenseNet non-trainable and the last 40 layers trainable on a loaded model?
densenet201 (Functional) is a nested model, therefore you can access its layers the same way you access the layers of your 'topmost' model.
saved_model.layers[0].layers
where saved_model.layers[0] is a model with its own layers.
In your loop, you need to access the layers like this
saved_model.layers[0].layers[-cnt].trainable = True
Update
By default, the loaded model's layers are trainable (trainable=True), therefore you will need to set the bottom layers' trainable attribute to False instead.
I am trying to train an auto encoder in tensorflow using the Keras Layer API. This API is quite nice and easy to use to setup the deep learning layers.
Just to review a quickly an autoencoder (in my mind) is a function $f(x) = z$ and its pseudo inverse \hat{x} = f^{-1}(z) such that f(f^{-1}(x)) \approx x. In a neural network model, you would setup a neural network with a bottleneck layer that tries to predict itself x using f^{-1}(f(x)). When the training error minimizes, you then have two components, z = f(x) is the prediction up until and including the bottleneck layer. f^{-1}(z) is the bottleneck layer to the end.
So I setup the encoder:
SZ = 6
model = tf.keras.Sequential()
model.add(layers.InputLayer(SZ))
model.add(layers.Dense(SZ))
model.add(layers.Dense(1))
model.add(layers.Dense(SZ))
model.summary()
model.compile('sgd','mse',metrics = ['accuracy'])
history= model.fit(returns.values,returns.values,epochs=100)
My difficulty here is that the weights and components (f being input+dense(SZ)+dense(1),f^{-1} being dense(1)+dense(SZ)) are trained but I do not know how to disentangle them. Is there some way to break off the two layers in the neural network and have them treated as their own separate models?
import tensorflow as tf
SZ=6
encoder_input = tf.keras.layers.Input(shape=(SZ,))
x = tf.keras.layers.Dense(SZ)(encoder_input)
x = tf.keras.layers.Dense(1)(x)
encoder_model = tf.keras.Model(inputs=encoder_input, outputs=x, name='encoder')
decoder_input = tf.keras.layers.Input(shape=(1,))
x2 = tf.keras.layers.Dense(SZ)(decoder_input)
decoder_model = tf.keras.Model(inputs=decoder_input, outputs=x2, name='decoder')
encoder_output = encoder_model(encoder_input)
decoder_output = decoder_model(encoder_output)
encoder_decoder_model = tf.keras.Model(inputs=encoder_input , outputs=decoder_output, name='encoder-decoder')
encoder_decoder_model.summary()
Here is the summary:
Model: "encoder-decoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) [(None, 6)] 0
_________________________________________________________________
encoder (Model) (None, 1) 49
_________________________________________________________________
decoder (Model) (None, 6) 12
=================================================================
Total params: 61
Trainable params: 61
Non-trainable params: 0
you could train the encoder-decoder model and you separate encoder_model and decoder_model will be trained automatically. You could also retrieve them from your encoder_decoder model as follows:
retrieved_encoder = encoder_decoder_model.get_layer('encoder')
retrieved_encoder.summary()
it prints:
Model: "encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) [(None, 6)] 0
_________________________________________________________________
dense_11 (Dense) (None, 6) 42
_________________________________________________________________
dense_12 (Dense) (None, 1) 7
=================================================================
Total params: 49
Trainable params: 49
Non-trainable params: 0
and the decoder:
retrieved_decoder = encoder_decoder_model.get_layer('decoder')
retrieved_decoder.summary()
which prints:
Model: "decoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) [(None, 1)] 0
_________________________________________________________________
dense_13 (Dense) (None, 6) 12
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
I have a problem with my current attempt to build a sequential model for time series classification in Keras. I want to work with channels_first data, because it is more convenient from a perprocessing perspective (I only work with one channel, though). This works fine for the Convolution1D layers I'm using, as I can specify data_sample='channels_first', but somehow this won't work for Maxpooling1D, which doesn't have this option as it seems.
The model I want to build is structured as follows:
model = Sequential()
model.add(Convolution1D(filters=16, kernel_size=35, activation='relu', input_shape=(1, window_length), data_format='channels_first'))
model.add(MaxPooling1D(pool_size=5)
model.add(Convolution1D(filters=16, kernel_size=10, activation='relu', data_format='channels_first'))
[...] #several other layers here
With window_length = 5000 I get the following summary after all three layers are added:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_1 (Conv1D) (None, 32, 4966) 1152
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 4, 4966) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 16, 4957) 656
=================================================================
Total params: 1,808
Trainable params: 1,808
Non-trainable params: 0
Now, I wonder if this is correct, as I would expect the third dimension (i.e. the number of neurons in a feature map) and not the second (i.e. the number of filters) to be reduced by the pooling layer? As I see it, MaxPooling1D does not recognize the channels_first ordering and while the Keras documentation says there exists a keyword data_format for MaxPooling2D, there's no such keyword for MaxPooling1D.
I tested the whole setup with a channels_last data format, and it worked as I expected. But since the conversion from channels_first to channels_last takes quite some time for me, I'd really rather have this work with channels_first. And I have the feeling that I'm simply missing something.
If you need any more information, let me know.
Update: as mentioned by #HSK in the comments, the data_format argument is now supported in MaxPooling layers as a result of this PR.
Well, one alternative is to use the Permute layer (and remove the channels_first for the second conv layer):
model = Sequential()
model.add(Convolution1D(filters=16, kernel_size=35, activation='relu', input_shape=(1, 100), data_format='channels_first'))
model.add(Permute((2, 1)))
model.add(MaxPooling1D(pool_size=5))
model.add(Convolution1D(filters=16, kernel_size=10, activation='relu'))
model.summary()
Model summary:
Layer (type) Output Shape Param #
=================================================================
conv1d_7 (Conv1D) (None, 16, 66) 576
_________________________________________________________________
permute_1 (Permute) (None, 66, 16) 0
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 13, 16) 0
_________________________________________________________________
conv1d_8 (Conv1D) (None, 4, 16) 2096
=================================================================
Total params: 2,672
Trainable params: 2,672
Non-trainable params: 0
_________________________________________________________________