Tensorflow: Weights of non-trainable model layers are updated - python

I have a trained model which is created using Keras. On this model I want to apply transfer learning by freezing all but the last convolutional layer. However, when I fit the model after freezing the layers I notice that some of the freezed layers have different weights. How can I avoid this?
I tried to freeze the entire model with model.trainable = False but this also didn't work out.
I am using python 3.5.0, tensorflow 1.0.1 and Keras 2.0.3
Example script
import os
import timeit
import datetime
import numpy as np
from keras.layers.core import Activation, Reshape, Permute
from keras.layers.convolutional import Convolution2D, MaxPooling2D, UpSampling2D, ZeroPadding2D
from keras.layers.normalization import BatchNormalization
from keras.optimizers import Adam
from keras import models
from keras import backend as K
K.set_image_dim_ordering('th')
def conv_model(input_shape, data_shape, kern_size, filt_size, pad_size,\
maxpool_size, n_classes, compile_model=True):
"""
Create a small conv neural network
input_shape: input shape of the images
data_shape: 1d shape of the data
kern_size: Kernel size used in all convolutional2d layers
filt_size: Filter size of the first and last convolutional2d layer
pad_size: size of padding
maxpool_size: Pool size of all maxpooling2d and upsampling2d layers
n_classes: number of output classes
compile_model: True if the model should be compiled
output: Keras deep learning model
"""
#keep track of compilation time
start_time = timeit.default_timer()
model = models.Sequential()
# Add a noise layer to get a denoising autoencoder. This helps avoid overfitting
model.add(ZeroPadding2D(padding=(pad_size, pad_size), input_shape=input_shape))
#Encoding layers
model.add(Convolution2D(filt_size, kern_size, kern_size, border_mode='valid'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
model.add(UpSampling2D(size=(maxpool_size, maxpool_size)))
model.add(ZeroPadding2D(padding=(pad_size, pad_size)))
model.add(Convolution2D(filt_size, kern_size, kern_size, border_mode='valid'))
model.add(BatchNormalization())
model.add(Convolution2D(n_classes, 1, 1, border_mode='valid'))
model.add(Reshape((n_classes, data_shape), input_shape=(n_classes,)+input_shape[1:]))
model.add(Permute((2, 1)))
model.add(Activation('softmax'))
if compile_model:
model.compile(loss="categorical_crossentropy", optimizer='adam', metrics=["accuracy"])
print('Model compiled in {0} seconds'.format(datetime.timedelta(seconds=round(\
timeit.default_timer() - start_time))))
return model
if __name__ == '__main__':
#Create some random training data
train_data = np.random.randint(0, 10, 3*512*512*20, dtype='uint8').reshape(-1, 3, 512, 512)
train_labels = np.random.randint(0, 1, 7*512*512*20, dtype='uint8').reshape(-1, 512*512, 7)
#Get dims of the data
data_dims = train_data.shape[2:]
data_shape = np.prod(data_dims)
#Create initial model
initial_model = conv_model((train_data.shape[1], train_data.shape[2], train_data.shape[3]),\
data_shape, 3, 4, 1, 2, train_labels.shape[-1])
#Train initial model on first part of the training data
initial_model.fit(train_data[0:10], train_labels[0:10], verbose=2)
#Store initial weights
initial_weights = initial_model.get_weights()
#Create transfer learning model
transf_model = conv_model((train_data.shape[1], train_data.shape[2], train_data.shape[3]),\
data_shape, 3, 4, 1, 2, train_labels.shape[-1], False)
#Set transfer model weights
transf_model.set_weights(initial_weights)
#Set all layers trainable to False (except final conv layer)
for layer in transf_model.layers:
layer.trainable = False
transf_model.layers[9].trainable = True
print(transf_model.layers[9])
#Compile model
transf_model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=1e-4),\
metrics=["accuracy"])
#Train model on second part of the data
transf_model.fit(train_data[10:20], train_labels[10:20], verbose=2)
#Store transfer model weights
transf_weights = transf_model.get_weights()
#Check where the weights have changed
for i in range(len(initial_weights)):
update_w = np.sum(initial_weights[i] != transf_weights[i])
if update_w != 0:
print(str(update_w)+' updated weights for layer '+str(transf_model.layers[i]))

Once you compiled your model - you lost your previous weights, as they were resampled. You need to first transfer them, set weights to be not trainable and then compile it:
#Compile model
transf_model.set_weights(initial_weights)
#Set all layers trainable to False (except final conv layer)
for layer in transf_model.layers:
layer.trainable = False
transf_model.layers[9].trainable = True
transf_model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=1e-4),\
metrics=["accuracy"])
Otherwise - weights would change as they are resampled.
EDIT:
The model should be compiled after changes - because during compilation keras is setting all trainable / not trainable weights in a list which is not further changed.

You should Upgrade Keras to Keras v2.1.3
This issue is just solved and this very last feature of freezing BatchNormalization layers is now available in the recent release:
trainable attribute in BatchNormalization now disables the updates of the batch statistics (i.e. if trainable == False the layer will now run 100% in inference mode).
The Reason of error:
In the previous versions, the variance and mean parameters of BatchNormalization layers couldn't set untrainable and it didn't work, although you sat layer.trainable = False.
Now, it works!

Related

How do I force my training data to match the output shape of my neural network?

I am trying to use the transfer learning example on keras.applications using VGG19. I am trying to train on the cifar10 dataset, so 10 classes. My model is (conceptually) simple as it's just VGG 19 minus the top three layers and then some extra layers that are trainable.
import tensorflow as tf
from keras.utils import to_categorical
from keras.applications import VGG19
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Input
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
#%%
# Specify input and number of classes
input_tensor = Input(shape=(32, 32, 3))
num_classes=10
#Load the data (cifar100), if label mode is fine then 100 classes
(X_train,y_train),(X_test,y_test)=tf.keras.datasets.cifar10.load_data()
#One_Hot_encode y data
y_test=to_categorical(y_test,num_classes=num_classes,dtype='int32')
y_train=to_categorical(y_train,num_classes=num_classes,dtype='int32')
#%%
# create the base pre-trained model
base_model = VGG19(weights='imagenet', include_top=False,
input_tensor=input_tensor)
# Add a fully connected layer and then a logistic layer
x = base_model.output
# # let's add a fully-connected layer
x = Dense(1024, activation='relu',name='Fully_Connected')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(num_classes, activation='softmax',name='Logistic')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
# train the model on the new data for a few epochs
model.fit(X_train,y_train,epochs=10)
#%%
model.evaluate(X_test,y_test)
Now when I try to train using X_train [dimensions of (50000, 32, 32, 3)] and y_test (dimensions of (50000,10),
I get an error:
ValueError: Error when checking target: expected Logistic to have 4
dimensions, but got array with shape (50000, 10)
So for some reason the model isn't realizing that its output shape should be a 1x10 vector with one-hot encoding for the 10 classes.
How can I make it so that the dimensions agree? I don't fully understand the dimensions of output that keras is expecting here. When I do model.summary() the Logistic layer yields that the output shape should be (None, 1, 1, 10), which when flattened should just give a
VGG19 without the top layers does not return a fully-connected layer and instead returns a 2D feature space (output of a Conv2D/max pooling2d I believe). You'll probably want to place a flatten after the VGG, that'd be the best practical choice, as it will make your output shape (None,10).
Otherwise, you could do
y_train = np.reshape(y_train, (50000,1,1,10))

Can CNN do better than pretrained CNN?

With all I know. pretrained CNN can do way better than CNN. I have a dataset of 855 images. I have applied CNN and got 94% accuracy.Then I applied Pretrained model (VGG16, ResNet50, Inception_V3, MobileNet)also with fine tuning but still i got highest 60% and two of them are doing very bad on classification. Can CNN really do better than pretrained model or my implementation is wrong. I've converted my image into 100 by 100 dimensions and followed the way of keras application. Then What is the issue ??
Naive CNN approach :
def cnn_model():
size = (100,100,1)
num_cnn_layers =2
NUM_FILTERS = 32
KERNEL = (3, 3)
MAX_NEURONS = 120
model = Sequential()
for i in range(1, num_cnn_layers+1):
if i == 1:
model.add(Conv2D(NUM_FILTERS*i, KERNEL, input_shape=size,
activation='relu', padding='same'))
else:
model.add(Conv2D(NUM_FILTERS*i, KERNEL, activation='relu',
padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(int(MAX_NEURONS), activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(int(MAX_NEURONS/2), activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
return model
VGG16 approach:
def vgg():
` `vgg_model = keras.applications.vgg16.VGG16(weights='imagenet',include_top=False,input_shape = (100,100,3))
model = Sequential()
for layer in vgg_model.layers:
model.add(layer)
# Freeze the layers
for layer in model.layers:
layer.trainable = False
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(3, activation='softmax'))
model.compile(optimizer=keras.optimizers.Adam(lr=1e-5),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
What you're referring to as CNN in both cases talk about the same thing, which is a type of a neural network model. It's just that the pre-trained model has been trained on some other data instead of the dataset you're working on and trying to classify.
What is usually used here is called Transfer Learning. Instead of freezing all the layers, trying leaving the last few layers open so they can be retrained with your own data, so that the pretrained model can edit its weights and biases to match your needs as well. It could be the case that the dataset you're trying to classify is foreign to the pretrained models.
Here's an example from my own work, there are additional pieces of code but you can make it work with your own code, the logic remains the same
#You extract the layer which you want to manipulate, usually the last few.
last_layer = pre_trained_model.get_layer(name_of_layer)
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024,activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(1,activation='sigmoid')(x)
#Here we combine your newly added layers and the pre-trained model.
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])
Adding to what #Ilknur Mustafa mentioned, as your dataset may be foreign to the images used for pre-training, you can try to re-train few last layers of the pre-trained model instead of adding a whole new layers. The below example code doesn't add any additional trainable layer other than the output layer. In this way, you can benefit by retraining the last few layers on the existing weights, rather than training from scratch. This may be beneficial if you don't have a large dataset to train on.
# load model without classifier layers
vgg = VGG16(include_top=False, input_shape=(100, 100, 3), weights='imagenet', pooling='avg')
# make only last 2 conv layers trainable
for layer in vgg.layers[:-4]:
layer.trainable = False
# add output layer
out_layer = Dense(3, activation='softmax')(vgg.layers[-1].output)
model_pre_vgg = Model(vgg.input, out_layer)
# compile model
opt = SGD(lr=1e-5)
model_pre_vgg.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
#You extract the layer which you want to manipulate, usually the last few.
last_layer = pre_trained_model.get_layer(name_of_layer)
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024,activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(1,activation='sigmoid')(x)
#Here we combine your newly added layers and the pre-trained model.
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])

How to train model to add new classes?

My trained model has 10 classes ( ie. output layer has 10 classes). I want to add 3 more classes to it without training the whole model again.
I want to use the old trained model and add new classes to it.
This is the code I had already tried but it shows an error.
from keras.models import load_model
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
base_model = load_model('hand_gest.h5')
new_model = Sequential()
for layer in base_model.layers[:-2]:
new_model.add(layer)
for layer in new_model.layers:
layer.trainable = False
weights_training = base_model.layers[-2].get_weights()
new_model.layers[-2].set_weights(weights_training)
new_model.add(Dense(units = 3, activation = 'softmax'))
But when I train this model it shows the following error.
ValueError: You called `set_weights(weights)` on layer "max_pooling2d_2" with a weight list of length 2, but the layer was expecting 0 weights. Provided weights: [array([[-0.01650696, 0.01082378, 0.0149541 , .....
As the number of classes is changed from 10 to 13, the last layer of the previous network needs to be changed.
base_model = load_model('hand_gest.h5')
base_model.pop() #remove the last layer - 'Dense' layer with 10 units
for layer in base_model.layers:
layer.trainable = False
base_model.add(Dense(units = 13, activation = 'softmax'))
base_model.summary() #Check architecture before starting the fine-tuning

How to get output of hidden layer given an input, weights and biases of the hidden layer in keras?

Suppose I have trained the model below for an epoch:
model = Sequential([
Dense(32, input_dim=784), # first number is output_dim
Activation('relu'),
Dense(10), # output_dim, input_dim is taken for granted from above
Activation('softmax'),
])
And I got the weights dense1_w, biases dense1_b of first hidden layer (named it dense1) and a single data sample sample.
How do I use these to get the output of dense1 on the sample in keras?
Thanks!
The easiest way is to use the keras backend. With the keras backend you can define a function that gives you the intermediate output of a keras model as defined here (https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer).
So in essence:
get_1st_layer_output = K.function([model.layers[0].input],
[model.layers[1].output])
layer_output = get_1st_layer_output([X])
Just recreate the first part of the model up until the layer for which you would like the output (in your case only the first dense layer). Afterwards you can load the trained weights of the first part in your newly created model and compile it.
The output of the prediction with this new model will be the output of the layer (in your case the first dense layer).
from keras.models import Sequential
from keras.layers import Dense, Activation
import numpy as np
model = Sequential([
Dense(32, input_dim=784), # first number is output_dim
Activation('relu'),
Dense(10), # output_dim, input_dim is taken for granted from above
Activation('softmax'),
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
#create some random data
n_features = 5
samples = np.random.randint(0, 10, 784*n_features).reshape(-1,784)
labels = np.arange(10*n_features).reshape(-1, 10)
#train your sample model
model.fit(samples, labels)
#create new model
new_model= Sequential([
Dense(32, input_dim=784), # first number is output_dim
Activation('relu')])
#set weights of the first layer
new_model.set_weights(model.layers[0].get_weights())
#compile it after setting the weights
new_model.compile(optimizer='adam', loss='categorical_crossentropy')
#get output of the first dens layer
output = new_model.predict(samples)
As for weights, I had a none-Sequential model. What I did was using model.summary() to get the desired layers name and then model.get_layer("layer_name").get_weights() to get the weights.

Fine tuning pretrained model in keras

I want to use a pretrained imagenet VGG16 model in keras and add my own small convnet on top. I am only interested in the features, not the predictions
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np
import os
from keras.models import Model
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
load images from directory (the dir contains 4 images)
IF = '/home/ubu/files/png/'
files = os.listdir(IF)
imgs = [img_to_array(load_img(IF + p, target_size=[224,224])) for p in files]
im = np.array(imgs)
load the base model, preprocess input and get the features
base_model = VGG16(weights='imagenet', include_top=False)
x = preprocess_input(aa)
features = base_model.predict(x)
this works, and I get the features for my images on the pretrained VGG.
I now want to finetune the model and add some convolutional layers.
I read https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html and https://keras.io/applications/ but cannot quite bring them together.
adding my model on top:
x = base_model.output
x = Convolution2D(32, 3, 3)(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Convolution2D(32, 3, 3)(x)
x = Activation('relu')(x)
feat = MaxPooling2D(pool_size=(2, 2))(x)
building the complete model
model_complete = Model(input=base_model.input, output=feat)
stop base layers from being learned
for layer in base_model.layers:
layer.trainable = False
new model
model_complete.compile(optimizer='rmsprop',
loss='binary_crossentropy')
now fit the new model, the model is 4 images and [1,0,1,0] are the class labels.
But this is obviously wrong:
model_complete.fit_generator((x, [1,0,1,0]), samples_per_epoch=100, nb_epoch=2)
ValueError: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None
How is this done?
How would I do it if I only wanted to replace the last convolutional block (conv block5 in VGG16) instead of adding something?
How would I only train the bottleneck features?
The features output features has shape (4, 512, 7, 7). There are four images, but what is in the other dimensions? How would I reduce that to a (1,x) array?
Fitting model
The problem with your generator code is that the fit_generator method expects a generator function to generate the data for fitting which you don't provide.
You can either define a generator as done in the tutorial that you have linked to or create the data and labels yourself and fit your model yourself:
model_complete.fit(images, labels, batch_size=100, nb_epoch=2)
where images are your generated training images and labels are the corresponding labels.
Removing last layer
Assuming you have a model variable and the "pop" method described below, you can do model = pop(model) to remove the last layer.
Training only specific layers
As you have done in your code, you can do:
for layer in base_model.layers:
layer.trainable = False
Then you can "unfreeze" and layer that you want by changing their trainable property to True.
Changing dimensions
To change the output to a 1D array you can use the Flatten layer
The pop method
def pop(model):
'''Removes a layer instance on top of the layer stack.
This code is thanks to #joelthchao https://github.com/fchollet/keras/issues/2371#issuecomment-211734276
'''
if not model.outputs:
raise Exception('Sequential model cannot be popped: model is empty.')
else:
model.layers.pop()
if not model.layers:
model.outputs = []
model.inbound_nodes = []
model.outbound_nodes = []
else:
model.layers[-1].outbound_nodes = []
model.outputs = [model.layers[-1].output]
model.built = False
return model
Use model.fit(X, y) to train on your dataset as explained here: https://keras.io/models/model/#fit
Additionally you should add a Flatten layer and a dense layer with an ouput shape of 1 to get the correct result shape.

Categories