I want to build an LSTM on top of pre-trained CNN (VGG) to classify a video sequence. The LSTM will be fed with the features extracted by the last FC layer of VGG.
The architecture is something like:
I wrote the code:
def build_LSTM_CNN_net()
from keras.applications.vgg16 import VGG16
from keras.models import Model
from keras.layers import Dense, Input, Flatten
from keras.layers.pooling import GlobalAveragePooling2D, GlobalAveragePooling1D
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import TimeDistributed
from keras.optimizers import Nadam
from keras.applications.vgg16 import VGG16
num_classes = 5
frames = Input(shape=(5, 224, 224, 3))
base_in = Input(shape=(224,224,3))
base_model = VGG16(weights='imagenet',
include_top=False,
input_shape=(224,224,3))
x = Flatten()(base_model.output)
x = Dense(128, activation='relu')(x)
x = TimeDistributed(Flatten())(x)
x = LSTM(units = 256, return_sequences=False, dropout=0.2)(x)
x = Dense(self.nb_classes, activation='softmax')(x)
lstm_cnn = build_LSTM_CNN_net()
keras.utils.plot_model(lstm_cnn, "lstm_cnn.png", show_shapes=True)
But got the error:
ValueError: `TimeDistributed` Layer should be passed an `input_shape ` with at least 3 dimensions, received: [None, 128]
Why is this happening, how can I fix it?
here the correct way to build a model to classify video sequences. Note that I wrap into TimeDistributed a model instance. This model was previously build to extract features from each frame individually. In the second part, we deal the frame sequences
frames, channels, rows, columns = 5,3,224,224
video = Input(shape=(frames,
rows,
columns,
channels))
cnn_base = VGG16(input_shape=(rows,
columns,
channels),
weights="imagenet",
include_top=False)
cnn_base.trainable = False
cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(cnn_base.input, cnn_out)
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(10, activation="softmax")(hidden_layer)
model = Model(video, outputs)
model.summary()
if you want to use the VGG 1x4096 emb representation you can simply do:
frames, channels, rows, columns = 5,3,224,224
video = Input(shape=(frames,
rows,
columns,
channels))
cnn_base = VGG16(input_shape=(rows,
columns,
channels),
weights="imagenet",
include_top=True) #<=== include_top=True
cnn_base.trainable = False
cnn = Model(cnn_base.input, cnn_base.layers[-3].output) # -3 is the 4096 layer
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(10, activation="softmax")(hidden_layer)
model = Model(video, outputs)
model.summary()
Related
This is my code in order to join resnet50 model with this model (that I want to train on my dataset). I want to freeze layers of the resnet50 model ( see Trainable=false) in the code .
Here I'm importing resnet 50 model
``
import tensorflow.keras
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
resnet50_imagnet_model = tensorflow.keras.applications.resnet.ResNet50(weights = "imagenet",
include_top=False,
input_shape = (150, 150, 3),
pooling='max')
``
Here I create my model
```
# freeze feature layers and rebuild model
for l in resnet50_imagnet_model.layers:
l.trainable = False
#construction du model
model5 = [
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(12, activation='softmax')
]
#Jointure des deux modeles
model_using_pre_trained_resnet50 = tf.keras.Sequential(resnet50_imagnet_model.layers + model5 )
```
Last line doesn't work and I have this error :
Input 0 of layer conv2_block1_3_conv is incompatible with the layer: expected axis -1 of input shape to have value 64 but received input with shape [None, 38, 38, 256
Thanks for help .
You can also use keras' functional API, like below
from tensorflow.keras.applications.resnet50 import ResNet50
import tensorflow as tf
resnet50_imagenet_model = ResNet50(include_top=False, weights='imagenet', input_shape=(150, 150, 3))
#Flatten output layer of Resnet
flattened = tf.keras.layers.Flatten()(resnet50_imagenet_model.output)
#Fully connected layer 1
fc1 = tf.keras.layers.Dense(128, activation='relu', name="AddedDense1")(flattened)
#Fully connected layer, output layer
fc2 = tf.keras.layers.Dense(12, activation='softmax', name="AddedDense2")(fc1)
model = tf.keras.models.Model(inputs=resnet50_imagenet_model.input, outputs=fc2)
Also refer this question.
I have a sequence of 5 images that I want to pass through a CNN sequentially. A single input will have size: (5, width, height, channels) and I want to pass each image in the sequence in order to a 2D CNN, concatenate all 5 outputs at some layer and then feed to an LSTM. My model looks something like this:
from keras.models import Model
from keras.layers import Dense, Input, LSTM, Flatten, Conv2D, MaxPooling2D
# Feed images in sequential order here
inputs = Input(shape=(128, 128, 3))
x = Conv2D(16, 3, activation='relu')(inputs)
x = MaxPooling2D((2, 2))(x)
...
# Concatenate sequence outputs here
x = LSTM(8)(x)
x = Flatten()(x)
outputs = Dense(5, activation='sigmoid')
model = Model(inputs=inputs, outputs=outputs)
Eventually I want to concatenate all 5 outputs together at some point in the network and feed them to an LSTM but I am having trouble figuring out how to feed sequence of images in order to a 2D convolutional layer. I have looked into 3D convolutional layers and the ConvLSTM2D layer but I want to figure out how I can do it this way instead.
I'm loading the VGG16 pretrained model, adding a couple of dense layers and fine tuning the last 5 layers of the base VGG16. I'm training my model on mutliple gpus. I saved the model before and after training. The weights are the same inspite of having layers.trainable = True.
Please help!
Heres the code
from keras import applications
from keras import Model
<import other relevant Keras layers, etc.>
model = applications.VGG16(weights = "imagenet", include_top = False, input_shape = (224,224,3))
model.save('./before_training')
for layer in model.layers:
layer.trainable = False
for layer in model.layers[-5:]:
layer.trainable = True
x = model.output
x = Flatten()(x)
x = Dense(1024, activation = "relu")(x)
x = Dropout(0.5)(x)
x = Dense(1024, activation = "relu")(x)
predictions = Dense(2, activation = "softmax")(x)
model_final = Model(input = model.input, output = predictions)
from keras.utils import multi_gpu_model
parallel_model = multi_gpu_model(model_final, gpus = 4)
parallel_model.compile(loss = "categorical_crossentropy" ..... )
datagen = ImageDataGenerator(....)
early = EarlyStopping(...)
train_generator = datagen.flow_from_directory(train_data_dir,...)
validation_generator = datagen.flow_from_directory(test_data_dir,...)
parallel_model.fit_generator(train_generator, validation_data = valiudation_generator,...)
model_final.save('./after_training)
Weights in before_training and after_training models are the same!!! Which is not what I expected!
I combine two VGG net in keras together to make classification task. When I run the program, it shows an error:
RuntimeError: The name "predictions" is used 2 times in the model. All layer names should be unique.
I was confused because I only use prediction layer once in my code:
from keras.layers import Dense
import keras
from keras.models import Model
model1 = keras.applications.vgg16.VGG16(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None,
pooling=None,
classes=1000)
model1.layers.pop()
model2 = keras.applications.vgg16.VGG16(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None,
pooling=None,
classes=1000)
model2.layers.pop()
for layer in model2.layers:
layer.name = layer.name + str("two")
model1.summary()
model2.summary()
featureLayer1 = model1.output
featureLayer2 = model2.output
combineFeatureLayer = keras.layers.concatenate([featureLayer1, featureLayer2])
prediction = Dense(1, activation='sigmoid', name='main_output')(combineFeatureLayer)
model = Model(inputs=[model1.input, model2.input], outputs= prediction)
model.summary()
Thanks for #putonspectacles help, I follow his instruction and find some interesting part. If you use model2.layers.pop() and combine the last layer of two models using "model.layers.keras.layers.concatenate([model1.output, model2.output])", you will find that the last layer information is still showed using the model.summary(). But actually they do not exist in the structure. So instead, you can use model.layers.keras.layers.concatenate([model1.layers[-1].output, model2.layers[-1].output]). It looks tricky but it works.. I think it is a problem about synchronization of the log and structure.
First, based on the code you posted you have no layers with a name attribute 'predictions', so this error has nothing to do with your layer
Dense layer prediction: i.e:
prediction = Dense(1, activation='sigmoid',
name='main_output')(combineFeatureLayer)
The VGG16 model has a Dense layer with name predictions. In particular this line:
x = Dense(classes, activation='softmax', name='predictions')(x)
And since you're using two of these models you have layers with duplicate names.
What you could do is rename the layer in the second model to something other than predictions, maybe predictions_1, like so:
model2 = keras.applications.vgg16.VGG16(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None,
pooling=None,
classes=1000)
# now change the name of the layer inplace.
model2.get_layer(name='predictions').name='predictions_1'
You can change the layer's name in keras, don't use 'tensorflow.python.keras'.
Here is my sample code:
from keras.layers import Dense, concatenate
from keras.applications import vgg16
num_classes = 10
model = vgg16.VGG16(include_top=False, weights='imagenet', input_tensor=None, input_shape=(64,64,3), pooling='avg')
inp = model.input
out = model.output
model2 = vgg16.VGG16(include_top=False,weights='imagenet', input_tensor=None, input_shape=(64,64,3), pooling='avg')
for layer in model2.layers:
layer.name = layer.name + str("_2")
inp2 = model2.input
out2 = model2.output
merged = concatenate([out, out2])
merged = Dense(1024, activation='relu')(merged)
merged = Dense(num_classes, activation='softmax')(merged)
model_fusion = Model([inp, inp2], merged)
model_fusion.summary()
Example:
# Network for affine transform estimation
affine_transform_estimator = MobileNet(
input_tensor=None,
input_shape=(config.IMAGE_H // 2, config.IMAGE_W //2, config.N_CHANNELS),
alpha=1.0,
depth_multiplier=1,
include_top=False,
weights='imagenet'
)
affine_transform_estimator.name = 'affine_transform_estimator'
for layer in affine_transform_estimator.layers:
layer.name = layer.name + str("_1")
# Network for landmarks regression
landmarks_regressor = MobileNet(
input_tensor=None,
input_shape=(config.IMAGE_H // 2, config.IMAGE_W // 2, config.N_CHANNELS),
alpha=1.0,
depth_multiplier=1,
include_top=False,
weights='imagenet'
)
landmarks_regressor.name = 'landmarks_regressor'
for layer in landmarks_regressor.layers:
layer.name = layer.name + str("_2")
input_image = Input(shape=(config.IMAGE_H, config.IMAGE_W, config.N_CHANNELS))
downsampled_image = MaxPooling2D(pool_size=(2,2))(input_image)
x1 = affine_transform_estimator(downsampled_image)
x2 = landmarks_regressor(downsampled_image)
x3 = add([x1,x2])
model = Model(inputs=input_image, outputs=x3)
optimizer = Adadelta()
model.compile(optimizer=optimizer, loss=mae_loss_masked)
I want to use a pretrained imagenet VGG16 model in keras and add my own small convnet on top. I am only interested in the features, not the predictions
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np
import os
from keras.models import Model
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
load images from directory (the dir contains 4 images)
IF = '/home/ubu/files/png/'
files = os.listdir(IF)
imgs = [img_to_array(load_img(IF + p, target_size=[224,224])) for p in files]
im = np.array(imgs)
load the base model, preprocess input and get the features
base_model = VGG16(weights='imagenet', include_top=False)
x = preprocess_input(aa)
features = base_model.predict(x)
this works, and I get the features for my images on the pretrained VGG.
I now want to finetune the model and add some convolutional layers.
I read https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html and https://keras.io/applications/ but cannot quite bring them together.
adding my model on top:
x = base_model.output
x = Convolution2D(32, 3, 3)(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Convolution2D(32, 3, 3)(x)
x = Activation('relu')(x)
feat = MaxPooling2D(pool_size=(2, 2))(x)
building the complete model
model_complete = Model(input=base_model.input, output=feat)
stop base layers from being learned
for layer in base_model.layers:
layer.trainable = False
new model
model_complete.compile(optimizer='rmsprop',
loss='binary_crossentropy')
now fit the new model, the model is 4 images and [1,0,1,0] are the class labels.
But this is obviously wrong:
model_complete.fit_generator((x, [1,0,1,0]), samples_per_epoch=100, nb_epoch=2)
ValueError: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None
How is this done?
How would I do it if I only wanted to replace the last convolutional block (conv block5 in VGG16) instead of adding something?
How would I only train the bottleneck features?
The features output features has shape (4, 512, 7, 7). There are four images, but what is in the other dimensions? How would I reduce that to a (1,x) array?
Fitting model
The problem with your generator code is that the fit_generator method expects a generator function to generate the data for fitting which you don't provide.
You can either define a generator as done in the tutorial that you have linked to or create the data and labels yourself and fit your model yourself:
model_complete.fit(images, labels, batch_size=100, nb_epoch=2)
where images are your generated training images and labels are the corresponding labels.
Removing last layer
Assuming you have a model variable and the "pop" method described below, you can do model = pop(model) to remove the last layer.
Training only specific layers
As you have done in your code, you can do:
for layer in base_model.layers:
layer.trainable = False
Then you can "unfreeze" and layer that you want by changing their trainable property to True.
Changing dimensions
To change the output to a 1D array you can use the Flatten layer
The pop method
def pop(model):
'''Removes a layer instance on top of the layer stack.
This code is thanks to #joelthchao https://github.com/fchollet/keras/issues/2371#issuecomment-211734276
'''
if not model.outputs:
raise Exception('Sequential model cannot be popped: model is empty.')
else:
model.layers.pop()
if not model.layers:
model.outputs = []
model.inbound_nodes = []
model.outbound_nodes = []
else:
model.layers[-1].outbound_nodes = []
model.outputs = [model.layers[-1].output]
model.built = False
return model
Use model.fit(X, y) to train on your dataset as explained here: https://keras.io/models/model/#fit
Additionally you should add a Flatten layer and a dense layer with an ouput shape of 1 to get the correct result shape.