I built and trained a network based on vgg16 network. In the original network I froze all the layers of vgg16 and trained only the last 4 layers which I added at the end of vgg16. Now I want to load and re-train this model by changing the trainable layers to use my own weights instead of ImageNet weights. Initially I tried to build the same model by changing the trainable layers of vgg16 and model weights with the following code.
# Load the VGG model
vgg_conv = VGG16(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
# Freeze n number of layers from the last
for layer in vgg_conv.layers[:-8]: layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers: print(layer, layer.trainable)
# Create and compile the model
model = createModel()
trained_model = keras.models.load_model(trained_dir)
model.set_weights(trained_model.get_weights())
model.compile(loss='categorical_crossentropy', optimizer=optimizers.RMSprop(lr=lr), metrics=['acc'])
But this gives me this error:
ValueError: Cannot feed value of shape (3, 3, 3, 64) for Tensor 'Placeholder_869:0', which has shape '(3, 3, 256, 512)'
When I check the weights of the original and new networks I see that shapes of some weights are different. I also tried to change the trainable layers of the original network but for layer in trained_model.layers: print(layer, layer.trainable) shows only the last layers that I added. So how can change the trainable layers of my own trained_model? Or is there another way to get the same result?
This might be the possible solution. I created a vgg16 based model with above code. Then I changed weights of the last layers by running this code: model.layers[1].set_weights(trained_model.layers[1].get_weights()). Since I added 4 layers to vgg16 I executed this code by changing the layer index from 1 to 4. I have not tried the model yet. If this is not a correct solution I would be glad to read your answers.
Related
I need to see how I would initialize all layers of a Sequential model with data from a same-sized sequential model.
E.G. How would I initialize the weights for every layer of the following Sequential model?
model = tf.keras.Sequential([Dense(2000, activation='relu', input_shape=(11,)),
Dense(1, activation='relu'),
Dropout(0.5),
Dense(400, activation='relu'),
Dropout(0.5),
Dense(150, activation='relu'),
BatchNormalization(),
Dense(y_max+1, activation='softmax')
])
I am fairly new to CNN training and have managed to make the above code work through trial and error and extensive research.
Datatype is list and np.array() of dtype np.float64
The idea is that I grab the weights from one model (same as above) and return it to another model (also same as above). I just need to be able to visualize how I can initialize the weights and biases of all layers using the following:
weights = model.get_weights()[0]
biases = model.get_weights()[1]
return weights, biases
I have attempted the model.set_weights() method, but I keep getting the following error message, given the code before the TypeError:
if iteration == 1:
for layer in model.layers:
layer.set_weights(None, None)
TypeError: set_weights() takes 2 positional arguments but 3 were given
I'd be very appreciative of any help, thank you.
In the Sequential example above, each layer parameters can be accessed and assigned new weights as shown below,
#example of first layer
model.layers[0]
#weights of the first layer,
model.layers[0].weights #gives the weights of kernel and bias of dense in this case
#assign new_weights by
model.layers[0].kernel.assign(tf.Variable(new_kernel_weights))
model.layers[0].bias.assign(tf.Variable(new_bias_weights))
I am working on a project in which i am trying to implement transfer learning to classify ECG signals (1-Dimentional). I have a pretrained model with pretty good accuracy, but the model was trained on a different dataset which have an input shape of (4096,12) and output shape (6). I want to fine tune this pre-trained model on my data which have an input shape of (350,5).
To do so, i have added some layers before the input of the pretrained model in order to get the shape (4096,12) and added an output dense layer with shape (5). the code of my model is as below:
from tensorflow.keras.layers import Dense,Input,Conv1D, BatchNormalization,
Activation,Flatten,Reshape,Dropout
from tensorflow.keras.models import Model
#layer to get the desired shape for pre-trained model
new_inp = Input(shape=(300,5))
net = Flatten()(new_inp)
net = Dense(1000, activation='relu')(net)
net = Dropout(0.3)(net)
net = Dense(4096, activation='relu')(net)
net = Dropout(0.3)(net)
net = Reshape([4096,1])(net)
net = Conv1D (filters = 64, kernel_size = 11, strides = 1, padding = 'same')(net)
net = BatchNormalization()(net)
net = Activation('relu')(net)
net = Conv1D (filters = 12, kernel_size = 9, strides = 1, padding = 'same')(net)
net = BatchNormalization()(net)
net = Activation('relu')(net)
# pre-trained model
net = mod(net)
# output layer
ll = Dense(4,activation="softmax")(net)
newModel = Model(new_inp, ll)
my training and validation accuracy is not improving... it improved upto 55%. any idea about the problem.
Thank you.
The idea behind transfer learning is that you concatenate new trainable layers to the end of a pre-trained model, freeze the pre-trained layers, and train the new layers. When you add these new layers to the beginning of the pre-trained model and training the whole network, you are essentially overwriting the pre-trained coefficients.
It is possible to add preprocessing layers (or any layer that does not require back-propagation) to the beginning, but you have added a whole DNN.
I am using an LSTM for fake news detection and added an embedding layer to my model.
It is working fine without adding any input_shape in the LSTM function, but I thought the input_shape parameter was mandatory. Could someone help me with why there is no error even without defining input_shape? Is it because the embedding layer implicitly defines the input_shape?
Following is the code:
model=Sequential()
embedding_layer = Embedding(total_words, embedding_dim, weights=[embedding_matrix], input_length=max_length)
model.add(embedding_layer)
model.add(LSTM(64,))
model.add(Dense(1,activation='sigmoid'))
opt = SGD(learning_rate=0.01,decay=1e-6)
model.compile(loss = "binary_crossentropy", optimizer = opt,metrics=['accuracy'])
model.fit(data,train['label'], epochs=30, verbose=1)
You only need to provide an input_length to the Embedding layer. Furthermore, if you use a sequential model, you do not need to provide an input layer. Avoiding an input layer essentially means that your models weights are only created when you pass real data, as you did in model.fit(*). If you wanted to see the weights of your model before providing real data, you would have to define an input layer before your Embedding layer like this:
embedding_input = tf.keras.layers.Input(shape=(max_length,))
And yes, as you mentioned, your model infers the input_shape implicitly when you provide the real data. Your LSTM layer does not need an input_shape as it is also derived based on the output of your Embedding layer. If the LSTM layer were the first layer of your model, it would be best to specify an input_shape for clarity. For example:
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(32, input_shape=(10, 5)))
model.add(tf.keras.layers.Dense(1))
where 10 represents the number of time steps and 5 the number of features. In your example, your input to the LSTM layer has the shape(max_length, embedding_dim). Also here, if you do not specify the input_shape, your model will infer the shape based on your input data.
For more information check out the Keras documentation.
I am attempting to add another LSTM layer to my model but I am only a beginner and I am not very good. I am using the (Better) - Donal Trump Tweets! dataset on Kaggle for LSTM text generation.
I am struggling to get it to run as it returns an Error:
<ValueError: Input 0 of layer lstm_16 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 128]>
My model is:
print('Building model...')
model2 = Sequential()
model2.add(LSTM(128, input_shape=(maxlen, len(chars)),return_sequences=True))
model2.add(Dropout(0.5))
model2.add(LSTM(128))
model2.add(Dropout(0.5))
model2.add(LSTM(128))
model2.add(Dropout(0.2))
model2.add(Dense(len(chars), activation='softmax'))
# optimizer = RMSprop(lr=0.01)
optimizer = Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
print('model built')
The Model works with only two LSTM layers, two Dropout layers, and one dense layer. I think something is wrong with my setup for input_shape, but I could be wrong. My model is based off of a notebook from the above data set notebook here.
In order to stack RNN's you will have to use return_sequences=True.
From the error it could be seen that the layer was expecting 3 dimentional tensor, but received a 2 dimentional. Here you can read that that return_sequences=True flag will output a 3 dimentional tensor.
If True the full sequences of successive outputs for each timestep is
returned (a 3D tensor of shape (batch_size, timesteps,
output_features)).
Assuming, that there are no issues with your input layer and the input data is passed on correctly, I will propose to try the following model.
print('Building model...')
model2 = Sequential()
model2.add(LSTM(128, input_shape=(maxlen, len(chars)),return_sequences=True))
model2.add(Dropout(0.5))
model2.add(LSTM(128, return_sequences=True))
model2.add(Dropout(0.5))
model2.add(LSTM(128))
model2.add(Dropout(0.2))
model2.add(Dense(len(chars), activation='softmax'))
# optimizer = RMSprop(lr=0.01)
optimizer = Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
print('model built')
I have the following code which works on pre-trained VGG model but fails on ResNet and Inception model.
vgg_model = keras.applications.vgg16.VGG16(weights='imagenet')
type(vgg_model)
vgg_model.summary()
model = Sequential()
for layer in vgg_model.layers:
model.add(layer)
Now, changing the model to ResNet as follows:
resnet_model=keras.applications.resnet50.ResNet50(weights='imagenet')
type(resnet_model)
resnet_model.summary()
model = Sequential()
for layer in resnet_model.layers:
model.add(layer)
gives the following error:
ValueError: Input 0 is incompatible with layer res2a_branch1: expected axis -1 of input shape to have value 64 but got shape (None, 56, 56, 256)
The problem is due to the fact that unlike VGG, Resnet does not have a sequential architecture (e.g. some layers are connected to more than one layers, there are skip connections, etc.). Therefore you cannot iterate over the layers in the model one after another and connect each layer to the previous one (i.e. sequentially). You can plot the architecture of the model using plot_model() to have a better understanding of this point.