Should the inference model in a chatbot model with keras lstm, have the same amount of layers as the main model or it doesnt matter?
I don't know what you exactly mean by inference model.
The number of layers of a model is an hyperparameter that you tune during training. Let's say that you train an LSTM model with 3 layers, then the model used for inference must have the same number of layers and use the weights resulting from the training.
Otherwise, if you add non trained layer when inference, the results won't make any sense.
Hope this helps
Related
If the pretrained model such as Resnet101 were trained on ImageNet dataset, then I change some layers inside it. Can I still be able to use the pretrained model on different ABC dataset?
Lets say This is ResNet34 Model,
It is pretrained on ImageNet and saved as ResNet.pt file.
If I changed some layers inside it, lets say I made it more deeper by introducing some layers in conv4_x (check image)
model = Resnet34() #I have changes some layers inside this ResNet34()
optimizer = optim.Adam(model.parameters(), lr=0.00005)
model.load_state_dict(torch.load('Resnet.pt')['state_dict']) #This is pretrained model of ResNet before some changes
optimizer.load_state_dict(torch.load('Resnet.pt')['optimizer'])
Can I do this? or there are anyother method?
You can do anything you like - the question is: would it be better than training from scratch?
Here are a few issues you might encounter:
1. A mismatch between weights saved in ResNet.pt (the trained weights of the original ResNet18) and the state_dict of your modified model.
You would probably need to manually make sure that the old weights are correctly assigned to the original layers and only the new layer is not initialized.
2. Initializing the weights of the new layer.
Since you are training a resNet - you can take advantage of the residual connections and init the weights of the new layer such that it would initially make no contribution to the predicted value and only pass the input directly to the output via the residual link.
I am training a CNN model for Image Classification using Keras. I am using VGG19 model and a custom dataset with 200 classes and uniformly distributed 90000 training images, 10000 Validation Images and 10000 test images. Even though the training is at 200 epochs, the accuracy is staying at a constant 0.0050. Same with the loss, 5.2988. I am using Kaggle's TPU instance to run this model.
How can I make the model more accurate? Can you suggest any different pretrained models for this purpose?
Your CNN model is behaving like a random model.
I know this because since there are 200 classes, the probability of getting a correct class at random is 1/200=0.0050 which is the accuracy that you have.
This happens when you use tensorflow/keras API instead of sequential()
Since you are using VGG19, if you are trying to use transfer learning, then maybe you have freezed the wrong layer.
If you are using API then you have to do
model = Model(inputs = input_layer, outputs = output_layer) #which is not required in sequential()
print(model.layers) # if you are using API or sequential() this is used to check your layers
Then you have to freeze the layer required as
model.layers[index_of_freeze_layer].trainable = False
If you are not freezing your model layers, then try to use lower learning rate since VGG19 is very sensitive to learning rate. (0.00001 or less depends)
My keras model is made up of multiple models. Each "sub-model" has multiple layers. How do I call out the layers in the "sub-model" and set trainability / freeze specific layers?
I'll use an example of the VGG19 convolutional neural network in Keras, although it applies to any neural network architecture:
from keras.applications.vgg19 import VGG19
model = VGG19(weights='imagenet')
You can visualise the layers using:
model.summary()
The summary will show the amount of trainable parameters in the network. To freeze certain layers, i.e. the last 5 layers in the network:
for layer in model.layers[:-5]:
layer.trainable = False
Calling the summary again you'll see the amount of trainable parameters have reduced.
I am using a pre-trained keras model ( Convolutional network) and I am retraining this model again on my dataset.
Now, I need to get the output of some layers, to visualize the gradient activation. I just found out that every trained model has different naming of layers. for example, the input layer in one model is: input_7 (InputLayer) and in another model is input_5 (InputLayer).
Do you know how to prevent this bad behavior? How can I keep the same naming without the need to manually name all the layers, as I have more than 53 convolutional layers?
I am trying to build a multi-task CNN in Tensorflow which has two dense dense layers in parallel ,one for Age prediction and other for Gender prediction. How can I train each Dense layer for different number of epochs since one can converge before the other and training both for same no of epochs would overfit one of them?
Also, if I propagate the gradients of both age and gender to the CNN, would it overfit since it's weights are being updated at twice the rate of Dense layers?
I have ask a similar question and i've finally found the answer : LINK
SOLUTION : You can define 2 different train_step, and each one has his own learning rate. Each train_step can be called a chosen number of times. In addition, you can define some dependencies if you want some variables to be trainable only for a selected train_step. (See the documentation).