Size mismatch error during VGG finetuning

Size mismatch error during VGG finetuning - python

I have been following the ants and bees transfer learning tutorial from the official PyTorch Docs (http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). I am trying to finetune a VGG19 model by changing the final layer to predict one of two classes. I am able to modify the last fc layer using the following code.
But I get an error when executing the train_model function. The error is “size mismatch at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/THC/generic/THCTensorMathBlas.cu:243”. Any idea what the issue is ?
model_conv = torchvision.models.vgg19(pretrained=True)
for param in model_conv.parameters():
param.requires_grad = False
model_conv = nn.Sequential(*list(model_conv.classifier.children())[:-1] +
[nn.Linear(in_features=4096, out_features=2)])
if use_gpu:
model_conv = model_conv.cuda()
criterion = nn.CrossEntropyLoss()
optimizer_conv = optim.SGD(model_conv._modules['6'].parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
model_conv = train_model(model_conv, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=25)

When you are defining your model you are just considering the classifier which consists on the fully connected part of the network only. Then, when feeding the 224*224*3 image to the model it tries to "go through" a linear layer with 25K features as the input. To solve it you just need to add the convolutional part before, to do so redefine the model like this:
class newModel(nn.Module):
def __init__(self, old_model):
super(newModel, self).__init__()
self.features = old_model.features
self.classifier = nn.Sequential(*list(old_model.classifier.children())[:-1] +
[nn.Linear(in_features=4096, out_features=2)])
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
model_conv = newModel(model_conv)
Now you just also tell the parameters to optimize, if you just want to train the last layer (the one that is newly added) do :
optimizer_conv = optim.SGD(model_conv.classifier._modules['6'].parameters(), lr=0.001, momentum=0.9)
The rest of the code remains the same.
Hope it helps!

Related

How to load a fine tuned pytorch huggingface bert model from a checkpoint file?

I had fine tuned a bert model in pytorch and saved its checkpoints via torch.save(model.state_dict(), 'model.pt')
Now When I want to reload the model, I have to explain whole network again and reload the weights and then push to the device.
Can anyone tell me how can I save the bert model directly and load directly to use in production/deployment?
Following is the training code and you can try running there in colab itself! After training completion, you will notice in file system we have a checkpoint file. But I want to save the model itself.
LINK TO COLAB NOTEBOOK FOR SAMPLE TRAINING
Following is the current inferencing code I written.
import torch
import torch.nn as nn
from transformers import AutoModel, BertTokenizerFast
import numpy as np
import json
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
device = torch.device("cpu")
class BERT_Arch(nn.Module):
def __init__(self, bert):
super(BERT_Arch, self).__init__()
self.bert = bert
# dropout layer
self.dropout = nn.Dropout(0.1)
# relu activation function
self.relu = nn.ReLU()
# dense layer 1
self.fc1 = nn.Linear(768, 512)
# dense layer 2 (Output layer)
self.fc2 = nn.Linear(512, 2)
# softmax activation function
self.softmax = nn.LogSoftmax(dim=1)
# define the forward pass
def forward(self, sent_id, mask):
# pass the inputs to the model
_, cls_hs = self.bert(sent_id, attention_mask=mask, return_dict=False)
x = self.fc1(cls_hs)
x = self.relu(x)
x = self.dropout(x)
# output layer
x = self.fc2(x)
# apply softmax activation
x = self.softmax(x)
return x
bert = AutoModel.from_pretrained('bert-base-uncased')
model = BERT_Arch(bert)
path = './models/saved_weights_new_data.pt'
model.load_state_dict(torch.load(path, map_location=device))
model.to(device)
def inference(comment):
tokens_test = tokenizer.batch_encode_plus(
list([comment]),
max_length=75,
pad_to_max_length=True,
truncation=True,
return_token_type_ids=False
)
test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
predictions = model(test_seq.to(device), test_mask.to(device))
predictions = predictions.detach().cpu().numpy()
predictions = np.argmax(predictions, axis=1)
return predictions
I simply want to save a model from this notebook in a way such that I can use it for inferencing anywhere.

Just save your model using model.save_pretrained, here is an example:
model.save_pretrained("<path_to_dummy_folder>")
You can download the model from colab, save it on your gdrive or at any other location of your choice. While doing inference, you can just give path to this model (you may have to upload it) and start with inference.
To load the model
model = AutoModel.from_pretrained("<path_to_saved_pretrained_model>")
#Note: Instead of AutoModel class, you may use the task specific class as well.

Updating weights of a part of a model (nn.Module)

I encountered an issue when building a network that is loosely based on a CycleGAN architecture
I made all of its components fit inside one nn.Module
from torch import nn
from classes.EncoderDecoder import EncoderDecoder
from classes.Discriminator import Discriminator
class CycleGAN(nn.Module):
def __init__(self):
super(CycleGAN, self).__init__()
self.encdec1 = EncoderDecoder(encoder_in_channels=3)
self.encdec2 = EncoderDecoder(encoder_in_channels=3)
self.disc = Discriminator()
def forward(self, images, images_bw):
disc_color = self.disc(images) # I want the Discriminator to be trained here
disc_bw = self.disc(images_bw) # I want the Discriminator to be trained here
decoded1 = self.encdec1(images_bw) # EncoderDecoder forward pass
decoded2 = self.encdec2(decoded1)
decoded_disc = self.disc(decoded1) # I don't want to train the Discriminator here,
# only the EncoderDecoder should be trained based
# on this Discriminator's result
return [disc_color, disc_bw, decoded1, decoded2, decoded_disc]
This is how I initialize this module, loss functions and the optimizer
c_gan = CycleGAN().to('cuda', dtype=float32, non_blocking=True)
l2_loss = MSELoss().to('cuda', dtype=float32).train()
bce_loss = BCELoss().to('cuda', dtype=float32).train()
optimizer_gan = Adam(c_gan.parameters(), lr=0.00001)
This is how I train the network inside the training loop
c_gan.zero_grad()
optimizer_gan.zero_grad()
disc_color, disc_bw, decoded1, decoded2, decoded_disc = c_gan(images, images_bw)
loss_true = bce_loss(disc_color, label_true)
loss_false = bce_loss(disc_bw, label_false)
disc_loss = loss_true + loss_false
disc_loss.backward()
decoded_loss = l2_loss(decoded2, images_bw)
decoded_disc_loss = bce_loss(decoded_disc, label_true) # This is where the loss for that Discriminator forward pass is calculated
both_decoded_losses = decoded_loss + decoded_disc_loss
both_decoded_losses.backward()
optimizer_gan.step()
The issue
I don't want to train the Discriminator module based on the EncoderDecoder -> Discriminator forward pass. I do however want to train it based on images -> Discriminator and images_bw -> Discriminator forward passes.
Is it possible to achieve this using only one optimizer for my CycleGAN module?
Can I freeze the Discriminator during the optimizer's .step()?
I would appreciate any help.

From PyTorch example: freezing a part of the net (including fine-tuning) - GitHub gist
class CycleGan:
...
c_gan = CycleGan()
# freeze every layer of discriminator
# c_gan.disc.{layer}.weight.requires_grad = False
# c_gan.disc.{layer}.bias.requires_grad = False
...

Gradcam with guided backprop for transfer learning in Tensorflow 2.0

I get an error using gradient visualization with transfer learning in TF 2.0. The gradient visualization works on a model that does not use transfer learning.
When I run my code I get the error:
assert str(id(x)) in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("block5_conv3/Identity:0", shape=(None, 14, 14, 512), dtype=float32)
When I run the code below it errors. I think there's an issue with the naming conventions or connecting inputs and outputs from the base model, vgg16, to the layers I'm adding. Really appreciate your help!
"""
Broken example when grad_model is created.
"""
!pip uninstall tensorflow
!pip install tensorflow==2.0.0
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
import matplotlib.pyplot as plt
IMAGE_PATH = '/content/cat.3.jpg'
LAYER_NAME = 'block5_conv3'
model_layer = 'vgg16'
CAT_CLASS_INDEX = 281
imsize = (224,224,3)
img = tf.keras.preprocessing.image.load_img(IMAGE_PATH, target_size=(224, 224))
plt.figure()
plt.imshow(img)
img = tf.io.read_file(IMAGE_PATH)
img = tf.image.decode_jpeg(img)
img = tf.cast(img, dtype=tf.float32)
# img = tf.keras.preprocessing.image.img_to_array(img)
img = tf.image.resize(img, (224,224))
img = tf.reshape(img, (1, 224,224,3))
input = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet',
input_shape=(imsize[0], imsize[1], imsize[2]))
# base_model.trainable = False
flat = layers.Flatten()
dropped = layers.Dropout(0.5)
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
fc1 = layers.Dense(16, activation='relu', name='dense_1')
fc2 = layers.Dense(16, activation='relu', name='dense_2')
fc3 = layers.Dense(128, activation='relu', name='dense_3')
prediction = layers.Dense(2, activation='softmax', name='output')
for layr in base_model.layers:
if ('block5' in layr.name):
layr.trainable = True
else:
layr.trainable = False
x = base_model(input)
x = global_average_layer(x)
x = fc1(x)
x = fc2(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = input, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
This portion of the code is where the error lies. I'm not sure what is the correct way to label inputs and outputs.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input, model.get_layer(model_layer).input],
outputs=[model.get_layer(model_layer).get_layer(LAYER_NAME).output,
model.output])
print(model.get_layer(model_layer).get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
conv_outputs, predictions = grad_model(img)
loss = predictions[:, 1]
The section below is for plotting a heatmap of gradcam.
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]
# Apply guided backpropagation
gate_f = tf.cast(output > 0, 'float32')
gate_r = tf.cast(grads > 0, 'float32')
guided_grads = gate_f * gate_r * grads
# Average gradients spatially
weights = tf.reduce_mean(guided_grads, axis=(0, 1))
# Build a ponderated map of filters according to gradients importance
cam = np.ones(output.shape[0:2], dtype=np.float32)
for index, w in enumerate(weights):
cam += w * output[:, :, index]
# Heatmap visualization
cam = cv2.resize(cam.numpy(), (224, 224))
cam = np.maximum(cam, 0)
heatmap = (cam - cam.min()) / (cam.max() - cam.min())
cam = cv2.applyColorMap(np.uint8(255 * heatmap), cv2.COLORMAP_JET)
output_image = cv2.addWeighted(cv2.cvtColor(img.astype('uint8'), cv2.COLOR_RGB2BGR), 0.5, cam, 1, 0)
plt.figure()
plt.imshow(output_image)
plt.show()
I also asked this to the tensorflow team on github at https://github.com/tensorflow/tensorflow/issues/37680.

I figured it out. If you set up the model extending the vgg16 base model with your own layers, rather than inserting the base model into a new model like a layer, then it works.
First set up the model and be sure to declare the input_tensor.
inp = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet', input_tensor=inp,
input_shape=(imsize[0], imsize[1], imsize[2]))
This way we don't have to include a line like x=base_model(inp) to show what input we want to put in. That's already included in tf.keras.applications.VGG16(...).
Instead of putting this vgg16 base model inside another model, it's easier to do gradcam by adding layers to the base model itself. I grab the output of the last layer of VGG16 (with the top removed), which is the pooling layer.
block5_pool = base_model.get_layer('block5_pool')
x = global_average_layer(block5_pool.output)
x = fc1(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = inp, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
Now, I grab the layer for visualization, LAYER_NAME='block5_conv3'.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input],
outputs=[model.output, model.get_layer(LAYER_NAME).output])
print(model.get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
predictions, conv_outputs = grad_model(img)
loss = predictions[:, 1]
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]

We (I plus a number of team members developing a project) found a similar problem with a code implementing Grad-CAM that we found in a tutorial.
That code didn't work with a model consisting of the base model of VGG19 plus a few extra layers added on top of it. The problem was that the VGG19 base model was inserted as a "layer" inside our model, and apparently the GradCAM code didn't know how to deal with it - we were getting a "Graph disconnected..." error. Then after some debugging (carried out by another team member, not me) we managed to modify the original code to make it work for this kind of model that contains another model inside it. The idea is to add the inner model as an extra argument of the class GradCAM. Since this may be helpful to others I am including the modified code below (we also renamed the GradCAM class as My_GradCAM).
class My_GradCAM:
def __init__(self, model, classIdx, inner_model=None, layerName=None):
self.model = model
self.classIdx = classIdx
self.inner_model = inner_model
if self.inner_model == None:
self.inner_model = model
self.layerName = layerName
[...]
gradModel = tensorflow.keras.models.Model(inputs=[self.inner_model.inputs],
outputs=[self.inner_model.get_layer(self.layerName).output,
self.inner_model.output])
Then the class can be instantiated by adding the inner model as the extra argument, e.g.:
cam = My_GradCAM(model, None, inner_model=model.get_layer("vgg19"), layerName="block5_pool")
I hope this helps.
Edit: Credit to Mirtha Lucas for doing the debugging and finding the solution.

After a lot of struggle, I condense the way to draw the heat map when you are using transfer learning. Here is the keras official tutorial
The issue I encounter is that when I'm trying to draw the heat map
from my model, the densenet can be only seen as functional layer in my
model. So the make_gradcam_heatmap can not figure out the layer that
inside functional layer. As the 5th layer shows.
Therefore, to simulate the Keras official document, I need to only use the densenet as the model for visualization. Here is the step
Only Take out the model from your model
dense_model = dense_model.get_layer('densenet121')
Copy the weight from dense model to your new initiated model
inputs = tf.keras.Input(shape=(224, 224, 3))
model = model_builder(weights="imagenet", include_top=True, input_tensor=inputs)
for layer, dense_layer in zip(model.layers[1:], dense_model.layers[1:]):
layer.set_weights(dense_layer.get_weights())
relu = model.get_layer('relu')
x = tf.keras.layers.GlobalAveragePooling2D()(relu.output)
outputs = tf.keras.layers.Dense(5)(x)
model = tf.keras.models.Model(inputs = inputs, outputs = outputs)
Draw the heat map
preprocess_input = keras.applications.densenet.preprocess_input
img_array = preprocess_input(get_img_array(img_path, size=(224, 224)))
heatmap = make_gradcam_heatmap(img_array, model, 'bn')
plt.matshow(heatmap)
plt.show()
get_img_array, make_gradcam_heatmap and save_and_display_gradcam are kept in still. Follow the keras tutorial then you are good to go.

TypeError: init() takes at least 3 arguments (2 given) when subclassing Model class

I want to create a simple neural network using Tensorflow and Keras.
When I try to instantiate a Model by subclassing the Model class
class TwoLayerFC(tf.keras.Model):
def __init__(self, hidden_size, num_classes):
super(TwoLayerFC, self).__init__()
self.fc1 = keras.layers.Dense(hidden_size,activation=tf.nn.relu)
self.fc2 = keras.layers.Dense(num_classes)
def call(self, x, training=None):
x = tf.layers.flatten(x)
x = self.fc1(x)
x = self.fc2(x)
return x
This is how I test the network
def test_TwoLayerFC():
tf.reset_default_graph()
input_size, hidden_size, num_classes = 50, 42, 10
model = TwoLayerFC(hidden_size, num_classes)
with tf.device(device):
x = tf.zeros((64, input_size))
scores = model(x)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
scores_np = sess.run(scores)
print(scores_np.shape)
I get an error:
TypeError: init() takes at least 3 arguments (2 given)
I followed this tutorial, and it seems that there should be two parameters.

I read your code and I see a PyTorch model being created, including the mistake in the second Dense layer with two passed numbers.
Keras models should not follow the same logic of PyTorch models.
This model should be created like this:
input_tensor = Input(input_shape)
output_tensor = Flatten()(input_tensor)
output_tensor = Dense(hidden_size, activation='relu')(output_tensor)
output_tensor = Dense(num_classes)
model = keras.models.Model(input_tensor, output_tensor)
This model instance is ready to be compiled and trained:
model.compile(optimizer=..., loss = ..., metrics=[...])
model.fit(x_train, y_train, epochs=..., batch_size=..., ...)
There is no reason in Keras to subclass Model, unless you're a really advanced user trying some very unconventional things.
By the way, be careful not to mix tf.keras.anything with keras.anything. The first is a version of Keras maitained directly by tensorflow, while the second is original Keras. They're not exactly the same, tensorflow's version seems more buggy and mixing the two in the same code sounds like a bad idea.

PyTorch n-to-1 LSTM does not learn anything

I am new to PyTorch and LSTMs and I am trying to train a classification model that takes a sentences where each word is encoded via word2vec (pre-trained vectors) and outputs one class after it saw the full sentence. I have four different classes. The sentences have variable length.
My code is running without errors, but it always predicts the same class, no matter how many epochs I train my model. So I think the gradients are not properly backpropagated. Here is my code:
class LSTM(nn.Module):
def __init__(self, embedding_dim, hidden_dim, tagset_size):
super(LSTM, self).__init__()
self.hidden_dim = hidden_dim
self.lstm = nn.LSTM(embedding_dim, hidden_dim)
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
self.hidden = self.init_hidden()
def init_hidden(self):
# The axes semantics are (num_layers, minibatch_size, hidden_dim)
return (torch.zeros(1, 1, self.hidden_dim).to(device),
torch.zeros(1, 1, self.hidden_dim).to(device))
def forward(self, sentence):
lstm_out, self.hidden = self.lstm(sentence.view(len(sentence), 1, -1), self.hidden)
tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores
EMBEDDING_DIM = len(training_data[0][0][0])
HIDDEN_DIM = 256
model = LSTM(EMBEDDING_DIM, HIDDEN_DIM, 4)
model.to(device)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
for epoch in tqdm(range(n_epochs)):
for sentence, tag in tqdm(training_data):
model.zero_grad()
model.hidden = model.init_hidden()
sentence_in = torch.tensor(sentence, dtype=torch.float).to(device)
targets = torch.tensor([label_to_idx[tag]], dtype=torch.long).to(device)
tag_scores = model(sentence_in)
res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device)
# I THINK THIS IS WRONG???
print(res) # tensor([[-10.6328, -10.6783, -10.6667, -0.0001]], device='cuda:0', grad_fn=<CopyBackwards>)
print(targets) # tensor([3], device='cuda:0')
loss = loss_function(res, targets)
loss.backward()
optimizer.step()
The code is largely inspired by https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html
The difference is that they have a sequence-to-sequence model and I have a sequence-to-ONE model.
I am not sure what the problem is, but I guess that the scores returned by the model contain a score for each tag and my ground truth only contains the index of the correct class? How would this be handled correctly?
Or is the loss function maybe not the correct one for my use case? Also I am not sure if this is done correctly:
res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device)
By taking tag_scores[-1] I want to get the scores after the last word has been given to the network because tag_scores contains the scores after each step, if I understand correctly.
And this is how I evaluate:
with torch.no_grad():
preds = []
gts = []
for sentence, tag in tqdm(test_data):
inputs = torch.tensor(sentence, dtype=torch.float).to(device)
tag_scores = model(inputs)
# find index with max value (this is the class to be predicted)
pred = [j for j,v in enumerate(tag_scores[-1]) if v == max(tag_scores[-1])][0]
print(pred, idx_to_label[pred], tag)
preds.append(pred)
gts.append(label_to_idx[tag])
print(f1_score(gts, preds, average='micro'))
print(classification_report(gts, preds))
EDIT:
When shuffling the data before training it seems to work. But why?
EDIT 2:
I think the reason why shuffling is needed is that my training data contains samples for each class in groups. So when training them each after the other, the model will only see the same class in the last N iterations and therefore it will only predict this class. Another reason might also be that I am currently using mini-batches of only one sample because I haven't figured out yet how to use other sizes.

Because you are trying to use a whole sentence to classify, so the following line:
self.hidden2tag(lstm_out.view(len(sentence), -1))
should be changed to, so it takes the final features to the classifier.
self.hidden2tag(lstm_out.view(sentence[-1], -1))
But I am also not so sure since I am not familiar with LSTM.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Size mismatch error during VGG finetuning - python

Related

How to load a fine tuned pytorch huggingface bert model from a checkpoint file?

Updating weights of a part of a model (nn.Module)

Gradcam with guided backprop for transfer learning in Tensorflow 2.0

TypeError: init() takes at least 3 arguments (2 given) when subclassing Model class

PyTorch n-to-1 LSTM does not learn anything

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Size mismatch error during VGG finetuning - python

Related

How to load a fine tuned pytorch huggingface bert model from a checkpoint file?

Updating weights of a part of a model (nn.Module)

Gradcam with guided backprop for transfer learning in Tensorflow 2.0

TypeError: __init__() takes at least 3 arguments (2 given) when subclassing Model class

PyTorch n-to-1 LSTM does not learn anything

Categories

Resources

TypeError: init() takes at least 3 arguments (2 given) when subclassing Model class