When doing training, I initialize my embedding matrix, using the pretrained embeddings picked for words in training set vocabulary.
import torchtext as tt
contexts = tt.data.Field(lower=True, sequential=True, tokenize=tokenizer, use_vocab=True)
contexts.build_vocab(data, vectors="fasttext.en.300d",
vectors_cache=config["vectors_cache"])
In my model I pass contexts.vocab as parameter and initialize embeddings:
embedding_dim = vocab.vectors.shape[1]
self.embeddings = nn.Embedding(len(vocab), embedding_dim)
self.embeddings.weight.data.copy_(vocab.vectors)
self.embeddings.weight.requires_grad=False
I train my model and during training I save its 'best' state via torch.save(model, f).
Then I want to test/create demo for model in separate file for evaluation. I load the model via torch.load. How do I extend the embedding matrix to contain test vocabulary? I tried to replace embedding matrix
# data is TabularDataset with test data
contexts.build_vocab(data, vectors="fasttext.en.300d",
vectors_cache=config["vectors_cache"])
model.embeddings = torch.nn.Embedding(len(contexts.vocab), contexts.vocab.vectors.shape[1])
model.embeddings.weight.data.copy_(contexts.vocab.vectors)
model.embeddings.weight.requires_grad = False
But the results are terrible (almost 0 accuracy). Model was doing good during training. What is the 'correct way' of doing this?
Related
I created a word embedding layer outside model and used it as input before fitting my model. Now I need to predict new sentences by this model, how can I save the pre-trained embedding layer and apply it to my new sentences?
Code example:
Before input to model and fitting:
embedding_sentence = tf.keras.layers.Embedding(vocab_size, model_dimension, trainable=True)
embedded_sentence = embedding_sentence(vectorized_sentence)
Model fitting:
model = tf.keras.Sequential()
model.add(tf.keras.layers.GlobalAveragePooling1D())
...
Now I need to predict new sentences, how can I apply the trained embedding to them?
The above information is insufficient to answer this question accurately but still, I will give it a try. In tensorflow, you can use a function named get_weights to get the weights of a pre-train embedding layer and save it in a numpy/hd5 file which can be used later as an embedding layer in a new architecture.
weights = embedding_sentence.get_weights()
np.save('embedding_weights.npy', weights)
# Now load the weights to the embedding layer again
new_embedding_sentence = tf.keras.layers.Embedding(vocab_size, model_dimension, trainable=True)
new_embedding_sentence.build((None,)) # required to set the weights
new_embedding_sentence.set_weights(weights)
new_sentence = "This a dummy sentence"
new_sentence_embedding = new_embedding_sentence(new_sentence )
predictions = model(new_sentence_embedding)
I was wondering what is the correct way of uploading saved weights for a trained model at inference.
As an FYI, I train my model using pretrained coco weights and pretrained imagenet weights:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, pretrained_backbone=True)
num_classes = 2
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model.to(device)
I then save the model dict in a dictionary where the state is saved like:
'state_dict': model.state_dict()
Once trained, I upload the weights like:
# set up model
model_test = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, pretrained_backbone=True)
num_classes = 2
# get number of input features for the classifier
in_features = model_test.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model_test.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model_test.to(device)
bestmodel = get_best_model(args.best)
bestmodel = torch.load(bestmodel)
model_test.load_state_dict(bestmodel['state_dict'])
As you can see, I also upload the pretrained weights at inference (test time). However, you can see that I then upload the weights from my saved model (bestmodel).
I thought by uploading weights, that this would override the initial pretrained weights uploaded. However, when I set the test model to :
model_test = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone= False)
And continue to upload my best model, I get slightly worse performance (marginally).
Is there a correct way to upload these weights? And if it shouldn't matter, why do I get a difference?
I am trying to get the embeddings from pre-trained wav2vec2 models (e.g., from jonatasgrosman/wav2vec2-large-xlsr-53-german) using my own dataset.
My aim is to use these features for a downstream task (not specifically speech recognition). Namely, since the dataset is relatively small, I would train an SVM with these embeddings for the final classification.
So far I have tried this:
model_name = "facebook/wav2vec2-large-xlsr-53-german"
feature_extractor = Wav2Vec2Processor.from_pretrained(model_name)
model = Wav2Vec2Model.from_pretrained(model_name)
input_values = feature_extractor(train_dataset[:10]["speech"], return_tensors="pt", padding=True,
feature_size=1, sampling_rate=16000 ).input_values
Then, I am not sure whether the embeddings here correspond to the sequence of last_hidden_states:
hidden_states = model(input_values).last_hidden_state
or to the sequence of features of the last conv layer of the model:
features_last_cnn_layer = model(input_values).extract_features
Also, is this the correct way to extract features from a pre-trained model?
How one can get embeddings from a specific layer?
PD: Posting here as the HuggingFace's forum seems to be less active.
Just check the documentation:
last_hidden_state (torch.FloatTensor of shape (batch_size,
sequence_length, hidden_size)) – Sequence of hidden-states at the
output of the last layer of the model.
extract_features (torch.FloatTensor of shape (batch_size,
sequence_length, conv_dim[-1])) – Sequence of extracted feature
vectors of the last convolutional layer of the model.
The last_hidden_state vector represents so called contextualized embeddings (i.e. every feature (CNN output) has a vector representation that is to some extend influenced by the other tokens of the sequence).
The extract_features vector represents the embeddings of your input (after the CNNs).
.
Also, is this the correct way to extract features from a pre-trained
model?
Yes.
How one can get embeddings from a specific layer?
Set output_hidden_states=True:
o = model(input_values,output_hidden_states=True)
o.keys()
Output:
odict_keys(['last_hidden_state', 'extract_features', 'hidden_states'])
The hidden_states value contains the embeddings and the contextualized embeddings of each attention layer.
P.S.: jonatasgrosman/wav2vec2-large-xlsr-53-german model was trained with feat_extract_norm==layer. That means, you should also pass an attention mask to the model:
model_name = "facebook/wav2vec2-large-xlsr-53-german"
feature_extractor = Wav2Vec2Processor.from_pretrained(model_name)
model = Wav2Vec2Model.from_pretrained(model_name)
i= feature_extractor(train_dataset[:10]["speech"], return_tensors="pt", padding=True,
feature_size=1, sampling_rate=16000 )
model(**i)
I trained a text classifier under this guide: https://developers.google.com/machine-learning/guides/text-classification/step-4
And save model as
model.save('~./output/model.h5')
In this case, how i use this model to classify texts on another new dataset?
Thank you
import tensorflow as tf
# Recreate the exact same model, including its weights and the optimizer
new_model = tf.keras.models.load_model('~./output/model.h5')
# Show the model architecture
new_model.summary()
# Apply the same process of data preparation while training the model.
# Lets say after Data preprocessing you have stored the processed data in test_data
# check model accuracy from unseen/new dataset
loss, acc = new_model.evaluate(test_data, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))
You can use the tensorflow's Text tokenization utility class (Tokenizer) to deal with unknown words in Test data.
Num_words is the vocabulary size (it picks most frequent words)
Assign oov_token = 'Some string', used for all the tokens/words outside vocab size (basically new words in test data will be dealt as oov_token string .
Fit on Train data and then generate token sequence for both train and test data.
tf.keras.preprocessing.text.Tokenizer(
num_words=None, filters='!"#$%&()*+,-./:;<=>?#[\]^_`{|}~\t\n', lower=True,
split=' ', char_level=False, oov_token=None, document_count=0, **kwargs
)
I want to create an image classifier using transfer learning on a model already trained on ImageNet.
How do I replace the final layer of a torchvision.models ImageNet classifier with my own custom classifier?
Get a pre-trained ImageNet model (resnet152 has the best accuracy):
from torchvision import models
# https://pytorch.org/docs/stable/torchvision/models.html
model = models.resnet152(pretrained=True)
Print out its structure so we can compare to the final state:
print(model)
Remove the last module (generally a single fully connected layer) from model:
classifier_name, old_classifier = model._modules.popitem()
Freeze the parameters of the feature detector part of the model so that they are not adjusted by back-propagation:
for param in model.parameters():
param.requires_grad = False
Create a new classifier:
classifier_input_size = old_classifier.in_features
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(classifier_input_size, hidden_layer_size)),
('activation', nn.SELU()),
('dropout', nn.Dropout(p=0.5)),
('fc2', nn.Linear(hidden_layer_size, output_layer_size)),
('output', nn.LogSoftmax(dim=1))
]))
The module name for our classifier needs to be the same as the one which was removed. Add our new classifier to the end of the feature detector:
model.add_module(classifier_name, classifier)
Finally, print out the structure of the new network:
print(model)