I'm trying to use transformer's huggingface pretrained model bert-base-uncased, but I want to increace dropout. There isn't any mention to this in from_pretrained method, but colab ran the object instantiation below without any problem. I saw these dropout parameters in classtransformers.BertConfig documentation.
Am I using bert-base-uncased AND changing dropout in the correct way?
model = BertForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='bert-base-uncased',
num_labels=2,
output_attentions = False,
output_hidden_states = False,
attention_probs_dropout_prob=0.5,
hidden_dropout_prob=0.5
)
As Elidor00 already said it, your assumption is correct. Similarly I would suggest using a separated Config because it is easier to export and less prone to cause errors. Additionally someone in the comments ask how to use it via from_pretrained:
from transformers import BertModel, AutoConfig
configuration = AutoConfig.from_pretrained('bert-base-uncased')
configuration.hidden_dropout_prob = 0.5
configuration.attention_probs_dropout_prob = 0.5
bert_model = BertModel.from_pretrained(pretrained_model_name_or_path = 'bert-base-uncased',
config = configuration)
Yes this is correct, but note that there are two dropout parameters and that you are using a specific Bert model, that is BertForSequenceClassification.
Also as suggested by the documentation you could first define the configuration and then the way in the following way:
from transformers import BertModel, BertConfig
# Initializing a BERT bert-base-uncased style configuration
configuration = BertConfig()
# Initializing a model from the bert-base-uncased style configuration
model = BertModel(configuration)
# Accessing the model configuration
configuration = model.config
Related
import tensorflow as tf
import keras
import tensorflow.keras.layers as tfl
from tensorflow.keras.layers.experimental.preprocessing import RandomFlip, RandomRotation
I am trying to figure out which I should use for Data Augmentation. In the documentation, there is:
tf.keras.layers.RandomFlip and RandomRotation
Then we have in tf.keras.layers.experimental.preprocessing the same things, randomFlip and RandomRotation.
Which should I use? I've seen guides that use both.
This is my current code:
def data_augmenter():
data_augmentation = tf.keras.Sequential([
tfl.RandomFlip(),
tfl.RandomRotation(0.2)
])
return data_augmentation
and this is a part of my model:
def ResNet50(image_shape = IMG_SIZE, data_augmentation=data_augmenter()):
input_shape = image_shape + (3,)
# Remove top layer in order to put mine with the correct classification labels, get weights for imageNet
base_model = tf.keras.applications.resnet_v2.ResNet50V2(input_shape=input_shape, include_top=False, weights='imagenet')
# Freeze base model
base_model.trainable = False
# Define input layer
inputs = tf.keras.Input(shape=input_shape)
# Apply Data Augmentation
x = data_augmentation(inputs)
I am a bit confused here..
If you find something in an experimental module and something in the same package by the same name, these will typically be aliases of one another. For the sake of backwards compatibility, they don't remove the experimental one (at least not for a few iterations.)
You should generally use the non-experimental one if it exists, since this is considered stable and should not be removed or changed later.
The following page shows Keras preprocessing exerimental. If it redirects to the preprocessing module, it's an alias. https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing
I recently being using a RobertaLarge model, which I perform a down stream Training, using "Trainer" package.
All goes well, I see the loss going down, and compare manually some results with valid dataset.
Problem goes when I try to save the model and reload it afterwards.
I keep seeing the warning when trying to reload the model:
Some weights of the model checkpoint at Roberta_trained_1epoch were not used when initializing RobertaPreTrainedModel: ['module.roberta.encoder.layer.10.output.dense.bias', [........................................340_LAYERS_..................................]
'module.roberta.encoder.layer.6.attention.self.key.bias', 'module.roberta.encoder.layer.22.output.dense.weight', 'module.roberta.encoder.layer.3.attention.self.key.bias', 'module.roberta.encoder.layer.15.attention.self.value.bias', 'module.roberta.encoder.layer.15.attention.self.query.bias', 'module.roberta.encoder.layer.2.attention.self.value.bias']
I looked extensively for an answer to why this problem, and so far couldn't find a solution. Some claim this is just a warning and there's nothing wrong, however suspiciously I did some manual checks, and indeed the model seems... virgin.
I'm using the: Trainer.save_model('save_here') after training, and using the RobertaForTokenClassification.from_pretrained('save_here', local_files_only=True)model to reload it.
However the results show me that the model is not loading currently clearly.
training code:
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=ds_train,
eval_dataset=ds_valid,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
trainer.train()
trainer.evaluate()
trainer.save_model('save_here')
this results in evaluation loss of: 0.002
Reloading and re-evaluation:
model = RobertaForTokenClassification.from_pretrained('save_here', local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained('tokenizers_saved')
dl_valid = DataLoader(ds_valid, batch_size=Config.batch_size, shuffle=True)
with torch.no_grad():
for index, data in enumerate(dl_valid):
batch_input_ids = data['input_ids'].to(device, dtype=torch.long)
batch_att_mask = data['attention_mask'].to(device, dtype=torch.long)
batch_target = data['label_ids'].to(device, dtype=torch.long)
output = model(batch_input_ids, token_type_ids=None, attention_mask=batch_att_mask, labels=batch_target)
step_loss, eval_prediction = output['loss'], output['logits']
eval_prediction = np.argmax(eval_prediction.detach().to('cpu').numpy(), axis=2)
predictions.append(eval_prediction)
reals.append(batch_target)
eval_loss += step_loss
print(eval_loss)
This results in loss: 1.2 - 0.9 (randomly after loading)
I found out what was wrong.
Will share with others, given others may have the same issue.
My problem was that I wrapped my model into a DataParallel model = nn.DataParallel(model)
So it seems that Trainer can't save it properly and get it back the usual way.
As a work around:
model = trainer.model
model.module.save_pretrained('save_here')
....
# afterwards in another machine
....
model = RobertaForTokenClassification.from_pretrained('save_here')
Still think that this should be handled differently.
Model.summary() gives me a this output
Now how can i check sequential_1 layers and sequential_3 layer?
I want whole model summary but it gives two sequential so that means two model are combined so how can i get summary of both model?
I only have model.h5 file nothing else
Models saved in .h5 format includes everything about the model.
To inspect the layers summary inside the Model in a Model, like in your case.
You could extract the layers, then call the summary method from each of them.
ie.
layer_summary = [layer.summary() for layer in loaded_model.layers]
Here is the complete code I used in reproducing your scenario.
import tensorflow as tf
print('Running Tensorflow version {}'.format(tf.__version__)) # Tensorflow 2.1.0
model_path = '/content/keras_model.h5'
loaded_model = tf.keras.models.load_model(model_path)
loaded_model.summary()
inp = loaded_model.input
layer_summary = [layer.summary() for layer in loaded_model.layers]
I've also used the model.h5 file you uploaded.
Whenever I export a fastai model and reload it, I get this error (or a very similar one) when I try and use the reloaded model to generate predictions on a new test set:
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Minimal reprodudeable code example below, you just need to update your FILES_DIR variable to where the MNIST data gets deposited on your system:
from fastai import *
from fastai.vision import *
# download data for reproduceable example
untar_data(URLs.MNIST_SAMPLE)
FILES_DIR = '/home/mepstein/.fastai/data/mnist_sample' # this is where command above deposits the MNIST data for me
# Create FastAI databunch for model training
tfms = get_transforms()
tr_val_databunch = ImageDataBunch.from_folder(path=FILES_DIR, # location of downloaded data shown in log of prev command
train = 'train',
valid_pct = 0.2,
ds_tfms = tfms).normalize()
# Create Model
conv_learner = cnn_learner(tr_val_databunch,
models.resnet34,
metrics=[error_rate]).to_fp16()
# Train Model
conv_learner.fit_one_cycle(4)
# Export Model
conv_learner.export() # saves model as 'export.pkl' in path associated with the learner
# Reload Model and use it for inference on new hold-out set
reloaded_model = load_learner(path = FILES_DIR,
test = ImageList.from_folder(path = f'{FILES_DIR}/valid'))
preds = reloaded_model.get_preds(ds_type=DatasetType.Test)
Output:
"RuntimeError: Input type (torch.cuda.FloatTensor) and weight type
(torch.cuda.HalfTensor) should be the same"
Stepping through the code statement by statement, everything works fine until the last line pred = ... which is where the torch error above pops up.
Relevant software versions:
Python 3.7.3
fastai 1.0.57
torch 1.2.0
torchvision 0.4.0
So the answer to this ended up being relatively simple:
1) As noted in my comment, training in mixed precision mode (setting conv_learner to_fp16()) caused the error with the exported/reloaded model
2) To train in mixed precision mode (which is faster than regular training) and enable export/reload of the model without errors, simply set the model back to default precision before exporting.
...In code, simply changing the example above:
# Export Model
conv_learner.export()
to:
# Export Model (after converting back to default precision for safe export/reload
conv_learner = conv_learner.to_fp32()
conv_learner.export()
...and now the full (reproduceable) code example above runs without errors, including the prediction after model reload.
Your model is in half precision if you have .to_fp16, which would be the same if you would model.half() in PyTorch.
Actually if you trace the code .to_fp16 will call model.half()
But there is a problem. If you convert the batch norm layer also to half precision you may get the convergence problem.
This is why you would typically do this in PyTorch:
model.half() # convert to half precision
for layer in model.modules():
if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
layer.float()
This will convert any layer to half precision other than batch norm.
Note that code from PyTorch forum is also OK, but just for nn.BatchNorm2d.
Then make sure your input is in half precision using to() like this:
import torch
t = torch.tensor(10.)
print(t)
print(t.dtype)
t=t.to(dtype=torch.float16)
print(t)
print(t.dtype)
# tensor(10.)
# torch.float32
# tensor(10., dtype=torch.float16)
# torch.float16
Is it possible to create a hub module from existing checkpoints without chaining the training code?
Yes, absolutely. You need a session with (1) a Module and (2) the proper values in its variables. It doesn't matter if those come from actual training or merely restoring a checkpoint. Given a Python library for model building that knows nothing about TensorFlow Hub, you can have a tool on the side for export to a Hub Module that looks like:
import tensorflow_hub as hub
import your_library as build_model_body
def module_fn():
inputs = tf.placeholder(...)
logits = build_model_body(inputs)
hub.add_signature(inputs=inputs, outputs=logits)
def main(_):
spec = hub.create_module_spec(module_fn)
# Supply a checkpoint trained on a model from the same Python code.
checkpoint_path = "..."
# Output will be written here:
export_path = "..."
with tf.Graph().as_default():
module = hub.Module(spec)
init_fn = tf.contrib.framework.assign_from_checkpoint_fn(
checkpoint_path, module.variable_map)
with tf.Session() as session:
init_fn(session)
module.export(export_path, session=session)
Fine points to note:
build_model_body() should transform inputs to outputs (say, pixels to feature vectors) as suitable for a Hub module, but not include data reading, or loss and optimizers. For transfer learning, these are best left to the consumer of the module. Some refactoring may be required.
Supplying the module.variable_map is essential, to translate from plain variable names as created by running build_model_body() by itself to the variable names created by instantiating the Module, live in scope module/state.