How to mix tensorflow keras model and transformers - python

I am trying to import a pretrained model from Huggingface's transformers library and extend it with a few layers for classification using tensorflow keras. When I directly use transformers model (Method 1), the model trains well and reaches a validation accuracy of 0.93 after 1 epoch. However, when trying to use the model as a layer within a tf.keras model (Method 2), the model can't get above 0.32 accuracy. As far as I can tell based on the documentation, the two approaches should be equivalent. My goal is to get Method 2 working so that I can add more layers to it instead of directly using the logits produced by Huggingface's classifier head but I'm stuck at this stage.
import tensorflow as tf
from transformers import TFRobertaForSequenceClassification
Method 1:
model = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)
Method 2:
input_ids = tf.keras.Input(shape=(128,), dtype='int32')
attention_mask = tf.keras.Input(shape=(128, ), dtype='int32')
transformer = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)
encoded = transformer([input_ids, attention_mask])
logits = encoded[0]
model = tf.keras.models.Model(inputs = [input_ids, attention_mask], outputs = logits)
Rest of the code for either method is identical,
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])
I am using Tensorflow 2.3.0 and have tried with transformers versions 3.5.0 and 4.0.0.

Answering my own question here. I posted a bug report on HuggingFace GitHub and they fixed this in the new dev version (4.1.0.dev0 as of December 2020). The snippet below now works as expected:
input_ids = tf.keras.Input(shape=(128,), dtype='int32')
attention_mask = tf.keras.Input(shape=(128, ), dtype='int32')
transformer = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)
encoded = transformer({"input_ids": input_ids, "attention_mask": attention_mask})
logits = encoded[0]
model = tf.keras.models.Model(inputs = {"input_ids": input_ids, "attention_mask": attention_mask}, outputs = logits)

Related

Training a TensorFlow-Keras model with extra layer which gets also the labels as input

I want to train a model which create a features vector for a RGB image (2D array with 3 channels), and using that features vector, a classifier will decide what to do (e.g. person recognition from an image, assign a label by choosing the "closest" pre-trained features vectors (of the people enrolled to the system). To do so I use a categorical cross-entropy. In the training phase I apply categorical softmax on the features vector as an extra layer, and get as output the probability to be in each label or class, then I use the softmax output and the training label to compute the loss.
So, for working or testing, the model receives just one input: the image, and outputs a features vector. While for training the model receives pairs: the image and its label.
I want to train such a model, with pair [image,label] input in the training phase, and [image] input in the testing or working phase.
I use TensorFlow 2.8 and Keras 2.8 with Python 3.9.5.
The code (with a toy model and some random data):
# ==============================================================================
# Imports
# ==============================================================================
import numpy as np
import tensorflow as tf
import keras
import keras.backend as K
from keras import layers as tfl
from keras import Model
# ==============================================================================
# Switch case layer, behaves differently for training and testing
# ==============================================================================
class Switch(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super().__init__(**kwargs)
def call(self, inputs, training=None):
x = tf.identity(inputs)
if training:
y = tfl.Input(shape=(2,), name="label")
output_tensor = tf.nn.softmax_cross_entropy_with_logits(y, x)
return output_tensor
else:
output_tensor = tf.identity(x, name="output")
return output_tensor
# ==============================================================================
# Define model
# ==============================================================================
inputs = keras.Input(shape=(4, 4, 3))
conv = keras.layers.Conv2D(filters=2, kernel_size=2)(inputs)
pooling = keras.layers.GlobalAveragePooling2D()(conv)
feature = keras.layers.Dense(10)(pooling)
outputs = Switch()(feature)
# output = tf.identity(feature)
model = keras.Model(inputs, outputs)
# ==============================================================================
# Training data
# ==============================================================================
tf.random.set_seed(42)
x_train = tf.random.normal((5, 4, 4, 3))
y_train = tf.constant([1, 1, 0, 0, 2])
# ==============================================================================
# Train model
# ==============================================================================
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics='accuracy',)
model.fit(
x=x_train,
y=y_train,
epochs=3,
verbose='auto',
shuffle=True,
initial_epoch=0,
max_queue_size=10
)
The Switch layer is based on: Is it possible to add different behavior for training and testing in keras Functional API
If I understand correctly, when using model.fit, the model's call is automatically invoked with training=True.
However, when I run the model I get the following error:
TypeError: You are passing KerasTensor(type_spec=TensorSpec(shape=(),
dtype=tf.float32, name=None), name='Placeholder:0',
description="created by layer 'tf.cast_4'"), an intermediate Keras
symbolic input/output, to a TF API that does not allow registering
custom dispatchers, such as tf.cond, tf.function, gradient tapes,
or tf.map_fn. Keras Functional model construction only supports TF
API calls that do support dispatching, such as tf.math.add or
tf.reshape. Other APIs cannot be called directly on symbolic
Kerasinputs/outputs. You can work around this limitation by putting
the operation in a custom Keras layer call and calling that layer on
this symbolic input/output.
When I pass:
model.fit(
x=[x_train, y_train],
y=y_train,
I receive the following error:
ValueError: Layer "model" expects 1 input(s), but it received 2 input
tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None,
4, 4, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None,)
dtype=int32>]
The problem is probably due to the Switch layer.
How do I solve it and how I train a model in which the training phase input and output are different than in the working phase (gets image, outputs features vector)?

Bert prediction shape not equal to num_samples

I have a text classification that I am trying to do using BERT. Below is the code I am using. The model training code(below) works fine but I am facing issue with the prediction part
from transformers import TFBertForSequenceClassification
import tensorflow as tf
# recommended learning rate for Adam 5e-5, 3e-5, 2e-5
learning_rate = 5e-5
nlabels = 26
# we will do just 1 epoch for illustration, though multiple epochs might be better as long as we will not overfit the model
number_of_epochs = 1
# model initialization
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=nlabels,
output_attentions=False,
output_hidden_states=False)
# optimizer Adam
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=1e-08)
# we do not have one-hot vectors, we can use sparce categorical cross entropy and accuracy
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
bert_history = model.fit(ds_tr_encoded, epochs=number_of_epochs)
I am getting the output using the following
preds = model.predict(ds_te_encoded)
pred_labels_idx = np.argmax(preds['logits'], axis=1)
The issue I am facing is that the shape of pred_labels_idx is not the same as ds_te_encoded
len(pred_labels_idx) #426820
tf.data.experimental.cardinality(ds_te_encoded) #<tf.Tensor: shape=(), dtype=int64, numpy=21341>
Not sure why this is happening.
Since ds_te_encoded is of type tf.data.Dataset and you call cardinality(...), the cardinality in your case is simply the rounded number of batches and not the number of samples. So I am assuming you are using a batch size of 20, because 426820/20 = 21341. That is probably what is causing the confusion.

LSTM Model not having any variance during evaluation

I have a question regarding the evaluation of an LSTM Model. I have trained an LSTM Model and stored it with model.save(...). Now I want load_model and evaluate it on the validation set datasets. Since neural networks are stochastic, I run it several times and compute the mean and the variance of the different metrics I am interested in.
Now I am shocked that after the first run all consecutive runs have the same performance on every metric. I don't think that is right, but I don't know where the error occurs.
So my question is:
what is my mistake in setting up the validation of my model?
and how can I fix that?
Here are the code snippets that should explain what I am doing:
Compile and fit the Model
def compile_and_fit( hparams,
MAX_EPOCHS,
model_path ):
window = WindowGenerator( input_width= hparams[HP_WINDOW_SIZE],
label_width=hparams[HP_WINDOW_SIZE], shift=1,
label_columns=['q_MARI'], batch_size = hparams[HP_BATCH_SIZE])
model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_1"),
tf.keras.layers.Dropout(hparams[HP_DROPOUT], name="Dropout_1"),
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_2"),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
learning_rate = hparams[HP_LEARNING_RATE]
model.compile(loss=tf.losses.MeanSquaredError(),
optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
metrics=get_metrics())
history = model.fit(window.train,
epochs=MAX_EPOCHS,
validation_data=window.val,
callbacks= get_callbacks(model_path))
_, a,_,_,_,_ = model.evaluate(window.val)
return a, model, history
Train and safe it
a, model, history = compile_and_fit( hparams = hparams, MAX_EPOCHS = MAX_EPOCHS, model_path = run_path)
model.save(run_path)
Load and evaluate it
model = tf.keras.models.load_model(os.path.join(hparam_path, model_name),
custom_objects={"max_error": max_error, "median_absolute_error": median_absolute_error, "rev_metric": rev_metric, "nse_metric": nse_metric})
model.compile(loss=tf.losses.MeanSquaredError(), optimizer="adam", metrics=get_metrics())
metric_values = np.empty(shape = (nr_runs, len(metrics)), dtype=float)
for j in range(nr_runs):
window = WindowGenerator(input_width= hparam_vals[i], label_width=hparam_vals[i], shift=1,
label_columns=['q_MARI'])
metric_values[j]= np.array(model.evaluate(window.val))
means = metric_values.mean(axis=0)
varis = metric_values.var(axis=0)
print(f'means: {means}, varis: {varis}')
The results I am getting
For setting up the Training I follow those two guides:
https://www.tensorflow.org/tutorials/structured_data/time_series
https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams
LSTM is not stochastic. Evaluation results should be the same for the same data.
There are two steps, when you train the model, randomness will influence the model you trained. However, after that, you saved the model, the prediction result would be same if you use the same model.

PyTorch transfer learning with pre-trained ImageNet model

I want to create an image classifier using transfer learning on a model already trained on ImageNet.
How do I replace the final layer of a torchvision.models ImageNet classifier with my own custom classifier?
Get a pre-trained ImageNet model (resnet152 has the best accuracy):
from torchvision import models
# https://pytorch.org/docs/stable/torchvision/models.html
model = models.resnet152(pretrained=True)
Print out its structure so we can compare to the final state:
print(model)
Remove the last module (generally a single fully connected layer) from model:
classifier_name, old_classifier = model._modules.popitem()
Freeze the parameters of the feature detector part of the model so that they are not adjusted by back-propagation:
for param in model.parameters():
param.requires_grad = False
Create a new classifier:
classifier_input_size = old_classifier.in_features
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(classifier_input_size, hidden_layer_size)),
('activation', nn.SELU()),
('dropout', nn.Dropout(p=0.5)),
('fc2', nn.Linear(hidden_layer_size, output_layer_size)),
('output', nn.LogSoftmax(dim=1))
]))
The module name for our classifier needs to be the same as the one which was removed. Add our new classifier to the end of the feature detector:
model.add_module(classifier_name, classifier)
Finally, print out the structure of the new network:
print(model)

how to get data from within Keras model for visualisation?

I am using Tensorflow 1.12 which has Keras integrated together with Python 3.6.x
I wish to use Keras for its simplicity of model building, but also would like to use data on the intermediate layer for visualization of feature maps and kernels to better understand how machine learning works(even though this is admittedly not so evident)
I am using the mnist data base and a very basic Keras model to try to do what I want to do.
Here is the code
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow import keras
print(tf.VERSION)
print(tf.keras.__version__)
tf.keras.backend.clear_session()
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train_shaped = np.expand_dims(x_train, axis=3) / 255.0
x_test_shaped = np.expand_dims(x_test, axis=3) / 255.0
def create_model():
model = tf.keras.models.Sequential([
keras.layers.Conv2D(32, kernel_size=(4, 4),strides=(1,1),activation='relu', input_shape=(28,28,1)),
keras.layers.Dropout(0.5),
keras.layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)),
keras.layers.Conv2D(24, kernel_size=(8, 8),strides=(1,1)),
keras.layers.Flatten(),
keras.layers.Dropout(0.5),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy'])
return model
The above sets up the dataset and the model
Next I define my session for Tensorflow and do the training.
This all works fine but now I want to get my data for the, as example, the first layer out as ideally a numpy array on which I can do the visualization.
My model.layers[0].output gives me a Tensor of (?,25,25,32) as expected and now I try to do a eval() and thenafter a .numpy() method to get my result.
The error message is
You must feed a value for placeholder tensor 'conv2d_6_input' with dtype float and shape [?,28,28,1]
I am looking for help on how to get my data (32 feature maps of 25x25 pixels) out as numpy array for visualization.
sess = tf.Session(graph=tf.get_default_graph())
tf.keras.backend.set_session(sess)
with sess.as_default():
model = create_model()
model.summary()
model.fit(x_train_shaped[:10000], y_train[:10000], epochs=2,
batch_size=64, validation_split=.2,)
model.layers[0].output
print(model.layers[0].output.shape)
my_array = model.layers[0].output
my_array.eval()
tf.keras.backend.clear_session()
sess.close()
First of all, you must note that getting the output of a model or a layer only makes sense when you feed the input layers with some data. You get the model something (i.e. input data), you get something in return (i.e. output or feature map or activation map). That's why it would produce the following error:
You must feed a value for placeholder tensor 'conv2d_6_input'
You haven't fed the baby, so it would cry :)
Now, the idea of building a new Keras model is counterproductive. When you have a large model in the first place, one would like to plug in some kind of ready-made code that can get the output of the feature maps and visualize them. So this route seems not really interesting.
I think you are mistakenly thinking that when you construct a new model out of the layers of another model, a whole new model is cloned. That's not the case since the parameters of the layers would be shared.
Concretely, what you are looking for can be achieved like this:
viz_conv = Model(model.input, model.layers[0].output)
conv_active = viz_conv(my_input_data) # my_input_data is a numpy array of shape `(num_samples,28,28,1)`
All the parameters of viz_conv are shared with model and they have not been copied either. Under the hood they are using the same weight Tensors.
Alternatively, you could define a backend function to do this:
from tensorflow.keras import backend as K
viz_func = K.function([model.input], [any layer(s) you would like in the model])
output = viz_func([my_input_data])
This has been covered in Keras documentation and I highly recommend to read that as well.

Categories