I have a trained TF model which has the following architecture:
Inputs:
word_a, one-hot representation, vocab-size: 50000
word_b, one-hot representation, vocab-size: 50
Output:
probs, size: 1x10000
The network consists of embedding lookup of word_a of size 1x100 (dense_word_a) from an embedding matrix. word_b is transformed into a similar vector using a Character CNN into a dense vector of size 1x250. Both the vectors are concatenated into a 1x350 vector and using a decoder layer and sigmoid we're mapping it to the output layer and sigmoid with vector size 1x10000.
I need to run this model on the client, and for this I'm converting it to TFLite.
However, I also need to break the model into two sub-models with the following inputs and outputs:
Model 1:
Inputs:
word_a: one-hot representation, (1x50000) vocab-size: 50000
Output:
dense_word_a: dense word-embedding looked up from embedding matrix (1x100)
Network:
Simple embedding lookup for word_a from embedding matrix.
Model 2:
Inputs:
dense_word_a: embedding for word_a received from Model 1. (1x100).
word_b, one-hot representation, vocab-size: 50 (1x50)
Output:
probs, size: 1x10000
In Model 1, the input word_a is a placeholder and dense_word_a is a variable. In Model 2, dense_word_a is a placeholder and it's value is concatenated with the word_b's embedding.
The embeddings in my model are not pre-trained, and are trained as part of the model training process itself. So I need to train the model as a combined model but during inference I want to break it up into Model 1 and Model 2 as described above.
The idea is to run the Model 1 on server side and pass the embedding values to client so it can perform inference using word_b and not have a 5MB embedding matrix on the client. So, I'm not constrained on the size of Model 1, but since Model 2 runs on the client I need it to be small.
Here's what I've tried:
1. During model freezing time, I freeze the full model but in the output nodes list I also add the variable name dense_word_a along with probs. I then convert the model to TFLite. During inference, I'm able to see the dense_word_a output as a 1x100 vector. This seems to work fine. I'm also getting the probs as output,
For generating Model 2, I just remove the dense_word_a variable and convert it into a placeholder (using tf.placeholder), remove the placeholder for word_a and freeze the graph again.
However, I'm not able to match the probs value. The probs vector generated by the full model don't match with the probs values vector generated by Model 2.
How can I go about breaking the model into sub-models and also match the results?
I think what you described should work.
Is it easy to reproduce the problem that you're seeing? If you can isolate the reproducible steps and you believe there's a bug, could you file a bug on github? Thanks!
Related
I am trying to get the embeddings from pre-trained wav2vec2 models (e.g., from jonatasgrosman/wav2vec2-large-xlsr-53-german) using my own dataset.
My aim is to use these features for a downstream task (not specifically speech recognition). Namely, since the dataset is relatively small, I would train an SVM with these embeddings for the final classification.
So far I have tried this:
model_name = "facebook/wav2vec2-large-xlsr-53-german"
feature_extractor = Wav2Vec2Processor.from_pretrained(model_name)
model = Wav2Vec2Model.from_pretrained(model_name)
input_values = feature_extractor(train_dataset[:10]["speech"], return_tensors="pt", padding=True,
feature_size=1, sampling_rate=16000 ).input_values
Then, I am not sure whether the embeddings here correspond to the sequence of last_hidden_states:
hidden_states = model(input_values).last_hidden_state
or to the sequence of features of the last conv layer of the model:
features_last_cnn_layer = model(input_values).extract_features
Also, is this the correct way to extract features from a pre-trained model?
How one can get embeddings from a specific layer?
PD: Posting here as the HuggingFace's forum seems to be less active.
Just check the documentation:
last_hidden_state (torch.FloatTensor of shape (batch_size,
sequence_length, hidden_size)) – Sequence of hidden-states at the
output of the last layer of the model.
extract_features (torch.FloatTensor of shape (batch_size,
sequence_length, conv_dim[-1])) – Sequence of extracted feature
vectors of the last convolutional layer of the model.
The last_hidden_state vector represents so called contextualized embeddings (i.e. every feature (CNN output) has a vector representation that is to some extend influenced by the other tokens of the sequence).
The extract_features vector represents the embeddings of your input (after the CNNs).
.
Also, is this the correct way to extract features from a pre-trained
model?
Yes.
How one can get embeddings from a specific layer?
Set output_hidden_states=True:
o = model(input_values,output_hidden_states=True)
o.keys()
Output:
odict_keys(['last_hidden_state', 'extract_features', 'hidden_states'])
The hidden_states value contains the embeddings and the contextualized embeddings of each attention layer.
P.S.: jonatasgrosman/wav2vec2-large-xlsr-53-german model was trained with feat_extract_norm==layer. That means, you should also pass an attention mask to the model:
model_name = "facebook/wav2vec2-large-xlsr-53-german"
feature_extractor = Wav2Vec2Processor.from_pretrained(model_name)
model = Wav2Vec2Model.from_pretrained(model_name)
i= feature_extractor(train_dataset[:10]["speech"], return_tensors="pt", padding=True,
feature_size=1, sampling_rate=16000 )
model(**i)
I have the folowing models that i want to train (See image below):
The model has an input of 20. The model A has an input of 10 (the first 10 elements of the initial input), the model B has an input of 10 (the last 10 elements of the initial input) finally the input of the model C is the concatenation of the output of the models A and B.
How can I train this 3 models at the same time in Keras? Can I merge it in one big model? (I only have data to train the big model)
Can I merge it in one big model?
Yes!
How can I train this 3 models at the same time in Keras?
I will give you pointers:
Use functional APIs. Want to know how it is different from sequential? Look here
Use concatenate layer - Reference
Lets assume that you have your three models defined, and named model_A, model_B and model_C. You can now define you complete model somewhat like this (I did not check the exact code):
def complete_model(model_A, model_B, model_C):
input_1 = layers.Input(shape=(10,))
input_2 = layers.Input(shape=(10,))
model_A_output = model_A(input_1)
model_B_output = model_B(input_2)
concatenated = tf.concat([model_A_output, model_B_output], axis=-1)
model_C_output = model_C(concatenated)
model = Model(inputs=[input_1, input_2], outputs=model_C_output)
model.compile(loss=losses.MSE)
model.summary()
return model
This requires you to give two-dimensional inputs, so you have to do some numpy slicing to preprocess your inputs.
If you still want your one-dimensional inputs, you can just define a single input layer with shape (20,) and then use the tf.split function to split it in half and feed it into the next networks.
I am trying to predict a single image. But my model returns a prediction array with the shape (1,1,1,2048) when it should be (1,10). Any idea what I am doing wrong? My x input shape is correct at (1,32,32,3).
ResNet50V2():
IMG_SHAPE = (32, 32, 3)
return tf.keras.applications.ResNet50V2(input_shape=IMG_SHAPE, include_top=False, weights=None, classes=10)
model = ResNet50V2()
x = x[None, :]
predictions = model.predict(x)
You are loading your keras-model with parameter
include_top=False
which cuts of the fully-connected projection layer that is responsible for projecting the model output to your expected amout of classes. Change the parameter to True.
That's because you are disabling top with the include top, which removes the final classification layer. You need to either add your own layer with 10 classes or remove the include top parameter and retrain the network with the desired inputs.
A image classification network usually works within 2 steps of processing.
The first one is feature extraction, we call that "base", and it consists in a stack of layers to find and reinforce patterns on image(2DConv, Relu and MaxPool).
The second one is the "head" and it is used to classify the extracted features from previous step.
Your code is getting the raw output of the "base", with no classification, and as the other gentle people stated, the solution is adding a custom "head" or changing the include_top parameter to True.
Im trying to create document context vectors from sentence-vectors via LSTM using keras (so each document consist of a sequence of sentence vectors).
My goal is to replicate the following blog post using keras: https://andriymulyar.com/blog/bert-document-classification
I have a (toy-)tensor, that looks like this: X = np.array(features).reshape(5, 200, 768) So 5 documents with each having a 200 sequence of sentence vectors - each sentence vector having 768 features.
So to get an embedding from my sentence vectors, I encoded my documents as one-hot-vectors to learn an LSTM:
y = [1,2,3,4,5] # 5 documents in toy-tensor
y = np.array(y)
yy = to_categorical(y)
yy = yy[0:5,1:6]
Until now, my code looks like this
inputs1=Input(shape=(200,768))
lstm1, states_h, states_c =LSTM(5,dropout=0.3,recurrent_dropout=0.2, return_state=True)(inputs1)
model1=Model(inputs1,lstm1)
model1.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['acc'])
model1.summary()
model1.fit(x=X,y=yy,batch_size=100,epochs=10,verbose=1,shuffle=True,validation_split=0.2)
When I print states_h I get a tensor of shape=(?,5) and I dont really know how to access the vectors inside the tensor, which should represent my documents.
print(states_h)
Tensor("lstm_51/while/Exit_3:0", shape=(?, 5), dtype=float32)
Or am I doing something wrong? To my understanding there should be 5 document vectors e.g. doc1=[...] ; ...; doc5=[...] so that I can reuse the document vectors for a classification task.
Well, printing a tensor shows exactly this: it's a tensor, it has that shape and that type.
If you want to see data, you need to feed data.
States are not weights, they are not persistent, they only exist with input data, just as any other model output.
You should create a model that outputs this information (yours doesn't) in order to grab it. You can have two models:
#this is the model you compile and train - exactly as you are already doing
training_model = Model(inputs1,lstm1)
#this is just for getting the states, nothing else, don't compile, don't train
state_getting_model = Model(inputs1, [lstm1, states_h, states_c])
(Don't worry, these two models will share the same weights and be updated together, even if you only train the training_model)
Now you can:
With eager mode off (and probably "on" too):
lstm_out, states_h_out, states_c_out = state_getting_model.predict(X)
print(states_h_out)
print(states_c_out)
With eager mode on:
lstm_out, states_h_out, states_c_out = state_getting_model(X)
print(states_h_out.numpy())
print(states_c_out.numpy())
TF 1.x with tf.keras (Tested with TF 1.15)
Keras does operations using symbolic tensors. Therefore, print(states_h) won't give you anything unless you pass data to the placeholders states_h depends on (in this case inputs1). You can do that as follows.
import tensorflow.keras.backend as K
inputs1=Input(shape=(200,768))
lstm1, states_h, states_c =LSTM(5,dropout=0.3,recurrent_dropout=0.2, return_state=True)(inputs1)
model1=Model(inputs1,lstm1)
model1.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['acc'])
model1.summary()
model1.fit(x=X,y=yy,batch_size=100,epochs=10,verbose=1,shuffle=True,validation_split=0.2)
sess = K.get_session()
out = sess.run(states_h, feed_dict={inputs1:X})
Then out will be (batch_size, 5) sized output.
TF 2.x with tf.keras
The above code won't work as it is. And I still haven't found how to get this to work with TF 2.0 (even though TF 2.0 will still produce a placeholder according to docs). I will edit my answer when I find how to fix this for TF 2.x.
I am using Keras to build my architecture.
The regression problem I am trying to solve has outputs different for different training samples.
For instance, for 1st training sample, I have output as [16,3] for 2nd training sample it is [6]. I am unable to find a solution about how to assign units to output dense layer based on this type of output. You can interpret the output as having y_train having shape [no. of samples, columns(depending on how many outputs do specific training sample has)].
I have tried to fetch each and every input from y_train, so that I would have assign the length of this input to output dense layer as units, e.g. as for 1st training sample, we have [16,3] as output (input in y_train to predict for 1st training sample), I am planning to set output dense layer's unit as 2 and so on. but I even don't know how to fetch this and assign it to output dense layer's unit.
Can anybody help me on this variable length output problem?