Parameter transfer in Tensorflow from one model to another - python

I have neural net model with multiple sub-models. Model2 includes model1 in it's topology. I want to train model1 first separately from model2, then I want to pass the trained parameters of model1 to model2. Does an approach similar to below passes parameters automatically or do I need to explicitly get weights and biases from model1 and pass to model2 during initalization?
Model1:
main_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32', name='main_input')
x = embedding_layer(main_input)
x = CuDNNLSTM(KG_EMBEDDING_DIM, return_sequences=True)(x)
x = Avg(x)
x = Dense(KG_EMBEDDING_DIM)(x)
x = Activation('relu')(x)
# entity_extraction = Reshape([KG_EMBEDDING_DIM])(x)
entity_extraction = Transpose(x)
final_output = Dense(units=len(unique_labels), activation='softmax')(Transpose(entity_extraction))
optimizer = Adam(lr=LEARNING_RATE, clipvalue=0.25)
m1 = Model(inputs=main_input, outputs=final_output)
m1.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
m1.summary()
Model2:
x = embedding_layer(main_input)
x = CuDNNLSTM(LSTM_HIDDEN_SIZE, return_sequences=True)(x)
Avg = keras.layers.core.Lambda(lambda x: K.mean(x, axis=1), output_shape=(LSTM_HIDDEN_SIZE,))
x = Avg(x)
x = Dense(LSTM_HIDDEN_SIZE)(x)
main_lstm_out = Activation('relu')(x)
lstm_hidden_and_entity = Concatenate(axis=0)([Transpose(main_lstm_out), entity_extraction])
print("lstm_hidden_and_entity", K.int_shape(lstm_hidden_and_entity))
# input("continue?")
final_output = Dense(units=len(unique_labels), activation='softmax')(Transpose(lstm_hidden_and_entity))
optimizer = Adam(lr=LEARNING_RATE, clipvalue=0.25)
m2 = Model(inputs=main_input, outputs=final_output)
m2.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
Notice that Model2 uses entity_extraction in it's structure which is a layer in model1 and trained before initialization of Model2. So with this kind of approach, does it transfer parameters?

Related

TensorFlow Deep Learning Model Training Error --- loss: nan - accuracy: 0.0000e+00

I'm working on a sports analytics project to predict the outcome of MLB matchups with deep learning. When training the neural net with TensorFlow, I am getting 'NaN' for the loss and consistent 0's for the accuracy. Here is the full code for my model:
import tensorflow as tf
import pandas as pd
import numpy as np
matchups = pd.read_csv("data\\matchups\\model_matchups.csv", index_col=0)
y = matchups['outcome']
x_train = matchups.drop(columns=['outcome','game_code','batter_game_code','pitcher_game_code','batter_id','pitcher_id','b_pos'])
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(y)
labels_enc = le.transform(y)
labels = tf.keras.utils.to_categorical(labels_enc)
ss = preprocessing.StandardScaler()
x_standardized = ss.fit_transform(x_train)
p = .1
inputs = tf.keras.layers.Input((74,), name='numeric_inputs')
x = tf.keras.layers.Dropout(p)(inputs)
x = tf.keras.layers.Dense(500, activation='relu')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(p)(x)
x = tf.keras.layers.Dense(250, activation='relu')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(p)(x)
x = tf.keras.layers.Dense(100, activation='relu')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(p)(x)
x = tf.keras.layers.Dense(50, activation='relu')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(p)(x)
out = tf.keras.layers.Dense(55, activation='softmax', name='output')(x)
model = tf.keras.models.Model(inputs=inputs, outputs=out)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
def bootstrap_sample_generator(batch_size):
while True:
batch_idx = np.random.choice(x_standardized.shape[0], batch_size)
yield ({'numeric_inputs': x_standardized[batch_idx]},
{'output': labels[batch_idx]})
batch_size = 128
model.fit(
bootstrap_sample_generator(batch_size),
steps_per_epoch=10_000 // batch_size,
epochs=5,
max_queue_size=10
)
After researching this for a bit, it seems like this issue is commonly caused by incorrect model fitting for categorical data. I've attempted to resolve this by transforming the labels into one-hot vectors, but am still getting the training error.
How to resolve this issue?
For anyone wondering, I forgot to remove null values from the training data.

Performing Differentiation wrt input within a keras model for use in loss

Is there any layer in keras which calculates the derivative wrt input? For example if x is input, the first layer is say f(x), then the next layer's output should be f'(x). There are multiple question here about this topic but all of them involve computation of derivative outside the model. In essence, I want to create a neural network whose loss function involves both the jacobian and hessians wrt the inputs.
I've tried the following
import keras.backend as K
def create_model():
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = K.gradients(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model()
X = np.random.uniform(size = (3, 10))
This is gives the error tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
So I tried using that
def create_model2():
with tf.GradientTape() as tape:
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = tape.gradient(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model2()
X = np.random.uniform(size = (3, 10))
but this tells me 'KerasTensor' object has no attribute '_id'
Both these methods work fine outside the model. My end goal is to use the Jacobian and Hessian in the loss function, so alternative approaches would also be appreciated
Not sure what exactly you want to do, but maybe try a custom Keras layer with tf.gradients:
import tensorflow as tf
tf.random.set_seed(111)
class GradientLayer(tf.keras.layers.Layer):
def __init__(self):
super(GradientLayer, self).__init__()
self.dense = tf.keras.layers.Dense(1, activation = "sigmoid")
#tf.function
def call(self, inputs):
outputs = self.dense(inputs)
return tf.gradients(outputs, inputs)
def create_model2():
gradient_layer = GradientLayer()
inputs = tf.keras.layers.Input(shape = (10,))
outputs = gradient_layer(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
model = create_model2()
X = tf.random.uniform((3, 10))
print(model(X))
tf.Tensor(
[[-0.07935508 -0.12471244 -0.0702782 -0.06729251 0.14465885 -0.0818079
-0.08996294 0.07622238 0.11422144 -0.08126545]
[-0.08666676 -0.13620329 -0.07675356 -0.07349276 0.15798753 -0.08934557
-0.09825202 0.08324542 0.12474566 -0.08875315]
[-0.08661086 -0.13611545 -0.07670406 -0.07344536 0.15788564 -0.08928795
-0.09818865 0.08319173 0.12466521 -0.08869591]], shape=(3, 10), dtype=float32)

Method to transfer weights between nested keras models

I'm trying to successively build up mixture models, iteratively adding sub-models.
I start by building and training a simple model. I then build a slightly more complex model that contains all of the original model but has more layers. I want to move the trained weights from the first model into the new model. How can I do this? The first model is nested in the second model.
Here's a dummy MWE:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import (concatenate, Conv1D, Dense, LSTM)
from tensorflow.keras import Model, Input, backend
# data
x = np.random.normal(size = 100)
y = np.sin(x)+np.random.normal(size = 100)
# model 1
def make_model_1():
inp = Input(1)
l1 = Dense(5, activation = 'relu')(inp)
out1 = Dense(1)(l1)
model1 = Model(inp, out1)
return model1
model1 = make_model_1()
model1.compile(optimizer = tf.keras.optimizers.SGD(),
loss = tf.keras.losses.mean_squared_error)
model1.fit(x, y, epochs = 3, batch_size = 10)
# make model 2
def make_model_2():
inp = Input(1)
l1 = Dense(5, activation = 'relu')(inp)
out1 = Dense(1)(l1)
l2 = Dense(15, activation = 'sigmoid')(inp)
out2 = Dense(1)(l2)
bucket = tf.stack([out1, out2], axis=2)
out = backend.squeeze(Dense(1)(bucket), axis = 2)
model2 = Model(inp, out)
return model2
model2 = make_model_2()
HOW CAN I TRANSFER THE WEIGHTS FROM model1 to model2? In a way that's automatic and completely agnostic about the nature of the two models, except that they are nested?
you can simply load the trained weights in the specific part of the new model you are interested in. I do this by creating a new instance of model1 into model2. After that, I load the trained weights.
Here the full example
# data
x = np.random.normal(size = 100)
y = np.sin(x)+np.random.normal(size = 100)
# model 1
def make_model_1():
inp = Input(1)
l1 = Dense(5, activation = 'relu')(inp)
out1 = Dense(1)(l1)
model1 = Model(inp, out1)
return model1
model1 = make_model_1()
model1.compile(optimizer = tf.keras.optimizers.SGD(),
loss = tf.keras.losses.mean_squared_error)
model1.fit(x, y, epochs = 3, batch_size = 10)
# make model 2
def make_model_2(trained_model):
inp = Input(1)
m = make_model_1()
m.set_weights(trained_model.get_weights())
out1 = m(inp)
l2 = Dense(15, activation = 'sigmoid')(inp)
out2 = Dense(1)(l2)
bucket = tf.stack([out1, out2], axis=2)
out = tf.keras.backend.squeeze(Dense(1)(bucket), axis = 2)
model2 = Model(inp, out)
return model2
model2 = make_model_2(model1)
model2.summary()

load_model and Lamda layer in Keras

How to load model that have lambda layer?
Here is the code to reproduce behaviour:
MEAN_LANDMARKS = np.load('data/mean_shape_68.npy')
def add_mean_landmarks(x):
mean_landmarks = np.array(MEAN_LANDMARKS, np.float32)
mean_landmarks = mean_landmarks.flatten()
mean_landmarks_tf = tf.convert_to_tensor(mean_landmarks)
x = x + mean_landmarks_tf
return x
def get_model():
inputs = Input(shape=(8, 128, 128, 3))
cnn = VGG16(include_top=False, weights='imagenet', input_shape=(128, 128, 3))
x = TimeDistributed(cnn)(inputs)
x = TimeDistributed(Flatten())(x)
x = LSTM(256)(x)
x = Dense(68 * 2, activation='linear')(x)
x = Lambda(add_mean_landmarks)(x)
model = Model(inputs=inputs, outputs=x)
optimizer = Adadelta()
model.compile(optimizer=optimizer, loss='mae')
return model
Model compiles and I can save it, but when I tried to load it with load_model function I get an error:
in add_mean_landmarks
mean_landmarks = np.array(MEAN_LANDMARKS, np.float32)
NameError: name 'MEAN_LANDMARKS' is not defined
Аs I understand MEAN_LANDMARKS is not incorporated in graph as constant tensor. Also it's related to this question: How to add constant tensor in Keras?
You need to pass custom_objects argument to load_model function:
model = load_model('model_file_name.h5', custom_objects={'MEAN_LANDMARKS': MEAN_LANDMARKS})
Look for more info in Keras docs: Handling custom layers (or other custom objects) in saved models
.

Reduce the size of Keras LSTM model

Essentially, I am training an LSTM model using Keras, but when I save it, its size takes up to 100MB. However, my purpose of the model is to deploy to a web server in order to serve as an API, my web server is not able to run it since the model size is too big. After analyzing all parameters in my model, I figured out that my model has 20,000,000 parameters but 15,000,000 parameters are untrained since they are word embeddings. Is there any way that I can minimize the size of the model by removing that 15,000,000 parameters but still preserving the performance of the model?
Here is my code for the model:
def LSTModel(input_shape, word_to_vec_map, word_to_index):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.5)(X)
X = LSTM(256, return_sequences=False)(X)
X = Dropout(0.5)(X)
X = Dense(NUM_OF_LABELS)(X)
X = Activation("softmax")(X)
model = Model(inputs=sentence_indices, outputs=X)
return model
Define the layers you want to save outside the function and name them. Then create two functions foo() and bar(). foo() will have the original pipeline including the embedding layer. bar() will have only the part of pipeline AFTER embedding layer. Instead, you will define new Input() layer in bar() with dimensions of your embeddings:
lstm1 = LSTM(256, return_sequences=True, name='lstm1')
lstm2 = LSTM(256, return_sequences=False, name='lstm2')
dense = Dense(NUM_OF_LABELS, name='Susie Dense')
def foo(...):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
def bar(...):
embeddings = Input(embedding_shape, dtype="float32")
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
foo_model = foo(...)
bar_model = bar(...)
foo_model.fit(...)
bar_model.save_weights(...)
Now, you will train the original foo() model. Then you can save the weights of the reduced bar() model. When loading the model, don't forget to specify by_name=True parameter:
foo_model.load_weights('bar_model.h5', by_name=True)

Categories