Essentially, I am training an LSTM model using Keras, but when I save it, its size takes up to 100MB. However, my purpose of the model is to deploy to a web server in order to serve as an API, my web server is not able to run it since the model size is too big. After analyzing all parameters in my model, I figured out that my model has 20,000,000 parameters but 15,000,000 parameters are untrained since they are word embeddings. Is there any way that I can minimize the size of the model by removing that 15,000,000 parameters but still preserving the performance of the model?
Here is my code for the model:
def LSTModel(input_shape, word_to_vec_map, word_to_index):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.5)(X)
X = LSTM(256, return_sequences=False)(X)
X = Dropout(0.5)(X)
X = Dense(NUM_OF_LABELS)(X)
X = Activation("softmax")(X)
model = Model(inputs=sentence_indices, outputs=X)
return model
Define the layers you want to save outside the function and name them. Then create two functions foo() and bar(). foo() will have the original pipeline including the embedding layer. bar() will have only the part of pipeline AFTER embedding layer. Instead, you will define new Input() layer in bar() with dimensions of your embeddings:
lstm1 = LSTM(256, return_sequences=True, name='lstm1')
lstm2 = LSTM(256, return_sequences=False, name='lstm2')
dense = Dense(NUM_OF_LABELS, name='Susie Dense')
def foo(...):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
def bar(...):
embeddings = Input(embedding_shape, dtype="float32")
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
foo_model = foo(...)
bar_model = bar(...)
foo_model.fit(...)
bar_model.save_weights(...)
Now, you will train the original foo() model. Then you can save the weights of the reduced bar() model. When loading the model, don't forget to specify by_name=True parameter:
foo_model.load_weights('bar_model.h5', by_name=True)
Related
I have neural net model with multiple sub-models. Model2 includes model1 in it's topology. I want to train model1 first separately from model2, then I want to pass the trained parameters of model1 to model2. Does an approach similar to below passes parameters automatically or do I need to explicitly get weights and biases from model1 and pass to model2 during initalization?
Model1:
main_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32', name='main_input')
x = embedding_layer(main_input)
x = CuDNNLSTM(KG_EMBEDDING_DIM, return_sequences=True)(x)
x = Avg(x)
x = Dense(KG_EMBEDDING_DIM)(x)
x = Activation('relu')(x)
# entity_extraction = Reshape([KG_EMBEDDING_DIM])(x)
entity_extraction = Transpose(x)
final_output = Dense(units=len(unique_labels), activation='softmax')(Transpose(entity_extraction))
optimizer = Adam(lr=LEARNING_RATE, clipvalue=0.25)
m1 = Model(inputs=main_input, outputs=final_output)
m1.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
m1.summary()
Model2:
x = embedding_layer(main_input)
x = CuDNNLSTM(LSTM_HIDDEN_SIZE, return_sequences=True)(x)
Avg = keras.layers.core.Lambda(lambda x: K.mean(x, axis=1), output_shape=(LSTM_HIDDEN_SIZE,))
x = Avg(x)
x = Dense(LSTM_HIDDEN_SIZE)(x)
main_lstm_out = Activation('relu')(x)
lstm_hidden_and_entity = Concatenate(axis=0)([Transpose(main_lstm_out), entity_extraction])
print("lstm_hidden_and_entity", K.int_shape(lstm_hidden_and_entity))
# input("continue?")
final_output = Dense(units=len(unique_labels), activation='softmax')(Transpose(lstm_hidden_and_entity))
optimizer = Adam(lr=LEARNING_RATE, clipvalue=0.25)
m2 = Model(inputs=main_input, outputs=final_output)
m2.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
Notice that Model2 uses entity_extraction in it's structure which is a layer in model1 and trained before initialization of Model2. So with this kind of approach, does it transfer parameters?
Is there any layer in keras which calculates the derivative wrt input? For example if x is input, the first layer is say f(x), then the next layer's output should be f'(x). There are multiple question here about this topic but all of them involve computation of derivative outside the model. In essence, I want to create a neural network whose loss function involves both the jacobian and hessians wrt the inputs.
I've tried the following
import keras.backend as K
def create_model():
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = K.gradients(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model()
X = np.random.uniform(size = (3, 10))
This is gives the error tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
So I tried using that
def create_model2():
with tf.GradientTape() as tape:
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = tape.gradient(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model2()
X = np.random.uniform(size = (3, 10))
but this tells me 'KerasTensor' object has no attribute '_id'
Both these methods work fine outside the model. My end goal is to use the Jacobian and Hessian in the loss function, so alternative approaches would also be appreciated
Not sure what exactly you want to do, but maybe try a custom Keras layer with tf.gradients:
import tensorflow as tf
tf.random.set_seed(111)
class GradientLayer(tf.keras.layers.Layer):
def __init__(self):
super(GradientLayer, self).__init__()
self.dense = tf.keras.layers.Dense(1, activation = "sigmoid")
#tf.function
def call(self, inputs):
outputs = self.dense(inputs)
return tf.gradients(outputs, inputs)
def create_model2():
gradient_layer = GradientLayer()
inputs = tf.keras.layers.Input(shape = (10,))
outputs = gradient_layer(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
model = create_model2()
X = tf.random.uniform((3, 10))
print(model(X))
tf.Tensor(
[[-0.07935508 -0.12471244 -0.0702782 -0.06729251 0.14465885 -0.0818079
-0.08996294 0.07622238 0.11422144 -0.08126545]
[-0.08666676 -0.13620329 -0.07675356 -0.07349276 0.15798753 -0.08934557
-0.09825202 0.08324542 0.12474566 -0.08875315]
[-0.08661086 -0.13611545 -0.07670406 -0.07344536 0.15788564 -0.08928795
-0.09818865 0.08319173 0.12466521 -0.08869591]], shape=(3, 10), dtype=float32)
I'm trying to successively build up mixture models, iteratively adding sub-models.
I start by building and training a simple model. I then build a slightly more complex model that contains all of the original model but has more layers. I want to move the trained weights from the first model into the new model. How can I do this? The first model is nested in the second model.
Here's a dummy MWE:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import (concatenate, Conv1D, Dense, LSTM)
from tensorflow.keras import Model, Input, backend
# data
x = np.random.normal(size = 100)
y = np.sin(x)+np.random.normal(size = 100)
# model 1
def make_model_1():
inp = Input(1)
l1 = Dense(5, activation = 'relu')(inp)
out1 = Dense(1)(l1)
model1 = Model(inp, out1)
return model1
model1 = make_model_1()
model1.compile(optimizer = tf.keras.optimizers.SGD(),
loss = tf.keras.losses.mean_squared_error)
model1.fit(x, y, epochs = 3, batch_size = 10)
# make model 2
def make_model_2():
inp = Input(1)
l1 = Dense(5, activation = 'relu')(inp)
out1 = Dense(1)(l1)
l2 = Dense(15, activation = 'sigmoid')(inp)
out2 = Dense(1)(l2)
bucket = tf.stack([out1, out2], axis=2)
out = backend.squeeze(Dense(1)(bucket), axis = 2)
model2 = Model(inp, out)
return model2
model2 = make_model_2()
HOW CAN I TRANSFER THE WEIGHTS FROM model1 to model2? In a way that's automatic and completely agnostic about the nature of the two models, except that they are nested?
you can simply load the trained weights in the specific part of the new model you are interested in. I do this by creating a new instance of model1 into model2. After that, I load the trained weights.
Here the full example
# data
x = np.random.normal(size = 100)
y = np.sin(x)+np.random.normal(size = 100)
# model 1
def make_model_1():
inp = Input(1)
l1 = Dense(5, activation = 'relu')(inp)
out1 = Dense(1)(l1)
model1 = Model(inp, out1)
return model1
model1 = make_model_1()
model1.compile(optimizer = tf.keras.optimizers.SGD(),
loss = tf.keras.losses.mean_squared_error)
model1.fit(x, y, epochs = 3, batch_size = 10)
# make model 2
def make_model_2(trained_model):
inp = Input(1)
m = make_model_1()
m.set_weights(trained_model.get_weights())
out1 = m(inp)
l2 = Dense(15, activation = 'sigmoid')(inp)
out2 = Dense(1)(l2)
bucket = tf.stack([out1, out2], axis=2)
out = tf.keras.backend.squeeze(Dense(1)(bucket), axis = 2)
model2 = Model(inp, out)
return model2
model2 = make_model_2(model1)
model2.summary()
I have a model like the one below. I want to add a matrix of learnable weights in the end, which is initialized to the variable matrix that I pass to the function create_model.
To get the intuitive idea of what I want to do, imagine the matrix is supposed to be the one I pass to the model, but I have the feeling that it can still be finetuned during training. Therefore, I want it to be initialized to the values I pass, and then refined during training.
The code below works, but as you see from the model.summary() output, the matrix multiplication contains no learnable weights, which makes me think that the weights of the matrix are not beeing finetuned.
What am I doing wrong?
def create_model(num_columns, matrix):
inp_layer = tfl.Input((num_columns,))
dense = tfl.Dense(512, activation = 'relu')(inp_layer)
dense = tfl.Dense(256, activation = 'relu')(dense)
dense = tfl.Dense(128, activation = 'relu')(dense)
va = tf.Variable(matrix, dtype = tf.float32)
dense = K.dot(dense, va )
model = tf.keras.Model(inputs = inp_layer, outputs = dense)
model.compile(optimizer='adam', loss=['binary_crossentropy'])
model.summary()
return model
matrix = np.random.randint(0,2,(128, 206)) # In reality, this is not random, but it has sensed values
num_columns = 750
model = create_model(num_columns,matrix)
you can simply use a dense layer with no bias to do this multiplication. After the model is built I change the weight of interest with the matrix you provided
def create_model(num_columns, matrix):
inp_layer = Input((num_columns,))
x = Dense(512, activation = 'relu')(inp_layer)
x = Dense(256, activation = 'relu')(x)
x = Dense(128, activation = 'relu')(x)
dense = Dense(206, use_bias=False)(x)
model = Model(inputs = inp_layer, outputs = dense)
model.compile(optimizer='adam', loss=['binary_crossentropy'])
model.set_weights(model.get_weights()[:-1] + [matrix])
model.summary()
return model
matrix = np.random.randint(0,2,(128, 206)) # In reality, this is not random, but it has sensed values
num_columns = 750
model = create_model(num_columns,matrix)
check
(model.get_weights()[-1] == matrix).all() # True
In this way, the weights can be fine-tuned
I'm trying to train a neural network in keras but I'm getting as error that there are no gradients for any variable, which may imply that the graph is disconnected.
I'm copying here a stripped down version of the code with only the bit related to the model definition.
The model accepts two inputs that will be fed, one at time, to the same shared model: the encoder.
The two outputs of the encoder are then concatenated and sent to a dense layer to compute the final output.
I don't get what's wrong, it looks like that when instantiating the encoder I'm creating additional trainable variables that are not used anywhere.
For the network layout I was getting inspiration from the official keras docs:
https://keras.io/guides/functional_api/#all-models-are-callable-just-like-layers
def _get_encoder(self, model_input_shape):
encoder_input = Input(shape=model_input_shape)
x = encoder_input
x = Conv2D(32, (3, 3), strides=1, padding="same")(x)
x = BatchNormalization(axis=-1)(x)
x = LeakyReLU(alpha=0.1)(x)
latent_z = Flatten()(x)
latent_z = Dense(self.latent_dim)(latent_z)
encoder = Model(
encoder_input,
latent_z,
name='encoder'
)
return encoder
def build_model(self):
model_input_shape = (self.height, self.width, self.depth)
model_input_1 = Input(shape=model_input_shape)
model_input_2 = Input(shape=model_input_shape)
self.encoder = self._get_encoder(model_input_shape)
z_1 = self.encoder(model_input_1)
z_2 = self.encoder(model_input_2)
x = concatenate([z_1, z_2])
prediction = Dense(1, activation='sigmoid')(x)
self.network = Model(
inputs=[model_input_1, model_input_2],
outputs=[prediction],
name = 'network'
)
network.network.compile(
optimizer='rmsprop',
loss='mse',
metrics=['mae'])
H = network.network.fit(
x=train_gen,
validation_data=test_gen,
epochs=EPOCHS,
steps_per_epoch=STEPS,
validation_steps=STEPS)
I found the problem. My custom data generator was returning a list [x,y] instead of a tuple (x,y). Where x is the input and y the target. A simple mistake that was causing totally unrelated errors.