I'm trying to successively build up mixture models, iteratively adding sub-models.
I start by building and training a simple model. I then build a slightly more complex model that contains all of the original model but has more layers. I want to move the trained weights from the first model into the new model. How can I do this? The first model is nested in the second model.
Here's a dummy MWE:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import (concatenate, Conv1D, Dense, LSTM)
from tensorflow.keras import Model, Input, backend
# data
x = np.random.normal(size = 100)
y = np.sin(x)+np.random.normal(size = 100)
# model 1
def make_model_1():
inp = Input(1)
l1 = Dense(5, activation = 'relu')(inp)
out1 = Dense(1)(l1)
model1 = Model(inp, out1)
return model1
model1 = make_model_1()
model1.compile(optimizer = tf.keras.optimizers.SGD(),
loss = tf.keras.losses.mean_squared_error)
model1.fit(x, y, epochs = 3, batch_size = 10)
# make model 2
def make_model_2():
inp = Input(1)
l1 = Dense(5, activation = 'relu')(inp)
out1 = Dense(1)(l1)
l2 = Dense(15, activation = 'sigmoid')(inp)
out2 = Dense(1)(l2)
bucket = tf.stack([out1, out2], axis=2)
out = backend.squeeze(Dense(1)(bucket), axis = 2)
model2 = Model(inp, out)
return model2
model2 = make_model_2()
HOW CAN I TRANSFER THE WEIGHTS FROM model1 to model2? In a way that's automatic and completely agnostic about the nature of the two models, except that they are nested?
you can simply load the trained weights in the specific part of the new model you are interested in. I do this by creating a new instance of model1 into model2. After that, I load the trained weights.
Here the full example
# data
x = np.random.normal(size = 100)
y = np.sin(x)+np.random.normal(size = 100)
# model 1
def make_model_1():
inp = Input(1)
l1 = Dense(5, activation = 'relu')(inp)
out1 = Dense(1)(l1)
model1 = Model(inp, out1)
return model1
model1 = make_model_1()
model1.compile(optimizer = tf.keras.optimizers.SGD(),
loss = tf.keras.losses.mean_squared_error)
model1.fit(x, y, epochs = 3, batch_size = 10)
# make model 2
def make_model_2(trained_model):
inp = Input(1)
m = make_model_1()
m.set_weights(trained_model.get_weights())
out1 = m(inp)
l2 = Dense(15, activation = 'sigmoid')(inp)
out2 = Dense(1)(l2)
bucket = tf.stack([out1, out2], axis=2)
out = tf.keras.backend.squeeze(Dense(1)(bucket), axis = 2)
model2 = Model(inp, out)
return model2
model2 = make_model_2(model1)
model2.summary()
Related
I have neural net model with multiple sub-models. Model2 includes model1 in it's topology. I want to train model1 first separately from model2, then I want to pass the trained parameters of model1 to model2. Does an approach similar to below passes parameters automatically or do I need to explicitly get weights and biases from model1 and pass to model2 during initalization?
Model1:
main_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32', name='main_input')
x = embedding_layer(main_input)
x = CuDNNLSTM(KG_EMBEDDING_DIM, return_sequences=True)(x)
x = Avg(x)
x = Dense(KG_EMBEDDING_DIM)(x)
x = Activation('relu')(x)
# entity_extraction = Reshape([KG_EMBEDDING_DIM])(x)
entity_extraction = Transpose(x)
final_output = Dense(units=len(unique_labels), activation='softmax')(Transpose(entity_extraction))
optimizer = Adam(lr=LEARNING_RATE, clipvalue=0.25)
m1 = Model(inputs=main_input, outputs=final_output)
m1.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
m1.summary()
Model2:
x = embedding_layer(main_input)
x = CuDNNLSTM(LSTM_HIDDEN_SIZE, return_sequences=True)(x)
Avg = keras.layers.core.Lambda(lambda x: K.mean(x, axis=1), output_shape=(LSTM_HIDDEN_SIZE,))
x = Avg(x)
x = Dense(LSTM_HIDDEN_SIZE)(x)
main_lstm_out = Activation('relu')(x)
lstm_hidden_and_entity = Concatenate(axis=0)([Transpose(main_lstm_out), entity_extraction])
print("lstm_hidden_and_entity", K.int_shape(lstm_hidden_and_entity))
# input("continue?")
final_output = Dense(units=len(unique_labels), activation='softmax')(Transpose(lstm_hidden_and_entity))
optimizer = Adam(lr=LEARNING_RATE, clipvalue=0.25)
m2 = Model(inputs=main_input, outputs=final_output)
m2.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
Notice that Model2 uses entity_extraction in it's structure which is a layer in model1 and trained before initialization of Model2. So with this kind of approach, does it transfer parameters?
New to tf/python and have created a model that classifies text with a toxicity level (obscene, toxic, threat, etc). This is what I have so far and it does produce the summary, so I know it is loading correctly. How do I pass text to the model to return a prediction? Any help would be much appreciated.
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
checkpoint_path = "tf_model/the_model/saved_model.pb"
checkpoint_dir = os.path.dirname(checkpoint_path)
new_model = tf.keras.models.load_model(checkpoint_dir)
# Check its architecture
new_model.summary()
inputs = [
"tenserflow seems like it fits the bill but there are zero tutorials that outline how to reuse a model in a production environment "
]
predictions = new_model.predict(inputs)
print(predictions)
I get many error messages, some of the long winded ones are as follows:
WARNING:tensorflow:Model was constructed with shape (None, 150) for input KerasTensor(type_spec=TensorSpec(shape=(None, 150), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'"), but it was called on an input with incompatible shape (None, 1).
ValueError: Negative dimension size caused by subtracting 3 from 1 for '{{node model/conv1d/conv1d}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](model/conv1d/conv1d/ExpandDims, model/conv1d/conv1d/ExpandDims_1)' with input shapes: [?,1,1,256], [1,3,256,64].
This is the py code used to create and test it/prediction which works perfectly:
import tensorflow as tf
import numpy as np
import pandas as pd
import os
TRAIN_DATA = "datasets/train.csv"
GLOVE_EMBEDDING = "embedding/glove.6B.100d.txt"
train = pd.read_csv(TRAIN_DATA)
train["comment_text"].fillna("fillna")
x_train = train["comment_text"].str.lower()
y_train = train[["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]].values
max_words = 100000
max_len = 150
embed_size = 100
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=max_words, lower=True)
tokenizer.fit_on_texts(x_train)
x_train = tokenizer.texts_to_sequences(x_train)
x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_len)
embeddings_index = {}
with open(GLOVE_EMBEDDING, encoding='utf8') as f:
for line in f:
values = line.rstrip().rsplit(' ')
word = values[0]
embed = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = embed
word_index = tokenizer.word_index
num_words = min(max_words, len(word_index) + 1)
embedding_matrix = np.zeros((num_words, embed_size), dtype='float32')
for word, i in word_index.items():
if i >= max_words:
continue
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector
input = tf.keras.layers.Input(shape=(max_len,))
x = tf.keras.layers.Embedding(max_words, embed_size, weights=[embedding_matrix], trainable=False)(input)
x = tf.keras.layers.Bidirectional(tf.keras.layers.GRU(128, return_sequences=True, dropout=0.1,
recurrent_dropout=0.1))(x)
x = tf.keras.layers.Conv1D(64, kernel_size=3, padding="valid", kernel_initializer="glorot_uniform")(x)
avg_pool = tf.keras.layers.GlobalAveragePooling1D()(x)
max_pool = tf.keras.layers.GlobalMaxPooling1D()(x)
x = tf.keras.layers.concatenate([avg_pool, max_pool])
preds = tf.keras.layers.Dense(6, activation="sigmoid")(x)
model = tf.keras.Model(input, preds)
model.summary()
model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(lr=1e-3), metrics=['accuracy'])
batch_size = 128
checkpoint_path = "tf_model/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
save_weights_only=True,
verbose=1)
callbacks = [
tf.keras.callbacks.EarlyStopping(patience=5, monitor='val_loss'),
tf.keras.callbacks.TensorBoard(log_dir='logs'),
cp_callback
]
model.fit(x_train, y_train, validation_split=0.2, batch_size=batch_size,
epochs=1, callbacks=callbacks, verbose=1)
latest = tf.train.latest_checkpoint(checkpoint_dir)
model.load_weights(latest)
# Save the entire model as a SavedModel.
model.save('tf_model/the_model')
predictions = model.predict(np.expand_dims(x_train[42], 0))
print(tokenizer.sequences_to_texts([x_train[42]]))
print(y_train[42])
print(predictions)
Final solution:
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
checkpoint_path = "tf_model/the_model/saved_model.pb"
checkpoint_dir = os.path.dirname(checkpoint_path)
new_model = tf.keras.models.load_model(checkpoint_dir)
max_words = 100000
max_len = 150
# Check its architecture
# new_model.summary()
inputs = ["tenserflow seems like it fits the bill but there are zero tutorials that outline how to reuse a model in a production environment."]
# use same tokenizer used to build model
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=max_words, lower=True)
tokenizer.fit_on_texts(inputs)
# pass string to tokenizer and that 'array' is passed to predict
sequence = tokenizer.texts_to_sequences(inputs) # same tokenizer which is used on train data.
sequence = tf.keras.preprocessing.sequence.pad_sequences(sequence, maxlen = max_len)
predictions = new_model.predict(sequence)
print(predictions)
# [[0.0365479 0.01275077 0.02102855 0.00647011 0.02302513 0.00406089]]
It needs to be processed in the same way. This can be done with:
inputs = [
"tenserflow seems like it fits the bill but there are zero tutorials that outline
how to reuse a model in a production environment"]
sequence = tokenizer.texts_to_sequences(inputs) # same tokenizer which is used on train data.
sequence = pad_sequences(sequence, maxlen = max_len)
predictions = new_model.predict(sequence)
I'm new to machine learning and trying to fit a sample data set with neural networks in python using tensorflow. After having implemented the neural network in Dymola I want to compare the outputs of the function with those from the neural network.
The sample data set is:
import tensorflow as tf
from keras import metrics
import numpy as np
from keras.models import *
from keras.layers import Dense, Dropout
from keras import optimizers
from keras.callbacks import *
import scipy.io as sio
import mat4py as m4p
inputs = np.linspace(0, 15, num=3000)
outputs = 1/7 * ((inputs/5)^3 - (inputs/3)^2 + 5)
Inputs and outputs are then scaled into the interval [0; 0.9]:
inputs_max = np.max(inputs)
inputs_min = np.min(inputs)
outputs_max = np.max(outputs)
outputs_min = np.min(outputs)
upper_bound = 0.9
lower_bound = 0
m_in = (upper_bound - lower_bound) / (inputs_max - inputs_min)
c_in = upper_bound - (m_in * inputs_max)
scaled_in = m_in * inputs + c_in
m_out = (upper_bound - lower_bound) / (outputs_max - outputs_min)
c_out = upper_bound - (m_out * outputs_max)
scaled_out = m_in * inputs + c_in
and after that the neural network is trained with:
# shuffle values
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = np.empty(a.shape, dtype=a.dtype)
shuffled_b = np.empty(b.shape, dtype=b.dtype)
permutation = np.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
tf_features_64 = scaled_in
tf_labels_64 = scaled_out
tf_features_32 = tf_features_64.astype(np.float32)
tf_labels_32 = tf_labels_64.astype(np.float32)
X = tf_features_32
Y = tf_labels_32
shuffle_in_unison(X, Y)
# define callbacks
filepath = "weights-improvement-{epoch:02d}-{val_loss:.2f}.hdf5"
savebestCallBack = ModelCheckpoint(filepath, monitor='val_loss', verbose=1,
save_best_only=True, save_weights_only=False, mode='auto', period=1)
tbCallBack = TensorBoard(log_dir='./Graph',
histogram_freq=5,
write_graph=True,
write_images=True)
esCallback = EarlyStopping(monitor='val_loss',
min_delta=0,
patience=500,
verbose=0,
mode='min')
# neural network architecture
visible = Input(shape=(1,))
x = Dense(40, activation='tanh')(visible)
x = Dense(39, activation='tanh')(x)
x = Dense(38, activation='tanh')(x)
x = Dense(30, activation='tanh')(x)
output = Dense(1)(x)
# setup optimizer
Optimizer = optimizers.adam(lr=0.0007, amsgrad=True)
model = Model(inputs=visible, outputs=output)
model.compile(optimizer=Optimizer,
loss=['mse'],
metrics=['mae', 'mse']
)
model.fit(X, Y, epochs=1000, batch_size=1, verbose=1,
shuffle=True, validation_split=0.05, callbacks=[tbCallBack, esCallback])
# return weights
weights1 = model.layers[1].get_weights()[0]
biases1 = model.layers[1].get_weights()[1]
print('Layer1---------------------------------------------------------------------------------------------------------')
print('weights1:')
print(repr(weights1.transpose()))
print('biases1:')
print(repr(biases1))
w1 = weights1.transpose()
b1 = biases1.transpose()
we1 = {'w1' : w1.tolist()}
bi1 = {'b1' : b1.tolist()}
.........
......
Later on, I implemented the trained neural network in the program "Dymola" by loading the weights and biases in pre-configured "neural network base classes" (which have been used several times and are working).
// Modelica code for Dymola:
Real inputs;
Real outputs;
Real scaled_outputs;
Real scaled_inputs(start=0);
Real scaled_outputsfunc;
der(scaled_inputs) = 0.9;
//part of the neural network implementation in Dymola
NeuralNetwork.BaseClasses.NeuralNetworkLayer neuralNetworkLayer1(
NeuronActivationFunction=NeuralNetwork.Types.ActivationFunction.TanSig,
numInputs=1,
numNeurons=40,
weightTable=[-0.367953330278397; ......])
annotation (Placement(transformation(extent={{-76,22},{-56,42}})));
//scaled inputs
neuralNetworkLayer1.u[1] = scaled_inputs;
//scaled outputs
neuralNetworkLayer5.y[1]= scaled_outputs;
//scaled_inputs = 0.06 * inputs
inputs = 1/0.06 * (scaled_inputs);
outputs = 1/875 * inputs^3 - 1/63 * inputs^2 + 5/7;
scaled_outputsfunc = 1.2173139581825052 * outputs - 0.3173139581825052;
When plotting and comparing the scaled outputs of the function and the returned (scaled) values of the neural network I noticed that the approximation is very good in the interval from [0.5; 0.8], but the closer the inputs reach the boundaries the worse the approximation becomes.
Unfortunately, I have no clue why this is happening and how to fix this issue. I'd be very glad if someone could help me.
I want to answer my own question: I forgot to specify the activation function in the output layer in my python code, which Keras then set to a linear function by default, see also:
https://keras.io/layers/core/
In Dymola, where my ANN was implemented, 'tanh' was the activation function in the last layer, which lead to a divergence near the boundaries.
The correct python code for this application must be:
visible = Input(shape=(1,))
x = Dense(40, activation='tanh')(visible)
x = Dense(39, activation='tanh')(x)
x = Dense(38, activation='tanh')(x)
x = Dense(30, activation='tanh')(x)
output = Dense(1, activation='tanh')(x)
How to load model that have lambda layer?
Here is the code to reproduce behaviour:
MEAN_LANDMARKS = np.load('data/mean_shape_68.npy')
def add_mean_landmarks(x):
mean_landmarks = np.array(MEAN_LANDMARKS, np.float32)
mean_landmarks = mean_landmarks.flatten()
mean_landmarks_tf = tf.convert_to_tensor(mean_landmarks)
x = x + mean_landmarks_tf
return x
def get_model():
inputs = Input(shape=(8, 128, 128, 3))
cnn = VGG16(include_top=False, weights='imagenet', input_shape=(128, 128, 3))
x = TimeDistributed(cnn)(inputs)
x = TimeDistributed(Flatten())(x)
x = LSTM(256)(x)
x = Dense(68 * 2, activation='linear')(x)
x = Lambda(add_mean_landmarks)(x)
model = Model(inputs=inputs, outputs=x)
optimizer = Adadelta()
model.compile(optimizer=optimizer, loss='mae')
return model
Model compiles and I can save it, but when I tried to load it with load_model function I get an error:
in add_mean_landmarks
mean_landmarks = np.array(MEAN_LANDMARKS, np.float32)
NameError: name 'MEAN_LANDMARKS' is not defined
Аs I understand MEAN_LANDMARKS is not incorporated in graph as constant tensor. Also it's related to this question: How to add constant tensor in Keras?
You need to pass custom_objects argument to load_model function:
model = load_model('model_file_name.h5', custom_objects={'MEAN_LANDMARKS': MEAN_LANDMARKS})
Look for more info in Keras docs: Handling custom layers (or other custom objects) in saved models
.
Essentially, I am training an LSTM model using Keras, but when I save it, its size takes up to 100MB. However, my purpose of the model is to deploy to a web server in order to serve as an API, my web server is not able to run it since the model size is too big. After analyzing all parameters in my model, I figured out that my model has 20,000,000 parameters but 15,000,000 parameters are untrained since they are word embeddings. Is there any way that I can minimize the size of the model by removing that 15,000,000 parameters but still preserving the performance of the model?
Here is my code for the model:
def LSTModel(input_shape, word_to_vec_map, word_to_index):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.5)(X)
X = LSTM(256, return_sequences=False)(X)
X = Dropout(0.5)(X)
X = Dense(NUM_OF_LABELS)(X)
X = Activation("softmax")(X)
model = Model(inputs=sentence_indices, outputs=X)
return model
Define the layers you want to save outside the function and name them. Then create two functions foo() and bar(). foo() will have the original pipeline including the embedding layer. bar() will have only the part of pipeline AFTER embedding layer. Instead, you will define new Input() layer in bar() with dimensions of your embeddings:
lstm1 = LSTM(256, return_sequences=True, name='lstm1')
lstm2 = LSTM(256, return_sequences=False, name='lstm2')
dense = Dense(NUM_OF_LABELS, name='Susie Dense')
def foo(...):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
def bar(...):
embeddings = Input(embedding_shape, dtype="float32")
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
foo_model = foo(...)
bar_model = bar(...)
foo_model.fit(...)
bar_model.save_weights(...)
Now, you will train the original foo() model. Then you can save the weights of the reduced bar() model. When loading the model, don't forget to specify by_name=True parameter:
foo_model.load_weights('bar_model.h5', by_name=True)