I have a model like the one below. I want to add a matrix of learnable weights in the end, which is initialized to the variable matrix that I pass to the function create_model.
To get the intuitive idea of what I want to do, imagine the matrix is supposed to be the one I pass to the model, but I have the feeling that it can still be finetuned during training. Therefore, I want it to be initialized to the values I pass, and then refined during training.
The code below works, but as you see from the model.summary() output, the matrix multiplication contains no learnable weights, which makes me think that the weights of the matrix are not beeing finetuned.
What am I doing wrong?
def create_model(num_columns, matrix):
inp_layer = tfl.Input((num_columns,))
dense = tfl.Dense(512, activation = 'relu')(inp_layer)
dense = tfl.Dense(256, activation = 'relu')(dense)
dense = tfl.Dense(128, activation = 'relu')(dense)
va = tf.Variable(matrix, dtype = tf.float32)
dense = K.dot(dense, va )
model = tf.keras.Model(inputs = inp_layer, outputs = dense)
model.compile(optimizer='adam', loss=['binary_crossentropy'])
model.summary()
return model
matrix = np.random.randint(0,2,(128, 206)) # In reality, this is not random, but it has sensed values
num_columns = 750
model = create_model(num_columns,matrix)
you can simply use a dense layer with no bias to do this multiplication. After the model is built I change the weight of interest with the matrix you provided
def create_model(num_columns, matrix):
inp_layer = Input((num_columns,))
x = Dense(512, activation = 'relu')(inp_layer)
x = Dense(256, activation = 'relu')(x)
x = Dense(128, activation = 'relu')(x)
dense = Dense(206, use_bias=False)(x)
model = Model(inputs = inp_layer, outputs = dense)
model.compile(optimizer='adam', loss=['binary_crossentropy'])
model.set_weights(model.get_weights()[:-1] + [matrix])
model.summary()
return model
matrix = np.random.randint(0,2,(128, 206)) # In reality, this is not random, but it has sensed values
num_columns = 750
model = create_model(num_columns,matrix)
check
(model.get_weights()[-1] == matrix).all() # True
In this way, the weights can be fine-tuned
Related
I have a custom ResNet model that I define through the Keras Functional API. Also my model has multiple outputs. The last element of the output array is the fully connected dense layer with num_class nodes. I want to be able to increment the number of nodes of this layer. This is the relevant code for the creation of my network:
from tensorflow.keras import layers, models, Input, regularizers
res = []
inputs = Input(shape=(height, width, channels), name='data')
x = MyLayer()(inputs)
# ... other layers
x = MyLayer()(x)
res.append(x)
# ... other layers
x = layers.Dense(num_class, name='fc1', use_bias=True)(x)
res.append(x)
model = models.Model(inputs=inputs, outputs=[res[-2], res[-3], res[-4], res[-1]])
At the question Adding new nodes to output layer in Keras I found an answer similar to what I'm searching for that I'm adding here below:
def add_outputs(self, n_new_outputs):
#Increment the number of outputs
self.n_outputs += n_new_outputs
weights = self.model.get_layer('fc8').get_weights()
#Adding new weights, weights will be 0 and the connections random
shape = weights[0].shape[0]
weights[1] = np.concatenate((weights[1], np.zeros(n_new_outputs)), axis=0)
weights[0] = np.concatenate((weights[0], -0.0001 * np.random.random_sample((shape, n_new_outputs)) + 0.0001), axis=1)
#Deleting the old output layer
self.model.layers.pop()
last_layer = self.model.get_layer('batchnormalization_1').output
#New output layer
out = Dense(self.n_outputs, activation='softmax', name='fc8')(last_layer)
self.model = Model(input=self.model.input, output=out)
#set weights to the layer
self.model.get_layer('fc8').set_weights(weights)
print(weights[0])
However in this question there was only one layer as output and I'm not sure how to replicate the same with my architecture.
This is the solution I've come up with. I assigned the layers that I wanted to keep as output to variables:
from tensorflow.keras import layers, models, Input, regularizers
inputs = Input(shape=(height, width, channels), name='data')
x = MyLayer()(inputs)
# ... other layers
a = MyLayer(name="a")(x)
b = MyLayer(name="b")(a)
c = MyLayer(name="c")(b)
x = layers.Dense(num_class, name='fc1', use_bias=True)(c)
model = models.Model(inputs=inputs, outputs=[c, b, a, x])
And then in order to increase the number of nodes in the last layer just call the function increment_classes. The total number of nodes will be the sum of old_num_class and num_class.
def increment_classes(model, old_num_class, num_class):
weights = model.get_layer("fc1").get_weights()
new_num_class = old_num_class + num_class
# Adding new weights, weights will be 0 and the connections random
shape = weights[0].shape[0]
weights[1] = np.concatenate((weights[1], np.zeros(num_class)), axis=0)
weights[0] = np.concatenate((weights[0], -0.0001 * np.random.random_sample((shape, num_class)) + 0.0001), axis=1)
# Deleting the old dense output layer
model.layers.pop()
# get the output layers
a = model.get_layer("a").output
b = model.get_layer("b").output
c = model.get_layer('c').output
# Replace dense output layer (x)
out = layers.Dense(new_num_class, name='fc1', use_bias=True)(c)
model = models.Model(inputs=model.input, outputs=[c, b, a, out])
# set weights to the layer
model.get_layer('fc1').set_weights(weights)
return model
Is there any layer in keras which calculates the derivative wrt input? For example if x is input, the first layer is say f(x), then the next layer's output should be f'(x). There are multiple question here about this topic but all of them involve computation of derivative outside the model. In essence, I want to create a neural network whose loss function involves both the jacobian and hessians wrt the inputs.
I've tried the following
import keras.backend as K
def create_model():
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = K.gradients(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model()
X = np.random.uniform(size = (3, 10))
This is gives the error tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
So I tried using that
def create_model2():
with tf.GradientTape() as tape:
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = tape.gradient(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model2()
X = np.random.uniform(size = (3, 10))
but this tells me 'KerasTensor' object has no attribute '_id'
Both these methods work fine outside the model. My end goal is to use the Jacobian and Hessian in the loss function, so alternative approaches would also be appreciated
Not sure what exactly you want to do, but maybe try a custom Keras layer with tf.gradients:
import tensorflow as tf
tf.random.set_seed(111)
class GradientLayer(tf.keras.layers.Layer):
def __init__(self):
super(GradientLayer, self).__init__()
self.dense = tf.keras.layers.Dense(1, activation = "sigmoid")
#tf.function
def call(self, inputs):
outputs = self.dense(inputs)
return tf.gradients(outputs, inputs)
def create_model2():
gradient_layer = GradientLayer()
inputs = tf.keras.layers.Input(shape = (10,))
outputs = gradient_layer(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
model = create_model2()
X = tf.random.uniform((3, 10))
print(model(X))
tf.Tensor(
[[-0.07935508 -0.12471244 -0.0702782 -0.06729251 0.14465885 -0.0818079
-0.08996294 0.07622238 0.11422144 -0.08126545]
[-0.08666676 -0.13620329 -0.07675356 -0.07349276 0.15798753 -0.08934557
-0.09825202 0.08324542 0.12474566 -0.08875315]
[-0.08661086 -0.13611545 -0.07670406 -0.07344536 0.15788564 -0.08928795
-0.09818865 0.08319173 0.12466521 -0.08869591]], shape=(3, 10), dtype=float32)
Playing around with Variational Autoencoders for some days. I am trying to fit a small toy function with a small model.
I first implemented the model using the Keras Functional API, with the following code:
def define_tfp_encoder(latent_dim, n_inputs=2, kl_weight=1):
prior = tfd.MultivariateNormalDiag(loc=tf.zeros(latent_dim))
input_x = Input((n_inputs,))
input_c = Input((1,))
dense = Dense(25, activation='relu', name='tfpenc/dense_1')(input_x)
dense = Dense(32, activation='relu', name='tfpenc/dense_2')(dense)
dense_z_params = Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim), name='tfpenc/z_params')(dense)
dense_z = tfpl.MultivariateNormalTriL(latent_dim, name='tfpenc/z')(dense_z_params)
#activity_regularizer=tfpl.KLDivergenceRegularizer(prior) # weight=kl_weight
kld = tfpl.KLDivergenceAddLoss(prior, name='tfpenc/kld_add')(dense_z)
model = Model(inputs=input_x, outputs=kld)
return model
def define_tfp_decoder(latent_dim, n_inputs=2):
input_c = Input((1,), name='tfpdec/cond_input')
input_n = Input((latent_dim,))
dense = Dense(15, activation='relu', name='tfpdec/dense_1')(input_n)
dense = Dense(32, activation='relu', name='tfpdec/dense_2')(dense)
dense = Dense(tfpl.IndependentNormal.params_size(n_inputs), name='tfpdec/output')(dense)
output = tfpl.IndependentNormal((n_inputs,))(dense)
model = Model(input_n, output)
return model
def get_custom_unconditional_vae():
latent_size = 5
encoder = define_tfp_encoder(latent_dim=latent_size)
decoder = define_tfp_decoder(latent_dim=latent_size)
encoder.trainable = True
decoder.trainable = True
x = encoder.input
z = encoder.output
out = decoder(z)
vae = Model(inputs=x, outputs=out)
vae.compile(loss=lambda x, pred: -pred.log_prob(x), optimizer='adam')
return encoder, decoder, vae
The vae-model was then fitted and trained on 3000 epochs.
However, it only produced garbage for a very simple quadratic function to fit.
Now it comes:
When creating the exact same model using the sequential API it works as expected and the desired function gets approximated nicely:
And it becomes even stranger for me:
After running tf.random.set_seed(None) the model created using the Functional API also works as expected - What am I missing or not understanding correctly so far? - I assume that there are some differences regarding tf.random.set_seed when using the Sequential vs. the Functional API but... ?
Thanks in advance,
codax
EDIT: I forgot to mention that setting a seed (e.g. tf.random.set_seed(123) leads to identical results for both models not fitting the desired function.
the model is for binary classification.
this is my model:
im_input= layers.Input(shape=[160,160,3])
x = layers.Conv2D(30,(3,3),strides=(2,2),padding='same')(im_input)
z = layers.DepthwiseConv2D((3,3),strides=2,padding='same',depth_multiplier=10)(im_input)
x = layers.ReLU()(x)
z = layers.ReLU()(z)
x = layers.Conv2D(60,(3,3),strides=(2,2),padding='same')(x)
z = layers.Conv2D(60,(3,3),strides=2,padding='same')(z)
x = layers.ReLU()(x)
z = layers.ReLU()(z)
x = layers.Concatenate()([x,z])
x = layers.Conv2D(120,(3,3),strides=2,padding='same')(x)
x = layers.ReLU()(x)
x = layers.Conv2D(200,(3,3),strides=2,padding='same')(x)
x = layers.ReLU()(x)
x = layers.Conv2D(400,(3,3),strides=1,padding='same')(x)
x = layers.ReLU()(x)
x = layers.Conv2D(900,(3,3),strides=1,padding='same')(x)
x = layers.Flatten()(x)
#x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(100,activation='relu')(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(20, activation='relu')(x)
out = layers.Dense(1,activation='sigmoid')(x)
smodel = tf.keras.Model(inputs=im_input, outputs=out, name="myModel2")
smodel.summary()
and this is the loss function:
cross_entropy = tf.keras.losses.BinaryCrossentropy()
the optimizer:
optimizer = tf.keras.optimizers.SGD(0.001)
any suggestions for the optimizer?
why does this model loss is not decreasing? is there something wrong in model? someone, please help...
Instead of SGD, you should try Adam optimizer.
Also, in your network, increase the units in the Dense layer as this is the final representation of the data.
Finally, the number of filters should be less, keep it maximum to 512.
If your input size is small, then reduce the number of layers also.
Try changing the optimizer to Adam
I don't think there is anything wrong with the code.
Also try changing the dense layers-
after flatten use dense layer with 512 units and then directly your final output layer.
You don't need so many dense layers.
Also can you post your loss value, if its two large then maybe there is something wrong with your Train labels.
Essentially, I am training an LSTM model using Keras, but when I save it, its size takes up to 100MB. However, my purpose of the model is to deploy to a web server in order to serve as an API, my web server is not able to run it since the model size is too big. After analyzing all parameters in my model, I figured out that my model has 20,000,000 parameters but 15,000,000 parameters are untrained since they are word embeddings. Is there any way that I can minimize the size of the model by removing that 15,000,000 parameters but still preserving the performance of the model?
Here is my code for the model:
def LSTModel(input_shape, word_to_vec_map, word_to_index):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.5)(X)
X = LSTM(256, return_sequences=False)(X)
X = Dropout(0.5)(X)
X = Dense(NUM_OF_LABELS)(X)
X = Activation("softmax")(X)
model = Model(inputs=sentence_indices, outputs=X)
return model
Define the layers you want to save outside the function and name them. Then create two functions foo() and bar(). foo() will have the original pipeline including the embedding layer. bar() will have only the part of pipeline AFTER embedding layer. Instead, you will define new Input() layer in bar() with dimensions of your embeddings:
lstm1 = LSTM(256, return_sequences=True, name='lstm1')
lstm2 = LSTM(256, return_sequences=False, name='lstm2')
dense = Dense(NUM_OF_LABELS, name='Susie Dense')
def foo(...):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
def bar(...):
embeddings = Input(embedding_shape, dtype="float32")
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
foo_model = foo(...)
bar_model = bar(...)
foo_model.fit(...)
bar_model.save_weights(...)
Now, you will train the original foo() model. Then you can save the weights of the reduced bar() model. When loading the model, don't forget to specify by_name=True parameter:
foo_model.load_weights('bar_model.h5', by_name=True)