TF2, Tensorflow Probability random seed generator and VAE - python

Playing around with Variational Autoencoders for some days. I am trying to fit a small toy function with a small model.
I first implemented the model using the Keras Functional API, with the following code:
def define_tfp_encoder(latent_dim, n_inputs=2, kl_weight=1):
prior = tfd.MultivariateNormalDiag(loc=tf.zeros(latent_dim))
input_x = Input((n_inputs,))
input_c = Input((1,))
dense = Dense(25, activation='relu', name='tfpenc/dense_1')(input_x)
dense = Dense(32, activation='relu', name='tfpenc/dense_2')(dense)
dense_z_params = Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim), name='tfpenc/z_params')(dense)
dense_z = tfpl.MultivariateNormalTriL(latent_dim, name='tfpenc/z')(dense_z_params)
#activity_regularizer=tfpl.KLDivergenceRegularizer(prior) # weight=kl_weight
kld = tfpl.KLDivergenceAddLoss(prior, name='tfpenc/kld_add')(dense_z)
model = Model(inputs=input_x, outputs=kld)
return model
def define_tfp_decoder(latent_dim, n_inputs=2):
input_c = Input((1,), name='tfpdec/cond_input')
input_n = Input((latent_dim,))
dense = Dense(15, activation='relu', name='tfpdec/dense_1')(input_n)
dense = Dense(32, activation='relu', name='tfpdec/dense_2')(dense)
dense = Dense(tfpl.IndependentNormal.params_size(n_inputs), name='tfpdec/output')(dense)
output = tfpl.IndependentNormal((n_inputs,))(dense)
model = Model(input_n, output)
return model
def get_custom_unconditional_vae():
latent_size = 5
encoder = define_tfp_encoder(latent_dim=latent_size)
decoder = define_tfp_decoder(latent_dim=latent_size)
encoder.trainable = True
decoder.trainable = True
x = encoder.input
z = encoder.output
out = decoder(z)
vae = Model(inputs=x, outputs=out)
vae.compile(loss=lambda x, pred: -pred.log_prob(x), optimizer='adam')
return encoder, decoder, vae
The vae-model was then fitted and trained on 3000 epochs.
However, it only produced garbage for a very simple quadratic function to fit.
Now it comes:
When creating the exact same model using the sequential API it works as expected and the desired function gets approximated nicely:
And it becomes even stranger for me:
After running tf.random.set_seed(None) the model created using the Functional API also works as expected - What am I missing or not understanding correctly so far? - I assume that there are some differences regarding tf.random.set_seed when using the Sequential vs. the Functional API but... ?
Thanks in advance,
codax
EDIT: I forgot to mention that setting a seed (e.g. tf.random.set_seed(123) leads to identical results for both models not fitting the desired function.

Related

Performing Differentiation wrt input within a keras model for use in loss

Is there any layer in keras which calculates the derivative wrt input? For example if x is input, the first layer is say f(x), then the next layer's output should be f'(x). There are multiple question here about this topic but all of them involve computation of derivative outside the model. In essence, I want to create a neural network whose loss function involves both the jacobian and hessians wrt the inputs.
I've tried the following
import keras.backend as K
def create_model():
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = K.gradients(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model()
X = np.random.uniform(size = (3, 10))
This is gives the error tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
So I tried using that
def create_model2():
with tf.GradientTape() as tape:
x = keras.Input(shape = (10,))
layer = Dense(1, activation = "sigmoid")
output = layer(x)
jac = tape.gradient(output, x)
model = keras.Model(inputs=x, outputs=jac)
return model
model = create_model2()
X = np.random.uniform(size = (3, 10))
but this tells me 'KerasTensor' object has no attribute '_id'
Both these methods work fine outside the model. My end goal is to use the Jacobian and Hessian in the loss function, so alternative approaches would also be appreciated
Not sure what exactly you want to do, but maybe try a custom Keras layer with tf.gradients:
import tensorflow as tf
tf.random.set_seed(111)
class GradientLayer(tf.keras.layers.Layer):
def __init__(self):
super(GradientLayer, self).__init__()
self.dense = tf.keras.layers.Dense(1, activation = "sigmoid")
#tf.function
def call(self, inputs):
outputs = self.dense(inputs)
return tf.gradients(outputs, inputs)
def create_model2():
gradient_layer = GradientLayer()
inputs = tf.keras.layers.Input(shape = (10,))
outputs = gradient_layer(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
model = create_model2()
X = tf.random.uniform((3, 10))
print(model(X))
tf.Tensor(
[[-0.07935508 -0.12471244 -0.0702782 -0.06729251 0.14465885 -0.0818079
-0.08996294 0.07622238 0.11422144 -0.08126545]
[-0.08666676 -0.13620329 -0.07675356 -0.07349276 0.15798753 -0.08934557
-0.09825202 0.08324542 0.12474566 -0.08875315]
[-0.08661086 -0.13611545 -0.07670406 -0.07344536 0.15788564 -0.08928795
-0.09818865 0.08319173 0.12466521 -0.08869591]], shape=(3, 10), dtype=float32)

Tensorflow: Use model inside another model as layer

I want to use a classification model inside another model as layer, since I thought that keras models can be used as layers also. This is the code of the first model:
cencoder_inputs = keras.layers.Input(shape=[pad_len], dtype=np.int32)
ccondi_input = keras.layers.Input(shape=[1], dtype=np.int32)
ccondi_layer = tf.keras.layers.concatenate([cencoder_inputs, ccondi_input], axis=1)
cembeddings = keras.layers.Embedding(vocab_size, 4)
cencoder_embeddings = cembeddings(ccondi_layer)
clstm = keras.layers.LSTM(128)(cencoder_embeddings)
cout_layer = keras.layers.Dense(16, activation="softmax")(clstm)
classification_model = keras.Model(inputs=[cencoder_inputs, ccondi_input], outputs=[cout_layer])
classification_model.compile(optimizer="Nadam", loss="sparse_categorical_crossentropy", metrics=["accuracy"], experimental_run_tf_function=False)
I train this model, save and reload it as class_model and set trainable=False
This is the code of my model, which should use the model above as layer:
encoder_inputs = keras.layers.Input(shape=[pad_len], dtype=np.int32)
decoder_inputs = keras.layers.Input(shape=[pad_len], dtype=np.int32)
condi_input = keras.layers.Input(shape=[1], dtype=np.int32)
class_layer = class_model((encoder_inputs, condi_input))
#Thats how I use the class model. Compilation goes fine so far
class_pred_layer = keras.layers.Lambda(lambda x: tf.reshape(tf.cast(tf.keras.backend.argmax(x, axis=1), dtype=tf.int32),shape=(tf.shape(encoder_inputs)[0],1)))(class_layer)
# Lambda and reshape layer, so I get 1 prediction per batch as integer
condi_layer = tf.keras.layers.concatenate([encoder_inputs, condi_input, class_pred_layer], axis=1)
embeddings = keras.layers.Embedding(vocab_size, 2)
encoder_embeddings = embeddings(condi_layer)
decoder_embeddings = embeddings(decoder_inputs)
encoder_1 = keras.layers.LSTM(64, return_sequences=True, return_state=True)
encoder_lstm_bidirectional_1 = keras.layers.Bidirectional(encoder_1)
encoder_output, state_h1, state_c1, state_h2, state_c2 = encoder_lstm_bidirectional_1(encoder_embeddings)
encoder_state = [Concatenate()([state_h1, state_h2]), Concatenate()([state_c1, state_c2])]
decoder_lstm = keras.layers.LSTM(64*2, return_sequences=True, return_state=True, name="decoder_lstm")
print(encoder_output.shape)
decoder_outputs,decoder_fwd_state, decoder_back_state = decoder_lstm(decoder_embeddings,initial_state=encoder_state)
print(decoder_outputs.shape)
attn_layer = AttentionLayer(name="attention_layer")
attn_out, attn_states = attn_layer([encoder_output, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1, name="decoder_concat_layer")([decoder_outputs, attn_out])
decoder_dense_out = keras.layers.TimeDistributed(keras.layers.Dense(vocab_size, activation="softmax"))
decoder_outputs = decoder_dense_out(decoder_concat_input)
model = keras.Model(inputs=[encoder_inputs, decoder_inputs, condi_input], outputs=[decoder_outputs])
When I execute model.fit(), I receive the following error:
Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor 'input_21:0' shape=(None, 35) dtype=int32>]
I thought trained models could be used easily as layers, what am I doing wrong?
I also already looked into this Post but it didnt help me either.
Thanks for your help!
Ok, i will do 2 things: (1) I will give you an example that works where i had to do call a model inside an other model, and (2) try to give you a hint on what could be your problem here ( i cant really undertand the code but i had in the past the same error )
1.
This is an example of a model that use an other model as an hidden layer:
def model_test(input_shape, sub_model):
inputs = Input(input_shape)
eblock_1_1 = dense_convolve(inputs, n_filters=growth_rate)
eblock_1_2 = dense_convolve(eblock_1_1, n_filters=growth_rate);
dblock_1_1 = dense_convolve(eblock_1_2, n_filters=growth_rate);
dblock_1_2 = dense_convolve(dblock_1_1, n_filters=growth_rate);
final_convolution = Conv3D(2, (1, 1, 1), padding='same', activation='relu')(dblock_1_2)
intermedio = sub_model(final_convolution)
layer = LeakyReLU(alpha=0.3)(intermedio)
model = Model(inputs=inputs, outputs=layer)
return model
I call it like this:
with strategy.scope():
sub_model = tf.keras.models.load_model('link_to_the_model')
sub_model.trainable = False
model = model_test(INPUT_SIZE, sub_model)
model.compile(optimizer=Adam(lr=0.1),
loss=tf.keras.losses.MeanSquaredError(),
metrics=None)
I just tested this on google colab with keras.
I had the same error some time ago when i tried to call a function with eager execution inside a model, the problem here is that the training is executed in graph mode ( you can find online some info about https://towardsdatascience.com/eager-execution-vs-graph-execution-which-is-better-38162ea4dbf6).
If the problem is the call of the model maybe try to do what i did, pass the model as a parameter and call it inside with a layer as argument and use it as a simple layer

Why isnt my loss function zero when using predictions as training examples?

I am building a neural network to control a pan tilt gimbal using ROS, OpenCV and Keras. I have position errors and their derivatives presented as the state input, and my approach is to add to the output command position error that resulted from the movement. So my examples are collected as
input(t) -> output(t) + error(t+1) and my understanding is that updates to the network will stop when the error is small.
However, after failing to converge I decided to try removing my error term to verify that my loss is zero. Thinking that what ever the current state of the network, I am passing it training examples that it is already predicting. To my surprise this is still giving me a non zero loss. I feel there is something fundamental that I am missing about how the network trains.
I have put together a minimum working example with a random input, removing the tilt axis for simplicity but otherwise leaving in the basic neural network architecture in case that was where my problem is.
from tensorflow.keras.layers import Input, Dense, BatchNormalization, Dropout, Activation
from tensorflow.keras.models import Model
from tensorflow.keras import optimizers
from tensorflow.keras.regularizers import l2
import numpy as np
input_dims = 4
batch_size = 4
state_input = Input(shape=input_dims)
initializer = 'glorot_uniform'
x = Dense(128, kernel_initializer = initializer, kernel_regularizer=l2(0.1),
bias_regularizer=l2(0.01), name='neural_inputLayer')(state_input)
x = BatchNormalization()(x)
x = Activation('softplus')(x)
x = Dropout(0.3)(x)
x_pan = Dense(128, activation='softplus', kernel_initializer = initializer,
kernel_regularizer=l2(0.1), bias_regularizer=l2(0.01), name='neural_PantLayer')(x)
mu_pan = Dense(1, activation='linear',kernel_initializer = initializer, kernel_regularizer=l2(0.1),
bias_regularizer=l2(0.01), name='neural_ctrl_output_mu_pan')(x_pan)
model = Model(inputs=state_input, outputs=mu_pan)
OPTIMIZER = optimizers.Nadam(learning_rate=0.01, beta_1=0.9, beta_2=0.999, clipnorm=0.5,
clipvalue=0.1)
model.compile(optimizer=OPTIMIZER, loss='mse')
model.summary()
test_inputs = np.array([])
test_outputs = np.array([])
for i in range(batch_size):
test_input = np.random.rand(input_dims)
test_output = model.predict(test_input.reshape(1,input_dims))
#save IO pairs
if i == 0:
test_inputs = test_input
else:
test_inputs = np.vstack((test_inputs, test_input))
test_outputs = np.append(test_outputs,test_output)
test_loss = model.fit(test_inputs,
test_outputs,batch_size=batch_size,verbose=True,shuffle=True,epochs=10)

Keras model graph is disconnected when trying to use a shared model

I'm trying to train a neural network in keras but I'm getting as error that there are no gradients for any variable, which may imply that the graph is disconnected.
I'm copying here a stripped down version of the code with only the bit related to the model definition.
The model accepts two inputs that will be fed, one at time, to the same shared model: the encoder.
The two outputs of the encoder are then concatenated and sent to a dense layer to compute the final output.
I don't get what's wrong, it looks like that when instantiating the encoder I'm creating additional trainable variables that are not used anywhere.
For the network layout I was getting inspiration from the official keras docs:
https://keras.io/guides/functional_api/#all-models-are-callable-just-like-layers
def _get_encoder(self, model_input_shape):
encoder_input = Input(shape=model_input_shape)
x = encoder_input
x = Conv2D(32, (3, 3), strides=1, padding="same")(x)
x = BatchNormalization(axis=-1)(x)
x = LeakyReLU(alpha=0.1)(x)
latent_z = Flatten()(x)
latent_z = Dense(self.latent_dim)(latent_z)
encoder = Model(
encoder_input,
latent_z,
name='encoder'
)
return encoder
def build_model(self):
model_input_shape = (self.height, self.width, self.depth)
model_input_1 = Input(shape=model_input_shape)
model_input_2 = Input(shape=model_input_shape)
self.encoder = self._get_encoder(model_input_shape)
z_1 = self.encoder(model_input_1)
z_2 = self.encoder(model_input_2)
x = concatenate([z_1, z_2])
prediction = Dense(1, activation='sigmoid')(x)
self.network = Model(
inputs=[model_input_1, model_input_2],
outputs=[prediction],
name = 'network'
)
network.network.compile(
optimizer='rmsprop',
loss='mse',
metrics=['mae'])
H = network.network.fit(
x=train_gen,
validation_data=test_gen,
epochs=EPOCHS,
steps_per_epoch=STEPS,
validation_steps=STEPS)
I found the problem. My custom data generator was returning a list [x,y] instead of a tuple (x,y). Where x is the input and y the target. A simple mistake that was causing totally unrelated errors.

Gradcam with guided backprop for transfer learning in Tensorflow 2.0

I get an error using gradient visualization with transfer learning in TF 2.0. The gradient visualization works on a model that does not use transfer learning.
When I run my code I get the error:
assert str(id(x)) in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("block5_conv3/Identity:0", shape=(None, 14, 14, 512), dtype=float32)
When I run the code below it errors. I think there's an issue with the naming conventions or connecting inputs and outputs from the base model, vgg16, to the layers I'm adding. Really appreciate your help!
"""
Broken example when grad_model is created.
"""
!pip uninstall tensorflow
!pip install tensorflow==2.0.0
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
import matplotlib.pyplot as plt
IMAGE_PATH = '/content/cat.3.jpg'
LAYER_NAME = 'block5_conv3'
model_layer = 'vgg16'
CAT_CLASS_INDEX = 281
imsize = (224,224,3)
img = tf.keras.preprocessing.image.load_img(IMAGE_PATH, target_size=(224, 224))
plt.figure()
plt.imshow(img)
img = tf.io.read_file(IMAGE_PATH)
img = tf.image.decode_jpeg(img)
img = tf.cast(img, dtype=tf.float32)
# img = tf.keras.preprocessing.image.img_to_array(img)
img = tf.image.resize(img, (224,224))
img = tf.reshape(img, (1, 224,224,3))
input = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet',
input_shape=(imsize[0], imsize[1], imsize[2]))
# base_model.trainable = False
flat = layers.Flatten()
dropped = layers.Dropout(0.5)
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
fc1 = layers.Dense(16, activation='relu', name='dense_1')
fc2 = layers.Dense(16, activation='relu', name='dense_2')
fc3 = layers.Dense(128, activation='relu', name='dense_3')
prediction = layers.Dense(2, activation='softmax', name='output')
for layr in base_model.layers:
if ('block5' in layr.name):
layr.trainable = True
else:
layr.trainable = False
x = base_model(input)
x = global_average_layer(x)
x = fc1(x)
x = fc2(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = input, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
This portion of the code is where the error lies. I'm not sure what is the correct way to label inputs and outputs.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input, model.get_layer(model_layer).input],
outputs=[model.get_layer(model_layer).get_layer(LAYER_NAME).output,
model.output])
print(model.get_layer(model_layer).get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
conv_outputs, predictions = grad_model(img)
loss = predictions[:, 1]
The section below is for plotting a heatmap of gradcam.
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]
# Apply guided backpropagation
gate_f = tf.cast(output > 0, 'float32')
gate_r = tf.cast(grads > 0, 'float32')
guided_grads = gate_f * gate_r * grads
# Average gradients spatially
weights = tf.reduce_mean(guided_grads, axis=(0, 1))
# Build a ponderated map of filters according to gradients importance
cam = np.ones(output.shape[0:2], dtype=np.float32)
for index, w in enumerate(weights):
cam += w * output[:, :, index]
# Heatmap visualization
cam = cv2.resize(cam.numpy(), (224, 224))
cam = np.maximum(cam, 0)
heatmap = (cam - cam.min()) / (cam.max() - cam.min())
cam = cv2.applyColorMap(np.uint8(255 * heatmap), cv2.COLORMAP_JET)
output_image = cv2.addWeighted(cv2.cvtColor(img.astype('uint8'), cv2.COLOR_RGB2BGR), 0.5, cam, 1, 0)
plt.figure()
plt.imshow(output_image)
plt.show()
I also asked this to the tensorflow team on github at https://github.com/tensorflow/tensorflow/issues/37680.
I figured it out. If you set up the model extending the vgg16 base model with your own layers, rather than inserting the base model into a new model like a layer, then it works.
First set up the model and be sure to declare the input_tensor.
inp = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet', input_tensor=inp,
input_shape=(imsize[0], imsize[1], imsize[2]))
This way we don't have to include a line like x=base_model(inp) to show what input we want to put in. That's already included in tf.keras.applications.VGG16(...).
Instead of putting this vgg16 base model inside another model, it's easier to do gradcam by adding layers to the base model itself. I grab the output of the last layer of VGG16 (with the top removed), which is the pooling layer.
block5_pool = base_model.get_layer('block5_pool')
x = global_average_layer(block5_pool.output)
x = fc1(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = inp, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
Now, I grab the layer for visualization, LAYER_NAME='block5_conv3'.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input],
outputs=[model.output, model.get_layer(LAYER_NAME).output])
print(model.get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
predictions, conv_outputs = grad_model(img)
loss = predictions[:, 1]
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]
We (I plus a number of team members developing a project) found a similar problem with a code implementing Grad-CAM that we found in a tutorial.
That code didn't work with a model consisting of the base model of VGG19 plus a few extra layers added on top of it. The problem was that the VGG19 base model was inserted as a "layer" inside our model, and apparently the GradCAM code didn't know how to deal with it - we were getting a "Graph disconnected..." error. Then after some debugging (carried out by another team member, not me) we managed to modify the original code to make it work for this kind of model that contains another model inside it. The idea is to add the inner model as an extra argument of the class GradCAM. Since this may be helpful to others I am including the modified code below (we also renamed the GradCAM class as My_GradCAM).
class My_GradCAM:
def __init__(self, model, classIdx, inner_model=None, layerName=None):
self.model = model
self.classIdx = classIdx
self.inner_model = inner_model
if self.inner_model == None:
self.inner_model = model
self.layerName = layerName
[...]
gradModel = tensorflow.keras.models.Model(inputs=[self.inner_model.inputs],
outputs=[self.inner_model.get_layer(self.layerName).output,
self.inner_model.output])
Then the class can be instantiated by adding the inner model as the extra argument, e.g.:
cam = My_GradCAM(model, None, inner_model=model.get_layer("vgg19"), layerName="block5_pool")
I hope this helps.
Edit: Credit to Mirtha Lucas for doing the debugging and finding the solution.
After a lot of struggle, I condense the way to draw the heat map when you are using transfer learning. Here is the keras official tutorial
The issue I encounter is that when I'm trying to draw the heat map
from my model, the densenet can be only seen as functional layer in my
model. So the make_gradcam_heatmap can not figure out the layer that
inside functional layer. As the 5th layer shows.
Therefore, to simulate the Keras official document, I need to only use the densenet as the model for visualization. Here is the step
Only Take out the model from your model
dense_model = dense_model.get_layer('densenet121')
Copy the weight from dense model to your new initiated model
inputs = tf.keras.Input(shape=(224, 224, 3))
model = model_builder(weights="imagenet", include_top=True, input_tensor=inputs)
for layer, dense_layer in zip(model.layers[1:], dense_model.layers[1:]):
layer.set_weights(dense_layer.get_weights())
relu = model.get_layer('relu')
x = tf.keras.layers.GlobalAveragePooling2D()(relu.output)
outputs = tf.keras.layers.Dense(5)(x)
model = tf.keras.models.Model(inputs = inputs, outputs = outputs)
Draw the heat map
preprocess_input = keras.applications.densenet.preprocess_input
img_array = preprocess_input(get_img_array(img_path, size=(224, 224)))
heatmap = make_gradcam_heatmap(img_array, model, 'bn')
plt.matshow(heatmap)
plt.show()
get_img_array, make_gradcam_heatmap and save_and_display_gradcam are kept in still. Follow the keras tutorial then you are good to go.

Categories