How to get attention weights in hierarchical model

How to get attention weights in hierarchical model - python

Model :
sequence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32')
words = embedding_layer(sequence_input)
h_words = Bidirectional(GRU(200, return_sequences=True,dropout=0.2,recurrent_dropout=0.2))(words)
sentence = Attention()(h_words) #with return true
#sentence = Dropout(0.2)(sentence)
sent_encoder = Model(sequence_input, sentence[0])
print(sent_encoder.summary())
document_input = Input(shape=(None, MAX_SENT_LENGTH), dtype='int32')
document_enc = TimeDistributed(sent_encoder)(document_input)
h_sentences = Bidirectional(GRU(100, return_sequences=True))(document_enc)
preds = Dense(7, activation='softmax')(h_sentences)
model = Model(document_input, preds)
Attention layer used:
https://gist.github.com/cbaziotis/6428df359af27d58078ca5ed9792bd6d
with return_attention=True
How can I visualise attention weights for a new input once the model is trained.
What I am trying:
get_3rd_layer_output = K.function([model.layers[0].input,K.learning_phase()],
[model.layers[1].layer.layers[3].output])
and passing a new input but it is giving me error.
Possible reasons:
model.layers() only gives me last layers. I want ot get weights from the Timedistributed part.

You can use the following to display all the layers in your model:
print(model.layers)
Once you know what index number is your Time Distributed layer, say, 3, then use the following to get the config and the layer weights.
g = model_name.layers[3].get_config()
h = model_name.layers[3].get_weights()
print(g)
print(h)

Related

Add new nodes to one of the output layers in a Keras model

I have a custom ResNet model that I define through the Keras Functional API. Also my model has multiple outputs. The last element of the output array is the fully connected dense layer with num_class nodes. I want to be able to increment the number of nodes of this layer. This is the relevant code for the creation of my network:
from tensorflow.keras import layers, models, Input, regularizers
res = []
inputs = Input(shape=(height, width, channels), name='data')
x = MyLayer()(inputs)
# ... other layers
x = MyLayer()(x)
res.append(x)
# ... other layers
x = layers.Dense(num_class, name='fc1', use_bias=True)(x)
res.append(x)
model = models.Model(inputs=inputs, outputs=[res[-2], res[-3], res[-4], res[-1]])
At the question Adding new nodes to output layer in Keras I found an answer similar to what I'm searching for that I'm adding here below:
def add_outputs(self, n_new_outputs):
#Increment the number of outputs
self.n_outputs += n_new_outputs
weights = self.model.get_layer('fc8').get_weights()
#Adding new weights, weights will be 0 and the connections random
shape = weights[0].shape[0]
weights[1] = np.concatenate((weights[1], np.zeros(n_new_outputs)), axis=0)
weights[0] = np.concatenate((weights[0], -0.0001 * np.random.random_sample((shape, n_new_outputs)) + 0.0001), axis=1)
#Deleting the old output layer
self.model.layers.pop()
last_layer = self.model.get_layer('batchnormalization_1').output
#New output layer
out = Dense(self.n_outputs, activation='softmax', name='fc8')(last_layer)
self.model = Model(input=self.model.input, output=out)
#set weights to the layer
self.model.get_layer('fc8').set_weights(weights)
print(weights[0])
However in this question there was only one layer as output and I'm not sure how to replicate the same with my architecture.

This is the solution I've come up with. I assigned the layers that I wanted to keep as output to variables:
from tensorflow.keras import layers, models, Input, regularizers
inputs = Input(shape=(height, width, channels), name='data')
x = MyLayer()(inputs)
# ... other layers
a = MyLayer(name="a")(x)
b = MyLayer(name="b")(a)
c = MyLayer(name="c")(b)
x = layers.Dense(num_class, name='fc1', use_bias=True)(c)
model = models.Model(inputs=inputs, outputs=[c, b, a, x])
And then in order to increase the number of nodes in the last layer just call the function increment_classes. The total number of nodes will be the sum of old_num_class and num_class.
def increment_classes(model, old_num_class, num_class):
weights = model.get_layer("fc1").get_weights()
new_num_class = old_num_class + num_class
# Adding new weights, weights will be 0 and the connections random
shape = weights[0].shape[0]
weights[1] = np.concatenate((weights[1], np.zeros(num_class)), axis=0)
weights[0] = np.concatenate((weights[0], -0.0001 * np.random.random_sample((shape, num_class)) + 0.0001), axis=1)
# Deleting the old dense output layer
model.layers.pop()
# get the output layers
a = model.get_layer("a").output
b = model.get_layer("b").output
c = model.get_layer('c').output
# Replace dense output layer (x)
out = layers.Dense(new_num_class, name='fc1', use_bias=True)(c)
model = models.Model(inputs=model.input, outputs=[c, b, a, out])
# set weights to the layer
model.get_layer('fc1').set_weights(weights)
return model

Error overlapping layer name when loading a keras model

I have trained a composed model on keras, with one training on images using transfer learning from inception_v3 and one training on numerical feature, I had to rename the layers of the two models when creating the composed model to prevent overlapping names and it worked.
i=0
for layer in model.layers:
i+=1
layer._name = 'layer'+str(i) + str("_image")
i=0
for layer in model_structure.layers:
i+=1
layer._name = 'layer'+str(i) + str("_structure")
inputs1 = model.inputs
inputs2 = model_structure.inputs
outputs1 = model.layers[-2].output
outputs2 = model_structure.layers[-2].output
z = Concatenate()([outputs1,outputs2])
z = Dense(256, activation="relu")(z)
z = tf.keras.layers.Dropout(0.2)(z)
outputs = Dense(NUM_CLASSES)(z)
The problem is that once the model is trained and saved, I can't load it because it detects two layers with the same name which shouldn't be the case :
combined_model = load_model('/tf/pvc_data/train_checkpoints/combined_model.h5')
ValueError: The name "tf_op_layer_" is used 2 times in the model. All layer names should be unique.
Any suggestions on how could I load the model ?
Thank you

A suggestion would be to change the model names while loading the models. The following code has not been tested.
model_1 = keras.models.load_model('tf/pvc_data/train_checkpoints/combined_model.h5')
model_1._name = 'inceptionv3_images'
model_2 = keras.models.load_model('tf/pvc_data/train_checkpoints/combined_model.h5')
model_2._name = 'inceptionv3_numerical'
inputs1 = model.inputs
inputs2 = model_structure.inputs
outputs1 = model.layers[-2].output
outputs2 = model_structure.layers[-2].output
merged_output = keras.layers.Add()([outputs1,outputs2])
newModel = Model(inputs=[inputs1 , inputs2], merged_output)
I hope it gives you a starting point to solve your problem.

Gradcam with guided backprop for transfer learning in Tensorflow 2.0

I get an error using gradient visualization with transfer learning in TF 2.0. The gradient visualization works on a model that does not use transfer learning.
When I run my code I get the error:
assert str(id(x)) in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("block5_conv3/Identity:0", shape=(None, 14, 14, 512), dtype=float32)
When I run the code below it errors. I think there's an issue with the naming conventions or connecting inputs and outputs from the base model, vgg16, to the layers I'm adding. Really appreciate your help!
"""
Broken example when grad_model is created.
"""
!pip uninstall tensorflow
!pip install tensorflow==2.0.0
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
import matplotlib.pyplot as plt
IMAGE_PATH = '/content/cat.3.jpg'
LAYER_NAME = 'block5_conv3'
model_layer = 'vgg16'
CAT_CLASS_INDEX = 281
imsize = (224,224,3)
img = tf.keras.preprocessing.image.load_img(IMAGE_PATH, target_size=(224, 224))
plt.figure()
plt.imshow(img)
img = tf.io.read_file(IMAGE_PATH)
img = tf.image.decode_jpeg(img)
img = tf.cast(img, dtype=tf.float32)
# img = tf.keras.preprocessing.image.img_to_array(img)
img = tf.image.resize(img, (224,224))
img = tf.reshape(img, (1, 224,224,3))
input = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet',
input_shape=(imsize[0], imsize[1], imsize[2]))
# base_model.trainable = False
flat = layers.Flatten()
dropped = layers.Dropout(0.5)
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
fc1 = layers.Dense(16, activation='relu', name='dense_1')
fc2 = layers.Dense(16, activation='relu', name='dense_2')
fc3 = layers.Dense(128, activation='relu', name='dense_3')
prediction = layers.Dense(2, activation='softmax', name='output')
for layr in base_model.layers:
if ('block5' in layr.name):
layr.trainable = True
else:
layr.trainable = False
x = base_model(input)
x = global_average_layer(x)
x = fc1(x)
x = fc2(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = input, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
This portion of the code is where the error lies. I'm not sure what is the correct way to label inputs and outputs.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input, model.get_layer(model_layer).input],
outputs=[model.get_layer(model_layer).get_layer(LAYER_NAME).output,
model.output])
print(model.get_layer(model_layer).get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
conv_outputs, predictions = grad_model(img)
loss = predictions[:, 1]
The section below is for plotting a heatmap of gradcam.
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]
# Apply guided backpropagation
gate_f = tf.cast(output > 0, 'float32')
gate_r = tf.cast(grads > 0, 'float32')
guided_grads = gate_f * gate_r * grads
# Average gradients spatially
weights = tf.reduce_mean(guided_grads, axis=(0, 1))
# Build a ponderated map of filters according to gradients importance
cam = np.ones(output.shape[0:2], dtype=np.float32)
for index, w in enumerate(weights):
cam += w * output[:, :, index]
# Heatmap visualization
cam = cv2.resize(cam.numpy(), (224, 224))
cam = np.maximum(cam, 0)
heatmap = (cam - cam.min()) / (cam.max() - cam.min())
cam = cv2.applyColorMap(np.uint8(255 * heatmap), cv2.COLORMAP_JET)
output_image = cv2.addWeighted(cv2.cvtColor(img.astype('uint8'), cv2.COLOR_RGB2BGR), 0.5, cam, 1, 0)
plt.figure()
plt.imshow(output_image)
plt.show()
I also asked this to the tensorflow team on github at https://github.com/tensorflow/tensorflow/issues/37680.

I figured it out. If you set up the model extending the vgg16 base model with your own layers, rather than inserting the base model into a new model like a layer, then it works.
First set up the model and be sure to declare the input_tensor.
inp = layers.Input(shape=(imsize[0], imsize[1], imsize[2]))
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet', input_tensor=inp,
input_shape=(imsize[0], imsize[1], imsize[2]))
This way we don't have to include a line like x=base_model(inp) to show what input we want to put in. That's already included in tf.keras.applications.VGG16(...).
Instead of putting this vgg16 base model inside another model, it's easier to do gradcam by adding layers to the base model itself. I grab the output of the last layer of VGG16 (with the top removed), which is the pooling layer.
block5_pool = base_model.get_layer('block5_pool')
x = global_average_layer(block5_pool.output)
x = fc1(x)
x = prediction(x)
model = tf.keras.models.Model(inputs = inp, outputs = x)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
Now, I grab the layer for visualization, LAYER_NAME='block5_conv3'.
# Create a graph that outputs target convolution and output
grad_model = tf.keras.models.Model(inputs = [model.input],
outputs=[model.output, model.get_layer(LAYER_NAME).output])
print(model.get_layer(LAYER_NAME).output)
# Get the score for target class
# Get the score for target class
with tf.GradientTape() as tape:
predictions, conv_outputs = grad_model(img)
loss = predictions[:, 1]
print('Prediction shape:', predictions.get_shape())
# Extract filters and gradients
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]

We (I plus a number of team members developing a project) found a similar problem with a code implementing Grad-CAM that we found in a tutorial.
That code didn't work with a model consisting of the base model of VGG19 plus a few extra layers added on top of it. The problem was that the VGG19 base model was inserted as a "layer" inside our model, and apparently the GradCAM code didn't know how to deal with it - we were getting a "Graph disconnected..." error. Then after some debugging (carried out by another team member, not me) we managed to modify the original code to make it work for this kind of model that contains another model inside it. The idea is to add the inner model as an extra argument of the class GradCAM. Since this may be helpful to others I am including the modified code below (we also renamed the GradCAM class as My_GradCAM).
class My_GradCAM:
def __init__(self, model, classIdx, inner_model=None, layerName=None):
self.model = model
self.classIdx = classIdx
self.inner_model = inner_model
if self.inner_model == None:
self.inner_model = model
self.layerName = layerName
[...]
gradModel = tensorflow.keras.models.Model(inputs=[self.inner_model.inputs],
outputs=[self.inner_model.get_layer(self.layerName).output,
self.inner_model.output])
Then the class can be instantiated by adding the inner model as the extra argument, e.g.:
cam = My_GradCAM(model, None, inner_model=model.get_layer("vgg19"), layerName="block5_pool")
I hope this helps.
Edit: Credit to Mirtha Lucas for doing the debugging and finding the solution.

After a lot of struggle, I condense the way to draw the heat map when you are using transfer learning. Here is the keras official tutorial
The issue I encounter is that when I'm trying to draw the heat map
from my model, the densenet can be only seen as functional layer in my
model. So the make_gradcam_heatmap can not figure out the layer that
inside functional layer. As the 5th layer shows.
Therefore, to simulate the Keras official document, I need to only use the densenet as the model for visualization. Here is the step
Only Take out the model from your model
dense_model = dense_model.get_layer('densenet121')
Copy the weight from dense model to your new initiated model
inputs = tf.keras.Input(shape=(224, 224, 3))
model = model_builder(weights="imagenet", include_top=True, input_tensor=inputs)
for layer, dense_layer in zip(model.layers[1:], dense_model.layers[1:]):
layer.set_weights(dense_layer.get_weights())
relu = model.get_layer('relu')
x = tf.keras.layers.GlobalAveragePooling2D()(relu.output)
outputs = tf.keras.layers.Dense(5)(x)
model = tf.keras.models.Model(inputs = inputs, outputs = outputs)
Draw the heat map
preprocess_input = keras.applications.densenet.preprocess_input
img_array = preprocess_input(get_img_array(img_path, size=(224, 224)))
heatmap = make_gradcam_heatmap(img_array, model, 'bn')
plt.matshow(heatmap)
plt.show()
get_img_array, make_gradcam_heatmap and save_and_display_gradcam are kept in still. Follow the keras tutorial then you are good to go.

How to combine two layers in Keras? One with embedding, one with "other" feature

I am trying to combine an embedding layer with a numeric feature layer. I did like:
tensor_feature = Input(shape=(MAX_LENGTH, 3))
tensor_embed = Input(shape=(MAX_LENGTH, ))
tensor_embed = Embedding(len(word2index), 128)(tensor_embed)
merged_tensor = concatenate([tensor_embed, tensor_feature])
model = Bidirectional(LSTM(256, return_sequences=True))(merged_tensor)
model = Bidirectional(LSTM(128, return_sequences=True))(model)
model = TimeDistributed(Dense(len(tag2index)))(model)
model = Activation('softmax')(model)
model = Model(inputs=[tensor_embed,tensor_feature],outputs=model)
Noted that MAX_LENGTH is 82. Unfortunately, I got an error like this:
ValueError: Graph disconnected: cannot obtain value for tensor
Tensor("input_2:0", shape=(?, 82), dtype=float32) at layer "input_2".
The following previous layers were accessed without issue: []
while combining input and output. Please help.

You are overwriting tensor_embed which is an input layer to embedding output and using same again as input in the model. Change your code to
tensor_feature = Input(shape=(MAX_LENGTH, 3))
tensor_embed_feature = Input(shape=(MAX_LENGTH, ))
tensor_embed = Embedding(len(word2index), 128)(tensor_embed_feature)
merged_tensor = concatenate([tensor_embed, tensor_feature])
model = Bidirectional(LSTM(256, return_sequences=True))(merged_tensor)
model = Bidirectional(LSTM(128, return_sequences=True))(model)
model = TimeDistributed(Dense(len(tag2index)))(model)
model = Activation('softmax')(model)
model = Model(inputs=[tensor_embed_feature,tensor_feature],outputs=model)

seq2seq prediction for time series

I want to make a Seq2Seq model for reconstruction purpose. I want a model trained to reconstruct the normal time-series and it is assumed that such a model would do badly to reconstruct the anomalous time-series having not seen them during training.
I have some gaps in my code and also in the understanding. I took this as an orientation and did so far:
traindata: input_data.shape(1000,60,1) and target_data.shape(1000,50,1) with target data being the same training data only in reversed order as sugested in the paper here.
for inference: I want to predict another time series data with the trained model having the shape (3000,60,1). T Now 2 points are open: how do I specify the input data for my training model and how do I build the inference part with the stop condition ? Please correct any mistakes.
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from keras.layers import Dense
num_encoder_tokens = 1#number of features
num_decoder_tokens = 1#number of features
encoder_seq_length = None
decoder_seq_length = None
batch_size = 50
epochs = 40
# same data for training
input_seqs=()#shape (1000,60,1) with sliding windows
target_seqs=()#shape(1000,60,1) with sliding windows but reversed
x= #what has x to be ?
#data for inference
# how do I specify the input data for my other time series ?
# Define training model
encoder_inputs = Input(shape=(encoder_seq_length,
num_encoder_tokens))
encoder = LSTM(128, return_state=True, return_sequences=True)
encoder_outputs = encoder(encoder_inputs)
_, encoder_states = encoder_outputs[0], encoder_outputs[1:]
decoder_inputs = Input(shape=(decoder_seq_length,
num_decoder_tokens))
decoder = LSTM(128, return_sequences=True)
decoder_outputs = decoder(decoder_inputs, initial_state=encoder_states)
decoder_outputs = TimeDistributed(
Dense(num_decoder_tokens, activation='tanh'))(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# Training
model.compile(optimizer='adam', loss='mse')
model.fit([input_seqs,x], target_seqs,
batch_size=batch_size, epochs=epochs)
# Define sampling models for inference
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(100,))
decoder_state_input_c = Input(shape=(100,))
decoder_states = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs = decoder(decoder_inputs,
initial_state=decoder_states)
decoder_model = Model([decoder_inputs] + decoder_states,
decoder_outputs)
# Sampling loop for a batch of sequences
states_values = encoder_model.predict(input_seqs)
stop_condition = False
while not stop_condition:
output_tokens = decoder_model.predict([target_seqs] + states_values)
#what else do I need to include here ?
break

def predict_sequence(infenc, infdec, source, n_steps, cardinality):
# encode
state = infenc.predict(source)
# start of sequence input
target_seq = array([0.0 for _ in range(cardinality)]).reshape(1, 1, cardinality)
# collect predictions
output = list()
for t in range(n_steps):
# predict next char
yhat, h, c = infdec.predict([target_seq] + state)
# store prediction
output.append(yhat[0,0,:])
# update state
state = [h, c]
# update target sequence
target_seq = yhat
return array(output)
You can see that the output from every timestep is fed back to the LSTM cell externally.

You can refer the blog and find how it is done during inference.
https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/

During training, we give the data in a one shot manner. I think you understand that part.
But during the inference time, we can't do like that. We have to give the data at every time step and then return the cell states, hidden states and the loop should continue till the last word is generated

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get attention weights in hierarchical model - python

Related

Add new nodes to one of the output layers in a Keras model

Error overlapping layer name when loading a keras model

Gradcam with guided backprop for transfer learning in Tensorflow 2.0

How to combine two layers in Keras? One with embedding, one with "other" feature

seq2seq prediction for time series

Categories

Resources