Testing Concept Activation Vector (TCAV) for Pytorch - python

I am trying to compute the Testing Concept Activation Vectors (TCAV, as described here) vectors for different concepts for my classification model. So far, I haven't successfully found code online for Pytorch models so I have decided to rewrite it myself. The code I am trying to copy is:
def compute_tcav(input_tensor, model, AA, layer_name, filter_indices, optimizer, seed_input=None, wrt_tensor=None, backprop_modifier=None, grad_modifier='absolute'):
layer_AA = AA[layer_name]
losses = [
(ActivationMaximization(layer_AA, filter_indices), -1)
]
opt = optimizer(input_tensor, losses, wrt_tensor=wrt_tensor, norm_grads=False)
#grads = opt.minimize(seed_input=seed_input, max_iter=1, grad_modifier=grad_modifier, verbose=False)[1]
return losses #utils.normalize(grads)[0]
source: https://github.com/maragraziani/iMIMIC-RCVs/blob/master/rcv_utils.py
This is what I have so far:
def ActivationMaximizationLoss(input_AA):
loss = torch.mean(input_AA)
return loss
def compute_tcav_pytorch(model, layer_predictions):
optimizer = torch.optim.SGD(model.parameters(), 1e-4)
optimizer.zero_grad()
input_AA = torch.from_numpy(layer_predictions['_blocks.6._project_conv']) # middle
input_AA.requires_grad=True
loss = ActivationMaximizationLoss2(input_AA)
loss.backward()
optimizer.step()
img = input_AA.grad
return img[0][0]
I am trying to maximise a layer activation from the model and from that get the TCAV vector.

Related

Why PyTorch optimizer might fail to update its parameters?

I am trying to do a simple loss-minimization for a specific variable coeff using PyTorch optimizers. This variable is supposed to be used as an interpolation coefficient for two vectors w_foo and w_bar to find a third vector, w_target.
w_target = `w_foo + coeff * (w_bar - w_foo)
With w_foo and w_bar set as constant, at each optimization step I calculate w_target for the given coeff. Loss is determined from w_target using a fairly complex process beyond the scope of this question.
# w_foo.shape = [1, 16, 512]
# w_bar.shape = [1, 16, 512]
# num_layers = 16
# num_steps = 10000
vgg_loss = VGGLoss()
coeff = torch.randn([num_layers, ])
optimizer = torch.optim.Adam([coeff], lr=initial_learning_rate)
for step in range(num_steps):
w_target = w_foo + torch.matmul(coeff, (w_bar - w_foo))
optimizer.zero_grad()
target_image = generator.synthesis(w_target)
processed_target_image = process(target_image)
loss = vgg_loss(processed_target_image, source_image)
loss.backward()
optimizer.step()
However, when running this optimizer, I am met with query_opt not changing from one step to another, optimizer being essentially useless. I would like to ask for some advice on what I am doing wrong here.
Edit:
As suggested, I will try to elaborate on the loss function. Essentially, w_target is used to generate an image, and VGGLoss uses VGG feature extractor to compare this synthetic image with a certain exemplar source image.
class VGGLoss(torch.nn.Module):
def __init__(self, device, vgg):
super().__init__()
for param in self.parameters():
param.requires_grad = True
self.vgg = vgg # VGG16 in eval mode
def forward(self, source, target):
loss = 0
source_features = self.vgg(source, resize_images=False, return_lpips=True)
target_features = self.vgg(target, resize_images=False, return_lpips=True)
loss += (source_features - target_features).square().sum()
return loss

Training Graph Neural Network (GNN) to create Embeddings using spektral

I am working to create a Graph Neural Network (GNN) which can create embeddings of the input graph for its usage in other applications like Reinforcement Learning.
I have started with example from the spektral library TUDataset classification with GIN and modified it to divide the network into two parts. The first part to produce embeddings and second part to produce classification. My goal is to train this network using supervised learning on dataset with graph labels e.g. TUDataset and use the first part (embedding generation) once trained in other applications.
I am getting different results from my approach in two different datasets. The TUDataset shows improved loss and accuracy with this new approach whereas the other other local dataset shows significant increase in the loss.
Can I get any feedback if my approach to create embedding is appropriate or any suggestions for further improvement?
here is my code used to generate graph embeddings:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from spektral.data import DisjointLoader
from spektral.datasets import TUDataset
from spektral.layers import GINConv, GlobalAvgPool
################################################################################
# PARAMETERS
################################################################################
learning_rate = 1e-3 # Learning rate
channels = 128 # Hidden units
layers = 3 # GIN layers
epochs = 300 # Number of training epochs
batch_size = 32 # Batch size
################################################################################
# LOAD DATA
################################################################################
dataset = TUDataset("PROTEINS", clean=True)
# Parameters
F = dataset.n_node_features # Dimension of node features
n_out = dataset.n_labels # Dimension of the target
# Train/test split
idxs = np.random.permutation(len(dataset))
split = int(0.9 * len(dataset))
idx_tr, idx_te = np.split(idxs, [split])
dataset_tr, dataset_te = dataset[idx_tr], dataset[idx_te]
loader_tr = DisjointLoader(dataset_tr, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(dataset_te, batch_size=batch_size, epochs=1)
################################################################################
# BUILD MODEL
################################################################################
class GIN0(Model):
def __init__(self, channels, n_layers):
super().__init__()
self.conv1 = GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
self.convs = []
for _ in range(1, n_layers):
self.convs.append(
GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
)
self.pool = GlobalAvgPool()
self.dense1 = Dense(channels, activation="relu")
def call(self, inputs):
x, a, i = inputs
x = self.conv1([x, a])
for conv in self.convs:
x = conv([x, a])
x = self.pool([x, i])
return self.dense1(x)
# Build model
model = GIN0(channels, layers)
model_op = Sequential()
model_op.add(Dropout(0.5, input_shape=(channels,)))
model_op.add(Dense(n_out, activation="softmax"))
opt = Adam(lr=learning_rate)
loss_fn = CategoricalCrossentropy()
################################################################################
# FIT MODEL
################################################################################
#tf.function(input_signature=loader_tr.tf_signature(), experimental_relax_shapes=True)
def train_step(inputs, target):
with tf.GradientTape(persistent=True) as tape:
node2vec = model(inputs, training=True)
predictions = model_op(node2vec, training=True)
loss = loss_fn(target, predictions)
loss += sum(model.losses)
gradients = tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(gradients, model.trainable_variables))
gradients2 = tape.gradient(loss, model_op.trainable_variables)
opt.apply_gradients(zip(gradients2, model_op.trainable_variables))
acc = tf.reduce_mean(categorical_accuracy(target, predictions))
return loss, acc
print("Fitting model")
current_batch = 0
model_lss = model_acc = 0
for batch in loader_tr:
lss, acc = train_step(*batch)
model_lss += lss.numpy()
model_acc += acc.numpy()
current_batch += 1
if current_batch == loader_tr.steps_per_epoch:
model_lss /= loader_tr.steps_per_epoch
model_acc /= loader_tr.steps_per_epoch
print("Loss: {}. Acc: {}".format(model_lss, model_acc))
model_lss = model_acc = 0
current_batch = 0
################################################################################
# EVALUATE MODEL
################################################################################
def tolist(predictions):
result = []
for item in predictions:
result.append((float(item[0]), float(item[1])))
return result
loss_data = []
print("Testing model")
model_lss = model_acc = 0
for batch in loader_te:
inputs, target = batch
node2vec = model(inputs, training=False)
predictions = model_op(node2vec, training=False)
predictions_list = tolist(predictions)
loss_data.append(zip(target,predictions_list))
model_lss += loss_fn(target, predictions)
model_acc += tf.reduce_mean(categorical_accuracy(target, predictions))
model_lss /= loader_te.steps_per_epoch
model_acc /= loader_te.steps_per_epoch
print("Done. Test loss: {}. Test acc: {}".format(model_lss, model_acc))
for batchi in loss_data:
for item in batchi:
print(list(item),'\n')
Your approach to generate graph embeddings is correct, the GIN0 model will return a vector given a graph.
This code here, however, seems weird:
gradients = tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(gradients, model.trainable_variables))
gradients2 = tape.gradient(loss, model_op.trainable_variables)
opt.apply_gradients(zip(gradients2, model_op.trainable_variables))
What you're doing here is that you're updating the weights of model twice, and the weights of model_op once.
When you compute the loss in the context of a tf.GradientTape, all computations that went into computing the final value are tracked. This means that if you call loss = foo(bar(x)) and then compute the training step using that loss, the weights of both foo and bar will be updated.
Besides this, I don't see issues with the code so it will mostly depend on the local dataset that you are using.
Cheers

How to employ learning rate schedule while training a tensorflow model from scratch

This question is a possible duplicate of another question on stack overflow.
However, the answer doesn't indicate how to modify the optimizer learning rate using a scheduler (that could be implemented in simple python).
I'm training a tensorflow model from scratch as explained here. Hence, an optimizer is defined as: optimizer = keras.optimizers.SGD(learning_rate=1e-3), therefore, the learning rate is defined at the beginning. However, I'd like to have a learning rate scheduler such as tf.keras.optimizers.schedules.ExponentialDecay. So how can I change the optimizer learning rate from within the training loop?
Please note that I am not using the model.fit in this case.
Any help is much appreciated.
Try this code:
import tensorflow as tf
class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
def __init__(self, drop=0.5):
super(CustomSchedule, self).__init__()
self.drop = drop
def __call__(self, step):
return tf.math.pow(self.drop, step)
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images)
loss = tf.losses.mse(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
learning_rate = CustomSchedule(0.5)
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)
model = tf.keras.Sequential([
tf.keras.layers.Dense(10)
])
input = tf.random.uniform([10, 10])
labels = tf.random.uniform([10, 10], 0, 10, dtype=tf.int32)
train_step(input, labels)

Cost remains same at 0.6932 in Siamese Network

I am trying to implement a Siamese Network, as in this paper
In this paper, they have used cross entropy for the Loss function
I am using STL-10 Dataset for training and instead of the 3 layer network used in the paper, I replaced it with VGG-13 CNN network, except the last logit layer.
Here is my loss function code
def loss(pred,true_pred):
cross_entropy_loss = tf.multiply(-1.0,tf.reduce_mean(tf.add(tf.multiply(true_pred,tf.log(pred)),tf.multiply((1-true_pred),tf.log(tf.subtract(1.0,pred))))))
total_loss = tf.add(tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)),cross_entropy_loss,name='total_loss')
return cross_entropy_loss,total_loss
with tf.device('/gpu:0'):
h1 = siamese(feed_image1)
h2 = siamese(feed_image2)
l1_dist = tf.abs(tf.subtract(h1,h2))
with tf.variable_scope('pred') as scope:
predictions = tf.contrib.layers.fully_connected(l1_dist,1,activation_fn = tf.sigmoid,weights_initializer = tf.contrib.layers.xavier_initializer(uniform=False),weights_regularizer = tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32)))
celoss,cost = loss(predictions,feed_labels)
with tf.variable_scope('adam_optimizer') as scope:
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
opt = optimizer.minimize(cost)
However, when I run the training, the cost remains almost constant at 0.6932
I have used Adam Optimizer here.
But previously I used Momentum Optimizer.
I have tried changing the learning rate but the cost still behaves the same.
And all the prediction values converge to 0.5 after a few iterations.
After taking the output for two batches of images (input1 and input2), I take their L1 distance and to that I have connected a fully connected layer with a single output and sigmoid activation function.
[h1 and h2 contains the output of the last fully connected layer(not the logit layer) of the VGG-13 network]
Since the output activation function is sigmoid, and since the prediction values are around 0.5, we can calculate and say that the sum of the weighted L1 distance of the output of the two networks is near to zero.
I can't understand where I am going wrong.
A little help will be very much appreciated.
I thought the nonconvergence may be caused by the gradient vanishing. You can trace the gradients using tf.contrib.layers.optimize_loss and the tensorboard. You can refer to this answer for more details.
Several optimizations(maybe):
1) don't write the cross entropy yourself.
You can employ the sigmoid cross entropy with logits API, since it ensures stability as documented:
max(x, 0) - x * z + log(1 + exp(-abs(x)))
2) do some weigh normalization may would hlep.
3) keep the regularization loss small.
You can read this answer for more information.
4) I don't see the necessity of tf.abs the L1 distance.
And here is the code I modified. Hope it helps.
mode = "training"
rl_rate = .1
with tf.device('/gpu:0'):
h1 = siamese(feed_image1)
h2 = siamese(feed_image2)
l1_dist = tf.subtract(h1, h2)
# is it necessary to use abs?
l1_dist_norm = tf.layers.batch_normalization(l1_dist, training=(mode=="training"))
with tf.variable_scope('logits') as scope:
w = tf.get_variable('fully_connected_weights', [tf.shape(l1_dist)[-1], 1],
weights_initializer = tf.contrib.layers.xavier_initializer(uniform=False), weights_regularizer = tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32))
)
logits = tf.tensordot(l1_dist_norm, w, axis=1)
xent_loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=feed_labels)
total_loss = tf.add(tf.reduce_sum(rl_rate * tf.abs(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))), (1-rl_rate) * xent_loss, name='total_loss')
# or:
# weights = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
# l1_regularizer = tf.contrib.layers.l1_regularizer()
# regularization_loss = tf.contrib.layers.apply_regularization(l1_regularizer, weights)
# total_loss = xent_loss + regularization_loss
with tf.variable_scope('adam_optimizer') as scope:
optimizer = tf.train.AdamOptimizer(learning_rate=0.0005)
opt = tf.contrib.layers.optimize_loss(total_loss, global_step, learning_rate=learning_rate, optimizer="Adam", clip_gradients=max_grad_norm, summaries=["gradients"])

Why is the global Step not incrementing in custom Tensorflow model function

I've been trying to update this tutorial to the latest version of tensorflow ie 0.12 and I've hit a snag in the custom model function definition to train the lstm model.
def _lstm_model(X, y):
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell(
lstm_cells(rnn_layers),
state_is_tuple=True)
global_step = tf.Variable(0, trainable=False)
X = tf.cast(X, tf.float32)
y = tf.cast(y, tf.float32)
x_ = tf.unpack(X, axis=1, num=time_steps)
output, layers = tf.nn.rnn(stacked_lstm, x_, dtype=dtypes.float32)
output = dnn_layers(output[-1], dense_layers)
(predictions, loss) = learn.models.linear_regression(output, y)
if optim == 'Adagrad':
print("using AdagradOptimizer")
optimizer = tf.train.AdagradOptimizer(learning_rate)
else:
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(
loss,
global_step=global_step)
return (predictions, loss, train_op)
I've tried both specifying the global step and not specifying it - and I've ended up with the same result - the step remains at 0 with loss being optimized and this continues until I stop the entire script. The code below is what I use to create an Estimator and try to fit the model.
regressor = learn.SKCompat(learn.Estimator(
model_fn=lstm_model(TIMESTEPS,
RNN_LAYERS,
DENSE_LAYERS,
optim='Adagrad',
learning_rate=0.03)))
regressor.fit(x=X['train'],
y=y['train'],
batch_size=BATCH_SIZE,
steps=TRAINING_STEPS
)
I realized that the adamOptimizer was not to be used on it's own - but with the neural network that I had created. The following edit on my code - helped me achieve what I wanted to :
# create model and features
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell(
lstm_cells(params['rnn_layers']),
state_is_tuple=True)
features = tf.cast(features, tf.float32)
targets = tf.cast(targets, tf.float32)
features = tf.unpack(features, axis=1, num=params['time_steps'])
output, layers = tf.nn.rnn(stacked_lstm, features, dtype=dtypes.float32)
output = dnn_layers(output[-1], params['dense_layers'])
# Define Loss
(predictions, loss) = learn.models.linear_regression(output, targets)
# train_op
train_op = tf.contrib.layers.optimize_loss(
loss=loss,
global_step=contrib.framework.get_global_step(),
learning_rate=params['learning_rate'],
optimizer='Adagrad'
)
Here I created the neural Network layers first and then tried to find the optimization on that layer - this is what seemed to have caused me problems.
The tensorflow documentation helped me out quite a bit - hence I was able to run the code on tensorflow - v0.12.

Categories