RBF Layer - difficulty in understanding - python

I wanted to implement an RBFN and found this code on StackOverflow itself. While I do understand some of the code, I do not understand what gamma, kwargs, and the entire call function.
Can someone please explain it to me?
from keras.layers import Layer
from keras import backend as K
class RBFLayer(Layer):
def __init__(self, units, gamma, **kwargs):
super(RBFLayer, self).__init__(**kwargs)
self.units = units
self.gamma = K.cast_to_floatx(gamma)
def build(self, input_shape):
self.mu = self.add_weight(name='mu',
shape=(int(input_shape[1]), self.units),
initializer='uniform',
trainable=True)
super(RBFLayer, self).build(input_shape)
def call(self, inputs):
diff = K.expand_dims(inputs) - self.mu
l2 = K.sum(K.pow(diff,2), axis=1)
res = K.exp(-1 * self.gamma * l2)
return res
def compute_output_shape(self, input_shape):
return (input_shape[0], self.units)

Gamma: According to the doc: the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The behavior of the model is very sensitive to the gamma parameter. When gamma is very small, the model is too constrained and cannot capture the complexity or “shape” of the data. It's a hyper-parameter.
kwargs: **kwargs is used to let the functions take an arbitrary number of keyword arguments. Details.
Call: In the call function, you're calculating the radial basis function kernel, ie. RBF kernel, defined as follows.
source.
The calculation of numerator part:
diff = K.expand_dims(inputs) - self.mu
l2 = K.sum(K.pow(diff,2), axis=1)
The calculation of denominator part:
res = K.exp(-1 * self.gamma * l2)
The self.gamma can be expressed as follows

Related

Error when defining custom gradients in Keras

I have been trying to define a custom layer in Keras with a custom discrete gradient, as the activation function is discrete.
The layer looks like this:
class DiffLayer(tf.keras.layers.Layer):
def __init__(self):
super(DiffLayer, self).__init__()
def build(self, input_shape):
self.w = self.add_weight(
shape=(15, 1),
initializer="random_normal",
trainable=True,
)
self.b = self.add_weight(
shape=(1, 1), initializer="random_normal", trainable=True
)
def call(self, x):
z = tf.matmul(Flatten()(x), self.w) + self.b
a = custom_op(z)
self.a = a
if K.greater(a,0.5):
return x-1
else:
return x
And the custom_op function:
#tf.custom_gradient
def custom_op(x):
a = 1. / (1. + K.exp(-x))
def custom_grad(dy):
if K.greater(a, 0.5):
grad = K.exp(x)
else:
grad = 0
return grad
return a, custom_grad
I have followed the tutorials from this post but when I try to fit the network that I am working with I get the following warning:
WARNING:tensorflow:Gradients do not exist for variables ['diff_layer_10/Variable:0', 'diff_layer_10/Variable:0'] when minimizing the loss.
My guess is that Keras is not detecting the defined gradient because of the way it is defined but I cannot think of a different way of defining it.
Is this the case or am I missing something in my code?
EDIT
As suggested by one of the comments I am going to further explain what I am trying to do. I want a to be a parameter that decides what happens to the input data. If a is greater than 0.5 then I want 1 subtracted to the input data, otherwise the layer should return the input data.
I do not know if that is possible to do in Keras.

Custom keras callbacks and changing weight (beta) of regularization term in variational autoencoder loss function

The variational autoencoder loss function is this: Loss = Loss_reconstruction + Beta * Loss_kld. I am trying to efficiently implement Kullback-Liebler Divergence Cyclic Annealing--that is changing the weight of beta dynamically during training. I subclass the tf.keras.callbacks.Callback class as a start, but I don't know how I can update a tf.keras.Model variable from a custom keras callback. Furthermore, I would like to track how the betas change at the end of each training step (on_train_batch_end), and right now I have a list in the callback class, but I know python lists don't play well with TensorFlow. When I fit the model, I get a warning that my on_train_batch_end function is slower than the processing of the batch itself. I think I should use a tf.TensorArray instead of python lists, but then the tf.TensorArray method write cannot use a tf.Variable for the index (i.e., as the number of steps changes, the index in the tf.TensorArray to which a new beta for that step should be written changes)... is there a better way to store value changes? It looks like this github shows a solution that doesn't involve a custom tf.keras.Model and that uses a different kind of KL annealing. Below is a callback function and dummy VAE.
class CyclicAnnealing(tf.keras.callbacks.Callback):
"""Cyclic annealing from https://arxiv.org/abs/1903.10145
Requires that model tracks training iterations and
total number of training iterations. It also requires
that model has hyperparameter for `M` and `R`.
"""
def __init__(self, schedule_fxn='sigmoid', **kwargs):
super().__init__(**kwargs)
# INEFFICIENT WAY OF LOGGING `betas` AND THE TRAIN STEPS...
# The `train_iterations` list could be removed because in principle
# if I have a list of betas, I know that the list of betas is of length
# (number of samples//batch size) * number of epochs.
# This is because (number of samples//batch size) * number of epochs is the total number of steps for the model.
self.betas = []
self.train_iterations = []
if schedule_fxn == 'sigmoid':
self.schedule_fxn = self.sigmoid
elif schedule_fxn =='linear':
self.schedule_fxn = self.linear
else:
raise ValueError('Invalid arg: `schedule_fxn`')
def on_epoch_end(self, epoch, logs=None):
print('\nCurrent anneal weight B =', self.beta)
def on_train_batch_end(self, batch, logs=None):
"""Computes betas and updates list"""
# Compute beta
self.beta = self.beta_tau_cyclic_annealing(self.compute_tau())
###################################
# HOW TO UPDATE BETA IN THE MODEL???
###################################
# Update the lists for logging
self.betas.append(self.beta)
self.train_iterations.append(self.model._train_counter))
def get_annealing_data(self):
return {'betas': self.betas, 'training_iterations': self.train_iterations}
def sigmoid(self, x):
"""Monotonic increasing function
:return: tf.constant float32
"""
return (1/(1+tf.keras.backend.exp(-x)))
def linear(self, x):
return x/self.model._R
def compute_tau(self):
"""Used to determine kld_beta.
:return: tf.constant float32
"""
t = tf.identity(self.model._train_counter)
T = self.model._total_training_iterations
M = self.model._M
numerator = tf.cast(tf.math.floormod(tf.subtract(t, 1), tf.math.floordiv(T, M)), dtype=tf.float32)
denominator = tf.cast(tf.math.floordiv(T, M), dtype=tf.float32)
return tf.math.divide(numerator, denominator)
def beta_tau_cyclic_annealing(self, tau):
"""Compute change for kld_beta.
:param tau: Increases beta_tau
:param R: Proportion used to increase Beta w/i cycle.
:return: tf.constant float32
"""
R = self.model._R
if tau <= R:
return self.schedule_fxn(tau)
else:
return tf.constant(1.0)
Dummy vae:
class VAE(tf.keras.Model):
def __init__(self, num_samples, batch_size, epochs, features, units, latent_size, kld_beta, M, R, **kwargs):
"""Defines state for model.
:param num_samples: <class 'int'>
:param batch_size: <class 'int'>
:param epochs: <class 'int'>
:param features: <class 'int'> if input is (n, m), then `features` is the the `m` dimension. This param is used with the decoder.
:param units: <class 'int'> Number of hidden units.
:param latent_size: <class 'int'> Dimension of latent space z.
:param kld_beta: <tf.Variable??> for dynamic weight.
:param M: <class 'int'> Hyperparameter for cyclic annealing.
:param R: <class 'float'> Hyperparameter for cyclic annealing.
"""
super().__init__(**kwargs)
# NEED TO UPDATE THIS SOMEHOW -- I think it should be a tf.Variable?
self.kld_beta = kld_beta
# Hyperparameters for CyclicAnnealing
self._M = M
self._R = R
self._total_training_iterations = (num_samples//batch_size) * epochs
# Encoder and Decoder not defined, but typically
# encoder = inputs -> dense -> dense mu and dense log var -> z
# while decoder = z -> dense -> reconstructions
self.encoder = Encoder(units, latent_size)
self.decoder = Decoder(features)
def call(self, inputs):
z, mus, log_vars = self.encoder(inputs)
reconstructions = self.decoder(z)
kl_loss = self.compute_kl_loss(mus, log_vars)
# THE BETA WEIGHT NEEDS TO BE DYNAMIC
weighted_kl_loss = self.kld_beta * kl_loss
self.add_loss(weighted_kl_loss)
return reconstructions
def compute_kl_loss(self, mus, log_vars):
return -0.5 * tf.reduce_mean(1. + log_vars - tf.exp(log_vars) - tf.pow(mus, 2))
Concerning your first question: It depends how you plan to update your gradients with your optimizer (e.g. ADAM). When training a VAE with Tensorflow / Keras, I usually use the #tf.functiondecorator to calculate the loss of my model and based on that update my model's parameters:
#tf.function
def train_step(self, model, batch, gamma, capacity):
with tf.GradientTape() as tape:
x, c = batch
loss = compute_loss(model, x, c, gamma, capacity)
tf.print('Total loss: ', loss)
gradients = tape.gradient(loss, model.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
Note the variables gamma and capacity. They are defined as terms which influence the loss function. I update them after an x number of epochs as follows:
new_weight = min(tf.keras.backend.get_value(capacity) + (20. / capacity_annealtime), 20.)
tf.keras.backend.set_value(capacity, new_weight)
At this point you can easily save the new_weight for logging purposes or you can defined a custom Tensorflow logger to log into a file. If you really want to use an array, you could simply define a TF array as:
this_array = tf.TensorArray(tf.float32, size=0, dynamic=True)
and update it after an x number of steps:
this_array.write(this_array.size(), new_beta_weight)
You could also use a second array and update it simultaneously in order to record the epoch or batch at which your new_beta_weight was updated.
Finally, the loss function itself looks like this:
def compute_loss(model, x, c, gamma_weight, capacity_weight):
mean, logvar = model.encode(x, c)
z = model.reparameterize(mean, logvar)
reconstruction = model.decode(z, c)
total_reconstruction_loss =
tf.nn.sigmoid_cross_entropy_with_logits(labels=x,
logits=reconstruction)
total_reconstruction_loss = tf.reduce_sum(total_reconstruction_loss,
1)
kl_loss = 1 + logvar - tf.square(mean) - tf.exp(logvar)
kl_loss = tf.reduce_mean(kl_loss)
kl_loss *= -0.5
total_loss = tf.reduce_mean(total_reconstruction_loss * 3 + (
gamma_weight * tf.abs(kl_loss - capacity_weight)))
return total_loss
Note that model is from the type tf.keras.Model. This should hopefully give you some different insights into this specific topic.

Implementing Neural Network using pure Numpy (Softmax + CrossEntropy)

I am trying a simple implementation of a multi-layer perceptron (MLP) using pure NumPy. My previous implementation using RMSE and sigmoid activation at the output (single output) works perfectly with appropriate data. However, when I consider multi-output system (Due to one-hot encoding) with Cross-entropy loss function and softmax activation always fails.
I believe I am doing something wrong with my implementation for gradient calculation but unable to figure it out. So I am here for help.
For the current implementation, I use IRIS dataset for testing the model.
The data for IRIS is obtained as follows:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import minmax_scale
def one_hot_encoder(y):
y_oh = np.zeros((len(y), np.max(y)+1))
for t in np.unique(y):
y_oh[y==t,t] = 1
return y_oh
data = load_iris().data
target = load_iris().target
data_scaled = minmax_scale(data)
target_oh = one_hot_encoder(target)
A Neural network class is defined with a simple 1-hidden layer network as follows:
class NeuralNetwork:
def __init__(self, x, y):
self.x = x
# hidden layer with 16 nodes
self.weights1= np.random.rand(self.x.shape[1],16)
self.bias1 = np.random.rand(16)
# output layer with 3 nodes (for 3 output - One-hot encoded)
self.weights2 = np.random.rand(16,3)
self.bias2 = np.random.rand(3)
self.y = y
self.pred = np.zeros(y.shape)
self.lr = 0.001
def feedforward(self):
self.layer1 = sigmoid(np.dot(self.x, self.weights1) + self.bias1)
self.layer2 = softmax(np.dot(self.layer1, self.weights2) + self.bias2)
# print(self.layer2.shape)
return self.layer2.clip(min=1e-8, max=None)
def backprop(self):
dloss = cross_entropy_derivative(self.pred, self.y) # 2*(self.y - self.pred)
d_weights2 = np.dot(self.layer1.T, dloss*softmax_derivative(self.pred))
d_bias2 = np.dot(np.ones([self.x.shape[0]]), dloss*softmax_derivative(self.pred))
d_weights1 = np.dot(self.x.T, np.dot(dloss*softmax_derivative(self.pred), self.weights2.T)*sigmoid_derivative(self.layer1))
d_bias1 = np.dot(np.ones([self.x.shape[0]]), np.dot(dloss*softmax_derivative(self.pred), self.weights2.T)*sigmoid_derivative(self.layer1))
self.weights1 += self.lr*d_weights1
self.weights2 += self.lr*d_weights2
self.bias1 += self.lr*d_bias1
self.bias2 += self.lr*d_bias2
def train(self, X, y):
self.x = X
self.y = y
self.pred = self.feedforward()
self.backprop()
def predict(self, X):
self.x = X
self.pred = self.feedforward()
return self.pred
def evaluate(self, y, pred):
self.y = y
self.pred = pred
# self.loss = np.sqrt(np.mean((self.pred-self.y)**2))
self.loss = cross_entropy(self.pred, self.y)
return self.loss
The activation functions and their derivatives are computed as follows (I feel there is something wrong here)
# Activation functions
def sigmoid(t):
return 1/(1+np.exp(-t))
# Derivative of sigmoid
def sigmoid_derivative(p):
return p * (1 - p)
# sofmax activation
def softmax(X):
exps = np.exp(X - np.max(X,axis=1).reshape(-1,1))
return exps / np.sum(exps,axis=1)[:,None]
# derivative of softmax
def softmax_derivative(pred):
return pred * (1 -(1 * pred).sum(axis=1)[:,None])
The cross-entropy loss function and its derivatives are as shown below:
def cross_entropy(X,y):
X = X.clip(min=1e-8,max=None)
# print('\n\nCE: ', (np.where(y==1,-np.log(X), 0)).sum(axis=1))
return (np.where(y==1,-np.log(X), 0)).sum(axis=1)
def cross_entropy_derivative(X,y):
X = X.clip(min=1e-8,max=None)
# print('\n\nCED: ', np.where(y==1,-1/X, 0))
return np.where(y==1,-1/X, 0)
The main function call:
NN = NeuralNetwork(data_scaled, target_oh)
for i in range(10000): # trains the NN 10,000 times
NN.train(data_scaled, target_oh)
loss.append(NN.evaluate(NN.y, NN.pred))
y_pred = NN.predict(data_scaled)
The output is approximately constant always predicting a single class. What am I doing wrong? Appreciate your help on the code or directions to look at. Thanks.
subtract the gradient and also derive the unactivated output instead of activated output. read this piece of code that i wrote for more info and watch Sebastian Lagues video about Neural Networks for help about this topic
P.S. The video is not in python, but it explains exactly what 3 years in college tries to explain.

Keras : How to create a custom layer with weights when the input shape is unknow during compilation?

I want to define a Pre-processing layer just after my input layer, ie it will use the mean and variance of a scaler that was computed before and apply it on my inputs before passing them to the Dense network.
Lambda layers do not work in my case because I want to save the model, the objective is that when applied on data, there is not need to process the inputs since it will be done in the early stage of the network.
Using K.variables for the mean and var works, but I would like to use weights instead and set trainable=False. This way they will be saved in the weights of the network and I don't have to provide them each time.
class PreprocessLayer(Layer):
"""
Defines a layer that applies the preprocessing from a scaler
Needed because lambda layers are too fragile to be saved in a model
"""
def __init__(self, batch_size, mean, var, **kwargs):
self.b = batch_size
self.m = mean
self.v = var
super(PreprocessLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.mean = self.add_weight(name='mean',
shape=(self.b,input_shape[1]),
initializer=tf.constant_initializer(self.m),
trainable=False)
self.var = self.add_weight(name='var',
shape=(self.b,input_shape[1]),
initializer=tf.constant_initializer(self.v),
trainable=False)
super(PreprocessLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return (x-self.mean)/self.var
def compute_output_shape(self, input_shape):
return (input_shape[0],input_shape[1])
def get_config(self):
config = super(PreprocessLayer, self).get_config()
config['mean'] = self.m
config['var'] = self.v
return config
And I call this layer with
L0 = PreprocessLayer(batch_size=20,mean=scaler.mean_,var=scaler.scale_)(IN)
The problem arises at
shape=(self.b,input_shape[1]),
Which give me the error (when batch_size is 20)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [32,15] vs. [20,15]
[[Node: preprocess_layer_1/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_IN_0_0, preprocess_layer_1/mean/read)]]
From what I understand, since my weights (mean and var) need to have the same shape as the input x, the first axis poses problems when the batch_size is not a divisor of the training size because it will have different values during the training. That causes the crash because the shape has to be determined at compilation time and I cannot leave it blank.
Is there any way to have a dynamic value for the first value of shape ? If not, a work around for this problem ?
For anyone having the same issue - which is a remainder different from the batch_size at the end of the epoch (due to the training and testing size not being a multiple of the batch size) that results in a InvalidArgumentError: Incompatible shapes - here is my fix.
Since this remainder will always have a size smaller than the batch_size, what I did in the call function is to slice the weights like this :
def call(self, x):
mean = self.mean[:K.shape(x)[0],:]
std = self.std[:K.shape(x)[0],:]
return (x-mean)/std
This works but it means that if a batch size larger than the one that initialized the layer is used to evaluate the model, the error will pop up again.
This is why I put at in the __init__ :
self.b = max(32,batch_size).
Because predict() uses by default batch_size = 32
I do not think you need to add mean and var as weights. You can calculate them in your call function. I also do not exactly understand why you want to use this instead of BatchNormalization but anyway, maybe you can try this code
class PreprocessLayer(Layer):
def __init__(self, eps=1e-6, **kwargs):
self.eps = eps
super(PreprocessLayer, self).__init__(**kwargs)
def build(self, input_shape):
super(PreprocessLayer, self).build(input_shape)
def call(self, x):
mean = K.mean(x, axis=-1, keepdims=True)
std = K.std(x, axis=-1, keepdims=True)
return (x - mean) / (std + self.eps)
def compute_output_shape(self, input_shape):
return input_shape
eps is to avoid division by 0.
I do not guarantee this will work, but maybe give it a try.

How do I correctly implement a custom activity regularizer in Keras?

I am trying to implement sparse autoencoders according to Andrew Ng's lecture notes as shown here.
It requires that a sparsity constraint be applied on an autoencoder layer by introducing a penalty term (K-L divergence). I tried to implement this using the direction provided here, after some minor changes.
Here is the K-L divergence and the sparsity penalty term implemented by the SparseActivityRegularizer class as shown below.
def kl_divergence(p, p_hat):
return (p * K.log(p / p_hat)) + ((1-p) * K.log((1-p) / (1-p_hat)))
class SparseActivityRegularizer(Regularizer):
sparsityBeta = None
def __init__(self, l1=0., l2=0., p=-0.9, sparsityBeta=0.1):
self.p = p
self.sparsityBeta = sparsityBeta
def set_layer(self, layer):
self.layer = layer
def __call__(self, loss):
#p_hat needs to be the average activation of the units in the hidden layer.
p_hat = T.sum(T.mean(self.layer.get_output(True) , axis=0))
loss += self.sparsityBeta * kl_divergence(self.p, p_hat)
return loss
def get_config(self):
return {"name": self.__class__.__name__,
"p": self.l1}
The model was built like so
X_train = np.load('X_train.npy')
X_test = np.load('X_test.npy')
autoencoder = Sequential()
encoder = containers.Sequential([Dense(250, input_dim=576, init='glorot_uniform', activation='tanh',
activity_regularizer=SparseActivityRegularizer(p=-0.9, sparsityBeta=0.1))])
decoder = containers.Sequential([Dense(576, input_dim=250)])
autoencoder.add(AutoEncoder(encoder=encoder, decoder=decoder, output_reconstruction=True))
autoencoder.layers[0].build()
autoencoder.compile(loss='mse', optimizer=SGD(lr=0.001, momentum=0.9, nesterov=True))
loss = autoencoder.fit(X_train_tmp, X_train_tmp, nb_epoch=200, batch_size=800, verbose=True, show_accuracy=True, validation_split = 0.3)
autoencoder.save_weights('SparseAutoEncoder.h5',overwrite = True)
result = autoencoder.predict(X_test)
When I call the fit() function I get negative loss values and the output does not resemble the input at all. I want to know where I am going wrong. What is the correct way to calculate the average activation of a layer and use this custom sparsity regularizer? Any sort of help will be greatly appreciated. Thanks!
I am using Keras 0.3.1 with Python 2.7 as the latest Keras (1.0.1) build does not have the Autoencoder layer.
You have defined self.p = -0.9 instead of the 0.05 value that both the original poster and the lecture notes you referred to are using.
I correct some erros:
class SparseRegularizer(keras.regularizers.Regularizer):
def __init__(self, rho = 0.01,beta = 1):
"""
rho : Desired average activation of the hidden units
beta : Weight of sparsity penalty term
"""
self.rho = rho
self.beta = beta
def __call__(self, activation):
rho = self.rho
beta = self.beta
# sigmoid because we need the probability distributions
activation = tf.nn.sigmoid(activation)
# average over the batch samples
rho_bar = K.mean(activation, axis=0)
# Avoid division by 0
rho_bar = K.maximum(rho_bar,1e-10)
KLs = rho*K.log(rho/rho_bar) + (1-rho)*K.log((1-rho)/(1-rho_bar))
return beta * K.sum(KLs) # sum over the layer units
def get_config(self):
return {
'rho': self.rho,
'beta': self.beta
}

Categories