I have been trying to define a custom layer in Keras with a custom discrete gradient, as the activation function is discrete.
The layer looks like this:
class DiffLayer(tf.keras.layers.Layer):
def __init__(self):
super(DiffLayer, self).__init__()
def build(self, input_shape):
self.w = self.add_weight(
shape=(15, 1),
initializer="random_normal",
trainable=True,
)
self.b = self.add_weight(
shape=(1, 1), initializer="random_normal", trainable=True
)
def call(self, x):
z = tf.matmul(Flatten()(x), self.w) + self.b
a = custom_op(z)
self.a = a
if K.greater(a,0.5):
return x-1
else:
return x
And the custom_op function:
#tf.custom_gradient
def custom_op(x):
a = 1. / (1. + K.exp(-x))
def custom_grad(dy):
if K.greater(a, 0.5):
grad = K.exp(x)
else:
grad = 0
return grad
return a, custom_grad
I have followed the tutorials from this post but when I try to fit the network that I am working with I get the following warning:
WARNING:tensorflow:Gradients do not exist for variables ['diff_layer_10/Variable:0', 'diff_layer_10/Variable:0'] when minimizing the loss.
My guess is that Keras is not detecting the defined gradient because of the way it is defined but I cannot think of a different way of defining it.
Is this the case or am I missing something in my code?
EDIT
As suggested by one of the comments I am going to further explain what I am trying to do. I want a to be a parameter that decides what happens to the input data. If a is greater than 0.5 then I want 1 subtracted to the input data, otherwise the layer should return the input data.
I do not know if that is possible to do in Keras.
Related
I'm trying to recreate results from a paper and I'm having a lot of issues in running a Tensorflow sub-classed model without resorting to eager execution. Even when using eager execution it seems that it isn't training properly (showing zero trainable parameters and loss metric is near constant). Any time an error is thrown the input arguments received all have 'None' shape (as shown in the error below), leading me to believe this is somehow an issue with the symbolic tensors used on execution?
This model uses the Spektral package with a dataset of labelled graphs as input. Here 'x' is the Spektral graph features, 'a' is the adjacency matrix, 'e' is the (unused) edge features matrix. 'C' is just a constant. The 'x' node features contain one-hot encodings and node coordinates which are split and treated separately throughout the model.
This is the current state of the model.
class egcl(Model):
def __init__(self):
super().__init__()
def call(self, inputs):
x, a, C = InputParser()(inputs)
h_feat, x_feat = NodeFeatureSplitter()(x)
distance = NodeDistance()(x_feat)
# Message matrices
m_ij_input = tf.keras.layers.Concatenate()([tf.squeeze(h_feat), distance, tf.squeeze(a)])
m_ij = tf.keras.layers.Dense(64, activation='swish')(m_ij_input)
m_ij = tf.keras.layers.Dense(9, activation='swish')(m_ij)
m_ij = tf.reshape(m_ij, shape=(3, 3, 3))
m_i = tf.reduce_sum(m_ij, axis=2)
# Update node feature
h_i_next_input = tf.keras.layers.concatenate([distance, m_i])
h_i_next = tf.keras.layers.Dense(64, activation='swish')(h_i_next_input)
h_i_next = tf.keras.layers.Dense(3, activation='swish')(h_i_next)
# Update coord feature
x_i_next = tf.keras.layers.Dense(64, activation='swish')(m_i)
x_i_next = tf.keras.layers.Dense(3, activation='swish')(x_i_next)
x_i_next = x_feat + (C * distance * x_i_next)
x_i_next = tf.squeeze(x_i_next)
# Fit with graph labels
out = tf.keras.layers.concatenate([h_i_next, x_i_next])
out = tf.keras.layers.Dense(64, activation='swish')(out)
out = tf.keras.layers.Dense(9, activation='swish')(out)
out = tf.math.reduce_mean(out, axis=0)
return out
# Model call
epochs = 5
model = egcl()
model.compile(optimizer=Adam(learning_rate=1e-03), loss=MeanSquaredError(), run_eagerly=False)
history = model.fit(loader.load(), steps_per_epoch=loader.steps_per_epoch, epochs=epochs)
And the custom classes called are defined as follows.
class InputParser(Layer):
def __init__(self):
super(InputParser, self).__init__()
def call(self, inputs):
x, a, e = inputs
C = tf.cast(1/(len(x)), tf.float32)
return x, a, C
class NodeFeatureSplitter(Layer):
def __init__(self):
super(NodeFeatureSplitter, self).__init__()
def call(self, x):
h_feat = x[...,:2]
x_feat = x[...,-3:]
return h_feat, x_feat
class NodeDistance(Layer):
def __init__(self):
super(NodeDistance, self).__init__()
def call(self, x_feat):
norm = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
for i in range(len(x_feat[0])):
for j in range(len(x_feat[0][i])):
norm = norm.write(norm.size(), tf.math.reduce_euclidean_norm([x_feat[0][i] - x_feat[0][j]]))
norm = norm.stack()
norm = tf.reshape(norm, shape=(len(x_feat[0]), len(x_feat[0])))
return norm
The current issue is the first concatenate layer throwing the following error,
ValueError: as_list() is not defined on an unknown TensorShape.
Call arguments received:
• inputs=('tf.Tensor(shape=(None, None, None), dtype=float32)', 'tf.Tensor(shape=(None, None, None), dtype=float32)', 'tf.Tensor(shape=(None, None, None, None), dtype=float32)')
Am I incorrectly sub-classing Model? Is this setup configured to train correctly even in eager execution mode? Please don't hesitate to ask for any clarifications or further info.
Thanks for any and all input <3
Edit: I tried Loris' suggestion of adding my layers into the init and calling them with 'self.'... This has gotten past the previous error however there is now another issue,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[1], expected a dimension of 1, got 9
[[node mean_squared_error/remove_squeezable_dimensions/Squeeze
W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
How is the loss function and squeeze causing this?
I wanted to implement an RBFN and found this code on StackOverflow itself. While I do understand some of the code, I do not understand what gamma, kwargs, and the entire call function.
Can someone please explain it to me?
from keras.layers import Layer
from keras import backend as K
class RBFLayer(Layer):
def __init__(self, units, gamma, **kwargs):
super(RBFLayer, self).__init__(**kwargs)
self.units = units
self.gamma = K.cast_to_floatx(gamma)
def build(self, input_shape):
self.mu = self.add_weight(name='mu',
shape=(int(input_shape[1]), self.units),
initializer='uniform',
trainable=True)
super(RBFLayer, self).build(input_shape)
def call(self, inputs):
diff = K.expand_dims(inputs) - self.mu
l2 = K.sum(K.pow(diff,2), axis=1)
res = K.exp(-1 * self.gamma * l2)
return res
def compute_output_shape(self, input_shape):
return (input_shape[0], self.units)
Gamma: According to the doc: the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The behavior of the model is very sensitive to the gamma parameter. When gamma is very small, the model is too constrained and cannot capture the complexity or “shape” of the data. It's a hyper-parameter.
kwargs: **kwargs is used to let the functions take an arbitrary number of keyword arguments. Details.
Call: In the call function, you're calculating the radial basis function kernel, ie. RBF kernel, defined as follows.
source.
The calculation of numerator part:
diff = K.expand_dims(inputs) - self.mu
l2 = K.sum(K.pow(diff,2), axis=1)
The calculation of denominator part:
res = K.exp(-1 * self.gamma * l2)
The self.gamma can be expressed as follows
I am setting up a custom layer with a custom gradient. The inputs are a single 2-D tensor of shape (?, 2). The outputs are also a single 2-D tensor with shape (?, 2).
I am struggling with understanding how these objects behave. What I've gathered from the documentation is that for a given input, the gradient will have the same shape as the output and that I need to return a list of gradients for each input. I've been assuming that since my inputs look like (?, 2) and my outputs look like (?, 2), then the grad function should return a length-2 list: [input_1_grad, input_2_grad], where both list items are tensors with the shape of the output, (?, 2).
This is not working, which is why I'm hoping someone here could help.
Here is my error (appears to occur at compile time):
ValueError: Num gradients 3 generated for op name:
"custom_layer/IdentityN" op: "IdentityN" input:
"custom_layer_2/concat" input: "custom_layer_1/concat" attr { key:
"T" value {
list {
type: DT_FLOAT
type: DT_FLOAT
} } } attr { key: "_gradient_op_type" value {
s: "CustomGradient-28729" } } do not match num inputs 2
The other wrinkle is that the input to the custom layer is itself also a custom layer (though without a custom gradient). I will provide the code for both layers, in case it's helpful.
Also, note that the network compiles and runs if I don't try to specify a custom gradient. But, since my functions need help differentiating themselves, I need to manually intervene, so having a working custom gradient is critical.
First Custom Layer (no custom gradient):
class custom_layer_1(tensorflow.keras.layers.Layer):
def __init__(self):
super(custom_layer_1, self).__init__()
def build(self, input_shape):
self.term_1 = self.add_weight('term_1', trainable=True)
self.term_2 = self.add_weight('term_2', trainable=True)
def call(self, x):
self.term_1 = formula in terms of x
self.term_2 = another formula in terms of x
return tf.concat([self.term_1, self.term_2], axis=1)
Second Custom Layer (with the custom gradient):
class custom_layer_2(tensorflow.keras.layers.Layer):
### the inputs
# x is the concatenation of term_1 and term_2
def __init__(self):
super(custom_layer_2, self).__init__()
def build(self, input_shape):
#self.weight_1 = self.add_weight('weight_1', trainable=True)
#self.weight_2 = self.add_weight('weight_2', trainable=True)
def call(self, x):
return custom_function(x)
The Custom Function:
#tf.custom_gradient
def custom_function(x):
### the inputs
# x is a concatenation of term_1 and term_2
weight_1 = function in terms of x
weight_2 = another function in terms of x
### the gradient
def grad(dy):
# assuming dy has the output shape of (?, 2). could be wrong.
d_weight_1 = K.reshape(dy[:, 0], shape=(K.shape(x)[0], 1))
d_weight_1 = K.reshape(dy[:, 1], shape=(K.shape(x)[0], 1))
term_1 = K.reshape(x[:, 0], shape=(K.shape(x)[0], 1))
term_2 = K.reshape(x[:, 1], shape=(K.shape(x)[0], 1))
d_weight_1_d_term_1 = tf.where(K.equal(term_1, K.zeros_like(term_1)), K.zeros_like(term_1), -term_2 / term_1) * d_weight_1
d_weight_1_d_term_2 = tf.where(K.equal(term_1, K.zeros_like(term_1)), K.zeros_like(term_1), 1 / term_1) * d_weight_1
d_weight_2_d_term_1 = tf.where(K.equal(term_2, K.zeros_like(term_2)), K.zeros_like(term_1), 2 * term_1 / term_2) * d_weight_2
d_weight_2_d_term_2 = tf.where(K.equal(term_2, K.zeros_like(term_2)), K.zeros_like(term_1), -K.square(term_1 / term_2)) * d_weight_2
return tf.concat([d_weight_1_d_term_1, d_weight_1_d_term_2], axis=1), tf.concat([d_weight_2_d_term_1, d_weight_2_d_term_2], axis=1)
return tf.concat([weight_1, weight_2], axis=1), grad
Any help would be much appreciated!
I want to encode the following function into a TS layer. Let x be a d-dimensional vector.
x -> tf.linalg.diag(x)*A + b,
where A is a trainable dxd matrix and b is a trainable (d-dimensional) vector.
If A and b were not there, I would have used a Lambda layer but since they are... how would I go about it.
P.s.: for educational perpouses I don't want to feed the lambda layer:
Lambda(lambda x: tf.linalg.diag(x)))
Into a fully-connected layer with "identity" activation. (I know this works but it doesn't help me learn how to address the problem really :) )
you can create your custom layer and put your function in call method.
class Custom_layer(keras.layers.Layer):
def __init__(self, dim):
super(Custom_layer, self).__init__()
self.dim = dim
# add trainable weight
self.weight = self.add_weight(shape=(dim,dim),trainable=True)
# add trainable bias
self.bias = self.add_weight(shape=(dim))
def call(self, input):
# your function
return (tf.linalg.diag(input)*self.weight) + self.bias
def get_config(self):
config = super(Custom_layer, self).get_config()
config['dim'] = self.dim
return config
And use it just like normal layer and give it with dimension argument when you use it.
my_layer = Custom_layer(desire_dimension)
output = my_layer(input)
I want to define a Pre-processing layer just after my input layer, ie it will use the mean and variance of a scaler that was computed before and apply it on my inputs before passing them to the Dense network.
Lambda layers do not work in my case because I want to save the model, the objective is that when applied on data, there is not need to process the inputs since it will be done in the early stage of the network.
Using K.variables for the mean and var works, but I would like to use weights instead and set trainable=False. This way they will be saved in the weights of the network and I don't have to provide them each time.
class PreprocessLayer(Layer):
"""
Defines a layer that applies the preprocessing from a scaler
Needed because lambda layers are too fragile to be saved in a model
"""
def __init__(self, batch_size, mean, var, **kwargs):
self.b = batch_size
self.m = mean
self.v = var
super(PreprocessLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.mean = self.add_weight(name='mean',
shape=(self.b,input_shape[1]),
initializer=tf.constant_initializer(self.m),
trainable=False)
self.var = self.add_weight(name='var',
shape=(self.b,input_shape[1]),
initializer=tf.constant_initializer(self.v),
trainable=False)
super(PreprocessLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return (x-self.mean)/self.var
def compute_output_shape(self, input_shape):
return (input_shape[0],input_shape[1])
def get_config(self):
config = super(PreprocessLayer, self).get_config()
config['mean'] = self.m
config['var'] = self.v
return config
And I call this layer with
L0 = PreprocessLayer(batch_size=20,mean=scaler.mean_,var=scaler.scale_)(IN)
The problem arises at
shape=(self.b,input_shape[1]),
Which give me the error (when batch_size is 20)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [32,15] vs. [20,15]
[[Node: preprocess_layer_1/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_IN_0_0, preprocess_layer_1/mean/read)]]
From what I understand, since my weights (mean and var) need to have the same shape as the input x, the first axis poses problems when the batch_size is not a divisor of the training size because it will have different values during the training. That causes the crash because the shape has to be determined at compilation time and I cannot leave it blank.
Is there any way to have a dynamic value for the first value of shape ? If not, a work around for this problem ?
For anyone having the same issue - which is a remainder different from the batch_size at the end of the epoch (due to the training and testing size not being a multiple of the batch size) that results in a InvalidArgumentError: Incompatible shapes - here is my fix.
Since this remainder will always have a size smaller than the batch_size, what I did in the call function is to slice the weights like this :
def call(self, x):
mean = self.mean[:K.shape(x)[0],:]
std = self.std[:K.shape(x)[0],:]
return (x-mean)/std
This works but it means that if a batch size larger than the one that initialized the layer is used to evaluate the model, the error will pop up again.
This is why I put at in the __init__ :
self.b = max(32,batch_size).
Because predict() uses by default batch_size = 32
I do not think you need to add mean and var as weights. You can calculate them in your call function. I also do not exactly understand why you want to use this instead of BatchNormalization but anyway, maybe you can try this code
class PreprocessLayer(Layer):
def __init__(self, eps=1e-6, **kwargs):
self.eps = eps
super(PreprocessLayer, self).__init__(**kwargs)
def build(self, input_shape):
super(PreprocessLayer, self).build(input_shape)
def call(self, x):
mean = K.mean(x, axis=-1, keepdims=True)
std = K.std(x, axis=-1, keepdims=True)
return (x - mean) / (std + self.eps)
def compute_output_shape(self, input_shape):
return input_shape
eps is to avoid division by 0.
I do not guarantee this will work, but maybe give it a try.