I am setting up a custom layer with a custom gradient. The inputs are a single 2-D tensor of shape (?, 2). The outputs are also a single 2-D tensor with shape (?, 2).
I am struggling with understanding how these objects behave. What I've gathered from the documentation is that for a given input, the gradient will have the same shape as the output and that I need to return a list of gradients for each input. I've been assuming that since my inputs look like (?, 2) and my outputs look like (?, 2), then the grad function should return a length-2 list: [input_1_grad, input_2_grad], where both list items are tensors with the shape of the output, (?, 2).
This is not working, which is why I'm hoping someone here could help.
Here is my error (appears to occur at compile time):
ValueError: Num gradients 3 generated for op name:
"custom_layer/IdentityN" op: "IdentityN" input:
"custom_layer_2/concat" input: "custom_layer_1/concat" attr { key:
"T" value {
list {
type: DT_FLOAT
type: DT_FLOAT
} } } attr { key: "_gradient_op_type" value {
s: "CustomGradient-28729" } } do not match num inputs 2
The other wrinkle is that the input to the custom layer is itself also a custom layer (though without a custom gradient). I will provide the code for both layers, in case it's helpful.
Also, note that the network compiles and runs if I don't try to specify a custom gradient. But, since my functions need help differentiating themselves, I need to manually intervene, so having a working custom gradient is critical.
First Custom Layer (no custom gradient):
class custom_layer_1(tensorflow.keras.layers.Layer):
def __init__(self):
super(custom_layer_1, self).__init__()
def build(self, input_shape):
self.term_1 = self.add_weight('term_1', trainable=True)
self.term_2 = self.add_weight('term_2', trainable=True)
def call(self, x):
self.term_1 = formula in terms of x
self.term_2 = another formula in terms of x
return tf.concat([self.term_1, self.term_2], axis=1)
Second Custom Layer (with the custom gradient):
class custom_layer_2(tensorflow.keras.layers.Layer):
### the inputs
# x is the concatenation of term_1 and term_2
def __init__(self):
super(custom_layer_2, self).__init__()
def build(self, input_shape):
#self.weight_1 = self.add_weight('weight_1', trainable=True)
#self.weight_2 = self.add_weight('weight_2', trainable=True)
def call(self, x):
return custom_function(x)
The Custom Function:
#tf.custom_gradient
def custom_function(x):
### the inputs
# x is a concatenation of term_1 and term_2
weight_1 = function in terms of x
weight_2 = another function in terms of x
### the gradient
def grad(dy):
# assuming dy has the output shape of (?, 2). could be wrong.
d_weight_1 = K.reshape(dy[:, 0], shape=(K.shape(x)[0], 1))
d_weight_1 = K.reshape(dy[:, 1], shape=(K.shape(x)[0], 1))
term_1 = K.reshape(x[:, 0], shape=(K.shape(x)[0], 1))
term_2 = K.reshape(x[:, 1], shape=(K.shape(x)[0], 1))
d_weight_1_d_term_1 = tf.where(K.equal(term_1, K.zeros_like(term_1)), K.zeros_like(term_1), -term_2 / term_1) * d_weight_1
d_weight_1_d_term_2 = tf.where(K.equal(term_1, K.zeros_like(term_1)), K.zeros_like(term_1), 1 / term_1) * d_weight_1
d_weight_2_d_term_1 = tf.where(K.equal(term_2, K.zeros_like(term_2)), K.zeros_like(term_1), 2 * term_1 / term_2) * d_weight_2
d_weight_2_d_term_2 = tf.where(K.equal(term_2, K.zeros_like(term_2)), K.zeros_like(term_1), -K.square(term_1 / term_2)) * d_weight_2
return tf.concat([d_weight_1_d_term_1, d_weight_1_d_term_2], axis=1), tf.concat([d_weight_2_d_term_1, d_weight_2_d_term_2], axis=1)
return tf.concat([weight_1, weight_2], axis=1), grad
Any help would be much appreciated!
Related
I have been trying to define a custom layer in Keras with a custom discrete gradient, as the activation function is discrete.
The layer looks like this:
class DiffLayer(tf.keras.layers.Layer):
def __init__(self):
super(DiffLayer, self).__init__()
def build(self, input_shape):
self.w = self.add_weight(
shape=(15, 1),
initializer="random_normal",
trainable=True,
)
self.b = self.add_weight(
shape=(1, 1), initializer="random_normal", trainable=True
)
def call(self, x):
z = tf.matmul(Flatten()(x), self.w) + self.b
a = custom_op(z)
self.a = a
if K.greater(a,0.5):
return x-1
else:
return x
And the custom_op function:
#tf.custom_gradient
def custom_op(x):
a = 1. / (1. + K.exp(-x))
def custom_grad(dy):
if K.greater(a, 0.5):
grad = K.exp(x)
else:
grad = 0
return grad
return a, custom_grad
I have followed the tutorials from this post but when I try to fit the network that I am working with I get the following warning:
WARNING:tensorflow:Gradients do not exist for variables ['diff_layer_10/Variable:0', 'diff_layer_10/Variable:0'] when minimizing the loss.
My guess is that Keras is not detecting the defined gradient because of the way it is defined but I cannot think of a different way of defining it.
Is this the case or am I missing something in my code?
EDIT
As suggested by one of the comments I am going to further explain what I am trying to do. I want a to be a parameter that decides what happens to the input data. If a is greater than 0.5 then I want 1 subtracted to the input data, otherwise the layer should return the input data.
I do not know if that is possible to do in Keras.
My hare-brained idea is to create a custom layer that allows me to programmatically add features to a model's output.
EDIT - My "O" output values (see image below) are ASCII values. I want the "F"eature nodes to be 1 if the corresponding "O" nodes are alphabetic and 0 otherwise. In a previous experiment, the additional information made training much much better.
class Unpack_and_Categorize(keras.layers.Layer):
def __init__(self, units=32, **kwargs):
super(Unpack_and_Categorize, self).__init__(units, **kwargs)
self.units = units
self.trainable = False
def build(self, input_shape):
self.weight = self.add_weight(
shape=(input_shape[-1], self.units),
trainable=True,
)
self.bias = self.add_weight(
shape=(self.units,), trainable=True, dtype="float32"
)
def call(self, inputs):
batch_size = inputs.shape[0]
one_hot_size = self.units
c = tf.constant([0] * (one_hot_size * batch_size), shape=(batch_size, one_hot_size))
base_out = tf.tensordot(inputs, self.weight, axes = 1) + self.bias
return tf.concat(base_out, c, shape=(batch_size, 2*one_hot_size))
This image shows what I am trying to accomplish. My custom layer (right side) has 3 values that are densely connected to the previous layer. But now I want to add three 3 more output values that are totally derived from the O1..3. For example, I might set Fx to 1 if Ox was an even number. This would be done in the call method.
So the challenge is that I don't want to hardcode the number of outputs. That is, if the input layer has 10 inputs, then the customer layer will have 20 values. (The challenge that follows is 'will it back-prop, or simply explode...)
Here is an example where we see the "A" is categorized with a 1, while the punctuation and numeric are categorized with a 0.
It's a bit difficult to say exactly what you want to do without seeing all your code, but you need to make sure that all dimensions except the one you want to concatenate are the same:
import tensorflow as tf
def call(inputs):
w_init = tf.random_normal_initializer()
w = tf.Variable(
initial_value=w_init(shape=(10, 10), dtype="float32"),
trainable=True,
)
batch_size = tf.shape(inputs)[0]
c = tf.constant([1.0] * (15 * tf.cast(batch_size, tf.float32)), shape=(batch_size, 15))
print('Inputs: ', inputs.shape)
print('C: ', c.shape)
return tf.concat((tf.tensordot(inputs, w, axes = 1) ,c), 1)
batch_size = 5
print('Result: ', call(tf.random.normal((batch_size, 10))).shape)
Inputs: (5, 10)
C: (5, 15)
Result: (5, 25)
I am having a custom layer and I want to print the intermediate tensors which are not linked to the returned tensor(shown in code) by call() method of custom layer. The code I used is:
class Similarity(Layer):
def __init__(self, num1, num2):
super(Similarity, self).__init__()
self.num1 = num1
self.num2 = num2
# self.total = tf.Variable(initial_value=tf.zeros((16,self.num1, 1)), trainable=False)
def build(self, input_shape):
super(Similarity, self).build((None, self.num1, 1))
def compute_mask(self, inputs, mask=None):
# Just pass the received mask from previous layer, to the next layer or
# manipulate it if this layer changes the shape of the input
return mask
def call(self, inputs, mask=None):
print(">>", type(inputs), inputs.shape, inputs)
normalized = tf.nn.l2_normalize(inputs, axis = 2)
print("norm", normalized)
# multiply row i with row j using transpose
# element wise product
similarity = tf.matmul(normalized, normalized,
adjoint_b = True # transpose second matrix
)
print("SIM", similarity)
z=tf.linalg.band_part(similarity, 0, -1)*3 + tf.linalg.band_part(similarity, -1, 0)*2 - tf.linalg.band_part(similarity,0,0)*6 + tf.linalg.band_part(similarity,0,0)
# z = K.print_tensor(tf.reduce_sum(z, 2, keepdims=True))
z = tf.reduce_sum(z, 2, keepdims=True)
z = tf.argsort(z) # <----------- METHOD2: Reassigned the Z to the tensor I want to print temporarily
z = K.print_tensor(z)
print(z)
z=tf.linalg.band_part(similarity, 0, -1)*3 + tf.linalg.band_part(similarity, -1, 0)*2 - tf.linalg.band_part(similarity,0,0)*6 + tf.linalg.band_part(similarity,0,0)
z = K.print_tensor(tf.reduce_sum(z, 2, keepdims=True)) #<------------- THIS LINE WORKS/PRINTS AS Z is returned
# z = tf.reduce_sum(z, 2, keepdims=True)
#tf.function
#<------------- METHOD1: Want to print RANKT tensor but this DID NOT WORKED
def f(z):
rankt = K.print_tensor(tf.argsort(z))
# rankt = tf.reshape(rankt, (-1, self.num1))
# rankt = K.print_tensor(rankt)
return rankt
pt = f(z)
return z # <--------- The returned tensor
def compute_output_shape(self, input_shape):
print("IS", (None, self.num1, 1))
return (None, self.num1, 1)
To be more clear,
I used method1 in which I used #tf.function to print rankt tensor but it didn't worked.
Secondly, in method2, I reassigned z(returned tensor after call()) temporarily, so that it's executed in backprop and I get the printed values. After this I reassigned z to original opertaions
To summarize it I don't want value of z but I want to print value of some variable which is depended upon z but I am not able to print any variable other than z
There is tf.print function for this.
In the eager mode, it returns nothing and just prints the tensors. When used during computation graph building, it returns TF operators that do identity and print the tensor values as a side-effect.
I have searhed a lot but I couldn't find anything to print intermediate tenosors. I turns out that we could only print the tensors which are linked to the exectuted tensor (here z). So what I did was, I printed z using K.print_tensor() and then, later on, used that tensor (obviously now in list form) to perform my computation (was side computation, not to be implemented in logic)
I need to rearrange a tensor values and then reshape it in Keras, however I am struggling with the proper way to to rearrange a tensor in Keras with Tensorflow backend.
This custom layer/function will iterate through the values, and then rearrange the values via a stride formula
This doesn't seem to have weights, so I am assuming stateless and won't affect back propagation.
It requires list slicing though:
out_array[b,channel_out, row_out, column_out] = in_array[b,i,j,k]
and this is just one of the components I am struggling with.
Here is the function/layer
def reorg(tensor, stride):
batch,channel, height, width = (tensor.get_shape())
out_channel = channel * (stride * stride)
out_len = length//stride
out_width = width//stride
#create new empty tensor
out_array = K.zeros((batch, out_channel, out_len, out_width))
for b in batch:
for i in range(channel):
for j in range(height):
for k in range(width):
channel_out = i + (j % stride) * (channel * stride) + (k % stride) * channel
row_out = j//stride
column_out = k//stride
out_array[b,channel_out, row_out, column_out] = K.slice(in_array,[b,i,j,k], size = (1,1,1,1))
return out_array.astype("int")
I don't have much experience creating custom functions/layers in Keras,
so not quite sure If I am on the right track.
Here is what the code bit is doing depending on the stride (here it's 2):
https://towardsdatascience.com/training-object-detection-yolov2-from-scratch-using-cyclic-learning-rates-b3364f7e4755
When you say re-arrange, do you mean change the order of your axes? There is a function called tf.transpose which you can use inside a custom layer. There is also tf.keras.layers.Permute which can be used without any custom code to re-order a tensor.
If you are asking how you can create a custom layer, there are some methods you'll need to implement. The docs explain it pretty well here: Custom Layers
from tensorflow.keras import layers
import tensorflow as tf
class Linear(layers.Layer):
def __init__(self, units=32):
super(Linear, self).__init__()
self.units = units
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units),
initializer='random_normal',
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer='random_normal',
trainable=True)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
Since the Keras wrapper does not support attention model yet, I'd like to refer to the following custom attention.
https://github.com/datalogue/keras-attention/blob/master/models/custom_recurrents.py
But the problem is, when I run the code above, it returns following error:
ImportError: cannot import name '_time_distributed_dense'
It looks like no more _time_distributed_dense is supported by keras over 2.0.0
the only parts that use _time_distributed_dense module is the part below:
def call(self, x):
# store the whole sequence so we can "attend" to it at each timestep
self.x_seq = x
# apply the a dense layer over the time dimension of the sequence
# do it here because it doesn't depend on any previous steps
# thefore we can save computation time:
self._uxpb = _time_distributed_dense(self.x_seq, self.U_a, b=self.b_a,
input_dim=self.input_dim,
timesteps=self.timesteps,
output_dim=self.units)
return super(AttentionDecoder, self).call(x)
In which way should I change the _time_distrubuted_dense(self ... ) part?
I just posted from An Chen's answer of the GitHub issue (the page or his answer might be deleted in the future)
def _time_distributed_dense(x, w, b=None, dropout=None,
input_dim=None, output_dim=None,
timesteps=None, training=None):
"""Apply `y . w + b` for every temporal slice y of x.
# Arguments
x: input tensor.
w: weight matrix.
b: optional bias vector.
dropout: wether to apply dropout (same dropout mask
for every temporal slice of the input).
input_dim: integer; optional dimensionality of the input.
output_dim: integer; optional dimensionality of the output.
timesteps: integer; optional number of timesteps.
training: training phase tensor or boolean.
# Returns
Output tensor.
"""
if not input_dim:
input_dim = K.shape(x)[2]
if not timesteps:
timesteps = K.shape(x)[1]
if not output_dim:
output_dim = K.shape(w)[1]
if dropout is not None and 0. < dropout < 1.:
# apply the same dropout pattern at every timestep
ones = K.ones_like(K.reshape(x[:, 0, :], (-1, input_dim)))
dropout_matrix = K.dropout(ones, dropout)
expanded_dropout_matrix = K.repeat(dropout_matrix, timesteps)
x = K.in_train_phase(x * expanded_dropout_matrix, x, training=training)
# collapse time dimension and batch dimension together
x = K.reshape(x, (-1, input_dim))
x = K.dot(x, w)
if b is not None:
x = K.bias_add(x, b)
# reshape to 3D tensor
if K.backend() == 'tensorflow':
x = K.reshape(x, K.stack([-1, timesteps, output_dim]))
x.set_shape([None, None, output_dim])
else:
x = K.reshape(x, (-1, timesteps, output_dim))
return x
You could just add this on your Python code.