I am trying to make some changes to the inbuilt dropout function in tensorflow. What is the best procedure to do so?
I'd like to make some changes in forward and backpropogation steps. In Tensorflow Implementation I can only find forward pass not backward pass.
I'd like to modify both forward and backward pass.
You can use tf.custom_gradient to define your own forward and backprop step in a single method. Here is a simple example:
import tensorflow as tf
tf.InteractiveSession()
#tf.custom_gradient
def custom_multiply(a, x):
# Define your own forward step
y = a * x
# Define your own backward step
def grads(dy): return dy * x, dy * a + 100
# Return the forward result and the backward function
return y, grads
a, x = tf.constant(2), tf.constant(3)
y = custom_multiply(a, x)
dy_dx = tf.gradients(y, x)[0]
# It will print `dy/dx = 102` instead of 2 if the gradient is not customized
print('dy/dx =', dy_dx.eval())
If your want to customize your own layer, simply replace the core function used in tf.layers.Dropout.call with your own's.
Related
I want to approximate the function g(x) = exp(x) with a linear combination of functions, h(x) = sum_i(a_i * f_i(x)) using a neural network with TF2.
Now, the network input is just x and the outputs are the functions f_i.
The custom loss function is simple: the mean squared difference |g(x) - h(x)|^2.
My problem is that I don't understand how to define/use a_i?
First, all a_i are just scalars, In addition they don't depend on x.
Defining a_i as inputs doesn't make sense since I want them to be optimized.
Defining as outputs makes them depend on x which I don't want.
How can I add these variables as scalars to the network and make them optimized by the optimization process ?.
During training, Tensorflow tracks all tf.Variable objects defined in the model. If you define your constant as a tf.Variable, it will be ajusted via backpropagation.
Let's say we have a dataset wi X, and y which is y = X * 2:
import tensorflow as tf
x = tf.random.uniform((10_000, 1))
y = x * 2
We will create a model which has a constant inside, which will need to replicate the relationship between X and y. We will of course initialize this value as something other than 2. The relationship between X and y is 2 so the training should make the constant converge towards 2. So let's define a model that is nothing but a constant.
class CustomModel(tf.keras.models.Model):
def __init__(self):
super(CustomModel, self).__init__()
self.constant = tf.Variable(initial_value=0.1, dtype=tf.float32, trainable=True)
def call(self, x, **kwargs):
x = self.constant * x
return x
model = CustomModel()
Now just compile and train the model:
model.compile(loss='mae')
history = model.fit(x, y, epochs=25, verbose=0)
Now look at the weights. The constant, which was initialized with value 0.1, is now 2. It was optimized, as it understood the relationship between X and y, which is 2.
model.weights
[<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>]
I am learning custom gradient in Tensorflow 1.14. I am testing it out by defining custom gradient for a simple ReLu function as follows:
import numpy as np
import tensorflow as tf
#tf.custom_gradient
def rateFunction(v_):
z_ = tf.nn.relu(v_)
def grad(dy):
dz_dv = tf.where(tf.greater(v_, 0.), tf.ones_like(v_), tf.zeros_like(v_))
dv = dy * dz_dv
return [dv]
return z_, grad
# define test input
vv = tf.random.normal((32,100))
# output from customized gradient
z1 = rateFunction(vv)
and I expect the gradient computed using the custom gradient to match the gradient of the actual ReLU, but it does not:
# output of actual relu
z2 = tf.nn.relu(vv)
# Compute the gradient
sess = tf.Session()
dzdv1=sess.run(tf.gradients(z1, vv)[0])
dzdv2=sess.run(tf.gradients(z2, vv)[0])
# Expect to match, i.e. difference to be 0
print(np.mean(np.abs(dzdv1-dzdv2)))
but the difference between the expected and actual is not zero. I got an mean absolute difference of about 0.49. Can someone please explain to me why this is happening? Thanks a lot!
The problem comes from
vv = tf.random.normal((32,100))
a different input is generated each time.
I want to multiply a Keras layer with my own Variable.
Then, I want to compute the gradients of some loss relative to the variables I have defined.
Here is a simplified MWE of what I am trying to do:
import tensorflow as tf
x = input_shape = tf.keras.layers.Input((10,))
x = tf.keras.layers.Dense(5)(x)
s = tf.Variable(tf.ones((5,)))
x = x*s
model = tf.keras.models.Model(input_shape, x)
X = tf.random.normal((50, 10)) # random sample
with tf.GradientTape() as tape:
tape.watch(s)
y = model(X)
loss = y**2
print(tape.gradient(loss, s)) # why None ??
The print prints None... why?
Notice that I am using eager-execution (TF version 2.0.0).
I managed to fix my problem by sub-classing Model and creating my variable inside the model:
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(5)
self.s = tf.Variable(tf.ones((5,)))
def call(self, inputs):
x = self.dense(inputs)
x = x * self.s
return x
Alternatively, defining my own custom layer also works.
There must be some magic going on whereby variables not inside a model are not backpropagated (like in PyTorch).
I will leave the question open because I am curious as to why my code was not working and what a simpler fix would look like.
This might be the explanation. Based on reviewing the documentation, I'm suspecting that the issue is the differentiation with respect to the model layer "s" (or any other layer say "x") might not be a meaningful calculation. For example, it is possible to do this:
print(tape.gradient(loss, model.variables))
and obtain the gradients with respect to the model weights/parameters, but differentiating the model with respect to a "layer" is not appropriate. This is my speculation at this point. I hope this helps.
I am trying to perform the most basic function minimisation possible in TensorFlow 2.0, exactly as in the question Tensorflow 2.0: minimize a simple function, however I cannot get the solution described there to work. Here is my attempt, mostly copy-pasted but with some bits that seemed to be missing added in.
import tensorflow as tf
x = tf.Variable(2, name='x', trainable=True, dtype=tf.float32)
with tf.GradientTape() as t:
y = tf.math.square(x)
# Is the tape that computes the gradients!
trainable_variables = [x]
#### Option 2
# To use minimize you have to define your loss computation as a funcction
def compute_loss():
y = tf.math.square(x)
return y
opt = tf.optimizers.Adam(learning_rate=0.001)
train = opt.minimize(compute_loss, var_list=trainable_variables)
print("x:", x)
print("y:", y)
Output:
x: <tf.Variable 'x:0' shape=() dtype=float32, numpy=1.999>
y: tf.Tensor(4.0, shape=(), dtype=float32)
So it says the minimum is at x=1.999, but obviously that is wrong. So what happened? I suppose it only performed one loop of the minimiser or something? If so then "minimize" seems like a terrible name for the function. How is this supposed to work?
On a side note, I also need to know the values of intermediate variables that are calculated in the loss function (the example only has y, but imagine that it took several steps to compute y and I want all those numbers). I don't think I am using the gradient tape correctly either, it is not obvious to me that it has anything to do with the computations in the loss function (I just copied this stuff from the other question).
You need to call minimize multiple times, because minimize only performs a single step of your optimisation.
Following should work
import tensorflow as tf
x = tf.Variable(2, name='x', trainable=True, dtype=tf.float32)
# Is the tape that computes the gradients!
trainable_variables = [x]
# To use minimize you have to define your loss computation as a funcction
class Model():
def __init__(self):
self.y = 0
def compute_loss(self):
self.y = tf.math.square(x)
return self.y
opt = tf.optimizers.Adam(learning_rate=0.01)
model = Model()
for i in range(1000):
train = opt.minimize(model.compute_loss, var_list=trainable_variables)
print("x:", x)
print("y:", model.y)
Say I have some custom operation binarizer used in a neural network. The operation takes a Tensor and constructs a new Tensor. I would like to modify that operation such that it is only used in the forward pass. In the backward pass, when gradients are calculated, it should just pass through the gradients reaching it.
More concretly, say binarizer is:
def binarizer(input):
prob = tf.truediv(tf.add(1.0, input), 2.0)
bernoulli = tf.contrib.distributions.Bernoulli(p=prob, dtype=tf.float32)
return 2 * bernoulli.sample() - 1
and I setup my network:
# ...
h1_before_my_op = tf.nn.tanh(tf.matmul(x, W) + bias_h1)
h1 = binarizer(h1_before_b)
# ...
loss = tf.reduce_mean(tf.square(y - y_true))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
How do I tell TensorFlow to skip gradient calculation in the backward pass?
I tried defining a custom operation as described in this answer, however: py_func cannot return Tensors, that's not what it is made for – I get:
UnimplementedError (see above for traceback): Unsupported object type Tensor
You're looking for tf.stop_gradient(input, name=None):
Stops gradient computation.
When executed in a graph, this op outputs its input tensor as-is.
h1 = binarizer(h1_before_b)
h1 = tf.stop_gradient(h1)