Tensorflow custom gradient not giving expected answer - python

I am learning custom gradient in Tensorflow 1.14. I am testing it out by defining custom gradient for a simple ReLu function as follows:
import numpy as np
import tensorflow as tf
#tf.custom_gradient
def rateFunction(v_):
z_ = tf.nn.relu(v_)
def grad(dy):
dz_dv = tf.where(tf.greater(v_, 0.), tf.ones_like(v_), tf.zeros_like(v_))
dv = dy * dz_dv
return [dv]
return z_, grad
# define test input
vv = tf.random.normal((32,100))
# output from customized gradient
z1 = rateFunction(vv)
and I expect the gradient computed using the custom gradient to match the gradient of the actual ReLU, but it does not:
# output of actual relu
z2 = tf.nn.relu(vv)
# Compute the gradient
sess = tf.Session()
dzdv1=sess.run(tf.gradients(z1, vv)[0])
dzdv2=sess.run(tf.gradients(z2, vv)[0])
# Expect to match, i.e. difference to be 0
print(np.mean(np.abs(dzdv1-dzdv2)))
but the difference between the expected and actual is not zero. I got an mean absolute difference of about 0.49. Can someone please explain to me why this is happening? Thanks a lot!

The problem comes from
vv = tf.random.normal((32,100))
a different input is generated each time.

Related

Keras Adam minimize function: no gradients provided

I need to optimize a function with Adam Optimizer (no Neural Network involved). I made a dummy example to understand how it works, using the minimize function but seems like I'm not getting it. It's a simple function that returns the dot product between two arrays (as tf variables). Code bellow:
np.random.seed(1)
phi = tf.Variable(initial_value=np.random.rand(32))
theta = tf.Variable(initial_value=np.random.rand(32))
loss = lambda : tf.Variable(np.dot(phi, theta))
optimizer = Adam(learning_rate=0.1)
niter = 5
for _ in range(niter):
optimizer.minimize(loss, [phi,theta] )
print(phi[:5].numpy(),theta[:5].numpy())
I'm getting the following error in return:
ValueError: No gradients provided for any variable: (['Variable:0', 'Variable:0'],).
Can anyone tell me what I'm doing wrong?

How do I use autograd for a separate function independent of backpropagate in PyTorch?

I have two variables, x and theta. I am trying to minimise my loss with respect to theta only, but as part of my loss function I need the derivative of a different function (f) with respect to x. This derivative itself is not relevant to the minimisation, only its output. However, when implementing this in PyTorch I am getting a Runtime error.
A minimal example is as follows:
# minimal example of two different autograds
import torch
from torch.autograd.functional import jacobian
def f(theta, x):
return torch.sum(theta * x ** 2)
def df(theta, x):
J = jacobian(lambda x: f(theta, x), x)
return J
# example evaluations of the autograd gradient
x = torch.tensor([1., 2.])
theta = torch.tensor([1., 1.], requires_grad = True)
# derivative should be 2*theta*x (same as an analytical)
with torch.no_grad():
print(df(theta, x))
print(2*theta*x)
tensor([2., 4.])
tensor([2., 4.])
# define some arbitrary loss as a fn of theta
loss = torch.sum(df(theta, x)**2)
loss.backward()
gives the following error
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
If I provide an analytic derivative (2*theta*x), it works fine:
loss = torch.sum((2*theta*x)**2)
loss.backward()
Is there a way to do this in PyTorch? Or am I limited in some way?
Let me know if anyone needs any more details.
PS
I am imagining the solution is something similar to the way that JAX does autograd, as that is what I am more familiar with. What I mean here is that in JAX I believe you would just do:
from jax import grad
df = grad(lambda x: f(theta, x))
and then df would just be a function that can be called at any point. But is PyTorch the same? Or is there some conflict within .backward() that causes this error?
PyTorch's jacobian does not create a computation graph unless you explicitely ask for it
J = jacobian(lambda x: f(theta, x), x, create_graph=True)
.. with create_graph argument.
The documentation is quite clear about it
create_graph (bool, optional) – If True, the Jacobian will be computed in a differentiable manner

Why does Tensorflow's automatic differentiation fail when .numpy() is used in the loss function?

I've noticed that Tensorflow's automatic differentiation does not give the same values as finite differences when the loss function converts the input to a numpy array to calculate the output value. Here's a minimum working example of the problem:
import tensorflow as tf
import numpy as np
def lossFn(inputTensor):
# Input is a rank-2 square tensor
return tf.linalg.trace(inputTensor # inputTensor)
def lossFnWithNumpy(inputTensor):
# Same function, but converts input to a numpy array before performing the norm
inputArray = inputTensor.numpy()
return tf.linalg.trace(inputArray # inputArray)
N = 2
tf.random.set_seed(0)
randomTensor = tf.random.uniform([N, N])
# Prove that the two functions give the same output; evaluates to exactly zero
print(lossFn(randomTensor) - lossFnWithNumpy(randomTensor))
theoretical, numerical = tf.test.compute_gradient(lossFn, [randomTensor])
# These two values match
print(theoretical[0])
print(numerical[0])
theoretical, numerical = tf.test.compute_gradient(lossFnWithNumpy, [randomTensor])
# The theoretical value is [0 0 0 0]
print(theoretical[0])
print(numerical[0])
The function tf.test.compute_gradients computes the 'theoretical' gradient using automatic differentiation, and the numerical gradient using finite differences. As the code shows, if I use .numpy() in the loss function the automatic differentiation does not calculate the gradient.
Could anybody explain the reason for this?
From the guide : Introduction to Gradients and Automatic Differentiation
The tape can't record the gradient path if the calculation exits TensorFlow. For example:
x = tf.Variable([[1.0, 2.0],
[3.0, 4.0]], dtype=tf.float32)
with tf.GradientTape() as tape:
x2 = x**2
# This step is calculated with NumPy
y = np.mean(x2, axis=0)
# Like most ops, reduce_mean will cast the NumPy array to a constant tensor
# using `tf.convert_to_tensor`.
y = tf.reduce_mean(y,axis=0)
print(tape.gradient(y, x))
outputs None
The numpy value will be cast back as a constant tensor in the call to tf.linalg.trace, which Tensorflow cannot compute gradients on.

Basic function minimisation and variable tracking in TensorFlow 2.0

I am trying to perform the most basic function minimisation possible in TensorFlow 2.0, exactly as in the question Tensorflow 2.0: minimize a simple function, however I cannot get the solution described there to work. Here is my attempt, mostly copy-pasted but with some bits that seemed to be missing added in.
import tensorflow as tf
x = tf.Variable(2, name='x', trainable=True, dtype=tf.float32)
with tf.GradientTape() as t:
y = tf.math.square(x)
# Is the tape that computes the gradients!
trainable_variables = [x]
#### Option 2
# To use minimize you have to define your loss computation as a funcction
def compute_loss():
y = tf.math.square(x)
return y
opt = tf.optimizers.Adam(learning_rate=0.001)
train = opt.minimize(compute_loss, var_list=trainable_variables)
print("x:", x)
print("y:", y)
Output:
x: <tf.Variable 'x:0' shape=() dtype=float32, numpy=1.999>
y: tf.Tensor(4.0, shape=(), dtype=float32)
So it says the minimum is at x=1.999, but obviously that is wrong. So what happened? I suppose it only performed one loop of the minimiser or something? If so then "minimize" seems like a terrible name for the function. How is this supposed to work?
On a side note, I also need to know the values of intermediate variables that are calculated in the loss function (the example only has y, but imagine that it took several steps to compute y and I want all those numbers). I don't think I am using the gradient tape correctly either, it is not obvious to me that it has anything to do with the computations in the loss function (I just copied this stuff from the other question).
You need to call minimize multiple times, because minimize only performs a single step of your optimisation.
Following should work
import tensorflow as tf
x = tf.Variable(2, name='x', trainable=True, dtype=tf.float32)
# Is the tape that computes the gradients!
trainable_variables = [x]
# To use minimize you have to define your loss computation as a funcction
class Model():
def __init__(self):
self.y = 0
def compute_loss(self):
self.y = tf.math.square(x)
return self.y
opt = tf.optimizers.Adam(learning_rate=0.01)
model = Model()
for i in range(1000):
train = opt.minimize(model.compute_loss, var_list=trainable_variables)
print("x:", x)
print("y:", model.y)

matrix determinant differentiation in tensorflow

I am interested in computing the derivative of a matrix determinant using TensorFlow. I can see from experimentation that TensorFlow has not implemented a method of differentiating through a determinant:
LookupError: No gradient defined for operation 'MatrixDeterminant'
(op type: MatrixDeterminant)
A little further investigation revealed that it is actually possible to compute the derivative; see for example Jacobi's formula. I determined that in order to implement this means of differentiating through a determinant that I need to use the function decorator,
#tf.RegisterGradient("MatrixDeterminant")
def _sub_grad(op, grad):
...
However, I am not familiar enough with tensor flow to understand how this can be accomplished. Does anyone have any insight on this matter?
Here's an example where I run into this issue:
x = tf.Variable(tf.ones(shape=[1]))
y = tf.Variable(tf.ones(shape=[1]))
A = tf.reshape(
tf.pack([tf.sin(x), tf.zeros([1, ]), tf.zeros([1, ]), tf.cos(y)]), (2,2)
)
loss = tf.square(tf.matrix_determinant(A))
optimizer = tf.train.GradientDescentOptimizer(0.001)
train = optimizer.minimize(loss)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for step in xrange(100):
sess.run(train)
print sess.run(x)
Please check "Implement Gradient in Python" section here
In particular, you can implement it as follows
#ops.RegisterGradient("MatrixDeterminant")
def _MatrixDeterminantGrad(op, grad):
"""Gradient for MatrixDeterminant. Use formula from 2.2.4 from
An extended collection of matrix derivative results for forward and reverse
mode algorithmic differentiation by Mike Giles
-- http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf
"""
A = op.inputs[0]
C = op.outputs[0]
Ainv = tf.matrix_inverse(A)
return grad*C*tf.transpose(Ainv)
Then a simple training loop to check that it works:
a0 = np.array([[1,2],[3,4]]).astype(np.float32)
a = tf.Variable(a0)
b = tf.square(tf.matrix_determinant(a))
init_op = tf.initialize_all_variables()
sess = tf.InteractiveSession()
init_op.run()
minimization_steps = 50
learning_rate = 0.001
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(b)
losses = []
for i in range(minimization_steps):
train_op.run()
losses.append(b.eval())
Then you can visualize your loss over time
import matplotlib.pyplot as plt
plt.ylabel("Determinant Squared")
plt.xlabel("Iterations")
plt.plot(losses)
Should see something like this
I think you are confused with what is a derivative of a matrix determinant.
Matrix determinant is a function which is calculated over the elements of the matrix by some formula. So if all the elements of the matrix are numbers, you the determinant will you you just one number and the derivative will be 0. When some of the elements are variables, you will get an expression of these variables. For example:
x, x^2
1, sin(x)
The determinant will be x*sin(x) - x^2 and the derivative is 2x + sin(x) + x*cos(x). The Jacobi formula just connects the determinant with adjunct matrix.
In your example your matrix A consists of only numbers and therefore the determinant is just a number and the loss is just a number as well. GradientDescentOptimizer needs to have some free variables to minimize and does not have any because your loss is just a number.
For those who are interested, I discovered the solution that works on my problems:
#tf.RegisterGradient("MatrixDeterminant")
def _MatrixDeterminant(op, grad):
"""Gradient for MatrixDeterminant."""
return op.outputs[0] * tf.transpose(tf.matrix_inverse(op.inputs[0]))

Categories