Why the gradients are unconnected in the following function? - python

I am implementing a customer operation whose gradients must be calculated. The following is the function:
def difference(prod,box):
result = tf.Variable(tf.zeros((prod.shape[0],box.shape[1]),dtype=tf.float16))
for i in tf.range(0,prod.shape[0]):
for j in tf.range(0,box.shape[1]):
result[i,j].assign((tf.reduce_prod(box[:,j])-tf.reduce_prod(prod[i,:]))/tf.reduce_prod(box[:,j]))
return result
I am unable to calculate the gradients with respect to box, the tape.gradient() is returning None, here is the code I have written for calculating gradients
prod = tf.constant([[3,4,5],[4,5,6],[1,3,3]],dtype=tf.float16)
box = tf.Variable([[4,5],[5,6],[5,7]],dtype=tf.float16)
with tf.GradientTape() as tape:
tape.watch(box)
loss = difference(prod,box)
print(tape.gradient(loss,box))
I am not able to find the reason for unconnected gradients. Is the result variable causing it? Kindly suggest an alternative implementation.

Yes, in order to calculate gradients we need a set of (differentiable) operations on your variables.
You should re-write difference as a function of the 2 input tensors. I think (though happy to confess I am not 100% sure!) that it is the use of 'assign' that makes the gradient tape fall over.
Perhaps something like this:
def difference(prod, box):
box_red = tf.reduce_prod(box, axis=0)
prod_red = tf.reduce_prod(prod, axis=1)
return (tf.expand_dims(box_red, 0) - tf.expand_dims(prod_red, 1)) / tf.expand_dims(box_red, 0)
would get you the desired result

Related

Tensorflow giving a ValueError: No gradients provided for any variable

I'm trying to implement a loss function that increases loss as the model ranks images from worst to best, to do this I've come up with an algorithm that sorts the predicted score array according to the true scores of the image batch, then starting from the largest predicted score, check how far away it is from the first position in the array and give it a loss based on that, for the second largest we will see how far away it is from the 2nd array position and give it a loss based on that
To do this, I'm using tf.nn.top_k and other functions that I looked up to all be differentiable to my knowledge, but I still get the No gradients provided error
Can someone please tell me what part am I doing wrong?
Please note that the global sub_tensor was a workaround (to replace correct_indices) I was doing to avoid using a range which I know is non-differentiable, an array from outside the function that is fixed to be a range of the length of the batch [0-32]. This still didn't work
sub_tensor = constant(np.array([np.arange(32)],dtype='int32'))
def get_ranking_loss(y_true,y_pred):
global sub_tensor
_, y_true_ind_k = tf.nn.top_k(y_true, y_true.shape[1])
sorted_y_pred = tf.gather(y_pred,y_true_ind_k)
_, y_pred_ind_k = tf.nn.top_k(sorted_y_pred, sorted_y_pred.shape[1])
# correct_indices = tf.range(0,sorted_y_pred.shape[1])
subtracted = tf.math.subtract(y_pred_ind_k,sub_tensor)
absolute = tf.abs(subtracted)
absolute = tf.cast(absolute, float64)
return tf.reduce_sum(absolute)
I tried to change almost all functions to be tf functions only, but no luck

Is there any method that can accelerate the python gredient calculation?

I am working on an algorithm which needs gradient information.
I tried numdifftools.Gradient,this function works well,but the time cost is unsustainable.
Briefly,what Im doing is initializing a multi-dimensional(says d) vector t,then use the t vector to parameterize a matrix A,then the matrix along with other information gives a energy value,which is a scalar output.
I need the gradient (energy on t) element-wisely,so that i can update the t parameters and continue the loop.
My code looks like this:
def initialize(d):
......
return t
def A(t):
......
return A,result_2,result3...
def energy(t,A,para_2,para_3...)
......
some matrix calculation including kron etc.
......
return e
grad = numdifftools.Gradient(energy)(t)
#this return the same shape of t
#represents element-wise gradient w.r.t. the energy function.
t -= grad * learning_rate
this works exactly what i want,however,when the dimension goes bigger,the gradient calculation may take several minutes in only 1 iteration,while I need to perform thousands of the iterations,
I tried to use Google's JAX,and it seems that JAX only work when the output has just one scalar, while here I need matrix results.
Actually, u dont need to know what exactly I am doing,this is just a time cost optimization problem about gradient.
Is there any better way to do this?
Have you tried cache decorator? It might help
from functools import cache
#cache
def my_function():
"""do things"""

How to use tensorflow to approximate hessian matrix's norm

I wonder is there any method to recompute gradients with updated weights within a graph or if there is any better way to do this. For example, for estimating hessian norm, we need to compute
delta ~ N(0, I)
hessian_norm = 1/M \sum_{1}^{M} gradient(f(x+delta))- gradient(f(x-delta))/(2*delta)
we need to gradient value on x+delta. Currently we will get None type if we use tf.gradient on var+delta directly.
More specifally speaking, if we define
a = tf.Variable
b = some_function(a)
grad = tf.gradients(b, a)
that's a normal gradient computation but if we do
grad_delta = tf.gradients(b, a+delta)
it will return None. This feature seems to make it impossible to approximate the hessian norm using the above method.
b is not a function of a+delta, so you get Nones. You either need to create new value b2 which depends on a+delta, or just move your a variable by delta and eval again to get second value.
This is similar to how you do line search in TensorFlow.

Alternative plan of tf.floor

One of my operaction need integer, but output of convolution is float.
It means I need to use tf.floor, tf.ceil, tf.cast...etc to handle it.
But these operactions cause None gradients, since operactions like tf.floor are not differentiable
So, I tried something like below
First. detour
out1 = tf.subtract(vif, tf.subtract(vif, tf.floor(vif)))
But output of test.compute_gradient_error is 500 or 0, I don't think this is a reasonable gradient.
Second. override gradient function of floor
#ops.RegisterGradient("CustomFloor")
def _custom_floor_grad(op, grads):
return [grads]
A, B = 50, 7
shape = [A, B]
f = np.ones(shape, dtype=np.float32)
vif = tf.constant(f, dtype=tf.float32)
# out1 = tf.subtract(vif, tf.subtract(vif, tf.floor(vif)))
with tf.get_default_graph().gradient_override_map({"Floor": "CustomFloor"}):
out1 = tf.floor(vif)
with tf.Session() as sess:
err1 = tf.test.compute_gradient_error(vif, shape, out1, shape)
print err1
output of test.compute_gradient_error is 500 or 1, doesn't work too.
Question: A way to get integer and keep back propagation work fine (value like 2.0, 5.0 is ok)
In general, it's not inadvisable to solve discrete problem with gradient descent. You should be able express, to some extent integer solvers in TF but you're more or less on your own.
FWIW, the floor function looks like a saw. Its derivative is a constant function at 1 with little holes at every integer. At these positions you have a Dirac functional pointing downwards, like a rake if you wish. The Dirac functional has finite energy but no finite value.
The canonical way to tackle these problems is to relax the problem by "relaxiing" the hard floor constraint with something that is (at least once) differentiable (smooth).
There are multiple ways to do this. Perhaps the most popular are:
Hack up a function that looks like what you want. For instance a piece-wise linear function that slopes down quickly, but not vertically.
Replace step functions by sigmoids
Use a filter approximation which is well understood if it's a time series

Tilted loss in theano

I am trying to calculate the tilted loss, which in turn will be used in Keras. However, I must be doing something wrong since I am getting negative loss values (which ought to be impossible). Can anyone point out what I've done wrong. I'm assuming it's the theano syntax that I have got wrong.
The loss is defined mathematically as:
where $\xi_i = y_i - f_i$ where $y_i$ is the observation and $f_i$ is the prediction. Furthermore I am after the mean loss, thus I have defined my loss function as:
$$
\mathcal{L} = \frac{\alpha\sum \xi_i-\sum I(\xi_i<0)\xi_i}{N}
$$
where $I()$ is the indicator function and takes on the values 1 if true.
Hence my loss function is defined as follows:
def tilted_loss2(y,f):
q = 0.05
e = (y-f)
return (q*tt.sum(e)-tt.sum(e[e<0]))/e.shape[0]
however, when I run my network I get negative values. Is there something wrong with the theano syntax here? my biggest suspicion is here: tt.sum(e[e<0])). Can you slice it like this?
Any thoughts would be appreciated.
You can not slice like this. see this answer
You need to change your loss function as follows:
def tilted_loss2(y,f):
q = 0.05
e = (y-f)
return (q*tt.sum(e)-tt.sum(e[(e<0).nonzero()]))/e.shape[0]
You can also try this work-around using abs function instead of complex slicing syntax that might not work:
def tilted_loss2(y,f):
q = 0.05
e = (y-f)
return (q*tt.sum(e)-tt.sum(e-abs(e))/2.)/e.shape[0]

Categories