Is tf.GradientTape in TF 2.0 equivalent to tf.gradients? - python

I am migrating my training loop to Tensorflow 2.0 API. In eager execution mode, tf.GradientTape replaces tf.gradients. The question is, do they have the same functionality? Specifically:
In function gradient():
Is the parameter output_gradients equivalent to grad_ys in the old API?
What about parameters colocate_gradients_with_ops. aggregation_method, gate_gradients of tf.gradients? Are they deprecated due to lack of use? Can they be replaced by using other methods in 2.0 API? Are they needed in Eager Execution at all?
Is function jacobian() equivalent to tf.python.ops.parallel_for.gradients?

Please find the response below.
Regarding Output Gradients and grad_ys: Yes, they can be considered same.
Detailed Explanation: Info about Output Gradients is mentioned in Github -> imperative_grad.py as shown below.
output_gradients: if not None, a list of gradient provided for each
Target,
or None if we are to use the target's computed downstream gradient,
Info about grad_ys is mentioned in TF Site as shown below:
grad_ys: is a list of tensors of the same length as ys that holds the
initial gradients for each y in ys. When grad_ys is None, we fill in a
tensor of '1's of the shape of y for each y in ys. A user can provide
their own initial grad_ys to compute the derivatives using a different
initial gradient for each y (e.g., if one wanted to weight the
gradient differently for each value in each y).
From the above explanations, and from the below code, mentioned in page 394 of the book, Hands on ML using Scikit-Learn & Tensorflow,
we can conclude that initial value of Theta can be a Random Value and we can pass that using the parameters, output_gradients or grad_ys.
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
gradients = tf.gradients(mse, [theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients)
Regarding colocate_gradients_with_ops: Yes, it is not needed for Eager Execution as it is related to Control Flow Context of Graphs.
Detailed Explanation: colocate_gradients_with_ops points to the below code mentioned in Github -> ops.py. Control flow Context is related to the concept of Context, which is related to Graphs, as explained in TF Site -> Graphs
def _colocate_with_for_gradient(self, op, gradient_uid,
ignore_existing=False):
with self.colocate_with(op, ignore_existing):
if gradient_uid is not None and self._control_flow_context is not None:
self._control_flow_context.EnterGradientColocation(op, gradient_uid)
try:
yield
finally:
self._control_flow_context.ExitGradientColocation(op, gradient_uid)
else:
yield
Regarding aggregation_method: The equivalent of this parameter has been implemented in 2.0, named _aggregate_grads as shown in Github link
Regarding gate_gradients: Not needed for Eager as this also is related to Graph Context.
Detailed Explanation: As shown in the below code from Github -> gradients_utils.py, if gate_gradients is True, then some operations are added to graph using the function, _colocate_with_for_gradient, which in turn depends on Control Flow Context of Graphs.
if gate_gradients and len([x for x in in_grads
if x is not None]) > 1:
with ops.device(None):
with ops._colocate_with_for_gradient( # pylint: disable=protected-access
None,
gradient_uid,
ignore_existing=True):
in_grads = control_flow_ops.tuple(in_grads)
Regarding jacobian: Yes they are same.

Related

Initializing variables, variable scope and import_graph_def in tensorflow

I have a number of related questions about tensorflow behavior when attempting to do graph surgery using import_graph_def. 2 different graph surgeries
In the image above, I represent with bold red arrows 2 different graph surgeries. On the left, there are 2 graphs, g1 and g2, and the surgery consists of replacing a node in graph g2 by a node - and everything below it - from graph g1. How to do that is explained in this post. The surgery on the right, which involves replacing nodes that belong to the same graph, I haven't been able to figure out how to perform, or even if it is at all possible. I ended up with this minimal example
with tf.Graph().as_default() as g1:
with tf.variable_scope('foo', reuse=tf.AUTO_REUSE):
x = tf.placeholder(dtype=tf.float64, shape=[2], name='x')
c = tf.get_variable('c', initializer=tf.cast(1.0, tf.float64))
y = tf.identity(2*x, 'y')
z = tf.identity(3*x*c, 'z')
g1_def = g1.as_graph_def()
z1, = tf.import_graph_def(g1_def, input_map={'foo/x:0' : y}, return_elements=["foo/z:0"],
name='z1')
init_op = tf.global_variables_initializer()
print(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='foo'))
with tf.Session(graph=g1) as sess:
sess.run(init_op)
print(sess.run(z, feed_dict={'foo/x:0' : np.array([1.0, 2.0])}) )
print(sess.run(tf.report_uninitialized_variables()))
# z1 = sess.run(z1, feed_dict={'foo/x:0' : np.array([1.0, 2.0])})
This code runs as it is. The 3 prints yield respectively:
[<tf.Variable 'foo/c:0' shape=() dtype=float64_ref>]
[ 3. 6.]
[]
In particular, the last print informs that there are no unintialized variables. However, uncommenting the last line, yields the error
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value foo/z1/foo/c
Note that if I remove c from the definition of z above, this would also work. However, I would like to understand this error. To begin with, why is the variable reported as foo/z1/foo/c? Why does the scope foo appear twice? Why is nothing reported when I print the uninitialized variables? Why is only foo/c reported when I print the GLOBAL_VARIABLES collection under the scope foo?
PS: I guess that there is a simpler way to ask the question which is, what is the tensorflow analogue of
theano.clone(some_tensor, replace={input_var : replace_var})
To begin with, why is the variable reported as foo/z1/foo/c?
Why does the scope foo appear twice?
After you've called tf.import_graph_def(...), the graph got duplicated. The first graph is defined in foo score. The second subgraph has been imported under the scope foo/z1 (because name='z1', plus foo is preserved from the scope above). So the graph g1 now contains the following tensors:
foo/x
foo/y
foo/c
...
foo/z1/foo/x
foo/z1/foo/y
foo/z1/foo/c
...
The first foo/c is initialized, but the second foo/z1/foo/c is not (see below).
Why is nothing reported when I print the uninitialized variables? Why is only foo/c reported when I print the GLOBAL_VARIABLES collection under the scope foo?
Since report_uninitialized_variables() scans LOCAL_VARIABLES and GLOBAL_VARIABLES by default, this is basically the same question.
And it probably is a bug: GLOBAL_VARIABLES collection isn't updated after tf.import_graph_def call. I say probably because GLOBAL_VARIABLES was designed as a mere convenience collection. Tensorflow tries to keep it up do date, but probably doesn't guarantee it always has all variables. The fact that tf.add_to_collection exists publicly supports this idea -- one can add any value to any collection if they want it. Bottom line: this behavior may or may not change in future versions, but as of 1.5 the client is responsible to update the global variables after graph import.
In particular, the last print informs that there are no unintialized variables. However, uncommenting the last line, yields the error
To fix this error, you simply need to run the initializer for the z1 subgraph. Like this:
# note that it's defined before `g1.as_graph_def()` to be a part of graph def
init_op = tf.global_variables_initializer()
g1_def = g1.as_graph_def()
z1, = tf.import_graph_def(g1_def, input_map={'foo/x:0': y}, return_elements=["foo/z:0"],
name='z1')
# find the init op
z1_init_op = tf.get_default_graph().get_operation_by_name('foo/z1/foo/init')
...
sess.run(z1_init_op)
And voila! You have the duplicated graphs, just like you wanted to.
I faced a similar issue but simply running the init operation didn't work.
I fixed it by manually running all "Assign" ops of the global variables of the imported graph.
In my scenario I want to run an encoding op 'z' with input 'patch:0' using two different input tensors.
with tf.Session(graph=tf.get_default_graph()).as_default() as sess:
g = tf.Graph()
saved_model = predictor.from_saved_model(args.export_dir, graph=g)
variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)]
fetch_ops = ['z:0','init']
fetch_ops.extend([v.name.strip(":0") + "/Assign" for v in variables)
image_graph = tf.graph_util.import_graph_def(
g.as_graph_def(),
input_map={'patch:0': image},
return_elements=fetch_ops,
name='image')
warped_graph = tf.graph_util.import_graph_def(
g.as_graph_def(),
input_map={'patch:0': warped_image},
return_elements=fetch_ops,
name='warp')
loss = tf.reduce_sum(tf.math.squared_difference(image_graph[0], warped_graph[0]))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.0001)
compute_gradients = optimizer.compute_gradients(
loss,
var_list=[dest_control_point_locations])
apply_gradients = optimizer.apply_gradients(compute_gradients, global_step=step)
sess.run(image_graph[1:])
sess.run(warped_graph[1:])
sess.run(tf.global_variables_initializer())
gradients = sess.run(compute_gradients)
When extracting the operation and running it by feeding my tensors with feed_dict, gradient_computation doesn't work, that's why I used tf.graph_util.import_graph_def(...).
Hope this might help anyone facing the same issue.

Using op inputs when defining custom gradients in TensorFlow

I'm trying to define a gradient method for my custom TF operation. Most of the solutions I have found online seem to based on a gist by harpone. I'm reluctant to use that approach as it uses py_func which won't run on GPU. I found another solution here that uses tf.identity() that looks more elegant and I think will run on GPU. However, I have some problems accessing inputs of the ops in my custom gradient function. Here's my code:
#tf.RegisterGradient('MyCustomGradient')
def _custom_gradient(op, gradients):
x = op.inputs[0]
return(x)
def my_op(w):
return tf.pow(w,3)
var_foo = tf.Variable(5, dtype=tf.float32)
bar = my_op(var_foo)
g = tf.get_default_graph()
with g.gradient_override_map({'Identity': 'MyCustomGradient'}):
bar = tf.identity(bar)
g = tf.gradients(bar, var_foo)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(g))
I was expecting _custom_gradient() to return the input to the op (5 in this example) but instead it seems to return op output x gradient. My custom my_op will have non-differentiable operations like tf.sign and I'd like to define my custom gradient based on the inputs. What am I doing wrong?
There is no problem with your code:
Let's first do the forward pass:
var_foo = 5 -> bar = 125 -> tf.identity(bar) = 125
Now let's backpropagate:
The gradient of tf.identity(bar) with respect to its argument bar equals (by your definition) to bar, that is, 125. The gradient of bar with respect to var_foo equals 3 times the square of var_foo which is 75. Multiply, and you get 9375, which is indeed the output of your code.
op.inputs[0] contains the forward-pass value of the op. In this case, the forward pass of the identity op is 125.

TensorFlow (v1.1.0) Multi-RNN BasicLSTMCell Error ('reuse' parameter) Python 3.5

An expansion on: What is the use of a "reuse" parameter of tf.contrib.layers functions?.
The question: Although this issue has been brought up on github and will likely be addressed in another release of TensorFlow, I have found no existing solution for the time being; is there a stop-gap measure that might work in the meantime?
Code:
state_size = 4
def lstm_cell():
if 'reuse' in inspect.getargspec(tf.contrib.rnn.BasicLSTMCell.__init__).args:
return tf.contrib.rnn.BasicLSTMCell(state_size, forget_bias=0.0, state_is_tuple=True, reuse=tf.get_variable_scope().reuse)
else:
return tf.contrib.rnn.BasicLSTMCell(state_size, forget_bias=0.0, state_is_tuple=True)
cell = lstm_cell()
cell = rnn.DropoutWrapper(cell, output_keep_prob=0.5)
cell = rnn.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
states_series, current_state = tf.nn.dynamic_rnn(cell, tf.expand_dims(batchX_placeholder, -1), initial_state=rnn_tuple_state)
states_series = tf.reshape(states_series, [-1, state_size])
The function lstm_cell() is a suggestion from https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py. It explains that the newest version of tensorflow has includes the 'reuse' parameter for BasicLSTMCell().
In this code, if I set reuse to False, tf.nn.dynamic_rnn line produces the error:
"ValueError: Variable
rnn/multi_rnn_cell/cell_0/basic_lstm_cell/weights already exists,
disallowed. Did you mean to set reuse=True in VarScope? Originally
defined at:..."
If I set reuse to True, the error is:
"ValueError: Attempt to reuse RNNCell
with a different variable scope than its first
use. First use of cell was with scope
'rnn/multi_rnn_cell/cell_0/basic_lstm_cell', this attempt is with
scope 'rnn/multi_rnn_cell/cell_1/basic_lstm_cell'. Please create a
new instance of the cell if you would like it to use a different set
of weights. If before you were using:
MultiRNNCell([BasicLSTMCell(...)] * num_layers), change to:
MultiRNNCell([BasicLSTMCell(...) for _ in range(num_layers)]). If
before you were using the same cell instance as both the forward and
reverse cell of a bidirectional RNN, simply create two instances (one
for forward, one for reverse). In May 2017, we will start
transitioning this cell's behavior to use existing stored weights, if
any, when it is called with scope=None (which can lead to silent
model degradation, so this error will remain until then.)"
Lastly, adding in 'scope=None' to dynamic_rnn provides no difference either.
Have you considered trying what the 'reuse to True'-error is suggesting?
If before you were using: MultiRNNCell([BasicLSTMCell(...)] *
num_layers), change to: MultiRNNCell([BasicLSTMCell(...) for _ in
range(num_layers)]).
following code snipped works for me (already answered here)
def lstm_cell():
cell = tf.contrib.rnn.NASCell(state_size, reuse=tf.get_variable_scope().reuse)
return tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=0.8)
rnn_cells = tf.contrib.rnn.MultiRNNCell([lstm_cell() for _ in range(num_layers)], state_is_tuple = True)
outputs, current_state = tf.nn.dynamic_rnn(rnn_cells, x, initial_state=rnn_tuple_state)

Tensorflow: How to write op with gradient in python?

I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient).
This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow: Writing an Op in Python
The TF documentation describes how to add an op starting from C++ code only: https://www.tensorflow.org/versions/r0.10/how_tos/adding_an_op/index.html
In my case, I am prototyping so I don't care about whether it runs on GPU, and I don't care about it being usable from anything other than the TF python API.
Yes, as mentionned in #Yaroslav's answer, it is possible and the key is the links he references: here and here. I want to elaborate on this answer by giving a concret example.
Modulo opperation: Let's implement the element-wise modulo operation in tensorflow (it already exists but its gradient is not defined, but for the example we will implement it from scratch).
Numpy function: The first step is to define the opperation we want for numpy arrays. The element-wise modulo opperation is already implemented in numpy so it is easy:
import numpy as np
def np_mod(x,y):
return (x % y).astype(np.float32)
The reason for the .astype(np.float32) is because by default tensorflow takes float32 types and if you give it float64 (the numpy default) it will complain.
Gradient Function: Next we need to define the gradient function for our opperation for each input of the opperation as tensorflow function. The function needs to take a very specific form. It need to take the tensorflow representation of the opperation op and the gradient of the output grad and say how to propagate the gradients. In our case, the gradients of the mod opperation are easy, the derivative is 1 with respect to the first argument and
with respect to the second (almost everywhere, and infinite at a finite number of spots, but let's ignore that, see https://math.stackexchange.com/questions/1849280/derivative-of-remainder-function-wrt-denominator for details). So we have
def modgrad(op, grad):
x = op.inputs[0] # the first argument (normally you need those to calculate the gradient, like the gradient of x^2 is 2x. )
y = op.inputs[1] # the second argument
return grad * 1, grad * tf.neg(tf.floordiv(x, y)) #the propagated gradient with respect to the first and second argument respectively
The grad function needs to return an n-tuple where n is the number of arguments of the operation. Notice that we need to return tensorflow functions of the input.
Making a TF function with gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc].
Copying the code from harpone we can modify the tf.py_func function to make it define the gradient at the same time:
import tensorflow as tf
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
tf.RegisterGradient(rnd_name)(grad) # see _MySquareGrad for grad example
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False) in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations.
Combining it all together: Now that we have all the pieces, we can combine them all together:
from tensorflow.python.framework import ops
def tf_mod(x,y, name=None):
with ops.op_scope([x,y], name, "mod") as name:
z = py_func(np_mod,
[x,y],
[tf.float32],
name=name,
grad=modgrad) # <-- here's the call to the gradient
return z[0]
tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x,y] (and return z[0]).
And now we are done. And we can test it.
Test:
with tf.Session() as sess:
x = tf.constant([0.3,0.7,1.2,1.7])
y = tf.constant([0.2,0.5,1.0,2.9])
z = tf_mod(x,y)
gr = tf.gradients(z, [x,y])
tf.initialize_all_variables().run()
print(x.eval(), y.eval(),z.eval(), gr[0].eval(), gr[1].eval())
[ 0.30000001 0.69999999 1.20000005 1.70000005] [ 0.2 0.5 1. 2.9000001] [ 0.10000001 0.19999999 0.20000005 1.70000005] [ 1. 1. 1. 1.] [ -1. -1. -1. 0.]
Success!
Here's an example of adding gradient to a specific py_func
https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342
Here's the issue discussion

How to define General deterministic function in PyMC

In my model, I need to obtain the value of my deterministic variable from a set of parent variables using a complicated python function.
Is it possible to do that?
Following is a pyMC3 code which shows what I am trying to do in a simplified case.
import numpy as np
import pymc as pm
#Predefine values on two parameter Grid (x,w) for a set of i values (1,2,3)
idata = np.array([1,2,3])
size= 20
gridlength = size*size
Grid = np.empty((gridlength,2+len(idata)))
for x in range(size):
for w in range(size):
# A silly version of my real model evaluated on grid.
Grid[x*size+w,:]= np.array([x,w]+[(x**i + w**i) for i in idata])
# A function to find the nearest value in Grid and return its product with third variable z
def FindFromGrid(x,w,z):
return Grid[int(x)*size+int(w),2:] * z
#Generate fake Y data with error
yerror = np.random.normal(loc=0.0, scale=9.0, size=len(idata))
ydata = Grid[16*size+12,2:]*3.6 + yerror # ie. True x= 16, w= 12 and z= 3.6
with pm.Model() as model:
#Priors
x = pm.Uniform('x',lower=0,upper= size)
w = pm.Uniform('w',lower=0,upper =size)
z = pm.Uniform('z',lower=-5,upper =10)
#Expected value
y_hat = pm.Deterministic('y_hat',FindFromGrid(x,w,z))
#Data likelihood
ysigmas = np.ones(len(idata))*9.0
y_like = pm.Normal('y_like',mu= y_hat, sd=ysigmas, observed=ydata)
# Inference...
start = pm.find_MAP() # Find starting value by optimization
step = pm.NUTS(state=start) # Instantiate MCMC sampling algorithm
trace = pm.sample(1000, step, start=start, progressbar=False) # draw 1000 posterior samples using NUTS sampling
print('The trace plot')
fig = pm.traceplot(trace, lines={'x': 16, 'w': 12, 'z':3.6})
fig.show()
When I run this code, I get error at the y_hat stage, because the int() function inside the FindFromGrid(x,w,z) function needs integer not FreeRV.
Finding y_hat from a pre calculated grid is important because my real model for y_hat does not have an analytical form to express.
I have earlier tried to use OpenBUGS, but I found out here it is not possible to do this in OpenBUGS. Is it possible in PyMC ?
Update
Based on an example in pyMC github page, I found I need to add the following decorator to my FindFromGrid(x,w,z) function.
#pm.theano.compile.ops.as_op(itypes=[t.dscalar, t.dscalar, t.dscalar],otypes=[t.dvector])
This seems to solve the above mentioned issue. But I cannot use NUTS sampler anymore since it needs gradient.
Metropolis seems to be not converging.
Which step method should I use in a scenario like this?
You found the correct solution with as_op.
Regarding the convergence: Are you using pm.Metropolis() instead of pm.NUTS() by any chance? One reason this could not converge is that Metropolis() by default samples in the joint space while often Gibbs within Metropolis is more effective (and this was the default in pymc2). Having said that, I just merged this: https://github.com/pymc-devs/pymc/pull/587 which changes the default behavior of the Metropolis and Slice sampler to be non-blocked by default (so within Gibbs). Other samplers like NUTS that are primarily designed to sample the joint space still default to blocked. You can always explicitly set this with the kwarg blocked=True.
Anyway, update pymc with the most recent master and see if convergence improves. If not, try the Slice sampler.

Categories