I'm a new here, studying tensorflow and encountering a problem.
import model_method
fittt(model_method.build(self,...),...parameters...)
The above is in the main.py importing model_method.py. Function fittt in main.py:
def fittt(model,...):
model.fit(...)
build() in model_method.py:
def build(self,...):
self.op_C,self.op_A = self.function_A(...)
self.op_B = self.function_B(self.op_C,...)
fit() in model_method.py:
def fit(self,...):
sess = tf.Session(graph=self.graph,config=config)
BB,AA = sess.run([self.op_B,self.op_A],feed_dict)
To check running process, I added pdb.set_trace() at the beginning of function_A() and function_B() in model_method.py as follows:
def function_A(self,...):
pdb.set_trace()
......
def function_B(self,...):
pdb.set_trace()
......
The two pdb.set_trace() only stopped when the build() called and didn't work when sess.run([self.op_B,self.op_A],feed_dict) called. So it means the sess.run() didn't run function_A() and function_B() actually. I wonder why and wanna know how to make the two functions work?
By calling the model_method.build() function you create a computation graph. In this call every line of code is executed (hence why pdb stopped).
However, tf.Session.run(...) executes only those parts of computational graph which are necessary to compute the fetched values (self.op_A, self.op_B in your example). The function does not execute the entire build() function again.
Therefore the reason why pdb.set_trace() did not execute when you've run sess.run(...) is because they are not valid Tensor objects and hence not part of the computational graph.
UPDATE
Consider the following:
class My_Model:
def __init__(self):
self.np_input = np.random.normal(size=(10,2)) # 10x2
def build(self):
self._in = tf.placeholder(dtype=tf.float32, shape=[10, None]) # matrix 10xN
W_exception = tf.random_normal(dtype=tf.float32, shape=[3,3]) # matrix 3x3
W_success = tf.random_normal(dtype=tf.float32, shape=[2,3]) # matrix 2x3
self.op_exception = tf.matmul(self._in, W_exception) # [10x2] x [3x3] = ERROR
self.op_success = tf.matmul(self._in, W_success) # [10x2] x [2x3] = [10x3]
print('Computational Graph Built')
def fit_success(self):
with tf.Session() as sess:
res = sess.run(self.op_success, feed_dict={self._in : self.np_input})
print('Result shape: {}'.format(res.shape))
def fit_exception(self):
with tf.Session() as sess:
res = sess.run(self.op_exception, feed_dict={self._in : self.np_input})
print('Result shape: {}'.format(res.shape))
and then calling:
m = My_Model()
m.build()
#> Computational Graph Built
m.fit_success()
#> Result shape: (10, 3)
m.fit_exception()
#> InvalidArgumentError: Matrix size-incompatible: In[0]: [10,2], In[1]: [3,3]
So to explain what you see there. We first define the computational graph in the build() function. The _in is our input tensor; None means the dimension 1 is determined dynamically - that is once we provide a tensor with specified values.
Then we defined two matrices W_exception and W_success which have all dimensions specified and their values will be randomly generated.
Then we define two operations, matrix multiplication, that each returns a tensor.
We called the build() function and created the computational graph, print() function is also executed but NOT added to the graph. Nothing is computed here. In fact, it can't even be, because the values of _in are not specified.
Now to show, that only necessary parts required for computation are evaluated, we call the fit_success() function, which simply multiplies the input tensor _in with the W_success tensor (with correct dimensions). We receive a tensor with correct shape: [10x3]. Note, that we receive no error that op_exception cannot be computed due to mismatched dimensions. That's because we do not need it to evaluate op_success.
Lastly, I just show that exception is indeed thrown when we try to evaluate the op_exception with the same input tensor.
Related
I am using PyTorch 1.6.0 to learn a tensor (lets say x) with autograd.
After x is learnt, how can I reset .requires_grad of every tensor that was a node in the autograd comp. graph to zero?
I know about torch.detach() and about setting .requires_grad to False manually. I am searching for an one-shot instruction.
Ps: I want to do that because I still want to use these tensors after the part of my code that learns x is executed. Plus, some are to be converted to numpy.
There is no "one shot instruction" to switch .requires_grad for all tensors in graph.
Usually parameters are kept in torch.nn.Module instances but in case they are elsewhere, you can always add them to some list and iterate over it, I'd do something like this:
import torch
class Leafs:
def __init__(self):
self.leafs = []
def add(self, tensor):
self.leafs.append(tensor)
return tensor
def clear(self):
for leaf in self.leafs:
leaf.requires_grad_(False)
keeper = Leafs()
x = keeper.add(torch.tensor([1.2], requires_grad=True))
y = keeper.add(torch.tensor([1.3], requires_grad=True))
print(x.requires_grad, y.requires_grad)
keeper.clear()
print(x.requires_grad, y.requires_grad)
Usually there is no need for that, also if you don't want gradient for some part of computation you can always use with torch.no_grad() context manager.
Inside my custom loss function I need to call a pure python function passing in the computed TD errors and some indexes. The function doesn't need to return anything or be differentiated. Here's the function I want to call:
def update_priorities(self, traces_idxs, td_errors):
"""Updates the priorities of the traces with specified indexes."""
self.priorities[traces_idxs] = td_errors + eps
I've tried using tf.py_function to call a wrapper function but it only gets called if it's embedded in the graph i.e. if it has inputs and outputs and the outputs are used. Therefore I tried to pass through some of the tensors without performing any operations on them and the function now gets called. Here's my entire custom loss function:
def masked_q_loss(data, y_pred):
"""Computes the MSE between the Q-values of the actions that were taken and the cumulative
discounted rewards obtained after taking those actions. Updates trace priorities.
"""
action_batch, target_qvals, traces_idxs = data[:,0], data[:,1], data[:,2]
seq = tf.cast(tf.range(0, tf.shape(action_batch)[0]), tf.int32)
action_idxs = tf.transpose(tf.stack([seq, tf.cast(action_batch, tf.int32)]))
qvals = tf.gather_nd(y_pred, action_idxs)
def update_priorities(_qvals, _target_qvals, _traces_idxs):
"""Computes the TD error and updates memory priorities."""
td_error = _target_qvals - _qvals
_traces_idxs = tf.cast(_traces_idxs, tf.int32)
mem.update_priorities(_traces_idxs, td_error)
return _qvals
qvals = tf.py_function(func=update_priorities, inp=[qvals, target_qvals, traces_idxs], Tout=[tf.float32])
return tf.keras.losses.mse(qvals, target_qvals)
However I get the following error due to the call mem.update_priorities(_traces_idxs, td_error)
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
I don't need to compute gradients for update_priorities, I just want to call it at a specific point in the graph computation and forget about it. How can I do that?
Using .numpy() on the tensors inside the wrapper function fixed the problem:
def update_priorities(_qvals, _target_qvals, _traces_idxs):
"""Computes the TD error and updates memory priorities."""
td_error = np.abs((_target_qvals - _qvals).numpy())
_traces_idxs = (tf.cast(_traces_idxs, tf.int32)).numpy()
mem.update_priorities(_traces_idxs, td_error)
return _qvals
I have a number of related questions about tensorflow behavior when attempting to do graph surgery using import_graph_def. 2 different graph surgeries
In the image above, I represent with bold red arrows 2 different graph surgeries. On the left, there are 2 graphs, g1 and g2, and the surgery consists of replacing a node in graph g2 by a node - and everything below it - from graph g1. How to do that is explained in this post. The surgery on the right, which involves replacing nodes that belong to the same graph, I haven't been able to figure out how to perform, or even if it is at all possible. I ended up with this minimal example
with tf.Graph().as_default() as g1:
with tf.variable_scope('foo', reuse=tf.AUTO_REUSE):
x = tf.placeholder(dtype=tf.float64, shape=[2], name='x')
c = tf.get_variable('c', initializer=tf.cast(1.0, tf.float64))
y = tf.identity(2*x, 'y')
z = tf.identity(3*x*c, 'z')
g1_def = g1.as_graph_def()
z1, = tf.import_graph_def(g1_def, input_map={'foo/x:0' : y}, return_elements=["foo/z:0"],
name='z1')
init_op = tf.global_variables_initializer()
print(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='foo'))
with tf.Session(graph=g1) as sess:
sess.run(init_op)
print(sess.run(z, feed_dict={'foo/x:0' : np.array([1.0, 2.0])}) )
print(sess.run(tf.report_uninitialized_variables()))
# z1 = sess.run(z1, feed_dict={'foo/x:0' : np.array([1.0, 2.0])})
This code runs as it is. The 3 prints yield respectively:
[<tf.Variable 'foo/c:0' shape=() dtype=float64_ref>]
[ 3. 6.]
[]
In particular, the last print informs that there are no unintialized variables. However, uncommenting the last line, yields the error
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value foo/z1/foo/c
Note that if I remove c from the definition of z above, this would also work. However, I would like to understand this error. To begin with, why is the variable reported as foo/z1/foo/c? Why does the scope foo appear twice? Why is nothing reported when I print the uninitialized variables? Why is only foo/c reported when I print the GLOBAL_VARIABLES collection under the scope foo?
PS: I guess that there is a simpler way to ask the question which is, what is the tensorflow analogue of
theano.clone(some_tensor, replace={input_var : replace_var})
To begin with, why is the variable reported as foo/z1/foo/c?
Why does the scope foo appear twice?
After you've called tf.import_graph_def(...), the graph got duplicated. The first graph is defined in foo score. The second subgraph has been imported under the scope foo/z1 (because name='z1', plus foo is preserved from the scope above). So the graph g1 now contains the following tensors:
foo/x
foo/y
foo/c
...
foo/z1/foo/x
foo/z1/foo/y
foo/z1/foo/c
...
The first foo/c is initialized, but the second foo/z1/foo/c is not (see below).
Why is nothing reported when I print the uninitialized variables? Why is only foo/c reported when I print the GLOBAL_VARIABLES collection under the scope foo?
Since report_uninitialized_variables() scans LOCAL_VARIABLES and GLOBAL_VARIABLES by default, this is basically the same question.
And it probably is a bug: GLOBAL_VARIABLES collection isn't updated after tf.import_graph_def call. I say probably because GLOBAL_VARIABLES was designed as a mere convenience collection. Tensorflow tries to keep it up do date, but probably doesn't guarantee it always has all variables. The fact that tf.add_to_collection exists publicly supports this idea -- one can add any value to any collection if they want it. Bottom line: this behavior may or may not change in future versions, but as of 1.5 the client is responsible to update the global variables after graph import.
In particular, the last print informs that there are no unintialized variables. However, uncommenting the last line, yields the error
To fix this error, you simply need to run the initializer for the z1 subgraph. Like this:
# note that it's defined before `g1.as_graph_def()` to be a part of graph def
init_op = tf.global_variables_initializer()
g1_def = g1.as_graph_def()
z1, = tf.import_graph_def(g1_def, input_map={'foo/x:0': y}, return_elements=["foo/z:0"],
name='z1')
# find the init op
z1_init_op = tf.get_default_graph().get_operation_by_name('foo/z1/foo/init')
...
sess.run(z1_init_op)
And voila! You have the duplicated graphs, just like you wanted to.
I faced a similar issue but simply running the init operation didn't work.
I fixed it by manually running all "Assign" ops of the global variables of the imported graph.
In my scenario I want to run an encoding op 'z' with input 'patch:0' using two different input tensors.
with tf.Session(graph=tf.get_default_graph()).as_default() as sess:
g = tf.Graph()
saved_model = predictor.from_saved_model(args.export_dir, graph=g)
variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)]
fetch_ops = ['z:0','init']
fetch_ops.extend([v.name.strip(":0") + "/Assign" for v in variables)
image_graph = tf.graph_util.import_graph_def(
g.as_graph_def(),
input_map={'patch:0': image},
return_elements=fetch_ops,
name='image')
warped_graph = tf.graph_util.import_graph_def(
g.as_graph_def(),
input_map={'patch:0': warped_image},
return_elements=fetch_ops,
name='warp')
loss = tf.reduce_sum(tf.math.squared_difference(image_graph[0], warped_graph[0]))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.0001)
compute_gradients = optimizer.compute_gradients(
loss,
var_list=[dest_control_point_locations])
apply_gradients = optimizer.apply_gradients(compute_gradients, global_step=step)
sess.run(image_graph[1:])
sess.run(warped_graph[1:])
sess.run(tf.global_variables_initializer())
gradients = sess.run(compute_gradients)
When extracting the operation and running it by feeding my tensors with feed_dict, gradient_computation doesn't work, that's why I used tf.graph_util.import_graph_def(...).
Hope this might help anyone facing the same issue.
I've tried the following code. But I don't find what is not feedable in tensorflow. Could anybody show me what is not feedable?
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:
import tensorflow as tf
x = tf.Variable(3)
y = tf.constant(3)
z = tf.add(1, 2)
with tf.Session() as sess:
print sess.graph.is_feedable(x)
print sess.graph.is_feedable(y)
print sess.graph.is_feedable(z)
All tensors are feedable (including the constants, as you can see), unless they are explicitly prevented from feeding via tf.Graph.prevent_feeding method. One can call this method directly or indirectly, for example, that's what tf.contrib.util.constant_value function does:
NOTE: If constant_value(tensor) returns a non-None result, it will no longer be possible to feed a different value for tensor. This allows the result of this function to influence the graph that is constructed, and permits static shape optimizations.
Sample code:
y = tf.constant(3)
tf.contrib.util.constant_value(y) # 3
with tf.Session() as sess:
print sess.graph.is_feedable(y) # False!
I'm trying to define a gradient method for my custom TF operation. Most of the solutions I have found online seem to based on a gist by harpone. I'm reluctant to use that approach as it uses py_func which won't run on GPU. I found another solution here that uses tf.identity() that looks more elegant and I think will run on GPU. However, I have some problems accessing inputs of the ops in my custom gradient function. Here's my code:
#tf.RegisterGradient('MyCustomGradient')
def _custom_gradient(op, gradients):
x = op.inputs[0]
return(x)
def my_op(w):
return tf.pow(w,3)
var_foo = tf.Variable(5, dtype=tf.float32)
bar = my_op(var_foo)
g = tf.get_default_graph()
with g.gradient_override_map({'Identity': 'MyCustomGradient'}):
bar = tf.identity(bar)
g = tf.gradients(bar, var_foo)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(g))
I was expecting _custom_gradient() to return the input to the op (5 in this example) but instead it seems to return op output x gradient. My custom my_op will have non-differentiable operations like tf.sign and I'd like to define my custom gradient based on the inputs. What am I doing wrong?
There is no problem with your code:
Let's first do the forward pass:
var_foo = 5 -> bar = 125 -> tf.identity(bar) = 125
Now let's backpropagate:
The gradient of tf.identity(bar) with respect to its argument bar equals (by your definition) to bar, that is, 125. The gradient of bar with respect to var_foo equals 3 times the square of var_foo which is 75. Multiply, and you get 9375, which is indeed the output of your code.
op.inputs[0] contains the forward-pass value of the op. In this case, the forward pass of the identity op is 125.