So if I run this code in Pytorch:
x = torch.ones(2,2, requires_grad=True)
x.add_(1)
I will get the error:
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
I understand that Pytorch does not allow inplace operations on leaf variables and I also know that there are ways to get around this restrictions. What I don't understand is the philosophy behind this rule. Why is it wrong to change a leaf variable with inplace operations?
As I understand it, any time you do a non-traditional operation on a tensor that was initialized with requires_grad=True, Pytorch throws an error to make sure it was intentional. For example, you normally would only update a weight tensor using optimizer.step().
For another example, I ran into this issue when trying to update the values in a backprop-able tensor during network initialization.
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
The problem is that, because requires_grad=True, the network doesn't know that I'm still initializing the values. If this is what you are trying to do, wrapping the update in a torch.no_grad block is one solution:
with torch.no_grad()
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
Otherwise, you could just set requires_grad=True after you finish initializing the Tensor:
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length))
self.weight_layer[true_ids == 1] = -1.2
self.weight_layer.requires_grad = True
The simple answer to this is that, once autograd creates the graph, and we have created our tensors which are a input to the graph, autograd will construct and track each operation on your created tensor. Now, since this tensor has requires_grad=True, and let's say I do a weight update after loss.backward(), autograd is probably considering as a part of the already created graph, which requires gradients. This leads to the RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
Autograd is confused that why is a tensor which is a part of the computational graph being used outside/being used for some other operations/initialization of it inplace and this is an an problem
if we simply place the code under a
with torch.no_grad(): code here
we disable the gradients, and hence there is essentially it signals to autograd that this operation is not a part of our dynamic graph updates.
PS: I will expand this answer by writing a blog of this on Medium
Related
For my task, I do not need to compute gradients. I am simply replacing nn.L1Loss with a numpy function (corrcoef) in my loss evaluation but I get the following error:
RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
I couldn’t figure out how exactly I should detach the graph (I tried torch.Tensor.detach(np.corrcoef(x, y)) but I still get the same error. I eventually wrapped everything using with torch.no_grad as follow:
with torch.no_grad():
predFeats = self.forward(x)
targetFeats = self.forward(target)
loss = torch.from_numpy(np.corrcoef(predFeats.cpu().numpy().astype(np.float32), targetFeats.cpu().numpy().astype(np.float32))[1][1])
But this time I get the following error:
TypeError: expected np.ndarray (got numpy.float64)
I wonder, what am I doing wrong?
TL;DR
with torch.no_grad():
predFeats = self(x)
targetFeats = self(target)
loss = torch.tensor(np.corrcoef(predFeats.cpu().numpy(),
targetFeats.cpu().numpy())[1][1]).float()
You would avoid the first RuntimeError by detaching the tensors (predFeats and targetFeats) from the computational graph.
i.e. Getting a copy of the tensor data without the gradients and the gradient function (grad_fn).
So, instead of
torch.Tensor.detach(np.corrcoef(x.numpy(), y.numpy())) # Detaches a newly created tensor!
# x and y still may have gradients. Hence the first error.
which does nothing, do
# Detaches x and y properly
torch.Tensor(np.corrcoef(x.detach().numpy(), y.detach().numpy()))
But let's not bother with all the detachments.
Like you rightfully fixed, it, let's disable the gradients.
torch.no_grad()
Now, compute the features.
predFeats = self(x) # No need for the explicit .forward() call
targetFeats = self(target)
I found it helpful to break your last line up.
loss = np.corrcoef(predFeats.numpy(), targetFeats.numpy()) # We don't need to detach
# Notice that we don't need to cast the arguments to fp32
# since the `corrcoef` casts them to fp64 anyway.
print(loss.shape, loss.dtype) # A 2-dimensional fp64 matrix
loss = loss[1][1]
print(type(loss)) # Output: numpy.float64
# Loss now just a simple fp64 number
And that is the problem!
Because, when we do
loss = torch.from_numpy(loss)
we're passing in a number (numpy.float64) while it expects a numpy tensor (np.ndarray).
If you're using PyTorch 0.4 or up, there's inbuilt support for scalars.
Simply replace the from_numpy() method with the universal tensor() creation method.
loss = torch.tensor(loss)
P.S. You might also want to look at setting rowvar=False in corrcoef since the rows in PyTorch tensors usually represent the observations.
I need to modify the weight values during the execution, more specifically between the compute_gradients() and apply_gradients() functions. I was able to modify the gradients themselves, but i could not change the weights.
I'm using the tutorial for the Iris NN in tensorflow:
https://github.com/tensorflow/models/blob/master/samples/core/get_started/custom_estimator.py , the only difference being that i changed the minimize() function for the compute_gradients() and the apply_gradients() function.
grads_and_vars = optimizer.compute_gradients(loss)
// some way to change the weights
train_op = optimizer.apply_gradients(grads_and_vars, global_step=tf.train.get_global_step())
Thanks in advance.
My best guess is that you are looking for tf.assign (from here) to assign values to your Variable tensors.
According to the docs:
Update 'ref' by assigning 'value' to it.
This operation outputs a Tensor that holds the new value of 'ref' after the value has been assigned. This makes it easier to chain operations that need to use the reset value.
I'm new to Tensorflow.
Is it necessary to use tensorflow's function, such as using tf.constant() to replace int32, float32 or else?
Also during computation, using tf.mul() instead of normal Python multiplication *?
Also the print function tf.Print() instead of print()?
As noted here,
A Tensor is a symbolic handle to one of the outputs of an Operation. It does not hold the values of that operation's output, but instead provides a means of computing those values in a TensorFlow Session
So tensor variables aren't like python variables. Rather they specify the relationship between operations in the computational graph. The python variables that you use to describe the graph are for the programmer's convenience, but it might be easier to think about the python variables and the tensor variables as being in parallel namespaces. This example may help:
with tf.Session() as sess:
a = tf.constant([1, 2, 3])
b = tf.Variable([])
b = 2 * a
sess.run(tf.initialize_all_variables())
print(b) # Tensor("mul:0", shape=(3,), dtype=int32)
print(b.eval()) # [2, 4, 6]
b = tf.Print(b, [b]) # [2, 4, 6] (at the command line)
From this you can see:
print(b) returns information about the operation that 'b' refers to as well as the variable shape and data type, but not the value.
b.eval() (or sess.run(b)) returns the value of b as a numpy array which can be printed by a python print()
tf.Print() allows you to see the value of b during graph execution.
Note that the syntax of tf.Print() might seem a little strange to a newbie. As described in the documentation here, tf.Print() is an identity operation that only has the side effect of printing to the command line. The first parameter is just passed through. The second parameter is the list of tensors that are printed and can be different than the first parameter. Also note that in order for tf.Print() to do something, a variable used in a subsequent call to sess.run() needs to be dependent on the tensor output by tf.Print(), otherwise this portion of the graph will not be executed.
Finally with respect to math ops such as tf.mul() vs * many of the normal python ops are overloaded with the equivalent tensorflow ops, as described here.
Because tensorflow is built upon a computation graph. When you construct the graph in python, you are just building is a description of computations (not actually doing the computation). To compute anything, a graph must be launched in a Session. So it's best to do the computation with the tensorflow ops.
https://www.tensorflow.org/versions/r0.11/get_started/basic_usage.html
This is a generic question. I found that in the tensorflow, after we build the graph, fetch data into the graph, the output from graph is a tensor. but in many cases, we need to do some computation based on this output (which is a tensor), which is not allowed in tensorflow.
for example, I'm trying to implement a RNN, which loops times based on data self property. That is, I need use a tensor to judge whether I should stop (I am not using dynamic_rnn since in my design, the rnn is highly customized). I find tf.while_loop(cond,body.....) might be a candidate for my implementation. But the official tutorial is too simple. I don't know how to add more functionalities into the 'body'. Can anyone give me few more complex example?
Also, in such case that if the future computation is based on the tensor output (ex: the RNN stop based on the output criterion), which is very common case. Is there an elegant way or better way instead of dynamic graph?
What is stopping you from adding more functionality to the body? You can build whatever complex computational graph you like in the body and take whatever inputs you like from the enclosing graph. Also, outside of the loop, you can then do whatever you want with whatever outputs you return. As you can see from the amount of 'whatevers', TensorFlow's control flow primitives were built with much generality in mind. Below is another 'simple' example, in case it helps.
import tensorflow as tf
import numpy as np
def body(x):
a = tf.random_uniform(shape=[2, 2], dtype=tf.int32, maxval=100)
b = tf.constant(np.array([[1, 2], [3, 4]]), dtype=tf.int32)
c = a + b
return tf.nn.relu(x + c)
def condition(x):
return tf.reduce_sum(x) < 100
x = tf.Variable(tf.constant(0, shape=[2, 2]))
with tf.Session():
tf.global_variables_initializer().run()
result = tf.while_loop(condition, body, [x])
print(result.eval())
TensorFlow graph is usually built gradually from inputs to outputs, and then executed. Looking at the Python code, the inputs lists of operations are immutable which suggests that the inputs should not be modified. Does that mean that there is no way to update/modify an existing graph?
The TensorFlow tf.Graph class is an append-only data structure, which means that you can add nodes to the graph after executing part of the graph, but you cannot remove or modify existing nodes. Since TensorFlow executes only the necessary subgraph when you call Session.run(), there is no execution-time cost to having redundant nodes in the graph (although they will continue to consume memory).
To remove all nodes in the graph, you can create a session with a new graph:
with tf.Graph().as_default(): # Create a new graph, and make it the default.
with tf.Session() as sess: # `sess` will use the new, currently empty, graph.
# Build graph and execute nodes in here.
Yes, tf.Graph are build in an append-only fashion as #mrry puts it.
But there's workaround:
Conceptually you can modify an existing graph by cloning it and perform the modifications needed along the way. As of r1.1, Tensorflow provides a module named tf.contrib.graph_editor which implements the above idea as a set of convinient functions.
In addition to what #zaxily and #mrry says, I want to provide an example of how to actually do a modification to the graph. In short:
one can not modify existing operations, all ops are final and non-mutable
one may copy an op, modify it's inputs or attributes and add new op back to the graph
all downstream ops that depend on the new/copied op have to be recreated. Yes, a signifficant portion of the graph would be copied copied, which is not a problem
The code:
import tensorflow
import copy
import tensorflow.contrib.graph_editor as ge
from copy import deepcopy
a = tf.constant(1)
b = tf.constant(2)
c = a+b
def modify(t):
# illustrate operation copy&modification
new_t = deepcopy(t.op.node_def)
new_t.name = new_t.name+"_but_awesome"
new_t = tf.Operation(new_t, tf.get_default_graph())
# we got a tensor, let's return a tensor
return new_t.outputs[0]
def update_existing(target, updated):
# illustrate how to use new op
related_ops = ge.get_backward_walk_ops(target, stop_at_ts=updated.keys(), inclusive=True)
new_ops, mapping = ge.copy_with_input_replacements(related_ops, updated)
new_op = mapping._transformed_ops[target.op]
return new_op.outputs[0]
new_a = modify(a)
new_b = modify(b)
injection = new_a+39 # illustrate how to add another op to the graph
new_c = update_existing(c, {a:injection, b:new_b})
with tf.Session():
print(c.eval()) # -> 3
print(new_c.eval()) # -> 42
For tensorflow v>=2.6, using Graph directly have been depcreated
A tf.Graph can be constructed and used directly without a tf.function, as was required in TensorFlow 1, but this is deprecated and it is recommended to use a tf.function instead. If a graph is directly used, other deprecated TensorFlow 1 classes are also required to execute the graph, such as a tf.compat.v1.Session.
That being said, I think your question can be still relevant, I think the kind of problem you are facing might be solved when using tensorflow eager execution. While running tf in eager mode, you can run, modify the graph before building it test it before building it...
TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later. This makes it easy to get started with TensorFlow and debug models, and it reduces boilerplate as well. To follow along with this guide, run the code samples below in an interactive python interpreter.
However be careful eager mode trade debugging/flexbillity with performance/speed, so for production you might consider turning it off.
Lastly, there is other feature of tensorflow, that might be relvant for this probelm, which is tensor slicing, tf.slice.