TensorFlow graph is usually built gradually from inputs to outputs, and then executed. Looking at the Python code, the inputs lists of operations are immutable which suggests that the inputs should not be modified. Does that mean that there is no way to update/modify an existing graph?
The TensorFlow tf.Graph class is an append-only data structure, which means that you can add nodes to the graph after executing part of the graph, but you cannot remove or modify existing nodes. Since TensorFlow executes only the necessary subgraph when you call Session.run(), there is no execution-time cost to having redundant nodes in the graph (although they will continue to consume memory).
To remove all nodes in the graph, you can create a session with a new graph:
with tf.Graph().as_default(): # Create a new graph, and make it the default.
with tf.Session() as sess: # `sess` will use the new, currently empty, graph.
# Build graph and execute nodes in here.
Yes, tf.Graph are build in an append-only fashion as #mrry puts it.
But there's workaround:
Conceptually you can modify an existing graph by cloning it and perform the modifications needed along the way. As of r1.1, Tensorflow provides a module named tf.contrib.graph_editor which implements the above idea as a set of convinient functions.
In addition to what #zaxily and #mrry says, I want to provide an example of how to actually do a modification to the graph. In short:
one can not modify existing operations, all ops are final and non-mutable
one may copy an op, modify it's inputs or attributes and add new op back to the graph
all downstream ops that depend on the new/copied op have to be recreated. Yes, a signifficant portion of the graph would be copied copied, which is not a problem
The code:
import tensorflow
import copy
import tensorflow.contrib.graph_editor as ge
from copy import deepcopy
a = tf.constant(1)
b = tf.constant(2)
c = a+b
def modify(t):
# illustrate operation copy&modification
new_t = deepcopy(t.op.node_def)
new_t.name = new_t.name+"_but_awesome"
new_t = tf.Operation(new_t, tf.get_default_graph())
# we got a tensor, let's return a tensor
return new_t.outputs[0]
def update_existing(target, updated):
# illustrate how to use new op
related_ops = ge.get_backward_walk_ops(target, stop_at_ts=updated.keys(), inclusive=True)
new_ops, mapping = ge.copy_with_input_replacements(related_ops, updated)
new_op = mapping._transformed_ops[target.op]
return new_op.outputs[0]
new_a = modify(a)
new_b = modify(b)
injection = new_a+39 # illustrate how to add another op to the graph
new_c = update_existing(c, {a:injection, b:new_b})
with tf.Session():
print(c.eval()) # -> 3
print(new_c.eval()) # -> 42
For tensorflow v>=2.6, using Graph directly have been depcreated
A tf.Graph can be constructed and used directly without a tf.function, as was required in TensorFlow 1, but this is deprecated and it is recommended to use a tf.function instead. If a graph is directly used, other deprecated TensorFlow 1 classes are also required to execute the graph, such as a tf.compat.v1.Session.
That being said, I think your question can be still relevant, I think the kind of problem you are facing might be solved when using tensorflow eager execution. While running tf in eager mode, you can run, modify the graph before building it test it before building it...
TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later. This makes it easy to get started with TensorFlow and debug models, and it reduces boilerplate as well. To follow along with this guide, run the code samples below in an interactive python interpreter.
However be careful eager mode trade debugging/flexbillity with performance/speed, so for production you might consider turning it off.
Lastly, there is other feature of tensorflow, that might be relvant for this probelm, which is tensor slicing, tf.slice.
Related
So if I run this code in Pytorch:
x = torch.ones(2,2, requires_grad=True)
x.add_(1)
I will get the error:
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
I understand that Pytorch does not allow inplace operations on leaf variables and I also know that there are ways to get around this restrictions. What I don't understand is the philosophy behind this rule. Why is it wrong to change a leaf variable with inplace operations?
As I understand it, any time you do a non-traditional operation on a tensor that was initialized with requires_grad=True, Pytorch throws an error to make sure it was intentional. For example, you normally would only update a weight tensor using optimizer.step().
For another example, I ran into this issue when trying to update the values in a backprop-able tensor during network initialization.
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
The problem is that, because requires_grad=True, the network doesn't know that I'm still initializing the values. If this is what you are trying to do, wrapping the update in a torch.no_grad block is one solution:
with torch.no_grad()
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
Otherwise, you could just set requires_grad=True after you finish initializing the Tensor:
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length))
self.weight_layer[true_ids == 1] = -1.2
self.weight_layer.requires_grad = True
The simple answer to this is that, once autograd creates the graph, and we have created our tensors which are a input to the graph, autograd will construct and track each operation on your created tensor. Now, since this tensor has requires_grad=True, and let's say I do a weight update after loss.backward(), autograd is probably considering as a part of the already created graph, which requires gradients. This leads to the RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
Autograd is confused that why is a tensor which is a part of the computational graph being used outside/being used for some other operations/initialization of it inplace and this is an an problem
if we simply place the code under a
with torch.no_grad(): code here
we disable the gradients, and hence there is essentially it signals to autograd that this operation is not a part of our dynamic graph updates.
PS: I will expand this answer by writing a blog of this on Medium
Similar results can be obtained via tf.function and autograph.to_graph.
However this seems to be version dependant.
For example, the function (taken from the official guide):
def square_if_positive(x):
if x > 0:
x = x * x
else:
x = 0.0
return x
Can be evaluated in graph mode using:
autograph.to_graph in TF 1.14
tf_square_if_positive = autograph.to_graph(square_if_positive)
with tf.Graph().as_default():
g_out = tf_square_if_positive(tf.constant( 9.0))
with tf.Session() as sess:
print(sess.run(g_out))
tf.function in TF2.0
#tf.function
def square_if_positive(x):
if x > 0:
x = x * x
else:
x = 0.0
return x
square_if_positive(tf.constant( 9.0))
So:
What is the relationship between tf.function and autograph.to_graph? One can assumes tf.function is using autograph.to_graph (as well as autograph.to_code) internally, but this is far from obvious.
Is the autograph.to_graph snippet still supported in TF2.0 (since it requires tf.Session)? It is present in the autograph doc in TF 1.14, but not in the corresponding doc of TF 2.0
I covered and answered all your questions in a three-part article: "Analyzing tf.function to discover AutoGraph strengths and subtleties": part 1, part 2, part 3.
To summarize and answer your 3 questions:
What is the relationship between tf.function and autograph.to_graph?
tf.function uses AutoGraph by default. What happens when you invoke the first time a tf.function-decorated function is that:
The function body is executed (in TensorFlow 1.x like, thus without eager mode) and its execution is traced (now tf.function knows which nodes are present, which branch of the if to keep and so on)
At the same time, AutoGraph starts and tries to convert to tf.* calls, the Python statements it knows (while -> tf.while, if -> tf.cond, ...)-.
Merging the information from points 1 and 2 a new graph is built, and based on the function name and the type of the parameters it is cached in a map (see the articles for a better understanding).
Is the autograph.to_graph snippet still supported in TF2.0?
Yes, tf.autograph.to_graph is still present and it creates a session internally for you (in TF2 you don't have to worry about them).
At any rate, I suggest you read the three articles linked since they cover in detail this and other peculiarities of tf.function.
#nessuno 's answer is excellent and it helps me a lot. While, actually the doc tf.autograph.to_graph explains the relation ship between autograpsh and tf.funciton directly:
Unlike tf.function, to_graph is a low-level transpiler that converts Python code to TensorFlow graph code. It does not implement any caching, variable management or create any actual ops, and is best used where greater control over the generated TensorFlow graph is desired. Another difference from tf.function is that to_graph will not wrap the graph into a TensorFlow function or a Python callable. Internally, tf.function uses to_graph.
I write a function using tensorflow ops. I know the fact when I run the function, it will add many ops to the graph. But I am confused with how to get access of these ops.
for example:
def assign_weights():
with tf.name_scope('zheng'):
v = tf.Variable(0, 'v', dtype=tf.float32)
b = tf.placeholder(tf.float32, shape=())
z = tf.assign(v, b)
return z, b
I can use feed_dict to pass a value to b, only if I set b as a return value. Otherwise, there is no way to access b. If we want to access many ops in the function scope, we should set many return values. This is very ugly.
I want to know what happens under the hood when I run functions using tensorflow and how to get access of the ops in the function scope.
Thank you!
Obviously, it's true that to access an op (or tensor) we need some reference to it. IMHO, one standard workaround is to build your graph in a class and make certain tensors attributes of the class and access them through the object.
Alternatively, if you're more inclined to the functional approach, a better way than returning all relevant ops and tensors separately would be to return a dict (or namedtuple).
Additionally, there are also specialized functions that return ops by name: e.g. get_operation_by_name.
As an aside to this question, you might also want to try out eager execution, which is imperative.
3 things happen when you use op function:
create and add a compute node to default graph
set your input as the node input tensor
set node output tensor as return value
for example, a = tf.add(b, c, name='add'),
add a node with op Add to default graph, with name 'add'
set b and c as node input tensor
set node output, with name 'add:0', to a
So you can access nodes via sess.graph, there are many functions to access nodes, say, get_operation_by_name.
Also, you can operate the graph via sess.graph_def, which is serialized graph with protobuf, you can find the protobuf definition in the tensorflow source code, tensorflow/core/framework, some .proto files there.
I am trying to implement an Optimizer in tensorflow, and have been looking at optimizer code from old version of tensorflow, and want to understand what does this function _get_variable_for do? It is the first function in the optimizer file.
Any help would be appreciated.
Thanking You.
I see that this function checks two conditions.
ResourceVariable and VarHandleOp
This is a ResourceVariable according to the comments in the code
"For example, if there is more than one assignment to a ResourceVariable in
a single session.run call there is a well-defined value for each operation
which uses the variable's value if the assignments and the read are connected
by edges in the graph. Consider the following example, in which two writes
can cause tf.Variable and tf.ResourceVariable to behave differently:"
a = tf.Variable(1.0, use_resource=True)
a.initializer.run()
assign = a.assign(2.0)
with tf.control_dependencies([assign]):
b = a.read_value()
with tf.control_dependencies([b]):
other_assign = a.assign(3.0)
with tf.control_dependencies([other_assign]):
# Will print 2.0 because the value was read before other_assign ran. If
# `a` was a tf.Variable instead, 2.0 or 3.0 could be printed.
tf.Print(b, [b]).eval()
VarHandleOp seems to have deeper semantics as per this
"A common approach to managing where variables are placed, is to create a method to determine where each Op is to be placed and use that method in place of a specific device name when calling with tf.device(): Consider a scenario where a model is being trained on 2 GPUs and the variables are to be placed on the CPU. There would be a loop for creating and placing the "towers" on each of the 2 GPUs. A custom device placement method would be created that watches for Ops of type Variable, VariableV2, and VarHandleOp and indicates that they are to be placed on the CPU. All other Ops would be placed on the target GPU."
It explains this scenario further with sample code.
I am trying to perform two tasks (A and B) which have inputs inp_A and inp_B and corresponding outputs out_A, out_B.
Task A is to be first achieved by a Graph g_A. After Task A is finished, I wish to use the weights of g_A into a new graph g_B which is a bigger graph ( a superset of g_A).
I am unsure how to do this in tensorflow.
I am using this kind of split for training and validation purposes, where I create dedicated input and output pipelines but share the inception part of the graph, although I'm using the same graph (as of tf.Graph()), but different (unconnected) subgraphs within it.
Within one tf.Graph() the general concept is variable sharing which you can achieve by using tf.variable_scope() to group your variables by concept and them creating and refetching them by using tf.get_variable() (instead of using tf.Variable() directly). The first time it's called it will create the variables, the second time it will reuse them - provided the name of the variable stays the same.
However I found it much easier to use tf.make_template() instead, which will wrap a function that creates a subgraph entirely and on every call creates a new instance of the graph while sharing all of its variables.
The documentation example for that is
def my_op(x, scalar_name):
var1 = tf.get_variable(scalar_name,
shape=[],
initializer=tf.constant_initializer(1))
return x * var1
create_instance = tf.make_template('scale_by_y', my_op, scalar_name='y')
z = create_instance(input1)
w = create_instance(input2)
Here, each call to create_instance will create a new node called scale_by_y in the graph that performs the operation defined by my_op() while sharing its internal variables. (In the example, the parameter scalar_name is statically bound to the value of y, resulting in variable scale_by_y/y to be created (and reused) in the graph. I find that to be more confusing than helpful.)
It does not care about parent scopes, so
with tf.variable_scope('training'):
z1 = create_instance(input1)
with tf.variable_scope('validation'):
z2 = create_instance(input2)
works. It also might or might not work across different tf.Graph() instances, though I doubt it.