I am trying to implement an Optimizer in tensorflow, and have been looking at optimizer code from old version of tensorflow, and want to understand what does this function _get_variable_for do? It is the first function in the optimizer file.
Any help would be appreciated.
Thanking You.
I see that this function checks two conditions.
ResourceVariable and VarHandleOp
This is a ResourceVariable according to the comments in the code
"For example, if there is more than one assignment to a ResourceVariable in
a single session.run call there is a well-defined value for each operation
which uses the variable's value if the assignments and the read are connected
by edges in the graph. Consider the following example, in which two writes
can cause tf.Variable and tf.ResourceVariable to behave differently:"
a = tf.Variable(1.0, use_resource=True)
a.initializer.run()
assign = a.assign(2.0)
with tf.control_dependencies([assign]):
b = a.read_value()
with tf.control_dependencies([b]):
other_assign = a.assign(3.0)
with tf.control_dependencies([other_assign]):
# Will print 2.0 because the value was read before other_assign ran. If
# `a` was a tf.Variable instead, 2.0 or 3.0 could be printed.
tf.Print(b, [b]).eval()
VarHandleOp seems to have deeper semantics as per this
"A common approach to managing where variables are placed, is to create a method to determine where each Op is to be placed and use that method in place of a specific device name when calling with tf.device(): Consider a scenario where a model is being trained on 2 GPUs and the variables are to be placed on the CPU. There would be a loop for creating and placing the "towers" on each of the 2 GPUs. A custom device placement method would be created that watches for Ops of type Variable, VariableV2, and VarHandleOp and indicates that they are to be placed on the CPU. All other Ops would be placed on the target GPU."
It explains this scenario further with sample code.
Related
The tf.Variable documentation
contains the following warning:
WARNING: tf.Variable objects by default have a non-intuitive memory model. A Variable is represented internally as a mutable Tensor which can non-deterministically alias other Tensors in a graph. The set of operations which consume a Variable and can lead to aliasing is undetermined and can change across TensorFlow versions. Avoid writing code which relies on the value of a Variable either changing or not changing as other operations happen. For example, using Variable objects or simple functions thereof as predicates in a tf.cond is dangerous and error-prone:
v = tf.Variable(True)
tf.cond(v, lambda: v.assign(False), my_false_fn) # Note: this is broken.
I don't quite understand what time means and why the example above is broken. What does it mean that one cannot rely on the value of a Variable? Is it possible to have an example where the code above works not as expected?
As the documentation of TensorFlow you should use:
v = tf.Variable(True, use_resource=True)
tf.cond(v, lambda: v.assign(False), my_false_fn)
See here for more information:
tf.Variable
I write a function using tensorflow ops. I know the fact when I run the function, it will add many ops to the graph. But I am confused with how to get access of these ops.
for example:
def assign_weights():
with tf.name_scope('zheng'):
v = tf.Variable(0, 'v', dtype=tf.float32)
b = tf.placeholder(tf.float32, shape=())
z = tf.assign(v, b)
return z, b
I can use feed_dict to pass a value to b, only if I set b as a return value. Otherwise, there is no way to access b. If we want to access many ops in the function scope, we should set many return values. This is very ugly.
I want to know what happens under the hood when I run functions using tensorflow and how to get access of the ops in the function scope.
Thank you!
Obviously, it's true that to access an op (or tensor) we need some reference to it. IMHO, one standard workaround is to build your graph in a class and make certain tensors attributes of the class and access them through the object.
Alternatively, if you're more inclined to the functional approach, a better way than returning all relevant ops and tensors separately would be to return a dict (or namedtuple).
Additionally, there are also specialized functions that return ops by name: e.g. get_operation_by_name.
As an aside to this question, you might also want to try out eager execution, which is imperative.
3 things happen when you use op function:
create and add a compute node to default graph
set your input as the node input tensor
set node output tensor as return value
for example, a = tf.add(b, c, name='add'),
add a node with op Add to default graph, with name 'add'
set b and c as node input tensor
set node output, with name 'add:0', to a
So you can access nodes via sess.graph, there are many functions to access nodes, say, get_operation_by_name.
Also, you can operate the graph via sess.graph_def, which is serialized graph with protobuf, you can find the protobuf definition in the tensorflow source code, tensorflow/core/framework, some .proto files there.
As the title says I'm using tensorflow version 1.2 built from source for my machine. I don't believe that affects my question though.
What is the difference between these two chunks of code?
The top one causes me to never get values assigned while training but the bottom does. I am copying all my epoch data over to the gpu and then grabbing the data for each batch as I need it, so this code runs at the beginning of every batch inside the same session.
The code is in python and all of this is defined inside my model class.
All of the self.data objects are 3D float32 tensors.
## the index i.e the current step in the epoch
index = tf.to_int32(self.step, name="step_to_int")
## code that doesn't work
tf.assign(self.input_data, self.all_input_data[index])
tf.assign(self.targets, self.all_target_data[index])
## code that works
self.input_data = self.all_input_data[index]
self.targets = self.all_target_data[index]
Remember that pretty much everything is an operation in TensorFlow. I believe the issue in your code is that you never run the assignment operation (you just evaluate the input_data tensor as it has been initialised).
You then need to assign the return of the assignment method to a variable:
self.input_data = tf.assign(self.input_data, self.all_input_data[index])
This variable will hold both the new value and the reassignment operation, and so whenever you will evaluate it, it will update its value.
Quoting the doc string:
Returns:
A Tensor that will hold the new value of 'ref' after
the assignment has completed.
I am trying to perform two tasks (A and B) which have inputs inp_A and inp_B and corresponding outputs out_A, out_B.
Task A is to be first achieved by a Graph g_A. After Task A is finished, I wish to use the weights of g_A into a new graph g_B which is a bigger graph ( a superset of g_A).
I am unsure how to do this in tensorflow.
I am using this kind of split for training and validation purposes, where I create dedicated input and output pipelines but share the inception part of the graph, although I'm using the same graph (as of tf.Graph()), but different (unconnected) subgraphs within it.
Within one tf.Graph() the general concept is variable sharing which you can achieve by using tf.variable_scope() to group your variables by concept and them creating and refetching them by using tf.get_variable() (instead of using tf.Variable() directly). The first time it's called it will create the variables, the second time it will reuse them - provided the name of the variable stays the same.
However I found it much easier to use tf.make_template() instead, which will wrap a function that creates a subgraph entirely and on every call creates a new instance of the graph while sharing all of its variables.
The documentation example for that is
def my_op(x, scalar_name):
var1 = tf.get_variable(scalar_name,
shape=[],
initializer=tf.constant_initializer(1))
return x * var1
create_instance = tf.make_template('scale_by_y', my_op, scalar_name='y')
z = create_instance(input1)
w = create_instance(input2)
Here, each call to create_instance will create a new node called scale_by_y in the graph that performs the operation defined by my_op() while sharing its internal variables. (In the example, the parameter scalar_name is statically bound to the value of y, resulting in variable scale_by_y/y to be created (and reused) in the graph. I find that to be more confusing than helpful.)
It does not care about parent scopes, so
with tf.variable_scope('training'):
z1 = create_instance(input1)
with tf.variable_scope('validation'):
z2 = create_instance(input2)
works. It also might or might not work across different tf.Graph() instances, though I doubt it.
I'm finally using my LSTM model to predict things. However, I've run into a new problem that I don't quite understand. If I try to predict something using
sess.run(pred, feed_dict={x: xs})
It works great for the first prediction, but any subsequent predictions throw the error:
ValueError: Variable weight_def/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?
Now, there are a TON of topics on this - and most of them are easily solved by doing what it asks - just create a variable scope around the offending line and make variable reuse true. Now, if I do that I get the following error:
ValueError: Variable rnn_def/RNN/BasicLSTMCell/Linear/Matrix does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
This is causing me quite the headache. I've read the Tensorflow Variable Sharing documentation over and over, and I can't for the life of me figure out what I am doing wrong. Here the offending lines
with tf.variable_scope("rnn_def"):
outputs, states = rnn.rnn(self.__lstm_cell,
self.__x,
dtype=tf.float32)
self.__outputs = outputs
self.__states = states
I have this code nested in a larger class that just contains the remainder of the graph. To train it, I just call my "train" method over and over again. Which seems to work fine, the problem ends up being prediction.
So my question is two fold:
Why do I require some sort of variable sharing only after the first prediction but the first call doesn't fail? What do I need to fix this code so I can predict more than once without causing an error?
When is variable sharing useful, and why is Tensorflow creating new variables each time I run it? How can I prevent this (do I want to prevent it?)?
Thank you!
Add a print statement to that block of code. I suspect it is being called multiple times. Or maybe you are creating multiple instances of the class in which each class should have its own scope name.
To answer your questions.
Why do I require some sort of variable sharing only after the first
prediction but the first call doesn't fail? What do I need to fix this
code so I can predict more than once without causing an error?
No you don't. That block of code creating the RNN is probably being accidentally called multiple times.
When is variable sharing useful, and why is Tensorflow creating new
variables each time I run it? How can I prevent this (do I want to
prevent it?)?
It is useful in the following case where I have different input sources for part of my graph depending on whether is is training or predicting.
x_train, h_train = ops.sharpen_cell(x_train, h_train, 2, n_features, conv_size, n_convs, conv_activation, 'upsampler')
self.cost += tf.reduce_mean((x_train - y_train) ** 2)
level_scope.reuse_variables()
x_gen, h_gen = ops.sharpen_cell(x_gen, h_gen, 2, n_features, conv_size, n_convs, conv_activation, 'upsampler')
self.generator_outputs.append(tf.clip_by_value(x_gen, -1, 1))
In this example is reuses the variables for the generator which were trained with the trainer. It is also useful if you want to unroll and RNN in a loop. Such as in this case...
y = #initial value
state = #initial state
rnn = #some sort of RNN cell
with tf.variable_scope("rnn") as scope:
for t in range(10):
y, state = rnn(y, state)
scope.reuse_variabled()
In this case it will reuse the rnn weights between time steps which is the desired behavior for an RNN.