Tensorflow Variables and Ops vs. Python Equivalents - python

I'm new to Tensorflow.
Is it necessary to use tensorflow's function, such as using tf.constant() to replace int32, float32 or else?
Also during computation, using tf.mul() instead of normal Python multiplication *?
Also the print function tf.Print() instead of print()?

As noted here,
A Tensor is a symbolic handle to one of the outputs of an Operation. It does not hold the values of that operation's output, but instead provides a means of computing those values in a TensorFlow Session
So tensor variables aren't like python variables. Rather they specify the relationship between operations in the computational graph. The python variables that you use to describe the graph are for the programmer's convenience, but it might be easier to think about the python variables and the tensor variables as being in parallel namespaces. This example may help:
with tf.Session() as sess:
a = tf.constant([1, 2, 3])
b = tf.Variable([])
b = 2 * a
sess.run(tf.initialize_all_variables())
print(b) # Tensor("mul:0", shape=(3,), dtype=int32)
print(b.eval()) # [2, 4, 6]
b = tf.Print(b, [b]) # [2, 4, 6] (at the command line)
From this you can see:
print(b) returns information about the operation that 'b' refers to as well as the variable shape and data type, but not the value.
b.eval() (or sess.run(b)) returns the value of b as a numpy array which can be printed by a python print()
tf.Print() allows you to see the value of b during graph execution.
Note that the syntax of tf.Print() might seem a little strange to a newbie. As described in the documentation here, tf.Print() is an identity operation that only has the side effect of printing to the command line. The first parameter is just passed through. The second parameter is the list of tensors that are printed and can be different than the first parameter. Also note that in order for tf.Print() to do something, a variable used in a subsequent call to sess.run() needs to be dependent on the tensor output by tf.Print(), otherwise this portion of the graph will not be executed.
Finally with respect to math ops such as tf.mul() vs * many of the normal python ops are overloaded with the equivalent tensorflow ops, as described here.

Because tensorflow is built upon a computation graph. When you construct the graph in python, you are just building is a description of computations (not actually doing the computation). To compute anything, a graph must be launched in a Session. So it's best to do the computation with the tensorflow ops.
https://www.tensorflow.org/versions/r0.11/get_started/basic_usage.html

Related

TensorFlow 2 - does NumPy numerical values and arrays cause new graphs for TF Function?

Does "numerical Python values" stated in the Hands on ML 2 include NumPy int, float, and array? Do we need to explicitly create a TF Tensor or a TF DataSet from a NumPy construct as the argument of a TF Function?
Hands on ML 2 Chapter 12 Auto Graph and Tracing:
By default, a TF Function generates a new graph for every unique set
of intput shapes and data types and caches it for subsequent calls.
... However this is only true for tensor arguments: if you pass
numerical Python values to a TF Function, a new graph will be
generated for every distinct value. ...
if you pass numerical Python values to a TF function, a new graph will
be generated for every distinct values. If you call a TF function many
times with different numerical Python values, then many graphs will be
generated, slowing down your program and using up a lot of RAM (you
must delete the TF Function to release it).
Python values should be reserved for arguments that will have few
unique values, such as hyper parameters like the number of neurons per
layer. This allows TensorFlow to better optimize each variant fo yor
model.
The 3rd rule stated in the TensorFlow document Rules of tracing corresponds with Python int, float, boolean, str , etc that will cause a new graph part. But not sure if the 5th rule (all other Python types) applies to NumPy constructs.
A Function determines whether to reuse a traced ConcreteFunction by computing a cache key from an input's args and kwargs. A cache key is a key that identifies a ConcreteFunction based on the
input args and kwargs of the Function call, according to the following
rules (which may change):
The key generated for a tf.Tensor is its shape and dtype.
The key generated for a tf.Variable is a unique variable id.
The key generated for a Python primitive (like int, float, str) is its value.
The key generated for nested dicts, lists, tuples, namedtuples, and attrs is the flattened tuple of leaf-keys (see nest.flatten). (As a
result of this flattening, calling a concrete function with a
different nesting structure than the one used during tracing will
result in a TypeError).
For all other Python types the key is unique to the object. This way a
function or method is traced independently for each instance it is
called with.
I suppose the fact tf.numpy_function exists suggests that the TF Function tracing will generate a new graph, but need a definite confirmation.
I dont think changing the values inside numpy arrays will generate a new graph. Consider the following minimal code exampls:
#tf.function
def test(input):
print("Tracing with input= ", input)
tf.print("Executing with input = ", input)
The first print is only executed during tracing, the second for every call. Calling it with a list leads to:
test([1,2])
test([3,4])
>>> Tracing with input = [1, 2]
>>> Executing with input [1, 2]
>>> Tracing with input = [3, 4]
>>> Executing with input [3, 4]
whereas calling it with numpy-arrays leads to:
test(np.array([1,2]))
test(np.array([3,4]))
>>> Tracing with input = Tensor("input:0", shape=(2,), dtype=int32)
>>> Executing with input [1 2]
>>> Executing with input [3, 4]
Here no tracing is done for the second call. This suggests at least that numpy-arrays are handled the same way as tensorflow tensors.

Why doesn't pytorch allow inplace operations on leaf variables?

So if I run this code in Pytorch:
x = torch.ones(2,2, requires_grad=True)
x.add_(1)
I will get the error:
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
I understand that Pytorch does not allow inplace operations on leaf variables and I also know that there are ways to get around this restrictions. What I don't understand is the philosophy behind this rule. Why is it wrong to change a leaf variable with inplace operations?
As I understand it, any time you do a non-traditional operation on a tensor that was initialized with requires_grad=True, Pytorch throws an error to make sure it was intentional. For example, you normally would only update a weight tensor using optimizer.step().
For another example, I ran into this issue when trying to update the values in a backprop-able tensor during network initialization.
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
The problem is that, because requires_grad=True, the network doesn't know that I'm still initializing the values. If this is what you are trying to do, wrapping the update in a torch.no_grad block is one solution:
with torch.no_grad()
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
Otherwise, you could just set requires_grad=True after you finish initializing the Tensor:
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length))
self.weight_layer[true_ids == 1] = -1.2
self.weight_layer.requires_grad = True
The simple answer to this is that, once autograd creates the graph, and we have created our tensors which are a input to the graph, autograd will construct and track each operation on your created tensor. Now, since this tensor has requires_grad=True, and let's say I do a weight update after loss.backward(), autograd is probably considering as a part of the already created graph, which requires gradients. This leads to the RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
Autograd is confused that why is a tensor which is a part of the computational graph being used outside/being used for some other operations/initialization of it inplace and this is an an problem
if we simply place the code under a
with torch.no_grad(): code here
we disable the gradients, and hence there is essentially it signals to autograd that this operation is not a part of our dynamic graph updates.
PS: I will expand this answer by writing a blog of this on Medium

Explicit vs implicit type definition in TensorFlow

I'm just beginning to learn TensorFlow. Quoting from the documentation:
Let's build a simple computational graph. The most basic operation is a constant. The Python function that builds the operation takes a tensor value as input. The resulting operation takes no inputs. When run, it outputs the value that was passed to the constructor. We can create two floating point constants a and b as follows:
a = tf.constant(3.0, dtype=tf.float32)
b = tf.constant(4.0) # also tf.float32 implicitly
total = a + b
print(a)
print(b)
print(total)
The second constant is implicitly typed as a float32. Is that based on the explicit typing of the first constant? And does that imply that the first dtype is required? tf.constant documentation would imply that it does not:
If the argument dtype is not specified, then the type is inferred from the type of value.
But then it would be unnecessary to explicitly type the 3.0 constant above.
I'm just looking for some clarification on this, since, like I said, I'm just starting out.
But then it would be unnecessary to explicitly type the 3.0 constant
above.
Absolutely correct.
a = tf.constant(3.0, dtype=tf.float32)
is equivalent to:
a = tf.constant(3.0)
The documentation is just demonstrating the different overloads. We might choose to explicitly provide the type if we want a different numerical precision (or even just to aid human readability) but if you want the default data type TF infers, then it's entirely unnecessary.

what happens when I write a function using tensorflow ops

I write a function using tensorflow ops. I know the fact when I run the function, it will add many ops to the graph. But I am confused with how to get access of these ops.
for example:
def assign_weights():
with tf.name_scope('zheng'):
v = tf.Variable(0, 'v', dtype=tf.float32)
b = tf.placeholder(tf.float32, shape=())
z = tf.assign(v, b)
return z, b
I can use feed_dict to pass a value to b, only if I set b as a return value. Otherwise, there is no way to access b. If we want to access many ops in the function scope, we should set many return values. This is very ugly.
I want to know what happens under the hood when I run functions using tensorflow and how to get access of the ops in the function scope.
Thank you!
Obviously, it's true that to access an op (or tensor) we need some reference to it. IMHO, one standard workaround is to build your graph in a class and make certain tensors attributes of the class and access them through the object.
Alternatively, if you're more inclined to the functional approach, a better way than returning all relevant ops and tensors separately would be to return a dict (or namedtuple).
Additionally, there are also specialized functions that return ops by name: e.g. get_operation_by_name.
As an aside to this question, you might also want to try out eager execution, which is imperative.
3 things happen when you use op function:
create and add a compute node to default graph
set your input as the node input tensor
set node output tensor as return value
for example, a = tf.add(b, c, name='add'),
add a node with op Add to default graph, with name 'add'
set b and c as node input tensor
set node output, with name 'add:0', to a
So you can access nodes via sess.graph, there are many functions to access nodes, say, get_operation_by_name.
Also, you can operate the graph via sess.graph_def, which is serialized graph with protobuf, you can find the protobuf definition in the tensorflow source code, tensorflow/core/framework, some .proto files there.

How to use tf.while_loop() in tensorflow

This is a generic question. I found that in the tensorflow, after we build the graph, fetch data into the graph, the output from graph is a tensor. but in many cases, we need to do some computation based on this output (which is a tensor), which is not allowed in tensorflow.
for example, I'm trying to implement a RNN, which loops times based on data self property. That is, I need use a tensor to judge whether I should stop (I am not using dynamic_rnn since in my design, the rnn is highly customized). I find tf.while_loop(cond,body.....) might be a candidate for my implementation. But the official tutorial is too simple. I don't know how to add more functionalities into the 'body'. Can anyone give me few more complex example?
Also, in such case that if the future computation is based on the tensor output (ex: the RNN stop based on the output criterion), which is very common case. Is there an elegant way or better way instead of dynamic graph?
What is stopping you from adding more functionality to the body? You can build whatever complex computational graph you like in the body and take whatever inputs you like from the enclosing graph. Also, outside of the loop, you can then do whatever you want with whatever outputs you return. As you can see from the amount of 'whatevers', TensorFlow's control flow primitives were built with much generality in mind. Below is another 'simple' example, in case it helps.
import tensorflow as tf
import numpy as np
def body(x):
a = tf.random_uniform(shape=[2, 2], dtype=tf.int32, maxval=100)
b = tf.constant(np.array([[1, 2], [3, 4]]), dtype=tf.int32)
c = a + b
return tf.nn.relu(x + c)
def condition(x):
return tf.reduce_sum(x) < 100
x = tf.Variable(tf.constant(0, shape=[2, 2]))
with tf.Session():
tf.global_variables_initializer().run()
result = tf.while_loop(condition, body, [x])
print(result.eval())

Categories