When is it safe to cache tf.Tensors?

When is it safe to cache tf.Tensors? - python

Let's say we have some method foo we call during graph construction time that returns some tf.Tensors or a nested structure of them every time is called, and multiple other methods that make use of foo's result. For efficiency and to avoid spamming the TF graph with unnecessary repeated operations, it might be tempting to make foo cache its result (to reuse the subgraph it produces) the first time is called. However, that will fail if foo is ever used in the context of a control flow, like tf.cond, tf.map_fn or tf.while_loop.
My questions are:
When is it safe to cache tf.Tensor objects in such a way that does not cause problems with control flows? Perhaps is there some way to retrieve the control flow under which a tf.Tensor was created (if any), store it and compare it later to see if a cached result can be reused?
How would the answer to the question above apply to tf.Operations?
(Question text updated to make clearer that foo creates a new set of tensors every time is called)

TL;DR: TF already caches what it needs to, don't bother with it yourself.
Every time you call sess.run([some_tensors]) TF's engine find the minimum subgraph needed to compute all tensors in [some_tensors] and runs it from top to bottom (possibly on new data, if you're not feeding it the same data).
That means, caching of results in-between sess.run calls is useless towards saving computation, because they will be recomputed anyway.
If, instead, you're concerned with having multiple tensors using the same data as input in one call of sess.run, don't worry, TF is smart enough. if you have input A and B = 2*A, C = A + 1, as long as you do one sess.run call as sess.run([B,C]) A will be evaluated only once (and then implicitly cached by the TF engine).

Related

What is tracing with regard to tf.function

The word "tracing" is mentioned frequently in TensorFlow's guide like Better performance with tf.function
What is "tracing" exactly, does it refer to generating the graph as a result of
calling the tf.function for the first time (and subsequently
depending on the arguments)?
What happens when only part of the computation is annotated with
#tf.function, will it mix eager execution with graph execution?

Yes, "tracing" means to run a Python function and "record" its TensorFlow operations in a graph. Note the traced code may not exactly correspond to the written Python code, if Autograph has performed some transformation. Tracing is ideally only done once, the first time the function is called, so subsequent calls can directly use the traced graph and save the Python code execution. As you say, though, future calls may require retracing the function depending on the given arguments, as explained in the link you posted.
You can call a #tf.function from a function that works in eager mode, in which case, yes, it will sort of "mix" both modes. But if you call an unnanotated function from a #tf.function, then its code will also be traced - that is, you cannot temporarily go back to eager/Python mode from within a #tf.function. That is the reason why, at some point, there was the suggestion that you only needed to annotate higher-level functions, because the lower-level ones would be "graphed" too anyway - although it's not so clear-cut when one should or should not annotate a function, see Should I use #tf.function for all functions? and this GitHub discussion.
EDIT: When I say "you cannot temporarily go back to eager/Python mode from within a #tf.function", I mean #tf.function cannot go out of "traced" mode. Of course, using tf.numpy_function or tf.py_function you can have a traced function that uses eager/Python mode, which will be encapsulated in an operation as part of the traced graph.

Opportunistic caching with reusable custom graphs in Dask

Dask supports defining custom computational graphs as well as opportinistic caching. The question is how can they be used together.
For instance, let's define a very simple computational graph, that computes x+1 operation,
import dask
def compute(x):
graph = {'step1': (sum, [x, 1])}
return dask.get(graph, 'step1')
print('Cache disabled:', compute(1), compute(2))
this yields 2 and 3 as expected.
Now we enable opportunistic caching,
from dask.cache import Cache
cc = Cache(1e9)
cc.register()
print('Cache enabled: ', compute(1), compute(2))
print(cc.cache.data)
we get incorrectly a result of 2 in both cases, because cc.cache.data is {'step1': 2} irrespective of the input.
I imagine this means that the input needs to be hashed (e.g. with dask.base.tokenize and appended to all the keys in the graph. Is there a simpler way of doing it, particularly since the tokenize function is not part of the public API?
The issue is that in complex graphs, a random step name, needs to account for the hash of all the inputs provided to it's children steps, which means that it's necessary to do full graph resolution.

It's important that key names in dask graphs are unique (as you found above). Additionally, we'd like identical computations to have the same key so we can avoid computing them multiple times - this isn't necessary for dask to work though, it just provides some opportunities for optimization.
In dask's internals we make use of dask.base.tokenize to compute a "hash" of the inputs, resulting in deterministic key names. You are free to make use of this function as well. In the issue you linked above we say the function is public, just that the implementation might change (not the signature).
Also note that for many use cases, we recommend using dask.delayed now instead of custom graphs for generating custom computations. This will do the deterministic hashing for you behind the scenes.

How does Tensorflow manage graphs?

I have realized that there is some funky stuff going on with the way Tensorflow seems to be managing graphs.
Since building (and rebuilding) models is so tedious, I decided to wrap my custom model in a class so I could easily re-instantiate it elsewhere.
When I was training and testing the code (in the original place) it would work fine, however in the code where I loaded the graph's variables I would get all sorts of weird errors - variable redefinitions and everything else. This (from my last question about a similar thing) was the hint that everything was being called twice.
After doing a TON of tracing, it came down to the way I was using the loaded code. It was being used from within a class that had a structure like so
class MyModelUser(object):
def forecast(self):
# .. build the model in the same way as in the training code
# load the model checkpoint
# call the "predict" function on the model
# manipulate the prediction and return it
And then in some code that uses MyModelUserI had
def test_the_model(self):
model_user = MyModelUser()
print(model_user.forecast()) # 1
print(model_user.forecast()) # 2
and I (obviously) expected to see two forecasts when this was called. Instead, the first forecast was called and worked as expected, but the second call threw a TON of variable reuse ValueError an example of one of these was:
ValueError: Variable weight_def/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?
I managed to quell the errors by adding a series of try/except blocks that used get_variable to create the variable, and then on exception, called reuse_variables on the scope and then get_variable without anything but the name. This brought on a new set of nasty errors, one of which was:
tensorflow.python.framework.errors.NotFoundError: Tensor name "weight_def/weights/Adam_1" not found in checkpoint files
On a whim I said "what if I move the modeling building code to __init__ so its only built once?"
My new model user:
class MyModelUser(object):
def __init__(self):
# ... build the model in the same way as in the training code
# load the model checkpoint
def forecast(self):
# call the "predict" function on the model
# manipulate the prediction and return it
and now:
def test_the_model(self):
model_user = MyModelUser()
print(model_user.forecast()) # 1
print(model_user.forecast()) # 2
Works as expected, printing two forecasts with no errors. This leads me to believe I can also get rid of the variable reuse stuff.
My question is this:
Why did this fix it? In theory, the graph should be reinstanced every single time in the original predict method, so it shouldn't be creating more than one graph. Does Tensorflow persist the graph even after the function completes? Is this why moving the creation code to __init__ worked? This has left me hopelessly confused.

By default, TensorFlow uses a single global tf.Graph instance that is created when you first call a TensorFlow API. If you do not create a tf.Graph explicitly, all operations, tensors, and variables will be created in that default instance. This means that each call in your code to model_user.forecast() will be adding operations to the same global graph, which is somewhat wasteful.
There are (at least) two possible courses of action here:
The ideal action would be to restructure your code so that MyModelUser.__init__() constructs an entire tf.Graph with all of the operations needed to perform forecasting, and MyModelUser.forecast() simply performs sess.run() calls on the existing graph. Ideally, you would only create a single tf.Session as well, because TensorFlow caches information about the graph in the session, and the execution would be more efficient.
The less invasive—but probably less efficient—change would be to create a new tf.Graph for every call to MyModelUser.forecast(). It's unclear from the question how much state is created in the MyModelUser.__init__() method, but you could do something like the following to put the two calls in different graphs:
def test_the_model(self):
with tf.Graph(): # Create a local graph
model_user_1 = MyModelUser()
print(model_user_1.forecast())
with tf.Graph(): # Create another local graph
model_user_2 = MyModelUser()
print(model_user_2.forecast())

TF has a default graph that new operations etc get added to. When you call your function twice, you will add the same things twice to the same graph. So, either build the graph once and evaluate it multiple times (as you have done, which is also the "normal" approach), or, if you want to change things, you can use reset_default_graph https://www.tensorflow.org/versions/r0.11/api_docs/python/framework.html#reset_default_graph to reset the graph in order to have a fresh state.

Get iterable Tensor without running eval

Is there a way to make a Tensor iterable without running eval() to get its numpy array?
I am trying to iterate through two parts of a tensor after using split() on it, but it happens within the construction of the hidden layers of my neural network, so it needs to happen before I am able to start a session.
import tensorflow as tf
x = tf.placeholder('float', [None, nbits])
layer = [x]
for i in range(1,numbits):
layer.append(tf.add(tf.matmul(weights[i-1], layer[i-1]), biases[i-1]))
aes, bes = tf.split(1, 2, layer[-1])
if i%2 == 1:
for am, a, b in zip(add_layer, aes, bes):
layer.append(am.ex(a, b))
The problem is that layer[-1] is a tf.placeholder at this point, so aes and bes are both tensors, and I can't iterate through them with zip().
Any ideas would be appreciated.

No, there isn't; not directly.
It's easiest to think about Tensorflow programs as being split into two phases: a building Python phase that builds a computation graph, and a execution phase that runs the computation graph. Nothing actually runs during the building phase; all computation happens during the execution phase. The building phase can't depend on the results of the execution phase, except by running the graph (session.run(), .eval(), etc.)
You can't iterate over a Tensor while building the graph, because it doesn't actually get evaluated to a specific set of values until you call session.run(). Instead it's just a reference to a node in the computation graph.
In general, you have to use Tensorflow functions to manipulate Tensors, not Python primitives (like zip). One way I like to think of it is that it's almost like a Tensor is a radioactive object in a sealed box, and you can only handle it indirectly using a robot that can perform a certain set of actions (Tensorflow library functions) :-) So you likely need to find a way to express your task using Tensorflow primitives.
If you gave a complete example of what you're trying to do, it might be possible to say more (it's not clear to me from your code fragment). One possibility might be to use tf.split to split the tensors up into Python lists of subtensors, and then use something like zip on the lists.
I hope that helps!

Return a value coming directly from a function call vs. an intermediate variable

I have a function f(x), which does something and return values (a tuple).
I have another function that call this function , after processing parameters (the whole function operation is irrelevant to the question); and now I would like to know if there are evil intent in returning the function itself, vs runt the function, dump the output in a variable and return the variable.
A variable has a cost, and assign a value to a variable has a cost; but beside that, is there any sorcery that would happen behind the scene, that would make one better than the other ?
def myfunction(self):
[do something]
return f(x)
is the same as
def myfunction(self):
[do something]
b = f(x)
return b
or one is to prefer to the other (and why)? I am talking purely on the OOP persepctive; without considering that create variables and assign has a cost, in terms of memory and CPU cycles.

That doesn't return the function. Returning the function would look like return f. You're returning the result of the function call. Generally speaking, the only reason to save that result before returning it is if you plan to do some other kind of processing on it before the return, in which case it's faster to just refer to a saved value rather than recomputing it. Another reason to save it would be for clarity, turning what might be a long one-liner with extensive chaining into several steps.
There's a possibility that those two functions might produce different results if you have some kind of asynchronous process that modifies your data in the background between saving the reference and returning it, but that's something you'll have to keep in mind based on your program's situation.
In a nutshell, save it if you want to refer to it, or just return it directly otherwise.

Those are practically identical; use whichever one you think is more readable. If the performance of once versus the other actually matters for you, perhaps Python is not the best choice ;).

The cost difference between these is utterly negligible: in the worst case, one extra dictionary store, one extra dictionary lookup and one extra string in memory. In practice it won't even be that bad, since cpython stores local variables in a C array, so it's more like two c level pointer indirections.
As a matter of style, I would usually avoid the unnecessary variable but its possible that it might be better in particular cases. As a guideline, think about things like whether the amalgamated version leads to an excessively long line of code, whether the extra variable has a better name than eg result, and how clear it is that that function call is the result you need (and if it isnt, whether/how much a variable helps).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

When is it safe to cache tf.Tensors? - python

Related

What is tracing with regard to tf.function

Opportunistic caching with reusable custom graphs in Dask

How does Tensorflow manage graphs?

Get iterable Tensor without running eval

Return a value coming directly from a function call vs. an intermediate variable

Categories

Resources