My use-case of TensorFlow requires me to build a new computation graph for each instance that needs to be processed. This ends up blowing up the memory requirements.
Apart from a few tf.Variables that are model parameters, I'd like to delete all other nodes. Other people with similar problems have found tf.reset_default_graph() to be useful, but this would get rid of the model parameters that I need to persist.
What can I use to delete all but these nodes?
Edit:
The instance specific computations actually just means I am adding a lot new operations. I believe these operations are the reason behind the memory issues.
UPDATE:
See the recently released tensorflow fold (https://github.com/tensorflow/fold) which allows dynamic construction of computation graphs.
The tf.graph data-structure is designed to be an append-only data-structure. It is therefore not possible to remove or modify existing nodes. Usually this is not a problem, as only the necessary subgraph is processed when running a session.
What you can try is to copy the Variabels of your graph into a new graph and delete the old one. To archive this just run:
old_graph = tf.get_default_graph() # Save the old graph for later iteration
new_graph = tf.graph() # Create an empty graph
new_graph.set_default() # Makes the new graph default
If you want to iterate over all nodes in the old graph use:
for node in old_graph.get_operations():
if node.type == 'Variable':
# read value of variable and copy it into new Graph
Alternatively you can use:
for node in old_graph.get_collection('trainable_variables'):
# iterates over all trainable Variabels
# read and create new variable
Have also a look at python/framework/ops.py : 1759 to see more ways on manipulating nodes in graph.
However before you mess around with tf.Graph I would strongly recommend to consider whether this is really required. Usually one can try to generalize the computation and use shared variables build a graph, so that each instance you want to process is a subgraph of this graph.
Related
As I understand, tf.reset_default_graph() only creates a new graph and sets it equal to the default graph. So, the previously created tensors would just be lying around occupying the memory. I have also read the unreferenced tensors are not garbage collected (like normal variables in Python are).
If I am running a cross-validation to search for a set of hyperparameters and thus creating the same graph, again and again, how do I get rid of the previously created tensors?
I had the same problem when designing experiments, after researching about this problem, the only solution that worked for me is this one. As you can read in that link, it seems to be a design flaw and the TF team doesn't seem to care about fixing.
The solution is to create a new process for each cross-validation iteration. So when the process finishes the system kills it and releases the resources automatically.
import multiprocessing
def evaluate(...):
import tensorflow as tf
# Your logic
for ... in cross_valiadtion_loop:
process_eval = multiprocessing.Process(target=evaluate, args=(...))
process_eval.start()
process_eval.join()
Let's say we have some method foo we call during graph construction time that returns some tf.Tensors or a nested structure of them every time is called, and multiple other methods that make use of foo's result. For efficiency and to avoid spamming the TF graph with unnecessary repeated operations, it might be tempting to make foo cache its result (to reuse the subgraph it produces) the first time is called. However, that will fail if foo is ever used in the context of a control flow, like tf.cond, tf.map_fn or tf.while_loop.
My questions are:
When is it safe to cache tf.Tensor objects in such a way that does not cause problems with control flows? Perhaps is there some way to retrieve the control flow under which a tf.Tensor was created (if any), store it and compare it later to see if a cached result can be reused?
How would the answer to the question above apply to tf.Operations?
(Question text updated to make clearer that foo creates a new set of tensors every time is called)
TL;DR: TF already caches what it needs to, don't bother with it yourself.
Every time you call sess.run([some_tensors]) TF's engine find the minimum subgraph needed to compute all tensors in [some_tensors] and runs it from top to bottom (possibly on new data, if you're not feeding it the same data).
That means, caching of results in-between sess.run calls is useless towards saving computation, because they will be recomputed anyway.
If, instead, you're concerned with having multiple tensors using the same data as input in one call of sess.run, don't worry, TF is smart enough. if you have input A and B = 2*A, C = A + 1, as long as you do one sess.run call as sess.run([B,C]) A will be evaluated only once (and then implicitly cached by the TF engine).
I am trying to simulate my decentralized algorithm on TensorFlow, so I want to create copies of my Model object, which includes variable/placeholder/constant into each of my Worker objects. For example, a Model contains
self.w = tf.Variable(tf.zeros([10, 784]))
self.X = tf.placeholder(shape=(BATCH_SIZE, 784), dtype=tf.float32)
Now I want to create copies of these things to all Workers so that I can initialize, train and test them separately. Practically, I could use explicit for_loops to create them for each worker, but I am imagining of some Distributor object that copies its own dummy model to all workers instead of going deep and manipulate the Model objects myself.
I have tried
tf.identity, but its converts Variables to Tensors.
copy.deepcopy simply gives errors.
record everything the variable has and use tf.Variable to re-create them. It's cumbersome and not comprehensive.
Any ideas will be appreciated! Thank you!
Create a python function which builds your model and call that function multiple times. Be careful about the variable reuse story.
There is in general no way to replicate all state of a graph inside a graph many times safely.
I have a model where I need to assign to the weights (trainable variables) new external values every N iterations.
I can think of a few solutions:
Save and restore
Not good as I would need to serialization, go through a file system calls, etc. (even if I use something like tmpfs)
Using placeholders and assign operations
I would create a placeholder and assign op for each trainable variable. Everytime I want to assign something to the weights, I ran the assign ops.
However, I understand that this means I will be forced to consider these placeholders in every feed_dict and pass dummy values everytime I run any operation in my graph.
In addition I would be using much more memory than necessary..
Use a feed_dict for trainable variable and trigger ops that assign each variable to itself?
Does this work? Is there any drawback?
Before coding something I thought it was a good idea to ask?
What is the recommended way to assign new external values to variables efficiently (memory/timewise)?
Your 3-rd option sounds like the best one.
You can feed values to tensors that aren’t placeholders.
TensorFlow's feed mechanism lets you inject data into any Tensor in a
computation graph. A python computation can thus feed data directly
into the graph.
Any tensors that are feedable can be fed. To check if a tensor is feedable or not, use: tf.Graph.is_feedable(tensor).
In recent versions of Tensorflow Variable class has load method. It does exactly what you want.
https://www.tensorflow.org/api_docs/python/tf/Variable#load
You can use the assign operations with placeholders.
I will be forced to consider these placeholders in every feed_dict and pass dummy values everytime I run any operation in my graph
In addition I would be using much more memory than necessary..
No. You would only need to feed values to the placeholders when you run the assign operations. Don't make the assign operation part of your training graph and only run them when you want to assign new values.
If the assigning turns out to be a bottleneck (for small N it might slow down your program) you can consider other methods of getting data into TensorFlow.
I have realized that there is some funky stuff going on with the way Tensorflow seems to be managing graphs.
Since building (and rebuilding) models is so tedious, I decided to wrap my custom model in a class so I could easily re-instantiate it elsewhere.
When I was training and testing the code (in the original place) it would work fine, however in the code where I loaded the graph's variables I would get all sorts of weird errors - variable redefinitions and everything else. This (from my last question about a similar thing) was the hint that everything was being called twice.
After doing a TON of tracing, it came down to the way I was using the loaded code. It was being used from within a class that had a structure like so
class MyModelUser(object):
def forecast(self):
# .. build the model in the same way as in the training code
# load the model checkpoint
# call the "predict" function on the model
# manipulate the prediction and return it
And then in some code that uses MyModelUserI had
def test_the_model(self):
model_user = MyModelUser()
print(model_user.forecast()) # 1
print(model_user.forecast()) # 2
and I (obviously) expected to see two forecasts when this was called. Instead, the first forecast was called and worked as expected, but the second call threw a TON of variable reuse ValueError an example of one of these was:
ValueError: Variable weight_def/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?
I managed to quell the errors by adding a series of try/except blocks that used get_variable to create the variable, and then on exception, called reuse_variables on the scope and then get_variable without anything but the name. This brought on a new set of nasty errors, one of which was:
tensorflow.python.framework.errors.NotFoundError: Tensor name "weight_def/weights/Adam_1" not found in checkpoint files
On a whim I said "what if I move the modeling building code to __init__ so its only built once?"
My new model user:
class MyModelUser(object):
def __init__(self):
# ... build the model in the same way as in the training code
# load the model checkpoint
def forecast(self):
# call the "predict" function on the model
# manipulate the prediction and return it
and now:
def test_the_model(self):
model_user = MyModelUser()
print(model_user.forecast()) # 1
print(model_user.forecast()) # 2
Works as expected, printing two forecasts with no errors. This leads me to believe I can also get rid of the variable reuse stuff.
My question is this:
Why did this fix it? In theory, the graph should be reinstanced every single time in the original predict method, so it shouldn't be creating more than one graph. Does Tensorflow persist the graph even after the function completes? Is this why moving the creation code to __init__ worked? This has left me hopelessly confused.
By default, TensorFlow uses a single global tf.Graph instance that is created when you first call a TensorFlow API. If you do not create a tf.Graph explicitly, all operations, tensors, and variables will be created in that default instance. This means that each call in your code to model_user.forecast() will be adding operations to the same global graph, which is somewhat wasteful.
There are (at least) two possible courses of action here:
The ideal action would be to restructure your code so that MyModelUser.__init__() constructs an entire tf.Graph with all of the operations needed to perform forecasting, and MyModelUser.forecast() simply performs sess.run() calls on the existing graph. Ideally, you would only create a single tf.Session as well, because TensorFlow caches information about the graph in the session, and the execution would be more efficient.
The less invasive—but probably less efficient—change would be to create a new tf.Graph for every call to MyModelUser.forecast(). It's unclear from the question how much state is created in the MyModelUser.__init__() method, but you could do something like the following to put the two calls in different graphs:
def test_the_model(self):
with tf.Graph(): # Create a local graph
model_user_1 = MyModelUser()
print(model_user_1.forecast())
with tf.Graph(): # Create another local graph
model_user_2 = MyModelUser()
print(model_user_2.forecast())
TF has a default graph that new operations etc get added to. When you call your function twice, you will add the same things twice to the same graph. So, either build the graph once and evaluate it multiple times (as you have done, which is also the "normal" approach), or, if you want to change things, you can use reset_default_graph https://www.tensorflow.org/versions/r0.11/api_docs/python/framework.html#reset_default_graph to reset the graph in order to have a fresh state.