Suppose I have two placeholder quantities in tensorflow: placeholder_1 and placeholder_2. Essentially I would like the following computational functionality: "if placeholder_1 is defined (ie is given a value in the feed_dict of sess.run()), compute X as f(placeholder_1), otherwise, compute X as g(placeholder_2)." Think of X as being a hidden layer in a neural network that can optionally be computed in these two different ways. Eventually I would use X to produce an output, and I'd like to backpropagate error to the parameters of f or g depending on which placeholder I used.
One could accomplish this using the tf.where(condition, x, y) function if there was a way to make the condition "placeholder_1 has a value", but after looking through the tensorflow documentation on booleans and asserts I couldn't find anything that looked applicable.
Any ideas? I have a vague idea of how I could accomplish this basically by copying part of the network, sharing parameters and syncing the networks after updates, but I'm hoping for a cleaner way to do it.
You can create a third placeholder variable of type boolean to select which branch to use and feed that in at run time.
The logic behind it is that since you are feeding in the placholders at runtime anyways you can determine outside of tensorflow which placeholders will be fed.
Related
Is it possible to reuse tensors in multiple tf-graphs, even after they are reset?
Problem:
I have a large dataset that I want to evaluate with many different tf-graphs.
For each evaluation, tensorflow is reset with tf.compat.v1.reset_default_graph() and initialized completely from scratch.
Imho, it seems kind of dull and slow to call the data-to-tensor procedure every time, so I thought I could just define the data-tensor once and use it for all future evaluation.
Unfortunately, reusing tensors does not seem to be possible, as 'Tensor must be from the same graph as Tensor'.
ValueError: Tensor("Const:0", shape=(1670,), dtype=float32, device=/device:GPU:0) must be from the same graph as Tensor("Const_1:0", shape=(1670,), dtype=float32).
Is it possible to reuse these tensors somehow?
Check out this answer in another answered on another questio. https://stackoverflow.com/a/42616834/13514201
TensorFlow stores all operations on an operational graph. This graph defines what functions output to where, and it links it all together so that it can follow the steps you have set up in the graph to produce your final output. If you try to input a Tensor or operation on one graph into a Tensor or operation on another graph it will fail. Everything must be on the same execution graph.
Try removing with tf.Graph().as_default():
I am encountering something a bit strange (to me) in tensorflow and was hoping someone could shed some light on the situation.
I have a simple neural network that processes images. The cost function I am minimizing is the simple MSE.
At first I implemented the following:
cost = tf.square(DECONV - Y)
which I then passed to my optimizer as follows:
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
I was able to obtain great results with this implementation. However, as I tried to implement a regularizer, I realized that I wasn't passing a scalar value to the optimizer.minimize() but in fact passing a tensor of shape [batch, dim_x, dim_y].
I changed my implementation to the following:
cost = tf.losses.mean_squared_error(Y, DECONV)
as well as many variations of this like:
cost = tf.reduce_mean(tf.square(tf.subtract(DECONV, Y)))
etc.
My issue is that with these new implementations of the MSE I am not able to even come close to the results I obtained using the original "wrong" implementation.
Is the original way a valid way to train? If so, how can I implement regularizers? If not, what am I doing wrong with the new implementations? Why can't I replicate the results?
Can you precise what you mean by
I was able to achieve greater result [..]
I assume that you have another metric than cost - this time an actual scalar, which enables you to compare the models trained by each method.
Also, have you tried adjusting the learning rate for the second method? I ask this because my intuition is that when you ask tensorflow to minimize a tensor (which has no mathematical meaning as far as I know), it minimizes the scalar obtained by summing over all the axis of the tensor. This is how tf.gradients works, and the reason why I think this is happening. So maybe in the second method if you multiply the learning rate by batch*dim_x*dim_y you would get the same behavior as in the first method.
Even if this works, I don't think passing a tensor to the minimize function is a good idea - minimization of a d-dimensional value has no meaning as you have no order rule in such spaces.
Here's my use-case
I am trying to implement the Model Agnostic Meta Learning algorithm. At some phase of the training process I need to calculate the gradients of some variables without actually updating the variables and at a later step I would like to do certain things ONLY if the compute gradient operations are complete.
A simple way to do this is to use tf.control_dependencies()
# In this step I would like to COMPUTE gradients
optimizer = tf.train.AdamOptimizer()
# let's assume that I already have loss and var_list
gradients = optimizer.compute_gradients(loss, var_list)
# In this step I would like to do some things ONLY if the gradients are computed
with tf.control_dependencies([gradients]):
# do some stuff
Problem
Unfortunately the above snippet throws an error since tf.control_dependencies expects gradients to be a tf.Operation or tf.Tensor but compute_gradients returns a list.
Error message:
TypeError: Can not convert a list into a Tensor or Operation.
What I would like?
I would like one of two things:
A way for me to get either a tf.Operation or a tf.Tensor from the optimizer.compute_gradients function that I can use intf.control_dependencies.
Or any other reliable way for me to check if optimizer.compute_gradients is actually computed.
Since gradients is the list of (gradient, variable) pairs you'd like to make sure being calculated, you can covert it to a list of tensors/variables and use it as the control_inputs:
with tf.control_dependencies([t for tup in gradients for t in tup]):
When I read TensorFlow codes, I see people specify placeholders for the input arguments of the functions and then feed the input data in a session.run. A trivial example can be like:
def sigmoid(z):
x = tf.placeholder(tf.float32, name='x')
sigmoid = tf.sigmoid(x)
with tf.Session() as session:
result = session.run(sigmoid, feed_dict={x:z})
return result
I wonder why don't they directly feed the z into the tf.sigmoid(z) and get rid of the placeholder x?
If this is a best practice, what is the reason behind it?
In your example method sigmoid, you basically built a small computation graph (see below) and run it with session.run (in the same method). Yes, it does not add any benefit to use a place-holder in your case.
However, usually people just built the computation graph (and execute the graph with data later). But at the time of building the graph, the data is not needed. That's why we use a place-holder to hold the place of data. Or in other words, it allows us to create our computing operations without needing any data.
Also this should explain why we want to use tf.placehoder instead of tf.Variable for holding training data. In short:
tf.Variable is for trainable parameters of the model.
tf.placeholder is for training data which does not change as model trains.
No initial values are needed for placeholders.
The first dimension of data through feeding could be None thus supporting any batch_size.
I am trying to understand the difference between a placeholder and a variable in TensorFlow:
X = tf.placeholder("float")
W = tf.Variable(rng.randn(), name="weight")
I also read the Stack Overflow's question below. I understand the difference when they are the input of a model.
InvalidArgumentError: You must feed a value for placeholder tensor Placeholder
However, in general, if we are not building a model, is there still a difference between tf.placeholder() and tf.Variable()?
Placeholder
A placeholder is used for feeding external data into a Tensorflow computation (stuff outside the graph). Here's some documentation: (https://www.tensorflow.org/versions/r0.10/how_tos/reading_data/#feeding)
TensorFlow's feed mechanism lets you inject data into any Tensor in a
computation graph. A python computation can thus feed data directly
into the graph.
I personally would draw an analogy from placeholders to reading from standard input.
x = raw_input()
X = tf.placeholder("float")
When you read from standard input, you need to "inject data" from an external source. Same with a placeholder. It lets you "inject data" that's external to the computation graph.
If you're training a learning algorithm, the clear use case of placeholder is to feed in your training data. The training data isn't stored in the computation graph. How are you going to get it into the graph? By injecting it through a placeholder. A placeholder is basically you telling the graph "I don't have this for you yet. But I'll have it for you when I ask you to run."
Variable
A variable is used to store state in your graph. It requires an initial value. One use case could be representing weights of a neural network or something similar. Here's documentation: (https://www.tensorflow.org/api_docs/python/tf/Variable)
A variable maintains state in the graph across calls to run(). You add
a variable to the graph by constructing an instance of the class
Variable.
The Variable() constructor requires an initial value for the variable,
which can be a Tensor of any type and shape. The initial value defines
the type and shape of the variable. After construction, the type and
shape of the variable are fixed. The value can be changed using one of
the assign methods.
I personally would draw an analogy between Tensorflow Variables and assigning a variable in Python to anything that is not dependent on external stuff. For example,
# Tensorflow:
W = tf.Variable(rng.randn(), name="weight")
# Standard python:
w = 5
w = "hello"
w = [1, 2, 3, 4, 5]
W represents some sort of result of your computation. Just like how you must initialize all your variables in Python (you can't just run a command x you have to say x = ...something...), you have to initialize all Variable objects in Tensorflow.
Variable vs. Placeholder
There's not much related between tf.Variable and tf.placeholder in my opinion. You use a Variable if you need to store state. You use a placeholder if you need to input external data.
If you are not building a model, you should still use tf.placeholder if you want to insert external data that you don't necessarily have while you're defining the graph. If you are not building a model, you still need tf.Variable if you want to store some kind of result of your computation while the graph is being run.
Why have both?
I'm not an expert in Tensorflow, so I can only speculate as to why the design has both.
A big difference between placeholders and variables is that placeholders can have variable size, but the shape of a tf.Variable must be specified while constructing the graph.
Variable size placeholders sense: maybe I only want to input a training batch of size 5 right now, but maybe I want to increase the batch size later on. Maybe I don't know ahead of time how many training examples I'm going to get.
Variable size variables don't make sense: tf.Variable holds the learned parameters of your model, and the number of parameters shouldn't change. Furthermore, Tensorflow extends to distributed computation. If you had Variables whose shape changed throughout the computation, it would be very difficult to keep it properly distributed among 1000 computers.
Usually, you build a model and all parameters are known ahead of time, so that's what tf.Variable is probably used to represent. tf.placeholder is probably for everything else outside of your model (or computation graph) and so that can be more flexible.
The most obvious difference between the tf.Variable and the tf.placeholder is that
you use variables to hold and update parameters. Variables are
in-memory buffers containing tensors. They must be explicitly
initialized and can be saved to disk during and after training. You
can later restore saved values to exercise or analyze the model.
Initialization of the variables is done with sess.run(tf.global_variables_initializer()). Also while creating a variable, you need to pass a Tensor as its initial value to the Variable() constructor and when you create a variable you always know its shape.
On the other hand you can't update the placeholder. They also should not be initialized, but because they are a promise to have a tensor, you need to feed the value into them sess.run(<op>, {a: <some_val>}). And at last, in comparison to a variable, placeholder might not know the shape. You can either provide parts of the dimensions or provide nothing at all.
There other differences:
the values inside the variable can be updated during optimizations
variables can be shared, and can be non-trainable
the values inside the variable can be stored after training
when the variable is created, 3 ops are added to a graph (variable op, initializer op, ops for the initial value)
placeholder is a function, Variable is a class (hence an uppercase)
when you use TF in a distributed environment, variables are stored in a special place (parameter server) and are shared between the workers.
Interesting part is that not only placeholders can be fed. You can feed the value to a Variable and even to a constant.