This is a generic question. I found that in the tensorflow, after we build the graph, fetch data into the graph, the output from graph is a tensor. but in many cases, we need to do some computation based on this output (which is a tensor), which is not allowed in tensorflow.
for example, I'm trying to implement a RNN, which loops times based on data self property. That is, I need use a tensor to judge whether I should stop (I am not using dynamic_rnn since in my design, the rnn is highly customized). I find tf.while_loop(cond,body.....) might be a candidate for my implementation. But the official tutorial is too simple. I don't know how to add more functionalities into the 'body'. Can anyone give me few more complex example?
Also, in such case that if the future computation is based on the tensor output (ex: the RNN stop based on the output criterion), which is very common case. Is there an elegant way or better way instead of dynamic graph?
What is stopping you from adding more functionality to the body? You can build whatever complex computational graph you like in the body and take whatever inputs you like from the enclosing graph. Also, outside of the loop, you can then do whatever you want with whatever outputs you return. As you can see from the amount of 'whatevers', TensorFlow's control flow primitives were built with much generality in mind. Below is another 'simple' example, in case it helps.
import tensorflow as tf
import numpy as np
def body(x):
a = tf.random_uniform(shape=[2, 2], dtype=tf.int32, maxval=100)
b = tf.constant(np.array([[1, 2], [3, 4]]), dtype=tf.int32)
c = a + b
return tf.nn.relu(x + c)
def condition(x):
return tf.reduce_sum(x) < 100
x = tf.Variable(tf.constant(0, shape=[2, 2]))
with tf.Session():
result = tf.while_loop(condition, body, [x])
What does the following line of code do? How to interprete?
model.add(tf.keras.layers.Lambda(lambda x: x * 200))
My interpretation:
Lambda is like a function.
>>> f = lambda x: x + 1
>>> f(3)
In the second example the function is called using f(3). But what is the purpose of model.add?
The model.add method adds a layer to the associated Keras model. Now, the argument of this method usually is a Keras layer. In your case, it is a special kind of layer called Lambda. You are right that lambda is a function. In principle, lambda is common syntactic sugar that allows you to declare a simple function without naming it. It would be just like:
def my_func(x):
return x*200
As you can see, this is way more code for a very basic functionality. Coming back to the Lambda layer, this just applies the given function to all of the nodes of the previous layer. If you don't understand what a Keras model is or how machine learning works, at least in a broad sense, you may want to start with some tutorials on that instead of looking into what the individual lines of code do. This way you could become productive way faster.
I bet it is used a a last layer. Normally, you can just a have a Dense layer output. However, you can help the training by scaling up the output to around the same figures as your labels. This will depend on the activation functions you used in your model. LSTM or SimpleRNN use tanh by default and that has an output range of [-1,1]. You will use this Lambda() layer to scale the output by 200 before it adjusts the layer weights.
So if I run this code in Pytorch:
x = torch.ones(2,2, requires_grad=True)
I will get the error:
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
I understand that Pytorch does not allow inplace operations on leaf variables and I also know that there are ways to get around this restrictions. What I don't understand is the philosophy behind this rule. Why is it wrong to change a leaf variable with inplace operations?
As I understand it, any time you do a non-traditional operation on a tensor that was initialized with requires_grad=True, Pytorch throws an error to make sure it was intentional. For example, you normally would only update a weight tensor using optimizer.step().
For another example, I ran into this issue when trying to update the values in a backprop-able tensor during network initialization.
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
The problem is that, because requires_grad=True, the network doesn't know that I'm still initializing the values. If this is what you are trying to do, wrapping the update in a torch.no_grad block is one solution:
with torch.no_grad()
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
Otherwise, you could just set requires_grad=True after you finish initializing the Tensor:
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length))
self.weight_layer[true_ids == 1] = -1.2
self.weight_layer.requires_grad = True
The simple answer to this is that, once autograd creates the graph, and we have created our tensors which are a input to the graph, autograd will construct and track each operation on your created tensor. Now, since this tensor has requires_grad=True, and let's say I do a weight update after loss.backward(), autograd is probably considering as a part of the already created graph, which requires gradients. This leads to the RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
Autograd is confused that why is a tensor which is a part of the computational graph being used outside/being used for some other operations/initialization of it inplace and this is an an problem
if we simply place the code under a
with torch.no_grad(): code here
we disable the gradients, and hence there is essentially it signals to autograd that this operation is not a part of our dynamic graph updates.
PS: I will expand this answer by writing a blog of this on Medium
I want to use the external optimizer interface within tensorflow, to use newton optimizers, as tf.train only has first order gradient descent optimizers. At the same time, i want to build my network using tf.keras.layers, as it is way easier than using tf.Variables when building large, complex networks. I will show my issue with the following, simple 1D linear regression example:
import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np
#generate data
no = 100
data_x = np.linspace(0,1,no)
data_y = 2 * data_x + 2 + np.random.uniform(-0.5,0.5,no)
data_y = data_y.reshape(no,1)
data_x = data_x.reshape(no,1)
# Make model using keras layers and train
x = tf.placeholder(dtype=tf.float32, shape=[None,1])
y = tf.placeholder(dtype=tf.float32, shape=[None,1])
output = tf.keras.layers.Dense(1, activation=None)(x)
loss = tf.losses.mean_squared_error(data_y, output)
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method="L-BFGS-B")
sess = K.get_session()
tf_dict = {x : data_x, y : data_y}
optimizer.minimize(sess, feed_dict = tf_dict, fetches=[loss], loss_callback=lambda x: print("Loss:", x))
When running this, the loss just does not change at all. When using any other optimizer from tf.train, it works fine. Also, when using tf.layers.Dense() instead of tf.keras.layers.Dense() it does work using the ScipyOptimizerInterface. So really the question is what is the difference between tf.keras.layers.Dense() and tf.layers.Dense(). I saw that the Variables created by tf.layers.Dense() are of type tf.float32_ref while the Variables created by tf.keras.layers.Dense() are of type tf.float32. As far as I now, _ref indicates that this tensor is mutable. So maybe that's the issue? But then again, any other optimizer from tf.train works fine with keras layers.
After a lot of digging I was able to find a possible explanation.
ScipyOptimizerInterface uses feed_dicts to simulate the updates of your variables during the optimization process. It only does an assign operation at the very end. In contrast, tf.train optimizers always do assign operations. The code of ScipyOptimizerInterface is not that complex so you can verify this easily.
Now the problem is that assigining variables with feed_dict is working mostly by accident. Here is a link where I learnt about this. In other words, assigning variables via feed dict, which is what ScipyOptimizerInterface does, is a hacky way of doing updates.
Now this hack mostly works, except when it does not. tf.keras.layers.Dense uses ResourceVariables to model the weights of the model. This is an improved version of simple Variables that has cleaner read/write semantics. The problem is that under the new semantics the feed dict update happens after the loss calculation. The link above gives some explanations.
Now tf.layers is currently a thin wrapper around tf.keras.layer so I am not sure why it would work. Maybe there is some compatibility check somewhere in the code.
The solutions to adress this are somewhat simple.
Either avoid using components that use ResourceVariables. This can be kind of difficult.
Patch ScipyOptimizerInterface to do assignments for variables always. This is relatively easy since all the required code is in one file.
There was some effort to make the interface work with eager (that by default uses the ResourceVariables). Check out this link
I think the problem is with the line
output = tf.keras.layers.Dense(1, activation=None)(x)
In this format output is not a layer but rather the output of a layer, which might be preventing the wrapper from collecting the weights and biases of the layer and feed them to the optimizer. Try to write it in two lines e.g.
output = tf.keras.layers.Dense(1, activation=None)
res = output(x)
If you want to keep the original format then you might have to manually collect all trainables and feed them to the optimizer via the var_list option
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, var_list = [Trainables], method="L-BFGS-B")
Hope this helps.
How do I implement product AND in Theano? Mathematically that is the equivalent of multiply all of the previous layer (no weights). I think my code works for a batch size of 1 but I want it to work for batches.
Here is what I have tried. Note, I have no clue what I am doing.
Product AND function
def prod_and(result, k):
elif k[0][0] == 7:
return theano.tensor.stack([
[result[i][0] * result[i][1]*\
result[i][2] * result[i][3]*\
result[i][4] * result[i][5]*\
result[i][6]] for i in np.arange(1)
Product AND Layer
class ProdAnd(layers.BaseLayer):
# Begin by initializing.
def initialize(self,):
super(ProdAnd, self).initialize()
# Create output from input.
def output(self, *input_values):
return prod_and(input_values[0], self.input_shape)
I think that my problem arises from my inability to understand the relationship between neupy types (example what is connecting neupy layers {layers.0}). Also I don't think that I understand how Theano implements batch differently than stochastic.
Ideally the best answer would include the fix to the product and function as well as an explanation on how inputs and outputs work in this particular example.
An example where you would use this is a fuzzy-neuro network that utilizes Takagi-Sugeno style fuzzy inference (look at layer 5).
Im trying to make the problem easy so I broke it down into simply multiplying all of the input of the previous layer.
The equation is given by:
where x is the input layer and y is the output layer. Differentiating with respect to each variables leaves the other variable from the previous layer (so it is not a difficult computation).
I'm new to Tensorflow.
Is it necessary to use tensorflow's function, such as using tf.constant() to replace int32, float32 or else?
Also during computation, using tf.mul() instead of normal Python multiplication *?
Also the print function tf.Print() instead of print()?
As noted here,
A Tensor is a symbolic handle to one of the outputs of an Operation. It does not hold the values of that operation's output, but instead provides a means of computing those values in a TensorFlow Session
So tensor variables aren't like python variables. Rather they specify the relationship between operations in the computational graph. The python variables that you use to describe the graph are for the programmer's convenience, but it might be easier to think about the python variables and the tensor variables as being in parallel namespaces. This example may help:
with tf.Session() as sess:
a = tf.constant([1, 2, 3])
b = tf.Variable([])
b = 2 * a
print(b) # Tensor("mul:0", shape=(3,), dtype=int32)
print(b.eval()) # [2, 4, 6]
b = tf.Print(b, [b]) # [2, 4, 6] (at the command line)
From this you can see:
print(b) returns information about the operation that 'b' refers to as well as the variable shape and data type, but not the value.
b.eval() (or returns the value of b as a numpy array which can be printed by a python print()
tf.Print() allows you to see the value of b during graph execution.
Note that the syntax of tf.Print() might seem a little strange to a newbie. As described in the documentation here, tf.Print() is an identity operation that only has the side effect of printing to the command line. The first parameter is just passed through. The second parameter is the list of tensors that are printed and can be different than the first parameter. Also note that in order for tf.Print() to do something, a variable used in a subsequent call to needs to be dependent on the tensor output by tf.Print(), otherwise this portion of the graph will not be executed.
Finally with respect to math ops such as tf.mul() vs * many of the normal python ops are overloaded with the equivalent tensorflow ops, as described here.
Because tensorflow is built upon a computation graph. When you construct the graph in python, you are just building is a description of computations (not actually doing the computation). To compute anything, a graph must be launched in a Session. So it's best to do the computation with the tensorflow ops.