Inconsistency between GRU and RNN implementation - python

I'm trying to implement some custom GRU cells using Tensorflow. I need to stack those cells, and I wanted to inherit from tensorflow.keras.layers.GRU. However, when looking at the source code, I noticed that you can only pass a units argument to the __init__ of GRU, while RNN has an argument that is a list of RNNcell, and leverages it to stack those cells calling StackedRNNCells. Meanwhile, GRU only create one GRUCell.
For the paper I'm trying to implement, I actually need to stack GRUCell. Why are the implementation of RNN and GRU different?

While searching for the documentation for these classes to add links, I noticed something that may be tripping you up: there are (currently, just before the official TF 2.0 release) two GRUCell implementations in TensorFlow! There is a tf.nn.rnn_cell.GRUCell and a tf.keras.layers.GRUCell. It looks like the one from tf.nn.rnn_cell is deprecated, and the Keras one is the one you should use.
From what I can tell, the GRUCell has the same __call__() method signature as tf.keras.layers.LSTMCell and tf.keras.layers.SimpleRNNCell, and they all inherit from Layer. The RNN documentation gives some requirements on what the __call__() method of the objects you pass to its cell argument must do, but my guess is that all three of these should meet those requirements. You should be able to just use the same RNN framework and pass it a list of GRUCell objects instead of LSTMCell or SimpleRNNCell.
I can't test this right now, so I'm not sure if you pass a list of GRUCell objects or just GRU objects into RNN, but I think one of those should work.

train_graph = tf.Graph()
with train_graph.as_default():
# Initialize input placeholders
input_text = tf.placeholder(tf.int32, [None, None], name='input')
targets = tf.placeholder(tf.int32, [None, None], name='targets')
lr = tf.placeholder(tf.float32, name='learning_rate')
# Calculate text attributes
vocab_size = len(int_to_vocab)
input_text_shape = tf.shape(input_text)
# Build the RNN cell
lstm = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_size)
drop_cell = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
cell = tf.contrib.rnn.MultiRNNCell([drop_cell] * num_layers)
# Set the initial state
initial_state = cell.zero_state(input_text_shape[0], tf.float32)
initial_state = tf.identity(initial_state, name='initial_state')
# Create word embedding as input to RNN
embed = tf.contrib.layers.embed_sequence(input_text, vocab_size, embed_dim)
# Build RNN
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, dtype=tf.float32)
final_state = tf.identity(final_state, name='final_state')
# Take RNN output and make logits
logits = tf.contrib.layers.fully_connected(outputs, vocab_size, activation_fn=None)
# Calculate the probability of generating each word
probs = tf.nn.softmax(logits, name='probs')
# Define loss function
cost = tf.contrib.seq2seq.sequence_loss(
logits,
targets,
tf.ones([input_text_shape[0], input_text_shape[1]])
)
# Learning rate optimizer
optimizer = tf.train.AdamOptimizer(learning_rate)
# Gradient clipping to avoid exploding gradients
gradients = optimizer.compute_gradients(cost)
capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
train_op = optimizer.apply_gradients(capped_gradients)

Related

For loop with GRUCell in call method of subclassed tf.keras.Model

I have subclassed tf.keras.Model and I use tf.keras.layers.GRUCell in a for loop to compute sequences 'y_t' (n, timesteps, hidden_units) and final hidden states 'h_t' (n, hidden_units). For my loop to output 'y_t', I update a tf.Variable after each iteration of the loop. Calling the model with model(input) is not a problem, but when I fit the model with the for loop in the call method I get either a TypeError or a ValueError.
Please note, I cannot simply use tf.keras.layers.GRU because I am trying to implement this paper. Instead of just passing x_t to the next cell in the RNN, the paper performs some computation as a step in the for loop (they implement in PyTorch) and pass the result of that computation to the RNN cell. They end up essentially doing this: h_t = f(special_x_t, h_t-1).
Please see the model below that causes the error:
class CustomGruRNN(tf.keras.Model):
def __init__(self, batch_size, timesteps, hidden_units, features, **kwargs):
# Inheritance
super().__init__(**kwargs)
# Args
self.batch_size = batch_size
self.timesteps = timesteps
self.hidden_units = hidden_units
# Stores y_t
self.rnn_outputs = tf.Variable(tf.zeros(shape=(batch_size, timesteps, hidden_units)), trainable=False)
# To be used in for loop in call
self.gru_cell = tf.keras.layers.GRUCell(units=hidden_units)
# Reshape to match input dimensions
self.dense = tf.keras.layers.Dense(units=features)
def call(self, inputs):
"""Inputs is rank-3 tensor of shape (n, timesteps, features) """
# Initial state for gru cell
h_t = tf.zeros(shape=(self.batch_size, self.hidden_units))
for timestep in tf.range(self.timesteps):
# Get the the timestep of the inputs
x_t = tf.gather(inputs, timestep, axis=1) # Same as x_t = inputs[:, timestep, :]
# Compute outputs and hidden states
y_t, h_t = self.gru_cell(x_t, h_t)
# Update y_t at the t^th timestep
self.rnn_outputs = self.rnn_outputs[:, timestep, :].assign(y_t)
# Outputs need to have same last dimension as inputs
outputs = self.dense(self.rnn_outputs)
return outputs
An example that would throw the error:
# Arbitrary values for dataset
num_samples = 128
batch_size = 4
timesteps = 5
features = 10
# Arbitrary dataset
x = tf.random.uniform(shape=(num_samples, timesteps, features))
y = tf.random.uniform(shape=(num_samples, timesteps, features))
train_data = tf.data.Dataset.from_tensor_slices((x, y))
train_data = train_data.shuffle(batch_size).batch(batch_size, drop_remainder=True)
# Model with arbitrary hidden units
model = CustomGruRNN(batch_size, timesteps, hidden_units=5)
model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam())
When running eagerly:
model.fit(train_data, epochs=2, run_eagerly=True)
Epoch 1/2
WARNING:tensorflow:Gradients do not exist for variables
['stack_overflow_gru_rnn/gru_cell/kernel:0',
'stack_overflow_gru_rnn/gru_cell/recurrent_kernel:0',
'stack_overflow_gru_rnn/gru_cell/bias:0'] when minimizing the loss.
ValueError: substring not found ValueError
When not running eagerly:
model.fit(train_data, epochs=2, run_eagerly=False)
Epoch 1/2
TypeError: in user code:
TypeError: Can not convert a NoneType into a Tensor or Operation.
Edit:
While the TensorFlow guide answer suffices, I think my self-answered question involving custom cells for RNNs is a much better option. Please see this answer. Using a custom RNN cell removes the need to use tf.Transpose and tf.TensorArrayand thus lowers complexity of the code while simultaneously improving readability.
Original Self-Answer:
The use of the DynamicRNN described near the bottom of TensorFlow's Guide to Effective TensorFlow2 solves my problem.
To expand briefly on the DynamicRNN's conceptual use, an RNN cell is defined, in my case GRU, and then any number of custom steps can be defined within the tf.range loop. Variables should be tracked using tf.TensorArray objects outside the loop but inside the call method itself, and the sizes of such arrays can be determined by simply calling the .shape method of (input) tensors. Notably, the DynamicRNN object works in model fit, wherein the default execution mode is 'Graph' mode as opposed to the slower 'Eager Execution' mode.
Lastly, one might require the use of a 'DynamicRNN' because by default, the `tf.keras.layers.GRU' computation is loosely described by the following recurrent logic (assume that 'f' defines a GRU cell):
# Numpy is used here for ease of indexing, but in general you should use
# tensors and transpose them accordingly (see the previously linked guide)
inputs = np.random.randn((batch, total_timesteps, features))
# List for tracking outputs -- just for simple demonstration... again please see the guide for more details
outputs = []
# Initialize the 'hidden state' (often referred to as h_naught and denoted h_0) of the RNN cell
state_at_t_minus_1 = tf.zeros(shape=(batch, hidden_cell_units))
# Iterate through the input until all timesteps in the sequence have been 'seen' by the GRU cell function 'f'
for timestep_t in total_timesteps:
# This is of shape (batch, features)
input_at_t = inputs[:, timestep_t, :]
# output_at_t of shape (batch, hidden_units_of_cell) and state_at_t (batch, hidden_units_of_cell)
output_at_t, state_at_t = f(input_at_t, state_at_t_minus_1)
outputs.append(output_at_t)
# When the loop restarts, this variable will be used in the next GRU Cell function call 'f'
state_at_t_minus_1 = state_at_t
One might wish to add other steps in the for loop of the recurrent logic (e.g., dense layers, other layers, etc.) to modify the inputs and states passed to the GRU Cell function 'f'. This is one motivation of the DynamicRNN.

Access output of intermediate layers in Tensor-flow 2.0 in eager mode

I have CNN that I have built using on Tensor-flow 2.0. I need to access outputs of the intermediate layers. I was going over other stackoverflow questions that were similar but all had solutions involving Keras sequential model.
I have tried using model.layers[index].output but I get
Layer conv2d has no inbound nodes.
I can post my code here (which is super long) but I am sure even without that someone can point to me how it can be done using just Tensorflow 2.0 in eager mode.
I stumbled onto this question while looking for an answer and it took me some time to figure out as I use the model subclassing API in TF 2.0 by default (as in here https://www.tensorflow.org/tutorials/quickstart/advanced).
If somebody is in a similar situation, all you need to do is assign the intermediate output you want, as an attribute of the class. Then keep the test_step without the #tf.function decorator and create its decorated copy, say val_step, for efficient internal computation of validation performance during training. As a short example, I have modified a few functions of the tutorial from the link accordingly. I'm assuming we need to access the output after flattening.
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
self.intermediate=x #assign it as an object attribute for accessing later
x = self.d1(x)
return self.d2(x)
#Remove #tf.function decorator from test_step for prediction
def test_step(images, labels):
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
return
#Create a decorated val_step for object's internal use during training
#tf.function
def val_step(images, labels):
return test_step(images, labels)
Now when you run model.predict() after training, using the un-decorated test step, you can access the intermediate output using model.intermediate which would be an EagerTensor whose value is obtained simply by model.intermediate.numpy(). However, if you don't remove the #tf_function decorator from test_step, this would return a Tensor whose value is not so straightforward to obtain.
Thanks for answering my earlier question. I wrote this simple example to illustrate how what you're trying to do might be done in TensorFlow 2.x, using the MNIST dataset as the example problem.
The gist of the approach:
Build an auxiliary model (aux_model in the example below), which is so-called "functional model" with multiple outputs. The first output is the output of the original model and will be used for loss calculation and backprop, while the remaining output(s) are the intermediate-layer outputs that you want to access.
Use tf.GradientTape() to write a custom training loop and expose the detailed gradient values on each individual variable of the model. Then you can pick out the gradients that are of interest to you. This requires that you know the ordering of the model's variables. But that should be relatively easy for a sequential model.
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
# This is the original model.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=[28, 28, 1]),
tf.keras.layers.Dense(100, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax")])
# Make an auxiliary model that exposes the output from the intermediate layer
# of interest, which is the first Dense layer in this case.
aux_model = tf.keras.Model(inputs=model.inputs,
outputs=model.outputs + [model.layers[1].output])
# Define a custom training loop using `tf.GradientTape()`, to make it easier
# to access gradients on specific variables (the kernel and bias of the first
# Dense layer in this case).
cce = tf.keras.losses.CategoricalCrossentropy()
optimizer = tf.optimizers.Adam()
with tf.GradientTape() as tape:
# Do a forward pass on the model, retrieving the intermediate layer's output.
y_pred, intermediate_output = aux_model(x_train)
print(intermediate_output) # Now you can access the intermediate layer's output.
# Compute loss, to enable backprop.
loss = cce(tf.one_hot(y_train, 10), y_pred)
# Do backprop. `gradients` here are for all variables of the model.
# But we know we want the gradients on the kernel and bias of the first
# Dense layer, which happens to be the first two variables of the model.
gradients = tape.gradient(loss, aux_model.variables)
# This is the gradient on the first Dense layer's kernel.
intermediate_layer_kerenl_gradients = gradients[0]
print(intermediate_layer_kerenl_gradients)
# This is the gradient on the first Dense layer's bias.
intermediate_layer_bias_gradients = gradients[1]
print(intermediate_layer_bias_gradients)
# Update the variables of the model.
optimizer.apply_gradients(zip(gradients, aux_model.variables))
The most straightforward solution would go like this:
mid_layer = model.get_layer("layer_name")
you can now treat the "mid_layer" as a model, and for instance:
mid_layer.predict(X)
Oh, also, to get the name of a hidden layer, you can use this:
model.summary()
this will give you some insights about the layer input/output as well.

How to get the states for each step and for each layer in a multilayer RNN using dynamic_rnn

I am building a multi-layer RNN with the same setting as in (using MultiRNNCell to wrap up the cells and then use dynamic_rnn to call):
Outputs and State of MultiRNNCell in Tensorflow
And as descriped in the above question, the dynamic_rnn returns
outputs, state = tf.nn.dynamic_rnn(...)
The outputs only provides outputs I guess from the top layer (because the shape is batch_size x steps x state_size). However, the state return the outputs from each layer (tuple with num_layer elements, each one contains the last state of that layer).
(1) Is there any way that I can assess the outputs from all time steps for each layer(not jus the last layer returned by the dynamic_rnn) in a simple way without running a one-step RNN recursively and read the state for each step?
(2) Is the output returned indicated for the last(top) layer?
Based on the documentation of the tf.nn.rnn_cell.MultiRNNCell you should be safe doing the following:
cell_1 = tf.nn.rnn_cell.GRUCell(7, name="gru1")
cell_2 = tf.nn.rnn_cell.GRUCell(7, name="gru2")
outputs_1, states_1 = tf.nn.dynamic_rnn(cell_1, X, dtype=tf.float32)
outputs_2, states_2 = tf.nn.dynamic_rnn(cell_2, outputs_1, dtype=tf.float32)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
first_layer_outputs = sess.run(outputs_1)
second_layer_outputs = sess.run(outputs_2)
As for the outputs returned by tf.nn.dynamic_rnn, they are indeed from the top layer if the cell provided is tf.nn.rnn_cell.MultiRNNCell.

Not fully connected layer in tensorflow

I want to create a network where in the input layer nodes are just connected to some nodes in the next layer. Here is a small example:
My solution so far is that I set the weight of the edge between i1 and h1 to zero and after every optimization step I multiply the weights with a matrix (I call this matrix mask matrix) in which every entry is 1 except the entry of the weight of the edge between i1 and h1.
(See code below)
Is this approach right? Or does this have a affect on the GradientDescent? Is there another approach to create this kind of a network in TensorFlow?
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tf.enable_eager_execution()
model = tf.keras.Sequential([
tf.keras.layers.Dense(2, activation=tf.sigmoid, input_shape=(2,)), # input shape required
tf.keras.layers.Dense(2, activation=tf.sigmoid)
])
#set the weights
weights=[np.array([[0, 0.25],[0.2,0.3]]),np.array([0.35,0.35]),np.array([[0.4,0.5],[0.45, 0.55]]),np.array([0.6,0.6])]
model.set_weights(weights)
model.get_weights()
features = tf.convert_to_tensor([[0.05,0.10 ]])
labels = tf.convert_to_tensor([[0.01,0.99 ]])
mask =np.array([[0, 1],[1,1]])
#define the loss function
def loss(model, x, y):
y_ = model(x)
return tf.losses.mean_squared_error(labels=y, predictions=y_)
#define the gradient calculation
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss(model, inputs, targets)
return loss_value, tape.gradient(loss_value, model.trainable_variables)
#create optimizer an global Step
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
global_step = tf.train.get_or_create_global_step()
#optimization step
loss_value, grads = grad(model, features, labels)
optimizer.apply_gradients(zip(grads, model.variables),global_step)
#masking the optimized weights
weights=(model.get_weights())[0]
masked_weights=tf.multiply(weights,mask)
model.set_weights([masked_weights])
If you are looking for a solution for the specific example you provided, you can simply use tf.keras Functional API and define two Dense layers where one is connected to both neurons in the previous layer and the other one is only connected to one of the neurons:
from tensorflow.keras.layer import Input, Lambda, Dense, concatenate
from tensorflow.keras.models import Model
inp = Input(shape=(2,))
inp2 = Lambda(lambda x: x[:,1:2])(inp) # get the second neuron
h1_out = Dense(1, activation='sigmoid')(inp2) # only connected to the second neuron
h2_out = Dense(1, activation='sigmoid')(inp) # connected to both neurons
h_out = concatenate([h1_out, h2_out])
out = Dense(2, activation='sigmoid')(h_out)
model = Model(inp, out)
# simply train it using `fit`
model.fit(...)
The problem with your solution and some others suggested by other answers in this post is that they do not prevent training of this weight. They allow the gradient descent to train the non existent weight and then overwrite it retrospectively. This will result in a network that has a zero in this location as desired, but will negatively affect your training process as the back propagation calculation will not see the masking step as it is not part of a TensorFlow graph and so the gradient descent will follow a path which includes the assumption that this weight does have an affect on the outcome (it does not).
A better solution would be to include the masking step as a part of your TensorFlow graph, so that it can be factored into the gradient descent. Since the masking step is simply a element wise multiplication by your sparse, binary martix mask, you could just include the mask matrix as an elementwise matrix multiplicaiton in the graph definition using tf.multiply.
Sadly this means sying goodbye to the user friendly keras,layers methods and embracing a more nuts & bolts approach to TensorFlow. I can't see an obvious way to do it using the layers API.
See the implementation below, I have tried to provide comments explaining what is happening at each stage.
import tensorflow as tf
## Graph definition for model
# set up tf.placeholders for inputs x, and outputs y_
# these remain fixed during training and can have values fed to them during the session
with tf.name_scope("Placeholders"):
x = tf.placeholder(tf.float32, shape=[None, 2], name="x") # input layer
y_ = tf.placeholder(tf.float32, shape=[None, 2], name="y_") # output layer
# set up tf.Variables for the weights at each layer from l1 to l3, and setup feeding of initial values
# also set up mask as a variable and set it to be un-trianable
with tf.name_scope("Variables"):
w_l1_values = [[0, 0.25],[0.2,0.3]]
w_l1 = tf.Variable(w_l1_values, name="w_l1")
w_l2_values = [[0.4,0.5],[0.45, 0.55]]
w_l2 = tf.Variable(w_l2_values, name="w_l2")
mask_values = [[0., 1.], [1., 1.]]
mask = tf.Variable(mask_values, trainable=False, name="mask")
# link each set of weights as matrix multiplications in the graph. Inlcude an elementwise multiplication by mask.
# Sequence takes us from inputs x to output final_out, which will be compared to labels fed to placeholder y_
l1_out = tf.nn.relu(tf.matmul(x, tf.multiply(w_l1, mask)), name="l1_out")
final_out = tf.nn.relu(tf.matmul(l1_out, w_l2), name="output")
## define loss function and training operation
with tf.name_scope("Loss"):
# some loss defined as a function of graph output: final_out and labels: y_
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=final_out, labels=y_, name="loss")
with tf.name_scope("Train"):
# some optimisation strategy, arbitrary learning rate
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, name="optimizer_adam")
train_op = optimizer.minimize(loss, name="train_op")
# create session, initialise variables and train according to inputs and corresponding labels
# This should show that the values of the first layer weights change, but the one set to 0 remains at 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
initial_l1_weights = sess.graph.get_tensor_by_name("Variables/w_l1:0")
print(initial_l1_weights.eval())
inputs = [[0.05, 0.10]]
labels = [[0.01, 0.99]]
ans = sess.run(train_op, feed_dict={"Placeholders/x:0": inputs, "Placeholders/y_:0": labels})
train_steps = 1
for i in range(train_steps):
initial_l1_weights = sess.graph.get_tensor_by_name("Variables/w_l1:0")
print(initial_l1_weights.eval())
Or use the answer provided by today for a keras friendly option.
You have multiple options here.
First, you could use the dynamic masking approach in your example. I believe this will work as expected since the gradients w.r.t. the masked-out parameters will be zero (the output is constant when you change the unused parameters). This approach is simple and it can be used even when your mask is not constant during the training.
Second, if you know beforehand which weights will be always zero, you can compose your weight matrix using tf.get_variable to get a submatrix, and then concatenate it with a tf.constant tensor, e.g.:
weights_sub = tf.get_variable("w", [dim_in, dim_out - 1])
zeros = tf.zeros([dim_in, 1])
weights = tf.concat([weights_sub, zeros], axis=1)
this example will make one column of your weight matrix to be always zero.
Finally, if your mask is more complex, you can use tf.get_variable on a flattened vector and then compose a tf.SparseTensor with the variable values on the used indices:
weights_used = tf.get_variable("w", [num_used_vars])
indices = ... # get your indices in a 2-D matrix of shape [num_used_vars, 2]
dense_shape = tf.constant([dim_in, dim_out]) # this is the final shape of the weight matrix
weights = tf.SparseTensor(indices, weights_used, dense_shape)
EDIT: This probably won't work in combination with Keras' set_weights method, as it expects Numpy arrays, not Tensors.

Having trouble understanding lstm use in tensorflow code sample

Why is the pred variable being calculated before any of the training iterations occur? I would expect that a pred would be generated (through the RNN() function) during each pass through of the data for every iteration?
There must be something I am missing. Is pred something like a function object? I have looked at the docs for tf.matmul() and that returns a tensor, not a function.
Full source: https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/recurrent_network.py
Here is the code:
def RNN(x, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
# Unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.unstack(x, n_steps, 1)
# Define a lstm cell with tensorflow
lstm_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Get lstm cell output
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out']
pred = RNN(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.global_variables_initializer()
Tensorflow code has two distinct phases. First, you build a "dependency graph", which contains all of the operations that you will use. Note that during this phase you are not processing any data. Instead, you are simply defining the operations you want to occur. Tensorflow is taking note of the dependencies between the operations.
For example, in order to compute the accuracy, you'll need to first compute correct_pred, and to compute correct_pred you'll need to first compute pred, and so on.
So all you have done in the code shown is to tell tensorflow what operations you want. You've saved those in a "graph" data structure (that's a tensorflow data structure that basically is a bucket that contains all the mathematical operations and tensors).
Later you will run operations on the data using calls to sess.run([ops], feed_dict={inputs}).
When you call sess.run notice that you have to tell it what you want from the graph. If you ask for accuracy:
sess.run(accuracy, feed_dict={inputs})
Tensorflow will try to compute accuracy. It will see that accuracy depends on correct_pred, so it will try to compute that, and so on through the dependency graph that you defined.
The error you're making is that you think pred in the code you listed is computing something. It's not. The line:
pred = RNN(x, weights, biases)
only defined the operation and its dependencies.

Categories