I am working on a RNN controller, which takes the current state of the plant as the input to the RNN, and generates the output as the controlling signal . After executing the control, the updated plant state is fed back to the RNN as the input of next time step. In this looping, the input sequence is stacked step by step, rather than all given in advance.
For now, no training is involved. Only the single-step forward simulation is needed.
So a tensorflow RNN operation that can do this one-step RNN output is what I'm looking for.
input_data = tf.placeholder(tf.float32, [batch_size, len_seq,8])
I defined two kinds of input: Input_data for the batch_size sequences of input, and input_single for the input of current time step.
input_single = tf.placeholder(tf.float32, [1, 1, 8])
action_gradient = tf.placeholder(tf.float32, [batch_size, len_seq, dimAction])
num_hidden = 24
cell = tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True)
state_single = cell.zero_state(batch_size, tf.float32)
(output_single, state_single) = cell(input_single, state_single)
weight = tf.Variable(tf.truncated_normal([num_hidden, dimAction]))
bias = tf.Variable(tf.constant(0.1, shape=[dimAction]))
y_single = tf.nn.tanh(tf.matmul(output_single, weight) + bias)
The network is read out in two ways: y_single for each time step, and y_seq for the whole minibatch of the input.
outputs, states = tf.nn.dynamic_rnn(cell, input_data, dtype=tf.float32)
y_seq = tf.nn.tanh(tf.matmul(outputs, weight) + bias)
You can achieve this by simply calling your tf.rnn.LSTMCell object once. Make sure you put correct arguments. Something like this will help you,
cell = tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True)
input_single = tf.ones([batch_size, input_size])
state_single = cell.zero_state(batch_size, tf.float32)
(output_single, state_single) = cell(input_single, state_single)
Have a look at the documentation for RNNCell.__call__() for more details on what the shape of input_single and state_single should be, if you have a good reason not to use cell.zero_state().
I have subclassed tf.keras.Model and I use tf.keras.layers.GRUCell in a for loop to compute sequences 'y_t' (n, timesteps, hidden_units) and final hidden states 'h_t' (n, hidden_units). For my loop to output 'y_t', I update a tf.Variable after each iteration of the loop. Calling the model with model(input) is not a problem, but when I fit the model with the for loop in the call method I get either a TypeError or a ValueError.
Please note, I cannot simply use tf.keras.layers.GRU because I am trying to implement this paper. Instead of just passing x_t to the next cell in the RNN, the paper performs some computation as a step in the for loop (they implement in PyTorch) and pass the result of that computation to the RNN cell. They end up essentially doing this: h_t = f(special_x_t, h_t-1).
Please see the model below that causes the error:
class CustomGruRNN(tf.keras.Model):
def __init__(self, batch_size, timesteps, hidden_units, features, **kwargs):
# Inheritance
# Args
self.batch_size = batch_size
self.timesteps = timesteps
self.hidden_units = hidden_units
# Stores y_t
self.rnn_outputs = tf.Variable(tf.zeros(shape=(batch_size, timesteps, hidden_units)), trainable=False)
# To be used in for loop in call
self.gru_cell = tf.keras.layers.GRUCell(units=hidden_units)
# Reshape to match input dimensions
self.dense = tf.keras.layers.Dense(units=features)
def call(self, inputs):
"""Inputs is rank-3 tensor of shape (n, timesteps, features) """
# Initial state for gru cell
h_t = tf.zeros(shape=(self.batch_size, self.hidden_units))
for timestep in tf.range(self.timesteps):
# Get the the timestep of the inputs
x_t = tf.gather(inputs, timestep, axis=1) # Same as x_t = inputs[:, timestep, :]
# Compute outputs and hidden states
y_t, h_t = self.gru_cell(x_t, h_t)
# Update y_t at the t^th timestep
self.rnn_outputs = self.rnn_outputs[:, timestep, :].assign(y_t)
# Outputs need to have same last dimension as inputs
outputs = self.dense(self.rnn_outputs)
return outputs
An example that would throw the error:
# Arbitrary values for dataset
num_samples = 128
batch_size = 4
timesteps = 5
features = 10
# Arbitrary dataset
x = tf.random.uniform(shape=(num_samples, timesteps, features))
y = tf.random.uniform(shape=(num_samples, timesteps, features))
train_data = tf.data.Dataset.from_tensor_slices((x, y))
train_data = train_data.shuffle(batch_size).batch(batch_size, drop_remainder=True)
# Model with arbitrary hidden units
model = CustomGruRNN(batch_size, timesteps, hidden_units=5)
model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam())
When running eagerly:
model.fit(train_data, epochs=2, run_eagerly=True)
Epoch 1/2
WARNING:tensorflow:Gradients do not exist for variables
'stack_overflow_gru_rnn/gru_cell/bias:0'] when minimizing the loss.
ValueError: substring not found ValueError
When not running eagerly:
model.fit(train_data, epochs=2, run_eagerly=False)
Epoch 1/2
TypeError: in user code:
TypeError: Can not convert a NoneType into a Tensor or Operation.
While the TensorFlow guide answer suffices, I think my self-answered question involving custom cells for RNNs is a much better option. Please see this answer. Using a custom RNN cell removes the need to use tf.Transpose and tf.TensorArrayand thus lowers complexity of the code while simultaneously improving readability.
Original Self-Answer:
The use of the DynamicRNN described near the bottom of TensorFlow's Guide to Effective TensorFlow2 solves my problem.
To expand briefly on the DynamicRNN's conceptual use, an RNN cell is defined, in my case GRU, and then any number of custom steps can be defined within the tf.range loop. Variables should be tracked using tf.TensorArray objects outside the loop but inside the call method itself, and the sizes of such arrays can be determined by simply calling the .shape method of (input) tensors. Notably, the DynamicRNN object works in model fit, wherein the default execution mode is 'Graph' mode as opposed to the slower 'Eager Execution' mode.
Lastly, one might require the use of a 'DynamicRNN' because by default, the `tf.keras.layers.GRU' computation is loosely described by the following recurrent logic (assume that 'f' defines a GRU cell):
# Numpy is used here for ease of indexing, but in general you should use
# tensors and transpose them accordingly (see the previously linked guide)
inputs = np.random.randn((batch, total_timesteps, features))
# List for tracking outputs -- just for simple demonstration... again please see the guide for more details
outputs = []
# Initialize the 'hidden state' (often referred to as h_naught and denoted h_0) of the RNN cell
state_at_t_minus_1 = tf.zeros(shape=(batch, hidden_cell_units))
# Iterate through the input until all timesteps in the sequence have been 'seen' by the GRU cell function 'f'
for timestep_t in total_timesteps:
# This is of shape (batch, features)
input_at_t = inputs[:, timestep_t, :]
# output_at_t of shape (batch, hidden_units_of_cell) and state_at_t (batch, hidden_units_of_cell)
output_at_t, state_at_t = f(input_at_t, state_at_t_minus_1)
# When the loop restarts, this variable will be used in the next GRU Cell function call 'f'
state_at_t_minus_1 = state_at_t
One might wish to add other steps in the for loop of the recurrent logic (e.g., dense layers, other layers, etc.) to modify the inputs and states passed to the GRU Cell function 'f'. This is one motivation of the DynamicRNN.
I'm new to pytorch, I followed a tutorial on sentence generation with RNN and I'm trying to modify it to generate sequences of positions, however I'm having trouble with defining the correct model parameters such as input_size, output_size, hidden_dim, batch_size.
I have 596 sequences of x,y positions, each looking like [[x1,y1],[x2,y2],...,[xn,yn]]. Each sequence represents the 2D path of a vehicle. I would like to to train a model that, given a starting point (or a partial sequence), could generate one of these sequences.
-I have padded/truncated the sequences so that they all have length 50, meaning each sequence is an array of shape [50,2]
-I then divided this data into input_seq and target_seq:
input_seq: tensor of torch.Size([596, 49, 2]). contains all the 596 sequences, each without its last position.
target_seq: tensor of torch.Size([596, 49, 2]). contains all the 596 sequences, each without its first position.
The model class:
class Model(nn.Module):
def __init__(self, input_size, output_size, hidden_dim, n_layers):
super(Model, self).__init__()
# Defining some parameters
self.hidden_dim = hidden_dim
self.n_layers = n_layers
#Defining the layers
# RNN Layer
self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
# Fully connected layer
self.fc = nn.Linear(hidden_dim, output_size)
def forward(self, x):
batch_size = x.size(0)
# Initializing hidden state for first input using method defined below
hidden = self.init_hidden(batch_size)
# Passing in the input and hidden state into the model and obtaining outputs
out, hidden = self.rnn(x, hidden)
# Reshaping the outputs such that it can be fit into the fully connected layer
out = out.contiguous().view(-1, self.hidden_dim)
out = self.fc(out)
return out, hidden
def init_hidden(self, batch_size):
# This method generates the first hidden state of zeros which we'll use in the forward pass
# We'll send the tensor holding the hidden state to the device we specified earlier as well
hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
return hidden
I instantiate the model with the following parameters:
input_size of 2 (an [x,y] position)
output_size of 2 (an [x,y] position)
hidden_dim of 2 (an [x,y] position) (or should this be 50 as in the length of a full sequence?)
model = Model(input_size=2, output_size=2, hidden_dim=2, n_layers=1)
n_epochs = 100
# Define Loss, Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
# Training Run
for epoch in range(1, n_epochs + 1):
optimizer.zero_grad() # Clears existing gradients from previous epoch
output, hidden = model(input_seq)
loss = criterion(output, target_seq.view(-1).long())
loss.backward() # Does backpropagation and calculates gradients
optimizer.step() # Updates the weights accordingly
if epoch%10 == 0:
print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
print("Loss: {:.4f}".format(loss.item()))
When I run the training loop, it fails with this error:
ValueError Traceback (most recent call last)
<ipython-input-9-ad1575e0914b> in <module>
3 optimizer.zero_grad() # Clears existing gradients from previous epoch
4 output, hidden = model(input_seq)
----> 5 loss = criterion(output, target_seq.view(-1).long())
6 loss.backward() # Does backpropagation and calculates gradients
7 optimizer.step() # Updates the weights accordingly
ValueError: Expected input batch_size (29204) to match target batch_size (58408).
I tried modifying input_size, output_size, hidden_dim and batch_size and reshaping the tensors, but the more I try the more confused I get. Could someone point out what I am doing wrong?
Furthermore, since batch size is defined as x.size(0) in Model.forward(self,x), this means I only have a single batch of size 596 right? What would be the correct way to have multiple smaller batches?
The output has size [batch_size * seq_len, 2] = [29204, 2], and you flatten the target_seq, which has size [batch_size * seq_len * 2] = [58408]. They don't have the same number of dimensions, while having the same number of total elements, therefore the first dimensions are not identical.
Regardless of the dimension mismatch, nn.CrossEntropyLoss is a categorical loss function, which means it would only predict a class from the output. You don't have any classes, but you are trying to predict coordinates, which are continuous values. For this you need to use a regression loss function, such as nn.MSELoss, which calculates the squared error/distance between the predicted and target coordinates.
criterion = nn.MSELoss()
# .flatten() does the same thing as .view(-1) but is more descriptive
loss = criterion(output.flatten(), target_seq.flatten())
The flattening can be avoided as the loss functions as well as the linear layer can operate on multidimensional inputs, which removes the potential risk of getting lost with the flattening and restoring of the dimensions, and the output is more comprehensible to inspect or use later outside of the training. For the linear layer, only the last dimension of the input needs to match the in_features of nn.Linear, which is hidden_dim in your case.
def forward(self, x):
batch_size = x.size(0)
# Initializing hidden state for first input using method defined below
hidden = self.init_hidden(batch_size)
# Passing in the input and hidden state into the model and obtaining outputs
# out size: [batch_size, seq_len, hidden_dim]
out, hidden = self.rnn(x, hidden)
# out size: [batch_size, seq_len, output_size]
out = self.fc(out)
return out, hidden
Now the output of the model has the same size as the target_seq and you can directly call the loss function without flattening:
loss = criterion(output, target_seq)
hidden_dim of 2 (an [x,y] position) (or should this be 50 as in the length of a full sequence?)
The hidden_dim is not a pair of [x, y] and is completely unrelated to both the input_size and output_size. It defines the number of hidden features of the RNN, which is kind of its complexity, and bigger sizes potentially have more room to retain essential information, but also require more computations. There is no perfect hidden size and it largely depends on the use case. You can experiment with different sizes, e.g. 100, 256, etc. and see whether that improves your results.
Furthermore, since batch size is defined as x.size(0) in Model.forward(self,x), this means I only have a single batch of size 596 right? What would be the correct way to have multiple smaller batches?
Yes, you only have a single batch of size 596. If you want to use smaller batches, for example if you cannot fit all of them into a more complex model, you could easily use slices of them, but it would be better to use PyTorch's data utilities: torch.utils.data.TensorDataset to get a dataset, where each sequence of the input has a corresponding target, in combination with torch.utils.data.DataLoader to create batches for you.
from torch.utils.data import DataLoader, TensorDataset
# Match each sequence of the input_seq to the corresponding target_seq.
# e.g. dataset[0] == (input_seq[0], target_seq[0])
dataset = TensorDataset(input_seq, target_seq)
# Randomly shuffle the data and load it in batches of 16
data_loader = DataLoader(dataset, batch_size=16, shuffle=True)
# Process one batch at a time
for input, target in data_loader:
output, hidden = model(input)
loss = criterion(output, target)
I am building a multi-layer RNN with the same setting as in (using MultiRNNCell to wrap up the cells and then use dynamic_rnn to call):
Outputs and State of MultiRNNCell in Tensorflow
And as descriped in the above question, the dynamic_rnn returns
outputs, state = tf.nn.dynamic_rnn(...)
The outputs only provides outputs I guess from the top layer (because the shape is batch_size x steps x state_size). However, the state return the outputs from each layer (tuple with num_layer elements, each one contains the last state of that layer).
(1) Is there any way that I can assess the outputs from all time steps for each layer(not jus the last layer returned by the dynamic_rnn) in a simple way without running a one-step RNN recursively and read the state for each step?
(2) Is the output returned indicated for the last(top) layer?
Based on the documentation of the tf.nn.rnn_cell.MultiRNNCell you should be safe doing the following:
cell_1 = tf.nn.rnn_cell.GRUCell(7, name="gru1")
cell_2 = tf.nn.rnn_cell.GRUCell(7, name="gru2")
outputs_1, states_1 = tf.nn.dynamic_rnn(cell_1, X, dtype=tf.float32)
outputs_2, states_2 = tf.nn.dynamic_rnn(cell_2, outputs_1, dtype=tf.float32)
with tf.Session() as sess:
first_layer_outputs = sess.run(outputs_1)
second_layer_outputs = sess.run(outputs_2)
As for the outputs returned by tf.nn.dynamic_rnn, they are indeed from the top layer if the cell provided is tf.nn.rnn_cell.MultiRNNCell.
My current LSTM network looks like this.
rnn_cell = tf.contrib.rnn.BasicRNNCell(num_units=CELL_SIZE)
init_s = rnn_cell.zero_state(batch_size=1, dtype=tf.float32) # very first hidden state
outputs, final_s = tf.nn.dynamic_rnn(
rnn_cell, # cell you have chosen
tf_x, # input
initial_state=init_s, # the initial hidden state
time_major=False, # False: (batch, time step, input); True: (time step, batch, input)
# reshape 3D output to 2D for fully connected layer
outs2D = tf.reshape(outputs, [-1, CELL_SIZE])
net_outs2D = tf.layers.dense(outs2D, INPUT_SIZE)
# reshape back to 3D
outs = tf.reshape(net_outs2D, [-1, TIME_STEP, INPUT_SIZE])
Usually, I apply tf.layers.batch_normalization as batch normalization. But I am not sure if this works in a LSTM network.
b1 = tf.layers.batch_normalization(outputs, momentum=0.4, training=True)
d1 = tf.layers.dropout(b1, rate=0.4, training=True)
# reshape 3D output to 2D for fully connected layer
outs2D = tf.reshape(d1, [-1, CELL_SIZE])
net_outs2D = tf.layers.dense(outs2D, INPUT_SIZE)
# reshape back to 3D
outs = tf.reshape(net_outs2D, [-1, TIME_STEP, INPUT_SIZE])
If you want to use batch norm for RNN (LSTM or GRU), you can check out this implementation , or read the full description from blog post.
However, the layer-normalization has more advantage than batch norm in sequence data. Specifically, "the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent networks" (from the paper Ba, et al. Layer normalization).
For layer normalization, it normalizes the summed inputs within each layer. You can check out the implementation of layer-normalization for GRU cell:
Based on this paper: "Layer Normalization" - Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
Tensorflow now comes with the tf.contrib.rnn.LayerNormBasicLSTMCell a LSTM unit with layer normalization and recurrent dropout.
Find the documentation here.
I'm trying to build a LSTM RNN that handles 3D data in Tensorflow. From this paper, Grid LSTM RNN's can be n-dimensional. The idea for my network is a have a 3D volume [depth, x, y] and the network should be [depth, x, y, n_hidden] where n_hidden is the number of LSTM cell recursive calls. The idea is that each pixel gets its own "string" of LSTM recursive calls.
The output should be [depth, x, y, n_classes]. I'm doing a binary segmentation -- think foreground and background, so the number of classes is just 2.
# Network Parameters
n_depth = 5
n_input_x = 200 # MNIST data input (img shape: 28*28)
n_input_y = 200
n_hidden = 128 # hidden layer num of features
n_classes = 2
# tf Graph input
x = tf.placeholder("float", [None, n_depth, n_input_x, n_input_y])
y = tf.placeholder("float", [None, n_depth, n_input_x, n_input_y, n_classes])
# Define weights
weights = {}
biases = {}
# Initialize weights
for i in xrange(n_depth * n_input_x * n_input_y):
weights[i] = tf.Variable(tf.random_normal([n_hidden, n_classes]))
biases[i] = tf.Variable(tf.random_normal([n_classes]))
def RNN(x, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_input_y, n_input_x)
# Permuting batch_size and n_input_y
x = tf.reshape(x, [-1, n_input_y, n_depth * n_input_x])
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_input_y*batch_size, n_input_x)
x = tf.reshape(x, [-1, n_input_x * n_depth])
# Split to get a list of 'n_input_y' tensors of shape (batch_size, n_hidden)
# This input shape is required by `rnn` function
x = tf.split(0, n_depth * n_input_x * n_input_y, x)
# Define a lstm cell with tensorflow
lstm_cell = grid_rnn_cell.GridRNNCell(n_hidden, input_dims=[n_depth, n_input_x, n_input_y])
# lstm_cell = rnn_cell.MultiRNNCell([lstm_cell] * 12, state_is_tuple=True)
# lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.8)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
# pdb.set_trace()
output = []
for i in xrange(n_depth * n_input_x * n_input_y):
#I'll need to do some sort of reshape here on outputs[i]
output.append(tf.matmul(outputs[i], weights[i]) + biases[i])
return output
pred = RNN(x, weights, biases)
pred = tf.transpose(tf.pack(pred),[1,0,2])
pred = tf.reshape(pred, [-1, n_depth, n_input_x, n_input_y, n_classes])
# pdb.set_trace()
temp_pred = tf.reshape(pred, [-1, n_classes])
n_input_y = tf.reshape(y, [-1, n_classes])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(temp_pred, n_input_y))
Currently I'm getting the error: TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
It occurs after the RNN intialization: outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
x of course is of type float32
I am unable to tell what type GridRNNCell returns, any helpe here? This could be the issue. Should I be defining more arguments to this? input_dims makes sense, but what should output_dims be?
Is this a bug in the contrib code?
GridRNNCell is located in contrib/grid_rnn/python/ops/grid_rnn_cell.py
I was unsure on some of the implementation decisions of the code, so I decided to roll my own. One thing to keep in mind is that this is an implementation of just the cell. It is up to you to build the actual machinery that handles the locations and interactions of the h and m vectors and isn't as simple as passing in your data and expecting it to traverse the dimensions properly.
So for example, if you are working in two dimensions, start with the top left block, take the incoming x and y vectors, concat them together, then use your cell to compute the output (which includes outgoing vectors for both x and y); and it is up to you to store the output for later use in neighboring blocks. Pass those outputs individually to each corresponding dimension, and in each of those neighboring blocks, concat the incoming vectors (again, for each dimension) and compute the output for the neighboring blocks. To do this, you'll need two for-loops, one for each dimension.
Perhaps the version in contrib will work for this, but a couple problems I have with it (I could be wrong here, but as far as I can tell):
1) The vectors are handled using concat and slice rather than with tuples. This will likely result in slower performance.
2) It looks like the input is projected at each step, which doesn't sit well with me. In the paper they only project into the network for incoming blocks along the edge of the grid and not throughout.
If you look at the code, it is actually very simple. Perhaps reading the paper and making adjustments to the code as needed, or rolling your own are your best bet. And remember that the cell is only good for performing the recurrence at each step, and not for managing the incoming and outgoing h and m vectors.
which version of Grid LSTM cells are you using?
If you are using https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/rnn_cell.py
I think you can try to initialize 'feature_size' and 'frequency_skip'.
Also, I think there may exists another bug. Feed a dynamic shape into this version may cause a TypeError
Yes, dynamic shape was the cause. There is a PR to fix this: https://github.com/tensorflow/tensorflow/pull/4631
#jstaker7: Thank you for trying it out. Re. problem 1, the above PR uses tuples for states and outputs, hopefully it can address the performance issue. GridRNNCell was created some while ago, at that time all the LSTMCells in Tensorflow was using concat/slice instead of tuple.
Re. problem 2, GridRNNCell will not project the input if you pass None. A dimension can be both input and recurrent, and when there is no input (inputs = None), it will use the recurrent tensors for computation. We can also use 2 input dimensions, by instantiate the GridRNNCell directly.
Of course writing a generic class for all cases makes the code looks a bit convoluted, and I think that it needs better documentation.
Anyway, it will be great if you could share your improvements, or any idea you might have to make it clearer/more useful. It is the nature of an open-source project anyway.