ValueError: ConvLSTMCell and dynamic_rnn

ValueError: ConvLSTMCell and dynamic_rnn - python

I'm trying to build a seq2seq model in tensorflow (1.4) using the tf.contrib.rnn.ConvLSTMCell API together with the tf.nn.dynamic_rnn API, but I got an error with the dimension of the inputs.
My code is:
# features is an image sequence with shape [600, 400, 10],
# so features is a tensor with shape [batch_size, 600, 400, 10]
features = tf.transpose(features, [0,3,1,2])
features = tf.reshape(features, [params['batch_size'],10,600,400])
encoder_cell = tf.contrib.rnn.ConvLSTMCell(conv_ndims=2,
input_shape=[600, 400,1],
output_channels=5,
kernel_shape=[7,7],
skip_connection=False)
_, encoder_state = tf.nn.dynamic_rnn(cell=encoder_cell,
inputs=features,
sequence_length=[10]*params['batch_size'],
dtype=tf.float32)
I get the following error
ValueError: Conv Linear expects all args to be of same Dimension: [[2, 600, 400], [2, 600, 400, 5]]
Looking at the tf implementation, it seems that the inputs to dynamic_rnn is only 3-dimensional in contrary to the hidden state, which is 4-dimensional. I tried to pass the input as a nested tuple, but it didn't work.
The problem is similar to TensorFlow dynamic_rnn regressor: ValueError dimension mismatch, it's slightly different though, as they're using a plain LSTMCell (which worked for me).
Can anyone give me a minimal example how to use these 2 APIs together?
Thanks!

As I understand from here https://github.com/iwyoo/ConvLSTMCell-tensorflow/issues/2
Currently, tf.nn.dynamic_rnn doesn't support ConvLSTMCell.
Therefore, as described here, https://github.com/iwyoo/ConvLSTMCell-tensorflow/issues/1 you have to manually create the RNN.
An example is provided in the documentation, https://github.com/iwyoo/ConvLSTMCell-tensorflow/blob/master/README.md
Below I have modified your code according to the above example with the comments where necessary.
height = 400
width = 400
time_steps = 25
channel = 10
batch_size = 2
p_input = tf.placeholder(tf.float32, [None, height, width, time_steps, channel])
p_label = tf.placeholder(tf.float32, [None, height, width, 3])
p_input_list = tf.split(p_input, step_size, 3) # creates a list of leghth time_steps and one elemnt has the shape of (?, 400, 400, 1, 10)
p_input_list = [tf.squeeze(p_input_, [3]) for p_input_ in p_input_list] #remove the third dimention now one list elemnt has the shape of (?, 400, 400, 10)
cell = tf.contrib.rnn.ConvLSTMCell(conv_ndims=2, # ConvLSTMCell definition
input_shape=[height, width, channel],
output_channels=5,
kernel_shape=[7, 7],
skip_connection=False)
state = cell.zero_state(batch_size, dtype=tf.float32) #initial state is zero
with tf.variable_scope("ConvLSTM") as scope: # as BasicLSTMCell # create the RNN with a loop
for i, p_input_ in enumerate(p_input_list):
if i > 0:
scope.reuse_variables()
# ConvCell takes Tensor with size [batch_size, height, width, channel].
t_output, state = cell(p_input_, state)
Notice that you have to input an image that has the same height and width. If your height and width doesn't match, then you may have to do padding.
hope this helps.

In the meantime I figured out how to use the 2 APIs together. The trick is to pass a 5D-Tensor as input to tf.nn.dynamic_rnn(), where the last dimension is the size of the "vector on the spatial grid" (which comes from the transformation of the input from 2D to 3D, inspired by the paper on which the implementation is based: https://arxiv.org/pdf/1506.04214.pdf). In my case the vector size is 1, I have to expand the dimension anyway though.
While fixing this error another issue emerged: In the paper mentioned above in section 3.1 they state the equations for the convLSTM. They use the Hadamard-product for weights connected to the cell outputs C. Printing the weights of my ConvLSTMCell in Tensorflow, it seems like they don't use the weights Wci, Wcf and Wco at all. So, can anybody tell me the exact implementation of the TF ConvLSTMCell?
Btw. the output of the tensorflow ConvSTMCell is C or H (in the notation of the paper)?

Related

How to determine weight dimension for input tensor of Rank-3?

I'm trying to design an Autoencoder for activity classification for 3-channel input (Tri-axial accelerometer data).
The input tensor is of shape [None,200,3] ([Batch size, window size, number of channels]) and in the first layer, I want to simply reduce the dimension of input layer to [None,150,3]. Here is the code for creating placeholders and the first layer:
import tensorflow as tf
def denseLayer(inputVal,weight,bias):
return tf.nn.relu((tf.matmul(inputVal,weight)+bias))
x = tf.placeholder(dtype=tf.float32,shape=[None,200,3]) #Input tensor
wIn = tf.get_variable(name='wIn',initializer=tf.truncated_normal(stddev=0.1,dtype=tf.float32,shape=[200,150]))
bIn = tf.get_variable(name='bIn',initializer=tf.constant(value = 0.1,shape=[150,3],dtype=tf.float32))
firstLayer = denseLayer(x,weight=wIn,bias=bIn)
This code will, of course, result in an error (due to the difference in rank between x and wIn) and i am unable to determine the shape of wIn variable to get the desired shape of firstLayer that is [None,150,3].
Here is how the final network should look (simplified version with lesser layers):

I think this does what you want:
import tensorflow as tf
def denseLayer(inputVal, weight, bias):
# Each input "channel" uses the corresponding set of weights
value = tf.einsum('nic,ijc->njc', inputVal, weight) + bias
return tf.nn.relu(value)
#Input tensor
x = tf.placeholder(dtype=tf.float32, shape=[None, 200, 3])
# Weights and biases have three "channels" each
wIn = tf.get_variable(name='wIn',
shape=[200, 150, 3],
initializer=tf.truncated_normal_initializer(stddev=0.1))
bIn = tf.get_variable(name='bIn',
shape=[150, 3],
initializer=tf.constant_initializer(value=0.1))
firstLayer = denseLayer(x, weight=wIn, bias=bIn)
print(firstLayer)
# Tensor("Relu:0", shape=(?, 150, 3), dtype=float32)
Here wIn can be seen as three sets of [200, 150] parameters that are applied to each input channel. I think tf.einsum is the easiest way to implement that in this case.

Tensorflow shape inference static RNN compiler error

I am working on OCR software optimized for phone camera images.
Currently, each 300 x 1000 x 3 (RGB) image is reformatted as a 900 x 1000 numpy array. I have plans for a more complex model architecture, but for now I just want to get a baseline working. I want to get started by training a static RNN on the data that I've generated.
Formally, I am feeding in n_t at each timestep t for T timesteps, where n_t is a 900-vector and T = 1000 (similar to reading the whole image left to right). Here is the Tensorflow code in which I create batches for training:
sequence_dataset = tf.data.Dataset.from_generator(example_generator, (tf.int32,
tf.int32))
sequence_dataset = sequence_dataset.batch(experiment_params['batch_size'])
iterator = sequence_dataset.make_initializable_iterator()
x_batch, y_batch = iterator.get_next()
The tf.nn.static_bidirectional_rnn documentation claims that the input must be a "length T list of inputs, each a tensor of shape [batch_size, input_size], or a nested tuple of such elements." So, I go through the following steps in order to get the data into the correct format.
# Dimensions go from [batch, n , t] -> [t, batch, n]
x_batch = tf.transpose(x_batch, [2, 0, 1])
# Unpack such that x_batch is a length T list with element dims [batch_size, n]
x_batch = tf.unstack(x_batch, experiment_params['example_t'], 0)
Without altering the batch any further, I make the following call:
output, _, _ = tf.nn.static_rnn(lstm_fw_cell, x_batch, dtype=tf.int32)
Note that I do not explicitly tell Tensorflow the dimensions of the matrices (this could be the problem). They all have the same dimensionality, yet I am getting the following bug:
ValueError: Input size (dimension 0 of inputs) must be accessible via shape
inference, but saw value None.
At which point in my stack should I be declaring the dimensions of my input? Because I am using a Dataset and hoping to get its batches directly to the RNN, I am not sure that the "placeholder -> feed_dict" route makes sense. If that in fact is the method that makes the most sense, let me know what that looks like (I definitely do not know). Otherwise, let me know if you have any other insights to the problem. Thanks!

The reason for the absence of static shape information is that TensorFlow doesn't understand enough about the example_generator function to determine the shapes of the arrays it yields, and so it assumes the shapes can be completely different from one element to the next. The best way to constrain this is to specify the optional output_shapes argument to tf.data.Dataset.from_generator(), which accepts a nested structure of shapes matching the structure of the yielded elements (and the output_types argument).
In this case you'd pass a tuple of two shapes, which can be partially specified. For example, if the x elements are 900 x 1000 arrays and the y elements are scalars:
sequence_dataset = tf.data.Dataset.from_generator(
example_generator, (tf.int32, tf.int32),
output_shapes=([900, 1000], []))

Tensorflow convolution

I'm trying to perform a convolution (conv2d) on images of variable dimensions. I have those images in form of an 1-D array and I want to perform a convolution on them, but I have a lot of troubles with the shapes.
This is my code of the conv2d:
tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
where x is the input image.
The error is:
ValueError: Shape must be rank 4 but is rank 1 for 'Conv2D' (op: 'Conv2D') with input shapes: [1], [5,5,1,32].
I think I might reshape x, but I don't know the right dimensions. When I try this code:
x = tf.reshape(self.x, shape=[-1, 5, 5, 1]) # example
I get this:
ValueError: Dimension size must be evenly divisible by 25 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [1], [4] and with input tensors computed as partial shapes: input[1] = [?,5,5,1].

You can't use conv2d with a tensor of rank 1. Here's the description from the doc:
Computes a 2-D convolution given 4-D input and filter tensors.
These four dimensions are [batch, height, width, channels] (as Engineero already wrote).
If you don't know the dimensions of the image in advance, tensorflow allows to provide a dynamic shape:
x = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='x')
with tf.Session() as session:
print session.run(x, feed_dict={x: data})
In this example, a 4-D tensor x is created, but only the number of channels is known statically (3), everything else is determined on runtime. So you can pass this x into conv2d, even if the size is dynamic.
But there's another problem. You didn't say your task, but if you're building a convolutional neural network, I'm afraid, you'll need to know the size of the input to determine the size of FC layer after all pooling operations - this size must be static. If this is the case, I think the best solution is actually to scale your inputs to a common size before passing it into a convolutional network.
UPD:
Since it wasn't clear, here's how you can reshape any image into 4-D array.
a = np.zeros([50, 178, 3])
shape = a.shape
print shape # prints (50, 178, 3)
a = a.reshape([1] + list(shape))
print a.shape # prints (1, 50, 178, 3)

Tensorflow Grid LSTM RNN TypeError

I'm trying to build a LSTM RNN that handles 3D data in Tensorflow. From this paper, Grid LSTM RNN's can be n-dimensional. The idea for my network is a have a 3D volume [depth, x, y] and the network should be [depth, x, y, n_hidden] where n_hidden is the number of LSTM cell recursive calls. The idea is that each pixel gets its own "string" of LSTM recursive calls.
The output should be [depth, x, y, n_classes]. I'm doing a binary segmentation -- think foreground and background, so the number of classes is just 2.
# Network Parameters
n_depth = 5
n_input_x = 200 # MNIST data input (img shape: 28*28)
n_input_y = 200
n_hidden = 128 # hidden layer num of features
n_classes = 2
# tf Graph input
x = tf.placeholder("float", [None, n_depth, n_input_x, n_input_y])
y = tf.placeholder("float", [None, n_depth, n_input_x, n_input_y, n_classes])
# Define weights
weights = {}
biases = {}
# Initialize weights
for i in xrange(n_depth * n_input_x * n_input_y):
weights[i] = tf.Variable(tf.random_normal([n_hidden, n_classes]))
biases[i] = tf.Variable(tf.random_normal([n_classes]))
def RNN(x, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_input_y, n_input_x)
# Permuting batch_size and n_input_y
x = tf.reshape(x, [-1, n_input_y, n_depth * n_input_x])
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_input_y*batch_size, n_input_x)
x = tf.reshape(x, [-1, n_input_x * n_depth])
# Split to get a list of 'n_input_y' tensors of shape (batch_size, n_hidden)
# This input shape is required by `rnn` function
x = tf.split(0, n_depth * n_input_x * n_input_y, x)
# Define a lstm cell with tensorflow
lstm_cell = grid_rnn_cell.GridRNNCell(n_hidden, input_dims=[n_depth, n_input_x, n_input_y])
# lstm_cell = rnn_cell.MultiRNNCell([lstm_cell] * 12, state_is_tuple=True)
# lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.8)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
# pdb.set_trace()
output = []
for i in xrange(n_depth * n_input_x * n_input_y):
#I'll need to do some sort of reshape here on outputs[i]
output.append(tf.matmul(outputs[i], weights[i]) + biases[i])
return output
pred = RNN(x, weights, biases)
pred = tf.transpose(tf.pack(pred),[1,0,2])
pred = tf.reshape(pred, [-1, n_depth, n_input_x, n_input_y, n_classes])
# pdb.set_trace()
temp_pred = tf.reshape(pred, [-1, n_classes])
n_input_y = tf.reshape(y, [-1, n_classes])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(temp_pred, n_input_y))
Currently I'm getting the error: TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
It occurs after the RNN intialization: outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
x of course is of type float32
I am unable to tell what type GridRNNCell returns, any helpe here? This could be the issue. Should I be defining more arguments to this? input_dims makes sense, but what should output_dims be?
Is this a bug in the contrib code?
GridRNNCell is located in contrib/grid_rnn/python/ops/grid_rnn_cell.py

I was unsure on some of the implementation decisions of the code, so I decided to roll my own. One thing to keep in mind is that this is an implementation of just the cell. It is up to you to build the actual machinery that handles the locations and interactions of the h and m vectors and isn't as simple as passing in your data and expecting it to traverse the dimensions properly.
So for example, if you are working in two dimensions, start with the top left block, take the incoming x and y vectors, concat them together, then use your cell to compute the output (which includes outgoing vectors for both x and y); and it is up to you to store the output for later use in neighboring blocks. Pass those outputs individually to each corresponding dimension, and in each of those neighboring blocks, concat the incoming vectors (again, for each dimension) and compute the output for the neighboring blocks. To do this, you'll need two for-loops, one for each dimension.
Perhaps the version in contrib will work for this, but a couple problems I have with it (I could be wrong here, but as far as I can tell):
1) The vectors are handled using concat and slice rather than with tuples. This will likely result in slower performance.
2) It looks like the input is projected at each step, which doesn't sit well with me. In the paper they only project into the network for incoming blocks along the edge of the grid and not throughout.
If you look at the code, it is actually very simple. Perhaps reading the paper and making adjustments to the code as needed, or rolling your own are your best bet. And remember that the cell is only good for performing the recurrence at each step, and not for managing the incoming and outgoing h and m vectors.

which version of Grid LSTM cells are you using?
If you are using https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/rnn_cell.py
I think you can try to initialize 'feature_size' and 'frequency_skip'.
Also, I think there may exists another bug. Feed a dynamic shape into this version may cause a TypeError

Yes, dynamic shape was the cause. There is a PR to fix this: https://github.com/tensorflow/tensorflow/pull/4631
#jstaker7: Thank you for trying it out. Re. problem 1, the above PR uses tuples for states and outputs, hopefully it can address the performance issue. GridRNNCell was created some while ago, at that time all the LSTMCells in Tensorflow was using concat/slice instead of tuple.
Re. problem 2, GridRNNCell will not project the input if you pass None. A dimension can be both input and recurrent, and when there is no input (inputs = None), it will use the recurrent tensors for computation. We can also use 2 input dimensions, by instantiate the GridRNNCell directly.
Of course writing a generic class for all cases makes the code looks a bit convoluted, and I think that it needs better documentation.
Anyway, it will be great if you could share your improvements, or any idea you might have to make it clearer/more useful. It is the nature of an open-source project anyway.

Use batch_size in model_fn in skflow

I need to create a random variable inside my model_fn(), having shape [batch_size, 20].
I do not want to pass batch_size as an argument, because then I cannot use a different batch size for prediction.
Removing the parts which do not concern this question, my model_fn() is:
def model(inp, out):
eps = tf.random_normal([batch_size, 20], 0, 1, name="eps"))) # batch_size is the
# value I do not want to hardcode
# dummy example
predictions = tf.add(inp, eps)
return predictions, 1
if I replace [batch_size, 20] by inp.get_shape(), I get
ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, 20)
when running myclf.setup_training().
If I try
def model(inp, out):
batch_size = tf.placeholder("float", [])
eps = tf.random_normal([batch_size.eval(), 20], 0, 1, name="eps")))
# dummy example
predictions = tf.add(inp, eps)
return predictions, 1
I get ValueError: Cannot evaluate tensor using eval(): No default session is registered. Usewith sess.as_default()or pass an explicit session to eval(session=sess) (understandably, because I have not provided a feed_dict)
How can I access the value of batch_size inside model_fn(), while remaining able to change it during prediction?

I wasn't aware of the difference between Tensor.get_shape() and tf.shape(Tensor). The latter works:
eps = tf.random_normal(tf.shape(inp), 0, 1, name="eps")))
As mentionned in Tensorflow 0.8 FAQ:
How do I build a graph that works with variable batch sizes?
It is often useful to build a graph that works with variable batch
sizes, for example so that the same code can be used for (mini-)batch
training, and single-instance inference. The resulting graph can be
saved as a protocol buffer and imported into another program.
When building a variable-size graph, the most important thing to
remember is not to encode the batch size as a Python constant, but
instead to use a symbolic Tensor to represent it. The following tips
may be useful:
Use batch_size = tf.shape(input)[0] to extract the batch dimension
from a Tensor called input, and store it in a Tensor called
batch_size.
Use tf.reduce_mean() instead of tf.reduce_sum(...) / batch_size.
If you use placeholders for feeding input, you can specify a variable
batch dimension by creating the placeholder with tf.placeholder(...,
shape=[None, ...]). The None element of the shape corresponds to a
variable-sized dimension.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.