Tensorflow, What is the best way to arrange tensors dimensions? - python

I am using tensorflow to train an RNN model. I store my input tensors with the shape (Batch Size, Time Steps, 128) where the 128 is the length of the one hot encoding to represent ASCII characters. To input a time step into an RNN I use the following function to reshape it to (Batch Size, 128)...
def getTimeStep(x, t):
return tf.reshape(x[:, t, :], (-1, 128))
I am wondering if this is the most efficient way to feed my RNN the time steps. I am not sure about how memory is ordered in tensorflow. Here is the rest of my code for a sequence-sequence encoder. Notice that I am saving the output after each timestep since I want to feed it into an attention model in my decoder. Could I be doing something more efficiently?
input_tensor = tf.placeholder(tf.float32, (BATCH_SIZE, TIME_STEPS, 128), 'input_tensor')
expected_output = tf.placeholder(tf.float32, (BATCH_SIZE, TIME_STEPS, 128), 'expected_output')
with tf.variable_scope('encoder') as encode_scope:
encoder_rnn = rnn.MultiRNNCell([rnn.GRUCell(1024)] * 3)
encoder_state = tf.zeros((BATCH_SIZE, encoder_rnn.state_size))
encoder_outputs = [None] * TIME_STEPS
for t in range(TIME_STEPS):
encoder_outputs[t], encoder_state = encoder_rnn(getTimeStep(input_tensor, t), encoder_state)
encoder_outputs = tf.concat(1, [tf.reshape(t, (BATCH_SIZE, 1, 1024)) for t in encoder_outputs])


Pytorch LSTM and cross entropy

I am working on sentiment analysis, I want to classify the output into 4 classes. For loss I am using cross-entropy.
The problem is PyTorch cross-entropy needs the input of (batch_size, output) which is am having trouble with.
I am taking a batch size of 12 and sequence size is 32
import torch.nn as nn
class RNN(nn.Module):
def __init__(self, hidden_dim = 256, input_size = 32 , num_layers = 1, num_classes=4, vocab_size = len(vocab_to_int)+1, embedding_dim=100):
self.input_size = input_size
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.num_classes = num_classes
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers)
self.fc1 = nn.Linear(hidden_dim, 50)
self.fc2 = nn.Linear(50, 4)
def forward(self, x, hidden):
x = self.embedding(x)
x = x.view(32, 12, 100)
x, hidden = self.lstm(x, hidden)
x = x.contiguous().view(-1, 256)
x = self.fc1(x) # output shape ([384, 50])
x = self.fc2(x) # output shape [384, 4]
return x, hidden
def init_hidden(self, batch_size=12):
weight = next(self.parameters()).data
hidden = (weight.new(self.num_layers, 12, self.hidden_dim).zero_().cuda(), weight.new(self.num_layers, 12, self.hidden_dim).zero_().cuda())
return hidden
According to the CrossEntropyLoss docs:
input has to be a Tensor of size (C) for unbatched input, (minibatch,C) [for batched input] [...]
The code you provided is only the RNN class and not the data processing and the actual call to CrossEntropyLoss, but the error you stated in the comments makes me think that you didn't reshape the labels tensor to have the same size as the output from the neural network. Therefore, you'd be calculating the loss of a tensor with size (384, 4) against another tensor which I infer is of size (12, 32). Your labels tensor should be of size (384) to match the first dimension of the neural network output.
Also, you don't have to manually reshape your tensors, you can reshape them after the forward() call through the torch.nn.utils.rnn.pack_padded_sequence() function. If you do apply this function to both the output of the neural network and the labels, you will have a tensor of size (384, 4) that PyTorch can handle in the call to CrossEntropyLoss. See the note in the pack_padded_sequence() function docs for more details.

How to Pre-process image for keras.VGG19?

I am attempting to train the keras VGG-19 model on RGB images, when attempting to feed forward this error arises:
ValueError: Input 0 of layer block1_conv1 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [224, 224, 3]
When reshaping image to (224, 224, 3, 1) to include batch dim, and then feeding forward as shown in code, this error occurs:
ValueError: Dimensions must be equal, but are 1 and 3 for '{{node BiasAdd}} = BiasAdd[T=DT_FLOAT, data_format="NHWC"](strided_slice, Const)' with input shapes: [64,224,224,1], [3]
for idx in tqdm(range(train_data.get_ds_size() // batch_size)):
# train step
batch = train_data.get_train_batch()
for sample, label in zip(batch[0], batch[1]):
sample = tf.reshape(sample, [*sample.shape, 1])
label = tf.reshape(label, [*label.shape, 1])
train_step(idx, sample, label)
vgg is intialized as:
vgg = tf.keras.applications.VGG19(
input_shape=[224, 224, 3],
training function:
def train_step(idx, sample, label):
with tf.GradientTape() as tape:
# preprocess for vgg-19
sample = tf.image.resize(sample, (224, 224))
sample = tf.keras.applications.vgg19.preprocess_input(sample * 255)
predictions = vgg(sample, training=True)
# mean squared error in prediction
loss = tf.keras.losses.MSE(label, predictions)
# apply gradients
gradients = tape.gradient(loss, vgg.trainable_variables)
optimizer.apply_gradients(zip(gradients, vgg.trainable_variables))
# update metrics
train_accuracy(vgg, predictions)
I am wondering how the input should be formatted such that the keras VGG-19 implementation will accept it?
You will have to unsqueeze one dimension to turn your shape into [1, 224, 224, 3':
for idx in tqdm(range(train_data.get_ds_size() // batch_size)):
# train step
batch = train_data.get_train_batch()
for sample, label in zip(batch[0], batch[1]):
sample = tf.reshape(sample, [1, *sample.shape]) # added the 1 here
label = tf.reshape(label, [*label.shape, 1])
train_step(idx, sample, label)
You use wrong dimension for the image batch, "When reshaping image to (224, 224, 3, 1) to include batch dim" -- this should be (x, 224, 224, 3), where x is the number of the images in the batch.

TensorFlow Keras MaxPool2D breaks LSTM with CTC loss?

I am trying to tie together a CNN layer with 2 LSTM layers and ctc_batch_cost for loss, but I'm encountering some problems. My model is supposed to work with grayscale images.
During my debugging I've figured out that if I use just a CNN layer that keeps the output size equal to the input size + LSTM and CTC, the model is able to train:
# === Without MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
# Go from Bx128x32x1 to Bx128x32 (B x TimeSteps x Features)
rnn_inp = Reshape((128, 32))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, calling fit works!
But when I add a MaxPool2D layer that halves the dimensions, I get an error sequence_length(0) <= 64, similar to the one presented here.
# === With MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
maxp = MaxPool2D(name='maxp', pool_size=2, strides=2, padding='valid')(cnn) # -> 64x16x1
# Go from Bx64x16x1 to Bx64x16 (B x TimeSteps x Features)
rnn_inp = Reshape((64, 16))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, but calling fit crashes with:
# InvalidArgumentError: sequence_length(0) <= 64
# [[{{node ctc_loss_1/CTCLoss}}]]
After struggling for about 3 days with this problem, I posted the above question here, on StackOverflow. About 2 hours after posting the questions I finally figured it out.
TL;DR Solution:
If you're using ctc_batch_cost:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the input_length argument.
If you're using ctc_loss:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the logit_length argument.
The solution lies in the documentation, which, relatively sparse, can be cryptic for a machine learning newbie like myself.
The TensorFlow documentation for ctc_batch_cost reads:
y_true, y_pred, input_length, label_length
input_length tensor (samples, 1) containing the sequence length for
each batch item in y_pred.
input_length corresponds to logit_length from ctc_loss function's TensorFlow documentation:
labels, logits, label_length, logit_length, logits_time_major=True, unique=None,
blank_index=None, name=None
logit_length tensor of shape [batch_size] Length of input sequence in
That's where it clicked, at the word logit. So, the argument for input_length or logit_length is supposed to be a tensor/container (in my case, numpy array) of the lengths (i.e. number of timesteps) of the sequences entering the RNN (in my case LSTM) as input.
I was originally making the mistake of considering the required length to be the width of the grayscale images that act as input for the whole network (CNN + MaxPool2D + RNN), but because the MaxPool2D layer creates a tensor of different dimensions for the RNN's input, the ctc loss function crashes.
Now fit runs without crashing.

Simple LSTM Error: Error when checking input: expected lstm_20_input to have shape (None, 10, 3) but got array with shape (1, 64, 3)

I have a batching generator function that is not feeding the correct batch shape to an LSTM. When I test the function it appears to return the correct shape [n_samples, n_timesteps, n_features] but this throws an error when fitting the model.
I have checked the function by looping over the generator to check the batch shapes and they return the correct number of samples, time steps etc.
from keras.models import Sequential
from keras.layers import Dense, LSTM, TimeDistributed, RepeatVector
def batch_generator(x_train_scaled, y_train_scaled, batch_size, sequence_length):
Generator function to develop sequential batches of data.
# Infinite loop.
while True:
# Allocate a new array for the batch of input-signals.
x_shape = np.array((batch_size, sequence_length, num_x_signals))
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
# Allocate a new array for the batch of output-signals.
y_shape = np.array((batch_size, sequence_length, 1))
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
for i in range(batch_size):
# Copy the sequences of data starting at this index.
x_batch[i] = x_train_scaled[i:i+sequence_length]
y_batch[i] = y_train_scaled[i:i+sequence_length]
yield (x_batch, y_batch)
# test function
batch_size = 10
sequence_length = 10
batch_gen = batch_generator(x_train_scaled, y_train_scaled,batch_size=batch_size,
x_batch, y_batch = next(batch_gen)
# test that returns correct shape (10, 10, 3) and (10, 10, 1)
def build_model(generator, n_outputs):
# define encoder/decoder architecture, use Time Distributed layer
model = Sequential()
model.add(LSTM(10, activation='relu', input_shape=(x_batch.shape[1],
model.add(LSTM(10, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(5, activation='relu')))
model.compile(loss='mse', optimizer='adam')
# fit network
verbose = 1)
return model

How does tf.layers.dense() interact with inputs of higher dim?

In tensorflow layers.dense(inputs, units, activation) implements a Multi-Layer Perceptron layer with arbitrary activation function.
Output = activation(matmul(input, weights) + bias)
Typically input has shape=[batch_size, input_size] and might look like this: (units = 128 and activation = tf.nn.relu are chosen arbitrarily)
inputx = tf.placeholder(float, shape=[batch_size, input_size])
dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)
I have not found any documentation on what would happen, if i fed higher dimensional input, e.g. because one might have time_steps resulting in a tensor of shape=[time_step, batch_size, input_size]. What one would want here is that the layer is applied to each single input_vector for each timestep for each element of the batch. To put it a bit differently, the internal matmul of layers.dense() should simply use broadcasting in numpy style. Is the behaviour i expect here what actually happens? I.e. is:
inputx = tf.placeholder(float, shape=[time_step, batch_size, input_size])
dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)
applying the dense layer to each input of size input_size for each time_step for each element in batch_size? This should then result in a tensor(in dense_layer above) of shape=[time_step, batch_size, 128]
I'm asking, as e.g. tf.matmul does not support broadcasting in the numpy style, so i'm not sure, how tensorflow handles these cases.
Edit: This post is related, but does not finally answer my question
You can verify your expectation by checking the shape of the dense kernel as follows.
>>> inputx = tf.placeholder(float, shape=[2,3,4])
>>> dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)
>>> g=tf.get_default_graph()
>>> g.get_collection('variables')
[<tf.Variable 'dense/kernel:0' shape=(4, 128) dtype=float32_ref>, <tf.Variable 'dense/bias:0' shape=(128,) dtype=float32_ref>]
The behavior of the dense layer is the same as a conv layer.
You can consider inputx as an image which has width=2, height=3 and channel=4 and the dense layer as a conv layer which has 128 filters and filters size is 1*1.
