How does tf.layers.dense() interact with inputs of higher dim?

How does tf.layers.dense() interact with inputs of higher dim? - python

In tensorflow layers.dense(inputs, units, activation) implements a Multi-Layer Perceptron layer with arbitrary activation function.
Output = activation(matmul(input, weights) + bias)
Typically input has shape=[batch_size, input_size] and might look like this: (units = 128 and activation = tf.nn.relu are chosen arbitrarily)
inputx = tf.placeholder(float, shape=[batch_size, input_size])
dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)
I have not found any documentation on what would happen, if i fed higher dimensional input, e.g. because one might have time_steps resulting in a tensor of shape=[time_step, batch_size, input_size]. What one would want here is that the layer is applied to each single input_vector for each timestep for each element of the batch. To put it a bit differently, the internal matmul of layers.dense() should simply use broadcasting in numpy style. Is the behaviour i expect here what actually happens? I.e. is:
inputx = tf.placeholder(float, shape=[time_step, batch_size, input_size])
dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)
applying the dense layer to each input of size input_size for each time_step for each element in batch_size? This should then result in a tensor(in dense_layer above) of shape=[time_step, batch_size, 128]
I'm asking, as e.g. tf.matmul does not support broadcasting in the numpy style, so i'm not sure, how tensorflow handles these cases.
Edit: This post is related, but does not finally answer my question

You can verify your expectation by checking the shape of the dense kernel as follows.
>>> inputx = tf.placeholder(float, shape=[2,3,4])
>>> dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)
>>> g=tf.get_default_graph()
>>> g.get_collection('variables')
[<tf.Variable 'dense/kernel:0' shape=(4, 128) dtype=float32_ref>, <tf.Variable 'dense/bias:0' shape=(128,) dtype=float32_ref>]
The behavior of the dense layer is the same as a conv layer.
You can consider inputx as an image which has width=2, height=3 and channel=4 and the dense layer as a conv layer which has 128 filters and filters size is 1*1.

Related

TensorFlow Keras MaxPool2D breaks LSTM with CTC loss?

I am trying to tie together a CNN layer with 2 LSTM layers and ctc_batch_cost for loss, but I'm encountering some problems. My model is supposed to work with grayscale images.
During my debugging I've figured out that if I use just a CNN layer that keeps the output size equal to the input size + LSTM and CTC, the model is able to train:
# === Without MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
# Go from Bx128x32x1 to Bx128x32 (B x TimeSteps x Features)
rnn_inp = Reshape((128, 32))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, calling fit works!
But when I add a MaxPool2D layer that halves the dimensions, I get an error sequence_length(0) <= 64, similar to the one presented here.
# === With MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
maxp = MaxPool2D(name='maxp', pool_size=2, strides=2, padding='valid')(cnn) # -> 64x16x1
# Go from Bx64x16x1 to Bx64x16 (B x TimeSteps x Features)
rnn_inp = Reshape((64, 16))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, but calling fit crashes with:
# InvalidArgumentError: sequence_length(0) <= 64
# [[{{node ctc_loss_1/CTCLoss}}]]

After struggling for about 3 days with this problem, I posted the above question here, on StackOverflow. About 2 hours after posting the questions I finally figured it out.
TL;DR Solution:
If you're using ctc_batch_cost:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the input_length argument.
If you're using ctc_loss:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the logit_length argument.
Solution:
The solution lies in the documentation, which, relatively sparse, can be cryptic for a machine learning newbie like myself.
The TensorFlow documentation for ctc_batch_cost reads:
tf.keras.backend.ctc_batch_cost(
y_true, y_pred, input_length, label_length
)
...
input_length tensor (samples, 1) containing the sequence length for
each batch item in y_pred.
...
input_length corresponds to logit_length from ctc_loss function's TensorFlow documentation:
tf.nn.ctc_loss(
labels, logits, label_length, logit_length, logits_time_major=True, unique=None,
blank_index=None, name=None
)
...
logit_length tensor of shape [batch_size] Length of input sequence in
logits.
...
That's where it clicked, at the word logit. So, the argument for input_length or logit_length is supposed to be a tensor/container (in my case, numpy array) of the lengths (i.e. number of timesteps) of the sequences entering the RNN (in my case LSTM) as input.
I was originally making the mistake of considering the required length to be the width of the grayscale images that act as input for the whole network (CNN + MaxPool2D + RNN), but because the MaxPool2D layer creates a tensor of different dimensions for the RNN's input, the ctc loss function crashes.
Now fit runs without crashing.

How to add skip connection between convolutional layers in Keras

I would like to add a skip connection between residual blocks in keras. This is my current implementation, which does not work because the tensors have different shapes.
The function looks like this:
def build_res_blocks(net, x_in, num_res_blocks, res_block, num_filters, res_block_expansion, kernel_size, scaling):
net_next_in = net
for i in range(num_res_blocks):
net = res_block(net_next_in, num_filters, res_block_expansion, kernel_size, scaling)
# net tensor shape: (None, None, 32)
# x_in tensor shape: (None, None, 3)
# Error here, net_next_in should be in the shape of (None, None, 32) to be fed into next layer
net_next_in = Add()([net, x_in])
return net
But I get
ValueError: Operands could not be broadcast together with shapes (None, None, 32) (None, None, 3)
How to add or merge these tensors into the correct shape (None, None, 32)? If this is not the correct approach, how could you achieve the intended result?
This is what the res_block looks like:
def res_block(x_in, num_filters, expansion, kernel_size, scaling):
x = Conv2D(num_filters * expansion, kernel_size, padding='same')(x_in)
x = Activation('relu')(x)
x = Conv2D(num_filters, kernel_size, padding='same')(x)
x = Add()([x_in, x])
return x

You cant add tensors of different shape. You could concatenate them with keras.layers.Concatenate, but this would leave you with a tensor of shape [None, None, 35].
Alternatively, have a look at the
Resnet50 implementation in Keras. Their residual block features a 1x1xC convolution in the shortcut for those cases where the dimensions to be added are different.

Tensorflow Resnet example: Is the bottleneck size wrong?

The following code is from the Tensorflow Resnet example at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/resnet.py:
# Create the bottleneck groups, each of which contains `num_blocks`
# bottleneck groups.
for group_i, group in enumerate(groups):
for block_i in range(group.num_blocks):
name = 'group_%d/block_%d' % (group_i, block_i)
# 1x1 convolution responsible for reducing dimension
with tf.variable_scope(name + '/conv_in'):
conv = tf.layers.conv2d(
net,
filters=group.num_filters,
kernel_size=1,
padding='valid',
activation=tf.nn.relu)
conv = tf.layers.batch_normalization(conv, training=training)
with tf.variable_scope(name + '/conv_bottleneck'):
conv = tf.layers.conv2d(
conv,
filters=group.bottleneck_size,
kernel_size=3,
padding='same',
activation=tf.nn.relu)
conv = tf.layers.batch_normalization(conv, training=training)
# 1x1 convolution responsible for restoring dimension
with tf.variable_scope(name + '/conv_out'):
input_dim = net.get_shape()[-1].value
conv = tf.layers.conv2d(
conv,
filters=input_dim,
kernel_size=1,
padding='valid',
activation=tf.nn.relu)
conv = tf.layers.batch_normalization(conv, training=training)
# shortcut connections that turn the network into its counterpart
# residual function (identity shortcut)
net = conv + net
This piece of code runs for each block, with a given output and bottleneck dimension. These blocks are defined as:
# Configurations for each bottleneck group.
BottleneckGroup = namedtuple('BottleneckGroup',
['num_blocks', 'num_filters', 'bottleneck_size'])
groups = [
BottleneckGroup(3, 128, 32), BottleneckGroup(3, 256, 64),
BottleneckGroup(3, 512, 128), BottleneckGroup(3, 1024, 256)
]
In the common practice, as far as I know, these so-called bottleneck layers of the ResNets first reduce the input channel count with 1x1 kernels, applies higher order (3x3) convolutions in that reduced channel size and then restores to the input channel size back with a final 1x1 convolutional layer, as given in the original ResNet paper:
But in the Tensorflow example, the first 1x1 layer uses the block output size, not the bottleneck size, hence no meaningful channel reduction is made. Is the Tensorflow example really wrong here, or am I missing something?

LSTM Initial state from Dense layer

I am using a lstm on time series data. I have features about the time series that are not time dependent. Imagine company stocks for the series and stuff like company location in the non-time series features. This is not the usecase, but it is the same idea. For this example, let's just predict the next value in the time series.
So a simple example would be:
feature_input = Input(shape=(None, data.training_features.shape[1]))
dense_1 = Dense(4, activation='relu')(feature_input)
dense_2 = Dense(8, activation='relu')(dense_1)
series_input = Input(shape=(None, data.training_series.shape[1]))
lstm = LSTM(8)(series_input, initial_state=dense_2)
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])
however, I am just not sure on how to specify the initial state on the list correctly. I get
ValueError: An initial_state was passed that is not compatible with `cell.state_size`. Received `state_spec`=[<keras.engine.topology.InputSpec object at 0x11691d518>]; However `cell.state_size` is (8, 8)
which I can see is caused by the 3d batch dimension. I tried using Flatten, Permutation, and Resize layers but I don't believe that is correct. What am I missing and how can I connect these layers?

The first problem is that an LSTM(8) layer expects two initial states h_0 and c_0, each of dimension (None, 8). That's what it means by "cell.state_size is (8, 8)" in the error message.
If you only have one initial state dense_2, maybe you can switch to GRU (which requires only h_0). Or, you can transform your feature_input into two initial states.
The second problem is that h_0 and c_0 are of shape (batch_size, 8), but your dense_2 is of shape (batch_size, timesteps, 8). You need to deal with the time dimension before using dense_2 as initial states.
So maybe you can change your input shape into (data.training_features.shape[1],) or take average over timesteps with GlobalAveragePooling1D.
A working example would be:
feature_input = Input(shape=(5,))
dense_1_h = Dense(4, activation='relu')(feature_input)
dense_2_h = Dense(8, activation='relu')(dense_1_h)
dense_1_c = Dense(4, activation='relu')(feature_input)
dense_2_c = Dense(8, activation='relu')(dense_1_c)
series_input = Input(shape=(None, 5))
lstm = LSTM(8)(series_input, initial_state=[dense_2_h, dense_2_c])
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])

Tensorflow, What is the best way to arrange tensors dimensions?

I am using tensorflow to train an RNN model. I store my input tensors with the shape (Batch Size, Time Steps, 128) where the 128 is the length of the one hot encoding to represent ASCII characters. To input a time step into an RNN I use the following function to reshape it to (Batch Size, 128)...
def getTimeStep(x, t):
return tf.reshape(x[:, t, :], (-1, 128))
I am wondering if this is the most efficient way to feed my RNN the time steps. I am not sure about how memory is ordered in tensorflow. Here is the rest of my code for a sequence-sequence encoder. Notice that I am saving the output after each timestep since I want to feed it into an attention model in my decoder. Could I be doing something more efficiently?
input_tensor = tf.placeholder(tf.float32, (BATCH_SIZE, TIME_STEPS, 128), 'input_tensor')
expected_output = tf.placeholder(tf.float32, (BATCH_SIZE, TIME_STEPS, 128), 'expected_output')
with tf.variable_scope('encoder') as encode_scope:
encoder_rnn = rnn.MultiRNNCell([rnn.GRUCell(1024)] * 3)
encoder_state = tf.zeros((BATCH_SIZE, encoder_rnn.state_size))
encoder_outputs = [None] * TIME_STEPS
for t in range(TIME_STEPS):
encoder_outputs[t], encoder_state = encoder_rnn(getTimeStep(input_tensor, t), encoder_state)
encode_scope.reuse_variables()
encoder_outputs = tf.concat(1, [tf.reshape(t, (BATCH_SIZE, 1, 1024)) for t in encoder_outputs])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.