TensorFlow Keras MaxPool2D breaks LSTM with CTC loss?

TensorFlow Keras MaxPool2D breaks LSTM with CTC loss? - python

I am trying to tie together a CNN layer with 2 LSTM layers and ctc_batch_cost for loss, but I'm encountering some problems. My model is supposed to work with grayscale images.
During my debugging I've figured out that if I use just a CNN layer that keeps the output size equal to the input size + LSTM and CTC, the model is able to train:
# === Without MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
# Go from Bx128x32x1 to Bx128x32 (B x TimeSteps x Features)
rnn_inp = Reshape((128, 32))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, calling fit works!
But when I add a MaxPool2D layer that halves the dimensions, I get an error sequence_length(0) <= 64, similar to the one presented here.
# === With MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
maxp = MaxPool2D(name='maxp', pool_size=2, strides=2, padding='valid')(cnn) # -> 64x16x1
# Go from Bx64x16x1 to Bx64x16 (B x TimeSteps x Features)
rnn_inp = Reshape((64, 16))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, but calling fit crashes with:
# InvalidArgumentError: sequence_length(0) <= 64
# [[{{node ctc_loss_1/CTCLoss}}]]

After struggling for about 3 days with this problem, I posted the above question here, on StackOverflow. About 2 hours after posting the questions I finally figured it out.
TL;DR Solution:
If you're using ctc_batch_cost:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the input_length argument.
If you're using ctc_loss:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the logit_length argument.
Solution:
The solution lies in the documentation, which, relatively sparse, can be cryptic for a machine learning newbie like myself.
The TensorFlow documentation for ctc_batch_cost reads:
tf.keras.backend.ctc_batch_cost(
y_true, y_pred, input_length, label_length
)
...
input_length tensor (samples, 1) containing the sequence length for
each batch item in y_pred.
...
input_length corresponds to logit_length from ctc_loss function's TensorFlow documentation:
tf.nn.ctc_loss(
labels, logits, label_length, logit_length, logits_time_major=True, unique=None,
blank_index=None, name=None
)
...
logit_length tensor of shape [batch_size] Length of input sequence in
logits.
...
That's where it clicked, at the word logit. So, the argument for input_length or logit_length is supposed to be a tensor/container (in my case, numpy array) of the lengths (i.e. number of timesteps) of the sequences entering the RNN (in my case LSTM) as input.
I was originally making the mistake of considering the required length to be the width of the grayscale images that act as input for the whole network (CNN + MaxPool2D + RNN), but because the MaxPool2D layer creates a tensor of different dimensions for the RNN's input, the ctc loss function crashes.
Now fit runs without crashing.

Related

How do I run an iterative 2D convolution for each slice of a tensor?

I'm working on a machine learning project with convolutional neural networks using TF/Keras in Python, and my goal is to split up an image up into patches, run a convolution on each one separately, and then put it back together.
What I can't figure out how to do is run a convolution for each slice of a 3D array.
For example, if I have a tensor of size (500,100,100) I want to do a separate convolution for all 500 slices of size (100 x 100). I'm implementing this within a custom Keras layer and want these to be trainable weights I've tried a few different things:
Using map.fn() to run a convolution for each slice of the array
This doesn't seem to attach weights to each layer separately.
Using the DepthwiseConv2D layer:
This works well for the first call of the layer, but fails when I call the layer the second time with more filters because it wants to perform the depthwise convolution on each of the previous filtered layers
This, of course isn't what I want because I want one convolution for each of the previous sets of filters from the previous layer.
Any ideas are appreciated, as I'm truly stuck here. Thank you!

If you have a tensor with shape (500,100,100) and want to feed some subset of this tensor, to separate conv2d layers at the same time, you may do this by defining conv2d layers in the same level. You should first define Lambda layers to split input, then feed their output to Conv2D layers, then concatenate them.
Let's take a tensor with shape (100,28,28,1) as an example, that we want to split it into 2 subset tensor and apply conv2d layers on each subset separately:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D, Input, concatenate, Lambda
from tensorflow.keras.models import Model
# define a sample dataset
x = tf.random.uniform((100, 28, 28, 1))
y = tf.random.uniform((100, 1), dtype=tf.int32, minval=0, maxval=9)
ds = tf.data.Dataset.from_tensor_slices((x, y))
ds = ds.batch(16)
def create_nn_model():
input = Input(shape=(28,28,1))
b1 = Lambda(lambda a: a[:,:14,:,:], name="first_slice")(input)
b2 = Lambda(lambda a: a[:,14:,:,:], name="second_slice")(input)
d1 = Conv2D(64, 2, padding='same', activation='relu', name="conv1_first_slice")(b1)
d2 = Conv2D(64, 2, padding='same', activation='relu', name="conv2_second_slice")(b2)
x = concatenate([d1,d2], axis=1)
x = Flatten()(x)
x = Dense(64, activation='relu')(x)
out = Dense(10, activation='softmax')(x)
model = Model(input, out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
model = create_nn_model()
tf.keras.utils.plot_model(model, show_shapes=True)
Here is the plotted model architecture:

ValueError: Input 0 is incompatible with layer lstm_14: expected ndim=3, found ndim=2

I am building a cnn_rnn network for image classification. I am getting an error while running the following python code in my jupyter notebook.
# model
model1 = Sequential()
# first convolutional layer
model1.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape(160, 120, 3)))
# second convolutional layer
model1.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
#Adding a pooling Layer
model1.add(MaxPooling2D(pool_size=(3, 3)))
#Adding dropouts
model1.add(Dropout(0.25))
# flatten and put a fully connected layer
model1.add(Flatten())
model1.add(Dense(32, activation='relu')) # fully connected
#Adding RNN N/W
model1.add(LSTM(32, return_sequences=True))
model1.add(TimeDistributed(Dense(5, activation='softmax')))
I also tried adding input_shape=(160, 120, 3) as a parameter to the LSTM function but to no avail. Please Help!
P.S: I also tried using GRU instead of LSTM but got the same error.
Update: Please note the model.summary() results
enter image description here

Your error is due to your use of Flatten and Dense BEFORE the LSTM layer.
LSTM layers require the input to be in the shape of (Batchsize x Length x Feature depth0) (or some variant), whereas your flatten changes the Conv2D output from (B x H x W x F) to (B x W * F * H) if that makes sense. If you want to use this architecture, I'd recommend using a Reshape layer to flatten the dimension you want, and using a Conv1D of kernel size 1 (same as a fully-connected layer) before the LSTM layer.
Or, if you want to use this exact code, add this before your LSTM layer and it should work:
model1.add(Reshape(target_shape=(1,54*40,32))
It's 54 and 40 due a pool_size of (3,3).

RNN with GRU in Keras

I want to implement Recurrent Neural network with GRU using Keras in python. I have problem in running code and I change variables more and more but it doesn't work. Do you have an idea for solve it?
inputs = 42 #number of columns input
num_hidden =50 #number of neurons in the layer
outputs = 1 #number of columns output
num_epochs = 50
batch_size = 1000
learning_rate = 0.05
#train (125973, 42) 125973 Rows and 42 Features
#Labels (125973,1) is True Results
model = tf.contrib.keras.models.Sequential()
fv=tf.contrib.keras.layers.GRU
model.add(fv(units=42, activation='tanh', input_shape= (1000,42),return_sequences=True)) #i want to send Batches to train
#model.add(tf.keras.layers.Dropout(0.15)) # Dropout overfitting
#model.add(fv((1,42),activation='tanh', return_sequences=True))
#model.add(Dropout(0.2)) # Dropout overfitting
model.add(fv(42, activation='tanh'))
model.add(tf.keras.layers.Dropout(0.15)) # Dropout overfitting
model.add(tf.keras.layers.Dense(1000,activation='softsign'))
#model.add(tf.keras.layers.Activation("softsign"))
start = time.time()
# sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
# model.compile(loss="mse", optimizer=sgd)
model.compile(loss="mse", optimizer="Adam")
inp = np.array(train)
oup = np.array(labels)
X_tr = inp[:batch_size].reshape(-1, batch_size, inputs)
model.fit(X_tr,labels,epochs=20, batch_size=batch_size)
However I get the following error:
ValueError: Error when checking target: expected dense to have shape (1000,) but got array with shape (1,)

Here, you have mentioned input vector shape to be 1000.
model.add(fv(units=42, activation='tanh', input_shape= (1000,42),return_sequences=True)) #i want to send Batches to train
However, shape of your training data (X_tr) is 1-D
Check your X_tr variable and have same dimension for input layer.

If you read the error carefully you would realize there is a shape mismatch between the shapes of labels you provide, which is (None, 1), and the shape of output of model, which is (None, 1):
ValueError: Error when checking target: <--- This means the output shapes
expected dense to have shape (1000,) <--- output shape of model
but got array with shape (1,) <--- the shape of labels you give when training
Therefore you need to make them consistent. You just need to change the number of units in the last layer to 1 since there is one output per input sample:
model.add(tf.keras.layers.Dense(1, activation='softsign')) # 1 unit in the output

LSTM Initial state from Dense layer

I am using a lstm on time series data. I have features about the time series that are not time dependent. Imagine company stocks for the series and stuff like company location in the non-time series features. This is not the usecase, but it is the same idea. For this example, let's just predict the next value in the time series.
So a simple example would be:
feature_input = Input(shape=(None, data.training_features.shape[1]))
dense_1 = Dense(4, activation='relu')(feature_input)
dense_2 = Dense(8, activation='relu')(dense_1)
series_input = Input(shape=(None, data.training_series.shape[1]))
lstm = LSTM(8)(series_input, initial_state=dense_2)
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])
however, I am just not sure on how to specify the initial state on the list correctly. I get
ValueError: An initial_state was passed that is not compatible with `cell.state_size`. Received `state_spec`=[<keras.engine.topology.InputSpec object at 0x11691d518>]; However `cell.state_size` is (8, 8)
which I can see is caused by the 3d batch dimension. I tried using Flatten, Permutation, and Resize layers but I don't believe that is correct. What am I missing and how can I connect these layers?

The first problem is that an LSTM(8) layer expects two initial states h_0 and c_0, each of dimension (None, 8). That's what it means by "cell.state_size is (8, 8)" in the error message.
If you only have one initial state dense_2, maybe you can switch to GRU (which requires only h_0). Or, you can transform your feature_input into two initial states.
The second problem is that h_0 and c_0 are of shape (batch_size, 8), but your dense_2 is of shape (batch_size, timesteps, 8). You need to deal with the time dimension before using dense_2 as initial states.
So maybe you can change your input shape into (data.training_features.shape[1],) or take average over timesteps with GlobalAveragePooling1D.
A working example would be:
feature_input = Input(shape=(5,))
dense_1_h = Dense(4, activation='relu')(feature_input)
dense_2_h = Dense(8, activation='relu')(dense_1_h)
dense_1_c = Dense(4, activation='relu')(feature_input)
dense_2_c = Dense(8, activation='relu')(dense_1_c)
series_input = Input(shape=(None, 5))
lstm = LSTM(8)(series_input, initial_state=[dense_2_h, dense_2_c])
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])

Understanding lstm input shape in keras with different sequence

I'm very new to keras and also to python.
I have a time series dataset with different sequence lengths (for example 1st sequence is 484000x128, 2nd sequence is 563110x128, etc)
I've put the sequences in 3D array.
My question is how to define the input shape, because I'm confused. I was using DL4J but the concept is different in defining the network configuration.
Here is my first trial code:
import numpy as np
from keras.models import Sequential
from keras.layers import Embedding,LSTM,Dense,Dropout
## Loading dummy data
sequences = np.array([[[1,2,3],[1,2,3]], [[4,5,6],[4,5,6],[4,5,6]]])
y = np.array([[[0],[0]], [[1],[1],[1]]])
x_test=np.array([[2,3,2],[4,6,7],[1,2,1]])
y_test=np.array([0,1,1])
n_epochs=40
# model configration
model = Sequential()
model.add(LSTM(100, input_shape=(3,1), activation='tanh', recurrent_activation='hard_sigmoid')) # 100 num of LSTM units
model.add(LSTM(100, activation='tanh', recurrent_activation='hard_sigmoid'))
model.add(Dense(1, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print(model.summary())
## training with batches of size 1 (each batch is a sequence)
for epoch in range(n_epochs):
for seq, label in zip(sequences, y):
model.train(np.array([seq]), [label]) # train a batch at a time..
scores=model.evaluate(x_test, y_test) # evaluate batch at a time..

Here is the docs on input shapes for LSTMs:
Input shapes
3D tensor with shape (batch_size, timesteps, input_dim), (Optional) 2D
tensors with shape (batch_size, output_dim).
Which implies that you you're going to need timesteps with a constant size for each batch.
The canonical way of doing this is padding your sequences using something like keras's padding utility
then you can try:
# let say timestep you choose: is 700000 and dimension of the vectors are 128
timestep = 700000
dims = 128
model.add(LSTM(100, input_shape=(timestep, dim),
activation='tanh', recurrent_activation='hard_sigmoid'))
I edited the answer to remove the batch_size argument. With this setup the batch size is unspecified, you could set that when you fitting the model (in model.fit()).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.