Adding Batchnormalization layer to lstm encoder decoder model - python

I am interested in how to add BatchNormalization layer to the LSTM encoder decoder model. I have a code for LSTM encoder decoder model which does time series forecasting.
num_features = X_train.shape[2]
# Define an input series and encode it with an LSTM.
encoder_inputs = Input(shape=(None, num_features))
encoder = LSTM(units_size, return_state=True, dropout=dropout)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the final states. These represent the "context"
# vector that we use as the basis for decoding.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
# This is where teacher forcing inputs are fed in.
decoder_inputs = Input(shape=(None, 1))
# We set up our decoder using `encoder_states` as initial state.
# We return full output sequences and return internal states as well.
# We don't use the return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(units_size, return_sequences=True, return_state=True, dropout=dropout)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(1) # 1 continuous output at each timestep
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(Adam(lr = learning_rate), loss='mean_absolute_error')
I would like to add BatchNormalization layer to the decoder part. But I do not know I should use it. I would appreciate any help.

Related

Attention layer to keras seq2seq model

I have seen the keras now comes with Attention Layer. However, I have some problem using it in my Seq2Seq model.
This is the working seq2seq model without attention:
latent_dim = 300
embedding_dim = 200
clear_session()
# Encoder
encoder_inputs = Input(shape=(max_text_len, ))
# Embedding layer
enc_emb = Embedding(x_voc, embedding_dim,
trainable=True)(encoder_inputs)
# Encoder LSTM 1
encoder_lstm1 = Bidirectional(LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_output1, forward_h1, forward_c1, backward_h1, backward_c1) = encoder_lstm1(enc_emb)
# Encoder LSTM 2
encoder_lstm2 = Bidirectional(LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_output2, forward_h2, forward_c2, backward_h2, backward_c2) = encoder_lstm2(encoder_output1)
# Encoder LSTM 3
encoder_lstm3 = Bidirectional(LSTM(latent_dim, return_state=True,
return_sequences=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_outputs, forward_h, forward_c, backward_h, backward_c) = encoder_lstm3(encoder_output2)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
# Set up the decoder, using encoder_states as the initial state
decoder_inputs = Input(shape=(None, ))
# Embedding layer
dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)
# Decoder LSTM
decoder_lstm = LSTM(latent_dim*2, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.2)
(decoder_outputs, decoder_fwd_state, decoder_back_state) = \
decoder_lstm(dec_emb, initial_state=[state_h, state_c])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
I have modified the model to add Attention like this ( this is after # Decoder LSTM and right before # Dense Layer):
attn_out, attn_states = Attention()([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attn_out])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
This throws TypeError: Cannot iterate over a Tensor with unknown first dimension.
How do I apply attention mechanism to my seq2seq model? If keras Attention layer does not work and/or other models are easy to use, I am happy to use them as well.
This is how I run my model:
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy')
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=2)
history = model.fit(
[x_tr, y_tr[:, :-1]],
y_tr.reshape(y_tr.shape[0], y_tr.shape[1], 1)[:, 1:],
epochs=50,
callbacks=[es],
batch_size=128,
verbose=1,
validation_data=([x_val, y_val[:, :-1]],
y_val.reshape(y_val.shape[0], y_val.shape[1], 1)[:
, 1:]),
)
The shape of x_tr is (89674, 300), y_tr[:, :-1] is (89674, 14). Similarly, the shape of x_val and y_val[:, :-1] are (9964, 300) and (9964, 14) repectively.
You are using Attention layer from keras, it returns only a 3D tensor not two tensors.
So your code must be:
attn_out = Attention()([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attn_out])

How to do inference on seq2seq RNN?

I'm trying to create a chatbot using an RNN in TensorFlow, using this introduction https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
The model in the example is a character based sequence, but I want to do a word-level model. The tutorial has a tiny bit of info in the "Bonus FAQ" section on how to modify the model to make it word-level. I am using GloVe pretrained word embeddings.
My model looks like this:
emb_dimension = 100
# Set up embedding layer using pretrained weights
embedding_layer = Embedding(total_words, emb_dimension, input_length=max_input_len, weights=[embedding_matrix], name="Embedding")
# Set up input sequence
encoder_inputs = Input(shape=(None,))
x = embedding_layer(encoder_inputs)
encoder_lstm = LSTM(100, return_state=True)
x, state_h, state_c = encoder_lstm(x)
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
x = embedding_layer(decoder_inputs)
decoder_lstm = LSTM(100, return_sequences=True)
decoder_lstm(x, initial_state=encoder_states)
decoder_outputs = Dense(total_words, activation='softmax')(x)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
It seems to train fine, but I don't know how to use this model to process new text. The tutorial has an inference example, but this has not been modified for a word-level model, and I can't figure out how to do it. Particularly this bit in the example:
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
I tried modifying this code to add an embedding layer x = embedding_layer(decoder_inputs) and then use x for the input to the decoder lstm, but I get an error: TypeError: Cannot iterate over a Tensor with unknown first dimension.
How do I set up an inference model?
Writing a decoder for inference is not that easy. First of all you have to understand that your current decoder uses teacher-forcing (meaning that the correct token of the previous position is given for the prediction at the next timestep). For inference you would eather need the greedy algorithm or beam search. Those steps are desrcirbed in this tutorial: https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt

Specifying a seq2seq autoencoder. What does RepeatVector do? And what is the effect of batch learning on predicting output?

I am building a basic seq2seq autoencoder, but I'm not sure if I'm doing it correctly.
model = Sequential()
# Encoder
model.add(LSTM(32, activation='relu', input_shape =(timesteps, n_features ), return_sequences=True))
model.add(LSTM(16, activation='relu', return_sequences=False))
model.add(RepeatVector(timesteps))
# Decoder
model.add(LSTM(16, activation='relu', return_sequences=True))
model.add(LSTM(32, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))'''
The model is then fit using a batch size parameter
model.fit(data, data,
epochs=30,
batch_size = 32)
The model is compiled with the mse loss function and seems to learn.
To get the encoder output for the test data, I am using a K function:
get_encoder_output = K.function([model.layers[0].input],
[model.layers[1].output])
encoder_output = get_encoder_output([test_data])[0]
My first question is whether the model is specified correctly. In particular whether the RepeatVector layer is needed. I'm not sure what it is doing. What if I omit it and specify the preceding layer with return_sequences = True?
My second question is whether I need to tell get_encoder_output about the batch_size used in training?
Thanks in advance for any help on either question.
This might prove useful to you:
As a toy problem I created a seq2seq model for predicting the continuation of different sine waves.
This was the model:
def create_seq2seq():
features_num=5
latent_dim=40
##
encoder_inputs = Input(shape=(None, features_num))
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoder_inputs)
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
encoded = LSTM(latent_dim, return_state=True)(encoded)
encoder = Model (input=encoder_inputs, output=encoded)
##
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
decoder_inputs=Input(shape=(1, features_num))
decoder_lstm_1 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_2 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_3 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_4 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_dense = Dense(features_num)
all_outputs = []
inputs = decoder_inputs
states_1=encoder_states
# Placeholder values:
states_2=states_1; states_3=states_1; states_4=states_1
###
for _ in range(1):
# Run the decoder on the first timestep
outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1)
outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2)
outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3)
# Store the current prediction (we will concatenate all predictions later)
outputs = decoder_dense(outputs_4)
all_outputs.append(outputs)
# Reinject the outputs as inputs for the next loop iteration
# as well as update the states
inputs = outputs
states_1 = [state_h_1, state_c_1]
states_2 = [state_h_2, state_c_2]
states_3 = [state_h_3, state_c_3]
states_4 = [state_h_4, state_c_4]
for _ in range(149):
# Run the decoder on each timestep
outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1, initial_state=states_2)
outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2, initial_state=states_3)
outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3, initial_state=states_4)
# Store the current prediction (we will concatenate all predictions later)
outputs = decoder_dense(outputs_4)
all_outputs.append(outputs)
# Reinject the outputs as inputs for the next loop iteration
# as well as update the states
inputs = outputs
states_1 = [state_h_1, state_c_1]
states_2 = [state_h_2, state_c_2]
states_3 = [state_h_3, state_c_3]
states_4 = [state_h_4, state_c_4]
# Concatenate all predictions
decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
#model = load_model('pre_model.h5')
print(model.summary()
return (model)
The best way, in my opinion, to implement a seq2seq LSTM in Keras, is by using 2 LSTM models and having the first one transfer its states to the second one.
Your last LSTM layer in the encoder will need
return_state=True ,return_sequences=False so it will pass on its h and c.
You will then need to set an LSTM decoder that will receive these as it's initial_state.
For decoder input you will most likely want a "start of sequence" token as the first time step input, and afterwards use the decoder output of the nth time step as the input of the the decoder in the (n+1)th time step.
After you have mastered this, have a look at Teacher Forcing.

Keras example word-level model with integer sequences gives `expected ndim=3, found ndim=4`

I'm trying to implement the Keras word-level example on their blog listed under the Bonus Section -> What if I want to use a word-level model with integer sequences?
I've marked up the layers with names to help me reconnect the layers from a loaded model to a inference model later. I think I've followed their example model:
# Define an input sequence and process it - where the shape is (timesteps, n_features)
encoder_inputs = Input(shape=(None, src_vocab), name='enc_inputs')
# Add an embedding layer to process the integer encoded words to give some 'sense' before the LSTM layer
encoder_embedding = Embedding(src_vocab, latent_dim, name='enc_embedding')(encoder_inputs)
# The return_state constructor argument configures a RNN layer to return a list where the first entry is the outputs
# and the next entries are the internal RNN states. This is used to recover the states of the encoder.
encoder_outputs, state_h, state_c = LSTM(latent_dim, return_state=True, name='encoder_lstm')(encoder_embedding)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state of the RNN.
decoder_inputs = Input(shape=(None, target_vocab), name='dec_inputs')
decoder_embedding = Embedding(target_vocab, latent_dim, name='dec_embedding')(decoder_inputs)
# The return_sequences constructor argument, configuring a RNN to return its full sequence of outputs (instead of
# just the last output, which the defaults behavior).
decoder_lstm = LSTM(latent_dim, return_sequences=True, name='dec_lstm')(decoder_embedding, initial_state=encoder_states)
decoder_outputs = Dense(target_vocab, activation='softmax', name='dec_outputs')(decoder_lstm)
# Put the model together
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
but I get
ValueError: Input 0 is incompatible with layer encoder_lstm: expected ndim=3, found ndim=4
on the line
encoder_outputs, state_h, state_c = LSTM(...
What am I missing? Or is the example on the blog assuming a step that I've skipped?
Update:
And I'm training with:
X = [source_data, target_data]
y = offset_data(target_data)
model.fit(X, y, ...)
Update 2:
So, I'm still not quite there. I have my decoder_lstm and decoder_outputs defined like above and have fixed the inputs. When I load my model from an h5 file and build my inference model, I try and connect to the training model with
decoder_inputs = model.input[1] # dec_inputs (Input(shape=(None,)))
# decoder_embedding = model.layers[3] # dec_embedding (Embedding(target_vocab, latent_dim))
target_vocab = model.output_shape[2]
decoder_state_input_h = Input(shape=(latent_dim,), name='input_3') # named to avoid conflict
decoder_state_input_c = Input(shape=(latent_dim,), name='input_4')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
# Use decoder_lstm from the training model
# decoder_lstm = LSTM(latent_dim, return_sequences=True)
decoder_lstm = model.layers[5] # dec_lstm
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
but I get an error
ValueError: Input 0 is incompatible with layer dec_lstm: expected ndim=3, found ndim=2
Trying to pass decoder_embedding rather than decoder_inputs fails too.
I'm trying to adapt the example of lstm_seq2seq_restore.py but it doesn't include the complexity of the embedding layer.
Update 3:
When I use decoder_outputs, state_h, state_c = decoder_lstm(decoder_embedding, ...) to build the inference model I've confirmed that decoder_embedding is an object of type Embedding but I get:
ValueError: Layer dec_lstm was called with an input that isn't a symbolic tensor. Received type: <class 'keras.layers.embeddings.Embedding'>. Full input: [<keras.layers.embeddings.Embedding object at 0x1a1f22eac8>, <tf.Tensor 'input_3:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'input_4:0' shape=(?, 256) dtype=float32>]. All inputs to the layer should be tensors.
The full code for this model is on Bitbucket.
The problem is in the input shape of Input layer. An embedding layer accepts a sequence of integers as input which corresponds to words indices in a sentence. Since here the number of words in sentences is not fixed, therefore you must set the input shape of Input layer as (None,).
I think you are mistaking it with the case that we don't have an Embedding layer in our model and therefore the input shape of the model is (timesteps, n_features) to make it compatible with LSTM layer.
Update:
You need to pass the decoder_inputs to the Embedding layer first and then pass the resulting output tensor to the decoder_lstm layer like this:
decoder_inputs = model.input[1] # (Input(shape=(None,)))
# pass the inputs to the embedding layer
decoder_embedding = model.get_layer(name='dec_embedding')(decoder_inputs)
# ...
decoder_lstm = model.get_layer(name='dec_lstm') # dec_lstm
decoder_outputs, state_h, state_c = decoder_lstm(decoder_embedding, ...)
Update 2:
In training time, when creating the decoder_lstm layer you need to set return_state=True:
decoder_lstm, _, _ = LSTM(latent_dim, return_sequences=True, return_state=True, name='dec_lstm')(decoder_embedding, initial_state=encoder_states)

Keras: Why is TimeDistributed not used instead of Dense in the official seq2seq example?

I'm now trying to build seq2seq model on top of Keras.
I referred to this official seq2seq example but I wondered why not using TimeDistibuted layer instead of Dense.
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax') # <- Here!
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
Seq2seq model must deal with a many-to-many problem, so each timestep must be processed separately. Therefore I thought this model should have TimeDistributed(Dense()) layer but actually, it's Dense.
Can anyone explain the reason to me?

Categories