Attention layer to keras seq2seq model - python

I have seen the keras now comes with Attention Layer. However, I have some problem using it in my Seq2Seq model.
This is the working seq2seq model without attention:
latent_dim = 300
embedding_dim = 200
clear_session()
# Encoder
encoder_inputs = Input(shape=(max_text_len, ))
# Embedding layer
enc_emb = Embedding(x_voc, embedding_dim,
trainable=True)(encoder_inputs)
# Encoder LSTM 1
encoder_lstm1 = Bidirectional(LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_output1, forward_h1, forward_c1, backward_h1, backward_c1) = encoder_lstm1(enc_emb)
# Encoder LSTM 2
encoder_lstm2 = Bidirectional(LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_output2, forward_h2, forward_c2, backward_h2, backward_c2) = encoder_lstm2(encoder_output1)
# Encoder LSTM 3
encoder_lstm3 = Bidirectional(LSTM(latent_dim, return_state=True,
return_sequences=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_outputs, forward_h, forward_c, backward_h, backward_c) = encoder_lstm3(encoder_output2)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
# Set up the decoder, using encoder_states as the initial state
decoder_inputs = Input(shape=(None, ))
# Embedding layer
dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)
# Decoder LSTM
decoder_lstm = LSTM(latent_dim*2, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.2)
(decoder_outputs, decoder_fwd_state, decoder_back_state) = \
decoder_lstm(dec_emb, initial_state=[state_h, state_c])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
I have modified the model to add Attention like this ( this is after # Decoder LSTM and right before # Dense Layer):
attn_out, attn_states = Attention()([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attn_out])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
This throws TypeError: Cannot iterate over a Tensor with unknown first dimension.
How do I apply attention mechanism to my seq2seq model? If keras Attention layer does not work and/or other models are easy to use, I am happy to use them as well.
This is how I run my model:
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy')
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=2)
history = model.fit(
[x_tr, y_tr[:, :-1]],
y_tr.reshape(y_tr.shape[0], y_tr.shape[1], 1)[:, 1:],
epochs=50,
callbacks=[es],
batch_size=128,
verbose=1,
validation_data=([x_val, y_val[:, :-1]],
y_val.reshape(y_val.shape[0], y_val.shape[1], 1)[:
, 1:]),
)
The shape of x_tr is (89674, 300), y_tr[:, :-1] is (89674, 14). Similarly, the shape of x_val and y_val[:, :-1] are (9964, 300) and (9964, 14) repectively.

You are using Attention layer from keras, it returns only a 3D tensor not two tensors.
So your code must be:
attn_out = Attention()([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attn_out])

Related

How to do inference on seq2seq RNN?

I'm trying to create a chatbot using an RNN in TensorFlow, using this introduction https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
The model in the example is a character based sequence, but I want to do a word-level model. The tutorial has a tiny bit of info in the "Bonus FAQ" section on how to modify the model to make it word-level. I am using GloVe pretrained word embeddings.
My model looks like this:
emb_dimension = 100
# Set up embedding layer using pretrained weights
embedding_layer = Embedding(total_words, emb_dimension, input_length=max_input_len, weights=[embedding_matrix], name="Embedding")
# Set up input sequence
encoder_inputs = Input(shape=(None,))
x = embedding_layer(encoder_inputs)
encoder_lstm = LSTM(100, return_state=True)
x, state_h, state_c = encoder_lstm(x)
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
x = embedding_layer(decoder_inputs)
decoder_lstm = LSTM(100, return_sequences=True)
decoder_lstm(x, initial_state=encoder_states)
decoder_outputs = Dense(total_words, activation='softmax')(x)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
It seems to train fine, but I don't know how to use this model to process new text. The tutorial has an inference example, but this has not been modified for a word-level model, and I can't figure out how to do it. Particularly this bit in the example:
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
I tried modifying this code to add an embedding layer x = embedding_layer(decoder_inputs) and then use x for the input to the decoder lstm, but I get an error: TypeError: Cannot iterate over a Tensor with unknown first dimension.
How do I set up an inference model?
Writing a decoder for inference is not that easy. First of all you have to understand that your current decoder uses teacher-forcing (meaning that the correct token of the previous position is given for the prediction at the next timestep). For inference you would eather need the greedy algorithm or beam search. Those steps are desrcirbed in this tutorial: https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt

Dimension mismatch in Keras sequence to sequence model with Attention

I am trying to build a Neural Machine Translation model with attention. I am following the tutorial on Keras blog that shows how to build a NMT model using sequence-to-sequence approach (without attention). I extended the model to incorporate attention in the following way -
latent_dim = 300
embedding_dim=100
batch_size = 128
# Encoder
encoder_inputs = keras.Input(shape=(None, num_encoder_tokens))
#encoder lstm 1
encoder_lstm = tf.keras.layers.LSTM(latent_dim,return_sequences=True,return_state=True,dropout=0.4,recurrent_dropout=0.4)
encoder_output, state_h, state_c = encoder_lstm(encoder_inputs)
print(encoder_output.shape)
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = keras.Input(shape=(None, num_decoder_tokens))
decoder_lstm = tf.keras.layers.LSTM(latent_dim, return_sequences=True, return_state=True,dropout=0.4,recurrent_dropout=0.2)
decoder_output,decoder_fwd_state, decoder_back_state = decoder_lstm(decoder_inputs,initial_state=[state_h, state_c])
# Attention layer
attn_out = tf.keras.layers.Attention()([encoder_output, decoder_output])
# Concat attention input and decoder LSTM output
decoder_concat_input = tf.keras.layers.Concatenate(axis=-1, name='concat_layer')([decoder_output, attn_out])
#dense layer
decoder_dense = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_decoder_tokens, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
# Define the model
attn_model = tf.keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)
attn_model.summary()
To train the model -
attn_model.compile(
optimizer="rmsprop", loss="categorical_crossentropy", metrics=["accuracy"]
)
history = attn_model.fit(
[encoder_input_data, decoder_input_data],
decoder_target_data,
batch_size=batch_size,
epochs=5,
validation_split=0.2,
)
Here I have below shape
encoder_input_data.shape is (10000, 16, 71)
decoder_input_data.shape is (10000, 59, 92)
decoder_target_data.shape is (10000, 59, 92)
When I train this model, I get below error:
InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 59 and 16. Shapes are [?,59] and [?,16]. for 'model/concat_layer/concat' (op: 'ConcatV2') with input shapes: [?,59,300], [?,16,300], [] and with computed input tensors: input[2] = <2>.
I understand that it is complaining about the dimension of encoder_input_data and decoder_input_data but this same setup works when we run the regular sequence-to-sequence model (without attention) as discussed in keras blog.In this case, it is throwing error because of the Concatenation layer.
Can anyone please suggest how to fix this?

Specifying a seq2seq autoencoder. What does RepeatVector do? And what is the effect of batch learning on predicting output?

I am building a basic seq2seq autoencoder, but I'm not sure if I'm doing it correctly.
model = Sequential()
# Encoder
model.add(LSTM(32, activation='relu', input_shape =(timesteps, n_features ), return_sequences=True))
model.add(LSTM(16, activation='relu', return_sequences=False))
model.add(RepeatVector(timesteps))
# Decoder
model.add(LSTM(16, activation='relu', return_sequences=True))
model.add(LSTM(32, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))'''
The model is then fit using a batch size parameter
model.fit(data, data,
epochs=30,
batch_size = 32)
The model is compiled with the mse loss function and seems to learn.
To get the encoder output for the test data, I am using a K function:
get_encoder_output = K.function([model.layers[0].input],
[model.layers[1].output])
encoder_output = get_encoder_output([test_data])[0]
My first question is whether the model is specified correctly. In particular whether the RepeatVector layer is needed. I'm not sure what it is doing. What if I omit it and specify the preceding layer with return_sequences = True?
My second question is whether I need to tell get_encoder_output about the batch_size used in training?
Thanks in advance for any help on either question.
This might prove useful to you:
As a toy problem I created a seq2seq model for predicting the continuation of different sine waves.
This was the model:
def create_seq2seq():
features_num=5
latent_dim=40
##
encoder_inputs = Input(shape=(None, features_num))
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoder_inputs)
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
encoded = LSTM(latent_dim, return_state=True)(encoded)
encoder = Model (input=encoder_inputs, output=encoded)
##
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
decoder_inputs=Input(shape=(1, features_num))
decoder_lstm_1 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_2 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_3 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_lstm_4 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_dense = Dense(features_num)
all_outputs = []
inputs = decoder_inputs
states_1=encoder_states
# Placeholder values:
states_2=states_1; states_3=states_1; states_4=states_1
###
for _ in range(1):
# Run the decoder on the first timestep
outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1)
outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2)
outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3)
# Store the current prediction (we will concatenate all predictions later)
outputs = decoder_dense(outputs_4)
all_outputs.append(outputs)
# Reinject the outputs as inputs for the next loop iteration
# as well as update the states
inputs = outputs
states_1 = [state_h_1, state_c_1]
states_2 = [state_h_2, state_c_2]
states_3 = [state_h_3, state_c_3]
states_4 = [state_h_4, state_c_4]
for _ in range(149):
# Run the decoder on each timestep
outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1, initial_state=states_2)
outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2, initial_state=states_3)
outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3, initial_state=states_4)
# Store the current prediction (we will concatenate all predictions later)
outputs = decoder_dense(outputs_4)
all_outputs.append(outputs)
# Reinject the outputs as inputs for the next loop iteration
# as well as update the states
inputs = outputs
states_1 = [state_h_1, state_c_1]
states_2 = [state_h_2, state_c_2]
states_3 = [state_h_3, state_c_3]
states_4 = [state_h_4, state_c_4]
# Concatenate all predictions
decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
#model = load_model('pre_model.h5')
print(model.summary()
return (model)
The best way, in my opinion, to implement a seq2seq LSTM in Keras, is by using 2 LSTM models and having the first one transfer its states to the second one.
Your last LSTM layer in the encoder will need
return_state=True ,return_sequences=False so it will pass on its h and c.
You will then need to set an LSTM decoder that will receive these as it's initial_state.
For decoder input you will most likely want a "start of sequence" token as the first time step input, and afterwards use the decoder output of the nth time step as the input of the the decoder in the (n+1)th time step.
After you have mastered this, have a look at Teacher Forcing.

Adding Batchnormalization layer to lstm encoder decoder model

I am interested in how to add BatchNormalization layer to the LSTM encoder decoder model. I have a code for LSTM encoder decoder model which does time series forecasting.
num_features = X_train.shape[2]
# Define an input series and encode it with an LSTM.
encoder_inputs = Input(shape=(None, num_features))
encoder = LSTM(units_size, return_state=True, dropout=dropout)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the final states. These represent the "context"
# vector that we use as the basis for decoding.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
# This is where teacher forcing inputs are fed in.
decoder_inputs = Input(shape=(None, 1))
# We set up our decoder using `encoder_states` as initial state.
# We return full output sequences and return internal states as well.
# We don't use the return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(units_size, return_sequences=True, return_state=True, dropout=dropout)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(1) # 1 continuous output at each timestep
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(Adam(lr = learning_rate), loss='mean_absolute_error')
I would like to add BatchNormalization layer to the decoder part. But I do not know I should use it. I would appreciate any help.

Keras embedding layer causing dimensionality problems

I am currently trying to include an embedding layer to my sequence-to-sequence autoencoder, built with the keras functional API.
The model code looks like this:
#Encoder inputs
encoder_inputs = Input(shape=(None,))
#Embedding
embedding_layer = Embedding(input_dim=n_tokens, output_dim=2)
encoder_embedded = embedding_layer(encoder_inputs)
#Encoder LSTM
encoder_outputs, state_h, state_c = LSTM(n_hidden, return_state=True)(encoder_embedded)
lstm_states = [state_h, state_c]
#Decoder Inputs
decoder_inputs = Input(shape=(None,))
#Embedding
decoder_embedded = embedding_layer(decoder_inputs)
#Decoder LSTM
decoder_lstm = LSTM(n_hidden, return_sequences=True, return_state=True, )
decoder_outputs, _, _ = decoder_lstm(decoder_embedded, initial_state=lstm_states)
#Dense + Time
decoder_dense = TimeDistributed(Dense(n_tokens, activation='softmax'), input_shape=(None, None, 256))
#decoder_dense = Dense(n_tokens, activation='softmax', )
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
The model is trained like this:
model.fit([X, y], X, epochs=n_epoch, batch_size=n_batch)
with X and y having a shape (n_samples, n_seq_len)
The compiling of the model works flawless, while when trying to train, I will always get:
ValueError: Error when checking target: expected time_distributed_1 to
have 3 dimensions, but got array with shape (n_samples, n_seq_len)
Does anybody have an idea?
Keras Version is 2.2.4
Tensorflow backend version 1.12.0
In such an autoencoder, since the last layer is a softmax classifier you need to one-hot encode the labels:
from keras.utils import to_categorical
one_hot_X = to_categorical(X)
model.fit([X, y], one_hot_X, ...)
As a side note, since the Dense layer is applied on the last axis, there is no need to wrap the Dense layer in TimeDistributed layer.

Categories