I am currently trying to include an embedding layer to my sequence-to-sequence autoencoder, built with the keras functional API.
The model code looks like this:
#Encoder inputs
encoder_inputs = Input(shape=(None,))
#Embedding
embedding_layer = Embedding(input_dim=n_tokens, output_dim=2)
encoder_embedded = embedding_layer(encoder_inputs)
#Encoder LSTM
encoder_outputs, state_h, state_c = LSTM(n_hidden, return_state=True)(encoder_embedded)
lstm_states = [state_h, state_c]
#Decoder Inputs
decoder_inputs = Input(shape=(None,))
#Embedding
decoder_embedded = embedding_layer(decoder_inputs)
#Decoder LSTM
decoder_lstm = LSTM(n_hidden, return_sequences=True, return_state=True, )
decoder_outputs, _, _ = decoder_lstm(decoder_embedded, initial_state=lstm_states)
#Dense + Time
decoder_dense = TimeDistributed(Dense(n_tokens, activation='softmax'), input_shape=(None, None, 256))
#decoder_dense = Dense(n_tokens, activation='softmax', )
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
The model is trained like this:
model.fit([X, y], X, epochs=n_epoch, batch_size=n_batch)
with X and y having a shape (n_samples, n_seq_len)
The compiling of the model works flawless, while when trying to train, I will always get:
ValueError: Error when checking target: expected time_distributed_1 to
have 3 dimensions, but got array with shape (n_samples, n_seq_len)
Does anybody have an idea?
Keras Version is 2.2.4
Tensorflow backend version 1.12.0
In such an autoencoder, since the last layer is a softmax classifier you need to one-hot encode the labels:
from keras.utils import to_categorical
one_hot_X = to_categorical(X)
model.fit([X, y], one_hot_X, ...)
As a side note, since the Dense layer is applied on the last axis, there is no need to wrap the Dense layer in TimeDistributed layer.
Related
I have seen the keras now comes with Attention Layer. However, I have some problem using it in my Seq2Seq model.
This is the working seq2seq model without attention:
latent_dim = 300
embedding_dim = 200
clear_session()
# Encoder
encoder_inputs = Input(shape=(max_text_len, ))
# Embedding layer
enc_emb = Embedding(x_voc, embedding_dim,
trainable=True)(encoder_inputs)
# Encoder LSTM 1
encoder_lstm1 = Bidirectional(LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_output1, forward_h1, forward_c1, backward_h1, backward_c1) = encoder_lstm1(enc_emb)
# Encoder LSTM 2
encoder_lstm2 = Bidirectional(LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_output2, forward_h2, forward_c2, backward_h2, backward_c2) = encoder_lstm2(encoder_output1)
# Encoder LSTM 3
encoder_lstm3 = Bidirectional(LSTM(latent_dim, return_state=True,
return_sequences=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_outputs, forward_h, forward_c, backward_h, backward_c) = encoder_lstm3(encoder_output2)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
# Set up the decoder, using encoder_states as the initial state
decoder_inputs = Input(shape=(None, ))
# Embedding layer
dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)
# Decoder LSTM
decoder_lstm = LSTM(latent_dim*2, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.2)
(decoder_outputs, decoder_fwd_state, decoder_back_state) = \
decoder_lstm(dec_emb, initial_state=[state_h, state_c])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
I have modified the model to add Attention like this ( this is after # Decoder LSTM and right before # Dense Layer):
attn_out, attn_states = Attention()([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attn_out])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
This throws TypeError: Cannot iterate over a Tensor with unknown first dimension.
How do I apply attention mechanism to my seq2seq model? If keras Attention layer does not work and/or other models are easy to use, I am happy to use them as well.
This is how I run my model:
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy')
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=2)
history = model.fit(
[x_tr, y_tr[:, :-1]],
y_tr.reshape(y_tr.shape[0], y_tr.shape[1], 1)[:, 1:],
epochs=50,
callbacks=[es],
batch_size=128,
verbose=1,
validation_data=([x_val, y_val[:, :-1]],
y_val.reshape(y_val.shape[0], y_val.shape[1], 1)[:
, 1:]),
)
The shape of x_tr is (89674, 300), y_tr[:, :-1] is (89674, 14). Similarly, the shape of x_val and y_val[:, :-1] are (9964, 300) and (9964, 14) repectively.
You are using Attention layer from keras, it returns only a 3D tensor not two tensors.
So your code must be:
attn_out = Attention()([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attn_out])
I'm trying to create a chatbot using an RNN in TensorFlow, using this introduction https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
The model in the example is a character based sequence, but I want to do a word-level model. The tutorial has a tiny bit of info in the "Bonus FAQ" section on how to modify the model to make it word-level. I am using GloVe pretrained word embeddings.
My model looks like this:
emb_dimension = 100
# Set up embedding layer using pretrained weights
embedding_layer = Embedding(total_words, emb_dimension, input_length=max_input_len, weights=[embedding_matrix], name="Embedding")
# Set up input sequence
encoder_inputs = Input(shape=(None,))
x = embedding_layer(encoder_inputs)
encoder_lstm = LSTM(100, return_state=True)
x, state_h, state_c = encoder_lstm(x)
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
x = embedding_layer(decoder_inputs)
decoder_lstm = LSTM(100, return_sequences=True)
decoder_lstm(x, initial_state=encoder_states)
decoder_outputs = Dense(total_words, activation='softmax')(x)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
It seems to train fine, but I don't know how to use this model to process new text. The tutorial has an inference example, but this has not been modified for a word-level model, and I can't figure out how to do it. Particularly this bit in the example:
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
I tried modifying this code to add an embedding layer x = embedding_layer(decoder_inputs) and then use x for the input to the decoder lstm, but I get an error: TypeError: Cannot iterate over a Tensor with unknown first dimension.
How do I set up an inference model?
Writing a decoder for inference is not that easy. First of all you have to understand that your current decoder uses teacher-forcing (meaning that the correct token of the previous position is given for the prediction at the next timestep). For inference you would eather need the greedy algorithm or beam search. Those steps are desrcirbed in this tutorial: https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt
I am trying to develop a bio-tagging name entity recognition (multi-class) model. I have 9 classes and converted it to one-hot encoding. During the training I got following error:
ValueError: A target array with shape (2014, 120, 9) was passed for an output of shape (None, 9) while using as loss categorical_crossentropy. This loss expects targets to have the same shape as the output.
My code snippet:
from keras.utils import to_categorical
y = [to_categorical(i, num_classes=n_tags) for i in y] ### One hot encoding
input = Input(shape=(max_len,))
embed = Embedding(input_dim=n_words + 1, output_dim=50,
input_length=max_len, mask_zero=True)(input) # 50-dim embedding
lstm = Bidirectional(LSTM(units=130, return_sequences=True,
recurrent_dropout=0.2))(embed) # variational biLSTM
(lstm, forward_h, forward_c, backward_h, backward_c) = Bidirectional(LSTM(units=130, return_sequences=True, return_state=True, recurrent_dropout=0.2))(lstm) # variational biLSTM
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
context_vector, attention_weights = Attention(10)(lstm, state_h) ### Attention mechanism
output = Dense(9, activation="softmax")(context_vector)
model = Model(input, output)
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['categorical_accuracy'])
model.summary()
history = model.fit(X,np.array(y), batch_size=32, epochs=15,verbose=1)
#### Got error message during training
Don't use one hot encoding and categorical_crossentropy. Instead keep the y vector as it is and use sparse_categorical_crossentropy. See if this works.
Refer https://stackoverflow.com/a/50135466/14337775
I am trying to build a Neural Machine Translation model with attention. I am following the tutorial on Keras blog that shows how to build a NMT model using sequence-to-sequence approach (without attention). I extended the model to incorporate attention in the following way -
latent_dim = 300
embedding_dim=100
batch_size = 128
# Encoder
encoder_inputs = keras.Input(shape=(None, num_encoder_tokens))
#encoder lstm 1
encoder_lstm = tf.keras.layers.LSTM(latent_dim,return_sequences=True,return_state=True,dropout=0.4,recurrent_dropout=0.4)
encoder_output, state_h, state_c = encoder_lstm(encoder_inputs)
print(encoder_output.shape)
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = keras.Input(shape=(None, num_decoder_tokens))
decoder_lstm = tf.keras.layers.LSTM(latent_dim, return_sequences=True, return_state=True,dropout=0.4,recurrent_dropout=0.2)
decoder_output,decoder_fwd_state, decoder_back_state = decoder_lstm(decoder_inputs,initial_state=[state_h, state_c])
# Attention layer
attn_out = tf.keras.layers.Attention()([encoder_output, decoder_output])
# Concat attention input and decoder LSTM output
decoder_concat_input = tf.keras.layers.Concatenate(axis=-1, name='concat_layer')([decoder_output, attn_out])
#dense layer
decoder_dense = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_decoder_tokens, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
# Define the model
attn_model = tf.keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)
attn_model.summary()
To train the model -
attn_model.compile(
optimizer="rmsprop", loss="categorical_crossentropy", metrics=["accuracy"]
)
history = attn_model.fit(
[encoder_input_data, decoder_input_data],
decoder_target_data,
batch_size=batch_size,
epochs=5,
validation_split=0.2,
)
Here I have below shape
encoder_input_data.shape is (10000, 16, 71)
decoder_input_data.shape is (10000, 59, 92)
decoder_target_data.shape is (10000, 59, 92)
When I train this model, I get below error:
InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 59 and 16. Shapes are [?,59] and [?,16]. for 'model/concat_layer/concat' (op: 'ConcatV2') with input shapes: [?,59,300], [?,16,300], [] and with computed input tensors: input[2] = <2>.
I understand that it is complaining about the dimension of encoder_input_data and decoder_input_data but this same setup works when we run the regular sequence-to-sequence model (without attention) as discussed in keras blog.In this case, it is throwing error because of the Concatenation layer.
Can anyone please suggest how to fix this?
I'm trying to build a multi-output keras model starting from a working single output model. Keras however, is complaining about tensors dimensions.
The single output Model:
This GRU model is training and predicting fine:
timesteps = 250
features = 2
input_tensor = Input(shape=(timesteps, features), name="input")
conv = Conv1D(filters=128, kernel_size=6,use_bias=True)(input_tensor)
b = BatchNormalization()(conv)
s_gru, states = GRU(256, return_sequences=True, return_state=True, name="gru_1")(b)
biases = keras.initializers.Constant(value=88.15)
out = Dense(1, activation='linear', name="output")(s_gru)
model = Model(inputs=input_tensor, outputs=out)
My numpy arrays are:
train_x # shape:(7110, 250, 2)
train_y # shape: (7110, 250, 1)
If fit the model with the following code and everything is fine:
model.fit(train_x, train_y,batch_size=128, epochs=10, verbose=1)
The Problem:
I want to use a slightly modified version of the network that outputs also the GRU states:
input_tensor = Input(shape=(timesteps, features), name="input")
conv = Conv1D(filters=128, kernel_size=6,use_bias=True)(input_tensor)
b = BatchNormalization()(conv)
s_gru, states = GRU(256, return_sequences=True, return_state=True, name="gru_1")(b)
biases = keras.initializers.Constant(value=88.15)
out = Dense(1, activation='linear', name="output")(s_gru)
model = Model(inputs=input_tensor, outputs=[out, states]) # multi output
#fit the model but with a list of numpy array as y
model.compile(optimizer=optimizer, loss='mae', loss_weights=[0.5, 0.5])
history = model.fit(train_x, [train_y,train_y], batch_size=128, epochs=10, callbacks=[])
This training fails and keras is complaining about the target dimensions:
ValueError: Error when checking target: expected gru_1 to have 2 dimensions, but got array with shape (7110, 250, 1)
I'm using Keras 2.3.0 and Tensorflow 2.0.
What am I missing here?
The dimensions of the second output and the second element in the outputs list should be of similar shape. In this case, states would be of shape (7110, 256), which can't really be compared to the train_y shape (which will be of shape (7110, 250, 1) as noted in the first code block. Make sure the outputs can be compared with a similar shape.