I am trying to build a Neural Machine Translation model with attention. I am following the tutorial on Keras blog that shows how to build a NMT model using sequence-to-sequence approach (without attention). I extended the model to incorporate attention in the following way -
latent_dim = 300
embedding_dim=100
batch_size = 128
# Encoder
encoder_inputs = keras.Input(shape=(None, num_encoder_tokens))
#encoder lstm 1
encoder_lstm = tf.keras.layers.LSTM(latent_dim,return_sequences=True,return_state=True,dropout=0.4,recurrent_dropout=0.4)
encoder_output, state_h, state_c = encoder_lstm(encoder_inputs)
print(encoder_output.shape)
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = keras.Input(shape=(None, num_decoder_tokens))
decoder_lstm = tf.keras.layers.LSTM(latent_dim, return_sequences=True, return_state=True,dropout=0.4,recurrent_dropout=0.2)
decoder_output,decoder_fwd_state, decoder_back_state = decoder_lstm(decoder_inputs,initial_state=[state_h, state_c])
# Attention layer
attn_out = tf.keras.layers.Attention()([encoder_output, decoder_output])
# Concat attention input and decoder LSTM output
decoder_concat_input = tf.keras.layers.Concatenate(axis=-1, name='concat_layer')([decoder_output, attn_out])
#dense layer
decoder_dense = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_decoder_tokens, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
# Define the model
attn_model = tf.keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)
attn_model.summary()
To train the model -
attn_model.compile(
optimizer="rmsprop", loss="categorical_crossentropy", metrics=["accuracy"]
)
history = attn_model.fit(
[encoder_input_data, decoder_input_data],
decoder_target_data,
batch_size=batch_size,
epochs=5,
validation_split=0.2,
)
Here I have below shape
encoder_input_data.shape is (10000, 16, 71)
decoder_input_data.shape is (10000, 59, 92)
decoder_target_data.shape is (10000, 59, 92)
When I train this model, I get below error:
InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 59 and 16. Shapes are [?,59] and [?,16]. for 'model/concat_layer/concat' (op: 'ConcatV2') with input shapes: [?,59,300], [?,16,300], [] and with computed input tensors: input[2] = <2>.
I understand that it is complaining about the dimension of encoder_input_data and decoder_input_data but this same setup works when we run the regular sequence-to-sequence model (without attention) as discussed in keras blog.In this case, it is throwing error because of the Concatenation layer.
Can anyone please suggest how to fix this?
Related
I'm trying to train slot filling by LSTM in tensorflow, but received a shape error after first epoch.
Error mes: "Input to reshape is a tensor with 5632 values, but the requested shape has 12800"
I used sequences model and defined timestep = 128, which is the length of the input data sequence(after padding).
input data in model.fit : x_train.shape = (7244, 128) , y_train.shape = (7244, 128)
I have no idea why the model work on in the first epoch but fail in the latter , appreciate to any suggestion!
batch_size = 100
sequence_input = Input(shape=(128,), dtype='int32',batch_size=batch_size)
embedding_layer = Embedding(embeddings.shape[0],
embedding_dim,
weights = [embeddings],
input_length = 128,
name = 'embeddings',mask_zero=True)
# embedding layer use a dict according to glove 300d,
# and I set input_length =128 because I gauss this parameter also mentioned time step? not pretty sure.
embedded_sequences = embedding_layer(sequence_input)
x = Bidirectional(GRU(90,
stateful=True,
return_sequences=True,
name='lstm_layer',
go_backwards=True))(embedded_sequences)
x = Dense(50, activation="sigmoid")(x)
pred = Dense(9, activation="sigmoid")(x)
_model = Model(sequence_input, pred)
_model.compile(loss = 'SparseCategoricalCrossentropy',
optimizer='adam',
metrics = ['accuracy'])
_model.summary()
history = _model.fit(x_train, y_train,
epochs =10, batch_size =batch_size, shuffle = False,
validation_data=(sequences['train'], labels['train']))
model summary
I am trying to develop a bio-tagging name entity recognition (multi-class) model. I have 9 classes and converted it to one-hot encoding. During the training I got following error:
ValueError: A target array with shape (2014, 120, 9) was passed for an output of shape (None, 9) while using as loss categorical_crossentropy. This loss expects targets to have the same shape as the output.
My code snippet:
from keras.utils import to_categorical
y = [to_categorical(i, num_classes=n_tags) for i in y] ### One hot encoding
input = Input(shape=(max_len,))
embed = Embedding(input_dim=n_words + 1, output_dim=50,
input_length=max_len, mask_zero=True)(input) # 50-dim embedding
lstm = Bidirectional(LSTM(units=130, return_sequences=True,
recurrent_dropout=0.2))(embed) # variational biLSTM
(lstm, forward_h, forward_c, backward_h, backward_c) = Bidirectional(LSTM(units=130, return_sequences=True, return_state=True, recurrent_dropout=0.2))(lstm) # variational biLSTM
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
context_vector, attention_weights = Attention(10)(lstm, state_h) ### Attention mechanism
output = Dense(9, activation="softmax")(context_vector)
model = Model(input, output)
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['categorical_accuracy'])
model.summary()
history = model.fit(X,np.array(y), batch_size=32, epochs=15,verbose=1)
#### Got error message during training
Don't use one hot encoding and categorical_crossentropy. Instead keep the y vector as it is and use sparse_categorical_crossentropy. See if this works.
Refer https://stackoverflow.com/a/50135466/14337775
I'm trying to build a multi-output keras model starting from a working single output model. Keras however, is complaining about tensors dimensions.
The single output Model:
This GRU model is training and predicting fine:
timesteps = 250
features = 2
input_tensor = Input(shape=(timesteps, features), name="input")
conv = Conv1D(filters=128, kernel_size=6,use_bias=True)(input_tensor)
b = BatchNormalization()(conv)
s_gru, states = GRU(256, return_sequences=True, return_state=True, name="gru_1")(b)
biases = keras.initializers.Constant(value=88.15)
out = Dense(1, activation='linear', name="output")(s_gru)
model = Model(inputs=input_tensor, outputs=out)
My numpy arrays are:
train_x # shape:(7110, 250, 2)
train_y # shape: (7110, 250, 1)
If fit the model with the following code and everything is fine:
model.fit(train_x, train_y,batch_size=128, epochs=10, verbose=1)
The Problem:
I want to use a slightly modified version of the network that outputs also the GRU states:
input_tensor = Input(shape=(timesteps, features), name="input")
conv = Conv1D(filters=128, kernel_size=6,use_bias=True)(input_tensor)
b = BatchNormalization()(conv)
s_gru, states = GRU(256, return_sequences=True, return_state=True, name="gru_1")(b)
biases = keras.initializers.Constant(value=88.15)
out = Dense(1, activation='linear', name="output")(s_gru)
model = Model(inputs=input_tensor, outputs=[out, states]) # multi output
#fit the model but with a list of numpy array as y
model.compile(optimizer=optimizer, loss='mae', loss_weights=[0.5, 0.5])
history = model.fit(train_x, [train_y,train_y], batch_size=128, epochs=10, callbacks=[])
This training fails and keras is complaining about the target dimensions:
ValueError: Error when checking target: expected gru_1 to have 2 dimensions, but got array with shape (7110, 250, 1)
I'm using Keras 2.3.0 and Tensorflow 2.0.
What am I missing here?
The dimensions of the second output and the second element in the outputs list should be of similar shape. In this case, states would be of shape (7110, 256), which can't really be compared to the train_y shape (which will be of shape (7110, 250, 1) as noted in the first code block. Make sure the outputs can be compared with a similar shape.
I am currently trying to include an embedding layer to my sequence-to-sequence autoencoder, built with the keras functional API.
The model code looks like this:
#Encoder inputs
encoder_inputs = Input(shape=(None,))
#Embedding
embedding_layer = Embedding(input_dim=n_tokens, output_dim=2)
encoder_embedded = embedding_layer(encoder_inputs)
#Encoder LSTM
encoder_outputs, state_h, state_c = LSTM(n_hidden, return_state=True)(encoder_embedded)
lstm_states = [state_h, state_c]
#Decoder Inputs
decoder_inputs = Input(shape=(None,))
#Embedding
decoder_embedded = embedding_layer(decoder_inputs)
#Decoder LSTM
decoder_lstm = LSTM(n_hidden, return_sequences=True, return_state=True, )
decoder_outputs, _, _ = decoder_lstm(decoder_embedded, initial_state=lstm_states)
#Dense + Time
decoder_dense = TimeDistributed(Dense(n_tokens, activation='softmax'), input_shape=(None, None, 256))
#decoder_dense = Dense(n_tokens, activation='softmax', )
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
The model is trained like this:
model.fit([X, y], X, epochs=n_epoch, batch_size=n_batch)
with X and y having a shape (n_samples, n_seq_len)
The compiling of the model works flawless, while when trying to train, I will always get:
ValueError: Error when checking target: expected time_distributed_1 to
have 3 dimensions, but got array with shape (n_samples, n_seq_len)
Does anybody have an idea?
Keras Version is 2.2.4
Tensorflow backend version 1.12.0
In such an autoencoder, since the last layer is a softmax classifier you need to one-hot encode the labels:
from keras.utils import to_categorical
one_hot_X = to_categorical(X)
model.fit([X, y], one_hot_X, ...)
As a side note, since the Dense layer is applied on the last axis, there is no need to wrap the Dense layer in TimeDistributed layer.
I'm trying to implement the Keras word-level example on their blog listed under the Bonus Section -> What if I want to use a word-level model with integer sequences?
I've marked up the layers with names to help me reconnect the layers from a loaded model to a inference model later. I think I've followed their example model:
# Define an input sequence and process it - where the shape is (timesteps, n_features)
encoder_inputs = Input(shape=(None, src_vocab), name='enc_inputs')
# Add an embedding layer to process the integer encoded words to give some 'sense' before the LSTM layer
encoder_embedding = Embedding(src_vocab, latent_dim, name='enc_embedding')(encoder_inputs)
# The return_state constructor argument configures a RNN layer to return a list where the first entry is the outputs
# and the next entries are the internal RNN states. This is used to recover the states of the encoder.
encoder_outputs, state_h, state_c = LSTM(latent_dim, return_state=True, name='encoder_lstm')(encoder_embedding)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state of the RNN.
decoder_inputs = Input(shape=(None, target_vocab), name='dec_inputs')
decoder_embedding = Embedding(target_vocab, latent_dim, name='dec_embedding')(decoder_inputs)
# The return_sequences constructor argument, configuring a RNN to return its full sequence of outputs (instead of
# just the last output, which the defaults behavior).
decoder_lstm = LSTM(latent_dim, return_sequences=True, name='dec_lstm')(decoder_embedding, initial_state=encoder_states)
decoder_outputs = Dense(target_vocab, activation='softmax', name='dec_outputs')(decoder_lstm)
# Put the model together
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
but I get
ValueError: Input 0 is incompatible with layer encoder_lstm: expected ndim=3, found ndim=4
on the line
encoder_outputs, state_h, state_c = LSTM(...
What am I missing? Or is the example on the blog assuming a step that I've skipped?
Update:
And I'm training with:
X = [source_data, target_data]
y = offset_data(target_data)
model.fit(X, y, ...)
Update 2:
So, I'm still not quite there. I have my decoder_lstm and decoder_outputs defined like above and have fixed the inputs. When I load my model from an h5 file and build my inference model, I try and connect to the training model with
decoder_inputs = model.input[1] # dec_inputs (Input(shape=(None,)))
# decoder_embedding = model.layers[3] # dec_embedding (Embedding(target_vocab, latent_dim))
target_vocab = model.output_shape[2]
decoder_state_input_h = Input(shape=(latent_dim,), name='input_3') # named to avoid conflict
decoder_state_input_c = Input(shape=(latent_dim,), name='input_4')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
# Use decoder_lstm from the training model
# decoder_lstm = LSTM(latent_dim, return_sequences=True)
decoder_lstm = model.layers[5] # dec_lstm
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
but I get an error
ValueError: Input 0 is incompatible with layer dec_lstm: expected ndim=3, found ndim=2
Trying to pass decoder_embedding rather than decoder_inputs fails too.
I'm trying to adapt the example of lstm_seq2seq_restore.py but it doesn't include the complexity of the embedding layer.
Update 3:
When I use decoder_outputs, state_h, state_c = decoder_lstm(decoder_embedding, ...) to build the inference model I've confirmed that decoder_embedding is an object of type Embedding but I get:
ValueError: Layer dec_lstm was called with an input that isn't a symbolic tensor. Received type: <class 'keras.layers.embeddings.Embedding'>. Full input: [<keras.layers.embeddings.Embedding object at 0x1a1f22eac8>, <tf.Tensor 'input_3:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'input_4:0' shape=(?, 256) dtype=float32>]. All inputs to the layer should be tensors.
The full code for this model is on Bitbucket.
The problem is in the input shape of Input layer. An embedding layer accepts a sequence of integers as input which corresponds to words indices in a sentence. Since here the number of words in sentences is not fixed, therefore you must set the input shape of Input layer as (None,).
I think you are mistaking it with the case that we don't have an Embedding layer in our model and therefore the input shape of the model is (timesteps, n_features) to make it compatible with LSTM layer.
Update:
You need to pass the decoder_inputs to the Embedding layer first and then pass the resulting output tensor to the decoder_lstm layer like this:
decoder_inputs = model.input[1] # (Input(shape=(None,)))
# pass the inputs to the embedding layer
decoder_embedding = model.get_layer(name='dec_embedding')(decoder_inputs)
# ...
decoder_lstm = model.get_layer(name='dec_lstm') # dec_lstm
decoder_outputs, state_h, state_c = decoder_lstm(decoder_embedding, ...)
Update 2:
In training time, when creating the decoder_lstm layer you need to set return_state=True:
decoder_lstm, _, _ = LSTM(latent_dim, return_sequences=True, return_state=True, name='dec_lstm')(decoder_embedding, initial_state=encoder_states)