Shape error of Tensorflow GRU while doing slot filling - python

I'm trying to train slot filling by LSTM in tensorflow, but received a shape error after first epoch.
Error mes: "Input to reshape is a tensor with 5632 values, but the requested shape has 12800"
I used sequences model and defined timestep = 128, which is the length of the input data sequence(after padding).
input data in model.fit : x_train.shape = (7244, 128) , y_train.shape = (7244, 128)
I have no idea why the model work on in the first epoch but fail in the latter , appreciate to any suggestion!
batch_size = 100
sequence_input = Input(shape=(128,), dtype='int32',batch_size=batch_size)
embedding_layer = Embedding(embeddings.shape[0],
embedding_dim,
weights = [embeddings],
input_length = 128,
name = 'embeddings',mask_zero=True)
# embedding layer use a dict according to glove 300d,
# and I set input_length =128 because I gauss this parameter also mentioned time step? not pretty sure.
embedded_sequences = embedding_layer(sequence_input)
x = Bidirectional(GRU(90,
stateful=True,
return_sequences=True,
name='lstm_layer',
go_backwards=True))(embedded_sequences)
x = Dense(50, activation="sigmoid")(x)
pred = Dense(9, activation="sigmoid")(x)
_model = Model(sequence_input, pred)
_model.compile(loss = 'SparseCategoricalCrossentropy',
optimizer='adam',
metrics = ['accuracy'])
_model.summary()
history = _model.fit(x_train, y_train,
epochs =10, batch_size =batch_size, shuffle = False,
validation_data=(sequences['train'], labels['train']))
model summary

Related

ValueError in `categorical_crossentropy` loss function: shape issue

I am trying to develop a bio-tagging name entity recognition (multi-class) model. I have 9 classes and converted it to one-hot encoding. During the training I got following error:
ValueError: A target array with shape (2014, 120, 9) was passed for an output of shape (None, 9) while using as loss categorical_crossentropy. This loss expects targets to have the same shape as the output.
My code snippet:
from keras.utils import to_categorical
y = [to_categorical(i, num_classes=n_tags) for i in y] ### One hot encoding
input = Input(shape=(max_len,))
embed = Embedding(input_dim=n_words + 1, output_dim=50,
input_length=max_len, mask_zero=True)(input) # 50-dim embedding
lstm = Bidirectional(LSTM(units=130, return_sequences=True,
recurrent_dropout=0.2))(embed) # variational biLSTM
(lstm, forward_h, forward_c, backward_h, backward_c) = Bidirectional(LSTM(units=130, return_sequences=True, return_state=True, recurrent_dropout=0.2))(lstm) # variational biLSTM
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
context_vector, attention_weights = Attention(10)(lstm, state_h) ### Attention mechanism
output = Dense(9, activation="softmax")(context_vector)
model = Model(input, output)
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['categorical_accuracy'])
model.summary()
history = model.fit(X,np.array(y), batch_size=32, epochs=15,verbose=1)
#### Got error message during training
Don't use one hot encoding and categorical_crossentropy. Instead keep the y vector as it is and use sparse_categorical_crossentropy. See if this works.
Refer https://stackoverflow.com/a/50135466/14337775

recurrent neural network ValueError: Found array with dim 3. Estimator expected <= 2

I am running an LSTM, GRU and bilstm model using the following code
# Create BiLSTM model
def create_model_bilstm(units):
model = Sequential()
model.add(Bidirectional(LSTM(units = units,
return_sequences=True),
input_shape=(X_train.shape[1], X_train.shape[2])))
#model.add(Bidirectional(LSTM(units = units)))
model.add(Dense(1))
#Compile model
model.compile(loss='mse', optimizer='adam')
return model
# Create LSTM or GRU model
def create_model(units, m):
model = Sequential()
model.add(m (units = units, return_sequences = True,
input_shape = [X_train.shape[1], X_train.shape[2]]))
model.add(Dropout(0.1))
#model.add(m (units = units))
#model.add(Dropout(0.2))
model.add(Dense(units = 1))
#Compile model
model.compile(loss='mse', optimizer='adam')
return model
# BiLSTM
model_bilstm = create_model_bilstm(20)
# GRU and LSTM
model_gru = create_model(50, GRU)
model_lstm = create_model(20, LSTM)
# Fit BiLSTM, LSTM and GRU
def fit_model(model):
early_stop = EarlyStopping(monitor = 'val_loss',
patience = 100)
history = model.fit(X_train, y_train, epochs = 700,
validation_split = 0.2, batch_size = 32,
shuffle = False, callbacks = [early_stop])
return history
history_bilstm = fit_model(model_bilstm)
history_lstm = fit_model(model_lstm)
history_gru = fit_model(model_gru)
This all runs smoothly and prints out my loss graphs. but when it comes to predictions i run the following code
# Make prediction
def prediction(model):
prediction = model.predict(X_test)
prediction = scaler_y.inverse_transform(prediction)
return prediction
prediction_bilstm = prediction(model_bilstm)
prediction_lstm = prediction(model_lstm)
prediction_gru = prediction(model_gru)
and i get the following error
ValueError Traceback (most recent call last)
<ipython-input-387-9d45f01ae2a2> in <module>
5 return prediction
6
----> 7 prediction_bilstm = prediction(model_bilstm)
8 prediction_lstm = prediction(model_lstm)
9 prediction_gru = prediction(model_gru)
<ipython-input-387-9d45f01ae2a2> in prediction(model)
2 def prediction(model):
3 prediction = model.predict(X_test)
----> 4 prediction = scaler_y.inverse_transform(prediction)
5 return prediction
...
ValueError: Found array with dim 3. Estimator expected <= 2.
I am assuming this has something to do with my X_test shape based on other posts i have read so i tried to reshape it to 2d but got another error telling me "expected bidirectional_3_input to have 3 dimensions, but got array with shape (62, 36)" on line 7 again.
What am i doing wrong and how can i fix it?
Data Explanation:
So I am trying to predict discharge rates (target variable) using groundwater levels (34 features), precipitation and temperature as input which gives me a total of 36 features. My data is in monthly resolution. I am using 63 observation for my test (5 year pred) and the rest for my train.
What are you doing wrong? Let's assume your input data has shape X_train.shape = [d0,d1,d2], then after setting up your BiLSTM-model like
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Bidirectional,LSTM,Dense
model = tf.keras.Sequential()
model.add(
tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(
units = 10,
return_sequences=True),
input_shape=(d1, d2)
)
)
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
we can check the input- and output-shapes your model expects by
>>model.input.shape
TensorShape([None, d1, d2])
>>model.output.shape
TensorShape([None, d1, 1])
So your model expects input of shape (n_batch,d1,d2), where n_batch is the batch size of the data, and returns a shape (n_batch,d1,1), thus a 3d-tensor.
Now if you provide a 3d-tensor to your model, the model.prediction-method will succesfully return a 3d-tensor, however sklearn.preprocessing.StandardScaler.inverse_transform only works for 2d-data, thats why it says
ValueError: Found array with dim 3. Estimator expected <= 2.
On the other hand, if you first reshape your data to be 2d, then model.prediction complains, because it is set up to expect a 3d-tensor.
How can you fix it? For further help on how to fix your code, you will need to provide us with more detailled information on what you expect your model to do, especially what output-shape you want your BiLSTM-model to have. I assume you actually want your BiLSTM-model to return a scalar for each sample, so an additional Flatten-layer might do the trick:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Bidirectional,LSTM,Dense,Flatten
model = tf.keras.Sequential()
model.add(
tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(
units = 10,
return_sequences=True),
input_shape=(d1, d2)
)
)
model.add(Flatten()) #<-- additional flatten-layer
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')

Dimension mismatch in Keras sequence to sequence model with Attention

I am trying to build a Neural Machine Translation model with attention. I am following the tutorial on Keras blog that shows how to build a NMT model using sequence-to-sequence approach (without attention). I extended the model to incorporate attention in the following way -
latent_dim = 300
embedding_dim=100
batch_size = 128
# Encoder
encoder_inputs = keras.Input(shape=(None, num_encoder_tokens))
#encoder lstm 1
encoder_lstm = tf.keras.layers.LSTM(latent_dim,return_sequences=True,return_state=True,dropout=0.4,recurrent_dropout=0.4)
encoder_output, state_h, state_c = encoder_lstm(encoder_inputs)
print(encoder_output.shape)
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = keras.Input(shape=(None, num_decoder_tokens))
decoder_lstm = tf.keras.layers.LSTM(latent_dim, return_sequences=True, return_state=True,dropout=0.4,recurrent_dropout=0.2)
decoder_output,decoder_fwd_state, decoder_back_state = decoder_lstm(decoder_inputs,initial_state=[state_h, state_c])
# Attention layer
attn_out = tf.keras.layers.Attention()([encoder_output, decoder_output])
# Concat attention input and decoder LSTM output
decoder_concat_input = tf.keras.layers.Concatenate(axis=-1, name='concat_layer')([decoder_output, attn_out])
#dense layer
decoder_dense = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_decoder_tokens, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
# Define the model
attn_model = tf.keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)
attn_model.summary()
To train the model -
attn_model.compile(
optimizer="rmsprop", loss="categorical_crossentropy", metrics=["accuracy"]
)
history = attn_model.fit(
[encoder_input_data, decoder_input_data],
decoder_target_data,
batch_size=batch_size,
epochs=5,
validation_split=0.2,
)
Here I have below shape
encoder_input_data.shape is (10000, 16, 71)
decoder_input_data.shape is (10000, 59, 92)
decoder_target_data.shape is (10000, 59, 92)
When I train this model, I get below error:
InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 59 and 16. Shapes are [?,59] and [?,16]. for 'model/concat_layer/concat' (op: 'ConcatV2') with input shapes: [?,59,300], [?,16,300], [] and with computed input tensors: input[2] = <2>.
I understand that it is complaining about the dimension of encoder_input_data and decoder_input_data but this same setup works when we run the regular sequence-to-sequence model (without attention) as discussed in keras blog.In this case, it is throwing error because of the Concatenation layer.
Can anyone please suggest how to fix this?

Keras multi-output model wrongly calculate target dimensions: ValueError: Error when checking target

I'm trying to build a multi-output keras model starting from a working single output model. Keras however, is complaining about tensors dimensions.
The single output Model:
This GRU model is training and predicting fine:
timesteps = 250
features = 2
input_tensor = Input(shape=(timesteps, features), name="input")
conv = Conv1D(filters=128, kernel_size=6,use_bias=True)(input_tensor)
b = BatchNormalization()(conv)
s_gru, states = GRU(256, return_sequences=True, return_state=True, name="gru_1")(b)
biases = keras.initializers.Constant(value=88.15)
out = Dense(1, activation='linear', name="output")(s_gru)
model = Model(inputs=input_tensor, outputs=out)
My numpy arrays are:
train_x # shape:(7110, 250, 2)
train_y # shape: (7110, 250, 1)
If fit the model with the following code and everything is fine:
model.fit(train_x, train_y,batch_size=128, epochs=10, verbose=1)
The Problem:
I want to use a slightly modified version of the network that outputs also the GRU states:
input_tensor = Input(shape=(timesteps, features), name="input")
conv = Conv1D(filters=128, kernel_size=6,use_bias=True)(input_tensor)
b = BatchNormalization()(conv)
s_gru, states = GRU(256, return_sequences=True, return_state=True, name="gru_1")(b)
biases = keras.initializers.Constant(value=88.15)
out = Dense(1, activation='linear', name="output")(s_gru)
model = Model(inputs=input_tensor, outputs=[out, states]) # multi output
#fit the model but with a list of numpy array as y
model.compile(optimizer=optimizer, loss='mae', loss_weights=[0.5, 0.5])
history = model.fit(train_x, [train_y,train_y], batch_size=128, epochs=10, callbacks=[])
This training fails and keras is complaining about the target dimensions:
ValueError: Error when checking target: expected gru_1 to have 2 dimensions, but got array with shape (7110, 250, 1)
I'm using Keras 2.3.0 and Tensorflow 2.0.
What am I missing here?
The dimensions of the second output and the second element in the outputs list should be of similar shape. In this case, states would be of shape (7110, 256), which can't really be compared to the train_y shape (which will be of shape (7110, 250, 1) as noted in the first code block. Make sure the outputs can be compared with a similar shape.

Transfer learning, wrong dense layer's shape

I am trying to apply transfer learning to my ANN for image classification.
I have found an example of it, and I would personalize the network.
Here there are the main blocks of code:
model = VGG19(weights='imagenet',
include_top=False,
input_shape=(224, 224, 3))
batch_size = 16
for layer in model.layers[:5]:
layer.trainable = False
x = model.output
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
x = Dense(1024, activation="relu")(x)
predictions = Dense(16, activation="sigmoid")(x)
model_final = Model(input = model.input, output = predictions)
model_final.fit_generator(
train_generator,
samples_per_epoch = nb_train_samples,
epochs = epochs,
validation_data = validation_generator,
validation_steps = nb_validation_samples,
callbacks = [checkpoint, early])
When I run the code above I get this error:
ValueError: Error when checking target: expected dense_3 to have shape (16,) but got array with shape (1,).
I suppose that the problem is about the dimensions' order in the dense layer, I have tried to transpose it, but I get the same error.
Maybe this simple example can help:
import numpy as np
test = np.array([1,2,3])
print(test.shape) # (3,)
test = test[np.newaxis]
print(test.shape) # (1, 3)
Try apply [np.newaxis] in your train_generator output.

Categories