LSTM Text generation Input_shape - python

I am attempting to add another LSTM layer to my model but I am only a beginner and I am not very good. I am using the (Better) - Donal Trump Tweets! dataset on Kaggle for LSTM text generation.
I am struggling to get it to run as it returns an Error:
<ValueError: Input 0 of layer lstm_16 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 128]>
My model is:
print('Building model...')
model2 = Sequential()
model2.add(LSTM(128, input_shape=(maxlen, len(chars)),return_sequences=True))
model2.add(Dropout(0.5))
model2.add(LSTM(128))
model2.add(Dropout(0.5))
model2.add(LSTM(128))
model2.add(Dropout(0.2))
model2.add(Dense(len(chars), activation='softmax'))
# optimizer = RMSprop(lr=0.01)
optimizer = Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
print('model built')
The Model works with only two LSTM layers, two Dropout layers, and one dense layer. I think something is wrong with my setup for input_shape, but I could be wrong. My model is based off of a notebook from the above data set notebook here.

In order to stack RNN's you will have to use return_sequences=True.
From the error it could be seen that the layer was expecting 3 dimentional tensor, but received a 2 dimentional. Here you can read that that return_sequences=True flag will output a 3 dimentional tensor.
If True the full sequences of successive outputs for each timestep is
returned (a 3D tensor of shape (batch_size, timesteps,
output_features)).
Assuming, that there are no issues with your input layer and the input data is passed on correctly, I will propose to try the following model.
print('Building model...')
model2 = Sequential()
model2.add(LSTM(128, input_shape=(maxlen, len(chars)),return_sequences=True))
model2.add(Dropout(0.5))
model2.add(LSTM(128, return_sequences=True))
model2.add(Dropout(0.5))
model2.add(LSTM(128))
model2.add(Dropout(0.2))
model2.add(Dense(len(chars), activation='softmax'))
# optimizer = RMSprop(lr=0.01)
optimizer = Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
print('model built')

Related

How to fix: ValueError: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=2

I have a question concerning time series data. My training dataset has the dimension (3183, 1, 6)
My model:
model = Sequential()
model.add(LSTM(100, input_shape = (training_input_data.shape[1], training_input_data.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(100, input_shape = (training_input_data.shape[1], training_input_data.shape[2])))
model.add(Dense(1))
model.compile(optimizer = 'adam', loss='mse')
I get the following error at the second LSTM layer:
ValueError: Input 0 is incompatible with layer lstm_2: expected
ndim=3, found ndim=2 But there is no ndim parameter.
The problem is that the first LSTM layer is returning something with shape (batch_size, 100). If you want to iterate with a 2nd LSTM layer, you should probably add the option return_sequences=True in the first LSTM layer (which would then return an object of shape (batch_size, training_input_data.shape[1], 100).
Note that passing input_shape = (..) in the 2nd LSTM is not mandatory, as the input shape of this layer is autoamtically computed based on the output shape of the first one.
You need to set parameter return_sequences=True to stack LSTM layers.
model = Sequential()
model.add(LSTM(
100,
input_shape = (training_input_data.shape[1], training_input_data.shape[2]),
return_sequences=True
))
model.add(Dropout(0.2))
model.add(LSTM(100, input_shape = (training_input_data.shape[1], training_input_data.shape[2])))
model.add(Dense(1))
model.compile(optimizer = 'adam', loss='mse')
See also How to stack multiple lstm in keras?

How to get one output for several timestep in Keras LSTM?

I want to classifying a timeframe of data. So for example every 5 input, there's one output. But my code refuse to accept my output.
model = Sequential()
model.add(GRU(32, input_shape=(TimeStep.TIME_STEP + 1, 10), return_sequences=True, activation='relu'))
model.add(GRU(64, activation='relu', return_sequences=True))
model.add(Dense(2, activation='hard_sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=[categorical_accuracy])
history = model.fit(TimeStep.fodder, TimeStep.target, epochs=50)
The error:
ValueError: Error when checking target: expected dense_1 to have shape (5, 2) but got array with shape (31057, 2)
It does have 31057 data point that each data point consist of 5 sequential data.
The return_sequences param in the GRU layer instructs the model to return the state at each time step rather than the final activation.
If you set that flag to False in the second GRU, your model will return the shape that you expect.
Tip: use model.summary() to display the output shapes of your layers.
For a model with a categorical loss you want the output layer activation to be a softmax not a sigmoid.

Training CNN with transfer learning in Keras - image input doesn't work but vector input does

I'm trying to do transfer learning in Keras. I set up a ResNet50 network set to not trainable with some extra layers:
# Image input
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg')) # output is 2048
model.add(Dropout(0.05))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(512, activation='relu'))
model.add(Dense(7, activation='softmax'))
model.layers[0].trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
Then I create input data: x_batch using the ResNet50 preprocess_input function, along with the one hot encoded labels y_batch and do the fitting as so:
model.fit(x_batch,
y_batch,
epochs=nb_epochs,
batch_size=64,
shuffle=True,
validation_split=0.2,
callbacks=[lrate])
Training accuracy gets close to 100% after ten or so epochs, but validation accuracy actually decreases from around 50% to 30% with validation loss steadily increasing.
However if I instead create a network with just the last layers:
# Vector input
model2 = Sequential()
model2.add(Dropout(0.05, input_shape=(2048,)))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model2.summary()
and feed in the output of the ResNet50 prediction:
resnet = ResNet50(include_top=False, pooling='avg')
x_batch = resnet.predict(x_batch)
Then validation accuracy gets up to around 85%... What is going on? Why won't the image input method work?
Update:
This problem is really bizarre. If I change ResNet50 to VGG19 it seems to work ok.
After a lot of googling I found that the problem is to do with the Batch Normalisation layers in ResNet. There are no batch normalisation layers in VGGNet which is why it works for that topology.
There is a pull request to fix this in Keras here, which explains in more detail:
Assume we use one of the pre-trained CNNs of Keras and we want to fine-tune it. Unfortunately, we get no guarantees that the mean and variance of our new dataset inside the BN layers will be similar to the ones of the original dataset. As a result, if we fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference the top layers will receive data which are scaled using the mean/variance of the original dataset. This discrepancy can lead to reduced accuracy.
This means that the BN layers are adjusting to the training data, however when validation is performed, the original parameters of the BN layers are used. From what I can tell, the fix is to allow the frozen BN layers to use the updated mean and variance from training.
A work around is to pre-compute the ResNet output. In fact, this decreases training time considerably, as we are not repeating that part of the calculation.
you can try :
Res = keras.applications.resnet.ResNet50(include_top=False,
weights='imagenet', input_shape=(IMG_SIZE , IMG_SIZE , 3 ) )
# Freeze the layers except the last 4 layers
for layer in vgg_conv.layers :
layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers:
print(layer, layer.trainable)
# Vector input
model2 = Sequential()
model2.add(Res)
model2.add(Flatten())
model2.add(Dropout(0.05 ))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics =(['accuracy'])
model2.summary()

Keras - Wrong input shape in LSTM dense layer

I am trying to build an lstm text classifier using Keras.
This is the model structure:
model_word2vec = Sequential()
model_word2vec.add(Embedding(input_dim=vocabulary_dimension,
output_dim=embedding_dim,
weights=[word2vec_weights,
input_length=longest_sentence,
mask_zero=True,
trainable=False))
model_word2vec.add(LSTM(units=embedding_dim, dropout=0.25, recurrent_dropout=0.25, return_sequences=True))
model_word2vec.add(Dense(3, activation='softmax'))
model_word2vec.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
results = model_word2vec.fit(X_tr_word2vec, y_tr_word2vec, validation_split=0.16, epochs=3, batch_size=128, verbose=0)
Where y_tr_word2vec is a 3-dimensional one-hot encoded variable.
When I run the code above, I get this error:
ValueError: Error when checking model target: expected dense_2 to have 3 dimensions, but got array with shape (15663, 3)
I suppose that the issue could be about y_tr_word2vec shape or the batch size dimension, but I'm not sure.
Update:
I have changed return_sequences=False, y_tr_word2vec from one-hot to categorical, 1 neuron in dense layer, and now I am using sparse_categorical_crossentropy instead of categorical_crossentropy.
Now, I get this error: ValueError: invalid literal for int() with base 10: 'countess'.
Therefore now I suppose that, during fit(), something goes wrong with the input vector X_tr_word2vec, which contains the sentences.
The problem is this code
model_word2vec.add(LSTM(units=dim_embedding, dropout=0.25, recurrent_dropout=0.25, return_sequences=True))
model_word2vec.add(Dense(3, activation='softmax'))
You have set return_sequences=True ,which means LSTM will return a 3D array to dense layer,,whereas dense does not need 3D data...so delete return_sequences=True
model_word2vec.add(LSTM(units=dim_embedding, dropout=0.25, recurrent_dropout=0.25))
model_word2vec.add(Dense(3, activation='softmax'))
why did u set return_sequences=True?

ValueError: Input 0 is incompatible with layer lstm_13: expected ndim=3, found ndim=4

I am trying for multi-class classification and here are the details of my training input and output:
train_input.shape= (1, 95000, 360) (95000 length input array with each
element being an array of 360 length)
train_output.shape = (1, 95000, 22) (22 Classes are there)
model = Sequential()
model.add(LSTM(22, input_shape=(1, 95000,360)))
model.add(Dense(22, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(train_input, train_output, epochs=2, batch_size=500)
The error is:
ValueError: Input 0 is incompatible with layer lstm_13: expected ndim=3, found ndim=4
in line:
model.add(LSTM(22, input_shape=(1, 95000,360)))
Please help me out, I am not able to solve it through other answers.
I solved the problem by making
input size: (95000,360,1) and
output size: (95000,22)
and changed the input shape to (360,1) in the code where model is defined:
model = Sequential()
model.add(LSTM(22, input_shape=(360,1)))
model.add(Dense(22, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(ml2_train_input, ml2_train_output_enc, epochs=2, batch_size=500)
input_shape is supposed to be (timesteps, n_features). Remove the first dimension.
input_shape = (95000,360)
Same for the output.
Well, I think the main problem out there is with the return_sequences parameter in the network.This hyper parameter should be set to False for the last layer and true for the other previous layers.
In Artifical Neural Networks (ANN), input is of shape (N,D), where N is the number of samples and D is the number of features.
IN RNN, GRU and LSTM, input is of shape (N,T,D), where N is the number of samples, T is length of time sequence and D is the number of features.
So, while adding layers
Input(shape = (D,)) for ANN and
Input(shape = (T,D)) for RNN, GRU and LSTMs

Categories