keras - Error when checking target with embedding layer - python

I'm trying to run keras model as follows:
model = Sequential()
model.add(Dense(10, activation='relu',input_shape=(286,)))
model.add(Dense(1, activation='softmax',input_shape=(324827, 286)))
This code works, but if I'm trying to add an embedding layer:
model = Sequential()
model.add(Embedding(286,64, input_shape=(286,)))
model.add(Dense(10, activation='relu',input_shape=(286,)))
model.add(Dense(1, activation='softmax',input_shape=(324827, 286)))
I'm getting the following error :
ValueError: Error when checking target: expected dense_2 to have 3 dimensions, but got array with shape (324827, 1)
My data have 286 features and 324827 rows.
I'm probably doing something wrong with the shape definitions, can you tell me what it is ?

You don't need to provide the input_shape in the second Dense layer, and neither the first one, only on the first layer, the following layers shape will be coomputed :
from tensorflow.keras.layers import Embedding, Dense
from tensorflow.keras.models import Sequential
# 286 features and 324827 rows (324827, 286)
model = Sequential()
model.add(Embedding(286,64, input_shape=(286,)))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='softmax'))
model.compile(loss='mse', optimizer='adam')
returns :
Model: "sequential_2"
Layer (type) Output Shape Param #
embedding_2 (Embedding) (None, 286, 64) 18304
dense_2 (Dense) (None, 286, 10) 650
dense_3 (Dense) (None, 286, 1) 11
Total params: 18,965
Trainable params: 18,965
Non-trainable params: 0
I hope it's what you're looking for


Bidirectional LSTM output shape

There is Bidirectional LSTM model, I don't understand why after the second implementation of model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2))), in the result we get 2 dimension (None, 20) but in the first bi directionaL LSTM we have (None, 409, 20).
can anyone help me please?
and also how can I add a self attention layer in the model?
from tensorflow.keras.layers import LSTM,Dense, Dropout,Bidirectional
from tensorflow.keras.layers import SpatialDropout1D
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
embedding_vector_length = 100
model2 = Sequential()
model2.add(Embedding(len(tokenizer.word_index) + 1, embedding_vector_length,
input_length=409) )
model2.add(Bidirectional(LSTM(10, return_sequences=True, recurrent_dropout=0.2)))
model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2)))
#model2.add(Dense(256, activation='relu'))
model2.add(Dense(3, activation='softmax'))
and the output:
Layer (type) Output Shape Param #
embedding_23 (Embedding) (None, 409, 100) 1766600
bidirectional_12 (Bidirectio (None, 409, 20) 8880
dropout_8 (Dropout) (None, 409, 20) 0
bidirectional_13 (Bidirectio (None, 20) 2480
dense_15 (Dense) (None, 3) 63
Total params: 1,778,023
Trainable params: 1,778,023
Non-trainable params: 0
For the second Bidirectional-LSTM, by default, return_sequences is set to False. Therefore, the output of this layer will be like many-to-one. If you want to get the output of each time_step, then simply use model2.add(Bidirectional(LSTM(10, return_sequences=True , recurrent_dropout=0.2))).
For attention mechanism in LSTM, you may refer to this and this links.

Understand the summary of a LSTM model

I have the following LSTM model. Can somebody helps me understand the summary of the model?
a) How the param# are calculated?
b) We have no value?
c) the param# near the dropoout why is 0?
model = Sequential()
model.add(LSTM(64, return_sequences=True, recurrent_regularizer=l2(0.0015), input_shape=(timestamps,
model.add(LSTM(64, recurrent_regularizer=l2(0.0015), input_shape=(timesteps,input_dim)))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(n_classes, activation='softmax'))
The following are the input, timestamps, and x_train
input_dim= 6
The summary is:
Layer (type) Output Shape Param #
lstm_1 (LSTM) (None, 100, 64) 18176
dropout_1 (Dropout) (None, 100, 64) 0
lstm_2 (LSTM) (None, 64) 33024
dense_1 (Dense) (None, 64) 4160
dense_2 (Dense) (None, 64) 4160
dense_3 (Dense) (None, 6) 390
Total params: 59,910
Trainable params: 59,910
Non-trainable params: 0
Part of your question is answered here.
Simply put, the reason there are so many parameters for an LSTM model is because you have tons of data in your model and many weights need to be trained to fit the model.
Dropout layers don't have parameters because there are no weights in a dropout layer. All a dropout layer does is give a % chance that a neuron won't be included during testing. In this case, you've chosen 50%. Beyond that, there is nothing to configure in a dropout layer.
How parameters are calculated?
well!!. the input dimension is 6 and the hidden neurons in the first LSTM layer is 64.
so the first LSTM layer takes input [64 (initialized hidden state) + 6 (input)] in this form. so we can say the input dimension is 70 [64 (hidden state at t-1) + 6 current input at t].
Now the calculation part.
no of parms = input dimension * hidden units + bias.
= [64 (randomly initialized hidden state dimension) + 6 (input dimension)]*64( hidden neurons ) + 64 ( bias 1 for each hidden neurons)
= (64+6)*64+64
for one FFNN = 4544
But LSTM has 4 FFNN, so simply multiply it by 4.
Total trainable params = 4 * 4544
= 18176
Dropout layer does not have any parameters.
I am not sure which value you are talking about.?

LSTM - predicting on a sliding window data

My training data is an overlapping sliding window of users daily data. it's shape is (1470, 3, 256, 18):
1470 batches of 3 days of data, each day has 256 samples of 18 features each.
My targets shape is (1470,):
a label value for each batch.
I want to train an LSTM to predict a [3 days batch] -> [one target]
The 256 day samples is padded with -10 for days that were missing 256 sampels
I've written the following code to build the model:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout,Dense,Masking,Flatten
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import TensorBoard,ModelCheckpoint
from tensorflow.keras import metrics
def build_model(num_samples, num_features):
opt = RMSprop(0.001)
model = Sequential()
model.add(Masking(mask_value=-10., input_shape=(num_samples, num_features)))
model.add(LSTM(32, return_sequences=True, activation='tanh'))
model.add(LSTM(16, return_sequences=False, activation='tanh'))
model.add(Dense(16, activation='tanh'))
model.add(Dense(8, activation='tanh'))
model.compile(loss='mse', optimizer=opt ,metrics=['mae','mse'])
return model
model = build_model(256,18)
Model: "sequential_7"
Layer (type) Output Shape Param #
masking_7 (Masking) (None, 256, 18) 0
lstm_14 (LSTM) (None, 256, 32) 6528
dropout_7 (Dropout) (None, 256, 32) 0
lstm_15 (LSTM) (None, 16) 3136
dropout_8 (Dropout) (None, 16) 0
dense_6 (Dense) (None, 16) 272
dense_7 (Dense) (None, 8) 136
dense_8 (Dense) (None, 1) 9
Total params: 10,081
Trainable params: 10,081
Non-trainable params: 0
I can see that the shapes are incompatible, but I can't figure out how to change the code to fit my problem.
Any help would be appreciated
Update: I've reshaped my data like so:
train_data.reshape(1470*3, 256, 18)
is that right?
I think you are looking for TimeDistributed(LSTM(...)) (source)
day, num_samples, num_features = 3, 256, 18
model = Sequential()
model.add(Masking(mask_value=-10., input_shape=(day, num_samples, num_features)))
model.add(TimeDistributed(LSTM(32, return_sequences=True, activation='tanh')))
model.add(TimeDistributed(LSTM(16, return_sequences=False, activation='tanh')))
model.add(Dense(16, activation='tanh'))
model.add(Dense(8, activation='tanh'))
model.compile(loss='mse', optimizer='adam' ,metrics=['mae','mse'])

TimeDistributed Dense Layer after GRU (return_sequences=True) layers causing error with dimensions

I´m currently trying to make my first steps using Keras on top of Tensorflow to classify timeseries data. I was able to get a pretty simple model running but after some feedback it was recommended to me to use multiple GRU layers in a row and add the TimeDistributed wrapper around my Dense layers. Here is the model I was trying:
model = Sequential()
model.add(GRU(100, input_shape=(n_timesteps, n_features), return_sequences=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(TimeDistributed(Dense(units=100, activation='relu')))
model.add(TimeDistributed(Dense(n_outputs, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I am receiving the following error message when trying to fit the model with the input having a shape of (2357, 128, 11) (2357 samples, 128 timesteps, 11 features):
ValueError: Error when checking target: expected time_distributed_2 to have 3 dimensions, but got array with shape (2357, 5)
This is the output for model.summary():
Layer (type) Output Shape Param #
gru_1 (GRU) (None, 128, 100) 33600
gru_2 (GRU) (None, 128, 100) 60300
gru_3 (GRU) (None, 128, 100) 60300
gru_4 (GRU) (None, 128, 100) 60300
gru_5 (GRU) (None, 128, 100) 60300
gru_6 (GRU) (None, 128, 100) 60300
time_distributed_1 (TimeDist (None, 128, 100) 10100
time_distributed_2 (TimeDist (None, 128, 5) 505
Total params: 345,705
Trainable params: 345,705
Non-trainable params: 0
So what is the correct way to put multiple GRU layers in a row and add the TimeDistributed Wrapper to the following Dense layers. I will be very grateful for any helpful input
If you set return_sequences = False in your last layer of GRU, the code will work.
You only need to put return_sequences = True in case the output of a RNN is fed to an input again to a RNN, hence to preserve the time dimensionality space. When you set return_sequences = False, this means that the output will be only the last hidden state (instead of hidden state at every time step), and the time dimensionality will disappear.
That is why when you set return_sequnces = False, the output dimensionality decreases from N to N-1.

How to define inpute shapes in Sequential keras model

Please, help to define appropriate Dense input shapes in keras models. Maybe I have to reshape my data first. I have data set with dimensions shown below:
Data shapes are X_train: (2858, 2037) y_train: (2858, 1) X_test: (715, 2037) y_test: (715, 1)
Number of features (input shape) is 2037
I want to define Sequential keras model like that
batch_size = 128
num_classes = 2
epochs = 20
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(X_input_shape,)))
model.add(Dense(512, activation='relu'))
Model summary:
Layer (type) Output Shape Param #
dense_20 (Dense) (None, 512) 1043456
dropout_12 (Dropout) (None, 512) 0
dense_21 (Dense) (None, 512) 262656
Total params: 1,306,112
Trainable params: 1,306,112
Non-trainable params: 0
And when I try to fit it...
history =, y_train,
validation_data=(X_test, y_test))
I got an error:
ValueError: Error when checking target: expected dense_21 to have shape (512,) but got array with shape (1,)
model.add(Dense(512, activation='relu'))
model.add(Dense(1, activation='relu'))
The output shape to be of size 1, same as y_train.shape[1].
