If I have:
self.model.add(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True))
self.model.add(BatchNormalization())
self.model.add(Dropout(0.2))
then my seq_length specifies how many slices of data I want to process at once. If it matters, my model is a sequence-to-sequence (same size).
But if I have:
self.model.add(Bidirectional(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True)))
self.model.add(BatchNormalization())
self.model.add(Dropout(0.2))
then is that doubling the sequence size? Or at each time step, is it getting seq_length / 2 before and after that timestep?
Using a bidirectional LSTM layer has no effect on the sequence length.
I tested this with the following code:
from keras.models import Sequential
from keras.layers import Bidirectional,LSTM,BatchNormalization,Dropout,Input
model = Sequential()
lstm1_size = 50
seq_length = 128
feature_dim = 20
model.add(Bidirectional(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True)))
model.add(BatchNormalization())
model.add(Dropout(0.2))
batch_size = 32
model.build(input_shape=(batch_size,seq_length, feature_dim))
model.summary()
This resulted in the following output for bidirectional
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_1 (Bidirection (32, 128, 100) 28400
_________________________________________________________________
batch_normalization_1 (Batch (32, 128, 100) 400
_________________________________________________________________
dropout_1 (Dropout) (32, 128, 100) 0
=================================================================
Total params: 28,800
Trainable params: 28,600
Non-trainable params: 200
No bidirectional layer:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 128, 50) 14200
_________________________________________________________________
batch_normalization_1 (Batch (None, 128, 50) 200
_________________________________________________________________
dropout_1 (Dropout) (None, 128, 50) 0
=================================================================
Total params: 14,400
Trainable params: 14,300
Non-trainable params: 100
_________________________________________________________________
Related
There is Bidirectional LSTM model, I don't understand why after the second implementation of model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2))), in the result we get 2 dimension (None, 20) but in the first bi directionaL LSTM we have (None, 409, 20).
can anyone help me please?
and also how can I add a self attention layer in the model?
from tensorflow.keras.layers import LSTM,Dense, Dropout,Bidirectional
from tensorflow.keras.layers import SpatialDropout1D
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
embedding_vector_length = 100
model2 = Sequential()
model2.add(Embedding(len(tokenizer.word_index) + 1, embedding_vector_length,
input_length=409) )
model2.add(Bidirectional(LSTM(10, return_sequences=True, recurrent_dropout=0.2)))
model2.add(Dropout(0.4))
model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2)))
model2.add(SeqSelfAttention())
#model.add(Dropout(dropout))
#model2.add(Dense(256, activation='relu'))
#model.add(Dropout(0.2))
model2.add(Dense(3, activation='softmax'))
model2.compile(loss='binary_crossentropy',optimizer='adam',
metrics=['accuracy'])
print(model2.summary())
and the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_23 (Embedding) (None, 409, 100) 1766600
_________________________________________________________________
bidirectional_12 (Bidirectio (None, 409, 20) 8880
_________________________________________________________________
dropout_8 (Dropout) (None, 409, 20) 0
_________________________________________________________________
bidirectional_13 (Bidirectio (None, 20) 2480
_________________________________________________________________
dense_15 (Dense) (None, 3) 63
=================================================================
Total params: 1,778,023
Trainable params: 1,778,023
Non-trainable params: 0
_________________________________________________________________
None
For the second Bidirectional-LSTM, by default, return_sequences is set to False. Therefore, the output of this layer will be like many-to-one. If you want to get the output of each time_step, then simply use model2.add(Bidirectional(LSTM(10, return_sequences=True , recurrent_dropout=0.2))).
For attention mechanism in LSTM, you may refer to this and this links.
I'm trying to implement this model to generate midi music but I'm getting an error
The last dimension of the inputs to `Dense` should be defined. Found `None`.
Here's my code
model = Sequential()
model.add(Bidirectional(LSTM(512, return_sequences=True), input_shape=(network_input.shape[1], network_input.shape[2])))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dropout(0.3))
model.add( LSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(Flatten())
model.summary()
model.add(Dense(note_variants_count))
model.compile(loss='categorical_crossentropy', optimizer='adam')
And here's the summary before the dense layer
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_19 (Bidirectio (None, 100, 1024) 2105344
_________________________________________________________________
seq_self_attention_20 (SeqSe (None, None, 1024) 65601
_________________________________________________________________
dropout_38 (Dropout) (None, None, 1024) 0
_________________________________________________________________
lstm_40 (LSTM) (None, None, 512) 3147776
_________________________________________________________________
dropout_39 (Dropout) (None, None, 512) 0
_________________________________________________________________
flatten_14 (Flatten) (None, None) 0
=================================================================
Total params: 5,318,721
Trainable params: 5,318,721
Non-trainable params: 0
_________________________________________________________________
I think that the Flatten layer is causing the problem but I have no idea why it's returning a (None, None) shape.
You have to provide all dimensions for Flatten layer except for batch dimension.
Using RNN and 'return_sequence'
For RNNs like LSTM, there is an option to either return the whole sequence or just the results. In this case you want just the results. Changing just this one line
model.add(Dropout(0.3))
model.add(LSTM(512, return_sequences=False)) # <== change to False
model.add(Dropout(0.3))
model.add(Flatten())
model.summary()
Returns the following network shape from summary
_________________________________________________________________
lstm_4 (LSTM) (None, 512) 3147776
_________________________________________________________________
dropout_3 (Dropout) (None, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 512) 0
=================================================================
Notice the reduction of rank of the output tensor from Rank 3 to Rank 2. This is because this output is just that, the output, and not the whole sequence under consideration with all the hidden states.
First, I have read this and this questions with similar names to mine and still do not have an answer.
I want to build a feedforward network for sequence prediction. (I realize that RNNs are more suitable for this task, but I have my reasons). The sequences are of length 128 and each element is a vector with 2 entries, so each batch should be of shape (batch_size, 128, 2) and the target is the next step in the sequence, so the target tensor should be of shape (batch_size, 1, 2).
The network architecture is something like this:
model = Sequential()
model.add(Dense(50, batch_input_shape=(None, 128, 2), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(2))
But trying to train I get the following error:
ValueError: Error when checking target: expected dense_4 to have shape (128, 2) but got array with shape (1, 2)
I've tried variations like:
model.add(Dense(50, input_shape=(128, 2), kernel_initializer="he_normal" ,activation="relu"))
but get the same error.
If you take a look at the model.summary() output you will see that what the issue is:
Layer (type) Output Shape Param #
=================================================================
dense_13 (Dense) (None, 128, 50) 150
_________________________________________________________________
dense_14 (Dense) (None, 128, 20) 1020
_________________________________________________________________
dense_15 (Dense) (None, 128, 5) 105
_________________________________________________________________
dense_16 (Dense) (None, 128, 2) 12
=================================================================
Total params: 1,287
Trainable params: 1,287
Non-trainable params: 0
_________________________________________________________________
As you can see, the output of the model is (None, 128,2) and not (None, 1, 2) (or (None, 2)) as you expected. So, you may or may not know that Dense layer is applied on the last axis of its input array and as a result, as you see above, the time axis and dimension is preserved until the end.
How to resolve this? You mentioned you don't want to use a RNN layer, therefore you have two options: you need to either use Flatten layer somewhere in the model or you can also use some Conv1D + Pooling1D layers or even a GlobalPooling layer. For example (these are just for demonstration, you may do it differently):
using Flatten layer
model = models.Sequential()
model.add(Dense(50, batch_input_shape=(None, 128, 2), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Flatten())
model.add(Dense(2))
model.summary()
Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_17 (Dense) (None, 128, 50) 150
_________________________________________________________________
dense_18 (Dense) (None, 128, 20) 1020
_________________________________________________________________
dense_19 (Dense) (None, 128, 5) 105
_________________________________________________________________
flatten_1 (Flatten) (None, 640) 0
_________________________________________________________________
dense_20 (Dense) (None, 2) 1282
=================================================================
Total params: 2,557
Trainable params: 2,557
Non-trainable params: 0
_________________________________________________________________
using GlobalAveragePooling1D layer
model = models.Sequential()
model.add(Dense(50, batch_input_shape=(None, 128, 2), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(GlobalAveragePooling1D())
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(2))
model.summary()
​Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_21 (Dense) (None, 128, 50) 150
_________________________________________________________________
dense_22 (Dense) (None, 128, 20) 1020
_________________________________________________________________
global_average_pooling1d_2 ( (None, 20) 0
_________________________________________________________________
dense_23 (Dense) (None, 5) 105
_________________________________________________________________
dense_24 (Dense) (None, 2) 12
=================================================================
Total params: 1,287
Trainable params: 1,287
Non-trainable params: 0
_________________________________________________________________
Note that in both cases above you need to reshape the labels (i.e. targets) array to (n_samples, 2) (or you may want to use a Reshape layer at the end).
Hi everyone I must implement a cnn + lstm model and I've this error:
ValueError: Error when checking target: expected dense_17 to have shape (None, 1) but got array with shape (8, 4800000)
this is my code with:
X_train.shape = (8, 4800000, 1)
y_train_arousal.shape=(8, 4800000).
X_train are 8 audio tracks, 4800000 are the samples and 1 channel
y_train_arousal: for each audio track and for each sample correspond to an arousal value.
So my model reads sequentially the samples and for each sample produces 1 value, I must evaluate the variance between the output value vary with the sample
model_arousal = Sequential()
model_arousal.add(Conv1D(filters=40, kernel_size=40, strides=2, activation='tanh', input_shape=(4800000, 1)))
model_arousal.add(MaxPooling1D(pool_size=2))
model_arousal.add(Dropout(0.5))
model_arousal.add(Conv1D(filters=40, kernel_size=400, strides=2, activation='tanh'))
model_arousal.add(MaxPooling1D(pool_size=20))
model_arousal.add(Dropout(0.5))
model_arousal.add(LSTM(128, activation='tanh', return_sequences=True))
model_arousal.add(LSTM(128, activation='tanh', return_sequences=True))
model_arousal.add(Flatten())
model_arousal.add(Dense(2))
adam = Adam(0.002)
model_arousal.compile(loss='mse', optimizer=adam, metrics=['accuracy'])
model_arousal.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_47 (Conv1D) (None, 2399981, 40) 1640
_________________________________________________________________
max_pooling1d_42 (MaxPooling (None, 1199990, 40) 0
_________________________________________________________________
dropout_39 (Dropout) (None, 1199990, 40) 0
_________________________________________________________________
conv1d_48 (Conv1D) (None, 599796, 40) 640040
_________________________________________________________________
max_pooling1d_43 (MaxPooling (None, 29989, 40) 0
_________________________________________________________________
dropout_40 (Dropout) (None, 29989, 40) 0
_________________________________________________________________
lstm_19 (LSTM) (None, 29989, 128) 86528
_________________________________________________________________
lstm_20 (LSTM) (None, 29989, 128) 131584
_________________________________________________________________
flatten_19 (Flatten) (None, 3838592) 0
_________________________________________________________________
dense_17 (Dense) (None, 1) 3838593
=================================================================
Total params: 4,698,385
Trainable params: 4,698,385
Non-trainable params: 0
_________________________________________________________________
h = model_arousal.fit(X_train, y_train_arousal, batch_size=50, epochs=5, verbose=1)
UPDATE
I'm trying to explain myself as clear as I can be.
For simplicity I consider just one audio track. For each sample of the track correspond an arousal value.
So my network must predict for each sample an arousal value, then compare it with the corresponding arousal value for the sample-ith
Here I have a GoogleNet model for Keras. Is there any possibile way to block the changing of the individual layers of network? I want to block the first two layers of pretrained model from changes.
By 'block the changing of the individual layers' I am assuming you don't want to train those layers, that is you don't want to modify the loaded weights(possibly learnt in previous training).
if so you can pass trainable=False to the layer and the parameters wont be used for the training update rule.
Example:
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential([
Dense(32, input_dim=100),
Dense(output_dim=10),
Activation('sigmoid'),
])
model.summary()
model2 = Sequential([
Dense(32, input_dim=100,trainable=False),
Dense(output_dim=10),
Activation('sigmoid'),
])
model2.summary()
You can see in the model summary for the 2nd model the parameters are counted as Non-trainable ones.
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
dense_1 (Dense) (None, 32) 3232 dense_input_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 10) 330 dense_1[0][0]
____________________________________________________________________________________________________
activation_1 (Activation) (None, 10) 0 dense_2[0][0]
====================================================================================================
Total params: 3,562
Trainable params: 3,562
Non-trainable params: 0
____________________________________________________________________________________________________
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
dense_3 (Dense) (None, 32) 3232 dense_input_2[0][0]
____________________________________________________________________________________________________
dense_4 (Dense) (None, 10) 330 dense_3[0][0]
____________________________________________________________________________________________________
activation_2 (Activation) (None, 10) 0 dense_4[0][0]
====================================================================================================
Total params: 3,562
Trainable params: 330
Non-trainable params: 3,232