There is Bidirectional LSTM model, I don't understand why after the second implementation of model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2))), in the result we get 2 dimension (None, 20) but in the first bi directionaL LSTM we have (None, 409, 20).
can anyone help me please?
and also how can I add a self attention layer in the model?
from tensorflow.keras.layers import LSTM,Dense, Dropout,Bidirectional
from tensorflow.keras.layers import SpatialDropout1D
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
embedding_vector_length = 100
model2 = Sequential()
model2.add(Embedding(len(tokenizer.word_index) + 1, embedding_vector_length,
input_length=409) )
model2.add(Bidirectional(LSTM(10, return_sequences=True, recurrent_dropout=0.2)))
model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2)))
#model2.add(Dense(256, activation='relu'))
model2.add(Dense(3, activation='softmax'))
and the output:
Layer (type) Output Shape Param #
embedding_23 (Embedding) (None, 409, 100) 1766600
bidirectional_12 (Bidirectio (None, 409, 20) 8880
dropout_8 (Dropout) (None, 409, 20) 0
bidirectional_13 (Bidirectio (None, 20) 2480
dense_15 (Dense) (None, 3) 63
Total params: 1,778,023
Trainable params: 1,778,023
Non-trainable params: 0
For the second Bidirectional-LSTM, by default, return_sequences is set to False. Therefore, the output of this layer will be like many-to-one. If you want to get the output of each time_step, then simply use model2.add(Bidirectional(LSTM(10, return_sequences=True , recurrent_dropout=0.2))).
For attention mechanism in LSTM, you may refer to this and this links.
I'm trying to implement this model to generate midi music but I'm getting an error
The last dimension of the inputs to `Dense` should be defined. Found `None`.
Here's my code
model = Sequential()
model.add(Bidirectional(LSTM(512, return_sequences=True), input_shape=(network_input.shape[1], network_input.shape[2])))
model.add( LSTM(512, return_sequences=True))
model.compile(loss='categorical_crossentropy', optimizer='adam')
And here's the summary before the dense layer
Model: "sequential_17"
Layer (type) Output Shape Param #
bidirectional_19 (Bidirectio (None, 100, 1024) 2105344
seq_self_attention_20 (SeqSe (None, None, 1024) 65601
dropout_38 (Dropout) (None, None, 1024) 0
lstm_40 (LSTM) (None, None, 512) 3147776
dropout_39 (Dropout) (None, None, 512) 0
flatten_14 (Flatten) (None, None) 0
Total params: 5,318,721
Trainable params: 5,318,721
Non-trainable params: 0
I think that the Flatten layer is causing the problem but I have no idea why it's returning a (None, None) shape.
You have to provide all dimensions for Flatten layer except for batch dimension.
Using RNN and 'return_sequence'
For RNNs like LSTM, there is an option to either return the whole sequence or just the results. In this case you want just the results. Changing just this one line
model.add(LSTM(512, return_sequences=False)) # <== change to False
Returns the following network shape from summary
lstm_4 (LSTM) (None, 512) 3147776
dropout_3 (Dropout) (None, 512) 0
flatten_1 (Flatten) (None, 512) 0
Notice the reduction of rank of the output tensor from Rank 3 to Rank 2. This is because this output is just that, the output, and not the whole sequence under consideration with all the hidden states.
Would much appreciate your help.
I am new to the RNN and I am trying to implement a RNN architecture to classify protein sequences. essentially they are one hot encoded np arrays.
I have an issue that the data is very imbalanced:
Total: 34909
Positive: 282 (0.81% of total)
Therefore I am planning to implement the weights for the different classes by adding the class_weight=class_weight parameter when model is fitted.
I am also planning to use the f1 on the validation as a metric instead of accuracy or loss for the model as I am not interested in the true negatives.
Moreover, I am planning to implement transfer learning as I have dataseets with more positive data and datasets with only few points therefore I am planning to pretrain a general model and use the weights to further train on the specific problem.
I have come up with this architecture of the model however I am not sure if adding 4 bidirectional LSTM layers is a wise choice?:
from keras import regularizers
if output_bias is not None:
output_bias = Constant(output_bias)
model = Sequential()
# First LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True, recurrent_dropout=0.1), input_shape=(timesteps, features)))
# Second LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
# Third LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
# Forth LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=False)))
#First Dense Layer
# Adding the output layer
if output_bias == None:
model.add(Dense(units=1, activation='sigmoid',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(units=1, activation='sigmoid',
model.compile(optimizer=Adam(lr=1e-3), loss=BinaryCrossentropy(), metrics=metrics)
How do I know how many LSTM layers I should add? is it just trial and error?
Is there anything else I should include in the layers?
Model: "sequential_4"
Layer (type) Output Shape Param #
bidirectional_13 (Bidirectio (None, 5, 100) 28400
dropout_16 (Dropout) (None, 5, 100) 0
bidirectional_14 (Bidirectio (None, 5, 100) 60400
dropout_17 (Dropout) (None, 5, 100) 0
bidirectional_15 (Bidirectio (None, 5, 100) 60400
dropout_18 (Dropout) (None, 5, 100) 0
bidirectional_16 (Bidirectio (None, 100) 60400
dropout_19 (Dropout) (None, 100) 0
dense_7 (Dense) (None, 128) 12928
dropout_20 (Dropout) (None, 128) 0
dense_8 (Dense) (None, 1) 129
Total params: 222,657
Trainable params: 222,657
Non-trainable params: 0
I have built this model by going through multiple tutorials such as
Would appreciate if you could point in the right direction.
If I have:
self.model.add(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True))
then my seq_length specifies how many slices of data I want to process at once. If it matters, my model is a sequence-to-sequence (same size).
But if I have:
self.model.add(Bidirectional(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True)))
then is that doubling the sequence size? Or at each time step, is it getting seq_length / 2 before and after that timestep?
Using a bidirectional LSTM layer has no effect on the sequence length.
I tested this with the following code:
from keras.models import Sequential
from keras.layers import Bidirectional,LSTM,BatchNormalization,Dropout,Input
model = Sequential()
lstm1_size = 50
seq_length = 128
feature_dim = 20
model.add(Bidirectional(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True)))
batch_size = 32,seq_length, feature_dim))
This resulted in the following output for bidirectional
Layer (type) Output Shape Param #
bidirectional_1 (Bidirection (32, 128, 100) 28400
batch_normalization_1 (Batch (32, 128, 100) 400
dropout_1 (Dropout) (32, 128, 100) 0
Total params: 28,800
Trainable params: 28,600
Non-trainable params: 200
No bidirectional layer:
Layer (type) Output Shape Param #
lstm_1 (LSTM) (None, 128, 50) 14200
batch_normalization_1 (Batch (None, 128, 50) 200
dropout_1 (Dropout) (None, 128, 50) 0
Total params: 14,400
Trainable params: 14,300
Non-trainable params: 100
I want to train a model to predict one's emotion from the physical signals. I have a physical signal and using it as input feature;
I want to use the CNN architecture to extract features from the data, and then use these extracted features to feed a classical "Decision Tree Classifier". Below, you can see my CNN aproach without the decision tree;
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(Dense(3, activation = 'softmax'))
I want to edit this code so that, in the output layer there will be working decision tree instead of model.add(Dense(3, activation = 'softmax')). I have tried to save the outputs of the last convolutional layer like this;
output = model.layers[-6].output
And when I printed out the output variable, result was this;
THE OUTPUT: Tensor("conv1d_56/Relu:0", shape=(?, 8971, 30),
I guess, the output variable holds the extracted features. Now, how can I feed my decision tree classifier model with this data which is stored in the output variable? Here is the decision tree from scikit learn;
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier(criterion = 'entropy')
How should I feed the fit() method? Thanks in advance.
To extract a vector of features that you can pass on to another algorithm, you need a fully connected layer before your softmax layer. Something like this will add in a 128 dimensional layer just before your softmax layer:
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(Dense(128, activation='relu'))
model.add(Dense(3, activation = 'softmax'))
If you then run model.summary() you can see the name of the layers:
Layer (type) Output Shape Param #
conv1d_9 (Conv1D) (None, 17941, 15) 915
max_pooling1d_9 (MaxPooling1 (None, 8970, 15) 0
dropout_10 (Dropout) (None, 8970, 15) 0
batch_normalization_9 (Batch (None, 8970, 15) 60
conv1d_10 (Conv1D) (None, 8911, 30) 27030
max_pooling1d_10 (MaxPooling (None, 2227, 30) 0
dropout_11 (Dropout) (None, 2227, 30) 0
batch_normalization_10 (Batc (None, 2227, 30) 120
flatten_6 (Flatten) (None, 66810) 0
dense_7 (Dense) (None, 128) 8551808
dropout_12 (Dropout) (None, 128) 0
dense_8 (Dense) (None, 3) 387
Total params: 8,580,320
Trainable params: 8,580,230
Non-trainable params: 90
Once your network has been trained you can create a new model where the output layer becomes 'dense_7' and it'll generate 128 dimensional feature vectors:
feature_vectors_model = Model(model.input, model.get_layer('dense_7').output)
dtc_features = feature_vectors_model.predict(your_X_data) # fit your decision tree on this data
This is a simple example that reproduces my issue in a network I am trying to deploy.
I have an image input layer (which I need to maintain), then a Dense layer, Conv2D layer and a dense layer.
The idea is that the inputs are 10x10 images and the labels are 10x10 images. Inspired by my code and this example.
import numpy as np
from keras.models import Model
from keras.layers import Input, Conv2D
#Building model
a = Input(shape=(size,size,1))
hidden = Dense(size)(a)
hidden = Conv2D(kernel_size = (3,3), filters = size*size, activation='relu', padding='same')(hidden)
outputs = Dense(size, activation='sigmoid')(hidden)
model = Model(inputs=a, outputs=outputs)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#Create random data and accounting for 1 channel of data
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))
#Fit model, labels, verbose=1, batch_size=10, epochs=20)
I get the following error: ValueError: Error when checking target: expected dense_92 to have shape (10, 10, 10) but got array with shape (10, 10, 1)
I don't get an error if I change:
outputs = Dense(size, activation='sigmoid')(hidden)
outputs = Dense(1, activation='sigmoid')(hidden)
No idea how Dense(1) is even valid and how it allows 10x10 output signal as model.summary() indicates:
Layer (type) Output Shape Param #
input_26 (InputLayer) (None, 10, 10, 1) 0
dense_93 (Dense) (None, 10, 10, 10) 20
conv2d_9 (Conv2D) (None, 10, 10, 100) 9100
dense_94 (Dense) (None, 10, 10, 1) 101
Total params: 9,221
Trainable params: 9,221
Non-trainable params: 0
Well, according to your comments:
what I am trying to do isn't standard. I have set of images and for
each image I want to find a binary image of the same size that if the
value of its pixel is 1 it means the feature exists in the input image
the insight wether a pixel has a feature should be taken both from
local information (extracted by a convolution layers) and global
information extracted by Dense layers.
I guess you are looking for creating a two branch model where one branch consists of convolution layers and another one is simply one or more dense layers on top of each other (although, I should mention that in my opinion one convolution network may achieve what you are looking for, because the combination of pooling and convolution layers and then maybe some up-sampling layers at the end somehow preserves both local and global information). To define such a model, you can use Keras functional API like this:
from keras import models
from keras import layers
input_image = layers.Input(shape=(10, 10, 1))
# branch one: dense layers
b1 = layers.Flatten()(input_image)
b1 = layers.Dense(64, activation='relu')(b1)
b1_out = layers.Dense(32, activation='relu')(b1)
# branch two: conv + pooling layers
b2 = layers.Conv2D(32, (3,3), activation='relu')(input_image)
b2 = layers.MaxPooling2D((2,2))(b2)
b2 = layers.Conv2D(64, (3,3), activation='relu')(b2)
b2_out = layers.MaxPooling2D((2,2))(b2)
# merge two branches
flattened_b2 = layers.Flatten()(b2_out)
merged = layers.concatenate([b1_out, flattened_b2])
# add a final dense layer
output = layers.Dense(10*10, activation='sigmoid')(merged)
output = layers.Reshape((10,10))(output)
# create the model
model = models.Model(input_image, output)
model.compile(optimizer='rmsprop', loss='binary_crossentropy')
Model summary:
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 10, 10, 1) 0
conv2d_1 (Conv2D) (None, 8, 8, 32) 320 input_1[0][0]
max_pooling2d_1 (MaxPooling2D) (None, 4, 4, 32) 0 conv2d_1[0][0]
flatten_1 (Flatten) (None, 100) 0 input_1[0][0]
conv2d_2 (Conv2D) (None, 2, 2, 64) 18496 max_pooling2d_1[0][0]
dense_1 (Dense) (None, 64) 6464 flatten_1[0][0]
max_pooling2d_2 (MaxPooling2D) (None, 1, 1, 64) 0 conv2d_2[0][0]
dense_2 (Dense) (None, 32) 2080 dense_1[0][0]
flatten_2 (Flatten) (None, 64) 0 max_pooling2d_2[0][0]
concatenate_1 (Concatenate) (None, 96) 0 dense_2[0][0]
dense_3 (Dense) (None, 100) 9700 concatenate_1[0][0]
reshape_1 (Reshape) (None, 10, 10) 0 dense_3[0][0]
Total params: 37,060
Trainable params: 37,060
Non-trainable params: 0
Note that this is one way of achieving what you are looking for and it may or may not work for the specific problem and the data you are working on. You may modify this model (e.g. remove the pooling layers or add more dense layers) or completely use another architecture with different kind of layers (e.g. up-sampling, conv2dtrans) to reach a better accuracy. At the end, you must experiment to find the perfect solution.
For completeness here is how to generate data and fitting the network:
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1)), labels, verbose=1, batch_size=32, epochs=20)