I want to train feature of size (10151, 1285) to lable (10151, 257), and I want to use way2. since in I want to use "feature_input" in the cost function. but it fails with error:
ValueError: Input 0 of layer batch_normalization_6 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 257).
I am wondering why?
Way1:
model = Sequential()
model.add(Dense(257, input_dim=1285))
model.add(BatchNormalization())
model.add(Activation('sigmoid'))
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
model.fit(feature, label )
model.save("./model.hdf5")
Way2:
feature_input = Input(shape=(None, 1285))
dense = Dense(257)(feature_input)
norm = BatchNormalization()(dense)
out = Activation('sigmoid')(norm)
model = Model(feature_input, out)
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
model.fit(feature, label )
model.save("./model.hdf5")
If you define the input shape as (None, 1285), the model recognizes the input as a 3-dimensional data. I guess the None shape you entered was meant to describe the batch size, but when we compile the model we get a 3-dimensional input, and the batch dimension is automatically added. Therefore, you can use an input shape of (1285,) as an alternative.
<Summary of your model>
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, None, 1285)] 0
_________________________________________________________________
dense (Dense) (None, None, 257) 330502
_________________________________________________________________
batch_normalization (BatchNo (None, None, 257) 1028
_________________________________________________________________
activation (Activation) (None, None, 257) 0
=================================================================
Total params: 331,530
Trainable params: 331,016
Non-trainable params: 514
_________________________________________________________________
Related
There is Bidirectional LSTM model, I don't understand why after the second implementation of model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2))), in the result we get 2 dimension (None, 20) but in the first bi directionaL LSTM we have (None, 409, 20).
can anyone help me please?
and also how can I add a self attention layer in the model?
from tensorflow.keras.layers import LSTM,Dense, Dropout,Bidirectional
from tensorflow.keras.layers import SpatialDropout1D
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
embedding_vector_length = 100
model2 = Sequential()
model2.add(Embedding(len(tokenizer.word_index) + 1, embedding_vector_length,
input_length=409) )
model2.add(Bidirectional(LSTM(10, return_sequences=True, recurrent_dropout=0.2)))
model2.add(Dropout(0.4))
model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2)))
model2.add(SeqSelfAttention())
#model.add(Dropout(dropout))
#model2.add(Dense(256, activation='relu'))
#model.add(Dropout(0.2))
model2.add(Dense(3, activation='softmax'))
model2.compile(loss='binary_crossentropy',optimizer='adam',
metrics=['accuracy'])
print(model2.summary())
and the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_23 (Embedding) (None, 409, 100) 1766600
_________________________________________________________________
bidirectional_12 (Bidirectio (None, 409, 20) 8880
_________________________________________________________________
dropout_8 (Dropout) (None, 409, 20) 0
_________________________________________________________________
bidirectional_13 (Bidirectio (None, 20) 2480
_________________________________________________________________
dense_15 (Dense) (None, 3) 63
=================================================================
Total params: 1,778,023
Trainable params: 1,778,023
Non-trainable params: 0
_________________________________________________________________
None
For the second Bidirectional-LSTM, by default, return_sequences is set to False. Therefore, the output of this layer will be like many-to-one. If you want to get the output of each time_step, then simply use model2.add(Bidirectional(LSTM(10, return_sequences=True , recurrent_dropout=0.2))).
For attention mechanism in LSTM, you may refer to this and this links.
I'm trying to implement this model to generate midi music but I'm getting an error
The last dimension of the inputs to `Dense` should be defined. Found `None`.
Here's my code
model = Sequential()
model.add(Bidirectional(LSTM(512, return_sequences=True), input_shape=(network_input.shape[1], network_input.shape[2])))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dropout(0.3))
model.add( LSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(Flatten())
model.summary()
model.add(Dense(note_variants_count))
model.compile(loss='categorical_crossentropy', optimizer='adam')
And here's the summary before the dense layer
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_19 (Bidirectio (None, 100, 1024) 2105344
_________________________________________________________________
seq_self_attention_20 (SeqSe (None, None, 1024) 65601
_________________________________________________________________
dropout_38 (Dropout) (None, None, 1024) 0
_________________________________________________________________
lstm_40 (LSTM) (None, None, 512) 3147776
_________________________________________________________________
dropout_39 (Dropout) (None, None, 512) 0
_________________________________________________________________
flatten_14 (Flatten) (None, None) 0
=================================================================
Total params: 5,318,721
Trainable params: 5,318,721
Non-trainable params: 0
_________________________________________________________________
I think that the Flatten layer is causing the problem but I have no idea why it's returning a (None, None) shape.
You have to provide all dimensions for Flatten layer except for batch dimension.
Using RNN and 'return_sequence'
For RNNs like LSTM, there is an option to either return the whole sequence or just the results. In this case you want just the results. Changing just this one line
model.add(Dropout(0.3))
model.add(LSTM(512, return_sequences=False)) # <== change to False
model.add(Dropout(0.3))
model.add(Flatten())
model.summary()
Returns the following network shape from summary
_________________________________________________________________
lstm_4 (LSTM) (None, 512) 3147776
_________________________________________________________________
dropout_3 (Dropout) (None, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 512) 0
=================================================================
Notice the reduction of rank of the output tensor from Rank 3 to Rank 2. This is because this output is just that, the output, and not the whole sequence under consideration with all the hidden states.
I am playing around with an NLP problem (sentence classification) and decided to use HuggingFace's TFBertModel along with Conv1D, Flatten, and Dense layers. I am using the functional API and my model compiles. However, during model.fit(), I get a shape error at the output Dense layer.
Model definition:
# Build model with a max length of 50 words in a sentence
max_len = 50
def build_model():
bert_encoder = TFBertModel.from_pretrained(model_name)
input_word_ids = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_word_ids")
input_mask = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_mask")
input_type_ids = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_type_ids")
# Create a conv1d model. The model may not really be useful or make sense, but that's OK (for now).
embedding = bert_encoder([input_word_ids, input_mask, input_type_ids])[0]
conv_layer = tf.keras.layers.Conv1D(32, 3, activation='relu')(embedding)
dense_layer = tf.keras.layers.Dense(24, activation='relu')(conv_layer)
flatten_layer = tf.keras.layers.Flatten()(dense_layer)
output_layer = tf.keras.layers.Dense(3, activation='softmax')(flatten_layer)
model = tf.keras.Model(inputs=[input_word_ids, input_mask, input_type_ids], outputs=output_layer)
model.compile(tf.keras.optimizers.Adam(lr=1e-5), loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
# View model architecture
model = build_model()
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_word_ids (InputLayer) [(None, 50)] 0
__________________________________________________________________________________________________
input_mask (InputLayer) [(None, 50)] 0
__________________________________________________________________________________________________
input_type_ids (InputLayer) [(None, 50)] 0
__________________________________________________________________________________________________
tf_bert_model (TFBertModel) ((None, 50, 768), (N 177853440 input_word_ids[0][0]
input_mask[0][0]
input_type_ids[0][0]
__________________________________________________________________________________________________
conv1d (Conv1D) (None, 48, 32) 73760 tf_bert_model[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 48, 24) 792 conv1d[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 1152) 0 dense[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 3) 3459 flatten[0][0]
==================================================================================================
Total params: 177,931,451
Trainable params: 177,931,451
Non-trainable params: 0
__________________________________________________________________________________________________
# Fit model on input data
model.fit(train_input, train['label'].values, epochs = 3, verbose = 1, batch_size = 16,
validation_split = 0.2)
And this is the error message:
ValueError: Input 0 of layer dense_1 is incompatible with the layer: expected axis -1 of input shape to have value 1152 but received
input with shape [16, 6168]
I am unable to understand how the input shape to layer dense_1 (the output dense layer) can be 6168? As per the model summary, it should always be 1152.
The shape of your input is likely not as you expect. Check the shape of train_input.
I am using keras with TF backend to build a simple Conv1d net. The data has the following shape:
train feature shape: (33960, 3053, 1)
train label shape: (33960, 686, 1)
I build my model with:
def create_conv_model():
inp = Input(shape=(3053, 1))
conv = Conv1D(filters=2, kernel_size=2)(inp)
pool = MaxPool1D(pool_size=2)(conv)
flat = Flatten()(pool)
dense = Dense(686)(flat)
model = Model(inp, dense)
model.compile(loss='mse', optimizer='adam')
return model
Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3053, 1) 0
_________________________________________________________________
conv1d_1 (Conv1D) (None, 3052, 2) 6
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1526, 2) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 3052) 0
_________________________________________________________________
dense_1 (Dense) (None, 686) 2094358
=================================================================
Total params: 2,094,364
Trainable params: 2,094,364
Non-trainable params: 0
Upon running
model.fit(x=train_feature,
y=train_label_categorical,
epochs=100,
batch_size=64,
validation_split=0.2,
validation_data=(test_feature,test_label_categorical),
callbacks=[tensorboard,reduce_lr,early_stopping])
i get the following VERY USUAL ERROR:
ValueError: Error when checking input: expected input_1 to have 3 dimensions, but got array with shape (8491, 3053)
I've checked pretty much all the posts regarding this very common problem, but I've been unable to find a solution. What am i doing wrong? I don't understand what's going on. Where is the shape (8491, 3053) coming from?
Any help will be much appreciated, I am not able to make this go away.
Change validation_data=(test_feature,test_label_categorical) in model.fit function to
validation_data=(np.expand_dims(test_feature, -1),test_label_categorical)
The model is expecting validation feature of shape (8491, 3053, 1), but in above code you are providing it (8491, 3053).
I have managed to create a successful RNN that can predict the next letter in a sequence of letters. However, I can't work out why a solution to a problem I was having is working.
My training data is of dimensions (39000,7,7)
My Model is as follows:
model = Sequential()
model.add(SimpleRNN(7, input_shape = [7,7], return_sequences = True))
model.add(Flatten())
model.add(Dense(7))
model.add(Activation('softmax'))
adam = optimizers.Adam(lr = 0.001)
model.compile(loss='categorical_crossentropy',optimizer=adam, metrics=['accuracy'])
model.summary()
return model
Layer (type) Output Shape Param #
=================================================================
simple_rnn_49 (SimpleRNN) (None, 7, 7) 105
_________________________________________________________________
flatten_14 (Flatten) (None, 49) 0
_________________________________________________________________
dense_49 (Dense) (None, 7) 350
_________________________________________________________________
activation_40 (Activation) (None, 7) 0
=================================================================
Total params: 455
Trainable params: 455
Non-trainable params: 0
_________________________________________________________________
This works perfectly. My question is, why do I need the flatten layer? When I don't include it I get this model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_50 (SimpleRNN) (None, 7, 7) 105
_________________________________________________________________
dense_50 (Dense) (None, 7, 7) 56
_________________________________________________________________
activation_41 (Activation) (None, 7, 7) 0
=================================================================
Total params: 161
Trainable params: 161
Non-trainable params: 0
_________________________________________________________________
followed by this error
ValueError: Error when checking target: expected activation_41 to have 3 dimensions, but got array with shape (39000, 7)
My Question is: when the model summary says the output of the dense layer should be (None, 7 , 7) in the second example, and the error message says activation level is expecting exactly such a 3D input, why is the dense layer actually outputting a tensor of shape (39000,7) as according to the error message? I realise the flatten() layer solves this by putting everything in 2D, but im confused as to why it doesn't work without it.
In your error statement you can see that the error is caused when checking the target dimensions. Your model output without the flatten layer is of the shape (None, 7, 7) which is correctly shown in your model summary. The issue here is that your labels are of the shape (None, 7), so Keras throws a ValueError (probably during backpropogation) as your labels have one less dimension than the output of your network. Keras was expecting a (None, 7, 7) from labels to match the dimensions of your activation layer, but received a (None, 7) instead.
That is why using model.add(Flatten()) before adding the dense layer works fine, as the target dimensions and outputs dimensions are both (None, 7).