Keras Model with Maxpooling1D and channel_first - python

I have a problem with my current attempt to build a sequential model for time series classification in Keras. I want to work with channels_first data, because it is more convenient from a perprocessing perspective (I only work with one channel, though). This works fine for the Convolution1D layers I'm using, as I can specify data_sample='channels_first', but somehow this won't work for Maxpooling1D, which doesn't have this option as it seems.
The model I want to build is structured as follows:
model = Sequential()
model.add(Convolution1D(filters=16, kernel_size=35, activation='relu', input_shape=(1, window_length), data_format='channels_first'))
model.add(Convolution1D(filters=16, kernel_size=10, activation='relu', data_format='channels_first'))
[...] #several other layers here
With window_length = 5000 I get the following summary after all three layers are added:
Layer (type) Output Shape Param #
conv1d_1 (Conv1D) (None, 32, 4966) 1152
max_pooling1d_1 (MaxPooling1 (None, 4, 4966) 0
conv1d_2 (Conv1D) (None, 16, 4957) 656
Total params: 1,808
Trainable params: 1,808
Non-trainable params: 0
Now, I wonder if this is correct, as I would expect the third dimension (i.e. the number of neurons in a feature map) and not the second (i.e. the number of filters) to be reduced by the pooling layer? As I see it, MaxPooling1D does not recognize the channels_first ordering and while the Keras documentation says there exists a keyword data_format for MaxPooling2D, there's no such keyword for MaxPooling1D.
I tested the whole setup with a channels_last data format, and it worked as I expected. But since the conversion from channels_first to channels_last takes quite some time for me, I'd really rather have this work with channels_first. And I have the feeling that I'm simply missing something.
If you need any more information, let me know.

Update: as mentioned by #HSK in the comments, the data_format argument is now supported in MaxPooling layers as a result of this PR.
Well, one alternative is to use the Permute layer (and remove the channels_first for the second conv layer):
model = Sequential()
model.add(Convolution1D(filters=16, kernel_size=35, activation='relu', input_shape=(1, 100), data_format='channels_first'))
model.add(Permute((2, 1)))
model.add(Convolution1D(filters=16, kernel_size=10, activation='relu'))
Model summary:
Layer (type) Output Shape Param #
conv1d_7 (Conv1D) (None, 16, 66) 576
permute_1 (Permute) (None, 66, 16) 0
max_pooling1d_2 (MaxPooling1 (None, 13, 16) 0
conv1d_8 (Conv1D) (None, 4, 16) 2096
Total params: 2,672
Trainable params: 2,672
Non-trainable params: 0


Acces to last convolutional layer transfer learning

I'm trying to get some heatmaps from a computervision model that's it's already working to classify images but I'm finding some difficulties.
This is the model summary:
Model: "model_4"
Layer (type) Output Shape Param #
input_9 (InputLayer) [(None, 512, 512, 1)] 0
conv2d_4 (Conv2D) (None, 512, 512, 3) 30
densenet121 (Functional) (None, 1024) 7037504
dense_4 (Dense) (None, 100) 102500
dropout_4 (Dropout) (None, 100) 0
predictions (Dense) (None, 2) 202
Total params: 7,140,236
Trainable params: 7,056,588
Non-trainable params: 83,648
As part of the standard procces to create a heatmap, I know I have to acces to the last convolutional layer in the model, that in this case I'll say it's a layer inside the Densenet121, but I can not find a way to access to all the layers belonging to densenet121.
Right now, I've been using conv2d_4 layer to run some tests, but I feel is not the right way because that layer is before all the Transfer learning work from densenet.
Also, I just looked up for Funcitnal layers in KErar official documentation but I cound't find it, so I guess it's not a layer, it's like the hole densenet model embedded there, but I can not find a way to access.
By the way, here I share the model construction because it may help to answer this:
from tensorflow.keras.applications.densenet import DenseNet121
num_classes = 2
input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1))
x = Conv2D(3,(3,3), padding='same')(input_tensor)
x = DenseNet121(include_top=False, classes=2, pooling="avg", weights="imagenet")(x)
x = Dense(100)(x)
x = Dropout(0.45)(x)
predictions = Dense(num_classes, activation='softmax', name="predictions")(x)
model = Model(inputs=input_tensor, outputs=predictions)
I found you can use
twice to acces layers inside functional densenet model embebeed in the "main" model.
In this case I can use model.get_layer('densenet121').summary() to check all thje layer inside the embebeed model, and then use them with this code: model.get_layer('densenet121').get_layer('xxxxx')

How to interpret output shape printed in the summary of tensorflow model?

I have simple multi-layer perceptron for MNIST data classification problem.
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
When printing summary i receive following output:
Layer (type) Output Shape Param #
flatten_8 (Flatten) (None, 784) 0
dense_16 (Dense) (None, 128) 100480
dense_17 (Dense) (None, 10) 1290
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
How do I interpret output shape printed in the summary? Why is there None therm in the output shape tuple? Why is it not just (784) in the first layer?
The "None" value refers to the number of input samples (the batch size). To allow you to train on different sized training sets, this value is None. If it were a number, let's say 50 for example, that means you can only train on exactly 50 samples which is usually not very useful (but does occasionally have applications).

Keras Transfer-Learning setting layers.trainable to True has no effect

I want to finetune efficientnet using tf.keras (tensorflow 2.3) but i cannot change the training status of layers properly. My model looks like this:
data_augmentation_layers = tf.keras.Sequential([
efficientnet = EfficientNetB3(weights="imagenet", include_top=False,
input_shape=(*img_size, 3))
#Setting to not trainable as described in the standard keras FAQ
efficientnet.trainable = False
inputs = keras.layers.Input(shape=(*img_size, 3))
augmented = augmentation_layers(inputs)
base = efficientnet(augmented, training=False)
pooling = keras.layers.GlobalAveragePooling2D()(base)
outputs = keras.layers.Dense(5, activation="softmax")(pooling)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss="categorical_crossentropy", optimizer=keras_opt, metrics=["categorical_accuracy"])
This is done so that my random weights on the custom top wont destroy the weights asap.
Model: "functional_1"
Layer (type) Output Shape Param #
input_2 (InputLayer) [(None, 512, 512, 3)] 0
sequential (Sequential) (None, 512, 512, 3) 0
efficientnetb3 (Functional) (None, 16, 16, 1536) 10783535
global_average_pooling2d (Gl (None, 1536) 0
dense (Dense) (None, 5) 7685
Total params: 10,791,220
Trainable params: 7,685
Non-trainable params: 10,783,535
Everything seems to work until this point. I train my model for 2 epochs and then i want to start fine-tuning the efficientnet base. Thus i call
for l in model.get_layer("efficientnetb3").layers:
if not isinstance(l, keras.layers.BatchNormalization):
l.trainable = True
model.compile(loss="categorical_crossentropy", optimizer=keras_opt, metrics=["categorical_accuracy"])
I recompiled and print the summary again to see that the number of non-trainable weights remained the same. Also fitting does not bring better results that keeping frozen.
dense (Dense) (None, 5) 7685
Total params: 10,791,220
Trainable params: 7,685
Non-trainable params: 10,783,535
Ps: I also tried efficientnet3.trainable = True but this also had no effect.
Could it be that it has something to do with the fact that i'm using a sequential and a functional model at the same time?
For me the problem was using sequential API for part of the model. When I change to sequential, my model.sumary() displayed all the sublayers and it was possible to set some of them as trainable and others not.

Bidirectional LSTM output shape

There is Bidirectional LSTM model, I don't understand why after the second implementation of model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2))), in the result we get 2 dimension (None, 20) but in the first bi directionaL LSTM we have (None, 409, 20).
can anyone help me please?
and also how can I add a self attention layer in the model?
from tensorflow.keras.layers import LSTM,Dense, Dropout,Bidirectional
from tensorflow.keras.layers import SpatialDropout1D
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
embedding_vector_length = 100
model2 = Sequential()
model2.add(Embedding(len(tokenizer.word_index) + 1, embedding_vector_length,
input_length=409) )
model2.add(Bidirectional(LSTM(10, return_sequences=True, recurrent_dropout=0.2)))
model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2)))
#model2.add(Dense(256, activation='relu'))
model2.add(Dense(3, activation='softmax'))
and the output:
Layer (type) Output Shape Param #
embedding_23 (Embedding) (None, 409, 100) 1766600
bidirectional_12 (Bidirectio (None, 409, 20) 8880
dropout_8 (Dropout) (None, 409, 20) 0
bidirectional_13 (Bidirectio (None, 20) 2480
dense_15 (Dense) (None, 3) 63
Total params: 1,778,023
Trainable params: 1,778,023
Non-trainable params: 0
For the second Bidirectional-LSTM, by default, return_sequences is set to False. Therefore, the output of this layer will be like many-to-one. If you want to get the output of each time_step, then simply use model2.add(Bidirectional(LSTM(10, return_sequences=True , recurrent_dropout=0.2))).
For attention mechanism in LSTM, you may refer to this and this links.

Why is my layer output not the same dimensions as shown in my model summary?

I have managed to create a successful RNN that can predict the next letter in a sequence of letters. However, I can't work out why a solution to a problem I was having is working.
My training data is of dimensions (39000,7,7)
My Model is as follows:
model = Sequential()
model.add(SimpleRNN(7, input_shape = [7,7], return_sequences = True))
adam = optimizers.Adam(lr = 0.001)
model.compile(loss='categorical_crossentropy',optimizer=adam, metrics=['accuracy'])
return model
Layer (type) Output Shape Param #
simple_rnn_49 (SimpleRNN) (None, 7, 7) 105
flatten_14 (Flatten) (None, 49) 0
dense_49 (Dense) (None, 7) 350
activation_40 (Activation) (None, 7) 0
Total params: 455
Trainable params: 455
Non-trainable params: 0
This works perfectly. My question is, why do I need the flatten layer? When I don't include it I get this model summary:
Layer (type) Output Shape Param #
simple_rnn_50 (SimpleRNN) (None, 7, 7) 105
dense_50 (Dense) (None, 7, 7) 56
activation_41 (Activation) (None, 7, 7) 0
Total params: 161
Trainable params: 161
Non-trainable params: 0
followed by this error
ValueError: Error when checking target: expected activation_41 to have 3 dimensions, but got array with shape (39000, 7)
My Question is: when the model summary says the output of the dense layer should be (None, 7 , 7) in the second example, and the error message says activation level is expecting exactly such a 3D input, why is the dense layer actually outputting a tensor of shape (39000,7) as according to the error message? I realise the flatten() layer solves this by putting everything in 2D, but im confused as to why it doesn't work without it.
In your error statement you can see that the error is caused when checking the target dimensions. Your model output without the flatten layer is of the shape (None, 7, 7) which is correctly shown in your model summary. The issue here is that your labels are of the shape (None, 7), so Keras throws a ValueError (probably during backpropogation) as your labels have one less dimension than the output of your network. Keras was expecting a (None, 7, 7) from labels to match the dimensions of your activation layer, but received a (None, 7) instead.
That is why using model.add(Flatten()) before adding the dense layer works fine, as the target dimensions and outputs dimensions are both (None, 7).
