I have a neural network in keras with two conv2d layers, an average pooling layer and a dense output layer.
I want to put the trained model on an FPGA later on and the architecture does not support MaxPooling or AveragePooling layers.
However, I read somewhere that you can basically use a conv2d for pooling by playing with the parameters, I am unsure how to do it exactly.
I naively thought that a Pooling layer (max or average or whatever) like this:
model.add(tf.keras.layers.AveragePooling2D(pool_size=(1, 3)))
would do roughly the same job as this:
model.add(tf.keras.layers.Conv2D(1, (1, 3),strides=(1,3),use_bias=False,padding='same',name='Conv3'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
Where I thought that choosing 1 filter would be equivalent of telling to network to do one operation (e.g. pooling or maxing, whichever seems best). And the dimension and stride should correspond to that of the Pooling layer.
However the total parameters in my model are vastly different and I fail to understand why my model with the averagepooling has 15,644 parameters and my model with the conv2d variant only has 2,604 parameters?
Also the model performs a lot worse when doing it like this.
You could create conv layer, set weights that would perform average pooling and then set this layer as not trainable.
Example code:
conv_pool_weights = np.zeros((2, 2, 4, 4)) # this shape should be computed depending on shape of previous layer's output
for i in range(conv_pool_weights.shape[2]):
conv_pool_weights[:,:,i,i] = 1./(conv_pool_weights.shape[0]*conv_pool_weights.shape[1])
conv_pool = Conv2D(4, kernel_size=(2, 2), strides=(2, 2), input_shape=(16, 16, 4), use_bias=False)
model_conv = Sequential(
conv_pool
)
conv_pool.set_weights([conv_pool_weights])
conv_pool.trainable = False
model_pool = Sequential(
AveragePooling2D(input_shape=(16, 16, 4))
)
model_conv.summary()
model_pool.summary()
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 8, 8, 4) 64
=================================================================
Total params: 64
Trainable params: 0
Non-trainable params: 64
_________________________________________________________________
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
average_pooling2d (AverageP (None, 8, 8, 4) 0
ooling2D)
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
Test:
random_input = np.random.random((4, 16, 16, 4))
pred_1 = model_pool.predict(random_input)
pred_2 = model_conv.predict(random_input)
print(np.mean(np.abs(pred_1 - pred_2)))
Output:
1.1503289e-08
As we can see there is some difference but it is negligible.
Related
I am using Convolutional Neural Network to train a text classification task, using Keras, Conv1D. When I run the model below to my multi class text classification task, I get error such as following. I put time to undrestand the error but I don't know how to fix it. can anyone help me please?
The data set and evaluation set shape is such as following:
df_train shape: (7198,)
df_val shape: (1800,)
np.random.seed(42)
#You needs to reshape your input data according to Conv1D layer input format - (batch_size, steps, input_dim). Try
# set parameters of matrices and convolution
embedding_dim = 300
nb_filter = 64
filter_length = 5
hidden_dims = 32
stride_length = 1
from keras.layers import Embedding
embedding_layer = Embedding(len(tokenizer.word_index) + 1,
embedding_dim,
input_length=35,
name="Embedding")
inp = Input(shape=(35,), dtype='int32')
embeddings = embedding_layer(inp)
conv1 = Conv1D(filters=32, # Number of filters to use
kernel_size=filter_length, # n-gram range of each filter.
padding='same', #valid: don't go off edge; same: use padding before applying filter
activation='relu',
name="CONV1",
kernel_regularizer=regularizers.l2(l=0.0367))(embeddings)
conv2 = Conv1D(filters=32, # Number of filters to use
kernel_size=filter_length, # n-gram range of each filter.
padding='same', #valid: don't go off edge; same: use padding before applying filter
activation='relu',
name="CONV2",kernel_regularizer=regularizers.l2(l=0.02))(embeddings)
conv3 = Conv1D(filters=32, # Number of filters to use
kernel_size=filter_length, # n-gram range of each filter.
padding='same', #valid: don't go off edge; same: use padding before applying filter
activation='relu',
name="CONV2",kernel_regularizer=regularizers.l2(l=0.01))(embeddings)
max1 = MaxPool1D(10, strides=1,name="MaxPool1D1")(conv1)
max2 = MaxPool1D(10, strides=1,name="MaxPool1D2")(conv2)
max3 = MaxPool1D(10, strides=1,name="MaxPool1D2")(conv3)
conc = concatenate([max1, max2,max3])
flat = Flatten(name="FLATTEN")(max1)
....
Error is like following:
ValueError: Input 0 is incompatible with layer CNN: expected shape=(None, 35), found shape=(None, 31)
The model :
Model: "CNN"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_19 (InputLayer) [(None, 35)] 0
_________________________________________________________________
Embedding (Embedding) (None, 35, 300) 4094700
_________________________________________________________________
CONV1 (Conv1D) (None, 35, 32) 48032
_________________________________________________________________
MaxPool1D1 (MaxPooling1D) (None, 26, 32) 0
_________________________________________________________________
FLATTEN (Flatten) (None, 832) 0
_________________________________________________________________
Dropout (Dropout) (None, 832) 0
_________________________________________________________________
Dense (Dense) (None, 3) 2499
=================================================================
Total params: 4,145,231
Trainable params: 4,145,231
Non-trainable params: 0
_________________________________________________________________
Epoch 1/100
That error comes when you have not matched the network's input layer shape and the dataset's shape. If are you receiving an error like this, then you should try:
Set the network input shape at (None, 31) so that it matches the Dataset's shape.
Check that the dataset's shape is equal to (num_of_examples, 35).(Preferable)
If all of this informations are correct and there is no problem with the Dataset, it might be an error of the net itself, where the shapes af two adjcent layers don't match.
Considering this LSTM based RNN:
# Instantiating the model
model = Sequential()
# Input layer
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(30, 1)))
# Hidden layers
model.add(LSTM(12, activation="softsign", return_sequences=True))
model.add(LSTM(12, activation="softsign", return_sequences=True))
# Final Hidden layer
model.add(LSTM(10, activation="softsign"))
# Output layer
model.add(Dense(10))
Is each output unit from the final hidden layer connected to each 12 output unit of the preceding hidden layer ? (10*12 = 120 connections)
Is each one of the 10 outputs from the Dense layer connected to each one of the final hidden layer (10*10 = 100 connections)
Would there be a difference in term of connections between the Input layer and the 1st hidden layer if variable "return_sequence" was set to False (for both layers or for one) ?
Thanks a lot for your help
Aymeric
Here is how I picture the RNN, please tell me if it's wrong:
Note about the picture:
X = one training example, i.e a vector of 30 bitcoin (BTC) values (each value represent one day, 30 days total)
Output vector = 10 values that are supposed to be the 10 next values of bitcoin (10 next days)
Let's take a look at the model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 30, 30) 3840
_________________________________________________________________
lstm_1 (LSTM) (None, 30, 12) 2064
_________________________________________________________________
lstm_2 (LSTM) (None, 30, 12) 1200
_________________________________________________________________
lstm_3 (LSTM) (None, 10) 920
_________________________________________________________________
dense (Dense) (None, 10) 110
=================================================================
Total params: 8,134
Trainable params: 8,134
Non-trainable params: 0
_________________________________________________________________
Since you don't use return_sequences=True, the default is return_sequences=False, which means only the last output from the final LSTM layer is used by the Dense layer.
Yes. But it is actually 110 because you have a bias: (10 + 1) * 10.
There would not. The difference between return_sequence=True and return_sequence=False is that when it is set to false, only the final output will be sent to the next layer. So if I have a time series data with 30 events (1, 30, 30), only the output from the 30th event will be passed along to the next layer. The computations are the same, so there will be no difference in weights. Do know that there might be some shape mis-matches if you try to set some of these to be False out of the box.
I've a sample tiny CNN implemented in both Keras and PyTorch. When I print summary of both the networks, the total number of trainable parameters are same but total number of parameters and number of parameters for Batch Normalization don't match.
Here is the CNN implementation in Keras:
inputs = Input(shape = (64, 64, 1)). # Channel Last: (NHWC)
model = Conv2D(filters=32, kernel_size=(3, 3), padding='SAME', activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 1))(inputs)
model = BatchNormalization(momentum=0.15, axis=-1)(model)
model = Flatten()(model)
dense = Dense(100, activation = "relu")(model)
head_root = Dense(10, activation = 'softmax')(dense)
And the summary printed for above model is:
Model: "model_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) (None, 64, 64, 1) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 64, 64, 32) 320
_________________________________________________________________
batch_normalization_2 (Batch (None, 64, 64, 32) 128
_________________________________________________________________
flatten_3 (Flatten) (None, 131072) 0
_________________________________________________________________
dense_11 (Dense) (None, 100) 13107300
_________________________________________________________________
dense_12 (Dense) (None, 10) 1010
=================================================================
Total params: 13,108,758
Trainable params: 13,108,694
Non-trainable params: 64
_________________________________________________________________
Here's the implementation of the same model architecture in PyTorch:
# Image format: Channel first (NCHW) in PyTorch
class CustomModel(nn.Module):
def __init__(self):
super(CustomModel, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3, 3), padding=1),
nn.ReLU(True),
nn.BatchNorm2d(num_features=32),
)
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(in_features=131072, out_features=100)
self.fc2 = nn.Linear(in_features=100, out_features=10)
def forward(self, x):
output = self.layer1(x)
output = self.flatten(output)
output = self.fc1(output)
output = self.fc2(output)
return output
And following is the output of summary of the above model:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 32, 64, 64] 320
ReLU-2 [-1, 32, 64, 64] 0
BatchNorm2d-3 [-1, 32, 64, 64] 64
Flatten-4 [-1, 131072] 0
Linear-5 [-1, 100] 13,107,300
Linear-6 [-1, 10] 1,010
================================================================
Total params: 13,108,694
Trainable params: 13,108,694
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.02
Forward/backward pass size (MB): 4.00
Params size (MB): 50.01
Estimated Total Size (MB): 54.02
----------------------------------------------------------------
As you can see in above results, Batch Normalization in Keras has more number of parameters than PyTorch (2x to be exact). So what's the difference in above CNN architectures? If they are equivalent, then what am I missing here?
Keras treats as parameters (weights) many things that will be "saved/loaded" in the layer.
While both implementations naturally have the accumulated "mean" and "variance" of the batches, these values are not trainable with backpropagation.
Nevertheless, these values are updated every batch, and Keras treats them as non-trainable weights, while PyTorch simply hides them. The term "non-trainable" here means "not trainable by backpropagation", but doesn't mean the values are frozen.
In total they are 4 groups of "weights" for a BatchNormalization layer. Considering the selected axis (default = -1, size=32 for your layer)
scale (32) - trainable
offset (32) - trainable
accumulated means (32) - non-trainable, but updated every batch
accumulated std (32) - non-trainable, but updated every batch
The advantage of having it like this in Keras is that when you save the layer, you also save the mean and variance values the same way you save all other weights in the layer automatically. And when you load the layer, these weights are loaded together.
I am trying to implement a multi step forecasting LSTM model in Keras. The shapes of data is like this:
X : (5831, 48, 1)
y : (5831, 1, 12)
The model that I am trying to use is:
power_in = Input(shape=(X.shape[1], X.shape[2]))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563, kernel_initializer=power_lstm_init, return_sequences=True)(power_in)
main_out = TimeDistributed(Dense(12, kernel_initializer=power_lstm_init))(power_lstm)
While trying to train the model like this:
hist = forecaster.fit([X], y, epochs=325, batch_size=16, validation_data=([X_valid], y_valid), verbose=1, shuffle=False)
I am getting the following error:
ValueError: Error when checking target: expected time_distributed_16 to have shape (48, 12) but got array with shape (1, 12)
How to fix this?
According to your comment:
[The] data i have is like t-48, t-47, t-46, ..... , t-1 as the past data and
t+1, t+2, ......, t+12 as the values that I want to forecast
you may not need to use a TimeDistributed layer at all:
first, just remove the resturn_sequences=True argument of the LSTM layer. After doing it, the LSTM layer would encode the input timeseries of the past in a vector of shape (50,). Now you can feed it directly to a Dense layer with 12 units:
# make sure the labels have are in shape (num_samples, 12)
y = np.reshape(y, (-1, 12))
power_in = Input(shape=(X.shape[1:],))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
main_out = Dense(12, kernel_initializer=power_lstm_init)(power_lstm)
Alternatively, if you would like to use a TimeDistributed layer and considering that the output is a sequence itself, we can explicitly enforce this temporal dependency in our model by using another LSTM layer before the Dense layer (with the addition of a RepeatVector layer after the first LSTM layer to make its output a timseries of length 12, i.e. same as the output timeseries length):
# make sure the labels have are in shape (num_samples, 12, 1)
y = np.reshape(y, (-1, 12, 1))
power_in = Input(shape=(48,1))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
rep = RepeatVector(12)(power_lstm)
out_lstm = LSTM(32, return_sequences=True)(rep)
main_out = TimeDistributed(Dense(1))(out_lstm)
model = Model(power_in, main_out)
model.summary()
Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 48, 1) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 50) 10400
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 12, 50) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 12, 32) 10624
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 1) 33
=================================================================
Total params: 21,057
Trainable params: 21,057
Non-trainable params: 0
_________________________________________________________________
Of course, in both models you may need to tune the hyper-parameters (e.g. number of LSTM layers, the dimension of LSTM layers, etc.) to be able to accurately compare them and achieve good results.
Side note: actually, in your scenario, you don't need to use TimeDistributed layer at all because (currently) Dense layer is applied on the last axis. Therefore, TimeDistributed(Dense(...)) and Dense(...) are equivalent.
I have a problem with my current attempt to build a sequential model for time series classification in Keras. I want to work with channels_first data, because it is more convenient from a perprocessing perspective (I only work with one channel, though). This works fine for the Convolution1D layers I'm using, as I can specify data_sample='channels_first', but somehow this won't work for Maxpooling1D, which doesn't have this option as it seems.
The model I want to build is structured as follows:
model = Sequential()
model.add(Convolution1D(filters=16, kernel_size=35, activation='relu', input_shape=(1, window_length), data_format='channels_first'))
model.add(MaxPooling1D(pool_size=5)
model.add(Convolution1D(filters=16, kernel_size=10, activation='relu', data_format='channels_first'))
[...] #several other layers here
With window_length = 5000 I get the following summary after all three layers are added:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_1 (Conv1D) (None, 32, 4966) 1152
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 4, 4966) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 16, 4957) 656
=================================================================
Total params: 1,808
Trainable params: 1,808
Non-trainable params: 0
Now, I wonder if this is correct, as I would expect the third dimension (i.e. the number of neurons in a feature map) and not the second (i.e. the number of filters) to be reduced by the pooling layer? As I see it, MaxPooling1D does not recognize the channels_first ordering and while the Keras documentation says there exists a keyword data_format for MaxPooling2D, there's no such keyword for MaxPooling1D.
I tested the whole setup with a channels_last data format, and it worked as I expected. But since the conversion from channels_first to channels_last takes quite some time for me, I'd really rather have this work with channels_first. And I have the feeling that I'm simply missing something.
If you need any more information, let me know.
Update: as mentioned by #HSK in the comments, the data_format argument is now supported in MaxPooling layers as a result of this PR.
Well, one alternative is to use the Permute layer (and remove the channels_first for the second conv layer):
model = Sequential()
model.add(Convolution1D(filters=16, kernel_size=35, activation='relu', input_shape=(1, 100), data_format='channels_first'))
model.add(Permute((2, 1)))
model.add(MaxPooling1D(pool_size=5))
model.add(Convolution1D(filters=16, kernel_size=10, activation='relu'))
model.summary()
Model summary:
Layer (type) Output Shape Param #
=================================================================
conv1d_7 (Conv1D) (None, 16, 66) 576
_________________________________________________________________
permute_1 (Permute) (None, 66, 16) 0
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 13, 16) 0
_________________________________________________________________
conv1d_8 (Conv1D) (None, 4, 16) 2096
=================================================================
Total params: 2,672
Trainable params: 2,672
Non-trainable params: 0
_________________________________________________________________