non trainable parameters params in keras model is calculated - python

I have following program taken from Internet
def my_model(input_shape):
# Define the input placeholder as a tensor with shape input_shape. Think of this as your input image!
X_input = Input(input_shape)
# Zero-Padding: pads the border of X_input with zeroes
X = ZeroPadding2D((3, 3))(X_input)
# CONV -> BN -> RELU Block applied to X
X = Conv2D(32, (7, 7), strides = (1, 1), name = 'conv0')(X)
X = BatchNormalization(axis = 3, name = 'bn0')(X)
X = Activation('relu')(X)
# MAXPOOL
X = MaxPooling2D((2, 2), name='max_pool')(X)
# FLATTEN X (means convert it to a vector) + FULLYCONNECTED
X = Flatten()(X)
X = Dense(1, activation='sigmoid', name='fc')(X)
# Create model. This creates your Keras model instance, you'll use this instance to train/test the model.
model = Model(inputs = X_input, outputs = X, name='myModel')
return model
mymodel = my_model((64,64,3))
mymodel.summary()
Here output of summary is shown as below
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 64, 64, 3) 0
_________________________________________________________________
zero_padding2d_3 (ZeroPaddin (None, 70, 70, 3) 0
_________________________________________________________________
conv0 (Conv2D) (None, 64, 64, 32) 4736
_________________________________________________________________
bn0 (BatchNormalization) (None, 64, 64, 32) 128
_________________________________________________________________
activation_2 (Activation) (None, 64, 64, 32) 0
_________________________________________________________________
max_pool (MaxPooling2D) (None, 32, 32, 32) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 32768) 0
_________________________________________________________________
fc (Dense) (None, 1) 32769
=================================================================
Total params: 37,633
Trainable params: 37,569
Non-trainable params: 64
My question is from which layer this non-trainable params are taken i.e., 64.
Another question is how batch normalization has parameters 128?
Request your help how above numbers we got from model defined above. Thanks for the time and help.

BatchNormalization layer is composed of [gamma weights, beta weights, moving_mean(non-trainable), moving_variance(non-trainable)] and for each parameter there is one value for each element in the last axis (by default in keras, but you can change the axis if you want to).
In your code you have a size 32 in the last dimension before the BatchNormalization layer, so 32*4=128 parameters and since there are 2 non-trainable parameters there are 32*2=64 non-trainable parameters

Related

Keras model with 2 inputs

I am dealing with a binary classification problem that feeds a network with two inputs (images),
model_vgg16_conv = VGG16(weights='imagenet', include_top=False)
for layer in model_vgg16_conv.layers:
layer.trainable = False
model_vgg16_conv.summary()
input1 = Input(shape=(60,36,3))
input2 = Input(shape=(60,36,3))
concate_input = concatenate([input1, input2])
input = Conv2D(3, (3, 3),
padding='same', activation="relu")(concate_input)
#Use the generated model
output_vgg16_conv = model_vgg16_conv(input)
#Paso la salida del modelo por varas capas.
x = GlobalAveragePooling2D()(output_vgg16_conv)
x = Dense(512,activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
#Create your own model
my_model = Model(inputs=[input1, input2], outputs=predictions)
#In the summary, weights and layers from VGG part will be hidden, but they will be fit during the training
my_model.summary()
my_model.compile(loss='binary_crossentropy', # función de pérdida para problemas de clasificación multi-clase
optimizer=optimizers.Adam(learning_rate=1e-4), # optimizador Adam
metrics=['accuracy'])
Model: "model_5"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_17 (InputLayer) [(None, 60, 36, 3)] 0 []
input_18 (InputLayer) [(None, 60, 36, 3)] 0 []
concatenate_5 (Concatenate) (None, 60, 36, 6) 0 ['input_17[0][0]',
'input_18[0][0]']
conv2d_5 (Conv2D) (None, 60, 36, 3) 165 ['concatenate_5[0][0]']
vgg16 (Functional) (None, None, None, 14714688 ['conv2d_5[0][0]']
512)
global_average_pooling2d_5 (Gl (None, 512) 0 ['vgg16[0][0]']
obalAveragePooling2D)
dense_10 (Dense) (None, 512) 262656 ['global_average_pooling2d_5[0][0
]']
dense_11 (Dense) (None, 1) 513 ['dense_10[0][0]']
==================================================================================================
Total params: 14,978,022
Trainable params: 263,334
Non-trainable params: 14,714,688
My question is if I am feeding the network correctly? The Vgg16 layer displays the value (none, none, none, 3). Is it right?
It's ok, the problem is that this architecture introduces overfitting. The solution is to use weighted weights or oversampling.

How to add one point as a feature in an encoder-decoder time series model?

I have been performing a seq2seq time-series prediction using encoder-decoder LSTM architecture. The input data to the model has 2 features, which are essentially two arrays: one is a dependent variable (y-values) and the other, an independent variable (x-values). The shape of the array is:
input_shape: (57, 20, 2)
Where, for example, the x and y-values of one time series are of the shape (1, 20, 2), and their positions in the 3D array being:
x = input_shape[:][:, 0]
y = input_shape[:][:, 1]
I am now faced with a challenge of feeding a point (an x-y timestep, so to speak) as an additional feature. Is there any way to do so?
EDIT: I have added the model that I'm using based on the requests in the comments. It may be noted that the input size I mentioned is small here for reasons of simplicity. The actual input I am using is quite large.
model = Sequential()
model.add(Masking(mask_value=0, input_shape = (input_shape.shape[1], 2)))
model.add(Bidirectional(LSTM(128, dropout=0, return_sequences=True, activation='tanh')))
model.add(Bidirectional(LSTM(128, dropout=0, return_sequences=False)))
model.add((RepeatVector(targets.shape[1])))
model.add(Bidirectional(LSTM(128, dropout=0, return_sequences=True, activation='tanh')))
model.add(Bidirectional(LSTM(128, dropout=0, return_sequences=True)))
model.add(TimeDistributed(Dense(64, activation='relu')))
model.add(TimeDistributed(Dense(1, activation='linear')))
model.build()
model.compile(optimizer=optimizers.Adam(0.00001), loss = 'MAE')
I would give your model two inputs, where the first input is your normal time series in the shape of (batch,20,2) and a second input of your special time point in the shape (batch,2). Then define the following architecture that repeats your special point 20 times to get (batch,20,2) which is then concatenated with your normal input. (Note i defined target_shape_1 to make sure it compiles on my end, but you can replace it with target.shape[1])
input_shape_1 = 20
target_shape_1 = 3
normal_input = Input(shape=(20, 2), name='normal_inputs') #your normal time series (None,20,2) (Batch,time,feats)
key_time_point = Input(shape=(2),name='key_time_point') #your single special point (None,2) (Batch,feats)
key_time_repeater = RepeatVector(20,name='key_time_repeater') #repeat your special point 20 times
key_time_repeater_out = key_time_repeater(key_time_point) #turning your (None,2) into (None,20,2)
initial_mask = Masking(mask_value=0, input_shape = (20, 4))
masked_out = initial_mask(
#concat your normal input (None,20,2) and repeated input (None,20,2)
#into (None, 20,4) and feed to nn
tf.concat([normal_input,key_time_repeater_out],len(normal_input.shape)-1)
)
encoder_1 = Bidirectional(LSTM(128, dropout=0, return_sequences=True, activation='tanh'))
encoder_2 = Bidirectional(LSTM(128, dropout=0, return_sequences=False))
encoder_repeat = RepeatVector(target_shape_1)
encoder_out = encoder_repeat(encoder_2(encoder_1(masked_out)))
decoder_1 = Bidirectional(LSTM(128, dropout=0, return_sequences=True, activation='tanh'))
decoder_2 = Bidirectional(LSTM(128, dropout=0, return_sequences=True))
decoder_dense = TimeDistributed(Dense(64, activation='relu'))
decoder_out = decoder_dense(decoder_2(decoder_1(encoder_out)))
final_output = TimeDistributed(Dense(1, activation='linear'))(decoder_out)
model = tf.keras.models.Model(inputs=[normal_input, key_time_point], outputs=final_output)
model.compile(optimizer=tf.keras.optimizers.Adam(0.00001), loss = 'MAE')
A summary() of the model looks like this:
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
key_time_point (InputLayer) [(None, 2)] 0
__________________________________________________________________________________________________
normal_inputs (InputLayer) [(None, 20, 2)] 0
__________________________________________________________________________________________________
key_time_repeater (RepeatVector (None, 20, 2) 0 key_time_point[0][0]
__________________________________________________________________________________________________
tf_op_layer_concat_3 (TensorFlo [(None, 20, 4)] 0 normal_inputs[0][0]
key_time_repeater[0][0]
__________________________________________________________________________________________________
masking_4 (Masking) (None, 20, 4) 0 tf_op_layer_concat_3[0][0]
__________________________________________________________________________________________________
bidirectional_12 (Bidirectional (None, 20, 256) 136192 masking_4[0][0]
__________________________________________________________________________________________________
bidirectional_13 (Bidirectional (None, 256) 394240 bidirectional_12[0][0]
__________________________________________________________________________________________________
repeat_vector_11 (RepeatVector) (None, 3, 256) 0 bidirectional_13[0][0]
__________________________________________________________________________________________________
bidirectional_14 (Bidirectional (None, 3, 256) 394240 repeat_vector_11[0][0]
__________________________________________________________________________________________________
bidirectional_15 (Bidirectional (None, 3, 256) 394240 bidirectional_14[0][0]
__________________________________________________________________________________________________
time_distributed_7 (TimeDistrib (None, 3, 64) 16448 bidirectional_15[0][0]
__________________________________________________________________________________________________
time_distributed_8 (TimeDistrib (None, 3, 1) 65 time_distributed_7[0][0]
==================================================================================================
Total params: 1,335,425
Trainable params: 1,335,425
Non-trainable params: 0
__________________________________________________________________________________________________

TensorFlow input shape error at Dense output layer is contradictory to what model.summary() says

I am playing around with an NLP problem (sentence classification) and decided to use HuggingFace's TFBertModel along with Conv1D, Flatten, and Dense layers. I am using the functional API and my model compiles. However, during model.fit(), I get a shape error at the output Dense layer.
Model definition:
# Build model with a max length of 50 words in a sentence
max_len = 50
def build_model():
bert_encoder = TFBertModel.from_pretrained(model_name)
input_word_ids = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_word_ids")
input_mask = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_mask")
input_type_ids = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_type_ids")
# Create a conv1d model. The model may not really be useful or make sense, but that's OK (for now).
embedding = bert_encoder([input_word_ids, input_mask, input_type_ids])[0]
conv_layer = tf.keras.layers.Conv1D(32, 3, activation='relu')(embedding)
dense_layer = tf.keras.layers.Dense(24, activation='relu')(conv_layer)
flatten_layer = tf.keras.layers.Flatten()(dense_layer)
output_layer = tf.keras.layers.Dense(3, activation='softmax')(flatten_layer)
model = tf.keras.Model(inputs=[input_word_ids, input_mask, input_type_ids], outputs=output_layer)
model.compile(tf.keras.optimizers.Adam(lr=1e-5), loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
# View model architecture
model = build_model()
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_word_ids (InputLayer) [(None, 50)] 0
__________________________________________________________________________________________________
input_mask (InputLayer) [(None, 50)] 0
__________________________________________________________________________________________________
input_type_ids (InputLayer) [(None, 50)] 0
__________________________________________________________________________________________________
tf_bert_model (TFBertModel) ((None, 50, 768), (N 177853440 input_word_ids[0][0]
input_mask[0][0]
input_type_ids[0][0]
__________________________________________________________________________________________________
conv1d (Conv1D) (None, 48, 32) 73760 tf_bert_model[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 48, 24) 792 conv1d[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 1152) 0 dense[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 3) 3459 flatten[0][0]
==================================================================================================
Total params: 177,931,451
Trainable params: 177,931,451
Non-trainable params: 0
__________________________________________________________________________________________________
# Fit model on input data
model.fit(train_input, train['label'].values, epochs = 3, verbose = 1, batch_size = 16,
validation_split = 0.2)
And this is the error message:
ValueError: Input 0 of layer dense_1 is incompatible with the layer: expected axis -1 of input shape to have value 1152 but received
input with shape [16, 6168]
I am unable to understand how the input shape to layer dense_1 (the output dense layer) can be 6168? As per the model summary, it should always be 1152.
The shape of your input is likely not as you expect. Check the shape of train_input.

How to feed and build a "Input->Dense->Conv2D->Dense" network in keras?

This is a simple example that reproduces my issue in a network I am trying to deploy.
I have an image input layer (which I need to maintain), then a Dense layer, Conv2D layer and a dense layer.
The idea is that the inputs are 10x10 images and the labels are 10x10 images. Inspired by my code and this example.
import numpy as np
from keras.models import Model
from keras.layers import Input, Conv2D
#Building model
size=10
a = Input(shape=(size,size,1))
hidden = Dense(size)(a)
hidden = Conv2D(kernel_size = (3,3), filters = size*size, activation='relu', padding='same')(hidden)
outputs = Dense(size, activation='sigmoid')(hidden)
model = Model(inputs=a, outputs=outputs)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#Create random data and accounting for 1 channel of data
n_images=55
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))
#Fit model
model.fit(data, labels, verbose=1, batch_size=10, epochs=20)
print(model.summary())
I get the following error: ValueError: Error when checking target: expected dense_92 to have shape (10, 10, 10) but got array with shape (10, 10, 1)
I don't get an error if I change:
outputs = Dense(size, activation='sigmoid')(hidden)
with:
outputs = Dense(1, activation='sigmoid')(hidden)
No idea how Dense(1) is even valid and how it allows 10x10 output signal as model.summary() indicates:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_26 (InputLayer) (None, 10, 10, 1) 0
_________________________________________________________________
dense_93 (Dense) (None, 10, 10, 10) 20
_________________________________________________________________
conv2d_9 (Conv2D) (None, 10, 10, 100) 9100
_________________________________________________________________
dense_94 (Dense) (None, 10, 10, 1) 101
=================================================================
Total params: 9,221
Trainable params: 9,221
Non-trainable params: 0
_________________________________________________________________
None
Well, according to your comments:
what I am trying to do isn't standard. I have set of images and for
each image I want to find a binary image of the same size that if the
value of its pixel is 1 it means the feature exists in the input image
the insight wether a pixel has a feature should be taken both from
local information (extracted by a convolution layers) and global
information extracted by Dense layers.
I guess you are looking for creating a two branch model where one branch consists of convolution layers and another one is simply one or more dense layers on top of each other (although, I should mention that in my opinion one convolution network may achieve what you are looking for, because the combination of pooling and convolution layers and then maybe some up-sampling layers at the end somehow preserves both local and global information). To define such a model, you can use Keras functional API like this:
from keras import models
from keras import layers
input_image = layers.Input(shape=(10, 10, 1))
# branch one: dense layers
b1 = layers.Flatten()(input_image)
b1 = layers.Dense(64, activation='relu')(b1)
b1_out = layers.Dense(32, activation='relu')(b1)
# branch two: conv + pooling layers
b2 = layers.Conv2D(32, (3,3), activation='relu')(input_image)
b2 = layers.MaxPooling2D((2,2))(b2)
b2 = layers.Conv2D(64, (3,3), activation='relu')(b2)
b2_out = layers.MaxPooling2D((2,2))(b2)
# merge two branches
flattened_b2 = layers.Flatten()(b2_out)
merged = layers.concatenate([b1_out, flattened_b2])
# add a final dense layer
output = layers.Dense(10*10, activation='sigmoid')(merged)
output = layers.Reshape((10,10))(output)
# create the model
model = models.Model(input_image, output)
model.compile(optimizer='rmsprop', loss='binary_crossentropy')
model.summary()
Model summary:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 10, 10, 1) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 8, 8, 32) 320 input_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 4, 4, 32) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 100) 0 input_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 2, 2, 64) 18496 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 64) 6464 flatten_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 1, 1, 64) 0 conv2d_2[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 32) 2080 dense_1[0][0]
__________________________________________________________________________________________________
flatten_2 (Flatten) (None, 64) 0 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 96) 0 dense_2[0][0]
flatten_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 100) 9700 concatenate_1[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) (None, 10, 10) 0 dense_3[0][0]
==================================================================================================
Total params: 37,060
Trainable params: 37,060
Non-trainable params: 0
__________________________________________________________________________________________________
Note that this is one way of achieving what you are looking for and it may or may not work for the specific problem and the data you are working on. You may modify this model (e.g. remove the pooling layers or add more dense layers) or completely use another architecture with different kind of layers (e.g. up-sampling, conv2dtrans) to reach a better accuracy. At the end, you must experiment to find the perfect solution.
Edit:
For completeness here is how to generate data and fitting the network:
n_images=10
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))
model.fit(data, labels, verbose=1, batch_size=32, epochs=20)

Keras Functional API: LTSM returns a 2 dimensional array

I'm stack and I need the wisdom of stackoverflow.
I have a two inputs neural network implemented in Keras using the Functional API, the input shapes are:
X.shape, X_size.shape, y.shape
((123, 9), (123, 2), (123, 9, 10))
So, my problem is I want to get output shape from LSTMs have 3-D shape, in order to use my y tensor. I know, I can reshape my y to 2-D shape, but I want to use it as a 3-D array.
from keras.models import Model
from keras import layers
from keras import Input
# first input
list_input = Input(shape=(None,), dtype='int32', name='li')
embedded_list = layers.Embedding(100,90)(list_input)
encoded_list = layers.LSTM(4, name = "lstm1")(embedded_list)
# second input
size_input = Input(shape=(None,), dtype='int32', name='si')
embedded_size = layers.Embedding(100,10)(size_input)
encoded_size = layers.LSTM(4, name = "lstm2")(embedded_size)
# concatenate
concatenated = layers.concatenate([encoded_size, encoded_list], axis=-1)
answer = layers.Dense(90, activation='sigmoid', name = 'outpuy_layer')(concatenated)
model = Model([list_input, size_input], answer)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=[f1])
Model summary:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
si (InputLayer) (None, None) 0
____________________________________________________________________________________________________
li (InputLayer) (None, None) 0
____________________________________________________________________________________________________
embedding_16 (Embedding) (None, None, 10) 1000 si[0][0]
____________________________________________________________________________________________________
embedding_15 (Embedding) (None, None, 90) 9000 li[0][0]
____________________________________________________________________________________________________
lstm2 (LSTM) (None, 4) 240 embedding_16[0][0]
____________________________________________________________________________________________________
lstm1 (LSTM) (None, 4) 1520 embedding_15[0][0]
____________________________________________________________________________________________________
concatenate_8 (Concatenate) (None, 8) 0 lstm2[0][0]
lstm1[0][0]
____________________________________________________________________________________________________
outpuy_layer (Dense) (None, 90) 810 concatenate_8[0][0]
====================================================================================================
Total params: 12,570
Trainable params: 12,570
Non-trainable params: 0
One more time, the question is:
How to get output shape from LSTMs like (None, None, None/10)?
Keras ignores every timestep output except the last one by default, which creates a 2D array. To get a 3D array (meaning you get the output of every timestep), instantiate the layer with return_sequences set to True. In your case for example:
encoded_list = layers.LSTM(4, name = "lstm1", return_sequences=True)(embedded_list)

Categories