I am trying to implement a multi-layer RNN model in Tensorflow 2.0. Trying both tf.keras.layers.StackedRNNCells and tf.keras.layers.RNN turns out identical results. Can anyone help me out to understand the differences between tf.keras.layers.RNN and tf.keras.layers.StackedRNNCells?
# driving parameters
sz_batch = 128
sz_latent = 200
sz_sequence = 196
sz_feature = 2
n_units = 120
n_layers = 3
Mulitlayer RNN with tf.keras.layers.RNN:
inputs = tf.keras.layers.Input(batch_shape=(sz_batch, sz_sequence, sz_feature))
cells = [tf.keras.layers.GRUCell(n_units) for _ in range(n_layers)]
outputs = tf.keras.layers.RNN(cells, stateful=True, return_sequences=True, return_state=False)(inputs)
outputs = tf.keras.layers.Dense(1)(outputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.summary()
returns:
Model: "model_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_88 (InputLayer) [(128, 196, 2)] 0
_________________________________________________________________
rnn_61 (RNN) (128, 196, 120) 218880
_________________________________________________________________
dense_19 (Dense) (128, 196, 1) 121
=================================================================
Total params: 219,001
Trainable params: 219,001
Non-trainable params: 0
Mulitlayer RNN with tf.keras.layers.RNN and tf.keras.layers.StackedRNNCells:
inputs = tf.keras.layers.Input(batch_shape=(sz_batch, sz_sequence, sz_feature))
cells = [tf.keras.layers.GRUCell(n_units) for _ in range(n_layers)]
outputs = tf.keras.layers.RNN(tf.keras.layers.StackedRNNCells(cells),
stateful=True,
return_sequences=True,
return_state=False)(inputs)
outputs = tf.keras.layers.Dense(1)(outputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.summary()
returns:
Model: "model_14"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_89 (InputLayer) [(128, 196, 2)] 0
_________________________________________________________________
rnn_62 (RNN) (128, 196, 120) 218880
_________________________________________________________________
dense_20 (Dense) (128, 196, 1) 121
=================================================================
Total params: 219,001
Trainable params: 219,001
Non-trainable params: 0
tf.keras.layers.RNN uses tf.keras.layers.StackedRNNCells if you give it a list or a tuple of cells.
This is done in https://github.com/tensorflow/tensorflow/blob/v2.1.0/tensorflow/python/keras/layers/recurrent.py#L390
Related
In the example from 3b1b's video about Neural Network (the video), the model has 784 "neurons" in the input layer, followed by two 16-neuron dense layers, and a 10-neuron dense layer. (Please refer to the screenshot of the video provided below). This makes sense, because for example the first neuron in the input layer will have 16 'weights' (as in xw) so the number of weights is 784 * 16. And followed by 1616, and 16*10. There are also biases, which is same as the number of neurons in the dense layers.
Then I made the same model in Tensorflow, and the model.summary() shows the following:
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 784, 1)] 0
dense_8 (Dense) (None, 784, 16) 32
dense_9 (Dense) (None, 784, 16) 272
dense_10 (Dense) (None, 784, 10) 170
=================================================================
Total params: 474
Trainable params: 474
Non-trainable params: 0
_________________________________________________________________
Code used to produce the above:
#I'm using Keras through Julia so the code may look different?
input_shape = (784,1)
inputs = layers.Input(input_shape)
outputs = layers.Dense(16)(inputs)
outputs = layers.Dense(16)(outputs)
outputs = layers.Dense(10)(outputs)
model = keras.Model(inputs, outputs)
model.summary()
Which does not reflect the input shape at all? So I made another model with input_shape=(1,1), and I get the same Total Params:
Model: "model_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 1, 1)] 0
dense_72 (Dense) (None, 1, 16) 32
dense_73 (Dense) (None, 1, 16) 272
dense_74 (Dense) (None, 1, 10) 170
=================================================================
Total params: 474
Trainable params: 474
Non-trainable params: 0
_________________________________________________________________
I don't think it's a bug, but I probably just don't understand what these mean / how Params are calculated.
Any help will be very appreciated.
A Dense layer is applied to the last dimension of your input. In your case it is 1, instead of 784. What you actually want is:
import tensorflow as tf
input_shape = (784, )
inputs = tf.keras.layers.Input(input_shape)
outputs = tf.keras.layers.Dense(16)(inputs)
outputs = tf.keras.layers.Dense(16)(outputs)
outputs = tf.keras.layers.Dense(10)(outputs)
model = tf.keras.Model(inputs, outputs)
model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 784)] 0
dense_3 (Dense) (None, 16) 12560
dense_4 (Dense) (None, 16) 272
dense_5 (Dense) (None, 10) 170
=================================================================
Total params: 13,002
Trainable params: 13,002
Non-trainable params: 0
_________________________________________________________________
From the TF docs:
Note: If the input to the layer has a rank greater than 2, then Dense
computes the dot product between the inputs and the kernel along the
last axis of the inputs and axis 0 of the kernel (using tf.tensordot).
For example, if input has dimensions (batch_size, d0, d1), then we
create a kernel with shape (d1, units), and the kernel operates along
axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there
are batch_size * d0 such sub-tensors). The output in this case will
have shape (batch_size, d0, units).
I am playing around with an NLP problem (sentence classification) and decided to use HuggingFace's TFBertModel along with Conv1D, Flatten, and Dense layers. I am using the functional API and my model compiles. However, during model.fit(), I get a shape error at the output Dense layer.
Model definition:
# Build model with a max length of 50 words in a sentence
max_len = 50
def build_model():
bert_encoder = TFBertModel.from_pretrained(model_name)
input_word_ids = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_word_ids")
input_mask = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_mask")
input_type_ids = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_type_ids")
# Create a conv1d model. The model may not really be useful or make sense, but that's OK (for now).
embedding = bert_encoder([input_word_ids, input_mask, input_type_ids])[0]
conv_layer = tf.keras.layers.Conv1D(32, 3, activation='relu')(embedding)
dense_layer = tf.keras.layers.Dense(24, activation='relu')(conv_layer)
flatten_layer = tf.keras.layers.Flatten()(dense_layer)
output_layer = tf.keras.layers.Dense(3, activation='softmax')(flatten_layer)
model = tf.keras.Model(inputs=[input_word_ids, input_mask, input_type_ids], outputs=output_layer)
model.compile(tf.keras.optimizers.Adam(lr=1e-5), loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
# View model architecture
model = build_model()
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_word_ids (InputLayer) [(None, 50)] 0
__________________________________________________________________________________________________
input_mask (InputLayer) [(None, 50)] 0
__________________________________________________________________________________________________
input_type_ids (InputLayer) [(None, 50)] 0
__________________________________________________________________________________________________
tf_bert_model (TFBertModel) ((None, 50, 768), (N 177853440 input_word_ids[0][0]
input_mask[0][0]
input_type_ids[0][0]
__________________________________________________________________________________________________
conv1d (Conv1D) (None, 48, 32) 73760 tf_bert_model[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 48, 24) 792 conv1d[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 1152) 0 dense[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 3) 3459 flatten[0][0]
==================================================================================================
Total params: 177,931,451
Trainable params: 177,931,451
Non-trainable params: 0
__________________________________________________________________________________________________
# Fit model on input data
model.fit(train_input, train['label'].values, epochs = 3, verbose = 1, batch_size = 16,
validation_split = 0.2)
And this is the error message:
ValueError: Input 0 of layer dense_1 is incompatible with the layer: expected axis -1 of input shape to have value 1152 but received
input with shape [16, 6168]
I am unable to understand how the input shape to layer dense_1 (the output dense layer) can be 6168? As per the model summary, it should always be 1152.
The shape of your input is likely not as you expect. Check the shape of train_input.
I am trying to train an auto encoder in tensorflow using the Keras Layer API. This API is quite nice and easy to use to setup the deep learning layers.
Just to review a quickly an autoencoder (in my mind) is a function $f(x) = z$ and its pseudo inverse \hat{x} = f^{-1}(z) such that f(f^{-1}(x)) \approx x. In a neural network model, you would setup a neural network with a bottleneck layer that tries to predict itself x using f^{-1}(f(x)). When the training error minimizes, you then have two components, z = f(x) is the prediction up until and including the bottleneck layer. f^{-1}(z) is the bottleneck layer to the end.
So I setup the encoder:
SZ = 6
model = tf.keras.Sequential()
model.add(layers.InputLayer(SZ))
model.add(layers.Dense(SZ))
model.add(layers.Dense(1))
model.add(layers.Dense(SZ))
model.summary()
model.compile('sgd','mse',metrics = ['accuracy'])
history= model.fit(returns.values,returns.values,epochs=100)
My difficulty here is that the weights and components (f being input+dense(SZ)+dense(1),f^{-1} being dense(1)+dense(SZ)) are trained but I do not know how to disentangle them. Is there some way to break off the two layers in the neural network and have them treated as their own separate models?
import tensorflow as tf
SZ=6
encoder_input = tf.keras.layers.Input(shape=(SZ,))
x = tf.keras.layers.Dense(SZ)(encoder_input)
x = tf.keras.layers.Dense(1)(x)
encoder_model = tf.keras.Model(inputs=encoder_input, outputs=x, name='encoder')
decoder_input = tf.keras.layers.Input(shape=(1,))
x2 = tf.keras.layers.Dense(SZ)(decoder_input)
decoder_model = tf.keras.Model(inputs=decoder_input, outputs=x2, name='decoder')
encoder_output = encoder_model(encoder_input)
decoder_output = decoder_model(encoder_output)
encoder_decoder_model = tf.keras.Model(inputs=encoder_input , outputs=decoder_output, name='encoder-decoder')
encoder_decoder_model.summary()
Here is the summary:
Model: "encoder-decoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) [(None, 6)] 0
_________________________________________________________________
encoder (Model) (None, 1) 49
_________________________________________________________________
decoder (Model) (None, 6) 12
=================================================================
Total params: 61
Trainable params: 61
Non-trainable params: 0
you could train the encoder-decoder model and you separate encoder_model and decoder_model will be trained automatically. You could also retrieve them from your encoder_decoder model as follows:
retrieved_encoder = encoder_decoder_model.get_layer('encoder')
retrieved_encoder.summary()
it prints:
Model: "encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) [(None, 6)] 0
_________________________________________________________________
dense_11 (Dense) (None, 6) 42
_________________________________________________________________
dense_12 (Dense) (None, 1) 7
=================================================================
Total params: 49
Trainable params: 49
Non-trainable params: 0
and the decoder:
retrieved_decoder = encoder_decoder_model.get_layer('decoder')
retrieved_decoder.summary()
which prints:
Model: "decoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) [(None, 1)] 0
_________________________________________________________________
dense_13 (Dense) (None, 6) 12
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
If I have:
self.model.add(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True))
self.model.add(BatchNormalization())
self.model.add(Dropout(0.2))
then my seq_length specifies how many slices of data I want to process at once. If it matters, my model is a sequence-to-sequence (same size).
But if I have:
self.model.add(Bidirectional(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True)))
self.model.add(BatchNormalization())
self.model.add(Dropout(0.2))
then is that doubling the sequence size? Or at each time step, is it getting seq_length / 2 before and after that timestep?
Using a bidirectional LSTM layer has no effect on the sequence length.
I tested this with the following code:
from keras.models import Sequential
from keras.layers import Bidirectional,LSTM,BatchNormalization,Dropout,Input
model = Sequential()
lstm1_size = 50
seq_length = 128
feature_dim = 20
model.add(Bidirectional(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True)))
model.add(BatchNormalization())
model.add(Dropout(0.2))
batch_size = 32
model.build(input_shape=(batch_size,seq_length, feature_dim))
model.summary()
This resulted in the following output for bidirectional
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_1 (Bidirection (32, 128, 100) 28400
_________________________________________________________________
batch_normalization_1 (Batch (32, 128, 100) 400
_________________________________________________________________
dropout_1 (Dropout) (32, 128, 100) 0
=================================================================
Total params: 28,800
Trainable params: 28,600
Non-trainable params: 200
No bidirectional layer:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 128, 50) 14200
_________________________________________________________________
batch_normalization_1 (Batch (None, 128, 50) 200
_________________________________________________________________
dropout_1 (Dropout) (None, 128, 50) 0
=================================================================
Total params: 14,400
Trainable params: 14,300
Non-trainable params: 100
_________________________________________________________________
Right now I have the following model called model1:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) (None, 101, 101, 1) 0
__________________________________________________________________________________________________
up_sampling2d_2 (UpSampling2D) (None, 202, 202, 1) 0 input_3[0][0]
__________________________________________________________________________________________________
zero_padding2d_36 (ZeroPadding2 (None, 256, 256, 1) 0 up_sampling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 256, 256, 3) 6 zero_padding2d_36[0][0]
__________________________________________________________________________________________________
u-resnet34 (Model) (None, 256, 256, 1) 24453178 conv2d_3[0][0]
__________________________________________________________________________________________________
input_4 (InputLayer) (None, 1, 1, 1) 0
__________________________________________________________________________________________________
cropping2d_2 (Cropping2D) (None, 202, 202, 1) 0 u-resnet34[1][0]
__________________________________________________________________________________________________
lambda_3 (Lambda) (None, 1, 1, 1) 0 input_4[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 101, 101, 1) 0 cropping2d_2[0][0]
__________________________________________________________________________________________________
lambda_4 (Lambda) (None, 101, 101, 1) 0 lambda_3[0][0]
__________________________________________________________________________________________________
concatenate_10 (Concatenate) (None, 101, 101, 2) 0 max_pooling2d_2[0][0]
lambda_4[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D) (None, 101, 101, 1) 3 concatenate_10[0][0]
==================================================================================================
Total params: 24,453,187
Trainable params: 24,437,821
Non-trainable params: 15,366
_____________________________________
The u-resnet34 layer is another model that has many more layers inside it. I can print the summary of it and I can freeze any layer I want.
When I freeze layers of u-resnet34 and print the summary, I can see that the Trainable params decrease accordingly.
However, even though I'm freezing layers of a model inside model1, the Trainable params of model1 doesn't decrease.
How can I freeze layers of u-resnet34 and make it reflect on the Trainable params of model1?
edit:
Bellow is my code
# https://github.com/qubvel/segmentation_models
from segmentation_models import Unet
from keras.models import Model
from keras.layers import Input, Cropping2D, Conv2D
inputs = Input((256, 256, 3))
resnetmodel = Unet(backbone_name='resnet34', encoder_weights='imagenet', input_shape=(256, 256, 3), activation=None)
outputs = resnetmodel(inputs)
outputs = Cropping2D(cropping=((27, 27), (27, 27)) ) (outputs)
outputs = Conv2D(1, (1, 1), activation='sigmoid') (outputs)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.summary()
That outputs:
Total params: 24,453,180
Trainable params: 24,437,814
Non-trainable params: 15,366
Then:
for layer in resnetmodel.layers:
layer.trainable = False
resnetmodel.summary()
Which outputs:
Total params: 24,453,178
Trainable params: 0
Non-trainable params: 24,453,178
Finnaly this:
model.summary()
Which outputs this:
Total params: 48,890,992
Trainable params: 24,437,814
Non-trainable params: 24,453,178
Let's see with ResNet50 as an example.
from keras.models import Model
from keras.layers import Input, Dense
from keras.applications.resnet50 import ResNet50
res = ResNet50()
res.summary()
#....
#Total params: 25,636,712
#Trainable params: 25,583,592
#Non-trainable params: 53,120
Resnet model has a lot of parameters to train.
Let's put this as a layer of a model.
x = Input((224,224,3))
y = res(x)
y = Dense(10)(y)
model = Model(x, y)
model.summary()
#.....
#Total params: 25,646,722
#Trainable params: 25,593,602
#Non-trainable params: 53,120
Freeze the layers of resnet.
for layer in res.layers:
layer.trainable = False
res.summary()
# ....
#Total params: 25,636,712
#Trainable params: 0
#Non-trainable params: 25,636,712
This reflects to the model that use resnet as well.
model.summary()
#.....
#Total params: 25,646,722
#Trainable params: 10,010
#Non-trainable params: 25,636,712
So, freezing layers of inner models should reflect to the outer model.
EDIT
If you compile the model before freezing your model, you need to compile it again.