What layers are affected by dropout layer in Tensorflow? - python

Consider transfer learning in order to use a pretrained model in keras/tensorflow. For each old layer, trained parameter is set to false so that its weights are not updated during training whereas the last layer(s) have been substituted with new layers and these must be trained. Particularly two fully connected hidden layers with 512 and 1024 neurons and and relu activation function have been added. After these layers a Dropout layer is used with rate 0.2. This means that during each epoch of training 20% of the neurons are randomly discarded.
What layers does this dropout layer affect? Does it affect all the network including also the pretrained layers for which layer.trainable=false has been set or does it affect only the newly added layers? Or does it affect only the previous layer (i.e., the one with 1024 neurons)?
In other words, which layer(s) do the neurons that are turned off during each epoch by the dropout belong to?
import os
from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.applications.inception_v3 import InceptionV3
local_weights_file = 'weights.h5'
pre_trained_model = InceptionV3(input_shape = (150, 150, 3),
include_top = False,
weights = None)
pre_trained_model.load_weights(local_weights_file)
for layer in pre_trained_model.layers:
layer.trainable = False
# pre_trained_model.summary()
last_layer = pre_trained_model.get_layer('mixed7')
last_output = last_layer.output
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add two fully connected layers with 512 and 1,024 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense (1, activation='sigmoid')(x)
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])

The dropout layer will affect the output of the previous layer.
If we look at the specific part of your code:
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense (1, activation='sigmoid')(x)
In your case, 20% of the output of the layer defined by x = layers.Dense(1024, activation='relu')(x) will be dropped at random, before being passed to the final Dense layer.

Only the previous layer's neurons are "turned off", but all layers are "affected" in terms of backprop.
Later layers: Dropout's output is input to the next layer, so next layer's outputs will change, and so will next-next's, etc.
Previous layers: as the "effective output" of the pre-Dropout layer is changed, so will gradients to it, and thus any subsequent gradients. In the extreme case of Dropout(rate=1), zero gradient will flow.
Also, note that whole neurons are only dropped if input to Dense is 2D (batch_size, features); Dropout applies a random uniform mask to all dimensions (equivalent to dropping whole neurons in 2D case). To drop whole neurons, set Dropout(.2, noise_shape=(batch_size, 1, features)) (3D case). To drop same neurons across all samples, use noise_shape=(1, 1, features) (or (1, features) for 2D).

Dropout technique is not implemented on every single layer within a neural network; it’s commonly leveraged within the neurons in the last few layers within the network.
The technique works by randomly reducing the number of interconnecting neurons within a neural network. At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contribution from connected neurons
There’s some debate as to whether the dropout should be placed before or after the activation function. As a rule of thumb, place the dropout after the activate function for all activation functions other than relu.
you can add dropout after every hidden layer and generally it affect only the previous layer in (your case it will effect (x = layers.Dense(1024, activation='relu')(x) )). In the original paper that proposed dropout layers, by Hinton (2012), dropout (with p=0.5) was used on each of the fully connected (dense) layers before the output; it was not used on the convolutional layers. This became the most commonly used configuration.
I am adding the resources link that might help you:
https://towardsdatascience.com/understanding-and-implementing-dropout-in-tensorflow-and-keras-a8a3a02c1bfa
https://towardsdatascience.com/dropout-on-convolutional-layers-is-weird-5c6ab14f19b2
https://towardsdatascience.com/machine-learning-part-20-dropout-keras-layers-explained-8c9f6dc4c9ab

Related

Can we have a dense layer between Conv layers in 1D CNN Architecture?

I have a question regarding the one-dimensional convolutional neural network 1D CNN.
Can we have a dense layer between Conv layers in the architecture? just like what I have done in the following example:
Note: It is working correctly with CSV files for classification problems.
model = Sequential()
# First Convolusional Layer
model.add(Conv1D(128, 5, input_shape=(20,1), strides=2, padding='same'))
model.add(Dense(256, activation="relu"))
model.add(MaxPooling1D())
# Second Convolusional Layer
model.add(Conv1D(128, 3, strides=1, padding='same'))
model.add(Dense(64, activation="relu"))
model.add(MaxPooling1D())
# Passing to Fully Connected Layers
model.add(Flatten())
model.add(Dense(32, activation = 'relu'))
#model.add(Dropout(0.02))
# Output Layer
model.add(Dense(2, activation = 'sigmoid'))
# Model Compilation
model.compile(loss = 'sparse_categorical_crossentropy',
optimizer = "adam", metrics = ['accuracy'])
# Summary of The Model
model.summary()
Thank you very much!
Yes, that you can certainly do. It is not usual at all and not very advisable from a theoretical perspective but it is possible.
why is it not advisable? (theory) With convolutions one tries to capture spatial features (i.e. information). Values next to each other should have an influence but values far away from this point (in time -- in the case of time series data) should have less influence. That is the whole idea of CNNs. To a fully connected NN the order in which the input is presented to it is not important. It looks at all inputs at the same time since it is equally connected to all. So you loose spatial information. BTW, that is also the reason why it is plausible to do a global pooling before feeding the output of the CNN-part of a model to the fully-connected part of a model (i.e. the dense layers).
Now if you do convolution, you care about spatial information. If you then apply a dense layer, you kind of say "I cared enough about the spatial info". If you then apply convolution again on the output vector of a dense layer it becomes totally irrational.
feasibility
Nonetheless, such a network would be feasible. You would just need to make sure that the dense layer outputs a vector (or matrix) again, on which you can apply convolution.
However, your code lacks of a proper adapter from the output of the convolution layer to the dense layer. You should apply some type of global pooling operation to create a vector that serves as an input to the dense layer. That would also save you the Flatten() step. Again, it should work your way anyway. It is just about style since now your are sending mixed signals: flatten should concatenate all spatial layers but the NN ignores spatial information...
I don't get the point of applying MaxPooling1D´ after the Dense-layer. One could simply reduce the number of outputs of the Dense-layer. And you definitely don't need a second Flattenafter aDense`-layer as it returns a vector by definition (and pooling won't add a dimension to it)

How many model parameters do we need to optimize for the following CNN model?

We use the following convolutional neural network to classify a set of 32×32 greyscale images (so the input size will be 32$\times$32$\times$1):
Layer 1: convolutional layer with the ReLU nonlinear activation function, 100 5×5 filters with stride 1.
Layer 2: 2×2 max-pooling layer
Layer 3: convolutional layer with the ReLU nonlinear activation function, 50 3×3 filters with stride 1.
Layer 4: 2×2 max-pooling layer
Layer 5: fully-connected layer
Layer 6: classification layer
How many model parameters do we need to optimize in the first layer and in the second layer (assume the bias term is used)
You can use tensorflow to display the number of trainable parameters:
import tensorflow as tf
from tensorflow.keras import layers
def make_model():
model = tf.keras.Sequential()
model.add(layers.Conv2D(100, (5, 5), strides=1,input_shape=[32, 32, 1]))
model.add(layers.LeakyReLU())
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
return model
model=make_model()
model.summary()
This gives 2600 trainable params and 0 non-trainable params.

Set starting weights individually for neural network

I have a simple feed forward neural network consisting of 8 input neurons, followed by 2 hidden layers, each with 6 hidden neurons and 1 output layer consisting of 1 output neuron.
The Keras code is:
model = Sequential()
model.add(Dense(6, input_dim = 8, activation='tanh')
model.add(Dense(6, activation='tanh'))
model.add(Dense(1, activation='tanh'))
Question:
Since I know which of the 8 input parameters has the strongest impact on the single output, I could set their start weights to a higher value relative to the other input parameters. If this would be possible that could reduce the training time significantly (if I am not wrong).
# reading the initial weights and bias of the input layer
layer_1 = (model.layers)[0]
# reading the initial weights of the input layer
w_1 = layer_1.get_weights()[0]
# setting weights for nth parameter of the input layer to a modified value val
w_1[n, :] = val
# setting the modified weights and unmodified bias of the input layer
layer_1.set_weights([w_1, layer_1.get_weights()[1]])
# writing layer_1 to model
(model.layers)[0] = layer_1

what exactly is Dense in LSTM model description?

I am new to deep_learning and working with Keras, so I want to know what is Dense meaning when we have a code like the one below :
I read the https://keras.io/getting-started/sequential-model-guide/
and I also found some explanations like : Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
which didnt help me so much!
model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
Another name for dense layer is Fully-connected layer. It's actually the layer where each neuron is connected to all of the neurons from the next layer. It implements the operation output = X * W + b where X is input to the layer, and W and b are weights and bias of the layer. W ad b are actually the things you're trying to learn. If you want a more detailed explanation, please refer to this article.
A dense layer is a fully-connected layer, i.e. every neurons of the layer N are connected to every neurons of the layer N+1
The code you wrote is not for LSTM, this is a simple neural network of two fully connected layers also known as dense layers, here sequential means the output of one layer will be directly passed to next layer, which is not sequential learning like LSTM.

How to switch Off/On an LSTM layer?

I am looking for a way to access the LSTM layer such that the addition and subtraction of a layer are event-driven. So the Layer can be added or subtracted when there is a function trigger.
For Example (hypothetically):
Add an LSTM layer if a = 2 and remove an LSTM layer if a = 3.
Here a = 2 and a= 3 is supposed to be a python function which returns specific value based on which the LSTM layer should be added or removed. I want to add a switch function to the layer so that it can be switched on or off based on the python function.
Is it possible?
Currently, I need to hard code the layer needed. For eg:
# Initialising the RNN
regressor = Sequential()
# Adding the first LSTM layer and some Dropout regularization
regressor.add(LSTM(units = 60, return_sequences = True, input_shape =
(X_train.shape[1], X_train.shape[2])))
#regressor.add(Dropout(0.1))
# Adding the 2nd LSTM layer and some Dropout regularization
regressor.add(LSTM(units = 60, return_sequences = True))
regressor.add(Dropout(0.1))
My goal is to both add and subtract these layers at runtime.
Any help is appreciated!!
I found the answer and posting in case anyone else is looking for the solution.
This can be done by using freeze Keras layer functionality. Basically, you need to pass the boolean trainable argument to the layer constructor to set it as non-trainable.
Eg:
frozen_layer = Dense(32, trainable=False)
Additionally, in case you want to set the trainable property of a layer to True or False after instantiation. By calling compile() on your model after modifying the trainable property. Eg:
x = Input(shape=(32,))
layer = Dense(32)
layer.trainable = False
y = layer(x)
frozen_model = Model(x, y)
# the weights of layer will not be updated during training for below model
frozen_model.compile(optimizer='rmsprop', loss='mse')
layer.trainable = True
trainable_model = Model(x, y)
# the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model.compile(optimizer='rmsprop', loss='mse')
frozen_model.fit(data, labels) # this does NOT update the weights of layer
trainable_model.fit(data, labels) # this updates the weights of layer
Hope this helps!!

Categories