What is the difference between Activation layer and activation keyword argument - python

guys what is the difference between activation kwarg and Activation layer in tensorflow?
here's an example :
activation kwarg :
model.add(Dense(64,activation="relu"))
Activation layer :
model.add(Dense(64))
model.add(Activation("sigmoid"))
PS: im new to tensorflow

In Dense(64,activation="relu"), the relu activation function becomes a part of Dense layer, and will be called automatically whenever this Dense layer is called.
In Activation("relu"), the relu activation function is a layer itself and is decoupled from the Dense layer. This is necessary if you want to have a reference to the tensor after Dense but before activation for, says, branching purposes.
input_tensor = Input((10,))
intermediate_tensor = Dense(64)(input_tensor)
branch_1_tensor = Activation('relu')(intermediate_tensor)
branch_2_tensor = Dense(64)(intermediate_tensor)
final_tensor = branch_1_tensor + branch_2_tensor
model = Model(inputs=input_tensor, outputs=final_tensor)
However, your model is Sequential model so your two samples are effectively equal: the relu activation function will be called automatically. To obtain a reference to the tensor before Activation in this case, you can go through model.layers and get the output of the Dense layer from within.

Related

Replace activation function of MobileNetV2 in tensorflow

I am trying to replace all ReLU activation functions in the MobileNetV2 with some custom activation functions(abs, swish, leaky relu etc.). First I liked to start with abs.
I saw a few similar posts but they were not really helpful for my problem -> see this one.
backbone = tf.keras.applications.mobilenet_v2.MobileNetV2(
input_shape=IN_SHAPE, include_top=False, weights=None)
x = tf.keras.layers.GlobalAveragePooling2D()(backbone.output)
x = tf.keras.layers.Dropout(dropout_rate)(x)
kernel_initializer = tf.random_normal_initializer(mean=0.0, stddev=0.02)
bias_initializer = tf.constant_initializer(value=0.0)
x = tf.keras.layers.Dense(
NUM_CLASSES, activation='sigmoid', name='Logits',
kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer)(x)
model = tf.keras.models.Model(inputs=backbone.input, outputs=x)
for layer in model.layers:
layer.trainable = True
smooth = 0.1
loss = tf.keras.losses.BinaryCrossentropy(label_smoothing=smooth)
model = replace_relu_with_abs(model) # function I like to call to replace the activation function
model.compile(
optimizer=optimizer,
loss=loss,
metrics=['accuracy'])
model.save("mobilenetv2-abs")
model = tf.keras.models.load_model("mobilenetv2-abs")
print(model.summary())
# Here the function I like to implement
def replace_relu_with_abs(model):
for layer in model.layers:
# do something
return model
I tried to implement the solution from the link above but that didn't work and after debugging it I saw that MobileNetV2 has in the Conv2d layer linear as an activation function and the relu activation function is a separate layer which comes after the BatchNormalization layer.
Has anyone a tip or solution how to replace the relu activation functions with custom ones( here its rather a ReLU activation layers)?
I am using tensorflow version 2.6

What layers are affected by dropout layer in Tensorflow?

Consider transfer learning in order to use a pretrained model in keras/tensorflow. For each old layer, trained parameter is set to false so that its weights are not updated during training whereas the last layer(s) have been substituted with new layers and these must be trained. Particularly two fully connected hidden layers with 512 and 1024 neurons and and relu activation function have been added. After these layers a Dropout layer is used with rate 0.2. This means that during each epoch of training 20% of the neurons are randomly discarded.
What layers does this dropout layer affect? Does it affect all the network including also the pretrained layers for which layer.trainable=false has been set or does it affect only the newly added layers? Or does it affect only the previous layer (i.e., the one with 1024 neurons)?
In other words, which layer(s) do the neurons that are turned off during each epoch by the dropout belong to?
import os
from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.applications.inception_v3 import InceptionV3
local_weights_file = 'weights.h5'
pre_trained_model = InceptionV3(input_shape = (150, 150, 3),
include_top = False,
weights = None)
pre_trained_model.load_weights(local_weights_file)
for layer in pre_trained_model.layers:
layer.trainable = False
# pre_trained_model.summary()
last_layer = pre_trained_model.get_layer('mixed7')
last_output = last_layer.output
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add two fully connected layers with 512 and 1,024 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense (1, activation='sigmoid')(x)
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])
The dropout layer will affect the output of the previous layer.
If we look at the specific part of your code:
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense (1, activation='sigmoid')(x)
In your case, 20% of the output of the layer defined by x = layers.Dense(1024, activation='relu')(x) will be dropped at random, before being passed to the final Dense layer.
Only the previous layer's neurons are "turned off", but all layers are "affected" in terms of backprop.
Later layers: Dropout's output is input to the next layer, so next layer's outputs will change, and so will next-next's, etc.
Previous layers: as the "effective output" of the pre-Dropout layer is changed, so will gradients to it, and thus any subsequent gradients. In the extreme case of Dropout(rate=1), zero gradient will flow.
Also, note that whole neurons are only dropped if input to Dense is 2D (batch_size, features); Dropout applies a random uniform mask to all dimensions (equivalent to dropping whole neurons in 2D case). To drop whole neurons, set Dropout(.2, noise_shape=(batch_size, 1, features)) (3D case). To drop same neurons across all samples, use noise_shape=(1, 1, features) (or (1, features) for 2D).
Dropout technique is not implemented on every single layer within a neural network; it’s commonly leveraged within the neurons in the last few layers within the network.
The technique works by randomly reducing the number of interconnecting neurons within a neural network. At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contribution from connected neurons
There’s some debate as to whether the dropout should be placed before or after the activation function. As a rule of thumb, place the dropout after the activate function for all activation functions other than relu.
you can add dropout after every hidden layer and generally it affect only the previous layer in (your case it will effect (x = layers.Dense(1024, activation='relu')(x) )). In the original paper that proposed dropout layers, by Hinton (2012), dropout (with p=0.5) was used on each of the fully connected (dense) layers before the output; it was not used on the convolutional layers. This became the most commonly used configuration.
I am adding the resources link that might help you:
https://towardsdatascience.com/understanding-and-implementing-dropout-in-tensorflow-and-keras-a8a3a02c1bfa
https://towardsdatascience.com/dropout-on-convolutional-layers-is-weird-5c6ab14f19b2
https://towardsdatascience.com/machine-learning-part-20-dropout-keras-layers-explained-8c9f6dc4c9ab

relu as a parameter in Dense() ( or any other layer) vs ReLu as a layer in Keras

I was just wondering if there is any significant difference between the use and speciality of
Dense(activation='relu')
and
keras.layers.ReLu
How and where the later one can be used? My best guess is in Functional API usecase but I don't know how.
Creating some Layer instance passing the activation as parameter i.e. activation='relu' is the same as creating some Layer instance and then creating an activation e.g. Relu instance. Relu() is a layer which returns K.relu() function over inputs:
class ReLU(Layer):
.
.
.
def call(self, inputs):
return K.relu(inputs,
alpha=self.negative_slope,
max_value=self.max_value,
threshold=self.threshold)
From the Keras documentation:
Usage of activations
Activations can either be used through an Activation layer, or through
the activation argument supported by all
forward layers:
from keras.layers import Activation, Dense
model.add(Dense(64))
model.add(Activation('tanh'))
This is equivalent to:
model.add(Dense(64, activation='tanh'))
You can also pass an element-wise TensorFlow/Theano/CNTK function as
an activation:
from keras import backend as K
model.add(Dense(64, activation=K.tanh))
Update:
Answering OP's aditional question: How and where the later one can be used?:
You can use it when you used some layer, which doesn't accept activation parameter like e.g. tf.keras.layers.Add, tf.keras.layers.Subtract etc, but you want to get a rectified output of such layers as a result:
added = tf.keras.layers.Add()([x1, x2])
relu = tf.keras.layers.ReLU(added)
The most obvious use case is when you need to put a ReLU without a Dense layer, for example when implementing ResNet, the design requires a ReLU activation after summing the residual connection, like it is shown here:
x = layers.add([x, shortcut])
x = layers.Activation('relu')(x)
return x
It is also useful when you want to put a BatchNormalization layer between the pre-activation of a Dense layer and the ReLU activation. When using a GlobalAveragePooling classifier (such as in the SqueezeNet architecture), then you need to put a softmax activation after the GAP using Activation("softmax") and there are no Dense layers in the network.
There are probably more cases, these are just samples.

what exactly is Dense in LSTM model description?

I am new to deep_learning and working with Keras, so I want to know what is Dense meaning when we have a code like the one below :
I read the https://keras.io/getting-started/sequential-model-guide/
and I also found some explanations like : Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
which didnt help me so much!
model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
Another name for dense layer is Fully-connected layer. It's actually the layer where each neuron is connected to all of the neurons from the next layer. It implements the operation output = X * W + b where X is input to the layer, and W and b are weights and bias of the layer. W ad b are actually the things you're trying to learn. If you want a more detailed explanation, please refer to this article.
A dense layer is a fully-connected layer, i.e. every neurons of the layer N are connected to every neurons of the layer N+1
The code you wrote is not for LSTM, this is a simple neural network of two fully connected layers also known as dense layers, here sequential means the output of one layer will be directly passed to next layer, which is not sequential learning like LSTM.

Softmatx activation in Keras

I am new to Keras.
I want to build a CNN network with Keras.
It is fine until I want to add a softmax dense layer.
The error is TypeError: softmax() got an unexpected keyword argument 'axis'.
I searched Keras and tried to fix it:
from keras.layers import Softmax
output_layer = Dense(units=n_classes, activation=Softmax(axis=-1), kernel_initializer='uniform')
However, it still gives error plus the warning:
UserWarning: Do not pass a layer instance (such as Softmax) as the activation argument of another layer. Instead, advanced activation layers should be used just like any other layer in a model.
What is the correct method of applying softmax?
Thank you very much.
cnn = Sequential()
kernelSize = (3, 3)
ip_activation = 'relu'
ip_conv_0 = Conv2D(filters=32, kernel_size=kernelSize, input_shape=im_shape, activation=ip_activation)
cnn.add(ip_conv_0)
ip_conv_0_1 = Conv2D(filters=64, kernel_size=kernelSize, activation=ip_activation)
cnn.add(ip_conv_0_1)
pool_0 = MaxPool2D(pool_size=(2, 2), strides=(2, 2), padding="same")
cnn.add(pool_0)
drop_layer_0 = Dropout(0.2)
cnn.add(drop_layer_0)
flat_layer_0 = Flatten()
cnn.add(Flatten())
h_dense_0 = Dense(units=128, activation=ip_activation, kernel_initializer='uniform')
cnn.add(h_dense_0)
h_dense_1 = Dense(units=64, activation=ip_activation, kernel_initializer='uniform')
cnn.add(h_dense_1)
op_activation = 'softmax'
output_layer = Dense(units=n_classes, activation=op_activation, kernel_initializer='uniform')
cnn.add(output_layer)#<---------error
TypeError: softmax() got an unexpected keyword argument 'axis'
Keras differ between the layer softmax and the activation function softmax. See here. As the error suggests you are trying to pass a whole layer where a function is needed. The most simple solution would be to just pass a string:
output_layer = Dense(units=500, activation='softmax', kernel_initializer='uniform')
Another possibility would be to pass no activation function to the layer and apply an Activation layer afterwards.

Categories