I am trying to replace all ReLU activation functions in the MobileNetV2 with some custom activation functions(abs, swish, leaky relu etc.). First I liked to start with abs.
I saw a few similar posts but they were not really helpful for my problem -> see this one.
backbone = tf.keras.applications.mobilenet_v2.MobileNetV2(
input_shape=IN_SHAPE, include_top=False, weights=None)
x = tf.keras.layers.GlobalAveragePooling2D()(backbone.output)
x = tf.keras.layers.Dropout(dropout_rate)(x)
kernel_initializer = tf.random_normal_initializer(mean=0.0, stddev=0.02)
bias_initializer = tf.constant_initializer(value=0.0)
x = tf.keras.layers.Dense(
NUM_CLASSES, activation='sigmoid', name='Logits',
kernel_initializer=kernel_initializer,
bias_initializer=bias_initializer)(x)
model = tf.keras.models.Model(inputs=backbone.input, outputs=x)
for layer in model.layers:
layer.trainable = True
smooth = 0.1
loss = tf.keras.losses.BinaryCrossentropy(label_smoothing=smooth)
model = replace_relu_with_abs(model) # function I like to call to replace the activation function
model.compile(
optimizer=optimizer,
loss=loss,
metrics=['accuracy'])
model.save("mobilenetv2-abs")
model = tf.keras.models.load_model("mobilenetv2-abs")
print(model.summary())
# Here the function I like to implement
def replace_relu_with_abs(model):
for layer in model.layers:
# do something
return model
I tried to implement the solution from the link above but that didn't work and after debugging it I saw that MobileNetV2 has in the Conv2d layer linear as an activation function and the relu activation function is a separate layer which comes after the BatchNormalization layer.
Has anyone a tip or solution how to replace the relu activation functions with custom ones( here its rather a ReLU activation layers)?
I am using tensorflow version 2.6
Related
Could you help me with the code such that along with the dense layers also the last convolutional layer of Efficientnet is trained as well ?
features_url ="https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet21k_b3/feature_vector/2"
img_shape = (299,299,3)
features_layer = hub.KerasLayer(features_url,
input_shape=img_shape)
# below commented code keeps all the cnn layers frozen, thus it does not work for me at the moment
#features_layer.trainable = False
model = tf.keras.Sequential([
features_layer,
tf.keras.layers.Dense(256, activation = 'relu'),
tf.keras.layers.Dense(64, activation = 'relu'),
tf.keras.layers.Dense(4, activation = 'softmax')
])
In addition how can I save in a variable the name of the last convolutional layer ?
guys what is the difference between activation kwarg and Activation layer in tensorflow?
here's an example :
activation kwarg :
model.add(Dense(64,activation="relu"))
Activation layer :
model.add(Dense(64))
model.add(Activation("sigmoid"))
PS: im new to tensorflow
In Dense(64,activation="relu"), the relu activation function becomes a part of Dense layer, and will be called automatically whenever this Dense layer is called.
In Activation("relu"), the relu activation function is a layer itself and is decoupled from the Dense layer. This is necessary if you want to have a reference to the tensor after Dense but before activation for, says, branching purposes.
input_tensor = Input((10,))
intermediate_tensor = Dense(64)(input_tensor)
branch_1_tensor = Activation('relu')(intermediate_tensor)
branch_2_tensor = Dense(64)(intermediate_tensor)
final_tensor = branch_1_tensor + branch_2_tensor
model = Model(inputs=input_tensor, outputs=final_tensor)
However, your model is Sequential model so your two samples are effectively equal: the relu activation function will be called automatically. To obtain a reference to the tensor before Activation in this case, you can go through model.layers and get the output of the Dense layer from within.
With all I know. pretrained CNN can do way better than CNN. I have a dataset of 855 images. I have applied CNN and got 94% accuracy.Then I applied Pretrained model (VGG16, ResNet50, Inception_V3, MobileNet)also with fine tuning but still i got highest 60% and two of them are doing very bad on classification. Can CNN really do better than pretrained model or my implementation is wrong. I've converted my image into 100 by 100 dimensions and followed the way of keras application. Then What is the issue ??
Naive CNN approach :
def cnn_model():
size = (100,100,1)
num_cnn_layers =2
NUM_FILTERS = 32
KERNEL = (3, 3)
MAX_NEURONS = 120
model = Sequential()
for i in range(1, num_cnn_layers+1):
if i == 1:
model.add(Conv2D(NUM_FILTERS*i, KERNEL, input_shape=size,
activation='relu', padding='same'))
else:
model.add(Conv2D(NUM_FILTERS*i, KERNEL, activation='relu',
padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(int(MAX_NEURONS), activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(int(MAX_NEURONS/2), activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
return model
VGG16 approach:
def vgg():
` `vgg_model = keras.applications.vgg16.VGG16(weights='imagenet',include_top=False,input_shape = (100,100,3))
model = Sequential()
for layer in vgg_model.layers:
model.add(layer)
# Freeze the layers
for layer in model.layers:
layer.trainable = False
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(3, activation='softmax'))
model.compile(optimizer=keras.optimizers.Adam(lr=1e-5),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
What you're referring to as CNN in both cases talk about the same thing, which is a type of a neural network model. It's just that the pre-trained model has been trained on some other data instead of the dataset you're working on and trying to classify.
What is usually used here is called Transfer Learning. Instead of freezing all the layers, trying leaving the last few layers open so they can be retrained with your own data, so that the pretrained model can edit its weights and biases to match your needs as well. It could be the case that the dataset you're trying to classify is foreign to the pretrained models.
Here's an example from my own work, there are additional pieces of code but you can make it work with your own code, the logic remains the same
#You extract the layer which you want to manipulate, usually the last few.
last_layer = pre_trained_model.get_layer(name_of_layer)
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024,activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(1,activation='sigmoid')(x)
#Here we combine your newly added layers and the pre-trained model.
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])
Adding to what #Ilknur Mustafa mentioned, as your dataset may be foreign to the images used for pre-training, you can try to re-train few last layers of the pre-trained model instead of adding a whole new layers. The below example code doesn't add any additional trainable layer other than the output layer. In this way, you can benefit by retraining the last few layers on the existing weights, rather than training from scratch. This may be beneficial if you don't have a large dataset to train on.
# load model without classifier layers
vgg = VGG16(include_top=False, input_shape=(100, 100, 3), weights='imagenet', pooling='avg')
# make only last 2 conv layers trainable
for layer in vgg.layers[:-4]:
layer.trainable = False
# add output layer
out_layer = Dense(3, activation='softmax')(vgg.layers[-1].output)
model_pre_vgg = Model(vgg.input, out_layer)
# compile model
opt = SGD(lr=1e-5)
model_pre_vgg.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
#You extract the layer which you want to manipulate, usually the last few.
last_layer = pre_trained_model.get_layer(name_of_layer)
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024,activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(1,activation='sigmoid')(x)
#Here we combine your newly added layers and the pre-trained model.
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])
I have a neural network model that is created in convnet.js that I have to define using Keras. Does anyone have an idea how can I do that?
neural = {
net : new convnetjs.Net(),
layer_defs : [
{type:'input', out_sx:4, out_sy:4, out_depth:1},
{type:'fc', num_neurons:25, activation:"regression"},
{type:'regression', num_neurons:5}
],
neuralDepth: 1
}
this is what I could do so far. I cannot ve sure if it's correct.
#---Build Model-----
model = models.Sequential()
# Input - Layer
model.add(layers.Dense(4, activation = "relu", input_shape=(4,)))
# Hidden - Layers
model.add(layers.Dense(25, activation = "relu"))
model.add(layers.Dense(5, activation = "relu"))
# Output- Layer
model.add(layers.Dense(1, activation = "linear"))
model.summary()
# Compile Model
model.compile(loss= "mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"])
From the Convnet.js doc : "your last layer must be a loss layer ('softmax' or 'svm' for classification, or 'regression' for regression)."
Also : "Create a regression layer which takes a list of targets (arbitrary numbers, not necessarily a single discrete class label as in softmax/svm) and backprops the L2 Loss."
It's unclear. I suspect "regression" layer is just another layer of Dense (Fully connected) neurons. The 'regression' word probably refers to linear activity. So, no 'relu' this time ?
Anyway, it would probably look something like (no sequential mode):
from keras.layers import Dense
from keras.models import Model
my_input = Input(shape = (4, ))
x = Dense(25, activation='relu')(x)
x = Dense(4)(x)
my_model = Model(input=my_input, output=x, loss='mse', metrics='mse')
my_model.compile(optimizer=Adam(LEARNING_RATE), loss='binary_crossentropy', metrics=['mse'])
After reading a bit of the docs, the convnet.js seems like a nice project. It would be much better with somebody with neural network knowledge on board.
I am trying to optimize the hyperparameters of my NN using Keras and sklearn.
I am wrapping up with KerasClassifier (it´s a classification problem).
I am trying to optimize the number of hidden layers.
I can´t figure it out how to do it with keras (actually I am wondering how to set up the function create_model in order to maximize the number of hidden layers)
Could anyone please help me?
My code (just the important part):
## Import `Sequential` from `keras.models`
from keras.models import Sequential
# Import `Dense` from `keras.layers`
from keras.layers import Dense
def create_model(optimizer='adam', activation = 'sigmoid'):
# Initialize the constructor
model = Sequential()
# Add an input layer
model.add(Dense(5, activation=activation, input_shape=(5,)))
# Add one hidden layer
model.add(Dense(8, activation=activation))
# Add an output layer
model.add(Dense(1, activation=activation))
#compile model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=
['accuracy'])
return model
my_classifier = KerasClassifier(build_fn=create_model, verbose=0)# Create
hyperparameter space
epochs = [5, 10]
batches = [5, 10, 100]
optimizers = ['rmsprop', 'adam']
activation1 = ['relu','sigmoid']
# Create grid search
grid = RandomizedSearchCV(estimator=my_classifier,
param_distributions=hyperparameters) #inserir param_distributions
# Fit grid search
grid_result = grid.fit(X_train, y_train)
# Create hyperparameter options
hyperparameters = dict(optimizer=optimizers, epochs=epochs,
batch_size=batches, activation=activation1)
# View hyperparameters of best neural network
grid_result.best_params_
If you want to make the number of hidden layers a hyperparameter you have to add it as parameter to your KerasClassifier build_fn like:
def create_model(optimizer='adam', activation = 'sigmoid', hidden_layers=1):
# Initialize the constructor
model = Sequential()
# Add an input layer
model.add(Dense(5, activation=activation, input_shape=(5,)))
for i in range(hidden_layers):
# Add one hidden layer
model.add(Dense(8, activation=activation))
# Add an output layer
model.add(Dense(1, activation=activation))
#compile model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=
['accuracy'])
return model
Then you will be able to optimize the number of hidden layers by adding it to the dictionary, which is passed to RandomizedSearchCV's param_distributions.
One more thing, you probably should separate the activation you use for the output layer from the other layers.
Different classes of activation functions are suitable for hidden layers and for output layers used in binary classification.