Set starting weights individually for neural network - python

I have a simple feed forward neural network consisting of 8 input neurons, followed by 2 hidden layers, each with 6 hidden neurons and 1 output layer consisting of 1 output neuron.
The Keras code is:
model = Sequential()
model.add(Dense(6, input_dim = 8, activation='tanh')
model.add(Dense(6, activation='tanh'))
model.add(Dense(1, activation='tanh'))
Question:
Since I know which of the 8 input parameters has the strongest impact on the single output, I could set their start weights to a higher value relative to the other input parameters. If this would be possible that could reduce the training time significantly (if I am not wrong).

# reading the initial weights and bias of the input layer
layer_1 = (model.layers)[0]
# reading the initial weights of the input layer
w_1 = layer_1.get_weights()[0]
# setting weights for nth parameter of the input layer to a modified value val
w_1[n, :] = val
# setting the modified weights and unmodified bias of the input layer
layer_1.set_weights([w_1, layer_1.get_weights()[1]])
# writing layer_1 to model
(model.layers)[0] = layer_1

Related

How to select number of hidden layers and number of memory cells in LSTM?

How to select number of hidden layers and number of memory cells in LSTM?
I want make LSTM model about classification.
from tensorflow.keras import Sequential
model = Sequential()
model.add(Embedding(44000,32))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
The number of memory cells can be set by passing in input_length parameter to your embedding layer, as it is defined by the length of your input sequences. This is optional and can be inferred when training data is provided.
You can increase the number of hidden LSTM layers by simply adding more. You need to set return_sequences=True to maintain the temporal dimension for your intermediate LSTM layers, however, ie,
model = Sequential()
model.add(Embedding(44000,32)) # 32-dim encoding is pretty small
model.add(LSTM(32), return_sequences=True)
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
gives two LSTM layers.
There is a pretty comprehensive guide to using RNNs for text classification in the Tensorflow documentation.

Why the tuned number of hidden layers (show 3) is not the same as the units (show 4 units) when tuning an ANN model using KerasTurner?

I am currently using the KerasTuner to tune my Artificial Neural Network (ANN) deep learning model for a binary classification project (tabular dataset ). Below is my function to build the model:
def build_model(hp):
# Create a Sequential model
model = tf.keras.Sequential()
# Input Layer: The now model will take as input arrays of shape (None, 67)
model.add(tf.keras.Input(shape = (X_train.shape[1],)))
# Tune number of hidden layers and number of neurons
for i in range(hp.Int('num_layers', min_value = 1, max_value = 4)):
hp_units = hp.Int(f'units_{i}', min_value = 64, max_value = 512, step = 5)
model.add(Dense(units = hp_units, activation = 'relu'))
# Output Layer
model.add(Dense(units = 1, activation='sigmoid'))
# Compile the model
hp_learning_rate = hp.Choice('learning_rate', values = [1e-2, 1e-3, 1e-4])
model.compile(optimizer = keras.optimizers.Adam(learning_rate = hp_learning_rate),
loss = keras.losses.BinaryCrossentropy(),
metrics = ["accuracy"]
)
return model
Codes of creating tuner:
import os
# HyperBand algorithm from keras tuner
hpb_tuner = kt.Hyperband(
hypermodel = build_model,
objective = 'val_accuracy',
max_epochs = 500,
seed = 42,
executions_per_trial = 3,
directory = os.getcwd(),
project_name = "Medical Claim (ANN)",
)
hpb_tuner.search_space_summary()
The best result shows that I have to use 3 hidden layers. However, why there is a total of 4 hidden layers shown?
If I didn't misunderstand, the num_layers parameter indicates how many hidden layers I have to use in my ANN, and parameters units_0 to units_3 indicate how many neurons I have to use in each hidden layer where units_0 refers to the first hidden layer, units_1 refers to the second hidden layer and so forth. The input layer of my ANN should equal the number of features in my dataset which is 67 as shown in my code above (within the build_model function), so I believe the units_0 does not refer to the number of neurons in the input layer.
Is there something wrong with my code? Hope any gurus here can solve my doubt and problem!

What layers are affected by dropout layer in Tensorflow?

Consider transfer learning in order to use a pretrained model in keras/tensorflow. For each old layer, trained parameter is set to false so that its weights are not updated during training whereas the last layer(s) have been substituted with new layers and these must be trained. Particularly two fully connected hidden layers with 512 and 1024 neurons and and relu activation function have been added. After these layers a Dropout layer is used with rate 0.2. This means that during each epoch of training 20% of the neurons are randomly discarded.
What layers does this dropout layer affect? Does it affect all the network including also the pretrained layers for which layer.trainable=false has been set or does it affect only the newly added layers? Or does it affect only the previous layer (i.e., the one with 1024 neurons)?
In other words, which layer(s) do the neurons that are turned off during each epoch by the dropout belong to?
import os
from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.applications.inception_v3 import InceptionV3
local_weights_file = 'weights.h5'
pre_trained_model = InceptionV3(input_shape = (150, 150, 3),
include_top = False,
weights = None)
pre_trained_model.load_weights(local_weights_file)
for layer in pre_trained_model.layers:
layer.trainable = False
# pre_trained_model.summary()
last_layer = pre_trained_model.get_layer('mixed7')
last_output = last_layer.output
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add two fully connected layers with 512 and 1,024 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense (1, activation='sigmoid')(x)
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])
The dropout layer will affect the output of the previous layer.
If we look at the specific part of your code:
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense (1, activation='sigmoid')(x)
In your case, 20% of the output of the layer defined by x = layers.Dense(1024, activation='relu')(x) will be dropped at random, before being passed to the final Dense layer.
Only the previous layer's neurons are "turned off", but all layers are "affected" in terms of backprop.
Later layers: Dropout's output is input to the next layer, so next layer's outputs will change, and so will next-next's, etc.
Previous layers: as the "effective output" of the pre-Dropout layer is changed, so will gradients to it, and thus any subsequent gradients. In the extreme case of Dropout(rate=1), zero gradient will flow.
Also, note that whole neurons are only dropped if input to Dense is 2D (batch_size, features); Dropout applies a random uniform mask to all dimensions (equivalent to dropping whole neurons in 2D case). To drop whole neurons, set Dropout(.2, noise_shape=(batch_size, 1, features)) (3D case). To drop same neurons across all samples, use noise_shape=(1, 1, features) (or (1, features) for 2D).
Dropout technique is not implemented on every single layer within a neural network; it’s commonly leveraged within the neurons in the last few layers within the network.
The technique works by randomly reducing the number of interconnecting neurons within a neural network. At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contribution from connected neurons
There’s some debate as to whether the dropout should be placed before or after the activation function. As a rule of thumb, place the dropout after the activate function for all activation functions other than relu.
you can add dropout after every hidden layer and generally it affect only the previous layer in (your case it will effect (x = layers.Dense(1024, activation='relu')(x) )). In the original paper that proposed dropout layers, by Hinton (2012), dropout (with p=0.5) was used on each of the fully connected (dense) layers before the output; it was not used on the convolutional layers. This became the most commonly used configuration.
I am adding the resources link that might help you:
https://towardsdatascience.com/understanding-and-implementing-dropout-in-tensorflow-and-keras-a8a3a02c1bfa
https://towardsdatascience.com/dropout-on-convolutional-layers-is-weird-5c6ab14f19b2
https://towardsdatascience.com/machine-learning-part-20-dropout-keras-layers-explained-8c9f6dc4c9ab

Can CNN do better than pretrained CNN?

With all I know. pretrained CNN can do way better than CNN. I have a dataset of 855 images. I have applied CNN and got 94% accuracy.Then I applied Pretrained model (VGG16, ResNet50, Inception_V3, MobileNet)also with fine tuning but still i got highest 60% and two of them are doing very bad on classification. Can CNN really do better than pretrained model or my implementation is wrong. I've converted my image into 100 by 100 dimensions and followed the way of keras application. Then What is the issue ??
Naive CNN approach :
def cnn_model():
size = (100,100,1)
num_cnn_layers =2
NUM_FILTERS = 32
KERNEL = (3, 3)
MAX_NEURONS = 120
model = Sequential()
for i in range(1, num_cnn_layers+1):
if i == 1:
model.add(Conv2D(NUM_FILTERS*i, KERNEL, input_shape=size,
activation='relu', padding='same'))
else:
model.add(Conv2D(NUM_FILTERS*i, KERNEL, activation='relu',
padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(int(MAX_NEURONS), activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(int(MAX_NEURONS/2), activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
return model
VGG16 approach:
def vgg():
` `vgg_model = keras.applications.vgg16.VGG16(weights='imagenet',include_top=False,input_shape = (100,100,3))
model = Sequential()
for layer in vgg_model.layers:
model.add(layer)
# Freeze the layers
for layer in model.layers:
layer.trainable = False
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(3, activation='softmax'))
model.compile(optimizer=keras.optimizers.Adam(lr=1e-5),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
What you're referring to as CNN in both cases talk about the same thing, which is a type of a neural network model. It's just that the pre-trained model has been trained on some other data instead of the dataset you're working on and trying to classify.
What is usually used here is called Transfer Learning. Instead of freezing all the layers, trying leaving the last few layers open so they can be retrained with your own data, so that the pretrained model can edit its weights and biases to match your needs as well. It could be the case that the dataset you're trying to classify is foreign to the pretrained models.
Here's an example from my own work, there are additional pieces of code but you can make it work with your own code, the logic remains the same
#You extract the layer which you want to manipulate, usually the last few.
last_layer = pre_trained_model.get_layer(name_of_layer)
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024,activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(1,activation='sigmoid')(x)
#Here we combine your newly added layers and the pre-trained model.
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])
Adding to what #Ilknur Mustafa mentioned, as your dataset may be foreign to the images used for pre-training, you can try to re-train few last layers of the pre-trained model instead of adding a whole new layers. The below example code doesn't add any additional trainable layer other than the output layer. In this way, you can benefit by retraining the last few layers on the existing weights, rather than training from scratch. This may be beneficial if you don't have a large dataset to train on.
# load model without classifier layers
vgg = VGG16(include_top=False, input_shape=(100, 100, 3), weights='imagenet', pooling='avg')
# make only last 2 conv layers trainable
for layer in vgg.layers[:-4]:
layer.trainable = False
# add output layer
out_layer = Dense(3, activation='softmax')(vgg.layers[-1].output)
model_pre_vgg = Model(vgg.input, out_layer)
# compile model
opt = SGD(lr=1e-5)
model_pre_vgg.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
#You extract the layer which you want to manipulate, usually the last few.
last_layer = pre_trained_model.get_layer(name_of_layer)
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024,activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(1,activation='sigmoid')(x)
#Here we combine your newly added layers and the pre-trained model.
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])

How to switch Off/On an LSTM layer?

I am looking for a way to access the LSTM layer such that the addition and subtraction of a layer are event-driven. So the Layer can be added or subtracted when there is a function trigger.
For Example (hypothetically):
Add an LSTM layer if a = 2 and remove an LSTM layer if a = 3.
Here a = 2 and a= 3 is supposed to be a python function which returns specific value based on which the LSTM layer should be added or removed. I want to add a switch function to the layer so that it can be switched on or off based on the python function.
Is it possible?
Currently, I need to hard code the layer needed. For eg:
# Initialising the RNN
regressor = Sequential()
# Adding the first LSTM layer and some Dropout regularization
regressor.add(LSTM(units = 60, return_sequences = True, input_shape =
(X_train.shape[1], X_train.shape[2])))
#regressor.add(Dropout(0.1))
# Adding the 2nd LSTM layer and some Dropout regularization
regressor.add(LSTM(units = 60, return_sequences = True))
regressor.add(Dropout(0.1))
My goal is to both add and subtract these layers at runtime.
Any help is appreciated!!
I found the answer and posting in case anyone else is looking for the solution.
This can be done by using freeze Keras layer functionality. Basically, you need to pass the boolean trainable argument to the layer constructor to set it as non-trainable.
Eg:
frozen_layer = Dense(32, trainable=False)
Additionally, in case you want to set the trainable property of a layer to True or False after instantiation. By calling compile() on your model after modifying the trainable property. Eg:
x = Input(shape=(32,))
layer = Dense(32)
layer.trainable = False
y = layer(x)
frozen_model = Model(x, y)
# the weights of layer will not be updated during training for below model
frozen_model.compile(optimizer='rmsprop', loss='mse')
layer.trainable = True
trainable_model = Model(x, y)
# the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model.compile(optimizer='rmsprop', loss='mse')
frozen_model.fit(data, labels) # this does NOT update the weights of layer
trainable_model.fit(data, labels) # this updates the weights of layer
Hope this helps!!

Categories