I am looking for a way to access the LSTM layer such that the addition and subtraction of a layer are event-driven. So the Layer can be added or subtracted when there is a function trigger.
For Example (hypothetically):
Add an LSTM layer if a = 2 and remove an LSTM layer if a = 3.
Here a = 2 and a= 3 is supposed to be a python function which returns specific value based on which the LSTM layer should be added or removed. I want to add a switch function to the layer so that it can be switched on or off based on the python function.
Is it possible?
Currently, I need to hard code the layer needed. For eg:
# Initialising the RNN
regressor = Sequential()
# Adding the first LSTM layer and some Dropout regularization
regressor.add(LSTM(units = 60, return_sequences = True, input_shape =
(X_train.shape[1], X_train.shape[2])))
#regressor.add(Dropout(0.1))
# Adding the 2nd LSTM layer and some Dropout regularization
regressor.add(LSTM(units = 60, return_sequences = True))
regressor.add(Dropout(0.1))
My goal is to both add and subtract these layers at runtime.
Any help is appreciated!!
I found the answer and posting in case anyone else is looking for the solution.
This can be done by using freeze Keras layer functionality. Basically, you need to pass the boolean trainable argument to the layer constructor to set it as non-trainable.
Eg:
frozen_layer = Dense(32, trainable=False)
Additionally, in case you want to set the trainable property of a layer to True or False after instantiation. By calling compile() on your model after modifying the trainable property. Eg:
x = Input(shape=(32,))
layer = Dense(32)
layer.trainable = False
y = layer(x)
frozen_model = Model(x, y)
# the weights of layer will not be updated during training for below model
frozen_model.compile(optimizer='rmsprop', loss='mse')
layer.trainable = True
trainable_model = Model(x, y)
# the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model.compile(optimizer='rmsprop', loss='mse')
frozen_model.fit(data, labels) # this does NOT update the weights of layer
trainable_model.fit(data, labels) # this updates the weights of layer
Hope this helps!!
Related
I need to see how I would initialize all layers of a Sequential model with data from a same-sized sequential model.
E.G. How would I initialize the weights for every layer of the following Sequential model?
model = tf.keras.Sequential([Dense(2000, activation='relu', input_shape=(11,)),
Dense(1, activation='relu'),
Dropout(0.5),
Dense(400, activation='relu'),
Dropout(0.5),
Dense(150, activation='relu'),
BatchNormalization(),
Dense(y_max+1, activation='softmax')
])
I am fairly new to CNN training and have managed to make the above code work through trial and error and extensive research.
Datatype is list and np.array() of dtype np.float64
The idea is that I grab the weights from one model (same as above) and return it to another model (also same as above). I just need to be able to visualize how I can initialize the weights and biases of all layers using the following:
weights = model.get_weights()[0]
biases = model.get_weights()[1]
return weights, biases
I have attempted the model.set_weights() method, but I keep getting the following error message, given the code before the TypeError:
if iteration == 1:
for layer in model.layers:
layer.set_weights(None, None)
TypeError: set_weights() takes 2 positional arguments but 3 were given
I'd be very appreciative of any help, thank you.
In the Sequential example above, each layer parameters can be accessed and assigned new weights as shown below,
#example of first layer
model.layers[0]
#weights of the first layer,
model.layers[0].weights #gives the weights of kernel and bias of dense in this case
#assign new_weights by
model.layers[0].kernel.assign(tf.Variable(new_kernel_weights))
model.layers[0].bias.assign(tf.Variable(new_bias_weights))
I am using an LSTM for fake news detection and added an embedding layer to my model.
It is working fine without adding any input_shape in the LSTM function, but I thought the input_shape parameter was mandatory. Could someone help me with why there is no error even without defining input_shape? Is it because the embedding layer implicitly defines the input_shape?
Following is the code:
model=Sequential()
embedding_layer = Embedding(total_words, embedding_dim, weights=[embedding_matrix], input_length=max_length)
model.add(embedding_layer)
model.add(LSTM(64,))
model.add(Dense(1,activation='sigmoid'))
opt = SGD(learning_rate=0.01,decay=1e-6)
model.compile(loss = "binary_crossentropy", optimizer = opt,metrics=['accuracy'])
model.fit(data,train['label'], epochs=30, verbose=1)
You only need to provide an input_length to the Embedding layer. Furthermore, if you use a sequential model, you do not need to provide an input layer. Avoiding an input layer essentially means that your models weights are only created when you pass real data, as you did in model.fit(*). If you wanted to see the weights of your model before providing real data, you would have to define an input layer before your Embedding layer like this:
embedding_input = tf.keras.layers.Input(shape=(max_length,))
And yes, as you mentioned, your model infers the input_shape implicitly when you provide the real data. Your LSTM layer does not need an input_shape as it is also derived based on the output of your Embedding layer. If the LSTM layer were the first layer of your model, it would be best to specify an input_shape for clarity. For example:
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(32, input_shape=(10, 5)))
model.add(tf.keras.layers.Dense(1))
where 10 represents the number of time steps and 5 the number of features. In your example, your input to the LSTM layer has the shape(max_length, embedding_dim). Also here, if you do not specify the input_shape, your model will infer the shape based on your input data.
For more information check out the Keras documentation.
I have a simple feed forward neural network consisting of 8 input neurons, followed by 2 hidden layers, each with 6 hidden neurons and 1 output layer consisting of 1 output neuron.
The Keras code is:
model = Sequential()
model.add(Dense(6, input_dim = 8, activation='tanh')
model.add(Dense(6, activation='tanh'))
model.add(Dense(1, activation='tanh'))
Question:
Since I know which of the 8 input parameters has the strongest impact on the single output, I could set their start weights to a higher value relative to the other input parameters. If this would be possible that could reduce the training time significantly (if I am not wrong).
# reading the initial weights and bias of the input layer
layer_1 = (model.layers)[0]
# reading the initial weights of the input layer
w_1 = layer_1.get_weights()[0]
# setting weights for nth parameter of the input layer to a modified value val
w_1[n, :] = val
# setting the modified weights and unmodified bias of the input layer
layer_1.set_weights([w_1, layer_1.get_weights()[1]])
# writing layer_1 to model
(model.layers)[0] = layer_1
Consider transfer learning in order to use a pretrained model in keras/tensorflow. For each old layer, trained parameter is set to false so that its weights are not updated during training whereas the last layer(s) have been substituted with new layers and these must be trained. Particularly two fully connected hidden layers with 512 and 1024 neurons and and relu activation function have been added. After these layers a Dropout layer is used with rate 0.2. This means that during each epoch of training 20% of the neurons are randomly discarded.
What layers does this dropout layer affect? Does it affect all the network including also the pretrained layers for which layer.trainable=false has been set or does it affect only the newly added layers? Or does it affect only the previous layer (i.e., the one with 1024 neurons)?
In other words, which layer(s) do the neurons that are turned off during each epoch by the dropout belong to?
import os
from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.applications.inception_v3 import InceptionV3
local_weights_file = 'weights.h5'
pre_trained_model = InceptionV3(input_shape = (150, 150, 3),
include_top = False,
weights = None)
pre_trained_model.load_weights(local_weights_file)
for layer in pre_trained_model.layers:
layer.trainable = False
# pre_trained_model.summary()
last_layer = pre_trained_model.get_layer('mixed7')
last_output = last_layer.output
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add two fully connected layers with 512 and 1,024 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense (1, activation='sigmoid')(x)
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])
The dropout layer will affect the output of the previous layer.
If we look at the specific part of your code:
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense (1, activation='sigmoid')(x)
In your case, 20% of the output of the layer defined by x = layers.Dense(1024, activation='relu')(x) will be dropped at random, before being passed to the final Dense layer.
Only the previous layer's neurons are "turned off", but all layers are "affected" in terms of backprop.
Later layers: Dropout's output is input to the next layer, so next layer's outputs will change, and so will next-next's, etc.
Previous layers: as the "effective output" of the pre-Dropout layer is changed, so will gradients to it, and thus any subsequent gradients. In the extreme case of Dropout(rate=1), zero gradient will flow.
Also, note that whole neurons are only dropped if input to Dense is 2D (batch_size, features); Dropout applies a random uniform mask to all dimensions (equivalent to dropping whole neurons in 2D case). To drop whole neurons, set Dropout(.2, noise_shape=(batch_size, 1, features)) (3D case). To drop same neurons across all samples, use noise_shape=(1, 1, features) (or (1, features) for 2D).
Dropout technique is not implemented on every single layer within a neural network; it’s commonly leveraged within the neurons in the last few layers within the network.
The technique works by randomly reducing the number of interconnecting neurons within a neural network. At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contribution from connected neurons
There’s some debate as to whether the dropout should be placed before or after the activation function. As a rule of thumb, place the dropout after the activate function for all activation functions other than relu.
you can add dropout after every hidden layer and generally it affect only the previous layer in (your case it will effect (x = layers.Dense(1024, activation='relu')(x) )). In the original paper that proposed dropout layers, by Hinton (2012), dropout (with p=0.5) was used on each of the fully connected (dense) layers before the output; it was not used on the convolutional layers. This became the most commonly used configuration.
I am adding the resources link that might help you:
https://towardsdatascience.com/understanding-and-implementing-dropout-in-tensorflow-and-keras-a8a3a02c1bfa
https://towardsdatascience.com/dropout-on-convolutional-layers-is-weird-5c6ab14f19b2
https://towardsdatascience.com/machine-learning-part-20-dropout-keras-layers-explained-8c9f6dc4c9ab
I built and trained a network based on vgg16 network. In the original network I froze all the layers of vgg16 and trained only the last 4 layers which I added at the end of vgg16. Now I want to load and re-train this model by changing the trainable layers to use my own weights instead of ImageNet weights. Initially I tried to build the same model by changing the trainable layers of vgg16 and model weights with the following code.
# Load the VGG model
vgg_conv = VGG16(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
# Freeze n number of layers from the last
for layer in vgg_conv.layers[:-8]: layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers: print(layer, layer.trainable)
# Create and compile the model
model = createModel()
trained_model = keras.models.load_model(trained_dir)
model.set_weights(trained_model.get_weights())
model.compile(loss='categorical_crossentropy', optimizer=optimizers.RMSprop(lr=lr), metrics=['acc'])
But this gives me this error:
ValueError: Cannot feed value of shape (3, 3, 3, 64) for Tensor 'Placeholder_869:0', which has shape '(3, 3, 256, 512)'
When I check the weights of the original and new networks I see that shapes of some weights are different. I also tried to change the trainable layers of the original network but for layer in trained_model.layers: print(layer, layer.trainable) shows only the last layers that I added. So how can change the trainable layers of my own trained_model? Or is there another way to get the same result?
This might be the possible solution. I created a vgg16 based model with above code. Then I changed weights of the last layers by running this code: model.layers[1].set_weights(trained_model.layers[1].get_weights()). Since I added 4 layers to vgg16 I executed this code by changing the layer index from 1 to 4. I have not tried the model yet. If this is not a correct solution I would be glad to read your answers.