I have a simple neural network, I need to get weights and biases from the model. I have tried a few approaches discussed before but I keep getting the out of bounds value error. Not sure how to fix this, or what I'm missing.
Network-
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.layers[0].get_weights()[1]
Error -
IndexError: list index out of range
This is what has been mentioned in a few questions,but I end up getting the out of bounds error for this.
I have another question, the index followed aftermodel.layers[], does it correspond to the layer?
For instance model.layers[1] gives the weights corresponding to the second layer, something like that?
I've been there, I have been looking at my old code to see if I could remember how did I solved that issue.
What I did was to print the length of the model.layer[index].get_weights()[X] to figure out where keras was saving the weights I needed.
In my old code, model.layers[0].get_weights()[1] would return the biases, while model.layers[0].get_weights()[0] would return the actual weights.
In any case, take into account that there are layers which weights aren't saved (as they don't have weights), so if asking for model.layers[0].get_weights()[0] doesn't work, try with model.layers[1].get_weights()[1], as I'm not sure about flatten layers, but I do know that dense layers should save their weights.
The first layer (index 0) in your model is a Flatten layer, which does not have any weights, that's why you get errors.
To get the Dense layer, which is the second layer, you have to use index 1:
model.layers[1].get_weights()[1]
Just model.get_weights(), you will get all the weights and bias of your model
To get the weights and bias on a Keras sequential and for every iteration, you can do it as in the next example:
# create model
model = Sequential()
model.add(Dense(numHiddenNeurons, activation="tanh", input_dim=4, kernel_initializer="uniform"))
model.add(Dense(1, activation="linear", kernel_initializer="uniform"))
# Compile model
model.compile(loss='mse', optimizer='adam', metrics=['accuracy', 'mse', 'mae', 'mape'])
weightsBiasDict = {}
weightAndBiasCallback = tf.keras.callbacks.LambdaCallback \
(on_epoch_end=lambda epoch, logs: weightsBiasDict.update({epoch:model.get_weights()}))
# Fit the model
history= model.fit(X1, Y1, epochs=numIterations, batch_size=batch_size, verbose=0, callbacks=weightAndBiasCallback)
weights and bias are accessible for every iteration on the dictionary weightsBiasDict
If you just need weights and bias values at the end of the training you can use
model.layer[index].get_weights()[0] for weights
and
model.layer[index].get_weights()[1] for biases
where index is the layer number on your network, starting at zero for the input layer.
Related
I need to see how I would initialize all layers of a Sequential model with data from a same-sized sequential model.
E.G. How would I initialize the weights for every layer of the following Sequential model?
model = tf.keras.Sequential([Dense(2000, activation='relu', input_shape=(11,)),
Dense(1, activation='relu'),
Dropout(0.5),
Dense(400, activation='relu'),
Dropout(0.5),
Dense(150, activation='relu'),
BatchNormalization(),
Dense(y_max+1, activation='softmax')
])
I am fairly new to CNN training and have managed to make the above code work through trial and error and extensive research.
Datatype is list and np.array() of dtype np.float64
The idea is that I grab the weights from one model (same as above) and return it to another model (also same as above). I just need to be able to visualize how I can initialize the weights and biases of all layers using the following:
weights = model.get_weights()[0]
biases = model.get_weights()[1]
return weights, biases
I have attempted the model.set_weights() method, but I keep getting the following error message, given the code before the TypeError:
if iteration == 1:
for layer in model.layers:
layer.set_weights(None, None)
TypeError: set_weights() takes 2 positional arguments but 3 were given
I'd be very appreciative of any help, thank you.
In the Sequential example above, each layer parameters can be accessed and assigned new weights as shown below,
#example of first layer
model.layers[0]
#weights of the first layer,
model.layers[0].weights #gives the weights of kernel and bias of dense in this case
#assign new_weights by
model.layers[0].kernel.assign(tf.Variable(new_kernel_weights))
model.layers[0].bias.assign(tf.Variable(new_bias_weights))
I've created a code to train my CNN which will determine if a image be one or another class ("pdr or nonPdr")
This is my keras model:
model = Sequential()
model.add(Conv2D(input_shape=(605,700,3), filters=64, kernel_size=(3,3), padding="same",activation="tanh"))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(32, activation='tanh', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
data, labels = ReadImages(TRAIN_DIR)
model.fit(np.array(data), np.array(labels), epochs=10, batch_size=16)
model.save('model.h5')
And this is my test_predict file:
model = load_model('model.h5')
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
test_image = cv2.imread("test_data/im0203.ppm")
test_image = test_image.reshape(1,605,700,3).astype('float')
test_image = np.array(test_image)
test_image2 = cv2.imread("test_data/im0001.ppm")
test_image2 = test_image2.reshape(1,605,700,3).astype('float')
#predict the result
print(model.predict(test_image))
print(model.predict(test_image2))
After run the code below I get the same value for my 2 images (that are different, one is pdr and the other is nonPdr)
[[0.033681]]
[[0.033681]]
How can I fix it and improve my CNN. I appreciate your help.
Update
I tried to remove one dense layer and change the last one in a (2, activation='sigmoid') but it not works too.. I really don't have any ideia what to do**
This could be because of multiple reasons. Most plausible ones are:
The input fed to network and the image fed during prediction may not have the same preprocessing steps.
The model is over-fitted and when it finds a little different prediction image, it tends to bias its decision based on its training.
Model is not good
To avoid such complications, possible future solutions could be:
1. Try feeding a normalized or scaled image only during training and testing.
Do not use too many Dense layers (Especially for image classification) because dense layers are good at learning spatial information and for an image to classify correctly, the model needs to look the image in 2d matrix rather than 1d spatial matrix (since dense layers flatten the 2d matrix to 1d matrix).
Even if you are trying to use a dense layer, I would recommend using a dense layer only at the last layer, it, for the final number_of_classes layers.
A better architecture would be
[ Conv followed by few transition blocks and bottleneck layers ] x 2
flatten
Dense(number_of_classes)
I'm trying to do transfer learning in Keras. I set up a ResNet50 network set to not trainable with some extra layers:
# Image input
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg')) # output is 2048
model.add(Dropout(0.05))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(512, activation='relu'))
model.add(Dense(7, activation='softmax'))
model.layers[0].trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
Then I create input data: x_batch using the ResNet50 preprocess_input function, along with the one hot encoded labels y_batch and do the fitting as so:
model.fit(x_batch,
y_batch,
epochs=nb_epochs,
batch_size=64,
shuffle=True,
validation_split=0.2,
callbacks=[lrate])
Training accuracy gets close to 100% after ten or so epochs, but validation accuracy actually decreases from around 50% to 30% with validation loss steadily increasing.
However if I instead create a network with just the last layers:
# Vector input
model2 = Sequential()
model2.add(Dropout(0.05, input_shape=(2048,)))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model2.summary()
and feed in the output of the ResNet50 prediction:
resnet = ResNet50(include_top=False, pooling='avg')
x_batch = resnet.predict(x_batch)
Then validation accuracy gets up to around 85%... What is going on? Why won't the image input method work?
Update:
This problem is really bizarre. If I change ResNet50 to VGG19 it seems to work ok.
After a lot of googling I found that the problem is to do with the Batch Normalisation layers in ResNet. There are no batch normalisation layers in VGGNet which is why it works for that topology.
There is a pull request to fix this in Keras here, which explains in more detail:
Assume we use one of the pre-trained CNNs of Keras and we want to fine-tune it. Unfortunately, we get no guarantees that the mean and variance of our new dataset inside the BN layers will be similar to the ones of the original dataset. As a result, if we fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference the top layers will receive data which are scaled using the mean/variance of the original dataset. This discrepancy can lead to reduced accuracy.
This means that the BN layers are adjusting to the training data, however when validation is performed, the original parameters of the BN layers are used. From what I can tell, the fix is to allow the frozen BN layers to use the updated mean and variance from training.
A work around is to pre-compute the ResNet output. In fact, this decreases training time considerably, as we are not repeating that part of the calculation.
you can try :
Res = keras.applications.resnet.ResNet50(include_top=False,
weights='imagenet', input_shape=(IMG_SIZE , IMG_SIZE , 3 ) )
# Freeze the layers except the last 4 layers
for layer in vgg_conv.layers :
layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers:
print(layer, layer.trainable)
# Vector input
model2 = Sequential()
model2.add(Res)
model2.add(Flatten())
model2.add(Dropout(0.05 ))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics =(['accuracy'])
model2.summary()
I have a big dataset composed of 18260 input field with 4 outputs. I am using Keras and Tensorflow to build a neural network that can detect the possible output.
However I tried many solutions but the accuracy is not getting above 55% unless I use sigmoid activation function in all model layers except the first one as below:
def baseline_model(optimizer= 'adam' , init= 'random_uniform'):
# create model
model = Sequential()
model.add(Dense(40, input_dim=18260, activation="relu", kernel_initializer=init))
model.add(Dense(40, activation="sigmoid", kernel_initializer=init))
model.add(Dense(40, activation="sigmoid", kernel_initializer=init))
model.add(Dense(10, activation="sigmoid", kernel_initializer=init))
model.add(Dense(4, activation="sigmoid", kernel_initializer=init))
model.summary()
# Compile model
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
Is using sigmoid for activation correct in all layers? The accuracy is reaching 99.9% when using sigmoid as shown above. So I was wondering if there is something wrong in the model implementation.
The sigmoid might work. But I suggest using relu activation for hidden layers' activation. The problem is, your output layer's activation is sigmoid but it should be softmax(because you are using sparse_categorical_crossentropy loss).
model.add(Dense(4, activation="softmax", kernel_initializer=init))
Edit after discussion on comments
Your outputs are integers for class labels. Sigmoid logistic function outputs values in range (0,1). The output of the softmax is also in range (0,1), but the Softmax function adds another constraint on the outputs:- the sum of the outputs must be 1. Therefore the outputs of softmax can be interpreted as probability of the input being each class.
E.g
def sigmoid(x):
return 1.0/(1 + np.exp(-x))
def softmax(a):
return np.exp(a-max(a))/np.sum(np.exp(a-max(a)))
a = np.array([0.6, 10, -5, 4, 7])
print(sigmoid(a))
# [0.64565631, 0.9999546 , 0.00669285, 0.98201379, 0.99908895]
print(softmax(a))
# [7.86089760e-05, 9.50255231e-01, 2.90685280e-07, 2.35544722e-03,
4.73104222e-02]
print(sum(softmax(a))
# 1.0
You got to use one or the other activation, as activations are the source to bring non-linearity into the model. If the model doesn't have any activation, then it basically behaves like a single layer network. Read more about 'Why to use activations here'. You can check various activations here.
Although it seems like your model is overfitting when using sigmoid, so try techniques to overcome it like creating train/dev/test sets, reducing complexity of the model, dropouts, etc.
Neural networks require non-linearity at each layer to work. Without non-linear activation no matter how many layers you have, you could write the same thing with only one layer.
Linear functions are limited in complexity and if "g" and "f" are linear functions g(f(x)) could be written as z(x) where z is also a linear function. It is pointless to stack them without adding non-linearity.
And that's why we use non-linear activation functions. sigmoid(g(f(x))) cannot be written as a linear function.
I am using tensorflow with keras to perform regression on some historical data. Data type as follows:
id,timestamp,ratio
"santalucia","2018-07-04T16:55:59.020000",21.8
"santalucia","2018-07-04T16:50:58.043000",22.2
"santalucia","2018-07-04T16:45:56.912000",21.9
"santalucia","2018-07-04T16:40:56.572000",22.5
"santalucia","2018-07-04T16:35:56.133000",22.5
"santalucia","2018-07-04T16:30:55.767000",22.5
And I am reformulating it as a time series problem (25 time steps) so that I can predict (make a regression) for the next values of the series (variance should not be high). I am using also sklearn.preprocessing MinMaxScaler to scale the data to range (-1,1) or (0,1) depending if I use LSTM or Dense (respectively).
I am training with two different architectures:
Dense is as follows:
def get_model(self, layers, activation='relu'):
model = Sequential()
# Input arrays of shape (*, layers[1])
# Output = arrays of shape (*, layers[1] * 16)
model.add(Dense(units=int(64), input_shape=(layers[1],), activation=activation))
model.add(Dense(units=int(64), activation=activation))
# model.add(Dropout(0.2))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
# opt = optimizers.Adagrad(lr=self.learning_rate, epsilon=None, decay=self.decay_lr)
opt = optimizers.rmsprop(lr=0.001)
model.compile(optimizer=opt, loss=self.loss_fn, metrics=['mae'])
model.summary()
return model
Which more or less provides with good results (same architecture as in tensorflows' tutorial for predicting house prices).
However, LSTM is not giving good results, it usually ends up stuck around a value (for example, 40 (40.0123123, 40.123123,41.09090...) and I do not see why or how to improve it. Architecture is:
def get_model(self, layers, activation='tanh'):
model = Sequential()
# Shape = (Samples, Timesteps, Features)
model.add(LSTM(units=128, input_shape=(layers[1], layers[2]),
return_sequences=True, activation=activation))
model.add(LSTM(64, return_sequences=True, activation=activation))
model.add(LSTM(layers[2], return_sequences=False, activation=activation))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
opt = optimizers.Adagrad(lr=0.001, decay=self.decay_lr)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
model.summary()
return model
I currently train with a batch size of 200 that increases by a rate of 1.5 every fit. Each fit is made of 50 epochs, and I use a keras earlystopping callback with at least 20 epoch.
I have tried adding more layers, more units, reducing layers, units, increasing and decreasing learning rate, etc, but every time it gets stuck around a value. Any reason for this?
Also, do you know any good practices that can be applied to this problem?
Cheers
Have you tried holding back a validation set seeing how well the model performance on the training set tracks with the validation set? This is often how I catch myself overfitting.
A simple function for doing this (adapted from here) can help you do that:
hist = model.fit_generator(...)
def gen_graph(history, title):
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])
plt.title(title)
gen_graph(hist, "Accuracy, training vs. validation scores")
Also, do you have enough samples? If you're really, really sure that you have done as much as you can in terms of preprocessing, and in terms of hyperparameter tuning... generating some synthetic data or doing some data augmentation has occasionally helped me.