This is my predict function. is there anything wrong with this? Predictions are not stable, everytime I run on same data, I get different predictions.
def predict(model, device, inputs, batch_size=1024):
model = model.to(device)
dataset = torch.utils.data.TensorDataset(*inputs)
loader = torch.utils.data.DataLoader(
dataset,
batch_size=batch_size,
pin_memory=False
)
predictions = []
for i, batch in enumerate(loader):
with torch.no_grad():
pred = model(*(item.to(device) for item in batch))
pred = pred.detach().cpu().numpy()
predictions.append(pred)
return np.concatenate(predictions)
As Usman Ali suggested, you need to set your model to eval mode by calling
model.eval()
before your prediction function.
What eval mode does:
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.
When you finish your prediction and wish t continue training, don't forget to reset your model to training mode by calling
model.train()
There are several layers in models that may introduce randomness into the forward pass of the net. One such example is the dropout layers. A dropout layer "drops" p percent of its neurons at random to increase model's generalization.
Additionally, BatchNorm (and possibly other adaptive normalization layers) keeps track of the statistics of the data and therefore has a different "behavior" in train mode or in eval mode.
You have defined the function, but you haven't trained the model. The model randomizes predictions before it is trained, which is why yours are inconsistent. If you set up an optimizer with a loss function, and run over multiple epochs the predictions will stabilize. This link may help: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html. Look at sections 3 and 4
Related
I'm learning CNN and wondering why is my network stuck at 0% accuracy even after multiple epochs? I'm sharing the entire code as it's really simple.
I have a dataset with faces and respective ages. I'm using keras and tf to train a convolution neural network to determine age.
However, my accuracy is always reporting as 0%. I'm very new to neural networks and I'm hoping you could tell me what I am doing wrong?
path = "dataset"
pixels = []
age = []
for img in os.listdir(path):
ages = img.split("_")[0]
img = cv2.imread(str(path)+"/"+str(img))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
pixels.append(np.array(img))
age.append(np.array(ages))
age = np.array(age,dtype=np.int64)
pixels = np.array(pixels)
x_train,x_test,y_train,y_test = train_test_split(pixels,age,random_state=100)
input = Input(shape=(200,200,3))
conv1 = Conv2D(70,(3,3),activation="relu")(input)
conv2 = Conv2D(65,(3,3),activation="relu")(conv1)
batch1 = BatchNormalization()(conv2)
pool3 = MaxPool2D((2,2))(batch1)
conv3 = Conv2D(60,(3,3),activation="relu")(pool3)
batch2 = BatchNormalization()(conv3)
pool4 = MaxPool2D((2,2))(batch2)
flt = Flatten()(pool4)
#age
age_l = Dense(128,activation="relu")(flt)
age_l = Dense(64,activation="relu")(age_l)
age_l = Dense(32,activation="relu")(age_l)
age_l = Dense(1,activation="relu")(age_l)
model = Model(inputs=input,outputs=age_l)
model.compile(optimizer="adam",loss=["mse","sparse_categorical_crossentropy"],metrics=['mae','accuracy'])
save = model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=2)
Well you have to decide if you want to do a classification model or a regression model. As it stands now it looks like you are trying to do a regression model.
Lets start at the outset. Apparently you have a dataset of image files and within the files path is text that defines the age so it is something like
say 27_01.jpg I assume. So you split the path based on the _ to get the age associated with the image file. You then read in the image using cv2 and then convert it to rgb. Now cv2 reads in the image and return it as an array so you don't need to convert it to an np array just use
pixels.append(img)
now the variable ages is a string which you want to convert into an integer. So just
use the code
ages =int( img.split("_")[0])
this is now a scaler integer value, not an array so just use
age.append(ages)
you now have two lists, pixels and age. To use them in a model you need to convert them to np arrays so use
age=np.array(age)
pixels=np.array(pixels
Now the next thing you want to do is to create a train set and a test set using the train_test_split function. Lets assume you want 90% of the data set to be used for training and 10% for testing. so use
x_train,x_test,y_train,y_test = train_test_split(pixels,age,train_size=.9, shuffle=True, random_state=100)
Now lets look at your model. This is what decides if you are doing regression or
classification. You want to do regression. Your model is OK but needs some changes
You have 4 dense layers. I suspect that this will lead to a case where your model
is over-fitting so I recommend that prior to the last layer you add a dropout layer
Use the code
drop=Dropout(rate=.4, seed=123)(age_1)
age_l = Dense(1,activation="linear")(age_l)
Note the activation is set to linear. That way the output can take a range of values
that can be compared to the integer values of the age array.
Now when you compile your model you want your loss to be mse. So it is measuring the error between the models output and the ages. Sparse_categorical crossentropy is used when you are doing classification which is NOT what you are doing. As for the metrics accuracy is used for classification models so you only want to use mae So you compile code should be
model.compile(optimizer="adam",loss="mse",metrics=['mae'])
now model.fit looks ok but you should run for more epochs like say 20. Now when you run your model look at the training loss and the validation loss. As the training loss decreases, on AVERAGE the validation loss should trend to decrease. If it starts to trend upward your model is over-fitting. In that case you may want to add an additional dropout layer.
At some point your model will stop improving if you run a sufficient number of epochs. You can usually get an improvement in performance if you use an adjustable learning rate. Since you are new to this you may not have experience using callbacks. Callbacks are used within model.fit and there are many types. Documentation for callbacks can be found here. To implement an adjustable learning rate you can use the callback ReduceLROnPlateau. The documentation for that is here. What it does is to set it up to monitor the validation loss. If the validation loss fails to reduce for a "patience" number of epochs the callback will reduce the learning rate by the parameter "factor" where
new_learning_rate=current_learning rate * factor
where factor is a float between 0 and 1.0. May recommended code for this callback is
shown below
rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",factor=0.5,
patience=2, verbose=1)
I also recommend you use the callback EarlyStopping. The documentation for that is here. Set it up to monitor validation loss. If the loss fails to reduce for 'patience number of consecutive epochs training will be halted. Set the parameter restore_best_weights=True. That way if the callback halts training it leaves your model set with the weights for the epoch that had the lowest validation loss. My recommended code for the callback is shown below
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=4,
verbose=1, restore_best_weights=True)
To use the callback for model.fit include the code
save = model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=20,
callbacks=[rlronp,estop])
By the way I think I am familar with this dataset or a similar one. Do not expect
great root mean squared error as I have seen many models for this and none had a small error margin. Incidentally if you want to learn machine learning there is an excellent set of about 200 tutorials on this by a guy called Gabriel Atkin. He can see his tutorials called Data Everyday here. The specific tutorial dealing with this kind of age dataset is located here.
Are the values of the variables (such as the Batch Normalization moving_mean and moving_variance) saved when the Estimator is exported? (e.g. with a BestExporter)
This is currently how I export the model:
best_exporter = tf.estimator.BestExporter(
name=best_model_path,
serving_input_receiver_fn=serving_input_receiver_fn,
exports_to_keep=1)
exporter = [best_exporter]
train_spec = tf.estimator.TrainSpec(...)
eval_spec = tf.estimator.EvalSpec(...,
exporters=exporter)
tf.estimator.train_and_evaluate(ben_classifier, train_spec, eval_spec)
At training time, I am adding the update operations for the BatchNormalization to the training operation
optimizer = tf.train.RMSPropOptimizer(learning_rate=L_RATE)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_ops = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_ops)
Restoring with a tf.contrib.predictor.from_saved_model does not allow checking the values of the variables.
So, my question is, is there a way to check it? And if so, how is it possible to save these BN variables at exporting time?
My performance at inference time is much worse from the one at training and evaluation time. I discarded the overfitting option because it's a very simple network and plus, performing the predictions directly using the estimator model at the end of the training (or generally the last one) would result in better performance than the best model.
I'm trying to implement an adversarial loss in keras.
The model consists of two networks, one auto-encoder (the target model) and one discriminator. The two models share the encoder.
I created the adversarial loss of the auto-encoder by setting a keras variable
def get_adv_loss(d_loss):
def loss(y_true, y_pred):
return some_loss(y_true, y_pred) - d_loss
return loss
discriminator_loss = K.variable()
L = get_adv_loss(discriminator_loss)
autoencoder.compile(..., loss=L)
and during training I interleave train_on_batch of discriminator and autoencoder to update discriminator_loss
d_loss = disciminator.train_on_batch(x, y_domain)
discriminator_loss.assign(d_loss)
a_loss, ... = self.segmenter.train_on_batch(x, y_target)
However, I found out that the value of these variables is frozen when the model is compiled. I tried to recompile the model during training but that raise the error
Node 'IsVariableInitialized_13644': Unknown input node
'training_12/Adam/Variable'
which I guess it means i cant recompile during training? any suggestion on how i can inject the discriminator loss in the autoencoder?
Keras model supports multiple outputs. So just include your discirminator into your keras model and freeze the discrminator layers, if the discriminator should not be trained.
The next question would be how to combine autoencoder loss and discriminator loss. Luckily keras model.compile supports loss weights. If autoencoder is your first output and discriminator is your second you could do something like loss_weights=[1, -1]. So a better discriminator is worse for the autoencoder.
Edit: Here is an example, how to implement an Adversary Network:
# Build your architecture
auto_encoder_input = Input((5,))
auto_encoder_net = Dense(10)(auto_encoder_input)
auto_encoder_output = Dense(5)(auto_encoder_net)
discriminator_net = Dense(20)(auto_encoder_output)
discriminator_output = Dense(5)(discriminator_net)
# Define outputs of your model
train_autoencoder_model = Model(auto_encoder_input, [auto_encoder_output, discriminator_output])
train_discriminator_model = Model(auto_encoder_input, discriminator_output)
# Compile the models (compile the first model and then change the trainable attribute for the second)
for layer_index, layer in enumerate(train_autoencoder_model.layers):
layer.trainable = layer_index < 3
train_autoencoder_model.compile('Adam', loss=['mse', 'mse'], loss_weights=[1, -1])
for layer_index, layer in enumerate(train_discriminator_model.layers):
layer.trainable = layer_index >= 3
train_discriminator_model.compile('Adam', loss='mse')
# A simple example how a training can look like
for i in range(10):
auto_input = np.random.sample((10,5))
discrimi_output = np.random.sample((10,5))
train_discriminator_model.fit(auto_input, discrimi_output, steps_per_epoch=5, epochs=1)
train_autoencoder_model.fit(auto_input, [auto_input, discrimi_output], steps_per_epoch=1, epochs=1)
As you can see there is no much magic behind building an Adversary Model with keras.
Unless you decide to go deep in the keras source code, I don't think you can do this easily. Before writing your own adversarial module, you should check the existing works carefully. As far as I know, keras-adversarial is still used by many people. Of course, it only supports old keras versions, e.g. 2.0.8.
Several other things:
be careful when you freeze your model weights. If you first compile a model and then freeze some weights, these weights are still trainable, because when the train function is generated during compiling. So you should freeze weights first then compile.
keras-adversarial does this job in a more elegant way. Instead of making two models, shared weights but freeze some weights in different ways, it creates two train functions, one for each player.
I'm using the latest Keras with Tensorflow backend (Python 3.6)
I'm loading a model that had a training accuracy at around 86% when I last trained it.
The orginal optimizer that I used was :
r_optimizer = optimizer=Adam(lr=0.0001, decay = .02)
model.compile(optimizer= r_optimizer,
loss='categorical_crossentropy', metrics = ['accuracy'])
If I load the model and continue training without recompiling, my
accuracy would stay around 86% (even after 10 or so more epochs).
So I wanted to try changing the learning rate or optimizer.
If I recompile the model and try to change the learning rate or the
optimizer as follows:
new_optimizer = optimizer=Adam(lr=0.001, decay = .02)
or to this one:
sgd = optimizers.SGD(lr= .0001)
and then compile:
model.compile(optimizer= new_optimizer ,
loss='categorical_crossentropy', metrics = ['accuracy'])
model.fit ....
The accuracy would reset to around 15% - 20%, instead of starting around 86%,
and my loss would be much higher.
Even if I used a small learning rate, and recompiled, I would still start
off from a very low accuracy.
From browsing the internet it seems some optimizers like ADAM or RMSPROP have
a problem with resetting weights after recompiling (can't find the link at the moment)
So I did some digging and tried to reset my optimizer without recompiling as follows:
model = load_model(load_path)
sgd = optimizers.SGD(lr=1.0) # very high for testing
model.optimizer = sgd #change optimizer
#fit for training
history =model.fit_generator(
train_gen,
steps_per_epoch = r_steps_per_epoch,
epochs = r_epochs,
validation_data=valid_gen,
validation_steps= np.ceil(len(valid_gen.filenames)/r_batch_size),
callbacks = callbacks,
shuffle= True,
verbose = 1)
However, these changes don't seem to be reflected in my training.
Despite raising the lr significantly, I'm still floundering around 86% with the same loss. During each epoch, I'm seeing very little loss or accuracy movement. I would expect the loss to be a lot more volatile.
This leads me to believe that my change in optimizer and lr isn't being
realized by the model.
Any idea what I could be doing wrong?
I think your change does not assign new lr to optimizer, and I find a solution to reset lr values after loading model in Keras, hope it will help you.
This is a partial answer referring to what you wrote here:
From browsing the internet it seems some optimizers like ADAM or RMSPROP have a problem with resetting weights after recompiling (can't find the link at the moment)
Adaptive optimizers such as ADAM RMSPROP, ADAGRAD, ADADELTA, and any variation on these, rely on previous update steps to improve the direction and magnitude of any current adjustment to the weights of the model.
Because of this, the first few steps that they take tend to be relatively "bad" as they "calibrate themselves" with information from previous steps.
When used on a random initialization, this is not a problem, but when used on a pretrained model, these few first steps, can degrade the model so much, that almost all of the pretrained work gets lost.
Even worse, now the training doesn't start from a carefully chosen random initialization like a Xavier initialization, but from some sub-optimal starting point, which could potentially prevent the model from converging to the local optimum that it would have reached if it started from a good random initialization.
Unfortunately I'm not sure how you can avoid this... Perhaps pretrain with one optimizer --> save weights --> replace optimizer --> restore weights --> train for a few epochs and hope the new adaptive optimizer learns a "useful history" --> than restore the weights agin from the saved weights of the pretrained model and without recompiling start training again, now with a better optimizer "history".
Please let us know if this works.
I am trying to visualize the errors of each layer in CNN by tensor-board with Keras to see how they change in every layer timely. How do I get errors for each layer?
The loss is defined only in the output layer to measure how good your model fits the data. keras provides a function to keep track of relevant variables during the training process called History().
from keras.callbacks import History
history = History()
# define and compile your model
model.fit(..., callbacks=[history])
print(history.history)
The last command shows you all tracked values during the training process. You can access single variables via the get() method. To get the training loss, you can access it via
history.history.get('loss')