Are the values of the variables (such as the Batch Normalization moving_mean and moving_variance) saved when the Estimator is exported? (e.g. with a BestExporter)
This is currently how I export the model:
best_exporter = tf.estimator.BestExporter(
name=best_model_path,
serving_input_receiver_fn=serving_input_receiver_fn,
exports_to_keep=1)
exporter = [best_exporter]
train_spec = tf.estimator.TrainSpec(...)
eval_spec = tf.estimator.EvalSpec(...,
exporters=exporter)
tf.estimator.train_and_evaluate(ben_classifier, train_spec, eval_spec)
At training time, I am adding the update operations for the BatchNormalization to the training operation
optimizer = tf.train.RMSPropOptimizer(learning_rate=L_RATE)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_ops = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_ops)
Restoring with a tf.contrib.predictor.from_saved_model does not allow checking the values of the variables.
So, my question is, is there a way to check it? And if so, how is it possible to save these BN variables at exporting time?
My performance at inference time is much worse from the one at training and evaluation time. I discarded the overfitting option because it's a very simple network and plus, performing the predictions directly using the estimator model at the end of the training (or generally the last one) would result in better performance than the best model.
Related
I have been browsing the documentation for the tensorflow.keras.save_model() API and I came across the parameter include_optimizer and I am wondering what would be the advantage of not including the optimizer, or perhaps what problems could arise if the optimizer isn't saved with the model?
To give more context for my specific use-case, I want to save a model and then use the generated .pb file with Tensorflow Serving. Is there any reason I would need to save the optimizer state, would not saving it reduce the overall size of the resultant file? If I don't save it is it possible that the model will not work correctly in TF serving?
Saving the optimizer state will require more space, as the optimizer has parameters that are adjusted during training. For some optimizers, this space can be significant, as several meta-parameters are saved for each tuned model parameter.
Saving the optimizer parameters allows you to restart training in exactly the same state as you saved the checkpoint, whereas without saving the optimizer state, even the same model parameters might result in a variety of training outcomes with different optimizer parameters.
Thus, if you plan on continuing to train your model from the saved checkpoint, you'd probably want to save the optimizer's state as well. However, if you're instead saving the model state for future use only for inference, you don't need the optimizer state for anything. Based on your description of wanting to deploy the model on TF Serving, it sounds like you'll only be doing inference with the saved model, so are safe to exclude the optimizer.
This is my predict function. is there anything wrong with this? Predictions are not stable, everytime I run on same data, I get different predictions.
def predict(model, device, inputs, batch_size=1024):
model = model.to(device)
dataset = torch.utils.data.TensorDataset(*inputs)
loader = torch.utils.data.DataLoader(
dataset,
batch_size=batch_size,
pin_memory=False
)
predictions = []
for i, batch in enumerate(loader):
with torch.no_grad():
pred = model(*(item.to(device) for item in batch))
pred = pred.detach().cpu().numpy()
predictions.append(pred)
return np.concatenate(predictions)
As Usman Ali suggested, you need to set your model to eval mode by calling
model.eval()
before your prediction function.
What eval mode does:
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.
When you finish your prediction and wish t continue training, don't forget to reset your model to training mode by calling
model.train()
There are several layers in models that may introduce randomness into the forward pass of the net. One such example is the dropout layers. A dropout layer "drops" p percent of its neurons at random to increase model's generalization.
Additionally, BatchNorm (and possibly other adaptive normalization layers) keeps track of the statistics of the data and therefore has a different "behavior" in train mode or in eval mode.
You have defined the function, but you haven't trained the model. The model randomizes predictions before it is trained, which is why yours are inconsistent. If you set up an optimizer with a loss function, and run over multiple epochs the predictions will stabilize. This link may help: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html. Look at sections 3 and 4
I'm working with an existing tensorflow model.
For one part of the network, I want to set a different learning rate as in the remaining network. Let's say all_variables are made up of variables_1 and variables_2, then I want to change the learning rate for variables of variables_2.
The existing code for settings up optimizer, computing and applying gradients looks basically like this:
optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)
grads_and_vars = optimizer.compute_gradients(loss, all_variables)
grads_updates = optimizer.apply_gradients(grads_and_vars, global_step)
I already tried to create a second optimizer following this scheme. However, for debugging, I set both learning rates equal, and the decrease of regularization loss was very dissimilar.
Isn't it possible to create a second optimizer, optimizer_new, and apply apply_gradients simply on the respective grads_and_vars of variables_1 and variables_2? I.e. Instead of having this line
grads_updates = optimizer.apply_gradients(grads_and_vars, global_step)
one could use
grads_updates = optimizer.apply_gradients(grads_and_vars['variables_1'], global_step)
grads_updates_new = optimizer_new.apply_gradients(grads_and_vars['variables_2'], global_step)
and finally, train_op = tf.group(grads_updates, grads_updates_new).
However, the regularization loss behavior is still present.
I came across the cause through a comment in this post . In my case, it doesn't make sense to to supply twice "global_step" for the global_step argument of apply_gradients. As the learning_rate and therefore the optimizer arguments depends on global_step, the training process, especially regularization loss behaviour, differs. Thanks to y.selivonchyk for pointing this out.
I want to ask a question about how to monitor validation loss in the training process of estimators in TensorFlow. I have checked a similar question (validation during training of Estimator) asked before, but it did not help much.
If I use estimators to build a model, I will give an input function to the Estimator.train() function. But there is no way to add another validation_x, and validation_y data in the training process. Therefore, when the training started, I can only see the training loss. The training loss is expected to decrease when the training process running longer. However, this information is not helpful to prevent overfitting. The more valuable information is validation loss. Usually, the validation loss is the U-shape with the number of epochs. To prevent overfitting, we want to find the number of epochs that the validation loss is minimum.
So this is my problem. How can I get validation loss for each epoch in the training process of using estimators?
You need to create a validation input_fn and either use estimator.train() and estimator.evaluate() alternatively or simpy use tf.estimator.train_and_evaluate()
x = ...
y = ...
...
# For example, if x and y are numpy arrays < 2 GB
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
val_dataset = tf.data.Dataset.from_tensor_slices((x_val_, y_val))
...
estimator = ...
for epoch in n_epochs:
estimator.train(input_fn = train_dataset)
estimator.evaluate(input_fn = val_dataset)
estimator.evaluate() will compute the loss and any other metrics that are defined in your model_fn and will save the events in a new "eval" directory inside your job_dir.
I have been unable to figure out how to use transfer learning/last layer retraining with the new TF Estimator API.
The Estimator requires a model_fn which contains the architecture of the network, and training and eval ops, as defined in the documentation. An example of a model_fn using a CNN architecture is here.
If I want to retrain the last layer of, for example, the inception architecture, I'm not sure whether I will need to specify the whole model in this model_fn, then load the pre-trained weights, or whether there is a way to use the saved graph as is done in the 'traditional' approach (example here).
This has been brought up as an issue, but is still open and the answers are unclear to me.
It is possible to load the metagraph during model definition and use SessionRunHook to load the weights from a ckpt file.
def model(features, labels, mode, params):
# Create the graph here
return tf.estimator.EstimatorSpec(mode,
predictions,
loss,
train_op,
training_hooks=[RestoreHook()])
The SessionRunHook can be:
class RestoreHook(tf.train.SessionRunHook):
def after_create_session(self, session, coord=None):
if session.run(tf.train.get_or_create_global_step()) == 0:
# load weights here
This way, the weights are loaded in first step and saved during training in model checkpoints.