Loading trained Tensorflow model into estimator - python

Say that I have trained a Tensorflow Estimator:
estimator = tf.contrib.learn.Estimator(
model_fn=model_fn,
model_dir=MODEL_DIR,
config=some_config)
And I fit it to some train data:
estimator.fit(input_fn=input_fn_train, steps=None)
The idea is that a model is fit to my MODEL_DIR. This folder contains a checkpoint and several files of .meta and .index.
This works perfectly. I want to do some predictions using my functions:
estimator = tf.contrib.Estimator(
model_fn=model_fn,
model_dir=MODEL_DIR,
config=some_config)
predictions = estimator.predict(input_fn=input_fn_test)
My solution works perfectly but there is one big disadvantage: you need to know model_fn, which is my model defined in Python. But if I change the model by adding a dense layer in my Python code, this model is incorrect for the saved data in MODEL_DIR, leading to incorrect results:
NotFoundError (see above for traceback): Key xxxx/dense/kernel not found in checkpoint
How do I cope with this? How can I load my model / estimator such that I can make predictions on some new data? How can I load model_fn or the estimator from MODEL_DIR?

Avoiding a bad restoration
Restoring a model's state from a checkpoint only works if the model and checkpoint are compatible. For example, suppose you trained a DNNClassifier Estimator containing two hidden layers, each having 10 nodes:
classifier = tf.estimator.DNNClassifier(
feature_columns=feature_columns,
hidden_units=[10, 10],
n_classes=3,
model_dir='models/iris')
classifier.train(
input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
steps=200)
After training (and, therefore, after creating checkpoints in models/iris), imagine that you changed the number of neurons in each hidden layer from 10 to 20 and then attempted to retrain the model:
classifier2 = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
hidden_units=[20, 20], # Change the number of neurons in the model.
n_classes=3,
model_dir='models/iris')
classifier.train(
input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
steps=200)
Since the state in the checkpoint is incompatible with the model described in classifier2, retraining fails with the following error:
...
InvalidArgumentError (see above for traceback): tensor_name =
dnn/hiddenlayer_1/bias/t_0/Adagrad; shape in shape_and_slice spec [10]
does not match the shape stored in checkpoint: [20]
To run experiments in which you train and compare slightly different versions of a model, save a copy of the code that created each model_dir, possibly by creating a separate git branch for each version. This separation will keep your checkpoints recoverable.
copy from tensorflow checkpoints doc.
https://www.tensorflow.org/get_started/checkpoints
hope that can help you.

Related

Loaded pytorch model gives different results than originally trained model

I trained a Pytorch model, saved the testing error, and saved the complete model using torch.save(model, 'model.pt')
I loaded the model to test it on another dataset and found the error to be higher, so I tested it on the exact same dataset as before, and found the results to be different:
Here, there is not a lot of difference in the predicted values which tells me that the model is correct, but somehow different.
1 difference is that originally the model was trained on GPUs with nn.DataParallel, and while testing after loading, I am evaluating it on CPU.
model = torch.load('model.pt')
model = model.module # To remove the DataParallel module
model.eval()
with torch.no_grad():
x = test_loader.dataset.tensors[0].cuda()
pred = model(x)
mae_loss = torch.nn.L1Loss(reduction='mean')
mae = mae_loss(pred, y)
What could be causing this difference in model evaluation? Thank you in advance

How to fix inconsistent predictions right after training and after loading the saved model?

I trained my Keras (version 2.3.1) Sequential models for a regression problem and achieved very good results. Right after training, I make predictions on the test set and then save the model as well as the weights in separate files.
To check for the speed of the models, I recently loaded them and made predictions on a single test input array but the results are way off, which should mean that the weights at the end of the training are different from the ones being loaded.
I tried making predictions using the loaded model as is and from the loaded weights too. The results for both of them are consistent. So at least, it saves the same weights in both files, however wrong they are.
From what I have read, this looks like a common issue with Keras. I came across this suggestion at several places - set the global variable initializer manually.
My problem is that this suggestion, along with a few others (like setting a fixed seed), are to be put in place before training. Training my models takes 4-5 days! How can I fix this without having to retrain the models?
Here is how I fit the models:
hist = model.fit(
X_train, y_train,
batch_size=batch_size,
verbose=1,
epochs=epochs,
validation_split=0.2
)
Then I save the model as well as the weights:
model.save("path to .h5 file")
model.save_weights("path to .hdf5 file")
Eventually, I am loading the model and predicting from it like so:
from keras.models import load_model
model = load_model("path to the same .h5 file")
ypred = model.predict(input_arr)

Tensorflow: Download and run pretrained VGG or ResNet model

Let's start at the beginning. So far I have created and trained small networks in Tensorflow myself. During the training I save my model and get the following files in my directory:
model.ckpt.meta
model.ckpt.index
model.ckpt.data-00000-of-00001
Later, I load the model saved in network_dir to do some classifications and extract the trainable variables of my model.
saver = tf.train.import_meta_graph(network_dir + ".meta")
variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="NETWORK")
Now I want to work with larger pretrained models like the VGG16 or ResNet and want to use my code to do that. I want to load pretrained models like my own networks as shown above.
On this site, I found many pretrained models:
https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models
I downloaded the VGG16 checkpoint and realized that these are only the trained parameters.
I would like to know how or where I can get the saved model or graph structure of these pretrained network? How do I use, for example, the VGG16 checkpoint without model.ckpt.meta, model.ckpt.index and the model.ckpt.data-00000-of-00001 files?
Next to the weights link, there is link to the code that defines the model. For instance, for VGG16: Code. Create the model using the code and restore variables from the checkpoint:
import tensorflow as tf
slim = tf.contrib.slim
image = ... # Define your input somehow, e.g with placeholder
logits, _ = vgg.vgg_16(image)
predictions = tf.argmax(logits, 1)
variables_to_restore = slim.get_variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
with tf.Session() as sess:
saver.restore(sess, "/path/to/model.ckpt")
So, the code contained in vgg.py will create all the variables for you. Using the tf-slim helper, you can get the list. Then, just follow the usual procedure. There was a similar question on this.

Quantize a Keras neural network model

Recently, I've started creating neural networks with Tensorflow + Keras and I would like to try the quantization feature available in Tensorflow. So far, experimenting with examples from TF tutorials worked just fine and I have this basic working example (from https://www.tensorflow.org/tutorials/keras/basic_classification):
import tensorflow as tf
from tensorflow import keras
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# fashion mnist data labels (indexes related to their respective labelling in the data set)
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
# preprocess the train and test images
train_images = train_images / 255.0
test_images = test_images / 255.0
# settings variables
input_shape = (train_images.shape[1], train_images.shape[2])
# create the model layers
model = keras.Sequential([
keras.layers.Flatten(input_shape=input_shape),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
# compile the model with added settings
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# train the model
epochs = 3
model.fit(train_images, train_labels, epochs=epochs)
# evaluate the accuracy of model on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)
Now, I would like to employ quantization in the learning and classification process. The quantization documentation (https://www.tensorflow.org/performance/quantization) (the page is no longer available since cca September 15, 2018) suggests to use this piece of code:
loss = tf.losses.get_total_loss()
tf.contrib.quantize.create_training_graph(quant_delay=2000000)
optimizer = tf.train.GradientDescentOptimizer(0.00001)
optimizer.minimize(loss)
However, it does not contain any information about where this code should be utilized or how it should be connected to a TF code (not even mentioning a high level model created with Keras). I have no idea how this quantization part relates to the previously created neural network model. Just inserting it following the neural network code runs into the following error:
Traceback (most recent call last):
File "so.py", line 41, in <module>
loss = tf.losses.get_total_loss()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/util.py", line 112, in get_total_loss
return math_ops.add_n(losses, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 2119, in add_n
raise ValueError("inputs must be a list of at least one Tensor with the "
ValueError: inputs must be a list of at least one Tensor with the same dtype and shape
Is it possible to quantize a Keras NN model in this way or am I missing something basic?
A possible solution that crossed my mind could be using low level TF API instead of Keras (needing to do quite a bit of work to construct the model), or maybe trying to extract some of the lower level methods from the Keras models.
As mentioned in other answers, TensorFlow Lite can help you with network quantization.
TensorFlow Lite provides several levels of support for quantization.
Tensorflow Lite post-training quantization quantizes weights and
activations post training easily. Quantization-aware training allows
for training of networks that can be quantized with minimal accuracy
drop; this is only available for a subset of convolutional neural
network architectures.
So first, you need to decide whether you need post-training quantization or quantization-aware training. For example, if you already saved the model as *.h5 files, you would probably want to follow #Mitiku's instruction and do the post-training quantization.
If you prefer to achieve higher performance by simulating the effect of quantization in training (using the method you quoted in the question), and your model is in the subset of CNN architecture supported by quantization-aware training, this example may help you in terms of interaction between Keras and TensorFlow. Basically, you only need to add this code between model definition and its fitting:
sess = tf.keras.backend.get_session()
tf.contrib.quantize.create_training_graph(sess.graph)
sess.run(tf.global_variables_initializer())
As your network looks quite simple, you can maybe use Tensorflow lite.
Tensorflow lite can be used to quantize keras model.
The following code was written for tensorflow 1.14. It might not work for earlier versions.
First, after training the model you should save your model to h5
model.fit(train_images, train_labels, epochs=epochs)
# evaluate the accuracy of model on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)
model.save("model.h5")
To load keras model use tf.lite.TFLiteConverter.from_keras_model_file
# load the previously saved model
converter = tf.lite.TFLiteConverter.from_keras_model_file("model.h5")
tflite_model = converter.convert()
# Save the model to file
with open("tflite_model.tflite", "wb") as output_file:
output_file.write(tflite_model)
The saved model can be loaded to python script or to other platforms and languages. To use saved tflite model, tensorlfow.lite provides Interpreter. The following example from here shows how to load tflite model from local file using python scripts.
import numpy as np
import tensorflow as tf
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="tflite_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

Transfer learning/ retraining with TensorFlow Estimators

I have been unable to figure out how to use transfer learning/last layer retraining with the new TF Estimator API.
The Estimator requires a model_fn which contains the architecture of the network, and training and eval ops, as defined in the documentation. An example of a model_fn using a CNN architecture is here.
If I want to retrain the last layer of, for example, the inception architecture, I'm not sure whether I will need to specify the whole model in this model_fn, then load the pre-trained weights, or whether there is a way to use the saved graph as is done in the 'traditional' approach (example here).
This has been brought up as an issue, but is still open and the answers are unclear to me.
It is possible to load the metagraph during model definition and use SessionRunHook to load the weights from a ckpt file.
def model(features, labels, mode, params):
# Create the graph here
return tf.estimator.EstimatorSpec(mode,
predictions,
loss,
train_op,
training_hooks=[RestoreHook()])
The SessionRunHook can be:
class RestoreHook(tf.train.SessionRunHook):
def after_create_session(self, session, coord=None):
if session.run(tf.train.get_or_create_global_step()) == 0:
# load weights here
This way, the weights are loaded in first step and saved during training in model checkpoints.

Categories