Using the same script for training and serving (Estimator + hub)

Using the same script for training and serving (Estimator + hub) - python

I want to use hub at training and serving, but I am getting a little confused how to do it on the same graph. Namely I have something like
def build_graph(..., mode, ...):
tags_and_args= ... # one for training, one for serving
if mode == 'training':
hub.create_module_spec(module_fn, tags_and_args=tags_and_args)
module_output = hub.Module(...)
hub.register_module_for_export(module_fn, tags_and_args=tags_and_args)
loss, output = ...
else:
module_output = hub.Module(XXX)
should I reload the module from disk? Therefore XXX will be the path where i saved it before. Or is it somehow saved as a graph object in memory?
I will call my code as
estimator.train(...)
exporter = hub.LatestModuleExporter(...)
exporter.export(...)
esimator.export_savedmodel(...) # for serving

You can use a hub.Module in the model_fn of an Estimator without ever exporting it. At the start of Estimator.train(), the module's variables will be initialized from their pre-trained values (much like other variables are initialized randomly). After that, the module's variables behave much like the other variables of your model - they are part of the model's checkpoint, and restored from there for evaluation, resumed training, or export to a SavedModel for serving, like any other variable.
Exporting a hub.Module is only needed in case you want to create a new version of the module (with the weights updated from your training) available to yet another, separate Estimator.

Related

Saving PyTorch model with no access to model class code

How can I save a PyTorch model without a need for the model class to be defined somewhere?
Disclaimer:
In Best way to save a trained model in PyTorch?, there are no solutions (or a working solution) for saving the model without access to the model class code.

If you plan to do inference with the Pytorch library available (i.e. Pytorch in Python, C++, or other platforms it supports) then the best way to do this is via TorchScript.
I think the simplest thing is to use trace = torch.jit.trace(model, typical_input) and then torch.jit.save(trace, path). You can then load the traced model with torch.jit.load(path).
Here's a really simple example. We make two files:
train.py :
import torch
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(4, 4)
def forward(self, x):
x = torch.relu(self.linear(x))
return x
model = Model()
x = torch.FloatTensor([[0.2, 0.3, 0.2, 0.7], [0.4, 0.2, 0.8, 0.9]])
with torch.no_grad():
print(model(x))
traced_cell = torch.jit.trace(model, (x))
torch.jit.save(traced_cell, "model.pth")
infer.py :
import torch
x = torch.FloatTensor([[0.2, 0.3, 0.2, 0.7], [0.4, 0.2, 0.8, 0.9]])
loaded_trace = torch.jit.load("model.pth")
with torch.no_grad():
print(loaded_trace(x))
Running these sequentially gives results:
python train.py
tensor([[0.0000, 0.1845, 0.2910, 0.2497],
[0.0000, 0.5272, 0.3481, 0.1743]])
python infer.py
tensor([[0.0000, 0.1845, 0.2910, 0.2497],
[0.0000, 0.5272, 0.3481, 0.1743]])
The results are the same, so we are good. (Note that the result will be different each time here due to randomness of the initialisation of the nn.Linear layer).
TorchScript provides for much more complex architectures and graph definitions (including if statements, while loops, and more) to be saved in a single file, without needing to redefine the graph at inference time. See the docs (linked above) for more advanced possibilities.

I recomend you to convert you pytorch model to onnx and save it. Probably its best way to store model without an access to the class.

Supplying an official answer by one of the core PyTorch devs (smth):
There are limitations to loading a pytorch model without code.
First limitation:
We only save the source code of the class definition. We do not save beyond that (like the package sources that the class is referring to).
For example:
import foo
class MyModel(...):
def forward(input):
foo.bar(input)
Here the package foo is not saved in the model checkpoint.
Second limitation:
There are limitations on robustly serializing python constructs. For example the default picklers cannot serialize lambdas. There are helper packages that can serialize more python constructs than the standard, but they still have limitations. Dill 25 is one such package.
Given these limitations, there is no robust way to have torch.load work without having the original source files.

There is no a solutins (or working solution) for saving model without an access to the class.
You can save whatever you like.
You can save the model, torch.save(model, filepath). It saves the model object itself.
You can save just the model state dict.
torch.save(model.state_dict(), filepath)
Further, you can save anything you like, since torch.save is just a pickle based save.
state = {
'hello_text': 'just the optimizer sd will be saved',
'optimizer': optimizer.state_dict(),
}
torch.save(state, filepath)
You may check what I wrote on torch.save some time ago.

Saving/Restoring weights under different variable scopes in Tensorflow

I've been trying to research model/weight saving for a while, but I still can't fully grasp it. I feel what I'd like to do should be simple enough, but I've not found a solution.
The final goal is to do transfer laerning with a collection of pretrained networks. I write my models/layers as classes, so class method(s) for saving the weights and restoring would be ideal.
Example:
If I have a graph, features > A > B > labels, where A and B are sub-networks, I'd like to save and/or restore weights for these sections. Say I already have the weights for A trained, but the variable scope is now different, how would I restore the weights I've trained for A from a different training session? At the end of training this new graph i'd like 1 directory for my new A weights, 1 directory for my new B weights, and 1 directory for the full graph (I can handle the full graph bit).
It's very possible I keep overlooking the solution, but model saving is so poorly documented.
Hope I've explained the scenario well.

You can do this with tf.train.init_from_checkpoint
Define your model
def model_fn():
with tf.variable_scope('One'):
layer = any_tf_layer
with tf.variable_scope('Two'):
layer = any_tf_layer
Output variable names in checkpoint file
vars = [i[0] for i in tf.train.list_variables(ckpt_file)]
Then you can create assignment map to load only variables, defined in your model.
You can also assign new names to restored variables
map = {variable.op.name: variable for variable in tf.global_variables() if variable.op.name in vars}
This line is placed before session or outside model function for Estimator API
tf.train.init_from_checkpoint(ckpt_file, map)
https://www.tensorflow.org/api_docs/python/tf/train/init_from_checkpoint
You also can do it with tf.train.Saver
First you need to know the names of variables
vars_dict = {}
for var_current in tf.global_variables():
print(var_current)
print(var_current.op.name) # this gets only name
for var_ckpt in tf.train.list_variables(ckpt):
print(var_ckpt[0]) this gets only name
When you know exact names of all variables you can assign whatever value you need, provided variables have same shape and dtype. So to get a dictionary
vars_dict[var_ckpt[0]) = tf.get_variable(var_current.op.name, shape) # remember to specify shape, you can always get it from var_current
saver = tf.train.Saver(vars_dict)
Take a look at my other answer to similar question
How to restore pretrained checkpoint for current model in Tensorflow?

Saving and restoring functions in TensorFlow

I am working on a VAE project in TensorFlow where the encoder/decoder networks are build in functions. The idea is to be able to save, then load the trained model and do sampling, using the encoder function.
After restoring the model, I am having trouble getting the decoder function to run and give me back the restored, trained variables, getting an "Uninitialized value" error. I assume it is because the function is either creating a new new one, overwriting the existing, or otherwise. But I cannot figure out how to solve this. Here is some code:
class VAE(object):
def __init__(self, restore=True):
self.session = tf.Session()
if restore:
self.restore_model()
self.build_decoder = tf.make_template('decoder', self._build_decoder)
#staticmethod
def _build_decoder(z, output_size=768, hidden_size=200,
hidden_activation=tf.nn.elu, output_activation=tf.nn.sigmoid):
x = tf.layers.dense(z, hidden_size, activation=hidden_activation)
x = tf.layers.dense(x, hidden_size, activation=hidden_activation)
logits = tf.layers.dense(x, output_size, activation=output_activation)
return distributions.Independent(distributions.Bernoulli(logits), 2)
def sample_decoder(self, n_samples):
prior = self.build_prior(self.latent_dim)
samples = self.build_decoder(prior.sample(n_samples), self.input_size).mean()
return self.session.run([samples])
def restore_model(self):
print("Restoring")
self.saver = tf.train.import_meta_graph(os.path.join(self.save_dir, "turbolearn.meta"))
self.saver.restore(self.sess, tf.train.latest_checkpoint(self.save_dir))
self._restored = True
want to run samples = vae.sample_decoder(5)
In my training routine, I run:
if self.checkpoint:
self.saver.save(self.session, os.path.join(self.save_dir, "myvae"), write_meta_graph=True)
UPDATE
Based on the suggested answer below, I changed the restore method
self.saver = tf.train.Saver()
self.saver.restore(self.session, tf.train.latest_checkpoint(self.save_dir))
But now get a value error when it creates the Saver() object:
ValueError: No variables to save

The tf.train.import_meta_graph restores the graph, meaning rebuilds the network architecture that was stored to the file. The call to tf.train.Saver.restore on the other hand only restores the variable values from the file to the current graph in the session (this naturally fails if the some values of in the file belong to variables that do not exist in the currently active graph).
So if you already build the network layers in the code, you don't need to call tf.train.import_meta_graph. Otherwise this might be causing you problems.
Not sure how the rest of your code looks like but here are some suggestions. First build the graph, then create the session, and finally restore if applicable. Your init might look like this then
def __init__(self, restore=True):
self.build_decoder = tf.make_template('decoder', self._build_decoder)
self.session = tf.Session()
if restore:
self.restore_model()
However if you are only restoring the encoder, and building the decoder anew, you might build the decoder last. But then don't forget to initialize its variables before usage.

Store/Reload CNTK Trainer, Model, Inputs, Outputs

What is the best way to store a trainer and all necessary components?
1. Storing:
Store checkpoint of the trainer: Use its trainer.save_checkpoint(filename, external_state={}) function
Additionally store the model separately: Use the z.save(filename) method, every cntk operation has. You can also get z = trainer.model.
2. Reloading:
Restore the model: Use C.load_model(...). (Don't get confused by the deprecated persist namespace from the Cntk 1.)
Get the inputs from the restored model.
Restore the trainer itself: Use trainer.restore_from_checkpoint as eg. shown here. The problem is, this function already needs a trainer object which probably has to be initialized in the same way as the trainer used to create the check point!?
How do I now restore the label-inputs which are going into the error function used by the trainer? In the following code I marked the variables which I think I have to restore after I once stored them.
z = C.layers.Dense(.... )
loss = error = C.squared_error(z, **l**)
**trainer** = C.Trainer(**z**, (loss, error), [mylearner], my_tensorboard_writer)

You can restore your trainer, but I actually prefer to just load my model m. The simple reason is that it is much easier to create a whole new trainer, beacuse then you can change all the other parameters of the trainer more easily.
Then you can get the input variable from the loaded model (if your network has only one input):
input_var = m.arguments[0]
then you need the output of your model:
output = m(input_var)
and define the loss function using your target output target_output:
C.squared_error(output, target_output)
using your model and the loss function you can recreate your trainer from there, setting the learning rate etc. as you like

How to keep lookup tables initialized for prediction (and not just training)?

I create a lookup table from tf.contrib.lookup, using the training data (as input). Then, I pass every input through that lookup table, before passing it through my model.
This works for training, but when it comes to online prediction from this same model, it raises the error:
Table not initialized
I'm using SavedModel to save the model. I run the prediction from this saved model.
How can I initialize this table so that it stays initialized? Or is there a better way to save the model so that the table is always initialized?

I think you would be better off using tf.tables_initializer() as the legacy_init_op.
tf.saved_model.main_op.main_op() also adds local and global initialization ops in addition to table initialization.
when you load the saved model and it runs the legacy_init_op, it would reset your variables, which is not what you want.

You can specify an "initialization" operation when you add a meta graph to your SavedModel bundle with tf.saved_model.builder.SavedModelBuilder.add_meta_graph, using the main_op or legacy_init_op kwarg. You can either use a single operation, or group together a number of operations with tf.group if you need more than one.
Note that in Cloud ML Engine, You'll have to use the legacy_init_op. However in future runtime_versions you will be able to use main_op
(IIRC, starting with runtime_version == 1.2)
The saved_model module provides a built in tf.saved_model.main_op.main_op to wrap up common initialization actions in a single op (local variable initialization, and table initialization).
So in summary, code should look like this (adapted from this example):
exporter = tf.saved_model.builder.SavedModelBuilder(
os.path.join(job_dir, 'export', name))
# signature_def gets constructed here
with tf.Session(graph=prediction_graph) as session:
# Need to be initialized before saved variables are restored
session.run([tf.local_variables_initializer(), tf.tables_initializer()])
# Restore the value of the saved variables
saver.restore(session, latest)
exporter.add_meta_graph_and_variables(
session,
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature_def
},
# Relevant change to the linked example is here!
legacy_init_op=tf.saved_model.main_op.main_op()
)
NOTE: If you are using the high level libraries (such as tf.estimator) this should be the default, and if you need to specify additional initialization actions you can specify them as part of the tf.train.Scaffold object that you pass to your tf.estimator.EstimatorSpec in your model_fn.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using the same script for training and serving (Estimator + hub) - python

Related

Saving PyTorch model with no access to model class code

Saving/Restoring weights under different variable scopes in Tensorflow

Saving and restoring functions in TensorFlow

Store/Reload CNTK Trainer, Model, Inputs, Outputs

How to keep lookup tables initialized for prediction (and not just training)?

Categories

Resources