I've been trying to research model/weight saving for a while, but I still can't fully grasp it. I feel what I'd like to do should be simple enough, but I've not found a solution.
The final goal is to do transfer laerning with a collection of pretrained networks. I write my models/layers as classes, so class method(s) for saving the weights and restoring would be ideal.
Example:
If I have a graph, features > A > B > labels, where A and B are sub-networks, I'd like to save and/or restore weights for these sections. Say I already have the weights for A trained, but the variable scope is now different, how would I restore the weights I've trained for A from a different training session? At the end of training this new graph i'd like 1 directory for my new A weights, 1 directory for my new B weights, and 1 directory for the full graph (I can handle the full graph bit).
It's very possible I keep overlooking the solution, but model saving is so poorly documented.
Hope I've explained the scenario well.
You can do this with tf.train.init_from_checkpoint
Define your model
def model_fn():
with tf.variable_scope('One'):
layer = any_tf_layer
with tf.variable_scope('Two'):
layer = any_tf_layer
Output variable names in checkpoint file
vars = [i[0] for i in tf.train.list_variables(ckpt_file)]
Then you can create assignment map to load only variables, defined in your model.
You can also assign new names to restored variables
map = {variable.op.name: variable for variable in tf.global_variables() if variable.op.name in vars}
This line is placed before session or outside model function for Estimator API
tf.train.init_from_checkpoint(ckpt_file, map)
https://www.tensorflow.org/api_docs/python/tf/train/init_from_checkpoint
You also can do it with tf.train.Saver
First you need to know the names of variables
vars_dict = {}
for var_current in tf.global_variables():
print(var_current)
print(var_current.op.name) # this gets only name
for var_ckpt in tf.train.list_variables(ckpt):
print(var_ckpt[0]) this gets only name
When you know exact names of all variables you can assign whatever value you need, provided variables have same shape and dtype. So to get a dictionary
vars_dict[var_ckpt[0]) = tf.get_variable(var_current.op.name, shape) # remember to specify shape, you can always get it from var_current
saver = tf.train.Saver(vars_dict)
Take a look at my other answer to similar question
How to restore pretrained checkpoint for current model in Tensorflow?
Related
I want to use hub at training and serving, but I am getting a little confused how to do it on the same graph. Namely I have something like
def build_graph(..., mode, ...):
tags_and_args= ... # one for training, one for serving
if mode == 'training':
hub.create_module_spec(module_fn, tags_and_args=tags_and_args)
module_output = hub.Module(...)
hub.register_module_for_export(module_fn, tags_and_args=tags_and_args)
loss, output = ...
else:
module_output = hub.Module(XXX)
should I reload the module from disk? Therefore XXX will be the path where i saved it before. Or is it somehow saved as a graph object in memory?
I will call my code as
estimator.train(...)
exporter = hub.LatestModuleExporter(...)
exporter.export(...)
esimator.export_savedmodel(...) # for serving
You can use a hub.Module in the model_fn of an Estimator without ever exporting it. At the start of Estimator.train(), the module's variables will be initialized from their pre-trained values (much like other variables are initialized randomly). After that, the module's variables behave much like the other variables of your model - they are part of the model's checkpoint, and restored from there for evaluation, resumed training, or export to a SavedModel for serving, like any other variable.
Exporting a hub.Module is only needed in case you want to create a new version of the module (with the weights updated from your training) available to yet another, separate Estimator.
What is the best way to store a trainer and all necessary components?
1. Storing:
Store checkpoint of the trainer: Use its trainer.save_checkpoint(filename, external_state={}) function
Additionally store the model separately: Use the z.save(filename) method, every cntk operation has. You can also get z = trainer.model.
2. Reloading:
Restore the model: Use C.load_model(...). (Don't get confused by the deprecated persist namespace from the Cntk 1.)
Get the inputs from the restored model.
Restore the trainer itself: Use trainer.restore_from_checkpoint as eg. shown here. The problem is, this function already needs a trainer object which probably has to be initialized in the same way as the trainer used to create the check point!?
How do I now restore the label-inputs which are going into the error function used by the trainer? In the following code I marked the variables which I think I have to restore after I once stored them.
z = C.layers.Dense(.... )
loss = error = C.squared_error(z, **l**)
**trainer** = C.Trainer(**z**, (loss, error), [mylearner], my_tensorboard_writer)
You can restore your trainer, but I actually prefer to just load my model m. The simple reason is that it is much easier to create a whole new trainer, beacuse then you can change all the other parameters of the trainer more easily.
Then you can get the input variable from the loaded model (if your network has only one input):
input_var = m.arguments[0]
then you need the output of your model:
output = m(input_var)
and define the loss function using your target output target_output:
C.squared_error(output, target_output)
using your model and the loss function you can recreate your trainer from there, setting the learning rate etc. as you like
Hi I have a model which is based on this https://github.com/igormq/asr-study/tree/keras-2 that is able to just about save okay but is unable to load (either full mode or json/weights) due to the fact the loss isn't defined properly.
inputs = Input(name='inputs', shape=(None, num_features))
...
o = TimeDistributed(Dense(num_hiddens))(inputs)
# Output layer
outputs = TimeDistributed(Dense(num_classes))(o)
# Define placeholders
labels = Input(name='labels', shape=(None,), dtype='int32', sparse=True)
inputs_length = Input(name='inputs_length', shape=(None,), dtype='int32')
# Define a decoder
dec = Lambda(ctc_utils.decode, output_shape=ctc_utils.decode_output_shape,
arguments={'is_greedy': True}, name='decoder')
y_pred = dec([output, inputs_length])
loss = ctc_utils.ctc_loss(output, labels, input_length)
model = Model(input=[inputs, labels, inputs_length], output=y_pred)
model.add_loss(loss)
opt = Adam(lr=args.lr, clipnorm=args.clipnorm)
# Compile with dummy loss
model.compile(optimizer=opt, loss=None, metrics=[metrics.ler])
This will compile and run (note it uses the add_loss function which isn't very well documented). It can even be convinced to save with a bit of work - as this post hints (https://github.com/fchollet/keras/issues/5179) you can make it save by forcing the graph to be complete. I did this by making a dummy lambda loss function to bring in the inputs that weren't fully part of the graph, now this appears to work.
#this captures all the dangling nodes so will now save
fake_dummy_loss = Lambda(fake_ctc_loss,output_shape(1,),name=ctc)([y_pred,labels,inputs_length])
def fake_ctc_loss(args):
return tf.Variable(tf.zeros([1]),name="fakeloss")
We can add this to the model like so:
model = Model(input=[inputs, labels, inputs_length], output=[y_pred, fake_dummy_loss])
Now the loss when trying to load, says that it cannot due to the fact that it is missing a loss function (i guess this is because it's set to None despite add_loss being used.
Any help here appreciated
I faced a similar problem in a project of mine in which add_loss is used to manually add a custom loss function to my model. You can see my model here: Keras Loss Function with Additional Dynamic Parameter As you found, loading the model with load_model fails, complaining about a missing loss function.
Anyway, my solution was to save and load the model's weights rather than the whole model. The Model class has a save_weights method, which is discussed here: https://keras.io/models/about-keras-models/ Likewise, there's a load_weights method. Using these methods, you should be able to save and load the model just fine. The downside is that you have to define the model upfront, and then load the weights. In my project that wasn't an issue and only involved a small refactor.
Hope that helps.
I create a lookup table from tf.contrib.lookup, using the training data (as input). Then, I pass every input through that lookup table, before passing it through my model.
This works for training, but when it comes to online prediction from this same model, it raises the error:
Table not initialized
I'm using SavedModel to save the model. I run the prediction from this saved model.
How can I initialize this table so that it stays initialized? Or is there a better way to save the model so that the table is always initialized?
I think you would be better off using tf.tables_initializer() as the legacy_init_op.
tf.saved_model.main_op.main_op() also adds local and global initialization ops in addition to table initialization.
when you load the saved model and it runs the legacy_init_op, it would reset your variables, which is not what you want.
You can specify an "initialization" operation when you add a meta graph to your SavedModel bundle with tf.saved_model.builder.SavedModelBuilder.add_meta_graph, using the main_op or legacy_init_op kwarg. You can either use a single operation, or group together a number of operations with tf.group if you need more than one.
Note that in Cloud ML Engine, You'll have to use the legacy_init_op. However in future runtime_versions you will be able to use main_op
(IIRC, starting with runtime_version == 1.2)
The saved_model module provides a built in tf.saved_model.main_op.main_op to wrap up common initialization actions in a single op (local variable initialization, and table initialization).
So in summary, code should look like this (adapted from this example):
exporter = tf.saved_model.builder.SavedModelBuilder(
os.path.join(job_dir, 'export', name))
# signature_def gets constructed here
with tf.Session(graph=prediction_graph) as session:
# Need to be initialized before saved variables are restored
session.run([tf.local_variables_initializer(), tf.tables_initializer()])
# Restore the value of the saved variables
saver.restore(session, latest)
exporter.add_meta_graph_and_variables(
session,
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature_def
},
# Relevant change to the linked example is here!
legacy_init_op=tf.saved_model.main_op.main_op()
)
NOTE: If you are using the high level libraries (such as tf.estimator) this should be the default, and if you need to specify additional initialization actions you can specify them as part of the tf.train.Scaffold object that you pass to your tf.estimator.EstimatorSpec in your model_fn.
I want to pre-train a model on dataset_A with tensorflow, save the model with tf.train.Saver into checkpoint files. And then restore the model next time and fine tune some variables in the graph(i.e. not all the trainable_variables) with dataset_B.
In the pre-training phase(pretrain.py), the variable is defined as follow:
with tf.variable_scope("finetune"):
full_connect_W = tf.get_variable(name="full_connect_W", shape=[n_hidden, num_class], initializer=tf.random_normal_initializer())
In the fine-tuning phase(finetune.py), I have to get the variable for the optimizer's var_list. The codes bellow raise ValueError:
with tf.variable_scope("finetune") as scope:
scope.reuse_variables()
full_connect_w = tf.get_variable("full_connect_W:0")
ValueError: Variable finetune/full_connect_W:0 does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
enter code here
Even if I remove the suffix ":0" in the variable name, same error raised again.
ValueError: Variable finetune/full_connect_W does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
I managed to solve this problem in an ugly way. I restore the checkpoint file and find that finetune/full_connect_W:0 is the first item in tf.trainable_variables. So I get the variable like this:
full_connect_W = tf.trainable_variables()[0]
So the first question is: Is there any methods like get_variable or get_tensor_by_xxx to get the variable I want to fine tune?
Another problem is, when I try to train the model with a new optimizer defined in the finetune phase (finetune.py), some error occurs. It seems the optimizer needs to be initialized.
full_connect_W = tf.trainable_variables()[0]
full_connect_b = tf.trainable_variables()[1]
finetune_varlist = [full_connect_W, full_connect_b]
cost = g.get_tensor_by_name("cost:0")
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost, var_list=finetune_varlist)
tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value beta1_power_2
[[Node: beta1_power_2/read = IdentityT=DT_FLOAT, _class=["loc:#finetune/full_connect_W"], _device="/job:localhost/replica:0/task:0/gpu:0"]]
Caused by op u'beta1_power_2/read', defined at:
File "finetune_lstm_videotitle_test.py", line 72, in
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost, var_list=finetune_varlist)
But if I initialize with tf.initialize_all_variables(), all the weights(variables) trained in the pre-trainning phrase with be reset.
So the second question is: How can I fine tune specific variables and fixed other variables?