How to restore pretrained model to initialize parameters - python

I have downloaded an network with its pretrained model. I added several layers and parameters to the network, I want to use this pretrained model to initialize the original parameters,and random initialize new added parameters by myself.I use this code:
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, "output/saver-test")
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
but I met the error:"Key global_step not found in checkpoint",this error because I have some new parameters that didn't exist in pretrained model.But how can I solve this problem? What's more,I want to use this code "sess.run(tf.global_variables_initializer())" to initialize the new added parameters,but the extracted parameters from pretrained model will be covered by it?

It happens because of your network is not perfectly match to the loaded one.
You can use selective checkpoint loader something like that:
reader = tf.train.NewCheckpointReader(os.path.join(checkpoint_dir, ckpt_name))
restore_dict = dict()
for v in tf.trainable_variables():
tensor_name = v.name.split(':')[0]
if reader.has_tensor(tensor_name):
print('has tensor ', tensor_name)
restore_dict[tensor_name] = v
restore_dict['my_new_var_scope/my_new_var'] = self.get_my_new_var_variable()
Where get_my_new_var_variable() is something like that:
def get_my_new_var_variable(self):
with tf.variable_scope("my_new_var_scope",reuse=tf.AUTO_REUSE):
my_new_var = tf.get_variable("my_new_var", dtype=tf.int32,initializer=tf.constant([23, 42]))
return my_new_var
Loading the weights:
self.saver = tf.train.Saver(restore_dict)
self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name))
Edited:
Note that in order to avoid override the loaded variables you can use this method:
def initialize_uninitialized(sess):
global_vars = tf.global_variables()
is_not_initialized = sess.run([tf.is_variable_initialized(var) for var in global_vars])
not_initialized_vars = [v for (v, f) in zip(global_vars, is_not_initialized) if not f]
if len(not_initialized_vars):
sess.run(tf.variables_initializer(not_initialized_vars))
Or simply calling tf.global_variables_initializer() before loading the variables should work here.

Related

Changing the shape of a tensorflow variable in a checkpoint

I have a pretrained model (checkpoint, tensorflow v1) with different variables and weights. I don't know all the variables, but I know two that I want to change their shape: v1 is in the shape of [4,768] and v2 is in the shape of [4]. I want to increase both to be [5,768] and [5] respectively and save the checkpoint again for fine-tuning purposes. To fill the missing data I want to take the average values of the variables.
Here is my code:
# The vars I want to change
v1 = tf.get_variable("v1", shape=[4, 768], initializer=utils.classification_initializer())
v2 = tf.get_variable("v2", shape=[4], initializer=tf.zeros_initializer())
checkpoint = {}
saver = tf.train.Saver()
with tf.Session() as sess:
# Restore checkpoint from source location (path).
saver.restore(sess, source)
# Get the vars values
checkpoint[v1.name] = v1.eval()
checkpoint[v2.name] = v2.eval()
new_data = {}
# Calc v1 average and reshape
avg = numpy.average(checkpoint[v1.name], axis=0)
new_data[v1.name] = numpy.vstack((checkpoint[v1.name], avg))
# Calc v2 average and reshape
avg = numpy.average(checkpoint[v2.name], axis=0)
new_data[v2.name] = numpy.append(checkpoint[v2.name], avg)
# Assign the new data and shape
sess.run(tf.assign(v1, new_data[v1.name], validate_shape=False))
sess.run(tf.assign(v2, new_data[v2.name], validate_shape=False))
# Save the checkpoint to target location (path).
saver.save(sess, target)
I was expecting to see a similar size model (the source checkpoint is about 1GB), but i get a much smaller file (target checkpoint is about 15KB). It seems that is saves only the variables that I've changed and not the entire checkpoint (other vars, weights, etc).
1 - this is the way to achieve my goal (reshaping and filling 2 vars in a checkpoint)?
2 - if so, how can i save the entire model (other vars, weights, etc) and not only the loaded vars?
Update
The model was originally trained (by someone else) on a TPU machine. Therefore loading the meta graph is not working on a GPU machine (my machine).
However, using the tf.estimator.tpu.TPUEstimator I can train and predict this model. Therefore the TPUEstimator has a way to load everything, change the vars and save the model.
The model: https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_base_reset.zip
Example vars to change: output_weights_agg is a [4, 768], output_bias_agg is [4].
Full code example:
https://colab.research.google.com/drive/1yoyZ-45So5pEIGmZp85ut38lW653KHXL?usp=sharing
In your code, There are only two variables(v1 and v2) in your graph, then the saver would only restore them from checkpoint.
You can import graph from checkpoint first and do what you want to do then.
Sample code from yours, tensorflow version==1.15.0
checkpoint = {}
# import graph from checkoutpoint meta like "/tmp/model.ckpt.meta"
saver = tf.train.import_meta_graph("path of meta")
with tf.Session() as sess:
# Restore checkpoint from source location (path) like "/tmp/model.ckpt".
saver.restore(sess, source)
# print(tf.global_variables())
# you can get the name of variable from graph through tf.global_variables()
v1 = [v for v in tf.global_variables() if v.name == "v1:0"][0]
v2 = [v for v in tf.global_variables() if v.name == "v2:0"][0]
# Get the vars values
checkpoint[v1.name] = v1.eval()
checkpoint[v2.name] = v2.eval()
new_data = {}
# Calc v1 average and reshape
avg = numpy.average(checkpoint[v1.name], axis=0)
new_data[v1.name] = numpy.vstack((checkpoint[v1.name], avg))
# Calc v2 average and reshape
avg = numpy.average(checkpoint[v2.name], axis=0)
new_data[v2.name] = numpy.append(checkpoint[v2.name], avg)
# Assign the new data and shape
sess.run(tf.assign(v1, new_data[v1.name], validate_shape=False))
sess.run(tf.assign(v2, new_data[v2.name], validate_shape=False))
# Save the checkpoint to target location (path).
saver.save(sess, target)
update
a workaround: print all tensors from checkpoints, get their names and shapes,then use tf.get_variable to build variable in graph.
from tensorflow.python.tools.inspect_checkpoint import print_tensors_in_checkpoint_file
# List ALL tensors
print_tensors_in_checkpoint_file(file_name=checkpoint_path, tensor_name='', all_tensors=True)
Alternative method:
# List ALL tensors
vars_list = tf.train.list_variables(checkpoint_path)
print(vars_list)
PS: When import graph from meta data comes from other platforms, we may need to build graph ourselves, and the relations between nodes could be found from graph_def
tf.get_default_graph().as_graph_def().node

Restore a saved neural network in Tensorflow

Before marking my question as duplicate, I want you to understand that I have went through a lot of questions, but none of the solutions there were able to clear my doubts and solve my problem. I have a trained neural network which I want to save, and later use this model to test this model against test dataset.
I tried saving and restoring it, but I am not getting the expected results. Restoring doesn't seem to work, maybe I am using it wrongly, it is just using the values given by the global variable initializer.
This is the code I am using for saving the model.
sess.run(tf.initializers.global_variables())
#num_epochs = 7
for epoch in range(num_epochs):
start_time = time.time()
train_accuracy = 0
train_loss = 0
val_loss = 0
val_accuracy = 0
for bid in range(int(train_data_size/batch_size)):
X_train_batch = X_train[bid*batch_size:(bid+1)*batch_size]
y_train_batch = y_train[bid*batch_size:(bid+1)*batch_size]
sess.run(optimizer, feed_dict = {x:X_train_batch, y:y_train_batch,prob:0.50})
train_accuracy = train_accuracy + sess.run(model_accuracy, feed_dict={x : X_train_batch,y:y_train_batch,prob:0.50})
train_loss = train_loss + sess.run(loss_value, feed_dict={x : X_train_batch,y:y_train_batch,prob:0.50})
for bid in range(int(val_data_size/batch_size)):
X_val_batch = X_val[bid*batch_size:(bid+1)*batch_size]
y_val_batch = y_val[bid*batch_size:(bid+1)*batch_size]
val_accuracy = val_accuracy + sess.run(model_accuracy,feed_dict = {x:X_val_batch, y:y_val_batch,prob:0.75})
val_loss = val_loss + sess.run(loss_value, feed_dict = {x:X_val_batch, y:y_val_batch,prob:0.75})
train_accuracy = train_accuracy/int(train_data_size/batch_size)
val_accuracy = val_accuracy/int(val_data_size/batch_size)
train_loss = train_loss/int(train_data_size/batch_size)
val_loss = val_loss/int(val_data_size/batch_size)
end_time = time.time()
saver.save(sess,'./blood_model_x_v2',global_step = epoch)
After saving the model, the files are written in my working directory something like this.
blood_model_x_v2-2.data-0000-of-0001
blood_model_x_v2-2.index
blood_model_x_v2-2.meta
Similarly, v2-3, so on to v2-6, and then a 'checkpoint' file. I then tried restoring it using this code snippet (after initializing),but getting different results from the expected one. What am I doing wrong ?
saver = tf.train.import_meta_graph('blood_model_x_v2-5.meta')
saver.restore(test_session,tf.train.latest_checkpoint('./'))
According to tensorflow docs:
Restore
Restores previously saved variables.
This method runs the ops added by the constructor for restoring
variables. It requires a session in which the graph was launched. The
variables to restore do not have to have been initialized, as
restoring is itself a way to initialize variables.
Let's see an example:
We save the model similar to this:
import tensorflow as tf
# Prepare to feed input, i.e. feed_dict and placeholders
w1 = tf.placeholder("float", name="w1")
w2 = tf.placeholder("float", name="w2")
b1 = tf.Variable(2.0, name="bias")
feed_dict = {w1: 4, w2: 8}
# Define a test operation that we will restore
w3 = tf.add(w1, w2)
w4 = tf.multiply(w3, b1, name="op_to_restore")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Create a saver object which will save all the variables
saver = tf.train.Saver()
# Run the operation by feeding input
print (sess.run(w4, feed_dict))
# Prints 24 which is sum of (w1+w2)*b1
# Now, save the graph
saver.save(sess, './ckpnt/my_test_model', global_step=1000)
And then load the trained model with:
import tensorflow as tf
sess = tf.Session()
# First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('./ckpnt/my_test_model-1000.meta')
saver.restore(sess, tf.train.latest_checkpoint('./ckpnt'))
# Now, let's access and create placeholders variables and
# create feed-dict to feed new data
graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("w1:0")
w2 = graph.get_tensor_by_name("w2:0")
feed_dict = {w1: 13.0, w2: 17.0}
# Now, access the op that you want to run.
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")
print (sess.run(op_to_restore, feed_dict))
# This will print 60 which is calculated
# using new values of w1 and w2 and saved value of b1.
As you can see we do not initialize our session in the restoring part. There is better way to save and restore model with Checkpoint which allows you to check whether the model is restored correctly or not.

Tensorflow MNIST Sample: Code to Predict from SavedModel

I am using the sample to build a CNN as per this article: https://www.tensorflow.org/tutorials/layers
However, I am unable to find a sample to predict by feeding in a sample image. Any help here would be highly appreciated.
Below is what I have tried, and not able to find the output tensor name
img = <load from file>
sess = tf.Session()
saver = tf.train.import_meta_graph('/tmp/mnist_convnet_model/model.ckpt-2000.meta')
saver.restore(sess, tf.train.latest_checkpoint('/tmp/mnist_convnet_model/'))
input_place_holder = sess.graph.get_tensor_by_name("enqueue_input/Placeholder:0")
out_put = <not sure what the tensor output name in the graph>
current_input = img
result = sess.run(out_put, feed_dict={input_place_holder: current_input})
print(result)
You can use the inspect_checkpoint tool in Tensorflow to find the tensors inside a checkpoint file.
from tensorflow.python.tools.inspect_checkpoint import print_tensors_in_checkpoint_file
print_tensors_in_checkpoint_file(file_name="tmp/mnist_convnet_model/model.ckpt-2000.meta", tensor_name='')
There are nice instructions on how to save and restore in tensorflows programming guide. Here is a small example inspired from the latter link. Just make sure that the ./tmp dir exists
import tensorflow as tf
# Create some variables.
variable = tf.get_variable("variable_1", shape=[3], initializer=tf.zeros_initializer)
inc_v1=variable.assign(variable + 1)
# Operation to initialize variables if we do not restore from checkpoint
init_op = tf.global_variables_initializer()
# Create the saver
saver = tf.train.Saver()
with tf.Session() as sess:
# Setting to decide wether or not to restore
DO_RESTORE=True
# Where to save the data file
save_path="./tmp/model.ckpt"
if DO_RESTORE:
# If we want to restore, load the variables from the saved file
saver.restore(sess, save_path)
else:
# If we don't want to restore, then initialize variables
# using their specified initializers.
sess.run(init_op)
# Print the initial values of variable
initial_var_value=sess.run(variable)
print("Initial:", initial_var_value)
# Do some work with the model.
incremented=sess.run(inc_v1)
print("Incremented:", incremented)
# Save the variables to disk.
save_path = saver.save(sess, save_path)
print("Model saved in path: %s" % save_path)

TensorFlow: Restoring a model

I'm trying to save my model at the end of triaining and restore it every time the training begins. I just followed what this link did.
saver = tf.train.Saver()
with tf.Session(graph=graph) as session:
# Initializate the weights and biases
tf.global_variables_initializer().run()
new_saver = tf.train.import_meta_graph('model.meta')
new_saver.restore(sess,tf.train.latest_checkpoint('./'))
W1 = session.run(W)
print(W1)
for curr_epoch in range(num_epochs):
train_cost = train_ler = 0
start = time.time()
for batch in range(num_batches_per_epoch):
...Some training...
W2 = session.run(W)
print(W2)
save_path = saver.save(session, "models/model")
But it gives error below:
---> new_saver.restore(session, tf.train.latest_checkpoint('./'))
SystemError: <built-in function TF_Run> returned a result with an error set
Can anyone help me please? Many thanks!
If you're gonna load with ./ you have to make sure, that your console (that you use to start the python program) is actually set on that directory (models/).
But in that case, it will save your new data in a new directory. So load with ./models/ instead
(Also you don't need to initiate variables, the restore does that for you.)

Tensorflow: What is the output node name in Cifar-10 model?

I'm trying to understand Tensorflow and I'm seeing one of the official examples, the Cifar-10 model.
In cifar10.py, in inference(), you can see the following lines:
with tf.variable_scope('softmax_linear') as scope:
weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
stddev=1/192.0, wd=0.0)
biases = _variable_on_cpu('biases', [NUM_CLASSES],
tf.constant_initializer(0.0))
softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
_activation_summary(softmax_linear)
scope.name should be softmax_linear, and that should be the node's name. I saved the graph proto with the following lines (it differs from the tutorial):
with tf.Graph().as_default():
global_step = tf.Variable(0, trainable=False)
# Get images and labels
images, labels = cifar10.distorted_inputs()
# Build a Graph that computes the logits predictions from the
# inference model.
logits = cifar10.inference(images)
# Calculate loss.
loss = cifar10.loss(logits, labels)
# Build a Graph that trains the model with one batch of examples and
# updates the model parameters.
train_op = cifar10.train(loss, global_step)
# Create a saver.
saver = tf.train.Saver(tf.global_variables())
# Build the summary operation based on the TF collection of Summaries.
summary_op = tf.summary.merge_all()
# Build an initialization operation to run below.
init = tf.global_variables_initializer()
# Start running operations on the Graph.
sess = tf.Session(config=tf.ConfigProto(
log_device_placement=FLAGS.log_device_placement))
sess.run(init)
# save the graph
tf.train.write_graph(sess.graph_def, FLAGS.train_dir, 'model.pbtxt')
....
But I can't see a node called softmax_linear in model.pbtxt. What am I doing wrong? I just want the name of the output node to export the graph.
The operator name won't be "softmax_linear". The tf.name_scope() prefixes names of operators with its name, separated by a /. Each operator has its own name. For example, if you write
with tf.name_scope("foo"):
a = tf.constant(1, name="bar")
then the constant will have name "foo/bar".
Hope that helps!

Categories