Tensorflow graph results appear random after restor - python

I trained a model to predict the next word in a sequence. I saved the model using tf.train.Saver(). However, when I go to restore the model and supply it the same seed value, the output changes each time I run the testing. For example, if I supply it with the words "happy birthday to", it will predict "you", but then , if I run it 10 seconds later, it will predict "rhyno". I have a feeling that this might be due to me randomly initializing the internal layers to random normal weights, however, wouldn't restoring the model restore the values after training and not reinitialize the layers? My restore code is below:
with tf.Session() as sess:
saved_model = tf.train.import_meta_graph(
'C:/Users/me/my_model.meta') # load graph from training
saved_model.restore(sess, tf.train.latest_checkpoint('./'))
imported_graph = tf.get_default_graph()
x = imported_graph.get_operation_by_name("ph_x").outputs[0]
prediction = imported_graph.get_tensor_by_name('prediction:0')
run_input = seed_values
print(np.array2string(run_input, separator=" "))
for _ in range(production_size):
run_input_oh = hlp.word_to_one_hot(run_input, hp_dict, 0)
pred = hlp.one_hot_to_word(sess.run(prediction, feed_dict={x: run_input_oh}), rev_dict)
print(sess.run(prediction, feed_dict={x: run_input_oh}))

You called default graph after restoring the saved weights. This will ignore restored weight.
Solution:
First call default graph, then restore the weights!
Try this!
with tf.Session() as sess:
saved_model = tf.train.import_meta_graph(
'C:/Users/me/my_model.meta') # load graph from training
imported_graph = tf.get_default_graph()
saved_model.restore(sess, tf.train.latest_checkpoint('./'))
...

#midhun pk's answer is mistaken, calling tf.get_default_graph() does not modify the graph, and calling it before or after saved_model.restore makes no difference.
Your code seems fine (calling import_meta_graph adds the nodes of the saved graph to the current graph, and calling restore restores the states of the variables), and it's difficult to debug without more information about your model. (eg what are run_input, seed_values, etc?) Can you provide a minimal reproducible example?
You should be able to verify if you variables are correctly restored by printing the value of the variable at save and restore time. Before saving, you can do print(sess.run(variable)) (or use tf.Print). After restoring, you can check the weights of the restored variables as follows: Supposing your variable's name is "XX", do:
var_value = imported_graph.get_tensor_by_name("XX:0")
print(sess.run(var_value))

I was able to find the issue, and it did not deal with the process of calling the saved weights.
When I first built the model for training, I created a dictionary from a text file by creating a set. In testing, I built the dictionary from the same text file, assuming that the order of elements would remain the same. Do not make this assumption. The order can change, hence the seemingly random results.

Related

TensorFlow Eager Mode: How to restore a model from a checkpoint?

I've trained a CNN model in TensorFlow eager mode. Now I'm trying to restore the trained model from a checkpoint file but haven't got any success.
All the examples (as shown below) I've found are talking about restoring checkpoint to a Session. But what I need is to restore the model into eager mode, i.e. without creating a session.
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
Basically what I need is something like:
tfe.enable_eager_execution()
model = tfe.restore('model.ckpt')
model.predict(...)
and then I can use the model to make predictions.
Can someone please help?
Update
The example code can be found at: mnist eager mode demo
I've tried to follow the steps from #Jay Shah 's answer and it almost worked but the restored model doesn't have any variables in it.
tfe.save_network_checkpoint(model,'./test/my_model.ckpt')
Out[58]:
'./test/my_model.ckpt-1720'
model2 = MNISTModel()
tfe.restore_network_checkpoint(model2,'./test/my_model.ckpt-1720')
model2.variables
Out[72]:
[]
The original model has lots of variables in it.:
model.variables
[<tf.Variable 'mnist_model_1/conv2d/kernel:0' shape=(5, 5, 1, 32) dtype=float32, numpy=
array([[[[ -8.25184360e-02, 6.77833706e-03, 6.97569922e-02,...
Eager Execution is still a new feature in TensorFlow, and was not included in the latest version, so not all features, are supported, but fortunately, loading a model from a saved checkpoint is.
You'll need to use the tfe.Saver class (which is a thin wrapper over the tf.train.Saver class), and your code should look something like this:
saver = tfe.Saver([x, y])
saver.restore('/tmp/ckpt')
Where [x,y] represents the list of variables and/or models you wish to restore. This should precisely match the variables passed when the saver that created the checkpoint was initially created.
More details, including sample code, can be found here, and the API details of the saver can be found here.
Ok, after spending a few hours running the code in line-by-line mode, I've figured out a way to restore a checkpoint to a new TensorFlow Eager Mode model.
Using the examples from TF Eager Mode MNIST
Steps:
After your model has been trained, find the latest checkpoint(or the checkpoint you want) index file from the checkpoint folder created in the training process, such as 'ckpt-25800.index'. Use only the filename 'ckpt-25800' while restoring in step 5.
Start a new python terminal and enable TensorFlow Eager mode by running:
tfe.enable_eager_execution()
Create a new instance of the MNISTMOdel:
model_new = MNISTModel()
Initialise the variables for model_new by running a dummy train process once.(This step is important. Without initialising the variables first, they can't be restored by the following step. However I can't find another way to initialise variables in Eager mode other than what I did below.)
model_new(tfe.Variable(np.zeros((1,784),dtype=np.float32)), training=True)
Restore the variables to model_new using the checkpoint identified in step 1.
tfe.Saver((model_new.variables)).restore('./tf_checkpoints/ckpt-25800')
If restore process is successful, you should see something like:
INFO:tensorflow:Restoring parameters from ./tf_checkpoints/ckpt-25800
Now the checkpoint has been successfully restored to model_new and you can use it to make predictions on new data.
I like to share TFLearn library which is Deep learning library featuring a higher-level API for TensorFlow. With the help of this library you can easily save and restore a model.
Saving a model
model = tflearn.DNN(net) #Here 'net' is your designed network model.
#This is a sample example for training the model
model.fit(train_x, train_y, n_epoch=10, validation_set=(test_x, test_y), batch_size=10, show_metric=True)
model.save("model_name.ckpt")
Restore a model
model = tflearn.DNN(net)
model.load("model_name.ckpt")
For more example of tflearn you can check some site like...
My first CNN in TFLearn.
Github Link
First you save your model in a checkpoint by doing following:
saver.save(sess, './my_model.ckpt')
In above line you are saving you session in "my_model.ckpt" checkpoint
Following code restores the model
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, './my_model.ckpt')
When you restore the session as a model then you restores your model from the ckpt
For eager mode to save :
tf.contrib.eager.save_network_checkpoint(sess,'./my_model.ckpt')
For eager mode to restore :
tf.contrib.eager.restore_network_checkpoint(sess,'./my_model.ckpt')
sess is an object of class Network. Any object of class Network can be saved and restored. A quick explanation of network objects :-
class TwoLayerNetwork(tfe.Network):
def __init__(self, name):
super(TwoLayerNetwork, self).__init__(name=name)
self.layer_one = self.track_layer(tf.layers.Dense(16, input_shape=(8,)))
self.layer_two = self.track_layer(tf.layers.Dense(1, input_shape=(16,)))
def call(self, inputs):
return self.layer_two(self.layer_one(inputs))
After constructing an object and calling the Network, a list of variables
created by tracked Layers is available via Network.variables:
python
sess = TwoLayerNetwork(name="net") # sess is object of Network
output = sess(tf.ones([1, 8]))
print([v.name for v in sess.variables])
```
=================================================================
This example prints variable names, one kernel and one bias per
`tf.layers.Dense` layer:
['net/dense/kernel:0',
'net/dense/bias:0',
'net/dense_1/kernel:0',
'net/dense_1/bias:0']
These variables can be passed to a `Saver` (`tf.train.Saver`, or
`tf.contrib.eager.Saver` when executing eagerly) to save or restore the
`Network`
=================================================================
```
tfe.save_network_checkpoint(sess,'./my_model.ckpt') # saving the model
tfe.restore_network_checkpoint(sess,'./my_model.ckpt') # restoring
Saving variables with tfe.Saver().save() :
for epoch in range(epochs):
train_and_optimize()
all_variables = model.variables + optimizer.variables()
# save the varibles
tfe.Saver(all_variables).save(checkpoint_prefix)
And then reload saved variables with tfe.Saver().restore() :
tfe.Saver((model.variables + optimizer.variables())).restore(checkpoint_prefix)
Then the model is loaded with the saved variables, and no need to create a new one as in #Stefan Falk 's answer.

TensorFlow Inference Graph - Loading and Restoring Variables impact

This is closely related to a lot of questions, including one of my own here: TensorFlow Inference
Every sample in TensorFlow for inference appears to follow this form:
import tensorflow as tf
import CONSTANTS
import Vgg3CIFAR10
import numpy as np
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
rand = np.random.rand(1, 32, 32, 3).astype(np.float32)
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
logits = Vgg3CIFAR10.inference(images)
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta')
new_saver.restore(sess, MODEL_PATH)
print(sess.run(logits, feed_dict={images : rand}))
print('done')
run_inference()
Issues:
Restoring the model & graph does just that...except I am creating a parallel graph here where I am possibly adding new parts to the graph. (Except tensorflow graphs are append only; so how does this add to the graph and run just that segment if it is appended; it would want to run the whole thing.
What happens to the queue runners that existed in the loaded graph; all those ops are loaded. By printing out sess.graph.get_operations() you can see all of the old input ops are there.
Does logits = Vgg3CIFAR10.inference(images) not append new items to the graph? If it is because of naming; then does the placeholder input replace the queue runner stuff?
Possible answer for a few items: Because I defined the logits op first; this means that the rest of the graph got appended after that; and via some tensorflow magic sauce the variables from the original graph got restored into the logits portion of the graph?
So I tested this out; and it doesn't even work properly...
It first creates a graph with logits, then it appends to that graph the old graph. So when you call inference; you just get a bunch of garbage back...
[[ 0.09815982 0.09611271 0.10542709 0.10383813 0.0955615 0.10979554
0.12138291 0.09316944 0.08336139 0.09319157]]
[[ 0.10305423 0.092167 0.10572157 0.10368075 0.1043573 0.10057402
0.12435613 0.08916584 0.07929172 0.09763144]]
[[ 0.1068181 0.09361464 0.10377798 0.10060066 0.10110897 0.09462726
0.11688241 0.09941135 0.0869903 0.09616835]]
Here I am expecting node 8 followed by nodes 2 and 2 to be the ones surfaced...obviously its just a bunch of nothing...
So after a ton of review this is what happens...
If you add anything to the graph before restoring the graph; the restored graph is appended to the already created graph.
Restoring variables looks for variable names in your graph at the time of restoring variables with the same name as the variables that are stored in the meta files. If you created a graph with the same variable names, and then restored a graph that also had the same variable names, the test I ran showed the appended stored graph receiving the restoration of the variables and not the initial graph.
So in summary; be careful about what you are doing with your graphs and be very aware of how things get appended/restored. If you were looking at this in hopes to find inference; then look at this S.O. where the answer involves creating a new graph and restoring variables to that new graph which is in fact a subgraph of the original.
TensorFlow Inference

TensorFlow - import meta graph and use variables from it

I'm training classification CNN using TensorFlow v0.12, and then want to create labels for new data using the trained model.
At the end of the training script, I added those lines of code:
saver = tf.train.Saver()
save_path = saver.save(sess,'/home/path/to/model/model.ckpt')
After the training completed, the files appearing in the folder are: 1. checkpoint ; 2. model.ckpt.data-00000-of-00001 ; 3. model.ckpt.index ; 4. model.ckpt.meta
Then I tried to restore the model using the .meta file. Following this tutorial, I added the following line into my classification code:
saver=tf.train.import_meta_graph(savepath+'model.ckpt.meta') #line1
and then:
saver.restore(sess, save_path=savepath+'model.ckpt') #line2
Before that change, I needed to build the graph again, and then write (instead of line1):
saver = tf.train.Saver()
But, deleting the graph building, and using line1 in order to restore it, raised an error. The error was that I used a variable from the graph inside my code, and the python didn't recognize it:
predictions = sess.run(y_conv, feed_dict={x: patches,keep_prob: 1.0})
The python didn't recognize the y_conv parameter. There is a way to restore the variables using the meta graph? if not, what os this restore helping, if I can't use variables from the original graph?
I know this question isn't so clear, but it was hard for me to express the problem in words. Sorry about it...
Thanks for answering, appreciate your help! Roi.
it is possible, don't worry. Assuming you don't want to touch the graph anymore, do something like this:
saver = tf.train.import_meta_graph('model/export/{}.meta'.format(model_name))
saver.restore(sess, 'model/export/{}'.format(model_name))
graph = tf.get_default_graph()
y_conv = graph.get_operation_by_name('y_conv').outputs[0]
predictions = sess.run(y_conv, feed_dict={x: patches,keep_prob: 1.0})
A preferred way would however be adding the ops into collections when you build the graph and then referring to them. So when you define the graph, you would add the line:
tf.add_to_collection("y_conv", y_conv)
And then after you import the metagraph and restore it, you would call:
y_conv = tf.get_collection("y_conv")[0]
It is actually explained in the documentation - the exact page you linked - but perhaps you missed it.
Btw, no need for the .ckpt extension, it might create some confusion as that is the old way of saving models.
Just to add to Roberts's answer - after obtaining a saver from the meta graph, and using it to restore the variables in the current session, you can also use:
y_conv = graph.get_tensor_by_name('y_conv:0')
This'll work if you've created the y_conv with explicitly adding the name="y_conv" argument (all TF ops have this).

Issue with TensorFlow saving

I am training neural nets with TensorFlow, and the model's training is working using a custom implementation of batch gradient descent. I have a logging function which records validation error, and it gets down to about 2.6%. I'm saving the model every 10 epochs using a tf.train.Saver.
However, when I load the variables into memory again using a tf.train.Saver with the same script, the model performs poorly--with about the performance it does when the weights are randomly initialized. I have inspected the constitutional filters in the checkpoint and they don't seem to be random however.
I have not included all of my code, since its around 400 lines long, but I've included what seem to be important sections here and summarized the other functionality.
class ModelTrainer:
def __init__(self, ...hyperparameters...):
# Intitialize datasets and hyperparameters
for each gpu
# Create loss function and gradient assigned to this gpu using tf.device("/gpu:n")
with tf.device("/cpu:0")
# Average and clip gradients from the gpu's
# Create this batch gradient descent operation for each trainable variable
variable.assign_sub(learning_rate * averaged_and_clipped_gradient).op
def train(self, ...hyperparameters...)
saver = train.Saver(tf.all_variables(), max_to_keep = 30)
init = tf.initialize_all_variables()
sess = tf.Session()
if starting_point is not None: # Used to evaluate existing models
saver.restore(sess, starting_point)
else:
sess.run(init)
for i in range(number_of_batches)
# ... Get training batch ...
gradients = sess.run(calculate_gradients, feeds = training_batch)
# Average "gradients" variable across multiple batches
# Must be done because of GPU memory limitations
if i % meta_batch_size == 0:
sess.run(apply_gradients_operators,
feeds = gradients_that_have_been_averaged_across_multiple_batches)
# Log validation error
if i % save_after_n_batches == 0:
saver.save(sess, "some-filename", global_step=self.iter_num)
As expected, running these two functions creates a set of checkpoint files called "some-filename-40001" or whatever other iteration number the training is at when that file is saved. Unfortunately when I load these checkpoints back in using the start_point parameter they perform on par with random initialization.
Initially I assumed it was something to do with the way I'm training the model, since I haven't found anyone else with this issue, but the validation error behaves as expected.
Edit: More odd results. After more experimentation, I have found that when I load the saved model using the code:
with tf.Session() as sess:
saver = tf.train.import_meta_graph("saved-checkpoint-40.meta")
saver.restore(sess, "saved-checkpoint-40")
# ... Use model in some way ...
I get different, but still incorrect results.

Saving tensorflow model after training is finished

I have finished running a big model in tensorflow python. But I have not saved it inside the session. Now that the training is over, I want to save the variables. I am doing the following:
saver=tf.train.Saver()
with tf.Session(graph=graph) as sess:
save_path = saver.save(sess, "86_model.ckpt")
print("Model saved in file: %s" % save_path)
This returns : ValueError: No variables to save. According to their website what is missing is initialize_all_variables(). The documentation says little about what exactly that does. The word "initialize" scares me, I do not want to reset all my trained values. Any way to save my model without re-running it?
It seems like from the tensorflow documentation, the "session" is the thing that holds the information from the trained model. So presumably somewhere you called sess.run() to train your model - what you want to do is call sess.save() using THAT session, not a new one you create with this saver object.
I believe its because you are not initializing all of your variables in the saver. This should work
with tf.Session() as sess:
tf.initialize_all_variables().run()
saver = tf.train.Saver(tf.all_variables())
-------everything your session does -------------
checkpoint_path = os.path.join(save_dir, 'model.ckpt')
saver.save(sess, checkpoint_path, global_step = your_global_step)
How about using skflow ? With skflow(now skflow is integrated in tensorflow) you can specify the parameter model_dir on your constructor and that automatically will save your model while training(it will save checkpoints so if something goes wrong during training, you can restart from last checkpooint).

Categories