I've looked at many questions regarding saving a trained neural network, including Tensorflow: how to save/restore a model? and https://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/ but none of them save a model without explicitly saving specific variables along with it, as in my case. Here is my case:
# In session "sesh"
saver = tf.train.Saver()
saver.save(sesh,os.getcwd(),latest_filename= 'RNN_plasma.ckpt')
Now, I quit the session and want to restore the model I just saved. How can I do this? When trying:
import tensorflow as tf
with tf.Session() as session1:
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('RNN_plasma.ckpt')#error-line
saver.restore(session1,tf.train.latest_checkpoint('./'))
, the tf.train.import_meta_graph() call returns:
raise IOError("Cannot parse file %s: %s." % (filename, str(e)))
IOError: Cannot parse file RNN_plasma.ckpt: 1:1 : Message type "tensorflow.MetaGraphDef" has no field named "model_checkpoint_path"..
Can anyone give any insight as to what is going on here, and how to solve it?
(My version of TensorFlow doesn't come with tf.python.saved_model.simple_save(). (I have git_version 1.5.0))
Save:
saver = tf.train.Saver()
saver.save(sess,"/tmp/network")
Restore:
sess = tf.Session()
saver = tf.train.import_meta_graph('/tmp/network.meta')
saver.restore(sess,tf.train.latest_checkpoint('/tmp'))
graph = tf.get_default_graph()
You save a simple checkpoint but then you are trying load it as a meta graph. This cannot work.
There is a writeup on the TensorFlow website explaining the differences
https://www.tensorflow.org/mobile/prepare_models#what_is_up_with_all_the_different_saved_file_formats
There must be a file ending with .meta.
Related
I used OpenAI to train a deepq model. After doing
saver = tf.train.Saver()
saver.save(tf.get_default_session(), 'my_deepq')
I got the following files:
my_deepq.data-00000-of-00001
my_deepq.index
checkpoint
my_deepq.meta
I then need to load this model in two different systems (C++ and python) to do the inference.
For the python part, I tried:
import tensorflow as tf
tf.reset_default_graph()
imported_graph = tf.train.import_meta_graph('my_deepq.meta')
with tf.Session() as sess:
imported_graph.restore(sess, './my_deepq')
The codes ran, but I am not sure where the model was loaded and how to do the inference. Could someone please advise.
For the C++ side, I will do something like:
tensorflow::Session *my_sess;
tensorflow::Status status = tensorflow::NewSession(options, &my_sess);
tensorflow::GraphDef graph_def;
status = ReadBinaryProto(tensorflow::Env::Default(), model_path, &graph_def);
status = my_sess->Create(graph_def);
tensorflow::Status status = my_sess->Run({{"My_Input", input_tensor}}, {"My_Output"}, {}, &output_tensor);
This approach requires the model to be in BinaryProto format, but I am not sure how to save my model in BinaryProto in python. Could anyone please advise. Thank you!
I've trained a CNN model in TensorFlow eager mode. Now I'm trying to restore the trained model from a checkpoint file but haven't got any success.
All the examples (as shown below) I've found are talking about restoring checkpoint to a Session. But what I need is to restore the model into eager mode, i.e. without creating a session.
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
Basically what I need is something like:
tfe.enable_eager_execution()
model = tfe.restore('model.ckpt')
model.predict(...)
and then I can use the model to make predictions.
Can someone please help?
Update
The example code can be found at: mnist eager mode demo
I've tried to follow the steps from #Jay Shah 's answer and it almost worked but the restored model doesn't have any variables in it.
tfe.save_network_checkpoint(model,'./test/my_model.ckpt')
Out[58]:
'./test/my_model.ckpt-1720'
model2 = MNISTModel()
tfe.restore_network_checkpoint(model2,'./test/my_model.ckpt-1720')
model2.variables
Out[72]:
[]
The original model has lots of variables in it.:
model.variables
[<tf.Variable 'mnist_model_1/conv2d/kernel:0' shape=(5, 5, 1, 32) dtype=float32, numpy=
array([[[[ -8.25184360e-02, 6.77833706e-03, 6.97569922e-02,...
Eager Execution is still a new feature in TensorFlow, and was not included in the latest version, so not all features, are supported, but fortunately, loading a model from a saved checkpoint is.
You'll need to use the tfe.Saver class (which is a thin wrapper over the tf.train.Saver class), and your code should look something like this:
saver = tfe.Saver([x, y])
saver.restore('/tmp/ckpt')
Where [x,y] represents the list of variables and/or models you wish to restore. This should precisely match the variables passed when the saver that created the checkpoint was initially created.
More details, including sample code, can be found here, and the API details of the saver can be found here.
Ok, after spending a few hours running the code in line-by-line mode, I've figured out a way to restore a checkpoint to a new TensorFlow Eager Mode model.
Using the examples from TF Eager Mode MNIST
Steps:
After your model has been trained, find the latest checkpoint(or the checkpoint you want) index file from the checkpoint folder created in the training process, such as 'ckpt-25800.index'. Use only the filename 'ckpt-25800' while restoring in step 5.
Start a new python terminal and enable TensorFlow Eager mode by running:
tfe.enable_eager_execution()
Create a new instance of the MNISTMOdel:
model_new = MNISTModel()
Initialise the variables for model_new by running a dummy train process once.(This step is important. Without initialising the variables first, they can't be restored by the following step. However I can't find another way to initialise variables in Eager mode other than what I did below.)
model_new(tfe.Variable(np.zeros((1,784),dtype=np.float32)), training=True)
Restore the variables to model_new using the checkpoint identified in step 1.
tfe.Saver((model_new.variables)).restore('./tf_checkpoints/ckpt-25800')
If restore process is successful, you should see something like:
INFO:tensorflow:Restoring parameters from ./tf_checkpoints/ckpt-25800
Now the checkpoint has been successfully restored to model_new and you can use it to make predictions on new data.
I like to share TFLearn library which is Deep learning library featuring a higher-level API for TensorFlow. With the help of this library you can easily save and restore a model.
Saving a model
model = tflearn.DNN(net) #Here 'net' is your designed network model.
#This is a sample example for training the model
model.fit(train_x, train_y, n_epoch=10, validation_set=(test_x, test_y), batch_size=10, show_metric=True)
model.save("model_name.ckpt")
Restore a model
model = tflearn.DNN(net)
model.load("model_name.ckpt")
For more example of tflearn you can check some site like...
My first CNN in TFLearn.
Github Link
First you save your model in a checkpoint by doing following:
saver.save(sess, './my_model.ckpt')
In above line you are saving you session in "my_model.ckpt" checkpoint
Following code restores the model
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, './my_model.ckpt')
When you restore the session as a model then you restores your model from the ckpt
For eager mode to save :
tf.contrib.eager.save_network_checkpoint(sess,'./my_model.ckpt')
For eager mode to restore :
tf.contrib.eager.restore_network_checkpoint(sess,'./my_model.ckpt')
sess is an object of class Network. Any object of class Network can be saved and restored. A quick explanation of network objects :-
class TwoLayerNetwork(tfe.Network):
def __init__(self, name):
super(TwoLayerNetwork, self).__init__(name=name)
self.layer_one = self.track_layer(tf.layers.Dense(16, input_shape=(8,)))
self.layer_two = self.track_layer(tf.layers.Dense(1, input_shape=(16,)))
def call(self, inputs):
return self.layer_two(self.layer_one(inputs))
After constructing an object and calling the Network, a list of variables
created by tracked Layers is available via Network.variables:
python
sess = TwoLayerNetwork(name="net") # sess is object of Network
output = sess(tf.ones([1, 8]))
print([v.name for v in sess.variables])
```
=================================================================
This example prints variable names, one kernel and one bias per
`tf.layers.Dense` layer:
['net/dense/kernel:0',
'net/dense/bias:0',
'net/dense_1/kernel:0',
'net/dense_1/bias:0']
These variables can be passed to a `Saver` (`tf.train.Saver`, or
`tf.contrib.eager.Saver` when executing eagerly) to save or restore the
`Network`
=================================================================
```
tfe.save_network_checkpoint(sess,'./my_model.ckpt') # saving the model
tfe.restore_network_checkpoint(sess,'./my_model.ckpt') # restoring
Saving variables with tfe.Saver().save() :
for epoch in range(epochs):
train_and_optimize()
all_variables = model.variables + optimizer.variables()
# save the varibles
tfe.Saver(all_variables).save(checkpoint_prefix)
And then reload saved variables with tfe.Saver().restore() :
tfe.Saver((model.variables + optimizer.variables())).restore(checkpoint_prefix)
Then the model is loaded with the saved variables, and no need to create a new one as in #Stefan Falk 's answer.
I train model with method in TensorFlow tutorials(code is here). At last I save the model in a checkpoint directory. Now I want to restore from the checkpoint directory:
import tensorflow as tf
def main(_):
saver = tf.train.Saver()
with tf.Session() as sess:
ckpt = tf.train.latest_checkpoint("/data/lstm_models")
saver.restore(sess, ckpt)
if __name__ == "__main__":
tf.app.run()
Howerver, I got error:
ValueError: No variables to save
It look like you haven't defined your graph to restore from the checkpoint, so when building the saver it complains that your graph is empty.
Could you try to build your graph again (e.g. redefining your variables) before trying to restore it?
From the restore method string doc:
It requires a session in which the graph was launched.
I'm training classification CNN using TensorFlow v0.12, and then want to create labels for new data using the trained model.
At the end of the training script, I added those lines of code:
saver = tf.train.Saver()
save_path = saver.save(sess,'/home/path/to/model/model.ckpt')
After the training completed, the files appearing in the folder are: 1. checkpoint ; 2. model.ckpt.data-00000-of-00001 ; 3. model.ckpt.index ; 4. model.ckpt.meta
Then I tried to restore the model using the .meta file. Following this tutorial, I added the following line into my classification code:
saver=tf.train.import_meta_graph(savepath+'model.ckpt.meta') #line1
and then:
saver.restore(sess, save_path=savepath+'model.ckpt') #line2
Before that change, I needed to build the graph again, and then write (instead of line1):
saver = tf.train.Saver()
But, deleting the graph building, and using line1 in order to restore it, raised an error. The error was that I used a variable from the graph inside my code, and the python didn't recognize it:
predictions = sess.run(y_conv, feed_dict={x: patches,keep_prob: 1.0})
The python didn't recognize the y_conv parameter. There is a way to restore the variables using the meta graph? if not, what os this restore helping, if I can't use variables from the original graph?
I know this question isn't so clear, but it was hard for me to express the problem in words. Sorry about it...
Thanks for answering, appreciate your help! Roi.
it is possible, don't worry. Assuming you don't want to touch the graph anymore, do something like this:
saver = tf.train.import_meta_graph('model/export/{}.meta'.format(model_name))
saver.restore(sess, 'model/export/{}'.format(model_name))
graph = tf.get_default_graph()
y_conv = graph.get_operation_by_name('y_conv').outputs[0]
predictions = sess.run(y_conv, feed_dict={x: patches,keep_prob: 1.0})
A preferred way would however be adding the ops into collections when you build the graph and then referring to them. So when you define the graph, you would add the line:
tf.add_to_collection("y_conv", y_conv)
And then after you import the metagraph and restore it, you would call:
y_conv = tf.get_collection("y_conv")[0]
It is actually explained in the documentation - the exact page you linked - but perhaps you missed it.
Btw, no need for the .ckpt extension, it might create some confusion as that is the old way of saving models.
Just to add to Roberts's answer - after obtaining a saver from the meta graph, and using it to restore the variables in the current session, you can also use:
y_conv = graph.get_tensor_by_name('y_conv:0')
This'll work if you've created the y_conv with explicitly adding the name="y_conv" argument (all TF ops have this).
I have finished running a big model in tensorflow python. But I have not saved it inside the session. Now that the training is over, I want to save the variables. I am doing the following:
saver=tf.train.Saver()
with tf.Session(graph=graph) as sess:
save_path = saver.save(sess, "86_model.ckpt")
print("Model saved in file: %s" % save_path)
This returns : ValueError: No variables to save. According to their website what is missing is initialize_all_variables(). The documentation says little about what exactly that does. The word "initialize" scares me, I do not want to reset all my trained values. Any way to save my model without re-running it?
It seems like from the tensorflow documentation, the "session" is the thing that holds the information from the trained model. So presumably somewhere you called sess.run() to train your model - what you want to do is call sess.save() using THAT session, not a new one you create with this saver object.
I believe its because you are not initializing all of your variables in the saver. This should work
with tf.Session() as sess:
tf.initialize_all_variables().run()
saver = tf.train.Saver(tf.all_variables())
-------everything your session does -------------
checkpoint_path = os.path.join(save_dir, 'model.ckpt')
saver.save(sess, checkpoint_path, global_step = your_global_step)
How about using skflow ? With skflow(now skflow is integrated in tensorflow) you can specify the parameter model_dir on your constructor and that automatically will save your model while training(it will save checkpoints so if something goes wrong during training, you can restart from last checkpooint).