I am trying to save my model at different steps while training. Let's say I would like to save after 5 epochs.
At this moment I am using:
tf.saved_model.simple_save(
sess, model_folder, inputs, outputs
)
which works as a charm. Nevertheless, I realize it is saving the whole graph and weights on each iteration, which has a high computational cost.
I would like to update the weights of my model keeping the graph from the previous save (since it is not changing during training)
I have read about tf.train.Saver which seems to fit with my intentions. But this forces me to specify all the variables I want to save, this is not as practical as simple_save method. So I am wondering if there is any way of using simple_save in a checkpoint fashion.
I think you have wrong understanding of the tf.train.Saver. You can do something as simple as:
saver = tf.train.Saver()
with tf.Session() as sess:
for e in range(epochs):
...
if e % 5 == 0:
saver.save(sess, "/path/where/to/save/model")
So no need to specify every single variable you want to save.
Related
I have trained a neural network with TensorFlow. After training i saved it and loaded it again in a new '. py' file to avoid retraining on accident. As i was testing it with some extra data i found out that it predicts different things for the same data. Should it not theoretically compute the same thing for the same data?
Some information
feed forward net
4 hidden layers with 900 neurons each
5000 training epochs
reached accuracy of ~80%
data was normalized using normalize from sklearn. preprocessing
cost function: tensorflow.nn.softmax_cross_entropy_with_logits
optimizer: tf.train.AdamOptimizer
I am giving my network the data as a matrix, same way i used for training. (each row containing a data sample, having as many columns as there are input neurons)
Out of ten prediction cycles with the same data my network produces different results in at least 2 cycles (max observed 4 so far)
How can this be. By theory all that is happening are data processing calculations of the form W_i*x_i + b_i. As my x_i, W_i and b_i do not change anymore how come that the prediction varies? May there be a mistake in model reloading routine?
with tf.Session() as sess:
saver = tf.train.import_meta_graph('path to .meta')
saver.restore(sess, tf.train.latest_checkpoint('path to checkpoints'))
result = (sess.run(tf.argmax(prediction.eval(feed_dict=x:input_data}),1)))
print(result)
So this is a really stupid mistake by me. Now it works fine with loading the model from a save. The problem was caused by the global variables initializer. If you leave it out, it will work fine. The previously found information may prove useful for someone so i will leave it here. Solution is now:
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, 'path to your saved file C:x/y/z/model/model.ckpt')
After this you can go on as usually. I do not really know why variables initializer prevents this from working. As i see it, it should be something like: initialize all variables to exist and with random values and then got to that saved file and use values from there, but apparently something else happens...
So i have been doing some testing and found out the following about this problem.
As i have been trying to reuse my created model i had to use the tf.global_variables_initializer(). By doing so it has overwritten my imported graph and all the values were random, which explains different network outputs. This still left me with a problem to solve: how do i load my network? The workaround i am currently using is not optimal by far but it at least allows me to use my saved model. Tensor flow allows one to give unique names to the functions and tensors used. By doing so i could access them through the graph:
with tf.Session() as sess:
saver = tf.train.import_meta_graph('path to .meta')
saver.restore(sess, tf.train.latest_checkpoint('path to checkpoints'))
graph = tf.get_default_graph()
graph.get_tensor_by_name('name:0')
Using this method i could access all my saved values, but they were separated! It means that i had 1x weight and 1x bias per operation used, which led to a bunch of new variables. If you do not know the names, use following:
print(graph.get_all_collection_keys())
This prints the collection names (our variables are stored in collections)
print(graph.get_collection('name'))
This allows us to access the collection as see what are the names/keys for our variables.
This led to another problem. I could no longer use my model as global variables initializer had everything overwritten. By thus i had to redefine the whole model manually with weight and biases that i got previously.
Unfortunately, this is the only thing i could come up with. If anyone has a better idea, please let me know.
The whole thing with mistake looked like this:
imports...
placeholders for data...
def my_network(data):
## network definition with tf functions ##
return output
def train_my_net():
prediction = my_network(data)
cost function
optimizer
with tf.Session() as sess:
for i in how many epochs i want:
training routine
save
def use_my_net():
prediction = my_network(data)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.import_meta_graph('path to .meta')
saver.restore(sess, tf.train.latest_checkpoint('path to checkpoints'))
print(sess.run(prediction.eval(feed_dict={placeholder:data})))
graph = tf.get_default_graph()
I'd like to train my model with many epoches using Tensorflow v1.0. And my idea is to save every model in every epoch. But soon i found the current model would replace the last one.(i mean the last one would vanish.) So i want to know how to get all of the models and restore them one by one. I think it's hard and haven't got a nice solution. Thanks for every suggestion!
tf.Train.Saver().save() has an argument global_step.
From the documentation:
Savers can automatically number checkpoint filenames with a provided counter. This lets you keep multiple checkpoints at different steps while training a model.
So you should try something like:
saver = tf.Train.Saver(...)
sess = tf.Session(...)
for epoch in num_epochs:
... train model...
saver.save(sess, "MODEL_NAME", global_step=epoch)
Note that by default, Tensorflow keeps only the last 5 checkpoints. If you want to keep them all you should initialize your Saver with something in the lines of:
saver = tf.Train.Saver(max_to_keep=num_epochs)
I trained a RNN with a fixed batch size, but now I'd like to modify the graph I saved with tf.train.Saver to have batch size 1 for inference. How can I go about this?
session = tf.InteractiveSession()
saver = tf.train.import_meta_graph('model.ckpt.meta')
saver.restore(session, 'model.ckpt')
A way to achieve this is to reconstruct a different (albeit compatible) network at test time and limit the recovery to weights only.
During training,
net = make_my_net(batch_size)
...
saver.save(session, model_name)
During testing,
net = make_my_net(1)
...
saver.restore(session, model_name)
The later will replace the values of variables (including network weights) with the ones that were saved earlier. You don't have to initialize the variables that you are about to overwrite according to the documentation, although I believe it has not always been so.
Note that reconstructing a different network gives you the opportunity to build a cleaner test network, e.g. by removing layers such as dropout.
I trained a FCN model in Tensorflow following implementation in link and saved the complete model as checkpoint, Now I want to use the saved model(pre-trained) for different problem.
I tried to restore the model from checkpoint by specifying the weights in Saver as:
saver = tf.train.Saver({"weights" : [w1_1,w1_2,w2_1,w2_2,w3_1,w3_2,w3_3,w3_4, w4_1, w4_2, w4_3, w4_4,w5_1,w5_2,w5_3,w6,w7]})
I am getting weights as:
w1_1=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,scope='inference/conv1_1_w')
and so on....
I am not able to restore it successfully (up to specific layer).
Tensorflow version:0.12r
Either you can call init = tf.initialize_variables([list_of_vars]) followed by sess.run(init) and that would reinitialize those variables for you, or you can recreate the graph with same structure from the point where you want to freeze the weights but keep different names for variables. Further in case you only want to train certain variables only, you can pass those variables only to optimizer. tf.train.AdamOptimizer(learning_rate).minimize(loss,var_list = [wi, wj, ....])
I have created a graph with an AdamOptimizer, which I have then saved with tf.train.Saver().save(session, "model_name")
After training it for a while I am able to import the whole graph and the variables in a different session and resume training with
saver = tf.train.import_meta_graph("model_name")
saver.restore(session, "model_name")
What I would like to do is, after importing the graph+variables and before resuming the optimization, to change the learning_rate of the AdamOptimizer. Is that possible?
EDIT: One way of doing this would be to define the learning rate as a placeholder and feed a different value every time. But let's assume the graph has already been saved without doing this for the sake of argument.
I think you can replace learning_rate with placeholder,ie.
learning_rate = tf.placeholder(tf.float32,shape=(),name="learing_rate")
train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(your_loss_tensor, name="train_op")
when you have restored your graph, get all the all ops and tensors that related to train like train_op and learning_rate using
train_op = graph.get_operation_by_name("train_op")
learning_rate = graph.get_tensor_by_name("learning_rate:0")
and run train
sess.run(train_op, feed_dict={learning_rate: whatever_you_what})
UPDATE:
see this if you want to change some input of your saved graph