Fine-tune inception network 2 times (Tensorflow) - python

I want to run flowers_train.py script from here: https://github.com/tensorflow/models/tree/master/inception/inception
to fine-tune the inception network on the flowers dataset. The difference is that I want to save a checkpoint and then run the flowers_train.py script again, but now restoring the previous saved checkpoint. I noticed that using this restorer again:
restorer = tf.train.Saver(variables_to_restore)
gives me high loss in the first steps. So do I need to use restorer = tf.train.Saver() ?
I also noticed that the provided checkpoint file is 434,9MB, but the checkpoint I am saving is 389,9MB.

What is variables_to_restore? If it does not include the last layer, then yes, you will see a higher loss and a smaller file size.

Related

I have a pytorch image classifier training, and I want to pause training and save the weights at time of program pause. Can I do this?

I'm in the middle of training a classifier that's been training for a few days now, but my problem is that I didn't code in to save .pt checkpoints throughout training, and so I'll only end up with a weights file when the program is done with all of its epochs. Is there a way to pause training (PAUSE BREAK) and save the model's weights right now?
Unfortunately, PyTorch does not have a native API for this at the moment.
For the current job, you could use an IDE like PyDev or Pycharm to attach a debugger to the running process and set a break point somewhere in your code and extract the weights and biases.
For future jobs, you could always create checkpoints inside the epochs loop and save the learned model there. This link will help.

LSTM Does Not Do Well On Second Test Data

During training my LSTM performs well (I use training, validation, and test dataset). And use my test dataset once at the end after training, and I get really good values. So I save the meta file and checkpoint.
Then, during inference, I load my checkpoint and meta file, initialize the weights (using sess.run(tf.initialize_variables())), but when I use a second test dataset (different from the dataset I used during training) my LSTM performance goes from 96% to 20%.
My second test dataset was recorded in similar conditions as my training, validation, and first test dataset, but it was recorded on a different day.
All my dataset was recorded using the same webcam, and with the same background in all images, so technically I should get similar performance in my first and second test set.
In shuffled my dataset during training.
I am using tensorflow 1.1.0
What could be the issue here?
Well, I was reloading my checkpoint during inference, and somehow tensorflow would complain if I did not call the initializer after starting my session like this:
init = tf.global_variables_initializer()
lstm_sess.run(init)
Somehow that seems to randomly initialize my weights rather than reloading the last used weight values.
So what I did instead is freezing my graph as soon as I finish training, and now during inference I reload my graph, so I get the same performance as the performance I got with my test dataset during training. It's kinda weird. Maybe I am not saving/reloading my checkpoint correctly?

Restore models in tensorflow1.0 from many steps

I'd like to train my model with many epoches using Tensorflow v1.0. And my idea is to save every model in every epoch. But soon i found the current model would replace the last one.(i mean the last one would vanish.) So i want to know how to get all of the models and restore them one by one. I think it's hard and haven't got a nice solution. Thanks for every suggestion!
tf.Train.Saver().save() has an argument global_step.
From the documentation:
Savers can automatically number checkpoint filenames with a provided counter. This lets you keep multiple checkpoints at different steps while training a model.
So you should try something like:
saver = tf.Train.Saver(...)
sess = tf.Session(...)
for epoch in num_epochs:
... train model...
saver.save(sess, "MODEL_NAME", global_step=epoch)
Note that by default, Tensorflow keeps only the last 5 checkpoints. If you want to keep them all you should initialize your Saver with something in the lines of:
saver = tf.Train.Saver(max_to_keep=num_epochs)

How do I modify tensor shape when loading a model checkpoint with tf.train.Saver?

I trained a RNN with a fixed batch size, but now I'd like to modify the graph I saved with tf.train.Saver to have batch size 1 for inference. How can I go about this?
session = tf.InteractiveSession()
saver = tf.train.import_meta_graph('model.ckpt.meta')
saver.restore(session, 'model.ckpt')
A way to achieve this is to reconstruct a different (albeit compatible) network at test time and limit the recovery to weights only.
During training,
net = make_my_net(batch_size)
...
saver.save(session, model_name)
During testing,
net = make_my_net(1)
...
saver.restore(session, model_name)
The later will replace the values of variables (including network weights) with the ones that were saved earlier. You don't have to initialize the variables that you are about to overwrite according to the documentation, although I believe it has not always been so.
Note that reconstructing a different network gives you the opportunity to build a cleaner test network, e.g. by removing layers such as dropout.

Tensorflow how to modify a pre-trained Model saved as checkpoints

I trained a FCN model in Tensorflow following implementation in link and saved the complete model as checkpoint, Now I want to use the saved model(pre-trained) for different problem.
I tried to restore the model from checkpoint by specifying the weights in Saver as:
saver = tf.train.Saver({"weights" : [w1_1,w1_2,w2_1,w2_2,w3_1,w3_2,w3_3,w3_4, w4_1, w4_2, w4_3, w4_4,w5_1,w5_2,w5_3,w6,w7]})
I am getting weights as:
w1_1=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,scope='inference/conv1_1_w')
and so on....
I am not able to restore it successfully (up to specific layer).
Tensorflow version:0.12r
Either you can call init = tf.initialize_variables([list_of_vars]) followed by sess.run(init) and that would reinitialize those variables for you, or you can recreate the graph with same structure from the point where you want to freeze the weights but keep different names for variables. Further in case you only want to train certain variables only, you can pass those variables only to optimizer. tf.train.AdamOptimizer(learning_rate).minimize(loss,var_list = [wi, wj, ....])

Categories