Modify learning rate in imported Tensorflow graph - python

I have created a graph with an AdamOptimizer, which I have then saved with tf.train.Saver().save(session, "model_name")
After training it for a while I am able to import the whole graph and the variables in a different session and resume training with
saver = tf.train.import_meta_graph("model_name")
saver.restore(session, "model_name")
What I would like to do is, after importing the graph+variables and before resuming the optimization, to change the learning_rate of the AdamOptimizer. Is that possible?
EDIT: One way of doing this would be to define the learning rate as a placeholder and feed a different value every time. But let's assume the graph has already been saved without doing this for the sake of argument.

I think you can replace learning_rate with placeholder,ie.
learning_rate = tf.placeholder(tf.float32,shape=(),name="learing_rate")
train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(your_loss_tensor, name="train_op")
when you have restored your graph, get all the all ops and tensors that related to train like train_op and learning_rate using
train_op = graph.get_operation_by_name("train_op")
learning_rate = graph.get_tensor_by_name("learning_rate:0")
and run train
sess.run(train_op, feed_dict={learning_rate: whatever_you_what})
UPDATE:
see this if you want to change some input of your saved graph

Related

Additional optimizer affects regularization loss

I'm working with an existing tensorflow model.
For one part of the network, I want to set a different learning rate as in the remaining network. Let's say all_variables are made up of variables_1 and variables_2, then I want to change the learning rate for variables of variables_2.
The existing code for settings up optimizer, computing and applying gradients looks basically like this:
optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)
grads_and_vars = optimizer.compute_gradients(loss, all_variables)
grads_updates = optimizer.apply_gradients(grads_and_vars, global_step)
I already tried to create a second optimizer following this scheme. However, for debugging, I set both learning rates equal, and the decrease of regularization loss was very dissimilar.
Isn't it possible to create a second optimizer, optimizer_new, and apply apply_gradients simply on the respective grads_and_vars of variables_1 and variables_2? I.e. Instead of having this line
grads_updates = optimizer.apply_gradients(grads_and_vars, global_step)
one could use
grads_updates = optimizer.apply_gradients(grads_and_vars['variables_1'], global_step)
grads_updates_new = optimizer_new.apply_gradients(grads_and_vars['variables_2'], global_step)
and finally, train_op = tf.group(grads_updates, grads_updates_new).
However, the regularization loss behavior is still present.
I came across the cause through a comment in this post . In my case, it doesn't make sense to to supply twice "global_step" for the global_step argument of apply_gradients. As the learning_rate and therefore the optimizer arguments depends on global_step, the training process, especially regularization loss behaviour, differs. Thanks to y.selivonchyk for pointing this out.

Tensorflow simple_save with checkpoints

I am trying to save my model at different steps while training. Let's say I would like to save after 5 epochs.
At this moment I am using:
tf.saved_model.simple_save(
sess, model_folder, inputs, outputs
)
which works as a charm. Nevertheless, I realize it is saving the whole graph and weights on each iteration, which has a high computational cost.
I would like to update the weights of my model keeping the graph from the previous save (since it is not changing during training)
I have read about tf.train.Saver which seems to fit with my intentions. But this forces me to specify all the variables I want to save, this is not as practical as simple_save method. So I am wondering if there is any way of using simple_save in a checkpoint fashion.
I think you have wrong understanding of the tf.train.Saver. You can do something as simple as:
saver = tf.train.Saver()
with tf.Session() as sess:
for e in range(epochs):
...
if e % 5 == 0:
saver.save(sess, "/path/where/to/save/model")
So no need to specify every single variable you want to save.

Trained neural network produces different predictions with same data (TensorFlow)

I have trained a neural network with TensorFlow. After training i saved it and loaded it again in a new '. py' file to avoid retraining on accident. As i was testing it with some extra data i found out that it predicts different things for the same data. Should it not theoretically compute the same thing for the same data?
Some information
feed forward net
4 hidden layers with 900 neurons each
5000 training epochs
reached accuracy of ~80%
data was normalized using normalize from sklearn. preprocessing
cost function: tensorflow.nn.softmax_cross_entropy_with_logits
optimizer: tf.train.AdamOptimizer
I am giving my network the data as a matrix, same way i used for training. (each row containing a data sample, having as many columns as there are input neurons)
Out of ten prediction cycles with the same data my network produces different results in at least 2 cycles (max observed 4 so far)
How can this be. By theory all that is happening are data processing calculations of the form W_i*x_i + b_i. As my x_i, W_i and b_i do not change anymore how come that the prediction varies? May there be a mistake in model reloading routine?
with tf.Session() as sess:
saver = tf.train.import_meta_graph('path to .meta')
saver.restore(sess, tf.train.latest_checkpoint('path to checkpoints'))
result = (sess.run(tf.argmax(prediction.eval(feed_dict=x:input_data}),1)))
print(result)
So this is a really stupid mistake by me. Now it works fine with loading the model from a save. The problem was caused by the global variables initializer. If you leave it out, it will work fine. The previously found information may prove useful for someone so i will leave it here. Solution is now:
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, 'path to your saved file C:x/y/z/model/model.ckpt')
After this you can go on as usually. I do not really know why variables initializer prevents this from working. As i see it, it should be something like: initialize all variables to exist and with random values and then got to that saved file and use values from there, but apparently something else happens...
So i have been doing some testing and found out the following about this problem.
As i have been trying to reuse my created model i had to use the tf.global_variables_initializer(). By doing so it has overwritten my imported graph and all the values were random, which explains different network outputs. This still left me with a problem to solve: how do i load my network? The workaround i am currently using is not optimal by far but it at least allows me to use my saved model. Tensor flow allows one to give unique names to the functions and tensors used. By doing so i could access them through the graph:
with tf.Session() as sess:
saver = tf.train.import_meta_graph('path to .meta')
saver.restore(sess, tf.train.latest_checkpoint('path to checkpoints'))
graph = tf.get_default_graph()
graph.get_tensor_by_name('name:0')
Using this method i could access all my saved values, but they were separated! It means that i had 1x weight and 1x bias per operation used, which led to a bunch of new variables. If you do not know the names, use following:
print(graph.get_all_collection_keys())
This prints the collection names (our variables are stored in collections)
print(graph.get_collection('name'))
This allows us to access the collection as see what are the names/keys for our variables.
This led to another problem. I could no longer use my model as global variables initializer had everything overwritten. By thus i had to redefine the whole model manually with weight and biases that i got previously.
Unfortunately, this is the only thing i could come up with. If anyone has a better idea, please let me know.
The whole thing with mistake looked like this:
imports...
placeholders for data...
def my_network(data):
## network definition with tf functions ##
return output
def train_my_net():
prediction = my_network(data)
cost function
optimizer
with tf.Session() as sess:
for i in how many epochs i want:
training routine
save
def use_my_net():
prediction = my_network(data)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.import_meta_graph('path to .meta')
saver.restore(sess, tf.train.latest_checkpoint('path to checkpoints'))
print(sess.run(prediction.eval(feed_dict={placeholder:data})))
graph = tf.get_default_graph()

How do I modify tensor shape when loading a model checkpoint with tf.train.Saver?

I trained a RNN with a fixed batch size, but now I'd like to modify the graph I saved with tf.train.Saver to have batch size 1 for inference. How can I go about this?
session = tf.InteractiveSession()
saver = tf.train.import_meta_graph('model.ckpt.meta')
saver.restore(session, 'model.ckpt')
A way to achieve this is to reconstruct a different (albeit compatible) network at test time and limit the recovery to weights only.
During training,
net = make_my_net(batch_size)
...
saver.save(session, model_name)
During testing,
net = make_my_net(1)
...
saver.restore(session, model_name)
The later will replace the values of variables (including network weights) with the ones that were saved earlier. You don't have to initialize the variables that you are about to overwrite according to the documentation, although I believe it has not always been so.
Note that reconstructing a different network gives you the opportunity to build a cleaner test network, e.g. by removing layers such as dropout.

Tensorflow: how it trains the model?

Working on Tensorflow, the first step is build a data graph and use session to run it. While, during my practice, such as the MNIST tutorial. It firstly defines loss function and the optimizer, with the following codes (and the MLP model is defined before that):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])) #define cross entropy error function
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean') #define loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate) #define optimizer
global_step = tf.Variable(0, name='global_step', trainable=False) #learning rate
train_op = optimizer.minimize(loss, global_step=global_step) #train operation in the graph
The training process:
train_step =tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
That's how Tensorflow did training in this case. But my question is, how did Tensorflow know which weight it needs to train and update? I mean, in the training codes, we only pass output y to cross_entropy, but for optimizer or loss, we didn't pass any information about the structure directly. In addition, we use dictionary to feed batch data to train_step, but train_step didn't directly use the data. How did Tensorflow know where to use these data as input?
To my question, I thought it might be all those variables or constants are stored in Tensor. Operations such as tf.matmul() should a "subclass" of Tensorflow's operation class(I haven't check the code yet). There might be some mechanism for Tensorflow to recognise relations among tensors (tf.Variable(), tf.constant()) and operations (tf.mul(), tf.div()...). I guess, it could check the tf.xxxx()'s super class to find out whether it is a tensor or operation. This assumption raises my second question: should I use Tensorflow's 'tf.xxx' function as possible to ensure tensorflow could build correct data flow graph, even sometimes it is more complicated than normal Python methods or some functions are supported better in Numpy than Tensorflow?
My last question is: Is there any link between Tensorflow and C++? I heard someone said Tensorflow is faster than normal Python since it uses C or C++ as backend. Is there any transform mechanism to transfer Tensorflow Python codes to C/C++?
I'd also be graceful if someone could share some debugging habits in coding with Tensorflow, since currently I just set up some terminals (Ubuntu) to test each part/functions of my codes.
You do pass information about your structure to Tensorflow when you define your loss with:
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
Notice that with Tensorflow you build a graph of operations, and every operation you use in your code is a node in the graph.
When you define your loss you are passing the operation stored in cross_entropy, which depends on y_ and y. y_ is a placeholder for your input whereas y is the result of y = tf.nn.softmax(tf.matmul(x, W) + b). See where I am going? The operation loss contains all the information it needs to build the model an process the input, because it depends on the operation cross_entropy, which depends on y_ and y, which depends on the input x and the model weights W.
So when you call
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Tensorflow knows perfectly which operations should be computed when you run train_step, and it knows exactly where to put in the operations graph the data you are passing through feed_dict.
As for how does Tensorflow know which variables should be trained, the answer is easy. It trains any tf.Variable() in the operations graph which is trainable. Notice how when you define the global_step you set trainable=False because you don't want to compute gradients w.r.t that variable.

Categories