I am loading from a saved model and I would like to be able to reset a tensorflow optimizer such as an Adam Optimizer. Ideally something like:
sess.run([tf.initialize_variables(Adamopt)])
or
sess.run([Adamopt.reset])
I have tried looking for an answer but have yet to find any way to do it. Here's what I've found which don't address the issue:
https://github.com/tensorflow/tensorflow/issues/634
In TensorFlow is there any way to just initialize uninitialised variables?
Tensorflow: Using Adam optimizer
I basically just want a way to reset the "slot" variables in the Adam Optimizer.
Thanks
In tensorflow 2.x, e.g., Adam optimizer, you can reset it like this:
for var in optimizer.variables():
var.assign(tf.zeros_like(var))
This question also bothered me for quite a while. Actually it's quite easy, you just define an operation to reset the current state of an optimizer which can be obtained by the variables() method, something like this:
optimizer = tf.train.AdamOptimizer(0.1, name='Optimizer')
reset_optimizer_op = tf.variables_initializer(optimizer.variables())
Whenever you need to reset the optimizer, run:
sess.run(reset_optimizer_op)
Official explanation of variables():
A list of variables which encode the current state of Optimizer.
Includes slot variables and additional global variables created by the optimizer in the current default graph.
e.g. for AdamOptimizer basically you will get the first and second moment(with slot_name 'm' and 'v') of all trainable variables, as long as beta1_power and beta2_power.
The simplest way I found was to give the optimizer its own variable scope and then run
optimizer_scope = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
"scope/prefix/for/optimizer")
sess.run(tf.initialize_variables(optimizer_scope))
idea from freeze weights
Building upon #EdisonLeejt answer for Tensorflow 2.x, more generally you can first get the initial state (may not be zero e.g. if loaded from a checkpoint file) and then assign it i.e.
#Get initial states
init_states = [var.value() for var in optimizer.variables()]
#Do the optimization
...
#reset optimizer state to init_state
for val,var in zip(init_states,optimizer.variables()): var.assign(val)
Related
I want to use the external optimizer interface within tensorflow, to use newton optimizers, as tf.train only has first order gradient descent optimizers. At the same time, i want to build my network using tf.keras.layers, as it is way easier than using tf.Variables when building large, complex networks. I will show my issue with the following, simple 1D linear regression example:
import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np
#generate data
no = 100
data_x = np.linspace(0,1,no)
data_y = 2 * data_x + 2 + np.random.uniform(-0.5,0.5,no)
data_y = data_y.reshape(no,1)
data_x = data_x.reshape(no,1)
# Make model using keras layers and train
x = tf.placeholder(dtype=tf.float32, shape=[None,1])
y = tf.placeholder(dtype=tf.float32, shape=[None,1])
output = tf.keras.layers.Dense(1, activation=None)(x)
loss = tf.losses.mean_squared_error(data_y, output)
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method="L-BFGS-B")
sess = K.get_session()
sess.run(tf.global_variables_initializer())
tf_dict = {x : data_x, y : data_y}
optimizer.minimize(sess, feed_dict = tf_dict, fetches=[loss], loss_callback=lambda x: print("Loss:", x))
When running this, the loss just does not change at all. When using any other optimizer from tf.train, it works fine. Also, when using tf.layers.Dense() instead of tf.keras.layers.Dense() it does work using the ScipyOptimizerInterface. So really the question is what is the difference between tf.keras.layers.Dense() and tf.layers.Dense(). I saw that the Variables created by tf.layers.Dense() are of type tf.float32_ref while the Variables created by tf.keras.layers.Dense() are of type tf.float32. As far as I now, _ref indicates that this tensor is mutable. So maybe that's the issue? But then again, any other optimizer from tf.train works fine with keras layers.
Thanks
After a lot of digging I was able to find a possible explanation.
ScipyOptimizerInterface uses feed_dicts to simulate the updates of your variables during the optimization process. It only does an assign operation at the very end. In contrast, tf.train optimizers always do assign operations. The code of ScipyOptimizerInterface is not that complex so you can verify this easily.
Now the problem is that assigining variables with feed_dict is working mostly by accident. Here is a link where I learnt about this. In other words, assigning variables via feed dict, which is what ScipyOptimizerInterface does, is a hacky way of doing updates.
Now this hack mostly works, except when it does not. tf.keras.layers.Dense uses ResourceVariables to model the weights of the model. This is an improved version of simple Variables that has cleaner read/write semantics. The problem is that under the new semantics the feed dict update happens after the loss calculation. The link above gives some explanations.
Now tf.layers is currently a thin wrapper around tf.keras.layer so I am not sure why it would work. Maybe there is some compatibility check somewhere in the code.
The solutions to adress this are somewhat simple.
Either avoid using components that use ResourceVariables. This can be kind of difficult.
Patch ScipyOptimizerInterface to do assignments for variables always. This is relatively easy since all the required code is in one file.
There was some effort to make the interface work with eager (that by default uses the ResourceVariables). Check out this link
I think the problem is with the line
output = tf.keras.layers.Dense(1, activation=None)(x)
In this format output is not a layer but rather the output of a layer, which might be preventing the wrapper from collecting the weights and biases of the layer and feed them to the optimizer. Try to write it in two lines e.g.
output = tf.keras.layers.Dense(1, activation=None)
res = output(x)
If you want to keep the original format then you might have to manually collect all trainables and feed them to the optimizer via the var_list option
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, var_list = [Trainables], method="L-BFGS-B")
Hope this helps.
Whether does a Keras custom loss function accept global python variable?
I am building my own Keras custom loss function, which only accepts y_true and y_pred as arguments.But the loss function is quite complex and it depends on other variables.Currently in my implementation,the loss function just directly uses global variables in the same python code script.After training the model,if I want to use the model to do prediction,and then those global variables in the python environment will be changed. My question is that,do I need to compile the model again,to guarantee that the model has been updated with the latest version of those external global variables?
Rlist=....
def custom_loss(y_true,y_pred):
z = 0.0
#Rlist is the global variable
for j in Rlist:
z = z +K.log(K.sum(K.exp(K.gather(y_pred,j[0])))) \
- K.log(K.sum(K.exp(K.gather(y_pred,j))))
z = -z
return z
#below build the model and compile it with loss=custom_loss
model=...
model.compile(loss=custom_loss,....
model.fit(x=train_x,y=train_y,...)
#Rlist=... update Rlist which is adaptive to test dataset
#Do I need to recompile in the code below,or whether Rlist is updated
#in custom_loss when it is called?
model.predict(x=test_x,y=test_y,...)
In my loss function(actually this is the loss function for cox proportional hazard model),the loss is not additive among loss values for each samples.
Rlist is a global variable in the python environment of my Keras code
my question is that,after training the model,if I change this Rlist for
the test dataset,will Keras automatically update the Rlist,or it uses the old version of this variable Rlist when it compiles and builds the computation graph?
Is there any explanation that if I directly refer to a global variable from python environment in the loss function,then what will happen when Tensorflow builds its computation graph?
I know it's not a goop practice to use global variable.Better suggestions are also recommended.
What exactly do you mean by "python environment of my Keras code"? If you set the Rlist variable in your code while training to [1,2,3]. And then change it to [3,2,1] in prediction/production mode, you custom loss will see the [3,2,1] variable.
I'm not sure what you are trying to achieve, i suppose this could work:
A) Create a real ENV_Variable with RList
B) Create a JSON File with your RList (that way, you'll be able to use your RList data in production mode on server or cloud).
C) Create a Dict in your code like
RList={
'train': [1,2,3],
'test':[3,2,1],
'production':[4,5,6]
}
I need to modify the weight values during the execution, more specifically between the compute_gradients() and apply_gradients() functions. I was able to modify the gradients themselves, but i could not change the weights.
I'm using the tutorial for the Iris NN in tensorflow:
https://github.com/tensorflow/models/blob/master/samples/core/get_started/custom_estimator.py , the only difference being that i changed the minimize() function for the compute_gradients() and the apply_gradients() function.
grads_and_vars = optimizer.compute_gradients(loss)
// some way to change the weights
train_op = optimizer.apply_gradients(grads_and_vars, global_step=tf.train.get_global_step())
Thanks in advance.
My best guess is that you are looking for tf.assign (from here) to assign values to your Variable tensors.
According to the docs:
Update 'ref' by assigning 'value' to it.
This operation outputs a Tensor that holds the new value of 'ref' after the value has been assigned. This makes it easier to chain operations that need to use the reset value.
I'm finally using my LSTM model to predict things. However, I've run into a new problem that I don't quite understand. If I try to predict something using
sess.run(pred, feed_dict={x: xs})
It works great for the first prediction, but any subsequent predictions throw the error:
ValueError: Variable weight_def/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?
Now, there are a TON of topics on this - and most of them are easily solved by doing what it asks - just create a variable scope around the offending line and make variable reuse true. Now, if I do that I get the following error:
ValueError: Variable rnn_def/RNN/BasicLSTMCell/Linear/Matrix does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
This is causing me quite the headache. I've read the Tensorflow Variable Sharing documentation over and over, and I can't for the life of me figure out what I am doing wrong. Here the offending lines
with tf.variable_scope("rnn_def"):
outputs, states = rnn.rnn(self.__lstm_cell,
self.__x,
dtype=tf.float32)
self.__outputs = outputs
self.__states = states
I have this code nested in a larger class that just contains the remainder of the graph. To train it, I just call my "train" method over and over again. Which seems to work fine, the problem ends up being prediction.
So my question is two fold:
Why do I require some sort of variable sharing only after the first prediction but the first call doesn't fail? What do I need to fix this code so I can predict more than once without causing an error?
When is variable sharing useful, and why is Tensorflow creating new variables each time I run it? How can I prevent this (do I want to prevent it?)?
Thank you!
Add a print statement to that block of code. I suspect it is being called multiple times. Or maybe you are creating multiple instances of the class in which each class should have its own scope name.
To answer your questions.
Why do I require some sort of variable sharing only after the first
prediction but the first call doesn't fail? What do I need to fix this
code so I can predict more than once without causing an error?
No you don't. That block of code creating the RNN is probably being accidentally called multiple times.
When is variable sharing useful, and why is Tensorflow creating new
variables each time I run it? How can I prevent this (do I want to
prevent it?)?
It is useful in the following case where I have different input sources for part of my graph depending on whether is is training or predicting.
x_train, h_train = ops.sharpen_cell(x_train, h_train, 2, n_features, conv_size, n_convs, conv_activation, 'upsampler')
self.cost += tf.reduce_mean((x_train - y_train) ** 2)
level_scope.reuse_variables()
x_gen, h_gen = ops.sharpen_cell(x_gen, h_gen, 2, n_features, conv_size, n_convs, conv_activation, 'upsampler')
self.generator_outputs.append(tf.clip_by_value(x_gen, -1, 1))
In this example is reuses the variables for the generator which were trained with the trainer. It is also useful if you want to unroll and RNN in a loop. Such as in this case...
y = #initial value
state = #initial state
rnn = #some sort of RNN cell
with tf.variable_scope("rnn") as scope:
for t in range(10):
y, state = rnn(y, state)
scope.reuse_variabled()
In this case it will reuse the rnn weights between time steps which is the desired behavior for an RNN.
I want to implement the autoencoder (to be exact stacked convolutional autoencoder)
here I'd like to pretrain each layer first and then fine-tuning
So I created variables for weight of each layer
ex. W_1 = tf.Variable(initial_value, name,trainable=True etc) for first layer
and I pretrained W_1 of first layer
Then I want to pretrain weight of second layer (W_2)
Here I should use W_1 for calculating input of second layer.
However W_1 is trainable so if I use W_1 directly then tensorflow may train W_1 together.
So I should create W_1_out that keep value of W_1 but not trainable
To be honest I tried to modify code of this site
https://github.com/cmgreen210/TensorFlowDeepAutoencoder/blob/master/code/ae/autoencoder.py
At line 102 it creates variable by following code
self[name_w + "_fixed"] = tf.Variable(tf.identity(self[name_w]),
name=name_w + "_fixed",
trainable=False)
However it calls error cause it use uninitialized value
How should I do to copy variable but make it not trainable to pretrain next layers??
Not sure if still relevant, but I'll try anyway.
Generally, what I do in a situation like that is the following:
Populate the (default) graph according to the model you are building, e.g. for the first training step just create the first convolutional layer W1 you mention. When you train the first layer you can store the saved model once training is finished, then reload it and add the ops required for the second layer W2. Or you can just build the whole graph for W1 from scratch again directly in the code and then add the ops for W2.
If you are using the restore mechanism provided by Tensorflow, you will have the advantage that the weights for W1 are already the pre-trained ones. If you don't use the restore mechanism, you will have to set the W1 weights manually, e.g. by doing something shown in the snippet further below.
Then when you set up the training op, you can pass a list of variables as var_list to the optimizer which explicitly tells the optimizer which parameters are updated in order to minimize the loss. If this is set to None (the default), it just uses what it can find in tf.trainable_variables() which in turn is a collection of all tf.Variables that are trainable. May be check this answer, too, which basically says the same thing.
When using the var_list argument, graph collections come in handy. E.g. you could create a separate graph collection for every layer you want to train. The collection would contain the trainable variables for each layer and then you could very easily just retrieve the required collection and pass it as the var_list argument (see example below and/or the remark in the above linked documentation).
How to override the value of a variable: name is the name of the variable to be overriden, value is an array of the appropriate size and type and sess is the session:
variable = tf.get_default_graph().get_tensor_by_name(name)
sess.run(tf.assign(variable, value))
Note that the name needs an additional :0 in the end, so e.g. if the weights of your layer are called 'weights1' the name in the example should be 'weights1:0'.
To add a tensor to a custom collection: Use something along the following lines:
tf.add_to_collection('layer1_tensors', weights1)
tf.add_to_collection('layer1_tensors', some_other_trainable_variable)
Note that the first line creates the collection because it does not yet exist and the second line adds the given tensor to the existing collection.
How to use the custom collection: Now you can do something like this:
# loss = some tensorflow op computing the loss
var_list = tf.get_collection_ref('layer1_tensors')
optim = tf.train.AdamOptimizer().minimize(loss=loss, var_list=var_list)
You could also use tf.get_collection('layer_tensors') which would return you a copy of the collection.
Of course, if you don't wanna do any of this, you could just use trainable=False when creating the graph for all variables you don't want to be trainable as you hinted towards in your question. However, I don't like that option too much, because it requires you to pass in booleans into the functions that populate your graph, which is very easily overlooked and thus error-prone. Also, even if you decide to it like that, you would still have to restore the non-trainable variables manually.