I need to modify the weight values during the execution, more specifically between the compute_gradients() and apply_gradients() functions. I was able to modify the gradients themselves, but i could not change the weights.
I'm using the tutorial for the Iris NN in tensorflow:
https://github.com/tensorflow/models/blob/master/samples/core/get_started/custom_estimator.py , the only difference being that i changed the minimize() function for the compute_gradients() and the apply_gradients() function.
grads_and_vars = optimizer.compute_gradients(loss)
// some way to change the weights
train_op = optimizer.apply_gradients(grads_and_vars, global_step=tf.train.get_global_step())
Thanks in advance.
My best guess is that you are looking for tf.assign (from here) to assign values to your Variable tensors.
According to the docs:
Update 'ref' by assigning 'value' to it.
This operation outputs a Tensor that holds the new value of 'ref' after the value has been assigned. This makes it easier to chain operations that need to use the reset value.
Related
So if I run this code in Pytorch:
x = torch.ones(2,2, requires_grad=True)
x.add_(1)
I will get the error:
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
I understand that Pytorch does not allow inplace operations on leaf variables and I also know that there are ways to get around this restrictions. What I don't understand is the philosophy behind this rule. Why is it wrong to change a leaf variable with inplace operations?
As I understand it, any time you do a non-traditional operation on a tensor that was initialized with requires_grad=True, Pytorch throws an error to make sure it was intentional. For example, you normally would only update a weight tensor using optimizer.step().
For another example, I ran into this issue when trying to update the values in a backprop-able tensor during network initialization.
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
The problem is that, because requires_grad=True, the network doesn't know that I'm still initializing the values. If this is what you are trying to do, wrapping the update in a torch.no_grad block is one solution:
with torch.no_grad()
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length), requires_grad=True)
self.weight_layer[true_ids == 1] = -1.2
Otherwise, you could just set requires_grad=True after you finish initializing the Tensor:
self.weight_layer = nn.Parameter(data=torch.zeros(seq_length))
self.weight_layer[true_ids == 1] = -1.2
self.weight_layer.requires_grad = True
The simple answer to this is that, once autograd creates the graph, and we have created our tensors which are a input to the graph, autograd will construct and track each operation on your created tensor. Now, since this tensor has requires_grad=True, and let's say I do a weight update after loss.backward(), autograd is probably considering as a part of the already created graph, which requires gradients. This leads to the RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
Autograd is confused that why is a tensor which is a part of the computational graph being used outside/being used for some other operations/initialization of it inplace and this is an an problem
if we simply place the code under a
with torch.no_grad(): code here
we disable the gradients, and hence there is essentially it signals to autograd that this operation is not a part of our dynamic graph updates.
PS: I will expand this answer by writing a blog of this on Medium
TLDR: How to create a variable that holds the confusion matrix used for computing custom metrics, accumulating the values across all of the evaluation steps?
I have implemented custom metrics to use in the tf.estimator.train_and_evaluation pipeline, with a confusion matrix as the crux for them all. My goal is to make this confusion matrix persist over multiple evaluation steps in order to track the learning progress.
Using get_variable in the variable scope did not work, since it does not save the variable to the checkpoint (or so it seems).
This does not work:
#property
def confusion_matrix(self):
with tf.variable_scope(
f"{self.name}-{self.metric_type}", reuse=tf.AUTO_REUSE
):
confusion_matrix = tf.get_variable(
name="confusion-matrix",
initializer=tf.zeros(
[self.n_classes, self.n_classes],
dtype=tf.float32,
name=f"{self.name}/{self.metric_type}-confusion-matrix",
),
aggregation=tf.VariableAggregation.SUM,
)
return confusion_matrix
Just saving the matrix as a class attribute works, but it obviously does not persist over multple steps:
self.confusion_matrix = tf.zeros(
[self.n_classes, self.n_classes],
dtype=tf.float32,
name=f"{self.name}/{self.metric_type}-confusion-matrix",
)
You can look at the full example here.
I expect to have this confusion matrix persist from end to finish during evaluation, but I do not need to have it in the final SavedModel. Could you please tell me how I can achieve this? Do I need to just save the matrix to an external file, or there is a better way?
You can define a custom metric:
def confusion_matrix(labels, predictions):
matrix = ... # confusion matrix calculation
mean, update_op = tf.metrics.mean_tensor(matrix)
# do something with mean if needed
return {'confusion_matrix': (mean, udpate_op)}
and then add it to your estimator:
estimator = tf.estimator.add_metrics(estimator, confusion_matrix)
if you need sum instead of meen you can take insight from tf.metrics.mean_tensor implementation
I am new to pytorch. I want to understand as to why we can't call the backward function on a variable containing a tensor of say size say [2,2].
And if we do want to call it on a variable containing tensor of say size say [2,2], we have to do that by first defining a gradient tensor and then calling the backward function on the variable containing the tensor w.r.t the defined gradients.
from the tutorial on autograd
If you want to compute the derivatives, you can call .backward() on a
Variable. If Variable is a scalar (i.e. it holds a one element data),
you don’t need to specify any arguments to backward(), however if it
has more elements, you need to specify a grad_output argument that is
a tensor of matching shape.
Basically to start the chain rule you need a gradient AT THE OUTPUT, to get it going. In the event the output is a scalar loss function ( which it usually is - normally you are beginning the backward pass at the loss variable ) , its an implied value of 1.0
from tutorial :
let's backprop now out.backward() is equivalent to doing
out.backward(torch.Tensor([1.0]))
but maybe you only want to update a subgraph ( somewhere deep in the network) ... and the value of a Variable is a matrix of weights. Then you have to tell it where the begin. From one of their chief devs ( somewhere in the links )
Yes, that's correct. We only support differentiation of scalar
functions, so if you want to start backward form a non-scalar value
you need to provide dout / dy
The gradients argument
https://discuss.pytorch.org/t/how-the-backward-works-for-torch-variable/907/8 ok explanation
Pytorch, what are the gradient arguments good explanation
http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html tutorial
I am loading from a saved model and I would like to be able to reset a tensorflow optimizer such as an Adam Optimizer. Ideally something like:
sess.run([tf.initialize_variables(Adamopt)])
or
sess.run([Adamopt.reset])
I have tried looking for an answer but have yet to find any way to do it. Here's what I've found which don't address the issue:
https://github.com/tensorflow/tensorflow/issues/634
In TensorFlow is there any way to just initialize uninitialised variables?
Tensorflow: Using Adam optimizer
I basically just want a way to reset the "slot" variables in the Adam Optimizer.
Thanks
In tensorflow 2.x, e.g., Adam optimizer, you can reset it like this:
for var in optimizer.variables():
var.assign(tf.zeros_like(var))
This question also bothered me for quite a while. Actually it's quite easy, you just define an operation to reset the current state of an optimizer which can be obtained by the variables() method, something like this:
optimizer = tf.train.AdamOptimizer(0.1, name='Optimizer')
reset_optimizer_op = tf.variables_initializer(optimizer.variables())
Whenever you need to reset the optimizer, run:
sess.run(reset_optimizer_op)
Official explanation of variables():
A list of variables which encode the current state of Optimizer.
Includes slot variables and additional global variables created by the optimizer in the current default graph.
e.g. for AdamOptimizer basically you will get the first and second moment(with slot_name 'm' and 'v') of all trainable variables, as long as beta1_power and beta2_power.
The simplest way I found was to give the optimizer its own variable scope and then run
optimizer_scope = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
"scope/prefix/for/optimizer")
sess.run(tf.initialize_variables(optimizer_scope))
idea from freeze weights
Building upon #EdisonLeejt answer for Tensorflow 2.x, more generally you can first get the initial state (may not be zero e.g. if loaded from a checkpoint file) and then assign it i.e.
#Get initial states
init_states = [var.value() for var in optimizer.variables()]
#Do the optimization
...
#reset optimizer state to init_state
for val,var in zip(init_states,optimizer.variables()): var.assign(val)
I want to implement the autoencoder (to be exact stacked convolutional autoencoder)
here I'd like to pretrain each layer first and then fine-tuning
So I created variables for weight of each layer
ex. W_1 = tf.Variable(initial_value, name,trainable=True etc) for first layer
and I pretrained W_1 of first layer
Then I want to pretrain weight of second layer (W_2)
Here I should use W_1 for calculating input of second layer.
However W_1 is trainable so if I use W_1 directly then tensorflow may train W_1 together.
So I should create W_1_out that keep value of W_1 but not trainable
To be honest I tried to modify code of this site
https://github.com/cmgreen210/TensorFlowDeepAutoencoder/blob/master/code/ae/autoencoder.py
At line 102 it creates variable by following code
self[name_w + "_fixed"] = tf.Variable(tf.identity(self[name_w]),
name=name_w + "_fixed",
trainable=False)
However it calls error cause it use uninitialized value
How should I do to copy variable but make it not trainable to pretrain next layers??
Not sure if still relevant, but I'll try anyway.
Generally, what I do in a situation like that is the following:
Populate the (default) graph according to the model you are building, e.g. for the first training step just create the first convolutional layer W1 you mention. When you train the first layer you can store the saved model once training is finished, then reload it and add the ops required for the second layer W2. Or you can just build the whole graph for W1 from scratch again directly in the code and then add the ops for W2.
If you are using the restore mechanism provided by Tensorflow, you will have the advantage that the weights for W1 are already the pre-trained ones. If you don't use the restore mechanism, you will have to set the W1 weights manually, e.g. by doing something shown in the snippet further below.
Then when you set up the training op, you can pass a list of variables as var_list to the optimizer which explicitly tells the optimizer which parameters are updated in order to minimize the loss. If this is set to None (the default), it just uses what it can find in tf.trainable_variables() which in turn is a collection of all tf.Variables that are trainable. May be check this answer, too, which basically says the same thing.
When using the var_list argument, graph collections come in handy. E.g. you could create a separate graph collection for every layer you want to train. The collection would contain the trainable variables for each layer and then you could very easily just retrieve the required collection and pass it as the var_list argument (see example below and/or the remark in the above linked documentation).
How to override the value of a variable: name is the name of the variable to be overriden, value is an array of the appropriate size and type and sess is the session:
variable = tf.get_default_graph().get_tensor_by_name(name)
sess.run(tf.assign(variable, value))
Note that the name needs an additional :0 in the end, so e.g. if the weights of your layer are called 'weights1' the name in the example should be 'weights1:0'.
To add a tensor to a custom collection: Use something along the following lines:
tf.add_to_collection('layer1_tensors', weights1)
tf.add_to_collection('layer1_tensors', some_other_trainable_variable)
Note that the first line creates the collection because it does not yet exist and the second line adds the given tensor to the existing collection.
How to use the custom collection: Now you can do something like this:
# loss = some tensorflow op computing the loss
var_list = tf.get_collection_ref('layer1_tensors')
optim = tf.train.AdamOptimizer().minimize(loss=loss, var_list=var_list)
You could also use tf.get_collection('layer_tensors') which would return you a copy of the collection.
Of course, if you don't wanna do any of this, you could just use trainable=False when creating the graph for all variables you don't want to be trainable as you hinted towards in your question. However, I don't like that option too much, because it requires you to pass in booleans into the functions that populate your graph, which is very easily overlooked and thus error-prone. Also, even if you decide to it like that, you would still have to restore the non-trainable variables manually.