No gradients provided for any variable - optimizer error - python

I'm computing as follows:
#Compute the cost
cost = tf.reduce_mean(tf.square(y - out))
minimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
Upon running minimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost), I receive this error:
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'parameters:0' shape=(15,) dtype=float32_ref>", "<tf.Variable 'weights:0' shape=(6,) dtype=float32_ref>"] and loss Tensor("Mean_1:0", dtype=float32).
Where is this path wrong and why?

Short version: The problem causing the error message is that your model function dose not use any tensorflow variables.
Meaning: The only TF.variable that you are defining is w, which is not used in the model function. Thus, there are no wights in the model that tensorflow can optimize with regard to the loss function. If you want tensorflow to optimize the coefficients, use the variable w instead of the constant coefficients c in your model definition and make sure they have the same size.
Also, you are using non tensorflow functions in your model definition like the append function instead of tf.append. This adds to the problem.
There are a number of more problems in your code. For example, you double defined the global variable initializer and the session.
I guess the basic problem is that you did not yet understand the basic structure of the low level tensorflow API. Explicitly, the concept of the graph and session definition. You first need to define a graph containing the complete definition of your model and using only tensorflow functions. Only afterwards you start a session in which you initialize the wights and start to train it.

Related

Calling .backward() function for two different neural networks but getting retain_graph=True error

I have an Actor Critic neural network where the Actor is its own class and the Critic is its own class with its own neural network and .forward() function. I then am creating an object of each of these classes in a larger Model class. My setup is as follows:
self.actor = Actor().to(device)
self.actor_opt = optim.Adam(self.actor.parameters(), lr=lr)
self.critic = Critic().to(device)
self.critic_opt = optim.Adam(self.critic.parameters(), lr=lr)
I then calculate two different loss functions and want to update each neural network separately. For the critic:
loss_critic = F.smooth_l1_loss(value, expected)
self.critic_opt.zero_grad()
loss_critic.backward()
self.critic_opt.step()
and for the actor:
loss_actor = -self.critic(state, action)
self.actor_opt.zero_grad()
loss_actor.backward()
self.actor_opt.step()
However, when doing this, I get the following error:
RuntimeError: Trying to backward through the graph a second time, but the saved
intermediate results have already been freed. Specify retain_graph=True when
calling backward the first time.
When reading up on this, I understood that I only need to retain_graph=True when calling backward twice on the same network, and in most cases this is not good to set to True as I will run out of GPU. Moreover, when I comment out one of the .backward() functions, the error goes away, leading me to believe that for some reason the code is thinking that both backward() functions are being called on the same neural network, even though I think I am doing it separately. What could be the reason for this? Is there a way to specify for which neural network I am calling the backward function on?
Edit:
For reference, the optimize() function in this code here https://github.com/wudongming97/PyTorch-DDPG/blob/master/train.py uses backward() twice with no issue (I've cloned the repo and tested it). I'd like my code to operate similarly where I backprop through critic and actor separately.
Yes, you shouldn't do it like that. What you should do instead, is propagating through parts of the graph.
What the graph contains
Now, graph contains both actor and critic. If the computations pass through the same part of graph (say, twice through actor), it will raise this error.
And they will, as you clearly use actor and critic joined with loss value (this line: loss_actor = -self.critic(state, action))
Different optimizers do not change anything here, as it's backward problem (optimizers simply apply calculated gradients onto models)
Trying to fix it
This is how to fix it in GANs, but not in this case, see Actual fix paragraph below, read on if you are curious about the topic
If part of a neural network (critic in this case) does not take part in the current optimization step, it should be treated as a constant (and vice versa).
To do that, you could disable gradient using torch.no_grad context manager (documentation) and set critic to eval mode (documentation), something along those lines:
self.critic.eval()
with torch.no_grad():
loss_actor = -self.critic(state, action)
...
But, here is a problem:
We are turning off gradient (tape recording) for action and breaking the graph!
hence this is not a viable solution.
Actual solution
It is much simpler than you think, one can see it in PyTorch's repository also:
Do not backpropagate after critic/actor loss
Calculate all losses (for both critic and actor)
sum them together
zero_grad for both optimizers
backpropagate with this summed value
critic_optimizer.step() and actor_optimizer.step() at this point
Something along those lines:
self.critic_opt.zero_grad()
self.actor_opt.zero_grad()
loss_critic = F.smooth_l1_loss(value, expected)
loss_actor = -self.critic(state, action)
total_loss = loss_actor + loss_critic
total_loss.backward()
self.critic_opt.step()
self.actor_opt.step()

Visualizing custom loss in double-head model

Using an A2C agent from this article, how to get numerical values of value_loss, policy_loss and entropy_loss when weights are being updated?
The model I'm using is double-headed, both heads share the same trunk. The policy head output shape is [number of actions, batch size] and value head has a shape of [1, batch_size]. Compiling this model returns a size incompatibility error, when these loss functions are given as metrics:
self.model.compile(optimizer=self.optimizer,
metrics=[self._logits_loss, self._value_loss],
loss=[self._logits_loss, self._value_loss])
Both self._value_loss and self._policy_loss are executed as graphs, meaning that all variables inside them are only pointers to graph nodes. I found some examples where Tensor objects are evaluated (with eval()) to get the value out of nodes. I don't understand them because in order to eval() a Tensor object you need to give it a Session but in TensorFlow 2.x Sessions are deprecated.
Another lead, when calling train_on_batch() from Model API in Keras to train the model, the method returns losses. I don't understand why, but the only losses it returns are from the policy head. Losses from that head are calculated as policy_loss - entropy_loss but my goal is to get all three losses separately to visualize them in a graph.
Any help is welcome, I'm stuck.
I found the answer to my problem. In Keras, the metrics built-in functionality provides an interface for measuring performance and losses of the model, be it a custom or standard one.
When compiling a model as follows:
self.model.compile(optimizer=ko.RMSprop(lr=lr),
metrics=dict(output_1=self._entropy_loss),
loss=dict(output_1=self._logits_loss, output_2=self._value_loss))
... self.model.train_on_batch([...]) returns a list of [total_loss, logits_loss, value_loss, entropy_loss]. By making a calculation of logits_loss + entropy_loss the value of policy_loss can be calculated. Beware, that this solution results in calling self._entropy_loss() twice.

Train two consecutive models in tensorflow

I am trying to build a model in tensorflow, while I use two consecutive models. Unfortunately I can't include them within one model. The first Model is basically an encoder, the second returns my needed value.
out = Model_a(image_input)
value = Model_b(out)
loss = f(value)
I can train Model_b using the given loss, but would then need the gradients of the first layer (of Model_b) regarding the loss to proceed for the gradient calculation in Model_a. Furthermore I would need somehow a function that calculates the gradients based on these gradients, instead of a loss function. Does anyone have an idea if tensorflow already has such functionality or had to tackle similar problems?
Cheers
I found a working solution, for any who has similar problems. Using Tensorflow 2.0 and the keras eager-mode (using GradientTape) one can construct any loss function as desired, even including consecutive models. Important is that the predict function will not work, one needs to use the direct call.
Now the gradients can be calculated for each model regarding that loss function, which seems to work so far. Important is that the models itself are included in the loss function and not a copy of the output or at least the copy is generated within the Tape environment. An example is found below:
optimizer = tf.keras.optimizers.Adam(lr=0.1)
with tf.GradientTape(persistent=True) as tape:
error = (model2(model1(x)) - y)
loss_value = tf.reduce_mean(tf.square(error))
gradients1 = tape.gradient(loss_value, model1.variables)
gradients2 = tape.gradient(loss_value, model2.variables)
optimizer.apply_gradients(zip(gradients1, model1.variables))
optimizer.apply_gradients(zip(gradients2, model2.variables))
If anyone finds a more efficient or "prettier" solution I would be happy if he/she shares it.
Cheers

In PyTorch, how to I make certain module `Parameters` static during training?

Context:
In pytorch, any Parameter is a special kind of Tensor. A Parameter is automatically registered with a module's parameters() method when it is assigned as an attribute.
During training, I will pass m.parameters() to the Optimizer instance so they can be updated.
Question: For a built-in pytorch module, how to I prevent certain parameters from being modified by the optimizer?
s = Sequential(
nn.Linear(2,2),
nn.Linear(2,3), # I want this one's .weight and .bias to be constant
nn.Linear(3,1)
)
Can I make it so they don't appear in s.parameters()?
Can I make the parameters read-only so any attempted changes are ignored?
Parameters can be made static by setting their attribute requires_grad=False.
In my example case:
params = list(s.parameters()) # .parameters() returns a generator
# Each linear layer has 2 parameters (.weight and .bias),
# Skipping first layer's parameters (indices 0, 1):
params[2].requires_grad = False
params[3].requires_grad = False
When a mix of requires_grad=True and requires_grad=False tensors are used to make a calculation, the result inherits requires_grad=True.
According to the PyTorch autograd mechanics documentation:
If there’s a single input to an operation that requires gradient, its output will also require gradient. Conversely, only if all inputs don’t require gradient, the output also won’t require it. Backward computation is never performed in the subgraphs, where all Tensors didn’t require gradients.
My concern was that if I disabled gradient tracking for the middle layer, the first layer wouldn't receive backpropagated gradients. This was faulty understanding.
Edge Case: If I disable gradients for all parameters in a module and try to train, the optimizer will raise an exception. Because there is not a single tensor to apply the backward() pass to.
This edge case is why I was getting errors. I was trying to test requires_grad=False on parameters for module with a single nn.Linear layer. That meant I disabled tracking for all parameters, which caused the optimizer to complain.

TensorFlow - How to minimize function of one variable?

I've been given a fully trained model by another researcher that has inputs as placeholders. Regarding it as a function f(x), I would like to find x to minimize my distance metric (loss function) dist(x, f(x)). This could be something like the euclidean distance between the two points.
I tried to use TensorFlow's built-in optimizer functions. The issue is that tf.train.AdamOptimizer(1e-4).minimize(loss, var_list[input_placeholder]) fails, complaining that input_placeholder isn't of a supported type. Thus, I cannot get gradients for my input.
How can I optimize a function in TensorFlow when the inputs have to be specified in this way? Unfortunately, these placeholders are not passed through a Variable first, and I have to treat that model as a black box.
Using the Keras functional API detailed in this question, I created a dense layer with no bias to sit right before the model I was given. Holding its input as a constant all 1's vector, I optimized the joined model using only the Variable in the dense layer, giving me the optimal vector as the output of that layer.
All TensorFlow Optimizer subclasses allow you to minimize while only modifying a particular set of Variables, which I got out of Keras fairly simply.

Categories