Optimizer.apply_gradients creating variables in tf.function

Optimizer.apply_gradients creating variables in tf.function - python

I have created a neural style transfer with Eager Execution, but it does not work when I try to turn it into a tf.function.
The error message says:
ValueError: tf.function only supports singleton tf.Variables created on the first call. Make sure the tf.Variable is only created once or created outside tf.function. See https://www.tensorflow.org/guide/function#creating_tfvariables for more information.
However, no variable is being created inside the function. Here is a simplified version of the code, which is just a neural style transfer with one image (the goal is to make the generated image look exactly like the content image):
import tensorflow as tf
import numpy as np
from PIL import Image
#Get and process the images
image = np.array(Image.open("frame7766.jpg")).reshape(1, 720, 1280, 3)/255
content_image = tf.convert_to_tensor(image, dtype = tf.float32)
# variable is defined outside of tf.function
generated_image = tf.Variable(np.random.rand(1, 720, 1280, 3)/2 + content_image/2, dtype = tf.float32)
def clip_0_1(image): # keeps image values between 0 and 1
return tf.clip_by_value(image, clip_value_min=0, clip_value_max=1)
# tf.function
def train_step(generated_image, content_image): #turn generated image into tf variable
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.01)
with tf.GradientTape() as tape:
cost = tf.reduce_sum(tf.square(generated_image - content_image))
grad = tape.gradient(cost, generated_image)
optimizer.apply_gradients([(grad, generated_image)]) # More information below
generated_image.assign(clip_0_1(generated_image))
return generated_image
generated_image = train_step(generated_image, content_image)
The error message points to the line
optimizer.apply_gradients([(grad, generated_image)])
I have tried to change the input of optimizer.apply_gradients to zip([grad], [generated_image]), and every combination of lists and tuples I can think of, but the error still remains. I have also looked through https://www.tensorflow.org/guide/function#creating_tfvariables and https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer, but neither of them shows examples where the variable is not explicitly defined.
The only conclusion that I can come to is that one of my commands (most likely optimizer.apply_gradients) creates a variable because of an issue in my earlier code. Is that correct?

The problem is that Adam creates additional variables to store the momentum terms for the model variables. By creating a new optimizer every training step, these variables are also re-created, resulting in the error message.
Note that it would also be a bad idea to do this without tf.function (which would not throw an error), precisely because the momentum terms would be re-initialized at every step, instead of being accumulated properly as they should be. This is why you should create the optimizer outside the training step, one time, at the beginning of training.

Related

Add additional scalar parameter to pytorch model gives runtimeerror

I'm trying to add a scalar parameter to my model (code too complex to attach), but it is effectively like:
class WholeModel:
def __init__(...):
self.new_parameter = Parameter(torch.scalar_tensor(0.1, requires_grad=True))
self.model = self.make_model()
def make_model(self):
d = distribution() # returns a Distribution which is a Module
d = transform_distribution(d, self.new_parameter)
d.register_parameter(name='new', param=self.new_parameter)
return d
However, I run into this error RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
If I change self.new_parameter = Parameter(torch.scalar_tensor(0.1)) to self.new_parameter = torch.scalar_tensor(0.1) and remove the register_parameter, then it compiles and runs (but then obviously its not then learning the parameter).
I've also tried using a tensor rather than a scalar_tensor but this also doesn't work. The error occurs with/without requires_grad.
Any ideas? It really is just a simple addition to a blackbox model.
Thanks

Tensorflow variable initialization inside loss function

I have an object detection model implemented in tensorflow.keras (version 1.15). I am trying to implement a modified (hybrid) loss function in my model. Basically I need a few variables defined in my loss function because I am processing the y_true and y_pred provided in my classification loss function (a focal loss to be exact). So, I naturally resided in implementing my ops inside the loss function.
I have defined a WrapperClass to initialize my variables:
class LossWrapper(object):
def __init__(self, num_centers):
...
self.total_loss = tf.Variable(0, dtype=tf.float32)
def loss_funcion(self, y_true, y_pred):
...
self.total_loss = self.total_loss + ...
I amd getting an error:
tensorflow.python.framework.errors_impl.FailedPreconditionError:
Attempting to use uninitialized value Variable
using self.total_loss_cosine = tf.zeros(1)[0] I am getting (a similar message):
tensorflow.python.framework.errors_impl.InvalidArgumentError:
Retval[0] does not have value
I came to the conclusion that no matter how I define my variable or where I define it (I have tried inside the __init__ function or in the main function body) I am getting an error stating about attempting to use some uninitialized Variable.
I am starting to think that I cannot initialize variables inside my loss function and probably I should implement them as a typical block outside it. Is this the case? Is the loss functions basically separated from the rest of the network so the typical initialization does not work as expected?
Some remarks:
The loss function seem to work flawless in eager execution mode where the initialization issue obviously does not exist.
In eager execution mode the type of y_true seem to be np.array and not tf.Tensor (or tf.EagerTensor at least). Does this mean that actually y_true and y_pred are propagated as numpy array in general, meaning, that this part is actually detached from the network? (I have tested this on eager execution only though)

Using tf.contrib.opt.ScipyOptimizerInterface with tf.keras.layers, loss not changing

I want to use the external optimizer interface within tensorflow, to use newton optimizers, as tf.train only has first order gradient descent optimizers. At the same time, i want to build my network using tf.keras.layers, as it is way easier than using tf.Variables when building large, complex networks. I will show my issue with the following, simple 1D linear regression example:
import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np
#generate data
no = 100
data_x = np.linspace(0,1,no)
data_y = 2 * data_x + 2 + np.random.uniform(-0.5,0.5,no)
data_y = data_y.reshape(no,1)
data_x = data_x.reshape(no,1)
# Make model using keras layers and train
x = tf.placeholder(dtype=tf.float32, shape=[None,1])
y = tf.placeholder(dtype=tf.float32, shape=[None,1])
output = tf.keras.layers.Dense(1, activation=None)(x)
loss = tf.losses.mean_squared_error(data_y, output)
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method="L-BFGS-B")
sess = K.get_session()
sess.run(tf.global_variables_initializer())
tf_dict = {x : data_x, y : data_y}
optimizer.minimize(sess, feed_dict = tf_dict, fetches=[loss], loss_callback=lambda x: print("Loss:", x))
When running this, the loss just does not change at all. When using any other optimizer from tf.train, it works fine. Also, when using tf.layers.Dense() instead of tf.keras.layers.Dense() it does work using the ScipyOptimizerInterface. So really the question is what is the difference between tf.keras.layers.Dense() and tf.layers.Dense(). I saw that the Variables created by tf.layers.Dense() are of type tf.float32_ref while the Variables created by tf.keras.layers.Dense() are of type tf.float32. As far as I now, _ref indicates that this tensor is mutable. So maybe that's the issue? But then again, any other optimizer from tf.train works fine with keras layers.
Thanks

After a lot of digging I was able to find a possible explanation.
ScipyOptimizerInterface uses feed_dicts to simulate the updates of your variables during the optimization process. It only does an assign operation at the very end. In contrast, tf.train optimizers always do assign operations. The code of ScipyOptimizerInterface is not that complex so you can verify this easily.
Now the problem is that assigining variables with feed_dict is working mostly by accident. Here is a link where I learnt about this. In other words, assigning variables via feed dict, which is what ScipyOptimizerInterface does, is a hacky way of doing updates.
Now this hack mostly works, except when it does not. tf.keras.layers.Dense uses ResourceVariables to model the weights of the model. This is an improved version of simple Variables that has cleaner read/write semantics. The problem is that under the new semantics the feed dict update happens after the loss calculation. The link above gives some explanations.
Now tf.layers is currently a thin wrapper around tf.keras.layer so I am not sure why it would work. Maybe there is some compatibility check somewhere in the code.
The solutions to adress this are somewhat simple.
Either avoid using components that use ResourceVariables. This can be kind of difficult.
Patch ScipyOptimizerInterface to do assignments for variables always. This is relatively easy since all the required code is in one file.
There was some effort to make the interface work with eager (that by default uses the ResourceVariables). Check out this link

I think the problem is with the line
output = tf.keras.layers.Dense(1, activation=None)(x)
In this format output is not a layer but rather the output of a layer, which might be preventing the wrapper from collecting the weights and biases of the layer and feed them to the optimizer. Try to write it in two lines e.g.
output = tf.keras.layers.Dense(1, activation=None)
res = output(x)
If you want to keep the original format then you might have to manually collect all trainables and feed them to the optimizer via the var_list option
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, var_list = [Trainables], method="L-BFGS-B")
Hope this helps.

How to use a numpy function as the loss function in PyTorch and avoid getting errors during run time?

For my task, I do not need to compute gradients. I am simply replacing nn.L1Loss with a numpy function (corrcoef) in my loss evaluation but I get the following error:
RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
I couldn’t figure out how exactly I should detach the graph (I tried torch.Tensor.detach(np.corrcoef(x, y)) but I still get the same error. I eventually wrapped everything using with torch.no_grad as follow:
with torch.no_grad():
predFeats = self.forward(x)
targetFeats = self.forward(target)
loss = torch.from_numpy(np.corrcoef(predFeats.cpu().numpy().astype(np.float32), targetFeats.cpu().numpy().astype(np.float32))[1][1])
But this time I get the following error:
TypeError: expected np.ndarray (got numpy.float64)
I wonder, what am I doing wrong?

TL;DR
with torch.no_grad():
predFeats = self(x)
targetFeats = self(target)
loss = torch.tensor(np.corrcoef(predFeats.cpu().numpy(),
targetFeats.cpu().numpy())[1][1]).float()
You would avoid the first RuntimeError by detaching the tensors (predFeats and targetFeats) from the computational graph.
i.e. Getting a copy of the tensor data without the gradients and the gradient function (grad_fn).
So, instead of
torch.Tensor.detach(np.corrcoef(x.numpy(), y.numpy())) # Detaches a newly created tensor!
# x and y still may have gradients. Hence the first error.
which does nothing, do
# Detaches x and y properly
torch.Tensor(np.corrcoef(x.detach().numpy(), y.detach().numpy()))
But let's not bother with all the detachments.
Like you rightfully fixed, it, let's disable the gradients.
torch.no_grad()
Now, compute the features.
predFeats = self(x) # No need for the explicit .forward() call
targetFeats = self(target)
I found it helpful to break your last line up.
loss = np.corrcoef(predFeats.numpy(), targetFeats.numpy()) # We don't need to detach
# Notice that we don't need to cast the arguments to fp32
# since the `corrcoef` casts them to fp64 anyway.
print(loss.shape, loss.dtype) # A 2-dimensional fp64 matrix
loss = loss[1][1]
print(type(loss)) # Output: numpy.float64
# Loss now just a simple fp64 number
And that is the problem!
Because, when we do
loss = torch.from_numpy(loss)
we're passing in a number (numpy.float64) while it expects a numpy tensor (np.ndarray).
If you're using PyTorch 0.4 or up, there's inbuilt support for scalars.
Simply replace the from_numpy() method with the universal tensor() creation method.
loss = torch.tensor(loss)
P.S. You might also want to look at setting rowvar=False in corrcoef since the rows in PyTorch tensors usually represent the observations.

Tensorflow creates a new set of already existing variables each session run?

I'm finally using my LSTM model to predict things. However, I've run into a new problem that I don't quite understand. If I try to predict something using
sess.run(pred, feed_dict={x: xs})
It works great for the first prediction, but any subsequent predictions throw the error:
ValueError: Variable weight_def/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?
Now, there are a TON of topics on this - and most of them are easily solved by doing what it asks - just create a variable scope around the offending line and make variable reuse true. Now, if I do that I get the following error:
ValueError: Variable rnn_def/RNN/BasicLSTMCell/Linear/Matrix does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
This is causing me quite the headache. I've read the Tensorflow Variable Sharing documentation over and over, and I can't for the life of me figure out what I am doing wrong. Here the offending lines
with tf.variable_scope("rnn_def"):
outputs, states = rnn.rnn(self.__lstm_cell,
self.__x,
dtype=tf.float32)
self.__outputs = outputs
self.__states = states
I have this code nested in a larger class that just contains the remainder of the graph. To train it, I just call my "train" method over and over again. Which seems to work fine, the problem ends up being prediction.
So my question is two fold:
Why do I require some sort of variable sharing only after the first prediction but the first call doesn't fail? What do I need to fix this code so I can predict more than once without causing an error?
When is variable sharing useful, and why is Tensorflow creating new variables each time I run it? How can I prevent this (do I want to prevent it?)?
Thank you!

Add a print statement to that block of code. I suspect it is being called multiple times. Or maybe you are creating multiple instances of the class in which each class should have its own scope name.
To answer your questions.
Why do I require some sort of variable sharing only after the first
prediction but the first call doesn't fail? What do I need to fix this
code so I can predict more than once without causing an error?
No you don't. That block of code creating the RNN is probably being accidentally called multiple times.
When is variable sharing useful, and why is Tensorflow creating new
variables each time I run it? How can I prevent this (do I want to
prevent it?)?
It is useful in the following case where I have different input sources for part of my graph depending on whether is is training or predicting.
x_train, h_train = ops.sharpen_cell(x_train, h_train, 2, n_features, conv_size, n_convs, conv_activation, 'upsampler')
self.cost += tf.reduce_mean((x_train - y_train) ** 2)
level_scope.reuse_variables()
x_gen, h_gen = ops.sharpen_cell(x_gen, h_gen, 2, n_features, conv_size, n_convs, conv_activation, 'upsampler')
self.generator_outputs.append(tf.clip_by_value(x_gen, -1, 1))
In this example is reuses the variables for the generator which were trained with the trainer. It is also useful if you want to unroll and RNN in a loop. Such as in this case...
y = #initial value
state = #initial state
rnn = #some sort of RNN cell
with tf.variable_scope("rnn") as scope:
for t in range(10):
y, state = rnn(y, state)
scope.reuse_variabled()
In this case it will reuse the rnn weights between time steps which is the desired behavior for an RNN.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.