I am trying to optimize a loss function (defined using evidence lower bound) with tf.train.AdamOptimizer.minimize() on Tensorflow version 1.15.2 with eager execution enabled. I tried the following:
learning_rate = 0.01
optim = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optim.minimize(loss)
and got the following : RuntimeError: "loss" passed to Optimizer.compute_gradients should be a function when eager execution is enabled.
This works fine if I disable eager execution but since I need to save a tensorflow variable as a numpy array so I need eager execution enabled. The documentation mentions that when eager execution is enabled, the loss must be a callable. So the loss function should be defined in a way that it takes no inputs but gives out loss. I am not exactly sure how do I achieve such a thing.
I tried train_op = optim.minimize(lambda: loss) but got ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables [] and loss <function <lambda> at 0x7f3c67a93b00>
Related
I have an object detection model implemented in tensorflow.keras (version 1.15). I am trying to implement a modified (hybrid) loss function in my model. Basically I need a few variables defined in my loss function because I am processing the y_true and y_pred provided in my classification loss function (a focal loss to be exact). So, I naturally resided in implementing my ops inside the loss function.
I have defined a WrapperClass to initialize my variables:
class LossWrapper(object):
def __init__(self, num_centers):
...
self.total_loss = tf.Variable(0, dtype=tf.float32)
def loss_funcion(self, y_true, y_pred):
...
self.total_loss = self.total_loss + ...
I amd getting an error:
tensorflow.python.framework.errors_impl.FailedPreconditionError:
Attempting to use uninitialized value Variable
using self.total_loss_cosine = tf.zeros(1)[0] I am getting (a similar message):
tensorflow.python.framework.errors_impl.InvalidArgumentError:
Retval[0] does not have value
I came to the conclusion that no matter how I define my variable or where I define it (I have tried inside the __init__ function or in the main function body) I am getting an error stating about attempting to use some uninitialized Variable.
I am starting to think that I cannot initialize variables inside my loss function and probably I should implement them as a typical block outside it. Is this the case? Is the loss functions basically separated from the rest of the network so the typical initialization does not work as expected?
Some remarks:
The loss function seem to work flawless in eager execution mode where the initialization issue obviously does not exist.
In eager execution mode the type of y_true seem to be np.array and not tf.Tensor (or tf.EagerTensor at least). Does this mean that actually y_true and y_pred are propagated as numpy array in general, meaning, that this part is actually detached from the network? (I have tested this on eager execution only though)
For my task, I do not need to compute gradients. I am simply replacing nn.L1Loss with a numpy function (corrcoef) in my loss evaluation but I get the following error:
RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
I couldn’t figure out how exactly I should detach the graph (I tried torch.Tensor.detach(np.corrcoef(x, y)) but I still get the same error. I eventually wrapped everything using with torch.no_grad as follow:
with torch.no_grad():
predFeats = self.forward(x)
targetFeats = self.forward(target)
loss = torch.from_numpy(np.corrcoef(predFeats.cpu().numpy().astype(np.float32), targetFeats.cpu().numpy().astype(np.float32))[1][1])
But this time I get the following error:
TypeError: expected np.ndarray (got numpy.float64)
I wonder, what am I doing wrong?
TL;DR
with torch.no_grad():
predFeats = self(x)
targetFeats = self(target)
loss = torch.tensor(np.corrcoef(predFeats.cpu().numpy(),
targetFeats.cpu().numpy())[1][1]).float()
You would avoid the first RuntimeError by detaching the tensors (predFeats and targetFeats) from the computational graph.
i.e. Getting a copy of the tensor data without the gradients and the gradient function (grad_fn).
So, instead of
torch.Tensor.detach(np.corrcoef(x.numpy(), y.numpy())) # Detaches a newly created tensor!
# x and y still may have gradients. Hence the first error.
which does nothing, do
# Detaches x and y properly
torch.Tensor(np.corrcoef(x.detach().numpy(), y.detach().numpy()))
But let's not bother with all the detachments.
Like you rightfully fixed, it, let's disable the gradients.
torch.no_grad()
Now, compute the features.
predFeats = self(x) # No need for the explicit .forward() call
targetFeats = self(target)
I found it helpful to break your last line up.
loss = np.corrcoef(predFeats.numpy(), targetFeats.numpy()) # We don't need to detach
# Notice that we don't need to cast the arguments to fp32
# since the `corrcoef` casts them to fp64 anyway.
print(loss.shape, loss.dtype) # A 2-dimensional fp64 matrix
loss = loss[1][1]
print(type(loss)) # Output: numpy.float64
# Loss now just a simple fp64 number
And that is the problem!
Because, when we do
loss = torch.from_numpy(loss)
we're passing in a number (numpy.float64) while it expects a numpy tensor (np.ndarray).
If you're using PyTorch 0.4 or up, there's inbuilt support for scalars.
Simply replace the from_numpy() method with the universal tensor() creation method.
loss = torch.tensor(loss)
P.S. You might also want to look at setting rowvar=False in corrcoef since the rows in PyTorch tensors usually represent the observations.
I have some TensorFlow code in a custom loss function.
I'm using tf.Print(node, [debug1, debug2], "print my debugs: ")
It works fine but TF says tf.Print is depricated and will be removed once i update TensorFlow and that i should be using tf.**p**rint(), with small p.
I've tried using tf.print the same way i would tf.Print() but it's not working. Once i fit my model in Keras, i get an error. unlike tf.Print, tf.print seems to take in anything **kwargs, so what am i suppose to give it? and unlike tf.Print it do not seem to return something that i can inject into the computational graph.
It's really difficult to search because all the information online is about tf.Print().
Can someone explain how to use tf.print()?
Edit: Example code
def custom_loss(y_true, y_pred):
loss = K.mean(...)
print_no_op = tf.Print(loss, [loss, y_true, y_true.shape], "Debug output: ")
return print_no_op
model.compile(loss=custom_loss)
Both the documentation of tf.print and tf.Print mention that tf.print returns an operation with no output, so it cannot be evaluated to any value. The syntax of tf.print is meant to be more similar to Python's builtin print. In your case, you could use it as follows:
def custom_loss(y_true, y_pred):
loss = K.mean(...)
print_op = tf.print("Debug output:", loss, y_true, y_true.shape)
with tf.control_dependencies([print_op]):
return K.identity(loss)
Here K.identity creates a new tensor identical to loss but with a control dependency to print_op, so evaluating it will force executing the printing operation. Note that Keras also offers K.print_tensor, although it is less flexible than tf.print.
Just a little addition to jdehesa's excellent answer:
tf.tuple can be used to couple the print operation with another operation, which will then run with that operation whichever session executes the graph. Here's how that is done:
print_op = tf.print(something_you_want_to_print)
some_tensor_list = tf.tuple([some_tensor], control_inputs=[print_op])
# Use some_tensor_list[0] instead of any_tensor below.
I need to modify the weight values during the execution, more specifically between the compute_gradients() and apply_gradients() functions. I was able to modify the gradients themselves, but i could not change the weights.
I'm using the tutorial for the Iris NN in tensorflow:
https://github.com/tensorflow/models/blob/master/samples/core/get_started/custom_estimator.py , the only difference being that i changed the minimize() function for the compute_gradients() and the apply_gradients() function.
grads_and_vars = optimizer.compute_gradients(loss)
// some way to change the weights
train_op = optimizer.apply_gradients(grads_and_vars, global_step=tf.train.get_global_step())
Thanks in advance.
My best guess is that you are looking for tf.assign (from here) to assign values to your Variable tensors.
According to the docs:
Update 'ref' by assigning 'value' to it.
This operation outputs a Tensor that holds the new value of 'ref' after the value has been assigned. This makes it easier to chain operations that need to use the reset value.
I need to implement a custom objective function for Keras where i need an additional tensorflow placeholder for computation. In tensorflow, i have it as following,
pre_cost1 = tf.multiply((self.input_R - self.Decoder) , self.input_mask_R)
cost1 = tf.square(self.l2_norm(pre_cost1))
where input_mask_R is the tensorflow placeholder. input_R and Decoder are the placeholders corresponding to y_true and y_pred for Keras loss function respectively. I have the Keras loss function implemented as,
def custom_objective(y_true, y_pred):
pre_cost1 = tf.multiply((y_true - y_pred))
cost1 = tf.square(l2_norm(pre_cost1))
return cost1
I need to add the additional information for input mask in the loss function for keras. (It needs to be tensorflow placeholder since its a mask for the input which is different for each row of the input data).
Use the keras backend:
import keras.backend as K
Most functions for tensors are there, such as:
input_mask_R = K.placeholder(shape=(yourshape))
But maybe, since you want a predefined mask, what you need is:
input_mask_R = K.constant(arrayWithValues, shape=(yourshape))
And you can actually multiply and square also with K.multiply and K.square. That way, if you ever think of changing the backend, everything will be ok. (Also I'm not sure if Keras will handle direct calls to tensorflow functions.....)
See documentation: https://keras.io/backend/