I have an object detection model implemented in tensorflow.keras (version 1.15). I am trying to implement a modified (hybrid) loss function in my model. Basically I need a few variables defined in my loss function because I am processing the y_true and y_pred provided in my classification loss function (a focal loss to be exact). So, I naturally resided in implementing my ops inside the loss function.
I have defined a WrapperClass to initialize my variables:
class LossWrapper(object):
def __init__(self, num_centers):
...
self.total_loss = tf.Variable(0, dtype=tf.float32)
def loss_funcion(self, y_true, y_pred):
...
self.total_loss = self.total_loss + ...
I amd getting an error:
tensorflow.python.framework.errors_impl.FailedPreconditionError:
Attempting to use uninitialized value Variable
using self.total_loss_cosine = tf.zeros(1)[0] I am getting (a similar message):
tensorflow.python.framework.errors_impl.InvalidArgumentError:
Retval[0] does not have value
I came to the conclusion that no matter how I define my variable or where I define it (I have tried inside the __init__ function or in the main function body) I am getting an error stating about attempting to use some uninitialized Variable.
I am starting to think that I cannot initialize variables inside my loss function and probably I should implement them as a typical block outside it. Is this the case? Is the loss functions basically separated from the rest of the network so the typical initialization does not work as expected?
Some remarks:
The loss function seem to work flawless in eager execution mode where the initialization issue obviously does not exist.
In eager execution mode the type of y_true seem to be np.array and not tf.Tensor (or tf.EagerTensor at least). Does this mean that actually y_true and y_pred are propagated as numpy array in general, meaning, that this part is actually detached from the network? (I have tested this on eager execution only though)
Related
I have created a neural style transfer with Eager Execution, but it does not work when I try to turn it into a tf.function.
The error message says:
ValueError: tf.function only supports singleton tf.Variables created on the first call. Make sure the tf.Variable is only created once or created outside tf.function. See https://www.tensorflow.org/guide/function#creating_tfvariables for more information.
However, no variable is being created inside the function. Here is a simplified version of the code, which is just a neural style transfer with one image (the goal is to make the generated image look exactly like the content image):
import tensorflow as tf
import numpy as np
from PIL import Image
#Get and process the images
image = np.array(Image.open("frame7766.jpg")).reshape(1, 720, 1280, 3)/255
content_image = tf.convert_to_tensor(image, dtype = tf.float32)
# variable is defined outside of tf.function
generated_image = tf.Variable(np.random.rand(1, 720, 1280, 3)/2 + content_image/2, dtype = tf.float32)
def clip_0_1(image): # keeps image values between 0 and 1
return tf.clip_by_value(image, clip_value_min=0, clip_value_max=1)
# tf.function
def train_step(generated_image, content_image): #turn generated image into tf variable
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.01)
with tf.GradientTape() as tape:
cost = tf.reduce_sum(tf.square(generated_image - content_image))
grad = tape.gradient(cost, generated_image)
optimizer.apply_gradients([(grad, generated_image)]) # More information below
generated_image.assign(clip_0_1(generated_image))
return generated_image
generated_image = train_step(generated_image, content_image)
The error message points to the line
optimizer.apply_gradients([(grad, generated_image)])
I have tried to change the input of optimizer.apply_gradients to zip([grad], [generated_image]), and every combination of lists and tuples I can think of, but the error still remains. I have also looked through https://www.tensorflow.org/guide/function#creating_tfvariables and https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer, but neither of them shows examples where the variable is not explicitly defined.
The only conclusion that I can come to is that one of my commands (most likely optimizer.apply_gradients) creates a variable because of an issue in my earlier code. Is that correct?
The problem is that Adam creates additional variables to store the momentum terms for the model variables. By creating a new optimizer every training step, these variables are also re-created, resulting in the error message.
Note that it would also be a bad idea to do this without tf.function (which would not throw an error), precisely because the momentum terms would be re-initialized at every step, instead of being accumulated properly as they should be. This is why you should create the optimizer outside the training step, one time, at the beginning of training.
I am trying to optimize a loss function (defined using evidence lower bound) with tf.train.AdamOptimizer.minimize() on Tensorflow version 1.15.2 with eager execution enabled. I tried the following:
learning_rate = 0.01
optim = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optim.minimize(loss)
and got the following : RuntimeError: "loss" passed to Optimizer.compute_gradients should be a function when eager execution is enabled.
This works fine if I disable eager execution but since I need to save a tensorflow variable as a numpy array so I need eager execution enabled. The documentation mentions that when eager execution is enabled, the loss must be a callable. So the loss function should be defined in a way that it takes no inputs but gives out loss. I am not exactly sure how do I achieve such a thing.
I tried train_op = optim.minimize(lambda: loss) but got ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables [] and loss <function <lambda> at 0x7f3c67a93b00>
I'm trying to implement an INN (invertible neural network) with the structure as described in this paper.
I was wondering if it is possible to create a block (as proposed in the paper) as a custom keras layer with two different call functions.
The basic structur would look as follows:
import tensorflow as tf
import tensorflow.keras.layers as layers
class INNBlock(tf.keras.Model):
#inheriting from model instead of keras.layers.Layer, because I want manage the
#underlying layer as well
def __init__(self, size):
super(INNBlock, self).__init__(name='innblock')
#define layers
self.denseL1 = layers.Dense(size,activation='relu')
def call(self, inputs):
#define the relationship between the layers for a foward call
out = self.denseL1(inputs)
return out
def inverse_call(self, inputs):
#define inverse relationship between the layer
out = -self.denseL1(inputs) #use the same weights as the foward call
return out
class INN(tf.keras.Model):
def __init__(self,kenel_size,input_dim,min_clip,max_clip):
super(INN, self).__init__()
self.block_1 = INNBlock(size)
self.block_2 = INNBlock(size)
def call(self, inputs):
x = self.block_1(inputs)
x = self.block_2.inverse_call(y)
x = self.block_1.inverse_call(x)
return (y,x)
Solutions I already thought of (but don't particulary like):
Creating new layers for the inverse call and give them the same weights as the layers in the forward call.
Adding another dimension to inputs and have a variable in there, that determines whether or not the inverse call or the foward call is to be executed (but I don't know if this would even be allowed by keras)
I hope someone knows, if there is a way to implement this.
Thank you in advance :)
There is nothing wrong with your code. You can try it and it will run normally.
The call method is the standard method for when you simply do model_instance(input_tensor) or layer_instance(input_tensor).
But there is nothing wrong if you define another method and use that method inside the model's call method. What will happen is just:
If you use the_block(input_tensor), it will use the_block.call(input_tensor).
If you use the_block.inverse_call(input_tensor) somewhere outside a layer/model, it will fail to build a Keras model (nothing can be outside a layer)
If you use the_block.inverse_call(input_tensor) inside a layer/model (that's what you're doing), it is exactly the same as just writing the operations directly. You just wrapped it inside another function.
For Keras/Tensorflow, there will be nothing special about inverse_call. You can use it anywhere you could use any other keras/tensorflow function.
Will the gradients be updated twice?
Not exactly twice, but the operation will certainly be counted in. When the system calculates the gradient of the loss with relation to the weights, if the loss was built with inverse_call in the way, then it will participate in the gradient calculation.
But the update will be once per batch, as usual.
For my task, I do not need to compute gradients. I am simply replacing nn.L1Loss with a numpy function (corrcoef) in my loss evaluation but I get the following error:
RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
I couldn’t figure out how exactly I should detach the graph (I tried torch.Tensor.detach(np.corrcoef(x, y)) but I still get the same error. I eventually wrapped everything using with torch.no_grad as follow:
with torch.no_grad():
predFeats = self.forward(x)
targetFeats = self.forward(target)
loss = torch.from_numpy(np.corrcoef(predFeats.cpu().numpy().astype(np.float32), targetFeats.cpu().numpy().astype(np.float32))[1][1])
But this time I get the following error:
TypeError: expected np.ndarray (got numpy.float64)
I wonder, what am I doing wrong?
TL;DR
with torch.no_grad():
predFeats = self(x)
targetFeats = self(target)
loss = torch.tensor(np.corrcoef(predFeats.cpu().numpy(),
targetFeats.cpu().numpy())[1][1]).float()
You would avoid the first RuntimeError by detaching the tensors (predFeats and targetFeats) from the computational graph.
i.e. Getting a copy of the tensor data without the gradients and the gradient function (grad_fn).
So, instead of
torch.Tensor.detach(np.corrcoef(x.numpy(), y.numpy())) # Detaches a newly created tensor!
# x and y still may have gradients. Hence the first error.
which does nothing, do
# Detaches x and y properly
torch.Tensor(np.corrcoef(x.detach().numpy(), y.detach().numpy()))
But let's not bother with all the detachments.
Like you rightfully fixed, it, let's disable the gradients.
torch.no_grad()
Now, compute the features.
predFeats = self(x) # No need for the explicit .forward() call
targetFeats = self(target)
I found it helpful to break your last line up.
loss = np.corrcoef(predFeats.numpy(), targetFeats.numpy()) # We don't need to detach
# Notice that we don't need to cast the arguments to fp32
# since the `corrcoef` casts them to fp64 anyway.
print(loss.shape, loss.dtype) # A 2-dimensional fp64 matrix
loss = loss[1][1]
print(type(loss)) # Output: numpy.float64
# Loss now just a simple fp64 number
And that is the problem!
Because, when we do
loss = torch.from_numpy(loss)
we're passing in a number (numpy.float64) while it expects a numpy tensor (np.ndarray).
If you're using PyTorch 0.4 or up, there's inbuilt support for scalars.
Simply replace the from_numpy() method with the universal tensor() creation method.
loss = torch.tensor(loss)
P.S. You might also want to look at setting rowvar=False in corrcoef since the rows in PyTorch tensors usually represent the observations.
Whether does a Keras custom loss function accept global python variable?
I am building my own Keras custom loss function, which only accepts y_true and y_pred as arguments.But the loss function is quite complex and it depends on other variables.Currently in my implementation,the loss function just directly uses global variables in the same python code script.After training the model,if I want to use the model to do prediction,and then those global variables in the python environment will be changed. My question is that,do I need to compile the model again,to guarantee that the model has been updated with the latest version of those external global variables?
Rlist=....
def custom_loss(y_true,y_pred):
z = 0.0
#Rlist is the global variable
for j in Rlist:
z = z +K.log(K.sum(K.exp(K.gather(y_pred,j[0])))) \
- K.log(K.sum(K.exp(K.gather(y_pred,j))))
z = -z
return z
#below build the model and compile it with loss=custom_loss
model=...
model.compile(loss=custom_loss,....
model.fit(x=train_x,y=train_y,...)
#Rlist=... update Rlist which is adaptive to test dataset
#Do I need to recompile in the code below,or whether Rlist is updated
#in custom_loss when it is called?
model.predict(x=test_x,y=test_y,...)
In my loss function(actually this is the loss function for cox proportional hazard model),the loss is not additive among loss values for each samples.
Rlist is a global variable in the python environment of my Keras code
my question is that,after training the model,if I change this Rlist for
the test dataset,will Keras automatically update the Rlist,or it uses the old version of this variable Rlist when it compiles and builds the computation graph?
Is there any explanation that if I directly refer to a global variable from python environment in the loss function,then what will happen when Tensorflow builds its computation graph?
I know it's not a goop practice to use global variable.Better suggestions are also recommended.
What exactly do you mean by "python environment of my Keras code"? If you set the Rlist variable in your code while training to [1,2,3]. And then change it to [3,2,1] in prediction/production mode, you custom loss will see the [3,2,1] variable.
I'm not sure what you are trying to achieve, i suppose this could work:
A) Create a real ENV_Variable with RList
B) Create a JSON File with your RList (that way, you'll be able to use your RList data in production mode on server or cloud).
C) Create a Dict in your code like
RList={
'train': [1,2,3],
'test':[3,2,1],
'production':[4,5,6]
}