I have some TensorFlow code in a custom loss function.
I'm using tf.Print(node, [debug1, debug2], "print my debugs: ")
It works fine but TF says tf.Print is depricated and will be removed once i update TensorFlow and that i should be using tf.**p**rint(), with small p.
I've tried using tf.print the same way i would tf.Print() but it's not working. Once i fit my model in Keras, i get an error. unlike tf.Print, tf.print seems to take in anything **kwargs, so what am i suppose to give it? and unlike tf.Print it do not seem to return something that i can inject into the computational graph.
It's really difficult to search because all the information online is about tf.Print().
Can someone explain how to use tf.print()?
Edit: Example code
def custom_loss(y_true, y_pred):
loss = K.mean(...)
print_no_op = tf.Print(loss, [loss, y_true, y_true.shape], "Debug output: ")
return print_no_op
model.compile(loss=custom_loss)
Both the documentation of tf.print and tf.Print mention that tf.print returns an operation with no output, so it cannot be evaluated to any value. The syntax of tf.print is meant to be more similar to Python's builtin print. In your case, you could use it as follows:
def custom_loss(y_true, y_pred):
loss = K.mean(...)
print_op = tf.print("Debug output:", loss, y_true, y_true.shape)
with tf.control_dependencies([print_op]):
return K.identity(loss)
Here K.identity creates a new tensor identical to loss but with a control dependency to print_op, so evaluating it will force executing the printing operation. Note that Keras also offers K.print_tensor, although it is less flexible than tf.print.
Just a little addition to jdehesa's excellent answer:
tf.tuple can be used to couple the print operation with another operation, which will then run with that operation whichever session executes the graph. Here's how that is done:
print_op = tf.print(something_you_want_to_print)
some_tensor_list = tf.tuple([some_tensor], control_inputs=[print_op])
# Use some_tensor_list[0] instead of any_tensor below.
Related
I am learning PyTorch for an image classification task, and I ran into code where someone used a PyTorch Variable() in their function for prediction:
def predict_image(image):
image_tensor = test_transforms(image).float()
image_tensor = image_tensor.unsqueeze_(0)
input = Variable(image_tensor)
input = input.to(device)
output = model(input)
index = output.data.cpu().numpy().argmax()
return index
Why do they use Variable() here? (even though it works fine without it.)
You can safely omit it. Variables are a legacy component of PyTorch, now deprecated, that used to be required for autograd:
Variable (deprecated)
WARNING
The Variable API has been deprecated: Variables are no longer necessary to use autograd with tensors. Autograd automatically supports Tensors with requires_grad set to True. Below please find a quick guide on what has changed:
Variable(tensor) and Variable(tensor, requires_grad) still work as expected, but they return Tensors instead of Variables.
I have an object detection model implemented in tensorflow.keras (version 1.15). I am trying to implement a modified (hybrid) loss function in my model. Basically I need a few variables defined in my loss function because I am processing the y_true and y_pred provided in my classification loss function (a focal loss to be exact). So, I naturally resided in implementing my ops inside the loss function.
I have defined a WrapperClass to initialize my variables:
class LossWrapper(object):
def __init__(self, num_centers):
...
self.total_loss = tf.Variable(0, dtype=tf.float32)
def loss_funcion(self, y_true, y_pred):
...
self.total_loss = self.total_loss + ...
I amd getting an error:
tensorflow.python.framework.errors_impl.FailedPreconditionError:
Attempting to use uninitialized value Variable
using self.total_loss_cosine = tf.zeros(1)[0] I am getting (a similar message):
tensorflow.python.framework.errors_impl.InvalidArgumentError:
Retval[0] does not have value
I came to the conclusion that no matter how I define my variable or where I define it (I have tried inside the __init__ function or in the main function body) I am getting an error stating about attempting to use some uninitialized Variable.
I am starting to think that I cannot initialize variables inside my loss function and probably I should implement them as a typical block outside it. Is this the case? Is the loss functions basically separated from the rest of the network so the typical initialization does not work as expected?
Some remarks:
The loss function seem to work flawless in eager execution mode where the initialization issue obviously does not exist.
In eager execution mode the type of y_true seem to be np.array and not tf.Tensor (or tf.EagerTensor at least). Does this mean that actually y_true and y_pred are propagated as numpy array in general, meaning, that this part is actually detached from the network? (I have tested this on eager execution only though)
Using tf print documentation
I wrote
print_op = tf.print("tensors:", cut_points[0,0,:], output_stream=sys.stderr)
with tf.control_dependencies([print_op]):
return cut_points
But not output to std whatsoever (I see other logs, and the session is indeed evaluates this point.
tf.control_dependencies only affects new operations created within the context. In you snippet, you are not creating any new operation in the context, so it is having no effect. The simplest solution is to use a tf.identity operation that will produce the same result but will have the control dependencies:
print_op = tf.print("tensors:", cut_points[0,0,:], output_stream=sys.stderr)
with tf.control_dependencies([print_op]):
return tf.identity(cut_points)
I want to use the external optimizer interface within tensorflow, to use newton optimizers, as tf.train only has first order gradient descent optimizers. At the same time, i want to build my network using tf.keras.layers, as it is way easier than using tf.Variables when building large, complex networks. I will show my issue with the following, simple 1D linear regression example:
import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np
#generate data
no = 100
data_x = np.linspace(0,1,no)
data_y = 2 * data_x + 2 + np.random.uniform(-0.5,0.5,no)
data_y = data_y.reshape(no,1)
data_x = data_x.reshape(no,1)
# Make model using keras layers and train
x = tf.placeholder(dtype=tf.float32, shape=[None,1])
y = tf.placeholder(dtype=tf.float32, shape=[None,1])
output = tf.keras.layers.Dense(1, activation=None)(x)
loss = tf.losses.mean_squared_error(data_y, output)
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method="L-BFGS-B")
sess = K.get_session()
sess.run(tf.global_variables_initializer())
tf_dict = {x : data_x, y : data_y}
optimizer.minimize(sess, feed_dict = tf_dict, fetches=[loss], loss_callback=lambda x: print("Loss:", x))
When running this, the loss just does not change at all. When using any other optimizer from tf.train, it works fine. Also, when using tf.layers.Dense() instead of tf.keras.layers.Dense() it does work using the ScipyOptimizerInterface. So really the question is what is the difference between tf.keras.layers.Dense() and tf.layers.Dense(). I saw that the Variables created by tf.layers.Dense() are of type tf.float32_ref while the Variables created by tf.keras.layers.Dense() are of type tf.float32. As far as I now, _ref indicates that this tensor is mutable. So maybe that's the issue? But then again, any other optimizer from tf.train works fine with keras layers.
Thanks
After a lot of digging I was able to find a possible explanation.
ScipyOptimizerInterface uses feed_dicts to simulate the updates of your variables during the optimization process. It only does an assign operation at the very end. In contrast, tf.train optimizers always do assign operations. The code of ScipyOptimizerInterface is not that complex so you can verify this easily.
Now the problem is that assigining variables with feed_dict is working mostly by accident. Here is a link where I learnt about this. In other words, assigning variables via feed dict, which is what ScipyOptimizerInterface does, is a hacky way of doing updates.
Now this hack mostly works, except when it does not. tf.keras.layers.Dense uses ResourceVariables to model the weights of the model. This is an improved version of simple Variables that has cleaner read/write semantics. The problem is that under the new semantics the feed dict update happens after the loss calculation. The link above gives some explanations.
Now tf.layers is currently a thin wrapper around tf.keras.layer so I am not sure why it would work. Maybe there is some compatibility check somewhere in the code.
The solutions to adress this are somewhat simple.
Either avoid using components that use ResourceVariables. This can be kind of difficult.
Patch ScipyOptimizerInterface to do assignments for variables always. This is relatively easy since all the required code is in one file.
There was some effort to make the interface work with eager (that by default uses the ResourceVariables). Check out this link
I think the problem is with the line
output = tf.keras.layers.Dense(1, activation=None)(x)
In this format output is not a layer but rather the output of a layer, which might be preventing the wrapper from collecting the weights and biases of the layer and feed them to the optimizer. Try to write it in two lines e.g.
output = tf.keras.layers.Dense(1, activation=None)
res = output(x)
If you want to keep the original format then you might have to manually collect all trainables and feed them to the optimizer via the var_list option
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, var_list = [Trainables], method="L-BFGS-B")
Hope this helps.
Problem
I'm running a Deep Neural Network on the MNIST where the loss defined as follow:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, label))
The program seems to run correctly until I get a nan loss in the 10000+ th minibatch. Sometimes, the program runs correctly until it finished. I think tf.nn.softmax_cross_entropy_with_logits is giving me this error.
This is strange, because the code just contains mul and add operations.
Possible Solution
Maybe I can use:
if cost == "nan":
optimizer = an empty optimizer
else:
...
optimizer = real optimizer
But I cannot find the type of nan. How can I check a variable is nan or not?
How else can I solve this problem?
I find a similar problem here TensorFlow cross_entropy NaN problem
Thanks to the author user1111929
tf.nn.softmax_cross_entropy_with_logits => -tf.reduce_sum(y_*tf.log(y_conv))
is actually a horrible way of computing the cross-entropy. In some samples, certain classes could be excluded with certainty after a while, resulting in y_conv=0 for that sample. That's normally not a problem since you're not interested in those, but in the way cross_entropy is written there, it yields 0*log(0) for that particular sample/class. Hence the NaN.
Replacing it with
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv + 1e-10))
Or
cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))
Solved nan problem.
The reason you are getting NaN's is most likely that somewhere in your cost function or softmax you are trying to take a log of zero, which is not a number. But to answer your specific question about detecting NaN, Python has a built-in capability to test for NaN in the math module. For example:
import math
val = float('nan')
val
if math.isnan(val):
print('Detected NaN')
import pdb; pdb.set_trace() # Break into debugger to look around
Check your learning rate. The bigger your network, more parameters to learn. That means you also need to decrease the learning rate.
I don't have your code or data. But tf.nn.softmax_cross_entropy_with_logits should be stable with a valid probability distribution (more info here). I assume your data does not meet this requirement. An analogous problem was also discussed here. Which would lead you to either:
Implement your own softmax_cross_entropy_with_logits function, e.g. try (source):
epsilon = tf.constant(value=0.00001, shape=shape)
logits = logits + epsilon
softmax = tf.nn.softmax(logits)
cross_entropy = -tf.reduce_sum(labels * tf.log(softmax), reduction_indices=[1])
Update your data so that it does have a valid probability distribution