Typecasting error in TensorFlow - python

Suppose I am implementing a linear layer on some training data that looks like
The following code
import tensorflow as tf
import numpy as np
weights = tf.Variable(np.random.uniform(0.0, 1.0, 3))
bias = tf.Variable(0.0)
trainingData = np.array(np.arange(15).astype(float).reshape(3,5))
output = tf.expand_dims(weights, 0) # trainingData + bias
produces
This can be fixed by instead changing the last line to say
tf.cast(tf.expand_dims(weights, 0) # trainingData, tf.float32) + bias
OK, so it doesn't like adding a float32_ref to a float64, but it's OK with adding a float32_ref to a float32. But I must be doing something wrong, because I'm doing something very simple, and it's throwing an error. (I'm new to TensorFlow.) I understand why it didn't like what I wrote, but what basic mistake am I making that's causing this problem?
I'm looking for an answer like "Oh, you should never initialize bias with a float like 0.0, because..." "That will lead to typecasting errors more generally."

Oh, you should never use tf.Variable unless you have a very good reason. You should use tf.get_variable instead to avoid issues.
Oh, you should never use float64 as the data type, unless you do have a good reason. NumPy uses float64 as a default, so you should write something like
W = tf.get_variable("w", initializer=np.random.randn().astype(np.float32))

Related

Using tf.contrib.opt.ScipyOptimizerInterface with tf.keras.layers, loss not changing

I want to use the external optimizer interface within tensorflow, to use newton optimizers, as tf.train only has first order gradient descent optimizers. At the same time, i want to build my network using tf.keras.layers, as it is way easier than using tf.Variables when building large, complex networks. I will show my issue with the following, simple 1D linear regression example:
import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np
#generate data
no = 100
data_x = np.linspace(0,1,no)
data_y = 2 * data_x + 2 + np.random.uniform(-0.5,0.5,no)
data_y = data_y.reshape(no,1)
data_x = data_x.reshape(no,1)
# Make model using keras layers and train
x = tf.placeholder(dtype=tf.float32, shape=[None,1])
y = tf.placeholder(dtype=tf.float32, shape=[None,1])
output = tf.keras.layers.Dense(1, activation=None)(x)
loss = tf.losses.mean_squared_error(data_y, output)
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method="L-BFGS-B")
sess = K.get_session()
sess.run(tf.global_variables_initializer())
tf_dict = {x : data_x, y : data_y}
optimizer.minimize(sess, feed_dict = tf_dict, fetches=[loss], loss_callback=lambda x: print("Loss:", x))
When running this, the loss just does not change at all. When using any other optimizer from tf.train, it works fine. Also, when using tf.layers.Dense() instead of tf.keras.layers.Dense() it does work using the ScipyOptimizerInterface. So really the question is what is the difference between tf.keras.layers.Dense() and tf.layers.Dense(). I saw that the Variables created by tf.layers.Dense() are of type tf.float32_ref while the Variables created by tf.keras.layers.Dense() are of type tf.float32. As far as I now, _ref indicates that this tensor is mutable. So maybe that's the issue? But then again, any other optimizer from tf.train works fine with keras layers.
Thanks
After a lot of digging I was able to find a possible explanation.
ScipyOptimizerInterface uses feed_dicts to simulate the updates of your variables during the optimization process. It only does an assign operation at the very end. In contrast, tf.train optimizers always do assign operations. The code of ScipyOptimizerInterface is not that complex so you can verify this easily.
Now the problem is that assigining variables with feed_dict is working mostly by accident. Here is a link where I learnt about this. In other words, assigning variables via feed dict, which is what ScipyOptimizerInterface does, is a hacky way of doing updates.
Now this hack mostly works, except when it does not. tf.keras.layers.Dense uses ResourceVariables to model the weights of the model. This is an improved version of simple Variables that has cleaner read/write semantics. The problem is that under the new semantics the feed dict update happens after the loss calculation. The link above gives some explanations.
Now tf.layers is currently a thin wrapper around tf.keras.layer so I am not sure why it would work. Maybe there is some compatibility check somewhere in the code.
The solutions to adress this are somewhat simple.
Either avoid using components that use ResourceVariables. This can be kind of difficult.
Patch ScipyOptimizerInterface to do assignments for variables always. This is relatively easy since all the required code is in one file.
There was some effort to make the interface work with eager (that by default uses the ResourceVariables). Check out this link
I think the problem is with the line
output = tf.keras.layers.Dense(1, activation=None)(x)
In this format output is not a layer but rather the output of a layer, which might be preventing the wrapper from collecting the weights and biases of the layer and feed them to the optimizer. Try to write it in two lines e.g.
output = tf.keras.layers.Dense(1, activation=None)
res = output(x)
If you want to keep the original format then you might have to manually collect all trainables and feed them to the optimizer via the var_list option
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, var_list = [Trainables], method="L-BFGS-B")
Hope this helps.

How to use a numpy function as the loss function in PyTorch and avoid getting errors during run time?

For my task, I do not need to compute gradients. I am simply replacing nn.L1Loss with a numpy function (corrcoef) in my loss evaluation but I get the following error:
RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
I couldn’t figure out how exactly I should detach the graph (I tried torch.Tensor.detach(np.corrcoef(x, y)) but I still get the same error. I eventually wrapped everything using with torch.no_grad as follow:
with torch.no_grad():
predFeats = self.forward(x)
targetFeats = self.forward(target)
loss = torch.from_numpy(np.corrcoef(predFeats.cpu().numpy().astype(np.float32), targetFeats.cpu().numpy().astype(np.float32))[1][1])
But this time I get the following error:
TypeError: expected np.ndarray (got numpy.float64)
I wonder, what am I doing wrong?
TL;DR
with torch.no_grad():
predFeats = self(x)
targetFeats = self(target)
loss = torch.tensor(np.corrcoef(predFeats.cpu().numpy(),
targetFeats.cpu().numpy())[1][1]).float()
You would avoid the first RuntimeError by detaching the tensors (predFeats and targetFeats) from the computational graph.
i.e. Getting a copy of the tensor data without the gradients and the gradient function (grad_fn).
So, instead of
torch.Tensor.detach(np.corrcoef(x.numpy(), y.numpy())) # Detaches a newly created tensor!
# x and y still may have gradients. Hence the first error.
which does nothing, do
# Detaches x and y properly
torch.Tensor(np.corrcoef(x.detach().numpy(), y.detach().numpy()))
But let's not bother with all the detachments.
Like you rightfully fixed, it, let's disable the gradients.
torch.no_grad()
Now, compute the features.
predFeats = self(x) # No need for the explicit .forward() call
targetFeats = self(target)
I found it helpful to break your last line up.
loss = np.corrcoef(predFeats.numpy(), targetFeats.numpy()) # We don't need to detach
# Notice that we don't need to cast the arguments to fp32
# since the `corrcoef` casts them to fp64 anyway.
print(loss.shape, loss.dtype) # A 2-dimensional fp64 matrix
loss = loss[1][1]
print(type(loss)) # Output: numpy.float64
# Loss now just a simple fp64 number
And that is the problem!
Because, when we do
loss = torch.from_numpy(loss)
we're passing in a number (numpy.float64) while it expects a numpy tensor (np.ndarray).
If you're using PyTorch 0.4 or up, there's inbuilt support for scalars.
Simply replace the from_numpy() method with the universal tensor() creation method.
loss = torch.tensor(loss)
P.S. You might also want to look at setting rowvar=False in corrcoef since the rows in PyTorch tensors usually represent the observations.

Issues printing an estimator.predict tensor

I am currently using tf 1.4, and I need help looking at the predictions of a tf.contrib.factorization.KMeansClustering estimator. My current code segment looks like:
km = KMeansClustering(num_clusters=8,initial_clusters=KMeansClustering.KMEANS_PLUS_PLUS_INIT,model_dir=MODEL,relative_tolerance=0.01)
result = km.train(input_fn=lambda: gen_input(body))
input_fn = tf.estimator.inputs.pandas_input_fn(x={'x':tst}, shuffle=False)
y = result.predict(input_fn)
Where body and tst are pandas dataframes. print(y) gives:
<generator object Estimator.predict at 0x11ebecba0>
And trying things that I've searched up like calling print(list(y)), print(next(y)) or iterating through y like:
for i in y:
...
for i in y.items():
...
for i in enumerate(y):
...
etc, gives the error TypeError: data must be either a numpy array or pandas DataFrame if pandas is installed; got dict. I can't find any other ways to try and print this online. Thanks
Your code is too less to confirm on what's wrong/missing. Also at least a full stack trace is expected. This answer may prove to be well off after you add more information.
Is it that you're expecting the call to pandas_input_fn return something else than what you expect? It returns a function with signature ()->(dict of features, target) See docs for details.
Also, it you don't seem to be running a TensorFlow session. Until you do so, all tensors, computations (predictions in your case) etc are just part of a graph, they will have values only after running a TF session.
See these docs for more details.

How to solve nan loss?

Problem
I'm running a Deep Neural Network on the MNIST where the loss defined as follow:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, label))
The program seems to run correctly until I get a nan loss in the 10000+ th minibatch. Sometimes, the program runs correctly until it finished. I think tf.nn.softmax_cross_entropy_with_logits is giving me this error.
This is strange, because the code just contains mul and add operations.
Possible Solution
Maybe I can use:
if cost == "nan":
optimizer = an empty optimizer
else:
...
optimizer = real optimizer
But I cannot find the type of nan. How can I check a variable is nan or not?
How else can I solve this problem?
I find a similar problem here TensorFlow cross_entropy NaN problem
Thanks to the author user1111929
tf.nn.softmax_cross_entropy_with_logits => -tf.reduce_sum(y_*tf.log(y_conv))
is actually a horrible way of computing the cross-entropy. In some samples, certain classes could be excluded with certainty after a while, resulting in y_conv=0 for that sample. That's normally not a problem since you're not interested in those, but in the way cross_entropy is written there, it yields 0*log(0) for that particular sample/class. Hence the NaN.
Replacing it with
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv + 1e-10))
Or
cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))
Solved nan problem.
The reason you are getting NaN's is most likely that somewhere in your cost function or softmax you are trying to take a log of zero, which is not a number. But to answer your specific question about detecting NaN, Python has a built-in capability to test for NaN in the math module. For example:
import math
val = float('nan')
val
if math.isnan(val):
print('Detected NaN')
import pdb; pdb.set_trace() # Break into debugger to look around
Check your learning rate. The bigger your network, more parameters to learn. That means you also need to decrease the learning rate.
I don't have your code or data. But tf.nn.softmax_cross_entropy_with_logits should be stable with a valid probability distribution (more info here). I assume your data does not meet this requirement. An analogous problem was also discussed here. Which would lead you to either:
Implement your own softmax_cross_entropy_with_logits function, e.g. try (source):
epsilon = tf.constant(value=0.00001, shape=shape)
logits = logits + epsilon
softmax = tf.nn.softmax(logits)
cross_entropy = -tf.reduce_sum(labels * tf.log(softmax), reduction_indices=[1])
Update your data so that it does have a valid probability distribution

How to use the SquaredError brick in Blocks (Theano, Python)?

I designed a really simple recurrent neural network in Blocks (and Theano). As a cost function I decided to use the square error function which is defined simply as (y-y')^2. I would like to compute the average cost across the minibatch.
The following code is an almost working example with Blocks class/method SquaredError which is, as far as I'm concerned, supposed to do exactly the desired operation.
Please ignore inefficient float64, I use them in order to simplify eval execution. The problem preserves when using 32b.
import theano.tensor as tt
from blocks.bricks.cost import SquaredError
if __name__ == '__main__':
a = tt.vector('a', dtype='float64')
b = tt.vector('b', dtype='float64')
cost = SquaredError().apply(a, b)
print(cost.eval({a: [1.0, 2.0, 3.0, 4.0],
b: [0.5, 2.1, 3.4, 3.8]}))
# Expected: mean(0.5^2 + 0.1^2 + 0.4^2 + 0.2^2)
# Got: ValueError: Not enough dimensions on squarederror_cost_matrix_output_0 to reduce on axis 1
If I change the problematic line into the one below, everything works as expected.
cost = tt.sqr(tt.abs_(a - b)).mean()
What am I doing wrong? I am trying to learn Blocks more but this is beyond my understanding. Am I supposed to use another brick? Or somehow preprocess the tensors?
Looks as though we require 2D inputs for CostMatrix bricks, which is kind of dumb. I've filed an issue about it. You can get around this, if you like, by dimshuffling your inputs up to (N, 1) matrices, but the Cost bricks are mainly only useful if you're using the automatic tagging of inputs and outputs for VariableFilter operations, etc. Writing down the cost as you did in a Theano expression is fine too (although to nitpick you don't need the abs, the square of a negative number is always positive anyway).

Categories