I'm implementing a DNN with Theano. At the last layer of DNN, I'm using a softmax as a nonlinear function from theano.tensor.nnet.softmax
As a lost function i'm using cross entropy from T.nnet.binary_crossentropy
But I get a strange error:
"The following error happened while compiling the node', GpuDnnSoftmaxGrad{tensor_format='bc01' ..."
I'm a newbie with theano and can't figure out what's wrong with this model. Your help is appreciated
PS: my guess is it is somehow related to the fact that softmax takes a 2D tensor and returns a 2D tensor.
PS2:I'm using the bleeding edge Theano (just cloned) my CUDA version is old it is 4.2 BUT I'm almost sure that that's not the problem since I'm working without error with other DNN tools written based on Theano.
I'm using pylearn2 to accelerate and that's not the problem either since I already used it successfully with the current Theano and CUDA in another DNN.
The error happens at this line: train= theano.function([idx], train_loss, givens=givens, updates=updates)
The full error message is:
cmodule.py", line 293, in dlimport
rval = __import__(module_name, {}, {}, [module_name])
RuntimeError: ('The following error happened while compiling the node', GpuDnnSoftmaxGrad{tensor_format='bc01', mode='channel', algo='accurate'}(GpuContiguous.0, GpuContiguous.0), '\n', 'could not create cuDNN handle: The handle was not initialized(Is your driver recent enought?).', "[GpuDnnSoftmaxGrad{tensor_format='bc01', mode='channel', algo='accurate'}(<CudaNdarrayType(float32, (False, False, True, True))>, <CudaNdarrayType(float32, (False, False, True, True))>)]")
The Cross entropy funcion I'm using is defined as:
error = T.mean(T.nnet.binary_crossentropy(input, target_y)
where input is the output of the softmax layer and target_y is the labels.
solved. I had to use T.nnet.categorical_crossentropy since my target variable is an integer vector.
Related
I am using PyTorch for training a network. I was going through the autograd documentation and here it is mentioned that for each tensor there is a counter that the autograd implements to track the "version" of any tensor. How can I get this counter for any tensor in the graph?
Reason why I need it.
I have encountered the autograd error
[torch.cuda.FloatTensor [x, y, z]], which is output 0 of torch::autograd::CopySlices, is at version 7; expected version 6 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
This is not new to me and I have been successful in handling it before. This time around I am not able to see why the tensor would be at version 7 instead of being at 6. To answer this, I would want to know the version at any given point in the run.
Thanks.
It can be obtained through the command tensor_name._version.
As an example of how to use it, following MSE is provided.
import torch
a = torch.zeros(10, 5)
print(a._version) # prints 0
a[:, 1] = 1
print(a._version) # prints 1
I am trying to run some code in PyTorch but I got stacked at this point:
At first iteration, both backward operations, for Discriminator and Generator are running well
....
self.G_loss.backward(retain_graph=True)
self.D_loss.backward()
...
At the second iteration, when self.G_loss.backward(retain_graph=True) executes, I get this error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8192, 512]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
According to torch.autograd.set_detect_anomaly, the last of the following lines in the Discriminator network, is responsible for this:
bottleneck = bottleneck[:-1]
self.embedding = x.view(x.size(0), -1)
self.logit = self.layers[-1](self.embedding)
The strange thing is that I have used that network architecture in other code where it worked properly. Any suggestions?
The full error:
site-packages\torch\autograd\__init__.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8192, 512]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Solved by removing code with loss += loss_val lines
I am trying to build a neural network in Python for solving PDEs, and, as such, I have had to write custom training steps. My training function looks like this:
...
tf.enable_eager_execution()
class PDENet:
...
def train_step():
input = self.input
with tf.GradientTape() as tape, tf.Session() as sess:
tape.watch(input)
output = self.model(input)
self.loss = self.pde_loss(output) # (network does not use training data)
grad = tape.gradient(self.loss, self.model.trainable_weights)
self.optimizer.apply_gradients([(grad, self.model)])
...
Due to my hardware, I have no choice but to use tensorflow==1.12.0 and keras==2.2.4.
When I run this code, I get "RuntimeError: Attempting to capture an EagerTensor without building a function". I have seen other posts about this, but all of the answers say to update tensorflow/keras, which I can't, use "tf.enable_eager_execution()", which I've already done, and "tf.disable_v2_behavior()", which is nonexistent on older versions of tensorflow. Is there anything else I can do to solve this problem? The error makes me think tensorflow wants me to add #tf.function, but that feature also doesn't seem to exist in tensorflow 1.
I'm trying to build a graph in tensorflow. But it gives me an error that I have wrong rank of shape. So, I'm trying to locate at which step something went wrong. Is there a chance to find out shapes of elements's outputs while building a graph?
For example, my code is:
def inference_decoding_layer(start_token, end_token, embeddings, dec_cell, initial_state, output_layer,
max_summary_length, batch_size):
'''Create the inference logits'''
start_tokens = tf.tile(tf.constant([start_token], dtype=tf.int32), [batch_size], name='start_tokens')
inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embeddings, #shape (2000,25,768)
start_tokens,
end_token)
inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
inference_helper,
initial_state,
output_layer)
inference_logits, _ , _ = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
output_time_major=False,
impute_finished=True,
maximum_iterations=max_summary_length)
return inference_decoder
The problem appears at dynamic_decoder. Here is the error:
ValueError: Shape must be rank 3 but is rank 2 for 'decode/decoder/while/BasicDecoderStep/decoder/attention_wrapper/concat_6' (op: 'ConcatV2') with input shapes: [32,25,768], [32,256], [].
So, I'm wondering is there a way to find out, for example, what shape of the value we get from GreedyEmbeddingHelper and then from BasicDecoder... Or maybe of other thing in my whole code. So, I would locate where the problem lays.
P.S. If there are any other ways/suggestions of how to locate the problem in this case I would be very grateful!
For the sake of easy debugging, eager mode has been introduced. With eager mode, you can keep printing the output shape after each line of code is executed.
In TF 1.x, to enable it, you have to run the following code:
tf.enable_eager_execution()
In TF 2.0, by default eager mode will be enabled. Also, the packages you are working on has been moved to TensorFlow Addons in TF 2.0.
I am a novice programmer trying to follow this guide.
However, I ran across an issue. The guide says to define the loss function as:
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
This gives me the following error:
sparse_categorical_crossentropy() got an unexpected keyword argument
'from_logits'
which I take to mean that from_logits is an argument not specified in the function, which is supported by the documentation, which that tf.keras.losses.sparse_categorical_crossentropy() has only two possible inputs.
Is there a way to specify that logits are being used or is that even necesarry?
I had the same problem while working through the tutorial. I changed the code from
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
to
def loss(labels, logits):
return tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)
and this resolved the issue without having to install tf-nightly.
The from_logits parameter is introduced in Tensorflow 1.13.
You can compare 1.12 and 1.13 with these urls:
https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/python/keras/losses.py
https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/keras/losses.py
1.13 is not released at the time of writing. This is why the tutorial starts with the line
!pip install -q tf-nightly