I am a novice programmer trying to follow this guide.
However, I ran across an issue. The guide says to define the loss function as:
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
This gives me the following error:
sparse_categorical_crossentropy() got an unexpected keyword argument
'from_logits'
which I take to mean that from_logits is an argument not specified in the function, which is supported by the documentation, which that tf.keras.losses.sparse_categorical_crossentropy() has only two possible inputs.
Is there a way to specify that logits are being used or is that even necesarry?
I had the same problem while working through the tutorial. I changed the code from
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
to
def loss(labels, logits):
return tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)
and this resolved the issue without having to install tf-nightly.
The from_logits parameter is introduced in Tensorflow 1.13.
You can compare 1.12 and 1.13 with these urls:
https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/python/keras/losses.py
https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/keras/losses.py
1.13 is not released at the time of writing. This is why the tutorial starts with the line
!pip install -q tf-nightly
Related
I am trying to build a neural network in Python for solving PDEs, and, as such, I have had to write custom training steps. My training function looks like this:
...
tf.enable_eager_execution()
class PDENet:
...
def train_step():
input = self.input
with tf.GradientTape() as tape, tf.Session() as sess:
tape.watch(input)
output = self.model(input)
self.loss = self.pde_loss(output) # (network does not use training data)
grad = tape.gradient(self.loss, self.model.trainable_weights)
self.optimizer.apply_gradients([(grad, self.model)])
...
Due to my hardware, I have no choice but to use tensorflow==1.12.0 and keras==2.2.4.
When I run this code, I get "RuntimeError: Attempting to capture an EagerTensor without building a function". I have seen other posts about this, but all of the answers say to update tensorflow/keras, which I can't, use "tf.enable_eager_execution()", which I've already done, and "tf.disable_v2_behavior()", which is nonexistent on older versions of tensorflow. Is there anything else I can do to solve this problem? The error makes me think tensorflow wants me to add #tf.function, but that feature also doesn't seem to exist in tensorflow 1.
I am trying to train a CNN in tensorflow (keras) with different learning rates per layer. As this option is not included in tensorflow i am trying to modify an already existing optimizer like suggested in this github comment.
When I simply copy the source code of the SGD optimizer that i found in
"C:\Users\user\Anaconda3\envs\tf_gpu\Lib\site-packages\tensorflow_core\python\keras\optimizers.py"
(without any modifications) into my script and I try to run it I get the following error:
TypeError Traceback (most recent call last)
<ipython-input-2-757ea125a090> in <module>
----> 1 opti = SGD(lr=0.001, momentum=0.9, nesterov=False)
<ipython-input-1-dad895e21911> in __init__(self, lr, momentum, decay, nesterov, **kwargs)
63
64 def __init__(self, lr=0.01, momentum=0., decay=0., nesterov=False, **kwargs):
---> 65 super(SGD, self).__init__(**kwargs)
66 with K.name_scope(self.__class__.__name__):
67 self.iterations = K.variable(0, dtype='int64', name='iterations')
TypeError: __init__() missing 1 required positional argument: 'name'
Do I need to make modifications directly in the source code and compile tensorflow from source or am I just doing something wrong?
After attempting to install tensorflow (with GPU support) and getting a lot of errors with CuDNN and CUDA I ended up installing it using
conda install tensorflow-gpu
which worked perfectly, so I am a bit scared to uninstall the now running tensorflow version and compile it from source.
Also, if you know other ways how to modify the learning rate for each layer individually using tensorflow please let me know.
This could be from a bad implementation.
I solved this by simply adding a name argument name='sgd'.
opti = SGD(lr=0.001, momentum=0.9, nesterov=False, name='sgd')
Let me know if that works
I'm trying to build a graph in tensorflow. But it gives me an error that I have wrong rank of shape. So, I'm trying to locate at which step something went wrong. Is there a chance to find out shapes of elements's outputs while building a graph?
For example, my code is:
def inference_decoding_layer(start_token, end_token, embeddings, dec_cell, initial_state, output_layer,
max_summary_length, batch_size):
'''Create the inference logits'''
start_tokens = tf.tile(tf.constant([start_token], dtype=tf.int32), [batch_size], name='start_tokens')
inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embeddings, #shape (2000,25,768)
start_tokens,
end_token)
inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
inference_helper,
initial_state,
output_layer)
inference_logits, _ , _ = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
output_time_major=False,
impute_finished=True,
maximum_iterations=max_summary_length)
return inference_decoder
The problem appears at dynamic_decoder. Here is the error:
ValueError: Shape must be rank 3 but is rank 2 for 'decode/decoder/while/BasicDecoderStep/decoder/attention_wrapper/concat_6' (op: 'ConcatV2') with input shapes: [32,25,768], [32,256], [].
So, I'm wondering is there a way to find out, for example, what shape of the value we get from GreedyEmbeddingHelper and then from BasicDecoder... Or maybe of other thing in my whole code. So, I would locate where the problem lays.
P.S. If there are any other ways/suggestions of how to locate the problem in this case I would be very grateful!
For the sake of easy debugging, eager mode has been introduced. With eager mode, you can keep printing the output shape after each line of code is executed.
In TF 1.x, to enable it, you have to run the following code:
tf.enable_eager_execution()
In TF 2.0, by default eager mode will be enabled. Also, the packages you are working on has been moved to TensorFlow Addons in TF 2.0.
Here is a great question on how to find the first occurence of Nan in a tensorflow graph:
Debugging nans in the backward pass
The answer is quite helpful, here is the code from it:
train_op = ...
check_op = tf.add_check_numerics_ops()
sess = tf.Session()
sess.run([train_op, check_op]) # Runs training and checks for NaNs
Apparently, running the training and the numerical check at the same time will result in an error report as soon as Nan is encountered for the first time.
How do I integrate this into Keras ?
In the documentation, I can't find anything that looks like this.
I checked the code, too.
The update step is executed here:
https://github.com/fchollet/keras/blob/master/keras/engine/training.py
There is a function called _make_train_function where an operation to compute the loss and apply updates is created. This is later called to train the network.
I could change the code like this (always assuming that we're running on a tf backend):
check_op = tf.add_check_numerics_ops()
self.train_function = K.function(inputs,
[self.total_loss] + self.metrics_tensors + [check_op],
updates=updates, name='train_function', **self._function_kwargs)
I'm currently trying to set this up properly and not sure whether the code above actually works.
Maybe there is an easier way ?
I've been running into the exact same problem, and found an alternative to the check_add_numerics_ops() function. Instead of going that route, I use the TensorFlow Debugger to walk through my model, following the example in https://www.tensorflow.org/guide/debugger to figure out exactly where my code produces nans. This snippet should work for replacing the TensorFlow Session that Keras is using with a debugging session, allowing you to use tfdbg.
from tensorflow.python import debug as tf_debug
sess = K.get_session()
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
K.set_session(sess)
I am using some code from here: https://github.com/monikkinom/ner-lstm with tensorflow. I think the code was written for an older version of tensorflow, I am using version 1.0.0. I used tf_upgrade.py to upgrade model.py in that github repos, but I am still getting the error:
output, _, _ = contrib_rnn.bidirectional_rnn(fw_cell, bw_cell,
AttributeError: 'module' object has no attribute 'bidirectional_rnn'
this is after I changed the bidirectional_rnn call to use contrib_rnn which is:
from tensorflow.contrib.rnn.python.ops import core_rnn as contrib_rnn
The old call was
output, _, _ = tf.nn.bidirectional_rnn(fw_cell, bw_cell,
tf.unpack(tf.transpose(self.input_data, perm=[1, 0, 2])),
dtype=tf.float32, sequence_length=self.length)
which also doesn't work.
I had to change the LSTMCell, DroputWrapper, etc. to rnn.LSTMCell, but they seem to work fine. It is the bidirectional_rnn that I can't figure out how to change.
In TensorFlow 1.0, you have the choice of two bidirectional RNN functions:
tf.nn.bidirectional_dynamic_rnn()
tf.contrib.rnn.static_bidirectional_rnn()
Maybe you can try to reimplement a bidirectional RNN by simply wrapping into a single class two monodirectional RNNs with the parameter "go_backwards=True" set on one of them. Then you can also have control over the type of merge done with the outputs. Maybe taking a look at the implementation in https://github.com/fchollet/keras/blob/master/keras/layers/wrappers.py (see the class Bidirectional) could get you started.