Keras + Tensorflow : Debug NaNs - python

Here is a great question on how to find the first occurence of Nan in a tensorflow graph:
Debugging nans in the backward pass
The answer is quite helpful, here is the code from it:
train_op = ...
check_op = tf.add_check_numerics_ops()
sess = tf.Session()
sess.run([train_op, check_op]) # Runs training and checks for NaNs
Apparently, running the training and the numerical check at the same time will result in an error report as soon as Nan is encountered for the first time.
How do I integrate this into Keras ?
In the documentation, I can't find anything that looks like this.
I checked the code, too.
The update step is executed here:
https://github.com/fchollet/keras/blob/master/keras/engine/training.py
There is a function called _make_train_function where an operation to compute the loss and apply updates is created. This is later called to train the network.
I could change the code like this (always assuming that we're running on a tf backend):
check_op = tf.add_check_numerics_ops()
self.train_function = K.function(inputs,
[self.total_loss] + self.metrics_tensors + [check_op],
updates=updates, name='train_function', **self._function_kwargs)
I'm currently trying to set this up properly and not sure whether the code above actually works.
Maybe there is an easier way ?

I've been running into the exact same problem, and found an alternative to the check_add_numerics_ops() function. Instead of going that route, I use the TensorFlow Debugger to walk through my model, following the example in https://www.tensorflow.org/guide/debugger to figure out exactly where my code produces nans. This snippet should work for replacing the TensorFlow Session that Keras is using with a debugging session, allowing you to use tfdbg.
from tensorflow.python import debug as tf_debug
sess = K.get_session()
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
K.set_session(sess)

Related

TensorFlow RuntimeError: "Attempting to capture an EagerTensor without building a function"

I am trying to build a neural network in Python for solving PDEs, and, as such, I have had to write custom training steps. My training function looks like this:
...
tf.enable_eager_execution()
class PDENet:
...
def train_step():
input = self.input
with tf.GradientTape() as tape, tf.Session() as sess:
tape.watch(input)
output = self.model(input)
self.loss = self.pde_loss(output) # (network does not use training data)
grad = tape.gradient(self.loss, self.model.trainable_weights)
self.optimizer.apply_gradients([(grad, self.model)])
...
Due to my hardware, I have no choice but to use tensorflow==1.12.0 and keras==2.2.4.
When I run this code, I get "RuntimeError: Attempting to capture an EagerTensor without building a function". I have seen other posts about this, but all of the answers say to update tensorflow/keras, which I can't, use "tf.enable_eager_execution()", which I've already done, and "tf.disable_v2_behavior()", which is nonexistent on older versions of tensorflow. Is there anything else I can do to solve this problem? The error makes me think tensorflow wants me to add #tf.function, but that feature also doesn't seem to exist in tensorflow 1.

In tensorflow-probability, how do I update a learnable prior used only in KL-divergence?

I'm working on a variational auto-encoder and I'd like the prior used in the KL-divergence regularization of the latent distribution to have its loc (mean) and scale (stddev) updated.
The below snippet is a contrived minimal example demonstrating what I'm trying to achieve. This starts to work but then just freezes after some random number of epochs (sometimes 1, sometimes 200, but usually around 7 or 8). There's no error message or anything.
loc = tf.Variable(tf.random.normal([ndim], stddev=0.1, dtype=tf.float32))
scale = tfp.util.TransformedVariable(
tf.random.normal([ndim], mean=1.0, stddev=0.1, dtype=tf.float32),
bijector=tfb.Chain([tfb.Shift(1e-5), tfb.Softplus(), tfb.Shift(0.5413)]))
prior = tfd.Independent(tfd.Normal(loc=loc, scale=scale), reinterpreted_batch_ndims=1)
_input = tfkl.Input(shape=(1,))
_loc = tfkl.Dense(ndim, name="loc_params")(_input)
_scale = tfkl.Dense(ndim, name="untransformed_scale_params")(_input)
_scale = tf.math.softplus(_scale + np.log(np.exp(1) - 1)) + 1e-5
_output = tfpl.DistributionLambda(
make_distribution_fn=lambda t: tfd.Independent(tfd.Normal(loc=t[0], scale=t[1])),
activity_regularizer=tfpl.KLDivergenceRegularizer(prior, use_exact_kl=True, weight=0.1)
)([_loc, _scale])
model = tf.keras.Model(_input, _output)
model.compile(optimizer='adam', loss=lambda y_true, model_out: -model_out.log_prob(y_true))
hist = model.fit(ds, epochs=N_EPOCHS, verbose=2)
I have a runnable gist here.
A more concrete example, and an architecture close to what I'm trying to update and simplify, is the tfp example for disentangled_vae. In its manual training loop, a new tfd.MultivariateNormalDiag is instantiated on every loop, though it is parameterized using persistent tf.Variables. I'm trying my best to avoid manual training loops, and I'm also trying to move to more Keras-like syntax, so I'd rather not do a direct port of this example.
Any advice is greatly appreciated. Thanks!
Edit: The activity_regularizer seems to work fine when attached to a latent (bottleneck) distribution. I have a more complete example in this Colab notebook. As this works in my architecture, I'm no longer in need of an answer.
However, I highly doubt having model fitting freeze is desirable behaviour, so this remains a problem.
As the machinery works in most circumstances, just not the contrived freezing example above, I no longer consider this a question that needs an answer.
I have reported the errorless freezing behaviour via the tensorflow-probability repository issues page. See here.

Structuring a Keras project to achieve reproducible results in GPU

I am writing a tensorflow.Keras wrapper to perform ML experiments.
I need my framework to be able to perform an experiment as specified in a configuration yaml file and run in parallel in a GPU.
Then I need a guarantee that if I ran the experiment again I would get if not the exact same results something reasonably close.
To try to ensure this, my training script contains these lines at the beginning, following the guidelines in the official documentation:
# Set up random seeds
random.seed(seed)
np.random.seed(seed)
tf.set_random_seed(seed)
This has proven to not be enough.
I ran the same configuration 4 times, and plotted the results:
As you can see, results vary a lot between runs.
How can I set up a training session in Keras to ensure I get reasonably similar results when training in a GPU? Is this even possible?
The full training script can be found here.
Some of my colleagues are using just pure TF, and their results seem far more consistent. What is more, they do not seem to be seeding any randomness except to ensure that the train and validation split is always the same.
Keras + Tensorflow.
Step 1, disable GPU.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
Step 2, seed those libraries which are included in your code, say "tensorflow, numpy, random".
import tensorflow as tf
import numpy as np
import random as rn
sd = 1 # Here sd means seed.
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)
from keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)
Make sure these two pieces of code are included at the start of your code, then the result will be reproducible.
Try adding seed parameters to weights/biases initializers. Just to add more specifics to Alexander Ejbekov's comment.
Tensorflow has two random seeds graph level and op level. If you're using more than one graph, you need to specify seed in every one. You can override graph level seed with op level, by setting seed parameter within function. And you can make two functions even from different graphs output same value if same seed is set.
Consider this example:
g1 = tf.Graph()
with g1.as_default():
tf.set_random_seed(1)
a = tf.get_variable('a', shape=(1,), initializer=tf.keras.initializers.glorot_normal())
b = tf.get_variable('b', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=2))
with tf.Session(graph=g1) as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(a))
print(sess.run(b))
g2 = tf.Graph()
with g2.as_default():
a1 = tf.get_variable('a1', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=1))
with tf.Session(graph=g2) as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(a1))
In this example, output of a is the same as a1, but b is different.

How to reset the AdamOptimizer from Tensorflow while training

We are currently working on a project in which we change a cGAN architecture on Tensorflow to see if we get better results than standard cGANs. Due to the fact that we implement a progressivly growing architecture we would like to reset the AdamOptimizer from Tensorflow after each phase transition. Nonetheless we still did not manage to do so. We tried multiple approaches but either we get the error message "Graph is finalized and cannot be modified" or the parameters do not get reset.
Would be very thankful if somebody could give a hint or a general approach.
You just have to define the optimizer, gather the Adam variables and their initializers. Then, during the training, you can re-initialize the variables by running the initializers.
The following minimal example should point you in the right direction
import tensorflow as tf
x = tf.placeholder(tf.float32, shape=(None, 1))
y_hat = tf.layers.Dense(10)(x)
y = 10
loss = tf.reduce_mean(tf.squared_difference(y_hat, y))
train = tf.train.AdamOptimizer().minimize(loss)
print(tf.all_variables())
adam_vars = [var for var in tf.all_variables() if "adam" in var.name.lower()]
print(adam_vars)
adam_reset = [var.initializer for var in adam_vars]
with tf.Session() as sess:
# do stuff with your model: train, evaluate, whatever
# when the reset condition is met, run:
sess.run(adam_reset)

Tensorflow Dataset .map() API

Couple of questions about this
For occasions when I'd like to do something like the following in Tensorflow (assume I'm creating training examples by loading WAV files):
import tensorflow as tf
def _some_audio_preprocessing_func(filename):
# ... some logic here which mostly uses Tensorflow ops ...
with tf.Session(graph=tf.Graph()) as sess:
wav_filename_placeholder = tf.placeholder(tf.string, [])
wav_loader = io_ops.read_file(wav_filename_placeholder)
wav_decoder = contrib_audio.decode_wav(wav_loader, desired_channels=1)
data = sess.run(
[wav_decoder],
feed_dict={wav_filename_placeholder: filename})
return data
dataset = tf.data.Dataset.list_files('*.wav')
dataset = dataset.map(_some_preprocessing_func)
If I have a parse_image() function that uses tensor ops - should
this be part of the main Graph? Following the example set in Google's own audio TF tutorial, it looks like they create a separate graph! Doesn't this ruin the point of using Tensorflow to make things faster?
Do I use tf.py_func() any time any single line isn't from the tensorflow library? Again, I wonder what the performance implications are and when I should use this...
Thanks!
When you use Dataset.map(map_func), TensorFlow defines a subgraph for all the ops created in the function map_func, and arranges to execute it efficiently in the same session as the rest of your graph. There is almost never any need to create a tf.Graph or tf.Session inside map_func: if your parsing function is made up of TensorFlow ops, these ops can be embedded directly in the graph that defines the input pipeline.
The modified version of the code using tf.data would look like this:
import tensorflow as tf
from tensorflow.contrib.framework.python.ops import audio_ops as contrib_audio
def _some_audio_preprocessing_func(filename):
wav_loader = tf.read_file(filename)
return contrib_audio.decode_wav(wav_loader, desired_channels=1)
dataset = tf.data.Dataset.list_files('*.wav')
dataset = dataset.map(_some_preprocessing_func)
If your map_func contains non-TensorFlow operations that you want to apply to each element, you should wrap them in a tf.py_func() (or Dataset.from_generator(), if the data generation process is defined in Python logic). The main performance implication is that any code running in a tf.py_func() is subject to the Global Interpreter Lock, so I would generally recommend trying to find a native TensorFlow implementation for anything that is performance critical.

Categories