Structuring a Keras project to achieve reproducible results in GPU - python

I am writing a tensorflow.Keras wrapper to perform ML experiments.
I need my framework to be able to perform an experiment as specified in a configuration yaml file and run in parallel in a GPU.
Then I need a guarantee that if I ran the experiment again I would get if not the exact same results something reasonably close.
To try to ensure this, my training script contains these lines at the beginning, following the guidelines in the official documentation:
# Set up random seeds
random.seed(seed)
np.random.seed(seed)
tf.set_random_seed(seed)
This has proven to not be enough.
I ran the same configuration 4 times, and plotted the results:
As you can see, results vary a lot between runs.
How can I set up a training session in Keras to ensure I get reasonably similar results when training in a GPU? Is this even possible?
The full training script can be found here.
Some of my colleagues are using just pure TF, and their results seem far more consistent. What is more, they do not seem to be seeding any randomness except to ensure that the train and validation split is always the same.

Keras + Tensorflow.
Step 1, disable GPU.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
Step 2, seed those libraries which are included in your code, say "tensorflow, numpy, random".
import tensorflow as tf
import numpy as np
import random as rn
sd = 1 # Here sd means seed.
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)
from keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)
Make sure these two pieces of code are included at the start of your code, then the result will be reproducible.

Try adding seed parameters to weights/biases initializers. Just to add more specifics to Alexander Ejbekov's comment.
Tensorflow has two random seeds graph level and op level. If you're using more than one graph, you need to specify seed in every one. You can override graph level seed with op level, by setting seed parameter within function. And you can make two functions even from different graphs output same value if same seed is set.
Consider this example:
g1 = tf.Graph()
with g1.as_default():
tf.set_random_seed(1)
a = tf.get_variable('a', shape=(1,), initializer=tf.keras.initializers.glorot_normal())
b = tf.get_variable('b', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=2))
with tf.Session(graph=g1) as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(a))
print(sess.run(b))
g2 = tf.Graph()
with g2.as_default():
a1 = tf.get_variable('a1', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=1))
with tf.Session(graph=g2) as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(a1))
In this example, output of a is the same as a1, but b is different.

Related

How do you feed a tf.data.Dataset dynamically in eager execution mode where initializable_iterator isn't available?

What is the new approach (under eager execution) to feeding data through a dataset pipeline in a dynamic fashion, when we need to feed it sample by sample?
I have a tf.data.Dataset which performs some preprocessing steps and reads data from a generator, drawing from a large dataset during training.
Let's say that dataset is represented as:
ds = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])
ds = ds.map(tf.square).shuffle(2).batch(2)
iterator = tf.data.make_one_shot_iterator(ds)
After training I want to produce various visualizations which require that I feed one sample at a time through the network for inference. I've now got this dataset preprocessing pipeline that I need to feed my raw sample through to be sized and shaped appropriately for the network input.
This seems like a use case for the initializable iterator:
placeholder = tf.placeholder(tf.float32, shape=None)
ds = tf.data.Dataset.from_tensor_slices(placeholder)
ds = ds.map(tf.square).shuffle(2).batch(2)
iterator = tf.data.make_initializable_iterator(ds)
# now re-initialize for each sample
Keep in mind that the map operation in this example represents a long sequence of preprocessing operations that can't be duplicated for each new data sample being feed in.
This doesn't work with eager execution, you can't use the placeholder. The documentation examples all seem to assume a static input such as in the first example here.
The only way I can think of doing this is with a queue and tf.data.Dataset.from_generator(...) which reads from the queue that I push to before predicting on the data. But this feels both hacky, and appears prone to deadlocks that I've yet to solve.
TF 1.14.0
I just realized that the answer to this question is trivial:
Just create a new dataset!
In non-eager mode the code below would have degraded in performance because each dataset operation would have been added to the graph and never released, and in non-eager mode we have the initializable iterator to resolve that issue.
However, in eager execution mode tensorflow operations like this are ephemeral, added iterators aren't being added to a global graph, they just get created and die when no longer referenced. Win one for TF2.0!
The code below (copy/paste runnable) demonstrates:
import tensorflow as tf
import numpy as np
import time
tf.enable_eager_execution()
inp = np.ones(shape=5000, dtype=np.float32)
t = time.time()
while True:
ds = tf.data.Dataset.from_tensors(inp).batch(1)
val = next(iter(ds))
assert np.all(np.squeeze(val, axis=0) == inp)
print('Processing time {:.2f}'.format(time.time() - t))
t = time.time()
The motivation for the question came on the heels of this issue in 1.14 where creating multiple dataset operations in graph mode under Keras constitutes a memory leak.
https://github.com/tensorflow/tensorflow/issues/30448

How to reset the AdamOptimizer from Tensorflow while training

We are currently working on a project in which we change a cGAN architecture on Tensorflow to see if we get better results than standard cGANs. Due to the fact that we implement a progressivly growing architecture we would like to reset the AdamOptimizer from Tensorflow after each phase transition. Nonetheless we still did not manage to do so. We tried multiple approaches but either we get the error message "Graph is finalized and cannot be modified" or the parameters do not get reset.
Would be very thankful if somebody could give a hint or a general approach.
You just have to define the optimizer, gather the Adam variables and their initializers. Then, during the training, you can re-initialize the variables by running the initializers.
The following minimal example should point you in the right direction
import tensorflow as tf
x = tf.placeholder(tf.float32, shape=(None, 1))
y_hat = tf.layers.Dense(10)(x)
y = 10
loss = tf.reduce_mean(tf.squared_difference(y_hat, y))
train = tf.train.AdamOptimizer().minimize(loss)
print(tf.all_variables())
adam_vars = [var for var in tf.all_variables() if "adam" in var.name.lower()]
print(adam_vars)
adam_reset = [var.initializer for var in adam_vars]
with tf.Session() as sess:
# do stuff with your model: train, evaluate, whatever
# when the reset condition is met, run:
sess.run(adam_reset)

Different spectrogram between audio_ops and tf.contrib.signal

I am trying to update the feature extraction pipeline of an speech command recognition model replacing the function audio_ops.audio_spectrogram() by tf.contrib.signal.stft(). I assumed that they were equivalent, but I am obtaining different spectrogram values with the same input audio. Could someone explain the relation between the two methods, or whether it is possible to obtain the same results using tf.contrib.signal.stft()?
My code:
1) audio_ops method:
from tensorflow.contrib.framework.python.ops import audio_ops
import tensorflow as tf
import numpy as np
from tensorflow.python.ops import io_ops
#WAV audio loader
wav_filename_placeholder_ = tf.placeholder(tf.string, [], name='wav_filename')
wav_loader = io_ops.read_file(wav_filename_placeholder_)
sample_rate = 16000
desired_samples = 16000 #1 sec audio
wav_decoder = audio_ops.decode_wav(wav_loader, desired_channels=1, desired_samples=desired_samples)
#Computing the spectrograms
spectrogram = audio_ops.audio_spectrogram(wav_decoder.audio,
window_size=320,
stride=160,
magnitude_squared=False)
with tf.Session() as sess:
feed_dict={wav_filename_placeholder_:"/<folder_path>/audio_sample.wav"}
#Get the input audio and the spectrogram
audio_ops_wav_decoder_audio, audio_ops_spectrogram = sess.run([wav_decoder.audio, spectrogram], feed_dict)
2) tf.contrib.signal method:
#Input WAV audio (will be initialized with the same audio signal: wav_decoder.audio )
signals = tf.placeholder(tf.float32, [None, None])
#Compute the spectrograms and get the absolute values
stfts = tf.contrib.signal.stft(signals,
frame_length=320,
frame_step=160,
fft_length=512,
window_fn=None)
magnitude_spectrograms = tf.abs(stfts)
with tf.Session() as sess:
feed_dict = {signals : audio_ops_wav_decoder_audio.reshape(1,16000)}
tf_original, tf_stfts, tf_spectrogram, = sess.run([signals, stfts, magnitude_spectrograms], feed_dict)
Thank you in advance
Found these helpful comments in github that discuss the differences:
https://github.com/tensorflow/tensorflow/issues/11339#issuecomment-345741527
https://github.com/tensorflow/tensorflow/issues/11339#issuecomment-443553788
You can think of audio_ops.audio_spectrogram and audio_ops.mfcc as
"fused" ops (like fused batch-norm or fused LSTM cells that TensorFlow
has) for the ops in tf.contrib.signal. I think the original motivation
of them was that a fused op makes it easier to provide mobile support.
I think long term it would be nice if we removed them and provided
automatic fusing via XLA, or unified the API to match
tf.contrib.signal API, and provided fused keyword arguments to
tf.contrib.signal functions, like we do for
tf.layers.batch_normalization.
audio_spectrogram is a C++ implementation of an STFT, while
tf.signal.stft uses TensorFlow ops to compute the STFT (and thus has
CPU, GPU and TPU support).
The main cause of difference between them is that audio_spectrogram
uses fft2d to compute FFTs while tf.contrib.signal.stft uses Eigen
(CPU), cuFFT (GPU), and XLA (TPU). There is another very minor
difference, which is that the default periodic Hann window used by
each is slightly different. tf.contrib.signal.stft follows
numpy/scipy's definition.

Keras + Tensorflow : Debug NaNs

Here is a great question on how to find the first occurence of Nan in a tensorflow graph:
Debugging nans in the backward pass
The answer is quite helpful, here is the code from it:
train_op = ...
check_op = tf.add_check_numerics_ops()
sess = tf.Session()
sess.run([train_op, check_op]) # Runs training and checks for NaNs
Apparently, running the training and the numerical check at the same time will result in an error report as soon as Nan is encountered for the first time.
How do I integrate this into Keras ?
In the documentation, I can't find anything that looks like this.
I checked the code, too.
The update step is executed here:
https://github.com/fchollet/keras/blob/master/keras/engine/training.py
There is a function called _make_train_function where an operation to compute the loss and apply updates is created. This is later called to train the network.
I could change the code like this (always assuming that we're running on a tf backend):
check_op = tf.add_check_numerics_ops()
self.train_function = K.function(inputs,
[self.total_loss] + self.metrics_tensors + [check_op],
updates=updates, name='train_function', **self._function_kwargs)
I'm currently trying to set this up properly and not sure whether the code above actually works.
Maybe there is an easier way ?
I've been running into the exact same problem, and found an alternative to the check_add_numerics_ops() function. Instead of going that route, I use the TensorFlow Debugger to walk through my model, following the example in https://www.tensorflow.org/guide/debugger to figure out exactly where my code produces nans. This snippet should work for replacing the TensorFlow Session that Keras is using with a debugging session, allowing you to use tfdbg.
from tensorflow.python import debug as tf_debug
sess = K.get_session()
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
K.set_session(sess)

TensorFlow Inference Graph - Loading and Restoring Variables impact

This is closely related to a lot of questions, including one of my own here: TensorFlow Inference
Every sample in TensorFlow for inference appears to follow this form:
import tensorflow as tf
import CONSTANTS
import Vgg3CIFAR10
import numpy as np
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
rand = np.random.rand(1, 32, 32, 3).astype(np.float32)
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
logits = Vgg3CIFAR10.inference(images)
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta')
new_saver.restore(sess, MODEL_PATH)
print(sess.run(logits, feed_dict={images : rand}))
print('done')
run_inference()
Issues:
Restoring the model & graph does just that...except I am creating a parallel graph here where I am possibly adding new parts to the graph. (Except tensorflow graphs are append only; so how does this add to the graph and run just that segment if it is appended; it would want to run the whole thing.
What happens to the queue runners that existed in the loaded graph; all those ops are loaded. By printing out sess.graph.get_operations() you can see all of the old input ops are there.
Does logits = Vgg3CIFAR10.inference(images) not append new items to the graph? If it is because of naming; then does the placeholder input replace the queue runner stuff?
Possible answer for a few items: Because I defined the logits op first; this means that the rest of the graph got appended after that; and via some tensorflow magic sauce the variables from the original graph got restored into the logits portion of the graph?
So I tested this out; and it doesn't even work properly...
It first creates a graph with logits, then it appends to that graph the old graph. So when you call inference; you just get a bunch of garbage back...
[[ 0.09815982 0.09611271 0.10542709 0.10383813 0.0955615 0.10979554
0.12138291 0.09316944 0.08336139 0.09319157]]
[[ 0.10305423 0.092167 0.10572157 0.10368075 0.1043573 0.10057402
0.12435613 0.08916584 0.07929172 0.09763144]]
[[ 0.1068181 0.09361464 0.10377798 0.10060066 0.10110897 0.09462726
0.11688241 0.09941135 0.0869903 0.09616835]]
Here I am expecting node 8 followed by nodes 2 and 2 to be the ones surfaced...obviously its just a bunch of nothing...
So after a ton of review this is what happens...
If you add anything to the graph before restoring the graph; the restored graph is appended to the already created graph.
Restoring variables looks for variable names in your graph at the time of restoring variables with the same name as the variables that are stored in the meta files. If you created a graph with the same variable names, and then restored a graph that also had the same variable names, the test I ran showed the appended stored graph receiving the restoration of the variables and not the initial graph.
So in summary; be careful about what you are doing with your graphs and be very aware of how things get appended/restored. If you were looking at this in hopes to find inference; then look at this S.O. where the answer involves creating a new graph and restoring variables to that new graph which is in fact a subgraph of the original.
TensorFlow Inference

Categories