TensorBoard recording stats separate from training - python

I'm trying to use TensorBoard to display some graphs of a neural network training run. (That is, graphs of test and validation accuracy during training, not just of the network structure.) There is some example code
As well as some questions on this site, all of which seem to follow the same pattern as the example code. That is, the pattern always revolves around something like
summary, _ = sess.run([merged, train_step], ...
So basically, the operations of running a training step and recording statistics for graph display, are being conflated.
This is fine as far as it goes, but I'm trying to retrofit the graph to an existing program that inevitably does things in a slightly different way, so the example code won't work as is. What I really want to do is isolate some code that just records the statistics, separate from existing code to do the training.
How do you record statistics for TensorBoard, within the main training loop, but separate from the code that does the training?

You can manually create tf.Summary object that stores the scalar value and pass it to tf.summary.FileWriter like in the following example:
summary_writer = tf.summary.FileWriter("path_to_log_dir")
# ...
for i in range(max_training_steps):
# compute the values of interest
scalar_value_1 = ...
# ...
scalar_value_n = ...
# manually create tf.Summary object
summary = tf.Summary(
value=[tf.Summary.Value(tag="Metrics_1", simple_value=scalar_value_1),
# ...
tf.Summary.Value(tag="Metrics_n", simple_value=scalar_value_n)])
summary_writer.add_summary(summary, i)
# ...
summary_writer.close()
Alternatively, you can define tf.summary.scalar() operation using tf.placeholder as a tensor and feed the actual value at run time:
scalar_pl_1 = tf.placeholder(tf.float32)
tf.summary.scalar("Metrics_1", scalar_pl_1)
# ...
scalar_pl_n = tf.placeholder(tf.float32)
tf.summary.scalar("Metrics_n", scalar_pl_n)
# Merge all summaries
merged = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter("path_to_log_dir")
with tf.Session() as sess:
for i in range(max_training_steps):
# compute scalar values of interest
scalar_value_1 = ...
scalar_value_n = ...
feed_dict = {scalar_pl_1: scalar_value_1, scalar_pl_n: scalar_value_n}
summary = sess.run(merged, feed_dict=feed_dict)
summary_writer.add_summary(summary, i)
# ...
summary_writer.close()

Related

Tensorflow contrib.summary API - recording scalars every n-th step does not work properly

Recently I have started playing out with TensorBoard. Firstly, I've just wanted to do a simple visualization of the loss function over a few hundred steps. For that, I've wanted to use the tf.contrib.summary API.
My code works except for a slight annoyance - let's say that I want to perform 250 steps of optimizer and I want to record the loss on each of these steps, so, I will do something like this (some chunks of code are missing).
graph = tf.Graph()
sess = tf.Session(graph=graph)
with sess.graph.as_default():
... # lines that define the computation graph as well as input dataset and predictions
global_step = tf.train.create_global_step()
rmse = tf.math.sqrt(tf.losses.mean_squared_error(labels=Y, predictions=Y_PRED))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(rmse, global_step=global_step)
# create summary writer, tensor for recording scalar and initialize everything
summary_writer = tf.contrib.summary.create_file_writer(args.logdir, flush_millis=10 * 1000)
summaries = {}
with summary_writer.as_default(), tf.contrib.summary.always_record_summaries():
summaries["train_rmse"] = tf.contrib.summary.scalar("train/RMSE", rmse)
sess.run(tf.global_variables_initializer())
with summary_writer.as_default():
tf.contrib.summary.initialize(session=sess, graph=graph)
for i in range(250):
train_X_batch, train_Y_batch = # ... retrieve batch of data from dataset
sess.run(optimizer, feed_dict={X : train_X_batch, Y : train_Y_batch})
sess.run(summaries["train_rmse"], {X: train_X, Y: train_Y})
But when I do this, and then visualize results in tensorboard, my train_rmse was recorded only 241 times instead of 250 times as I've used the tf.contrib.summary.always_record_summaries(), right? (See the image).
This issue seems to be data dependent. When I try similiar thing on the mnist dataset and try to record some scalars for the same amount of steps, the number of recorded steps was something like 200.
I've tried to find the answer in the tensorflow documentation but without success. I've also checked things like not having enough data for the 250 steps - this should not be an issue.
One more thing is that this happens even when I use the record_summaries_every_n_global_steps(n) call. For example, calling it with n = 5 records steps only up to the 215th step.
Could anyone help me with this please?

TensorFlow on multiple GPU

Recently, I try to learn how to use Tensorflow on multiple GPU by reading the official tutorial. However, there is something that I am confused about. The following code is part of the official tutorial, which calculates the loss on single GPU.
def tower_loss(scope, images, labels):
# Build inference Graph.
logits = cifar10.inference(images)
# Build the portion of the Graph calculating the losses. Note that we will
# assemble the total_loss using a custom function below.
_ = cifar10.loss(logits, labels)
# Assemble all of the losses for the current tower only.
losses = tf.get_collection('losses', scope)
# Calculate the total loss for the current tower.
total_loss = tf.add_n(losses, name='total_loss')
# Attach a scalar summary to all individual losses and the total loss; do the
# same for the averaged version of the losses.
for l in losses + [total_loss]:
# Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
# session. This helps the clarity of presentation on tensorboard.
loss_name = re.sub('%s_[0-9]*/' % cifar10.TOWER_NAME, '', l.op.name)
tf.summary.scalar(loss_name, l)
return total_loss
The training process is as the following.
def train():
with tf.device('/cpu:0'):
# Create a variable to count the number of train() calls. This equals the
# number of batches processed * FLAGS.num_gpus.
global_step = tf.get_variable(
'global_step', [],
initializer=tf.constant_initializer(0), trainable=False)
# Calculate the learning rate schedule.
num_batches_per_epoch = (cifar10.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN /
FLAGS.batch_size / FLAGS.num_gpus)
decay_steps = int(num_batches_per_epoch * cifar10.NUM_EPOCHS_PER_DECAY)
# Decay the learning rate exponentially based on the number of steps.
lr = tf.train.exponential_decay(cifar10.INITIAL_LEARNING_RATE,
global_step,
decay_steps,
cifar10.LEARNING_RATE_DECAY_FACTOR,
staircase=True)
# Create an optimizer that performs gradient descent.
opt = tf.train.GradientDescentOptimizer(lr)
# Get images and labels for CIFAR-10.
images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
[images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
# Dequeues one batch for the GPU
image_batch, label_batch = batch_queue.dequeue()
# Calculate the loss for one tower of the CIFAR model. This function
# constructs the entire CIFAR model but shares the variables across
# all towers.
loss = tower_loss(scope, image_batch, label_batch)
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
# Retain the summaries from the final tower.
summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
However, I am confused about the for loop about 'for i in xrange(FLAGS.num_gpus)'. It seems that I have to get a new batch image from batch_queue and calculate every gradient. I think this process is serialized instead of parallel. If there anything wrong with my own understanding? By the way, I can also use the iterator to feed image to my model rather than the dequeue right?
Thank you everybody!
This is a common misconception with Tensorflow's coding model.
What you are showing here is the computation graph's construction, NOT the actual execution.
The block:
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
# Dequeues one batch for the GPU
image_batch, label_batch = batch_queue.dequeue()
# Calculate the loss for one tower of the CIFAR model. This function
# constructs the entire CIFAR model but shares the variables across
# all towers.
loss = tower_loss(scope, image_batch, label_batch)
translates to:
For each GPU device (`for i in range..` & `with device...`):
- build operations needed to dequeue a batch
- build operations needed to run the batch through the network and compute the loss
Note how via tf.get_variable_scope().reuse_variables() you're telling the graph that the variables used for the graph GPU must be shared among all (i.e., all graphs on the multiple devices "reuse" the same variables).
None of this actually runs the network once (note how there is no sess.run()): you're just giving instructions on how data must flow.
Then, when you'll start the actual training (I guess you missed that piece of the code when copying it here) each GPU will pull its own batch and produce the per-tower loss. I guess these losses are averaged somewhere in the subsequent code and the average is the loss passed to the optimizer.
Up until the point where the tower losses are averaged together, everything is independent from the other devices, so getting the batch and computing the loss can be done in parallel. Then the gradients and parameter update is done only once, variables are updated and the cycle repeats.
So, to answer your question, no, per-batch loss computation is not serialized, but since this is synchronous distributed computation you need to collect all losses from all GPUs before being allowed to continue with gradients computation and parameters update, so you still have some part of the graph that cannot be independent.

When using TFRecord, how can I run intermediate validation check? (a better way?)

Let's say I defined a network Net and the example code below runs well.
# ... input processing using TFRecord ... # reading from TFRecord
x, y = tf.train.batch([image, label]) # encode batch
net = Net(x,y) # connect to network
# ... initialize and session ...
for iteration:
loss, _ = sess.run([net.loss, net.train_op])
The Net does not have tf.placeholder, since input is provided by tensors from TFRecord provider. What if I would like to run validation set as well, e.g., every 500 steps? How can I switch input flow?
x, y = tf.train.batch([image, label], ...) # training set
vx, vy = tf.train.batch([vimage, vlabel], ...) # validation set
net = Net(x,y)
for iteration:
loss, _ = sess.run([net.loss, net.train_op])
if step % 500 == 0:
# graph is already defined from input to loss.
# how can I run net.loss with vx and vy??
Only one thing I can imagine is, modifying Net to have placeholders, and every time running like
sess.run([...], feed_dict = {Net.x:sess.run(x), Net.y:sess.run(y)})
sess.run([...], feed_dict = {Net.x:sess.run(vx), Net.y:sess.run(vy)})
However, this seems to me that I lost benefits of using TFRecord (e.g., full TF integration). In the middle of computation flow, I have to stop the flow, run tf.sess, and continue (doesn't this lower speed by forcing to use CPU in the middle?)
I am wondering,
if there is a better way.
if my solution is not that worse than I imagine.
Thanks in advance.
There is a better way (than placeholders). I ran into this issue with the CIFAR10 tutorial in TensorFlow, which I adjusted to check accuracy on the test set simultaneous to the training every 500 batches or so. This is where sharing variables comes in handy.
x, y = tf.train.batch([image, label], ...) # training set
vx, vy = tf.train.batch([vimage, vlabel], ...) # validation set
with tf.variable_scope("model") as scope:
net = Net(x,y)
scope.reuse_variables()
vnet = Net(vx,vy)
for iteration:
loss, _ = sess.run([net.loss, net.train_op])
if step % 500 == 0:
loss, acc = sess.run([vnet.loss, vnet.accuracy])
By setting the scope to reuse variables on the second call to Net(), you will use the same tensors and values created in the first call but with a different set of inputs. Just make sure that vimage and vlabel aren't reusing tensors from image and label (which could possibly solved by creating their own variable scopes).

TensorFlow print input tensors?

I'm building a TF training program and attempting to diagnose some issues we are seeing with it. Root problem is the gradients are always nan. This is against the CIFAR10 data set (we wrote our own program from scratch to ensure we understand all of the mechanics properly).
Its too much code to post here; so it is here: https://github.com/drcrook1/CIFAR10
At this point we are fairly certain the issue is not the learning rate (we took that sucker down to 1e-25 and still got nans; we also simplified the network to a single mlp layer).
What we think is likely happening is the values being read in by the input pipeline are wrong; therefor we want to print the values from a TFRecordReader pipeline to double check that it is in fact reading and decoding the samples properly. As you know, you can only print a TF value if you know its name or have it captured as a variable; so that brings up the point; how does one print an input tensor from a mini batch?
Thanks for any tips!
It turns out you can return examples and labels as operations and then simply print them during graph execution.
def create_sess_ops():
'''
Creates and returns operations needed for running
a tensorflow training session
'''
GRAPH = tf.Graph()
with GRAPH.as_default():
examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,
batch_size=CONSTANTS.BATCH_SIZE,
img_shape=CONSTANTS.IMAGE_SHAPE,
num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)
examples = tf.reshape(examples, [CONSTANTS.BATCH_SIZE, CONSTANTS.IMAGE_SHAPE[0],
CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]])
logits = Vgg3CIFAR10.inference(examples)
loss = Vgg3CIFAR10.loss(logits, labels)
OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)
#OPTIMIZER = tf.train.RMSPropOptimizer(CONSTANTS.LEARNING_RATE)
gradients = OPTIMIZER.compute_gradients(loss)
apply_gradient_op = OPTIMIZER.apply_gradients(gradients)
gradients_summary(gradients)
summaries_op = tf.summary.merge_all()
return [apply_gradient_op, summaries_op, loss, logits, examples, labels], GRAPH
Notice in the above code we use the input queue runners to grab examples and inputs and feed into the graph. We then return examples and labels as operations along side all of our other operations which can then be used during a session run;
def main():
'''
Run and Train CIFAR 10
'''
print('starting...')
ops, GRAPH = create_sess_ops()
total_duration = 0.0
with tf.Session(graph=GRAPH) as SESSION:
COORDINATOR = tf.train.Coordinator()
THREADS = tf.train.start_queue_runners(SESSION, COORDINATOR)
SESSION.run(tf.global_variables_initializer())
SUMMARY_WRITER = tf.summary.FileWriter('Tensorboard/' + CONSTANTS.MODEL_NAME)
GRAPH_SAVER = tf.train.Saver()
for EPOCH in range(CONSTANTS.EPOCHS):
duration = 0
error = 0.0
start_time = time.time()
for batch in range(CONSTANTS.MINI_BATCHES):
_, summaries, cost_val, prediction = SESSION.run(ops)
print(np.where(np.isnan(prediction)))
print(prediction[0])
print(labels[0])
plt.imshow(examples[0])
plt.show()
error += cost_val
duration += time.time() - start_time
total_duration += duration
SUMMARY_WRITER.add_summary(summaries, EPOCH)
print('Epoch %d: loss = %.2f (%.3f sec)' % (EPOCH, error, duration))
if EPOCH == CONSTANTS.EPOCHS - 1 or error < 0.005:
print(
'Done training for %d epochs. (%.3f sec)' % (EPOCH, total_duration)
)
break
Notice in the above code we take the examples and labels operations and we can now print a variety of things. We print out if anything is nan; along with that we print the prediction array itself, the label and we even use matplot lib to plot an example image in each mini batch.
This is exactly what I was looking to do. I needed to do this to verify my issues. The root cause was due to labels being read incorrectly therefor producing infinite gradients; because the labels did not match the examples.
Have you looked at the tf.Print operator?
https://www.tensorflow.org/api_docs/python/tf/Print
If you add this to your graph with an input from one of the nodes you suspect of causing the problem, you should be able to see the results in stderr.
You may also find the check_numerics operator useful for debugging your problem:
How to check NaN in gradients in Tensorflow when updating?
This looks like an ideal use-case for the official TensorFlow Debugger.
From the first example on the page:
from tensorflow.python import debug as tf_debug
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)
From your description, it seems that you too need the tf_debug.has_inf_or_nan checkpoint to start your debugging.

TensorFlow: Does each session run initiate a different batch of data in a graph?

Meaning to say if I have the following graph like:
images, labels = load_batch(...)
with slim.arg_scope(inception_resnet_v2_arg_scope()):
logits, end_points = inception_resnet_v2(images, num_classes = dataset.num_classes, is_training = True)
predictions = tf.argmax(end_points['Predictions'], 1)
accuracy, accuracy_update = tf.contrib.metrics.streaming_accuracy(predictions, labels)
....
train_op = slim.learning.create_train_op(...)
and in a supervisor managed_session as sess within the graph context, I run the following every once in a while:
print sess.run(logits)
print sess.run(end_points['Predictions'])
print sess.run(predictions)
print sess.run(labels)
Do they actually call in different batches for each sess run, given that the batch tensor must actually start from load_batch onwards before they ever get to logits, predictions, or labels? Because now when I run each of these sessions, I get a very confusing result in that even the predictions do not match tf.argmax(end_points['Predictions'], 1), and despite a high accuracy in the model, I do not get any predictions that remotely even match the labels to give that kind of high accuracy. Therefore I suspect that each of the result from sess.run probably come from a different batch of data.
This brings me to my next question: Is there a way to inspect the results of different parts of a graph when a batch from load_batch goes all the way till a train_op, where the sess.run is actually run instead? In other words, is there a way to do what I want to do without calling for another sess.run?
Also, if I were to check the results using sess.run in such a way, would it affect my training in that some batches of data will be skipped and not reach the train_op?
I realized there is a problem with running using separate sess.run in that the data loaded is always different. Instead, when I did something like:
logits, probabilities, predictions, labels = sess.run([logits, probabilities, predictions, labels])
print 'logits: \n', logits
print 'Probabilities: \n', probabilities
print 'predictions: \n', predictions
print 'Labels:\n:', labels
All the quantities coincide very well as what I had expected. I had also tried using tf.Print by writing something like:
logits = tf.Print(logits, [logits], message = 'logits: \n', summarize = 100)
immediately after defining the logits, so that they can get printed within the same session I run the train_op. However, the printing is rather messy and so I would prefer the first method of running everything in a session to obtain the values first and then printing them normally like numpy arrays.

Categories