Plot Gradients of Individual Layers in Tensorboard - python

I have a GCMLE experiment and I want to plot the global norm of layer wise gradients in tensorflow. I can ploy the global norm of all gradients in tensorflow, but I'd like to specifically plot the gradients for only the embeddings. Here is my current code
gradients, variables = zip(*train_op.compute_gradients(loss))
tf.summary.scalar("gradients", tf.global_norm(gradients))
I also know that I should be able to get all of the variables using tf.trainable_variables() but I am not sure what is the easiest way to separate out each layer? I'm guessing that I need to know each layer/variable name and create tensors representing the specific variables of interest? I think it would need to be something like:
list_of_embedding_variables = [somehow grab the relevant names from tf.trainable_variables]
embedding_gradients = [g for g,v in zip(gradients, variables) if variables in list_of_embedding_variables]
tf.summary.scalar("embedding_gradients", tf.global_norm(gradients))
Because I am running this as a GCMLE experiment, I don't have access to sess.run()/print all of the variable names. Is there any way to view the list of tf.trainable_variables() in the saved graph from a GCMLE experiment? Or to display these variable names within tensorboard?
Option 1
One thought that I have had is that I should create collections of the variables of interest -- for example if my embedding sequence is:
embedding_sequence = tf.contrib.layers.embed_sequence(sequence,
vocab_size=n_tokens, embed_dim=word_embedding_size)
tf.add_to_collection("embedding_collection", embedding_sequence)
tf.summary.scalar("embedding_gradients",tf.global_norm(tf.get_collection("embedding_collection")

Something like this:
grads_and_vars=train_op.compute_gradients(loss)
for g, v in grads_and_vars:
if g is not None:
#print(format(v.name))
grad_hist_summary = tf.summary.histogram("{}/grad_histogram".format(v.name), g)
sparsity_summary = tf.summary.scalar("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g))
train_summary.append(grad_hist_summary)
train_summary.append(sparsity_summary)
tf.summary.merge(train_summary)
Let me know if this works.

Related

Tensorboard Image Summaries

I use Matplotlib to create custom t-SNE embedding plots at each epoch during trainging. I would like the plots to be displayed on Tensorboard in a slider format, like this MNST example:
But instead each batch of plots is displayed as separate summaries per epoch, which is really hard to review later. See below:
It appears to be creating multiple image summaries with the same name, so appending _X suffix instead of overwriting or adding to slider like I want. Similarly, when I use the family param, the images are grouped differently but still append _X to the summary name scope.
This is my code to create custom plots and add to tf.summary.image using custom plots and add evaluated summary to summary writer.
def _visualise_embedding(step, summary_writer, features, silhouettes, sample_size=1000):
'''
Visualise features embedding image by adding plot to summary writer to track on Tensorboard
'''
# Select random sample
feats_to_sils = list(zip(features, silhouettes))
shuffle(feats_to_sils)
feats, sils = zip(*feats_to_sils)
feats = feats[:sample_size]
sils = sils[:sample_size]
# Embed feats to 2 dim space
embedded_feats = perform_tsne(2, feats)
# Plot features embedding
im_bytes = plot_embedding(embedded_feats, sils)
# Convert PNG buffer to TF image
image = tf.image.decode_png(im_bytes, channels=4)
# Add the batch dimension
image = tf.expand_dims(image, 0)
summary_op = tf.summary.image("model_projections", image, max_outputs=1, family='family_name')
# Summary has to be evaluated (converted into a string) before adding to the writer
summary_writer.add_summary(summary_op.eval(), step)
I understand I might get the slider plots I want if I add the visualise method as an operation to the graph so as to avoid the name duplication issue. But I need to be able to loop through my evaluated tensor values to perform t-SNE to create the embeddings...
I've been stuck on this for a while so any advise is appreciated!
This can be achieved by using tf.Summary.Image()
For example:
im_summary = tf.Summary.Image(encoded_image_string=im_bytes)
im_summary_value = [tf.Summary.Value(tag=self.confusion_matrix_tensor_name,
image=im_summary)]
This is a summary.proto method so it was obvious to me at first as the method definition is not accessible through Tensorflow. I only realised its functionality when I found a code snippet of it being used on github.
Either way, it exposes image summaries as slides on Tensorboard like I wanted. 💪

How to window or reset streaming operations in tensorflow?

Tensorflow provides all sorts of nice streaming operations to aggregate statistics along batches, such as tf.metrics.mean.
However I find that accumulating all values since the beginning often does not make a lot of sense. For example, one could rather want to have statistics per epoch, or any other time window that makes sense in a given context.
Is there any way to restrict the history of such streaming statistics, for example by reseting streaming operations so that they start over the accumulation?
Work-arounds:
accumulate by hand accross batch
use a "soft" sliding window using EMA
One way to do it is to call the initializer of the relevant variables in the streaming op. For example,
import tensorflow as tf
x = tf.random_normal(())
mean_x, update_op = tf.metrics.mean(x, name='mean_x')
# get the initializers of the local variables (total and count)
my_metric_variables = [v for v in tf.local_variables() if v.name.startswith('mean_x/')]
# or maybe just
# my_metric_variables = tf.get_collection('metric_variables')
reset_ops = [v.initializer for v in my_metric_variables]
with tf.Session() as sess:
tf.local_variables_initializer().run()
for _ in range(100):
for _ in range(100):
sess.run(update_op)
print(sess.run(mean_x))
# if you comment the following out, the estimate of the mean converges to 0
sess.run(reset_ops)
The metrics in tf.contrib.eager.metrics (which work both with and without eager execution) have a init_variable() op you can call if you want to reset their internal variables.

Initializing variables, variable scope and import_graph_def in tensorflow

I have a number of related questions about tensorflow behavior when attempting to do graph surgery using import_graph_def. 2 different graph surgeries
In the image above, I represent with bold red arrows 2 different graph surgeries. On the left, there are 2 graphs, g1 and g2, and the surgery consists of replacing a node in graph g2 by a node - and everything below it - from graph g1. How to do that is explained in this post. The surgery on the right, which involves replacing nodes that belong to the same graph, I haven't been able to figure out how to perform, or even if it is at all possible. I ended up with this minimal example
with tf.Graph().as_default() as g1:
with tf.variable_scope('foo', reuse=tf.AUTO_REUSE):
x = tf.placeholder(dtype=tf.float64, shape=[2], name='x')
c = tf.get_variable('c', initializer=tf.cast(1.0, tf.float64))
y = tf.identity(2*x, 'y')
z = tf.identity(3*x*c, 'z')
g1_def = g1.as_graph_def()
z1, = tf.import_graph_def(g1_def, input_map={'foo/x:0' : y}, return_elements=["foo/z:0"],
name='z1')
init_op = tf.global_variables_initializer()
print(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='foo'))
with tf.Session(graph=g1) as sess:
sess.run(init_op)
print(sess.run(z, feed_dict={'foo/x:0' : np.array([1.0, 2.0])}) )
print(sess.run(tf.report_uninitialized_variables()))
# z1 = sess.run(z1, feed_dict={'foo/x:0' : np.array([1.0, 2.0])})
This code runs as it is. The 3 prints yield respectively:
[<tf.Variable 'foo/c:0' shape=() dtype=float64_ref>]
[ 3. 6.]
[]
In particular, the last print informs that there are no unintialized variables. However, uncommenting the last line, yields the error
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value foo/z1/foo/c
Note that if I remove c from the definition of z above, this would also work. However, I would like to understand this error. To begin with, why is the variable reported as foo/z1/foo/c? Why does the scope foo appear twice? Why is nothing reported when I print the uninitialized variables? Why is only foo/c reported when I print the GLOBAL_VARIABLES collection under the scope foo?
PS: I guess that there is a simpler way to ask the question which is, what is the tensorflow analogue of
theano.clone(some_tensor, replace={input_var : replace_var})
To begin with, why is the variable reported as foo/z1/foo/c?
Why does the scope foo appear twice?
After you've called tf.import_graph_def(...), the graph got duplicated. The first graph is defined in foo score. The second subgraph has been imported under the scope foo/z1 (because name='z1', plus foo is preserved from the scope above). So the graph g1 now contains the following tensors:
foo/x
foo/y
foo/c
...
foo/z1/foo/x
foo/z1/foo/y
foo/z1/foo/c
...
The first foo/c is initialized, but the second foo/z1/foo/c is not (see below).
Why is nothing reported when I print the uninitialized variables? Why is only foo/c reported when I print the GLOBAL_VARIABLES collection under the scope foo?
Since report_uninitialized_variables() scans LOCAL_VARIABLES and GLOBAL_VARIABLES by default, this is basically the same question.
And it probably is a bug: GLOBAL_VARIABLES collection isn't updated after tf.import_graph_def call. I say probably because GLOBAL_VARIABLES was designed as a mere convenience collection. Tensorflow tries to keep it up do date, but probably doesn't guarantee it always has all variables. The fact that tf.add_to_collection exists publicly supports this idea -- one can add any value to any collection if they want it. Bottom line: this behavior may or may not change in future versions, but as of 1.5 the client is responsible to update the global variables after graph import.
In particular, the last print informs that there are no unintialized variables. However, uncommenting the last line, yields the error
To fix this error, you simply need to run the initializer for the z1 subgraph. Like this:
# note that it's defined before `g1.as_graph_def()` to be a part of graph def
init_op = tf.global_variables_initializer()
g1_def = g1.as_graph_def()
z1, = tf.import_graph_def(g1_def, input_map={'foo/x:0': y}, return_elements=["foo/z:0"],
name='z1')
# find the init op
z1_init_op = tf.get_default_graph().get_operation_by_name('foo/z1/foo/init')
...
sess.run(z1_init_op)
And voila! You have the duplicated graphs, just like you wanted to.
I faced a similar issue but simply running the init operation didn't work.
I fixed it by manually running all "Assign" ops of the global variables of the imported graph.
In my scenario I want to run an encoding op 'z' with input 'patch:0' using two different input tensors.
with tf.Session(graph=tf.get_default_graph()).as_default() as sess:
g = tf.Graph()
saved_model = predictor.from_saved_model(args.export_dir, graph=g)
variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)]
fetch_ops = ['z:0','init']
fetch_ops.extend([v.name.strip(":0") + "/Assign" for v in variables)
image_graph = tf.graph_util.import_graph_def(
g.as_graph_def(),
input_map={'patch:0': image},
return_elements=fetch_ops,
name='image')
warped_graph = tf.graph_util.import_graph_def(
g.as_graph_def(),
input_map={'patch:0': warped_image},
return_elements=fetch_ops,
name='warp')
loss = tf.reduce_sum(tf.math.squared_difference(image_graph[0], warped_graph[0]))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.0001)
compute_gradients = optimizer.compute_gradients(
loss,
var_list=[dest_control_point_locations])
apply_gradients = optimizer.apply_gradients(compute_gradients, global_step=step)
sess.run(image_graph[1:])
sess.run(warped_graph[1:])
sess.run(tf.global_variables_initializer())
gradients = sess.run(compute_gradients)
When extracting the operation and running it by feeding my tensors with feed_dict, gradient_computation doesn't work, that's why I used tf.graph_util.import_graph_def(...).
Hope this might help anyone facing the same issue.

Using op inputs when defining custom gradients in TensorFlow

I'm trying to define a gradient method for my custom TF operation. Most of the solutions I have found online seem to based on a gist by harpone. I'm reluctant to use that approach as it uses py_func which won't run on GPU. I found another solution here that uses tf.identity() that looks more elegant and I think will run on GPU. However, I have some problems accessing inputs of the ops in my custom gradient function. Here's my code:
#tf.RegisterGradient('MyCustomGradient')
def _custom_gradient(op, gradients):
x = op.inputs[0]
return(x)
def my_op(w):
return tf.pow(w,3)
var_foo = tf.Variable(5, dtype=tf.float32)
bar = my_op(var_foo)
g = tf.get_default_graph()
with g.gradient_override_map({'Identity': 'MyCustomGradient'}):
bar = tf.identity(bar)
g = tf.gradients(bar, var_foo)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(g))
I was expecting _custom_gradient() to return the input to the op (5 in this example) but instead it seems to return op output x gradient. My custom my_op will have non-differentiable operations like tf.sign and I'd like to define my custom gradient based on the inputs. What am I doing wrong?
There is no problem with your code:
Let's first do the forward pass:
var_foo = 5 -> bar = 125 -> tf.identity(bar) = 125
Now let's backpropagate:
The gradient of tf.identity(bar) with respect to its argument bar equals (by your definition) to bar, that is, 125. The gradient of bar with respect to var_foo equals 3 times the square of var_foo which is 75. Multiply, and you get 9375, which is indeed the output of your code.
op.inputs[0] contains the forward-pass value of the op. In this case, the forward pass of the identity op is 125.

How does one read TensorBoard histograms for a 1D example in TensorFlow?

I made the simplest 1D example for TensorBoard (tracking the minimization of a quadratic) but I get plots that don't make sense to me and I can't figure out why. Is it my own implementation or is TensorBoard buggy?
Here are the plots:
HISTOGRAM:
Usually I think of histograms as bar graphs that encode probability distributions (or frequency counts). I assume that the y-axis say the values and the x-axis the count? Since my numbers of steps is 120 that seemed reasonable guess.
and Scalar plot:
why is there a strange line going through my plots?
The code that produced it (you should be able to copy paste it and run it):
## run cmd to collect model: python playground.py --logdir=/tmp/playground_tmp
## show board on browser run cmd: tensorboard --logdir=/tmp/playground_tmp
## browser: http://localhost:6006/
import tensorflow as tf
# x variable
x = tf.Variable(10.0,name='x')
# b placeholder (simualtes the "data" part of the training)
b = tf.placeholder(tf.float32)
# make model (1/2)(x-b)^2
xx_b = 0.5*tf.pow(x-b,2)
y=xx_b
learning_rate = 1.0
# get optimizer
opt = tf.train.GradientDescentOptimizer(learning_rate)
# gradient variable list = [ (gradient,variable) ]
gv = opt.compute_gradients(y,[x])
# transformed gradient variable list = [ (T(gradient),variable) ]
decay = 0.9 # decay the gradient for the sake of the example
# apply transformed gradients
tgv = [ (decay*g, v) for (g,v) in gv] #list [(grad,var)]
apply_transform_op = opt.apply_gradients(tgv)
# track value of x
x_scalar_summary = tf.scalar_summary("x", x)
x_histogram_sumarry = tf.histogram_summary('x_his', x)
with tf.Session() as sess:
merged = tf.merge_all_summaries()
tensorboard_data_dump = '/tmp/playground_tmp'
writer = tf.train.SummaryWriter(tensorboard_data_dump, sess.graph)
sess.run(tf.initialize_all_variables())
epochs = 120
for i in range(epochs):
b_val = 1.0 #fake data (in SGD it would be different on every epoch)
# applies the gradients
[summary_str_apply_transform,_] = sess.run([merged,apply_transform_op], feed_dict={b: b_val})
writer.add_summary(summary_str_apply_transform, i)
I also met the same problem where multiple lines occurred in the Instance tab in tensor board (even I tried your codes and Board service shows the duplicated warning and only present one curve, better than me)
WARNING:tensorflow:Found more than one graph event per run. Overwriting the graph with the newest event.
nevertheless, the solution hold the same as #Olivier Moindrot mentioned, delete the old logs, while sometimes the board may cache some results so you may want to reboot the board services.
The way to make sure we present the newest summary, as the MINIST example shown, is to log at a new folder:
if tf.gfile.Exists(FLAGS.summaries_dir):
tf.gfile.DeleteRecursively(FLAGS.summaries_dir)
tf.gfile.MakeDirs(FLAGS.summaries_dir)
Link to full source, with TF version r0.10: https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py

Categories