I use Matplotlib to create custom t-SNE embedding plots at each epoch during trainging. I would like the plots to be displayed on Tensorboard in a slider format, like this MNST example:
But instead each batch of plots is displayed as separate summaries per epoch, which is really hard to review later. See below:
It appears to be creating multiple image summaries with the same name, so appending _X suffix instead of overwriting or adding to slider like I want. Similarly, when I use the family param, the images are grouped differently but still append _X to the summary name scope.
This is my code to create custom plots and add to tf.summary.image using custom plots and add evaluated summary to summary writer.
def _visualise_embedding(step, summary_writer, features, silhouettes, sample_size=1000):
'''
Visualise features embedding image by adding plot to summary writer to track on Tensorboard
'''
# Select random sample
feats_to_sils = list(zip(features, silhouettes))
shuffle(feats_to_sils)
feats, sils = zip(*feats_to_sils)
feats = feats[:sample_size]
sils = sils[:sample_size]
# Embed feats to 2 dim space
embedded_feats = perform_tsne(2, feats)
# Plot features embedding
im_bytes = plot_embedding(embedded_feats, sils)
# Convert PNG buffer to TF image
image = tf.image.decode_png(im_bytes, channels=4)
# Add the batch dimension
image = tf.expand_dims(image, 0)
summary_op = tf.summary.image("model_projections", image, max_outputs=1, family='family_name')
# Summary has to be evaluated (converted into a string) before adding to the writer
summary_writer.add_summary(summary_op.eval(), step)
I understand I might get the slider plots I want if I add the visualise method as an operation to the graph so as to avoid the name duplication issue. But I need to be able to loop through my evaluated tensor values to perform t-SNE to create the embeddings...
I've been stuck on this for a while so any advise is appreciated!
This can be achieved by using tf.Summary.Image()
For example:
im_summary = tf.Summary.Image(encoded_image_string=im_bytes)
im_summary_value = [tf.Summary.Value(tag=self.confusion_matrix_tensor_name,
image=im_summary)]
This is a summary.proto method so it was obvious to me at first as the method definition is not accessible through Tensorflow. I only realised its functionality when I found a code snippet of it being used on github.
Either way, it exposes image summaries as slides on Tensorboard like I wanted. 💪
Related
Hi everyone I'm facing an issue after that I elaborate images and labels. To create an unique dataset I use the zip function. After the elaboration both images and labels are 18k and it's correct but when I call the zip(image,labels), items become 563.
Here some code to let you to understand:
# Map the load_and_preprocess_image function over the dataset of image paths
images = image_paths.map(load_and_preprocess_image)
# Map the extract_label function over the dataset of image paths
labels = image_paths.map(extract_label)
# Zip the labels and images together to create a dataset of (image, label) pairs
#HERE SOMETHING STRANGE HAPPENS
data = tf.data.Dataset.zip((images,labels))
# Shuffle and batch the data
data = data.shuffle(buffer_size=1000).batch(32)
# Split the data into train and test sets
data = data.shuffle(buffer_size=len(data))
# Convert the dataset into a collection of data
num_train = int(0.8 * len(data))
train_data = image_paths.take(num_train)
val_data = image_paths.skip(num_train)
I cannot see where is the error. Can you help me plese? Thanks
I'd like to have a dataset of 18k images,labels
tf's zip
tf.data.Dataset.zip is not like Python's zip. The tf.data.Dataset.zip's input is tf datasets. You may check the images/label return from your map function is the correct tf.Dataset object.
check tf.ds
make sure your image/label is correct tf.ds.
print("ele: ", images_dataset.element_spec)
print("num: ", images_dataset.cardinality().numpy())
print("ele: ", labels_dataset.element_spec)
print("num: ", labels_dataset.cardinality().numpy())
workaround
In your case, combine the image and label processing in one map function and return both to bypass to use tf.data.Dataset.zip:
# load_and_preprocess_image_and_label
def load_and_preprocess_image_and_label(image_path):
""" load image and label then some operations """
return image, label
# Map the load_and_preprocess_image function over the dataset of image/label paths
train_list = tf.data.Dataset.list_files(str(PATH / 'train/*.jpg'))
data = train_list.map(load_and_preprocess_image_and_label,
num_parallel_calls=tf.data.AUTOTUNE)
I've created a dataloader for my object detection task.
However, I cannot place the image/path name to a tensor. Instead I have it indexed, where in the last portion of the dataloader class, I have this:
target = {}
target['boxes'] = boxes
target['labels'] = labels
target['image_id'] = torch.tensor([index])
target['area'] = area
target['iscrowd'] = iscrowd
target['image_name'] = torch.tensor(index)
return image, target
where atm image_id and image_name are the same thing.
When I print out the image_name from the dataloader, I of course get this:
for image, target in valid_data_loader:
print(target[0]['image_name'])
Output:
tensor(0)
tensor(1)
tensor(2)
tensor(3)
tensor(4)
tensor(5)
tensor(6)
tensor(7)
I'm aware that strings can't be saved into torch tensors, so is there any way I can refer back to the original image name rather than the index of the tensor? Or would I just have to use the number that comes out and refer back to the dataset class (not dataloader)?
I ultimately want to save the image name, and attributes such as bounding box info to a separate numpy dataframe.
Ok, so this is a bit ad-hoc and not exactly what I was thinking but here is one method I have used to retrieve the paths/image names. I basically find the id from the dataloader by removing it from the tensor. I then use the tensor_id to find the corresponding id in the original dataframe:
for image, target in valid_data_loader:
tensor_id = target[0]['image_name'].item()
print(valid_df.iloc[tensor_id]['image_id'])
I don't know if this is efficient though but it got what I wanted...
I have a GCMLE experiment and I want to plot the global norm of layer wise gradients in tensorflow. I can ploy the global norm of all gradients in tensorflow, but I'd like to specifically plot the gradients for only the embeddings. Here is my current code
gradients, variables = zip(*train_op.compute_gradients(loss))
tf.summary.scalar("gradients", tf.global_norm(gradients))
I also know that I should be able to get all of the variables using tf.trainable_variables() but I am not sure what is the easiest way to separate out each layer? I'm guessing that I need to know each layer/variable name and create tensors representing the specific variables of interest? I think it would need to be something like:
list_of_embedding_variables = [somehow grab the relevant names from tf.trainable_variables]
embedding_gradients = [g for g,v in zip(gradients, variables) if variables in list_of_embedding_variables]
tf.summary.scalar("embedding_gradients", tf.global_norm(gradients))
Because I am running this as a GCMLE experiment, I don't have access to sess.run()/print all of the variable names. Is there any way to view the list of tf.trainable_variables() in the saved graph from a GCMLE experiment? Or to display these variable names within tensorboard?
Option 1
One thought that I have had is that I should create collections of the variables of interest -- for example if my embedding sequence is:
embedding_sequence = tf.contrib.layers.embed_sequence(sequence,
vocab_size=n_tokens, embed_dim=word_embedding_size)
tf.add_to_collection("embedding_collection", embedding_sequence)
tf.summary.scalar("embedding_gradients",tf.global_norm(tf.get_collection("embedding_collection")
Something like this:
grads_and_vars=train_op.compute_gradients(loss)
for g, v in grads_and_vars:
if g is not None:
#print(format(v.name))
grad_hist_summary = tf.summary.histogram("{}/grad_histogram".format(v.name), g)
sparsity_summary = tf.summary.scalar("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g))
train_summary.append(grad_hist_summary)
train_summary.append(sparsity_summary)
tf.summary.merge(train_summary)
Let me know if this works.
I have a number of training examples in my dataset and would like to rotate each one so that I get double the number. I am using datasets and tried it like this:
def addrotation(images, labels):
images_rotated_left = tf.contrib.image.rotate(images, pi/2.0)
labels_rotated_left = tf.stack([labels[1], labels[2], labels[0]])
return tf.stack([images,images_rotated_left]), tf.stack([labels, labels_rotated_left])
But when I now use dataset = dataset.map(addrotation), I get examples with double the data.
Is it possible to return the rotated tensors in a way so that they count as seperate examples or "lines"?
Never mind, I found a solution:
I create a new dataset with all the rotated examples and then zip the two datasets together like explained here:
https://stackoverflow.com/a/47344405/984336
I made the simplest 1D example for TensorBoard (tracking the minimization of a quadratic) but I get plots that don't make sense to me and I can't figure out why. Is it my own implementation or is TensorBoard buggy?
Here are the plots:
HISTOGRAM:
Usually I think of histograms as bar graphs that encode probability distributions (or frequency counts). I assume that the y-axis say the values and the x-axis the count? Since my numbers of steps is 120 that seemed reasonable guess.
and Scalar plot:
why is there a strange line going through my plots?
The code that produced it (you should be able to copy paste it and run it):
## run cmd to collect model: python playground.py --logdir=/tmp/playground_tmp
## show board on browser run cmd: tensorboard --logdir=/tmp/playground_tmp
## browser: http://localhost:6006/
import tensorflow as tf
# x variable
x = tf.Variable(10.0,name='x')
# b placeholder (simualtes the "data" part of the training)
b = tf.placeholder(tf.float32)
# make model (1/2)(x-b)^2
xx_b = 0.5*tf.pow(x-b,2)
y=xx_b
learning_rate = 1.0
# get optimizer
opt = tf.train.GradientDescentOptimizer(learning_rate)
# gradient variable list = [ (gradient,variable) ]
gv = opt.compute_gradients(y,[x])
# transformed gradient variable list = [ (T(gradient),variable) ]
decay = 0.9 # decay the gradient for the sake of the example
# apply transformed gradients
tgv = [ (decay*g, v) for (g,v) in gv] #list [(grad,var)]
apply_transform_op = opt.apply_gradients(tgv)
# track value of x
x_scalar_summary = tf.scalar_summary("x", x)
x_histogram_sumarry = tf.histogram_summary('x_his', x)
with tf.Session() as sess:
merged = tf.merge_all_summaries()
tensorboard_data_dump = '/tmp/playground_tmp'
writer = tf.train.SummaryWriter(tensorboard_data_dump, sess.graph)
sess.run(tf.initialize_all_variables())
epochs = 120
for i in range(epochs):
b_val = 1.0 #fake data (in SGD it would be different on every epoch)
# applies the gradients
[summary_str_apply_transform,_] = sess.run([merged,apply_transform_op], feed_dict={b: b_val})
writer.add_summary(summary_str_apply_transform, i)
I also met the same problem where multiple lines occurred in the Instance tab in tensor board (even I tried your codes and Board service shows the duplicated warning and only present one curve, better than me)
WARNING:tensorflow:Found more than one graph event per run. Overwriting the graph with the newest event.
nevertheless, the solution hold the same as #Olivier Moindrot mentioned, delete the old logs, while sometimes the board may cache some results so you may want to reboot the board services.
The way to make sure we present the newest summary, as the MINIST example shown, is to log at a new folder:
if tf.gfile.Exists(FLAGS.summaries_dir):
tf.gfile.DeleteRecursively(FLAGS.summaries_dir)
tf.gfile.MakeDirs(FLAGS.summaries_dir)
Link to full source, with TF version r0.10: https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py