Tensorflow: What is the output node name in Cifar-10 model? - python

I'm trying to understand Tensorflow and I'm seeing one of the official examples, the Cifar-10 model.
In cifar10.py, in inference(), you can see the following lines:
with tf.variable_scope('softmax_linear') as scope:
weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
stddev=1/192.0, wd=0.0)
biases = _variable_on_cpu('biases', [NUM_CLASSES],
softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
scope.name should be softmax_linear, and that should be the node's name. I saved the graph proto with the following lines (it differs from the tutorial):
with tf.Graph().as_default():
global_step = tf.Variable(0, trainable=False)
# Get images and labels
images, labels = cifar10.distorted_inputs()
# Build a Graph that computes the logits predictions from the
# inference model.
logits = cifar10.inference(images)
# Calculate loss.
loss = cifar10.loss(logits, labels)
# Build a Graph that trains the model with one batch of examples and
# updates the model parameters.
train_op = cifar10.train(loss, global_step)
# Create a saver.
saver = tf.train.Saver(tf.global_variables())
# Build the summary operation based on the TF collection of Summaries.
summary_op = tf.summary.merge_all()
# Build an initialization operation to run below.
init = tf.global_variables_initializer()
# Start running operations on the Graph.
sess = tf.Session(config=tf.ConfigProto(
# save the graph
tf.train.write_graph(sess.graph_def, FLAGS.train_dir, 'model.pbtxt')
But I can't see a node called softmax_linear in model.pbtxt. What am I doing wrong? I just want the name of the output node to export the graph.

The operator name won't be "softmax_linear". The tf.name_scope() prefixes names of operators with its name, separated by a /. Each operator has its own name. For example, if you write
with tf.name_scope("foo"):
a = tf.constant(1, name="bar")
then the constant will have name "foo/bar".
Hope that helps!


NotImplementedError: Pre-trained Graph Output -> New Layers

I am working on feeding some outputs of a pre-trained graph into some additional layers in Tensorflow. Here is a walkthrough of some of my code:
First, I define a new tf.Graph(), and load in the pre-trained model.
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile('./mobilenetssd/frozen_inference_graph.pb', 'rb') as fid:
serialized_graph = fid.read()
tf.import_graph_def(od_graph_def, name='')
Fetching input/output tensors of the loaded graph, defining placeholders, adding some ops.
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
output_matrix = detection_graph.get_tensor_by_name('concat:0')
labels = tf.placeholder(tf.float32, [None, 1])
# Adding operations
outmat_sq = tf.squeeze(output_matrix)
logits_max = tf.squeeze(tf.math.reduce_max(outmat_sq, reduction_indices=[0]))
logits_mean = tf.squeeze(tf.math.reduce_mean(outmat_sq, reduction_indices=[0]))
logodds = tf.concat([logits_max, logits_mean], 0)
logodds = tf.expand_dims(logodds, 0)
logodds.set_shape([None, 1204])
Defining the new layers, setting up optimizer to train new layers.
hidden = tf.contrib.layers.fully_connected(inputs=logodds, num_outputs=500, activation_fn=tf.nn.tanh)
out = tf.contrib.layers.fully_connected(inputs=hidden, num_outputs=1, activation_fn=tf.nn.sigmoid)
# Define Loss, Training, and Accuracy
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out, labels=labels))
training_step = tf.train.AdamOptimizer(1e-6).minimize(loss, var_list=[hidden, out])
correct_prediction = tf.equal(tf.round(out), labels)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
After running this code, I am getting a NotImplementedError:('Trying to update a Tensor ', tf.Tensor 'fully_connected/Tanh:0' shape=(?, 500) dtype=float32) error. This seems to be a problem with "linking" the two parts of the model together. Do I need to pass the output of the first graph into some tf.Variable and then pass that into the subsequent layers? Also, I am using TF 1.10.
Any insight on this would be appreciated!

Display loss in a Tensorflow DQN without leaving tf.Session()

I have a DQN all set up and working, but I can't figure out how to display the loss without leaving the Tensorflow session.
I first thought it involved creating a new function or class, but I'm not sure where to put it in the code, and what specifically to put into the function or class.
observations = tf.placeholder(tf.float32, shape=[None, num_stops], name='observations')
actions = tf.placeholder(tf.int32,shape=[None], name='actions')
rewards = tf.placeholder(tf.float32,shape=[None], name='rewards')
# Model
Y = tf.layers.dense(observations, 200, activation=tf.nn.relu)
Ylogits = tf.layers.dense(Y, num_stops)
# sample an action from predicted probabilities
sample_op = tf.random.categorical(logits=Ylogits, num_samples=1)
# loss
cross_entropies = tf.losses.softmax_cross_entropy(onehot_labels=tf.one_hot(actions,num_stops), logits=Ylogits)
loss = tf.reduce_sum(rewards * cross_entropies)
# training operation
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001, decay=.99)
train_op = optimizer.minimize(loss)
I then run the network, which works without error.
with tf.Session() as sess:
'''etc. The network is run'''
sess.run(train_op, feed_dict={observations: observations_list,
actions: actions_list,
rewards: rewards_list})
I want to have loss from train_op displayed to the user.
try this
loss, _ = sess.run([loss, train_op], feed_dict={observations: observations_list,
actions: actions_list,
rewards: rewards_list})

Correctly loading a model to resume training (meta graph, ckpts)

I'm having trouble loading a model to resume training.
I'm using a simple two-layered-NN (Fully connected) on a cifar data set for practice.
NN Setup:
import tensorflow as tf
import numpy as np
#input _-> hidden ->
def inference(data_samples, image_pixels, hidden_units, classes, reg_constant):
with tf.variable_scope('Layer1'):
# Define the variables
weights = tf.get_variable(
shape=[image_pixels, hidden_units],
stddev=1.0 / np.sqrt(float(image_pixels))),
biases = tf.Variable(tf.zeros([hidden_units]), name='biases')
# Define the layer's output
hidden = tf.nn.relu(tf.matmul(data_samples, weights) + biases)
with tf.variable_scope('Layer2'):
# Define variables
weights = tf.get_variable('weights', [hidden_units, classes],
stddev=1.0 / np.sqrt(float(hidden_units))),
biases = tf.Variable(tf.zeros([classes]), name='biases')
# Define the layer's output
logits = tf.matmul(hidden, weights) + biases
# Define summery-operation for 'logits'-variable
tf.summary.histogram('logits', logits)
return logits
def loss(logits, labels):
'''Calculates the loss from logits and labels.
logits: Logits tensor, float - [batch size, number of classes].
labels: Labels tensor, int64 - [batch size].
loss: Loss tensor of type float.
with tf.name_scope('Loss'):
# Operation to determine the cross entropy between logits and labels
cross_entropy = tf.reduce_mean(
logits=logits, labels=labels, name='cross_entropy'))
# Operation for the loss function
loss = cross_entropy + tf.add_n(tf.get_collection(
# Add a scalar summary for the loss
tf.summary.scalar('loss', loss)
return loss
def training(loss, learning_rate):
# Create a variable to track the global step
global_step = tf.Variable(0, name='global_step', trainable=False)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(
loss, global_step=global_step)
#train_step = tf.train.AdamOptimizer(learning_rate, beta1, beta2, epsilon).minimize(
#loss, global_step=global_step)
return train_step
def evaluation(logits, labels):
with tf.name_scope('Accuracy'):
# Operation comparing prediction with true label
correct_prediction = tf.equal(tf.argmax(logits,1), labels)
# Operation calculating the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Summary operation for the accuracy
tf.summary.scalar('train_accuracy', accuracy)
return accuracy
Saved model like this:
if (i + 1) % 500 == 0:
saver.save(sess, MODEL_DIR, global_step=i)
print('Saved checkpoint')
Saved model files
Within this directory:
C:\Users\Moondra\Desktop\CIFAR - PROJECT\parameters_no_changes
I have the following files as well as model.ckpt-499.index etc:
My attempt at loading the model
import numpy as np
import tensorflow as tf
import time
from datetime import datetime
import os
import data_helpers
import full_connected_layers
import itertools
learning_rate = .0001
max_steps = 3000
batch_size = 400
checkpoint = r'C:\Users\Moondra\Desktop\CIFAR - PROJECT\parameters_no_changes\model.ckpt-999'
with tf.Session() as sess:
saver = tf.train.import_meta_graph(r'C:\Users\Moondra\Desktop\CIFAR - PROJECT' +
saver.restore(sess, checkpoint)
data_sets = data_helpers.load_data()
images = tf.get_default_graph().get_tensor_by_name('images:0') #image placeholder
labels = tf.get_default_graph().get_tensor_by_name('image-labels:0') #placeholder
loss = tf.get_default_graph().get_tensor_by_name('Loss/add:0')
#global_step = tf.get_default_graph().get_tensor_by_name('global_step/initial_value_1:0')
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(
accuracy = tf.get_default_graph().get_tensor_by_name('Accuracy/Mean:0')
with tf.Session() as sess:
zipped_data = zip(data_sets['images_train'], data_sets['labels_train'])
batches = data_helpers.gen_batch(list(zipped_data), batch_size,
for i in range(max_steps):
# Get next input data batch
batch = next(batches)
images_batch, labels_batch = zip(*batch)
feed_dict = {
images: images_batch,
labels: labels_batch
if i % 100 == 0:
train_accuracy = sess.run(accuracy, feed_dict=feed_dict)
print('Step {:d}, training accuracy {:g}'.format(i, train_accuracy))
ts,loss_ =sess.run([train_step, loss], feed_dict=feed_dict)
Errors and confusion
1) Should I be using this command latest_checkpoint to restore:
I see some tutorials that just point to the folder holding the
.data, .index files.
2) Which brings me to the second question: What should I be using as the second parameter of saver.restore.
Currently I'm just pointing to the folder/dir that holds those files
3) I'm not purposely initializing any variables as I was told, that would overwrite the stored weight and bias values. This seems to be leading to this error:
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value Layer1/weights
[[Node: Layer1/weights/read = Identity[T=DT_FLOAT, _class=["loc:#Layer1/weights"], _device="/job:localhost/replica:0/task:0/cpu:0"](Layer1/weights)]]
4) However, If I do initialize all variables via this code:
My model seems to start training from scratch (and not resuming training)
Does that mean I'm supposed to load all weights and biases via
get_tensor explicitly? If so , how I deal with layers with 20 plus layers?
5) When I run this command
for i in tf.get_default_graph().get_operations():
I see many global_steps tensors/operations,
'global_step/initial_value' type=Const>>
'global_step' type=VariableV2>>
<'global_step/Assign' type=Assign>>
global_step/read' type=Identity>>
I was trying to load this variable into my current graph, but
didn't know which one I'm supposed to get using the command
get_tensor_by_name. Most of them were resulting in a does not exist error.
6) Same with loss which loss am I supposed to load into my graph with get_tensor
These are the options:
<bound method Operation.values of <tf.Operation 'Loss/Const' type=Const>>
<bound method Operation.values of <tf.Operation 'Loss/Mean' type=Mean>>
<bound method Operation.values of <tf.Operation 'Loss/AddN' type=AddN>>
<bound method Operation.values of <tf.Operation 'Loss/add' type=Add>>
<bound method Operation.values of <tf.Operation 'Loss/loss/tags' type=Const>>
<bound method Operation.values of <tf.Operation 'Loss/loss' type=ScalarSummary>>
6) Lastly, I see a lot of gradient operations when I look at all
the nodes of the graph but I don't see any nodes related to train_step (the
python variable I created that points to the Gradient Dsecent Optimizer). Does that mean I don't need to load it into this graph via get_tensor?
Thank you.
I usually did this sequence of operations:
This translates to this kind of code:
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('./'))
It will avoid the non-initialized error, and the restore will overwrite with the values from the checkpoint.
1/ In the folder where you save your checkpoint, there should be a file named 'checkpoint' which contains the name of your latest checkpoint.
I normally read this file to find latest checkpoint.
2/ I use checkpoint_directory/global_step.
With this, tf will create 4 files in the checkpoint_directory:
3/ 4/ I'm pretty sure you don't need to pre-initialize the graph before loading, at least I don't do it.
There is some difference: instead of import_meta_graph, I rebuild the whole graph every time I load, but I'm sure it's not an issue to load before you initialize.
5/ Be careful not to mis-take operations for tensors and you are good to go. Tensor name should be op_name:0, which mean this tensor is the output[0] of the operation op_name.
6/ 7/ Well, let me just tell you how I resume my checkpoint. This is probably not the correct way, but it really saves me from the burden of get_tensor_by_name. Seriously get_tensor_by_name can be a real pita sometimes.
Normally my loading process will go through: rebuild graph, load checkpoint, create some new tensors if needed, initialize tensors that is not in the checkpoint.
saver = tf.train.Saver()
saver.restore(session, checkpoint_dir/global_step)
checkpoint_dir/global_step is from the checkpoint file if you want the latest checkpoint, or you can use different global_step to get the specific checkpoint that you wanna load.

TensorFlow restore throwing "No Variable to save" error

I am working through some code to understand how to save and restore checkpoints in tensorflow. To do so, I implemented a simple neural netowork that works with MNIST digits and saved the .ckpt file like so:
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
learning_rate = 0.001
n_input = 784 # MNIST data input (img shape = 28*28)
n_classes = 10 # MNIST total classes 0-9
#import MNIST data
mnist = input_data.read_data_sets('.', one_hot = True)
#Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])
#Weights and biases
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))
#logits = xW + b
logits = tf.add(tf.matmul(features, weights), bias)
#Define loss and optimizer
cost = tf.reduce_mean(\
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
import math
save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100
saver = tf.train.Saver()
# Launch the graph
with tf.Session() as sess:
# Training cycle
for epoch in range(n_epochs):
total_batch = math.ceil(mnist.train.num_examples / batch_size)
# Loop over all batches
for i in range(total_batch):
batch_features, batch_labels = mnist.train.next_batch(batch_size)
feed_dict={features: batch_features, labels: batch_labels})
# Print status for every 10 epochs
if epoch % 10 == 0:
valid_accuracy = sess.run(
features: mnist.validation.images,
labels: mnist.validation.labels})
print('Epoch {:<3} - Validation Accuracy: {}'.format(
# Save the model
saver.save(sess, save_file)
print('Trained Model Saved.')
This part works well, and I get the .ckpt file saved in the correct directory. The problem comes in when I try to restore the model in an attempt to work on it again. I use the following code to restore the model:
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, 'train_model.ckpt.meta')
print('model restored')
and end up with the error: ValueError: No variables to save
Not too sure, what the mistake here is. Any help is appreciated. Thanks in advance
A Graph is different to the Session. A graph is the set of operations joining tensors, each of which is a symbolic representation of a set of values. A Session assigns specific values to the Variable tensors, and allows you to run operations in that graph.
The chkpt file saves variable values - i.e. those saved in the weights and biases - but not the graph itself.
The solution is simple: re-run the graph construction (everything before the Session, then start your session and load values from the chkpt file.
Alternatively, you can check out this guide for exporting and importing MetaGraphs.
You should tell the Saver which Variables to restore, default Saver will get all the Variables from the default graph.
As in your case, you should add the constructing graph code before saver = tf.train.Saver()

duplicate a tensorflow graph

What is the best way of duplicating a TensorFlow graph and keep it uptodate?
Ideally I want to put the duplicated graph on another device (e.g. from GPU to CPU) and then time to time update the copy.
Short answer: You probably want checkpoint files (permalink).
Long answer:
Let's be clear about the setup here. I'll assume that you have two devices, A and B, and you are training on A and running inference on B.
Periodically, you'd like to update the parameters on the device running inference with new parameters found during training on the other.
The tutorial linked above is a good place to start. It shows you how tf.train.Saver objects work, and you shouldn't need anything more complicated here.
Here is an example:
import tensorflow as tf
def build_net(graph, device):
with graph.as_default():
with graph.device(device):
# Input placeholders
inputs = tf.placeholder(tf.float32, [None, 784])
labels = tf.placeholder(tf.float32, [None, 10])
# Initialization
w0 = tf.get_variable('w0', shape=[784,256], initializer=tf.contrib.layers.xavier_initializer())
w1 = tf.get_variable('w1', shape=[256,256], initializer=tf.contrib.layers.xavier_initializer())
w2 = tf.get_variable('w2', shape=[256,10], initializer=tf.contrib.layers.xavier_initializer())
b0 = tf.Variable(tf.zeros([256]))
b1 = tf.Variable(tf.zeros([256]))
b2 = tf.Variable(tf.zeros([10]))
# Inference network
h1 = tf.nn.relu(tf.matmul(inputs, w0)+b0)
h2 = tf.nn.relu(tf.matmul(h1,w1)+b1)
output = tf.nn.softmax(tf.matmul(h2,w2)+b2)
# Training network
cross_entropy = tf.reduce_mean(-tf.reduce_sum(labels * tf.log(output), reduction_indices=[1]))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
# Your checkpoint function
saver = tf.train.Saver()
return tf.initialize_all_variables(), inputs, labels, output, optimizer, saver
The code for the training program:
def programA_main():
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# Build training network on device A
graphA = tf.Graph()
init, inputs, labels, _, training_net, saver = build_net(graphA, '/cpu:0')
with tf.Session(graph=graphA) as sess:
for step in xrange(1,10000):
batch = mnist.train.next_batch(50)
sess.run(training_net, feed_dict={inputs: batch[0], labels: batch[1]})
if step%100==0:
saver.save(sess, '/tmp/graph.checkpoint')
print 'saved checkpoint'
...and code for an inference program:
def programB_main():
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# Build inference network on device B
graphB = tf.Graph()
init, inputs, _, inference_net, _, saver = build_net(graphB, '/cpu:0')
with tf.Session(graph=graphB) as sess:
batch = mnist.test.next_batch(50)
saver.restore(sess, '/tmp/graph.checkpoint')
print 'loaded checkpoint'
out = sess.run(inference_net, feed_dict={inputs: batch[0]})
print out[0]
import time; time.sleep(2)
saver.restore(sess, '/tmp/graph.checkpoint')
print 'loaded checkpoint'
out = sess.run(inference_net, feed_dict={inputs: batch[0]})
print out[1]
If you fire up the training program and then the inference program, you'll see the inference program produces two different outputs (from the same input batch). This is a result of it picking up the parameters that the training program has checkpointed.
Now, this program obviously isn't your end point. We don't do any real synchronization, and you'll have to decide what "periodic" means with respect to checkpointing. But this should give you an idea of how to sync parameters from one network to another.
One final warning: this does not mean that the two networks are necessarily deterministic. There are known non-deterministic elements in TensorFlow (e.g., this), so be wary if you need exactly the same answer. But this is the hard truth about running on multiple devices.
Good luck!
I'll try to go with a pretty simplified answer, to see if the general approach is what OP is describing:
I'd implement it via the tf.train.Saver object.
Suppose you have your weights in a variable W1, W2, and b1
mysaver = tf.train.Saver(({'w1': W1, 'w2': W2, 'b1': b1}))
In the train loop you can add, every n iterations:
saver.save(session_var, 'model1', global_step=step)
And then in the loading instance, when needed, you run:
tf.train.Saver.restore(other_session_object, 'model1')
Hope this is similar to the solution you are asking.
Simply do the round trip tf.Graph > tf.GraphDef > tf.Graph:
import tensorflow as tf
def copy_graph(graph: tf.Graph) -> tf.Graph:
with tf.Graph().as_default() as copied_graph:
graph_def = graph.as_graph_def(add_shapes=True)
return copied_graph
