TensorFlow Inference Graph - Loading and Restoring Variables impact - python

This is closely related to a lot of questions, including one of my own here: TensorFlow Inference
Every sample in TensorFlow for inference appears to follow this form:
import tensorflow as tf
import CONSTANTS
import Vgg3CIFAR10
import numpy as np
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
rand = np.random.rand(1, 32, 32, 3).astype(np.float32)
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
logits = Vgg3CIFAR10.inference(images)
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta')
new_saver.restore(sess, MODEL_PATH)
print(sess.run(logits, feed_dict={images : rand}))
print('done')
run_inference()
Issues:
Restoring the model & graph does just that...except I am creating a parallel graph here where I am possibly adding new parts to the graph. (Except tensorflow graphs are append only; so how does this add to the graph and run just that segment if it is appended; it would want to run the whole thing.
What happens to the queue runners that existed in the loaded graph; all those ops are loaded. By printing out sess.graph.get_operations() you can see all of the old input ops are there.
Does logits = Vgg3CIFAR10.inference(images) not append new items to the graph? If it is because of naming; then does the placeholder input replace the queue runner stuff?
Possible answer for a few items: Because I defined the logits op first; this means that the rest of the graph got appended after that; and via some tensorflow magic sauce the variables from the original graph got restored into the logits portion of the graph?
So I tested this out; and it doesn't even work properly...
It first creates a graph with logits, then it appends to that graph the old graph. So when you call inference; you just get a bunch of garbage back...
[[ 0.09815982 0.09611271 0.10542709 0.10383813 0.0955615 0.10979554
0.12138291 0.09316944 0.08336139 0.09319157]]
[[ 0.10305423 0.092167 0.10572157 0.10368075 0.1043573 0.10057402
0.12435613 0.08916584 0.07929172 0.09763144]]
[[ 0.1068181 0.09361464 0.10377798 0.10060066 0.10110897 0.09462726
0.11688241 0.09941135 0.0869903 0.09616835]]
Here I am expecting node 8 followed by nodes 2 and 2 to be the ones surfaced...obviously its just a bunch of nothing...

So after a ton of review this is what happens...
If you add anything to the graph before restoring the graph; the restored graph is appended to the already created graph.
Restoring variables looks for variable names in your graph at the time of restoring variables with the same name as the variables that are stored in the meta files. If you created a graph with the same variable names, and then restored a graph that also had the same variable names, the test I ran showed the appended stored graph receiving the restoration of the variables and not the initial graph.
So in summary; be careful about what you are doing with your graphs and be very aware of how things get appended/restored. If you were looking at this in hopes to find inference; then look at this S.O. where the answer involves creating a new graph and restoring variables to that new graph which is in fact a subgraph of the original.
TensorFlow Inference

Related

Structuring a Keras project to achieve reproducible results in GPU

I am writing a tensorflow.Keras wrapper to perform ML experiments.
I need my framework to be able to perform an experiment as specified in a configuration yaml file and run in parallel in a GPU.
Then I need a guarantee that if I ran the experiment again I would get if not the exact same results something reasonably close.
To try to ensure this, my training script contains these lines at the beginning, following the guidelines in the official documentation:
# Set up random seeds
random.seed(seed)
np.random.seed(seed)
tf.set_random_seed(seed)
This has proven to not be enough.
I ran the same configuration 4 times, and plotted the results:
As you can see, results vary a lot between runs.
How can I set up a training session in Keras to ensure I get reasonably similar results when training in a GPU? Is this even possible?
The full training script can be found here.
Some of my colleagues are using just pure TF, and their results seem far more consistent. What is more, they do not seem to be seeding any randomness except to ensure that the train and validation split is always the same.
Keras + Tensorflow.
Step 1, disable GPU.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
Step 2, seed those libraries which are included in your code, say "tensorflow, numpy, random".
import tensorflow as tf
import numpy as np
import random as rn
sd = 1 # Here sd means seed.
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)
from keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)
Make sure these two pieces of code are included at the start of your code, then the result will be reproducible.
Try adding seed parameters to weights/biases initializers. Just to add more specifics to Alexander Ejbekov's comment.
Tensorflow has two random seeds graph level and op level. If you're using more than one graph, you need to specify seed in every one. You can override graph level seed with op level, by setting seed parameter within function. And you can make two functions even from different graphs output same value if same seed is set.
Consider this example:
g1 = tf.Graph()
with g1.as_default():
tf.set_random_seed(1)
a = tf.get_variable('a', shape=(1,), initializer=tf.keras.initializers.glorot_normal())
b = tf.get_variable('b', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=2))
with tf.Session(graph=g1) as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(a))
print(sess.run(b))
g2 = tf.Graph()
with g2.as_default():
a1 = tf.get_variable('a1', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=1))
with tf.Session(graph=g2) as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(a1))
In this example, output of a is the same as a1, but b is different.

Tensorflow graph results appear random after restor

I trained a model to predict the next word in a sequence. I saved the model using tf.train.Saver(). However, when I go to restore the model and supply it the same seed value, the output changes each time I run the testing. For example, if I supply it with the words "happy birthday to", it will predict "you", but then , if I run it 10 seconds later, it will predict "rhyno". I have a feeling that this might be due to me randomly initializing the internal layers to random normal weights, however, wouldn't restoring the model restore the values after training and not reinitialize the layers? My restore code is below:
with tf.Session() as sess:
saved_model = tf.train.import_meta_graph(
'C:/Users/me/my_model.meta') # load graph from training
saved_model.restore(sess, tf.train.latest_checkpoint('./'))
imported_graph = tf.get_default_graph()
x = imported_graph.get_operation_by_name("ph_x").outputs[0]
prediction = imported_graph.get_tensor_by_name('prediction:0')
run_input = seed_values
print(np.array2string(run_input, separator=" "))
for _ in range(production_size):
run_input_oh = hlp.word_to_one_hot(run_input, hp_dict, 0)
pred = hlp.one_hot_to_word(sess.run(prediction, feed_dict={x: run_input_oh}), rev_dict)
print(sess.run(prediction, feed_dict={x: run_input_oh}))
You called default graph after restoring the saved weights. This will ignore restored weight.
Solution:
First call default graph, then restore the weights!
Try this!
with tf.Session() as sess:
saved_model = tf.train.import_meta_graph(
'C:/Users/me/my_model.meta') # load graph from training
imported_graph = tf.get_default_graph()
saved_model.restore(sess, tf.train.latest_checkpoint('./'))
...
#midhun pk's answer is mistaken, calling tf.get_default_graph() does not modify the graph, and calling it before or after saved_model.restore makes no difference.
Your code seems fine (calling import_meta_graph adds the nodes of the saved graph to the current graph, and calling restore restores the states of the variables), and it's difficult to debug without more information about your model. (eg what are run_input, seed_values, etc?) Can you provide a minimal reproducible example?
You should be able to verify if you variables are correctly restored by printing the value of the variable at save and restore time. Before saving, you can do print(sess.run(variable)) (or use tf.Print). After restoring, you can check the weights of the restored variables as follows: Supposing your variable's name is "XX", do:
var_value = imported_graph.get_tensor_by_name("XX:0")
print(sess.run(var_value))
I was able to find the issue, and it did not deal with the process of calling the saved weights.
When I first built the model for training, I created a dictionary from a text file by creating a set. In testing, I built the dictionary from the same text file, assuming that the order of elements would remain the same. Do not make this assumption. The order can change, hence the seemingly random results.

Variables, Constants and Graphs in Tensorflow

I am new to tensorflow and after going through some basics from different sources I am completely confused about the graphs and their execution.Here is a 6 line code :
x = tf.constant([35, 40, 45], name='x')
y = tf.Variable(x + 5, name='y')
model = tf.global_variables_initializer()
with tf.Session() as session:
session.run(model)
print(session.run(y))
1. Line 1 and 2 creates a constant and a variable , at this point a graph is created ?
2.Is the graph created when I run the 'model' through session i.e variable initialization? and at what point the graph is executed ?
3.When the graph is executed why do we need to run the variable i.e 'session.run(y)' to print its value ?
Edited :
Here is a line by line graph representation , is it correct ? I know 2(a) is wrong that is why i created 2(b) graph . So this is what happens to graph when I run these statements ?
So Tensorflow runs in two phases,
Creation Phase(Or building phase): Here you define your Variables, Constants and Placeholders, and their relations.(define the mathematical operations on them)
Execution phase: Till now, all your variables and the computations applied (like matmul or adddition etc) are merely defined. Not computed. They are computed in this phase.
So to answer your questions:
Q1: At this point, yes the schema of the graph has been created(or the graph has been built) but it has not been executed.
Q2: The graph is executed(That is the actual initialization is done) when you call the run function on the initializer
Q3: You need to call run on the initializer first because before you do it, as mentioned before, the graph schema has merely been defined. The actual allocations and computations have not been done. When the tensor session is started and the run function called, the graph is executed and during the process the initialization of your variables is done. Before that they are not accessible as they still haven't been initialized even though they have been defined.
The tensorflow getting started guide here offers a great explanation of the same.
Hope this helps!

TensorFlow - import meta graph and use variables from it

I'm training classification CNN using TensorFlow v0.12, and then want to create labels for new data using the trained model.
At the end of the training script, I added those lines of code:
saver = tf.train.Saver()
save_path = saver.save(sess,'/home/path/to/model/model.ckpt')
After the training completed, the files appearing in the folder are: 1. checkpoint ; 2. model.ckpt.data-00000-of-00001 ; 3. model.ckpt.index ; 4. model.ckpt.meta
Then I tried to restore the model using the .meta file. Following this tutorial, I added the following line into my classification code:
saver=tf.train.import_meta_graph(savepath+'model.ckpt.meta') #line1
and then:
saver.restore(sess, save_path=savepath+'model.ckpt') #line2
Before that change, I needed to build the graph again, and then write (instead of line1):
saver = tf.train.Saver()
But, deleting the graph building, and using line1 in order to restore it, raised an error. The error was that I used a variable from the graph inside my code, and the python didn't recognize it:
predictions = sess.run(y_conv, feed_dict={x: patches,keep_prob: 1.0})
The python didn't recognize the y_conv parameter. There is a way to restore the variables using the meta graph? if not, what os this restore helping, if I can't use variables from the original graph?
I know this question isn't so clear, but it was hard for me to express the problem in words. Sorry about it...
Thanks for answering, appreciate your help! Roi.
it is possible, don't worry. Assuming you don't want to touch the graph anymore, do something like this:
saver = tf.train.import_meta_graph('model/export/{}.meta'.format(model_name))
saver.restore(sess, 'model/export/{}'.format(model_name))
graph = tf.get_default_graph()
y_conv = graph.get_operation_by_name('y_conv').outputs[0]
predictions = sess.run(y_conv, feed_dict={x: patches,keep_prob: 1.0})
A preferred way would however be adding the ops into collections when you build the graph and then referring to them. So when you define the graph, you would add the line:
tf.add_to_collection("y_conv", y_conv)
And then after you import the metagraph and restore it, you would call:
y_conv = tf.get_collection("y_conv")[0]
It is actually explained in the documentation - the exact page you linked - but perhaps you missed it.
Btw, no need for the .ckpt extension, it might create some confusion as that is the old way of saving models.
Just to add to Roberts's answer - after obtaining a saver from the meta graph, and using it to restore the variables in the current session, you can also use:
y_conv = graph.get_tensor_by_name('y_conv:0')
This'll work if you've created the y_conv with explicitly adding the name="y_conv" argument (all TF ops have this).

Tensorflow: How to give variables scope

I have to first pretrain a network before training it. I do this using code in separate files with their own sessions, but the variables from the first session are still getting carried over and causing problems (as I'm running both these files within one 'main' file).
I could get around this problem by simply running my pretrain file which saves the trained layers and then running my training file which loads the saved layers in. But it would be nice to be able to do these two things in one step. How can I 'break the link' and avoid unwanted variables having a global scope?
The 'main' file looks something like this:
from util import pretrain_nn
from NN import Network
shape = [...]
layer_save_file = ''
data = get_data()
# Trains and saves layers
pretrain_nn(shape, data, layer_save_file)
# If I were to print all variables (using tf.all_variables)
# variables only used in pretrain_nn show up
# (the printing would be done inside `Network`)
NN = Network(shape, pretrain=True, layer_save_file)
NN.train(data)
# Doesn't work because apparently some variables haven't been initialized.
NN.save()
The variables' lifetime is implicitly tied to the TensorFlow graph, and by default both of your computations will be added to the same (global) graph. You can scope them appropriately using with tf.Graph().as_default(): blocks around each of the subcomputations:
with tf.Graph().as_default():
# Trains and saves layers
pretrain_nn(shape, data, layer_save_file)
with tf.Graph().as_default():
NN = Network(shape, pretrain=True, layer_save_file)
NN.train(data)
NN.save()

Categories