Related
I'm trying to create an incremental classifier that will get trained on data containing n classes for some set number of epochs, then n+m classes for a set number of epochs, then n+m+k, etc, where each successive set of classes contains the previous set as a subset.
In order to do this without having to train the model, save it, manually edit the graph, re-train, repeat, I'm simply defining all the weights I will need to classify the entire set of classes, but keeping the weights corresponding to unseen classes frozen at 0 until the classifier is introduced to those classes.
My strategy for this is to define a placeholder that is fed in an array of Boolean values defining whether or not some given set of weights are trainable.
Relevant code below:
output_train = tf.placeholder(tf.int32, shape = (num_incremental_grps), name = "output_train")
.
.
.
weights = []
biases = []
for i in range(num_incremental_grps):
W = tf.Variable(tf.zeros([batch_size, classes_per_grp]),
trainable=tf.cond(tf.equal(output_train[i], tf.constant(1)),lambda: tf.constant(True), lambda: tf.constant(False)))
weights.append(W)
b = tf.Variable(tf.zeros([classes_per_grp]), trainable=tf.cond(tf.equal(output_train[i],
tf.constant(1)), lambda:tf.constant(True), lambda: tf.constant(False)))
biases.append(b)
out_weights = tf.stack(weights, axis=1).reshape((batch_size, -1))
out_biases = tf.stack(biases, axis=1).reshape((batch_size, -1))
outputs = tf.identity(tf.matmul(inputs, out_weights) + out_biases, name='values')
.
.
.
# Will change this to an array that progressively updates as classes are added.
output_trainable = np.ones(num_incremental_grps, dtype=bool)
.
.
.
with tf.Session() as sess:
init.run()
for epoch in range(epochs):
for iteration in range(iterations):
X_batch, y_batch = batch.getBatch()
fd={X: X_batch, y: y_batch, training: True, output_train: output_trainable}
_, loss_val = sess.run([training_op, loss], feed_dict=fd)
This returns the error message
Using a 'tf.Tensor' as a Python `bool` is not allowed. Use `if t is not None:` instead of
`if t:` to test if a tensor is defined,and use TensorFlow ops such as tf.cond to execute
subgraphs conditioned on the value of a tensor.
I've tried tinkering around with this, like making the initial placeholder datatype tf.bool instead of tf.int32. I've also tried just feeding in a slice of the tensor into the 'trainable' argument in the weights/biases like this
W = tf.Variable(tf.zeros([batch_size, classes_per_grp]), trainable=output_variable[i])
but I get the same error message. I'm not sure how to proceed from here, aside from trying a completely different approach to updating the number of predictable classes. Any help would be much appreciated.
The error occurs because tf.cond takes a decision based on a single boolean — much like an if statement. What you want here is to make a choice per element of your tensor.
You could use tf.where to fix that problem, but then you will run into another one, which is that trainable is not a property that you can fix at runtime, it is part of the definition of a variable. If a variable will be trained at some point, perhaps not at the beginning but definitely later, then it must be trainable.
I would suggest to take a much simpler route: define output_train to be an array of tf.float32
output_train = tf.placeholder(tf.float32, shape=(num_incremental_grps), name="output_train")
then later simply multiply your weights and variables with this vector.
W = tf.Variable(...)
W = W * output_train
...
Provide values of 1 to output_train where you want training to happen, 0 otherwise.
Be careful to also mask your loss to ignore output from unwanted channels, because event though they now always output 0, that may still affect your loss. For example,
logits = ...
logits = tf.matrix_transpose(tf.boolean_mask(
tf.matrix_transpose(logits ),
output_train == 1))
loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels)
Problem:
I am very new to Tensorflow. My specific question is what particular arguments should I put inside sess.run(fetches, feed_dict) function. For instance, how could find out what the values of the arguments?
Steps:
Here is my understanding of the steps after looking at other posts.
Save tranied tensorflow model, it should consists of 4 files, below are my outputs:
checkpoint
Inception_resnet_v2.ckpt.data-00000-of-00001
Inception_resnet_v2.ckpt.index
Inception_resnet_v2.ckpt.meta
Resize the input image to whatever format required by the neural network.
Start tensorflow session.
Retrive the Graph and associated parameters, tensors...
Predict the input image.
Code:
Traning code:
https://github.com/taki0112/SENet-Tensorflow/blob/master/SE_Inception_resnet_v2.py
[Solved] Test code:
import tensorflow as tf
import numpy as np
import cv2
labels = ["airplane","automobile","bird","cat","deer","dog","frog","horse","ship","truck"]
# Load graph and parameters, etc.
sess=tf.Session()
saver = tf.train.import_meta_graph('./model/Inception_resnet_v2.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint("./model/"))
graph = tf.get_default_graph()
# Get tensor names
x = graph.get_tensor_by_name("Placeholder:0")
training_flag = graph.get_tensor_by_name("Placeholder_2:0")
op_to_restore = graph.get_tensor_by_name("final_fully_connected/dense/BiasAdd:0")
# Preprocess imgae imput
src = cv2.imread("./input/car3.jpg")
dst = cv2.resize(src, (32, 32), interpolation=cv2.INTER_CUBIC)
b,g,r = cv2.split(dst)
b = (b - np.mean(b)) / np.std(b) * .1
g = (g - np.mean(g)) / np.std(g) * .1
r = (r - np.mean(r)) / np.std(r) * .1
src = cv2.merge((b,g,r))
picture = dst.reshape(1, 32, 32, 3)
feed_dict ={x: picture, training_flag:False}
result_index = sess.run(op_to_restore,feed_dict)
print(result_index)
print (labels[np.argmax(result_index)])
the arguments actually depend on what you're doing, but mostly the first argument is the weights and placeholders. Whenever you are working with Tensorflow, you define a graph which is fed examples(training data) and some hyperparameters like learning rate, global step etc. It’s a standard practice to feed all the training data and hyperparameters using placeholders. when you build a network using placeholders and save it the network is saved, however, values of the placeholders are not saved.
Let's see a toy example:
import tensorflow as tf
#Prepare to feed input, i.e. feed_dict and placeholders
w1 = tf.placeholder("float", name="w1")
w2 = tf.placeholder("float", name="w2")
b1= tf.Variable(2.0,name="bias")
feed_dict ={w1:4,w2:8}
#Define a test operation that we will restore
w3 = tf.add(w1,w2)
w4 = tf.multiply(w3,b1,name="op_to_restore")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
#Create a saver object which will save all the variables
saver = tf.train.Saver()
#Run the operation by feeding input
print sess.run(w4,feed_dict)
#Prints 24 which is sum of (w1+w2)*b1
#Now, save the graph
saver.save(sess, 'my_test_model',global_step=1000)
Now, when we want to restore it, we not only have to restore the graph and weights, but also prepare a new feed_dict that will feed the new training data to the network. We can get reference to these saved operations and placeholder variables via graph.get_tensor_by_name() method. So if you want to train the same model with further new data, then you would have to utilize those weigtages, if however you just want to get the prediction from the model you trained, you could utilize the op_to_restore and the feed_dict as new data. Something like this, if you follow the above example:
import tensorflow as tf
sess=tf.Session()
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('my_test_model-1000.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
# Now, let's access and create placeholders variables and
# create feed-dict to feed new data
graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("w1:0")
w2 = graph.get_tensor_by_name("w2:0")
feed_dict ={w1:13.0,w2:17.0}
#Now, access the op that you want to run.
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")
print sess.run(op_to_restore,feed_dict)
#This will print 60 which is calculated
#using new values of w1 and w2 and saved value of b1.
So, this is how it works, in your case, since you're trying to load the Inception model, your op_to_restore should depend on what you're trying to restore if you could tell us what you're trying to do, then only it's possible to suggest something. However in the other parameter feed_dict , it's just the numpy array of image pixel, of you, you're trying to classify/predict or whatever you're doing.
I took the code from the following article. This will help you as well. http://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/
Update: For your particular case, you may like to try the following code to predict the classes in the new images.
import tensorflow as tf
slim = tf.contrib.slim
from inception_resnet_v2 import *
#Well, since you're using resnet_v2, this may be equivalent to you.
checkpoint_file = 'inception_resnet_v2_2016_08_30.ckpt'
sample_images = ['dog.jpg', 'panda.jpg']
#Load the model
sess = tf.Session()
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(input_tensor, is_training=False)
#With this, you could consider the op_variable with the following
predict_values, logit_values = sess.run([end_points['Predictions'], logits], feed_dict={input_tensor: im})
#Here im is the normalized numpy array of the image pixels.
Furthermore, the following resources may help you even more:
Using pre-trained inception_resnet_v2 with Tensorflow
https://github.com/tensorflow/tensorflow/issues/7172
The context is that I'm trying to incrementally grow a rnn autoencoder, by first training a single cell encoder/decoder then extending. I'd like to load the parameters of the preceding cells.
This code here is a minimal code where I'm investigating how to do this, and it fails with:
TypeError: Cannot interpret feed_dict key as Tensor: The name 'save_1/Const:0' refers to a Tensor which does not exist. The operation, 'save_1/Const', does not exist in the graph.
I've searched and found nothing, this thread and this thread are not the same problem.
MVCE
import tensorflow as tf
import numpy as np
with tf.Session(graph=tf.Graph()) as sess:
cell1 = tf.nn.rnn_cell.LSTMCell(1,name='lstm_cell1')
cell = tf.nn.rnn_cell.MultiRNNCell([cell1])
inputs = tf.random_normal((5,10,1))
rnn1 = tf.nn.dynamic_rnn(cell,inputs,dtype=tf.float32)
vars0 = tf.trainable_variables()
saver = tf.train.Saver(vars0,max_to_keep=1)
sess.run(tf.initialize_all_variables())
saver.save(sess,'./save0')
vars0_val = sess.run(vars0)
# creating a new graph/session because it is not given that it'll be in the same session.
with tf.Session(graph=tf.Graph()) as sess:
cell1 = tf.nn.rnn_cell.LSTMCell(1,name='lstm_cell1')
#one extra cell
cell2 = tf.nn.rnn_cell.LSTMCell(1,name='lstm_cell2')
cell = tf.nn.rnn_cell.MultiRNNCell([cell1,cell2])
inputs = tf.random_normal((5,10,1))
rnn1 = tf.nn.dynamic_rnn(cell,inputs,dtype=tf.float32)
sess.run(tf.initialize_all_variables())
# new saver with first cell variables
saver = tf.train.Saver(vars0,max_to_keep=1)
# fails
saver.restore(sess,'./save0')
# Should be the same
vars0_val1 = sess.run(vars0)
assert np.all(vars0_val1 = vars0_val)
The mistake comes from the line,
saver = tf.train.Saver(vars0,max_to_keep=1)
if the second session. vars0 refers to actual tensor objects that existed in the previous graph (not the current one). Saver's var_list requires an actual set of tensors (not strings, which I assumed would be good enough).
To make it work the second Saver object should be initialized with the corresponding tensors in the current graph.
Something like,
vars0_names = [v.name for v in vars0]
load_vars = [sess.graph.get_tensor_by_name(n) for n in vars0_names]
saver = tf.train.Saver(load_vars,max_to_keep=1)
I have seen variations of this question asked, but I haven't quite found a satisfactory answer yet. Basically, I would like to do the equivalent from keras model.to_json(), model.get_weights(), model.from_json(), model.set_weights() to tensorflow. I think I am getting close to there, but I am at a point where I am stuck. I'd prefer if I could get the weights and graph in the same string, but I understand if that isn't possible.
Currently, what I have is:
g = optimizer.minimize(loss_op,
global_step=tf.train.get_global_step())
de = g.graph.as_graph_def()
json_string = json_format.MessageToJson(de)
gd = tf.GraphDef()
gd = json_format.Parse(json_string, gd)
That seems to create the graph fine, but obviously the meta graph is not included for variable, weights, etc. There is also the meta graph, but the only thing I see is export_meta_graph, which doesn't seem to serialize in the same manner. I saw that MetaGraph has a proto function, but I don't know how to serialize those variables.
So in short, how would you take a tensorflow model (model as in weights, graph, etc), serialize it to a string (preferably json), then deserialize it and continue training or serve predictions.
Here are things that get me close to there and I have tried, but mostly has limitations in needing to write to disk, which I can't do in this case:
Gist on GitHub
This is the closest one I found, but the link to serializing a metagraph doesn't exist.
Note that the solution from #Maxim will create new operations in the graph each time it runs.
If you run the function very frequently this will cause your code to get slower and slower.
Two solutions to work around this problem:
Create the assign operations at the same time as the rest of the graph and reuse them:
assign_ops = []
for var_name in tf.trainable_variables():
assign_placeholder = tf.placeholder(var.dtype, shape=value.shape)
assign_op = var.assign(assign_placeholder)
assign_ops.append(assign_op)
Use the load function on the variables, I prefer this one as it removes the need for the code above:
self.params = tf.trainable_variables()
def get_weights(self):
values = tf.get_default_session().run(self.params)
return values
def set_weights(self, weights):
for i, value in enumerate(weights):
value = np.asarray(value)
self.params[i].load(value, self.sess)
(I can't comment so I put this as an answer instead)
If you want the equivalent of keras Model.get_weights() and Model.set_weights(), these methods aren't strongly tied to keras internals and can be easily extracted.
Original code
Here's how they look like in keras source code:
def get_weights(self):
weights = []
for layer in self.layers:
weights += layer.weights
return K.batch_get_value(weights) # this is just `get_session().run(weights)`
def set_weights(self, weights):
tuples = []
for layer in self.layers:
num_param = len(layer.weights)
layer_weights = weights[:num_param]
for sw, w in zip(layer.weights, layer_weights):
tuples.append((sw, w))
weights = weights[num_param:]
K.batch_set_value(tuples) # another wrapper over `get_session().run(...)`
Keras's weights is the list of numpy arrays (not json). As you can see, it uses the fact that model architecture is known (self.layers) which allows it to reconstruct the correct mapping from variables to values. Some seemingly non-trivial work is done in K.batch_set_value, but in fact it simply prepares assign ops and runs them in session.
Getting and setting weights in pure tensorflow
def tensorflow_get_weights():
vars = tf.trainable_variables()
values = tf.get_default_session().run(vars)
return zip([var.name for var in vars], values)
def tensorflow_set_weights(weights):
assign_ops = []
feed_dict = {}
for var_name, value in weights:
var = tf.get_default_session().graph.get_tensor_by_name(var_name)
value = np.asarray(value)
assign_placeholder = tf.placeholder(var.dtype, shape=value.shape)
assign_op = tf.assign(var, assign_placeholder)
assign_ops.append(assign_op)
feed_dict[assign_placeholder] = value
tf.get_default_session().run(assign_ops, feed_dict=feed_dict)
Here I assume that you want to serialize / deserialize the whole model (i.e., all trainable variables) and in the default session. If this is not the case, functions above are easily customizable.
Testing
x = tf.placeholder(shape=[None, 5], dtype=tf.float32, name='x')
W = tf.Variable(np.zeros([5, 5]), dtype=tf.float32, name='W')
b = tf.Variable(np.zeros([5]), dtype=tf.float32, name='b')
y = tf.add(tf.matmul(x, W), b)
with tf.Session() as session:
session.run(tf.global_variables_initializer())
# Save the weights
w = tensorflow_get_weights()
print(W.eval(), b.eval())
# Update the model
session.run([tf.assign(W, np.ones([5, 5])), tf.assign(b, np.ones([5]) * 2)])
print(W.eval(), b.eval())
# Restore the weights
tensorflow_set_weights(w)
print(W.eval(), b.eval())
If you run this test, you should see the model was freezed at zeros, then got updated and then restored back to zeros.
You can use freeze_graph
This script is included in Tensorflow and allows you to take a GraphDef proto, a SaverDef proto, and a set of variable values stored in a checkpoint file.
In this way you can output a GraphDef with all of the variable ops converted into const ops containing the values of the variables.
To restore a frozen model you have to reinitialize graphs and remap inputs from the frozen model, see this example
Thanks to Maxim for getting me to the solution. I wanted to post an answer with both the graph and weights being converted to json for people that stumble across this problem. To just serialize the graph and not the weights, I created a gist that encapsulates what Maxim wrote here: Tensorflow graph with non json serialized weights
Now to serialize/deserialize both the graph and weights, I created a separate gist here: Tensorflow graph with json serialized weights and graph.
To run through the explanation, I first slightly tweaked the weight functions by not returning the variables in get weights, and in set weights, grabbing the current variables there. The is an important caveat, especially if the graph is slightly different than the current trainable variables:
import tensorflow as tf
import numpy as np
from google.protobuf import json_format
import json
def tensorflow_get_weights():
vs = tf.trainable_variables()
values = tf.get_default_session().run(vs)
return values
def tensorflow_set_weights(weights):
assign_ops = []
feed_dict = {}
vs = tf.trainable_variables()
zipped_values = zip(vs, weights)
for var, value in zipped_values:
value = np.asarray(value)
assign_placeholder = tf.placeholder(var.dtype, shape=value.shape)
assign_op = var.assign(assign_placeholder)
assign_ops.append(assign_op)
feed_dict[assign_placeholder] = value
tf.get_default_session().run(assign_ops, feed_dict=feed_dict)
Next, I created two utility functions that would convert weights to and from json:
def convert_weights_to_json(weights):
weights = [w.tolist() for w in weights]
weights_list = json.dumps(weights)
return weights_list
def convert_json_to_weights(json_weights):
loaded_weights = json.loads(json_weights)
loaded_weights = [np.asarray(x) for x in loaded_weights]
return loaded_weights
Than I had a method that initially ran to kick off training. This method would initialize variables, run the optimization, get the weights and graph, and convert them into json. It looks like:
def run_initial_with_json_weights(opti, feed_dict):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(0, 250):
sess.run(opti, feed_dict=feed_dict)
first_weights = tensorflow_get_weights()
g = tf.get_default_graph().as_graph_def()
json_string = json_format.MessageToJson(g)
return json_string, convert_weights_to_json(first_weights)
Now that we have the serialized weights and graph, if we want to continue training and or make predictions, we can do the following. This method deserializes the graphdef and weights, runs the optimization, then makes predictions.
def run_serialized(json_graph, json_weights, feed_dict):
gd = tf.GraphDef()
gd = json_format.Parse(json_graph, gd)
weights = convert_json_to_weights(json_weights)
with tf.Session() as sess:
tf.import_graph_def(gd)
sess.run(tf.global_variables_initializer())
nu_out = tf.get_default_graph().get_tensor_by_name('outer/Sigmoid:0')
mini = tf.get_default_graph().get_tensor_by_name('mini:0')
tensorflow_set_weights(weights)
for i in range(0, 50):
sess.run(mini, feed_dict=feed_dict)
predicted = sess.run(nu_out, feed_dict=feed_dict)
return predicted
A full xor example is in the gist above.
As far as I understand it, I should be able to add a print operator to my graph by doing something like this:
a = nn_ops.softmax(s)
a = tf.Print(a, [tf.shape(a)], message="This is shape a: ")
and when the graph is executed this should print the shape of a. However, this statement produces no output for me (I am running the seq2seq tensorflow tutorial and this softmax belongs to the attention function, so it's definitely executed).
I do get output if instead I do something like this:
ph = tf.placeholder(tf.float32, [3,4,5,6])
ts = tf.shape(ph)
tp = tf.Print(ts, [ts], message="PRINT=")
sess = tf.Session()
sess.run(tp)
However, in my real example, sess.run() is called in seq2seq_model.py, and if I try to do sess.run(a) in the attention function, tensorflow complains:
You must feed a value for placeholder tensor 'encoder0' with dtype int32
but I don't have access to the input feed at this point in the code. How can I fix this?
In case you just want to know the tensor shape, often it can be inferred without running the graph. Then you do not need tf.Print.
For example in the second code fragment you can just use:
ph = tf.placeholder(tf.float32, [3,4,5,6])
print(ph.get_shape())
If you want to see the shape which depends on the input size (using tf.shape) or you want to see a value which also depends on the input, it is not possible to do without providing the input data.
For example if you train the model where x and y are your samples and labels respectively, you cannot compute cost without providing them.
If you have the following code:
predictions = ...
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_output, y_train))
cost = tf.Print(cost, [tf.shape(cost)], message='cost:')
Trying to evaluate it without providing placeholder values won't work:
sess.run(cost)
# error, no placeholder provided
However this will work as expected:
sess.run(cost, {x: x_train, y: y_train})
Regarding your first code fragment. In order to work, tf.Print node needs to be executed in order to print a message. I suspect in your case the print node is not used during further computations.
For example the following code won't produce output:
import tensorflow as tf
a = tf.Variable([1., 2., 3])
b = tf.Variable([1., 2., 3])
c = tf.add(a, b)
# we create a print node, but it is never used
a = tf.Print(a, [tf.shape(a)], message='a.shape: ')
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print(sess.run(c))
However if you reverse lines such that the print node is used during computations, you will see the output:
a = tf.Print(a, [tf.shape(a)], message='a.shape: ')
# now c depends on the tf.Print node
c = tf.add(a, b)