tensorflow how to change dataset - python

I have a Dataset API doohickey which is part of my tensorflow graph. How do I swap it out when I want to use different data?
dataset = tf.data.Dataset.range(3)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
variable = tf.Variable(3, dtype=tf.int64)
model = variable*next_element
#pretend like this is me training my model, or something
with tf.Session() as sess:
sess.run(variable.initializer)
try:
while True:
print(sess.run(model)) # (0,3,6)
except:
pass
dataset = tf.data.Dataset.range(2)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
### HOW TO DO THIS THING?
with tf.Session() as sess:
sess.run(variable.initializer) #This would be a saver restore operation, normally...
try:
while True:
print(sess.run(model)) # (0,3)... hopefully
except:
pass

I do not believe this is possible. You are asking to change the computation graph itself, which is not allowed in tensorflow. Rather than explain that myself, I find the accepted answer in this post to be particularly clear in explaining that point Is it possible to modify an existing TensorFlow computation graph?
Now, that said, I think there is a fairly simple/clean way to accomplish what you seek. Essentially, you want to reset the graph and rebuild the Dataset part. Of course you want to reuse the model part of the code. Thus just put that model in a class or function to allow reuse. A simple example built on your code:
# the part of the graph you want to reuse
def get_model(next_element):
variable = tf.Variable(3,dtype=tf.int64)
return variable*next_element
# the first graph you want to build
tf.reset_default_graph()
# the part of the graph you don't want to reuse
dataset = tf.data.Dataset.range(3)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
# reusable part
model = get_model(next_element)
#pretend like this is me training my model, or something
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
try:
while True:
print(sess.run(model)) # (0,3,6)
except:
pass
# now the second graph
tf.reset_default_graph()
# the part of the graph you don't want to reuse
dataset = tf.data.Dataset.range(2)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
# reusable part
model = get_model(next_element)
### HOW TO DO THIS THING?
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
try:
while True:
print(sess.run(model)) # (0,3)... hopefully
except:
pass
Final Note: you will also see some references here and there to tf.contrib.graph_editor docs here. They specifically say that you can't accomplish exactly what you want with the graph_editor (see in that link: "Here is an example of what you cannot do"; but you can get pretty close). Even still though, it's not good practice; they had good reason to make the graph append only, and I think the above method I suggest is the cleaner way to accomplish what you seek.

One way I would suggest but that will make things slower is by using place_holders followed by the tf.data.dataset. Therefore, you will have the following:
train_data = tf.placeholder(dtype=tf.float32, shape=[None, None, 1]) # just an example
# Then add the tf.data.dataset here
train_data = tf.data.Dataset.from_tensor_slices(train_data).shuffle(10000).batch(batch_size)
Now when running the graph within a session, you have to feed in the data using the placeholder. So you feed whatever you like...
Hope this helps!!

Related

Inference with a model trained with tf.Dataset

I have trained a model using the tf.data.Dataset API, so my training code looks something like this
with graph.as_default():
dataset = tf.data.TFRecordDataset(tfrecord_path)
dataset = dataset.map(scale_features, num_parallel_calls=n_workers)
dataset = dataset.shuffle(10000)
dataset = dataset.padded_batch(batch_size, padded_shapes={...})
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle,
train_dataset.output_types,
train_dataset.output_shapes)
batch = iterator.get_next()
...
# Model code
...
iterator = dataset.make_initializable_iterator()
with tf.Session(graph=graph) as sess:
train_handle = sess.run(iterator.string_handle())
sess.run(tf.global_variables_initializer())
for epoch in range(n_epochs):
sess.run(train_iterator.initializer)
while True:
try:
sess.run(optimizer, feed_dict={handle: train_handle})
except tf.errors.OutOfRangeError:
break
Now after the model is trained I want to infer on examples that are not in the datasets and I am not sure how to go about doing it.
Just to be clear, I know how to use another dataset, for example I just pass a handle to my test set upon testing.
The question is about given the scaling scheme and the fact that the network expects a handle, if I want to make a prediction to a new example which is not written to a TFRecord, how would I go about doing that?
If I'd modify the batch I'd be responsible for the scaling beforehand which is something I would like to avoid if possible.
So how should I infer single examples from a model traiend the tf.data.Dataset way?
(This is not for production purposes it is for evaluating what will happen if I change specific features)
actually there is a tensor name called "IteratorGetNext:0" in the graph
when you use dataset api, so you can using following way to directly set
input:
#get a tensor from a graph
input tensor : input = graph.get_tensor_by_name("IteratorGetNext:0")
# difine the target tensor you want evaluate for your prediction
prediction tensor: predictions=...
# finally call session to run
then sess.run(predictions, feed_dict={input: np.asanyarray(images), ...})

How to replace the input of a saved graph, e.g. a placeholder by a Dataset iterator?

I have a saved Tensorflow graph that consumes input through a placeholder with a feed_dict param.
sess.run(my_tensor, feed_dict={input_image: image})
Because feeding data with a Dataset Iterator is more efficient, I want to load the saved graph, replace the input_image placeholder with an Iterator and run. How can I do that? Is there a better way to do it? An answer with code example would be highly appreciated.
You can achieve that by serializing your graph and reimport it using tf.import_graph_def, which has an input_map argument used to plug-in inputs at the desired places.
To do that you need at least to know the name of the inputs you replace and of the outputs you wish to execute (resp. x and y in my examples).
import tensorflow as tf
# restore graph (built from scratch here for the example)
x = tf.placeholder(tf.int64, shape=(), name='x')
y = tf.square(x, name='y')
# just for display -- you don't need to create a Session for serialization
with tf.Session() as sess:
print("with placeholder:")
for i in range(10):
print(sess.run(y, {x: i}))
# serialize the graph
graph_def = tf.get_default_graph().as_graph_def()
tf.reset_default_graph()
# build new pipeline
batch = tf.data.Dataset.range(10).make_one_shot_iterator().get_next()
# plug in new pipeline
[y] = tf.import_graph_def(graph_def, input_map={'x:0': batch}, return_elements=['y:0'])
# enjoy Dataset inputs!
with tf.Session() as sess:
print('with Dataset:')
try:
while True:
print(sess.run(y))
except tf.errors.OutOfRangeError:
pass
Note that the placeholder node is still there as I did not bother here to parse graph_def to remove it -- you could remove it as an improvement, although I think it is also OK to leave it here.
Depending on how you restore your graph, the input replacement may be already built-in in the loader, which makes things simpler (no need to go back to a GraphDef). For example, if you load your graph from a .meta file, you can use tf.train.import_meta_graph which accepts the same input_map argument.
import tensorflow as tf
# build new pipeline
batch = tf.data.Dataset.range(10).make_one_shot_iterator().get_next()
# load your net and plug in new pipeline
# you need to know the name of the tensor where to plug-in your input
restorer = tf.train.import_meta_graph(graph_filepath, input_map={'x:0': batch})
y = tf.get_default_graph().get_tensor_by_name('y:0')
# enjoy Dataset inputs!
with tf.Session() as sess:
# not needed here, but in practice you would also need to restore weights
# restorer.restore(sess, weights_filepath)
print('with Dataset:')
try:
while True:
print(sess.run(y))
except tf.errors.OutOfRangeError:
pass

Loading Tensorflow model in different session

I'm a bit new to all this so could you please help me? I tried finding the answer to this question but found nothing.
I'm trying to load Tensorflow model in python in a separate function so I can use the model in a loop without having to load it in every iteration of the for loop.
This is my code now:
def load_network():
prediction = neural_network_model(x)
return (prediction)
def use_neural_network(data, prediction):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.import_meta_graph(model_name+'.meta')
saver.restore(sess,model_name)
pred = sess.run(prediction, feed_dict={x: data})
pred = np.asarray(pred)
return pred
if __name__ == '__main__':
result=[]
Load= start_network()
for i in data:
result.append(use_neural_network(i,Load))
And I would like to get something like this:
def load_network():
prediction = neural_network_model(x)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.import_meta_graph(model_name+'.meta')
saver.restore(sess,model_name)
return (prediction)
def use_neural_network(data, prediction):
with tf.Session() as sess:
pred = sess.run(prediction, feed_dict={x: data})
pred = np.asarray(pred)
return pred
if __name__ == '__main__':
result=[]
Load= start_network()
for i in data:
result.append(use_neural_network(i,Load))
Generally what you're trying to achieve is easily doable and you're on the right track. In the main block you have start_network() instead of load_network() as in your first line. I'd also recommend against using Load as a variable name but that should not be a problem. Also the TensorFlow Session (sess in your code) should either be a global variable, or you should initialize it either in the main block or in the load_network() function and then pass it on to the use_neural_network() function. The way it's currently written the two sess variables in the two functions are local and therefore refer to different sessions.
If you want to avoid having to use the neural_network_model( x ) function, that is building the model at the beginning, you might want to freeze the model and load it that way, with the architecture embedded as well. Easiest to follow a guide on that, like this one.

How to reuse a data batch from iterator.get_next()

I'm implementing an algorithm involving alternating optimization. That is, at each iteration, the algorithm fetches a data batch, and uses the data batch to optimize two losses sequentially. My current implementation with tf.data.Dataaset and tf.data.Iterator is something like this (which is indeed incorrect as detailed below):
data_batch = iterator.get_next()
train_op_1 = get_train_op(data_batch)
train_op_2 = get_train_op(data_batch)
for _ in range(num_steps):
sess.run(train_op_1)
sess.run(train_op_2)
Note that the above is incorrect because each call of sess.run will advance the iterator to get next data batch. So train_op_1 and train_op_2 are indeed using different data batches.
I cannot do something like sess.run([train_op_1, train_op_2]) either, because the two optimization steps need to be sequential (i.e., the 2nd optimization step depends on the latest variable value by the 1st optimization step.)
I'm wondering is there any way to somehow "freeze" the iterator, so that it won't advance in a sess.run call?
I was doing something similar so that is part of my code stripped from some unnecessary stuff. It does a bit more as it has train and validation iterators, but you should get the idea of using is_keep_previous flag. Basically passed as True it fill force reuse of the previous value of the iterator, in case of False it will get new value.
iterator_t = ds_t.make_initializable_iterator()
iterator_v = ds_v.make_initializable_iterator()
iterator_handle = tf.placeholder(tf.string, shape=[], name="iterator_handle")
iterator = tf.data.Iterator.from_string_handle(iterator_handle,
iterator_t.output_types,
iterator_t.output_shapes)
def get_next_item():
# sometimes items need casting
next_elem = iterator.get_next(name="next_element")
x, y = tf.cast(next_elem[0], tf.float32), next_elem[1]
return x, y
def old_data():
# just forward the existing batch
return inputs, target
is_keep_previous = tf.placeholder_with_default(tf.constant(False),shape=[], name="keep_previous_flag")
inputs, target = tf.cond(is_keep_previous, old_data, new_data)
with tf.Session() as sess:
sess.run([tf.global_variables_initializer(),tf.local_variables_initializer()])
handle_t = sess.run(iterator_t.string_handle())
handle_v = sess.run(iterator_v.string_handle())
# Run data iterator initialisation
sess.run(iterator_t.initializer)
sess.run(iterator_v.initializer)
while True:
try:
inputs_, target_ = sess.run([inputs, target], feed_dict={iterator_handle: handle_t, is_keep_previous:False})
print(inputs_, target_)
inputs_, target_ = sess.run([inputs, target], feed_dict={iterator_handle: handle_t, is_keep_previous:True})
print(inputs_, target_)
inputs_, target_ = sess.run([inputs, target], feed_dict={iterator_handle: handle_v})
print(inputs_, target_)
except tf.errors.OutOfRangeError:
# now we know we run out of elements in the validationiterator
break
Use control dependencies when building the graph for train_op_2 so it can see the updated values of the variables.
Or use eager execution.

Predicting single images with tensorflow dataset api

I am trying to create a prediction script using the tensorflow dataset api. Previously I did this using the low-level API and feed_dict:
#import graph
saver = tf.train.import_meta_graph('...')
# Select variables to feed
x = graph.get_tensor_by_name("X:0")
predictions = graph.get_tensor_by_name("pred:0")
with tf.Session() as sess:
p = sess.run(predictions, feed_dict={x:x_feed})
Now I am using the dataset API in the fashion below:
iterator =
tf.data.Iterator.from_structure(training_dataset.output_types,
training_dataset.output_shapes)
next_element = iterator.get_next()
training_init_op = iterator.make_initializer(training_dataset)
validation_init_op = iterator.make_initializer(validation_dataset)
for _ in range(20):
# Initialize an iterator over the training dataset.
sess.run(training_init_op)
for _ in range(100):
sess.run(next_element
# Initialize an iterator over the validation dataset.
sess.run(validation_init_op)
for _ in range(50):
sess.run(next_element)
I am saving a .meta and .data file. How do I use these to create a prediction script? I am unable to extract operations from the graph and feed in desired vales an there are no placeholders defined. One way would be to use the same script and use test data, but there must be a better way?
Thanks

Categories