Using tf.train.string_input_producer and tf.image.decode_jpeg I manage to read from disk and decode a single image.
This is the code:
# -------- Graph
filename_queue = tf.train.string_input_producer(
[img_path, img_path])
image_reader = tf.WholeFileReader()
key, image_file =
image = tf.image.decode_jpeg(image_file, channels=3)
# Run my network
logits = network.get_logits(image)
# -------- Session
sess = tf.Session()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
logits_output =
The thing is, that when I look at the shape of the logit_outputs I get only 1 value even though the queue is 2 images long.
How can I read and decode the entire queue?
tf.WholeFileReader(), along tf.train.string_input_producer() work as an iterator, and thus does not have an easy way to evaluate the size of the complete dataset it is handling.
To obtain batches of N samples out of it, you could instead use image_reader.read_up_to(filename_queue, N).
Note: you can achieve the same using the newer pipeline:
def _parse_function(filename):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_image(image_string)
return image_decoded
# A vector of filenames.
filenames = tf.constant([img_path, img_path])
dataset =
dataset =
iterator = dataset.make_one_shot_iterator()
next_image_batch = iterator.get_next()
logits = network.get_logits(next_image_batch)
# ...
My code hangs on the following statement:
print('Beginning Eval...')
feed_dict_train = {img_data: images_batch.eval(session=session),
img_labels: labels_batch.eval(session=session)}
I have tried to eval with both a single example and as a batch. The code to understand how images_batch and labels_batch are created is as follows:
Creating a single example:
def read_single_example(filename):
image_size = 256
filenames = tf.train.string_input_producer([filename], num_epochs = None)
reader = tf.TFRecordReader()
_, serialized_example =
features = tf.parse_single_example(
'label':tf.FixedLenFeature([], tf.float32),
'image':tf.FixedLenFeature([image_size*image_size*3], tf.float32)
label = features['label']
#label = tf.get_default_session().run(label)
image_data = features['image']
#image_data = tf.get_default_session().run(image_data)
image = tf.reshape(image_data,(256,256,3))
return label, image
Reading a single example:
label_train, image_train = read_single_example(path)
Batching the examples inside the training loop:
print('Getting Batch')
images_batch, labels_batch = tf.train.shuffle_batch(
[image_train, label_train], batch_size=batch_size,
The records read from disk were created with the code found here but in essence are just a bunch of 256x256x3 images with a 1x41 tensor for labels. Thats what I think I did anyway, hoping the records arent where I made the error or I have like 100GB of dead data.
This is how the Tf session is started:
session = tf.Session()
init = tf.global_variables_initializer()
Which to my understanding was all I needed to do to make everything work as intended. A gist of the full code for the network can be found here. Hope I didnt do anything stupid, and thanks for the help!
I have a model that's trains my network using an Iterator; following the new Dataset API pipeline model that's now recommended by Google.
I read tfrecord files, feed data to the network, train nicely, and all is going well, I save my model in the end of the training so I can run Inference on it later. A simplified version of the code is as following:
""" Training and saving """
training_dataset =
training_dataset =
training_dataset = training_dataset.batch(BATCH_SIZE)
with tf.name_scope("iterators"):
training_iterator = Iterator.from_structure(training_dataset.output_types, training_dataset.output_shapes)
next_training_element = training_iterator.get_next()
training_init_op = training_iterator.make_initializer(training_dataset)
def train(num_epochs):
# compute for the number of epochs
for e in range(1, num_epochs+1): #initializing iterator here
while True:
images, labels =, feed_dict={x: images, y_true: labels})
except tf.errors.OutOfRangeError:
saver_name = './saved_models/ucf-model'
print("Finished Training Epoch {}".format(e))
""" Restoring """
# restoring the saved model and its variables
session = tf.Session()
saver = tf.train.import_meta_graph(r'saved_models\ucf-model.meta')
saver.restore(session, tf.train.latest_checkpoint('.\saved_models'))
graph = tf.get_default_graph()
# restoring relevant tensors/ops
accuracy = graph.get_tensor_by_name("accuracy/Mean:0") #the tensor that when evaluated returns the mean accuracy of the batch
testing_iterator = graph.get_operation_by_name("iterators/Iterator") #my iterator used in testing.
next_testing_element = graph.get_operation_by_name("iterators/IteratorGetNext") #the GetNext operator for my iterator
# loading my testing set tfrecords
testing_dataset =
testing_dataset =, num_threads=4, output_buffer_size=BATCH_SIZE*20)
testing_dataset = testing_dataset.batch(BATCH_SIZE)
testing_init_op = testing_iterator.make_initializer(testing_dataset) #to initialize the dataset
with tf.Session() as session:
while True:
images, labels =
accuracy =, feed_dict={x: test_images, y_true: test_labels}) #error here, x, y_true not defined
except tf.errors.OutOfRangeError:
My problem is mainly when I restore the model. How to feed testing data to the network?
When I restore my Iterator using testing_iterator = graph.get_operation_by_name("iterators/Iterator"), next_testing_element = graph.get_operation_by_name("iterators/IteratorGetNext"), I get the following error:
GetNext() failed because the iterator has not been initialized. Ensure that you have run the initializer operation for this iterator before getting the next element.
So I did try to initialize my dataset using: testing_init_op = testing_iterator.make_initializer(testing_dataset)). I got this error: AttributeError: 'Operation' object has no attribute 'make_initializer'
Another issue is, since an iterator is being used, there's no need to use placeholders in the training_model, as an iterator feed data directly to the graph. But this way, how to restore my feed_dict keys in the 3rd to last line, when I feed data to the "accuracy" op?
EDIT: if someone could suggest a way to add placeholders between the Iterator and the network input, then I could try running the graph by evaluating the "accuracy" tensor while feeding data to the placeholders and ignoring the iterator altogether.
When restoring a saved meta graph, you can restore the initialization operation with name and then use it again to initialize the input pipeline for inference.
That is, when creating the graph, you can do
dataset_init_op = iterator.make_initializer(dataset, name='dataset_init')
And then restore this operation by doing:
dataset_init_op = graph.get_operation_by_name('dataset_init')
Here is a self contained code snippet that compares results of a randomly initialized model before and after restoring.
Saving an Iterator
data = np.random.random([4, 4])
X = tf.placeholder(dtype=tf.float32, shape=[4, 4], name='X')
dataset =
iterator =, dataset.output_shapes)
dataset_next_op = iterator.get_next()
# name the operation
dataset_init_op = iterator.make_initializer(dataset, name='dataset_init')
w = np.random.random([1, 4])
W = tf.Variable(w, name='W', dtype=tf.float32)
output = tf.multiply(W, dataset_next_op, name='output')
sess = tf.Session()
saver = tf.train.Saver(), feed_dict={X:data})
while True:
except tf.errors.OutOfRangeError:, 'tmp/', global_step=1002)
And then you can restore the same model for inference as follows:
Restoring saved iterator
data = np.random.random([4, 4])
sess = tf.Session()
saver = tf.train.import_meta_graph('tmp/-1002.meta')
ckpt = tf.train.get_checkpoint_state(os.path.dirname('tmp/checkpoint'))
saver.restore(sess, ckpt.model_checkpoint_path)
graph = tf.get_default_graph()
# Restore the init operation
dataset_init_op = graph.get_operation_by_name('dataset_init')
X = graph.get_tensor_by_name('X:0')
output = graph.get_tensor_by_name('output:0'), feed_dict={X:data})
while True:
except tf.errors.OutOfRangeError:
I would suggest to use, which has been designed precisely for this purpose. It is much less verbose and does not require you to change existing code, in particular how you define your iterator.
Working example, when we save everything after step 5 has completed. Note how I don't even bother knowing what seed is used.
import tensorflow as tf
iterator = (
batch = iterator.get_next(name='batch')
saveable_obj =
tf.add_to_collection(tf.GraphKeys.SAVEABLE_OBJECTS, saveable_obj)
saver = tf.train.Saver()
with tf.Session() as sess:
for step in range(10):
print('{}: {}'.format(step,
if step == 5:, './foo', global_step=step)
# 0: 1
# 1: 6
# 2: 7
# 3: 3
# 4: 8
# 5: 10
# 6: 12
# 7: 14
# 8: 5
# 9: 17
Then later, if we resume from step 6, we get the same output.
import tensorflow as tf
saver = tf.train.import_meta_graph('./foo-5.meta')
with tf.Session() as sess:
saver.restore(sess, './foo-5')
for step in range(6, 10):
print('{}: {}'.format(step,'batch:0')))
# 6: 12
# 7: 14
# 8: 5
# 9: 17
I couldn't solve the problem related to initializing the iterator, but since I pre-process my dataset using map method, and I apply transformations defined by Python operations wrapped with py_func, which cannot be serialized for storing\restoring, I'll have to initialize my dataset when I want to restore it anyway.
So, the problem that remains is how to feed data to my graph when I restore it. I placed a tf.identity node between the iterator output and my network input. Upon restoring, I feed my data to the identity node. A better solution that I discovered later is using placeholder_with_default(), as described in this answer.
I would suggest having a look at CheckpointInputPipelineHook CheckpointInputPipelineHook, which implements saving iterator state for further training with tf.Estimator.
I have a folder of .tfrecords files that I would like to read into my network. However, I am having a lot of trouble reading more than one tfrecords file at a time.
All my tfrecords files are stored in the the path_to_folders list. All my feature names are correct as well.
My code looks like this:
with tf.Session() as sess:
for folder in path_to_folders:
no_grasps = int(len(os.listdir(folder)) / 5)
feature_name_images = os.path.basename(folder) + '_images'
feature_name_csv = os.path.basename(folder) + '_csv'
data_path = os.path.join(path_to_data, os.path.basename(folder) + '.tfrecords')
feature = {feature_name_images: tf.FixedLenFeature([], tf.string),
feature_name_csv: tf.FixedLenFeature([], tf.string)
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
reader = tf.TFRecordReader()
_, serialized_example =
features = tf.parse_single_example(serialized_example, features=feature)
image_out = tf.decode_raw(features[feature_name_images], tf.float32)
csv_out = tf.decode_raw(features[feature_name_csv], tf.float32)
image_out_reshaped = tf.reshape(image_out, [no_grasps, 200, 200, 3])
csv_out_reshaped = tf.reshape(csv_out, [no_grasps, 6])
# Create a coordinator and run all QueueRunner objects
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# image_dataset, csv_dataset =[image_out_reshaped, csv_out_reshaped])
image_dataset =
print(image_dataset.shape, type(image_dataset))
except tf.errors.OutOfRangeError:
print('epoch limit reached')
After the first iteration of reading is done (first tfrecords file successfully read) then rest of them tell me that my epoch limit is reached with the warning:
OutOfRangeError (see above for traceback): FIFOQueue '_2_input_producer_1' is closed and has insufficient elements (requested 1, current size 0)
[[Node: ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](TFRecordReaderV2_1, input_producer_1)]]
I really don't understand why this is the case and was wondering if anyone could help me out.
Try moving coord.request_stop() out of your main try block and adding a 'finally' block to your try, like so :
print(image_dataset.shape, type(image_dataset))
except tf.errors.OutOfRangeError:
print('epoch limit reached')
You might also want to try moving some of the code outside of the for loop, you can set up a lot of the graph structure elsewhere, for example putting your tf.train.Coordinator() and tf.train.start_queue_runners ahead of the for loop might be more manageable
I'm relatively new to the world of TensorFlow, and pretty perplexed by how you'd actually read CSV data into a usable example/label tensors in TensorFlow. The example from the TensorFlow tutorial on reading CSV data is pretty fragmented and only gets you part of the way to being able to train on CSV data.
Here's my code that I've pieced together, based off that CSV tutorial:
from __future__ import print_function
import tensorflow as tf
def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f):
return i + 1
filename = "csv_test_data.csv"
# setup text reader
file_length = file_len(filename)
filename_queue = tf.train.string_input_producer([filename])
reader = tf.TextLineReader(skip_header_lines=1)
_, csv_row =
# setup CSV decoding
record_defaults = [[0],[0],[0],[0],[0]]
col1,col2,col3,col4,col5 = tf.decode_csv(csv_row, record_defaults=record_defaults)
# turn features back into a tensor
features = tf.stack([col1,col2,col3,col4])
print("loading, " + str(file_length) + " line(s)\n")
with tf.Session() as sess:
# start populating filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(file_length):
# retrieve a single instance
example, label =[features, col5])
print(example, label)
print("\ndone loading")
And here is an brief example from the CSV file I'm loading - pretty basic data - 4 feature columns, and 1 label column:
All the code above does is print each example from the CSV file, one by one, which, while nice, is pretty darn useless for training.
What I'm struggling with here is how you'd actually turn those individual examples, loaded one-by-one, into a training dataset. For example, here's a notebook I was working on in the Udacity Deep Learning course. I basically want to take the CSV data I'm loading, and plop it into something like train_dataset and train_labels:
def reformat(dataset, labels):
dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
# Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...]
labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
I've tried using tf.train.shuffle_batch, like this, but it just inexplicably hangs:
for i in range(file_length):
# retrieve a single instance
example, label =[features, colRelevant])
example_batch, label_batch = tf.train.shuffle_batch([example, label], batch_size=file_length, capacity=file_length, min_after_dequeue=10000)
print(example, label)
So to sum up, here are my questions:
What am I missing about this process?
It feels like there is some key intuition that I'm missing about how to properly build an input pipeline.
Is there a way to avoid having to know the length of the CSV file?
It feels pretty inelegant to have to know the number of lines you want to process (the for i in range(file_length) line of code above)
As soon as Yaroslav pointed out that I was likely mixing up imperative and graph-construction parts here, it started to become clearer. I was able to pull together the following code, which I think is closer to what would typically done when training a model from CSV (excluding any model training code):
from __future__ import print_function
import numpy as np
import tensorflow as tf
import math as math
import argparse
parser = argparse.ArgumentParser()
args = parser.parse_args()
def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f):
return i + 1
def read_from_csv(filename_queue):
reader = tf.TextLineReader(skip_header_lines=1)
_, csv_row =
record_defaults = [[0],[0],[0],[0],[0]]
colHour,colQuarter,colAction,colUser,colLabel = tf.decode_csv(csv_row, record_defaults=record_defaults)
features = tf.stack([colHour,colQuarter,colAction,colUser])
label = tf.stack([colLabel])
return features, label
def input_pipeline(batch_size, num_epochs=None):
filename_queue = tf.train.string_input_producer([args.dataset], num_epochs=num_epochs, shuffle=True)
example, label = read_from_csv(filename_queue)
min_after_dequeue = 10000
capacity = min_after_dequeue + 3 * batch_size
example_batch, label_batch = tf.train.shuffle_batch(
[example, label], batch_size=batch_size, capacity=capacity,
return example_batch, label_batch
file_length = file_len(args.dataset) - 1
examples, labels = input_pipeline(file_length, 1)
with tf.Session() as sess:
# start populating filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
while not coord.should_stop():
example_batch, label_batch =[examples, labels])
except tf.errors.OutOfRangeError:
print('Done training, epoch reached')
I think you are mixing up imperative and graph-construction parts here. The operation tf.train.shuffle_batch creates a new queue node, and a single node can be used to process the entire dataset. So I think you are hanging because you created a bunch of shuffle_batch queues in your for loop and didn't start queue runners for them.
Normal input pipeline usage looks like this:
Add nodes like shuffle_batch to input pipeline
(optional, to prevent unintentional graph modification) finalize graph
--- end of graph construction, beginning of imperative programming --
To be more scalable (to avoid Python GIL), you could generate all of your data using TensorFlow pipeline. However, if performance is not critical, you can hook up a numpy array to an input pipeline by using slice_input_producer. Here's an example with some Print nodes to see what's going on (messages in Print go to stdout when node is run)
num_examples = 5
num_features = 2
data = np.reshape(np.arange(num_examples*num_features), (num_examples, num_features))
print data
(data_node,) = tf.slice_input_producer([tf.constant(data)], num_epochs=1, shuffle=False)
data_node_debug = tf.Print(data_node, [data_node], "Dequeueing from data_node ")
data_batch = tf.batch([data_node_debug], batch_size=2)
data_batch_debug = tf.Print(data_batch, [data_batch], "Dequeueing from data_batch ")
sess = tf.InteractiveSession()
while True:
except tf.errors.OutOfRangeError as e:
print "No more inputs."
You should see something like this
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
[[0 1]
[2 3]]
[[4 5]
[6 7]]
No more inputs.
The "8, 9" numbers didn't fill up the full batch, so they didn't get produced. Also tf.Print are printed to sys.stdout, so they show up in separately in Terminal for me.
PS: a minimal of connecting batch to a manually initialized queue is in github issue 2193
Also, for debugging purposes you might want to set timeout on your session so that your IPython notebook doesn't hang on empty queue dequeues. I use this helper function for my sessions
def create_session():
config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.per_process_gpu_memory_fraction=0.3 # don't hog all vRAM
config.operation_timeout_in_ms=60000 # terminate on long hangs
# create interactive session to register a default session
sess = tf.InteractiveSession("", config=config)
return sess
Scalability Notes:
tf.constant inlines copy of your data into the Graph. There's a fundamental limit of 2GB on size of Graph definition so that's an upper limit on size of data
You could get around that limit by using v=tf.Variable and saving the data into there by running v.assign_op with a tf.placeholder on right-hand side and feeding numpy array to the placeholder (feed_dict)
That still creates two copies of data, so to save memory you could make your own version of slice_input_producer which operates on numpy arrays, and uploads rows one at a time using feed_dict
Or you could try this, the code loads the Iris dataset into tensorflow using pandas and numpy and a simple one neuron output is printed in the session. Hope it helps for a basic understanding.... [ I havent added the way of one hot decoding labels].
import tensorflow as tf
import numpy
import pandas as pd
df=pd.read_csv('/home/nagarjun/Desktop/Iris.csv',usecols = [0,1,2,3,4],skiprows = [0],header=None)
d = df.values
l = pd.read_csv('/home/nagarjun/Desktop/Iris.csv',usecols = [5] ,header=None)
labels = l.values
data = numpy.float32(d)
labels = numpy.array(l,'str')
#print data, labels
x = tf.placeholder(tf.float32,shape=(150,5))
x = data
w = tf.random_normal([100,150],mean=0.0, stddev=1.0, dtype=tf.float32)
y = tf.nn.softmax(tf.matmul(w,x))
with tf.Session() as sess:
You can use latest API :
dataset =
iterator = dataset.make_initializable_iterator()
columns = iterator.get_next()
with tf.Session() as sess:[iteator.initializer])
If anyone came here searching for a simple way to read absolutely large and sharded CSV files in tf.estimator API then , please see below my code
CSV_COLUMNS = ['ID','text','class']
LABEL_COLUMN = 'class'
DEFAULTS = [['x'],['no'],[0]] #Default values
def read_dataset(filename, mode, batch_size = 512):
def _input_fn(v_test=False):
# def decode_csv(value_column):
# columns = tf.decode_csv(value_column, record_defaults = DEFAULTS)
# features = dict(zip(CSV_COLUMNS, columns))
# label = features.pop(LABEL_COLUMN)
# return add_engineered(features), label
# Create list of files that match pattern
file_list = tf.gfile.Glob(filename)
# Create dataset from file list
#dataset =
dataset =,
if mode == tf.estimator.ModeKeys.TRAIN:
num_epochs = None # indefinitely
dataset = dataset.shuffle(buffer_size = 10 * batch_size)
num_epochs = 1 # end-of-input after this
batch_features, batch_labels = dataset.make_one_shot_iterator().get_next()
#Begins - Uncomment for testing only -----------------------------------------------------<
if v_test == True:
with tf.Session() as sess:
#End - Uncomment for testing only -----------------------------------------------------<
return add_engineered(batch_features), batch_labels
return _input_fn
Example usage in TF.estimator:
train_spec = tf.estimator.TrainSpec(input_fn = read_dataset(
filename = train_file,
mode = tf.estimator.ModeKeys.TRAIN,
batch_size = 128),
max_steps = num_train_steps)
2.0 Compatible Solution: This Answer might be provided by others in the above thread but I will provide additional links which will help the community.
dataset =
batch_size=5, # Artificially small to make examples easier to show.
For more information, please refer this Tensorflow Tutorial.