How to read dataset names from string tensor in tensorflow - python

I'm new to tensorflow, I have a tensor(string type) in which I have stored image paths of all the required images that i want to use for training a model.
Question : How to read the tensor to queue and then batch it.
My Approach is: Is giving me error
img_names = dataset['f0']
file_length = len(img_names)
type(img_names)
tf_img_names = tf.stack(img_names)
filename_queue = tf.train.string_input_producer(tf_img_names, num_epochs=num_epochs, shuffle=False)
wd=getcwd()
print('In input pipeline')
tf_img_queue = tf.FIFOQueue(file_length,dtypes=[tf.string])
col_Image = tf_img_queue.dequeue(filename_queue)
### Read Image
img_file = tf.read_file(wd+'/'+col_Image)
image = tf.image.decode_png(img_file, channels=num_channels)
image = tf.cast(image, tf.float32) / 255.
image = tf.image.resize_images(image,[image_width, image_height])
min_after_dequeue = 100
capacity = min_after_dequeue + 3 * batch_size
image_batch, label_batch = tf.train.batch([image, onehot], batch_size=batch_size, capacity=capacity, allow_smaller_final_batch = True, min_after_dequeue=min_after_dequeue)
Error : TypeError: expected string or buffer'
I dont know if my approach is right or not

You don't have to create another Queue. You can define a reader that will dequeue elements for you. You can try the following and comment how that goes.
reader = tf.IdentityReader()
key, value = reader.read(filename_queue)
dir = tf.constant(wd)
path = tf.string_join([dir,tf.constant("/"),value])
img_file = tf.read_file(path)
and to check you're feeding correct paths, do
print(sess.run(img_file))
Looking for your feedback.

Related

Create tensorflow dataset from image local directory

I have a very huge database of images locally, with the data distribution like each folder cointains the images of one class.
I would like to use the tensorflow dataset API to obtain batches de data without having all the images loaded in memory.
I have tried something like this:
def _parse_function(filename, label):
image_string = tf.read_file(filename, "file_reader")
image_decoded = tf.image.decode_jpeg(image_string, channels=3)
image = tf.cast(image_decoded, tf.float32)
return image, label
image_list, label_list, label_map_dict = read_data()
dataset = tf.data.Dataset.from_tensor_slices((tf.constant(image_list), tf.constant(label_list)))
dataset = dataset.shuffle(len(image_list))
dataset = dataset.repeat(epochs).batch(batch_size)
dataset = dataset.map(_parse_function)
iterator = dataset.make_one_shot_iterator()
image_list is a list where the path (and name) of the images have been appended and label_list is a list where the class of each image has been appended in the same order.
But the _parse_function does not work, the error that I recibe is:
ValueError: Shape must be rank 0 but is rank 1 for 'file_reader' (op: 'ReadFile') with input shapes: [?].
I have googled the error, but nothing works for me.
If I do not use the map function, I just recibe the path of the images (which are store in image_list), so I think that I need the map function to read the images, but I am not able to make it works.
Thank you in advance.
EDIT:
def read_data():
image_list = []
label_list = []
label_map_dict = {}
count_label = 0
for class_name in os.listdir(base_path):
class_path = os.path.join(base_path, class_name)
label_map_dict[class_name]=count_label
for image_name in os.listdir(class_path):
image_path = os.path.join(class_path, image_name)
label_list.append(count_label)
image_list.append(image_path)
count_label += 1
The error is in this line dataset = dataset.repeat(epochs).batch(batch_size) Your pipeline adds batchsize as a dimension to input.
You need to batch your dataset after map function like this
dataset = tf.data.Dataset.from_tensor_slices((tf.constant(image_list), tf.constant(label_list)))
dataset = dataset.shuffle(len(image_list))
dataset = dataset.repeat(epochs)
dataset = dataset.map(_parse_function).batch(batch_size)

TensorFlow read and decode BATCH of images

Using tf.train.string_input_producer and tf.image.decode_jpeg I manage to read from disk and decode a single image.
This is the code:
# -------- Graph
filename_queue = tf.train.string_input_producer(
[img_path, img_path])
image_reader = tf.WholeFileReader()
key, image_file = image_reader.read(filename_queue)
image = tf.image.decode_jpeg(image_file, channels=3)
# Run my network
logits = network.get_logits(image)
# -------- Session
sess = tf.Session()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
logits_output = sess.run(logits)
The thing is, that when I look at the shape of the logit_outputs I get only 1 value even though the queue is 2 images long.
How can I read and decode the entire queue?
tf.WholeFileReader(), along tf.train.string_input_producer() work as an iterator, and thus does not have an easy way to evaluate the size of the complete dataset it is handling.
To obtain batches of N samples out of it, you could instead use image_reader.read_up_to(filename_queue, N).
Note: you can achieve the same using the newer tf.data pipeline:
def _parse_function(filename):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_image(image_string)
return image_decoded
# A vector of filenames.
filenames = tf.constant([img_path, img_path])
dataset = tf.data.Dataset.from_tensor_slices((filenames))
dataset = dataset.map(_parse_function).batch(N)
iterator = dataset.make_one_shot_iterator()
next_image_batch = iterator.get_next()
logits = network.get_logits(next_image_batch)
# ...

TensorFlow : Eval Never ends despite starting queue runners

My code hangs on the following statement:
print('Beginning Eval...')
feed_dict_train = {img_data: images_batch.eval(session=session),
img_labels: labels_batch.eval(session=session)}
I have tried to eval with both a single example and as a batch. The code to understand how images_batch and labels_batch are created is as follows:
Creating a single example:
def read_single_example(filename):
#Mini-init
image_size = 256
filenames = tf.train.string_input_producer([filename], num_epochs = None)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filenames)
features = tf.parse_single_example(
serialized_example,
features={
'label':tf.FixedLenFeature([], tf.float32),
'image':tf.FixedLenFeature([image_size*image_size*3], tf.float32)
})
label = features['label']
#label = tf.get_default_session().run(label)
image_data = features['image']
#image_data = tf.get_default_session().run(image_data)
image = tf.reshape(image_data,(256,256,3))
return label, image
Reading a single example:
label_train, image_train = read_single_example(path)
Batching the examples inside the training loop:
print('Getting Batch')
images_batch, labels_batch = tf.train.shuffle_batch(
[image_train, label_train], batch_size=batch_size,
capacity=2000,
min_after_dequeue=1000)
The records read from disk were created with the code found here but in essence are just a bunch of 256x256x3 images with a 1x41 tensor for labels. Thats what I think I did anyway, hoping the records arent where I made the error or I have like 100GB of dead data.
This is how the Tf session is started:
session = tf.Session()
init = tf.global_variables_initializer()
session.run(init)
tf.train.start_queue_runners(sess=session)
Which to my understanding was all I needed to do to make everything work as intended. A gist of the full code for the network can be found here. Hope I didnt do anything stupid, and thanks for the help!

Tensorflow mixes up images and labels when making batch

So I've been stuck on this problem for weeks. I want to make an image batch from a list of image filenames. I insert the filename list into a queue and use a reader to get the file. The reader then returns the filename and the read image file.
My problem is that when I make a batch using the decoded jpg and the labels from the reader, tf.train.shuffle_batch() mixes up the images and the filenames so that now the labels are in the wrong order for the image files. Is there something I am doing wrong with the queue/shuffle_batch and how can I fix it such that the batch comes out with the right labels for the right files?
Much thanks!
import tensorflow as tf
from tensorflow.python.framework import ops
def preprocess_image_tensor(image_tf):
image = tf.image.convert_image_dtype(image_tf, dtype=tf.float32)
image = tf.image.resize_image_with_crop_or_pad(image, 300, 300)
image = tf.image.per_image_standardization(image)
return image
# original image names and labels
image_paths = ["image_0.jpg", "image_1.jpg", "image_2.jpg", "image_3.jpg", "image_4.jpg", "image_5.jpg", "image_6.jpg", "image_7.jpg", "image_8.jpg"]
labels = [0, 1, 2, 3, 4, 5, 6, 7, 8]
# converting arrays to tensors
image_paths_tf = ops.convert_to_tensor(image_paths, dtype=tf.string, name="image_paths_tf")
labels_tf = ops.convert_to_tensor(labels, dtype=tf.int32, name="labels_tf")
# getting tensor slices
image_path_tf, label_tf = tf.train.slice_input_producer([image_paths_tf, labels_tf], shuffle=False)
# getting image tensors from jpeg and performing preprocessing
image_buffer_tf = tf.read_file(image_path_tf, name="image_buffer")
image_tf = tf.image.decode_jpeg(image_buffer_tf, channels=3, name="image")
image_tf = preprocess_image_tensor(image_tf)
# creating a batch of images and labels
batch_size = 5
num_threads = 4
images_batch_tf, labels_batch_tf = tf.train.batch([image_tf, label_tf], batch_size=batch_size, num_threads=num_threads)
# running testing session to check order of images and labels
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
print image_path_tf.eval()
print label_tf.eval()
coord.request_stop()
coord.join(threads)
Wait.... Isn't your tf usage a little weird?
You are basically running the graph twice by calling:
print image_path_tf.eval()
print label_tf.eval()
And since you are only asking for image_path_tf and label_tf, anything below this line is not even run:
image_path_tf, label_tf = tf.train.slice_input_producer([image_paths_tf, labels_tf], shuffle=False)
Maybe try this?
image_paths, labels = sess.run([images_batch_tf, labels_batch_tf])
print(image_paths)
print(labels)
From your code I'm unsure how your labels are encoded/extracted from the jpeg images. I used to encode everything in the same file, but have since found a much more elegant solution. Assuming you can get a list of filenames, image_paths and a numpy array of labels labels, you can bind them together and operate on individual examples with tf.train.slice_input_producer then batch them together using tf.train.batch.
import tensorflow as tf
from tensorflow.python.framework import ops
shuffle = True
batch_size = 128
num_threads = 8
def get_data():
"""
Return image_paths, labels such that label[i] corresponds to image_paths[i].
image_paths: list of strings
labels: list/np array of labels
"""
raise NotImplementedError()
def preprocess_image_tensor(image_tf):
"""Preprocess a single image."""
image = tf.image.convert_image_dtype(image_tf, dtype=tf.float32)
image = tf.image.resize_image_with_crop_or_pad(image, 300, 300)
image = tf.image.per_image_standardization(image)
return image
image_paths, labels = get_data()
image_paths_tf = ops.convert_to_tensor(image_paths, dtype=tf.string, name='image_paths')
labels_tf = ops.convert_to_tensor(image_paths, dtype=tf.int32, name='labels')
image_path_tf, label_tf = tf.train.slice_input_producer([image_paths_tf, labels_tf], shuffle=shuffle)
# preprocess single image paths
image_buffer_tf = tf.read_file(image_path_tf, name='image_buffer')
image_tf = tf.image.decode_jpeg(image_buffer_tf, channels=3, name='image')
image_tf = preprocess_image_tensor(image_tf)
# batch the results
image_batch_tf, labels_batch_tf = tf.train.batch([image_tf, label_tf], batch_size=batch_size, num_threads=num_threads)

Open and convert to tensor an image from CSV file in TensorFlow

I've got the below function, mostly from this question. That person is trying to read five columns that are all ints. I'm trying to read two columns: one an image file path and one an int. So I need to open the file and convert it into a tensor.
So my question is: how do I read the file and convert it into the necessary tensor?
I've tried quite a few different things like reading the file and converting that string to a tensor using .
def read_from_csv(filename_queue):
reader = tf.TextLineReader()
_, csv_row = reader.read(filename_queue)
record_defaults = [[""],[0]]
image_path, label = tf.decode_csv(csv_row, field_delim=" ", record_defaults=record_defaults)
# here's where I need to read the file somehow...
image = tf.read_file(image_path) # I probably need this somewhere
print(image) # Tensor("DecodeJpeg:0", shape=(?, ?, 3), dtype=uint8)
image = tf.image.decode_jpeg(image, channels=3)
return image, label
I also tried using numpy (can't recall exactly what) but that didn't work either.
Ok, here's what I was missing:
def read_from_csv(filename_queue):
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
record_defaults = [[""],[0]]
image_path, label = tf.decode_csv(value, field_delim=" ", record_defaults=record_defaults)
#following line contains the important change...
image = tf.image.decode_jpeg(tf.read_file(image_path), channels=3)
return image, label
def input_pipeline(batch_size, num_epochs=None):
filename_queue = tf.train.string_input_producer(["./28_dense.csv"], num_epochs=num_epochs, shuffle=True)
image, label = read_from_csv(filename_queue)
image = tf.reshape(image, [28,28,3])
min_after_dequeue = 5
capacity = min_after_dequeue + 3 * batch_size
image_batch, label_batch = tf.train.batch( [image, label], batch_size=batch_size, capacity=capacity)
return image_batch, label_batch
file_length = 1
examples, labels = input_pipeline(file_length, 1)
The step I was missing was simply reading the file with tf.read_file(image_path). I figured decode_jpeg would do that.
Also just a tip: to inspect values of TF stuff (I hesitate to say variables, etc.) create a session like the following:
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
real_value = sess.run([value]) # see value above
print(real_value)

Categories