My goal is to have a preprocessing layers so it can handle any image size. This is because the data set that I use have 2 different image shape. The solution is simple, just resize it when I load the image. However, I believe this wont work when the model is deployed, I can't do manual resize like that. So I must use preprocessing layers.
The docs I used
What I've tried:
Put the preprocessing layers part of the model, it does not work.
I am thinking to use TensorSliceDataset.map(resize_and_rescale).
The problem is I need to convert the [tensor image 1, tensor image 2] to TensorSliceDataset. However, I can't convert it.
What I've tried:
tf.data.Dataset.from_tensor_slices((X_train, y_train))
It throws error
InvalidArgumentError: {{function_node __wrapped__Pack_N_9773_device_/job:localhost/replica:0/task:0/device:GPU:0}} Shapes of all inputs must match: values[0].shape = [258,320,3] != values[23].shape = [322,480,3]
[[{{node Pack}}]] [Op:Pack] name: component_0
The load images function:
def load_images(df):
paths = df['path'].values
X = []
for path in paths:
raw = tf.io.read_file(path)
img = tf.image.decode_png(raw, channels=3)
X.append(img)
y = df['kind'].cat.codes
return X, y
As far as I understand you wish to train on both image sizes simultaneously. The simplest way is probably to create two different datasets for each image size and concatenate them after the batching as follows:
dataset_1 = tf.data.Dataset.from_tensor_slices((X_train_1, y_train_1))
dataset_1 = dataset_1.batch(batch_size_1)
dataset_2 = tf.data.Dataset.from_tensor_slices((X_train_2, y_train_2))
dataset_2 = dataset_2.batch(batch_size_2)
dataset = dataset_1.concatenate(dataset_2)
dataset = dataset.shuffle(shuffle_buffer_size)
This case each batch consists of images of the same size. If you use .repeat() do not forget to put if after the concatination.
You need to use ragged tensors to handle different image sizes:
dataset = tf.data.Dataset.from_tensor_slices((tf.ragged.constant(img_list), label_list))
dataset = dataset.apply(tf.data.experimental.dense_to_ragged_batch(batch_size=3))
Example
Related
So I have to use Tensorflow 1.9 for system specific reasons.
I want to train a cnn with a custom dataset consisting of images.
The folder structure looks very much like this:
./
+ circles
- circle-0.jpg
- circle-1.jpg
- ...
+ hexagons
- hexagon-0.jpg
- hexagon-1.jpg
- ...
+ ...
So the example I have to work with uses MNIST and has these two particular lines of code:
mnist_dataset = tf.keras.datasets.mnist.load_data('mnist_data')
(x_train, y_train), (x_test, y_test) = mnist_dataset
In my work, I also have to use this data format (x_train, y_train), (x_test, y_test), which seems to be quite common. As far as I was able to find out up to now, the format of those datasets are: (image_data, label), and is something like ((60000, 28, 28), (60000,)), at least with the MNIST dataset. The image_data here is supposedly of dtype uint8 (according to this post). I was able to find out, that a tf.data.Dataset() object looks like the tuples I need here (image_data, label).
So far so good. But a few questions arise from this information which I wasn't able to figure out yet, and where I would kindly request your help:
(60000, 28, 28) means 60k a 28 x 28 image value array, right?
If 1. is right, how do I get my images (like in the directory structure I described above) into this format? Is there a function which yields an array that I can use like that?
I know I need some kind of generator function which should get all the images with label, because in Tensorflow 1.9 the tf.keras.utils.image_dataset_from_directory() does not seem to exist yet.
How do the labels actually look like? For example, with my directory structure, would I have something like this:
(A)
File
Label
circle-0.jpg
circle
circle-233.jpg
circle
hexagon-1.jpg
hexagon
triangle-12.jpg
triangle
or (B)
File
Label
circle-0.jpg
circle-0
circle-233.jpg
circle-233
hexagon-1.jpg
hexagon-1
triangle-12.jpg
triangle-12
, where the respective image is already converted to a "(60000, 28, 28)" format? It seems as if I need to create all my functions by myself, since there does not seem to be a good function which takes a directory structure like mine to a dataset which can be utilized by Tensorflow 1.9, or is there?. I know of the tf.keras.preprocessing.image.ImageDataGenerator and image_dataset_from_directory as well as flow_from_directory(), however, all of them don't seem to bring me my desired dataset value tuple format.
I would really appreciate any help!
You have to build a custom data generator for that. If you have two arrays, train_paths containing the paths to images and train_labels containing the labels for the images, then this function (datagen) would yield the images as array and with their respective label as a tuple (image_array, label).
And I have also added a way to integer-encode your labels, with a dictionary encode_label
For example, train_paths and train_labels should look like this:
train_paths = np.array(['path/to/image1.jpg','path/to/image2.jpg','path/to/image3.jpg'])
train_labels = np.array(['circle','square','hexagon'])
where the image of path 'path/to/image1.jpg' has a label of 'circle', the image of path 'path/to/image2.jpg' has a label of 'square'.
This generator function will return data as a batch and you can write your custom augmentation techniques as well (inside the augment function)
import tensorflow as tf
# Hyperparameters
HEIGHT = 224 # Image height
WIDTH = 224 # Image width
CHANNELs = 3 # Image channels
# This function will encode your labels
encode_label = {'hexagon':0, 'circle':1, 'square':2}
def augment(image):
# All your augmentation techniques are done here
return image
def encode_labels(labels):
encoded = []
for label in labels:
encoded.append(encode_label[label])
return encoded
def open_images(paths):
'''
Given a list of paths to images, this function loads
the images from the paths, then augments them, then returns it as a batch
'''
images = []
for path in paths:
image = tf.keras.preprocessing.image.load_img(path, target_size=(HEIGHT, WIDTH, CHANNELS))
image = np.array(image)
image = augment(image)
images.append(image)
return np.array(images)
# This is the data generator
def datagen(paths, labels, batch_size=32):
for x in range(0,len(paths), batch_size):
# Load batch of images
batch_paths = paths[x:x+batch_size]
batch_images = open_images(batch_paths)
# Load batch of labels
batch_labels = labels[x:x+batch_size]
batch_labels = encode_labels(batch_labels)
batch_labels = np.array(batch_labels, dtype='float').reshape(-1)
yield batch_images, batch_labels
If you cannot get tf.keras.preprocessing.image.load_img working in your tensorflow version, try using an alternative method to load image and resize it. One alternative way would be to load the image with matplotlib and then resizing it with skimage. So the open_images function would be this:
import matplotlib
from skimage.transform import resize
def open_images(paths):
'''
Given a list of paths to images, this function loads
the images from the paths, then augments them, then returns it as a batch
'''
images = []
for path in paths:
image = matplotlib.image.imread(path)
image = np.array(image)
image = resize(image, (HEIGHT, WIDTH, CHANNELS))
image = augment(image)
images.append(image)
return np.array(images)
I wish to take the FFT of the input dataset loaded using ImageDataGenerator. Taking the FFT will double the number of channels as I stack the real and complex parts of the complex output of the FFT together along the channels dimension. The preprocessing_function attribute of the ImageDataGenerator class should output a Numpy tensor with the same shape as the input, so I could not use that.
I tried applying tf.math.fft2d directly on the ImageDataGenerator.flow_from_directory() output, but it is consuming too much RAM - causing the program to crash on Google colab. Another way I tried was to add a custom layer computing the FFT as the first layer of my neural network, but this adds to the training time. So I wish to do it as a pre-processing step.
Could anyone kindly suggest an efficient way to apply a function on ImageDataGenerator.
You can do a custom ImageDataGenerator, but I have no reason to think this is any faster than using it in the first layer. It seems like a costly operation, since tf.signal.fft2d takes complex64 or complex128 dtypes. So it needs casting, and then casting back because neural network weights are tf.float32 and other image processing functions don't take complex dtype.
import tensorflow as tf
labels = ['Cats', 'Dogs', 'Others']
def read_image(file_name):
image = tf.io.read_file(file_name)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize_with_pad(image, target_height=224, target_width=224)
image = tf.cast(image, tf.complex64)
image = tf.signal.fft2d(image)
label = tf.strings.split(file_name, '\\')[-2]
label = tf.where(tf.equal(label, labels))
return image, label
ds = tf.data.Dataset.list_files(r'path\to\my\pictures\*\*.jpg')
ds = ds.map(read_image)
next(iter(ds))
I am using tf.keras to build my network. And I am doing all the augmentation in tensor_wise level since my data in tfrecords file. Then I needed to do shearing and zca for augmentation but couldn't find a proper implementation in tensor flow. And I can't use the DataImageGenerator that did both operation I needed because as I said my data doesn't fit in memory and it is in tfrecord format. So all my augmentations process should be tesnorwise.
#fchollet here suggested a way to use ImgaeDataGenerator with large dataset.
My first questino is
if I use #fchollet way, which is basically using X-sample of the large data to run the ImageDataGenerator then using train_on_batch to train the network , how I can feed my validation data to the network.
My Second question is there any tensor-wise implementation for shear and zca operations. Some people like here suggested using tf.contrib.image.transform but couldn't understand how. If some one have the idea on how to do it, I will appreciate that.
Update:
This is my trial to construct the transformation matrix through ski_image
from skimage import io
from skimage import transform as trans
import tensor flow as tf
def augment()
afine_tf = trans.AffineTransform(shear=0.2)
transform = tf.contrib.image.matrices_to_flat_transforms(tf.linalg.inv(afine_tf.params))
transform= tf.cast(transform, tf.float32)
image = tf.contrib.image.transform(image, transform) # Image here is a tensor
return image
dataset_train = tf.data.TFRecordDataset(training_files, num_parallel_reads=calls)
dataset_train = dataset_train.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=1000+ 4 * batch_size))
dataset_train = dataset_train.map(decode_train, num_parallel_calls= calls)
dataset_train = dataset_train.map(augment,num_parallel_calls=calls )
dataset_train = dataset_train.batch(batch_size)
dataset_train = dataset_train.prefetch(tf.contrib.data.AUTOTUNE)
I will answer the second question.
Today one of my old questions was commented by a user, but the comments have been deleted when I was adding more details on how to use tf.contrib.image.transform. I guess it's you, right?
So, I have edited my question and added an example, check it here.
TL;DR:
def transformImg(imgIn,forward_transform):
t = tf.contrib.image.matrices_to_flat_transforms(tf.linalg.inv(forward_transform))
# please notice that forward_transform must be a float matrix,
# e.g. [[2.0,0,0],[0,1.0,0],[0,0,1]] will work
# but [[2,0,0],[0,1,0],[0,0,1]] will not
imgOut = tf.contrib.image.transform(imgIn, t, interpolation="BILINEAR",name=None)
return imgOut
def shear_transform_example(filename,shear_lambda):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string, channels=3)
img = transformImg(image_decoded, [[1.0,shear_lambda,0],[0,1.0,0],[0,0,1.0]])
# Notice that this is a shear transformation parallel to the x axis
# If you want a y axis version, use this:
# img = transformImg(image_decoded, [[1.0,0,0],[shear_lambda,1.0,0],[0,0,1.0]])
return img
img = shear_transform_example("white_square.jpg",0.1)
I am working on the Python/tensorflow/mnist tutorial.
Since a few weeks using the orignal code from tensorflow web site i get the warning that the image dataset would soon be deprecated abd that i should use the following one :
https://github.com/tensorflow/models/blob/master/official/mnist/dataset.py
I load it it my code using :
from tensorflow.models.official.mnist import dataset
trainfile = dataset.train(data_dir)
Which returns :
tf.data.Dataset.zip((images, labels))
The issue is that I cannot find a,way to separate them in the following way for example :
trainfile = dataset.train(data_dir)
train_data= trainfile.images
train_label= trainfile.label
But this clearly doesnot work because the attributrs images and label do not exist. trainfile is a tf.dataset.
Knowing that tf.dataset is made of int32 and float32 i tried :
train_data = trainfile.map(lambda x,y : x.dtype == tf.float32)
But it returns and empty dataset.
I insist (but will be open mimded) in doing it this way (two complete batches of image and label) because this is how the tutorial works :
https://www.tensorflow.org/tutorials/estimators/cnn
I saw a lot of solution to get elements from datasets but nothing to go back from the zip operations that is done in the following code
tf.data.Dataset.zip((images, labels))
Thanks you in advance for your help.
I hope this helps:
inputs = tf.placeholder(tf.float32, shape=(None, 784), name='inputs')
outputs = tf.placeholder(tf.float32, shape=(None,), name='outputs')
#Prepare a tensorflow dataset
ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
ds = ds.shuffle(buffer_size=10, reshuffle_each_iteration=True).batch(batch_size=batch_size, drop_remainder=True).repeat()
iter = ds.make_one_shot_iterator()
next = iter.get_next()
inputs = next[0]
outputs = next[1]
TensorFlow's get_single_element() is finally around which can be used to unzip features and labels from the dataset.
This avoids the need of generating and using an iterator using .map(), iter() or one_shot_iterator() (which could be costly for big datasets).
get_single_element() returns a tensor (or a tuple or dict of tensors) encapsulating all the members of the dataset. We need to pass all the members of the dataset batched into a single element.
This can be used to get features as a tensor-array, or features and labels as a tuple or dictionary (of tensor-arrays) depending upon how the original dataset was created.
Check this answer on SO for an example that unpacks features and labels into a tuple of tensor-arrays.
Instead of separating into two datasets, one for images and another for labels, it's best to make a single iterator which returns both the image and the label.
The reason why this is preferred is that it's a lot easier to ensure that you match each example with its label even after a complicated series of shuffles, reorderings, filterings, etc, as you might have in a nontrivial input pipeline.
You can visualize images and find its associated labels
ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
ds = ds.shuffle(buffer_size=10).batch(batch_size=batch_size)
iter = ds.make_one_shot_iterator()
next = iter.get_next()
def display(image, label):
# display image
...
plt.imshow(image)
...
with tf.Session() as sess:
try:
while True:
image, label = sess.run(next)
# image = numpy array (batch, image_size)
# label = numpy array (batch, label)
display(image[0], label[0]) #display first image in batch
except:
pass
I'd like to be able to read in batches of images. However, these batches must be constructed by a python script (they cannot be placed into a file ahead of time for various reasons). What's the most efficient way, in tensorflow to do the following:
(1) Provided: A python list of B variable-length strings that point to images, all have the same size. B is the batch size.
(2) For each string, load the image it corresponds to, and apply a random crop of 5% (the crop is random but the size of the crop is fixed)
(3) Concatenate the images together into a tensor of size B x H x W x 3
If this is not possible, does anyone have any benchmarks / data on the efficiency loss of loading and preprocessing the images in python then placing them into a queue? I assume the net will run considerably faster if image loading / preprocessing is done internally on tensorflow.
This is how I understand your problem:
you have some images
you have a function sample_batch() which returns a batch of filenames of size B
you want to read the images corresponding to these filenames and preprocess them
finally you output a batch of these examples
input = tf.placeholder(tf.string, name='Input')
queue = tf.FIFOQueue(capacity, tf.string, [()], name='Queue')
enqueue_op = queue.enqueue_many(input)
reader = tf.WholeFileReader()
filename, content = reader.read(queue)
image = tf.image.decode_jpeg(content, channels=3)
# Preprocessing
image = tf.random_crop(image, [H, W, 3])
image = tf.to_float(image)
batch_image = tf.train.batch([image], batch_size=B, name='Batch')
output = inference(batch_image)
Then in the session, you have to run the enqueue operation with the filenames from your sample_batch()function:
with tf.Session() as sess:
tf.train.start_queue_runners()
for i in range(NUM_STEPS):
batch_filenames = sample_batch()
sess.run(enqueue_op, feed_dict={input: batch_filenames})
sess.run(output)
If you have the images as a byte array you can use something similar to this in your graph:
jpegs = tf.placeholder(tf.string, shape=(None))
images = tf.map_fn(lambda jpeg : your_processing_fn(jpeg), jpegs,
dtype=tf.float32)
logits = your_inference_model(images,labels)
where your_processing_fn is a function which receives a jpeg tensor of bytes, decodes, resizes and crops it and returns a image of H x W x 3
You need the latest version of tensorflow as map_fn is not in 0.8 and below.