I'm trying do do data augmentation with keras ImageDataGenerator.
I'm pulling the images from a Dataframe containing the paths to the image in one column and the label in another one. For now, i'm only trying to flip the image horizontaly. But when I plot the images, the images look like the brightness was pushed to the max. I wonder what's going on here... Any thoughts?
datagen = ImageDataGenerator(horizontal_flip=True)
# Configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow_from_dataframe(dataframe=data,
x_col="File name",
y_col="Driving direction",
directory = "Self-Driving-Car/Training Data/",
target_size = (480, 640),
class_mode = "other"):
# Show 9 images
for i in range(0, 9):
plt.subplot(330 + 1 + i)
plt.imshow(X_batch[i])
plt.show()
break
Ok it was just a plotting problem, this solved it :
plt.imshow(X_batch[i]/255)
More generally, be aware that if you do not have 0-255 ranged int images, ImageDataGenerator can rescale the data to 0-255 according to certain default values on certain of the augmentation options but not others. brightness_range being one example that rescales to 0-255.
Related
My goal is to have a preprocessing layers so it can handle any image size. This is because the data set that I use have 2 different image shape. The solution is simple, just resize it when I load the image. However, I believe this wont work when the model is deployed, I can't do manual resize like that. So I must use preprocessing layers.
The docs I used
What I've tried:
Put the preprocessing layers part of the model, it does not work.
I am thinking to use TensorSliceDataset.map(resize_and_rescale).
The problem is I need to convert the [tensor image 1, tensor image 2] to TensorSliceDataset. However, I can't convert it.
What I've tried:
tf.data.Dataset.from_tensor_slices((X_train, y_train))
It throws error
InvalidArgumentError: {{function_node __wrapped__Pack_N_9773_device_/job:localhost/replica:0/task:0/device:GPU:0}} Shapes of all inputs must match: values[0].shape = [258,320,3] != values[23].shape = [322,480,3]
[[{{node Pack}}]] [Op:Pack] name: component_0
The load images function:
def load_images(df):
paths = df['path'].values
X = []
for path in paths:
raw = tf.io.read_file(path)
img = tf.image.decode_png(raw, channels=3)
X.append(img)
y = df['kind'].cat.codes
return X, y
As far as I understand you wish to train on both image sizes simultaneously. The simplest way is probably to create two different datasets for each image size and concatenate them after the batching as follows:
dataset_1 = tf.data.Dataset.from_tensor_slices((X_train_1, y_train_1))
dataset_1 = dataset_1.batch(batch_size_1)
dataset_2 = tf.data.Dataset.from_tensor_slices((X_train_2, y_train_2))
dataset_2 = dataset_2.batch(batch_size_2)
dataset = dataset_1.concatenate(dataset_2)
dataset = dataset.shuffle(shuffle_buffer_size)
This case each batch consists of images of the same size. If you use .repeat() do not forget to put if after the concatination.
You need to use ragged tensors to handle different image sizes:
dataset = tf.data.Dataset.from_tensor_slices((tf.ragged.constant(img_list), label_list))
dataset = dataset.apply(tf.data.experimental.dense_to_ragged_batch(batch_size=3))
Example
So I have to use Tensorflow 1.9 for system specific reasons.
I want to train a cnn with a custom dataset consisting of images.
The folder structure looks very much like this:
./
+ circles
- circle-0.jpg
- circle-1.jpg
- ...
+ hexagons
- hexagon-0.jpg
- hexagon-1.jpg
- ...
+ ...
So the example I have to work with uses MNIST and has these two particular lines of code:
mnist_dataset = tf.keras.datasets.mnist.load_data('mnist_data')
(x_train, y_train), (x_test, y_test) = mnist_dataset
In my work, I also have to use this data format (x_train, y_train), (x_test, y_test), which seems to be quite common. As far as I was able to find out up to now, the format of those datasets are: (image_data, label), and is something like ((60000, 28, 28), (60000,)), at least with the MNIST dataset. The image_data here is supposedly of dtype uint8 (according to this post). I was able to find out, that a tf.data.Dataset() object looks like the tuples I need here (image_data, label).
So far so good. But a few questions arise from this information which I wasn't able to figure out yet, and where I would kindly request your help:
(60000, 28, 28) means 60k a 28 x 28 image value array, right?
If 1. is right, how do I get my images (like in the directory structure I described above) into this format? Is there a function which yields an array that I can use like that?
I know I need some kind of generator function which should get all the images with label, because in Tensorflow 1.9 the tf.keras.utils.image_dataset_from_directory() does not seem to exist yet.
How do the labels actually look like? For example, with my directory structure, would I have something like this:
(A)
File
Label
circle-0.jpg
circle
circle-233.jpg
circle
hexagon-1.jpg
hexagon
triangle-12.jpg
triangle
or (B)
File
Label
circle-0.jpg
circle-0
circle-233.jpg
circle-233
hexagon-1.jpg
hexagon-1
triangle-12.jpg
triangle-12
, where the respective image is already converted to a "(60000, 28, 28)" format? It seems as if I need to create all my functions by myself, since there does not seem to be a good function which takes a directory structure like mine to a dataset which can be utilized by Tensorflow 1.9, or is there?. I know of the tf.keras.preprocessing.image.ImageDataGenerator and image_dataset_from_directory as well as flow_from_directory(), however, all of them don't seem to bring me my desired dataset value tuple format.
I would really appreciate any help!
You have to build a custom data generator for that. If you have two arrays, train_paths containing the paths to images and train_labels containing the labels for the images, then this function (datagen) would yield the images as array and with their respective label as a tuple (image_array, label).
And I have also added a way to integer-encode your labels, with a dictionary encode_label
For example, train_paths and train_labels should look like this:
train_paths = np.array(['path/to/image1.jpg','path/to/image2.jpg','path/to/image3.jpg'])
train_labels = np.array(['circle','square','hexagon'])
where the image of path 'path/to/image1.jpg' has a label of 'circle', the image of path 'path/to/image2.jpg' has a label of 'square'.
This generator function will return data as a batch and you can write your custom augmentation techniques as well (inside the augment function)
import tensorflow as tf
# Hyperparameters
HEIGHT = 224 # Image height
WIDTH = 224 # Image width
CHANNELs = 3 # Image channels
# This function will encode your labels
encode_label = {'hexagon':0, 'circle':1, 'square':2}
def augment(image):
# All your augmentation techniques are done here
return image
def encode_labels(labels):
encoded = []
for label in labels:
encoded.append(encode_label[label])
return encoded
def open_images(paths):
'''
Given a list of paths to images, this function loads
the images from the paths, then augments them, then returns it as a batch
'''
images = []
for path in paths:
image = tf.keras.preprocessing.image.load_img(path, target_size=(HEIGHT, WIDTH, CHANNELS))
image = np.array(image)
image = augment(image)
images.append(image)
return np.array(images)
# This is the data generator
def datagen(paths, labels, batch_size=32):
for x in range(0,len(paths), batch_size):
# Load batch of images
batch_paths = paths[x:x+batch_size]
batch_images = open_images(batch_paths)
# Load batch of labels
batch_labels = labels[x:x+batch_size]
batch_labels = encode_labels(batch_labels)
batch_labels = np.array(batch_labels, dtype='float').reshape(-1)
yield batch_images, batch_labels
If you cannot get tf.keras.preprocessing.image.load_img working in your tensorflow version, try using an alternative method to load image and resize it. One alternative way would be to load the image with matplotlib and then resizing it with skimage. So the open_images function would be this:
import matplotlib
from skimage.transform import resize
def open_images(paths):
'''
Given a list of paths to images, this function loads
the images from the paths, then augments them, then returns it as a batch
'''
images = []
for path in paths:
image = matplotlib.image.imread(path)
image = np.array(image)
image = resize(image, (HEIGHT, WIDTH, CHANNELS))
image = augment(image)
images.append(image)
return np.array(images)
I already trained my Keras model in .h5. My model use 6 classes and it able to classify all the classes by using images. The model able to output the name of the class that it successfully classified. However, I want to generate accuracy when testing the model with an image input by user. I already searching everywhere but still there are no answer for this problem.
model = load_model('prototype-tl2-80-20.h5')
classes = { 1:'Kacip Fatimah',
2:'Mempisang',
3:'Misai Adam',
4:'Pandan Serapat',
5:'Tapak Sulaiman',
6:'Tongkat Ali'}
image = Image.open(file_path)
image = image.resize((224,224))
image = numpy.expand_dims(image, axis=0)
image = numpy.array(image)
pred = model.predict_classes([image])[0]
sign = classes[pred+1]
print(sign)
to predict an image using a trained model you have to be careful to make sure the image is processed exactly as the training images were processed. The image should be the same size (height,width) as the training images and have the same number of color bands example 'rgb' or 'grayscale'. Make sure color bands are in the same order as used in training. Next you must apply the same preprocessing to the image. For example if your training images were scaled to be between 0 and 1 then you need to rescale your test image with image=image/255. After that than do
pred = model.predict(image)
index=np.argmax(pred)
class=classes[index]
print (index, class)
I am trying to fit a CNN model (AlexNet architecture) with a 4900 images(480*640*3) dataset and I would like to do Data Augmentation, I have done a custom generator which use ImageDataGenerator method, because the images are on different paths and the labels too, so I have done a class who take all paths and save on two lists the images paths and its labels, then it loads on batches of 32 images and labels and fit the image data generator:
This is the method of the custom generator called from the model when it´s fit, and is where I fit the ImageDataGenerator
def __getitem__(self,index) :
batch_x=self.img_filenames[index * self.batch_size : (index+1) * self.batch_size]
batch_y=self.labels[index * self.batch_size: (index+1) * self.batch_size]
gen=ImageDataGenerator(rescale=1./255,
rotation_range=90,
brightness_range=(0.1,0.9),
horizontal_flip=True)
X=[plt.imread(filename) for filename in batch_x]
X,Y = next(gen.flow(x= np.array(X), y= np.array(batch_y), batch_size=self.batch_size))
return X,Y
I have some questions:
What is supposed that ImageDataGenerator returns, if I pass 32(batch_size) differents images, it returns 32 modified images, 1 for each one, or 32 images for each one, and if I only pass 1 image with a batch size of 32, it returns 32 modified images from that one? I'm almost sure that are 1 for each one but I want to confirm.
Secondly, if I want to have 40k images, if I change the index to 0 again when it exceed samples//batch_size,and change the len method multiplying by 2 or whatever I want, it is supposed that as the images are generated randomly, I will have 4900 new images or as much as I want isn´t it?
The main problem is that when it reach 0.5 accuracy it stops increasing, I have tried with 3 epochs and it is the same, it increase till 3 or 4 batches and then stops, so that is why my doubts.
Thanks you.
Let me try to answer
1. If you pass batch size 32 to ImageDataGenerator with horizontal_flip=True only, it flip all of 32 images horizontally and passes these 32 +32 (original + flipped) for training.
If you set horizontal_flip and vertical_flip, then 32+32+32 images will be passed for training.
For brightness_range it produces one image for each brightness scale corresponding to one original image. It means if your brightness scale is 0.1-0.5, then 32*5 images were produced.
I am not sure about the second question. A better choice is to do more data augmentation both on training and test data.
For third question, you should try efficient net with focal loss
I printed X.shape and seems to be the 32 images but modified, so It doesn´t multiply the images. And the method to augment the data which I said works fine too.
Sorry, this is a long one!
I am 80% sure that the problem is that I don't fully understand how tensorflow uses the tf.train.batch function to queue data.
I am trying to adapt one of the tensorflow tutorials to classify a large number of images.
Tutorial can be found here: https://www.tensorflow.org/tutorials/deep_cnn
I have built some modules which can encode my raw data in the same format that cifar10 uses. I am using this to construct training and evaluation data which the program is able to evaluate to a high degree of accuracy. Accuracy varies depending on the quality of the imagesets I put in. To keep things simple I have trained it using 32x32 monochrome tiles of either yellow or blue (category 0 and 1 respectively). Conveniently the network is able to identify whether it is being given a yellow or blue tile with 100% accuracy.
I have also been able to adapt cifar10_eval.py to output predictions rather than an accuracy percentage. This allows me to feed in un-classified data and output predictions as a list. To do this I have exchanged the statement:
top_k_op = tf.nn.in_top_k(logits, labels, 1)
for:
output_2 = tf.argmax(logits, 1)
I have added a variable and a boolean to the eval_once function call to allow it to access the definition for "output_2" and to let me switch between this and "top_k_op" depending on whether I am in evaluation mode or if I am predicting new data.
So far so good. This method works for small amounts of input data but fails as soon as I want to output more than 128 classifications. Not coincidentally 128 is the batch size.
In theory the first item (3073 bytes) in the binary should correspond to the first item in the list which is churned out when I am predicting new data. This happens for inputs of up to 128 images but the data gets jumbled up when I try to categorise more images. Actually, some of the predictions are lost completely!
There are a couple of reasons that this happens. The tutorial isn't designed to care about the order in which data is read or processed, just that individual images correspond with their labels. Originally the data loss was randomised(!) but I have managed to remove the random element by removing multi-threading (threads = 1 rather than 16) and stopped it from shuffling filenames.
filename_queue = tf.train.string_input_producer(filenames, shuffle=False)
string_input_producer has a hidden/optional argument which shuffles the file names. For model evaluation I have set this to false as above.
However.... I am still stuck with jumbled data loss when evaluating data larger than a single batch.
Does anyone know why this happens and have any ideas about how it could be fixed?
In theory I could redesign the code to rebuild the graph and evaluate it for 128 images at a time. However, I want to classify millions of images and feel that I'd be asking for trouble trying to open a new graph instance per batch.
PS, I've done my homework:
I have verified that my initial data to binary conversion works by running a program which can read cifar10-style files and interpret it as a big tile of images. I have run this code on both the original cifar10 binaries and my own binaries and am able to reconstruct both perfectly.
When I encode uncategorised data I add a category label of zero to make sure the tutorial can read the file. However, I make sure that this label is chucked away at the file reading stage and thus is not used when generating a list of predictions.
I have verified the output predictions by printing the list directly onto the screen as a python output and also by using it to assemble a PNG image which can be compared with the original inputs. This verification works perfectly for small batch sizes and starts to fall apart in larger batch sizes.
I've also made some modifications to the tutorial not discussed in this post. These are simple modifications such as changing the number of categories to 2 rather than 10. Am confident that this is not the issue.
PPS, here is a copy of some functions from the modified script. I haven't pasted everything because this question is already huge:
from cifar10_eval:
def eval_once(saver, summary_writer, top_k_op, output_2, summary_op, mapping=False):
"""Run Eval once.
Args:
saver: Saver.
summary_writer: Summary writer.
top_k_op: Top K op.
summary_op: Summary op.
"""
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
# Restores from checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
# Assuming model_checkpoint_path looks something like:
# /my-favorite-path/cifar10_train/model.ckpt-0,
# extract global_step from it.
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
else:
print('No checkpoint file found')
return
# Start the queue runners.
coord = tf.train.Coordinator()
try:
threads = []
for qr in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
threads.extend(qr.create_threads(sess, coord=coord, daemon=True,
start=True))
num_iter = int(math.ceil(FLAGS.num_examples / FLAGS.batch_size))
true_count = 0 # Counts the number of correct predictions.
total_sample_count = num_iter * FLAGS.batch_size
step = 0
output=[]
if mapping: # if in mapping mode generate a map, if in default mode (variable set to False by default) then tally predictions instead.
while step < num_iter and not coord.should_stop():
step += 1
hold = sess.run(output_2)
print(hold)
for i in range (len(hold)):
output.append(hold[i])
return(output)
from cifar10_input:
def inputs(mapping, data_dir, batch_size):
"""Construct input for CIFAR evaluation using the Reader ops.
Args:
mapping: bool, indicating if one should use the raw or pre-classified eval data set.
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
filelist = os.listdir(data_dir)
filenames = []
if mapping:
# from Raw_Image_Processor import file_name
for f in filelist:
if f.startswith("raw_batch"):
filenames.append(os.path.join(data_dir, f))
num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
else:
for f in filelist:
if f.startswith("eval_batch"):
filenames.append(os.path.join(data_dir, f))
num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
for f in filenames:
if not tf.gfile.Exists(f):
raise ValueError('Failed to find file: ' + f)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames, shuffle=False)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for evaluation.
# Crop the central [height, width] of the image.
resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,
height, width)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_standardization(resized_image)
# Set the shapes of tensors.
float_image.set_shape([height, width, 3])
read_input.label.set_shape([1])
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=False)
from cifar10_input:
def _generate_image_and_label_batch(image, label, min_queue_examples,
batch_size, shuffle):
"""Construct a queued batch of images and labels.
Args:
image: 3-D Tensor of [height, width, 3] of type.float32.
label: 1-D Tensor of type.int32
min_queue_examples: int32, minimum number of samples to retain
in the queue that provides of batches of examples.
batch_size: Number of images per batch.
shuffle: boolean indicating whether to use a shuffling queue.
Returns:
images: Images. 4D tensor of [batch_size, height, width, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
# Create a queue that shuffles the examples, and then
# read 'batch_size' images + labels from the example queue.
num_preprocess_threads = 16
if shuffle:
images, label_batch = tf.train.shuffle_batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size,
min_after_dequeue=min_queue_examples)
else:
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=1,
capacity=1,
enqueue_many = False)
# Display the training images in the visualizer.
tf.summary.image('images', images)
return images, tf.reshape(label_batch, [batch_size])
Edit:
Partial solution is given in the comments below. Information loss is dependent on batch size so it turns out that increasing the batch size (in mapping mode only) is an effective fix.
However, I'm still unsure why it loses and/or scrambles information when the batch size is exceeded. Presumably the batches are taken is some non-sequential order. I don't need it to take the project forwards but if someone could explain how or why this happens it would be greatly appreciated.
Edit 2:
It's back! I've set the batch size to be equivalent to one binary file (in my case roughly 10,000 images). Data is not lost or jumbled within this batch but when I try to process multiple files (about 30) it mixes up the batches a little rather than outputting them on a FIFO basis.
A picture is probably the easiest way for you to see what is going on:
classification map
This is a reconstructed image of a rock face from which the classifier has been trained to recognize three categories. As you can see the reconstruction is mostly smooth. However, there are two clean breaks near the top of the image where a batch (or 3) has been outputted in non-chronological order. These should have appeared at the bottom of the image rather than near the top.