I am trying to train a Unet model to do per pixel regression predictions on images. To do this, I separate my large image (1000x1000) to 200x200 pixel squares. Then use that to train an FCN model with a linear final layer. The loss function is MSE loss. In the prediction stage, I extract the same boxes but stitch it together and obtain a final output image. When I do that, the problem I am getting is that there is discontinuities between the boundary of boxes. (I can clearly see the boxes)
I've tried to deal with this by feeding 250x250 boxes to my FCN and calculating the loss for the 200x200 centre region. I do the same process for the prediction state. Extract 250x250 patches crop the 200x200 centre region and stitch the image back together. Please see some code below:
Loss Function:
criterion = nn.MSELoss()
optimizer = optim.Adam(self.model.parameters(), lr=LR)
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
output = model(inputs)
output = output.squeeze()
_, dimx, dimy = output.shape
loss = criterion(output[:,25:dimx-25, 25:dimy-25], labels[:,25:dimx-25, 25:dimy-25])
loss.backward()
optimizer.step()
My code for predictions is as follows:
pred = np.zeros((height, width))
for i in range(25, height, 200):
for j in range(25, width, 200):
patch = img[:, i-25:i+225, j-25:j+225]
patch = torch.from_numpy(patch)
patch = patch.unsqueeze(dim=0).to(device)
out = model(patch)
out = out[0,0,25:225, 25:225]
pred[i:i+200, j:j+200] = out.cpu().numpy()
I'm not sure if my problem makes complete sense. I can provide more clarification if necessary but I have been stuck on this for a while now.
It makes sense to have discontinuity near the boundary because there is no requirement for the network to have smooth predictions across boxes during the training.
I assume you have limited GPU memory, so you take only 200x200 pixels as input at a time; Thus, I would suggest the following two possible workarounds.
First, You could use torchvision.transform.RandomCrop to generate 200x200 cropped regions as inputs of the training. At the testing phase, you directly input the whole image to do the prediction. The intuition is that the model can see the full resolution of images, which is the same as testing data, while consuming fewer GPU memory during the training. In this case, you would also expect that the model needs more time to learn all training data patterns because it only sees partial data at a time.
Second, You could simply downsample training data, say 0.5x, and keep the output size, i.e. 1x. For example, in your case, after downsampling the input image to 200x200, the model takes it to predict 1000x1000 pixel level labels (you could use bilinear upsampling or deconv layers). This workaround method has been used in some segmentation implementations (AdaptSeg, DISE).
After some troubleshooting I realized that I had this problem because I was performing batch normalization between each convolutional layer. Removing that step solved the discontinuity problem.
Related
I would like to train a CNN only on a cropped part of the CNN-output. This mean that the input image has a resolution of w x h. The output image processed by my model is also w x h. The loss should then be computed by comparing the crop of the output image at the center (w/2 x h/2) and the label.
Is PyTorch taking care of the cropping or will my model not learn properly because of the cropping?
I am aware that training a CNN with a variable input size is possible; however, I am not sure if the weights will be adjusted correctly since my loss operates on a different resolution than my model's output.
Thanks!
I have a dataset consisting of small RGB images. Each image is then split into a specific number of patches, each then being resized and blurred (Gaussian). The input of my model (see Thermal Image Enhancement using CNN (10.1109/IROS.2016.7759059), shallow 3-layers network for increasing the resolution and handling blurring issues in thermal images) is resized + blurred patch while the expected result is just the resized patch.
The network is rather simple:
class TEN_Network(nn.Module):
def __init__(self) -> None:
super(TEN_Network, self).__init__()
self.model = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(1, 64, 7, stride=1, padding=3)),
('relu1', nn.ReLU(True)),
('conv2', nn.Conv2d(64, 32, 5, stride=1, padding=2)),
('relu2', nn.ReLU(True)),
('conv3', nn.Conv2d(32, 32, 3, stride=1, padding=1)),
('relu3', nn.ReLU(True)),
('conv4', nn.Conv2d(32, 1, 3, stride=1, padding=1))
]))
def forward(self, x):
x = self.model(x)
return x
I picked this CNN for numerous reasons simplicity being one of them. Since I am quite new to neural networks and PyTorch I thought it will provide a nice playground.
My question is this - given that each sample is split into patches should I zero the gradients (and respectively step() the optimizer) for each patch or should I calculate the average loss (sum of all losses for all patches in a sample divided by the number of patches) or should I run these two steps at the beginning of training my sample and after the patches have been processed (resulting in the above mentioned average loss per sample)?
Currently I have the following (pseudo-code, optimizer is Adam, loss is MSE):
# ...
for epoch_id in range(0, epochs_total):
# Load dataset with dataloader
dataloader = DataLoader(dataset=custom_dataset, ...)
# Train
for sample_expected, sample_input in next(iter(dataloader)):
loss_sample_avg = 0.0
patches_count = len(sample_expected)
# optimizer.zero_grad() <---- HERE(1)
for patch_expected, patch_input in zip(sample_expected, sample_input):
# optimizer.zero_grad() <---- HERE(2)
# ...
patch_predicted = model(patch_input)
loss = citerion(patch_predicted, patch_input)
loss_sample_avg += loss.item()
loss.backward()
# optimizer.step() <---- HERE(2)?
# optimizer.step() <---- HERE(1)
# Validate
...
This is probably me not getting something very basic. I know that
Optimizer step() should always be followed by a zero_grad()
A backward() call is possible only after another forward() pass has been processed
Since the paper does not provide any code (tried reaching out to the authors but as expected nothing came out) I am trying to figure out how to implement it myself.
You should pass all patches as a batch. performing a gradient step on a sole patch consitutes pure stochastic gradient descent, which is generally not preferred because it yeilds a very noisy estimate of the desired gradient. Furthermore looping over each patch from the image is computationally inefficient.
At a minimum, batch all patches from one image together. Ideally though, you would pass multiple patches from multiple images as a batch. So patch_input and patch_expected should each be [batch_size x color_channels x patch_height x path_width] in size. batch_size should itself be the number of sampled patches per image times the number of images in a batch.
Likely, this will require very little modification of your code other than adding a few lines to collate the inputs and targets in their existing for into a single tensor each. I'd provide specifics but it isn't evident what form your targets and inputs take currently.
I am trying to train a semantic-segmentation network (E-Net) in particular for high-quality human segmentation. For that, I have collected the "Supervisely Person" data-set and extracted the annotation masks using the provided API. This data-set holds high quality masks, thus I think it will provide better results in comparison to e.g. COCO data-set.
Supervisely - Example below : original image - ground truth.
First I want to give some details of the model. The network itself (Enet_arch) returns logits from the last convolution layer and probabilities which are produced through tf.nn.sigmoid(logits,name='logits_to_softmax').
I am using sigmoid cross-entropy on the ground truth and the returned logits, momentum and exponential decay on the learning rate. The model instance and the training pipeline is as follows.
self.global_step = tf.Variable(0, name='global_step', trainable=False)
self.momentum = tf.Variable(0.9, trainable=False)
# introducing weight decay
#with slim.arg_scope(ENet_arg_scope(weight_decay=2e-4)):
self.logits, self.probabilities = Enet_arch(inputs=self.input_data, num_classes=self.num_classes, batch_size=self.batch_size) # returns logits (2d), probabilities (2d)
#self.gt is int32 with values 0 or 1 (coming from read_tfrecords.Read_TFRecords annotation images + placeholder defined to int)
self.gt = self.input_masks
# self.probabilities is output of sigmoid, pixel-wise between probablities [0, 1].
# self.predictions is filtered probabilities > 0.5 = 1 else 0
self.predictions = tf.to_int32(self.probabilities > 0.5)
# capture segmentation accuracy
self.accuracy, self.accuracy_update = tf.metrics.accuracy(labels=self.gt, predictions=self.predictions)
# losses and updates
# calculate cross entropy loss on logits
loss = tf.losses.sigmoid_cross_entropy(multi_class_labels=self.gt, logits=self.logits)
# add the loss to total loss and average (?)
self.total_loss = tf.losses.get_total_loss()
# decay_steps = depend on the number of epochs
self.learning_rate = tf.train.exponential_decay(self.starter_learning_rate, global_step=self.global_step, decay_steps=123893, decay_rate=0.96, staircase=True)
#Now we can define the optimizer
#optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, epsilon=1e-8)
optimizer = tf.train.MomentumOptimizer(self.learning_rate, self.momentum)
#Create the train_op.
self.train_op = optimizer.minimize(loss, global_step=self.global_step)
I first tried to over-fit the model on a single image to identify the depth of details that this network can capture. To increase the output quality I resized all the images to 1080p before feeding them to the network. On this trial I trained the network for 10K iterations and the total error reached ~30% (captured from tf.losses.get_total_loss() ).
The results while training on a single image are pretty good as you can see below.
Supervisely - Example below : (1) Loss (2) input (before resizing) | ground truth (before resizing) | 1080p out
Later, I tried to train on the whole data-set but the training loss produce lot of oscillations. That means that in some images the network perform well and in some other not. As a results after 743360 iterations (which is 160 epochs, since the training set holds 4646 images) I stopped training since obviously there is something wrong with the hyper-parameters selection that I made.
Supervisely - Example below : (1) Loss (2) learning rate (3) input (before resizing) | ground truth (before resizing) | 1080p out
On the other hand on some instances of the training set images the network produce fair (not very good though) results like below.
Supervisely - Example below : input (before resizing) | ground truth (before resizing) | 1080p out
Why do I have those differences on these training instances? Are there any obvious changes that I should do on the model or on the hyper-parameters? Is it possible that this model is just not suitable for this use-case (e.g. low network capacity) ?
Thanks in advance.
It turns out that the problem here is indeed E-net architecture. I changed the architecture with DeepLabV3 and saw a big difference in loss behaviour and performance.. even in small resolution!
I have a bunch of images that look like this of someone playing a videogame (a simple game I created in Tkinter):
The idea of the game is that the user controls the box at the bottom of the screen in order to dodge the falling balls (they can only dodge left and right).
My goal is to have the neural network output the position of the player on the bottom of the screen. If they're totally on the left, the neural network should output a 0, if they're in the middle, a .5, and all the way right, a 1, and all the values in-between.
My images are 300x400 pixels. I stored my data very simply. I recorded each of the images and position of the player as a tuple for each frame in a 50-frame game. Thus my result was a list in the form [(image, player position), ...] with 50 elements. I then pickled that list.
So in my code I try to create an extremely basic feed-forward network that takes in the image and outputs a value between 0 and 1 representing where the box on the bottom of the image is. But my neural network is only outputting 1s.
What should I change in order to get it to train and output values close to what I want?
Of course, here is my code:
# machine learning code mostly from https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pickle
def pil_image_to_np_array(image):
'''Takes an image and converts it to a numpy array'''
# from https://stackoverflow.com/a/45208895
# all my images are black and white, so I only need one channel
return np.array(image)[:, :, 0:1]
def data_to_training_set(data):
# split the list in the form [(frame 1 image, frame 1 player position), ...] into [[all images], [all player positions]]
inputs, outputs = [list(val) for val in zip(*data)]
for index, image in enumerate(inputs):
# convert the PIL images into numpy arrays so Keras can process them
inputs[index] = pil_image_to_np_array(image)
return (inputs, outputs)
if __name__ == "__main__":
# fix random seed for reproducibility
np.random.seed(7)
# load data
# data will be in the form [(frame 1 image, frame 1 player position), (frame 2 image, frame 2 player position), ...]
with open("position_data1.pkl", "rb") as pickled_data:
data = pickle.load(pickled_data)
X, Y = data_to_training_set(data)
# get the width of the images
width = X[0].shape[1] # == 400
# convert the player position (a value between 0 and the width of the image) to values between 0 and 1
for index, output in enumerate(Y):
Y[index] = output / width
# flatten the image inputs so they can be passed to a neural network
for index, inpt in enumerate(X):
X[index] = np.ndarray.flatten(inpt)
# keras expects an array (not a list) of image-arrays for input to the neural network
X = np.array(X)
Y = np.array(Y)
# create model
model = Sequential()
# my images are 300 x 400 pixels, so each input will be a flattened array of 120000 gray-scale pixel values
# keep it super simple by not having any deep learning
model.add(Dense(1, input_dim=120000, activation='sigmoid'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
# Fit the model
model.fit(X, Y, epochs=15, batch_size=10)
# see what the model is doing
predictions = model.predict(X, batch_size=10)
print(predictions) # this prints all 1s! # TODO fix
EDIT: print(Y) gives me:
so it's definitely not all zeroes.
Of course, a deeper model might give you a better accuracy, but considering the fact that your images are simple, a pretty simple (shallow) model with only one hidden layer should give a medium to high accuracy. So here are the modifications you need to make this happen:
Make sure X and Y are of type float32 (currently, X is of type uint8):
X = np.array(X, dtype=np.float32)
Y = np.array(Y, dtype=np.float32)
When training a neural network it would be much better to normalize the training data. Normalization helps the optimization process to go smoothly and speed up the convergence to a solution. It further prevent large values to cause large gradient updates which would be desruptive. Usually, the values of each feature in the input data should fall in a small range, where two common ranges are [-1,1] and [0,1]. Therefore, to make sure that all values fall in the range [-1,1], we subtract from each feature its mean and divide it by its standard deviation:
X_mean = X.mean(axis=0)
X -= X_mean
X_std = X.std(axis=0)
X /= X_std + 1e-8 # add a very small constant to prevent division by zero
Note that we are normalizing each feature (i.e. each pixel in this case) here not each image. When you want to predict on new data, i.e. in inference or testing mode, you need to subtract X_mean from test data and divide it by X_std (you should NEVER EVER subtract from test data its own mean or divide it by its own standard deviation; rather, use the mean and std of training data):
X_test -= X_mean
X_test /= X_std + 1e-8
If you apply the changes in points one and two, you might notice that the network no longer predicts only ones or only zeros. Rather, it shows some faint signs of learning and predicts a mix of zeros and ones. This is not bad but it is far from good and we have high expectations! The predictions should be much better than a mix of only zeros and ones. There, you should take into account the (forgotten!) learning rate. Since the network has relatively large number of parameters considering a relatively simple problem (and there are a few samples of training data), you should choose a smaller learning rate to smooth the gradient updates and the learning process:
from keras import optimizers
model.compile(loss='mean_squared_error', optimizer=optimizers.Adam(lr=0.0001))
You would notice the difference: the loss value reaches to around 0.01 after 10 epochs. And the network no longer predicts a mix of zeros and ones; rather the predictions are much more accurate and close to what they should be (i.e. Y).
Don't forget! We have high (logical!) expectations. So, how can we do better without adding any new layers to the network (obviously, we assume that adding more layers might help!!)?
4.1. Gather more training data.
4.2. Add weight regularization. Common ones are L1 and L2 regularization (I highly recommend the Jupyter notebooks of the the book Deep Learning with Python written by François Chollet the creator of Keras. Specifically, here is the one which discusses regularization.)
You should always evaluate your model in a proper and unbiased way. Evaluating it on the training data (that you have used to train it) does not tell you anything about how well your model would perform on unseen (i.e. new or real world) data points (e.g. consider a model which stores or memorize all the training data. It would perform perfectly on the training data, but it would be a useless model and perform poorly on new data). So we should have test and train datasets: we train model on the training data and evaluate the model on the test (i.e. new) data. However, during the process of coming up with a good model you are performing lots of experiments: for example, you first change the type and number of layers, train the model and then evaluate it on test data to make sure it is good. Then you change another thing say the learning rate, train it again and then evaluate it again on test data... To make it short, these cycles of tuning and evaluations somehow causes an over-fitting on the test data. Therefore, we would need a third dataset called validation data (read more: What is the difference between test set and validation set?):
# first shuffle the data to make sure it isn't in any particular order
indices = np.arange(X.shape[0])
np.random.shuffle(indices)
X = X[indices]
Y = Y[indices]
# you have 200 images
# we select 100 images for training,
# 50 images for validation and 50 images for test data
X_train = X[:100]
X_val = X[100:150]
X_test = X[150:]
Y_train = Y[:100]
Y_val = Y[100:150]
Y_test = Y[150:]
# train and tune the model
# you can attempt train and tune the model multiple times,
# each time with different architecture, hyper-parameters, etc.
model.fit(X_train, Y_train, epochs=15, batch_size=10, validation_data=(X_val, Y_val))
# only and only after completing the tuning of your model
# you should evaluate it on the test data for just one time
model.evaluate(X_test, Y_test)
# after you are satisfied with the model performance
# and want to deploy your model for production use (i.e. real world)
# you can train your model once more on the whole data available
# with the best configurations you have found out in your tunings
model.fit(X, Y, epochs=15, batch_size=10)
(Actually, when we have few training data available it would be wasteful to separate validation and test data from whole available data. In this case, and if the model is not computationally expensive, instead of separating a validation set which is called cross-validation, one can do K-fold cross-validation or iterated K-fold cross-validation in case of having very few data samples.)
It is around 4 AM at the time of writing this answer and I am feeling sleepy, but I would like to mention one more thing which is not directly related to your question: by using the Numpy library and its functionalities and methods you can write more concise and efficient code and also save yourself a lot time. So make sure you practice using it more as it is heavily used in machine learning community and libraries. To demonstrate this, here is the same code you have written but with more use of Numpy (Note that I have not applied all the changes I mentioned above in this code):
# machine learning code mostly from https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pickle
def pil_image_to_np_array(image):
'''Takes an image and converts it to a numpy array'''
# from https://stackoverflow.com/a/45208895
# all my images are black and white, so I only need one channel
return np.array(image)[:, :, 0]
def data_to_training_set(data):
# split the list in the form [(frame 1 image, frame 1 player position), ...] into [[all images], [all player positions]]
inputs, outputs = zip(*data)
inputs = [pil_image_to_np_array(image) for image in inputs]
inputs = np.array(inputs, dtype=np.float32)
outputs = np.array(outputs, dtype=np.float32)
return (inputs, outputs)
if __name__ == "__main__":
# fix random seed for reproducibility
np.random.seed(7)
# load data
# data will be in the form [(frame 1 image, frame 1 player position), (frame 2 image, frame 2 player position), ...]
with open("position_data1.pkl", "rb") as pickled_data:
data = pickle.load(pickled_data)
X, Y = data_to_training_set(data)
# get the width of the images
width = X.shape[2] # == 400
# convert the player position (a value between 0 and the width of the image) to values between 0 and 1
Y /= width
# flatten the image inputs so they can be passed to a neural network
X = np.reshape(X, (X.shape[0], -1))
# create model
model = Sequential()
# my images are 300 x 400 pixels, so each input will be a flattened array of 120000 gray-scale pixel values
# keep it super simple by not having any deep learning
model.add(Dense(1, input_dim=120000, activation='sigmoid'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
# Fit the model
model.fit(X, Y, epochs=15, batch_size=10)
# see what the model is doing
predictions = model.predict(X, batch_size=10)
print(predictions) # this prints all 1s! # TODO fix
Sorry, this is a long one!
I am 80% sure that the problem is that I don't fully understand how tensorflow uses the tf.train.batch function to queue data.
I am trying to adapt one of the tensorflow tutorials to classify a large number of images.
Tutorial can be found here: https://www.tensorflow.org/tutorials/deep_cnn
I have built some modules which can encode my raw data in the same format that cifar10 uses. I am using this to construct training and evaluation data which the program is able to evaluate to a high degree of accuracy. Accuracy varies depending on the quality of the imagesets I put in. To keep things simple I have trained it using 32x32 monochrome tiles of either yellow or blue (category 0 and 1 respectively). Conveniently the network is able to identify whether it is being given a yellow or blue tile with 100% accuracy.
I have also been able to adapt cifar10_eval.py to output predictions rather than an accuracy percentage. This allows me to feed in un-classified data and output predictions as a list. To do this I have exchanged the statement:
top_k_op = tf.nn.in_top_k(logits, labels, 1)
for:
output_2 = tf.argmax(logits, 1)
I have added a variable and a boolean to the eval_once function call to allow it to access the definition for "output_2" and to let me switch between this and "top_k_op" depending on whether I am in evaluation mode or if I am predicting new data.
So far so good. This method works for small amounts of input data but fails as soon as I want to output more than 128 classifications. Not coincidentally 128 is the batch size.
In theory the first item (3073 bytes) in the binary should correspond to the first item in the list which is churned out when I am predicting new data. This happens for inputs of up to 128 images but the data gets jumbled up when I try to categorise more images. Actually, some of the predictions are lost completely!
There are a couple of reasons that this happens. The tutorial isn't designed to care about the order in which data is read or processed, just that individual images correspond with their labels. Originally the data loss was randomised(!) but I have managed to remove the random element by removing multi-threading (threads = 1 rather than 16) and stopped it from shuffling filenames.
filename_queue = tf.train.string_input_producer(filenames, shuffle=False)
string_input_producer has a hidden/optional argument which shuffles the file names. For model evaluation I have set this to false as above.
However.... I am still stuck with jumbled data loss when evaluating data larger than a single batch.
Does anyone know why this happens and have any ideas about how it could be fixed?
In theory I could redesign the code to rebuild the graph and evaluate it for 128 images at a time. However, I want to classify millions of images and feel that I'd be asking for trouble trying to open a new graph instance per batch.
PS, I've done my homework:
I have verified that my initial data to binary conversion works by running a program which can read cifar10-style files and interpret it as a big tile of images. I have run this code on both the original cifar10 binaries and my own binaries and am able to reconstruct both perfectly.
When I encode uncategorised data I add a category label of zero to make sure the tutorial can read the file. However, I make sure that this label is chucked away at the file reading stage and thus is not used when generating a list of predictions.
I have verified the output predictions by printing the list directly onto the screen as a python output and also by using it to assemble a PNG image which can be compared with the original inputs. This verification works perfectly for small batch sizes and starts to fall apart in larger batch sizes.
I've also made some modifications to the tutorial not discussed in this post. These are simple modifications such as changing the number of categories to 2 rather than 10. Am confident that this is not the issue.
PPS, here is a copy of some functions from the modified script. I haven't pasted everything because this question is already huge:
from cifar10_eval:
def eval_once(saver, summary_writer, top_k_op, output_2, summary_op, mapping=False):
"""Run Eval once.
Args:
saver: Saver.
summary_writer: Summary writer.
top_k_op: Top K op.
summary_op: Summary op.
"""
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
# Restores from checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
# Assuming model_checkpoint_path looks something like:
# /my-favorite-path/cifar10_train/model.ckpt-0,
# extract global_step from it.
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
else:
print('No checkpoint file found')
return
# Start the queue runners.
coord = tf.train.Coordinator()
try:
threads = []
for qr in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
threads.extend(qr.create_threads(sess, coord=coord, daemon=True,
start=True))
num_iter = int(math.ceil(FLAGS.num_examples / FLAGS.batch_size))
true_count = 0 # Counts the number of correct predictions.
total_sample_count = num_iter * FLAGS.batch_size
step = 0
output=[]
if mapping: # if in mapping mode generate a map, if in default mode (variable set to False by default) then tally predictions instead.
while step < num_iter and not coord.should_stop():
step += 1
hold = sess.run(output_2)
print(hold)
for i in range (len(hold)):
output.append(hold[i])
return(output)
from cifar10_input:
def inputs(mapping, data_dir, batch_size):
"""Construct input for CIFAR evaluation using the Reader ops.
Args:
mapping: bool, indicating if one should use the raw or pre-classified eval data set.
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
filelist = os.listdir(data_dir)
filenames = []
if mapping:
# from Raw_Image_Processor import file_name
for f in filelist:
if f.startswith("raw_batch"):
filenames.append(os.path.join(data_dir, f))
num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
else:
for f in filelist:
if f.startswith("eval_batch"):
filenames.append(os.path.join(data_dir, f))
num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
for f in filenames:
if not tf.gfile.Exists(f):
raise ValueError('Failed to find file: ' + f)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames, shuffle=False)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for evaluation.
# Crop the central [height, width] of the image.
resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,
height, width)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_standardization(resized_image)
# Set the shapes of tensors.
float_image.set_shape([height, width, 3])
read_input.label.set_shape([1])
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=False)
from cifar10_input:
def _generate_image_and_label_batch(image, label, min_queue_examples,
batch_size, shuffle):
"""Construct a queued batch of images and labels.
Args:
image: 3-D Tensor of [height, width, 3] of type.float32.
label: 1-D Tensor of type.int32
min_queue_examples: int32, minimum number of samples to retain
in the queue that provides of batches of examples.
batch_size: Number of images per batch.
shuffle: boolean indicating whether to use a shuffling queue.
Returns:
images: Images. 4D tensor of [batch_size, height, width, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
# Create a queue that shuffles the examples, and then
# read 'batch_size' images + labels from the example queue.
num_preprocess_threads = 16
if shuffle:
images, label_batch = tf.train.shuffle_batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size,
min_after_dequeue=min_queue_examples)
else:
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=1,
capacity=1,
enqueue_many = False)
# Display the training images in the visualizer.
tf.summary.image('images', images)
return images, tf.reshape(label_batch, [batch_size])
Edit:
Partial solution is given in the comments below. Information loss is dependent on batch size so it turns out that increasing the batch size (in mapping mode only) is an effective fix.
However, I'm still unsure why it loses and/or scrambles information when the batch size is exceeded. Presumably the batches are taken is some non-sequential order. I don't need it to take the project forwards but if someone could explain how or why this happens it would be greatly appreciated.
Edit 2:
It's back! I've set the batch size to be equivalent to one binary file (in my case roughly 10,000 images). Data is not lost or jumbled within this batch but when I try to process multiple files (about 30) it mixes up the batches a little rather than outputting them on a FIFO basis.
A picture is probably the easiest way for you to see what is going on:
classification map
This is a reconstructed image of a rock face from which the classifier has been trained to recognize three categories. As you can see the reconstruction is mostly smooth. However, there are two clean breaks near the top of the image where a batch (or 3) has been outputted in non-chronological order. These should have appeared at the bottom of the image rather than near the top.