How to label training data for CNN? - python

I'm trying to design an neural network that predicts a photo image based on 5 distinct film stock. I have 5000 total images. 4000 training and 1000 testing. I've stored images into two sub-folders for training and testing data.
training_dir = r'C:\...\Training Set'
test_dir = r'C:\...\Test Set'
I'm able to collect the training images using skimage io.ImageCollection.
folders = []
for image_path in os.scandir(training_dir):
img = io.ImageCollection(os.path.join(training_dir, image_path, '*.jpg'))
folders.append(img)
I then collect the training images based on class and apply a loop to save image data into a list.
ektachrome = folders[0]
HP5 = folders[1]
LomoP = folders[2]
Trix = folders[3]
velvia = folders[4]
images = []
for i in range(0, 800):
ekta = ektachrome[i]
images.append(ekta)
hp5 = HP5[i]
images.append(hp5)
lomo = LomoP[i]
images.append(lomo)
trix = Trix[i]
images.append(trix)
Velvia = velvia[i]
images.append(Velvia)
When I put the list of training images into an array np.asarray(images).shape, I get a shape of (4000,). I'm having trouble labeling the data. Here are my labels.
label = {'Ektachrome':1, 'HP5':2, 'Lomochrome Purple':3, 'Tri-X':4, 'Velvia 50':5}
How do I label my images?

As per my understanding, your Images should be Categorized based on the Class Names, as shown below (since you've mentioned Folders[0], Folders[1], etc.):
In that case, you can use the below code for Labelling the Images.
Ektachrome_dir = train_dir / 'Ektachrome'
HP5_dir = train_dir / 'HP5'
Lomochrome_Purple_dir = train_dir / 'Lomochrome Purple'
Tri_X_dir = train_dir / 'Tri-X'
Velvia_50_dir = train_dir / 'Velvia 50'
# Get the list of all the images
Ektachrome_Images = Ektachrome_dir.glob('*.jpeg')
HP5_Images = HP5_dir.glob('*.jpeg')
Lomochrome_Purple_Images = Lomochrome_Purple_dir.glob('*.jpeg')
Tri_X_Images = Tri_X_dir.glob('*.jpeg')
Velvia_50_Images = Velvia_50_dir.glob('*.jpeg')
# An empty list. We will insert the data into this list in (img_path, label) format
train_data = []
for img in Ektachrome_Images:
train_data.append((img,1))
for img in HP5_Images:
train_data.append((img, 2))
for img in Lomochrome_Purple_Images:
train_data.append((img, 3))
for img in Tri_X_Images:
train_data.append((img, 4))
for img in Velvia_50_Images:
train_data.append((img, 5))

Related

Pytorch DataLoader doesn't return batched data

My dataset is composed of image patches obtained from the original image (face patches and random outside of face patches). Patches are stored in a folder with a name of an original image from which patches originate. I created my own DataSet and DataLoader but when I iterate over the dataset data is not returned in batches. A batch of size 1 should include an array of tuples of patches and a label, so with the increased batch size, we should get an array of arrays of tuples with labels. But DataLoader returns only one array of tuples no matter the batch size.
My dataset:
import os
import cv2 as cv
import PIL.Image as Image
import torchvision.transforms as Transforms
from torch.utils.data import dataset
class PatchDataset(dataset.Dataset):
def __init__(self, img_folder, n_patches):
self.img_folder = img_folder
self.n_patches = n_patches
self.img_names = sorted(os.listdir(img_folder))
self.transform = Transforms.Compose([
Transforms.Resize((50, 50)),
Transforms.ToTensor()
])
def __len__(self):
return len(self.img_names)
def __getitem__(self, idx):
img_name = self.img_names[idx]
patch_dir = os.path.join(self.img_folder, img_name)
patches = []
for i in range(self.n_patches):
face_patch = cv.imread(os.path.join(patch_dir, f'{str(i)}_face.png'))
face_patch = cv.cvtColor(face_patch, cv.COLOR_BGR2RGB)
face_patch = Image.fromarray(face_patch)
face_patch = self.transform(face_patch)
patch = cv.imread(os.path.join(patch_dir, f'{str(i)}_patch.png'))
patch = cv.cvtColor(patch, cv.COLOR_BGR2RGB)
patch = Image.fromarray(patch)
patch = self.transform(patch)
patches.append((face_patch, patch))
return patches, int(img_name.split('-')[0])
Then I use it as such:
X = PatchDataset(PATCHES_DIR, 9)
train_dl = dataloader.DataLoader(
X,
batch_size=10,
drop_last=True
)
for batch_X, batch_Y in train_dl:
print(len(batch_X))
print(len(batch_Y))
In this provided case the batch size is 10, so printing of the batch_Y returns the correct number (10). But the printing of the batch_X returns 9 which is number of patch pairs - returns only one sample from dataset instead of batch of 10 samples where each of them is length of 9.
You should return a one dimension higher tensor instead of a list of tensors in __get_item__ function call. You can use torch.stack(patches).
def __getitem__(self, idx):
img_name = self.img_names[idx]
patch_dir = os.path.join(self.img_folder, img_name)
patches = []
for i in range(self.n_patches):
face_patch = cv.imread(os.path.join(patch_dir, f'{str(i)}_face.png'))
face_patch = cv.cvtColor(face_patch, cv.COLOR_BGR2RGB)
face_patch = Image.fromarray(face_patch)
face_patch = self.transform(face_patch)
patch = cv.imread(os.path.join(patch_dir, f'{str(i)}_patch.png'))
patch = cv.cvtColor(patch, cv.COLOR_BGR2RGB)
patch = Image.fromarray(patch)
patch = self.transform(patch)
patches.append((face_patch, patch))
return torch.stack(patches), int(img_name.split('-')[0])

Why converting npy files (containing video frames) to tfrecords consumes too much disk space?

I am working on a violence detection service. I am trying to develop software based on the code in this repo. My dataset consists of videos resided in two directories "Violence" and "Non-Violence".
I used this code to generate npy files out of RGB channels and optical flow features. The output of this part would be 2 folders containing npy array with 244x244x5 shape. (np.float32 dtype). so it's like I have video frames in RGB in the first 3 channels (npy[...,:3]) and optical flow features in the next two channels (npy[..., 3:]).
Now I am trying to convert them to tfrecords and use tf.data.tfrecorddataset to speed up the training process. Since my model input has to be a cube tensor, my training elements has to be 64 frames of each video. It means the data point shape has to be 64x244x244x5.
So I used this code to convert the npy files to tfrecords.
from pathlib import Path
from os.path import join
import tensorflow as tf
import numpy as np
import cv2
from tqdm import tqdm
def normalize(data):
mean = np.mean(data)
std = np.std(data)
return (data - mean) / std
def random_flip(video, prob):
s = np.random.rand()
if s < prob:
video = np.flip(m=video, axis=2)
return video
def color_jitter(video):
# range of s-component: 0-1
# range of v component: 0-255
s_jitter = np.random.uniform(-0.2, 0.2)
v_jitter = np.random.uniform(-30, 30)
for i in range(len(video)):
hsv = cv2.cvtColor(video[i], cv2.COLOR_RGB2HSV)
s = hsv[..., 1] + s_jitter
v = hsv[..., 2] + v_jitter
s[s < 0] = 0
s[s > 1] = 1
v[v < 0] = 0
v[v > 255] = 255
hsv[..., 1] = s
hsv[..., 2] = v
video[i] = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
return video
def uniform_sample(video: str, target_frames: int = 64) -> np.ndarray:
"""
gets video and outputs n_frames number of frames in video.
Args:
video:
target_frames:
Returns:
"""
len_frames = int(len(data))
interval = int(np.ceil(len_frames / target_frames))
# init empty list for sampled video and
sampled_video = []
for i in range(0, len_frames, interval):
sampled_video.append(video[i])
# calculate number of padded frames and fix it
num_pad = target_frames - len(sampled_video)
if num_pad > 0:
padding = [video[i] for i in range(-num_pad, 0)]
sampled_video += padding
return np.array(sampled_video, dtype=np.float32)
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
if __name__ == '__main__':
path = Path('transformed/')
npy_files = list(path.rglob('*.npy'))[:100]
aug = True
# one_hots = to_categorical(range(2), dtype=np.int8)
path_to_save = 'data_tfrecords'
tfrecord_path = join(path_to_save, 'all_data.tfrecord')
with tf.io.TFRecordWriter(tfrecord_path) as writer:
for file in tqdm(npy_files, desc='files converted'):
# load npy files
npy = np.load(file.as_posix(), mmap_mode='r')
data = np.float32(npy)
del npy
# Uniform sampling
data = uniform_sample(data, target_frames=64)
# Add augmentation
if aug:
data[..., :3] = color_jitter(data[..., :3])
data = random_flip(data, prob=0.5)
# Normalization
data[..., :3] = normalize(data[..., :3])
data[..., 3:] = normalize(data[..., 3:])
# Label one hot encoding
label = 1 if file.parent.stem.startswith('F') else 0
# label = one_hots[label]
feature = {'image': _bytes_feature(tf.compat.as_bytes(data.tobytes())),
'label': _int64_feature(int(label))}
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
The code works fine, but the real problem is that it consumes too much disk drive. my whole dataset consisting of 2000 videos takes 12 GB, when I converted them to npy files, it became around 80 GB, and now using tfrecords It became over 120 GB or so. How can I convert them in an efficient way to reduce the space required to store them?
The answer might be too late. But I see you are still saving the video frame in your tfrecords file.
Try removing the "image" feature from your features list. And saving per frame as their Height, Width, Channels, and so forth.
feature = {'label': _int64_feature(int(label))}
Which is why the file is taking more space.

How to load a large image dataset efficiently?

I am trying to work on an image colorizer using autoencoders. The 'input' is a grayscale image and the 'labels' are their corresponding color images. I am trying to get it to work on google colab on a subset of the dataset. The problem is that the session crashes when I try to convert the list of images to a numpy array.
Here's what I tried:
X = []
y = []
errors = 0
count = 0
image_path = 'drive/My Drive/datasets/imagenette'
for file in tqdm(os.listdir(image_path)):
try:
# Load, transform image
color_image = io.imread(os.path.join(image_path,file))
color_image = transform.resize(color_image,(224,224))
lab_image = color.rgb2lab(color_image)
L = lab_image[:,:,0]/100.
ab = lab_image[:,:,1:]/128.
# Append to list
gray_scale_image = np.stack((L,L,L), axis=2)
X.append(gray_scale_image)
y.append(ab)
count += 1
if count == 5000:
break
except:
errors += 1
print(f'Errors encountered: {errors}')
X = np.array(X)
y = np.array(y)
print(X.shape)
print(y.shape)
Is there a way to load only a few batches, feed them and while the model is being trained on them, load another set of batches?

Accessing filename from file queue in Tensor Flow

I have a directory of images, and a separate file matching image filenames to labels. So the directory of images has files like 'train/001.jpg' and the labeling file looks like:
train/001.jpg 1
train/002.jpg 2
...
I can easily load images from the image directory in Tensor Flow by creating a filequeue from the filenames:
filequeue = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
img = reader.read(filequeue)
But I'm at a loss for how to couple these files with the labels from the labeling file. It seems I need access to the filenames inside the queue at each step. Is there a way to get them? Furthermore, once I have the filename, I need to be able to look up the label keyed by the filename. It seems like a standard Python dictionary wouldn't work because these computations need to happen at each step in the graph.
Given that your data is not too large for you to supply the list of filenames as a python array, I'd suggest just doing the preprocessing in Python. Create two lists (same order) of the filenames and the labels, and insert those into either a randomshufflequeue or a queue, and dequeue from that. If you want the "loops infinitely" behavior of the string_input_producer, you could re-run the 'enqueue' at the start of every epoch.
A very toy example:
import tensorflow as tf
f = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8"]
l = ["l1", "l2", "l3", "l4", "l5", "l6", "l7", "l8"]
fv = tf.constant(f)
lv = tf.constant(l)
rsq = tf.RandomShuffleQueue(10, 0, [tf.string, tf.string], shapes=[[],[]])
do_enqueues = rsq.enqueue_many([fv, lv])
gotf, gotl = rsq.dequeue()
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
sess.run(do_enqueues)
for i in xrange(2):
one_f, one_l = sess.run([gotf, gotl])
print "F: ", one_f, "L: ", one_l
The key is that you're effectively enqueueing pairs of filenames/labels when you do the enqueue, and those pairs are returned by the dequeue.
Here's what I was able to do.
I first shuffled the filenames and matched the labels to them in Python:
np.random.shuffle(filenames)
labels = [label_dict[f] for f in filenames]
Then created a string_input_producer for the filenames with shuffle off, and a FIFO for labels:
lv = tf.constant(labels)
label_fifo = tf.FIFOQueue(len(filenames),tf.int32, shapes=[[]])
file_fifo = tf.train.string_input_producer(filenames, shuffle=False, capacity=len(filenames))
label_enqueue = label_fifo.enqueue_many([lv])
Then to read the image I could use a WholeFileReader and to get the label I could dequeue the fifo:
reader = tf.WholeFileReader()
image = tf.image.decode_jpeg(value, channels=3)
image.set_shape([128,128,3])
result.uint8image = image
result.label = label_fifo.dequeue()
And generate the batches as follows:
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
num_preprocess_threads = 16
images, label_batch = tf.train.shuffle_batch(
[result.uint8image, result.label],
batch_size=FLAGS.batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * FLAGS.batch_size,
min_after_dequeue=min_queue_examples)
There is tf.py_func() you could utilize to implement a mapping from file path to label.
files = gfile.Glob(data_pattern)
filename_queue = tf.train.string_input_producer(
files, num_epochs=num_epochs, shuffle=True) # list of files to read
def extract_label(s):
# path to label logic for cat&dog dataset
return 0 if os.path.basename(str(s)).startswith('cat') else 1
def read(filename_queue):
key, value = reader.read(filename_queue)
image = tf.image.decode_jpeg(value, channels=3)
image = tf.cast(image, tf.float32)
image = tf.image.resize_image_with_crop_or_pad(image, width, height)
label = tf.cast(tf.py_func(extract_label, [key], tf.int64), tf.int32)
label = tf.reshape(label, [])
training_data = [read(filename_queue) for _ in range(num_readers)]
...
tf.train.shuffle_batch_join(training_data, ...)
I used this:
filename = filename.strip().decode('ascii')
Another suggestion is to save your data in TFRecord format. In this case you would be able to save all images and all labels in the same file. For a big number of files it gives a lot of advantages:
can store data and labels at the same place
data is allocated at one place (no need to remember various directories)
if there are many files (images), opening/closing a file is time consuming. Seeking the location of the file from ssd/hdd also takes time

SVM with openCV & Python

I'm tryint to build an application that classifies different objects. I have a training folder with a bunch of images i want to use as training for my SVM.
Up untill now I have followed this (GREAT) answer:
using OpenCV and SVM with images
here is a sample of my code:
def getTrainingData():
address = "..//data//training"
labels = []
trainingData = []
for items in os.listdir(address):
## extracts labels
name = address + "//" + items
for it in os.listdir(name):
path = name + "//" + it
print path
img = cv.imread(path, cv.CV_LOAD_IMAGE_GRAYSCALE)
d = np.array(img, dtype = np.float32)
q = d.flatten()
trainingData.append(q)
labels.append(items)
######DEBUG######
#cv.namedWindow(path,cv.WINDOW_NORMAL)
#cv.imshow(path,img)
return trainingData, labels
svm_params = dict( kernel_type = cv.SVM_LINEAR,
svm_type = cv.SVM_C_SVC,
C=2.67, gamma=3 )
training, labels = getTrainingData()
train = np.asarray(training)
svm = cv.SVM()
svm.train(train, labels, params=svm_params)
svm.save('svm_data.dat')
But when i try to run i recieve the following error:
svm.train(train, labels, params=svm_params)
TypeError: trainData data type = 17 is not supported
What am i doing wrong?
Thanks A lot!
You should resize your input images. like this:
img = cv2.resize(img, (64,64))
Size is up to you.

Categories