I have a directory of images, and a separate file matching image filenames to labels. So the directory of images has files like 'train/001.jpg' and the labeling file looks like:
train/001.jpg 1
train/002.jpg 2
...
I can easily load images from the image directory in Tensor Flow by creating a filequeue from the filenames:
filequeue = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
img = reader.read(filequeue)
But I'm at a loss for how to couple these files with the labels from the labeling file. It seems I need access to the filenames inside the queue at each step. Is there a way to get them? Furthermore, once I have the filename, I need to be able to look up the label keyed by the filename. It seems like a standard Python dictionary wouldn't work because these computations need to happen at each step in the graph.
Given that your data is not too large for you to supply the list of filenames as a python array, I'd suggest just doing the preprocessing in Python. Create two lists (same order) of the filenames and the labels, and insert those into either a randomshufflequeue or a queue, and dequeue from that. If you want the "loops infinitely" behavior of the string_input_producer, you could re-run the 'enqueue' at the start of every epoch.
A very toy example:
import tensorflow as tf
f = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8"]
l = ["l1", "l2", "l3", "l4", "l5", "l6", "l7", "l8"]
fv = tf.constant(f)
lv = tf.constant(l)
rsq = tf.RandomShuffleQueue(10, 0, [tf.string, tf.string], shapes=[[],[]])
do_enqueues = rsq.enqueue_many([fv, lv])
gotf, gotl = rsq.dequeue()
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
sess.run(do_enqueues)
for i in xrange(2):
one_f, one_l = sess.run([gotf, gotl])
print "F: ", one_f, "L: ", one_l
The key is that you're effectively enqueueing pairs of filenames/labels when you do the enqueue, and those pairs are returned by the dequeue.
Here's what I was able to do.
I first shuffled the filenames and matched the labels to them in Python:
np.random.shuffle(filenames)
labels = [label_dict[f] for f in filenames]
Then created a string_input_producer for the filenames with shuffle off, and a FIFO for labels:
lv = tf.constant(labels)
label_fifo = tf.FIFOQueue(len(filenames),tf.int32, shapes=[[]])
file_fifo = tf.train.string_input_producer(filenames, shuffle=False, capacity=len(filenames))
label_enqueue = label_fifo.enqueue_many([lv])
Then to read the image I could use a WholeFileReader and to get the label I could dequeue the fifo:
reader = tf.WholeFileReader()
image = tf.image.decode_jpeg(value, channels=3)
image.set_shape([128,128,3])
result.uint8image = image
result.label = label_fifo.dequeue()
And generate the batches as follows:
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
num_preprocess_threads = 16
images, label_batch = tf.train.shuffle_batch(
[result.uint8image, result.label],
batch_size=FLAGS.batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * FLAGS.batch_size,
min_after_dequeue=min_queue_examples)
There is tf.py_func() you could utilize to implement a mapping from file path to label.
files = gfile.Glob(data_pattern)
filename_queue = tf.train.string_input_producer(
files, num_epochs=num_epochs, shuffle=True) # list of files to read
def extract_label(s):
# path to label logic for cat&dog dataset
return 0 if os.path.basename(str(s)).startswith('cat') else 1
def read(filename_queue):
key, value = reader.read(filename_queue)
image = tf.image.decode_jpeg(value, channels=3)
image = tf.cast(image, tf.float32)
image = tf.image.resize_image_with_crop_or_pad(image, width, height)
label = tf.cast(tf.py_func(extract_label, [key], tf.int64), tf.int32)
label = tf.reshape(label, [])
training_data = [read(filename_queue) for _ in range(num_readers)]
...
tf.train.shuffle_batch_join(training_data, ...)
I used this:
filename = filename.strip().decode('ascii')
Another suggestion is to save your data in TFRecord format. In this case you would be able to save all images and all labels in the same file. For a big number of files it gives a lot of advantages:
can store data and labels at the same place
data is allocated at one place (no need to remember various directories)
if there are many files (images), opening/closing a file is time consuming. Seeking the location of the file from ssd/hdd also takes time
Related
I am writing a multi-class image classification and distance estimation network. The dataset I am using consists of images and their corresponding annotation file (containing class id, bbox, distance) as shown below.
1,74,127,92,139,47
I have written custom dataloader for this task as presented below.
class CreateDataLoader(Dataset):
#constructor
def __init__(self, root_dir, transforms=None):
self.root = root_dir
self.csvs = list(sorted(os.listdir(os.path.join(root_dir, "csv")))) #Get list of all csv files
self.images = list(sorted(os.listdir(os.path.join(root_dir, "images"))))#Get list of all images
self.transforms = transforms
def __getitem__(self, index):
# acquire image, label, its bounding box coordinates and the distance to object
imagePath = os.path.join(self.root, "images", self.images[index])
filename, ext = os.path.splitext(os.path.basename(imagePath))
csvFilename = filename.replace('camera', 'CSV')
csvFile = os.path.join(self.root, "csv", (csvFilename + ".csv"))
image = Image.open(imagePath).convert("RGB")
bboxes = []
objectLabels = []
distances = []
with open(csvFile, 'r') as read_obj:
csv_reader = csv.reader(read_obj)
for row in csv_reader:
objectLabel = row[0]
Xmin = np.array(row[1])
Ymin = np.array(row[2])
Xmax = np.array(row[3])
Ymax = np.array(row[4])
distance = np.array(row[5])
bbox = np.array([Xmin, Ymin, Xmax, Ymax], dtype=int)
bboxes.append(bbox)
objectLabels.append(int(objectLabel))
distances.append(distance)
distances = np.array(distances, dtype=float)
objectLabels = np.array(objectLabels, dtype=float)
#make everything to torch tensor, important question is it required for bounding boxes??
bboxes = torch.as_tensor(bboxes, dtype=torch.float32)
objectLabels = torch.as_tensor(objectLabels, dtype=torch.float32)
distances = torch.as_tensor(distances, dtype=torch.float32)
if self.transforms is not None:
image = self.transforms(image)
return image, objectLabels, bboxes, distances
def __len__(self):
# return the size of the dataset
return len(self.images)
Since, the length of objectLabels, bboxes and distances vary with image (due to objects present in the image), I had to write a custom collate function as below.
def collate_fn_Custom( batch):
images = list()
boxes = list()
objectLabels = list()
distances = list()
for b in batch:
images.append(b[0])
objectLabels.append(b[1])
boxes.append(b[2])
distances.append(b[3])
images = torch.stack(images, dim=0)
return images, objectLabels, boxes, distances
It works fine for batch size of 1. But I am failing to understand how I can extend it for larger batch sizes. Of course, I can pass any batch size as an argument, but how can the model differentiate among different different images and their corresponding annotation for multiple batch size.
The source for collate function is here.
I am working on a violence detection service. I am trying to develop software based on the code in this repo. My dataset consists of videos resided in two directories "Violence" and "Non-Violence".
I used this code to generate npy files out of RGB channels and optical flow features. The output of this part would be 2 folders containing npy array with 244x244x5 shape. (np.float32 dtype). so it's like I have video frames in RGB in the first 3 channels (npy[...,:3]) and optical flow features in the next two channels (npy[..., 3:]).
Now I am trying to convert them to tfrecords and use tf.data.tfrecorddataset to speed up the training process. Since my model input has to be a cube tensor, my training elements has to be 64 frames of each video. It means the data point shape has to be 64x244x244x5.
So I used this code to convert the npy files to tfrecords.
from pathlib import Path
from os.path import join
import tensorflow as tf
import numpy as np
import cv2
from tqdm import tqdm
def normalize(data):
mean = np.mean(data)
std = np.std(data)
return (data - mean) / std
def random_flip(video, prob):
s = np.random.rand()
if s < prob:
video = np.flip(m=video, axis=2)
return video
def color_jitter(video):
# range of s-component: 0-1
# range of v component: 0-255
s_jitter = np.random.uniform(-0.2, 0.2)
v_jitter = np.random.uniform(-30, 30)
for i in range(len(video)):
hsv = cv2.cvtColor(video[i], cv2.COLOR_RGB2HSV)
s = hsv[..., 1] + s_jitter
v = hsv[..., 2] + v_jitter
s[s < 0] = 0
s[s > 1] = 1
v[v < 0] = 0
v[v > 255] = 255
hsv[..., 1] = s
hsv[..., 2] = v
video[i] = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
return video
def uniform_sample(video: str, target_frames: int = 64) -> np.ndarray:
"""
gets video and outputs n_frames number of frames in video.
Args:
video:
target_frames:
Returns:
"""
len_frames = int(len(data))
interval = int(np.ceil(len_frames / target_frames))
# init empty list for sampled video and
sampled_video = []
for i in range(0, len_frames, interval):
sampled_video.append(video[i])
# calculate number of padded frames and fix it
num_pad = target_frames - len(sampled_video)
if num_pad > 0:
padding = [video[i] for i in range(-num_pad, 0)]
sampled_video += padding
return np.array(sampled_video, dtype=np.float32)
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
if __name__ == '__main__':
path = Path('transformed/')
npy_files = list(path.rglob('*.npy'))[:100]
aug = True
# one_hots = to_categorical(range(2), dtype=np.int8)
path_to_save = 'data_tfrecords'
tfrecord_path = join(path_to_save, 'all_data.tfrecord')
with tf.io.TFRecordWriter(tfrecord_path) as writer:
for file in tqdm(npy_files, desc='files converted'):
# load npy files
npy = np.load(file.as_posix(), mmap_mode='r')
data = np.float32(npy)
del npy
# Uniform sampling
data = uniform_sample(data, target_frames=64)
# Add augmentation
if aug:
data[..., :3] = color_jitter(data[..., :3])
data = random_flip(data, prob=0.5)
# Normalization
data[..., :3] = normalize(data[..., :3])
data[..., 3:] = normalize(data[..., 3:])
# Label one hot encoding
label = 1 if file.parent.stem.startswith('F') else 0
# label = one_hots[label]
feature = {'image': _bytes_feature(tf.compat.as_bytes(data.tobytes())),
'label': _int64_feature(int(label))}
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
The code works fine, but the real problem is that it consumes too much disk drive. my whole dataset consisting of 2000 videos takes 12 GB, when I converted them to npy files, it became around 80 GB, and now using tfrecords It became over 120 GB or so. How can I convert them in an efficient way to reduce the space required to store them?
The answer might be too late. But I see you are still saving the video frame in your tfrecords file.
Try removing the "image" feature from your features list. And saving per frame as their Height, Width, Channels, and so forth.
feature = {'label': _int64_feature(int(label))}
Which is why the file is taking more space.
I'm trying to design an neural network that predicts a photo image based on 5 distinct film stock. I have 5000 total images. 4000 training and 1000 testing. I've stored images into two sub-folders for training and testing data.
training_dir = r'C:\...\Training Set'
test_dir = r'C:\...\Test Set'
I'm able to collect the training images using skimage io.ImageCollection.
folders = []
for image_path in os.scandir(training_dir):
img = io.ImageCollection(os.path.join(training_dir, image_path, '*.jpg'))
folders.append(img)
I then collect the training images based on class and apply a loop to save image data into a list.
ektachrome = folders[0]
HP5 = folders[1]
LomoP = folders[2]
Trix = folders[3]
velvia = folders[4]
images = []
for i in range(0, 800):
ekta = ektachrome[i]
images.append(ekta)
hp5 = HP5[i]
images.append(hp5)
lomo = LomoP[i]
images.append(lomo)
trix = Trix[i]
images.append(trix)
Velvia = velvia[i]
images.append(Velvia)
When I put the list of training images into an array np.asarray(images).shape, I get a shape of (4000,). I'm having trouble labeling the data. Here are my labels.
label = {'Ektachrome':1, 'HP5':2, 'Lomochrome Purple':3, 'Tri-X':4, 'Velvia 50':5}
How do I label my images?
As per my understanding, your Images should be Categorized based on the Class Names, as shown below (since you've mentioned Folders[0], Folders[1], etc.):
In that case, you can use the below code for Labelling the Images.
Ektachrome_dir = train_dir / 'Ektachrome'
HP5_dir = train_dir / 'HP5'
Lomochrome_Purple_dir = train_dir / 'Lomochrome Purple'
Tri_X_dir = train_dir / 'Tri-X'
Velvia_50_dir = train_dir / 'Velvia 50'
# Get the list of all the images
Ektachrome_Images = Ektachrome_dir.glob('*.jpeg')
HP5_Images = HP5_dir.glob('*.jpeg')
Lomochrome_Purple_Images = Lomochrome_Purple_dir.glob('*.jpeg')
Tri_X_Images = Tri_X_dir.glob('*.jpeg')
Velvia_50_Images = Velvia_50_dir.glob('*.jpeg')
# An empty list. We will insert the data into this list in (img_path, label) format
train_data = []
for img in Ektachrome_Images:
train_data.append((img,1))
for img in HP5_Images:
train_data.append((img, 2))
for img in Lomochrome_Purple_Images:
train_data.append((img, 3))
for img in Tri_X_Images:
train_data.append((img, 4))
for img in Velvia_50_Images:
train_data.append((img, 5))
I have downloaded Caltech101. Its structure is:
#Caltech101 dir
#class1 dir
#images of class1 jpgs
#class2 dir
#images of class2 jpgs
...
#class100 dir
#images of class100 jpgs
My problem is that I can't keep in memory two np arrays x and y of shape (9144, 240, 180, 3) and (9144). So my solution is to overallocate a h5py dataset, load them in 2 chunks and write them to file one after the other. Precisely:
from __future__ import print_function
import os
import glob
from scipy.misc import imread, imresize
from sklearn.utils import shuffle
import numpy as np
import h5py
from time import time
def load_chunk(images_dset, labels_dset, chunk_of_classes, counter, type_key, prev_chunk_length):
# getting images and processing
xtmp = []
ytmp = []
for label in chunk_of_classes:
img_list = sorted(glob.glob(os.path.join(dir_name, label, "*.jpg")))
for img in img_list:
img = imread(img, mode='RGB')
img = imresize(img, (240, 180))
xtmp.append(img)
ytmp.append(label)
print(label, 'done')
x = np.concatenate([arr[np.newaxis] for arr in xtmp])
y = np.array(ytmp, dtype=type_key)
print('x: ', type(x), np.shape(x), 'y: ', type(y), np.shape(y))
# writing to dataset
a = time()
images_dset[prev_chunk_length:prev_chunk_length+x.shape[0], :, :, :] = x
print(labels_dset.shape)
print(y.shape, y.shape[0])
print(type(y), y.dtype)
print(prev_chunk_length)
labels_dset[prev_chunk_length:prev_chunk_length+y.shape[0]] = y
b = time()
print('Chunk', counter, 'written in', b-a, 'seconds')
return prev_chunk_length+x.shape[0]
def write_to_file(remove_DS_Store):
if os.path.isfile('caltech101.h5'):
print('File exists already')
return
else:
# the name of each dir is the name of a class
classes = os.listdir(dir_name)
if remove_DS_Store:
classes.pop(0) # removes .DS_Store - may not be used on other terminals
# need the dtype of y in order to initialize h5 dataset
s = ''
key_type_y = s.join(['S', str(len(max(classes, key=len)))])
classes = np.array(classes, dtype=key_type_y)
# number of chunks in which the dataset must be divided
nb_chunks = 2
nb_chunks_loaded = 0
prev_chunk_length = 0
# open file and allocating a dataset
f = h5py.File('caltech101.h5', 'a')
imgs = f.create_dataset('images', shape=(9144, 240, 180, 3), dtype='uint8')
labels = f.create_dataset('labels', shape=(9144,), dtype=key_type_y)
for class_sublist in np.array_split(classes, nb_chunks):
# loading chunk by chunk in a function to avoid memory overhead
prev_chunk_length = load_chunk(imgs, labels, class_sublist, nb_chunks_loaded, key_type_y, prev_chunk_length)
nb_chunks_loaded += 1
f.close()
print('Images and labels saved to \'caltech101.h5\'')
return
dir_name = '../Datasets/Caltech101'
write_to_file(remove_DS_Store=True)
This works quite well, and also reading is actually fast enough. The problem is that I need to shuffle the dataset.
Observations:
Shuffling the dataset objects: obviously veeeery slow because they're on disk.
Creating an array of shuffled indices and use advanced numpy indexing. This means slower reading from file.
Shuffling before writing to file would be nice, problem: I have only about half of the dataset in memory each time. I would get an improper shuffling.
Can you think of a way to shuffle before writing? I'm open also to solutions which rethink the writing process, as long as it doesn't use a lot of memory.
You could shuffle the file paths before reading the image data.
Instead of shuffling the image data in memory, create a list of all file paths that belong to the dataset. Then shuffle the list of file paths. Now you can create your HDF5 database as before.
You could for example use glob to create the list of files for shuffling:
import glob
import random
files = glob.glob('../Datasets/Caltech101/*/*.jpg')
shuffeled_files = random.shuffle(files)
You could then retrieve the class label and image name from the path:
import os
for file_path in shuffeled_files:
label = os.path.basename(os.path.dirname(file_path))
image_id = os.path.splitext(os.path.basename(file_path))[0]
I'm tryint to build an application that classifies different objects. I have a training folder with a bunch of images i want to use as training for my SVM.
Up untill now I have followed this (GREAT) answer:
using OpenCV and SVM with images
here is a sample of my code:
def getTrainingData():
address = "..//data//training"
labels = []
trainingData = []
for items in os.listdir(address):
## extracts labels
name = address + "//" + items
for it in os.listdir(name):
path = name + "//" + it
print path
img = cv.imread(path, cv.CV_LOAD_IMAGE_GRAYSCALE)
d = np.array(img, dtype = np.float32)
q = d.flatten()
trainingData.append(q)
labels.append(items)
######DEBUG######
#cv.namedWindow(path,cv.WINDOW_NORMAL)
#cv.imshow(path,img)
return trainingData, labels
svm_params = dict( kernel_type = cv.SVM_LINEAR,
svm_type = cv.SVM_C_SVC,
C=2.67, gamma=3 )
training, labels = getTrainingData()
train = np.asarray(training)
svm = cv.SVM()
svm.train(train, labels, params=svm_params)
svm.save('svm_data.dat')
But when i try to run i recieve the following error:
svm.train(train, labels, params=svm_params)
TypeError: trainData data type = 17 is not supported
What am i doing wrong?
Thanks A lot!
You should resize your input images. like this:
img = cv2.resize(img, (64,64))
Size is up to you.