From the mnist dataset example I know that the dataset look something like this (60000,28,28) and the labels are (60000,). When, I print the first three examples of Mnist dataset
and I print the first three labels of those which are:
The images and labels are bounded.
I want to know how can I bound a folder with (1200 images) with size 64 and 64 with an excel with a column named "damage", with 5 different classes so I can train a neural network.
Like image of a car door and damage is class 3.
Here's a rough sketch of how you can approach this problem.
Loading each image
The first step is how you pre-process each image. You can use Python Imaging Library for this.
Example:
from PIL import Image
def load_image(path):
image = Image.open(path)
# Images can be in one of several different modes.
# Convert to single consistent mode.
image = image.convert("RGB")
image = image.resize((64, 64))
return image
Optional step: cropping
Cropping the images to focus on the feature you want the network to pay attention to can improve performance, but requires some work for each training example and for each inference.
Loading all images
I would load the images like this:
import glob
import pandas as pd
image_search_path = "image_directory/*.png"
def load_all_images():
images = []
for path in glob.glob(image_search_path):
image = load_image(path)
images.append({
'path': path,
'img': image,
})
return pd.DataFrame(images)
Loading the labels
I would use Pandas to load the labels. Suppose you have an excel file with the columns path and label, named labels.xlsx.
labels = pd.read_excel("labels.xlsx")
You then have the problem that the images that are loaded are probably not in the same order as your file full of labels. You can fix this by merging the two datasets.
images = load_all_images()
images_and_labels = images.merge(labels, on="path", validate="1:1")
# check that no rows were dropped or added, say by a missing label
assert len(images.index) == len(images_and_labels.index)
assert len(labels.index) == len(images_and_labels.index)
Converting images to numpy
Next, you need to convert both the images and labels into a numpy dataframe.
Example for images:
import numpy as np
images_processed = []
for image in images_and_labels['img'].tolist():
image = np.array(image)
# Does the image have expected shape?
assert image.shape == (64, 64, 3)
images_process.append(image)
images_numpy = np.array(images_processed)
# Check that this has the expected shape. You'll need
# to replace 1200 with the number of training examples.
assert images_numpy.shape == (1200, 64, 64, 3)
Converting labels to numpy
Assuming you're setting up a classifier, like MNIST, you'll first want to decide on an ordering of categories, and map each element of that list of categories to its position within that ordering.
The ordering of categories is arbitrary, but you'll want to be consistent about it.
Example:
categories = {
'damage high': 0,
'damage low': 1,
'damage none': 2,
}
categories_num = labels_and_images['label'].map(categories)
# Are there any labels that didn't get mapped to something?
assert categories_num.isna().sum() == 0
# Convert labels to numpy
labels_np = categories_num.values
# Check shape. You'll need to replace 1200 with the number of training examples
assert labels_np.shape == (1200,)
You should now have the variables images_np and labels_np set up as numpy arrays in the same style as the MNIST example.
I use Matplotlib to create custom t-SNE embedding plots at each epoch during trainging. I would like the plots to be displayed on Tensorboard in a slider format, like this MNST example:
But instead each batch of plots is displayed as separate summaries per epoch, which is really hard to review later. See below:
It appears to be creating multiple image summaries with the same name, so appending _X suffix instead of overwriting or adding to slider like I want. Similarly, when I use the family param, the images are grouped differently but still append _X to the summary name scope.
This is my code to create custom plots and add to tf.summary.image using custom plots and add evaluated summary to summary writer.
def _visualise_embedding(step, summary_writer, features, silhouettes, sample_size=1000):
'''
Visualise features embedding image by adding plot to summary writer to track on Tensorboard
'''
# Select random sample
feats_to_sils = list(zip(features, silhouettes))
shuffle(feats_to_sils)
feats, sils = zip(*feats_to_sils)
feats = feats[:sample_size]
sils = sils[:sample_size]
# Embed feats to 2 dim space
embedded_feats = perform_tsne(2, feats)
# Plot features embedding
im_bytes = plot_embedding(embedded_feats, sils)
# Convert PNG buffer to TF image
image = tf.image.decode_png(im_bytes, channels=4)
# Add the batch dimension
image = tf.expand_dims(image, 0)
summary_op = tf.summary.image("model_projections", image, max_outputs=1, family='family_name')
# Summary has to be evaluated (converted into a string) before adding to the writer
summary_writer.add_summary(summary_op.eval(), step)
I understand I might get the slider plots I want if I add the visualise method as an operation to the graph so as to avoid the name duplication issue. But I need to be able to loop through my evaluated tensor values to perform t-SNE to create the embeddings...
I've been stuck on this for a while so any advise is appreciated!
This can be achieved by using tf.Summary.Image()
For example:
im_summary = tf.Summary.Image(encoded_image_string=im_bytes)
im_summary_value = [tf.Summary.Value(tag=self.confusion_matrix_tensor_name,
image=im_summary)]
This is a summary.proto method so it was obvious to me at first as the method definition is not accessible through Tensorflow. I only realised its functionality when I found a code snippet of it being used on github.
Either way, it exposes image summaries as slides on Tensorboard like I wanted. 💪
I try to create a Dataset for Tensorflow from a CSV file that I created with pandas.
The csv file looks like this:
feature1 feature2 filepath label
0.25 0.35 test1.jpg A
0.33 0.15 test2.jpg B
I read the dataframe like this
mydf = pd.read_csv("TraingDatafinal.csv",header=0)
Now I have defined a function which should return a dataframe. This is all according to the quickstart guide
def train_input_fn(features, labels, batch_size):
"""An input function for training"""
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# Shuffle, repeat, and batch the examples.
dataset = dataset.shuffle(1000).repeat().batch(batch_size)
dataset = dataset.map(mappingfunction)
# Return the dataset
return dataset
I call this function like this;
mydataset = train_input_fn(mydf.drop(["label"],axis=1),mydf["label"],200)
This works, if I remove the mapping but I get a questionmark when I print the shape. Why? The dimensions seem to be clearly defined.
This is where the real struggle begins. I want to create a mapping function, that replaces the filepath with an array of the image.
I tried to achieve that by writing this mappingfunction
def mappingfunction(feature,label):
print(feature['Filename'])
image = tf.read_file(feature['Filename'])
image = tf.image.decode_image(image)
return image,label
This will only return the image and the label. I don't know how I would realize it to return all the features but the filepath.
But even this simplified verison won't work. I get an "expected binary or unicode string" error. Can you help me?
The mapping function should return all features and the label. For example:
def mappingfunction(feature,label):
print(feature['Filename'])
image = tf.read_file(feature['Filename'])
image = tf.image.decode_image(image)
features['image'] = image
return features, label
This will add an image key to the features dictionary.
I am experimenting with using OpenCV via the Python 2.7 interface to implement a machine learning-based OCR application to parse text out of an image file. I am using this tutorial (I've reposted the code below for convenience). I am completely new to machine learning, and relatively new to OpenCV.
OCR of Hand-written Digits:
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('digits.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Now we split the image to 5000 cells, each 20x20 size
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]
# Make it into a Numpy array. It size will be (50,100,20,20)
x = np.array(cells)
# Now we prepare train_data and test_data.
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)
# Create labels for train and test data
k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = train_labels.copy()
# Initiate kNN, train the data, then test it with test data for k=1
knn = cv2.KNearest()
knn.train(train,train_labels)
ret,result,neighbours,dist = knn.find_nearest(test,k=5)
# Now we check the accuracy of classification
# For that, compare the result with test_labels and check which are wrong
matches = result==test_labels
correct = np.count_nonzero(matches)
accuracy = correct*100.0/result.size
print accuracy
# save the data
np.savez('knn_data.npz',train=train, train_labels=train_labels)
# Now load the data
with np.load('knn_data.npz') as data:
print data.files
train = data['train']
train_labels = data['train_labels']
OCR of English Alphabets:
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the data, converters convert the letter to a number
data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
converters= {0: lambda ch: ord(ch)-ord('A')})
# split the data to two, 10000 each for train and test
train, test = np.vsplit(data,2)
# split trainData and testData to features and responses
responses, trainData = np.hsplit(train,[1])
labels, testData = np.hsplit(test,[1])
# Initiate the kNN, classify, measure accuracy.
knn = cv2.KNearest()
knn.train(trainData, responses)
ret, result, neighbours, dist = knn.find_nearest(testData, k=5)
correct = np.count_nonzero(result == labels)
accuracy = correct*100.0/10000
print accuracy
The 2nd code snippet (for the English alphabet) takes input from a .data file in the following format:
T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8
I,5,12,3,7,2,10,5,5,4,13,3,9,2,8,4,10
D,4,11,6,8,6,10,6,2,6,10,3,7,3,7,3,9
N,7,11,6,6,3,5,9,4,6,4,4,10,6,10,2,8
G,2,1,3,1,1,8,6,6,6,6,5,9,1,7,5,10
S,4,11,5,8,3,8,8,6,9,5,6,6,0,8,9,7
B,4,2,5,4,4,8,7,6,6,7,6,6,2,8,7,10
...there's about 20,000 lines of that. The data describes contours of characters.
I have a basic grasp on how this works, but I am confused as to how I can use this to actually perform OCR on an image. How can I use this code to write a function that takes a cv2 image as a parameter and returns a string representing the recognized text?
In general, machine-learning works like this: First you must train your program in understanding the domain of your problem. Then you start asking questions.
So if you are creating an OCR the first step is teaching your program what an A letter looks like, and the B and so on.
You use OpenCV to clear the image from noise and identify groups of pixels that could be letters and isolate them.
Then you feed those letters to your OCR program. On training mode, you will feed the image and explain what letter the image represents. On asking mode, you will feed the image and ask which letter it is. The better the training the more accurate is your answer will be (the program could get the letter wrong, there is always a chance of that).
I am building a standard image classification model with Tensorflow. For this I have input images, each assigned with a label (number in {0,1}). The Data can hence be stored in a list using the following format:
/path/to/image_0 label_0
/path/to/image_1 label_1
/path/to/image_2 label_2
...
I want to use TensorFlow's queuing system to read my data and feed it to my model. Ignoring the labels, one can easily achieve this by using string_input_producer and wholeFileReader. Here the code:
def read_my_file_format(filename_queue):
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
example = tf.image.decode_png(value)
return example
#removing label, obtaining list containing /path/to/image_x
image_list = [line[:-2] for line in image_label_list]
input_queue = tf.train.string_input_producer(image_list)
input_images = read_my_file_format(input_queue)
However, the labels are lost in that process as the image data is purposely shuffled as part of the input pipeline. What is the easiest way of pushing the labels together with the image data through the input queues?
Using slice_input_producer provides a solution which is much cleaner. Slice Input Producer allows us to create an Input Queue containing arbitrarily many separable values. This snippet of the question would look like this:
def read_labeled_image_list(image_list_file):
"""Reads a .txt file containing pathes and labeles
Args:
image_list_file: a .txt file with one /path/to/image per line
label: optionally, if set label will be pasted after each line
Returns:
List with all filenames in file image_list_file
"""
f = open(image_list_file, 'r')
filenames = []
labels = []
for line in f:
filename, label = line[:-1].split(' ')
filenames.append(filename)
labels.append(int(label))
return filenames, labels
def read_images_from_disk(input_queue):
"""Consumes a single filename and label as a ' '-delimited string.
Args:
filename_and_label_tensor: A scalar string tensor.
Returns:
Two tensors: the decoded image, and the string label.
"""
label = input_queue[1]
file_contents = tf.read_file(input_queue[0])
example = tf.image.decode_png(file_contents, channels=3)
return example, label
# Reads pfathes of images together with their labels
image_list, label_list = read_labeled_image_list(filename)
images = ops.convert_to_tensor(image_list, dtype=dtypes.string)
labels = ops.convert_to_tensor(label_list, dtype=dtypes.int32)
# Makes an input queue
input_queue = tf.train.slice_input_producer([images, labels],
num_epochs=num_epochs,
shuffle=True)
image, label = read_images_from_disk(input_queue)
# Optional Preprocessing or Data Augmentation
# tf.image implements most of the standard image augmentation
image = preprocess_image(image)
label = preprocess_label(label)
# Optional Image and Label Batching
image_batch, label_batch = tf.train.batch([image, label],
batch_size=batch_size)
See also the generic_input_producer from the TensorVision examples for full input-pipeline.
There are three main steps to solving this problem:
Populate the tf.train.string_input_producer() with a list of strings containing the original, space-delimited string containing the filename and the label.
Use tf.read_file(filename) rather than tf.WholeFileReader() to read your image files. tf.read_file() is a stateless op that consumes a single filename and produces a single string containing the contents of the file. It has the advantage that it's a pure function, so it's easy to associate data with the input and the output. For example, your read_my_file_format function would become:
def read_my_file_format(filename_and_label_tensor):
"""Consumes a single filename and label as a ' '-delimited string.
Args:
filename_and_label_tensor: A scalar string tensor.
Returns:
Two tensors: the decoded image, and the string label.
"""
filename, label = tf.decode_csv(filename_and_label_tensor, [[""], [""]], " ")
file_contents = tf.read_file(filename)
example = tf.image.decode_png(file_contents)
return example, label
Invoke the new version of read_my_file_format by passing a single dequeued element from the input_queue:
image, label = read_my_file_format(input_queue.dequeue())
You can then use the image and label tensors in the remainder of your model.
In addition to the answers provided there are few other things you can do:
Encode your label into the filename. If you have N different categories you can rename your files to something like: 0_file001, 5_file002, N_file003. Afterwards when you read the data from a reader key, value = reader.read(filename_queue) your key/value are:
The output of Read will be a filename (key) and the contents of that file (value)
Then parse your filename, extract the label and convert it to int. This will require a little bit of preprocessing of the data.
Use TFRecords which will allow you to store the data and labels at the same file.