Python - Image recognition classifier - python

I want to evaluate if an event is happening in my screen, every time it happens a particular box/image shows up in a screen region with very similar structure.
I have collected a bunch of 84x94 .png RGB images from that screen region and I'd like to build a classifier to tell me if the event is happening or not.
Therefore my idea was to create a pd.DataFrame (df) containing 2 columns, df['np_array'] contains every picture as a np.array and df['is_category'] contains boolean values telling if that image is indicating that the event is happening or not.
The structure looks like this (with != size):
I have resized the images to 10x10 for training and converted to greyscale
df = pd.DataFrame(
{'np_array': [np.random.random((10, 10,2)) for x in range(0,10)],
'is_category': [bool(random.getrandbits(1)) for x in range(0,10)]
})
My problem is that I can't fit a scikit learn classifier by doing clf.fit(df['np_array'],df['is_category'])
I've never tried image recognition before, thanks upfront for any help!

If its a 10x10 grayscale image, you can flatten it:
import numpy as np
from sklearn import ensemble
# generate random 2d arrays
image_data = np.random.rand(10,10, 100)
# generate random labels
labels = np.random.randint(0,2, 100)
X = image_data.reshape(100, -1)
# then use any scikit-learn classification model
clf = ensemble.RandomForestClassifier()
clf.fit(X, y)
By the way, for images the best performing algorithms are convolutional neural networks.

Related

Bound label to Image

From the mnist dataset example I know that the dataset look something like this (60000,28,28) and the labels are (60000,). When, I print the first three examples of Mnist dataset
and I print the first three labels of those which are:
The images and labels are bounded.
I want to know how can I bound a folder with (1200 images) with size 64 and 64 with an excel with a column named "damage", with 5 different classes so I can train a neural network.
Like image of a car door and damage is class 3.
Here's a rough sketch of how you can approach this problem.
Loading each image
The first step is how you pre-process each image. You can use Python Imaging Library for this.
Example:
from PIL import Image
def load_image(path):
image = Image.open(path)
# Images can be in one of several different modes.
# Convert to single consistent mode.
image = image.convert("RGB")
image = image.resize((64, 64))
return image
Optional step: cropping
Cropping the images to focus on the feature you want the network to pay attention to can improve performance, but requires some work for each training example and for each inference.
Loading all images
I would load the images like this:
import glob
import pandas as pd
image_search_path = "image_directory/*.png"
def load_all_images():
images = []
for path in glob.glob(image_search_path):
image = load_image(path)
images.append({
'path': path,
'img': image,
})
return pd.DataFrame(images)
Loading the labels
I would use Pandas to load the labels. Suppose you have an excel file with the columns path and label, named labels.xlsx.
labels = pd.read_excel("labels.xlsx")
You then have the problem that the images that are loaded are probably not in the same order as your file full of labels. You can fix this by merging the two datasets.
images = load_all_images()
images_and_labels = images.merge(labels, on="path", validate="1:1")
# check that no rows were dropped or added, say by a missing label
assert len(images.index) == len(images_and_labels.index)
assert len(labels.index) == len(images_and_labels.index)
Converting images to numpy
Next, you need to convert both the images and labels into a numpy dataframe.
Example for images:
import numpy as np
images_processed = []
for image in images_and_labels['img'].tolist():
image = np.array(image)
# Does the image have expected shape?
assert image.shape == (64, 64, 3)
images_process.append(image)
images_numpy = np.array(images_processed)
# Check that this has the expected shape. You'll need
# to replace 1200 with the number of training examples.
assert images_numpy.shape == (1200, 64, 64, 3)
Converting labels to numpy
Assuming you're setting up a classifier, like MNIST, you'll first want to decide on an ordering of categories, and map each element of that list of categories to its position within that ordering.
The ordering of categories is arbitrary, but you'll want to be consistent about it.
Example:
categories = {
'damage high': 0,
'damage low': 1,
'damage none': 2,
}
categories_num = labels_and_images['label'].map(categories)
# Are there any labels that didn't get mapped to something?
assert categories_num.isna().sum() == 0
# Convert labels to numpy
labels_np = categories_num.values
# Check shape. You'll need to replace 1200 with the number of training examples
assert labels_np.shape == (1200,)
You should now have the variables images_np and labels_np set up as numpy arrays in the same style as the MNIST example.

How to apply PCA to an image

I have a graycale noisy image. I want to apply PCA for noise reduction and see the output after the application.
Here's what I tried to do:
[in]:
from sklearn.datasets import load_sample_image
from sklearn.feature_extraction import image
from sklearn.decomposition import PCA
# Create patches of size 25 by 25 and create a matrix from all patches
patches = image.extract_patches_2d(grayscale_image, (25, 25), random_state = 42)
print(patches.shape)
# reshape patches because I got an error when applying fit_transform(ValueError: FoundValueError: Found array with dim 3. Estimator expected <= 2.)
patches_reshaped = patches.reshape(2,-1)
#apply PCA
pca = PCA()
projected = pca.fit_transform(patches_reshaped.data)
denoised_image = pca.inverse_transform(projected)
imshow(denoised_image)
[out]:
(source: imggmi.com)
I get an array as a result. How to see the de-noised image?
In order to see your de-noised image, you need to convert your data which is represented in some low-dimension using the principal components back to the original space. To do that you can use the inverse_transform() function. As you can see from the documentation here, this function will accept the projected data and return an array like the original image. So you can do something like,
denoised_image = pca.inverse_transform(projected)
# then view denoised_image
Edit:
Here are some of the issues to look in to:
You have 53824 patches from your original image with sizes (25,25). In order to reshape you data and pass it to PCA, as you can see from the documentation here, you need to pass an array of size (n_samples, n_features). Your number of samples is 53824. So patches reshaped should be:
patches_reshaped = patches.reshape(patches.shape[0],-1)
# this should return a (53824, 625) shaped data
Now you use this reshaped data and transform it using PCA and inverse transform to get the data in your original domain. After doing that, your denoised_image is a set of reconstructed patches. You will need to combine these patches to get an image using the function image.reconstruct_from_patches_2d here is the documentation. So you can do something like,
denoised_image = image.reconstruct_from_patches_2d(denoised_image.reshape(-1,25,25), grayscale_image.shape)
Now you can view the denoised_image, which should look like grayscale_image.

Best way in Pytorch to upsample a Tensor and transform it to rgb?

For a nice output in Tensorboard I want to show a batch of input images, corresponding target masks and output masks in a grid.
Input images have different size then the masks. Furthermore the images are obviously RGB.
From a batch of e.g. 32 or 64 I only want to show the first 4 images.
After some fiddling around I came up with the following example code. Good thing: It works.
But I am really not sure if I missed something in Pytorch. It just looks much longer then I expected. Especially the upsampling and transformation to RGB seems wild. But the other transformations I found would not work for a whole batch.
import torch
from torch.autograd import Variable
import torch.nn.functional as FN
import torchvision.utils as vutils
from tensorboardX import SummaryWriter
import time
batch = 32
i_size = 192
o_size = 112
nr_imgs = 4
# Tensorboard init
writer = SummaryWriter('runs/' + time.strftime('%Y%m%d_%H%M%S'))
input_image=Variable(torch.rand(batch,3,i_size,i_size))
target_mask=Variable(torch.rand(batch,o_size,o_size))
output_mask=Variable(torch.rand(batch,o_size,o_size))
# upsample target_mask, add dim to have gray2rgb
tm = FN.upsample(target_mask[:nr_imgs,None], size=[i_size, i_size], mode='bilinear')
tm = torch.cat( (tm,tm,tm), dim=1) # grayscale plane to rgb
# upsample target_mask, add dim to have gray2rgb
om = FN.upsample(output_mask[:nr_imgs,None], size=[i_size, i_size], mode='bilinear')
om = torch.cat( (om,om,om), dim=1) # grayscale plane to rgb
# add up all images and make grid
imgs = torch.cat( ( input_image[:nr_imgs].data, tm.data, om.data ) )
x = vutils.make_grid(imgs, nrow=nr_imgs, normalize=True, scale_each=True)
# Tensorboard img output
writer.add_image('Image', x, 0)
EDIT: Found this on Pytorchs Issues list. Its about Batch support for Transform. Seems there are no plans to add batch transforms in the future. So my current code might be the best solution for the time being, anyway?
Maybe you can just convert your Tensors to the numpy array (.data.cpu().numpy() ) and use opencv to do upsampling? OpenCV implementation should be quite fast.

Python/OpenCV - Machine Learning-based OCR (Image to Text)

I am experimenting with using OpenCV via the Python 2.7 interface to implement a machine learning-based OCR application to parse text out of an image file. I am using this tutorial (I've reposted the code below for convenience). I am completely new to machine learning, and relatively new to OpenCV.
OCR of Hand-written Digits:
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('digits.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Now we split the image to 5000 cells, each 20x20 size
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]
# Make it into a Numpy array. It size will be (50,100,20,20)
x = np.array(cells)
# Now we prepare train_data and test_data.
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)
# Create labels for train and test data
k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = train_labels.copy()
# Initiate kNN, train the data, then test it with test data for k=1
knn = cv2.KNearest()
knn.train(train,train_labels)
ret,result,neighbours,dist = knn.find_nearest(test,k=5)
# Now we check the accuracy of classification
# For that, compare the result with test_labels and check which are wrong
matches = result==test_labels
correct = np.count_nonzero(matches)
accuracy = correct*100.0/result.size
print accuracy
# save the data
np.savez('knn_data.npz',train=train, train_labels=train_labels)
# Now load the data
with np.load('knn_data.npz') as data:
print data.files
train = data['train']
train_labels = data['train_labels']
OCR of English Alphabets:
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the data, converters convert the letter to a number
data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
converters= {0: lambda ch: ord(ch)-ord('A')})
# split the data to two, 10000 each for train and test
train, test = np.vsplit(data,2)
# split trainData and testData to features and responses
responses, trainData = np.hsplit(train,[1])
labels, testData = np.hsplit(test,[1])
# Initiate the kNN, classify, measure accuracy.
knn = cv2.KNearest()
knn.train(trainData, responses)
ret, result, neighbours, dist = knn.find_nearest(testData, k=5)
correct = np.count_nonzero(result == labels)
accuracy = correct*100.0/10000
print accuracy
The 2nd code snippet (for the English alphabet) takes input from a .data file in the following format:
T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8
I,5,12,3,7,2,10,5,5,4,13,3,9,2,8,4,10
D,4,11,6,8,6,10,6,2,6,10,3,7,3,7,3,9
N,7,11,6,6,3,5,9,4,6,4,4,10,6,10,2,8
G,2,1,3,1,1,8,6,6,6,6,5,9,1,7,5,10
S,4,11,5,8,3,8,8,6,9,5,6,6,0,8,9,7
B,4,2,5,4,4,8,7,6,6,7,6,6,2,8,7,10
...there's about 20,000 lines of that. The data describes contours of characters.
I have a basic grasp on how this works, but I am confused as to how I can use this to actually perform OCR on an image. How can I use this code to write a function that takes a cv2 image as a parameter and returns a string representing the recognized text?
In general, machine-learning works like this: First you must train your program in understanding the domain of your problem. Then you start asking questions.
So if you are creating an OCR the first step is teaching your program what an A letter looks like, and the B and so on.
You use OpenCV to clear the image from noise and identify groups of pixels that could be letters and isolate them.
Then you feed those letters to your OCR program. On training mode, you will feed the image and explain what letter the image represents. On asking mode, you will feed the image and ask which letter it is. The better the training the more accurate is your answer will be (the program could get the letter wrong, there is always a chance of that).

How to pass Pillow image data to scikit-learn?

I am trying to train an image classifier in scikit-learn. I have a bunch of input images and I am using Pillow to process them. My question is about what shape to give the Pillow data to scikit-learn.
This is my code now:
training = glob.glob('./img/training/*/*.bmp')
data = []
classes = []
for imagefile in training:
edges = Image.open(imagefile).filter(ImageFilter.FIND_EDGES).convert("L")
in_data = np.asarray(edges, dtype=np.uint8)
data.append(in_data[0])
if 'class1' in imagefile:
classes.append('class1')
else:
classes.append('class2')
clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(data, classes)
This runs without errors, but I have put the code together fairly crudely and I am not sure it is correct.
In particular, I'm not sure whether I should be using in_data[0]. I just did this because using in_data gives me an error: ValueError: Found array with dim 3. Estimator expected <= 2.
Unless you want the first row of the image matrix ( in_data[0] returns you the first row ) of each image, you probably want to use flattening.
Flattening will take each row of the image matrix and put the rows behind eachother in a 1 dimensional vector.
So it becomes data.append(in_data.flatten())
You could resize your image to a smaller format first, to reduce the number of columns of your data matrix.

Categories