Hi everyone I'm facing an issue after that I elaborate images and labels. To create an unique dataset I use the zip function. After the elaboration both images and labels are 18k and it's correct but when I call the zip(image,labels), items become 563.
Here some code to let you to understand:
# Map the load_and_preprocess_image function over the dataset of image paths
images = image_paths.map(load_and_preprocess_image)
# Map the extract_label function over the dataset of image paths
labels = image_paths.map(extract_label)
# Zip the labels and images together to create a dataset of (image, label) pairs
#HERE SOMETHING STRANGE HAPPENS
data = tf.data.Dataset.zip((images,labels))
# Shuffle and batch the data
data = data.shuffle(buffer_size=1000).batch(32)
# Split the data into train and test sets
data = data.shuffle(buffer_size=len(data))
# Convert the dataset into a collection of data
num_train = int(0.8 * len(data))
train_data = image_paths.take(num_train)
val_data = image_paths.skip(num_train)
I cannot see where is the error. Can you help me plese? Thanks
I'd like to have a dataset of 18k images,labels
tf's zip
tf.data.Dataset.zip is not like Python's zip. The tf.data.Dataset.zip's input is tf datasets. You may check the images/label return from your map function is the correct tf.Dataset object.
check tf.ds
make sure your image/label is correct tf.ds.
print("ele: ", images_dataset.element_spec)
print("num: ", images_dataset.cardinality().numpy())
print("ele: ", labels_dataset.element_spec)
print("num: ", labels_dataset.cardinality().numpy())
workaround
In your case, combine the image and label processing in one map function and return both to bypass to use tf.data.Dataset.zip:
# load_and_preprocess_image_and_label
def load_and_preprocess_image_and_label(image_path):
""" load image and label then some operations """
return image, label
# Map the load_and_preprocess_image function over the dataset of image/label paths
train_list = tf.data.Dataset.list_files(str(PATH / 'train/*.jpg'))
data = train_list.map(load_and_preprocess_image_and_label,
num_parallel_calls=tf.data.AUTOTUNE)
I want to evaluate if an event is happening in my screen, every time it happens a particular box/image shows up in a screen region with very similar structure.
I have collected a bunch of 84x94 .png RGB images from that screen region and I'd like to build a classifier to tell me if the event is happening or not.
Therefore my idea was to create a pd.DataFrame (df) containing 2 columns, df['np_array'] contains every picture as a np.array and df['is_category'] contains boolean values telling if that image is indicating that the event is happening or not.
The structure looks like this (with != size):
I have resized the images to 10x10 for training and converted to greyscale
df = pd.DataFrame(
{'np_array': [np.random.random((10, 10,2)) for x in range(0,10)],
'is_category': [bool(random.getrandbits(1)) for x in range(0,10)]
})
My problem is that I can't fit a scikit learn classifier by doing clf.fit(df['np_array'],df['is_category'])
I've never tried image recognition before, thanks upfront for any help!
If its a 10x10 grayscale image, you can flatten it:
import numpy as np
from sklearn import ensemble
# generate random 2d arrays
image_data = np.random.rand(10,10, 100)
# generate random labels
labels = np.random.randint(0,2, 100)
X = image_data.reshape(100, -1)
# then use any scikit-learn classification model
clf = ensemble.RandomForestClassifier()
clf.fit(X, y)
By the way, for images the best performing algorithms are convolutional neural networks.
I am experimenting with using OpenCV via the Python 2.7 interface to implement a machine learning-based OCR application to parse text out of an image file. I am using this tutorial (I've reposted the code below for convenience). I am completely new to machine learning, and relatively new to OpenCV.
OCR of Hand-written Digits:
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('digits.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Now we split the image to 5000 cells, each 20x20 size
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]
# Make it into a Numpy array. It size will be (50,100,20,20)
x = np.array(cells)
# Now we prepare train_data and test_data.
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)
# Create labels for train and test data
k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = train_labels.copy()
# Initiate kNN, train the data, then test it with test data for k=1
knn = cv2.KNearest()
knn.train(train,train_labels)
ret,result,neighbours,dist = knn.find_nearest(test,k=5)
# Now we check the accuracy of classification
# For that, compare the result with test_labels and check which are wrong
matches = result==test_labels
correct = np.count_nonzero(matches)
accuracy = correct*100.0/result.size
print accuracy
# save the data
np.savez('knn_data.npz',train=train, train_labels=train_labels)
# Now load the data
with np.load('knn_data.npz') as data:
print data.files
train = data['train']
train_labels = data['train_labels']
OCR of English Alphabets:
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the data, converters convert the letter to a number
data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
converters= {0: lambda ch: ord(ch)-ord('A')})
# split the data to two, 10000 each for train and test
train, test = np.vsplit(data,2)
# split trainData and testData to features and responses
responses, trainData = np.hsplit(train,[1])
labels, testData = np.hsplit(test,[1])
# Initiate the kNN, classify, measure accuracy.
knn = cv2.KNearest()
knn.train(trainData, responses)
ret, result, neighbours, dist = knn.find_nearest(testData, k=5)
correct = np.count_nonzero(result == labels)
accuracy = correct*100.0/10000
print accuracy
The 2nd code snippet (for the English alphabet) takes input from a .data file in the following format:
T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8
I,5,12,3,7,2,10,5,5,4,13,3,9,2,8,4,10
D,4,11,6,8,6,10,6,2,6,10,3,7,3,7,3,9
N,7,11,6,6,3,5,9,4,6,4,4,10,6,10,2,8
G,2,1,3,1,1,8,6,6,6,6,5,9,1,7,5,10
S,4,11,5,8,3,8,8,6,9,5,6,6,0,8,9,7
B,4,2,5,4,4,8,7,6,6,7,6,6,2,8,7,10
...there's about 20,000 lines of that. The data describes contours of characters.
I have a basic grasp on how this works, but I am confused as to how I can use this to actually perform OCR on an image. How can I use this code to write a function that takes a cv2 image as a parameter and returns a string representing the recognized text?
In general, machine-learning works like this: First you must train your program in understanding the domain of your problem. Then you start asking questions.
So if you are creating an OCR the first step is teaching your program what an A letter looks like, and the B and so on.
You use OpenCV to clear the image from noise and identify groups of pixels that could be letters and isolate them.
Then you feed those letters to your OCR program. On training mode, you will feed the image and explain what letter the image represents. On asking mode, you will feed the image and ask which letter it is. The better the training the more accurate is your answer will be (the program could get the letter wrong, there is always a chance of that).
I am using the MICCAI BRATS 2015 database containing 3D MRI images of the dimensions 155x240x240.
I wanted to perform intensity standardization on these images, and am trying to use the IntensityRangeStandardization class from medpy.filter.
The code is simple:
Load 20 flair images from the database into an array:
from glob import glob
import SimpleITK as sitk
pth = 'C:/BRats2015/HGG' #path to the directory
flair = glob(self.path + '*/*Flair*/*.mha') #contain paths to all images
flair = flair[:20] #choose 20 images
#load the 20 images in sitk format
im = []
for i in flair:
im.append(sitk.ReadImage(i))
#convert them into numpy array
for i in xrange(len(im)):
im[i] = sitk.GetArrayFromImage(im[i])
#initialize the filter
normalizer = IntensityRangeStandardization()
#train and transform the images
im_n = normalizer.train_transform(im)[1] # the second returned variable contains the new images, # hence [1]
I get the following error message:
File "intensity_range_standardization.py", line 268, in train
self.__stdrange = self.__compute_stdrange(images)
File "intensity_range_standardization.py", line 451, in __compute_stdrange
raise SingleIntensityAccumulationError('Image no.{} shows an unusual single-intensity accumulation that leads to a situation where two percentile values are equal. This situation is usually caused, when the background has not been removed from the image. Another possibility would be to reduce the number of landmark percentiles landmarkp or to change their distribution.'.format(idx))
SingleIntensityAccumulationError: Image no.0 shows an unusual single-intensity accumulation that leads to a situation where two percentile values are equal. This situation is usually caused, when the background has not been removed from the image. Another possibility would be to reduce the number of landmark percentiles landmarkp or to change their distribution.
Okay, I figured how to call the function train_transform if we are given images and their masks respectively. Here's the code from the medpy github repo.
Reshaping the images should be easy, but I'll still just post the link to the code in case of any confusion : Reshape the new images
The full code that worked for me:
images = [img1, img2, img3]
# each image is numpy array of shape (150,150)
masks = [i > 0 for i in images]
norm0 = IntensityRangeStandardization()
trained_model, transformed_images = norm0.train_transform([i[m] for i, m in zip(images, masks)])
for ti, i, m, in zip(transformed_images, images, masks):
i[m] = ti
norm_images.append(i)
To train and transform one after the other:
norm_images = []
trained_model = norm0.train([i[m] for i, m in zip(images, masks)])
transformed_images = [trained_model.transform(i[m], surpress_mapping_check = False) for i, m in zip(images, masks)]
for ti, i, m, in zip(transformed_images, images, masks):
i[m] = ti
norm_images.append(i)
I am trying to train an image classifier in scikit-learn. I have a bunch of input images and I am using Pillow to process them. My question is about what shape to give the Pillow data to scikit-learn.
This is my code now:
training = glob.glob('./img/training/*/*.bmp')
data = []
classes = []
for imagefile in training:
edges = Image.open(imagefile).filter(ImageFilter.FIND_EDGES).convert("L")
in_data = np.asarray(edges, dtype=np.uint8)
data.append(in_data[0])
if 'class1' in imagefile:
classes.append('class1')
else:
classes.append('class2')
clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(data, classes)
This runs without errors, but I have put the code together fairly crudely and I am not sure it is correct.
In particular, I'm not sure whether I should be using in_data[0]. I just did this because using in_data gives me an error: ValueError: Found array with dim 3. Estimator expected <= 2.
Unless you want the first row of the image matrix ( in_data[0] returns you the first row ) of each image, you probably want to use flattening.
Flattening will take each row of the image matrix and put the rows behind eachother in a 1 dimensional vector.
So it becomes data.append(in_data.flatten())
You could resize your image to a smaller format first, to reduce the number of columns of your data matrix.