My own OCR-program in Python

My own OCR-program in Python - python

I am still a beginner but I want to write a character-recognition-program. This program isn't ready yet. And I edited a lot, therefor the comments may not match exactly. I will use the 8-connectivity for the connected component labeling.
from PIL import Image
import numpy as np
im = Image.open("D:\\Python26\\PYTHON-PROGRAMME\\bild_schrift.jpg")
w,h = im.size
w = int(w)
h = int(h)
#2D-Array for area
area = []
for x in range(w):
area.append([])
for y in range(h):
area[x].append(2) #number 0 is white, number 1 is black
#2D-Array for letter
letter = []
for x in range(50):
letter.append([])
for y in range(50):
letter[x].append(0)
#2D-Array for label
label = []
for x in range(50):
label.append([])
for y in range(50):
label[x].append(0)
#image to number conversion
pix = im.load()
threshold = 200
for x in range(w):
for y in range(h):
aaa = pix[x, y]
bbb = aaa[0] + aaa[1] + aaa[2] #total value
if bbb<=threshold:
area[x][y] = 1
if bbb>threshold:
area[x][y] = 0
np.set_printoptions(threshold='nan', linewidth=10)
#matrix transponation
ccc = np.array(area)
area = ccc.T #better solution?
#find all black pixel and set temporary label numbers
i=1
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
letter[x][y]=1
label[x][y]=i
i += 1
#connected components labeling
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
label[x][y]=i
#if pixel has neighbour:
if area[x][y+1]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
if area[x+1][y]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
#should i also compare pixel and left neighbour?
#find width of the letter
#find height of the letter
#find the middle of the letter
#middle = [width/2][height/2] #?
#divide letter into 30 parts --> 5 x 6 array
#model letter
#letter A-Z, a-z, 0-9 (maybe more)
#compare each of the 30 parts of the letter with all model letters
#make a weighting
#print(letter)
im.save("D:\\Python26\\PYTHON-PROGRAMME\\bild2.jpg")
print('done')

OCR is not an easy task indeed. That's why text CAPTCHAs still work :)
To talk only about the letter extraction and not the pattern recognition, the technique you are using to separate the letters is called Connected Component Labeling. Since you are asking for a more efficient way to do this, try to implement the two-pass algorithm that's described in this article. Another description can be found in the article Blob extraction.
EDIT: Here's the implementation for the algorithm that I have suggested:
import sys
from PIL import Image, ImageDraw
class Region():
def __init__(self, x, y):
self._pixels = [(x, y)]
self._min_x = x
self._max_x = x
self._min_y = y
self._max_y = y
def add(self, x, y):
self._pixels.append((x, y))
self._min_x = min(self._min_x, x)
self._max_x = max(self._max_x, x)
self._min_y = min(self._min_y, y)
self._max_y = max(self._max_y, y)
def box(self):
return [(self._min_x, self._min_y), (self._max_x, self._max_y)]
def find_regions(im):
width, height = im.size
regions = {}
pixel_region = [[0 for y in range(height)] for x in range(width)]
equivalences = {}
n_regions = 0
#first pass. find regions.
for x in xrange(width):
for y in xrange(height):
#look for a black pixel
if im.getpixel((x, y)) == (0, 0, 0, 255): #BLACK
# get the region number from north or west
# or create new region
region_n = pixel_region[x-1][y] if x > 0 else 0
region_w = pixel_region[x][y-1] if y > 0 else 0
max_region = max(region_n, region_w)
if max_region > 0:
#a neighbour already has a region
#new region is the smallest > 0
new_region = min(filter(lambda i: i > 0, (region_n, region_w)))
#update equivalences
if max_region > new_region:
if max_region in equivalences:
equivalences[max_region].add(new_region)
else:
equivalences[max_region] = set((new_region, ))
else:
n_regions += 1
new_region = n_regions
pixel_region[x][y] = new_region
#Scan image again, assigning all equivalent regions the same region value.
for x in xrange(width):
for y in xrange(height):
r = pixel_region[x][y]
if r > 0:
while r in equivalences:
r = min(equivalences[r])
if not r in regions:
regions[r] = Region(x, y)
else:
regions[r].add(x, y)
return list(regions.itervalues())
def main():
im = Image.open(r"c:\users\personal\py\ocr\test.png")
regions = find_regions(im)
draw = ImageDraw.Draw(im)
for r in regions:
draw.rectangle(r.box(), outline=(255, 0, 0))
del draw
#im.show()
output = file("output.png", "wb")
im.save(output)
output.close()
if __name__ == "__main__":
main()
It's not 100% perfect, but since you are doing this only for learning purposes, it may be a good starting point. With the bounding box of each character you can now use a neural network as others have suggested here.

OCR is very, very hard. Even with computer-generated characters, it's quite challenging if you don't know the font and font size in advance. Even if you're matching characters exactly, I would not call it a "beginning" programming project; it's quite subtle.
If you want to recognize scanned, or handwritten characters, that's even harder - you'll need to use advanced math, algorithms, and machine learning. There are quite a few books and thousands of articles written about this topic, so you don't need to reinvent the wheel.
I admire your effort, but I don't think you've gotten far enough to hit any of the actual difficulties yet. So far you're just randomly exploring pixels and copying them from one array to another. You haven't actually done any comparison yet, and I'm not sure the purpose of your "random walk".
Why random? Writing correct randomized algorithms is quite difficult. I would recommend starting with a deterministic algorithm first.
Why are you copying from one array to another? Why not just compare directly?
When you get the comparison, you'll have to deal with the fact that the image is not exactly the same as the "prototype", and it's not clear how you'll deal with that.
Based on the code you've written so far, though, I have an idea for you: try writing a program that finds its way through a "maze" in an image. The input would be the image, plus the start pixel and the goal pixel. The output is a path through the maze from the start to the goal. This is a much easier problem than OCR - solving mazes is something that computers are great for - but it's still fun and challenging.

Most OCR algorithms these days are based on neural network algorithms. Hopfield networks are a good place to start. Based on the Hopfield Model available here in C, I built a very basic image recognition algorithm in python similar to what you describe. I've posted the full source here. It's a toy project and not suitable for real OCR, but can get you started in the right direction.
The Hopfield model is used as an autoassociative memory to store and recall a set of bitmap images. Images are stored by calculating a corresponding weight matrix. Thereafter, starting from an arbitrary configuration, the memory will settle on exactly that stored image, which is nearest to the starting configuration in terms of Hamming distance. Thus given an incomplete or corrupted version of a stored image, the network is able to recall the corresponding original image.
A Java applet to toy with an example can be found here; the network is trained with example inputs for the digits 0-9. Draw in the box on the right, click test and see the results from the network.
Don't let the mathematical notation intimidate you, the algorithms are straightforward once you get to source code.

OCR is very, very difficult! What approach to use to attempt OCR will be based on what you are trying to accomplish (hand writing recongnition, computer generated text reading, etc.)
However, to get you started, read up on Neural Networks and OCR. Here are a few jump-right-in articles on the subject:
http://www.codeproject.com/KB/cs/neural_network_ocr.aspx
http://www.codeproject.com/KB/dotnet/simple_ocr.aspx
Use your favorite search engine to find information.
Have fun!

Related

Finding lines manually in an image

I have an image (saved as a variable called canny_image) and it looks like this after preprocessing.
I am basically trying to find the distance between the first two vertical lines. I tried using the hough_line function from skimage, but it's unable to find the first line, so I thought it might be easier to solve this manually.
I am basically trying to solve this by going through each row in the image until I get to the first pixel with a value of 255, (the lines have a value of 255, while everything else is zero), and then I store the location of that pixel in an array. And I take the mode of the values in the array as the x location of the first line. I'll do the same for the 2nd line by using the first x-value as a starting point.
def find_lines(canny_image):
threshold = 255
for y in range(canny_image.shape[0]):
for x in range(canny_image.shape[1]):
if canny_image[x, y] == threshold:
return x
This is the code I wrote to get the x-location of the first line, however, I'm not getting the desired output. Any help on how to solve this will be much appreciated. Thanks!

Perhaps try something like this
# Returns an array of lines x positions in an image
def find_line_x_positions(image, lines_to_detect: int, buffer_zone: int):
threshold = 255
(height, width) = image.shape
x_position_sums = np.zeros((lines_to_detect, 2), np.double) # For each line, store x_pos sum[i,0] and point count[i,1]
for y in range(height):
buffer = 0
line_index = 0
for x in range(width):
if buffer > 0:
buffer -= 1
if (image[y, x] >= threshold) and (buffer == 0):
buffer = buffer_zone
x_position_sums[line_index, 0] += x
x_position_sums[line_index, 1] += 1
line_index += 1
if ((line_index) == lines_to_detect):
break
# Divide the x position sums by the point counts to get the average x position for each line
results = x_position_sums[np.all(x_position_sums,axis=1)]
results = np.divide(results[:,0],results[:,1])
return results
You can also try OpenCV's HoughLines() function which is simpler to implement than scikit lib. When I tested OpenCV implementation out, it seems to have a hard time finding vertical lines(within 10 degrees from vertical) but you can solve this by rotating your image X degrees and look for lines within that range of rotation.

Handwritten cross inside a square image recognition

this is a long one. I am currently (from way too much time) trying to differentiate between crossed out square, completely blacked out square, blank square, crossed out/blacked out circle and blank circle gather from a scanned image like the one you can see. My current approach (link to the repo here, the focus is on the function detect_answers in evaluator.py, sorry for the occasional italian comments/names) is:
Find the outer black borders;
Align the image to compensate for scanning misalignment;
Retrieve ID barcode;
Divide the whole image in small square such that each one contains either a circle or a square;
Classify each square based on the category mentioned above (crossed out square in green, completely blacked out square in blue, blank square can be left unprocessed, crossed out/blacked out circle in red and blank circle).
def detect_answers(bgr_image: np.array, bgr_img_for_debug: np.array,
x_cut_positions: List[int], y_cut_positions: Tuple[int],
is_60_question_sim, debug: str):
question_multiplier: int = 15 if is_60_question_sim else 20
letter: Tuple[str, ...] = ("L", "", "A", "B", "C", "D", "E")
user_answer_dict: Dict[int, str] = {i: "" for i in range(1, 61 - 20 * int(not is_60_question_sim))}
gr_image: np.array = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2GRAY)
# load SVM model
load_path = os.getcwd()
clf = load(os.path.join(load_path, "reduced.joblib"))
for y_index in range(len(y_cut_positions) - 1):
for x_index in range(len(x_cut_positions) - 1):
# if you are on a column with only numbers, skip it
if not (x_index - 1) % 7:
continue
x_top_left = int(not x_index % 7) * 7 + x_cut_positions[x_index]
x_bottom_right = int(not x_index % 7) * 2 + x_cut_positions[x_index + 1]
y_top_left: int = y_cut_positions[y_index]
y_bottom_right: int = y_cut_positions[y_index + 1]
crop_for_prediction: np.array = gr_image[y_top_left:y_bottom_right, x_top_left:x_bottom_right]
crop_for_prediction: np.array = cv2.resize(crop_for_prediction, (18, 18))
# category = ("QB", "QS", "QA", "CB", "CA")
# 0 1 2 3 4
crop_for_prediction: np.array = np.append(crop_for_prediction,
[x_index % 7, int(np.mean(crop_for_prediction))])
predicted_category_index: int = clf.predict([crop_for_prediction])[0]
return user_answer_dict
I am expecting to have a list with the position of each relevant coloured square and its category. Currently this is done via a trained model used previous manually labeled data (file can be found in the github repo). I have tried many different approach (mean of each square, counting black pixels, ...) but no method seems to be reliable enough to handle all the variation that handwriting provides. Furthermore, the algorithm should be quite fast since each time it runs it needs to evaluate 500-800 tests.
Sample input
Expected output
Correct Classification, ignore red numbers
WrongClassification
In particular, question 1D should be rounded by red contours (since it's a blacked out square) and almost everything from the last column didn't get classified correctly.
At your disposal for any question, clarification or discussion. Cheers

Implementing from scratch cv2.warpPerspective()

I was making some experimentations with the OpenCV function cv2.warpPerspective when I decided to code it from scratch to better understand its pipeline. Though I followed (hopefully) every theoretical step, it seems I am still missing something and I am struggling a lot to understand what. Could you please help me?
SRC image (left) and True DST Image (right)
Output of the cv2.warpPerspective overlapped on the True DST
# Invert the homography SRC->DST to DST->SRC
hinv = np.linalg.inv(h)
src = gray1
dst = np.zeros(gray2.shape)
h, w = src.shape
# Remap back and check the domain
for ox in range(h):
for oy in range(w):
# Backproject from DST to SRC
xw, yw, w = hinv.dot(np.array([ox, oy, 1]).T)
# cv2.INTER_NEAREST
x, y = int(xw/w), int(yw/w)
# Check if it falls in the src domain
c1 = x >= 0 and y < h
c2 = y >= 0 and y < w
if c1 and c2:
dst[x, y] = src[ox, oy]
cv2.imshow(dst + gray2//2)
Output of my code
PS: The output images are the overlapping of Estimated DST and the True DST to better highlight differences.

Your issue amounts to a typo. You mixed up the naming of your coordinates. The homography assumes (x,y,1) order, which would correspond to (j,i,1).
Just use (x, y, 1) in the calculation, and (xw, yw, w) in the result of that (then x,y = xw/w, yw/w). the w factor mirrors the math, when formulated properly.
Avoid indexing into .shape. The indices don't "speak". Just do (height, width) = src.shape[:2] and use those.
I'd recommend to fix the naming scheme, or define it up top in a comment. I'd recommend sticking with x,y instead of i,j,u,v, and then extend those with prefixes/suffixes for the space they're in ("src/dst/in/out"). Perhaps something like ox,oy for iterating, just xw,yw,w for the homography result, which turns into x,y via division, and ix,iy (integerized) for sampling in the input? Then you can use dst[oy, ox] = src[iy, ix]

Fast Connected Component Labeling in Python

I am trying to identify connected regions of pixels in an image stack. Since it is a stack, the input is quite large (on the order of 10 million pixels, although only about 1 million are bright), and looping through it pixel by pixel is extremely slow. Nonetheless, my first attempt works as follows:
1) Check if current pixel is bright, if so go to 2. If not go to 4.
2) Obtain slice of image containing all 26 neighbors. Set all neighbors that have not yet been scanned to be dark. Set all neighbors that fall outside of the image (e.g. indices of -1) to be dark. Set current pixel to dark.
3) If all pixels in slice are dark, assign current pixel a new label. If exactly one pixel in slice is bright, set current pixel label to be equal to this bright pixel's label. If more than one pixel is bright, set current pixel to lowest of bright pixel labels, and record the equivalence of these labels.
4) Advance one row. Once all rows in the column are scanned, advance one column. Once every pixel is in the slice is scanned, advance one slice in the stack. Once the entire image is scanned, advance to 5.
5) Loop through the list of equivalencies, reassigning labels in the output.
I don't think the processing in steps 2 and 3 are much slower than other, similar algorithms, but I could be wrong. I think the major bottleneck is simply looping through each pixel. I know the bottleneck is not #5 because I have never been patient enough to reach that step. Even just looping over the image indices with a print statement is quite slow. I know that with the same data I can do this analysis quickly, so I suspect there must be a smarter way to do this with a very different approach. I have read something vague about performing dilation and using intersections, but I can't imagine what the mechanism they were alluding to was.
Is there something fast and clever out there, or is this basically the only way to do it? How is the MATLAB algorithm so much faster?
for i in xrange(0,nPix):
for j in xrange(0,nPix):
for k in xrange(0,nFrames):
if img[i,j,k] != 0: #If the current pixel is bright
#Construct an array of already scanned
#neighbors. Pixels are scanned first in z, then y,
#then x.
#Construct indices allowing for periodic boundary
ir = np.array(range(i-1,i+2)).reshape(-1,1,1)%nPix
jr = np.array(range(j-1,j+2)).reshape(1,-1,1)%nPix
kr = np.array(range(k-1,k+2)).reshape(1,1,-1)%nFrames
pastNeighbors = img[ir,jr,kr] #Includes i-1:i+1
#Set all indices outside image and "future neighbors" to 0
pastNeighbors[2,:,:] = 0
pastNeighbors[1,2,:] = 0
pastNeighbors[1,1,2] = 0
pastNeighbors[1,1,1] = 0 #Pixel itself; not a neighbor
#Eliminate periodic boundary
if i-1 < 0: pastNeighbors[0,:,:] = 0
if i+1 == nPix: pastNeighbors[2,:,:] = 0
if j-1 < 0: pastNeighbors[:,0,:] = 0
if j+1 == nPix: pastNeighbors[:,2,:] = 0
if k-1 < 0: pastNeighbors[:,:,0] = 0
if k+1 == nFrames: pastNeighbors[:,:,2] = 0
if np.count_nonzero(pastNeighbors) == 0: #If all dark neighbors
label = label + 1 #Assign new label to pixel
out[i,j,k] = label
elif np.count_nonzero(pastNeighbors) == 1: #One bright neighbor
relX, relY, relZ = np.nonzero(pastNeighbors)
relX = relX - 1
relY = relY - 1
relZ = relZ - 1
out[i,j,k] = out[i + relX, j + relY, k + relZ]
else: #Mutliple bright neighbors
relX, relY, relZ = np.nonzero(pastNeighbors)
relX = relX - 1
relY = relY - 1
relZ = relZ - 1
equiv = out[i + relX, j + relY, k + relZ]
out[i, j, k] = np.min(equiv) #Assign one of the
#multiple labels
for i in xrange(0,equiv.size):
equivList = np.append(equivList, [[np.min(equiv),equiv[i]]], axis = 0)

Correct method and Python package that can find width of an image's feature

The input is a spectrum with colorful (sorry) vertical lines on a black background. Given the approximate x coordinate of that band (as marked by X), I want to find the width of that band.
I am unfamiliar with image processing. Please direct me to the correct method of image processing and a Python image processing package that can do the same.
I am thinking PIL, OpenCV gave me an impression of being overkill for this particular application.
What if I want to make this an expert system that can classify them in the future?

I'll give a complete minimal working example (as suggested by sega_sai). I don't have access to your original image, but you'll see it doesn't really matter! The peak distributions found by the code below are:
Mean values at: 26.2840960523 80.8255092125
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("band2.png"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y
# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)
# Output the result
print "Mean values at: ", p1[1], p1[4]
# Plot the result
from pylab import *
subplot(211)
imshow(pic)
subplot(223)
plot(projection)
subplot(224)
plot(X,Y,'r',lw=5)
show()

Below is a simple thresholding method to find the lines and their width, it should work quite reliably for any number of lines. The yellow and black image below was processed using this script, the red/black plot illustrates the found lines using parameters of threshold = 0.3, min_line_width = 5)
The script averages the rows of an image, and then determines the basic start and end positions of each line based on a threshold (which you can set between 0 and 1), and a minimum line width (in pixels). By using thresholding and minimum line width you can easily filter your input images to get the lines out of them. The first function find_lines returns all the lines in an image as a list of tuples containing the start, end, center, and width of each line. The second function find_closest_band_width is called with the specified x_position, and returns the width of the closest line to this position (assuming you want distance to centre for each line). As the lines are saturated (255 cut-off per channel), their cross-sections are not far from a uniform distribution, so I don't believe trying to fit any kind of distribution is really going to help too much, just unnecessarily complicates.
import Image, ImageStat
def find_lines(image_file, threshold, min_line_width):
im = Image.open(image_file)
width, height = im.size
hist = []
lines = []
start = end = 0
for x in xrange(width):
column = im.crop((x, 0, x + 1, height))
stat = ImageStat.Stat(column)
## normalises by 2 * 255 as in your example the colour is yellow
## if your images start using white lines change this to 3 * 255
hist.append(sum(stat.sum) / (height * 2 * 255))
for index, value in enumerate(hist):
if value > threshold and end >= start:
start = index
if value < threshold and end < start:
if index - start < min_line_width:
start = 0
else:
end = index
center = start + (end - start) / 2.0
width = end - start
lines.append((start, end, center, width))
return lines
def find_closest_band_width(x_position, lines):
distances = [((value[2] - x_position) ** 2) for value in lines]
index = distances.index(min(distances))
return lines[index][3]
## set your threshold, and min_line_width for finding lines
lines = find_lines("8IxWA_sample.png", 0.7, 4)
## sets x_position to 59th pixel
print 'width of nearest line:', find_closest_band_width(59, lines)

I don't think that you need anything fancy for you particular task.
I would just use PIL + scipy. That should be enough.
Because you essentially need to take your image, make a 1D-projection of it
and then fit a Gaussian or something like that to it. The information about the approximate location of the band should be used a first guess for the fitter.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.