Implementing from scratch cv2.warpPerspective()

Implementing from scratch cv2.warpPerspective() - python

I was making some experimentations with the OpenCV function cv2.warpPerspective when I decided to code it from scratch to better understand its pipeline. Though I followed (hopefully) every theoretical step, it seems I am still missing something and I am struggling a lot to understand what. Could you please help me?
SRC image (left) and True DST Image (right)
Output of the cv2.warpPerspective overlapped on the True DST
# Invert the homography SRC->DST to DST->SRC
hinv = np.linalg.inv(h)
src = gray1
dst = np.zeros(gray2.shape)
h, w = src.shape
# Remap back and check the domain
for ox in range(h):
for oy in range(w):
# Backproject from DST to SRC
xw, yw, w = hinv.dot(np.array([ox, oy, 1]).T)
# cv2.INTER_NEAREST
x, y = int(xw/w), int(yw/w)
# Check if it falls in the src domain
c1 = x >= 0 and y < h
c2 = y >= 0 and y < w
if c1 and c2:
dst[x, y] = src[ox, oy]
cv2.imshow(dst + gray2//2)
Output of my code
PS: The output images are the overlapping of Estimated DST and the True DST to better highlight differences.

Your issue amounts to a typo. You mixed up the naming of your coordinates. The homography assumes (x,y,1) order, which would correspond to (j,i,1).
Just use (x, y, 1) in the calculation, and (xw, yw, w) in the result of that (then x,y = xw/w, yw/w). the w factor mirrors the math, when formulated properly.
Avoid indexing into .shape. The indices don't "speak". Just do (height, width) = src.shape[:2] and use those.
I'd recommend to fix the naming scheme, or define it up top in a comment. I'd recommend sticking with x,y instead of i,j,u,v, and then extend those with prefixes/suffixes for the space they're in ("src/dst/in/out"). Perhaps something like ox,oy for iterating, just xw,yw,w for the homography result, which turns into x,y via division, and ix,iy (integerized) for sampling in the input? Then you can use dst[oy, ox] = src[iy, ix]

Related

Raw image to Datapoint Conversion

I am working with a dataset that has images in shape (105,105,3). Since this raw image gives us a range of (0 to 105) on the x axis and (105 to 0) on the y axis, I am converting the image into (x,y) datapoints by using the code below where example is the raw image:
example = x_train[21]
x = []
y = []
i1 = np.argwhere(example < 1)
H = i1.shape[1]
i1[:1] = [H-1]-i1[:1]
for i in i1:
x.append(i[0])
y.append(i[1])
plt.scatter(x,y)
xy = np.column_stack((x,y))
This code changes
to this:
Obviously i am doing something wrong and there is some random fricken dot in the bottom left corner that is driving me crazy. Is there a better less stupid way to do this? Ideally a nice mapping with the proper orientation and size would be great.

How to convert YUV_420_888 to BGR using opencv python?

I have three ndarray which is Y.shape(307200,) U.shape(153599,) V.shape(153599,). what is the efficient way to convert this to BGR using opencv python? Those array are in YUV_420_888
format.
my_image which is 640*640
My code is
Y= np.fromstring(Y, dtype=np.uint8)
U= np.fromstring(U, dtype=np.uint8)
V= np.fromstring(V, dtype=np.uint8)
Y= np.reshape(Y, (480,640))
U= np.reshape(U, (480,320))
V= np.reshape(V, (480,320))
YUV = np.append(Y,U)
YUV = np.append(YUV,V)
img = np.reshape(YUV,(960,640))
img = np.asarray(img, dtype = np.uint8)
img = cv2.cvtColor(img, cv2.COLOR_YUV2BGR_NV21)

Updated Answer
The information here tells me that an Android NV21 image is stored with all the Y (Luminance) values contiguously and sampled at the full resolution followed by the V and the U samples interleaved and stored at 1/4 the resolution (1/2 the height by 1/2 the width). I have created a dummy NV21 frame below and converted it into OpenCV BGR format and that confirms the layout and the way OpenCV interprets it too. All the code below works in order from top to bottom, so just remove the images and squidge all the lines up together to make a Python script:
#!/usr/bin/env python3
import cv2
import numpy as np
# Define width and height of image
w,h = 640,480
# Create black-white gradient from top to bottom in Y channel
f = lambda i, j: int((i*256)/h)
Y = np.fromfunction(np.vectorize(f), (h,w)).astype(np.uint8)
# DEBUG
cv2.imwrite('Y.jpg',Y)
That gives Y:
# Dimensions of subsampled U and V
UVwidth, UVheight = w//2, h//2
# U is a black-white gradient from left to right
f = lambda i, j: int((j*256)/UVwidth)
U = np.fromfunction(np.vectorize(f), (UVheight,UVwidth)).astype(np.uint8)
# DEBUG
cv2.imwrite('U.jpg',U)
That gives U:
# V is a white-black gradient from left to right
V = U[:,::-1]
# DEBUG
cv2.imwrite('V.jpg',V)
That gives V:
# Interleave U and V, V first NV21, U first for NV12
U = np.ravel(U)
V = np.ravel(V)
UV = np.empty((U.size+V.size), dtype=np.uint8)
UV[0::2] = V
UV[1::2] = U
# Lay out Y plane, followed by UV
YUV = np.append(Y,UV).reshape((3*h)//2,w)
BGR = cv2.cvtColor(YUV.astype(np.uint8), cv2.COLOR_YUV2BGR_NV21)
cv2.imwrite('result.jpg',BGR)
Which gives this. Hopefully you can see how that is the correct RGB representation of the individual Y, U, and V components.
So, in summary, I believe a 2x2 image in NV21 image is stored with interleaved VU, like this:
Y Y Y Y V U V U
and a 2x2 NV12 image is stored with interleaved UV, like this:
Y Y Y Y U V U V
and a YUV420 image (Raspberry Pi) is stored fully planar:
Y Y Y Y U U V V
Original Answer
I don't have your data to test with and your question is missing some details, but I see no-one is answering you after 5 hours, so I'll try and get you started... no-one said answers have to be complete.
Firstly, I guess from your Y.shape(307200) that your image is 640x480 pixels, correct?
Secondly, your U.shape(153599) and V.shape(153599) look incorrect - they should be exactly half the Y.shape since they are sampled down at a rate of 2:1.
Once you have got that sorted out, I think you need to take your Y array and append the U array, then the V array so you have one single contiguous array. You then need to pass that to cvtColor() with the code cv2.CV_YUV2BGR_NV21.
You may need to reshape your array before appending, something like im = Y.reshape(480,640).
I know when you use the C++ interface to OpenCV, you must set the height of the image to 1.5x the actual height (whilst leaving the width unchanged) - so you may need to try that too.
I can never remember all the constants OpenCV provides for image opening modes (like IMREAD_ANYDEPTH, IMREAD_GRAYSCALE) and for cvtColor(), so here's a handy way of finding them. I start ipython and if am looking for the Android NV21 constants, I do:
import cv2
[i for i in dir(cv2) if 'NV21' in i]
Out[29]:
['COLOR_YUV2BGRA_NV21',
'COLOR_YUV2BGR_NV21',
'COLOR_YUV2GRAY_NV21',
'COLOR_YUV2RGBA_NV21',
'COLOR_YUV2RGB_NV21']
So the constant you need is probably COLOR_YUV2BGR_NV21
The same technique works for parameters to imread():
items=[i for i in dir(cv2) if i.startswith('IMREAD')]
In [22]: items
['IMREAD_ANYCOLOR',
'IMREAD_ANYDEPTH',
'IMREAD_COLOR',
'IMREAD_GRAYSCALE',
'IMREAD_IGNORE_ORIENTATION',
'IMREAD_LOAD_GDAL',
'IMREAD_REDUCED_COLOR_2',
'IMREAD_REDUCED_COLOR_4',
'IMREAD_REDUCED_COLOR_8',
'IMREAD_REDUCED_GRAYSCALE_2',
'IMREAD_REDUCED_GRAYSCALE_4',
'IMREAD_REDUCED_GRAYSCALE_8',
'IMREAD_UNCHANGED']

Correct method and Python package that can find width of an image's feature

The input is a spectrum with colorful (sorry) vertical lines on a black background. Given the approximate x coordinate of that band (as marked by X), I want to find the width of that band.
I am unfamiliar with image processing. Please direct me to the correct method of image processing and a Python image processing package that can do the same.
I am thinking PIL, OpenCV gave me an impression of being overkill for this particular application.
What if I want to make this an expert system that can classify them in the future?

I'll give a complete minimal working example (as suggested by sega_sai). I don't have access to your original image, but you'll see it doesn't really matter! The peak distributions found by the code below are:
Mean values at: 26.2840960523 80.8255092125
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("band2.png"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y
# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)
# Output the result
print "Mean values at: ", p1[1], p1[4]
# Plot the result
from pylab import *
subplot(211)
imshow(pic)
subplot(223)
plot(projection)
subplot(224)
plot(X,Y,'r',lw=5)
show()

Below is a simple thresholding method to find the lines and their width, it should work quite reliably for any number of lines. The yellow and black image below was processed using this script, the red/black plot illustrates the found lines using parameters of threshold = 0.3, min_line_width = 5)
The script averages the rows of an image, and then determines the basic start and end positions of each line based on a threshold (which you can set between 0 and 1), and a minimum line width (in pixels). By using thresholding and minimum line width you can easily filter your input images to get the lines out of them. The first function find_lines returns all the lines in an image as a list of tuples containing the start, end, center, and width of each line. The second function find_closest_band_width is called with the specified x_position, and returns the width of the closest line to this position (assuming you want distance to centre for each line). As the lines are saturated (255 cut-off per channel), their cross-sections are not far from a uniform distribution, so I don't believe trying to fit any kind of distribution is really going to help too much, just unnecessarily complicates.
import Image, ImageStat
def find_lines(image_file, threshold, min_line_width):
im = Image.open(image_file)
width, height = im.size
hist = []
lines = []
start = end = 0
for x in xrange(width):
column = im.crop((x, 0, x + 1, height))
stat = ImageStat.Stat(column)
## normalises by 2 * 255 as in your example the colour is yellow
## if your images start using white lines change this to 3 * 255
hist.append(sum(stat.sum) / (height * 2 * 255))
for index, value in enumerate(hist):
if value > threshold and end >= start:
start = index
if value < threshold and end < start:
if index - start < min_line_width:
start = 0
else:
end = index
center = start + (end - start) / 2.0
width = end - start
lines.append((start, end, center, width))
return lines
def find_closest_band_width(x_position, lines):
distances = [((value[2] - x_position) ** 2) for value in lines]
index = distances.index(min(distances))
return lines[index][3]
## set your threshold, and min_line_width for finding lines
lines = find_lines("8IxWA_sample.png", 0.7, 4)
## sets x_position to 59th pixel
print 'width of nearest line:', find_closest_band_width(59, lines)

I don't think that you need anything fancy for you particular task.
I would just use PIL + scipy. That should be enough.
Because you essentially need to take your image, make a 1D-projection of it
and then fit a Gaussian or something like that to it. The information about the approximate location of the band should be used a first guess for the fitter.

Numpy manipulating array of True values dependent on x/y index

So I have an array (it's large - 2048x2048), and I would like to do some element wise operations dependent on where they are. I'm very confused how to do this (I was told not to use for loops, and when I tried that my IDE froze and it was going really slow).
Onto the question:
h = aperatureimage
h[:,:] = 0
indices = np.where(aperatureimage>1)
for True in h:
h[index] = np.exp(1j*k*z)*np.exp(1j*k*(x**2+y**2)/(2*z))/(1j*wave*z)
So I have an index, which is (I'm assuming here) essentially a 'cropped' version of my larger aperatureimage array. *Note: Aperature image is a grayscale image converted to an array, it has a shape or text on it, and I would like to find all the 'white' regions of the aperature and perform my operation.
How can I access the individual x/y values of index which will allow me to perform my exponential operation? When I try index[:,None], leads to the program spitting out 'ValueError: broadcast dimensions too large'. I also get array is not broadcastable to correct shape. Any help would be appreciated!
One more clarification: x and y are the only values I would like to change (essentially the points in my array where there is white, z, k, and whatever else are defined previously).
EDIT:
I'm not sure the code I posted above is correct, it returns two empty arrays. When I do this though
index = (aperatureimage==1)
print len(index)
Actually, nothing I've done so far works correctly. I have a 2048x2048 image with a 128x128 white square in the middle of it. I would like to convert this image to an array, look through all the values and determine the index values (x,y) where the array is not black (I only have white/black, bilevel image didn't work for me). I would then like to take all the values (x,y) where the array is not 0, and multiply them by the h[index] value listed above.
I can post more information if necessary. If you can't tell, I'm stuck.
EDIT2: Here's some code that might help - I think I have the problem above solved (I can now access members of the array and perform operations on them). But - for some reason the Fx values in my for loop never increase, it loops Fy forever....
import sys, os
from scipy.signal import *
import numpy as np
import Image, ImageDraw, ImageFont, ImageOps, ImageEnhance, ImageColor
def createImage(aperature, type):
imsize = aperature*8
middle = imsize/2
im = Image.new("L", (imsize,imsize))
draw = ImageDraw.Draw(im)
box = ((middle-aperature/2, middle-aperature/2), (middle+aperature/2, middle+aperature/2))
import sys, os
from scipy.signal import *
import numpy as np
import Image, ImageDraw, ImageFont, ImageOps, ImageEnhance, ImageColor
def createImage(aperature, type):
imsize = aperature*8 #Add 0 padding to make it nice
middle = imsize/2 # The middle (physical 0) of our image will be the imagesize/2
im = Image.new("L", (imsize,imsize)) #Make a grayscale image with imsize*imsize pixels
draw = ImageDraw.Draw(im) #Create a new draw method
box = ((middle-aperature/2, middle-aperature/2), (middle+aperature/2, middle+aperature/2)) #Bounding box for aperature
if type == 'Rectangle':
draw.rectangle(box, fill = 'white') #Draw rectangle in the box and color it white
del draw
return im, middle
def Diffraction(aperaturediameter = 1, type = 'Rectangle', z = 2000000, wave = .001):
# Constants
deltaF = 1/8 # Image will be 8mm wide
z = 1/3.
wave = 0.001
k = 2*pi/wave
# Now let's get to work
aperature = aperaturediameter * 128 # Aperaturediameter (in mm) to some pixels
im, middle = createImage(aperature, type) #Create an image depending on type of aperature
aperaturearray = np.array(im) # Turn image into numpy array
# Fourier Transform of Aperature
Ta = np.fft.fftshift(np.fft.fft2(aperaturearray))/(len(aperaturearray))
# Transforming and calculating of Transfer Function Method
H = aperaturearray.copy() # Copy image so H (transfer function) has the same dimensions as aperaturearray
H[:,:] = 0 # Set H to 0
U = aperaturearray.copy()
U[:,:] = 0
index = np.nonzero(aperaturearray) # Find nonzero elements of aperaturearray
H[index[0],index[1]] = np.exp(1j*k*z)*np.exp(-1j*k*wave*z*((index[0]-middle)**2+(index[1]-middle)**2)) # Free space transfer for ap array
Utfm = abs(np.fft.fftshift(np.fft.ifft2(Ta*H))) # Compute intensity at distance z
# Fourier Integral Method
apindex = np.nonzero(aperaturearray)
U[index[0],index[1]] = aperaturearray[index[0],index[1]] * np.exp(1j*k*((index[0]-middle)**2+(index[1]-middle)**2)/(2*z))
Ufim = abs(np.fft.fftshift(np.fft.fft2(U))/len(U))
# Save image
fim = Image.fromarray(np.uint8(Ufim))
fim.save("PATH\Fim.jpg")
ftfm = Image.fromarray(np.uint8(Utfm))
ftfm.save("PATH\FTFM.jpg")
print "that may have worked..."
return
if __name__ == '__main__':
Diffraction()
You'll need numpy, scipy, and PIL to work with this code.
When I run this, it goes through the code, but there is no data in them (everything is black). Now I have a real problem here as I don't entirely understand the math I'm doing (this is for HW), and I don't have a firm grasp on Python.
U[index[0],index[1]] = aperaturearray[index[0],index[1]] * np.exp(1j*k*((index[0]-middle)**2+(index[1]-middle)**2)/(2*z))
Should that line work for performing elementwise calculations on my array?

Could you perhaps post a minimal, yet complete, example? One that we can copy/paste and run ourselves?
In the meantime, in the first two lines of your current example:
h = aperatureimage
h[:,:] = 0
you set both 'aperatureimage' and 'h' to 0. That's probably not what you intended. You might want to consider:
h = aperatureimage.copy()
This generates a copy of aperatureimage while your code simply points h to the same array as aperatureimage. So changing one changes the other.
Be aware, copying very large arrays might cost you more memory then you would prefer.
What I think you are trying to do is this:
import numpy as np
N = 2048
M = 64
a = np.zeros((N, N))
a[N/2-M:N/2+M,N/2-M:N/2+M]=1
x,y = np.meshgrid(np.linspace(0, 1, N), np.linspace(0, 1, N))
b = a.copy()
indices = np.where(a>0)
b[indices] = np.exp(x[indices]**2+y[indices]**2)
Or something similar. This, in any case, sets some values in 'b' based on the x/y coordinates where 'a' is bigger than 0. Try visualizing it with imshow. Good luck!
Concerning the edit
You should normalize your output so it fits in the 8 bit integer. Currently, one of your arrays has a maximum value much larger than 255 and one has a maximum much smaller. Try this instead:
fim = Image.fromarray(np.uint8(255*Ufim/np.amax(Ufim)))
fim.save("PATH\Fim.jpg")
ftfm = Image.fromarray(np.uint8(255*Utfm/np.amax(Utfm)))
ftfm.save("PATH\FTFM.jpg")
Also consider np.zeros_like() instead of copying and clearing H and U.
Finally, I personally very much like working with ipython when developing something like this. If you put the code in your Diffraction function in the top level of your script (in place of 'if __ name __ &c.'), then you can access the variables directly from ipython. A quick command like np.amax(Utfm) would show you that there are indeed values!=0. imshow() is always nice to look at matrices.

My own OCR-program in Python

I am still a beginner but I want to write a character-recognition-program. This program isn't ready yet. And I edited a lot, therefor the comments may not match exactly. I will use the 8-connectivity for the connected component labeling.
from PIL import Image
import numpy as np
im = Image.open("D:\\Python26\\PYTHON-PROGRAMME\\bild_schrift.jpg")
w,h = im.size
w = int(w)
h = int(h)
#2D-Array for area
area = []
for x in range(w):
area.append([])
for y in range(h):
area[x].append(2) #number 0 is white, number 1 is black
#2D-Array for letter
letter = []
for x in range(50):
letter.append([])
for y in range(50):
letter[x].append(0)
#2D-Array for label
label = []
for x in range(50):
label.append([])
for y in range(50):
label[x].append(0)
#image to number conversion
pix = im.load()
threshold = 200
for x in range(w):
for y in range(h):
aaa = pix[x, y]
bbb = aaa[0] + aaa[1] + aaa[2] #total value
if bbb<=threshold:
area[x][y] = 1
if bbb>threshold:
area[x][y] = 0
np.set_printoptions(threshold='nan', linewidth=10)
#matrix transponation
ccc = np.array(area)
area = ccc.T #better solution?
#find all black pixel and set temporary label numbers
i=1
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
letter[x][y]=1
label[x][y]=i
i += 1
#connected components labeling
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
label[x][y]=i
#if pixel has neighbour:
if area[x][y+1]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
if area[x+1][y]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
#should i also compare pixel and left neighbour?
#find width of the letter
#find height of the letter
#find the middle of the letter
#middle = [width/2][height/2] #?
#divide letter into 30 parts --> 5 x 6 array
#model letter
#letter A-Z, a-z, 0-9 (maybe more)
#compare each of the 30 parts of the letter with all model letters
#make a weighting
#print(letter)
im.save("D:\\Python26\\PYTHON-PROGRAMME\\bild2.jpg")
print('done')

OCR is not an easy task indeed. That's why text CAPTCHAs still work :)
To talk only about the letter extraction and not the pattern recognition, the technique you are using to separate the letters is called Connected Component Labeling. Since you are asking for a more efficient way to do this, try to implement the two-pass algorithm that's described in this article. Another description can be found in the article Blob extraction.
EDIT: Here's the implementation for the algorithm that I have suggested:
import sys
from PIL import Image, ImageDraw
class Region():
def __init__(self, x, y):
self._pixels = [(x, y)]
self._min_x = x
self._max_x = x
self._min_y = y
self._max_y = y
def add(self, x, y):
self._pixels.append((x, y))
self._min_x = min(self._min_x, x)
self._max_x = max(self._max_x, x)
self._min_y = min(self._min_y, y)
self._max_y = max(self._max_y, y)
def box(self):
return [(self._min_x, self._min_y), (self._max_x, self._max_y)]
def find_regions(im):
width, height = im.size
regions = {}
pixel_region = [[0 for y in range(height)] for x in range(width)]
equivalences = {}
n_regions = 0
#first pass. find regions.
for x in xrange(width):
for y in xrange(height):
#look for a black pixel
if im.getpixel((x, y)) == (0, 0, 0, 255): #BLACK
# get the region number from north or west
# or create new region
region_n = pixel_region[x-1][y] if x > 0 else 0
region_w = pixel_region[x][y-1] if y > 0 else 0
max_region = max(region_n, region_w)
if max_region > 0:
#a neighbour already has a region
#new region is the smallest > 0
new_region = min(filter(lambda i: i > 0, (region_n, region_w)))
#update equivalences
if max_region > new_region:
if max_region in equivalences:
equivalences[max_region].add(new_region)
else:
equivalences[max_region] = set((new_region, ))
else:
n_regions += 1
new_region = n_regions
pixel_region[x][y] = new_region
#Scan image again, assigning all equivalent regions the same region value.
for x in xrange(width):
for y in xrange(height):
r = pixel_region[x][y]
if r > 0:
while r in equivalences:
r = min(equivalences[r])
if not r in regions:
regions[r] = Region(x, y)
else:
regions[r].add(x, y)
return list(regions.itervalues())
def main():
im = Image.open(r"c:\users\personal\py\ocr\test.png")
regions = find_regions(im)
draw = ImageDraw.Draw(im)
for r in regions:
draw.rectangle(r.box(), outline=(255, 0, 0))
del draw
#im.show()
output = file("output.png", "wb")
im.save(output)
output.close()
if __name__ == "__main__":
main()
It's not 100% perfect, but since you are doing this only for learning purposes, it may be a good starting point. With the bounding box of each character you can now use a neural network as others have suggested here.

OCR is very, very hard. Even with computer-generated characters, it's quite challenging if you don't know the font and font size in advance. Even if you're matching characters exactly, I would not call it a "beginning" programming project; it's quite subtle.
If you want to recognize scanned, or handwritten characters, that's even harder - you'll need to use advanced math, algorithms, and machine learning. There are quite a few books and thousands of articles written about this topic, so you don't need to reinvent the wheel.
I admire your effort, but I don't think you've gotten far enough to hit any of the actual difficulties yet. So far you're just randomly exploring pixels and copying them from one array to another. You haven't actually done any comparison yet, and I'm not sure the purpose of your "random walk".
Why random? Writing correct randomized algorithms is quite difficult. I would recommend starting with a deterministic algorithm first.
Why are you copying from one array to another? Why not just compare directly?
When you get the comparison, you'll have to deal with the fact that the image is not exactly the same as the "prototype", and it's not clear how you'll deal with that.
Based on the code you've written so far, though, I have an idea for you: try writing a program that finds its way through a "maze" in an image. The input would be the image, plus the start pixel and the goal pixel. The output is a path through the maze from the start to the goal. This is a much easier problem than OCR - solving mazes is something that computers are great for - but it's still fun and challenging.

Most OCR algorithms these days are based on neural network algorithms. Hopfield networks are a good place to start. Based on the Hopfield Model available here in C, I built a very basic image recognition algorithm in python similar to what you describe. I've posted the full source here. It's a toy project and not suitable for real OCR, but can get you started in the right direction.
The Hopfield model is used as an autoassociative memory to store and recall a set of bitmap images. Images are stored by calculating a corresponding weight matrix. Thereafter, starting from an arbitrary configuration, the memory will settle on exactly that stored image, which is nearest to the starting configuration in terms of Hamming distance. Thus given an incomplete or corrupted version of a stored image, the network is able to recall the corresponding original image.
A Java applet to toy with an example can be found here; the network is trained with example inputs for the digits 0-9. Draw in the box on the right, click test and see the results from the network.
Don't let the mathematical notation intimidate you, the algorithms are straightforward once you get to source code.

OCR is very, very difficult! What approach to use to attempt OCR will be based on what you are trying to accomplish (hand writing recongnition, computer generated text reading, etc.)
However, to get you started, read up on Neural Networks and OCR. Here are a few jump-right-in articles on the subject:
http://www.codeproject.com/KB/cs/neural_network_ocr.aspx
http://www.codeproject.com/KB/dotnet/simple_ocr.aspx
Use your favorite search engine to find information.
Have fun!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.