I have an image (saved as a variable called canny_image) and it looks like this after preprocessing.
I am basically trying to find the distance between the first two vertical lines. I tried using the hough_line function from skimage, but it's unable to find the first line, so I thought it might be easier to solve this manually.
I am basically trying to solve this by going through each row in the image until I get to the first pixel with a value of 255, (the lines have a value of 255, while everything else is zero), and then I store the location of that pixel in an array. And I take the mode of the values in the array as the x location of the first line. I'll do the same for the 2nd line by using the first x-value as a starting point.
def find_lines(canny_image):
threshold = 255
for y in range(canny_image.shape[0]):
for x in range(canny_image.shape[1]):
if canny_image[x, y] == threshold:
return x
This is the code I wrote to get the x-location of the first line, however, I'm not getting the desired output. Any help on how to solve this will be much appreciated. Thanks!
Perhaps try something like this
# Returns an array of lines x positions in an image
def find_line_x_positions(image, lines_to_detect: int, buffer_zone: int):
threshold = 255
(height, width) = image.shape
x_position_sums = np.zeros((lines_to_detect, 2), np.double) # For each line, store x_pos sum[i,0] and point count[i,1]
for y in range(height):
buffer = 0
line_index = 0
for x in range(width):
if buffer > 0:
buffer -= 1
if (image[y, x] >= threshold) and (buffer == 0):
buffer = buffer_zone
x_position_sums[line_index, 0] += x
x_position_sums[line_index, 1] += 1
line_index += 1
if ((line_index) == lines_to_detect):
break
# Divide the x position sums by the point counts to get the average x position for each line
results = x_position_sums[np.all(x_position_sums,axis=1)]
results = np.divide(results[:,0],results[:,1])
return results
You can also try OpenCV's HoughLines() function which is simpler to implement than scikit lib. When I tested OpenCV implementation out, it seems to have a hard time finding vertical lines(within 10 degrees from vertical) but you can solve this by rotating your image X degrees and look for lines within that range of rotation.
Related
I Want to mark the pixels,
Mark=[2, 455, 6, 556, 12, 654, 22, 23, 4,86,.....]
in such a way that it will not mark the 1st 2 pixels and then mark next 455 pixels by a color, again for next 6 pixels it will not mark and again mark the next 556 pixels by the same color and so on.
The size of the image is 500x500x3. How do I calculate these steps?
Img=np.zeros((500,500,3),dtype=np.uint8)
Your algorithm is actually in your question. By 500x500x3 I guess that you mean your image is 500 (width) on 500 (height) with 3 color channel?
It could be implemented as follows, without any optimizations:
color = (128, 50, 30)
x, y = 0, 0
for (skip, count) in [Mark[n:n+2] for n in range(len(Mark) // 2)]:
x += skip
y += x // 500 # keep track of the lines, when x > 500,
# it means we are on a new line
x %= 500 # keep the x in bounds
# colorize `count` pixels in the image
for i in range(0, count):
Img[x, y, 0] = color[0]
Img[x, y, 1] = color[1]
Img[x, y, 2] = color[2]
x += 1
y += x // 500
x %= 500 # keep the x in bounds
The zip([a for i, a in enumerate(Mark) if i % 2 == 0], [a for i, a in enumerate(Mark) if i % 2 != 0]) is a just a way to group the pairs (skip, pixel to colorize). It could definitely be improved though, I'm no Python expert.
EDIT: modified the zip() to use [Mark[n:n+2] for n in range(len(Mark) // 2)] as suggested by Peter, much simpler and easier to understand.
The easiest way is probably to convert the image to a Numpy array:
import numpy as np
na = np.array(Img)
And then use Numpy ravel() to give you a flattened (1-D) view of the array
flat = np.ravel(na)
You can now see the shape of your flat view:
print(flat.shape)
Then you can do your colouring by iterating over your array of offsets from your question. Then the good news is that, because ravel() gives you a view into your original data all the changes you make to the view will be reflected in your original data.
So, to get back to a PIL Image, all you need is:
RecolouredImg = Image.fromarray(na)
Try it out by just colouring the first ten pixels before worrying about your long list.
If you like working with Python lists (I don't), you can achieve a similar effect by using PIL getdata() to get a flattened list of pixels and then process the list against your requirements and putdata() to put them all back. The effect will be the same, just choose a method that fits how your brain works.
i am very new in matlab. i want to write the code for local histogram equalization . i have been written code for global histogram equalization and i know that local equalization means do equalization for each part of image seperately but my question is that how i should choose this part of images ? for example should i do equalization for each 100 pixel that are neighbor separate of other pixels ? in the other word how i can take apart image to some parts and then do equalization to each part?
The most naive way to do what you ask is split up your image into non-overlapping blocks, do your global histogram code on that block and save it to the output. Suppose you defined the rows and columns of these non-overlapping blocks as the variables rows and cols. In your case, let's say it's 100 x 100, so rows = 100; cols = 100;. You would simply loop over each non-overlapping block, do your histogram equalization then set this to the same locations in the output.
Something like this below, assuming your image is stored in im:
rows = 100;
cols = 100;
out = zeros(size(im)); % Declare output variable
for ii = 1 : rows : size(im, 1)
for jj = 1 : cols : size(im, 2)
% Get the block
row_begin = ii;
row_end = min(size(im, 1), ii + rows);
col_begin = jj;
col_end = min(size(im, 2), jj + cols);
blk = im(row_begin : row_end, col_begin : col_end, :);
% Perform histogram equalization with the block stored in blk
% ...
% Assume the output of this is stored in O
out(row_begin : row_end, col_begin : col_end, :) = O;
end
end
Note the intricacy of the variable blk that stores the non-overlapping block. We let the beginning row and column simply be the loop counter ii and jj, but the ending row and column we must make sure that it's bounded by the dimensions of the image. That's why the min call is there. Otherwise, the ending row and column is simply the beginning row and column added by the size of the block in the corresponding dimensions. Also note that I've used : to index into the third dimension in case you have a colour image. Grayscale should not affect this code. You finally need to use the same indexing when storing the output in the output image. Note that I've assumed this is stored in the variable O which is the output of your customized histogram equalization function.
The output out will contain your locally histogram equalized image. Take note that you could theoretically do this in one line using blockproc in the image processing toolbox if you have it. This processes distinct blocks in your image and applies some function to it. Assuming your histogram equalization function is called hsteq, you would simply do this:
rows = 100; cols = 100;
out = blockproc(im, [rows, cols], #(s) hsteq(s.data));
The first input is the image you want to process, the second input defines the block size and finally the last element is the function you want to apply to each block. Note that blockproc supplies a customized structure into your function and so what is important is that you pull out the data field in the structure. This should produce the same output as the code above with loops.
We can use the tile-based local (adaptive) histogram equalization to implement AHE (as suggested in the other answer), but in that case we need to implement a bilinear interpolation-like technique to prevent sudden change of contrasts at the edges of the window, e.g., observe the equalized output below with python implementation of the same (here a 50x50 window is used for the tile):
def AHE(im, tile_x=8, tile_y=8):
h, w = im.shape
out = np.zeros(im.shape) # Declare output variable
for i in range(0, h, tile_x):
for j in range(0, w, tile_y):
# Get the block
blk = im[i: min(i + tile_x, h), j: min(j + tile_y, w)]
probs = get_distr(blk)
out[i: min(i + tile_x, h), j: min(j + tile_y, w)] = CHE(blk, probs)
return out
def CHE(im, probs):
T = np.array(list(map(int, 255*np.cumsum(probs))))
return T[im]
def get_distr(im):
hist, _ = np.histogram(im.flatten(),256,[0,256])
return hist / hist.sum()
We could instead implement the AHE algorithm from this thesis:
The implementation of algorithm yields better results (without the boundary artifacts):
I can compute the SLIC boundaries using skimage as follows:
def compute_superpixels(frame, num_pixels=100, std=5, iter_max=10,
connectivity=False, compactness=10.0):
return slic(frame, n_segments=num_pixels, sigma=std, max_iter=iter_max,
enforce_connectivity=connectivity, compactness=compactness)
Now, what I would like to do is get the index of pixels which form the boundary of each label. So my idea was to get all pixels belonging to a given segment and then check which pixels have a change in all two directions
def boundary_pixels(segments, index):
# Get all pixels having a given index
x, y = np.where(segments == index)
right = x + 1
# check we are in bounds
right_mask = right < segments.shape[0]
down = y + 1
down_mask = down < segments.shape[1]
left = x - 1
left_mask = left >= 0
up = y - 1
up_mask = up >= 0
neighbors_1 = np.union1d(right_n, down_n)
neighbors_2 = np.union1d(left_n, up_n)
neighbors = np.union1d(neighbors_1, neighbors_2)
# Not neighbours to ourselves
neighbors = np.delete(neighbors, np.where(neighbors == i))
However, with this all I managed to do was to get the neighbours in the 4 directions of a given label. Can someone suggest some way to actually get all pixels on the border of the label.
I found an answer to my own question. The mark_boundaries in the skimage.segmentation package does exactly what I needed.
Usage:
processed = mark_boundaries(frame, segments==some_segment)
Here frame is he current image frame and segments is the label array. some_segment is the label integer index whose boundaries we are interested in.
You can make use of the find_contours function available in skimage.measure module to find the co-ordinates of the pixels along the boundary. An example is available at find_contours.. Next, you can change for change in both directions as needed.
The input is a spectrum with colorful (sorry) vertical lines on a black background. Given the approximate x coordinate of that band (as marked by X), I want to find the width of that band.
I am unfamiliar with image processing. Please direct me to the correct method of image processing and a Python image processing package that can do the same.
I am thinking PIL, OpenCV gave me an impression of being overkill for this particular application.
What if I want to make this an expert system that can classify them in the future?
I'll give a complete minimal working example (as suggested by sega_sai). I don't have access to your original image, but you'll see it doesn't really matter! The peak distributions found by the code below are:
Mean values at: 26.2840960523 80.8255092125
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("band2.png"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y
# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)
# Output the result
print "Mean values at: ", p1[1], p1[4]
# Plot the result
from pylab import *
subplot(211)
imshow(pic)
subplot(223)
plot(projection)
subplot(224)
plot(X,Y,'r',lw=5)
show()
Below is a simple thresholding method to find the lines and their width, it should work quite reliably for any number of lines. The yellow and black image below was processed using this script, the red/black plot illustrates the found lines using parameters of threshold = 0.3, min_line_width = 5)
The script averages the rows of an image, and then determines the basic start and end positions of each line based on a threshold (which you can set between 0 and 1), and a minimum line width (in pixels). By using thresholding and minimum line width you can easily filter your input images to get the lines out of them. The first function find_lines returns all the lines in an image as a list of tuples containing the start, end, center, and width of each line. The second function find_closest_band_width is called with the specified x_position, and returns the width of the closest line to this position (assuming you want distance to centre for each line). As the lines are saturated (255 cut-off per channel), their cross-sections are not far from a uniform distribution, so I don't believe trying to fit any kind of distribution is really going to help too much, just unnecessarily complicates.
import Image, ImageStat
def find_lines(image_file, threshold, min_line_width):
im = Image.open(image_file)
width, height = im.size
hist = []
lines = []
start = end = 0
for x in xrange(width):
column = im.crop((x, 0, x + 1, height))
stat = ImageStat.Stat(column)
## normalises by 2 * 255 as in your example the colour is yellow
## if your images start using white lines change this to 3 * 255
hist.append(sum(stat.sum) / (height * 2 * 255))
for index, value in enumerate(hist):
if value > threshold and end >= start:
start = index
if value < threshold and end < start:
if index - start < min_line_width:
start = 0
else:
end = index
center = start + (end - start) / 2.0
width = end - start
lines.append((start, end, center, width))
return lines
def find_closest_band_width(x_position, lines):
distances = [((value[2] - x_position) ** 2) for value in lines]
index = distances.index(min(distances))
return lines[index][3]
## set your threshold, and min_line_width for finding lines
lines = find_lines("8IxWA_sample.png", 0.7, 4)
## sets x_position to 59th pixel
print 'width of nearest line:', find_closest_band_width(59, lines)
I don't think that you need anything fancy for you particular task.
I would just use PIL + scipy. That should be enough.
Because you essentially need to take your image, make a 1D-projection of it
and then fit a Gaussian or something like that to it. The information about the approximate location of the band should be used a first guess for the fitter.
I am still a beginner but I want to write a character-recognition-program. This program isn't ready yet. And I edited a lot, therefor the comments may not match exactly. I will use the 8-connectivity for the connected component labeling.
from PIL import Image
import numpy as np
im = Image.open("D:\\Python26\\PYTHON-PROGRAMME\\bild_schrift.jpg")
w,h = im.size
w = int(w)
h = int(h)
#2D-Array for area
area = []
for x in range(w):
area.append([])
for y in range(h):
area[x].append(2) #number 0 is white, number 1 is black
#2D-Array for letter
letter = []
for x in range(50):
letter.append([])
for y in range(50):
letter[x].append(0)
#2D-Array for label
label = []
for x in range(50):
label.append([])
for y in range(50):
label[x].append(0)
#image to number conversion
pix = im.load()
threshold = 200
for x in range(w):
for y in range(h):
aaa = pix[x, y]
bbb = aaa[0] + aaa[1] + aaa[2] #total value
if bbb<=threshold:
area[x][y] = 1
if bbb>threshold:
area[x][y] = 0
np.set_printoptions(threshold='nan', linewidth=10)
#matrix transponation
ccc = np.array(area)
area = ccc.T #better solution?
#find all black pixel and set temporary label numbers
i=1
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
letter[x][y]=1
label[x][y]=i
i += 1
#connected components labeling
for x in range(40): # width (later)
for y in range(40): # heigth (later)
if area[x][y]==1:
label[x][y]=i
#if pixel has neighbour:
if area[x][y+1]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
if area[x+1][y]==1:
#pixel and neighbour get the lowest label
pass # tomorrows work
#should i also compare pixel and left neighbour?
#find width of the letter
#find height of the letter
#find the middle of the letter
#middle = [width/2][height/2] #?
#divide letter into 30 parts --> 5 x 6 array
#model letter
#letter A-Z, a-z, 0-9 (maybe more)
#compare each of the 30 parts of the letter with all model letters
#make a weighting
#print(letter)
im.save("D:\\Python26\\PYTHON-PROGRAMME\\bild2.jpg")
print('done')
OCR is not an easy task indeed. That's why text CAPTCHAs still work :)
To talk only about the letter extraction and not the pattern recognition, the technique you are using to separate the letters is called Connected Component Labeling. Since you are asking for a more efficient way to do this, try to implement the two-pass algorithm that's described in this article. Another description can be found in the article Blob extraction.
EDIT: Here's the implementation for the algorithm that I have suggested:
import sys
from PIL import Image, ImageDraw
class Region():
def __init__(self, x, y):
self._pixels = [(x, y)]
self._min_x = x
self._max_x = x
self._min_y = y
self._max_y = y
def add(self, x, y):
self._pixels.append((x, y))
self._min_x = min(self._min_x, x)
self._max_x = max(self._max_x, x)
self._min_y = min(self._min_y, y)
self._max_y = max(self._max_y, y)
def box(self):
return [(self._min_x, self._min_y), (self._max_x, self._max_y)]
def find_regions(im):
width, height = im.size
regions = {}
pixel_region = [[0 for y in range(height)] for x in range(width)]
equivalences = {}
n_regions = 0
#first pass. find regions.
for x in xrange(width):
for y in xrange(height):
#look for a black pixel
if im.getpixel((x, y)) == (0, 0, 0, 255): #BLACK
# get the region number from north or west
# or create new region
region_n = pixel_region[x-1][y] if x > 0 else 0
region_w = pixel_region[x][y-1] if y > 0 else 0
max_region = max(region_n, region_w)
if max_region > 0:
#a neighbour already has a region
#new region is the smallest > 0
new_region = min(filter(lambda i: i > 0, (region_n, region_w)))
#update equivalences
if max_region > new_region:
if max_region in equivalences:
equivalences[max_region].add(new_region)
else:
equivalences[max_region] = set((new_region, ))
else:
n_regions += 1
new_region = n_regions
pixel_region[x][y] = new_region
#Scan image again, assigning all equivalent regions the same region value.
for x in xrange(width):
for y in xrange(height):
r = pixel_region[x][y]
if r > 0:
while r in equivalences:
r = min(equivalences[r])
if not r in regions:
regions[r] = Region(x, y)
else:
regions[r].add(x, y)
return list(regions.itervalues())
def main():
im = Image.open(r"c:\users\personal\py\ocr\test.png")
regions = find_regions(im)
draw = ImageDraw.Draw(im)
for r in regions:
draw.rectangle(r.box(), outline=(255, 0, 0))
del draw
#im.show()
output = file("output.png", "wb")
im.save(output)
output.close()
if __name__ == "__main__":
main()
It's not 100% perfect, but since you are doing this only for learning purposes, it may be a good starting point. With the bounding box of each character you can now use a neural network as others have suggested here.
OCR is very, very hard. Even with computer-generated characters, it's quite challenging if you don't know the font and font size in advance. Even if you're matching characters exactly, I would not call it a "beginning" programming project; it's quite subtle.
If you want to recognize scanned, or handwritten characters, that's even harder - you'll need to use advanced math, algorithms, and machine learning. There are quite a few books and thousands of articles written about this topic, so you don't need to reinvent the wheel.
I admire your effort, but I don't think you've gotten far enough to hit any of the actual difficulties yet. So far you're just randomly exploring pixels and copying them from one array to another. You haven't actually done any comparison yet, and I'm not sure the purpose of your "random walk".
Why random? Writing correct randomized algorithms is quite difficult. I would recommend starting with a deterministic algorithm first.
Why are you copying from one array to another? Why not just compare directly?
When you get the comparison, you'll have to deal with the fact that the image is not exactly the same as the "prototype", and it's not clear how you'll deal with that.
Based on the code you've written so far, though, I have an idea for you: try writing a program that finds its way through a "maze" in an image. The input would be the image, plus the start pixel and the goal pixel. The output is a path through the maze from the start to the goal. This is a much easier problem than OCR - solving mazes is something that computers are great for - but it's still fun and challenging.
Most OCR algorithms these days are based on neural network algorithms. Hopfield networks are a good place to start. Based on the Hopfield Model available here in C, I built a very basic image recognition algorithm in python similar to what you describe. I've posted the full source here. It's a toy project and not suitable for real OCR, but can get you started in the right direction.
The Hopfield model is used as an autoassociative memory to store and recall a set of bitmap images. Images are stored by calculating a corresponding weight matrix. Thereafter, starting from an arbitrary configuration, the memory will settle on exactly that stored image, which is nearest to the starting configuration in terms of Hamming distance. Thus given an incomplete or corrupted version of a stored image, the network is able to recall the corresponding original image.
A Java applet to toy with an example can be found here; the network is trained with example inputs for the digits 0-9. Draw in the box on the right, click test and see the results from the network.
Don't let the mathematical notation intimidate you, the algorithms are straightforward once you get to source code.
OCR is very, very difficult! What approach to use to attempt OCR will be based on what you are trying to accomplish (hand writing recongnition, computer generated text reading, etc.)
However, to get you started, read up on Neural Networks and OCR. Here are a few jump-right-in articles on the subject:
http://www.codeproject.com/KB/cs/neural_network_ocr.aspx
http://www.codeproject.com/KB/dotnet/simple_ocr.aspx
Use your favorite search engine to find information.
Have fun!