how do i extract numbers from an image, row by row? - python

after preprocessing an image of a sudoku board (from web) with opencv, I managed to get the following picture:
looping through the contours and extracting each value using pytesseract and psm 10 (single character) resulted in junk values.
thus i would like to slice the image to rows and try to extract the values using the config psm 6, hoping it might work.
The approach i took is the simply numpy-slicing the row and trying to extract the values, although it doesn't work, giving me SystemError: tile cannot extend outside image after the first iteration although im sure the slicing occur inside the image
y = 1
for x in range(1, 9):
cropped_row = mask[y*33-33:y*33-1][x*33-33:x*33-1]
text = tess.image_to_string(np.array(cropped_row), config='--psm 6')
y += 1
print(text)
i would like some guidance to the ecorrect aproach in OCRing rows from the image

in the end i took a slightly different approach as explained by natancy in this answer.
I focused on the grid lines, and removed all values so that findcontours() will locate all grid cells.
then, i looped through all contours and checked if they're a cell (sizewise) or some other contour.
if it is a cell, a mask made only the current cell visible (and its values when used bitwise_and(original_image, mask)
that way i could get a blank image with only a single number, and i ran that image through tesseract.
some text clearing later i got my desired output.
extraction of numbers:
list_of_clues = []
for contour in contours:
extracted_value = ''
# create black mask
mask = np.zeros(processed.shape, dtype=np.uint8)
# check if contour is a cell
area = cv2.contourArea(contour)
if 700 <= area <= 1000: # contour is a cell
cv2.drawContours(mask, [contour], -1, WHITE, -1) # color everything in mask, but the contour- white
isolated_cell = cv2.bitwise_and(processed, mask)
isolated_cell[mask == 0] = 255 # invert isolated_cell's mask to WHITE (for tess)
# extract text from isolated_cell
text = tess.image_to_string(isolated_cell, config='--psm 10')
# clean non-numbers:
for ch in text:
if ch.isdigit():
extracted_value = ch
# calculate cell coordinates only if extracted_value exist
if extracted_value:
# relevant for my proj, extract grid coordinates of extracted value
[x_pos, y_pos, wid, hei] = cv2.boundingRect(contour) # get contour's sizes
x_coord = int(x_pos // (grid_size_pixels / 9)) # get x row-coordinate
y_coord = int(y_pos // (grid_size_pixels / 9)) # get y col-coordinate
list_of_clues.append(((x_coord, y_coord), int(extracted_value)))
else: # contour isn't a cell
continue

I have tried this:
custom_oem_psm_config = r'--oem 3 --psm 6 -c tessedit_char_whitelist="0123456789"'# -c preserve_interword_spaces=0'
text= pytesseract.pytesseract.image_to_string(otsu, config=custom_oem_psm_config)
print(text)
Output:
2 91
4 67 13
2 976
4 9
9816 2754
3 1
653 7
24 85 1
46 2
If you want to get the exact positions of the numbers, try numpy slicing and sort them from left to right and top to bottom, then pass each number to tesseract.

Related

how to segment text/handwritten lines using horizontal profile projection?

I have managed to get the horizontal profile projection of a handwritten image(the code of which in python is mentioned below). I wish to segment the individual lines and save them. I know this can be done using other methods but I wish to implement it by the horizontal profile projection that I have obtained. Point/Pixel of interest is the starting point from where the projection profile initiates or it is greater than zero till again the projection profile reaches to zero.
Horizontal Profile Projection of handwritten image
The peaks in the image depicts where it detects the text in the image, now I wish to segment and save those sections/individual lines of text of the original image.
def getHorizontalProjectionProfile(image):
# Convert black spots to ones
image[image == 0] = 1
# Convert white spots to zeros
image[image == 255] = 0
horizontal_projection = np.sum(image, axis=1)
return (horizontal_projection, image)
#Calling the horizontal projection function
horizontal_projection = getHorizontalProjectionProfile(binary.copy())
m = np.max(horizontal_projection[0])
w = 500
result = np.zeros((horizontal_projection[0].shape[0],500))
for row in range(image.shape[0]):
cv2.line(result, (0,row), (int(horizontal_projection[0] [row]*w/m),row), (255,255,255), 1)
cv2.imshow('Result', result)
cv2.waitKey()
So the result variable displays the image of the horizontal profile projection. Also the variable binary.copy() holds the binary image of the input handwritten image.
Kindly let me know if the post requires any further changes.
Is this what you are looking for?
import cv2
import numpy as np
def getHorizontalProjectionProfile(image):
# convert black spots to ones and others to zeros
binary = np.where(image == 0, 1, 0)
# add up rows
horizontal_projection = np.sum(binary, axis=1)
return horizontal_projection
# read image and get threshold
img = cv2.imread("img.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 127, 255, 0)
# call the horizontal projection function
horizontal_projection = getHorizontalProjectionProfile(thresh)
start_of_line = 0
count = 0
for row in range(len(horizontal_projection)):
if horizontal_projection[row] == 0:
if start_of_line > 0:
count += 1
print(f"Line {count} found from row {start_of_line} to {row}")
start_of_line = 0
else:
if start_of_line == 0:
start_of_line = row
Output something like:
Line 1 found from row 15 to 43
Line 2 found from row 109 to 143
Line 3 found from row 156 to 190
Line 4 found from row 203 to 237
...

Remove the selected elements from the image in OpenCV

I have this image with tables where I want to remove the tabular structure from the image so that it can work more effectively with Tesseract. I used the following code to create a boundary around the table (and individual cells) so that it can be deleted.
img =cv2.imread('bfir.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize = 3)
img1 = np.ones(img.shape, dtype=np.uint8)*255
ret,thresh = cv2.threshold(gray,127,255,1)
(_,contours,h) = cv2.findContours(thresh,1,2)
for cnt in contours:
approx = cv2.approxPolyDP(cnt,0.01*cv2.arcLength(cnt,True),True)
if len(approx)==4:
cv2.drawContours(img1,[cnt],0,(0,255,0),2)
This draws green lines around the table like this image.
Next, I tried the cv2.subtract method to subtract the table from the image, somewhat like this.
final_img = cv2.subtract(img1, img)
But this didn't work as I expected and gives me a grayscale image with the table still in it. Link
While I just want the original image in B&W with the table removed. I am using OpenCV for the first time so I don't know what I am doing wrong and I am sorry for the long post but if anybody can please help with how to go about with this or just point me in the right direction about how to remove the table, that would be very much appreciated.
EDIT:
As suggested by RobAu it can also work with simply drawing the contours in white in the first place but I don't know how to do that without losing the rest of the data in the preprocessing stage.
You could try and simply overwrite the cells that represent the borders. This can be done by creating a mask image, and then using that as reference as to where to overwrite pixels in the original.
This can be done with:
mask_image = np.zeros(img.shape[0:2], np.uint8)
cv2.drawContours(mask_image, contours, -1, color=255, thickness=2)
border_points = np.array(np.where(mask_image == 255)).transpose()
background = [0, 0, 0] # Change this to the colour you want
for point in border_points :
img[point[0], point[1]] = background
Update:
You could use the 3-channel you already created for the mask, but that slightly complicates the algorithms. The mask image propose is more fitted for the task, but I will try to adapt it to your code:
# Create your mask image as usual...
border_points = np.array(np.where(img1[:,:,1] == 255)).transpose() # Only look at channel 2
background = [0, 0, 0] # Change this to the colour you want
for point in border_points :
img[point[0], point[1]] = background
Update to do as #RobAu suggested (quicker than my previous methods):
line_thickness = 3 # Change this value until it looks the best.
cv2.drawContours(img, contours, -1, color=(0,0,0), thickness=line_thickness )
Please note I didn't test this code. So it might need some further fiddling.
As a reference to the comments of this question, this is an example of a code that locates rectangles and creates new images for each one, this was an attempt at creating individual images of a picture of shredded paper. Some of the values will need to be changed for it to locate the rectangles with the right amount of size
There is also some code for tracking sizes of images and the code is made up by 50% what i have written and 50% by stackoverflow help.
import cv2
import numpy as np
fileName = ['9','8','7','6','5','4','3','2','1','0']
img = cv2.imread('#YOUR IMAGE#')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
kernel = np.ones((5,5),np.uint8)
erosion = cv2.erode(gray,kernel,iterations = 2)
kernel = np.ones((4,4),np.uint8)
dilation = cv2.dilate(erosion,kernel,iterations = 2)
edged = cv2.Canny(dilation, 30, 200)
_, contours, hierarchy = cv2.findContours(edged, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
rects = [cv2.boundingRect(cnt) for cnt in contours]
rects = sorted(rects,key=lambda x:x[1],reverse=True)
i = -1
j = 1
y_old = 5000
x_old = 5000
for rect in rects:
x,y,w,h = rect
area = w * h
print('width: %d and height: %d' %(w,h))
if w > 50 and h > 500:
print('abs:')
print(abs(x_old - x))
if abs(x_old - x) > 0:
print('writing')
x_old = x
x,y,w,h = rect
out = img[y+10:y+h-10,x+10:x+w-10]
cv2.imwrite('assets/newImage' + fileName[i] + '.jpg', out)
j+=1
if (y_old - y) > 1000:
i += 1
y_old = y
Even though, the given input image links are not working & so I obviously doesn't know the following is what you have asked for, I learnt something from your question, when I was working on, removing table structure lines from given image, I like to share what I have learnt, for the future readers.
I followed the steps provided in opencv documentation to remove the lines.
But that only removed the horizontal lines. When I tried to remove vertical lines, the result image only had the vertical lines. The text in the table was not there.
Then I came across your question & saw final_img = cv2.subtract(img1, img) in the question. Tried that & it worked great.
Here are the steps that I followed:
# Load the image
src = cv.imread(argv[0], cv.IMREAD_COLOR)
# Check if image is loaded fine
if src is None:
print ('Error opening image: ' + argv[0])
return -1
# Show source image
cv.imshow("src", src)
# [load_image]
# [gray]
# Transform source image to gray if it is not already
if len(src.shape) != 2:
gray = cv.cvtColor(src, cv.COLOR_BGR2GRAY)
else:
gray = src
# Show gray image
# show_wait_destroy("gray", gray)
# [gray]
# [bin]
# Apply adaptiveThreshold at the bitwise_not of gray, notice the ~ symbol
gray = cv.bitwise_not(gray)
bw = cv.adaptiveThreshold(gray, 255, cv.ADAPTIVE_THRESH_MEAN_C, \
cv.THRESH_BINARY, 15, -2)
# Show binary image
# show_wait_destroy("binary", bw)
# [bin]
# [init]
# Create the images that will use to extract the horizontal and vertical lines
horizontal = np.copy(bw)
vertical = np.copy(bw)
# [horiz]
# [vert]
# Specify size on vertical axis
rows = vertical.shape[0]
verticalsize = rows / 10
# Create structure element for extracting vertical lines through morphology operations
verticalStructure = cv.getStructuringElement(cv.MORPH_RECT, (1, verticalsize))
# Apply morphology operations
vertical = cv.erode(vertical, verticalStructure)
vertical = cv.dilate(vertical, verticalStructure)
# [init]
# [horiz]
# Specify size on horizontal axis
cols = horizontal.shape[1]
horizontal_size = cols / 30
# Create structure element for extracting horizontal lines through morphology operations
horizontalStructure = cv.getStructuringElement(cv.MORPH_RECT, (horizontal_size, 1))
# Apply morphology operations
horizontal = cv.erode(horizontal, horizontalStructure)
horizontal = cv.dilate(horizontal, horizontalStructure)
lines_removed = cv.subtract(gray, vertical + horizontal)
show_wait_destroy("lines_removed", ~lines_removed)
Input:
Output:
Few things that I changed from the sources:
verticalsize = rows / 10, here, I do not understand the significance of the number 10. In the documentation, 30 was used. I got better result with 10. I guess, the less the division number, the large the structure element & here, as we are targeting straight lines, reducing the number works.
In the documentation, vertical lines are processed after horizontal lines. I reversed the order
I swapped the parameters to cv2.substract(). I used cv2.subtract(img, img1).

Applying Horizontal Sobel Mask rotates the image by 180 degrees

In one of my personal projects, I tried to apply the following horizontal edge mask on a grayscale image. By applying the horizontal edge mask, I am trying to detect horizontal edges in the image.
[1 2 1
0 0 0
-1 -2 -1]
When I tried to convolve my image matrix with the mask given above, the output image is rotated by 180 degrees. I am not sure whether it is expected behavior or if I am doing something wrong?
Here is code snippet of convolution.
def convolution(self):
result = np.zeros((self.mat_width, self.mat_height))
print(self.mat_width)
print(self.mat_height)
for i in range(0, self.mat_width-self.window_width):
for j in range(0, self.mat_height-self.window_height):
# deflate both mat and mask
# if j+self.window_height >= self.mat_height:
# row_index = j+self.window_height + 1
# else:
row_index = j+self.window_height
col_index = i+self.window_width
mat_masked = self.mat[j:row_index, i:col_index]
# pixel position
index_i = i + int(self.window_width / 2)
index_j = j + int(self.window_height / 2)
prod = np.sum(mat_masked*self.mask)
if prod >= 255:
result[index_i, index_j] = 255
else:
result[index_i, index_j] = 0
return result
The original grayscale input image is here -
Here is the output that is getting generated.
The indices when writing to the output are reversed. You are flipping the horizontal and vertical coordinates which would actually be transposing your image output and the output that you see is a result of transposing an image.
In addition, you aren't declaring the output size of your image correctly. The first dimension spans the rows or the height while the second dimension spans the columns or the width. The first change you must make is swapping the input dimensions of the output image:
result = np.zeros((self.mat_height, self.mat_width))
Secondly, the variable index_i is traversing horizontally while the variable index_j is traversing vertically. You just have to flip the order so that you are writing the results correctly:
if prod >= 255:
result[index_j, index_i] = 255
else:
result[index_j, index_i] = 0
If for some reason you don't want to change the order, then leave your code the way it is, including how you declared the output dimensions of your image and simply return the result transposed:
return result.T

open cv, img[x,y] always returning 0

I get centroid of objects in an image like this:
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
_, contours, _ = cv2.findContours(gray.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_TC89_L1)
centres = []
for i in range(len(contours)):
moments = cv2.moments(contours[i])
centres.append((int(moments['m10']/moments['m00']), int(moments['m01']/moments['m00'])))
I am looping over the centres and trying to get the colour of each centre pixel. For some reason every return is 0,0,0
for c in centres:
print img[c]
I also get this error
IndexError: index 484 is out of bounds for axis 0 with size 480
Image into openCv numpy structure is a 3D matrix.
To have the intensity in the pixel of coordinates x,y (remember that y are the rows) you have to do this (in grayscale image)
intensity = img[y,x]
And when I read your error line I think it is your unique mistake.
To have colours (in BGR) you have to write something like
blue = img[y,x,0]
green = img[y,x,1]
red = img[y,x,2]
You should check if it is your situation by using
print c
and see what are the center coordinates. If you obtain something lixe
c(x,y) = 484, 300
in a 640 x 480 image, it is sure that you have to use img[y,x] because coordinates gives x first but matrices want rows first.
value = img[row,column]

Python create image with continous input

I have a code where an image got converted to B/W.
Now I want to build a new image in reference to the original image.
The output of the original image are the X-/Y-coordinates and "1" and "0" for Black and White.
The new image will receive these information but not chronologically.
Therefore it must check and provide a negative output if it already has received information about a specific coordinate so that double entries can be avoided.
I havenĀ“t found many similar examples to this; only some examples that are going in the about direction.
Does anyone have an idea how to realize that?
UPDATE:
I built the code which converts a pixel from a white image black, if the reference pixel from the original image is black (Otherwise it leaves it white).
Furthermore the used coordinate is entered into a list and checked if used.
However, this part is not working properly.
Although the coordinate [10, 10] has been used in the loop before, the code displays Coordinate not in the system
Any help would be appreciated!
import cv2
import numpy
white = cv2.imread('white.jpg') #loading white image
white = cv2.resize(white,(640,480)) #adjusting it to the size of the original image
y = 0 #for testing purposes the white image gets blackened manually
x = 0
j = 0
while j < 50:
content = numpy.zeros((200, 2)) #creating a list with 200 entries, every entry contains 2 values
content = ([x, y]) #adding two values to the list
if condition[y, x] = 1: #condition = 1 means that in the reference picture at this coordinate the pixel is black
white[y,x] = 0 #"0" creates a black pixel at the specified coordinate on the white image
x += 5
y += 5
j += 1
x = 10 #taking a value which already has been used
y = 10
try:
b = content.index([x, y]) #check if coordinate is in the list
except ValueError:
print("Coordinate not in the system")
else:
print("Coordinate already in the system")
i = 0
while i < 100:
cv2.imshow('Bild', white) #displays the image
if cv2. waitKey(1) == ord('q'):
break
It took me a while but I was able to solve it without any complex lists or arrays.
Might not be the most elegant way but at least it is working!
I created a second white picture (=reference) which is getting compared if the coordinate has already been used or not.
If the coordinate has not been used, it will create a black pixel.
The next time it is checking this coordinate it will find a black pixel and therefore know that it has been used.
In the end the white image will contain 49 black pixels (because the position [10, 10] has already been used and will not become painted).
import cv2
import numpy
white = cv2.imread('C:\white.jpg') #loading white image
reference = cv2.imread('C:\white.jpg') #loading white image
white = cv2.resize(white,(640,480)) #adjusting it to the size of the original image
reference = cv2.resize(white,(640,480)) #adjusting it to the size of the original image
y = 0 #for testing purposes the white image gets blackened manually
x = 0
j = 0
reference[10,10] = 0
while j < 50:
if [255,255,255] in reference[y,x]:
reference[y,x] = 0 #"0" creates a black pixel at the specified coordinate on the reference image
white[y,x] = 0 #"0" creates a black pixel at the specified coordinate on the white image
print("Coordinate not in system")
else:
print("coordinate already in system")
x += 5
y += 5
j += 1
i = 0
while i < 100:
cv2.imshow('image copy', white) #displays the image
if cv2. waitKey(1) == ord('q'):
break

Categories