I'm new to image processing. I want to perform character segmentation in OCR. I have already done necessary pre-processing. When I perform character segmentation by finding contouring it works well except for the character 3, 8.
After pre processed image looks like this,
Output after finding contouring for 3 and 8 is
Code used:
imgGray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, imgThresh = cv2.threshold(imgGray, 127, 255, 0)
image, contours , _ = cv2.findContours(imgThresh, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
But it gives good result for other characters:
How to solve this issue?
Since your image does not involve any complex background I used Otsu's method to pick the right threshold for me:
threshold, thresh_img = cv2.threshold(imgGray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
Now when you find contours it will be easier. That is because contours are found for objects or charaters that are in white.
Have a look at those characters that caused problems now:
Related
I'm writing a program that takes an image that has a grid of 4 by 4 letters somewhere in that image.
E.g.
1
I want to read these letters into my program and for that I'm using pytesseract for the OCR.
Before feeding the image to pytesseract I do some preprocessing with openCV to increase the odds of pytesseract working correctly.
This is the code I use for this:
import cv2
img = cv2.imread('my_image.png')
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_pre_processed = cv2.threshold(img_gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
And these are some sample outputs of img_pre_processed:
2 3 4
Since the letters in the grid are spaced apart pytesseract has a difficult time to read them when I give the entire image as input. So it would be helpful if I knew the coordinates of every letter, then I could edit the image in such a way that pytesseract can always recognise them.
I started to try and solve this problem on my own and the solution I'm coming up with might work but it's getting rather complicated. So I'm wondering if there is a better way to do it.
At the moment I'm using the cv2.findContours() function to get all the contours of the objects in the image. For every contour I calculate the center coordinates and the area of the box you would be able to draw around it. I then sort these by area to get the largest contours. Now here it starts to get more and more complicated. I can't just say take the biggest 16 contours, because there might be unwanted objects in the picture that have a bigger area than the 16 letters that I want. Also some letters like O, P, Q,... have 2 contours and their inner contour might even be bigger than another letters outer contour like the letter I for example.
E.g. This is an image with the 18 biggest contours marked with a green box. 5
So to continue with my way of attacking the problem I would have to write an algorithm that finds the contours that are most likely part of the grid while ignoring the contours that are unwanted and also the inner contours of letters that have 2 contours.
While this is possible, I'm wondering if there is be a better way of doing this.
Somebody told me that if you can filter the image in such a way that everything gets more blurry so that all the letters become blobs. That it might be possible to do a pattern detection with 4x4 grid of blobs. But I don't know how to do that or if that's possible.
So if somebody knows a better way to tackle this problem or if you know how to execute the plan of attack I mentioned earlier that would be most helpfull.
Thanks in advance!
You can simply filter the bounding rectangles by width and height. As this is a rule based approach, it may need more example images to fine tune the filter rules.
import cv2
# get bounding rectangles of contours
img = cv2.imread('img.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
bbox = [cv2.boundingRect(c) for c in contours]
# filter rectangles by width and height
for x, y, w, h in bbox:
if (4 < w < 200) and (30 < h < 200):
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:
Using the threshold functions in open CV on an image to get a binary image, with Otsu's thresholding I get a image that has white spots due to different lighting conditions in parts of the image
or with adaptive threshold to fix the lighting conditions, it fails to accurately represent the pencil-filled bubbles that Otsu actually can represent.
How can I get both the filled bubbles represented and a fixed lighting conditions without patches?
Here's the original image
Here is my code
#binary image conversion
thresh2 = cv2.adaptiveThreshold(papergray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 21, 13)
thresh = cv2.threshold(papergray, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
cv2.imshow("Binary", thresh) #Otsu's
cv2.imshow("Adpative",thresh2)
An alternative approach would be to apply a morphological closing, which would remove all the drawing, yielding an estimate of the illumination level. Dividing the image by the illumination level gives you an image of the sheet corrected for illumination:
In this image we can easily apply a global threshold:
I used the following code:
import diplib as dip
img = dip.ImageRead('tlCw6.jpg')(1)
corrected = img / dip.Closing(img, dip.SE(40, 'rectangular'))
out = dip.Threshold(corrected, method='triangle')[0]
This can be done with cv2.ADAPTIVE_THRESH_MEAN_C:
import cv2
img = cv2.imread("omr.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 31, 10)
cv2.imshow("Mean Adaptive Thresholding", thresh)
cv2.waitKey(0)
The output is:
Problems with your approach:
The methods you have tried out:
Otsu threshold is decided based on all the pixel values in the entire image (global technique). If you look at the bottom-left of your image, there is a gray shade which can have an adverse effect in deciding the threshold value.
Adaptive threshold: here is a recent answer on why it isn't helpful. In short, it acts like an edge detector for smaller kernel sizes
What you can try:
OpenCV's ximgproc module has specialized binary image generation methods. One such method is the popular Niblack threshold technique.
This is a local threshold technique that depends on statistical measures. It divides the image into blocks (sub-images) of size predefined by the user. A threshold is set based on the mean minus k times standard deviation of pixel values for each block. The k is decided by the user.
Code:
img =cv2.imread('omr_sheet.jpg')
blur = cv2.GaussianBlur(img, (3, 3), 0)
gray = cv2.cvtColor(blur, cv2.COLOR_BGR2GRAY)
niblack = cv2.ximgproc.niBlackThreshold(gray, 255, cv2.THRESH_BINARY, 41, -0.1, binarizationMethod=cv2.ximgproc.BINARIZATION_NICK)
Result:
Links:
To know more about cv2.ximgproc.niBlackThreshold
There are other binarization techniques available that you may want to explore. It also contains links to research papers that explain each of these techniques on detail.
Edit:
Adaptive threshold actually works if you know what you are working with. You can decide the kernel size beforehand.
See Prashant's answer.
I am using pytesseract to read number from the screen in real-time.
The image mostly number, dot and 2 letters (M and R) as below.
In real-time number will keep changing but the letter M and R will stay the same place.
Background will always green with black letters.
As you can see the number on image is very clear but the pytesseract read the number and the result is not really satisfy. Sometime its read 7 become 1.
I would like to find the algorithms that help improce OCR result.
Currently I am using Pillow to convert image to gray scale and also try resize image bigger or smaller but still improve result much. Also applied filter on the image as below but result still not 100% correct.
img = cv2.imread('screenshot.png')
img = cv2.resize(img, None, fx=scale_factor, fy=scale_factor, interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
text = tess.image_to_string(img)
Please help suggest any algorithms that will help improve this OCR result.
You can easily detect applying simple-thresholding
Threshold
Result
3845.86 M51.31 M 309.12 3860.43 R191.90 R23.44
Thresholding will show the features of the image.
Code:
import cv2
import pytesseract
img = cv2.imread("UEWHj.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
I want to detect how many number of cards are present in this image using python.I was trying with white pixel but not getting the correct result.
My code is given below:
import cv2
import numpy as np
img = cv2.imread('imaagi.jpg', cv2.IMREAD_GRAYSCALE)
n_white_pix = np.sum(img == 255)
print('Number of white pixels:', n_white_pix)
I am a beginner. So unable to find out the way.
This solution is with respect to the image you have provided and the implementation is in OpenCV.
Code:
im = cv2.imread('C:/Users/Jackson/Desktop/cards.jpg', 1)
#--- convert the image to HSV color space ---
hsv = cv2.cvtColor(im, cv2.COLOR_BGR2HSV)
cv2.imshow('H', hsv[:,:,0])
cv2.imshow('S', hsv[:,:,1])
#--- find Otsu threshold on hue and saturation channel ---
ret, thresh_H = cv2.threshold(hsv[:,:,0], 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
ret, thresh_S = cv2.threshold(hsv[:,:,1], 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
#--- add the result of the above two ---
cv2.imshow('thresh', thresh_H + thresh_S)
#--- some morphology operation to clear unwanted spots ---
kernel = np.ones((5, 5), np.uint8)
dilation = cv2.dilate(thresh_H + thresh_S, kernel, iterations = 1)
cv2.imshow('dilation', dilation)
#--- find contours on the result above ---
(_, contours, hierarchy) = cv2.findContours(dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
#--- since there were few small contours found, retain those above a certain area ---
im2 = im.copy()
count = 0
for c in contours:
if cv2.contourArea(c) > 500:
count+=1
cv2.drawContours(im2, [c], -1, (0, 255, 0), 2)
cv2.imshow('cards_output', im2)
print('There are {} cards'.format(count))
Result:
On the terminal I got: There are 6 cards
Depending on how exactly your "white pixel approach" was working (please share more details on that if possible), you could try a simple image binarization, which is a well-established way of separating different objects/entities in your image. Granted, it will work only on grayscale images, but that is something you can also easily fix with sklearn.
It might provide optimal results right away, especially if the lighting conditions vary across images, or you have (as seen above) cards that contain a wide variety of colors.
To circumvent this, you could also try to look into different color spaces, e..g HSV.
If that still does not work, I would recommend using image segmentation libraries from OpenCV or similra libraries. The problem is that they usually also bring some unwanted complexity to your project, which might not be necessary if it works with a simple approach such as the binarization.
I am trying to do characters detection, have to draw a box around them, then crop and then feed to a neural network for recognition. Everything is working but before I was using sets of characters on a single color background image and segmentation was easily done.
However with real photos I have different lighting conditions and really struggle to find the contours.
After applying some adaptive thresholding I managed to get the folowing results, but starting from that I really can't figure how to properly proceed and detect each character. I can detect half of the characters easily, but not all of them. Probably because they are surrounded by lots of small irrelevant contours.
I have a feeling there is one step left but I can't figure which one.
Find Countours is capable of finding only about half of the characters.
For now, in short, im doing:
im_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
im_gray = cv2.GaussianBlur(im_gray, (5, 5), 0)
_, th1 = cv2.threshold(im_gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
cim, ctrs, hier = cv2.findContours(th1.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
and
th2 = cv2.adaptiveThreshold(im_gray,255,cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY,11,2)
Images below - original image and some variations of intermediate results.
Original picture:
After some thresholding:
After some thresholding:
Inverse thresholding:
So the question is - what is the step/steps after to segment the characters?
You can perform difference of gaussians. The idea is to blur the image with two different kernels and subtract their respective results:
Code:
im = cv2.imread(img, 0)
#--- it is better to take bigger kernel sizes to remove smaller edges ---
kernel1 = 15
kernel2 = 31
blur1 = cv2.GaussianBlur(im,(kernel1, kernel1), 0)
blur2 = cv2.GaussianBlur(im,(kernel2, kernel2), 0)
cv2.imshow('Difference of Gaussians',blur2 - blur1)
Result: