How to improve python/tesseract Image to Text accuracy? - python

How can I grab an image from a region and properly use tesseract to translate to text? I got this currently:
img = ImageGrab.grab(bbox =(1341,182, 1778, 213))
tesstr = pytesseract.image_to_string(np.array(img), lang ='eng')
print (tesstr)
Issue is that it translates it incredibly wrong because the region it's getting the text from is in red with blue background, how can I improve its accuracy? Example of what it's trying to turn from image to text:

*Issue is that it translates it incredibly wrong because the region it's getting the text from is in red with blue background, how can I improve its accuracy? *
You should know the Improving the quality of the output. You need to try each of the suggested method listed. If you still can't achieve the desired result, you should look at the other methods:
Thresholding Operations using inRange
Changing Colorspaces
Image segmentation
To get the desired result, you need to get the binary mask of the image. Both simple threshold, and adaptive-threshold won't work for the input image.
To get the binary mask
Up-sample and convert input image to the HSV color-space
Set lower and higher color boundaries.
Result:
The OCR output for 0.37 version will be:
Day 20204, 16:03:12: Your ‘Metal Triangle Foundation’
was destroved!
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("b.png")
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to HSV color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get the binary mask
msk = cv2.inRange(hsv, np.array([0, 0, 123]), np.array([179, 255, 255]))
# OCR
txt = pytesseract.image_to_string(msk)
print(txt)
# Display
cv2.imshow("msk", msk)
cv2.waitKey(0)

There is an option in the Tesseract API such that you are able to increase the DPI at which you examine the image to detect text. Higher the DPI, hihger the precision, till diminishing returns set in. More processing power is required. DPI should not exceed original image DPI.

Related

Remove large blobs of noise while leaving text intact with opencv

I'm trying to process an image with opencv for feeding it to tesseract for text recognition. I'm working with tags which have a lot of noise and an holographic pattern which varies greatly depending on light conditions.
For this reason I tried different approaches.
First I convert the image to grayscale, then I apply a median blur to soften the noise, then apply an adaptive threshold for masking.
I end up with this result
However the image still has a lot of noise around the text which really messes with its recognition. I'm new to image processing and I'm a bit lost as to how to proceed.
Here's the code for the aforementioned approach:
def approach_3(img, blurIterations=13):
img = rotate_img(img)
img = resize(img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY).astype(np.uint8)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
median = cv2.medianBlur(gray, blurIterations)
th2 = cv2.adaptiveThreshold(median,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
cv2.THRESH_BINARY, 31 , 12)
return th2

Python Tesseract OCR isn't properly detecting both numbers

This is the image I'm using. This is the output that was returned with print(pytesseract.image_to_string(img)):
#& [COE] daKizz
6 Grasslands
Q Badlands
#g Swamp
VALUE: # 7,615,; HIRE)
I've tried using the config second parameter (from psm 0 - 13, and "digits") but it doesn't make much difference. I'm considering looking into training tesseract to recognize the type of numbers in an image like that but I'm hoping there are other options.
I've also tried cleaning up the picture but I'm getting the same output
How can I fix this?
I guess you are missing the image processing part.
From the documentation "Improving the quality of the output" you may be interested in Image Processing section.
One way of reaching the desired numbers is inRange thresholding
To apply in-range, we first need to convert the image in hsv colorspace. Since we want the image between the range of pixels.
The result will be:
Now if you read it with "digits" option:
53.8.
53.8.
453.3.
7.615.725.834
One possible question is: Why inRange threshold?
Well, if you try with the other methods (simple or adaptive) you will see inRange is more accurate for the given input image.
Another possible question is: Can we use the values of the cv2.inRange method for other images.
You may not get the desired result. For other images you need to find other range values.
Code:
import cv2
import pytesseract
import numpy as np
bgr_image = cv2.imread("vNYcf.jpg")
hsv_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv_image, np.array([0, 180, 218]), np.array([60, 255, 255]))
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dilate = cv2.dilate(mask, kernel, iterations=1)
thresh = 255 - cv2.bitwise_and(dilate, mask)
text = pytesseract.image_to_string(thresh, config="--psm 6 digits")
print(text)

Pytesseract Image to String issue

Does anyone know how I can get these results better?
Total Kills: 15,230,550
Kill Details: (recorded after 2019/10,/Z3]
993,151 331,129
1,330,450 33,265,533
5,031,168
This is what it returns however it is meant to be the same as the image posted below, I am new to python so are there any parameters that I can add to make it read the image better?
img = cv2.imread("kills.jpeg")
text = pytesseract.image_to_string(img)
print(text)
This is my code to read the image, Is there anything I can add to make it read better? Also, the black boxes are to cover images that were interfering with the reading. I would like to also say that I have added the 2 black boxes to see if the images behind them were causing the issue, but I still get the same issue.
The missing knowledge is page-segmentation-mode (psm). You need to use them, when you can't get the desired result.
If we look at your image, the only artifacts are the black columns. Other than that, the image looks like a binary image. Suitable for tesseract to recognize the characters and the digits.
Lets try reading the image by setting the psm to 6.
6 Assume a single uniform block of text.
print(pytesseract.image_to_string(img, config="--psm 6")
The result will be:
Total Kills: 75,230,550
Kill Details: (recorded after 2019/10/23)
993,161 331,129
1,380,450 33,265,533
5,031,168
Update
The second way to solve the problem is getting binary mask and applying OCR to the mask features.
Binary-mask
Features of the binary-mask
As we can see the result is slightly different from the input image. Now when we apply OCR result will be:
Total Kills: 75,230,550
Kill Details: (recorded after 2019/10/23)
993,161 331,129
1,380,450 33,265,533
5,031,168
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("LuKz3.jpg")
# Convert to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get the binary mask
msk = cv2.inRange(hsv, np.array([0, 0, 0]), np.array([179, 255, 154]))
# Extract
krn = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dlt = cv2.dilate(msk, krn, iterations=5)
res = 255 - cv2.bitwise_and(dlt, msk)
# OCR
txt = pytesseract.image_to_string(res, config="--psm 6")
print(txt)
# Display
cv2.imshow("res", res)
cv2.waitKey(0)

Thresholding picture in opencv

I need help thresholding a picture. I need identify different types of gummies but i cannot get past the thresholding part of my project.
I have various pictures but this is one of them:
I have done this using mean_c threshold but i need better results to find the countours after.
This is the original picture:
You may get better results by converting the image from RGB to HSV color space and threshold by hue (that's color value) and saturation (that's how much color is there compared to the gray value). Using saturation you might get the most of your gummies, except the transparent ones -- these are quite hard to get.
On the other hand, you may try to use edge detections, since your paper is flat and gummies really stand out. Here's the edge detection result I've got:
here's the code:
#!/usr/bin/env python
import cv2
img = cv2.imread( 'Downloads/gummies.jpg' )
img = cv2.pyrDown(cv2.pyrDown( img ))
laplacian = cv2.Laplacian(img, cv2.CV_8U)
cv2.normalize( laplacian, img, 0, 600, cv2.NORM_MINMAX)
cv2.imshow( 'frame', img )
cv2.waitKey(0)
cv2.destroyAllWindows()

Convert image to binary doesn't shows white lines of a soccer field

I am inspired by the following blogpost, however I am struggling with step 2/3.
I want to creates a binary image from a gray image based on the threshold values and ultimately displaying all white lines on the image. My desired output looks as follows:
First, I want to isolate the soccer field by using colour-thresholding and morphology.
def isolate_field(img):
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# find green pitch
light_green = np.array([40, 40, 40])
dark_green = np.array([70, 255, 255])
mask = cv2.inRange(hsv, light_green, dark_green)
# removing small noises
kernel = np.ones((5, 5), np.uint8)
opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
# apply mask over original frame
return cv2.bitwise_and(frame, frame, mask=opening)
This gives the following output:
I am happy with the results so far, but because of the large shadow I am struggling with the image-processing when I grayscale the picture. As a result, the binary thresholding is based on the sunny part in the upper-left corner instead of the white lines around the soccer field.
Following the methodology on the tutorials I get the following output for the simple thresholding:
and adaptive thresholding:
and finally, Otsu's thresholding:
How can I make sure that the white lines become more visible? I was thinking about cropping the frame so I only see the field and then use a mask based on the color white. That didn't work out unfortunately.
Help is much appreciated,
You can modify inRange to also exclude saturated colors (meaning the greens). I don't have your original image, so I used your intermediate result:
The result of inRange is the binary image you want. I expect you can achieve better results with the original image. I used this script in the image - which makes it easy to search for good HSV values.

Categories