Remove large blobs of noise while leaving text intact with opencv - python

I'm trying to process an image with opencv for feeding it to tesseract for text recognition. I'm working with tags which have a lot of noise and an holographic pattern which varies greatly depending on light conditions.
For this reason I tried different approaches.
First I convert the image to grayscale, then I apply a median blur to soften the noise, then apply an adaptive threshold for masking.
I end up with this result
However the image still has a lot of noise around the text which really messes with its recognition. I'm new to image processing and I'm a bit lost as to how to proceed.
Here's the code for the aforementioned approach:
def approach_3(img, blurIterations=13):
img = rotate_img(img)
img = resize(img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY).astype(np.uint8)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
median = cv2.medianBlur(gray, blurIterations)
th2 = cv2.adaptiveThreshold(median,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
cv2.THRESH_BINARY, 31 , 12)
return th2

Related

How to improve python/tesseract Image to Text accuracy?

How can I grab an image from a region and properly use tesseract to translate to text? I got this currently:
img = ImageGrab.grab(bbox =(1341,182, 1778, 213))
tesstr = pytesseract.image_to_string(np.array(img), lang ='eng')
print (tesstr)
Issue is that it translates it incredibly wrong because the region it's getting the text from is in red with blue background, how can I improve its accuracy? Example of what it's trying to turn from image to text:
*Issue is that it translates it incredibly wrong because the region it's getting the text from is in red with blue background, how can I improve its accuracy? *
You should know the Improving the quality of the output. You need to try each of the suggested method listed. If you still can't achieve the desired result, you should look at the other methods:
Thresholding Operations using inRange
Changing Colorspaces
Image segmentation
To get the desired result, you need to get the binary mask of the image. Both simple threshold, and adaptive-threshold won't work for the input image.
To get the binary mask
Up-sample and convert input image to the HSV color-space
Set lower and higher color boundaries.
Result:
The OCR output for 0.37 version will be:
Day 20204, 16:03:12: Your ‘Metal Triangle Foundation’
was destroved!
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("b.png")
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to HSV color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get the binary mask
msk = cv2.inRange(hsv, np.array([0, 0, 123]), np.array([179, 255, 255]))
# OCR
txt = pytesseract.image_to_string(msk)
print(txt)
# Display
cv2.imshow("msk", msk)
cv2.waitKey(0)
There is an option in the Tesseract API such that you are able to increase the DPI at which you examine the image to detect text. Higher the DPI, hihger the precision, till diminishing returns set in. More processing power is required. DPI should not exceed original image DPI.

How to segment curved roads/lanes/paths/tracks/objects etc?

Is there anyway of segmenting detecting tracks just like in below image using image processing techniques?
Figure 1: Tracks of wheels on sand
Figure 2: One example of a track to be detected
According to me, the answer is no. First I removed the illumination effect then apply Canny to get the features of the image, only a partial track is visible. Next, I performed colored segmentation to get the binary mask. Then I used a binary-mask to remove the background, again the only partial track is visible.
Removing the illumination effect: To make the track more visible, we need to reduce the lightning in the image. To reduce, we will smooth the image using cv2.GaussianBlur then we make the pixel evenly distribute within scale using cv2.divide.
# Load the image
img = cv2.imread("v1uU4.jpg")
# Convert to the gray-scale
gry = convert_to_grayscale(img)
# Remove the lightning effect
blr = cv2.GaussianBlur(gry, (125, 125), 0)
div = cv2.divide(gry, blr, scale=192)
Displaying Features of the image: Then we load the non-illuminated image, convert it to gray-scale, apply Gaussian Smoothing to get the features of the image:
# Load non-lightning image
img = cv2.imread("non-lightning.png")
# Convert to the gray-scale
gry = convert_to_grayscale(img)
# Remove the lightning effect
blr = cv2.GaussianBlur(gry, (5, 5), 0)
# Find canny features
cny = cv2.Canny(blr, 50, 200)
Here you see only the left part of the track is partially visible. Of course, different parameters will give different results. If you will try them more unwanted features become more available.
Second-method
Color-segmentation We will load the non-illuminated image, convert to HSV color-space and find the binary mask using cv2.inRange. Then we used the binary mask to make the track more visible.
# Convert to HSV color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Perform color-segmentation to get the binary mask
lwr = np.array([0, 0, 0])
upr = np.array([179, 255, 194])
msk = cv2.inRange(hsv, lwr, upr)
Extracting track using binary-mask:
# Extracting the rod using binary-mask
krn = cv2.getStructuringElement(cv2.MORPH_RECT, (50, 30))
dlt = cv2.dilate(msk, krn, iterations=5)
res = 255 - cv2.bitwise_and(dlt, msk)
From my point of view, it is impossible to remove the background and display only visible track.

Reading numbers using PyTesseract

I am trying to read numbers from images and cannot find a way to get it to work consistently (not all images have numbers). These are the images:
(here is the link to the album in case the images are not working)
This is the command I'm using to run tesseract on the images: pytesseract.image_to_string(image, timeout=2, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789'). I have tried multiple configurations, but this seems to work best.
As far as preprocessing goes, this works the best:
gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
im_bw = cv2.threshold(gray, thresh, 255, cv2.THRESH_BINARY_INV)[1]
This works for all images except the 3rd one. To solve the problem of lines in the 3rd image, i tried getting the edges with cv2.Canny and a pretty large threshold which works, but when drawing them back, even though it gets more than 95% of each number's edges, tesseract does not read them correctly.
I have also tried resizing the image, using cv2.morphologyEx, blurring it etc. I cannot find a way to get it to work for each case.
Thank you.
cv2.resize has consistently worked for me with INTER_CUBIC interpolation.
Adding this last step to pre-processing would most likely solve your problem.
im_bw_scaled = cv2.resize(im_bw, (0, 0), fx=4, fy=4, interpolation=cv2.INTER_CUBIC)
You could play around with the scale. I have used '4' above.
EDIT:
The following code worked with your images very well, even special characters. Please try it out with the rest of your dataset. Scaling, OTSU and erosion was the best combination.
import cv2
import numpy
import pytesseract
pytesseract.pytesseract.tesseract_cmd = "<path to tesseract.exe>"
# Page segmentation mode, PSM was changed to 6 since each page is a single uniform text block.
custom_config = r'--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
# load the image as grayscale
img = cv2.imread("5.png",cv2.IMREAD_GRAYSCALE)
# Change all pixels to black, if they aren't white already (since all characters were white)
img[img != 255] = 0
# Scale it 10x
scaled = cv2.resize(img, (0,0), fx=10, fy=10, interpolation = cv2.INTER_CUBIC)
# Retained your bilateral filter
filtered = cv2.bilateralFilter(scaled, 11, 17, 17)
# Thresholded OTSU method
thresh = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Erode the image to bulk it up for tesseract
kernel = numpy.ones((5,5),numpy.uint8)
eroded = cv2.erode(thresh, kernel, iterations = 2)
pre_processed = eroded
# Feed the pre-processed image to tesseract and print the output.
ocr_text = pytesseract.image_to_string(pre_processed, config=custom_config)
if len(ocr_text) != 0:
print(ocr_text)
else: print("No string detected")

cv2 digit-image postprocessing

I`m trying to implement the digits classifier by myself. And I faced some troubles with it. I'm training the NN on MNIST handwritten dataset, MNIST sample digit. But when I'm trying to predict what the digit is, I predicted from the image that I found and processed using cv2 - cv2 processed digit, as you can see, my own image has a fatter circuit and clear boundaries.
That is my digit-image before the processing - Before, and after - After. But I want to image be like this.
after the processing.
I use the following code to process each digit:
def main():
image = cv2.imread('digit.jpg', cv2.IMREAD_GRAYSCALE)
image = image.reshape((32,32,1))
image = postprocess(image)
def postprocess(gray):
kernel_size = 15
blur_gray = cv2.GaussianBlur(gray,(kernel_size, kernel_size), 0)
thresh = cv2.adaptiveThreshold(blur_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 3)
return thresh
I use 11 as a Threshold parameter to discard most artifacts, but my digit circuit is still bold/fat and have too clear boundaries.
Question is: how can I process the image to make it look like a training sample image(thicker and with blurred boundaries)?
I found a solution to my problem by using kernel filtering. Today I stumbled upon an article about kernel image processing, there were some kernels for "Edge detection", I have tried them all, but no one of them was good enough. But I did my own kernel by an oversight, and it working awesomely for me!
So, there is the code:
def main():
image = cv2.imread('digit.jpg', cv2.IMREAD_GRAYSCALE)
image = postprocess(image)
image = image.reshape((32,32,1))
def postprocess(gray):
gray_big = cv2.resize(gray, (256,256))
kernel = np.array([[0,-2,0],[-2,10,-2],[0,-2,0]])
filtered = cv2.filter2D(gray_big, -1, kernel)
filtered = 255 - filtered
filtered = filtered / 255
filtered = cv2.resize(filtered, (32,32))
return filtered
Resizing the image to a bigger one keeps the image clear from "artifacts" after the kernel processing, I've tried to do it on a original image size, but the image wasn't so clear as in my final code version.
result

Convert image to binary doesn't shows white lines of a soccer field

I am inspired by the following blogpost, however I am struggling with step 2/3.
I want to creates a binary image from a gray image based on the threshold values and ultimately displaying all white lines on the image. My desired output looks as follows:
First, I want to isolate the soccer field by using colour-thresholding and morphology.
def isolate_field(img):
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# find green pitch
light_green = np.array([40, 40, 40])
dark_green = np.array([70, 255, 255])
mask = cv2.inRange(hsv, light_green, dark_green)
# removing small noises
kernel = np.ones((5, 5), np.uint8)
opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
# apply mask over original frame
return cv2.bitwise_and(frame, frame, mask=opening)
This gives the following output:
I am happy with the results so far, but because of the large shadow I am struggling with the image-processing when I grayscale the picture. As a result, the binary thresholding is based on the sunny part in the upper-left corner instead of the white lines around the soccer field.
Following the methodology on the tutorials I get the following output for the simple thresholding:
and adaptive thresholding:
and finally, Otsu's thresholding:
How can I make sure that the white lines become more visible? I was thinking about cropping the frame so I only see the field and then use a mask based on the color white. That didn't work out unfortunately.
Help is much appreciated,
You can modify inRange to also exclude saturated colors (meaning the greens). I don't have your original image, so I used your intermediate result:
The result of inRange is the binary image you want. I expect you can achieve better results with the original image. I used this script in the image - which makes it easy to search for good HSV values.

Categories