Preprocess bad scans (partially blurred, shadowed and slightly skewed) for OCR

Preprocess bad scans (partially blurred, shadowed and slightly skewed) for OCR - python

I try to read documents from various sources with Python. Therefore I am using OpenCV and Tesseract. To optimize the Tesseract performance, I do some preprocessing, but sadly the documents also vary in quality a lot. My current issue are documents that are only partially blurry or shaded, due to bad scans.
I have no influence on the document quality and manual feature detection is not applicable, because the code should run over hundred thousands of documents in the end and even inside a document the quality can vary strongly.
To get rid of the shadow, I found a technique with delating and blurring the image and dividing the original with the dilated version.
h, w = img.shape
kernel = np.ones((7, 7), np.uint8)
dilation = cv2.dilate(img, kernel, iterations=1)
blurred_dilation = cv2.GaussianBlur(dilation, (13, 13), 0)
resized = cv2.resize(blurred_dilation, (w, h))
corrected = img / resized * 255
That works very well.
But I still got that blur and optically it got worse to read.
I'd like to do a binarisation next, but then would be nothing valuable left from the blurred parts.
I found an example of a deconvolution that works for motion blur, but I can only apply it to the whole image, which blurs the rest of the text, and I need to know the direction of the motion blur.
So, I hope to get some help on how to optimize this kind of image, so that tesseract can properly read it.
I know that there should be further optimizations besides sharpening the blurred text. Deskewing and getting rid of the fragments of another pages.
These I am not sure about the proper sequence how to perform these additional steps.
I can hardly find sources or tutorials for plain document optimization for OCR processes.
Often the procedures apply globally to the whole image or are for non OCR applications.

Reminds me of this article I read a few years ago: https://medium.com/illuin/cleaning-up-dirty-scanned-documents-with-deep-learning-2e8e6de6cfa6
Contrary to the title, it contains a variety of classic computer vision algorithms for your inspiration.
To remove shadow, I've personally had median filtering as described (removing a median-filtered background) work more effectively than what you show here.
To deskew, I've experimented with Hough transform and got good results.
Intuitively, if you should know the font type and size in advance, that should help as well.

import cv2
import numpy as np
import skimage.filters as filters
# read the image
img = cv2.imread("input/ocr.png", 0)
# blur
blurred_dilation = cv2.GaussianBlur(img, (91, 91), 0)
# divide gray by morphology image
division = cv2.divide(img, blurred_dilation, scale=255)
# sharpen using unsharp masking
sharp = filters.unsharp_mask(division, radius=11, amount=11, multichannel=False, preserve_range=False)
sharp = (255 * sharp).clip(0, 255).astype(np.uint8)
# threshold
thresh = cv2.threshold(sharp, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# save results
cv2.imwrite('receipt_division_sharp.png', sharp)
cv2.imwrite('receipt_division_thresh.png', thresh)
result,
result with threshold
method: unsharp_mask filter, Otsu's method (1979)
ref: OpenCV: Contour detection of shadowed image before OCR
(2020 stack overflow)
If I were You, I'll try GAN. Though raw data is blurred and shadowed, You need clear data for tesseract. So You need to generate clear character from blurred raw data.

Related

morphological transformation opencv noob questiong

hope you all have good day today. so I'm here learning python,opencv on a raspberry pi and Im hoping that someone can explain what the code below do, I've read from https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html and it doesn't explain what iterations mean and how to choose the best one? and what's the use for object detection,
thank you.
for frame in cam.capture_continuous(raw, format="bgr", use_video_port=True):
frame = frame.array
hsv = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv,colorLower,colorUpper)
mask = cv2.blur(mask,(3,3))
mask= cv2.dilate(mask,None,iterations=5)
mask= cv2.erode(mask,None,iterations=1)
mask= cv2.dilate(mask,None,iterations=3)
me,thresh = cv2.threshold(mask,127,255,cv2.THRESH_BINARY)
cnts = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[-2]
center = None

It is quite an obvious concept, either way, as stated in the documentation:
iterations: number of times dilation is applied.
So iterations=5 means 5 consecutive times we're applying dilation.
Regarding the applications for this procedure, I recommend you to read the Eroding and Dilating Tutorial from OpenCV which explains quite clearly how these methods work, but the absolute basics is that we need to erode and dilate to remove noise from the image and also close "holes" that wouldn't make a solid contour.
The purpose of this transformations is to have a very solid image of the objects we're looking for, to later detect contours.
The values for the number of iterations and transformations we apply vary greatly depending on the video, camera features, resolution, amount of noise, etc. You should experiment and find the optimal for your specific situation, or go for a generic erode + dilate and hope for best.

How to get the size of an object using OpenCV python?

I have a question regarding opencv python , I have an example of image here I want to know how to get the size of this object using opencv python.
Here's the sample image
Here's the output I want
I'm just using paint to this output that I want.

A simple approach is to obtain a binary image then find the bounding box on that image. Here's the result with the width (in pixels) and the height of the box drawn onto the image. To determine real-world measurements, you would need calibration information to scale pixels into concrete values (such as centimeters). Without calibration information to translate pixels into quantifiable lengths, it would be difficult to convert this to a real-life size.
Code
import cv2
# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread("1.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find bounding box
x,y,w,h = cv2.boundingRect(thresh)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
cv2.putText(image, "w={},h={}".format(w,h), (x,y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (36,255,12), 2)
cv2.imshow("thresh", thresh)
cv2.imshow("image", image)
cv2.waitKey()

Here's a quick recommendation why not use an object detector like YOLO, there are lot of pretrained weights you can download online and banana is fortunately within the coco classes.
You can take a look at this github repo: https://github.com/divikshrivastava/drfoodie
Edit:
Take a look here for a sample
https://drive.google.com/file/d/1uJE0hy9mv75Ya3dqBIyw-kn1h87WLxy4/view?usp=drivesdk

In your case, using just an RGB picture, you actually can't.
Nevertheless, a simple and practical way would be the following. You probably need a reference object with known size in real world in a measurable unit such as millimeters. This should be placed beforehand at the same distance from the camera with the object of interest. Having detected both objects within the image (reference object and object of interest), you would be able to calculate the "Pixel Per Metric ratio" in order to compute it's actual size. For further detail, you check these links : tutorial, similarly in github.
Another way would be by using depth cameras, or just simply retrieve the distance from the object of interest using alternative techniques as this answer may suggest.
(edit) On the other hand, since your question doesn't exactly clarify whether you mean real-world metrics (i.e. centimeters) or just a measurement in pixels, forgive me if I mislead you..

Opencv - Extracting data from in-game images

I need some help with an OpenCV project I'm working on. I'm taking images from a computer game (in this case, Fortnite), and I would like to extract different elements from them, eg. timer value, quantities of materials, health and shield etc.
Currently I perform a series of image preprocessing functions until I get a binary image, followed by locating the contours in the image and then sending those contours to a machine learning algorithm (K-Nearest-Neighbours).
I am able to succeed in a lot of cases, but there are some images where I don't manage to find some of the contours, therefore I don't find all the data.
An important thing to note is that I use the same preprocessing pipeline for all images, because I'm looking for as robust of a solution that I can manage.
I would like to know what I can do to improve the performance of my program. -
Is KNN a good model for this sort of task, or are there other models that might give me better results?
Is there any way to recognise characters without locating contours?
How can I make my preprocessing pipeline as robust as possible, given the fact that there is a lot of variance in the background across all images?
My goal is to process the images as fast as possible, starting out with a minimum of at least 2 images per second.
Thanks in advance for any help or advice you can give me!
Here is an example image before preprocessing
Here is the image after preprocessing, in this example I cannot find the contour for the 4 on the right side.

Quite simply, enlarging the image might help, since it increases the dark border of the number.
I threw together some code that does that. The result could be improved, but my point here is to show that the 4 can now be detected as a contour. To increase efficiency I only selected contours within a certain size.
Also, since it is part of the HUD, that usually means that the location on screen is always the same. If so, you can get great performance increase by only selecting the area with values (described here) - as I have done manually.
Finally, since the numbers have a consistent shape, you could try matchShapes as an alternative to kNN to recognize the numbers. I don't know how they compare in performance though, so you'll have to try that out yourself.
Result:
Code:
import numpy as np
import cv2
# load image
img = cv2.imread("fn2.JPG")
# enlarge image
img = cv2.resize(img,None,fx=4, fy=4, interpolation = cv2.INTER_CUBIC)
# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# create mask using threshold
ret,mask = cv2.threshold(gray,200,255,cv2.THRESH_BINARY)
# find contours in mask
im, contours, hierarchy = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# draw contour on image
for cnt in contours:
if cv2.contourArea(cnt) < 3000 and cv2.contourArea(cnt) > 200:
cv2.drawContours(img, [cnt], 0, (255,0,0), 2)
#show images
cv2.imshow("Mask", mask)
cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Deblurring an image in order to perform edge detection

I have this image:
I am trying to put the background into focus in order to perform edge-detection on the image. What would be the methods available to me (either on
space/frequency filed)?
What I tried is the following:
kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
im = cv2.filter2D(equ, -1, kernel)
This outputs this image:
I also played around with the centre value but with no positive result.
I also tried this:
psf = np.ones((5, 5)) / 25
equ = convolve2d(equ, psf, 'same')
deconvolved = restoration.wiener(equ, psf, 1, clip=False)
plt.imshow(deconvolved, cmap='gray')
With no appreciable changes to the image.
Any help on the matter is greatly appreciated!
EDIT:
Here is the code that I took from here:
psf = np.ones((5, 5)) / 25
equ = convolve2d(equ, psf, 'same')
deconvolved, _ = restoration.unsupervised_wiener(equ, psf)
plt.imshow(deconvolved, cmap='gray')
and here is the output:

Deblurring images is (unfortunately) quite difficult, the reason for this is that blurring removes noise, so there are several (noisy) images that will yield the same image when you blur it. This means that there is no simple way for the computer to "choose" which of the noisy images when you deblur it. Because of this, deblurring will often yield noisy images.
Now then, you might ask how photographers do this in reality. Well, they do not actually deblur images, they sharpen them (which is slightly different). When you sharpen an image, you increase the contrast near borders to emphasise them (this is why you sometimes see a halo around borders on images that have been too heavily sharpened).
In you case, you want to deblur it (and there is no convolution kernel that will allow you to do this). To do it in a good way, you need to know what process blurred the image in the first place (that is if you don't want to spend thousands of dollars on special software or don't have a masters in mathematics or astronomy).
If you still want to do this, I'd recommend searching for deconvolution, and if you don't know the blurring process, blind deconvolution. There are some (crude) functions for it in skimage, which might be of help (http://scikit-image.org/docs/stable/auto_examples/filters/plot_restoration.html#sphx-glr-auto-examples-filters-plot-restoration-py).
Finally, the final link in Jax Briggs seem helpful, but I would not cross my fingers for magical results.

Irregular shape detection and measurement in python opencv

I'm attempting to do some image analysis using OpenCV in python, but I think the images themselves are going to be quite tricky, and I've never done anything like this before so I want to sound out my logic and maybe get some ideas/practical code to achieve what I want to do, before I invest a lot of time going down the wrong path.
This thread comes pretty close to what I want to achieve, and in my opinion, uses an image that should be even harder to analyse than mine. I'd be interested in the size of those coloured blobs though, rather than their distance from the top left. I've also been following this code, though I'm not especially interested in a reference object (the dimensions in pixels alone would be enough for now and can be converted afterwards).
Here's the input image:
What you're looking at are ice crystals, and I want to find the average size of each. The boundaries of each are reasonably well defined, so conceptually this is my approach, and would like to hear any suggestions or comments if this is the wrong way to go:
Image in RGB is imported and converted to 8bit gray (32 would be better based on my testing in ImageJ, but I haven't figured out how to do that in OpenCV yet).
The edges are optionally Gaussian blurred to remove noise
A Canny edge detector picks up the lines
Morphological transforms (erosion + dilation) are done to attempt to close the boundaries a bit further.
At this point it seems like I have a choice to make. I could either binarise the image, and measure blobs above a threshold (i.e. max value pixels if the blobs are white), or continue with the edge detection by closing and filling contours more fully. Contours seems complicated though looking at that tutorial, and though I can get the code to run on my images, it doesn't detect the crystals properly (unsurprisingly). I'm also not sure if I should morph transform before binarizing too?
Assuming I can get all that to work, I'm thinking a reasonable measure would be the longest axis of the minimum enclosing box or ellipse.
I haven't quite ironed out all the thresholds yet, and consequently some of the crystals are missed, but since they're being averaged, this isn't presenting a massive problem at the moment.
The script stores the processed images as it goes along, so I'd also like the final output image similar to the 'labelled blobs' image in the linked SO thread, but with each blob annotated with its dimensions maybe.
Here's what an (incomplete) idealised output would look like, each crystal is identified, annotated and measured (pretty sure I can tackle the measurement when I get that far).
Abridged the images and previous code attempts as they are making the thread overly long and are no longer that relevant.
Edit III:
As per the comments, the watershed algorithm looks to be very close to achieving what I'm after. The problem here though is that it's very difficult to assign the marker regions that the algorithm requires (http://docs.opencv.org/3.2.0/d3/db4/tutorial_py_watershed.html).
I don't think this is something that can be solved with thresholds through the binarization process, as the apparent colour of the grains varies by much more than the toy example in that thread.
Edit IV
Here are a couple of the other test images I've played with. It fares much better than I expected with the smaller crystals, and theres obviously a lot of finessing that could be done with the thresholds that I havent tried yet.
Here's 1, top left to bottom right correspond to the images output in Alex's steps below.
And here's a second one with bigger crystals.
You'll notice these tend to be more homogeneous in colour, but with harder to discern edges. Something I found a little suprising is that the edge floodfilling is a little overzealous with some of the images, I would have thought this would be particularly the case for the image with the very tiny crystals, but actually it appears to have more of an effect on the larger ones. There is probably a lot of room to improve the quality of the input images from our actual microscopy, but the more 'slack' the programming can take from the system, the easier our lives will be!

As I mentioned in the comments, watershed looks to be an ok approach for this problem. But as you replied, defining the foreground and the background for the markers is the hard part! My idea was to use the morphological gradient to get good edges along the ice crystals and work from there; the morphological gradient seems to work great.
import numpy as np
import cv2
img = cv2.imread('image.png')
blur = cv2.GaussianBlur(img, (7, 7), 2)
h, w = img.shape[:2]
# Morphological gradient
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
gradient = cv2.morphologyEx(blur, cv2.MORPH_GRADIENT, kernel)
cv2.imshow('Morphological gradient', gradient)
cv2.waitKey()
From here, I binarized the gradient using some thresholding. There's probably a cleaner way to do this...but this happens to work better than the dozen other ideas I tried.
# Binarize gradient
lowerb = np.array([0, 0, 0])
upperb = np.array([15, 15, 15])
binary = cv2.inRange(gradient, lowerb, upperb)
cv2.imshow('Binarized gradient', binary)
cv2.waitKey()
Now we have a couple issues with this. It needs some cleaning up as it's messy, and further, the ice crystals that are on the edge of the image are showing up---but we don't know where those crystals actually end so we should actually ignore those. To remove those from the mask, I looped through the pixels on the edge and used floodFill() to remove them from the binary image. Don't get confused here on the orders of rows and columns; the if statements are specifying rows and columns of the image matrix, while the input to floodFill() expects points (i.e. x, y form, which is opposite from row, col).
# Flood fill from the edges to remove edge crystals
for row in range(h):
if binary[row, 0] == 255:
cv2.floodFill(binary, None, (0, row), 0)
if binary[row, w-1] == 255:
cv2.floodFill(binary, None, (w-1, row), 0)
for col in range(w):
if binary[0, col] == 255:
cv2.floodFill(binary, None, (col, 0), 0)
if binary[h-1, col] == 255:
cv2.floodFill(binary, None, (col, h-1), 0)
cv2.imshow('Filled binary gradient', binary)
cv2.waitKey()
Great! Now just to clean this up with some opening and closing...
# Cleaning up mask
foreground = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
foreground = cv2.morphologyEx(foreground, cv2.MORPH_CLOSE, kernel)
cv2.imshow('Cleanup up crystal foreground mask', foreground)
cv2.waitKey()
So this image was labeled as "foreground" because it has the sure foreground of the objects we want to segment. Now we need to create a sure background of the objects. Now, I did this in the naïve way, which just is to grow your foreground a bunch, so that your objects are probably all defined within that foreground. However, you could probably use the original mask or even the gradient in a different way to get a better definition. Still, this works OK, but is not very robust.
# Creating background and unknown mask for labeling
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (17, 17))
background = cv2.dilate(foreground, kernel, iterations=3)
unknown = cv2.subtract(background, foreground)
cv2.imshow('Background', background)
cv2.waitKey()
So all the black there is "sure background" for the watershed. Also I created the unknown matrix, which is the area between foreground and background, so that we can pre-label the markers that get passed to watershed as "hey, these pixels are definitely in the foreground, these others are definitely background, and I'm not sure about these ones between." Now all that's left to do is run the watershed! First, you label the foreground image with connected components, identify the unknown and background portions, and pass them in:
# Watershed
markers = cv2.connectedComponents(foreground)[1]
markers += 1 # Add one to all labels so that background is 1, not 0
markers[unknown==255] = 0 # mark the region of unknown with zero
markers = cv2.watershed(img, markers)
You'll notice that I ran watershed() on img. You might experiment running it on a blurred version of the image (maybe median blurring---I tried this and got a little smoother of boundaries for the crystals) or other preprocessed versions of the images which define better boundaries or something.
It takes a little work to visualize the markers as they're all small numbers in a uint8 image. So what I did was assign them some hue in 0 to 179 and set inside a HSV image, then convert to BGR to display the markers:
# Assign the markers a hue between 0 and 179
hue_markers = np.uint8(179*np.float32(markers)/np.max(markers))
blank_channel = 255*np.ones((h, w), dtype=np.uint8)
marker_img = cv2.merge([hue_markers, blank_channel, blank_channel])
marker_img = cv2.cvtColor(marker_img, cv2.COLOR_HSV2BGR)
cv2.imshow('Colored markers', marker_img)
cv2.waitKey()
And finally, overlay the markers onto the original image to check how they look.
# Label the original image with the watershed markers
labeled_img = img.copy()
labeled_img[markers>1] = marker_img[markers>1] # 1 is background color
labeled_img = cv2.addWeighted(img, 0.5, labeled_img, 0.5, 0)
cv2.imshow('watershed_result.png', labeled_img)
cv2.waitKey()
Well, that's the pipeline in it's entirety. You should be able to copy/paste each section in a row and you should be able to get the same results. The weakest parts of this pipeline is binarizing the gradient and defining the sure background for watershed. The distance transform might be useful in binarizing the gradient somehow, but I haven't gotten there yet. Either way...this was a cool problem, I would be interested to see any changes you make to this pipeline or how it fares on other ice-crystal images.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.