Removing line artifacts from an image - python

I'm creating an OCR application. It extracts handwritten characters from a boxed section in a scanned or photographed printed form, and reads it using a CNN.
It successfully extracts characters using contours, but there are cases where there are lines that are read too as contours. These lines seem to be the result of either mere noise, or leftover pixels when the boxed section is cropped. The boxed section is cropped using contours.
Basically, it works when the form is scanned with a good scanner, saved in PNG format. Otherwise, it won't work as well. I need it to account for JPEG files too and crap camera/scanners.
This is then more of a question of what possible techniques I can use theoretically.
I'd like to either remove lines, or make the code ignore it.
I've tried:
"padding" the cropped boxed section by a negative number n. So it instead removes n pixels from each side. This can't be used too much though, as it also eats up the pixels of the character.
use morphological operation "close". Modifying the kernel size does almost nothing significant, though.
implementing a boxed section area:character area ratio. If the retrieved contour area ratio to the boxed section area is not in the range, it's ignored.
Here's what it looks like:
The grey parts outline the detected contours. The numbers indicate the index of the contour, ordered by the order they are detected. Notice there are strips of lines detected too. I want to get rid of this.
Beside the lines interfering with the model and making it spout nonsense trying to interpret these, there are some cases where it also seems to cause this error:
ValueError: cannot reshape array of size 339 into shape (1,28,28,1)
Maybe I'll start with investigating this in the meantime.

Related

Skewing text - How to take advantage of existing edges

I have the following JPG image. If I want to find the edges where the white page meets the black background. So I can rotate the contents a few degrees clockwise. My aim is to straighten the text for using with Tesseract OCR conversion. I don't see the need to rotate the text blocks as I have seen in similar examples.
In the docs Canny Edge Detection the third arg 200 eg edges = cv.Canny(img,100,200) is maxVal and said to be 'sure to be edges'. Is there anyway to determine these (max/min) values ahead of any trial & error approach?
I have used code examples which utilize the Python cv2 module. But the edge detection is set up for simpler applications.
Is there any approach I can use to take the text out of the equation. For example: only detecting edge lines greater than a specified length?
Any suggestions would be appreciated.
Below is an example of edge detection (above image same min/max values) The outer edge of the page is clearly defined. The image is high contrast b/w. It has even lighting. I can't see a need for the use of an adaptive threshold. Simple global is working. Its just at what ratio to use it.
I don't have the answer to this yet. But to add. I now have the contours of the above doc.
I used find contours tutorial with some customization of the file loading. Note: removing words gives a thinner/cleaner outline.
Consider Otsu.
Its chief virtue is that it is adaptive to local
illumination within the image.
In your case, blank margins might be the saving grace.
Consider working on a series of 2x reduced resolution images,
where new pixel is min() (or even max()!) of original four pixels.
These reduced images might help you to focus on the features
that matter for your use case.
The usual way to deskew scanned text is to binarize and
then keep changing theta until "sum of pixels across raster"
is zero, or small. In particular, with few descenders
and decent inter-line spacing, we will see "lots" of pixels
on each line of text and "near zero" between text lines,
when theta matches the original printing orientation.
Which lets us recover (1.) pixels per line, and (2.) inter-line spacing, assuming we've found a near-optimal theta.
In your particular case, focusing on the ... leader dots
seems a promising approach to finding the globally optimal
deskew correction angle. Discarding large rectangles of
pixels in the left and right regions of the image could
actually reduce noise and enhance the accuracy of
such an approach.

Character extraction from image of combed field

I am currently working on handwritten character recognition from a form iamge. Everything works pretty well so far, but I was hoping I could get some insight on extracting character from an image of a boxed or a "combed" field
For example, after a specific field has been cropped and binazarized (with otu's method), I'm left with something like this:
Binary Field Image
For character recogntion, I have a trained CNN model using the emnist dataset. In order to predict the characters, I have to extract the characters one by one. What would be the best way to extract the characters from the boxes?
Currently, I am using a pretty trivial method of just find groupings of non-white lines of horizontal and vertical pixels that take up a certain number of pixels in relation to the image width and height. For example, I would find horizontal lines that consists of at least 90% non-white pixels and group the ones that have concurrent y coordinates to form a rectangle object which would be the horizontal lines found on the image (which should constist of two lines/rectangles, for top and bottom). For vertical lines I do a similar thing except I would end up with {2 * charLength} lines. I use these values to crop out each character. However, it is not perfect.
Here are some issues with this:
Field is not always perfectly straight (rotation is slightly off). I am already applying SURF and homography to the original image, which does a very good job but it is not perfect.
If a user writes a "1" that takes up the entire height of the box, it will most likely falsly indicate that as a vertical line of the box.
The coordinates don't always match up with the original image and the input image. Therefore, part of the field will be cropped out sometimes. To fix this, I am currently extracting a surrounding part of the field (as seen in the image) but this can also cause problems because the form can have other vertical and horizontal lines very close to some fields. This will cause my current trivial method to not work properly.
Is there a better way to do this? One thing is that I have to keep performance in mind. I was thinking of doing SURF matching again for just the field image, but doing it for the entire form page takes very long, so I am not sure if I want to do it again for each field that I am reading.
I was hoping someone would have suggestions. I am using OpenCV for image processing, but solution in words is fine. Thank you
I know this is a bit late response, but I ended up using the contour feature that OpenCV had to extract the character portion.
When OpenCV finds the contours of the images, it sets up a hierarchy system of contours. The first level ended up being the very outer box so I was able to just grab the contours of the next level to extract the characters.
It didn't work 100% in the beginning, but after some additional image processing I was able to extract the characters properly for at least 99% of cases.

Clipping image/remove background programmatically in Python

How to go from the image on the left to the image on the right programmatically using Python (and maybe some tools, like OpenCV)?
I made this one by hand using an online tool for clipping. I am completely noob in image processing (especially in practice). I was thinking to apply some edge or contour detection to create a mask, which I will apply later on the original image to paint everything else (except the region of interest) black. But I failed miserably.
The goal is to preprocess a dataset of very similar images, in order to train a CNN binary classifier. I tried to train it by just cropping the image close to the region of interest, but the noise is so high that the CNN learned absolutely nothing.
Can someone help me do this preprocessing?
I used OpenCV's implementation of watershed algorithm to solve your problem. You can find out how to use it if you read this great tutorial, so I will not explain this into a lot of detail.
I selected four points (markers). One is located on the region that you want to extract, one is outside and the other two are within lower/upper part of the interior that does not interest you. I then created an empty integer array (the so-called marker image) and filled it with zeros. Then I assigned unique values to pixels at marker positions.
The image below shows the marker positions and marker values, drawn on the original image:
I could also select more markers within the same area (for example several markers that belong to the area you want to extract) but in that case they should all have the same values (in this case 255).
Then I used watershed. The first input is the image that you provided and the second input is the marker image (zero everywhere except at marker positions). The algorithm stores the result in the marker image; the region that interests you is marked with the value of the region marker (in this case 255):
I set all pixels that did not have the 255 value to zero. I dilated the obtained image three times with 3x3 kernel. Then I used the dilated image as a mask for the original image (i set all pixels outside the mask to zero) and this is the result i got:
You will probably need some kind of method that will find markers automatically. The difficulty of this task depends heavily on the set of the input images. In some cases, the method can be really straightforward and simple (as in the tutorial linked above) but sometimes this can be a tough nut to crack. But I can't recommend anything because I don't know how your images look like in general (you only provided one). :)

Segmentation of lines, words and characters from a document's image

I am working on a project where I have to read the document from an image. In initial stage I will read the machine printed documents and then eventually move to handwritten document's image. However I am doing this for learning purpose, so I don't intend to use apis like Tesseract etc.
I intend to do in steps:
Preprocessing(Blurring, Thresholding, Erosion&Dilation)
Character Segmentation
OCR (or ICR in later stages)
So I am doing the character segmentation right now, I recently did it through the Horizontal and Vertical Histogram. I was not able to get very good results for some of the fonts, like the image as shown I was not able to get good results.
Is there any other method or algorithm to do the same?
Any help will be appreciated!
Edit 1:
The result I got after detecting blobs using cv2.SimpleBlobDetector.
The result I got after using cv2.findContours.
A first option is by deskewing, i.e. measuring the skew angle. You can achieve this for instance by Gaussian filtering or erosion in the horizontal direction, so that the characters widen and come into contact. Then binarize and thin or find the lower edges of the blobs (or directly the directions of the blobs). You will get slightly oblique line segments which give you the skew direction.
When you know the skew direction, you can counter-rotate to perform de-sekwing. The vertical histogram will then reliably separate the lines, and you can use an horizontal histogram in each of them.
A second option, IMO much better, is to binarize the characters and perform blob detection. Then proximity analysis of the bounding boxes will allow you to determine chains of characters. They will tell you the lines, and where spacing is larger, delimit the words.

OpenCV: How to find individual positions of connected letters in an image?

I have an image of text that I would like to segment so I can get the individual positions of the letters. However, some of the letters are touching each other, so I'm having trouble with segmentation:
I tried using some contour methods (which seem to work for license plates, etc), but it always ends up detecting the entire block of text. I also tried to apply some erosion to increase the gaps between the letters, but sometimes this causes parts of the letters to erode while the connections remain. I also tried blurring the image and then thresholding, but like before, the connections remain. Is there a way to get the positions of these letters?

Categories