Extract Data from an Image with Python/OpenCV/Tesseract? - python

I'm trying to extract some contents from a cropped image. I tried pytesseract and opencv template matching but the results are very poor. OpenCV template matching sometimes fails due to poor quality of the icons and tesseract gives me a line of text with false characters.
I'm trying to grab the values like this:
0:26 83 1 1
Any thoughts or techniques?

A technique you could use would be to blur your image. From what it looks like, the image is kind of low res and blurry already, so you wouldn't need to blur the image super hard. Whenever I need to use a blur function in Opencv, I normally choose the gaussian blur, as its technique of blurring each pixel as well as each surrounding pixel is great. Once the image is blurred, I would threshold, or adaptive threshold the image. Once you have gotten this far, the image that should be shown should be mostly hard lines with little bits of short lines mixed between. Afterwards, dilate the threshold image just enough to have the bits where there are a lot of hard edges connect. Once a dilate has been performed, find the contours of that image, and sort based on their height with the image. Since I assume the position of those numbers wont change, you will only have to sort your contours based on the height of the image. Afterwards, once you have sorted your contours, just create bounding boxes over them, and read the text from there.
However, if you want to do this the quick and dirty way, you can always just manually create your own ROI's around each area you want to read and do it that way.
First Method
Gaussian blur the image
Threshold the image
Dilate the image
Find Contours
Sort Contours based on height
Create bounding boxes around relevent contours
Second Method
Manually create ROI's around the area you want to read text from

Related

Different ways to detect LED Screen from the image

image
I want to extract the LED screen from the image above. Some approaches that I have tried include:
I first converted the image to HSV and made a trackbar GUI through which I noted at what value of HSV our mask filters out our ROI.
using canny edge detection, contours extraction, and selecting the contour with 4 vertices and area greater than 100
Both of the solutions do work.
The problem with the first approach is that it only works on a pinkish screen. While the second approach is more generic, but both approaches need a lot of fine-tuning to get the required result.
My question is that if there's any other more generalised approach that I can try.

How to create a boundary mask around an object?

I have some processed images that have noise (background pixels) around the boundaries. Is there a way to detect only the boundary of the object itself and create a mask to remove the background pixels around the boundaries?
Im a beginner to OpenCV so any code samples would help.
Example:
Original Image
Processed Image
Expected Output
I have tried the findContours method but it creates a mask that includes the noisy pixels as well.
Also i have tried the erode method but it does not give the same results for different image sizes so that is not the solution im looking for.

Edge Detection from image using python libraries and Contours Draw

Hellow everyone,
I am trying very hard to extract edges from a specific image. I have tried many many ways, including;
grayscale, blurring (laplacian,gaussian, averaging etc), gradient (sobel, prewitt, canny)
With morfological transformations
Even thresholding with different combinations
Even HSV convert and masking and then thresholding
Using Contour Methods with area thresholding
Except all of this, I have tried different combinations with all the above. BUT neither of the above, had an excellent result. Main problem is still too many edges/lines. The image is an orthomosaic 2D photo of a marble wall. I will upload the image. Anyone have any ideas?
P.S The final result should be an image that has only the "skeleton' or/ shape of the marbles.
Wall.tif

Detect rectanglular signature fields in document scans using OpenCV

I am trying to extract rectangular big boxes from document images with signatures in it. Since i don't have training data (for deep learning), i want to cut rectangular boxes (3 in all images) from these images using OpenCV.
Here is what I tried:
import numpy as np
import cv2
img = cv2.imread('S-0330-444-20012800.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,1)
contours,h = cv2.findContours(thresh,1,2)
for cnt in contours:
approx = cv2.approxPolyDP(cnt,0.02*cv2.arcLength(cnt,True),True)
if len(approx)==4:
cv2.drawContours(img,[cnt],0,(26,60,232),-1)
cv2.imshow('img',img)
cv2.waitKey(0)
sample image
With the above code, I get a lot of squares (around 152 small points like squares) and of course not the 3 boxes.
Replies appreciated. [sample image is attached]
I would suggest you read up on template matching. There is also a good OpenCV tutorial on this.
For your use case, the idea would be to generate a stereotyped image of a rectangular box with the same shape (width/height ratio) as the boxes found on your documents. Depending on whether your input images show the document always in the same scaling or not, your would need to either resize the inputs to keep their magnification constant, or you would need to operate with a template bank (e.g. an array of box templates in various scalings).
Briefly, you would then cross-correlate the template box(es) with the input image and (in case of well-matched scaling) would find ideally relatively sharp peaks indicating the centers of your document boxes.
In the code above, use image pyramids (to merge unwanted contour noises) and cv2.findContours in combination. Post to that filtering list of Contours based on contour area cv2.contourArea will lead to only bigger squares.
There is also an alternate solution. Looking at images, we can see that the signature text usually is bigger than that of printed text in that ROI. so we can filter out contours smaller than signature contours and extract only the signature.
Its always good to remove noise before using cv2.findContours e.g. dilate, erode, blurring etc.

Remove border of license plates with OpenCV (python)

I cropped license plates but they have some borders I want to remove the borders to segment characters, I tried to use Hough transform but It's not a promising approach. Here is the samples of license plates:
Is there any simple way to do that?
I have a naïve solution for one image. You have to tune some parameters to generalize it for the other images.
I chose the third image due to its clarity.
1. Threshold
In such cases the first step is to reach an optimal threshold, where all the letters/numbers of interest are converted to same pixel values. As a result I got the following:
2. Finding Contour and Bounding Region
Now I found the external contour present in the image to retain the letter/numbers. After finding it I found the bounding rectangle for the corresponding contour:
3. Cropping
Next I used the parameters returned from bounding the contour and used them to crop the image:
VOILA! There you have your region of interest!
Note:
This approach would work if all the images are taken in a similar manner and for the same color space. The second image provided has a different color. Hence you will have to alter the threshold parameters to segment your ROI properly.
You can also perform some morphological operations on the threshold image to obtain a better ROI.

Categories