Python Tesseract figuring out text orientation/transformation - python

# Tesseract Win-Installer https://github.com/UB-Mannheim/tesseract/wiki
import pytesseract
import cv2
image = cv2.imread("img.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
text = pytesseract.image_to_string(image)
print(text)
the output is not even close "WX017," instead of "MX011A"
however, if I manually rearrange the characters it works. I could transform the input image and define an ROI but the orientation could be anything. It could be upside down as well.
I want to recognize curved text around a circle
1:
2:
3:

This is pretty difficult, as tesseract expects text to be minimally distorted.
One (admittedly far-fetched) possibility would be to try and detect the circle, then map it onto a rectangle.
To do this, you can reduce each letter to a non-connected blob using a blur filter and discarding grey values below a threshold; iterate until you get more or less circular blobs, then get their centers. Take several of those three by three at random, and for each triplet calculate the center of the circle encompassing all three. The average of those centers should be more or less the center of the lettering circle.
Having the center, and the approximate radius, it is relatively easy to map the circular crown of appropriate height to a rectangle (e.g. using polar-to-cartesian transform).
You then apply tesseract to the transformed rectangle.
It should also be possible to use autocorrelation to average and sharpen a multiple identical text along said rectangle (i.e. "MX011A MXOI1A MX017A" --> "MX011A").

Related

Extract Data from an Image with Python/OpenCV/Tesseract?

I'm trying to extract some contents from a cropped image. I tried pytesseract and opencv template matching but the results are very poor. OpenCV template matching sometimes fails due to poor quality of the icons and tesseract gives me a line of text with false characters.
I'm trying to grab the values like this:
0:26 83 1 1
Any thoughts or techniques?
A technique you could use would be to blur your image. From what it looks like, the image is kind of low res and blurry already, so you wouldn't need to blur the image super hard. Whenever I need to use a blur function in Opencv, I normally choose the gaussian blur, as its technique of blurring each pixel as well as each surrounding pixel is great. Once the image is blurred, I would threshold, or adaptive threshold the image. Once you have gotten this far, the image that should be shown should be mostly hard lines with little bits of short lines mixed between. Afterwards, dilate the threshold image just enough to have the bits where there are a lot of hard edges connect. Once a dilate has been performed, find the contours of that image, and sort based on their height with the image. Since I assume the position of those numbers wont change, you will only have to sort your contours based on the height of the image. Afterwards, once you have sorted your contours, just create bounding boxes over them, and read the text from there.
However, if you want to do this the quick and dirty way, you can always just manually create your own ROI's around each area you want to read and do it that way.
First Method
Gaussian blur the image
Threshold the image
Dilate the image
Find Contours
Sort Contours based on height
Create bounding boxes around relevent contours
Second Method
Manually create ROI's around the area you want to read text from

Extracting data from graphs in a scanned document

EDIT: This is a deeper explanation of a question I asked earlier, which is still not solved for me.
I'm currently trying to write some code that can extract data from some uncommon graphs in a book. I scanned the pages of the book, and by using opencv I would like to detect some features ofthe graphs in order to convert them into useable data. In the left graph I'm looking for the height of the "triangles" and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
For the left graph, I thought of detecting all the individual colors and computing the area of each sector by counting the amount of pixels in that color. When I have the area of these sectors, I can easily calculate their heights, using basic math. The following code snippet shows how far I've gotten already with identifying different colors. However I can't manage to make this work accurately. It always seems to detect some colors of other sectors as well, or not detect all pixels of one sector. I think it has something to do with the boundaries I'm using. I can't quite figure out how to make them work. Does someone know how I can determine these values?
import numpy as np
import cv2
img = cv2.imread('images/test2.jpg')
lower = np.array([0,0,100])
upper = np.array([50,56,150])
mask = cv2.inRange(img, lower, upper)
output = cv2.bitwise_and(img, img, mask = mask)
cv2.imshow('img', img)
cv2.imshow('mask', mask)
cv2.imshow('output', output)
cv2.waitKey(0)
cv2.destroyAllWindows()
For the right graph, I still have no idea how to extract data from it. I thought of identifying the center by detecting all the dotted lines, and then by detecting the intersections of these dotted lines with the gray area, I could measure the distance between the center and these intersections. However I couldn't yet figure out how to do this properly, since it sounds quite complex. The following code snippet shows how far I've gotten with the line detection. Also in this case the detection is far from accurate. Does someone have an idea how to tackle this problem?
import numpy as np
import cv2
# Reading the image
img = cv2.imread('test2.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Apply edge detection
edges = cv2.Canny(gray,50,150,apertureSize = 3)
# Line detection
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength=50,maxLineGap=20)
for line in lines:
x1,y1,x2,y2 = line[0]
cv2.line(img,(x1,y1),(x2,y2),(0,0,255),2)
cv2.imwrite('linesDetected.jpg',img)
For the left image, using your approach, try to look at the RGB histogram, the colors should be significant peaks, if you would like to use the relative area of the segments.
Another alternative could be to use Hough Circle Transform, which should work on circle segments. See also here.
For the right image ... let me think ...
You could create a "empty" diagram with no data inside. You know the locations of the circle segment ("cake pieces"). Then you could identify the area where the data is (the dark ones), either by using a grey threshold, an RGB threshold, or Find Contours or look for Watershed / Distance Transform.
In the end the idea is to make a boolean overlay between the cleared image and the segments (your data) that was found. Then you can identify which share of your circle segments is covered, or knowing the center, find the farthest point from the center.

Detect rectanglular signature fields in document scans using OpenCV

I am trying to extract rectangular big boxes from document images with signatures in it. Since i don't have training data (for deep learning), i want to cut rectangular boxes (3 in all images) from these images using OpenCV.
Here is what I tried:
import numpy as np
import cv2
img = cv2.imread('S-0330-444-20012800.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,1)
contours,h = cv2.findContours(thresh,1,2)
for cnt in contours:
approx = cv2.approxPolyDP(cnt,0.02*cv2.arcLength(cnt,True),True)
if len(approx)==4:
cv2.drawContours(img,[cnt],0,(26,60,232),-1)
cv2.imshow('img',img)
cv2.waitKey(0)
sample image
With the above code, I get a lot of squares (around 152 small points like squares) and of course not the 3 boxes.
Replies appreciated. [sample image is attached]
I would suggest you read up on template matching. There is also a good OpenCV tutorial on this.
For your use case, the idea would be to generate a stereotyped image of a rectangular box with the same shape (width/height ratio) as the boxes found on your documents. Depending on whether your input images show the document always in the same scaling or not, your would need to either resize the inputs to keep their magnification constant, or you would need to operate with a template bank (e.g. an array of box templates in various scalings).
Briefly, you would then cross-correlate the template box(es) with the input image and (in case of well-matched scaling) would find ideally relatively sharp peaks indicating the centers of your document boxes.
In the code above, use image pyramids (to merge unwanted contour noises) and cv2.findContours in combination. Post to that filtering list of Contours based on contour area cv2.contourArea will lead to only bigger squares.
There is also an alternate solution. Looking at images, we can see that the signature text usually is bigger than that of printed text in that ROI. so we can filter out contours smaller than signature contours and extract only the signature.
Its always good to remove noise before using cv2.findContours e.g. dilate, erode, blurring etc.

Rotate an image without black area

When we use some image processing library to rotate an image, the rotated image will always contains some black area. For example, I use the following python code to rotate an image:
from scipy import misc
img = misc.imread('test.jpg')
img = misc.imrotate(img,15)
misc.imsave('rotated.jpg')
The image is as follows:
My question is: how can I rotate an image without producing black area. I believe there exists some interpolation method to compensate for the missing area, which makes the image more natural.
It will be appreciated if anyone can provide a python code to achieve my task.
If you want to 'clone' or 'heal' the missing areas based on some part of the background, that's a complex problem, usually done with user intervention (in tools like Photoshop or GIMP).
Alternatives would be to fill the background with a calculated average colour - or just leave the original image. Neither will look 'natural' though.
The only approach that will work for all images will be to crop the rotated image to the largest rectangle within the rotated area. That will achieve your objective of having no black areas and looking natural, but at the cost of reducing the image size.
isnt there a simple paint fill function in your "some image library" ?, simple do that at all 4 corner pixels and then make it white or so.

Open CV Python- Image distortion radial and perspective-

I have a distorted picture, where without distortion the point A, B C and D form a square of 1 cm * 1 cm.
I tried to use homography to correct it, but it distort the line AD and BC, as you can see in the figure.
Do you have an idea how could I correct that?
Thanks a lot!
Marie- coder beginner
PS: for info, the image is taken in a tube with an endoscope camera having a large field of view allowing to take picture of the tube almost around the camera. I will use the 1*1 cm square to estimate roots growth with several pictures taken over time.
here is my code:
import cv2
import numpy as np
import matplotlib.pyplot as plt
if __name__ == '__main__' :
# Read source image.
im_src = cv2.imread('points2.jpg', cv2.IMREAD_COLOR)
# Four points of the miniR image
pts_src = np.array([[742,223],[806,255],[818,507],[753,517]], dtype=float)
# Read destination image.
im_dst = cv2.imread('rectangle.jpg', cv2.IMREAD_COLOR)
# Four points of the square
pts_dst = np.array([[200,200],[1000,200],[1000,1000],[200,1000]], dtype=float)
# Calculate Homography
h, status = cv2.findHomography(pts_src, pts_dst)
# Warp source image to destination based on homography
im_out = cv2.warpPerspective(im_src, h, (im_dst.shape[1],im_dst.shape[0]))
cv2.imwrite('corrected2.jpg', im_out)
# Display images
cv2.imshow("Source Image", im_src)
cv2.imshow("Destination Image", im_dst)
cv2.imshow("Warped Source Image", im_out)
cv2.waitKey(0)
A homography is a projective transformation. As such it can only map straight lines to straight lines. The straight sides of your input curvilinear quadrangle are correctly rectified, but there is no way that you can straighten the curved sides using a projective transform.
In the photo you posted it may be reasonable to assume that the overall geometry is approximately a cylinder, and the "vertical" lines are parallel to the axis of the cylinder. So they are approximately straight, and a projective transformation (the camera projection) will map them to straight lines. The "horizontal" lines are the images of circles, or ellipses if the cylinder is squashed. A projective transformation will map ellipses (in particular, circles) into ellipses. So you could proceed by fitting ellipses. See this other answer for hints.
I found a solution using GDAL. We can use two chessboard images. One image is imaged with the device creating the distortion and remain unchanged - so with no distortion. With the help QGIS you create a file with associating distorted point to undistorted one. For that you add a Ground Control Point at each intersection using a defined grid interval (e.g. 100px) and export the resulting GCPs as pointsfile.points.
After that, you can use a batch file that a collaborator created here. It is using GDAL to geo-correct the images.
You just need to put the images that you would like to transform (jpg format) into the root directory of the repo and run bash warp.sh. This will output the re-transformed images into the out/ directory.

Categories