Issue recognizing text in image with pytesseract python module

Issue recognizing text in image with pytesseract python module - python

I have the attached an image with 300 DPI. I am using the code below to extract text but I am getting no text. Anyone know the issue?
finalImg = Image.open('withdpi.jpg')
text = pytesseract.image_to_string(finalImg)
image to extract text from

Lets observe what is your code doing.
We need to see what part of the text is localized and detected.
For understanding the code behavior we will use image_to_data function.
image_to_data will show what part of the image is detected.
# Open the image and convert it to the gray-scale
finalImg = Image.open('hP5Pt.jpg').convert('L')
# Initialize ImageDraw class for displaying the detected rectangle in the image
finalImgDraw = ImageDraw.Draw(finalImg)
# OCR detection
d = pytesseract.image_to_data(finalImg, output_type=pytesseract.Output.DICT)
# Get ROI part from the detection
n_boxes = len(d['level'])
# For each detected part
for i in range(n_boxes):
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Initialize shape for displaying the current localized region
shape = [(x, y), (w, h)]
# Draw the region
finalImgDraw.rectangle(shape, outline="red")
# Display
finalImg.show()
# OCR "psm 6: Assume a single uniform block of text."
txt = pytesseract.image_to_string(cropped, config="--psm 6")
# Result
print(txt)
Result:
i
I
  ```
So the result is the image itself displays nothing is detected. The code is not-functional. The output does not display the desired result.
There might be various reasons.
Here are some facts of the input image:
Binary image.
Big rectangle artifact.
Text is a little bit dilated.
We can't know whether the image requires pre-processing without testing.
We are sure about the big-black-rectangle is an artifact. We need to remove the artifact. One solution is selecting part of the image.
To select the part of image, we need to use crop and some trial-and-error to find the roi.
If we the image as two pieces in terms of height. We don't want the other artifact containing half.
From the first glance, we want (0 -> height/2). If you play with the values you can see that the exact text location is between (height/6 -> height/4)
Result will be:
$1,582
Code:
# Open the image and convert it to the gray-scale
finalImg = Image.open('hP5Pt.jpg').convert('L')
# Get height and width of the image
w, h = finalImg.size
# Get part of the desired text
finalImg = finalImg.crop((0, int(h/6), w, int(h/4)))
# Initialize ImageDraw class for displaying the detected rectangle in the image
finalImgDraw = ImageDraw.Draw(finalImg)
# OCR detection
d = pytesseract.image_to_data(finalImg, output_type=pytesseract.Output.DICT)
# Get ROI part from the detection
n_boxes = len(d['level'])
# For each detected part
for i in range(n_boxes):
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Initialize shape for displaying the current localized region
shape = [(x, y), (w, h)]
# Draw the region
finalImgDraw.rectangle(shape, outline="red")
# Display
finalImg.show()
# OCR "psm 6: Assume a single uniform block of text."
txt = pytesseract.image_to_string(cropped, config="--psm 6")
# Result
print(txt)
If you can't get the same solution as mine, you need to check your pytesseract version, using:
print(pytesseract.get_tesseract_version())
For me the result is 4.1.1

Related

How to crop face detected via Mediapipe in Python

i have a problem with mediapipe coordinations. What i want to do is crop the box of the detected face.
https://google.github.io/mediapipe/solutions/face_detection.html
EXAMPLE OF PROCEDURE
And i use this code below:
mp_face_detection = mp.solutions.face_detection
# Setup the face detection function.
face_detection = mp_face_detection.FaceDetection(model_selection=0, min_detection_confidence=0.5)
# Initialize the mediapipe drawing class.
mp_drawing = mp.solutions.drawing_utils
# Read an image from the specified path.
sample_img = cv2.imread('12345.jpg')
# Specify a size of the figure.
plt.figure(figsize = [10, 10])
# Display the sample image, also convert BGR to RGB for display.
plt.title("Sample Image");plt.axis('off');plt.imshow(sample_img[:,:,::-1]);plt.show()
face_detection_results = face_detection.process(sample_img[:,:,::-1])
# Check if the face(s) in the image are found.
if face_detection_results.detections:
# Iterate over the found faces.
for face_no, face in enumerate(face_detection_results.detections):
# Display the face number upon which we are iterating upon.
print(f'FACE NUMBER: {face_no+1}')
print('---------------------------------')
# Display the face confidence.
print(f'FACE CONFIDENCE: {round(face.score[0], 2)}')
# Get the face bounding box and face key points coordinates.
face_data = face.location_data
# Display the face bounding box coordinates.
print(f'\nFACE BOUNDING BOX:\n{face_data.relative_bounding_box}')
# Iterate two times as we only want to display first two key points of each detected face.
for i in range(2):
# Display the found normalized key points.
print(f'{mp_face_detection.FaceKeyPoint(i).name}:')
print(f'{face_data.relative_keypoints[mp_face_detection.FaceKeyPoint(i).value]}')
So the results are in this form:
FACE NUMBER: 1
FACE CONFIDENCE: 0.89
FACE BOUNDING BOX:
xmin: 0.2784463167190552
ymin: 0.3503175973892212
width: 0.1538110375404358
height: 0.23071599006652832
RIGHT_EYE:
x: 0.3447018265724182
y: 0.4222590923309326
LEFT_EYE:
x: 0.39114508032798767
y: 0.3888365626335144
And i want to CROP the image in the coordinations of the BOX.
Like
face = Image.fromarray(image).crop(face_rect)
or any other crop procedure.
My problem is that i can't get the coords of the detected item from mediapipe.
Any ideas?

Got the solution guys
import dlib
from PIL import Image
from skimage import io
h, w, c = sample_img.shape
print('width: ', w)
print('height: ', h)
xleft = data.xmin*w
xleft = int(xleft)
xtop = data.ymin*h
xtop = int(xtop)
xright = data.width*w + xleft
xright = int(xright)
xbottom = data.height*h + xtop
xbottom = int(xbottom)
detected_faces = [(xleft, xtop, xright, xbottom)]
for n, face_rect in enumerate(detected_faces):
face = Image.fromarray(image_c).crop(face_rect)
face_np = np.asarray(face)
plt.imshow(face_np)

Assume, the objective is to crop a single detected face by mediapipe . Note the [0] to indicate that we are only interested in single face
results = mp_face.process(image_input)
detection=results.detections[0]
By default mediapipe returns detection data in normalize form and we have to convert to original size by multiplying x values by width and y values by height of input image.
We can employed the _normalized_to_pixel_coordinates available with the mediapipe
relative_bounding_box = location.relative_bounding_box
rect_start_point = _normalized_to_pixel_coordinates(
relative_bounding_box.xmin, relative_bounding_box.ymin, image_cols,
image_rows)
rect_end_point = _normalized_to_pixel_coordinates(
relative_bounding_box.xmin + relative_bounding_box.width,
relative_bounding_box.ymin + relative_bounding_box.height, image_cols,
image_rows)
This essentially produce
xleft,ytop=rect_start_point
xright,ybot=rect_end_point
In other word, ytop. ybot, xleft. xright represent face_top, face_bottom, face_left, and face_right, respectively.
Since the image is simply a 3D np array, we can crop it as below
crop_img = image_input[ytop: ybot, xleft: xright]
The complete code is as below
import cv2
import mediapipe as mp
from mediapipe.python.solutions.drawing_utils import _normalized_to_pixel_coordinates
# load face detection model
mp_face = mp.solutions.face_detection.FaceDetection(
model_selection=1, # model selection
min_detection_confidence=0.5 # confidence threshold
)
dframe= cv2.imread('xx.png',0)
image_rows, image_cols, _ = dframe.shape
image_input = cv2.cvtColor(dframe, cv2.COLOR_BGR2RGB)
results = mp_face.process(image_input)
detection=results.detections[0]
location = detection.location_data
relative_bounding_box = location.relative_bounding_box
rect_start_point = _normalized_to_pixel_coordinates(
relative_bounding_box.xmin, relative_bounding_box.ymin, image_cols,
image_rows)
rect_end_point = _normalized_to_pixel_coordinates(
relative_bounding_box.xmin + relative_bounding_box.width,
relative_bounding_box.ymin + relative_bounding_box.height, image_cols,
image_rows)
## Lets draw a bounding box
color = (255, 0, 0)
thickness = 2
cv2.rectangle(image_input, rect_start_point, rect_end_point, color, thickness)
xleft,ytop=rect_start_point
xright,ybot=rect_end_point
crop_img = image_input[ytop: ybot, xleft: xright]
cv2.imwrite('crop_image0.jpg', crop_img)

question about python and opencv for merge images

I wrote this code by python and opencv
I have 2 images (first is an image from football match 36.jpg) :
and (second is pitch.png an image (Lines of football field (Red Color)) with png format = without white background) :
With this code , I selected 4 coordinate points in both of the 2 images (4 corners of the right penalty area)
and then, with ( cv2.warpPerspective ) and showing it , we can show that first image from (Top View)
as below:
My question is, what changes do I need to make in my code so that the red colored lines from the second image appear on the first image in the same way as the images below (drawn in the Paint app): 
this is my code :
import cv2
import numpy as np
if __name__ == '__main__' :
# Read source image.
im_src = cv2.imread('c:/36.jpg')
# Four corners of penalty area in first image
pts_src = np.array([[314, 108], [693, 108], [903, 493],[311, 490]])
# Read destination image.
im_dst = cv2.imread('c:pitch.png')
# Four corners of right penalty area in pitch image.
pts_dst = np.array([[480, 76],[569, 76],[569, 292],[480, 292]])
# Calculate Homography
h, status = cv2.findHomography(pts_src, pts_dst)
# Warp source image to destination based on homography
im_out = cv2.warpPerspective(im_src, h, (im_dst.shape[1],im_dst.shape[0]))
# Display images
cv2.imshow("Source Image", im_src)
cv2.imshow("Destination Image", im_dst)
cv2.imshow("Warped Source Image", im_out)
cv2.waitKey(0)

Swap your source and destination images and points. Then, warp the source image:
im_out = cv2.warpPerspective(im_src, h, (im_dst.shape[1],im_dst.shape[0]), borderValue=[255,255,255])
and add this code
mask = im_out[:,:,0] < 100
im_out_overlapped = im_dst.copy()
im_out_overlapped[mask] = [0,0,255]

Group contours with the same y value

I have been following a tutorial about computer vision and doing a little project to read the time from a game. The game time is formatted h:m. So far I got the h and m figured out using findContours, but I'm having trouble isolating the colon as the character shape is not continuous. Because of this when I try to matchTemplate the code freaks out and starts to use the dot to match to all the other digits.
Are there ways to group the contours by X?
Here are simplified code to get the reference digits, the code to get digits from the screen is basically the same.
refCnts = cv2.findContours(ref.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
refCnts = imutils.grab_contours(refCnts)
refCnts = contours.sort_contours(refCnts, method="left-to-right")[0]
digits = {}
# loop over the OCR-A reference contours
for (i, c) in enumerate(refCnts):
# compute the bounding box for the digit, extract it, and resize
# it to a fixed size
(x, y, w, h) = cv2.boundingRect(c)
roi = ref[y:y + h, x:x + w]
roi = cv2.resize(roi, (10, 13))
digits[i] = roi
Im new to python and opencv. Apologies in advance if this is a dumb question.
Here is the reference image I'm using:
Here is the input image I'm trying to read:

Do you have to use findCountours? Because there are better suited methods for such problems. For instance, you can use template matching as shown below:
These are input, template (cut out from your reference image), and output images:
import cv2
import numpy as np
# Read the input image & convert to grayscale
input_rgb = cv2.imread('input.png')
input_gray = cv2.cvtColor(input_rgb, cv2.COLOR_BGR2GRAY)
# Read the template (Using 0 to read image in grayscale mode)
template = cv2.imread('template.png', 0)
# Perform template matching - more on this here: https://docs.opencv.org/4.0.1/df/dfb/group__imgproc__object.html#ga3a7850640f1fe1f58fe91a2d7583695d
res = cv2.matchTemplate(input_gray,template,cv2.TM_CCOEFF_NORMED)
# Store the coordinates of matched area
# found the threshold value of .56 using trial & error using the input image - might be different in your game
lc = np.where( res >= 0.56)
# Draw a rectangle around the matched region
# I used the width and height of the template image but in practice you need to use a better method to accomplish this
w, h = template.shape[::-1]
for pt in zip(*lc[::-1]):
cv2.rectangle(input_rgb, pt, (pt[0] + w, pt[1] + h), (0,255,255), 1)
# display output
cv2.imshow('Detected',input_rgb)
# cv2.imwrite('output.png', input_rgb)
# cv2.waitKey(0)
# cv2.destroyAllWindows()
You may also look into text detection & recognition using openCV.

how to use a handwritten T shape which is on a body part as the target and paste an image on it?

I am developing a project for my university assignment which has a AR part that I tried to with Unity and Vuforia. I want to get a simple T shape (or any shape which is easy for user to draw on a body part such as hand) as the image target, because I'm developing an app similar to inkHunter. In this app they have got a smiley as the image target and when the customer draws a smiley on the body and places the camera on it, the camera finds that and shows the selected tattoo design on it. I tried it with Vuforia SDK but they give a rating for the image target, so I can't get what I want as the image target. I think using openCV is the right way to do it but it's so hard to learn and I got less time. I think this is not a big thing to implement so please try to help me with this problem. I think you get my idea. in inkHunter even if I draw the target in a sheet also they show the tattoo on it. I need the same which means I need to detect the Drawn target. It would be great if you could help me in this situation. Thanks.
target can be like this,
I was able to do template matching from pictures and I applied the same to real-time which means I looped through the frames. But it does not seem to be matching the template with frames, And I realized that found(bookkeeping variable) is always None.
import cv2 as cv2
import numpy as np
import imutils
def main():
template = cv2.imread("C:\\Users\\Manthika\\Desktop\\opencvtest\\template.jpg")
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
template = cv2.Canny(template, 50, 200)
(tH, tW) = template.shape[:2]
cv2.imshow("Template", template)
windowName = "Something"
cv2.namedWindow(windowName)
cap = cv2.VideoCapture(0)
if cap.isOpened():
ret, frame = cap.read()
else:
ret = False
# loop over the frames to find the template
while ret:
# load the image, convert it to grayscale, and initialize the
# bookkeeping variable to keep track of the matched region
ret, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
found = None
# loop over the scales of the image
for scale in np.linspace(0.2, 1.0, 20)[::-1]:
# resize the image according to the scale, and keep track
# of the ratio of the resizing
resized = imutils.resize(gray, width=int(gray.shape[1] * scale))
r = gray.shape[1] / float(resized.shape[1])
# if the resized image is smaller than the template, then break
# from the loop
if resized.shape[0] < tH or resized.shape[1] < tW:
break
# detect edges in the resized, grayscale image and apply template
# matching to find the template in the image
edged = cv2.Canny(resized, 50, 200)
result = cv2.matchTemplate(edged, template, cv2.TM_CCOEFF)
(_, maxVal, _, maxLoc) = cv2.minMaxLoc(result)
# if we have found a new maximum correlation value, then update
# the bookkeeping variable
if found is None or maxVal > found[0]:
found = (maxVal, maxLoc, r)
print(found)
# unpack the bookkeeping variable and compute the (x, y) coordinates
# of the bounding box based on the resized ratio
print(found)
if found is None:
# just show only the frames if the template is not detected
cv2.imshow(windowName, frame)
else:
(_, maxLoc, r) = found
(startX, startY) = (int(maxLoc[0] * r), int(maxLoc[1] * r))
(endX, endY) = (int((maxLoc[0] + tW) * r), int((maxLoc[1] + tH) * r))
# draw a bounding box around the detected result and display the image
cv2.rectangle(frame, (startX, startY), (endX, endY), (0, 0, 255), 2)
cv2.imshow(windowName, frame)
if cv2.waitKey(1) == 27:
break
cv2.destroyAllWindows()
cap.release()
if __name__ == "__main__":
main()
Please help me to solve this problem

I can hint you with the OpenCV part, but without Unity and Vuforia, hope it may help.
So, the way I see the pipeline for the project:
Detect location, size, and aspect ratio
Use homography for transformation of the image that should be put over the original
Overlay put one image on top of the other
I will assume that the target will be a dark "T" on a white piece of paper, and it may appear in different locations of the paper, as well as the paper itself may move.
1. Detect location, size, and aspect ratio
Firstly, you need to detect the piece of paper, as you know its color and aspect ration, you may use RGB/HSV thresholding for segmentation. You may also try using Deep/Machine Learning (some similar strategy like in R-CNN, HOG-SVM etc.), but it will take time. Then, you can use findContours() function from OpenCV to get the largest object. From the contour you can get the location, size, and aspect ratio of the paper.
After that you do the same thing but within the piece of paper and looking for the "T". Here you can use template matching method, just scan the Region Of Interest with predefined mask of different sizes, or just repeat what steps above.
A useful resource may be this credit card characters recognition example. It helped me a lot one day:)
2. Use homography for transformation of the image that should be put over the original
After extracting aspect ratio you will know the approximate size and shape that should appear on top of the "T". This will let you to use homograpy for transformation of the image you want to put over "T". Here is a good point to start, you can also google for some other sources, there should be plenty of them, and as far as I know, OpenCV should have functions for that.
After the transformation, I would recommend you to use interpolation, because there might be some missing pixels afterwards.
3. Overlay put one image on top of the other
The last step is just to go through all pixels of the input image and put the transformed image over target pixels.
Hope this helps, good luck!:)

How to remove text stamps from images?

I have picture like this:
It has text stamps randomly distributed throughout the image file. Some aspect to keep in mind about the image are;
The text in the stamp is always same.
No transparency.
The text font is black thus there's some significant difference in contrast with original text.
So my question is;
How do I find this text stamps? I'm guessing, maybe template matching with tolerance could help?
Although even if I found the exact location of the text, how do I get rid it? I could try to figure out the random background and do something like I've mentioned as follows;
Get the bounding box of the text stamp contour.
Then take all pixels outside of the contour.
Removing the contour and filling with random pixels from previous step and adding some blur should do the trick as I'm expecting.

The following code removes the stamp from your image:
inp_img = cv2.imread('stamp.jpg',cv2.IMREAD_GRAYSCALE)
th,inp_img_thresh = cv2.threshold(255-inp_img,220,255,cv2.THRESH_BINARY)
dilate = cv2.dilate(inp_img_thresh,np.ones((5,5),np.uint8))
canny = cv2.Canny(dilate,0,255)
_,contours,_ =
cv2.findContours(canny,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
test_img = inp_img.copy()
for c in contours:
(x, y, w, h) = cv2.boundingRect(c)
#print(x,y,w,h,test_img[y+h//2,x-w])
test_img[y+3:y-2+h,x+3:x+w] = 240 #test_img[y+h//2,x-w]
cv2.imwrite("stamp_removed.jpg",test_img)
cv2.imshow("input image",inp_img)
cv2.imshow("threshold",inp_img_thresh)
cv2.imshow("output image",test_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.