Related
I'm fairly new on OpenCv and tesseract. I'm recently building a project on using computer vision to detect door labels. Hopefully it would be beneficial for visually impaired group.
The idea of the program is to preprocess the input image by converting it into binary color, then use canny edge to detect the outlines of door label, then dilate the canny edge result. After these, feed image to tesseract while trying to show the text detected with boxes.
Expected results are green rectangles on text. While printing out the text itself.
The issue is the missing rectangles and failure in text detection.
I have tried going through these:
Recognize Text in images using Canny Edge detection in Opencv
OpenCv pytesseract for OCR
Image preprocessing with OpenCV before doing character recognition (tesseract)
The questions and solutions are either too simple or not as relevant. Some are not in python as well.
Attached below is my attempt on the code:
import pytesseract as pytess
import cv2 as cv
import numpy as np
from PIL import Image
from pytesseract import Output
img = cv.imread(r"C:\Users\User\Desktop\dataset\p\Image_31.jpg", 0)
# edges store the canny version of img
edges = cv.Canny(img, 100, 200)
# ker as in kernel
# (5, 5) is the matrix while uint8 is datatype
ker = np.ones((3, 3), np.uint8)
# dil as in dilation
# edges as the src, ker is the kernel we set above, number of dilation
dil = cv.dilate(edges, ker, iterations=1)
# setup pytesseract parameters
configs = r'--oem 3 --psm 6'
# feed image to tesseract
result = pytess.image_to_data(dil, output_type=Output.DICT, config=configs, lang='eng')
print(result.keys())
boxes = len(result['text'])
# make a new copy of edges
new_item = dil.copy()
for sequence_number in range(boxes):
if int(result['conf'][sequence_number]) > 30: # removed constraints
(x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
result['width'][sequence_number], result['height'][sequence_number])
new_item = cv.rectangle(new_item, (x, y), (x + w, y + h), (0, 255, 0), 2)
# detect sentence with tesseract
# pending as rectangle not achieved
cv.imshow("original", img)
cv.imshow("canny", edges)
cv.imshow("dilation", dil)
cv.imshow("capturedText", new_item)
#ignore below this line, it is only for testing
#testobj = Image.fromarray(dil)
#testtext = pytess.image_to_string(testobj, lang='eng')
#print(testtext)
cv.waitKey(0)
cv.destroyAllWindows()
Resultant image:
The testing part of the code return results as shown below:
a)
Meets
Which, obviously does not satisfy the objective.
EDIT
After posting the question, I realized I may have done it wrong in the beginning. I should attempt to use OpencV to detect the contour of the door label and isolate the part containing text before sending whatever is in the rectangle for OCR recognition.
EDIT2
Now that I identify the issue thanks to our stackoverflow members, now I'm attempting to add on image rectification/image wrapping technique to retrieve a straight front view to get a better accuracy for the system. Update soon.
EDIT3
After certain bug fixing, reducing the constraint while allowing the function to draw on the original image, I have achieved the results below. Attached the updated code as well.
import cv2 as cv
import numpy as np
import pytesseract as pytess
from pytesseract import Output
# input of img source
img = cv.imread(r"C:\Users\User\Desktop\dataset\p\Image_31.jpg")
# necessary image color conversion
img2 = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
# edges store the canny version of img
edges = cv.Canny(img2, 100, 200)
# ker as in kernel
# (5, 5) is the matrix while uint8 is datatype
ker = np.ones((3, 3), np.uint8)
# dil as in dilation
# edges as the src, ker is the kernel we set above, number of dilation
dil = cv.dilate(edges, ker, iterations=1)
# setup pytesseract parameters
configs = r'--oem 3 --psm 6'
# feed image to tesseract
result = pytess.image_to_data(dil, output_type=Output.DICT, config=configs, lang='eng')
# number of boxes that encapsulate the boxes
boxes = len(result['text'])
# make a new copy of edges
new_item = dil.copy()
for sequence_number in range(boxes):
if int(result['conf'][sequence_number]) > 0: #removed constraints
(x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
result['width'][sequence_number], result['height'][sequence_number])
# draw rectangle boxes on the original img
cv.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 3)
# Crop the image
crp = new_item[y:y + h, x:x + w]
# OCR
txt = pytess.image_to_string(crp, config=configs)
# returns recognised text
print(txt)
cv.imshow("capturedText", crp)
cv.waitKey(0)
# cv.imshow("original", img)
# cv.imshow("canny", edges)
# cv.imshow("dilation", dil)
cv.imshow("results", img)
cv.waitKey(0)
cv.destroyAllWindows()
You have found all the detected text in the image:
for sequence_number in range(boxes):
if int(result['conf'][sequence_number]) > 30:
(x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
result['width'][sequence_number], result['height'][sequence_number])
new_item = cv.rectangle(new_item, (x, y), (x + w, y + h), (0, 255, 0), 2)
But you also say the current confidence should be more than 70%.
If we remove the constraint
If we OCR each new item
Result will be:
Now if you read:
txt = pytesseract.image_to_string(new_item, config="--psm 6")
print(txt)
OCR will be:
Meeting Room ยง
The output of the current pytesseract version 0.3.7
Code:
# Load the libraries
import cv2
import pytesseract
# Load the image
img = cv2.imread("fsUSw.png")
# Convert it to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# OCR detection
d = pytesseract.image_to_data(gry, config="--psm 6", output_type=pytesseract.Output.DICT)
# Get ROI part from the detection
n_boxes = len(d['level'])
# For each detected part
for i in range(1, 2):
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Draw rectangle to the detected region
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)
# Crop the image
crp = gry[y:y + h, x:x + w]
# OCR
txt = pytesseract.image_to_string(crp, config="--psm 6")
print(txt)
# Display the cropped image
cv2.imshow("crp", crp)
cv2.waitKey(0)
# Display
cv2.imshow("img", img)
cv2.waitKey(0)
I think what you are looking for here is image rectificaiton (warping image to make it look like taken from another point of view) and there seem to be tools for this in python. However, the problem gets more complicated since in your case you need to detect how you want to rectify it. I am not sure how you should go about that.
i want to ask something. In opencv there is cv2.rectangle to build a rectangle of object interest. After I got the object interest which is represent with rectangle, I want to get the region of interest box. To do that I used crop = frame[y:y+h, x:x+w].
Now I want to get the value of x,y,w,h of that cropped box. How to do that?
Here is my code
while bol:
capture = cv2.VideoCapture(0)
ret, frame = capture.read()
#load object detector
path = os.getcwd()+'\haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(path)
#convert image to grayscale
imgGray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
for (x, y, w, h) in face_cascade.CascadeClassifier(imgGray):
cv2.rectangle(frame, (x,y), (x+w, y+h), (255,255,255), 3)
_croppedImage = frame[y:y+h, x:x+w]
ROI = _croppedImage
how to get value of x,y,w,h of _croppedImage ?
The value of x & y for _croppedImage is of no importance and significance. This is because the _croppedImage is simply an image cropped out of another image.
To get the w & h for _croppedImage, you can use .shape() method. w & h of an image is simply the width and height of the image respectively.
h, w = _croppedImage[:2]
I'm writing a script to detect faces and blur out everything but the face.
I find the faces using Haar Cascades, then create a mask for with circles where the faces are. Then I add them together. This works fine for adding where the divide is absolute but I can't work out how to have the blur taper off without a blunt line. Blurring the mask just creates an ugly line where the tapering should be.
import numpy as np
import cv2
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
original_image = cv2.imread('person.jpg')
gray_original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2GRAY)
# mask
# detect faces and create mask
mask = np.full(
(original_image.shape[0], original_image.shape[1], 1), 0, dtype=np.uint8)
faces = face_cascade.detectMultiScale(gray_original_image, 1.3, 5)
face_areas = []
for (x, y, w, h) in faces:
face_areas.append(([x, y, w, h], original_image[y:y+h, x:x+w]))
center = (x + w // 2, y + h // 2)
radius = max(h, w) // 2
cv2.circle(mask, center, radius, (255), -1)
# blur original image
kernel = np.ones((5, 5), np.float32) / 25
blurred_image = cv2.filter2D(original_image, -1, kernel)
# blur mask to get tapered edge
mask = cv2.filter2D(mask, -1, kernel)
# composite blurred and unblurred faces
mask_inverted = cv2.bitwise_not(mask)
background = cv2.bitwise_and(
blurred_image, blurred_image, mask=mask_inverted)
foreground = cv2.bitwise_and(original_image, original_image, mask=mask)
composite = cv2.add(background, foreground)
cv2.imshow('composite', composite)
cv2.waitKey(0)
cv2.destroyAllWindows()
Original
Output from script
Desired result (no obvious line between blurred/non blurred)
You did most of the job, instead of cv2.mean, which by the way does not work when I tried, simply do:
composite = (foreground + background)
and you would get:
I executed a program that reads all .jpg files from directory, performs face detection, crops the faces and saves them.
The problem is that when run an official python program I am able to detect all faces, but it saves only few faces from every image.
What am I doing wrong?
import cv2
import sys
import glob
cascPath = "haarcascade_frontalface_default.xml"
# Create the haar cascade
faceCascade = cv2.CascadeClassifier(cascPath)
files=glob.glob("*.jpg")
for file in files:
# Read the image
image = cv2.imread(file)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect faces in the image
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30),
flags = cv2.cv.CV_HAAR_SCALE_IMAGE
)
print "Found {0} faces!".format(len(faces))
# Crop Padding
left = 10
right = 10
top = 10
bottom = 10
# Draw a rectangle around the faces
for (x, y, w, h) in faces:
print x, y, w, h
# Dubugging boxes
# cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)
image = image[y-top:y+h+bottom, x-left:x+w+right]
print "cropped_{1}{0}".format(str(file),str(x))
cv2.imwrite("cropped_{1}_{0}".format(str(file),str(x)), image)
As Gall said in the comments, the problem comes from your indentation. Your last three lines are not executed for each face as their indentation does not make them part of the loop over the faces. You want something like this:
# Draw a rectangle around the faces
for (x, y, w, h) in faces:
image = image[y-top:y+h+bottom, x-left:x+w+right]
cv2.imwrite("cropped_{1}_{0}".format(str(file),str(x)), image)
Note that with this code, there is a possibility of filename collision (2 faces with same in x in an image). You may want want to use a unique string to avoid that problem. A simple counter would do the trick.
problem your code is that when the first person cuts out the image, following already tries to cut not from the original image, and already from a certain first person so it's code you look think everyone will understand
import numpy as np
import cv2
import sys
import glob
cascPath = "haarcascade_frontalface_default.xml"
faceCascade = cv2.CascadeClassifier(cascPath)
img = cv2.imread('3.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30),
#flags = cv2.cv.CV_HAAR_SCALE_IMAGE
flags=0
)
print "Found {0} faces!".format(len(faces))
left = 10
right = 10
top = 10
bottom = 10
i=0
count=0
for (x, y, w, h) in faces:
print x, y, w, h, i
i=i+1
img = img[y-top:y+h+bottom, x-left:x+w+right]
cv2.imwrite('foo{}.png'.format(count), img)
count += 1
img=cv2.imread('3.jpg')
I would like to know is there is a way to blur the faces that have been automatically identify by the haarcascade face classifier.
using the code below, I'm able to detect the faces, crop the image around this face or draw a rectangle on it.
image = cv2.imread(imagepath)
# Specify the trained cascade classifier
face_cascade_name = "./haarcascade_frontalface_alt.xml"
# Create a cascade classifier
face_cascade = cv2.CascadeClassifier()
# Load the specified classifier
face_cascade.load(face_cascade_name)
#Preprocess the image
grayimg = cv2.cvtColor(image, cv2.cv.CV_BGR2GRAY)
grayimg = cv2.equalizeHist(grayimg)
#Run the classifiers
faces = face_cascade.detectMultiScale(grayimg, 1.1, 2, 0|cv2.cv.CV_HAAR_SCALE_IMAGE, (30, 30))
print "Faces detected"
if len(faces) != 0: # If there are faces in the images
for f in faces: # For each face in the image
# Get the origin co-ordinates and the length and width till where the face extends
x, y, w, h = [ v for v in f ]
# Draw rectangles around all the faces
cv2.rectangle(image, (x,y), (x+w,y+h), (255,255,255))
sub_face = image[y:y+h, x:x+w]
for i in xrange(1,31,2):
cv2.blur(sub_face, (i,i))
face_file_name = "./face_" + str(y) + ".jpg"
cv2.imwrite(face_file_name, sub_face)
But I would like to blur the face of the people so they can't be recognized.
Do you have an idea on how to do that?
Thanks for your help
Arnaud
I finally succeeded to do what I want.
To do that apply a gaussianblur as Hammer has suggested.
The code is :
image = cv2.imread(imagepath)
result_image = image.copy()
# Specify the trained cascade classifier
face_cascade_name = "./haarcascade_frontalface_alt.xml"
# Create a cascade classifier
face_cascade = cv2.CascadeClassifier()
# Load the specified classifier
face_cascade.load(face_cascade_name)
#Preprocess the image
grayimg = cv2.cvtColor(image, cv2.cv.CV_BGR2GRAY)
grayimg = cv2.equalizeHist(grayimg)
#Run the classifiers
faces = face_cascade.detectMultiScale(grayimg, 1.1, 2, 0|cv2.cv.CV_HAAR_SCALE_IMAGE, (30, 30))
print "Faces detected"
if len(faces) != 0: # If there are faces in the images
for f in faces: # For each face in the image
# Get the origin co-ordinates and the length and width till where the face extends
x, y, w, h = [ v for v in f ]
# get the rectangle img around all the faces
cv2.rectangle(image, (x,y), (x+w,y+h), (255,255,0), 5)
sub_face = image[y:y+h, x:x+w]
# apply a gaussian blur on this new recangle image
sub_face = cv2.GaussianBlur(sub_face,(23, 23), 30)
# merge this blurry rectangle to our final image
result_image[y:y+sub_face.shape[0], x:x+sub_face.shape[1]] = sub_face
face_file_name = "./face_" + str(y) + ".jpg"
cv2.imwrite(face_file_name, sub_face)
# cv2.imshow("Detected face", result_image)
cv2.imwrite("./result.png", result_image)
Arnaud
The whole end of your code can be replaced by :
img[startX:endX, startY:endY] = cv2.blur(img[startX:endX, startY:endY], (23, 23))
instead of :
# Get the origin co-ordinates and the length and width till where the face extends
x, y, w, h = [ v for v in f ]
# get the rectangle img around all the faces
cv2.rectangle(image, (x,y), (x+w,y+h), (255,255,0), 5)
sub_face = image[y:y+h, x:x+w]
# apply a gaussian blur on this new recangle image
sub_face = cv2.GaussianBlur(sub_face,(23, 23), 30)
# merge this blurry rectangle to our final image
result_image[y:y+sub_face.shape[0], x:x+sub_face.shape[1]] = sub_face
Especially because you don't request to have a circular mask, it's (to me) much easier to read.
PS : Sorry for not commenting, not enough reputation to do it. Even if the post is 5 years old, I guess this may be worth it, as found it for this particular question ..
Note: Neural Networks (like Resnet) are now more accurate than HAAR Cascade to detect the faces, and they are also now integrated in OpenCV. It might be better than using the solutions mentionned in this question.
However, the code to blur / pixelate a face is still applicable.
You can also pixelate the region of the face by adding squares that contain the average of RGB values of the zones in the face.
A function performing this could be like that:
def pixelate_image(image: np.ndarray, nb_blocks=5, in_place=False) -> np.ndarray:
"""Return a pixelated version of a picture (need to be fed with a face to pixelate)"""
# To pixelate, we will split into a given number of blocks
# For each block, we will compute the average of RGB values of the block
# And then we can just replace with a rectangle of this color
# divide the input image into NxN blocks
if not in_place:
image = np.copy(image)
h, w = image.shape[:2]
blocks = tuple(
np.linspace(0, d, nb_blocks + 1, dtype="int") for d in (w, h)
)
for i, j in product(*[range(1, len(s)) for s in blocks]):
# compute the starting and ending (x, y)-coordinates
# for the current block
start = blocks[0][i - 1], blocks[1][j - 1]
end = blocks[0][i], blocks[1][j]
# extract the ROI using NumPy array slicing, compute the
# mean of the ROI, and then draw a rectangle with the
# mean RGB values over the ROI in the original image
roi = image[start[1]:end[1], start[0]:end[0]]
bgr = [int(x) for x in cv2.mean(roi)[:3]]
cv2.rectangle(image, start, end, bgr, -1)
return image
You then just need to use it in a function like this (updated to Python 3 with pathlib and type hints):
from pathlib import Path
from typing import Union
import cv2
import numpy as np
PathLike = Union[Path, str]
face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_alt.xml")
def pixelate_faces_haar(img_path: PathLike, dest: Path):
"""Pixelate faces of people with OpenCV and save to a destination file"""
img = cv2.imread(str(img_path))
# To use cascade, we need to use Grayscale images
# We can then detect faces
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.1, 4)
for (x, y, width, height) in faces:
roi = img[y:y+height, x:x+width]
pixelate_image(roi, 15, in_place=True)
dest.parent.mkdir(parents=True, exist_ok=True)
cv2.imwrite(str(dest), img)
print(f"Saved pixelated version of {img_path} to {dest}")```