Hi I am working on facial recognition.
To increase performance I want to use facial alignment.
When I use the HOG face identifier, described e.g., by Adrian I get an aligned image out.
from imutils.face_utils import rect_to_bb
from dlib import get_frontal_face_detector
detector = dlib.get_frontal_face_detector()
shape_predictor = dlib.shape_predictor('/home/base/Documents/facial_landmarks/shape_predictor_5_face_landmarks.dat')
fa = face_utils.facealigner.FaceAligner(shape_predictor, desiredFaceWidth=112, desiredLeftEye=(0.3, 0.3))
img=cv2.imread(pathtoimage)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
rects = detector(gray, 2)
for rect in rects:
(x, y, w, h) = rect_to_bb(rect)
faceAligned = fa.align(img, gray, rect)
However, I have to work on an embedded hardware and the HOG facial recognition is not fast enough. The best working is the cv2 lbpcascader.
With cv2 I also get the box of the found face, but using that works not.
faces_detected = face_cascade.detectMultiScale(img, scaleFactor=1.1, minNeighbors=4)
In other examples using the HOG, the coordinates are extracted from the HOG-rect with:
(x, y, w, h) = rect_to_bb(rect)
and then used with
aligned_face = fa.align(img, gray, dlib.rectangle(left = x, top=y, right=w, bottom=h))
The idea would be to exchange the x,y,w,h with the cv2 values. Unfortunately, that does not work as the two lines above result in a complete false alignment. In the first code example, the rect_to_bb function is included but not used.
I checked the values and they are somehow off:
224x224 the image
156 70 219 219 the cv2 values (slightly different of course)
165 101 193 193 the rect values with rect_to_bb
[(165, 101) (358, 294)] the rect values
I checked the rect_to_bb function, but this seems straight forward:
def rect_to_bb(rect):
# take a bounding predicted by dlib and convert it
# to the format (x, y, w, h) as we would normally do
# with OpenCV
x = rect.left()
y = rect.top()
w = rect.right() - x
h = rect.bottom() - y
# return a tuple of (x, y, w, h)
return (x, y, w, h)
While typing I got the answer... classic
the alignment function needs the bounding box marks slightly different.
It can be seen in the rect_to_bb() function.
def rect_to_bb(rect):
# take a bounding predicted by dlib and convert it
# to the format (x, y, w, h) as we would normally do
# with OpenCV
x = rect.left()
y = rect.top()
w = rect.right() - x
h = rect.bottom() - y
# return a tuple of (x, y, w, h)
return (x, y, w, h)
There the rect.right (w in cv2) and the rect.bottom (h in cv2) are subtracted with x and y. So in the alignment function you have to add the values, otherwise the image fed to the alignment function is much to small and out of shape. And this can also be the values from the cv2 detection.
aligned_face = fa.align(img, gray, dlib.rectangle(left = x, top=y, right=w+x, bottom=h+y))
Keep healthy
Related
I'm using Mtcnn network (https://towardsdatascience.com/face-detection-using-mtcnn-a-guide-for-face-extraction-with-a-focus-on-speed-c6d59f82d49) to detect faces and heads. For this I'm using the classical lines code for face detection :I get the coordinate of the top-left corner of the bouding-box of the face (x,y) + the height and width of the box (h,w), then I expand the box to get the head in my crop :
import mtcnn
img = cv2.imread('images/'+path_res)
faces = detector.detect_faces(img)# result
for result in faces:
x, y, w, h = result['box']
x1, y1 = x + w, y + h
x, y, w, h = result['box']
x1, y1 = x + w, y + h
if x-100>=0:
a=x-100
else:
a=0
if y-150 >=0:
b=y-150
else:
b=0
if x1+100 >= w:
c=x1+100
else:
c=w
if y1+60 >= h:
d=y1+60
else:
d=h
crop=img[b:d,a:c] #<--- final crop of the head
the problem is this solution works for some images, but for many anothers, in my crop, I get the shoulders and the neck of the target person. I think, it's because, the pixels/inch in each image (i.e. +150pixels in one image isn't the same in another image). Hence, what can I do to extract the head properly ?
Many thanks
You can use relative instead of absolute sizes for the margins around the detected faces. For example, 50% on top, bottom, left and right:
import mtcnn
img = cv2.imread('images/'+path_res)
faces = []
for result in detector.detect_faces(img):
x, y, w, h = result['box']
b = max(0, y - (h//2))
d = min(img.shape[0], (y+h) + (h//2))
a = max(0, x - (w//2):(x+w))
c = min(img.shape[1], (x+w) + (w//2))
face = img[b:d, a:c, :]
faces.append(face)
I'm attempting to create a programĀ in which my code analyses a video from my security camera and locates the cars that have been spotted. I've figured out how to find the cars and draw red rectangles around them, but I'd like to add a condition that only draws boxes if there are more than 5 cars detected. However, I am unable to do so due to the presence of arrays. How would I go about fixing this code?
import cv2
classifier_file = 'cars.xml'
car_tracker = cv2.CascadeClassifier(classifier_file)
video = cv2.VideoCapture("footage.mp4")
while True:
(read_successful, frame) = video.read()
if read_successful:
grayscaled_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
else:
break
cars = car_tracker.detectMultiScale(grayscaled_frame)
if cars > 4:
for (x, y, w, h) in cars:
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.imshow('Car Detector', frame)
cv2.waitKey(0)
By the way, I am using an anaconda environment along with python 3.8.8
You have an array of detections.
All you need is to call len() on the array to get the number of elements. This is a built-in function of Python.
cars = car_tracker.detectMultiScale(grayscaled_frame)
if len(cars) >= 5:
for (x, y, w, h) in cars:
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
No need for any other libraries.
Here is one example using a different technique, in this case you can use cvlib it has a class object_detection to detect multiple classes. This uses the yolov4 weights, you will be capable of detect any car in the road or in this case from one webcam, only make sure that the car are ahead the camera not in 2d position.
import cv2
import matplotlib.pyplot as plt
# pip install cvlib
import cvlib as cv
from cvlib.object_detection import draw_bbox
im = cv2.imread('cars3.jpg')
#cv2.imshow("cars", im)
#cv2.waitKey(0)
bbox, label, conf = cv.detect_common_objects(im)
#yolov4 weights will be downloaded. Check: C:\Users\USER_NAME\.cvlib\object_detection\yolo
# if the download was not successful delete yolo folder and try again, it will be downloaded again
# after download the weights keep running the script it will say 0 cars, then close Anaconda spyder anda reopen it and try again.
#get bbox
output_image_with_bbox = draw_bbox(im, bbox, label, conf)
number_cars = label.count('car')
if number_cars > 5:
print('Number of cars in the image is: '+ str(number_cars))
plt.imshow(output_image_with_bbox)
plt.show()
else:
print('Number of cars in the image is: '+ str(number_cars))
plt.imshow(im)
plt.show()
output:
Greetings.
def face_detect(img):
hog_rects = hog_detector(img, 0)
hog_faces = np.zeros((0, 4), dtype=int)
for (i, rect) in enumerate(hog_rects):
(x, y, w, h) = rect_to_bb(rect)
face = np.asarray((x, y, w, h), dtype=int)
hog_faces = np.append(hog_faces, [face], axis=0)
return hog_faces
def detect(img, cascade, minimumFeatureSize=(20, 20)):
if cascade.empty():
raise (Exception("There was a problem loading your Haar Cascade xml file."))
rects = cascade.detectMultiScale(img, scaleFactor=1.3, minNeighbors=1, minSize=minimumFeatureSize)
if len(rects) == 0:
return []
rects[:, 2:] += rects[:, :2] # convert last coord from (width,height) to (maxX, maxY)
return rects
def eye_detect(faces, gray, minEyeSize):
# eyes = np.zeros((0, 4), dtype=int)
for (x, y, w, h) in faces:
roi_gray = gray[y:h, x:w]
detected_eyes = detect(roi_gray, haarEyeCascade, minEyeSize)
eyeFix = detected_eyes + [x, y, x, y]
# eyes = np.append(eyes, eyeFix, axis=0)
return eyeFix
I use the above function for dlib_face detector and loop through the detected faces to find eyes with the eye_detect function using OpenCV haar eyecascade. The input is VideoCapture input from OpenCV. The output of all the functions is a numpy array containing min x, min y max x, max y of the detected feature.
If there is only one face the code works fine. But as soon as a second face comes, it throws
UnboundLocalError: local variable 'eyeFix' referenced before assignment
I want it to only detect eyes on the existing face and not on new faces. What can I do to improve this code?
I am pasting multiple images randomly on the custom white background image, but I cannot figure out how can I avoid images to overlap each other? What I want is that every time an image is pasted, it must not be on the same position as the previous image. So far, I wasn't able to find much on it.
Any help pointing me in the right direction would be appreciated.
for image_to_paste in os.listdir(path):
image_to_paste = Image.open(os.path.join(path, image_to_paste))
i_width, i_height = image_to_paste.size
b_width, b_height = back_image.size
img = random.randint(0, max(0, back_image.size[0]-image_to_paste.size[0])), \
random.randint(0, max(0, back_image.size[1]-image_to_paste.size[1]))
back_image.paste(image_to_paste, img)
n += 1
if n == 5:
back_image.save(f'path to save output images\_{n}.jpg')
back_image = Image.new('RGB', (1440, 900), (255, 255, 255, 255))
n = 0
Here is an example code that pastes multiple images on a white background and avoids them to overlap. It generates random x and y coordinates for each image, and checks if the new image overlaps with any previous images before placing it. If it does overlap, new x and y coordinates are generated until it does not overlap.
import random
from PIL import Image
def random_coordinates(width, height, existing_coordinates, image):
"""Generate random x, y coordinates within the given dimensions,
making sure they do not overlap with any existing coordinates."""
img_width, img_height = image.size
x, y = random.randint(0, width-img_width), random.randint(0, height-img_height)
while any(x <= x2+w and x+img_width >= x2 and y <= y2+h and y+img_height >= y2 for x2, y2, w, h in existing_coordinates):
x, y = random.randint(0, width-img_width), random.randint(0, height-img_height)
return (x, y, img_width, img_height)
def place_images(images, background_path):
"""Paste the given images onto the background, with or without overlap."""
background = Image.new('RGB', (400, 400), (255, 255, 255))
coordinates = []
for image in images:
x, y, w, h = random_coordinates(background.width, background.height, coordinates, image)
coordinates.append((x, y, w, h))
background.paste(image, (x, y))
background.save(background_path)
return background
You would use this function by calling place_images(images, background_path) where images is a list of PIL images and background_path is a string representing the path of the white background image.
I'm working on sign language recognition (hand gestures) using OpenCV. I want to select a particular region (ROI) as input from the entire frame and give it to the model. Right now, I'm able to do the same with the entire frame but can't have ROI. This is the code I've tried so far :
_, image_frame = cam_capture.read()
r = cv2.rectangle(image_frame, upper_left, bottom_right, (100, 50, 200), 5)
rect_img = image_frame[upper_left[1] : bottom_right[1], upper_left[0] : bottom_right[0]]
sketcher_rect = rect_img
sketcher_rect = sketch_transform(sketcher_rect)
cv2.resize(sketcher_rect, (28,28), interpolation = cv2.INTER_AREA)
pred_probab, pred_class = keras_predict(model, sketcher_rect)
print(pred_class, pred_probab)
cv2.imshow('image_frame',sketcher_rect)
Any help ?
If I understand you correctly you simply want to cut a region from the image right?
This should work:
def crop_image(image, x, y, width, height):
"""
image: a cv2 frame
x, y, width, height: the region to cut out
"""
return image[y:y + height, x:x + width]
Remember that cv2 images are simply numpy arrays. Cutting a region of the image is the same as extracting the indexes from the array.