I'm trying to get a list with landmark coordinates with MediaPipe's Face Mesh. For example: Landmark[6]: (0.36116672, 0.93204623, 0.0019629495)
I cant find the way to do that and would appreciate the help.
Mediapipe has more complex interface than most of the models you see publicly.
But what you're looking for is easily achievable anyway.
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_face_mesh = mp.solutions.face_mesh
file_list = ['test.png']
# For static images:
drawing_spec = mp_drawing.DrawingSpec(thickness=1, circle_radius=1)
with mp_face_mesh.FaceMesh(
static_image_mode=True,
min_detection_confidence=0.5) as face_mesh:
for idx, file in enumerate(file_list):
image = cv2.imread(file)
# Convert the BGR image to RGB before processing.
results = face_mesh.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# Print and draw face mesh landmarks on the image.
if not results.multi_face_landmarks:
continue
annotated_image = image.copy()
for face_landmarks in results.multi_face_landmarks:
print('face_landmarks:', face_landmarks)
mp_drawing.draw_landmarks(
image=annotated_image,
landmark_list=face_landmarks,
connections=mp_face_mesh.FACE_CONNECTIONS,
landmark_drawing_spec=drawing_spec,
connection_drawing_spec=drawing_spec)
In this example, which is taken from here, you can see that they're iterating through results.multi_face_landmarks:
for face_landmarks in results.multi_face_landmarks:
Each iterable here consists of information about each face detected in the image, and length of results.multi_face_landmarks is number of faces detected in the image.
When you print attributes of let's say - first face, you'll see 'landmark' as a last attribute.
dir(results.multi_face_landmarks[0])
>> ..., 'landmark']
We need landmark attribute to acquire pixel coordinates after one step further.
Length of landmark attribute is 468, which basically is number of predicted [x,y,z] keypoints after regression.
If we take first keypoint:
results.multi_face_landmarks[0].landmark[0]
it will give us normalized [x,y,z] values:
x: 0.25341567397117615
y: 0.71121746301651
z: -0.03244325891137123
Finally, x, y and z here are attributes of each keypoint. We can check that by calling dir() on keypoint.
Now you can easily reach normalized pixel coordinates:
results.multi_face_landmarks[0].landmark[0].x -> X coordinate
results.multi_face_landmarks[0].landmark[0].y -> Y coordinate
results.multi_face_landmarks[0].landmark[0].z -> Z coordinate
For denormalization of pixel coordinates, we should multiply x coordinate by width and y coordinate by height.
Sample code:
for face in results.multi_face_landmarks:
for landmark in face.landmark:
x = landmark.x
y = landmark.y
shape = image.shape
relative_x = int(x * shape[1])
relative_y = int(y * shape[0])
cv2.circle(image, (relative_x, relative_y), radius=1, color=(225, 0, 100), thickness=1)
cv2_imshow(image)
Which would give us:
Click to see result image
Here is a full explanation -
Face Mesh MediaPipe
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_face_mesh = mp.solutions.face_mesh
# For static images:
file_list = ['test.png']
drawing_spec = mp_drawing.DrawingSpec(thickness=1, circle_radius=1)
with mp_face_mesh.FaceMesh(
static_image_mode=True,
max_num_faces=1,
min_detection_confidence=0.5) as face_mesh:
for idx, file in enumerate(file_list):
image = cv2.imread(file)
# Convert the BGR image to RGB before processing.
results = face_mesh.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# Print and draw face mesh landmarks on the image.
if not results.multi_face_landmarks:
continue
annotated_image = image.copy()
for face_landmarks in results.multi_face_landmarks:
print('face_landmarks:', face_landmarks)
Lets work with this particular image
Once load the image, we first instantiate the mediapipe solutions
face_mesh = mp.solutions.face_mesh.FaceMesh(static_image_mode=True, max_num_faces=2,
min_detection_confidence=0.5)
and detect all faces via process as below
results = face_mesh.process(cv2.cvtColor(image_input , cv2.COLOR_BGR2RGB))
To access all the landmark, for this particular face, we can iterate throu the landmark via
ls_single_face=results.multi_face_landmarks[0].landmark
for idx in ls_single_face:
print(idx.x,idx.y,idx.z)
Which will output the x, y, and z coordinate
0.6062703132629395 0.34374159574508667 -0.02611529268324375
0.6024502515792847 0.3223230540752411 -0.05503281578421593
0.6047719717025757 0.32883960008621216 -0.029224306344985962
0.5947933793067932 0.29429933428764343 -0.04156317934393883
0.6020699143409729 0.31391528248786926 -0.058685336261987686
0.6023058295249939 0.3025013208389282 -0.054952703416347504
The full code is as below
import cv2
import mediapipe as mp
dframe = cv2.imread("detect_face/person.png")
image_input = cv2.cvtColor(dframe, cv2.COLOR_BGR2RGB)
face_mesh = mp.solutions.face_mesh.FaceMesh(static_image_mode=True, max_num_faces=2,
min_detection_confidence=0.5)
image_rows, image_cols, _ = dframe.shape
results = face_mesh.process(cv2.cvtColor(image_input , cv2.COLOR_BGR2RGB))
ls_single_face=results.multi_face_landmarks[0].landmark
for idx in ls_single_face:
print(idx.x,idx.y,idx.z)
Using similar strategy, we can plot a marker for a the given face landmark by iterating each of the coordinate.
from mediapipe.python.solutions.drawing_utils import _normalized_to_pixel_coordinates
ls_single_face=results.multi_face_landmarks[0].landmark
for idx in ls_single_face:
cord = _normalized_to_pixel_coordinates(idx.x,idx.y,image_cols,image_rows)
cv2.putText(image_input, '.', cord,cv2.FONT_HERSHEY_SIMPLEX, 0.3, (0, 0, 255), 2)
Which will output
The original image was retrieved from this link.
Mediapipe also have the built in approach to detect key face region as discussed here
Mediapipe's landmarks value is normalized by the width and height of the image. After, getting the landmark value simply multiple the x of the landmark with the width of your image and y of the landmark with the height of your image.
You may check this link for a complete tutorial on mediapipe. It's under craft but is going to be completed very soon.
To print the coordinates of the landmarks you have to check if they
exist and after that you can access x, y and z coordinates.The code for landmark 0 is:
#in the cycle of capture
if results.multi_face_landmarks:
coord= results.multi_face_landmarks.landmark[0]
print(''.join(['(',str(coord.x),',',str(coord.y),',',str(coord.z) ,')']))
Related
I took an iPhone video of my computer monitor with a chessboard on it. During the video I did not change any of the camera settings, just simply moved my phone around.
From the video, I saved two screenshots where the grid was fully in view:
I calibrated both images using the code below and I got two very different sets of distortion coefficients... why are they different if it's the exact same camera?
import cv2
import numpy as np
import os
import glob
# Define the dimensions of checkerboard
CHECKERBOARD = (15,22)
# stop the iteration when specified
# accuracy, epsilon, is reached or
# specified number of iterations are completed.
criteria = (cv2.TERM_CRITERIA_EPS +
cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# Vector for 3D points
threedpoints = []
# Vector for 2D points
twodpoints = []
# 3D points real world coordinates
objectp3d = np.zeros((1, CHECKERBOARD[0]
* CHECKERBOARD[1],
3), np.float32)
objectp3d[0, :, :2] = np.mgrid[0:CHECKERBOARD[0],
0:CHECKERBOARD[1]].T.reshape(-1, 2)
prev_img_shape = None
# Extracting path of individual image stored
# in a given directory. Since no path is
# specified, it will take current directory
# jpg files alone
images = glob.glob('*.jpg')
print(images)
for filename in images:
image = cv2.imread(filename)
grayColor = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Find the chess board corners
# If desired number of corners are
# found in the image then ret = true
ret, corners = cv2.findChessboardCorners(
grayColor, CHECKERBOARD,
cv2.CALIB_CB_ADAPTIVE_THRESH
+ cv2.CALIB_CB_FAST_CHECK +
cv2.CALIB_CB_NORMALIZE_IMAGE)
print("return: " + ret.__str__())
# If desired number of corners can be detected then,
# refine the pixel coordinates and display
# them on the images of checker board
if ret == True:
threedpoints.append(objectp3d)
# Refining pixel coordinates
# for given 2d points.
corners2 = cv2.cornerSubPix(
grayColor, corners, (11, 11), (-1, -1), criteria)
twodpoints.append(corners2)
# Draw and display the corners
image = cv2.drawChessboardCorners(image,
CHECKERBOARD,
corners2, ret)
cv2.imshow('img', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
h, w = image.shape[:2]
# Perform camera calibration by
# passing the value of above found out 3D points (threedpoints)
# and its corresponding pixel coordinates of the
# detected corners (twodpoints)
ret, matrix, distortion, r_vecs, t_vecs = cv2.calibrateCamera(
threedpoints, twodpoints, grayColor.shape[::-1], None, None)
# Displaying required output
print(" Camera matrix:")
print(matrix)
print("\n Distortion coefficient:")
print(distortion)
Distortion Coefficients for images 1 and 2 respectively:
Distortion coefficient:
[[ 1.15092474e-01 2.51065895e+00 2.16077891e-03 4.76654910e-03
-3.40419245e+01]]
Distortion coefficient:
[[ 2.50995880e-01 -6.98047707e+00 1.14468356e-03 -1.10525114e-02
1.43212364e+02]]
I'm a novice at openCV, currently i'm following this tutorial on image alignment, i have the following image and template for testing
scanned image(test_image.jpg):
template image(template.jpg):
and the following python code:
from __future__ import print_function
import cv2
import numpy as np
MAX_FEATURES = 500
GOOD_MATCH_PERCENT = 0.15
def alignImages(im1, im2):
# Convert images to grayscale
im1Gray = cv2.cvtColor(im1, cv2.COLOR_BGR2GRAY)
im2Gray = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY)
# Detect ORB features and compute descriptors.
orb = cv2.ORB_create(MAX_FEATURES)
keypoints1, descriptors1 = orb.detectAndCompute(im1Gray, None)
keypoints2, descriptors2 = orb.detectAndCompute(im2Gray, None)
# Match features.
matcher = cv2.DescriptorMatcher_create(
cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_HAMMING)
matches = list(matcher.match(descriptors1, descriptors2, None))
# Sort matches by score
matches.sort(key=lambda x: x.distance, reverse=False)
# Remove not so good matches
numGoodMatches = int(len(matches) * GOOD_MATCH_PERCENT)
matches = matches[:numGoodMatches]
# Draw top matches
imMatches = cv2.drawMatches(im1, keypoints1, im2, keypoints2, matches, None)
cv2.imwrite("matches.jpg", imMatches)
# Extract location of good matches
points1 = np.zeros((len(matches), 2), dtype=np.float32)
points2 = np.zeros((len(matches), 2), dtype=np.float32)
for i, match in enumerate(matches):
points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt
# Find homography
h, mask = cv2.findHomography(points1, points2, cv2.RANSAC)
# Use homography
height, width, channels = im2.shape
im1Reg = cv2.warpPerspective(im1, h, (width, height))
return im1Reg, h
if __name__ == '__main__':
# Read reference image
refFilename = "template.jpg"
print("Reading reference image : ", refFilename)
imReference = cv2.imread(refFilename, cv2.IMREAD_COLOR)
# Read image to be aligned
imFilename = "test_image.jpg"
print("Reading image to align : ", imFilename)
im = cv2.imread(imFilename, cv2.IMREAD_COLOR)
print("Aligning images ...")
# Registered image will be resotred in imReg.
# The estimated homography will be stored in h.
imReg, h = alignImages(im, imReference)
# Write aligned image to disk.
outFilename = "aligned.jpg"
print("Saving aligned image : ", outFilename)
cv2.imwrite(outFilename, imReg)
# Print estimated homography
print("Estimated homography : \n", h)
I get the following results after i ran the script:
matches.jpg:
UPDATE:
I was able to get the image when i increase the amount of orb features to 2000
aligned.jpg
But the homography is still not rotating the image, how can i rotate the image to the same position as the template?
There are two types of forms to finding a homography (forward and backward), but if you already found the homography, applying it can be done without using opencv as follows:
import numpy as np
from scipy.interpolate import griddata
# creating the homogenious coordinates
src_h, src_w, _ = src_image.shape
values = np.matrix.reshape(src_image, (-1, 3), order='F')
yy, xx = np.meshgrid(np.arange(src_h), np.arange(src_w))
input_flat = np.concatenate((xx.reshape((1, -1)), yy.reshape((1, -1)), np.ones_like(xx.reshape((1, -1)))), axis=0)
# applying the homography and converting back to homogenious coordinates
points = np.matmul(homography, input_flat)
points_homogeneous = points[0:2, :] / points[2, :]
# interpolating the result to nicely fit the grid coordinates
dst_image_shape = [400, 400] # could be any number here
yy, xx = np.meshgrid(np.arange(dst_image_shape[1]), np.arange(dst_image_shape[0]))
src_image_warp = griddata(np.transpose(points_homogeneous ), values_relevant, (yy, xx), method='linear')
#numerical rounding
src_image_warp[np.isnan(src_image_warp)] = 0
src_image_warp[src_image_warp > 255] = 255
src_image_warp = np.uint8(src_image_warp)
Note that this is done for a 1 channel image, for RGB image this has to be done for each channel searately. In addition, this could be made to run faster by interpolating only the relevant coordinates since the interpolation is the most time-consuming operation.
With opencv this can be done by:
import cv2
image_dst = cv2.warpPerspective(image_src, homography, size) # size is a tuple (width, height) of the destination image
Read more on homographies and the opencv implementation here.
Finding the homography
The homography can be found without using opencv but that requires knowlage in linear algebra adn the explanation is a bit lengthy, if needed I will post it as an edit. For any practical case however, the homography can be found using opencv as follows:
homography, status = cv2.findHomography(pts_src, pts_dst)
where pts_src are coordinates in the original image and pts_dst are their matching location in the destination image. Since you already found the point pairs, this will yield you the homography (opencv optimizes the hmography for minimal distortion in the backward operation which is the correct way to perform homography computations).
You have a homography h calculated from findHomography and you can use warpPerspective to transform the template to have the same perspective as the photo.
Now you just need to invert the homography, and apply it to the photo instead of the template.
Either use np.linalg.inv for that, or pass the WARP_INVERSE_MAP flag to warpPerspetive instead.
i have a problem with mediapipe coordinations. What i want to do is crop the box of the detected face.
https://google.github.io/mediapipe/solutions/face_detection.html
EXAMPLE OF PROCEDURE
And i use this code below:
mp_face_detection = mp.solutions.face_detection
# Setup the face detection function.
face_detection = mp_face_detection.FaceDetection(model_selection=0, min_detection_confidence=0.5)
# Initialize the mediapipe drawing class.
mp_drawing = mp.solutions.drawing_utils
# Read an image from the specified path.
sample_img = cv2.imread('12345.jpg')
# Specify a size of the figure.
plt.figure(figsize = [10, 10])
# Display the sample image, also convert BGR to RGB for display.
plt.title("Sample Image");plt.axis('off');plt.imshow(sample_img[:,:,::-1]);plt.show()
face_detection_results = face_detection.process(sample_img[:,:,::-1])
# Check if the face(s) in the image are found.
if face_detection_results.detections:
# Iterate over the found faces.
for face_no, face in enumerate(face_detection_results.detections):
# Display the face number upon which we are iterating upon.
print(f'FACE NUMBER: {face_no+1}')
print('---------------------------------')
# Display the face confidence.
print(f'FACE CONFIDENCE: {round(face.score[0], 2)}')
# Get the face bounding box and face key points coordinates.
face_data = face.location_data
# Display the face bounding box coordinates.
print(f'\nFACE BOUNDING BOX:\n{face_data.relative_bounding_box}')
# Iterate two times as we only want to display first two key points of each detected face.
for i in range(2):
# Display the found normalized key points.
print(f'{mp_face_detection.FaceKeyPoint(i).name}:')
print(f'{face_data.relative_keypoints[mp_face_detection.FaceKeyPoint(i).value]}')
So the results are in this form:
FACE NUMBER: 1
FACE CONFIDENCE: 0.89
FACE BOUNDING BOX:
xmin: 0.2784463167190552
ymin: 0.3503175973892212
width: 0.1538110375404358
height: 0.23071599006652832
RIGHT_EYE:
x: 0.3447018265724182
y: 0.4222590923309326
LEFT_EYE:
x: 0.39114508032798767
y: 0.3888365626335144
And i want to CROP the image in the coordinations of the BOX.
Like
face = Image.fromarray(image).crop(face_rect)
or any other crop procedure.
My problem is that i can't get the coords of the detected item from mediapipe.
Any ideas?
Got the solution guys
import dlib
from PIL import Image
from skimage import io
h, w, c = sample_img.shape
print('width: ', w)
print('height: ', h)
xleft = data.xmin*w
xleft = int(xleft)
xtop = data.ymin*h
xtop = int(xtop)
xright = data.width*w + xleft
xright = int(xright)
xbottom = data.height*h + xtop
xbottom = int(xbottom)
detected_faces = [(xleft, xtop, xright, xbottom)]
for n, face_rect in enumerate(detected_faces):
face = Image.fromarray(image_c).crop(face_rect)
face_np = np.asarray(face)
plt.imshow(face_np)
Assume, the objective is to crop a single detected face by mediapipe . Note the [0] to indicate that we are only interested in single face
results = mp_face.process(image_input)
detection=results.detections[0]
By default mediapipe returns detection data in normalize form and we have to convert to original size by multiplying x values by width and y values by height of input image.
We can employed the _normalized_to_pixel_coordinates available with the mediapipe
relative_bounding_box = location.relative_bounding_box
rect_start_point = _normalized_to_pixel_coordinates(
relative_bounding_box.xmin, relative_bounding_box.ymin, image_cols,
image_rows)
rect_end_point = _normalized_to_pixel_coordinates(
relative_bounding_box.xmin + relative_bounding_box.width,
relative_bounding_box.ymin + relative_bounding_box.height, image_cols,
image_rows)
This essentially produce
xleft,ytop=rect_start_point
xright,ybot=rect_end_point
In other word, ytop. ybot, xleft. xright represent face_top, face_bottom, face_left, and face_right, respectively.
Since the image is simply a 3D np array, we can crop it as below
crop_img = image_input[ytop: ybot, xleft: xright]
The complete code is as below
import cv2
import mediapipe as mp
from mediapipe.python.solutions.drawing_utils import _normalized_to_pixel_coordinates
# load face detection model
mp_face = mp.solutions.face_detection.FaceDetection(
model_selection=1, # model selection
min_detection_confidence=0.5 # confidence threshold
)
dframe= cv2.imread('xx.png',0)
image_rows, image_cols, _ = dframe.shape
image_input = cv2.cvtColor(dframe, cv2.COLOR_BGR2RGB)
results = mp_face.process(image_input)
detection=results.detections[0]
location = detection.location_data
relative_bounding_box = location.relative_bounding_box
rect_start_point = _normalized_to_pixel_coordinates(
relative_bounding_box.xmin, relative_bounding_box.ymin, image_cols,
image_rows)
rect_end_point = _normalized_to_pixel_coordinates(
relative_bounding_box.xmin + relative_bounding_box.width,
relative_bounding_box.ymin + relative_bounding_box.height, image_cols,
image_rows)
## Lets draw a bounding box
color = (255, 0, 0)
thickness = 2
cv2.rectangle(image_input, rect_start_point, rect_end_point, color, thickness)
xleft,ytop=rect_start_point
xright,ybot=rect_end_point
crop_img = image_input[ytop: ybot, xleft: xright]
cv2.imwrite('crop_image0.jpg', crop_img)
In the image bellow, we see a defined world plane coordinate (X,Y,0) where Z=0. The camera as we can see is heading towards the defined world plane.
World reference point is located on the top left of the Grid (0,0,0). The distance between every two yellow point is 40 cm
I've calibrated my camera using the checkerboard and then used the built-in function cv2.solvePnP in order to estimate the rotation and translation vector of the camera with respect to my defined world coordinates. The results are as follows:
tvec_cam= [[-5.47884374]
[-3.08581371]
[24.15112048]]
rvec_cam= [[-0.02823308]
[ 0.08623225]
[ 0.01563199]]
According to the results, the (tx,ty,tz) seems to be right as the camera is located in the negative quarter of X,Y world-coordinates
However, i'm getting confused by interpreting the rotation vector.!
Does the resulted rotation vector say that the camera coordinates are almost aligned with the world coordinate axis (means almost no rotation!)?,
If yes how could this be true?, since according to OPENCV's camera coordinates, the Z-axis of the camera is pointing towards the scene (which means towards the world plane), the X-axis points towards the image write (which means opposite of X-world axis) and the Y-axis of the camera points towards the image bottom (which also means opposite of the Y-world axis)
Moreover, what is the unit of the tvec?
Note: I've illustrated the orientation of the defined world-coordinate axis according the the result of the translation vector (both tx and ty are negative)
the code i used for computing the rotation and translation vectors is shown below:
import cv2 as cv
import numpy as np
WPoints = np.zeros((9*3,3), np.float64)
WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
imPoints=np.array([[20,143],[90,143],[161,143],[231,144],[303,144],
[374,144],[446,145],[516,146],[587,147],[18,214],[88,214]
,[159,215],[230,215],[302,216],[374,216],[446,216]
,[517,217],[588,217],[16,285],[87,285],[158,286],[229,287],
[301,288]
,[374,289],[446,289],[518,289],[589,289]],dtype=np.float64)
#load the rotation matrix [[4.38073915e+03 0.00000000e+00 1.00593352e+03]
# [0.00000000e+00 4.37829226e+03 6.97020491e+02]
# [0.00000000e+00 0.00000000e+00 1.00000000e+00]]
with np.load('parameters_cam1.npz') as X:
mtx, dist, _, _ = [X[i] for i in ('mtx','dist','rvecs','tvecs')]
ret,rvecs, tvecs = cv.solvePnP(WPoints, imPoints, mtx, dist)
np.savez("extrincic_camera1.npz",rvecs=rvecs,tvecs=tvecs)
print(tvecs)
print(rvecs)
cv.destroyAllWindows()
The code for estimating the intrinsic is show below
import numpy as np
import cv2
import glob
import argparse
import pathlib
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--path", required=True, help="path to images folder")
ap.add_argument("-e", "--file_extension", required=False, default=".jpg",
help="extension of images")
args = vars(ap.parse_args())
path = args["path"] + "*" + args["file_extension"]
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (0.03,0,0), (0.06,0,0) ....,
#(0.18,0.12,0)
objp = np.zeros((5*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:5].T.reshape(-1,2)*0.03
#print(objp)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
#images = glob.glob('left/*.jpg') #read a series of images
images = glob.glob(path)
path = 'foundContours'
#pathlib.Path(path).mkdir(parents=True, exist_ok=True)
found = 0
for fname in images:
img = cv2.imread(fname) # Capture frame-by-frame
#print(images[im_i])
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (7,5), None)
# print(corners)
# If found, add object points, image points (after refining them)
if ret == True:
print('true')
objpoints.append(objp) # Certainly, every loop objp is the same, in 3D.
#print('obj_point',objpoints)
corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
# print(corners2)
imgpoints.append(corners2)
print(imgpoints)
print('first_point',imgpoints[0])
#print(imgpoints.shape())
# Draw and display the corners
img = cv2.drawChessboardCorners(img, (7,5), corners2, ret)
found += 1
cv2.imshow('img', img)
cv2.waitKey(1000)
# if you want to save images with detected corners
# uncomment following 2 lines and lines 5, 18 and 19
image_name = path + '/calibresult' + str(found) + '.jpg'
cv2.imwrite(image_name, img)
print("Number of images used for calibration: ", found)
# When everything done, release the capture
# cap.release()
cv2.destroyAllWindows()
#calibration
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints,
gray.shape[::-1],None,None)
#save parameters needed in undistortion
np.savez("parameters_cam1.npz",mtx=mtx,dist=dist,rvecs=rvecs,tvecs=tvecs)
np.savez("points_cam1.npz",objpoints=objpoints,imgpoints=imgpoints)
print ("Camera Matrix = |fx 0 cx|")
print (" | 0 fy cy|")
print (" | 0 0 1|")
print (mtx)
print('distortion coefficients=\n', dist)
print('rotation vector for each image=', *rvecs, sep = "\n")
print('translation vector for each image=', *tvecs, sep= "\n")
Hope you could help me understanding this
Thanks in Advance
First, tvec is a in Axis-angle representation (https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representation).
You can obtain the rotation matrix using cv2.Rodrigues(). For your data, I get almost the identity:
[[ 0.99616253 -0.01682635 0.08588995]
[ 0.01439347 0.99947963 0.02886672]
[-0.08633098 -0.02751969 0.99588635]]
Now, according to the directions of x and y in your picture, the z-axis points downwards (apply carefully the right-hand rule). This explains why the z-axis of the camera is almost aligned with the z-axis of your world reference frame.
Edit: Digging a little bit further, from the code you posted:
WPoints = np.zeros((9*3,3), np.float64)
WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
The values for X and Y are all positive and increment to the right and to the bottom respectively, so you are indeed using the usual convention. You are actually using X and Y incrementing to the right and down respectively and what's wrong is only the arrows you drew in the picture.
Edit Concerning the interpretation of the translation vector: in the OpenCV convention, the points in the local camera reference frame are obtained from the points in the world reference frame like this:
|x_cam| |x_world|
|y_cam| = Rmat * |y_world| + tvec
|z_cam| |z_world|
With this convention, tvec is the position of the world origin in the camera reference frame. What's more easily interpretable is the position of the camera origin in the world reference frame, which can be obtained as:
cam_center = -(tvec * R_inv)
Where R_inv is the inverse of the rotation matrix. Here the rotation matrix is almost the identity, so a quick approximation would be -tvec, which is (5.4, 3.1, -24.1).
I have a set of arbitrary images. Half the images are pictures, half are masks defining ROIS.
In the current version of my program I use the ROI to crop the image (i.e I extract the rectangle in the image matching the bounding box of the ROI mask). The problem is, the ROI mask isn't perfect and it's better to over predict than under predict in my case.
So I want to copy more than the ROI rectangle, but if I do this, I may be trying to crop out of the image.
i.e:
x, y, w, h = cv2.boundingRect(mask_contour)
img = img[int(y-h*0.05):int(y + h * 1.05), int(x-w*0.05):int(x + w * 1.05)]
can fail because it tries to access out of bounds pixels. I could just clamp the values, but I wanted to know if there is a better approach
You can add a boarder using OpenCV
import cv2 as cv
import random
src = cv.imread('/home/stephen/lenna.png')
borderType = cv.BORDER_REPLICATE
boarderSize = .5
top = int(boarderSize * src.shape[0]) # shape[0] = rows
bottom = top
left = int(boarderSize * src.shape[1]) # shape[1] = cols
right = left
value = [random.randint(0,255), random.randint(0,255), random.randint(0,255)]
dst = cv.copyMakeBorder(src, top, bottom, left, right, borderType, None, value)
cv.imshow('img', dst)
c = cv.waitKey(0)
Maybe you could try to limit the coordinates beforehand. Please see the code below:
[ymin, ymax] = [max(0,int(y-h*0.05)), min(h, int(y+h*1.05))]
[xmin, xmax] = [max(0,int(x-w*1.05)), min(w, int(x+w*1.05))]
img = img[ymin:ymax, xmin:xmax]