I am attempting to calibrate my single webcam using opencv cv2 in python. I am using the cv2.findChessboardCorners and cv2.calibrateCamera functions. However, my root mean square returned from the calibrateCamera function seems very high. It is always around 50 no matter how many frames with found boards are used. I read that good values range from 0-1. I am using a 5x8 black and white checker pattern on a piece of paper taped to a wooden board. Can anyone help me out with getting this lower? The weird thing is I used images I rendered from Blender, a 3d modeling software, in which there is no lens distortion and the board's coordinates are known for sure and I was able to get a RMS of 0.22 which is good. Using similar code though I cannot replicate those results with my webcam. Perhaps I am missing something. Thanks so much to everyone who takes a look at this. Here is the complete code:
import sys
import os
import numpy as np
import cv2
import time
'''
This module finds the intrinsic parameters of the camera. These parameters include
the focal length, pixel aspect ratio, image center, and lens distortion (see wiki
entry for "camera resectioning" for more detail). It is important to note that the
parameters found by this class are independent of location and rotation of the camera.
Thus, it only needs to be calculated once assuming the lens and focus of the camera is
unaltered. The location and rotation matrix are defined by the extrinsic parameters.
'''
class Find_Intrinsics:
'''Finds the intrinsic parameters of the camera.'''
def __init__(self):
#Import user input from Blender in the form of argv's
self.rows = int(sys.argv[1])
self.cols = int(sys.argv[2])
self.board_width_pxls = float(sys.argv[3])
self.pxls_per_sq_unit = float(sys.argv[4])
self.printer_scale = float(sys.argv[5])
def find_calib_grid_points(self,cols,rows,board_width_pxls,pxls_per_sq_unit,printer_scale):
'''Defines the distance of the board squares from each other and scale them.
The scaling is to correct for the scaling of the printer. Most printers
cannot print all the way to the end of the page and thus scale images to
fit the entire image. If the user does not desire to maintain real world
scaling, then an arbitrary distance is set. The 3rd value appended to
calib_points signifies the depth of the points and is always zero because
they are planar.
'''
#should be dist for each square
point_dist = (((board_width_pxls)/(pxls_per_sq_unit))*printer_scale)/(cols+2)
calib_points = []
for i in range(0,cols):
for j in range(0,rows):
pointX = 0 + (point_dist*j)
pointY = 0 - (point_dist*i)
calib_points.append((pointX,pointY,0))
np_calib_points = np.array(calib_points,np.float32)
return np_calib_points
def main(self):
print '---------------------------Finding Intrinsics----------------------------------'
np_calib_points = self.find_calib_grid_points(self.cols,self.rows,self.board_width_pxls,
self.pxls_per_sq_unit,self.printer_scale)
pattern_size = (self.cols,self.rows)
obj_points = []
img_points = []
camera = cv2.VideoCapture(0)
found_count = 0
while True:
found_cam,img = camera.read()
h, w = img.shape[:2]
print h,w
gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
found, corners = cv2.findChessboardCorners(img, pattern_size)
if found:
term = ( cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_COUNT, 30, 0.1 )
cv2.cornerSubPix(gray_img, corners, (5, 5), (-1, -1), term)
cv2.drawChessboardCorners(img, pattern_size, corners, found)
found_count+=1
img_points.append(corners.reshape(-1, 2))
obj_points.append(np_calib_points)
cv2.putText(img,'Boards found: '+str(found_count),(30,30),
cv2.FONT_HERSHEY_DUPLEX, 0.8,(0,0,255,1))
cv2.putText(img,'Press any key when finished',(30,h-30),
cv2.FONT_HERSHEY_DUPLEX, 0.8,(0,0,255,1))
cv2.imshow("webcam",img)
if (cv2.waitKey (1000) != -1):
cv2.destroyAllWindows()
del(camera)
np_obj_points = np.array(obj_points)
print "Calibrating.Please be patient"
start = time.clock()
#OpenCV function to solve for camera matrix
try:
print obj_points[0:10]
rms, camera_matrix, dist_coefs, rvecs, tvecs = cv2.calibrateCamera(obj_points, img_points, (w, h))
print "RMS:", rms
print "camera matrix:\n", camera_matrix
print "distortion coefficients: ", dist_coefs
#Save the camera matrix and the distortion coefficients to the hard drive to use
#to find the extrinsics
#want to use same file directory as this file
#directory = os.path.dirname(os.path.realpath('Find_Intrinsics.py'))
np.save('C:\\Users\\Owner\\Desktop\\3D_Scanning\\Blender_3d_Scanning\\camera_matrix.npy',camera_matrix)
np.save('C:\\Users\\Owner\\Desktop\\3D_Scanning\\Blender_3d_Scanning\\dist_coefs.npy',dist_coefs)
elapsed = (time.clock() - start)
print("Elapsed time: ", elapsed, " seconds")
img_undistort = cv2.undistort(img,camera_matrix,dist_coefs)
cv2.namedWindow('Undistorted Image',cv2.WINDOW_NORMAL)#WINDOW_NORMAL used bc WINDOW_AUTOSIZE does not let you resize
cv2.resizeWindow('Undistorted Image',w,h)
cv2.imshow('Undistorted Image',img_undistort)
cv2.waitKey(0)
cv2.destroyAllWindows()
break
except:
print "\nSorry, an error has occurred. Make sure more than zero boards are found."
break
if __name__ == '__main__' and len(sys.argv)== 6:
Intrinsics = Find_Intrinsics()
Intrinsics_main = Intrinsics.main()
else:
print "Incorrect number of args found. Make sure that the python27 filepath is entered correctly."
print '--------------------------------------------------------------------------------'
Wow, I do not know what it is about having aha moments (or doh! moments if you watch The Simpsons) right after posting on stackoverflow haha. This one was a bone head mistake. I managed to mess up the construction of the obj_points parameter for the calibrateCamera function. OpenCV takes the first point to be the top left and goes through each row, left to right, until it comes to the last point (bottom right). As such, my find_calib_grid_points function was wrong. Here is the correct code just in case someone else gets tripped up on that, though that is probably unlikely:
for j in range(0,rows):
for i in range(0,cols):
pointX = 0 + (point_dist*i)
pointY = 0 - (point_dist*j)
calib_points.append((pointX,pointY,0))
np_calib_points = np.array(calib_points,np.float32)
Thanks to everyone who looked at this and sent me psychic messages so I could figure this one out right haha. They worked! My RMS was 0.33!
Related
I'm attempting to be able to triangulate a 3D position using the OpenCV python library by calibrating two cameras and using the stereoCalibrate function. I want the cameras to be able to be attached to the ends of a bar and measure positions around 5m away from the cameras. The majority of code is similar to the code used by Temuge Batpurev (Source: https://temugeb.github.io/opencv/python/2021/02/02/stereo-camera-calibration-and-triangulation.html) and when I run his calibration images through my code, the RSME value ("ret" in his code, retStereo in my code) returned is the 2.4 expected as Temuge outlined.
When I use my own images, which are time-synched through GPS on GoPro 10s, I average around 50. Originally, I thought it was because I was calibrating for too large of an area but close up I still get around 50
Current Calibration Method:
Starting close to the cameras, without the chessboard going out of frame for either camera, slowly waving the board around. Ensuring it runs along the edges of the frames for both cameras as well as the center before moving backwards and repeating this at various depths.
Things I have tried:
Increasing the size of chessboard squares (48mm, 61mm, 109mm)
Increasing the number of rows/columns on the chessboard
Changing lighting conditions
More calibration images, I use a script to extract frames from a video so I normally use 20+ calibration frames
Using a smaller area to triangulate
Seeing if there was a change having one camera at an angle as opposed to having the cameras parallel
Checking the findChessboardCorner function actually finds the corners of the chessboard
Ensuring the chessboard is in many different positions (bottom corner, top corner, center of the frames for each camera)
Moving towards and away from the camera in videos.
Changed the criteria and criteria_stereo variables to see if that changed anything
Images of my most recent video, 109mm squares:
My code:
############### FIND CHESSBOARD CORNERS - OBJECT POINTS AND IMAGE POINTS #############################
chessboardSize = (8,5) # Other chessboard sizes used - (5,3) OR (9,6)
# Paths to the captured frames (should be in synch) (stereoLeft and stereoRight)
CALIBRATION_IMAGES_PATH_LEFT = 'images_vid\\stereoLeft\\*.png'
CALIBRATION_IMAGES_PATH_RIGHT = 'images_vid\\stereoRight\\*.png'
# termination criteria
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)
objp = np.zeros((chessboardSize[0] * chessboardSize[1], 3), np.float32) # creates 9*6 list of (0.,0.,0.)
objp[:,:2] = np.mgrid[0:chessboardSize[0],0:chessboardSize[1]].T.reshape(-1,2) # formats list with (column no., row no., 0.) where max column no. = 8, and max row no. = 5
size_of_chessboard_squares_mm = 109
objp = objp * size_of_chessboard_squares_mm
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpointsL = [] # 2d points in image plane.
imgpointsR = [] # 2d points in image plane.
imagesLeft = sorted(glob.glob(CALIBRATION_IMAGES_PATH_LEFT))
imagesRight = sorted(glob.glob(CALIBRATION_IMAGES_PATH_RIGHT))
for imgLeft, imgRight in zip(imagesLeft, imagesRight):
imgL = cv.imread(imgLeft)
imgR = cv.imread(imgRight)
grayL = cv.cvtColor(imgL, cv.COLOR_BGR2GRAY)
grayR = cv.cvtColor(imgR, cv.COLOR_BGR2GRAY)
# Get the corners of the chess board
retL, cornersL = cv.findChessboardCorners(grayL, chessboardSize, None)
retR, cornersR = cv.findChessboardCorners(grayR, chessboardSize, None)
# Add object points and image points if chess board corners are found
if retL and retR == True:
objpoints.append(objp)
cornersL = cv.cornerSubPix(grayL, cornersL, (11,11), (-1,-1), criteria)
imgpointsL.append(cornersL)
cornersR = cv.cornerSubPix(grayR, cornersR, (11,11), (-1,-1), criteria)
imgpointsR.append(cornersR)
#Draw corners for user feedback
cv.drawChessboardCorners(imgL, chessboardSize, cornersL, retL)
cv.imshow('img left', imgL)
cv.drawChessboardCorners(imgR, chessboardSize, cornersR, retR)
cv.imshow('img right', imgR)
cv.waitKey()
cv.destroyAllWindows()
############# CALIBRATION #######################################################
retL, cameraMatrixL, distL, rvecsL, tvecsL = cv.calibrateCamera(objpoints, imgpointsL, frameSize, None, None)
heightL, widthL, channelsL = imgL.shape
newCameraMatrixL, roi_L = cv.getOptimalNewCameraMatrix(cameraMatrixL, distL, (widthL, heightL), 1, (widthL, heightL))
retR, cameraMatrixR, distR, rvecsR, tvecsR = cv.calibrateCamera(objpoints, imgpointsR, frameSize, None, None)
heightR, widthR, channelsR = imgR.shape
newCameraMatrixR, roi_R = cv.getOptimalNewCameraMatrix(cameraMatrixR, distR, (widthR, heightR), 1, (widthR, heightR))
######### Stereo Vision Calibration #############################################
## stereoCalibrate Output: retStereo is RSME, newCameraMatrixL and newCameraMatrixR are the intrinsic matrices for both
## cameras, distL and distR are the distortion coeffecients for both cameras, rot is the rotation matrix,
## trans is the translation matrix, and essentialMatrix and fundamentalMatrix are self descriptive
# R and T are taken from stereoCalibrate to use in triangulation
header = ['Rotation','Translation', 'ProjectionLeft', 'ProjectionRight'] # for the csv file
flags = 0
flags = cv.CALIB_FIX_INTRINSIC
criteria_stereo= (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 100, 0.0001)
retStereo, newCameraMatrixL, distL, newCameraMatrixR, distR, rot, trans, essentialMatrix, fundamentalMatrix = cv.stereoCalibrate(objpoints, imgpointsL, imgpointsR, cameraMatrixL, distL, cameraMatrixR, distR, grayL.shape[::-1], criteria_stereo, flags)
print(retStereo)
print('stereo vision done')
Are there any immediate flags with my code or my calibration method? Or recommendations for improving the code? Thanks for taking your time to help me :)
In the image bellow, we see a defined world plane coordinate (X,Y,0) where Z=0. The camera as we can see is heading towards the defined world plane.
World reference point is located on the top left of the Grid (0,0,0). The distance between every two yellow point is 40 cm
I've calibrated my camera using the checkerboard and then used the built-in function cv2.solvePnP in order to estimate the rotation and translation vector of the camera with respect to my defined world coordinates. The results are as follows:
tvec_cam= [[-5.47884374]
[-3.08581371]
[24.15112048]]
rvec_cam= [[-0.02823308]
[ 0.08623225]
[ 0.01563199]]
According to the results, the (tx,ty,tz) seems to be right as the camera is located in the negative quarter of X,Y world-coordinates
However, i'm getting confused by interpreting the rotation vector.!
Does the resulted rotation vector say that the camera coordinates are almost aligned with the world coordinate axis (means almost no rotation!)?,
If yes how could this be true?, since according to OPENCV's camera coordinates, the Z-axis of the camera is pointing towards the scene (which means towards the world plane), the X-axis points towards the image write (which means opposite of X-world axis) and the Y-axis of the camera points towards the image bottom (which also means opposite of the Y-world axis)
Moreover, what is the unit of the tvec?
Note: I've illustrated the orientation of the defined world-coordinate axis according the the result of the translation vector (both tx and ty are negative)
the code i used for computing the rotation and translation vectors is shown below:
import cv2 as cv
import numpy as np
WPoints = np.zeros((9*3,3), np.float64)
WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
imPoints=np.array([[20,143],[90,143],[161,143],[231,144],[303,144],
[374,144],[446,145],[516,146],[587,147],[18,214],[88,214]
,[159,215],[230,215],[302,216],[374,216],[446,216]
,[517,217],[588,217],[16,285],[87,285],[158,286],[229,287],
[301,288]
,[374,289],[446,289],[518,289],[589,289]],dtype=np.float64)
#load the rotation matrix [[4.38073915e+03 0.00000000e+00 1.00593352e+03]
# [0.00000000e+00 4.37829226e+03 6.97020491e+02]
# [0.00000000e+00 0.00000000e+00 1.00000000e+00]]
with np.load('parameters_cam1.npz') as X:
mtx, dist, _, _ = [X[i] for i in ('mtx','dist','rvecs','tvecs')]
ret,rvecs, tvecs = cv.solvePnP(WPoints, imPoints, mtx, dist)
np.savez("extrincic_camera1.npz",rvecs=rvecs,tvecs=tvecs)
print(tvecs)
print(rvecs)
cv.destroyAllWindows()
The code for estimating the intrinsic is show below
import numpy as np
import cv2
import glob
import argparse
import pathlib
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--path", required=True, help="path to images folder")
ap.add_argument("-e", "--file_extension", required=False, default=".jpg",
help="extension of images")
args = vars(ap.parse_args())
path = args["path"] + "*" + args["file_extension"]
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (0.03,0,0), (0.06,0,0) ....,
#(0.18,0.12,0)
objp = np.zeros((5*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:5].T.reshape(-1,2)*0.03
#print(objp)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
#images = glob.glob('left/*.jpg') #read a series of images
images = glob.glob(path)
path = 'foundContours'
#pathlib.Path(path).mkdir(parents=True, exist_ok=True)
found = 0
for fname in images:
img = cv2.imread(fname) # Capture frame-by-frame
#print(images[im_i])
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (7,5), None)
# print(corners)
# If found, add object points, image points (after refining them)
if ret == True:
print('true')
objpoints.append(objp) # Certainly, every loop objp is the same, in 3D.
#print('obj_point',objpoints)
corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
# print(corners2)
imgpoints.append(corners2)
print(imgpoints)
print('first_point',imgpoints[0])
#print(imgpoints.shape())
# Draw and display the corners
img = cv2.drawChessboardCorners(img, (7,5), corners2, ret)
found += 1
cv2.imshow('img', img)
cv2.waitKey(1000)
# if you want to save images with detected corners
# uncomment following 2 lines and lines 5, 18 and 19
image_name = path + '/calibresult' + str(found) + '.jpg'
cv2.imwrite(image_name, img)
print("Number of images used for calibration: ", found)
# When everything done, release the capture
# cap.release()
cv2.destroyAllWindows()
#calibration
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints,
gray.shape[::-1],None,None)
#save parameters needed in undistortion
np.savez("parameters_cam1.npz",mtx=mtx,dist=dist,rvecs=rvecs,tvecs=tvecs)
np.savez("points_cam1.npz",objpoints=objpoints,imgpoints=imgpoints)
print ("Camera Matrix = |fx 0 cx|")
print (" | 0 fy cy|")
print (" | 0 0 1|")
print (mtx)
print('distortion coefficients=\n', dist)
print('rotation vector for each image=', *rvecs, sep = "\n")
print('translation vector for each image=', *tvecs, sep= "\n")
Hope you could help me understanding this
Thanks in Advance
First, tvec is a in Axis-angle representation (https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representation).
You can obtain the rotation matrix using cv2.Rodrigues(). For your data, I get almost the identity:
[[ 0.99616253 -0.01682635 0.08588995]
[ 0.01439347 0.99947963 0.02886672]
[-0.08633098 -0.02751969 0.99588635]]
Now, according to the directions of x and y in your picture, the z-axis points downwards (apply carefully the right-hand rule). This explains why the z-axis of the camera is almost aligned with the z-axis of your world reference frame.
Edit: Digging a little bit further, from the code you posted:
WPoints = np.zeros((9*3,3), np.float64)
WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
The values for X and Y are all positive and increment to the right and to the bottom respectively, so you are indeed using the usual convention. You are actually using X and Y incrementing to the right and down respectively and what's wrong is only the arrows you drew in the picture.
Edit Concerning the interpretation of the translation vector: in the OpenCV convention, the points in the local camera reference frame are obtained from the points in the world reference frame like this:
|x_cam| |x_world|
|y_cam| = Rmat * |y_world| + tvec
|z_cam| |z_world|
With this convention, tvec is the position of the world origin in the camera reference frame. What's more easily interpretable is the position of the camera origin in the world reference frame, which can be obtained as:
cam_center = -(tvec * R_inv)
Where R_inv is the inverse of the rotation matrix. Here the rotation matrix is almost the identity, so a quick approximation would be -tvec, which is (5.4, 3.1, -24.1).
This is my first post here on stackoverflow. Please sorry for my english and my knowledge in programming if they are somehow disturbing.
Well, I am trying to do camera calibration with opencv 2.4.9 in windows 8.1 operating system (ubuntu operating system doesn't resolve the problem.)
Problem: I am using the code below to calibrate my camera, but it seems that if the number of my sample images (with check board pattern) is more than 2, then the roi of newcameramtx,roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h)) result in [0,0,0,0]. How is the number of samples connected to that result? (earlier, prior of making some changes in this code, the maximum number of samples was 12).
By saying maximum number of samples I mean the images acquired from my camera with the chessboard pattern, were the roi doesn't give good result if the number exceeds the maximum number.
The corner detection works very well. You can find my sample images here.
# -*- coding: utf-8 -*-
"""
Created on Fri May 16 15:23:00 2014
#author: kakarot
"""
import numpy as np
import cv2
#import os
#import time
from matplotlib import pyplot as plt
LeftorRight = 'L'
numer = 12
chx = 6
chy = 9
chd = 25
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, numer, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((chy*chx,3), np.float32)
objp[:,:2] = np.mgrid[0:chy,0:chx].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space, (x25mm)
imgpoints = [] # 2d points in image plane.
enum = 1
while(enum<=numer):
img=cv2.imread('1280x720p/BestAsPerMatlab/calib_'+str(LeftorRight)+str(enum)+'.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (chy,chx),None)
#cv2.imshow('Calibration',img)
# If found, add object points, image points (after refining them)
if ret == True and enum <= numer:
objpoints.append(objp*chd)
cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners)
# Draw and display the corners
cv2.drawChessboardCorners(img, (chy,chx),corners,ret)
cv2.imshow('Calibration',img)
cv2.imwrite('1280x720p/Chessboard/calibrated_L{0}.jpg'.format(enum),img)
print enum
#time.sleep(2)
if enum == numer:
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
img = cv2.imread('1280x720p/BestAsPerMatlab/calib_'+str(LeftorRight)+'7.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
h, w = img.shape[:2] #a (1 to see the whole picture)
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
if (np.size(roi) == 4 and np.mean(roi) != 0):
# undistort
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(img,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
dst = cv2.cvtColor(dst,cv2.COLOR_RGB2BGR)
plt.imshow(dst)
#cv2.imwrite('result.jpg',dst)
#np.savetxt('mtxL.txt',mtx)
#np.savetxt('distL.txt',dist)
else:
np.disp('Something Went Wrong')
enum += 1
'''
k = cv2.waitKey(1) & 0xFF
if k == 27:
break
'''
cv2.destroyAllWindows()
EDIT: I am using two cheap usb cameras. I figured out that the sample set of one of the cameras is ok, and i can use more than 19 samples without problem. But when using the calibrating samples of the other camera the maximum number of sample images is 2. (if I make another set of samples the number will vary). In conclusion it seems that there is something going on with the calibrating matrixes that produce. But it is weird though.
Finally I am using fisheye cameras, believing that cutting enough pixels around the end of each capture I would simulate a normal camera... maybe this is what is causing me the trouble!
You should change dist to
dist = np.array([-0.13615181, 0.53005398, 0, 0, 0]) # no translation
and then make call to
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
It worked for me.
I am trying to simulate an image standing out of a marker. This is my code so far which does what is pictured. Essentially, I just want to rotate the image to appear to stand out orthogonal to the checkerboard.
As you can see, I use the code to find the transformation matrix between a normalized square image and the corresponding checkerboard corners. I then use warpPerspective to get the image you see. I know that I can use the rotation vectors from the solvePnP to obtain a rotation matrix through rodrigues() but I dont know what the next step is
def transformTheSurface(inputFrame):
ret, frameLeft = capleft.read()
capGray = cv2.cvtColor(frameLeft,cv2.COLOR_BGR2GRAY)
found, corners = cv2.findChessboardCorners(capGray, (5,4), None, cv2.CALIB_CB_NORMALIZE_IMAGE + cv2.CALIB_CB_ADAPTIVE_THRESH ) #,None,cv2.CALIB_CB_FAST_CHECK)
if (found):
npGameFrame = pygame.surfarray.array3d(inputFrame)
inputFrameGray = cv2.cvtColor(npGameFrame,cv2.COLOR_BGR2GRAY)
cv2.drawChessboardCorners(frameLeft, (5,4), corners, found)
q = corners[[0, 4, 15, 19]]
ret, rvecs, tvecs = cv2.solvePnP(objp, corners, mtx, dist)
ptMatrix = cv2.getPerspectiveTransform( muffinCoords, q)
npGameFrame = cv2.flip(npGameFrame, 0)
ptMatrixWithXRot = ptMatrix * rodRotMat[0]
#inputFrameConv = cv2.cvtColor(npGameFrame,cv2.COLOR_BGRA2GRAY)
transMuffin = cv2.warpPerspective(npGameFrame, ptMatrix, (640, 480)) #, muffinImg, cv2.INTER_NEAREST, cv2.BORDER_CONSTANT, 0)
EDIT:
I have added some more code, in hopes to create my own 3x3 transformation matrix. I used the following reference . Here is my code for that:
#initialization happens earlier in code
muffinCoords = np.zeros((4,2), np.float32)
muffinCoords[0] = (0,0)
muffinCoords[1] = (200,0)
muffinCoords[2] = (0,200)
muffinCoords[3] = (200,200)
A1 = np.zeros((4,3), np.float32)
A1[0] = (1,0,322)
A1[1] = (0,1,203)
A1[2] = (0,0,0)
A1[3] = (0,0,1)
R = np.zeros((4,4), np.float32)
R[3,3] = 1.0
T = np.zeros((4,4), np.float32)
T[0] = (1,0,0,0)
T[1] = (0,1,0,0)
T[2] = (0,0,1,0)
T[3] = (0,0,0,1)
#end initialization
#load calib data derived using cv2.calibrateCamera, my Fx and Fy are about 800
loadedCalibFileMTX = np.load('calibDataMTX.npy')
mtx = np.zeros((3,4), np.float32)
mtx[:3,:3] = loadedCalibFileMTX
#this is new to my code, creating what I interpret as Rx*Ry*Rz
ret, rvecCalc, tvecs = cv2.solvePnP(objp, corners, loadedCalibFileMTX, dist)
rodRotMat = cv2.Rodrigues(rvecCalc)
R[:3,:3] = rodRotMat[0]
#then I create T
T[0,3] = tvecs[0]
T[1,3] = tvecs[1]
T[2,3] = tvecs[2]
# CREATING CUSTOM TRANSFORM MATRIX
# A1 -> 2d to 3d projection matrix
# R-> rotation matrix as calculated by solve PnP, or Rx * Ry * Rz
# T -> converted translation matrix, reference from site, vectors pulled from tvecs of solvPnP
# mtx -> 3d to 2d matrix
# customTransformMat = mtx * (T * (R * A1)) {this is intended calculation of following}
first = np.dot(R, A1)
second = np.dot(T, first)
finalCalc = np.dot(mtx, second)
finalNorm = finalCalc/(finalCalc[2,2]) # to make sure that the [2,2] element is 1
transMuffin = cv2.warpPerspective(npGameFrame, finalNorm, (640, 480), None, cv2.INTER_NEAREST, cv2.BORDER_CONSTANT, 0)
#transMuffin is returned as undefined here, any help?
# using the cv2.getPerspectiveTransform method to find what you can find pictured at the top
ptMatrix = cv2.getPerspectiveTransform( muffinCoords, q)
I finally figured out the right methodology. you can find the code here https://github.com/mikezucc/augmented-reality-fighter-pygame
NOTICE:
Almost ALL of the game code is written by Leif Theiden, and is under license specified in the .py files. The code relevant to the computer vision is in states.py. I used the game to just show that it can be done for others looking to get started in simple computer vision.
My code opens a thread everytime a new surface (PyGame for frame simply) is called to be displayed on the main window. I start a thread at that point and execute a simple computer vision function that does the following:
Searches a camera stream frame for the 5x4 chessboard (cv2.findChessboardCorners)
The found corners are then drawn onto the image
Using cv2.solvePnP, the approximate pose (Rotation and translation vectors) are derived
The 3d points that describe a square are then projected from the 3d space determined by step 3 into a 2d space. this is used to convert a predertimined 3d structure into something you can use to graph on a 2d image.
However, this step instead finds the transformation to get from a set of 2d square points (the dimensions of the game frame) to the newly found projected 2d points (of the 3d frame). Now you can see that what we are trying to is simply do a two step transformation.
I then perform a basic tutorial style addition of the captured stream frame and the transformed game frame to get a final image
Variables:
+from3dTransMatrix -> points of the projected 3d structure into 2d points. these are the red dots you see
+q -> this is the reference plane that we determine the pose from
+ptMatrix -> the final transformation, to transform the game frame to fit in the projected frame
Check out the screens in the topmost folder ;]
enjoy!
I'm still hacking together a book scanning script, and for now, all I need is to be able to automagically detect a page turn. The book fills up 90% of the screen (I'm using a cruddy webcam for the motion detection), so when I turn a page, the direction of motion is basically in that same direction.
I have modified a motion-tracking script, but derivatives are getting me nowhere:
#!/usr/bin/env python
import cv, numpy
class Target:
def __init__(self):
self.capture = cv.CaptureFromCAM(0)
cv.NamedWindow("Target", 1)
def run(self):
# Capture first frame to get size
frame = cv.QueryFrame(self.capture)
frame_size = cv.GetSize(frame)
grey_image = cv.CreateImage(cv.GetSize(frame), cv.IPL_DEPTH_8U, 1)
moving_average = cv.CreateImage(cv.GetSize(frame), cv.IPL_DEPTH_32F, 3)
difference = None
movement = []
while True:
# Capture frame from webcam
color_image = cv.QueryFrame(self.capture)
# Smooth to get rid of false positives
cv.Smooth(color_image, color_image, cv.CV_GAUSSIAN, 3, 0)
if not difference:
# Initialize
difference = cv.CloneImage(color_image)
temp = cv.CloneImage(color_image)
cv.ConvertScale(color_image, moving_average, 1.0, 0.0)
else:
cv.RunningAvg(color_image, moving_average, 0.020, None)
# Convert the scale of the moving average.
cv.ConvertScale(moving_average, temp, 1.0, 0.0)
# Minus the current frame from the moving average.
cv.AbsDiff(color_image, temp, difference)
# Convert the image to grayscale.
cv.CvtColor(difference, grey_image, cv.CV_RGB2GRAY)
# Convert the image to black and white.
cv.Threshold(grey_image, grey_image, 70, 255, cv.CV_THRESH_BINARY)
# Dilate and erode to get object blobs
cv.Dilate(grey_image, grey_image, None, 18)
cv.Erode(grey_image, grey_image, None, 10)
# Calculate movements
storage = cv.CreateMemStorage(0)
contour = cv.FindContours(grey_image, storage, cv.CV_RETR_CCOMP, cv.CV_CHAIN_APPROX_SIMPLE)
points = []
while contour:
# Draw rectangles
bound_rect = cv.BoundingRect(list(contour))
contour = contour.h_next()
pt1 = (bound_rect[0], bound_rect[1])
pt2 = (bound_rect[0] + bound_rect[2], bound_rect[1] + bound_rect[3])
points.append(pt1)
points.append(pt2)
cv.Rectangle(color_image, pt1, pt2, cv.CV_RGB(255,0,0), 1)
num_points = len(points)
if num_points:
x = 0
for point in points:
x += point[0]
x /= num_points
movement.append(x)
if len(movement) > 0 and numpy.average(numpy.diff(movement[-30:-1])) > 0:
print 'Left'
else:
print 'Right'
# Display frame to user
cv.ShowImage("Target", color_image)
# Listen for ESC or ENTER key
c = cv.WaitKey(7) % 0x100
if c == 27 or c == 10:
break
if __name__=="__main__":
t = Target()
t.run()
It detects the average motion of the average center of all of the boxes, which is extremely inefficient. How would I go about detecting such motions quickly and accurately (i.e. within a threshold)?
I'm using Python, and I plan to stick with it, as my whole framework is based on Python.
And help is appreciated, so thank you all in advance. Cheers.
I haven't used OpenCV in Python before, just a bit in C++ with openframeworks.
For this I presume OpticalFlow's velx,vely properties would work.
For more on how Optical Flow works check out this paper.
HTH
why don't you use cv.GoodFeaturesToTrack ? it may solve the script runtime ... and shorten the code ...