Camera Calibration with openCV - cv2.getOptimalNewCameraMatrix results in zero roi - python

This is my first post here on stackoverflow. Please sorry for my english and my knowledge in programming if they are somehow disturbing.
Well, I am trying to do camera calibration with opencv 2.4.9 in windows 8.1 operating system (ubuntu operating system doesn't resolve the problem.)
Problem: I am using the code below to calibrate my camera, but it seems that if the number of my sample images (with check board pattern) is more than 2, then the roi of newcameramtx,roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h)) result in [0,0,0,0]. How is the number of samples connected to that result? (earlier, prior of making some changes in this code, the maximum number of samples was 12).
By saying maximum number of samples I mean the images acquired from my camera with the chessboard pattern, were the roi doesn't give good result if the number exceeds the maximum number.
The corner detection works very well. You can find my sample images here.
# -*- coding: utf-8 -*-
"""
Created on Fri May 16 15:23:00 2014
#author: kakarot
"""
import numpy as np
import cv2
#import os
#import time
from matplotlib import pyplot as plt
LeftorRight = 'L'
numer = 12
chx = 6
chy = 9
chd = 25
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, numer, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((chy*chx,3), np.float32)
objp[:,:2] = np.mgrid[0:chy,0:chx].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space, (x25mm)
imgpoints = [] # 2d points in image plane.
enum = 1
while(enum<=numer):
img=cv2.imread('1280x720p/BestAsPerMatlab/calib_'+str(LeftorRight)+str(enum)+'.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (chy,chx),None)
#cv2.imshow('Calibration',img)
# If found, add object points, image points (after refining them)
if ret == True and enum <= numer:
objpoints.append(objp*chd)
cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners)
# Draw and display the corners
cv2.drawChessboardCorners(img, (chy,chx),corners,ret)
cv2.imshow('Calibration',img)
cv2.imwrite('1280x720p/Chessboard/calibrated_L{0}.jpg'.format(enum),img)
print enum
#time.sleep(2)
if enum == numer:
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
img = cv2.imread('1280x720p/BestAsPerMatlab/calib_'+str(LeftorRight)+'7.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
h, w = img.shape[:2] #a (1 to see the whole picture)
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
if (np.size(roi) == 4 and np.mean(roi) != 0):
# undistort
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(img,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
dst = cv2.cvtColor(dst,cv2.COLOR_RGB2BGR)
plt.imshow(dst)
#cv2.imwrite('result.jpg',dst)
#np.savetxt('mtxL.txt',mtx)
#np.savetxt('distL.txt',dist)
else:
np.disp('Something Went Wrong')
enum += 1
'''
k = cv2.waitKey(1) & 0xFF
if k == 27:
break
'''
cv2.destroyAllWindows()
EDIT: I am using two cheap usb cameras. I figured out that the sample set of one of the cameras is ok, and i can use more than 19 samples without problem. But when using the calibrating samples of the other camera the maximum number of sample images is 2. (if I make another set of samples the number will vary). In conclusion it seems that there is something going on with the calibrating matrixes that produce. But it is weird though.
Finally I am using fisheye cameras, believing that cutting enough pixels around the end of each capture I would simulate a normal camera... maybe this is what is causing me the trouble!

You should change dist to
dist = np.array([-0.13615181, 0.53005398, 0, 0, 0]) # no translation
and then make call to
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
It worked for me.

Related

How do I get a decent accuracy with stereoCalibrate() function in OpenCV

I'm attempting to be able to triangulate a 3D position using the OpenCV python library by calibrating two cameras and using the stereoCalibrate function. I want the cameras to be able to be attached to the ends of a bar and measure positions around 5m away from the cameras. The majority of code is similar to the code used by Temuge Batpurev (Source: https://temugeb.github.io/opencv/python/2021/02/02/stereo-camera-calibration-and-triangulation.html) and when I run his calibration images through my code, the RSME value ("ret" in his code, retStereo in my code) returned is the 2.4 expected as Temuge outlined.
When I use my own images, which are time-synched through GPS on GoPro 10s, I average around 50. Originally, I thought it was because I was calibrating for too large of an area but close up I still get around 50
Current Calibration Method:
Starting close to the cameras, without the chessboard going out of frame for either camera, slowly waving the board around. Ensuring it runs along the edges of the frames for both cameras as well as the center before moving backwards and repeating this at various depths.
Things I have tried:
Increasing the size of chessboard squares (48mm, 61mm, 109mm)
Increasing the number of rows/columns on the chessboard
Changing lighting conditions
More calibration images, I use a script to extract frames from a video so I normally use 20+ calibration frames
Using a smaller area to triangulate
Seeing if there was a change having one camera at an angle as opposed to having the cameras parallel
Checking the findChessboardCorner function actually finds the corners of the chessboard
Ensuring the chessboard is in many different positions (bottom corner, top corner, center of the frames for each camera)
Moving towards and away from the camera in videos.
Changed the criteria and criteria_stereo variables to see if that changed anything
Images of my most recent video, 109mm squares:
My code:
############### FIND CHESSBOARD CORNERS - OBJECT POINTS AND IMAGE POINTS #############################
chessboardSize = (8,5) # Other chessboard sizes used - (5,3) OR (9,6)
# Paths to the captured frames (should be in synch) (stereoLeft and stereoRight)
CALIBRATION_IMAGES_PATH_LEFT = 'images_vid\\stereoLeft\\*.png'
CALIBRATION_IMAGES_PATH_RIGHT = 'images_vid\\stereoRight\\*.png'
# termination criteria
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)
objp = np.zeros((chessboardSize[0] * chessboardSize[1], 3), np.float32) # creates 9*6 list of (0.,0.,0.)
objp[:,:2] = np.mgrid[0:chessboardSize[0],0:chessboardSize[1]].T.reshape(-1,2) # formats list with (column no., row no., 0.) where max column no. = 8, and max row no. = 5
size_of_chessboard_squares_mm = 109
objp = objp * size_of_chessboard_squares_mm
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpointsL = [] # 2d points in image plane.
imgpointsR = [] # 2d points in image plane.
imagesLeft = sorted(glob.glob(CALIBRATION_IMAGES_PATH_LEFT))
imagesRight = sorted(glob.glob(CALIBRATION_IMAGES_PATH_RIGHT))
for imgLeft, imgRight in zip(imagesLeft, imagesRight):
imgL = cv.imread(imgLeft)
imgR = cv.imread(imgRight)
grayL = cv.cvtColor(imgL, cv.COLOR_BGR2GRAY)
grayR = cv.cvtColor(imgR, cv.COLOR_BGR2GRAY)
# Get the corners of the chess board
retL, cornersL = cv.findChessboardCorners(grayL, chessboardSize, None)
retR, cornersR = cv.findChessboardCorners(grayR, chessboardSize, None)
# Add object points and image points if chess board corners are found
if retL and retR == True:
objpoints.append(objp)
cornersL = cv.cornerSubPix(grayL, cornersL, (11,11), (-1,-1), criteria)
imgpointsL.append(cornersL)
cornersR = cv.cornerSubPix(grayR, cornersR, (11,11), (-1,-1), criteria)
imgpointsR.append(cornersR)
#Draw corners for user feedback
cv.drawChessboardCorners(imgL, chessboardSize, cornersL, retL)
cv.imshow('img left', imgL)
cv.drawChessboardCorners(imgR, chessboardSize, cornersR, retR)
cv.imshow('img right', imgR)
cv.waitKey()
cv.destroyAllWindows()
############# CALIBRATION #######################################################
retL, cameraMatrixL, distL, rvecsL, tvecsL = cv.calibrateCamera(objpoints, imgpointsL, frameSize, None, None)
heightL, widthL, channelsL = imgL.shape
newCameraMatrixL, roi_L = cv.getOptimalNewCameraMatrix(cameraMatrixL, distL, (widthL, heightL), 1, (widthL, heightL))
retR, cameraMatrixR, distR, rvecsR, tvecsR = cv.calibrateCamera(objpoints, imgpointsR, frameSize, None, None)
heightR, widthR, channelsR = imgR.shape
newCameraMatrixR, roi_R = cv.getOptimalNewCameraMatrix(cameraMatrixR, distR, (widthR, heightR), 1, (widthR, heightR))
######### Stereo Vision Calibration #############################################
## stereoCalibrate Output: retStereo is RSME, newCameraMatrixL and newCameraMatrixR are the intrinsic matrices for both
## cameras, distL and distR are the distortion coeffecients for both cameras, rot is the rotation matrix,
## trans is the translation matrix, and essentialMatrix and fundamentalMatrix are self descriptive
# R and T are taken from stereoCalibrate to use in triangulation
header = ['Rotation','Translation', 'ProjectionLeft', 'ProjectionRight'] # for the csv file
flags = 0
flags = cv.CALIB_FIX_INTRINSIC
criteria_stereo= (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 100, 0.0001)
retStereo, newCameraMatrixL, distL, newCameraMatrixR, distR, rot, trans, essentialMatrix, fundamentalMatrix = cv.stereoCalibrate(objpoints, imgpointsL, imgpointsR, cameraMatrixL, distL, cameraMatrixR, distR, grayL.shape[::-1], criteria_stereo, flags)
print(retStereo)
print('stereo vision done')
Are there any immediate flags with my code or my calibration method? Or recommendations for improving the code? Thanks for taking your time to help me :)

Interpreting the rotation matrix with respect to a defined World coordinates

In the image bellow, we see a defined world plane coordinate (X,Y,0) where Z=0. The camera as we can see is heading towards the defined world plane.
World reference point is located on the top left of the Grid (0,0,0). The distance between every two yellow point is 40 cm
I've calibrated my camera using the checkerboard and then used the built-in function cv2.solvePnP in order to estimate the rotation and translation vector of the camera with respect to my defined world coordinates. The results are as follows:
tvec_cam= [[-5.47884374]
[-3.08581371]
[24.15112048]]
rvec_cam= [[-0.02823308]
[ 0.08623225]
[ 0.01563199]]
According to the results, the (tx,ty,tz) seems to be right as the camera is located in the negative quarter of X,Y world-coordinates
However, i'm getting confused by interpreting the rotation vector.!
Does the resulted rotation vector say that the camera coordinates are almost aligned with the world coordinate axis (means almost no rotation!)?,
If yes how could this be true?, since according to OPENCV's camera coordinates, the Z-axis of the camera is pointing towards the scene (which means towards the world plane), the X-axis points towards the image write (which means opposite of X-world axis) and the Y-axis of the camera points towards the image bottom (which also means opposite of the Y-world axis)
Moreover, what is the unit of the tvec?
Note: I've illustrated the orientation of the defined world-coordinate axis according the the result of the translation vector (both tx and ty are negative)
the code i used for computing the rotation and translation vectors is shown below:
import cv2 as cv
import numpy as np
WPoints = np.zeros((9*3,3), np.float64)
WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
imPoints=np.array([[20,143],[90,143],[161,143],[231,144],[303,144],
[374,144],[446,145],[516,146],[587,147],[18,214],[88,214]
,[159,215],[230,215],[302,216],[374,216],[446,216]
,[517,217],[588,217],[16,285],[87,285],[158,286],[229,287],
[301,288]
,[374,289],[446,289],[518,289],[589,289]],dtype=np.float64)
#load the rotation matrix [[4.38073915e+03 0.00000000e+00 1.00593352e+03]
# [0.00000000e+00 4.37829226e+03 6.97020491e+02]
# [0.00000000e+00 0.00000000e+00 1.00000000e+00]]
with np.load('parameters_cam1.npz') as X:
mtx, dist, _, _ = [X[i] for i in ('mtx','dist','rvecs','tvecs')]
ret,rvecs, tvecs = cv.solvePnP(WPoints, imPoints, mtx, dist)
np.savez("extrincic_camera1.npz",rvecs=rvecs,tvecs=tvecs)
print(tvecs)
print(rvecs)
cv.destroyAllWindows()
The code for estimating the intrinsic is show below
import numpy as np
import cv2
import glob
import argparse
import pathlib
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--path", required=True, help="path to images folder")
ap.add_argument("-e", "--file_extension", required=False, default=".jpg",
help="extension of images")
args = vars(ap.parse_args())
path = args["path"] + "*" + args["file_extension"]
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (0.03,0,0), (0.06,0,0) ....,
#(0.18,0.12,0)
objp = np.zeros((5*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:5].T.reshape(-1,2)*0.03
#print(objp)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
#images = glob.glob('left/*.jpg') #read a series of images
images = glob.glob(path)
path = 'foundContours'
#pathlib.Path(path).mkdir(parents=True, exist_ok=True)
found = 0
for fname in images:
img = cv2.imread(fname) # Capture frame-by-frame
#print(images[im_i])
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (7,5), None)
# print(corners)
# If found, add object points, image points (after refining them)
if ret == True:
print('true')
objpoints.append(objp) # Certainly, every loop objp is the same, in 3D.
#print('obj_point',objpoints)
corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
# print(corners2)
imgpoints.append(corners2)
print(imgpoints)
print('first_point',imgpoints[0])
#print(imgpoints.shape())
# Draw and display the corners
img = cv2.drawChessboardCorners(img, (7,5), corners2, ret)
found += 1
cv2.imshow('img', img)
cv2.waitKey(1000)
# if you want to save images with detected corners
# uncomment following 2 lines and lines 5, 18 and 19
image_name = path + '/calibresult' + str(found) + '.jpg'
cv2.imwrite(image_name, img)
print("Number of images used for calibration: ", found)
# When everything done, release the capture
# cap.release()
cv2.destroyAllWindows()
#calibration
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints,
gray.shape[::-1],None,None)
#save parameters needed in undistortion
np.savez("parameters_cam1.npz",mtx=mtx,dist=dist,rvecs=rvecs,tvecs=tvecs)
np.savez("points_cam1.npz",objpoints=objpoints,imgpoints=imgpoints)
print ("Camera Matrix = |fx 0 cx|")
print (" | 0 fy cy|")
print (" | 0 0 1|")
print (mtx)
print('distortion coefficients=\n', dist)
print('rotation vector for each image=', *rvecs, sep = "\n")
print('translation vector for each image=', *tvecs, sep= "\n")
Hope you could help me understanding this
Thanks in Advance
First, tvec is a in Axis-angle representation (https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representation).
You can obtain the rotation matrix using cv2.Rodrigues(). For your data, I get almost the identity:
[[ 0.99616253 -0.01682635 0.08588995]
[ 0.01439347 0.99947963 0.02886672]
[-0.08633098 -0.02751969 0.99588635]]
Now, according to the directions of x and y in your picture, the z-axis points downwards (apply carefully the right-hand rule). This explains why the z-axis of the camera is almost aligned with the z-axis of your world reference frame.
Edit: Digging a little bit further, from the code you posted:
WPoints = np.zeros((9*3,3), np.float64)
WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
The values for X and Y are all positive and increment to the right and to the bottom respectively, so you are indeed using the usual convention. You are actually using X and Y incrementing to the right and down respectively and what's wrong is only the arrows you drew in the picture.
Edit Concerning the interpretation of the translation vector: in the OpenCV convention, the points in the local camera reference frame are obtained from the points in the world reference frame like this:
|x_cam| |x_world|
|y_cam| = Rmat * |y_world| + tvec
|z_cam| |z_world|
With this convention, tvec is the position of the world origin in the camera reference frame. What's more easily interpretable is the position of the camera origin in the world reference frame, which can be obtained as:
cam_center = -(tvec * R_inv)
Where R_inv is the inverse of the rotation matrix. Here the rotation matrix is almost the identity, so a quick approximation would be -tvec, which is (5.4, 3.1, -24.1).

Detecting moving object in moving camera(monitoring one area mounted on a drone)

def run(self):
while True:
_ret, frame = self.cam.read()
frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
vis = frame.copy()
if len(self.tracks) > 0:
img0, img1 = self.prev_gray, frame_gray
p0 = np.float32([tr[-1] for tr in self.tracks]).reshape(-1, 1, 2)
p1, _st, _err = cv2.calcOpticalFlowPyrLK(img0, img1, p0, None, **lk_params)
p0r, _st, _err = cv2.calcOpticalFlowPyrLK(img1, img0, p1, None, **lk_params)
d = abs(p0-p0r).reshape(-1, 2).max(-1)
good = d < 1
new_tracks = []
for i in range(len(p1)):
A.append(math.sqrt((p1[i][0][0])**2 + (p1[i][0][1])**2))
counts,bins,bars = plt.hist(A)
for tr, (x, y), good_flag in zip(self.tracks, p1.reshape(-1, 2), good):
if not good_flag:
continue
tr.append((x, y))
if len(tr) > self.track_len:
del tr[0]
new_tracks.append(tr)
cv2.circle(vis, (x, y), 2, (0, 255, 0), -1)
self.tracks = new_tracks
cv2.polylines(vis, [np.int32(tr) for tr in self.tracks], False, (0, 255, 0))
draw_str(vis, (20, 20), 'track count: %d' % len(self.tracks))
if self.frame_idx % self.detect_interval == 0:
mask = np.zeros_like(frame_gray)
mask[:] = 255
for x, y in [np.int32(tr[-1]) for tr in self.tracks]:
cv2.circle(mask, (x, y), 5, 0, -1)
p = cv2.goodFeaturesToTrack(frame_gray, mask = mask, **feature_params)
if p is not None:
for x, y in np.float32(p).reshape(-1, 2):
self.tracks.append([(x, y)])
self.frame_idx += 1
self.prev_gray = frame_gray
cv2.imshow('lk_track', vis)
ch = cv2.waitKey(1)
if ch == 27:
break
i am using lk_track.py from opencv samples to try and detect a moving object. I am trying to find the camera motion using the histogram of magnitude of optical flow vectors and then calculate the average for similar values which should be directly proportional to the camera motion. I have calculated the magnitude of the vectors and saved it in a list A. Can some suggest on how to find highest similar values from it and calculate the average for only those values?
I created a toy problem to model the approach of binarizing the images by optical flow. This is a massively simplified view of the problem, but gives the general idea well. I'll split the problem up into a few chunks and give functions for them. If you're working directly with video, there will be a lot of additional code needed of course, and I just hardcoded a lot of values that you'll need to turn into parameters.
The first function is just for generating the image sequence. The images are moving through a scene with an object moving inside the sequence. The image sequence is just simply translating through the scene, and the object appears stationary in the sequence, but that means that the object is actually moving in the opposite direction of the camera of course.
import numpy as np
import cv2
def gen_seq():
"""Generate motion sequence with an object"""
scene = cv2.GaussianBlur(np.uint8(255*np.random.rand(400, 500)), (21, 21), 3)
h, w = 400, 400
step = 4
obj_mask = np.zeros((h, w), np.bool)
obj_h, obj_w = 50, 50
obj_x, obj_y = 175, 175
obj_mask[obj_y:obj_y+obj_h, obj_x:obj_x+obj_w] = True
obj_data = np.uint8(255*np.random.rand(obj_h, obj_w)).ravel()
imgs = []
for i in range(0, 1+w//step, step):
img = scene[:, i:i+w].copy()
img[obj_mask] = obj_data
imgs.append(img)
return imgs
# generate image sequence
imgs = gen_seq()
# display images
for img in imgs:
cv2.imshow('Image', img)
k = cv2.waitKey(100) & 0xFF
if k == ord('q'):
break
cv2.destroyWindow('Image')
So here's the basic image sequence visualized. I just used a random scene, translated through, and added a random object in the center.
Great! Now we need to calculate the flow between each frame. I used dense flow here, but sparse flow would be more robust for actual images.
def find_flows(imgs):
"""Finds the dense optical flows"""
optflow_params = [0.5, 3, 15, 3, 5, 1.2, 0]
prev = imgs[0]
flows = []
for img in imgs[1:]:
flow = cv2.calcOpticalFlowFarneback(prev, img, None, *optflow_params)
flows.append(flow)
prev = img
return flows
# find optical flows between images
flows = find_flows(imgs)
# display flows
h, w = imgs[0].shape[:2]
hsv = np.zeros((h, w, 3), dtype=np.uint8)
hsv[..., 1] = 255
for flow in flows:
mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1])
hsv[..., 0] = ang*180/np.pi/2
hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)
rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
cv2.imshow('Flow', rgb)
k = cv2.waitKey(100) & 0xFF
if k == ord('q'):
break
cv2.destroyWindow('Flow')
Here I colorized the flow based on it's angle and magnitude. The angle will determine the color and the magnitude will determine the intensity/brightness of the color. This is the same view the OpenCV tutorial on dense optical flow uses.
Then, we need to binarize this flow so that we get two distinct sets of pixels based on how they're moving. In the sparse case, this works out the same except you will get two distinct sets of features.
def label_flows(flows):
"""Binarizes the flows by direction and magnitude"""
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
flags = cv2.KMEANS_RANDOM_CENTERS
h, w = flows[0].shape[:2]
labeled_flows = []
for flow in flows:
flow = flow.reshape(h*w, -1)
comp, labels, centers = cv2.kmeans(flow, 2, None, criteria, 10, flags)
n = np.sum(labels == 1)
camera_motion_label = np.argmax([labels.size-n, n])
labeled = np.uint8(255*(labels.reshape(h, w) == camera_motion_label))
labeled_flows.append(labeled)
return labeled_flows
# binarize the flows
labeled_flows = label_flows(flows)
# display binarized flows
for labeled_flow in labeled_flows:
cv2.imshow('Labeled Flow', labeled_flow)
k = cv2.waitKey(100) & 0xFF
if k == ord('q'):
break
cv2.destroyWindow('Labeled Flow')
The annoying thing here is the labels will be set randomly, i.e. the labels will be different for each frame. If you visualized the binary image, it would flip between black and white randomly. I'm only using binary labels, 0 and 1, so what I did was considered the label that is assigned to more pixels to be the "camera motion label" and then I set that label to be white in the resulting images, and the other label to be black, that way the camera motion label is always the same in each frame. This may need to be much more sophisticated for working on video feed.
But here we have it, a binarized flow where the color is just showing the two distinct sets of flow vectors.
Now if we wanted to find the target in this flow, we could invert the image and find the connected components of the binary image. The inversion will make the camera motion the background label (0). Then each of the black blobs will be white and will be labeled, and we could find the blob relating to the largest component which, in this case, will be the target. That will give a mask around the target, and we can draw the contours of that mask on the original images to see the target being detected. I'll also cut the borders of the image off before finding the connected components so edge effects from dense flow are ignored.
def find_target_in_labeled_flow(labeled_flow):
labeled_flow = cv2.bitwise_not(labeled_flow)
bw = 10
h, w = labeled_flow.shape[:2]
border_cut = labeled_flow[bw:h-bw, bw:w-bw]
conncomp, stats = cv2.connectedComponentsWithStats(border_cut, connectivity=8)[1:3]
target_label = np.argmax(stats[1:, cv2.CC_STAT_AREA]) + 1
img = np.zeros_like(labeled_flow)
img[bw:h-bw, bw:w-bw] = 255*(conncomp == target_label)
return img
for labeled_flow, img in zip(labeled_flows, imgs[:-1]):
target_mask = find_target_in_labeled_flow(labeled_flow)
display_img = cv2.merge([img, img, img])
contours = cv2.findContours(target_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[1]
display_img = cv2.drawContours(display_img, contours, -1, (0, 255, 0), 2)
cv2.imshow('Detected Target', display_img)
k = cv2.waitKey(100) & 0xFF
if k == ord('q'):
break
And of course this could get some cleaning up, and you won't be doing exactly this for sparse flow. You could just define a region of interest around the tracked points.
Now, there is still a lot of work to do. You have a binarized flow...you can probably assume that the label which occurs most frequently is the camera motion (like I did) safely. However, you'll have to make sure that the other label is the object you're interested in tracking. You'll have to keep track of it between flows so that if it stops moving, you'll know where it is as the camera is moving. When you do the k-means step, you'll want to make sure that the centers from k-means are "far enough" apart so that you know the object is moving or not.
The basic steps for that would be, from the starting frame of the video:
If the two centers are "close", then you can assume your object is either not in the scene or not moving in the scene.
Once the centers are split enough apart, you'll have found the object to track. Keep track of the location of the object.
During tracking of the object, verify the location is nearby a prediction. You can use the optical flow velocity vectors from the previous frame to predict the location each pixel/feature in the new frame, so make sure your predictions agree with your tracking result.
If the object stops moving, the centers from k-means should be close. Keep track of the optical flow vectors around the object location and follow them to have a prediction of where the object is again once it resumes moving, and again verify the detected location with this prediction.
I've never used these methods before so I'm not sure how robust they are. The typical approach for HOOF or "Histogram of oriented optical flow" is much more advanced than this (see the seminal paper here). Instead of just binarizing, the idea is to use histograms from each frame as a probability distribution, and the way this probability distribution changes over time can be analyzed with the tools from time series analysis, which I assume give a more robust framework to this approach.
with #alkasm's answer to avoid the following error:
(-215:Assertion failed) npoints > 0 in function 'drawContours'
simply replace:
contours = cv2.findContours(target_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[1]
with
contours, _ = cv2.findContours(target_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
I can't comment this below as an answer due to new account with low reputation.

Using inpaint function in OpenCV via Python to interpolate broken river-data in a watershed

Background
A raster file collected via LIDAR records the topography of a watershed. To properly model the watershed, the river must appear continuous without any breaks or interruptions. The roads in the raster file appear like dams that interrupt the river as seen in the picture below
Specific Area Under Consideration in the Watershed
Objective
These river breaks are the main problem and I am trying but failing to remove them.
Approach
Via Python, I used the various tools and prebuilt functions in the OpenCV library. The primary function I used in this approach is the cv2.inpaint function. This function takes in an image file and a mask file and interpolates the original image wherever the mask file pixels are nonzero.
The main step here is determining the mask file which I did by detecting the corners at the break in the river. The mask file will guide the inpaint function to fill in the pixels according to the patterns in the surrounding pixels.
Problem
My issue is that this happens from all directions whereas I require it to only extrapolate pixel data from the river itself. The image below shows the flawed result: inpaint works but it considers data from outside the river too.
Inpainted Result
Here is my code if you are so kind as to help:
import scipy.io as sio
import numpy as np
from matplotlib import pyplot as plt
import cv2
matfile = sio.loadmat('box.mat') ## box.mat original raster file linked below
ztopo = matfile['box']
#Crop smaller section for analysis
ztopo2 = ztopo[200:500, 1400:1700]
## Step 1) Isolate river
river = ztopo2.copy()
river[ztopo2<217.5] = 0
#This will become problematic for entire watershed w/o proper slicing
## Step 2) Detect Corners
dst = cv2.cornerHarris(river,3,7,0.04)
# cornerHarris arguments adjust qualities of corner markers
# Dilate Image (unnecessary)
#dst = cv2.dilate(dst,None)
# Threshold for an optimal value, it may vary depending on the image.
# This adjusts what defines a corner
river2 = river.copy()
river2[dst>0.01*dst.max()]=[150]
## Step 3) Remove river and keep corners
#Initiate loop to isolate detected corners
i=0
j=0
rows,columns = river2.shape
for i in np.arange(rows):
for j in np.arange(columns):
if river2[i,j] != 150:
river2[i,j] = 0
j = j + 1
i = i + 1
# Save corners as new image for import during next step.
# Import must be via cv2 as thresholding and contour detection can only work on BGR files. Sio import in line 6 (matfile = sio.loadmat('box.mat')) imports 1 dimensional image rather than BGR.
cv2.imwrite('corners.png', river2)
## Step 4) Create mask image by defining and filling a contour around the previously detected corners
#Step 4 code retrieved from http://dsp.stackexchange.com/questions/2564/opencv-c-connect-nearby-contours-based-on-distance-between-them
#Article: OpenCV/C++ connect nearby contours based on distance between them
#Initiate function to specify features of contour connections
def find_if_close(cnt1,cnt2):
row1,row2 = cnt1.shape[0],cnt2.shape[0]
for i in xrange(row1):
for j in xrange(row2):
dist = np.linalg.norm(cnt1[i]-cnt2[j])
if abs(dist) < 50 :
return True
elif i==row1-1 and j==row2-1:
return False
#import image of corners created in step 3 so thresholding can function properly
img = cv2.imread('corners.png')
#Thresholding and Finding contours only works on grayscale image
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,cv2.THRESH_BINARY)
contours,hier = cv2.findContours(thresh,cv2.RETR_EXTERNAL,2)
LENGTH = len(contours)
status = np.zeros((LENGTH,1))
for i,cnt1 in enumerate(contours):
x = i
if i != LENGTH-1:
for j,cnt2 in enumerate(contours[i+1:]):
x = x+1
dist = find_if_close(cnt1,cnt2)
if dist == True:
val = min(status[i],status[x])
status[x] = status[i] = val
else:
if status[x]==status[i]:
status[x] = i+1
unified = []
maximum = int(status.max())+1
for i in xrange(maximum):
pos = np.where(status==i)[0]
if pos.size != 0:
cont = np.vstack(contours[i] for i in pos)
hull = cv2.convexHull(cont) # I don't know why/how this is used
unified.append(hull)
cv2.drawContours(img,unified,-1,(0,255,0),1) #Last argument specifies contour width
cv2.drawContours(thresh,unified,-1,255,-1)
# Thresh is the filled contour while img is the contour itself
# The contour surrounds the corners
#cv2.imshow('pic', thresh) #Produces black and white image
## Step 5) Merge via inpaint
river = np.uint8(river)
ztopo2 = np.uint8(ztopo2)
thresh[thresh>0] = 1
#new = river.copy()
merged = cv2.inpaint(river,thresh,12,cv2.INPAINT_TELEA)
plt.imshow(merged)
plt.colorbar()
plt.show()

Camera Calibration with OpenCV - How to adjust chessboard square size?

I am working on a camera calibration program using the OpenCV/Python example (from: OpenCV Tutorials) as a guidebook.
Question: How do I tailor this example code to account for the size of a square on a particular chessboard pattern? My understanding of the camera calibration process is that this information must somehow be used otherwise the values given by:
cv2.calibrateCamera()
will be incorrect.
Here is the portion of my code that reads in image files and runs through the calibration process to produce the camera matrix and other values.
#import cv2
#import numpy as np
#import glob
"""
Corner Finding
"""
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# Prepare object points, like (0,0,0), (1,0,0), ....,(6,5,0)
objp = np.zeros((5*5,3), np.float32)
objp[:,:2] = np.mgrid[0:5,0:5].T.reshape(-1,2)
# Arrays to store object points and image points from all images
objpoints = []
imgpoints = []
counting = 0
# Import Images
images = glob.glob('dir/sub dir/Images/*')
for fname in images:
img = cv2.imread(fname) # Read images
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Convert to grayscale
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (5,5), None)
# if found, add object points, image points (after refining them)
if ret == True:
objpoints.append(objp)
cv2.cornerSubPix(gray, corners, (11,11), (-1,-1), criteria)
imgpoints.append(corners)
#Draw and display corners
cv2.drawChessboardCorners(img, (5,5), corners, ret)
counting += 1
print str(counting) + ' Viable Image(s)'
cv2.imshow('img', img)
cv2.waitKey(500)
cv2.destroyAllWindows()
# Calibrate Camera
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
Here, if you have your square size assume 30 mm then multiply this value with objp[:,:2]. Like this
objp[:,:2] = np.mgrid[0:9,0:6].T.reshape(-1,2)*30 # 30 mm size of square
As objp[:,:2] is a set of points of checkboard corners given as (0,0),(0,1), (0,2) ....(8,5). (0,0) point is the left upper most square corner and (8,5) is the right lowest square corner. In this case, these points have no unit but if we multiply these values with square size (for example 30 mm), then these will become (0,0),(0,30), .....(240,150) which are the real world units. Your translation vector will be in mm units in this case.
From here: https://docs.opencv.org/4.5.1/dc/dbb/tutorial_py_calibration.html
What about the 3D points from real world space? Those images are taken from a static camera and chess boards are placed at different locations and orientations. So we need to know (X,Y,Z) values. But for simplicity, we can say chess board was kept stationary at XY plane, (so Z=0 always) and camera was moved accordingly. This consideration helps us to find only X,Y values. Now for X,Y values, we can simply pass the points as (0,0), (1,0), (2,0), ... which denotes the location of points. In this case, the results we get will be in the scale of size of chess board square. But if we know the square size, (say 30 mm), we can pass the values as (0,0), (30,0), (60,0), ... . Thus, we get the results in mm. (In this case, we don't know square size since we didn't take those images, so we pass in terms of square size).

Categories