I know this question was asked a few Times, but the answers doesn't solve my problem.
I want to calibrate a pair of Cameras to use as Stereo Input.
But when I run the code I get the error Message :
OpenCV(3.4.1) Error: Assertion failed (nimages > 0 && nimages == (int)imagePoints1.total() && (!imgPtMat2 || nimages == (int)imagePoints2.total())) in collectCalibrationData, file /tmp/opencv-20180529-49540-yj8rbk/opencv-3.4.1/modules/calib3d/src/calibration.cpp, line 3133
Traceback (most recent call last):
File "/Users/MyName/Pycharm/Project/calibration.py", line 342, in <module>
cv2.error: OpenCV(3.4.1) /tmp/opencv-20180529-49540-yj8rbk/opencv-3.4.1/modules/calib3d/src/calibration.cpp:3133: error: (-215) nimages > 0 && nimages == (int)imagePoints1.total() && (!imgPtMat2 || nimages == (int)imagePoints2.total()) in function collectCalibrationData
My Code is :
def distortion_matrix(path, objpoints, imgpoints):
for item in os.listdir(path):
if item.endswith(".jpg"):
cap = cv2.VideoCapture(path+item, cv2.CAP_IMAGES)
ret, img = cap.read() # Capture frame-by-frame
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
keypoints = blobDetector.detect(gray) # Detect blobs.
im_with_keypoints = cv2.drawKeypoints(img, keypoints, np.array([]), (0, 255, 0),
im_with_keypoints_gray = cv2.cvtColor(im_with_keypoints, cv2.COLOR_BGR2GRAY)
ret, corners = cv2.findCirclesGrid(im_with_keypoints, (4, 11), None,
if ret == True:
corners2 = cv2.cornerSubPix(im_with_keypoints_gray, corners, (11, 11), (-1, -1),
_, leftCameraMatrix, leftDistortionCoefficients, _, _ , objpoints0, imgpoints0 = distortion_matrix("./calibration/left/", objpoints0, imgpoints0)
_, rightCameraMatrix, rightDistortionCoefficients, _, _, objpoints1, imgpoints1 = distortion_matrix("./calibration/right/", objpoints1, imgpoints1)
(_, _, _, _, _, rotationMatrix, translationVector, _, _) = cv2.stereoCalibrate( objp, imgpoints0, imgpoints1,
leftCameraMatrix, leftDistortionCoefficients,
rightCameraMatrix, rightDistortionCoefficients,
imageSize, None, None, None, None,
Most times when this gets thrown, it seems that the Message refers to arrays (imgpoint and objpoint) which are empty or not evenly filled.
But at the end both got the length 20 (I scan 20 images so this seems right) and every cell of the array has 44 arrays stored (the circle grid I use has 44 points so this seems also right).
**Edit: **
my objp, imgpoint and objpoint are defined like this:
objp = np.zeros((np.prod(pattern_size), 3), np.float32)
objp[0] = (0, 0, 0)
objp[1] = (0, 2, 0)
objp[2] = (0, 4, 0)
objp[3] = (0, 6, 0)
objpoints0 = []
objpoints1 = []
imgpoints0 = []
imgpoints1 = []
** Edit 2: **
If NUM_IMAGES stands for Number of images, I thing I've got it now. But only when I add the new axis after I call distortion_matrix().
Then the code is able to complete. I need to test the results, but at least this problem seems be be solved.
Thank you very much
You said you are doing stereo calibration, is there any case where some of the points on your grid does not visible from other camera? This error may appear when one of your view unable to detect all points on the calibration pattern. Three points to consider are
1- Make sure your object points are 3d
2- Make sure your left points, right points and object points have same size (number of views).
3- Make sure your left points, right points and object points have same amount of points at each index of list.
Edit: Your object points objp must contain a list/vector of 3d points, currently its shape is something like (44, 3), it must be (NUM_IMAGES, 44, 3). You can achieve this with objp = np.repeat(objp[np.newaxis, :, :], NUM_IMAGES, axis=0)
I am getting the following error when trying to use the cv2.solvePnP() function while trying to do camera calibration. Here is my code below.
My python version is 3.10.9 and my OpenCV version is 4.7.0
import cv2
import numpy as np
import glob
from natsort import natsorted
import os
Xc, Yc, Zc = 0, 0, 0
camera_axis_len = 0.1
number_of_squares_X = 8 # Number of chessboard squares along the x-axis
number_of_squares_Y = 5 # Number of chessboard squares along the y-axis
nX = number_of_squares_X - 1 # Number of interior corners along x-axis
nY = number_of_squares_Y - 1 # Number of interior corners along y-axis
square_size = 0.032 # Length of the side of a square in meters
image_points_left = []
image_points_right = []
object_points_left = []
object_points_right = []
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
object_points_3D = np.zeros((nX * nY, 3), np.float32)
object_points_3D[:,:2] = np.mgrid[0:nY, 0:nX].T.reshape(-1, 2)
object_points_3D = object_points_3D * square_size
images = natsorted(glob.glob('calib_pictures_left/*.JPG'))
count = 0
corners_2 =[]
for image_file in images:
image = cv2.imread(image_file)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
success, corners = cv2.findChessboardCorners(gray, (nY, nX), cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_FAST_CHECK + cv2.CALIB_CB_NORMALIZE_IMAGE)
if success == True:
corners_2 = cv2.cornerSubPix(gray, corners, (11,11), (-1,-1), criteria)
cv2.drawChessboardCorners(image, (nY, nX), corners_2, success)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(object_points_left, image_points_left, (1920,1080),None,None, flags=None)
object_points_left = np.array(object_points_left,dtype=np.float32)
image_points_left = np.array(image_points_left, dtype=np.float32)
retval, rvec, tvec = cv2.solvePnP(object_points_left,image_points_left , mtx,dist,flags=None)
R, _ = cv2.Rodrigues(rvec)
This is the error I am getting
retval, rvec, tvec = cv2.solvePnP(object_points_left,image_points_left , mtx,dist,flags=None)
cv2.error: OpenCV(4.7.0) /io/opencv/modules/calib3d/src/solvepnp.cpp:838: error: (-215:Assertion failed) ( (npoints >= 4) || (npoints == 3 && flags == SOLVEPNP_ITERATIVE && useExtrinsicGuess) || (npoints >= 3 && flags == SOLVEPNP_SQPNP) ) && npoints == std::max(ipoints.checkVector(2, CV_32F), ipoints.checkVector(2, CV_64F)) in function 'solvePnPGeneric'
I attempted to fix the error by following solutions to convert image points and object points to a numpy array of dtype float32. This still results in the error though.I've also ensured theres the same amount of points(26) amongst object points and image points.
I am not sure how to rectify this.
This the shape of the array for the image points
(26, 28, 1, 2)
This is the shape of the array for the object points
(26, 28, 3)
I am trying to draw a shape and then check whether the point is inside the shape or no. Thought using cv2.polylines() to draw it and cv2.pointPolygonTest() to test should work am getting an error which is not very informative.
Traceback (most recent call last):
File "C:\Users\XXX\Desktop\Heatmap\cvtest.py", line 32, in <module>
dist = cv2.pointPolygonTest(cv2.polylines(img,[pts],True,(0, 255 , 0), 2), (52,288), False)
cv2.error: OpenCV(4.1.0) C:\projects\opencv-python\opencv\modules\imgproc\src\geometry.cpp:103: error: (-215:Assertion failed) total >= 0 && (depth == CV_32S || depth == CV_32F) in function 'cv::pointPolygonTest'
I am guessing the shape created with cv2.polylines() is not a contour. What would be the correct way to do it then? My current code:
import cv2
import numpy as np
img = cv2.imread('image.png')
pts = np.array([[18,306],[50,268],[79,294],[165,328],[253,294],[281,268],[313,306],[281,334],[270,341],[251,351],[230,360],[200,368],[165,371],[130,368],[100,360],[79,351],[50,334],[35,323]], np.int32)
pts = pts.reshape((-1,1,2))
dist = cv2.pointPolygonTest(cv2.polylines(img,[pts],True,(0, 255 , 0), 2), (52,288), False)
cv2.imshow('test', img)
polylines is not the right input, it is used to draw a shape (docs)
pointPolygonTest instead needs the contour as an input (docs)
dist = cv2.pointPolygonTest(pts, (52,288), False) will return 1.0, meaning inside the contour.
Note that you can perform a pointPolygonTest without an image. But if you want to draw the results, you can use this code as a starter:
import cv2
import numpy as np
#create background
img = np.zeros((400,400),dtype=np.uint8)
# define shape
pts = np.array([[18,306],[50,268],[79,294],[165,328],[253,294],[281,268],[313,306],[281,334],[270,341],[251,351],[230,360],[200,368],[165,371],[130,368],[100,360],[79,351],[50,334],[35,323]], np.int32)
pts = pts.reshape((-1,1,2))
# draw shape
cv2.polylines(img,[pts],True,(255), 2)
# draw point of interest
# perform pointPolygonTest
dist = cv2.pointPolygonTest(pts, (52,288), False)
# show image
cv2.imshow('test', img)
I am trying to apply a mask I have made to an image using openCV (3.3.1) in python (3.6.5) to extract all the skin. I am looping over a photo and checking windows and classifying them using two premade sklearm GMMs. If the window is skin I have changing that area of the mask to True (255) otherwise leaving it as 0.
I have initialized the numpy array to hold the mask before the loop to be the same dimensions as the image, but openCV keeps saying that the image and mask do not have the same dimensions (output and error message are below). I have seen other somewhat similar problems on the site but none with solutions that have worked for me.
Here is my code:
# convert the image to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
delta = 6
# create an empty np array to make the mask
#mask = np.zeros((img.shape[0], img.shape[1], 1))
mask = np.zeros(img.shape[:2])
# loop through image and classify each window
for i in range(0,hsv.shape[0],delta):
for j in range(0,hsv.shape[1],delta):
# get a copy of the window
arr = np.copy(hsv[i:i+delta,j:j+delta,0])
# create a normalized hue histogram for the window
if arr.sum() > 0:
arr = np.histogram(np.ravel(arr/arr.sum()), bins=100, range=(0,1))
arr = np.histogram(np.ravel(arr), bins=100, range=(0,1))
# take the histogram and reshape it
arr = arr[0].reshape(1,-1)
# get the probabilities that the window is skin or not skin
skin = skin_gmm.predict_proba(arr)
not_skin = background_gmm.predict_proba(arr)
if skin > not_skin:
# becasue the window is more likely skin than not skin
# we fill that window of the mask with ones
# apply the mask to the original image to extract the skin
masked_img = cv2.bitwise_and(img, img, mask = mask)
The output is:
(2816, 2112)
(2816, 2112, 3)
OpenCV Error: Assertion failed ((mtype == 0 || mtype == 1) &&
_mask.sameSize(*psrc1)) in cv::binary_op, file C:\ci\opencv_1512688052760
\work\modules\core\src\arithm.cpp, line 241
Traceback (most recent call last):
File "skindetector_hist.py", line 183, in <module>
File "skindetector_hist.py", line 173, in main
skin = classifier_mask(img, skin_gmm, background_gmm)
File "skindetector_hist.py", line 63, in classifier_mask
masked_img = cv2.bitwise_and(img, img, mask = mask)
cv2.error: C:\ci\opencv_1512688052760\work\modules\core\src
\arithm.cpp:241: error: (-215) (mtype == 0 || mtype == 1) &&
_mask.sameSize(*psrc1) in function cv::binary_op
As you can see in the output, the image and mask have the same width and height. I have also tried making the mask have depth one (line 5) but that didn't help. Thank you for any help!
It is not only complaining about the size of the mask. It is complaining about the type of the mask. The error:
OpenCV Error: Assertion failed ((mtype == 0 || mtype == 1) &&
Means that either the type of the mask or the size (that in your case is equal) is not the same. In the documentation we see:
mask – optional operation mask, 8-bit single channel array, that
specifies elements of the output array to be changed.
And this is consistent with the error that asks for a type 0 (CV_8U) or 1 (CV_8S).
Also, even if it is not said, the img should not be float, since it will not give a desired result (probably it will do it anyways).
The solution is probably enough to change:
mask = np.zeros(img.shape[:2])
mask = np.zeros(img.shape[:2], dtype=np.uint8)
A small test shows what type you will get:
gives you dtype('float64') which means doubles and not 8 bit
I am trying to calibrate a fisheye lens following these instructions
where you can find the full code I'm using for the calibration part.
I arrive at this point where:
N_OK = len(objpoints)
K = np.zeros((3, 3))
D = np.zeros((4, 1))
rvecs = [np.zeros((1, 1, 3), dtype=np.float64) for i in range(N_OK)]
tvecs = [np.zeros((1, 1, 3), dtype=np.float64) for i in range(N_OK)]
rms, _, _, _, _ = \
print("Found " + str(N_OK) + " valid images for calibration")
print("DIM=" + str(_img_shape[::-1]))
print("K=np.array(" + str(K.tolist()) + ")")
print("D=np.array(" + str(D.tolist()) + ")")
I get this error:
Traceback (most recent call last)
<ipython-input-10-deaca9981fe4> in <module>()
13 tvecs,
14 calibration_flags,
---> 15 (cv2.TERM_CRITERIA_EPS+cv2.TERM_CRITERIA_MAX_ITER, 30, 1e-3)
16 )
17 print("Found " + str(N_OK) + " valid images for calibration")
error: C:\ci\opencv_1512688052760\work\modules\calib3d\src\fisheye.cpp:1414:
error: (-3) CALIB_CHECK_COND - Ill-conditioned matrix for input array 0 in
function cv::internal::CalibrateExtrinsics
I don't understand what's going on and I could only find so little information around the internet, does anyone have experienced something similar and know how to solve this?
These are the images of the checkerboard I'm using:
I think it is because your variable calibration_flags has CALIB_CHECK_COND set.
Try disabling this flag. Without it I was able to undistort your images (see links below).
I am not sure what this check is for (the documentation is not very explicit). This flag reject some images¹ of my gopro hero 3 even when the chessboard is visible and detected. In my case one image among 20 is not passing this test. This image has the chessboard close to the left border.
¹ in OpenCV versions >= 3.4.1 the error message tells you which image is not passing the test
I did not find the code in python so I manually check the images with chessboard on the edge and delete them one by one until the error is gone.
As #Ahmadiah mentioned, the "ill conditioned" thing can happen when the checkerboard falls near the edge of the image. One way to handle this is to remove images one by one and try again when they cause calibration to fail. Here is an example where we do that:
def calibrate_fisheye(all_image_points, all_true_points, image_size):
""" Calibrate a fisheye camera from matching points.
:param all_image_points: Sequence[Array(N, 2)[float32]] of (x, y) image coordinates of the points. (see cv2.findChessboardCorners)
:param all_true_points: Sequence[Array(N, 3)[float32]] of (x,y,z) points. (If from a grid, just put (x,y) on a regular grid and z=0)
Note that each of these sets of points can be in its own reference frame,
:param image_size: The (size_y, size_x) of the image.
:return: (rms, mtx, dist, rvecs, tvecs) where
rms: float - The root-mean-squared error
mtx: array[3x3] A 3x3 camera intrinsics matrix
dst: array[4x1] A (4x1) array of distortion coefficients
rvecs: Sequence[array[N,3,1]] of estimated rotation vectors for each set of true points
tvecs: Sequence[array[N,3,1]] of estimated translation vectors for each set of true points
assert len(all_true_points) == len(all_image_points)
all_true_points = list(all_true_points) # Because we'll modify it in place
all_image_points = list(all_image_points)
while True:
assert len(all_true_points) > 0, "There are no valid images from which to calibrate."
rms, mtx, dist, rvecs, tvecs = cv2.fisheye.calibrate(
objectPoints=[p[None, :, :] for p in all_true_points],
imagePoints=[p[:, None, :] for p in all_image_points],
K=np.zeros((3, 3)),
D=np.zeros((4, 1)),
flags=cv2.fisheye.CALIB_RECOMPUTE_EXTRINSIC + cv2.fisheye.CALIB_CHECK_COND + cv2.fisheye.CALIB_FIX_SKEW,
criteria=(cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 1e-6),
print('Found a calibration based on {} well-conditioned images.'.format(len(all_true_points)))
return rms, mtx, dist, rvecs, tvecs
except cv2.error as err:
idx = int(err.message.split('array ')[1][0]) # Parse index of invalid image from error message
print("Removed ill-conditioned image {} from the data. Trying again...".format(idx))
except IndexError:
raise err
my opencv version is 4.0+, I sloved it by this method
rms, _, _, _, _ = \
def run(self):
while True:
_ret, frame = self.cam.read()
frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
vis = frame.copy()
if len(self.tracks) > 0:
img0, img1 = self.prev_gray, frame_gray
p0 = np.float32([tr[-1] for tr in self.tracks]).reshape(-1, 1, 2)
p1, _st, _err = cv2.calcOpticalFlowPyrLK(img0, img1, p0, None, **lk_params)
p0r, _st, _err = cv2.calcOpticalFlowPyrLK(img1, img0, p1, None, **lk_params)
d = abs(p0-p0r).reshape(-1, 2).max(-1)
good = d < 1
new_tracks = []
for i in range(len(p1)):
A.append(math.sqrt((p1[i][0][0])**2 + (p1[i][0][1])**2))
counts,bins,bars = plt.hist(A)
for tr, (x, y), good_flag in zip(self.tracks, p1.reshape(-1, 2), good):
if not good_flag:
tr.append((x, y))
if len(tr) > self.track_len:
del tr[0]
cv2.circle(vis, (x, y), 2, (0, 255, 0), -1)
self.tracks = new_tracks
cv2.polylines(vis, [np.int32(tr) for tr in self.tracks], False, (0, 255, 0))
draw_str(vis, (20, 20), 'track count: %d' % len(self.tracks))
if self.frame_idx % self.detect_interval == 0:
mask = np.zeros_like(frame_gray)
mask[:] = 255
for x, y in [np.int32(tr[-1]) for tr in self.tracks]:
cv2.circle(mask, (x, y), 5, 0, -1)
p = cv2.goodFeaturesToTrack(frame_gray, mask = mask, **feature_params)
if p is not None:
for x, y in np.float32(p).reshape(-1, 2):
self.tracks.append([(x, y)])
self.frame_idx += 1
self.prev_gray = frame_gray
cv2.imshow('lk_track', vis)
ch = cv2.waitKey(1)
if ch == 27:
i am using lk_track.py from opencv samples to try and detect a moving object. I am trying to find the camera motion using the histogram of magnitude of optical flow vectors and then calculate the average for similar values which should be directly proportional to the camera motion. I have calculated the magnitude of the vectors and saved it in a list A. Can some suggest on how to find highest similar values from it and calculate the average for only those values?
I created a toy problem to model the approach of binarizing the images by optical flow. This is a massively simplified view of the problem, but gives the general idea well. I'll split the problem up into a few chunks and give functions for them. If you're working directly with video, there will be a lot of additional code needed of course, and I just hardcoded a lot of values that you'll need to turn into parameters.
The first function is just for generating the image sequence. The images are moving through a scene with an object moving inside the sequence. The image sequence is just simply translating through the scene, and the object appears stationary in the sequence, but that means that the object is actually moving in the opposite direction of the camera of course.
import numpy as np
import cv2
def gen_seq():
"""Generate motion sequence with an object"""
scene = cv2.GaussianBlur(np.uint8(255*np.random.rand(400, 500)), (21, 21), 3)
h, w = 400, 400
step = 4
obj_mask = np.zeros((h, w), np.bool)
obj_h, obj_w = 50, 50
obj_x, obj_y = 175, 175
obj_mask[obj_y:obj_y+obj_h, obj_x:obj_x+obj_w] = True
obj_data = np.uint8(255*np.random.rand(obj_h, obj_w)).ravel()
imgs = []
for i in range(0, 1+w//step, step):
img = scene[:, i:i+w].copy()
img[obj_mask] = obj_data
return imgs
# generate image sequence
imgs = gen_seq()
# display images
for img in imgs:
cv2.imshow('Image', img)
k = cv2.waitKey(100) & 0xFF
if k == ord('q'):
So here's the basic image sequence visualized. I just used a random scene, translated through, and added a random object in the center.
Great! Now we need to calculate the flow between each frame. I used dense flow here, but sparse flow would be more robust for actual images.
def find_flows(imgs):
"""Finds the dense optical flows"""
optflow_params = [0.5, 3, 15, 3, 5, 1.2, 0]
prev = imgs[0]
flows = []
for img in imgs[1:]:
flow = cv2.calcOpticalFlowFarneback(prev, img, None, *optflow_params)
prev = img
return flows
# find optical flows between images
flows = find_flows(imgs)
# display flows
h, w = imgs[0].shape[:2]
hsv = np.zeros((h, w, 3), dtype=np.uint8)
hsv[..., 1] = 255
for flow in flows:
mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1])
hsv[..., 0] = ang*180/np.pi/2
hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)
rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
cv2.imshow('Flow', rgb)
k = cv2.waitKey(100) & 0xFF
if k == ord('q'):
Here I colorized the flow based on it's angle and magnitude. The angle will determine the color and the magnitude will determine the intensity/brightness of the color. This is the same view the OpenCV tutorial on dense optical flow uses.
Then, we need to binarize this flow so that we get two distinct sets of pixels based on how they're moving. In the sparse case, this works out the same except you will get two distinct sets of features.
def label_flows(flows):
"""Binarizes the flows by direction and magnitude"""
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
h, w = flows[0].shape[:2]
labeled_flows = []
for flow in flows:
flow = flow.reshape(h*w, -1)
comp, labels, centers = cv2.kmeans(flow, 2, None, criteria, 10, flags)
n = np.sum(labels == 1)
camera_motion_label = np.argmax([labels.size-n, n])
labeled = np.uint8(255*(labels.reshape(h, w) == camera_motion_label))
return labeled_flows
# binarize the flows
labeled_flows = label_flows(flows)
# display binarized flows
for labeled_flow in labeled_flows:
cv2.imshow('Labeled Flow', labeled_flow)
k = cv2.waitKey(100) & 0xFF
if k == ord('q'):
cv2.destroyWindow('Labeled Flow')
The annoying thing here is the labels will be set randomly, i.e. the labels will be different for each frame. If you visualized the binary image, it would flip between black and white randomly. I'm only using binary labels, 0 and 1, so what I did was considered the label that is assigned to more pixels to be the "camera motion label" and then I set that label to be white in the resulting images, and the other label to be black, that way the camera motion label is always the same in each frame. This may need to be much more sophisticated for working on video feed.
But here we have it, a binarized flow where the color is just showing the two distinct sets of flow vectors.
Now if we wanted to find the target in this flow, we could invert the image and find the connected components of the binary image. The inversion will make the camera motion the background label (0). Then each of the black blobs will be white and will be labeled, and we could find the blob relating to the largest component which, in this case, will be the target. That will give a mask around the target, and we can draw the contours of that mask on the original images to see the target being detected. I'll also cut the borders of the image off before finding the connected components so edge effects from dense flow are ignored.
def find_target_in_labeled_flow(labeled_flow):
labeled_flow = cv2.bitwise_not(labeled_flow)
bw = 10
h, w = labeled_flow.shape[:2]
border_cut = labeled_flow[bw:h-bw, bw:w-bw]
conncomp, stats = cv2.connectedComponentsWithStats(border_cut, connectivity=8)[1:3]
target_label = np.argmax(stats[1:, cv2.CC_STAT_AREA]) + 1
img = np.zeros_like(labeled_flow)
img[bw:h-bw, bw:w-bw] = 255*(conncomp == target_label)
return img
for labeled_flow, img in zip(labeled_flows, imgs[:-1]):
target_mask = find_target_in_labeled_flow(labeled_flow)
display_img = cv2.merge([img, img, img])
contours = cv2.findContours(target_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[1]
display_img = cv2.drawContours(display_img, contours, -1, (0, 255, 0), 2)
cv2.imshow('Detected Target', display_img)
k = cv2.waitKey(100) & 0xFF
if k == ord('q'):
And of course this could get some cleaning up, and you won't be doing exactly this for sparse flow. You could just define a region of interest around the tracked points.
Now, there is still a lot of work to do. You have a binarized flow...you can probably assume that the label which occurs most frequently is the camera motion (like I did) safely. However, you'll have to make sure that the other label is the object you're interested in tracking. You'll have to keep track of it between flows so that if it stops moving, you'll know where it is as the camera is moving. When you do the k-means step, you'll want to make sure that the centers from k-means are "far enough" apart so that you know the object is moving or not.
The basic steps for that would be, from the starting frame of the video:
If the two centers are "close", then you can assume your object is either not in the scene or not moving in the scene.
Once the centers are split enough apart, you'll have found the object to track. Keep track of the location of the object.
During tracking of the object, verify the location is nearby a prediction. You can use the optical flow velocity vectors from the previous frame to predict the location each pixel/feature in the new frame, so make sure your predictions agree with your tracking result.
If the object stops moving, the centers from k-means should be close. Keep track of the optical flow vectors around the object location and follow them to have a prediction of where the object is again once it resumes moving, and again verify the detected location with this prediction.
I've never used these methods before so I'm not sure how robust they are. The typical approach for HOOF or "Histogram of oriented optical flow" is much more advanced than this (see the seminal paper here). Instead of just binarizing, the idea is to use histograms from each frame as a probability distribution, and the way this probability distribution changes over time can be analyzed with the tools from time series analysis, which I assume give a more robust framework to this approach.
with #alkasm's answer to avoid the following error:
(-215:Assertion failed) npoints > 0 in function 'drawContours'
simply replace:
contours = cv2.findContours(target_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[1]
contours, _ = cv2.findContours(target_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
I can't comment this below as an answer due to new account with low reputation.