I am trying to scan an object with a laser to extract 3D point clouds. There are 2 cameras and 1 laser in my setup. What I do is giving nonzero points in masks to OpenCV's triangulatePoints function as projPoints arg. Since both numbers of points must be the same for triangulatePoints function and there are 2 masks, if one mask has more nonzero points than the other, I basically downsize it to other's size by doing this:
l1 = len(pts1)
l2 = len(pts2)
newPts1 = pts1[0:l2]
Is there a good way for matching left and right frame nonzero points?
First, if your images normally look like that, your sensors are deeply saturated, and consequently your 3D ranges are either worthless or much less accurate than they could be.
Second, you should aim for matching one point per rectified scanline on each image of the pair, rather than a set of points. The whole idea of using a laser stripe is to get a well focused beam of light on as small a spot or band as possible, so you can probe the surface in detail.
For best accuracy, the peak-finding should be done independently on each scanline of the original (distorted and not rectified) images, so it is not affected by the interpolation used by the undistortion and stereo rectification procedures. Rather, you would use the geometrical undistortion and stereo rectification transforms to map the peaks detected in original images into the rectified ones.
There are several classical algorithms for peak-finding with laser stripe-based triangulation methods, you may find this other answer of mine useful.
Last, if your setup is expected to be as in the picture, with the laser stripe illuminating two orthogonal planes in addition to the object of interest, then you do not need to use stereo at all: you can solve for the 3D plane spanned by the laser stripe projector and triangulate by intersecting that plane with each ray back-projecting the peaks of the image of the laser stripe on the object. This is similar to one of the methods J. Y. Bouguet used in his old Ph.D. thesis on desktop photography (here is a summary by S. Seitz). One implementation using a laser striper is detailed in this patent. This method is surprisingly accurate: with it we achieved approximately 0.2mm accuracy in a cubic foot of volume using a dinky 640x480 CCD video camera back in 1999. Patent has expired, so you are free to enjoy it.
Related
TLDR: I am looking to fit a triangle to a boomerang-shaped object in order to detect its "head", potentially using python's opencv.
I have a collection of boomerang-shape objects (see image below), whose size and internal angles vary. Additionally, sometimes the "boomerangs" (unlike a real boomerang) can be asymmetric with one leg longer than the other, and can have defects and holes along their legs.
I can accurately extract the contours of these shapes, and now am trying to detect the direction the boomerang is facing (defined as the direction the "pointy" edge, the one marked by brown dots in the image below).
My plan so far was to use opencv's convex defects method to detect the internal angle, and from there detect the direction. However, my "boomerangs" are not perfect - they sometimes have holes and defects along their legs that confuse the convex defect algorithm.
My question is: is there a way to find the best-fit triangle (much like the best fit ellipse) that would fit the boomerang?
I am looking a robust way to extract the contour of the front panel of a washing machine. Or just get 4 corner points of the front panel.
I've tried color masking but didn't find stable results.
Here some examples:
Three potential options:
Get a bunch of images of the machines, manually determine a label saying where the door is, and then train a convolutional neural network to regress those parameters per image.
Treat each image as a separate optimization problem, where the goal is to estimate the parameters of the best rectangle most likely to correspond to the front panel. So our model is theta = (p_1, p_2, p_3, p_4), the four 2D locations of the panel in the image. We need an energy function E to minimize wrt theta (e.g., using gradient descent with momentum, or RANSAC). There are a number of terms you can use, just as some ideas:
a. At least some of the corners should be "corner-like": run a simple corner detector, and define an energy E_corner which penalizes distance to the closest corner.
b. At least some of the edges (between p_1 and p_2 or p_3, for example) should be "edge-like": compute the gradient magnitude of the image M = || \nabla I || and enforce that along the panel edge the values of M should be larger, using an energy E_edge. E.g., for x,y along an edge, let E_edge(x,y)=1/(1+M(x,y)) (Robust losses tend to be better here though).
c. Use the fact that each door is actually a projected 3D rectangle: e.g., see this question. An interesting idea is to start with a rectangle (representing the panel) and instead of regressing the p_i's, instead regress the parameters of an affine transform or even perspective projection transform (though this requires the algorithm estimate depth), that maps the starting rectangle to one in the image. You can then regularize the parameters of the estimated transform to prevent unlikely transforms from being output.
d. Use knowledge of what must be inside the rectangle. For instance, given the four corners, you can determine the ellipse defining the round door to the machine. The appearance statistics within that ellipse should be somewhat unique, as well as the edges/image gradient at the door boundary; hence you can define an energy term encouraging the model to choose corners such that the interior has a dark elliptical object on a white background.
Overall, this approach is similar to snakes, or active contour models, which might be worth looking into for you I think. However, energy-minimizing snakes tend not to consider the inside of the region they enclose; hence, some variant of the Mumford-Shah functional could be a useful addition (though note smoothness of the "door region" is not entirely desirable in your case).
If all your machines are very similar or nearly the same (as the ones you've posted are), it might actually be best to estimate a homography between the images. (See also here or here). Since the front of the machine is nearly planar, the fronts of different images must be related by a homography. Then knowing where the front panel is in one image will tell you where it is in all of them. For instance, check out the OpenCV tutorial for homographies, where they show how to undo the perspective transform of a planar surface allowing you to do a perspective warp of one image to another (here, one projected machine panel to another template one).
I am using openCV to process an image and use houghcircles to detect the circles in the image under test, and also calculating the distance between their centers using euclidean distance.
Since this would be in pixels, I need the absolute distances in mm or inches, can anyone let me know how this can be done
Thanks in advance.
The image formation process implies taking a 2D projection of the real, 3D world, through a lens. In this process, a lot of information is lost (e.g. the third dimension), and the transformation is dependent on lens properties (e.g. focal distance).
The transformation between the distance in pixels and the physical distance depends on the depth (distance between the camera and the object) and the lens. The complex, but more general way, is to estimate the depth (there are specialized algorithms which can do this under certain conditions, but require multiple cameras/perspectives) or use a depth camera which can measure the depth. Once the depth is known, after taking into account the effects of the lens projection, an estimation can be made.
You do not give much information about your setup, but the transformation can be measured experimentally. You simply take a picture of an object of known dimensions and you determine the physical dimension of one pixel (e.g. if the object is 10x10 cm and in the picture it has 100x100px, then 10px is 1mm). This is strongly dependent on the distance to the camera from the object.
An approach a bit more automated is to use a certain pattern (e.g. checkerboard) of known dimensions. It can be automatically detected in the image and the same transformation can be performed.
firstly, I wanted to know the metric unit of the 3d point we got from the opencv reprojectImageTo3D() function.
secondly, I have calibrated each camera individually with a chessboard with "mm" as metric unit and then use the opencv functions to calibrate the stereo system, rectify the stereo pair and then compute the disparity map.
Basically i want the distance of a center of a bounding box.
so i compute the disparity map and reproject it to 3D with the reprojectImageTo3D() function and then i take from those 3D points, the one which correspond to the center of the bbox (x, y).
But which image should i use to get the center of bbox? the rectified or the original?
Secondly, is it better to use the same camera model for a stereo system?
Thank you
During the calibration process (calibrateCamera) you have to give the points grid of your calibration target. The unit that you give there will then define the unit for the rest of the process.
When calling reprojectImageTo3D, you probably used the matrix Q output by stereoRectify, which takes in the individual calibrations (cameraMatrix1, cameraMatrix2). That's where the unit came from.
So in your case you get mm I guess.
reprojectImageTo3D has to use the rectified image, since the disparity is calculated using the rectified image (It wouldn't be properly aligned otherwise). Also, when calculating the disparity, it is calculated relative to the first image given (left one in the doc). So you should use the left rectified image if you computed the disparity like this: cv::StereoMatcher::compute(left, right)
I never had two different cameras, but it makes sense to use the same ones. I think that if you have very different color images, edges or any image difference, that could potentially influence the disparity quality.
What is actually very important (unless you are only working with still pictures), is to use cameras that can be synchronized by hardware (e.g. GENLOCK signal: https://en.wikipedia.org/wiki/Genlock). If you have a bit of delay between left and right and a moving subject, the disparity can be wrong. This is also true for the calibration.
Hope this helps!
There are many tutorials on how to calculate the distance between a camera and an object. Is it possible to calculate the approximate distance between a person detected and the camera using OpenCV?.
Yes it is possible. Like mentioned by #hkchengrex consider your face an object. There's plenty of methods. I'd recommend SIFT Feature Matching of the methods described following that link.
Here are roughly the required steps:
Take a picture of the person and measure the distance manually.
Crop this picture to only contain the person.
Extract the image features (e.g. as sift descriptor)
Take a second picture with the same person but unknown distance.
Detect the person via sift matching (see link above)
Compute a transformation between those two sift feature vectors
Apply the transformation to the distance measured in 1.
Best start at the link provided and further SIFT tutorials in opencv. The required approach is a very simple one and will only work if the person in the picture that is being examined is very similar to the person of picture one. For more advanced approaches I'd refer to scientific papers. Search for "person detection".
In reply to the comments
TL;DR person with same height/width in reality but displayed smaller/larger in the image can be measured regarding distance.
The depicted approach works under the hood as follows. The person (=cropped image) captured at step 2 can be found in any future image as long as he/she appears very similar. In the new image it will give you the rectangular region where the person is located. As the dimensions of this rectangle are now smaller/larger you can take those changes to compute the transformation (which is basically intercept theorem) and thereby the new distance.
What does this mean for a general approach measuring ANY person?
In case the person has the same width/height as the person from step 2 this process works flawlessly. In case they are of similar but but not identical height/width there will be calculation errors. But the results MAY still suffice for your use case. (You can define a generic human e.g. 1,8m of height and XX of width). Nevertheless SIFT might be a bit too specific here. Sorry I'd just refer you to google to see what works best.
If your camera is fixated and the recorded scene doesn't change too much I'd just define a ground plane and manually annotate every pixel projected on this plane with a depth value. So you only have to detect the arbitrary person, see where their feet touch the ground plane and look up this pixel's defined depth value.
If the use case has higher demands you'd have to measure depth in a more complex fashion. This can be done using a stereo-camera rig, a depth sensor or an image sequence via structure from motion.
So there is not the "one can do all" method in OpenCV. It always depends on the use case, the environment and a combination of quite elaborate methods.