Conversion from pixel to general Metric(mm, in)

Conversion from pixel to general Metric(mm, in) - python

I am using openCV to process an image and use houghcircles to detect the circles in the image under test, and also calculating the distance between their centers using euclidean distance.
Since this would be in pixels, I need the absolute distances in mm or inches, can anyone let me know how this can be done
Thanks in advance.

The image formation process implies taking a 2D projection of the real, 3D world, through a lens. In this process, a lot of information is lost (e.g. the third dimension), and the transformation is dependent on lens properties (e.g. focal distance).
The transformation between the distance in pixels and the physical distance depends on the depth (distance between the camera and the object) and the lens. The complex, but more general way, is to estimate the depth (there are specialized algorithms which can do this under certain conditions, but require multiple cameras/perspectives) or use a depth camera which can measure the depth. Once the depth is known, after taking into account the effects of the lens projection, an estimation can be made.
You do not give much information about your setup, but the transformation can be measured experimentally. You simply take a picture of an object of known dimensions and you determine the physical dimension of one pixel (e.g. if the object is 10x10 cm and in the picture it has 100x100px, then 10px is 1mm). This is strongly dependent on the distance to the camera from the object.
An approach a bit more automated is to use a certain pattern (e.g. checkerboard) of known dimensions. It can be automatically detected in the image and the same transformation can be performed.

Related

How to match laser points in laser based stereo camera?

I am trying to scan an object with a laser to extract 3D point clouds. There are 2 cameras and 1 laser in my setup. What I do is giving nonzero points in masks to OpenCV's triangulatePoints function as projPoints arg. Since both numbers of points must be the same for triangulatePoints function and there are 2 masks, if one mask has more nonzero points than the other, I basically downsize it to other's size by doing this:
l1 = len(pts1)
l2 = len(pts2)
newPts1 = pts1[0:l2]
Is there a good way for matching left and right frame nonzero points?

First, if your images normally look like that, your sensors are deeply saturated, and consequently your 3D ranges are either worthless or much less accurate than they could be.
Second, you should aim for matching one point per rectified scanline on each image of the pair, rather than a set of points. The whole idea of using a laser stripe is to get a well focused beam of light on as small a spot or band as possible, so you can probe the surface in detail.
For best accuracy, the peak-finding should be done independently on each scanline of the original (distorted and not rectified) images, so it is not affected by the interpolation used by the undistortion and stereo rectification procedures. Rather, you would use the geometrical undistortion and stereo rectification transforms to map the peaks detected in original images into the rectified ones.
There are several classical algorithms for peak-finding with laser stripe-based triangulation methods, you may find this other answer of mine useful.
Last, if your setup is expected to be as in the picture, with the laser stripe illuminating two orthogonal planes in addition to the object of interest, then you do not need to use stereo at all: you can solve for the 3D plane spanned by the laser stripe projector and triangulate by intersecting that plane with each ray back-projecting the peaks of the image of the laser stripe on the object. This is similar to one of the methods J. Y. Bouguet used in his old Ph.D. thesis on desktop photography (here is a summary by S. Seitz). One implementation using a laser striper is detailed in this patent. This method is surprisingly accurate: with it we achieved approximately 0.2mm accuracy in a cubic foot of volume using a dinky 640x480 CCD video camera back in 1999. Patent has expired, so you are free to enjoy it.

How to extract contour of the front panel the washing machine?

I am looking a robust way to extract the contour of the front panel of a washing machine. Or just get 4 corner points of the front panel.
I've tried color masking but didn't find stable results.
Here some examples:

Three potential options:
Get a bunch of images of the machines, manually determine a label saying where the door is, and then train a convolutional neural network to regress those parameters per image.
Treat each image as a separate optimization problem, where the goal is to estimate the parameters of the best rectangle most likely to correspond to the front panel. So our model is theta = (p_1, p_2, p_3, p_4), the four 2D locations of the panel in the image. We need an energy function E to minimize wrt theta (e.g., using gradient descent with momentum, or RANSAC). There are a number of terms you can use, just as some ideas:
a. At least some of the corners should be "corner-like": run a simple corner detector, and define an energy E_corner which penalizes distance to the closest corner.
b. At least some of the edges (between p_1 and p_2 or p_3, for example) should be "edge-like": compute the gradient magnitude of the image M = || \nabla I || and enforce that along the panel edge the values of M should be larger, using an energy E_edge. E.g., for x,y along an edge, let E_edge(x,y)=1/(1+M(x,y)) (Robust losses tend to be better here though).
c. Use the fact that each door is actually a projected 3D rectangle: e.g., see this question. An interesting idea is to start with a rectangle (representing the panel) and instead of regressing the p_i's, instead regress the parameters of an affine transform or even perspective projection transform (though this requires the algorithm estimate depth), that maps the starting rectangle to one in the image. You can then regularize the parameters of the estimated transform to prevent unlikely transforms from being output.
d. Use knowledge of what must be inside the rectangle. For instance, given the four corners, you can determine the ellipse defining the round door to the machine. The appearance statistics within that ellipse should be somewhat unique, as well as the edges/image gradient at the door boundary; hence you can define an energy term encouraging the model to choose corners such that the interior has a dark elliptical object on a white background.
Overall, this approach is similar to snakes, or active contour models, which might be worth looking into for you I think. However, energy-minimizing snakes tend not to consider the inside of the region they enclose; hence, some variant of the Mumford-Shah functional could be a useful addition (though note smoothness of the "door region" is not entirely desirable in your case).
If all your machines are very similar or nearly the same (as the ones you've posted are), it might actually be best to estimate a homography between the images. (See also here or here). Since the front of the machine is nearly planar, the fronts of different images must be related by a homography. Then knowing where the front panel is in one image will tell you where it is in all of them. For instance, check out the OpenCV tutorial for homographies, where they show how to undo the perspective transform of a planar surface allowing you to do a perspective warp of one image to another (here, one projected machine panel to another template one).

opencv: reprojectImageTo3d what is the metric unit of the (X,Y,Z) point?

firstly, I wanted to know the metric unit of the 3d point we got from the opencv reprojectImageTo3D() function.
secondly, I have calibrated each camera individually with a chessboard with "mm" as metric unit and then use the opencv functions to calibrate the stereo system, rectify the stereo pair and then compute the disparity map.
Basically i want the distance of a center of a bounding box.
so i compute the disparity map and reproject it to 3D with the reprojectImageTo3D() function and then i take from those 3D points, the one which correspond to the center of the bbox (x, y).
But which image should i use to get the center of bbox? the rectified or the original?
Secondly, is it better to use the same camera model for a stereo system?
Thank you

During the calibration process (calibrateCamera) you have to give the points grid of your calibration target. The unit that you give there will then define the unit for the rest of the process.
When calling reprojectImageTo3D, you probably used the matrix Q output by stereoRectify, which takes in the individual calibrations (cameraMatrix1, cameraMatrix2). That's where the unit came from.
So in your case you get mm I guess.
reprojectImageTo3D has to use the rectified image, since the disparity is calculated using the rectified image (It wouldn't be properly aligned otherwise). Also, when calculating the disparity, it is calculated relative to the first image given (left one in the doc). So you should use the left rectified image if you computed the disparity like this: cv::StereoMatcher::compute(left, right)
I never had two different cameras, but it makes sense to use the same ones. I think that if you have very different color images, edges or any image difference, that could potentially influence the disparity quality.
What is actually very important (unless you are only working with still pictures), is to use cameras that can be synchronized by hardware (e.g. GENLOCK signal: https://en.wikipedia.org/wiki/Genlock). If you have a bit of delay between left and right and a moving subject, the disparity can be wrong. This is also true for the calibration.
Hope this helps!

How to get a list the visible vertices and segments of a mesh

I work on pose estimation of a 3d objects. I am using CAD model of that object to generate all the possible hypothesis of its pose.
I am using pyopengl to render the view of the object from a specific POV. Can anyone explain how to get a list of all the visible edges?
So I use face culling to eliminate the occluded faces, but I don't know how to pass the visible edges(indices and segments) to other python functions.
If there are any other approaches (not using OpenGL), I would really appreciate it.
So I want to get the drawn edges in the The rendered image:
I don't really want the image to be displayed.
In summary, I have a CAD model, and I want a function that can return the visible segments out of a specific POV.
Thanks

Face culling
This works only for single convex strict winding rule mesh without holes!
The idea is that sign of dot product of 2 vectors will tell you if the vectors are opposite or not. So if we have a normal pointing out and view direction their dot should be negative for faces turned towards camera/viewer.
As you do not want to render just select visible planar faces/edges you can do this on CPU side entirely. What you need is to have your mesh in form of planar faces (does not matter if triangles,quads or whatever) so let assume triangles (for more points you just add them to _face but for computation still use only v0,v1,v2) ... Each face should have the vertexes and normal.
struct _face
{
double v0[3],v1[3],v2[3],n[3];
};
List<_face> mesh;
Now the vertexes v0,v1,v2 you already have. All of them should be ordered in strict winding rule. That means if you look at any face from outside the points should form only CW (clockwise) loop (or only CCW (counter-clockwise) loop). To compute normal you simply exploit cross product which returns vector perpendicular to both operands:
n = cross(v1-v0,v2-v1) // cross product
n = n / |n| // optional normalize to unit vector
If you need the vector math see
Understanding 4x4 homogenous transform matrices
On the bottom is how to compute this... Also the whole answer you will need for the camera direction so read it...
Now if your mesh has strict winding rule than all the computed normals are pointing out of mesh (or inwards depends on your coordinate system, CW/CCW and order of operands in cross product). Let assume they all pointing out (if not just negate normal).
In case you do not have strict winding rule compute avg point of your mesh (sum all vertexes and divide by their count) this will be the center c of your object. Now just compute
dot(n,(v0+v1+v2)/3 - c)
and if not positive negate the n. This will repair your normals (you can also reverse the v0,v1,v2 to repair the mesh.
Now the camera and mesh usually has its own 4x4 transform matrix. one transfroms from mesh LCS (local coordinate system) to GCS ("world" global coordinate system) and the other from GCS to camera LCS (screen). We do not need projections for this as we do not render ... So what we need to do for each face is:
convert n to GCS
compute dot(n,camera_view_direction)
where camera_view_direction is GCS vector pointing in view direction. You can take it from direct camera matrix directly. It is usually the Z axis vector (in OpenGL Perspective view it is -Z). Beware camera matrix used for rendering is inverse matrix so if the case either compute inverse first or transpose it as we do not need the offset anyway ...
decide if face visible from the sign of #2
Again all the math is explained in the link above...
In case you do not have mesh matrix (does not have changing position or orientation) you can assume its matrix is unit one which means GCS = mesh LCS so no need for transformations.
In some cases there is no camera and only mesh matrix (I suspect your case) then it is similar you just ignore the camera transforms and use (0,0,-1) or (0,0,+1) as view direction.
Also see this:
Understanding lighting in OpenGL
It should shine some light on the normals topic.

Determining the pattern orientation of a spatiotemporal image

How can I obtain average direction of the pattern shown in the figure below. It is the direction of the red arrow relative to the yellow (horizontal) line. Any ideas for an approach? I couldn't figure out a way to approach. This is a spatio-temporal image created from a video. Thank you.
Here is my original image:

The simplest approach would be to compute the gradient vector (x derivative and y derivative) and find its direction at each pixel (atan2(y,x)). The average orientation is what you want, not the average direction (will cancel out). So apply modulus pi, then average across the image.
The best way to compute image gradients is through the Gaussian gradients.
The structure tensor is the more robust way of accomplishing this. In short, it computes local averages of the gradient vector to reduce the effect of noise. It does this by computing the outer product of the gradient vector with itself, which produces a symmetric matrix. The individual components of this matrix can then be locally averaged (i.e. apply a smoothing filter). This is similar to computing the angle of the vectors, doubling the angles to make vectors in opposite directions equal, then averaging them.
Note that you can apply either of these solutions in 3D (you can think of the video data as 2D + time = 3D). That way, you compute both the speed and the direction of motion within the 2D frame, rather than just the speed along the direction in which you extracted the 2D image you show in the question. The image gradient and the concept of the structure tensor easily extend to 3D. This 3D approach is similar to the approach by Lucas-Kanade for optical flow. If you follow that link to Wikipedia, you'll see it uses the structure tensor in 2D, and adds gradients along the time dimension.

Might be useful to try Fourier transform.
In your case you should get two vertical lines in the middle of the transformed image corresponding to the information when traveling vertically in the image.
On the other hand there shouldn't be a horizontal line since when traveling horizontally in the image there is little information (little change)
For example you can use this online site to play with fourier transforms:
https://www.ejectamenta.com/Fourifier-fullscreen/
It might sound like the problem remains the same but in fact it is much easier now.
The 2D pattern is converted into dominant lines which are quite easy to find in the transformed image.
For example you can search for the strongest pixels in the image and simply determine if they are more likely to be horizontal line or a vertical line or determine the angle of the dominant line. Then rotate by 90 degrees.
For example see this image of wood grain and the resulting transformed image:
And don't worry about the two lines. The image is symmetric so just ignore 3/4 of the image and look in 1 quarter.

I recommend giving the Hough transform a go, it is available in OpenCv. The Hough transform maps lines to angles, and might be useful in your case.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.