Combining point clouds of facial poses into a 3d mesh - python

Using an iPhone camera with a TrueDepth sensor I am able to capture accurate depth data in images of my face. I am capturing depth from the front, left and right sides (about 30 degrees rotation) and with the head tilted up a bit to capture under the chin (so 4 depth images in total). We are only capturing depth here, so no color information. We are cropping out unimportant data by using an ellipse frame
We are also using ARKit to give us the transform of the face anchor which is the same as the transform of the face, ref: https://developer.apple.com/documentation/arkit/arfaceanchor. It isn't possible to capture a depth image and the face transform at the same time, since they come from different capture sessions. So we have to take the depth image, and then quickly switch sessions while the user holds their face still to get the face anchor transforms. The world alignment is set the .camera so the face anchor transform should be relative to the camera, not the world origin.
We end up with 4 point clouds that look like this: left-right: chin up, left 30, front on, right 30
We also end up with 4 transforms. We are trying to stitch they point clouds back together to make a smooth mesh of the face using open3d in python.
The process so far is as follows:
read point clouds and transforms
apply inverse transforms to point clouds to return to original position w.r.t camera
I was expecting these point clouds to roughly be at the same position, but this is happening instead:
As you can see the faces are still offset from one another:
Am I using the transforms wrong?
The python code and example point clouds and transforms are here: https://github.com/JoshPJackson/FaceMesh but the important bit is below:
dir = './temp4/'
frontPcd = readPointCloud('Front.csv', dir)
leftPcd = readPointCloud('Left.csv', dir)
rightPcd = readPointCloud('Right.csv', dir)
chinPcd = readPointCloud('Chin.csv', dir)
frontTransform = readTransform('front_transform.csv', dir)
leftTransform = readTransform('left_transform.csv', dir)
rightTransform = readTransform('right_transform.csv', dir)
chinTransform = readTransform('chin_transform.csv', dir)
rightPcd.transform(np.linalg.inv(rightTransform))
leftPcd.transform(np.linalg.inv(leftTransform))
frontPcd.transform(np.linalg.inv(frontTransform))
chinPcd.transform(np.linalg.inv(chinTransform))
I am expecting to get all of the point clouds merging together so I can remove duplicate vertices and then make a mesh

One good method is to find a mathematical reference of you faces. (work only with surfaces)
Many steps to do it:
Take one face and create a 2D function to map the face with a function f(x, y). The noise has to point toward z direction.
Fit your new function to other faces using lmfit.minimize or curve_fit
Use return parameters from the fit to know the offset!

Related

How to match laser points in laser based stereo camera?

I am trying to scan an object with a laser to extract 3D point clouds. There are 2 cameras and 1 laser in my setup. What I do is giving nonzero points in masks to OpenCV's triangulatePoints function as projPoints arg. Since both numbers of points must be the same for triangulatePoints function and there are 2 masks, if one mask has more nonzero points than the other, I basically downsize it to other's size by doing this:
l1 = len(pts1)
l2 = len(pts2)
newPts1 = pts1[0:l2]
Is there a good way for matching left and right frame nonzero points?
First, if your images normally look like that, your sensors are deeply saturated, and consequently your 3D ranges are either worthless or much less accurate than they could be.
Second, you should aim for matching one point per rectified scanline on each image of the pair, rather than a set of points. The whole idea of using a laser stripe is to get a well focused beam of light on as small a spot or band as possible, so you can probe the surface in detail.
For best accuracy, the peak-finding should be done independently on each scanline of the original (distorted and not rectified) images, so it is not affected by the interpolation used by the undistortion and stereo rectification procedures. Rather, you would use the geometrical undistortion and stereo rectification transforms to map the peaks detected in original images into the rectified ones.
There are several classical algorithms for peak-finding with laser stripe-based triangulation methods, you may find this other answer of mine useful.
Last, if your setup is expected to be as in the picture, with the laser stripe illuminating two orthogonal planes in addition to the object of interest, then you do not need to use stereo at all: you can solve for the 3D plane spanned by the laser stripe projector and triangulate by intersecting that plane with each ray back-projecting the peaks of the image of the laser stripe on the object. This is similar to one of the methods J. Y. Bouguet used in his old Ph.D. thesis on desktop photography (here is a summary by S. Seitz). One implementation using a laser striper is detailed in this patent. This method is surprisingly accurate: with it we achieved approximately 0.2mm accuracy in a cubic foot of volume using a dinky 640x480 CCD video camera back in 1999. Patent has expired, so you are free to enjoy it.

Measuring a real object with two static calibrated cameras

The goal. To estimate the 3D location (x, y, z) of the centre, the width (larger diameter of the glass) and height of the glass. Similarly as in this drawing. The inputs are two images, one coming from one different camera (here and here).
The setup. The images come from two fixed and calibrated (known intrinsic and extrinsic parameters) cameras.
My attempt.
I have segmented the image using FCN or DeepLab. Results here and here.
Then I have got a binary mask of the class of interest (glass) and extracted the most left, up, right and bottom parts of that mask. Results here and here.
I have obtained four 3D points trough triangulation of the "corresponding points" (upper of image 1 with the upper of image 2, most right of image 1 with the most right of image 2, etc...).
I compute the dimensions as: width = | left - right |, and height = |up - bottom|.
Issues. The points are not actual correspondences, therefore the reprojection is inaccurate and then the measure is inaccurate as well (resulting on up to 3cm error). Note that if I choose manually the corresponding pixels on both images and then triangulate I get approximatively 0.1cm error.
Can you guide me on how better (more accurately) solve this problem?
Thank you!
PS: I am using python and OpenCV.

Align scanned documents based on a reference point, using openCV

I'm currently trying to write a program that can automatically extract data from some graphs in multiple scanned documents. Mainly by using opencv I would like to detect some features of the graphs in order to convert them into usable data. In the left graph I'm looking for the height of the circle sectors and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
What follows is a step by step plan of how I think my algorithm will work:
Align the image based on the big dotted lines. This way I can ensure that the graphs in all the scanned images will have the exact same positions. After all, it is possible that some images will be slightly tilted or moved in comparison with other images, due to the manual scanning process. Basically I want the coordinate of a pixel in one image to correspond to the exact same pixel in another image.
We now know that the coordinates of the graph centers and the angles for the circle sectors are identical for all images now. For each circle sector, filter the darker pixels from the lighter ones. This is done using the openCV inRange function.
Search for the best fitting segment over the darker pixels in the left graph and search for the best fitting triangle in the right graph. This is done by global optimization.
Return the radius of the optimal segment and return the edge lengths of the optimal triangle. Now we have values that we can use as data.
I have more or less figured out how to do every step, except the first one. I have no clue on how I would go about aligning my images. Does someone might have an idea or a strategy on how to achieve this alignment?
Step 1: canny, it give you perfect long edge. If this is the only part you dont understand, here is the answer. You can adjust the parameter to get the best result. The first will be idea for both line and pie circle. But if you only keen to find pie. change the parameter accordingly to get my 2nd image
The red denotes the doted line. sample from opencv directly
Step 2: local area enhancement/segmentation to find both circles (from image 1 parameter with houghcircle param2 set to 110)
Step 3: Segment the pie out(all the way to the edge of image) and find the median line
Step 4: OCR on the test image pies and find the distance of none-background color along the median line.
Step 5: generate list out and send to csv or sth

opencv: reprojectImageTo3d what is the metric unit of the (X,Y,Z) point?

firstly, I wanted to know the metric unit of the 3d point we got from the opencv reprojectImageTo3D() function.
secondly, I have calibrated each camera individually with a chessboard with "mm" as metric unit and then use the opencv functions to calibrate the stereo system, rectify the stereo pair and then compute the disparity map.
Basically i want the distance of a center of a bounding box.
so i compute the disparity map and reproject it to 3D with the reprojectImageTo3D() function and then i take from those 3D points, the one which correspond to the center of the bbox (x, y).
But which image should i use to get the center of bbox? the rectified or the original?
Secondly, is it better to use the same camera model for a stereo system?
Thank you
During the calibration process (calibrateCamera) you have to give the points grid of your calibration target. The unit that you give there will then define the unit for the rest of the process.
When calling reprojectImageTo3D, you probably used the matrix Q output by stereoRectify, which takes in the individual calibrations (cameraMatrix1, cameraMatrix2). That's where the unit came from.
So in your case you get mm I guess.
reprojectImageTo3D has to use the rectified image, since the disparity is calculated using the rectified image (It wouldn't be properly aligned otherwise). Also, when calculating the disparity, it is calculated relative to the first image given (left one in the doc). So you should use the left rectified image if you computed the disparity like this: cv::StereoMatcher::compute(left, right)
I never had two different cameras, but it makes sense to use the same ones. I think that if you have very different color images, edges or any image difference, that could potentially influence the disparity quality.
What is actually very important (unless you are only working with still pictures), is to use cameras that can be synchronized by hardware (e.g. GENLOCK signal: https://en.wikipedia.org/wiki/Genlock). If you have a bit of delay between left and right and a moving subject, the disparity can be wrong. This is also true for the calibration.
Hope this helps!

Finding Corner points of Scrabble Board in an image

I am trying to extract the tiles ( Letters ) placed on a Scrabble Board. The goal is to identify / read all possible words present on the board.
An example image -
Ideally, I would like to find the four corners of the scrabble Board, and apply perspective transform, for further processing.
After Perspective transform -
The algorithm that I am using is as follows -
Apply Adaptive thresholding to the gray scale image of the Scrabble Board.
Dilate / Close the image, find the largest contour in the given image, then find the convex hull, and completely fill the area enclosed by the convex hull.
Find the boundary points ( contour ) of the resultant image, then apply Contour approximation to get the corner points, then apply perspective transform
Corner Points found -
This approach works with images like these. But, as you can see, many square boards have a base, which is curved at the top and the bottom. Sometimes, the base is a big circular board. And with these images my approach fails. Example images and outputs -
Board with Circular base:
Points found using above approach:
I can post more such problematic images, but this image should give you an idea about the problem that I am dealing with. My question is -
How do I find the rectangular board when a circular board is also present in the image?
Some points I would like to state -
I tried using hough lines to detect the lines in the image, find the largest vertical line(s), and then find their intersections to detect the corner points. Unfortunately, because of the tiles, all lines seem to be distorted / disconnected, and hence my attempts have failed.
I have also tried to apply contour approximation to all the contours found in the image ( I was assuming that the large rectangle, too, would be a contour ), but that approach failed as well.
I have implemented the solution in openCV-python. Since the approach is what matters here, and the question was becoming a tad too long, I didn't post the relevant code.
I am willing to share more such problematic images as well, if it is required.
Thank you!
EDIT1
#Silencer's answer has been mighty helpful to me for identifying letters in the image, but I want to accurately find the placement of the words in the image. Hence, I feel identifying the rows and columns is necessary, and I can do that only when a perspective transform is applied to the board.
I wrote an answer on MSER text detection:
Trying to Plot OpenCV's MSER regions using matplotlib
The code generate the following results on your images.
You can have a try.
I think #silencer has already given quite promising solution.
But to perform perspective transform as you have mentioned that you have already tried with hough lines to find the largest rectangle but it fails because for tiles present.
Given you have large image data set may be more than 1000 images, you can also give a shot to Deep learning based approach where you can train a model with images as input and corresponding rectangle boundary points coordinate as outputs.

Categories