The goal. To estimate the 3D location (x, y, z) of the centre, the width (larger diameter of the glass) and height of the glass. Similarly as in this drawing. The inputs are two images, one coming from one different camera (here and here).
The setup. The images come from two fixed and calibrated (known intrinsic and extrinsic parameters) cameras.
My attempt.
I have segmented the image using FCN or DeepLab. Results here and here.
Then I have got a binary mask of the class of interest (glass) and extracted the most left, up, right and bottom parts of that mask. Results here and here.
I have obtained four 3D points trough triangulation of the "corresponding points" (upper of image 1 with the upper of image 2, most right of image 1 with the most right of image 2, etc...).
I compute the dimensions as: width = | left - right |, and height = |up - bottom|.
Issues. The points are not actual correspondences, therefore the reprojection is inaccurate and then the measure is inaccurate as well (resulting on up to 3cm error). Note that if I choose manually the corresponding pixels on both images and then triangulate I get approximatively 0.1cm error.
Can you guide me on how better (more accurately) solve this problem?
Thank you!
PS: I am using python and OpenCV.
Related
Using an iPhone camera with a TrueDepth sensor I am able to capture accurate depth data in images of my face. I am capturing depth from the front, left and right sides (about 30 degrees rotation) and with the head tilted up a bit to capture under the chin (so 4 depth images in total). We are only capturing depth here, so no color information. We are cropping out unimportant data by using an ellipse frame
We are also using ARKit to give us the transform of the face anchor which is the same as the transform of the face, ref: https://developer.apple.com/documentation/arkit/arfaceanchor. It isn't possible to capture a depth image and the face transform at the same time, since they come from different capture sessions. So we have to take the depth image, and then quickly switch sessions while the user holds their face still to get the face anchor transforms. The world alignment is set the .camera so the face anchor transform should be relative to the camera, not the world origin.
We end up with 4 point clouds that look like this: left-right: chin up, left 30, front on, right 30
We also end up with 4 transforms. We are trying to stitch they point clouds back together to make a smooth mesh of the face using open3d in python.
The process so far is as follows:
read point clouds and transforms
apply inverse transforms to point clouds to return to original position w.r.t camera
I was expecting these point clouds to roughly be at the same position, but this is happening instead:
As you can see the faces are still offset from one another:
Am I using the transforms wrong?
The python code and example point clouds and transforms are here: https://github.com/JoshPJackson/FaceMesh but the important bit is below:
dir = './temp4/'
frontPcd = readPointCloud('Front.csv', dir)
leftPcd = readPointCloud('Left.csv', dir)
rightPcd = readPointCloud('Right.csv', dir)
chinPcd = readPointCloud('Chin.csv', dir)
frontTransform = readTransform('front_transform.csv', dir)
leftTransform = readTransform('left_transform.csv', dir)
rightTransform = readTransform('right_transform.csv', dir)
chinTransform = readTransform('chin_transform.csv', dir)
rightPcd.transform(np.linalg.inv(rightTransform))
leftPcd.transform(np.linalg.inv(leftTransform))
frontPcd.transform(np.linalg.inv(frontTransform))
chinPcd.transform(np.linalg.inv(chinTransform))
I am expecting to get all of the point clouds merging together so I can remove duplicate vertices and then make a mesh
One good method is to find a mathematical reference of you faces. (work only with surfaces)
Many steps to do it:
Take one face and create a 2D function to map the face with a function f(x, y). The noise has to point toward z direction.
Fit your new function to other faces using lmfit.minimize or curve_fit
Use return parameters from the fit to know the offset!
I have automated the task of measuring plant area over time to extrapolate growth rate using an image time-series and the following two methods: (1) Python + ArcGIS, and (2) Python + OpenCV.
In the first method, ArcGIS allows me to create a vector grid on the image. Each cell of the grid contains a single plant, so I number each cell starting from top-left to bottom-right. After creating a binary image in which plant pixels == 1 and everything else == 0, I apply Zonal Statistics to find my plant area. In this way the plant numbers stay consistent because I use the same grid over all the images in the time series, but it requires manual intervention.
In the second method, I use OpenCV to find plants via contours. The numbering of each contour is done automatically based on its centroid coordinates and bounding box dimensions. Currently I have them sorted 'top-to-bottom', but it obviously isn't as perfect a sort as the manually-made grid. In addition, plant #1 may not stay plant #1 in the second or third image because each plant grows and moves over the course of the experiment, and new plants emerge and change the total number of contours (images are taken every hour for up to several weeks). Therefore, I cannot compare plant #1 in the first image and plant #1 in subsequent images because they may not even be the same plant.
How can I consistently number the same plant through the entire time-series using the second method? I considered associating centroids in subsequent images to (x,y) coordinates in the previous image that were the most similar (once the data is in tabular form), but this would fail to provide an updated numbered contour image.
The solution to this problem lay in automatic circle detection via the OpenCV Hough Transform function (cv2.HoughCircles()), finding the resulting Hough Circle centroids and then overlaying them on the original RGB image to create a reference key. As I did not have an image without any plants in it at all, I adapted the method so it found the correct amount of origins, but the result would be better in an image with no plants.
I converted the resulting csv files for the hough circles reference image (columns: OID, X, Y) and plant contours (columns: CID, X, Y, Area etc.) to GeoPandas GeoDataFrames and used Scipy's cKDTree to combine them through a nearest neighbour algorithm.
Special thanks to JHuw's answer in https://gis.stackexchange.com/questions/222315/geopandas-find-nearest-point-in-other-dataframe as Shapely's nearest_points function did not work for me.
I'm currently working on my first assignment in image processing (using OpenCV in Python). My assignment is to calculate a precise score (to tenths of a point) of one to several shooting holes in an image uploaded by a user. One of the requirements is to transform the uploaded shooting target image to be from "birds-eye view" for further processing. For that I have decided that I need to find center coordinates of numbers (7 & 8) to select them as my 4 quadrilateral.
Unfortunately, there are several limitations that need to be taken into account.
Limitations:
resolution of the processed shooting target image can vary
the image can be taken in different lighting conditions
the image processed by this part of my algorithm will always be taken under an angle (extreme angles will be automatically rejected)
the image can be slightly rotated (+/- 10 degrees)
the shooting target can be just a part of the image
the image can be only of the center black part of the target, meaning the user doesn't have to take a photo of the whole shooting target (but there always has to be the center black part on it)
this algorithm can take a maximum of 2000ms runtime
What I have tried so far:
Template matching
here I quickly realized that it was unusable since the numbers could be slightly rotated and a different scale
Feature matching
I have tried all of the different feature matching types (SIFT, SURF, ORB...)
unfortunately, the numbers do not have that specific set of features so they matched a quite lot of false positives, but I could possibly filter them by adding shape matching, etc..
the biggest blocker was runtime, the runtime of only a single number feature matching took around 5000ms (even after optimizations) (on MacBook PRO 2017)
Optical character recognition
I mostly tried using pytesseract library
even after thresholding the image to inverted binary (so the text of numbers 7 and 8 is black and the background white) it failed to recognize them
I also tried several ways of preprocessing the image and I played a lot with the tesseract config parameter but it didn't seem to help whatsoever
Contour detection
I have easily detected all of the wanted numbers (7 & 8) as single contours but failed to filter out all of the false positives (since the image can be in different resolutions and also there are two types of targets with different sizes of the numbers I couldn't simply threshold the contour by its width, height or area)
After I would detect the numbers as contours I wanted to extract them as some ROI and then I would use OCR on them (but since there were so many false positives this would take a lot of time)
I also tried filtering them by using cv2.matchShapes function on both contours and cropped template / ROI but it seemed really unreliable
Example processed images:
high resolution version here
high resolution version here
high resolution version here
high resolution version here
high resolution version here
high resolution version here
As of right now, I'm lost on how to progress about this. I have tried everything I could think of. I would be immensely happy if any of you image recognition experts gave me any kind of advice or even better a usable code example to help me solve my problem.
Thank you all in advance.
Find the black disk by adaptive binarization and contour (possibly blur to erase the inner features);
Fit an ellipse to the outline, as accurate as possible;
Find at least one edge of the square (Hough lines);
Classify the edge as one of NWSE (according to angle);
Use the ellipse and the line information to reconstruct the perspective transformation (it is an homography);
Apply the inverse homography to straighten the image and obtain the exact target center and axis;
Again by adaptive binarization, find the bullet holes (center/radius);
Rate the holes after their distance to the center, relative to the back disk radius.
If the marking scheme is variable, detect the circles (Hough circles, using the known center, or detect peaks in an oblique profile starting from the center).
If necessary, you could OCR the digits, but it seems that the score is implicitly starting at one in the outer ring.
I'm currently trying to write a program that can automatically extract data from some graphs in multiple scanned documents. Mainly by using opencv I would like to detect some features of the graphs in order to convert them into usable data. In the left graph I'm looking for the height of the circle sectors and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
What follows is a step by step plan of how I think my algorithm will work:
Align the image based on the big dotted lines. This way I can ensure that the graphs in all the scanned images will have the exact same positions. After all, it is possible that some images will be slightly tilted or moved in comparison with other images, due to the manual scanning process. Basically I want the coordinate of a pixel in one image to correspond to the exact same pixel in another image.
We now know that the coordinates of the graph centers and the angles for the circle sectors are identical for all images now. For each circle sector, filter the darker pixels from the lighter ones. This is done using the openCV inRange function.
Search for the best fitting segment over the darker pixels in the left graph and search for the best fitting triangle in the right graph. This is done by global optimization.
Return the radius of the optimal segment and return the edge lengths of the optimal triangle. Now we have values that we can use as data.
I have more or less figured out how to do every step, except the first one. I have no clue on how I would go about aligning my images. Does someone might have an idea or a strategy on how to achieve this alignment?
Step 1: canny, it give you perfect long edge. If this is the only part you dont understand, here is the answer. You can adjust the parameter to get the best result. The first will be idea for both line and pie circle. But if you only keen to find pie. change the parameter accordingly to get my 2nd image
The red denotes the doted line. sample from opencv directly
Step 2: local area enhancement/segmentation to find both circles (from image 1 parameter with houghcircle param2 set to 110)
Step 3: Segment the pie out(all the way to the edge of image) and find the median line
Step 4: OCR on the test image pies and find the distance of none-background color along the median line.
Step 5: generate list out and send to csv or sth
firstly, I wanted to know the metric unit of the 3d point we got from the opencv reprojectImageTo3D() function.
secondly, I have calibrated each camera individually with a chessboard with "mm" as metric unit and then use the opencv functions to calibrate the stereo system, rectify the stereo pair and then compute the disparity map.
Basically i want the distance of a center of a bounding box.
so i compute the disparity map and reproject it to 3D with the reprojectImageTo3D() function and then i take from those 3D points, the one which correspond to the center of the bbox (x, y).
But which image should i use to get the center of bbox? the rectified or the original?
Secondly, is it better to use the same camera model for a stereo system?
Thank you
During the calibration process (calibrateCamera) you have to give the points grid of your calibration target. The unit that you give there will then define the unit for the rest of the process.
When calling reprojectImageTo3D, you probably used the matrix Q output by stereoRectify, which takes in the individual calibrations (cameraMatrix1, cameraMatrix2). That's where the unit came from.
So in your case you get mm I guess.
reprojectImageTo3D has to use the rectified image, since the disparity is calculated using the rectified image (It wouldn't be properly aligned otherwise). Also, when calculating the disparity, it is calculated relative to the first image given (left one in the doc). So you should use the left rectified image if you computed the disparity like this: cv::StereoMatcher::compute(left, right)
I never had two different cameras, but it makes sense to use the same ones. I think that if you have very different color images, edges or any image difference, that could potentially influence the disparity quality.
What is actually very important (unless you are only working with still pictures), is to use cameras that can be synchronized by hardware (e.g. GENLOCK signal: https://en.wikipedia.org/wiki/Genlock). If you have a bit of delay between left and right and a moving subject, the disparity can be wrong. This is also true for the calibration.
Hope this helps!