I have been capturing a photo of my face every day for the last couple of months, resulting in a sequence of images taken from the same spot, but with slight variations in orientation of my face. I have tried several ways to stabilize this sequence using Python and OpenCV, with varying rates of success. My question is: "Is the process I have now the best way to tackle this, or are there better techniques / order to execute things in?"
My process so far looks like this:
Collect images, keep the original image, a downscaled version and a downscaled grayscale version
Using dlib.get_frontal_face_detector() on the grayscale image, get a rectangle face containing my face.
Using the dlib shape-predictor 68_face_landmarks.dat, obtain the coordinates of the 68 face landmarks, and extract the position of eyes, nose, chin and mouth (specifically landmarks 8, 30, 36, 45, 48 and 54)
Using a 3D representation of my face (i.e. a numpy array containing 3D coordinates of an approximation of these landmarks on my real actual face in an arbitrary reference frame) and cv2.solvePnP, calculate a perspective transform matrix M1 to align the face with my 3D representation
Using the transformed face landmarks (i.e. cv2.projectPoints(face_points_3D, rvec, tvec, ...) with _, rvec, tvec = cv2.solvePnP(...)), calculate the 2D rotation and translation required to align the eyes vertically, center them horizontally and place them on a fixed distance from each other, and obtain the transformation matrix M2.
Using M = np.matmul(M2, M1) and cv2.warpPerspective, warp the image.
Using this method, I get okay-ish results, but it seems the 68 landmark prediction is far from perfect, resulting in twitchy stabilization and sometimes very skewed images (in that I can't remember having such a large forehead...). For example, the landmark prediction of one of the corners of the eye not always aligns with the actual eye, resulting in a perspective transform with the actual eye being skewed 20px down.
In an attempt to fix this, I have tried using SIFT to find features in two different photos (aligned using above method) and obtain another perspective transform. I then force the features to be somewhere around my detected face landmarks as to not align the background (using a mask in cv2.SIFT_create().detectAndCompute(...)), but this sometimes results in features only (or predominantly) being found around only one of the eyes, or not around the mouth, resulting again in extremely skewed images.
What would be a good way to get a smooth sequence of images, stabilized around my face? For reference, this video (not mine, which is stabilized around the eyes).
Related
I'm currently working on my first assignment in image processing (using OpenCV in Python). My assignment is to calculate a precise score (to tenths of a point) of one to several shooting holes in an image uploaded by a user. One of the requirements is to transform the uploaded shooting target image to be from "birds-eye view" for further processing. For that I have decided that I need to find center coordinates of numbers (7 & 8) to select them as my 4 quadrilateral.
Unfortunately, there are several limitations that need to be taken into account.
Limitations:
resolution of the processed shooting target image can vary
the image can be taken in different lighting conditions
the image processed by this part of my algorithm will always be taken under an angle (extreme angles will be automatically rejected)
the image can be slightly rotated (+/- 10 degrees)
the shooting target can be just a part of the image
the image can be only of the center black part of the target, meaning the user doesn't have to take a photo of the whole shooting target (but there always has to be the center black part on it)
this algorithm can take a maximum of 2000ms runtime
What I have tried so far:
Template matching
here I quickly realized that it was unusable since the numbers could be slightly rotated and a different scale
Feature matching
I have tried all of the different feature matching types (SIFT, SURF, ORB...)
unfortunately, the numbers do not have that specific set of features so they matched a quite lot of false positives, but I could possibly filter them by adding shape matching, etc..
the biggest blocker was runtime, the runtime of only a single number feature matching took around 5000ms (even after optimizations) (on MacBook PRO 2017)
Optical character recognition
I mostly tried using pytesseract library
even after thresholding the image to inverted binary (so the text of numbers 7 and 8 is black and the background white) it failed to recognize them
I also tried several ways of preprocessing the image and I played a lot with the tesseract config parameter but it didn't seem to help whatsoever
Contour detection
I have easily detected all of the wanted numbers (7 & 8) as single contours but failed to filter out all of the false positives (since the image can be in different resolutions and also there are two types of targets with different sizes of the numbers I couldn't simply threshold the contour by its width, height or area)
After I would detect the numbers as contours I wanted to extract them as some ROI and then I would use OCR on them (but since there were so many false positives this would take a lot of time)
I also tried filtering them by using cv2.matchShapes function on both contours and cropped template / ROI but it seemed really unreliable
Example processed images:
high resolution version here
high resolution version here
high resolution version here
high resolution version here
high resolution version here
high resolution version here
As of right now, I'm lost on how to progress about this. I have tried everything I could think of. I would be immensely happy if any of you image recognition experts gave me any kind of advice or even better a usable code example to help me solve my problem.
Thank you all in advance.
Find the black disk by adaptive binarization and contour (possibly blur to erase the inner features);
Fit an ellipse to the outline, as accurate as possible;
Find at least one edge of the square (Hough lines);
Classify the edge as one of NWSE (according to angle);
Use the ellipse and the line information to reconstruct the perspective transformation (it is an homography);
Apply the inverse homography to straighten the image and obtain the exact target center and axis;
Again by adaptive binarization, find the bullet holes (center/radius);
Rate the holes after their distance to the center, relative to the back disk radius.
If the marking scheme is variable, detect the circles (Hough circles, using the known center, or detect peaks in an oblique profile starting from the center).
If necessary, you could OCR the digits, but it seems that the score is implicitly starting at one in the outer ring.
firstly, I wanted to know the metric unit of the 3d point we got from the opencv reprojectImageTo3D() function.
secondly, I have calibrated each camera individually with a chessboard with "mm" as metric unit and then use the opencv functions to calibrate the stereo system, rectify the stereo pair and then compute the disparity map.
Basically i want the distance of a center of a bounding box.
so i compute the disparity map and reproject it to 3D with the reprojectImageTo3D() function and then i take from those 3D points, the one which correspond to the center of the bbox (x, y).
But which image should i use to get the center of bbox? the rectified or the original?
Secondly, is it better to use the same camera model for a stereo system?
Thank you
During the calibration process (calibrateCamera) you have to give the points grid of your calibration target. The unit that you give there will then define the unit for the rest of the process.
When calling reprojectImageTo3D, you probably used the matrix Q output by stereoRectify, which takes in the individual calibrations (cameraMatrix1, cameraMatrix2). That's where the unit came from.
So in your case you get mm I guess.
reprojectImageTo3D has to use the rectified image, since the disparity is calculated using the rectified image (It wouldn't be properly aligned otherwise). Also, when calculating the disparity, it is calculated relative to the first image given (left one in the doc). So you should use the left rectified image if you computed the disparity like this: cv::StereoMatcher::compute(left, right)
I never had two different cameras, but it makes sense to use the same ones. I think that if you have very different color images, edges or any image difference, that could potentially influence the disparity quality.
What is actually very important (unless you are only working with still pictures), is to use cameras that can be synchronized by hardware (e.g. GENLOCK signal: https://en.wikipedia.org/wiki/Genlock). If you have a bit of delay between left and right and a moving subject, the disparity can be wrong. This is also true for the calibration.
Hope this helps!
I am trying to get a 3D point cloud from images of my stereo camera.
I calibrated my stereo camera (240x320 each) with OpenCV in Python and got a reprojection error of 0.25. I used a chessboard pattern with 9x10 rows and columns and took 10 images in different angles at different lighting conditions and moved the camera while doing it. (I'm not sure, maybe that's the problem) Then I imported them to matlab and matlab gave me the chessboard corners. (I chose matlab because it was much more precise than opencv) Then I loaded the corners into my python calibration script.
The intrinsic calibration is flawless:
But I'm not sure whether I can use the results of the extrinsic calibration, because the image is still circular and has those black spaces; the rectangular centers of the images are good:
The parameters and matrices are saved and loaded in another script. That script runs a sobel operator through a left and a right image and finds the matching edge pixels in both images:
Then I'm handing those matching pixels over to cv.triangulatePoints(projMatr1, projMatr2, projPoints1, projPoints2) which should give me 3D coordinates of the points. I plot them with plotly but the result is not recognizable and is nowhere near the edges of the shown cup :
Does anyone have an idea, what it could be or what I'm doing wrong?
I can post my code here, if needed.
Ok I tried exchanging the row and column elements of the matched points from
[row, col] => [col, row] and now the triangulation is working:
There are still false matchings but thats ok, I expected them.
But changing the rows and columns doesnt work with the images:
but thats a different story and I'm satisfied with the current result.
I am trying to extract the tiles ( Letters ) placed on a Scrabble Board. The goal is to identify / read all possible words present on the board.
An example image -
Ideally, I would like to find the four corners of the scrabble Board, and apply perspective transform, for further processing.
After Perspective transform -
The algorithm that I am using is as follows -
Apply Adaptive thresholding to the gray scale image of the Scrabble Board.
Dilate / Close the image, find the largest contour in the given image, then find the convex hull, and completely fill the area enclosed by the convex hull.
Find the boundary points ( contour ) of the resultant image, then apply Contour approximation to get the corner points, then apply perspective transform
Corner Points found -
This approach works with images like these. But, as you can see, many square boards have a base, which is curved at the top and the bottom. Sometimes, the base is a big circular board. And with these images my approach fails. Example images and outputs -
Board with Circular base:
Points found using above approach:
I can post more such problematic images, but this image should give you an idea about the problem that I am dealing with. My question is -
How do I find the rectangular board when a circular board is also present in the image?
Some points I would like to state -
I tried using hough lines to detect the lines in the image, find the largest vertical line(s), and then find their intersections to detect the corner points. Unfortunately, because of the tiles, all lines seem to be distorted / disconnected, and hence my attempts have failed.
I have also tried to apply contour approximation to all the contours found in the image ( I was assuming that the large rectangle, too, would be a contour ), but that approach failed as well.
I have implemented the solution in openCV-python. Since the approach is what matters here, and the question was becoming a tad too long, I didn't post the relevant code.
I am willing to share more such problematic images as well, if it is required.
Thank you!
EDIT1
#Silencer's answer has been mighty helpful to me for identifying letters in the image, but I want to accurately find the placement of the words in the image. Hence, I feel identifying the rows and columns is necessary, and I can do that only when a perspective transform is applied to the board.
I wrote an answer on MSER text detection:
Trying to Plot OpenCV's MSER regions using matplotlib
The code generate the following results on your images.
You can have a try.
I think #silencer has already given quite promising solution.
But to perform perspective transform as you have mentioned that you have already tried with hough lines to find the largest rectangle but it fails because for tiles present.
Given you have large image data set may be more than 1000 images, you can also give a shot to Deep learning based approach where you can train a model with images as input and corresponding rectangle boundary points coordinate as outputs.
I am in the process of putting together an OpenCV script to analyze immunohistochemically stained heart tissue. Our staining procedure renders cell types expressing certain proteins in their plasma membranes with pigments visible under a light microscope, which we use to photograph the images.
So far, I've succeded in segmenting the images to different layers based on color range using a modified version of the frequently cited color segmentation script available through the OpenCV community(http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html).
A screen shot of the original image:
B-Cell layer displayed:
At this point, I would like to calculate the ratio of area of B-Cells to unstained tissue. This operation prompted an extraction of the background cell layer as such based on color range:
Obviously, these results leave much to be desired.
Does anyone have ideas of how to approach this problem? Again, I would like to segment the background tissue (transparent) layer, which is unfortunately fairly sponge-like in texture. My goal is to create a mask representive of the area of unstained tissue. It seems a blur technique is necessary to fill the gaps in the tissue, but the loss in accuracy this approach entails is obvious.
In the sample image, the channels look highly correlated. If you apply decorrelation-stretching to the image you should be able to see more detail. Here in my blog post I've implemented decorrelation-stretching in C++ (unfortualtely not Python).
Using the sample code in the blog I did the following to segment the cell region:
dstretch the CIE Lab image with following targetMean and tergetSigma.
float mu[3] = {128.0f, 128.0f, 128.0f};
float sd[3] = {128.0f, 5.0f, 5.0f};
Mat mean = Mat(3, 1, CV_32F, mu);
Mat sigma = Mat(3, 1, CV_32F, sd);
Convert the dstretched CIE Lab image back to BGR.
Erode this BGR image with a 3x3 rectangular structuring element once.
Apply kmeans clustering to this eroded image with k = 2.
I don't know how good this segmentation is. I think it is possible to get a better segmentation by trying different values for the above parameters (mean, sigma, structuring element size and number of times the image is eroded).
(Following images are not to the original scale)
Original:
dstretched CIE Lab converted back to BGR:
Eroded:
kmeans with k = 2: