I am trying to reconstruct a 3d shape from multiple 2d images.
I have calculated a fundamental matrix, but now I don't know what to do with it.
I am finding multiple conflicting answers on stack overflow and academic papers.
For example, Here says you need to compute the rotation and translation matrices from the fundamental matrix.
Here says you need to find the camera matrices.
Here says you need to find the homographies.
Here says you need to find the epipolar lines.
Which is it?? (And how do I do it? I have read the H&Z book but I do not understand it. It says I can 'easily' use the 'direct formula' in result 9.14, but result 9.14 is neither easy nor direct to understand.)
Stack overflow wants code so here's what I have so far:
# let's create some sample data
Wpts = np.array([[1, 1, 1, 1], # A Cube in world points
[1, 2, 1, 1],
[2, 1, 1, 1],
[2, 2, 1, 1],
[1, 1, 2, 1],
[1, 2, 2, 1],
[2, 1, 2, 1],
[2, 2, 2, 1]])
Cpts = np.array([[0, 4, 0, 1], #slightly up
[4, 0, 0, 1],
[-4, 0, 0, 1],
[0, -4, 0, 1]])
Cangles = np.array([[0, -1, 0], #slightly looking down
[-1, 0, 0],
[1, 0, 0],
[0,1,0]])
views = []
transforms = []
clen = len(Cpts)
for i in range(clen):
cangle = Cangles[i]
cpt = Cpts[i]
transform = cameraTransformMatrix(cangle, cpt)
transforms.append(transform)
newpts = np.dot(Wpts, transform.T)
view = cameraView(newpts)
views.append(view)
H = cv2.findFundamentalMat(views[0], views[1])[0]
## now what??? How do I recover the cube shape?
Edit: I do not know the camera parameters
Fundamental Matrix
At first, listen to the fundamental matrix song ;).
The Fundamental Matrix only shows the mathematical relationship between your point correspondences in 2 images (x' - image 2, x - image 1). "That means, for all pairs of corresponding points holds " (Wikipedia). This also means, that if you are having outlier or incorrect point correspondences, it directly affects the quality of your fundamental matrix.
Additionally, a similar structure exists for the relationship of point correspondences between 3 images which is called Trifocal Tensor.
A 3d reconstruction using exclusively the properties of the Fundamental Matrix is not possible because "The epipolar geometry is the intrinsic projective geometry between two views. It is
independent of scene structure, and only depends on the cameras’ internal parameters
and relative pose." (HZ, p.239).
Camera matrix
Refering to your question how to reconstruct the shape from multiple images you need to know the camera matrices of your images (K', K). The camera matrix is a 3x3 matrix composed of the camera focal lengths or principal distance (fx, fy) as well as the optical center or principal point (cx, cy).
You can derive your camera matrix using camera calibration.
Essential matrix
When you know your camera matrices you can extend your Fundamental Matrix to a Essential Matrix E.
You could say quite sloppy that your Fundamental Matrix is now "calibrated".
The Essential Matrix can be used to get the rotation (rotation matrix R) and translation (vector t) of your second image in comparison to your first image only up to a projective reconstruction. t will be a unit vector. For this purpose you can use the OpenCV functions decomposeEssentialMat or recoverPose (that uses the cheirality check) or read further detailed explanations in HZ.
Projection matrix
Knowing your translation and rotation you can build you projection matrices for your images. The projection matrix is defined as . Finally, you can use triangulation (triangulatePoints) to derive the 3d coordinates of your image points. I recommend using a subsequent bundle adjustment to receive a proper configuration. There is also a sfm module in openCV.
Since homography or epipolar line knowledge is not essentially necessary for the 3d reconstruction I did not explain these concepts.
With your fundamental matrix, you can determine the camera matrices P and P' in a canonical form as stated (HZ,pp254-256). From these camera matrices you can theoretically triangulate a projective reconstruction that differs to the real scene in terms of an unknown projective transformation.
It has to be noted that the linear triangulation methods aren't suitable for projective reconstruction as stated in
(HZ,Discussion,p313) ["...neither of these two linear methods is quite suitable for projective reconstruction, since they are not projective-invariant."]
and therefore, the mentioned recommended triangulation technique should be used to obtain valueable results (that is actually more work to implement).
From this projective reconstruction you could use self-calibration approaches that can work in some scenarios but will not yield the accuracy and robustness that you can obtain with a calibrated camera and the utilization of the essential matrix to compute the motion parameters.
Related
I want to apply an affine transformation (using scipy ndimage.affine_transform).
In particular I want to apply a shear in both dimensions so my transform matrix looks something like this:
transform = np.array([[1, degree_v, 0],
[degree_h, 1, 0],
[0, 0, 1]])
the resulting image received by
ndimage.affine_transform(img, transform)
is cropped (which makes sense) and as I understand I have to give the correct output_shape and offset to keep the image uncropped.
How do I calculate these values?
I have two shapes or coordinate systems, and I want to be able to transform points from one system onto the other.
I have found that if the shapes are quadrilateral and I have 4 pairs of corresponding points then I can calculate a transformation matrix and then use that matrix to calculate any point in Shape B onto it's corresponding coordinates in Shape A.
Here is the working python code to make this calculation:
import numpy as np
import cv2
shape_a_points = np.array([
[0.6, 0],
[1, 0.75],
[0.8, 1],
[0.5, 0.6]
], dtype="float32")
shape_b_points = np.array([
[0, 0],
[1, 0],
[1, 1],
[0, 1],
], dtype="float32")
test_points = [0.5, 0.5]
matrix = cv2.getPerspectiveTransform(shape_b_points, shape_a_points)
print(matrix)
result = cv2.perspectiveTransform(np.array([[test_points]], dtype="float32"), matrix)
print(result)
If you run this code you'll see that the test point of (0.5, 0.5) on Shape B (right in the middle), comes out as (0.73, 0.67) on Shape A, which visually looks correct.
However what can I do if the shape is more complex. Such as 4+N vertices, and 4+N pairs of corresponding points? Or even more complex, what if there are curves in the shapes?
For example:
Thanks #christoph-rackwitz for pointing me in the right direction.
I have found very good results for transformations using the OpenCV ThinPlateSplineShapeTransformer.
Here is my example script below. Note that I have 7 pairs of points. The "matches" is just a list of 7 (telling the script point #1 from Shape A matches to point #1 from Shape B...etc..)
import numpy as np
import cv2
number_of_points = 7
shape_a_points = np.array([
[0.6, 0],
[1, 0.75],
[0.8, 1],
[0.5, 0.6],
[0.75, 0],
[1, 0],
[1, 0.25]
], dtype="float32").reshape((-1, number_of_points, 2))
shape_b_points = np.array([
[0, 0],
[1, 0],
[1, 1],
[0, 1],
[0.25, 0],
[0.5, 0],
[0.75, 0]
], dtype="float32").reshape((-1, number_of_points, 2))
test_points = [0.5, 0.5]
matches = [cv2.DMatch(i, i, 0) for i in range(number_of_points)]
tps = cv2.createThinPlateSplineShapeTransformer()
tps.estimateTransformation(shape_b_points, shape_a_points, matches)
M = tps.applyTransformation(np.array([[test_points]], dtype="float32"))
print(M[1])
I do not know why you need to reshape the arrays; "you just do" or it will not work.
I have also put it into a simple class if anyone wants to use it:
import cv2
import numpy as np
class transform:
def __init__(self, points_a, points_b):
assert len(points_a) == len(points_b), "Number of points in set A and set B should be same count"
matches = [cv2.DMatch(i, i, 0) for i in range(len(points_a))]
self.tps = cv2.createThinPlateSplineShapeTransformer()
self.tps.estimateTransformation(np.array(points_b, dtype="float32").reshape((-1, len(points_a), 2)),
np.array(points_a, dtype="float32").reshape((-1, len(points_a), 2)), matches)
def transformPoint(self, point):
result = self.tps.applyTransformation(np.array([[point]], dtype="float32"))
return result[1][0][0]
If the two shapes are related by a perspective transformation, then any four points will lead to the same transformation, at least as long as no the of them are collinear. In theory you might pick any four such points and the rest should just work.
In practice, numeric considerations might come into play. If you pick for points very close to one another, then small errors in their positions would lead to much larger errors further away from these points. You could probably do some sophisticated analysis involving error intervals, but as a rule of thumb I'd try to aim for large distances between any two points both on the input and on the output side of the transformation.
An answer from me on Math Exchange explains a bit of the kind of computation that goes into the definition of a perspective transformation given for pairs of points. It might be useful for understanding where that number 4 is coming from.
If you have more than 4 pairs of points, and defining the transformation using any four of them does not correctly translate the rest, then you are likely in one of two other use cases.
Either you are indeed looking for a perspective transformation, but have poor input data. You might have positions from feature detection, and the might be imprecise. Some features might even be matched up indirectly. So in this case you would be looking for the best transformation to describe your data with small errors. Your question doesn't sound like this is your use case, so I'll not go into detail.
Our you have a transformation that is not a perspective transformation. In particular anything that turns a straight line into a bent curve or vice versa is not a perspective transformation any more. You might be looking for some other class of transformation, or for something like a piecewise projective transformation. Without knowing more about your use case, it's very hard to suggest a good class of transformations for this.
I tried 5 different implementations of the Sobel operator in Python, one of which I implemented myself, and the results are radically different.
My question is similar to this one, but there are still differences I don't understand with the other implementations.
Is there any agreed on definition of the Sobel operator, and is it always synonymous to "image gradient"?
Even the definition of the Sobel kernel is different from source to source, according to Wikipedia it is [[1, 0, -1],[2, 0, -2],[1, 0, -1]], but according to other sources it is [[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]].
Here is my code where I tried the different techniques:
from scipy import ndimage
import numpy as np
import cv2 as cv
from scipy import ndimage
from PIL import Image, ImageFilter
img = np.random.randint(0, 255, [10, 10]).astype(np.uint8)
def sobel_x(img) :
return ndimage.convolve(img, np.array([[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]]))
my_sobel = sobel_x(img)
_, numpy_sobel = np.gradient(img)
opencv_sobel = cv.Sobel(img, cv.CV_8UC1, 1, 0)
ndimage_sobel = ndimage.sobel(img, axis=0, mode="constant")
pil_sobel = np.array(Image.fromarray(img).filter(ImageFilter.Kernel((3, 3), (-1, 0, 1, -2, 0, 2, -1, 0, 1), 1, 0)))
print(my_sobel)
print(numpy_sobel)
print(opencv_sobel)
print(ndimage_sobel)
print(pil_sobel)
The Sobel operator estimates the derivative.
The correct definition of the Sobel operator to estimate the horizontal derivative is:
| 1 0 -1 |
| 2 0 -2 | / 8
| 1 0 -1 |
The division by 8 is important to get the right magnitude. People often leave it out because they don't care about the actual derivative, they care about comparing the gradient in different places of the same image. Multiplying everything by 8 makes no difference there, and so leaving out the /8 keeps things simple.
You will see the kernel defined with the inverse signs some places. These are cases where the kernel is applied by correlation instead of convolution (which differ by a mirroring of the kernel), such as the case of OpenCV. These can also be cases where people copy stuff without understanding them, resulting in a gradient with the wrong sign.
But then again, the Sobel operator is mostly applied to obtain the gradient magnitude (the square root of the sum of the squares of the horizontal and vertical derivatives). In this case, reversing the signs doesn't matter any more.
Note that np.gradient(img) is comparable to convolving with [1,0,-1]/2. This is another way to estimate the derivative. Sobel adds a regularization (==smoothing) in the perpendicular direction.
You will get a better understanding of each implementation if you use a more meaningful test image. Try for example a black image with a white square in the middle. You will be able to compare the strength of the estimated gradients, their direction (I assume some libraries use a different definition of x and y axes), and you will be able to see the effect of the regularization.
according to wikipedia it's [[1, 0, -1],[2, 0, -2],[1, 0, 1]] but according to other sources it's [[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]]
Both are used for detecting vertical edges. Difference here is how these kernels mark "left" and "right" edges.
For simplicity sake lets consider 1D example, and let array be
[0, 0, 255, 255, 255]
then if we calculate using padding then
kernel [2, 0, -2] gives [0, -510, -510, 0, 0]
kernel [-2, 0, 2] gives [0, 510, 510, 0, 0]
As you can see abrupt increase in value was marked with negative values by first kernel and positive values by second. Note that is is relevant only if you need to discriminate left vs right edges, when you want just to find vertical edges, you might use any of these 2 aboves and then get absolute value.
What is the function of numpy.linalg.norm method?
In this Kmeans Clustering sample the numpy.linalg.norm function is used to get the distance between new centroids and old centroids in the movement centroid step but I cannot understand what is the meaning by itself
Could somebody give me a few ideas in relation to this Kmeans clustering context?
What is the norm of a vector?
numpy.linalg.norm is used to calculate the norm of a vector or a matrix.
This is the help document taken from numpy.linalg.norm:
numpy.linalg.norm(x, ord=None, axis=None, keepdims=False)[source]
This is the code snippet taken from K-Means Clustering in Python:
# Euclidean Distance Caculator
def dist(a, b, ax=1):
return np.linalg.norm(a - b, axis=ax)
It take order=None as default, so just to calculate the Frobenius norm of (a-b), this is ti calculate the distance between a and b( using the upper Formula).
I am not a mathematician but here is my layman's explanation of “norm”:
A vector describes the location of a point in space relative to the origin. Here’s an example in 2D space for the point [3 2]:
The norm is the distance from the point to the origin. In the 2D case it’s easy to visualize the point as the diametrically opposed point of a right triangle and see that the norm is the same thing as the hypotenuse.
However, In higher dimensions it’s no longer a shape we describe in average-person language, but the distance from the origin to the point is still called the norm. Here's an example in 3D space:
I don’t know why the norm is used in K-means clustering. You stated that it was part of determing the distance between the old and new centroid in each step. Not sure why one would use the norm for this since you can get the distance between two points in any dimensionality* using an extension of the from used in 2D algebra:
You just add a term for each addtional dimension, for example here is a 3D version:
*where the dimensions are positive integers
numpy.linalg.norm function is used to get the sum from a row or column of a matrix.Suppose ,
>>> c = np.array([[ 1, 2, 3],
... [-1, 1, 4]])
>>> LA.norm(c, axis=0)
array([ 1.41421356, 2.23606798, 5. ])
>>> LA.norm(c, axis=1)
array([ 3.74165739, 4.24264069])
>>> LA.norm(c, ord=1, axis=1)
array([6, 6])
I want to perform principal component analysis for dimension reduction and data integration.
I have 3 features(variables) and 5 samples like below. I want to integrate them into 1-dimensional(1 feature) output by transforming them(computing 1st PC). I want to use transformed data for further statistical analysis, because I believe that it displays the 'main' characteristics of 3 input features.
I first wrote a test code with python using scikit-learn like below. It is the simple case that the values of 3 features are all equivalent. In other word, I applied PCA for three same vector, [0, 1, 2, 1, 0].
Code
import numpy as np
from sklearn.decomposition import PCA
pca = PCA(n_components=1)
samples = np.array([[0,0,0],[1,1,1],[2,2,2],[1,1,1],[0,0,0]])
pc1 = pca.fit_transform(samples)
print (pc1)
Output
[[-1.38564065]
[ 0.34641016]
[ 2.07846097]
[ 0.34641016]
[-1.38564065]]
Is taking 1st PCA after dimension reduction proper approach for data integration?
1-2. For example, if features are like [power rank, speed rank], and power have roughly negative correlation with speed, when it is a 2-feature case. I want to know the sample which have both 'high power' and 'high speed'. It is easy to decide that [power 1, speed 1] is better than [power 2, speed 2], but difficult for the case like [power 4, speed 2] vs [power 3, speed 3].
So I want to apply PCA to 2-dimensional 'power and speed' dataset, and take 1st PC, then use the rank of '1st PC'. Is this kind of approach still proper?
In this case, I think the output should also be [0, 1, 2, 1, 0] which is the same as the input. But output was [-1.38564065, 0.34641016, 2.07846097, 0.34641016, -1.38564065]. Are there any problem with the code, or is it the right answer?
Yes. It is also called data projection (to the lower dimension).
The resulting output is centered and normalized according to the train data. The result is correct.
In case of only 5 samples I don't think it is wise to run any statistical methods. And if you believe that your features are the same, just check that correlation between dimensions is close to 1, and then you can just disregard other dimensions.
There is no need to use PCA for this small dataset. And for PCA you array should be scaled.
In any case, you have only 3 dimensions: you can plot points and take a look with your eyes, you can calculate distances (make some kind on Nearest Neighborhoods algorithm).