I tried 5 different implementations of the Sobel operator in Python, one of which I implemented myself, and the results are radically different.
My question is similar to this one, but there are still differences I don't understand with the other implementations.
Is there any agreed on definition of the Sobel operator, and is it always synonymous to "image gradient"?
Even the definition of the Sobel kernel is different from source to source, according to Wikipedia it is [[1, 0, -1],[2, 0, -2],[1, 0, -1]], but according to other sources it is [[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]].
Here is my code where I tried the different techniques:
from scipy import ndimage
import numpy as np
import cv2 as cv
from scipy import ndimage
from PIL import Image, ImageFilter
img = np.random.randint(0, 255, [10, 10]).astype(np.uint8)
def sobel_x(img) :
return ndimage.convolve(img, np.array([[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]]))
my_sobel = sobel_x(img)
_, numpy_sobel = np.gradient(img)
opencv_sobel = cv.Sobel(img, cv.CV_8UC1, 1, 0)
ndimage_sobel = ndimage.sobel(img, axis=0, mode="constant")
pil_sobel = np.array(Image.fromarray(img).filter(ImageFilter.Kernel((3, 3), (-1, 0, 1, -2, 0, 2, -1, 0, 1), 1, 0)))
print(my_sobel)
print(numpy_sobel)
print(opencv_sobel)
print(ndimage_sobel)
print(pil_sobel)
The Sobel operator estimates the derivative.
The correct definition of the Sobel operator to estimate the horizontal derivative is:
| 1 0 -1 |
| 2 0 -2 | / 8
| 1 0 -1 |
The division by 8 is important to get the right magnitude. People often leave it out because they don't care about the actual derivative, they care about comparing the gradient in different places of the same image. Multiplying everything by 8 makes no difference there, and so leaving out the /8 keeps things simple.
You will see the kernel defined with the inverse signs some places. These are cases where the kernel is applied by correlation instead of convolution (which differ by a mirroring of the kernel), such as the case of OpenCV. These can also be cases where people copy stuff without understanding them, resulting in a gradient with the wrong sign.
But then again, the Sobel operator is mostly applied to obtain the gradient magnitude (the square root of the sum of the squares of the horizontal and vertical derivatives). In this case, reversing the signs doesn't matter any more.
Note that np.gradient(img) is comparable to convolving with [1,0,-1]/2. This is another way to estimate the derivative. Sobel adds a regularization (==smoothing) in the perpendicular direction.
You will get a better understanding of each implementation if you use a more meaningful test image. Try for example a black image with a white square in the middle. You will be able to compare the strength of the estimated gradients, their direction (I assume some libraries use a different definition of x and y axes), and you will be able to see the effect of the regularization.
according to wikipedia it's [[1, 0, -1],[2, 0, -2],[1, 0, 1]] but according to other sources it's [[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]]
Both are used for detecting vertical edges. Difference here is how these kernels mark "left" and "right" edges.
For simplicity sake lets consider 1D example, and let array be
[0, 0, 255, 255, 255]
then if we calculate using padding then
kernel [2, 0, -2] gives [0, -510, -510, 0, 0]
kernel [-2, 0, 2] gives [0, 510, 510, 0, 0]
As you can see abrupt increase in value was marked with negative values by first kernel and positive values by second. Note that is is relevant only if you need to discriminate left vs right edges, when you want just to find vertical edges, you might use any of these 2 aboves and then get absolute value.
Related
I have two shapes or coordinate systems, and I want to be able to transform points from one system onto the other.
I have found that if the shapes are quadrilateral and I have 4 pairs of corresponding points then I can calculate a transformation matrix and then use that matrix to calculate any point in Shape B onto it's corresponding coordinates in Shape A.
Here is the working python code to make this calculation:
import numpy as np
import cv2
shape_a_points = np.array([
[0.6, 0],
[1, 0.75],
[0.8, 1],
[0.5, 0.6]
], dtype="float32")
shape_b_points = np.array([
[0, 0],
[1, 0],
[1, 1],
[0, 1],
], dtype="float32")
test_points = [0.5, 0.5]
matrix = cv2.getPerspectiveTransform(shape_b_points, shape_a_points)
print(matrix)
result = cv2.perspectiveTransform(np.array([[test_points]], dtype="float32"), matrix)
print(result)
If you run this code you'll see that the test point of (0.5, 0.5) on Shape B (right in the middle), comes out as (0.73, 0.67) on Shape A, which visually looks correct.
However what can I do if the shape is more complex. Such as 4+N vertices, and 4+N pairs of corresponding points? Or even more complex, what if there are curves in the shapes?
For example:
Thanks #christoph-rackwitz for pointing me in the right direction.
I have found very good results for transformations using the OpenCV ThinPlateSplineShapeTransformer.
Here is my example script below. Note that I have 7 pairs of points. The "matches" is just a list of 7 (telling the script point #1 from Shape A matches to point #1 from Shape B...etc..)
import numpy as np
import cv2
number_of_points = 7
shape_a_points = np.array([
[0.6, 0],
[1, 0.75],
[0.8, 1],
[0.5, 0.6],
[0.75, 0],
[1, 0],
[1, 0.25]
], dtype="float32").reshape((-1, number_of_points, 2))
shape_b_points = np.array([
[0, 0],
[1, 0],
[1, 1],
[0, 1],
[0.25, 0],
[0.5, 0],
[0.75, 0]
], dtype="float32").reshape((-1, number_of_points, 2))
test_points = [0.5, 0.5]
matches = [cv2.DMatch(i, i, 0) for i in range(number_of_points)]
tps = cv2.createThinPlateSplineShapeTransformer()
tps.estimateTransformation(shape_b_points, shape_a_points, matches)
M = tps.applyTransformation(np.array([[test_points]], dtype="float32"))
print(M[1])
I do not know why you need to reshape the arrays; "you just do" or it will not work.
I have also put it into a simple class if anyone wants to use it:
import cv2
import numpy as np
class transform:
def __init__(self, points_a, points_b):
assert len(points_a) == len(points_b), "Number of points in set A and set B should be same count"
matches = [cv2.DMatch(i, i, 0) for i in range(len(points_a))]
self.tps = cv2.createThinPlateSplineShapeTransformer()
self.tps.estimateTransformation(np.array(points_b, dtype="float32").reshape((-1, len(points_a), 2)),
np.array(points_a, dtype="float32").reshape((-1, len(points_a), 2)), matches)
def transformPoint(self, point):
result = self.tps.applyTransformation(np.array([[point]], dtype="float32"))
return result[1][0][0]
If the two shapes are related by a perspective transformation, then any four points will lead to the same transformation, at least as long as no the of them are collinear. In theory you might pick any four such points and the rest should just work.
In practice, numeric considerations might come into play. If you pick for points very close to one another, then small errors in their positions would lead to much larger errors further away from these points. You could probably do some sophisticated analysis involving error intervals, but as a rule of thumb I'd try to aim for large distances between any two points both on the input and on the output side of the transformation.
An answer from me on Math Exchange explains a bit of the kind of computation that goes into the definition of a perspective transformation given for pairs of points. It might be useful for understanding where that number 4 is coming from.
If you have more than 4 pairs of points, and defining the transformation using any four of them does not correctly translate the rest, then you are likely in one of two other use cases.
Either you are indeed looking for a perspective transformation, but have poor input data. You might have positions from feature detection, and the might be imprecise. Some features might even be matched up indirectly. So in this case you would be looking for the best transformation to describe your data with small errors. Your question doesn't sound like this is your use case, so I'll not go into detail.
Our you have a transformation that is not a perspective transformation. In particular anything that turns a straight line into a bent curve or vice versa is not a perspective transformation any more. You might be looking for some other class of transformation, or for something like a piecewise projective transformation. Without knowing more about your use case, it's very hard to suggest a good class of transformations for this.
I am trying to reconstruct a 3d shape from multiple 2d images.
I have calculated a fundamental matrix, but now I don't know what to do with it.
I am finding multiple conflicting answers on stack overflow and academic papers.
For example, Here says you need to compute the rotation and translation matrices from the fundamental matrix.
Here says you need to find the camera matrices.
Here says you need to find the homographies.
Here says you need to find the epipolar lines.
Which is it?? (And how do I do it? I have read the H&Z book but I do not understand it. It says I can 'easily' use the 'direct formula' in result 9.14, but result 9.14 is neither easy nor direct to understand.)
Stack overflow wants code so here's what I have so far:
# let's create some sample data
Wpts = np.array([[1, 1, 1, 1], # A Cube in world points
[1, 2, 1, 1],
[2, 1, 1, 1],
[2, 2, 1, 1],
[1, 1, 2, 1],
[1, 2, 2, 1],
[2, 1, 2, 1],
[2, 2, 2, 1]])
Cpts = np.array([[0, 4, 0, 1], #slightly up
[4, 0, 0, 1],
[-4, 0, 0, 1],
[0, -4, 0, 1]])
Cangles = np.array([[0, -1, 0], #slightly looking down
[-1, 0, 0],
[1, 0, 0],
[0,1,0]])
views = []
transforms = []
clen = len(Cpts)
for i in range(clen):
cangle = Cangles[i]
cpt = Cpts[i]
transform = cameraTransformMatrix(cangle, cpt)
transforms.append(transform)
newpts = np.dot(Wpts, transform.T)
view = cameraView(newpts)
views.append(view)
H = cv2.findFundamentalMat(views[0], views[1])[0]
## now what??? How do I recover the cube shape?
Edit: I do not know the camera parameters
Fundamental Matrix
At first, listen to the fundamental matrix song ;).
The Fundamental Matrix only shows the mathematical relationship between your point correspondences in 2 images (x' - image 2, x - image 1). "That means, for all pairs of corresponding points holds " (Wikipedia). This also means, that if you are having outlier or incorrect point correspondences, it directly affects the quality of your fundamental matrix.
Additionally, a similar structure exists for the relationship of point correspondences between 3 images which is called Trifocal Tensor.
A 3d reconstruction using exclusively the properties of the Fundamental Matrix is not possible because "The epipolar geometry is the intrinsic projective geometry between two views. It is
independent of scene structure, and only depends on the cameras’ internal parameters
and relative pose." (HZ, p.239).
Camera matrix
Refering to your question how to reconstruct the shape from multiple images you need to know the camera matrices of your images (K', K). The camera matrix is a 3x3 matrix composed of the camera focal lengths or principal distance (fx, fy) as well as the optical center or principal point (cx, cy).
You can derive your camera matrix using camera calibration.
Essential matrix
When you know your camera matrices you can extend your Fundamental Matrix to a Essential Matrix E.
You could say quite sloppy that your Fundamental Matrix is now "calibrated".
The Essential Matrix can be used to get the rotation (rotation matrix R) and translation (vector t) of your second image in comparison to your first image only up to a projective reconstruction. t will be a unit vector. For this purpose you can use the OpenCV functions decomposeEssentialMat or recoverPose (that uses the cheirality check) or read further detailed explanations in HZ.
Projection matrix
Knowing your translation and rotation you can build you projection matrices for your images. The projection matrix is defined as . Finally, you can use triangulation (triangulatePoints) to derive the 3d coordinates of your image points. I recommend using a subsequent bundle adjustment to receive a proper configuration. There is also a sfm module in openCV.
Since homography or epipolar line knowledge is not essentially necessary for the 3d reconstruction I did not explain these concepts.
With your fundamental matrix, you can determine the camera matrices P and P' in a canonical form as stated (HZ,pp254-256). From these camera matrices you can theoretically triangulate a projective reconstruction that differs to the real scene in terms of an unknown projective transformation.
It has to be noted that the linear triangulation methods aren't suitable for projective reconstruction as stated in
(HZ,Discussion,p313) ["...neither of these two linear methods is quite suitable for projective reconstruction, since they are not projective-invariant."]
and therefore, the mentioned recommended triangulation technique should be used to obtain valueable results (that is actually more work to implement).
From this projective reconstruction you could use self-calibration approaches that can work in some scenarios but will not yield the accuracy and robustness that you can obtain with a calibrated camera and the utilization of the essential matrix to compute the motion parameters.
I have an array which looks like this
boxes = [268,885,426,865,406,707,248,727]
It's a collection of (x,y) points. If I plot this using this function:
def draw_boxes_on_image_mod(image, boxes):
image_copy = image.copy()
image_copy = np.array(image_copy)
cv2.line(image_copy, (boxes[0],boxes[1]),(boxes[2],boxes[3]),(0,255,255),2)
cv2.line(image_copy, (boxes[4], boxes[5]),(boxes[6],boxes[7]),(0,255,255),2)
cv2.line(image_copy, (boxes[0],boxes[1]),(boxes[6],boxes[7]),(0,255,255),2)
cv2.line(image_copy, (boxes[4], boxes[5]),(boxes[2],boxes[3]),(0,255,255),2)
scipy. misc.imsave('/home/ryan/TEST.png', image_copy)
return image_copy
I get an image with a rectangle drawn on the part of the image I'm interested in, But what I want is to extract that portion and convert it into an image.
I was thinking of using NumPy indexing to achieve this but
image = image[268:426]
I am finding it difficult to understand how to index the (x,y) values together.
Any suggestions would be really helpful.Thanks in advance.
When you just call A[1:3] all you are asking for are the rows 1 and 2, the rows including 1 and stopping before 3, so you must take into account columns as well to get the exact subsection you need.
You can do this in numpy by stating the range of the rows and columns, the subsection of the array you want will start at a row and end at row + m as well as starting at a column and ending at column + n
For example take
A = np.array([[0, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 0],
[0, 0, 0, 0]])
We want just the values in the middle set to 1, so we select them with
Asub = A[1:3,1:3]
To get
[[1 1]
[1 1]]
I've stated this question in graph theory terms, but that conceptualization isn't necessary.
What I'm trying to do, using Python, is produce a matrix of zeros and ones, where every row has the same number of ones and every column has the same number of ones. The number for rows will not be the same as the number for columns when the number of rows (sending nodes) does not equal the number of columns (receiving nodes) -- which is something I'm allowing.
It makes sense to me to do this in numpy, but there may be other packages (like networkx?) that would help.
Here's the function I'm looking to write with the desired inputs and outputs:
n_pre = 4 # number of nodes available to send a connection
n_post = 4 # number of nodes available to receive a connection
p = 0.5 # proportion of all possible connections that exist
mat = generate_mat(n_pre, n_post, p)
print mat
The output would be, for example:
[[0, 1, 0, 1],
[1, 0, 1, 0],
[1, 1, 0, 0],
[0, 0, 1, 1]]
Notice, every column and every row has two ones in it. Aside from this constraint, the positions of the ones should be random (and vary from call to call of this function).
In graph theory terms, this means every node has an in-degree of 2 and an out-degree of 2 (50% of all possible connections, as specified with p = 0.5).
For a square matrix, what you describe is the adjacency matrix of a random k-regular directed graph, and there are known algorithms to generate such graphs. igraph implements one:
# I think this is how you call it - it's an instance method for some reason.
igraph.Graph().K_Regular(n, k, directed=True)
networkx has a function for random k-regular undirected graphs:
networkx.random_regular_graph(k, n)
For a non-square matrix, what you describe is isomorphic to a random biregular graph. I have found no convenient existing implementation for random biregular graphs, but the term should be a good starting point for searching for known algorithms.
First, do the pre-work so that we have available the size of the square matrix and the population pop of each row and column. Now, initialize a matrix with pop ones on the diagonal. For n = 6 and pop = 3, you'd have
[[1, 1, 1, 0, 0, 0]
[0, 1, 1, 1, 0, 0]
[0, 0, 1, 1, 1, 0]
[0, 0, 0, 1, 1, 1]
[1, 0, 0, 0, 1, 1]
[1, 1, 0, 0, 0, 1]]
Now, apply your friendly neighborhood random shuffle operation to the columns, then the rows (or in the other order). There's your matrix. A shuffle of rows-only or columns-only does not change the population on either axis.
I found out about vtkInterface, a python vtk wrapper that facilitates vtk plotting.
Trying to run their first example, under Initialize from Numpy Arrays in this page: vtkInterface.PolyData, by simply running the code as is, and it results in a gray render window with nothing in it.
Some of the other examples do work but this is exactly the thing that I need at the moment and was wondering if anybody has tried it and knows what might be wrong.
Example Code:
import numpy as np
import vtkInterface
# mesh points
vertices = np.array([[0, 0, 0],
[1, 0, 0],
[1, 1, 0],
[0, 1, 0]])
# mesh faces
faces = np.hstack([[4, 0, 1, 2, 3], # square
[3, 0, 1, 4], # triangle
[3, 1, 2, 4]]) # triangle
surf = vtkInterface.PolyData(vertices, faces)
# plot each face with a different color
surf.Plot(scalars=np.arange(3))
The example is wrong. It lacks a fifth point. For example this will work.
vertices = np.array([[0, 0, 0],
[1, 0, 0],
[1, 1, 0],
[0, 1, 0],
[0.5, 0.5, -1]])
Explanation: In VTK, faces are encoded in the following way:
face_j = [ n, i_0, i_1, ..., i_n ]
Here, n is the number of points per face, and i_k are the indices of the points in the vertex-array. The face is formed by connecting the points vertices[i_k] with k in range(0,n). A list of faces is created by simply concatenating the single face specifications:
np.hstack([face_0, face_1, ..., face_j, ...])
The advantage of this encoding is that the number of points used per face, n, can vary. So a mesh can consist of lines, triangles, quads, etc.
In the example, the vertex with id 4 is used in the second and third face. So vertices is required to consist of at least five entries. Surprisingly, the sample doesn't crash, as VTK almost certainly would if some faces were accessing non-existing points.