I am trying to get a grasp of how to project 2D coordinates into a 3D space through my camera matrix, but I can't for the love of it, understand it.
So I am hoping that someone here can point me to a guide or something that can help me. Here is what I got:
I have read and tried all of these articles to try and understand the material:
Find 3D coordinate with respect to the camera using 2D image coordinates
https://en.wikipedia.org/wiki/Camera_matrix
https://se.mathworks.com/help/vision/ug/camera-calibration.html#bu0nh2_
https://staff.fnwi.uva.nl/r.vandenboomgaard/IPCV20162017/LectureNotes/CV/PinholeCamera/PinholeCamera.html
https://towardsdatascience.com/camera-calibration-fda5beb373c3
So I have a camera that is pointing "straight down" towards a table and it is centered on the table. I am guessing that from this I can create my translation matrix and rotation matrix (I am unsure what angle down is compared to 0degree, 90 or 180?)
T = [0.0, 0.0, 0.0]
R = [[cos(angle), -sin(angle), 0.0],
[sin(angle), cos(angle), 0.0],
[0.0, 0.0, 1.0]]
These are my extrinsic matrices.
My 2D photo is 1280x720px and my cameras focal length is 1.88mm and from this I can create a camera matrix based on this:
fx = 1280 / 1.88
fy = 720 / 1.88
u0 = 1280 / 2
v0 = 720 / 2
K = [[0.00146875, 0.0, 640.0, 0.0],
[0.0, 0.00261111, 360.0, 0.0],
[0.0, 0.0, 1.0, 0.0]]
I know that the distance between my camera and the table is 650mm
As far as I understand I am supposed to use linear algebra or matrix multiplication to take my 2D coordinate (300, 200) and put it into 3D space, but how to actually do it I can't seem to figure out.
It seems like a lot of the material I can find is about matching a 3D coordinate in 2D space.
From this question
How do I reverse-project 2D points into 3D?
I found this formula:
mat = [
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
]
x = mat[0][0] * p.x + mat[0][1] * p.y + mat[0][2] * p.z + mat[0][3] * 1
y = mat[1][0] * p.x + mat[1][1] * p.y + mat[1][2] * p.z + mat[1][3] * 1
w = mat[3][0] * p.x + mat[3][1] * p.y + mat[3][2] * p.z + mat[3][3] * 1
But again I am not sure if this is the way to go since it gives me some weird results.
I am really hoping someone can help me out. Please request if any more information is needed.
Edit:
I noticed that there are two different formulas for the intrinsic camera matrix:
which has u0,v0 and cx, xy in different locations, but they both express the center of the image.
Which one is correct to use and with what units, mm or pixels?
I looked into vector x matrix multiplication and I think I understand that part.
The second matrix formula with cx,cy in the third row will never consider the Z distance because of the way the multiplication works. Again I am not entirely sure how this works, but that does not make sense to me?
Related
Let's consider the code given in the following link:
https://towardsdatascience.com/head-pose-estimation-using-python-d165d3541600
I'm trying to analyze and understand everything what is behind it. One of the things that I cannot undetstand is this part:
# Get angles
angles, mtxR, mtxQ, Qx, Qy, Qz = cv2.RQDecomp3x3(rmat)
# Get the y rotation degree
x = angles[0] * 360
y = angles[1] * 360
I don't understand why we multpliy here by 360. The explanation is cv2.RQDecomp3x3(rmat) is in radians, so to obtain angles, we need to mupltiply it by 360. But does it make any sense? Since 1 radian is 180/np.pi approx 57, why we multiply it by 360 not by 57.
But also let's consider the following code:
rot_mat = np.array([[1, 0, 0], [0, 0, -1], [0, 1, 0]])
angles, _, _, _, _, _ = cv2.RQDecomp3x3(rot_mat)
angles
(90.0, 0.0, 0.0)
Now - if the output is in radians, why it's 90? Then, according to the code in link, the angle should be 90 * 360 = 32400 which completly make no sense.
So my questions are:
Why do we multiply it by 360 not by 57 if its in Radians?
Do we really need to multiply it? Aren't those numbers angles already?
I have two doubts. I have X and Y coordinates which I have listed below. I also have plotted coordinates as shown in picture below.
x = [0, 1, 1, 0, 0, 1]
y = [1, 1, 2, 2, 3, 3]
Now, I have decided to rotate the geometry in clockwise. Therefore, I have rotate all points at 45 degree (+ve) using below formula.
x_dash = x[i] * math.cos(theta) + y[i] * math.sin(theta)
y_dash = -x[i] * math.sin(theta) + y[i] * math.cos(theta)
After using above code (Formula), I got below results which shows new coordinate points after 45 degree clockwise rotation and after plotting, I got below plot.
x_dash = [0.8509035245341184, 1.3762255133518482, 2.2271290378859665, 1.7018070490682369, 2.552710573602355, 3.078032562420085]
y_dash = [0.5253219888177297, -0.3255815357163887, 0.19974045310134103, 1.0506439776354595, 1.575965966453189, 0.7250624419190707]
My questions:
(1) if I take two coordinates (X and Y) of one point and if I find an angle using
theta = np.degrees(np.arctan2(y, x)),
I did not get 45 degree.
For example:
np.degrees(np.arctan2(0.5253219888177297, 0.8509035245341184))
Result: 31.68992191129556
However, when I found an angle of 1st point before rotation. np.degrees(np.arctan2(1, 0)), I got 90.0.
I would like to know the reason that why there is a differece between the angle of same point before and after the rotation.
(2) If I have a rotated geometry like in the 2nd picture and I do not know the angle of rotation. What should I do make that geometry without rotation (like in the first picture).
Kinldy help me with these questions.
By default, math.sin() and math.cos() assumes that the arguments are in radians. So, the code considers the angle of rotation as 45 radians, and not 45 degrees.
You can define theta as:
theta = numpy.radians(45)
Hope this clarifies everything.
I am using a DeepSORT to gather the x and y coordinates of an object. I am using homography to get a zoomed in bird's-eye-view of a specific portion of the video. I know the real-world distances of the area I zoom into, and want to know the real world position and speed of the object as it moves through the identified area.
Here is my current code for the homographical transformation
# points for tracking window
pt_A = [x_0, y_0]
pt_B = [x_1, y_1]
pt_C = [x_2, y_2]
pt_D = [x_3, y_3]
# euclidean distances between each point
width_AD = np.sqrt(((pt_A[0] - pt_D[0]) ** 2) + ((pt_A[1] - pt_D[1]) ** 2))
width_BC = np.sqrt(((pt_B[0] - pt_C[0]) ** 2) + ((pt_B[1] - pt_C[1]) ** 2))
max_width = max(int(width_AD), int(width_BC))
height_AB = np.sqrt(((pt_A[0] - pt_B[0]) ** 2) + ((pt_A[1] - pt_B[1]) ** 2))
height_CD = np.sqrt(((pt_C[0] - pt_D[0]) ** 2) + ((pt_C[1] - pt_D[1]) ** 2))
max_height = max(int(height_AB), int(height_CD))
input_pts = np.float32([pt_A, pt_B, pt_C, pt_D])
output_pts = np.float32([[0, 0], [0, max_height - 1], [max_width - 1, max_height - 1], [max_width - 1, 0]])
# Compute the perspective transform h_transform
h_transform = cv2.getPerspectiveTransform(input_pts, output_pts)
This h_transform warps the video in the orientation I would like to use when using warpPerspective. I want to know how I can now apply the transformation to the tracked points of objects moving through the area and measure the position and speed of the objects in m/s using the known length and width of the area I zoom into.
This was one thing I tried to convert the points of each object into the new plane, would this be correct or seem correct?
#Finally, any 2d point in rectangle A can be found in rectangle B using this operation:
point_in_A = (x,y,1)
tempMatrix (1x3) = h_transform * point_in_A(1x3)
tempMatrix will be (x, y, scale);
#using tempMatrix values below:
result xy_in_B = (x / scale , y / scale);
Once this is correct, how do I convert these pixel coordinates into real world coordinates? I believe the above transformation would place the coordinates onto a plane where the corners are the selected area with the origin being in the bottom left side.
I have confused myself trying to wrap my head around this so I apologize if this is not a viable question or is very confusing!
The idea behind this is to create a detection area for a security camera. Currently, I know how to find and use the modelview matrix data as shown below in the function "matrixTransformation". The value for the matrix should then be calculated for each increase of rotation of the security camera in the initialization function.
I would like to know how you would find coordinates of the edges of each security camera, a cylinder shape, using the matrix. I am using Pygame 1.9.2, Python 3.5 and PyOpenGL-3.1.0.
Picture of coordinates on the security camera which need to be calculated
def matrixTransformation(x,y,z):
matrix = (GLfloat * 16)()
glGetFloatv(GL_MODELVIEW_MATRIX, matrix)
xp = matrix[0] * x + matrix[4] * y + matrix[8] * z + matrix[12]
yp = matrix[1] * x + matrix[5] * y + matrix[9] * z + matrix[13]
zp = matrix[2] * x + matrix[6] * y + matrix[10] * z + matrix[14]
wp = matrix[3] * x + matrix[7] * y + matrix[11] * z + matrix[15]
xp /= wp
yp /= wp
zp /= wp
return xp,yp,zp
def init():
securityCameraRotation=380
glEnable(GL_DEPTH_TEST)
multipleRotations=0
result=[]
glPushMatrix()
glTranslatef(-4,1.5,5.5)
glRotate(315,1,1,1)
while True:
if securityCameraRotation>=380:
clockwise=True
multipleRotations+=1
elif securityCameraRotation<=310:
clockwise=False
glRotate(securityCameraRotation,0,1,0)
#append the transformed coordinates to result
if clockwise==True:
securityCameraRotation-=0.2
elif clockwise==False:
securityCameraRotation+=0.2
if multipleRotations>1:
#End the loop when one complete rotation between 310 and 380 has occured
break
glPopMatrix()
return result
def securityCamera(radius, height, num_slices,frontCircleColour,backCircleColour,tubeColour):
r = radius
h = height
n = float(num_slices)
circle_pts = []
for i in range(int(n) + 1):
angle = 2 * math.pi * (i/n)
x = r * math.cos(angle)
y = r * math.sin(angle)
pt = (x, y)
circle_pts.append(pt)
glBegin(GL_TRIANGLE_FAN) #drawing the back circle
glColor(backCircleColour)
glVertex(0, 0, h/2.0)
for (x, y) in circle_pts:
z = h/2.0
glVertex(x, y, z)
glEnd()
glBegin(GL_TRIANGLE_FAN) #drawing the front circle
glColor(frontCircleColour)
glVertex(0, 0, h/2.0)
for (x, y) in circle_pts:
z = -h/2.0
glVertex(x, y, z)
glEnd()
glBegin(GL_TRIANGLE_STRIP) #draw the tube
glColor(tubeColour)
for (x, y) in circle_pts:
z = h/2.0
glVertex(x, y, z)
glVertex(x, y, -z)
glEnd()
In OpenGL, there are a bunch of transformations that occur. First, we treat the object as if it is in model space, where the the object is centered at the origin and we draw the mesh (in this case, the cylinder). Then, we apply a model matrix transform (where we translate/rotate/scale our cylinder) and a view matrix transform (where we shift our scene relative to the imaginary camera). Finally we apply the projection matrix that adds the "3d perspective" to our scene by creating a matrix with gluPerspective or some more modern means. All of these matrix multiplications basically put the coordinates of your 3d models in the right place on our 2d screens (sort of, more detailed info here).
In terms of the model space, the yellow points you highlighted in your picture are actually just (0, 0, -h/2.0) and (0, 0, h/2.0). This is fine if you are just drawing your yellow points with glBegin(GL_POINTS) in your securityCamera function. However, you are probably more interested in calculating where these yellow points are located in world space (that is, after multiplication by the modelview matrix).
One simple way to get these world space coordinates is to multiply the yellow points' model space coordinates by the modelview matrix. Use your matrixTransformation function on (0, 0, -h/2.0) and (0, 0, h/2.0) and that should work!
Alternatively, as I hinted at in the comments, matrices like your modelview matrix are actually contain useful information that results from the accumulation of multiplications of translation, rotation, and scaling matrices. I pointed to this picture:
Each of these column axes actually corresponds to the rows of your numpy array (which is interesting since numpy is row-major while OpenGL is column-major). You can get the following axes of how your model is pointing in world spaces with the following snippet:
mv_matrix = glGetFloatv(GL_MODELVIEW_MATRIX)
left, up, forward, position = [v/(np.linalg.norm(v)) for v in mv_matrix[:,:3]]
Note that I cut off the last row in the numpy array and normalized each of the axes. If you take the forward array you get from there, you get the direction in world space of where that particular camera is pointing, while the position array gives you the world space position of the center (model space equivalent of (0, 0, 0)) of the security camera. Multiply the normalized forward array by h/2.0 and add that to position and you should get the world space position of the front of your security camera. This is not too useful for rendering to the screen, but could be used for "behind the scenes" math for interesting objects with the security camera.
As a side note, I realized I made a sign error in this line for the cylinder drawing code:
glBegin(GL_TRIANGLE_FAN) #drawing the front circle
glColor(frontCircleColour)
glVertex(0, 0, -h/2.0)#this was + in the original code!!!
Let me know if this helps you make sense of my earlier comments!
I'm trying to get some code that will perform a perspective transformation (in this case a 3d rotation) on an image.
import os.path
import numpy as np
import cv
def rotation(angle, axis):
return np.eye(3) + np.sin(angle) * skew(axis) \
+ (1 - np.cos(angle)) * skew(axis).dot(skew(axis))
def skew(vec):
return np.array([[0, -vec[2], vec[1]],
[vec[2], 0, -vec[0]],
[-vec[1], vec[0], 0]])
def rotate_image(imgname_in, angle, axis, imgname_out=None):
if imgname_out is None:
base, ext = os.path.splitext(imgname_in)
imgname_out = base + '-out' + ext
img_in = cv.LoadImage(imgname_in)
img_size = cv.GetSize(img_in)
img_out = cv.CreateImage(img_size, img_in.depth, img_in.nChannels)
transform = rotation(angle, axis)
cv.WarpPerspective(img_in, img_out, cv.fromarray(transform))
cv.SaveImage(imgname_out, img_out)
When I rotate about the z-axis, everything works as expected, but rotating around the x or y axis seems completely off. I need to rotate by angles as small as pi/200 before I start getting results that seem at all reasonable. Any idea what could be wrong?
First, build the rotation matrix, of the form
[cos(theta) -sin(theta) 0]
R = [sin(theta) cos(theta) 0]
[0 0 1]
Applying this coordinate transform gives you a rotation around the origin.
If, instead, you want to rotate around the image center, you have to first shift the image center
to the origin, then apply the rotation, and then shift everything back. You can do so using a
translation matrix:
[1 0 -image_width/2]
T = [0 1 -image_height/2]
[0 0 1]
The transformation matrix for translation, rotation, and inverse translation then becomes:
H = inv(T) * R * T
I'll have to think a bit about how to relate the skew matrix to the 3D transformation. I expect the easiest route is to set up a 4D transformation matrix, and then to project that back to 2D homogeneous coordinates. But for now, the general form of the skew matrix:
[x_scale 0 0]
S = [0 y_scale 0]
[x_skew y_skew 1]
The x_skew and y_skew values are typically tiny (1e-3 or less).
Here's the code:
from skimage import data, transform
import numpy as np
import matplotlib.pyplot as plt
img = data.camera()
theta = np.deg2rad(10)
tx = 0
ty = 0
S, C = np.sin(theta), np.cos(theta)
# Rotation matrix, angle theta, translation tx, ty
H = np.array([[C, -S, tx],
[S, C, ty],
[0, 0, 1]])
# Translation matrix to shift the image center to the origin
r, c = img.shape
T = np.array([[1, 0, -c / 2.],
[0, 1, -r / 2.],
[0, 0, 1]])
# Skew, for perspective
S = np.array([[1, 0, 0],
[0, 1.3, 0],
[0, 1e-3, 1]])
img_rot = transform.homography(img, H)
img_rot_center_skew = transform.homography(img, S.dot(np.linalg.inv(T).dot(H).dot(T)))
f, (ax0, ax1, ax2) = plt.subplots(1, 3)
ax0.imshow(img, cmap=plt.cm.gray, interpolation='nearest')
ax1.imshow(img_rot, cmap=plt.cm.gray, interpolation='nearest')
ax2.imshow(img_rot_center_skew, cmap=plt.cm.gray, interpolation='nearest')
plt.show()
And the output:
I do not get the way you build your rotation matrix. It seems rather complicated to me. Usually, it would be built by constructing a zero matrix, putting 1 on unneeded axes, and the common sin, cos, -cos, sin into the two used dimensions. Then multiplying all these together.
Where did you get that np.eye(3) + np.sin(angle) * skew(axis) + (1 - np.cos(angle)) * skew(axis).dot(skew(axis)) construct from?
Try building the projection matrix from basic building blocks. Constructing a rotation matrix is fairly easy, and "rotationmatrix dot skewmatrix" should work.
You might need to pay attention to the rotation center though. Your image probably is placed at a virtual position of 1 on the z axis, so by rotating on x or y, it moves around a bit.
So you'd need to use a translation so z becomes 0, then rotate, then translate back. (Translation matrixes in affine coordinates are pretty simple, too. See wikipedia: https://en.wikipedia.org/wiki/Transformation_matrix )