OpenCV - Tilted camera and triangulation landmark for stereo vision

OpenCV - Tilted camera and triangulation landmark for stereo vision - python

I am using a stereo system and so I am trying to get world coordinates of some points by triangulation.
My cameras present an angle, the Z axis direction (direction of the depth) is not normal to my surface. That is why when I observe flat surface, I get no constant depth but a "linear" variation, correct? And I want the depth from the baseline direction... How I can re-project?
A piece of my code with my projective arrays and triangulate function :
#C1 and C2 are the cameras matrix (left and rig)
#R_0 and T_0 are the transformation between cameras
#Coord1 and Coord2 are the correspondant coordinates of left and right respectively
P1 = np.dot(C1,np.hstack((np.identity(3),np.zeros((3,1)))))
P2 =np.dot(C2,np.hstack(((R_0),T_0)))
for i in range(Coord1.shape[0])
z = cv2.triangulatePoints(P1, P2, Coord1[i,],Coord2[i,])
-------- EDIT LATER -----------
Thanks scribbleink, so i tried to apply your proposal. But i think i have a mistake because it doesnt work well as you can see below. And the point clouds seems to be warped and curved towards the edges of the image.
U, S, Vt = linalg.svd(F)
V = Vt.T
#Right epipol
U[:,2]/U[2,2]
# The expected X-direction with C1 camera matri and C1[0,0] the focal length
vecteurX = np.array([(U[:,2]/U[2,2])[0],(U[:,2]/U[2,2])[1],C1[0,0]])
vecteurX_unit = vecteurX/np.sqrt(vecteurX[0]**2 + vecteurX[1]**2 + vecteurX[2]**2)
# The expected Y axis :
height = 2048
vecteurY = np.array([0, height -1, 0])
vecteurY_unit = vecteurY/np.sqrt(vecteurY[0]**2 + vecteurY[1]**2 + vecteurY[2]**2)
# The expected Z direction :
vecteurZ = np.cross(vecteurX,vecteurY)
vecteurZ_unit = vecteurZ/np.sqrt(vecteurZ[0]**2 + vecteurZ[1]**2 + vecteurZ[2]**2)
#Normal of the Z optical (the current Z direction)
Zopitcal = np.array([0,0,1])
cos_theta = np.arccos(np.dot(vecteurZ_unit, Zopitcal)/np.sqrt(vecteurZ_unit[0]**2 + vecteurZ_unit[1]**2 + vecteurZ_unit[2]**2)*np.sqrt(Zopitcal[0]**2 + Zopitcal[1]**2 + Zopitcal[2]**2))
sin_theta = (np.cross(vecteurZ_unit, Zopitcal))[1]
#Definition of the Rodrigues vector and use of cv2.Rodrigues to get rotation matrix
v1 = Zopitcal
v2 = vecteurZ_unit
v_rodrigues = v1*cos_theta + (np.cross(v2,v1))*sin_theta + v2*(np.cross(v2,v1))*(1. - cos_theta)
R = cv2.Rodrigues(v_rodrigues)[0]

Your expected z direction is arbitrary to the reconstruction method. In general, you have a rotation matrix that rotates the left camera from your desired direction. You can easily build that matrix, R. Then all you need to do is to multiply your reconstructed points by the transpose of R.

To add to fireant's response, here is one candidate solution, assuming that the expected X-direction coincides with the line joining the centers of projection of the two cameras.
Compute the focal lengths f_1 and f_2 (via pinhole model calibration).
Solve for the location of camera 2's epipole in camera 1's frame. For this, you can use either the Fundamental matrix (F) or the Essential matrix (E) of the stereo camera pair. Specifically, the left and right epipoles lie in the nullspace of F, so you can use Singular Value Decomposition. For a solid theoretical reference, see Hartley and Zisserman, Second edition, Table 9.1 "Summary of fundamental matrix properties" on Page 246 (freely available PDF of the chapter).
The center of projection of camera 1, i.e. (0, 0, 0) and the location of the right epipole, i.e. (e_x, e_y, f_1) together define a ray that aligns with the line joining the camera centers. This can be used as the expected X-direction. Call this vector v_x.
Assuming that the expected Y axis faces downward in the image plane, i.e, from (0, 0, f_1) to (0, height-1, f_1), where f is the focal length. Call this vector as v_y.
The expected Z direction is now the cross-product of vectors v_x and v_y.
Using the expected Z direction along with the optical axis (Z-axis) of camera 1, you can then compute a rotation matrix from two 3D vectors using, say the method listed in this other stackoverflow post.
Practical note:
Expecting the planar object to exactly align with the stereo baseline is unlikely without considerable effort, in my practical experience. Some amount of plane-fitting and additional rotation would be required.
One-time effort:
It depends on whether you need to do this once, e.g. for one-time calibration, in which case simply make this estimation process real-time, then rotate your stereo camera pair until the depth map variance is minimized. Then lock your camera positions and pray someone doesn't bump into it later.
Repeatability:
If you need to keep aligning your estimated depth maps to truly arbitrary Z-axes that change for every new frame captured, then you should consider investing time in the plane-estimation method and making it more robust.

Related

Get coordinates of white pixels in order forming a curve

I have made an algorithm that gets the coordinates of the n degree curve in an image given its parameters(constant coefficients).
Equation of curve is as follows:
y = a0.x^0 + a1.x^1 + a2.x^2 + ..... + an-1.x^n-1 + an.x^n
(a0, a1, a2,... are given)
The issue is that the coordinates of this curve are not stored in order. They are from minimum x coordinate to maximum x coordinate and for each x, from minimum y coordinate to maximum y coordinate. I wish to have these coordinates in order from the starting of the curve to the end of the curve.
Images below explain the way coordinates are currently stored v/s the way they are expected to be stored.
Images below are the pixelated image on the line where each box denotes a pixel.
Expected order of coordinates:
Current order of coordinates:
Can anyone suggest an algorithm to solve this problem. Code in python would be appreciable.

Sort with the lexicographical ordering "left to right, then ties bottom to top".

Camera pose from solvePnP

Goal
I need to retrieve the position and attitude angles of a camera (using OpenCV / Python).
Definitions
Attitude angles are defined by:
Yaw being the general orientation of the camera when it lays on an horizontal plane: toward north=0, toward east = 90°, south=180°, west=270°, etc.
Pitch being the "nose" orientation of the camera: 0° = looking horizontally at a point on the horizon, -90° = looking down vertically, +90° = looking up vertically, 45° = looking up at an angle of 45° from the horizon, etc.
Roll being if the camera is tilted left or right when in your hands (so it is always looking at a point on the horizon when this angle is varying): +45° = tilted 45° in a clockwise rotation when you grab the camera, thus +90° (and -90°) would be the angle needed for a portrait picture for example, etc.
World reference frame:
My world reference frame is oriented so:
Toward east = +X
Toward north = +Y
Up toward the sky = +Z
My world objects points are given in that reference frame.
Camera reference frame:
According to the doc, the camera reference frame is oriented like that:
What to achieve
Now, from cv2.solvepnp() over a bunch of images points and their corresponding world coordinates, I have computed both rvec and tvec.
But, according to the doc: http://docs.opencv.org/trunk/d9/d0c/group__calib3d.html#ga549c2075fac14829ff4a58bc931c033d , they are:
rvec ; Output rotation vector (see Rodrigues()) that, together with tvec, brings points from the model coordinate system to the camera coordinate system.
tvec ; Output translation vector.
these vectors are given to go to the camera reference frame.
I need to do the exact inverse operation, thus retrieving camera position and attitude relative to world coordinates.
Camera position:
So I have computed the rotation matrix from rvec with Rodrigues():
rmat = cv2.Rodrigues(rvec)[0]
And if I'm right here, the camera position expressed in the world coordinates system is given by:
camera_position = -np.matrix(rmat).T * np.matrix(tvec)
(src: Camera position in world coordinate from cv::solvePnP )
This looks fairly well.
Camera attitude (yaw, pitch and roll):
But how to retrieve corresponding attitude angles (yaw, pitch and roll as describe above) from the point of view of the camera (as if it was in your hands basically)?
I have tried implementing this: http://planning.cs.uiuc.edu/node102.html#eqn:yprmat in a function:
def rotation_matrix_to_attitude_angles(R):
import math
import numpy as np
cos_beta = math.sqrt(R[2,1] * R[2,1] + R[2,2] * R[2,2])
validity = cos_beta < 1e-6
if not validity:
alpha = math.atan2(R[1,0], R[0,0]) # yaw [z]
beta = math.atan2(-R[2,0], cos_beta) # pitch [y]
gamma = math.atan2(R[2,1], R[2,2]) # roll [x]
else:
alpha = math.atan2(R[1,0], R[0,0]) # yaw [z]
beta = math.atan2(-R[2,0], cos_beta) # pitch [y]
gamma = 0 # roll [x]
return np.array([alpha, beta, gamma])
But results are not consistent with what I want. For example, I have a roll angle of ~ -90°, but the camera is horizontal so it should be around 0.
Pitch angle is around 0 so it seems correctly determined but I don't really understand why it's around 0 as the Z-axis of the camera reference frame is horizontal, so it's has already been tilted from 90° from the vertical axis of the world reference frame. I would have expected a value of -90° or +270° here. Anyway.
And yaw seems good. Mainly.
Question
Did I miss something with the roll angle?

The order of Euler rotations (pitch, yaw, roll) is important.
According to x-convention, the 3-1-3 extrinsic Euler angles φ, θ and ψ (around z-axis first , then x-axis and then again z-axis) can be obtained as follows:
sx = math.sqrt((R[2,0])**2 + (R[2,1])**2)
tolerance = 1e-6;
if (sx > tolerance): # no singularity
alpha = math.atan2(R[2,0], R[2,1])
beta = math.atan2(sx, R[2,2])
gamma= -math.atan2(R[0,2], R[1,2])
else:
alpha = 0
beta = math.atan2(sx, R[2,2])
gamma= 0
But this not an unique solution. For example ZYX,
sy = math.sqrt((R[0,0])**2 + (R[1,0])**2)
tolerance = 1e-6;
if (sy > tolerance): # no singularity
alpha = math.atan2(R[1,0], R[0,0])
beta = math.atan2(-R[2,0], sy)
gamma= math.atan2(R[2,1] , R[2,2])
else:
alpha = 0
beta = math.atan2(-R[2,0], sy)
gamma= math.atan2(-R[1,2], R[1,1])

I think your transformation is missing a rotation. If I interpret your question correctly, you are asking what the inverse of (rotation by R followed by translation T)
${\hat{R}|\vec{T}}.\vec{r}=\hat{R}.\vec{r}+\vec{T}$
The inverse should return the identity
${\hat{R}|\vec{T}}^{-1}.{\hat{R}|\vec{T}}={\hat{1}|0}$
Working this through yields
${\hat{R}|\vec{T}}^{-1}={\hat{R}^-1|-\hat{R}^-1\cdot \vec{T}}$
As far as I could tell you were using the $-\hat{R}^-1\cdot \vec{T}$ (undoing th translation) part of the answer but leaving out the inverse rotation $\hat{R}^-1$
Rotation+Translation:
${\hat{R}|\vec{T}}\vec{r}=\hat{R}\cdot\vec{r}+\vec{T}$
Inverse of (Rotation+Translation):
${\hat{R}|\vec{T}}^{-1}\vec{r}=\hat{R}^{-1}\cdot\vec{r}-\hat{R}^{-1}\cdot \vec{T}$
Non-latex mode (R^-1*r-R^-1*T) is the inverse of (R.r+T)

In this link: http://planning.cs.uiuc.edu/node102.html#eqn:yprmat they assume a different coordinate system for the object then the one of your camera.
They define:
Roll - Rotation around x (in your case its around z)
pitch - Rotation around y (in your case its around x)
yaw - Rotation around z (in your case its around y)
To get the right conversion you need to recalculate the full rotation matrix given the three angles as:
So for the reverse conversion you get:
cos_beta = math.sqrt(R[0,2] * R[0,2] + R[2,2] * R[2,2])
alpha = math.atan2(R[0,2], R[2,2]) # yaw [z]
beta = math.atan2(-R[1, 2], cos_beta) # pitch [y]
gamma = math.atan2(R[1, 0], R[1,1])

Computing diameter-lines of a 3D spherical mask

Background
For an algorithm I'm working on, I currently use a 3D sphere as binary mask, with a NxNxN array having voxels in a sphere of radius N//2 as True. Further processing does computation for each voxel set as True.
It proved computationally intensive for my specific task as N grew large = O(N^3), so I now want to reduce my binary mask to a subsample of lines radiating from array center within radius.
Objective
I want a 3D binary mask of the lines in gray in the image.
To have a bit of control over the number of voxels, I would have a parameter (say l) regulating the number of lines sampled in each 2D circle, and maybe a second one (k ?) for the number of z-rotation.
What I tried
I am using numpy and scipy, and I thought that I could use the scipy.ndimage.interpolation.rotate method to rotate a single line around on a plane, then use that complete 2D mask to rotate around the z-axis.
This proved difficult, as interpolate uses some deep magic regarding splines that discard my True values on rotation.
I am thinking that I could compute mathematically which voxel should be set to True by following some line-equations, but I'm at a loss to find them.
Any idea how to get there ?
Update : Solution !
Thanks to jkalden who helped me think this through and gave code samples, I have this :
rmax is radius of sphere, n_theta and n_phi the number of polar and azimutal lines to use.
out_mask = np.zeros((rmax*2,) * 3, dtype=bool)
# for each phi = one circle in azimutal circles
for phi in np.linspace(0, np.deg2rad(360), n_phi,endpoint=False):
# for all lines in polar circle of this azimutal circle
for theta in np.linspace(0, np.deg2rad(360), n_theta,endpoint=False):
# for all distances (0-rmax) in these lines
for r in range(rmax):
coords = spherical_to_cartesian([r, theta, phi]) + rmax
out_mask[tuple(coords)] = True
With the spherical_to_cartesian from this code sample.
Which gives me this (with rmax = 50 and n_theta = n_phi = 8) :
(Center area tuned out of my function by choice)

I propose to change the coordinate system to spherical coordinates. Thus, you will choose your 2D circle by an azimuthal angle, and a line then is defined by additionally choosing a polar angle. The variable along the line is then just the radius, and you can use ´numpy.linspace´ to discretize it. Doing so might also save time during calculation.
You can switch your coordinate system any time by using the bijective relation which is implemented e.g. here or here.

Affine transformation between contours in OpenCV

I have a historical time sequence of seafloor images scanned from film that need registration.
from pylab import *
import cv2
import urllib
urllib.urlretrieve('http://geoport.whoi.edu/images/frame014.png','frame014.png');
urllib.urlretrieve('http://geoport.whoi.edu/images/frame015.png','frame015.png');
gray1=cv2.imread('frame014.png',0)
gray2=cv2.imread('frame015.png',0)
figure(figsize=(14,6))
subplot(121);imshow(gray1,cmap=cm.gray);
subplot(122);imshow(gray2,cmap=cm.gray);
I want to use the black region on the left of each image to do the registration, since that region was inside the camera and should be fixed in time. So I just need to compute the affine transformation between the black regions.
I determined these regions by thresholding and finding the largest contour:
def find_biggest_contour(gray,threshold=40):
# threshold a grayscale image
ret,thresh = cv2.threshold(gray,threshold,255,1)
# find the contours
contours,h = cv2.findContours(thresh,mode=cv2.RETR_LIST,method=cv2.CHAIN_APPROX_NONE)
# measure the perimeter
perim = [cv2.arcLength(cnt,True) for cnt in contours]
# find contour with largest perimeter
i=perim.index(max(perim))
return contours[i]
c1=find_biggest_contour(gray1)
c2=find_biggest_contour(gray2)
x1=c1[:,0,0];y1=c1[:,0,1]
x2=c2[:,0,0];y2=c2[:,0,1]
figure(figsize=(8,8))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1,y1,'b-')
imshow(gray2,cmap=cm.gray, alpha=0.5);plot(x2,y2,'g-')
axis([0,1500,1000,0]);
The blue is the longest contour from the 1st frame, the green is the longest contour from the 2nd frame.
What is the best way to determine the rotation and offset between the blue and green contours?
I only want to use the right side of the contours in some region surrounding the step, something like the region between the arrows.
Of course, if there is a better way to register these images, I'd love to hear it. I already tried a standard feature matching approach on the raw images, and it didn't work well enough.

Following Shambool's suggested approach, here's what I've come up with. I used a Ramer-Douglas-Peucker algorithm to simplify the contour in the region of interest and identified the two turning points. I was going to use the two turning points to get my three unknowns (xoffset, yoffset and angle of rotation), but the 2nd turning point is a bit too far toward the right because RDP simplified away the smoother curve in this region. So instead I used the angle of the line segment leading up to the 1st turning point. Differencing this angle between image1 and image2 gives me the rotation angle. I'm still not completely happy with this solution. It worked well enough for these two images, but I'm not sure it will work well on the entire image sequence. We'll see.
It would really be better to fit the contour to the known shape of the black border.
# select region of interest from largest contour
ind1=where((x1>190.) & (y1>200.) & (y1<900.))[0]
ind2=where((x2>190.) & (y2>200.) & (y2<900.))[0]
figure(figsize=(10,10))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1[ind1],y1[ind1],'b-')
imshow(gray2,cmap=cm.gray, alpha=0.5);plot(x2[ind2],y2[ind2],'g-')
axis([0,1500,1000,0])
def angle(x1,y1):
# Returns angle of each segment along an (x,y) track
return array([math.atan2(y,x) for (y,x) in zip(diff(y1),diff(x1))])
def simplify(x,y, tolerance=40, min_angle = 60.*pi/180.):
"""
Use the Ramer-Douglas-Peucker algorithm to simplify the path
http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
Python implementation: https://github.com/sebleier/RDP/
"""
from RDP import rdp
points=vstack((x,y)).T
simplified = array(rdp(points.tolist(), tolerance))
sx, sy = simplified.T
theta=abs(diff(angle(sx,sy)))
# Select the index of the points with the greatest theta
# Large theta is associated with greatest change in direction.
idx = where(theta>min_angle)[0]+1
return sx,sy,idx
sx1,sy1,i1 = simplify(x1[ind1],y1[ind1])
sx2,sy2,i2 = simplify(x2[ind2],y2[ind2])
fig = plt.figure(figsize=(10,6))
ax =fig.add_subplot(111)
ax.plot(x1, y1, 'b-', x2, y2, 'g-',label='original path')
ax.plot(sx1, sy1, 'ko-', sx2, sy2, 'ko-',lw=2, label='simplified path')
ax.plot(sx1[i1], sy1[i1], 'ro', sx2[i2], sy2[i2], 'ro',
markersize = 10, label='turning points')
ax.invert_yaxis()
plt.legend(loc='best')
# determine x,y offset between 1st turning points, and
# angle from difference in slopes of line segments approaching 1st turning point
xoff = sx2[i2[0]] - sx1[i1[0]]
yoff = sy2[i2[0]] - sy1[i1[0]]
iseg1 = [i1[0]-1, i1[0]]
iseg2 = [i2[0]-1, i2[0]]
ang1 = angle(sx1[iseg1], sy1[iseg1])
ang2 = angle(sx2[iseg2], sy2[iseg2])
ang = -(ang2[0] - ang1[0])
print xoff, yoff, ang*180.*pi
-28 14 5.07775871644
# 2x3 affine matrix M
M=array([cos(ang),sin(ang),xoff,-sin(ang),cos(ang),yoff]).reshape(2,3)
print M
[[ 9.99959685e-01 8.97932821e-03 -2.80000000e+01]
[ -8.97932821e-03 9.99959685e-01 1.40000000e+01]]
# warp 2nd image into coordinate frame of 1st
Minv = cv2.invertAffineTransform(M)
gray2b = cv2.warpAffine(gray2,Minv,shape(gray2.T))
figure(figsize=(10,10))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1[ind1],y1[ind1],'b-')
imshow(gray2b,cmap=cm.gray, alpha=0.5);
axis([0,1500,1000,0]);
title('image1 and transformed image2 overlain with 50% transparency');

Good question.
One approach is to represent contours as 2d point clouds and then do registration.
More simple and clear code in Matlab that can give you affine transform.
And more complex C++ code(using VXL lib) with python and matlab wrapper included.
Or you can use some modificated ICP(iterative closest point) algorithm that is robust to noise and can handle affine transform.
Also your contours seems to be not very accurate so it can be a problem.
Another approach is to use some kind of registration that use pixel values.
Matlab code (I think it's using some kind of minimizer+ crosscorrelation metric)
Also maybe there is some kind of optical flow registration(or some other kind) that is used in medical imaging.
Also you can use point features as SIFT(SURF).
You can try it quick in FIJI(ImageJ)
also this link.
Open 2 images
Plugins->feature extraction-> sift (or other)
Set expected transformation to affine
Look at estimated transformation model [3,3] homography matrix in ImageJ log.
If it works good then you can implement it in python using OpenCV or maybe using Jython with ImageJ.
And it will be better if you post original images and describe all conditions (it seems that image is changing between frames)

You can represent these contours with their respective ellipses. These ellipses are centered on the centroid of the contour and they are oriented towards the main density axis. You can compare the centroids and the orientation angle.
1) Fill the contours => drawContours with thickness=CV_FILLED
2) Find moments => cvMoments()
3) And use them.
Centroid: { x, y } = {M10/M00, M01/M00 }
Orientation (theta):
EDIT: I customized the sample code from legacy (enteringblobdetection.cpp) for your case.
/* Image moments */
double M00,X,Y,XX,YY,XY;
CvMoments m;
CvRect r = ((CvContour*)cnt)->rect;
CvMat mat;
cvMoments( cvGetSubRect(pImgFG,&mat,r), &m, 0 );
M00 = cvGetSpatialMoment( &m, 0, 0 );
X = cvGetSpatialMoment( &m, 1, 0 )/M00;
Y = cvGetSpatialMoment( &m, 0, 1 )/M00;
XX = (cvGetSpatialMoment( &m, 2, 0 )/M00) - X*X;
YY = (cvGetSpatialMoment( &m, 0, 2 )/M00) - Y*Y;
XY = (cvGetSpatialMoment( &m, 1, 1 )/M00) - X*Y;
/* Contour description */
CvPoint myCentroid(r.x+(float)X,r.y+(float)Y);
double myTheta = atan( 2*XY/(XX-YY) );
Also, check this with OpenCV 2.0 examples.

If you don't want to find the homography between the two images and want to find the affine transformation you have three unknowns, rotation angle (R), and the displacement in x and y (X,Y). Therefore minimum of two points (with two known values for each) are needed to find the unknowns. Two points should be matched between the two images or two lines, each has two known values, the intercept and slope. If you go with the point matching approach, the further the points are from each other the more robust is the found transform to noise (this is very simple if you remember error propagation rules).
In the two point matching method:
find two points (A and B) in the first image I1 and their corresponding points (A',B') in the second image I2
find the middle point between A and B: C, and the middle point between A' and B': C'
the difference C and C' (C-C') gives the translation between the images (X and Y)
using the dot product of C-A and C'-A' you can find the rotation angle (R)
To detect robust points, I would find the the points along the side of counter you have found with highest absolute value of the second derivative (Hessian) and then try to match them. Since you mentioned this is a video footage you can easily make the assumption the transformation between each two frames is small to reject the outliers.

Find speed of vehicle from images

I am doing a project to find the speed of a vehicle from images. We are taking these images from within the vehicle. We will be marking some object from the 1st image as a reference. Using the properties of the same object in the next image, we must calculate the speed of the moving vehicle. Can anyone help me here??? I am using python opencv. I have succeeded till finding the marked pixel in the 2nd image using Optical flow method. Can anyone help me with the rest?

Knowing the acquisition frequency, you must now find the distance between the successive positions of the marker.
To find this distance, I suggest you estimate the pose of the marker for each image. Loosely speaking, the "pose" is the transformation matrix expressing the coordinates of an object relative to a camera. Once you have those successive coordinates, you can compute the distance, and then the speed.
Pose estimation is the process of computing the position and orientation of a known 3D object relative to a 2D camera. The resulting pose is the transformation matrix describing the object's referential in the camera's referential.
OpenCV implements a pose estimation algorithm: Posit. The doc says:
Given some 3D points (in object
coordinate system) of the object, at
least four non-coplanar points, their
corresponding 2D projections in the
image, and the focal length of the
camera, the algorithm is able to
estimate the object's pose.
This means:
You must know the focal length of your camera
You must know the geometry of your marker
You must be able to match four know points of your marker in the 2D image
You may have to compute the focal length of the camera using the calibration routines provided by OpenCV. I think you have the two other required data.
Edit:
// Algorithm example
MarkerCoords = {Four coordinates of know 3D points}
I1 = take 1st image
F1 = focal(I1)
MarkerPixels1 = {Matching pixels in I1}
Pose1 = posit(MarkerCoords, MarkerPixels1, F1)
I2 = take 2nd image
F2 = focal(I2)
MarkerPixels2 = {Matching pixels in I2 by optical flow}
Pose2 = posit(MarkerCoords, MarkerPixels2, F2)
o1 = origin_of_camera * Pose1 // Origin of camera is
o2 = origin_of_camera * Pose2 // typically [0,0,0]
dist = euclidean_distance(o1, o2)
speed = dist/frequency
Edit 2: (Answers to comments)
"What is the acquisition frequency?"
Computing the speed of your vehicle is equivalent to computing the speed of the marker. (In the first case, the referential is the marker attached to the earth, in the second case, the referential is the camera attached to the vehicle.) This is expressed by the following equation:
speed = D/(t2-t1)
With:
D the distance [o1 o2]
o1 the position of the marker at time t1
o2 the position of the marker at time t2
You can retrieve the elapsed time either by extracting t1 and t2 from the metadata of your photos, or from the acquisition frequency of your imaging device: t2-t1 = T = 1/F.
"Won't it be better to mark simple things like posters? And if doing so can't we consider it as a 2d object?"
This is not possible with the Posit algorithm (or with any other pose estimation algorithm as far as I know): it requires four non-coplanar points. This means you cannot chose a 2D object embedded in a 3D space, you have to chose an object with some depth.
On the other hand, you can use a really simple shape, as far as it is a volume. (A cube for example.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.