Camera pose from solvePnP

Camera pose from solvePnP - python

Goal
I need to retrieve the position and attitude angles of a camera (using OpenCV / Python).
Definitions
Attitude angles are defined by:
Yaw being the general orientation of the camera when it lays on an horizontal plane: toward north=0, toward east = 90°, south=180°, west=270°, etc.
Pitch being the "nose" orientation of the camera: 0° = looking horizontally at a point on the horizon, -90° = looking down vertically, +90° = looking up vertically, 45° = looking up at an angle of 45° from the horizon, etc.
Roll being if the camera is tilted left or right when in your hands (so it is always looking at a point on the horizon when this angle is varying): +45° = tilted 45° in a clockwise rotation when you grab the camera, thus +90° (and -90°) would be the angle needed for a portrait picture for example, etc.
World reference frame:
My world reference frame is oriented so:
Toward east = +X
Toward north = +Y
Up toward the sky = +Z
My world objects points are given in that reference frame.
Camera reference frame:
According to the doc, the camera reference frame is oriented like that:
What to achieve
Now, from cv2.solvepnp() over a bunch of images points and their corresponding world coordinates, I have computed both rvec and tvec.
But, according to the doc: http://docs.opencv.org/trunk/d9/d0c/group__calib3d.html#ga549c2075fac14829ff4a58bc931c033d , they are:
rvec ; Output rotation vector (see Rodrigues()) that, together with tvec, brings points from the model coordinate system to the camera coordinate system.
tvec ; Output translation vector.
these vectors are given to go to the camera reference frame.
I need to do the exact inverse operation, thus retrieving camera position and attitude relative to world coordinates.
Camera position:
So I have computed the rotation matrix from rvec with Rodrigues():
rmat = cv2.Rodrigues(rvec)[0]
And if I'm right here, the camera position expressed in the world coordinates system is given by:
camera_position = -np.matrix(rmat).T * np.matrix(tvec)
(src: Camera position in world coordinate from cv::solvePnP )
This looks fairly well.
Camera attitude (yaw, pitch and roll):
But how to retrieve corresponding attitude angles (yaw, pitch and roll as describe above) from the point of view of the camera (as if it was in your hands basically)?
I have tried implementing this: http://planning.cs.uiuc.edu/node102.html#eqn:yprmat in a function:
def rotation_matrix_to_attitude_angles(R):
import math
import numpy as np
cos_beta = math.sqrt(R[2,1] * R[2,1] + R[2,2] * R[2,2])
validity = cos_beta < 1e-6
if not validity:
alpha = math.atan2(R[1,0], R[0,0]) # yaw [z]
beta = math.atan2(-R[2,0], cos_beta) # pitch [y]
gamma = math.atan2(R[2,1], R[2,2]) # roll [x]
else:
alpha = math.atan2(R[1,0], R[0,0]) # yaw [z]
beta = math.atan2(-R[2,0], cos_beta) # pitch [y]
gamma = 0 # roll [x]
return np.array([alpha, beta, gamma])
But results are not consistent with what I want. For example, I have a roll angle of ~ -90°, but the camera is horizontal so it should be around 0.
Pitch angle is around 0 so it seems correctly determined but I don't really understand why it's around 0 as the Z-axis of the camera reference frame is horizontal, so it's has already been tilted from 90° from the vertical axis of the world reference frame. I would have expected a value of -90° or +270° here. Anyway.
And yaw seems good. Mainly.
Question
Did I miss something with the roll angle?

The order of Euler rotations (pitch, yaw, roll) is important.
According to x-convention, the 3-1-3 extrinsic Euler angles φ, θ and ψ (around z-axis first , then x-axis and then again z-axis) can be obtained as follows:
sx = math.sqrt((R[2,0])**2 + (R[2,1])**2)
tolerance = 1e-6;
if (sx > tolerance): # no singularity
alpha = math.atan2(R[2,0], R[2,1])
beta = math.atan2(sx, R[2,2])
gamma= -math.atan2(R[0,2], R[1,2])
else:
alpha = 0
beta = math.atan2(sx, R[2,2])
gamma= 0
But this not an unique solution. For example ZYX,
sy = math.sqrt((R[0,0])**2 + (R[1,0])**2)
tolerance = 1e-6;
if (sy > tolerance): # no singularity
alpha = math.atan2(R[1,0], R[0,0])
beta = math.atan2(-R[2,0], sy)
gamma= math.atan2(R[2,1] , R[2,2])
else:
alpha = 0
beta = math.atan2(-R[2,0], sy)
gamma= math.atan2(-R[1,2], R[1,1])

I think your transformation is missing a rotation. If I interpret your question correctly, you are asking what the inverse of (rotation by R followed by translation T)
${\hat{R}|\vec{T}}.\vec{r}=\hat{R}.\vec{r}+\vec{T}$
The inverse should return the identity
${\hat{R}|\vec{T}}^{-1}.{\hat{R}|\vec{T}}={\hat{1}|0}$
Working this through yields
${\hat{R}|\vec{T}}^{-1}={\hat{R}^-1|-\hat{R}^-1\cdot \vec{T}}$
As far as I could tell you were using the $-\hat{R}^-1\cdot \vec{T}$ (undoing th translation) part of the answer but leaving out the inverse rotation $\hat{R}^-1$
Rotation+Translation:
${\hat{R}|\vec{T}}\vec{r}=\hat{R}\cdot\vec{r}+\vec{T}$
Inverse of (Rotation+Translation):
${\hat{R}|\vec{T}}^{-1}\vec{r}=\hat{R}^{-1}\cdot\vec{r}-\hat{R}^{-1}\cdot \vec{T}$
Non-latex mode (R^-1*r-R^-1*T) is the inverse of (R.r+T)

In this link: http://planning.cs.uiuc.edu/node102.html#eqn:yprmat they assume a different coordinate system for the object then the one of your camera.
They define:
Roll - Rotation around x (in your case its around z)
pitch - Rotation around y (in your case its around x)
yaw - Rotation around z (in your case its around y)
To get the right conversion you need to recalculate the full rotation matrix given the three angles as:
So for the reverse conversion you get:
cos_beta = math.sqrt(R[0,2] * R[0,2] + R[2,2] * R[2,2])
alpha = math.atan2(R[0,2], R[2,2]) # yaw [z]
beta = math.atan2(-R[1, 2], cos_beta) # pitch [y]
gamma = math.atan2(R[1, 0], R[1,1])

Related

How to calculate the light reflection angle on tridimensional space

I have the follow situation:
One point located on Earth Surface with 3D coordinates (X, Y, Z) and one camera inside the airplane that taken picture from surface. For the camera, I have too the 3D coordinates (X, Y, Z) for the exactly moment that the image was taken.
To this scenario I need calculate the light reflection angle between the point on Earth surface and the camera inside the airplane.
I would like suggestions or ideias to calculate this angle. I know that a possible solution will use the analytical geometry.
I have calculated the sun incidence angle to the point on surface using PVLIB library, but I can't found on pvlib a function to determine the light reflection angle.
Thx for help me!!

I suppose that you used the sun elevation and azimuth angle to calculate the sun incidence vector by some formula such as (suppose azimuth as [N=0 / E=90 / S=180 / W=270]):
Vx_s = sin(sun_azim) * cos(sun_elev)
Vy_s = cos(sun_azim) * cos(sun_elev)
Vz_s = sin(sun_elev)
Considering a light reflection on a flat surface (horizontal with normal vector to zenith), the vector of reflected light (forward light, not considering scattering/dispersion rays, e.g. mirror surface) will be
Vx_r = sin(sun_azim + 180) * cos(sun_elev)
Vy_r = cos(sun_azim + 180) * cos(sun_elev)
Vz_r = sin(sun_elev)
The vector of the plane camera is:
Vx_p = X_plane - X_surface
Vy_p = Y_plane - Y_surface
Vz_p = Z_plane - Z_surface
Then, the angle between the reflected ray and the airplane camera is (take into account that the plane-site vector is not an unit vector in this example):
alpha = arccos( (Vx_p*Vx_r + Vy_p*Vy_r + Vz_p*Vz_r) / sqrt(Vx_p**2 + Vy_p**2 + Vz_p**2) )

Theory behind Wolfenstein-style 3D rendering

I'm currently working on a project about 3D rendering, and I'm trying to make simplistic program that can display a simple 3D room (static shading, no player movement, only rotation) with pygame
So far I've worked through the theory:
Start with a list of coordinates for the X and Z of each "Node"
Nodes are kept in an order which forms a closed loop, so that a pair of nodes will form either side of a wall
The height of the wall is determined when it is rendered, being relative to distance from the camera
Walls are rendered using painter's algorithm, so closer objects are drawn on top of further ones
For shading "fake contrast", which brightens/darkens walls based on the gradient between it's two nodes
While it seems simple enough, the process behind translating the 3D coordinates into 2D points on the screen is proving the difficult for me to understand.
Googling this topic has so far only yeilded these equations:
screenX = (worldX/worldZ)
screenY = (worldY/worldZ)
Which seem flawed to me, as you would get a divide by zero error if any Z coordinate is 0.
So if anyone could help explain this, I'd be really greatful.

Well the
screenX = (worldX/worldZ)
screenY = (worldY/worldZ)
is not the whole stuff that is just the perspective division by z and it is not meant for DOOM or Wolfenstein techniques.
Well in Doom there is only single angle of viewing (you can turn left/right but cannot look up/down only duck or jump which is not the same). So we need to know our player position and direction px,py,pz,pangle. The z is needed only if you want to implement also z axis movement/looking...
If you are looking in a straight line (Red) all the object that cross that line in the 3D are projected to single x coordinate in the player screen...
So if we are looking at some direction (red) any object/point crossing/touching this red line will be place at the center of screen (in x axis). What is left from it will be rendered on the left and similarly whats on right will be rendered on the right too...
With perspective we need to define how large viewing angle we got...
This limits our view so any point touches the green line will be projected on the edge of view (in x axis). From this we can compute screen x coordinate sx of any point (x,y,z) directly:
// angle of point relative to player direction
sx = point_ang - pangle;
if (sx<-M_PI) sx+=2.0*M_PI;
if (sx>+M_PI) sx-=2.0*M_PI;
// scale to pixels
sx = screen_size_x/2 + sx*screen_size_x/FOVx
where screen_size_x is resolution of our view area and point ang is angle of point x,y,z relative to origin px,py,pz. You can compute it like this:
point_ang = atan2(y-py,x-px)
but if you truly do a DOOM ray-casting then you already got this angle.
Now we need to compute the screen y coordinate sy which is dependent on the distance from player and wall size. We can exploit triangle similarity.
so:
sy = screen_size_y/2 (+/-) wall_height*focal_length/distance
Where focal length is the distance at which wall with 100% height will cover exactly the whole screen in y axis. As you can see we dividing by distance which might be zero. Such state must be avoided so you need to make sure your rays will be evaluated at the next cell if standing directly on cell boundary. Also we need to select the focal length so square wall will be projected as square.
Here a piece of code from mine Doom engine (putted all together):
double divide(double x,double y)
{
if ((y>=-1e-30)&&(y<=+1e-30)) return 0.0;
return x/y;
}
bool Doom3D::cell2screen(int &sx,int &sy,double x,double y,double z)
{
double a,l;
// x,y relative to player
x-=plrx;
y-=plry;
// convert z from [cell] to units
z*=_Doom3D_cell_size;
// angle -> sx
a=atan2(y,x)-plra;
if (a<-pi) a+=pi2;
if (a>+pi) a-=pi2;
sx=double(sxs2)*(1.0+(2.0*a/view_ang));
// perpendicular distance -> sy
l=sqrt((x*x)+(y*y))*cos(a);
sy=sys2+divide((double(plrz+_Doom3D_cell_size)-z-z)*wall,l);
// in front of player?
return (fabs(a)<=0.5*pi);
}
where:
_Doom3D_cell_size=100; // [units] cell cube size
view_ang=60.0*deg; // FOVx
focus=0.25; // [cells] view focal length (uncorrected)
wall=double(sxs)*(1.25+(0.288*a)+(2.04*a*a))*focus/double(_Doom3D_cell_size); // [px] projected wall size ratio size = height*wall/distance
sxs,sys = screen resolution
sxs2,sys2 = screen half resolution
pi=M_PI, pi2=2.0*M_PI
Do not forget to use perpendicular distances (multiplied by cos(a) as I did) otherwise serious fish-eye effect will occur. For more info see:
Ray Casting with different height size

OpenCV - Tilted camera and triangulation landmark for stereo vision

I am using a stereo system and so I am trying to get world coordinates of some points by triangulation.
My cameras present an angle, the Z axis direction (direction of the depth) is not normal to my surface. That is why when I observe flat surface, I get no constant depth but a "linear" variation, correct? And I want the depth from the baseline direction... How I can re-project?
A piece of my code with my projective arrays and triangulate function :
#C1 and C2 are the cameras matrix (left and rig)
#R_0 and T_0 are the transformation between cameras
#Coord1 and Coord2 are the correspondant coordinates of left and right respectively
P1 = np.dot(C1,np.hstack((np.identity(3),np.zeros((3,1)))))
P2 =np.dot(C2,np.hstack(((R_0),T_0)))
for i in range(Coord1.shape[0])
z = cv2.triangulatePoints(P1, P2, Coord1[i,],Coord2[i,])
-------- EDIT LATER -----------
Thanks scribbleink, so i tried to apply your proposal. But i think i have a mistake because it doesnt work well as you can see below. And the point clouds seems to be warped and curved towards the edges of the image.
U, S, Vt = linalg.svd(F)
V = Vt.T
#Right epipol
U[:,2]/U[2,2]
# The expected X-direction with C1 camera matri and C1[0,0] the focal length
vecteurX = np.array([(U[:,2]/U[2,2])[0],(U[:,2]/U[2,2])[1],C1[0,0]])
vecteurX_unit = vecteurX/np.sqrt(vecteurX[0]**2 + vecteurX[1]**2 + vecteurX[2]**2)
# The expected Y axis :
height = 2048
vecteurY = np.array([0, height -1, 0])
vecteurY_unit = vecteurY/np.sqrt(vecteurY[0]**2 + vecteurY[1]**2 + vecteurY[2]**2)
# The expected Z direction :
vecteurZ = np.cross(vecteurX,vecteurY)
vecteurZ_unit = vecteurZ/np.sqrt(vecteurZ[0]**2 + vecteurZ[1]**2 + vecteurZ[2]**2)
#Normal of the Z optical (the current Z direction)
Zopitcal = np.array([0,0,1])
cos_theta = np.arccos(np.dot(vecteurZ_unit, Zopitcal)/np.sqrt(vecteurZ_unit[0]**2 + vecteurZ_unit[1]**2 + vecteurZ_unit[2]**2)*np.sqrt(Zopitcal[0]**2 + Zopitcal[1]**2 + Zopitcal[2]**2))
sin_theta = (np.cross(vecteurZ_unit, Zopitcal))[1]
#Definition of the Rodrigues vector and use of cv2.Rodrigues to get rotation matrix
v1 = Zopitcal
v2 = vecteurZ_unit
v_rodrigues = v1*cos_theta + (np.cross(v2,v1))*sin_theta + v2*(np.cross(v2,v1))*(1. - cos_theta)
R = cv2.Rodrigues(v_rodrigues)[0]

Your expected z direction is arbitrary to the reconstruction method. In general, you have a rotation matrix that rotates the left camera from your desired direction. You can easily build that matrix, R. Then all you need to do is to multiply your reconstructed points by the transpose of R.

To add to fireant's response, here is one candidate solution, assuming that the expected X-direction coincides with the line joining the centers of projection of the two cameras.
Compute the focal lengths f_1 and f_2 (via pinhole model calibration).
Solve for the location of camera 2's epipole in camera 1's frame. For this, you can use either the Fundamental matrix (F) or the Essential matrix (E) of the stereo camera pair. Specifically, the left and right epipoles lie in the nullspace of F, so you can use Singular Value Decomposition. For a solid theoretical reference, see Hartley and Zisserman, Second edition, Table 9.1 "Summary of fundamental matrix properties" on Page 246 (freely available PDF of the chapter).
The center of projection of camera 1, i.e. (0, 0, 0) and the location of the right epipole, i.e. (e_x, e_y, f_1) together define a ray that aligns with the line joining the camera centers. This can be used as the expected X-direction. Call this vector v_x.
Assuming that the expected Y axis faces downward in the image plane, i.e, from (0, 0, f_1) to (0, height-1, f_1), where f is the focal length. Call this vector as v_y.
The expected Z direction is now the cross-product of vectors v_x and v_y.
Using the expected Z direction along with the optical axis (Z-axis) of camera 1, you can then compute a rotation matrix from two 3D vectors using, say the method listed in this other stackoverflow post.
Practical note:
Expecting the planar object to exactly align with the stereo baseline is unlikely without considerable effort, in my practical experience. Some amount of plane-fitting and additional rotation would be required.
One-time effort:
It depends on whether you need to do this once, e.g. for one-time calibration, in which case simply make this estimation process real-time, then rotate your stereo camera pair until the depth map variance is minimized. Then lock your camera positions and pray someone doesn't bump into it later.
Repeatability:
If you need to keep aligning your estimated depth maps to truly arbitrary Z-axes that change for every new frame captured, then you should consider investing time in the plane-estimation method and making it more robust.

Move point A to point B in a arc motion in 3D

I'm struggling to work out how I move point A to B in an arc motion in 3D. The duration of the movement doesn't matter so much. I have found a load of wikipedia pages on it but am having no luck understanding them as its been a long time since I was in college. Any code examples would be really useful for me to understand. Thank you, I would really appreciate your help. Here is an image that sort of shows what I am looking to achieve, although the image only represents the points in 2d, I am looking for a 3d solution.

Assuming your problem statement is:
Given points a and b, trace the circular path along the plane which lies tangent to the up vector:
And that you have the appropriate vector algebra libraries:
def interp(a, b, up, t):
""" 0 <= t <= 1"""
# find center and radius vector
center = (a + b) / 2
radius = a - center
# split path into upwards and downwards section
omega = math.acos(radius.dot(up)) # angle between center-a and center-top
t_top = omega / math.pi # time taken to reach the top
# redefine 0 as A, 1 as the top, and B as whatever remains linear
t = t / t_top
#slerp, with t intentionally > 1
sin = math.sin
return (
center +
sin((1 - t) * omega) / sin(omega) * radius +
sin(t * omega) / sin(omega) * up
)

it doesnt matter if its 2d or 3d .
you take the position of each dot and find the center beetwean them .
the distance beetwean the center and each dot is the radius .
after that give the object a moving direction and tell it to be always in a distance of radius from center . which a moving vector you can give it any direction you want .

Affine transformation between contours in OpenCV

I have a historical time sequence of seafloor images scanned from film that need registration.
from pylab import *
import cv2
import urllib
urllib.urlretrieve('http://geoport.whoi.edu/images/frame014.png','frame014.png');
urllib.urlretrieve('http://geoport.whoi.edu/images/frame015.png','frame015.png');
gray1=cv2.imread('frame014.png',0)
gray2=cv2.imread('frame015.png',0)
figure(figsize=(14,6))
subplot(121);imshow(gray1,cmap=cm.gray);
subplot(122);imshow(gray2,cmap=cm.gray);
I want to use the black region on the left of each image to do the registration, since that region was inside the camera and should be fixed in time. So I just need to compute the affine transformation between the black regions.
I determined these regions by thresholding and finding the largest contour:
def find_biggest_contour(gray,threshold=40):
# threshold a grayscale image
ret,thresh = cv2.threshold(gray,threshold,255,1)
# find the contours
contours,h = cv2.findContours(thresh,mode=cv2.RETR_LIST,method=cv2.CHAIN_APPROX_NONE)
# measure the perimeter
perim = [cv2.arcLength(cnt,True) for cnt in contours]
# find contour with largest perimeter
i=perim.index(max(perim))
return contours[i]
c1=find_biggest_contour(gray1)
c2=find_biggest_contour(gray2)
x1=c1[:,0,0];y1=c1[:,0,1]
x2=c2[:,0,0];y2=c2[:,0,1]
figure(figsize=(8,8))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1,y1,'b-')
imshow(gray2,cmap=cm.gray, alpha=0.5);plot(x2,y2,'g-')
axis([0,1500,1000,0]);
The blue is the longest contour from the 1st frame, the green is the longest contour from the 2nd frame.
What is the best way to determine the rotation and offset between the blue and green contours?
I only want to use the right side of the contours in some region surrounding the step, something like the region between the arrows.
Of course, if there is a better way to register these images, I'd love to hear it. I already tried a standard feature matching approach on the raw images, and it didn't work well enough.

Following Shambool's suggested approach, here's what I've come up with. I used a Ramer-Douglas-Peucker algorithm to simplify the contour in the region of interest and identified the two turning points. I was going to use the two turning points to get my three unknowns (xoffset, yoffset and angle of rotation), but the 2nd turning point is a bit too far toward the right because RDP simplified away the smoother curve in this region. So instead I used the angle of the line segment leading up to the 1st turning point. Differencing this angle between image1 and image2 gives me the rotation angle. I'm still not completely happy with this solution. It worked well enough for these two images, but I'm not sure it will work well on the entire image sequence. We'll see.
It would really be better to fit the contour to the known shape of the black border.
# select region of interest from largest contour
ind1=where((x1>190.) & (y1>200.) & (y1<900.))[0]
ind2=where((x2>190.) & (y2>200.) & (y2<900.))[0]
figure(figsize=(10,10))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1[ind1],y1[ind1],'b-')
imshow(gray2,cmap=cm.gray, alpha=0.5);plot(x2[ind2],y2[ind2],'g-')
axis([0,1500,1000,0])
def angle(x1,y1):
# Returns angle of each segment along an (x,y) track
return array([math.atan2(y,x) for (y,x) in zip(diff(y1),diff(x1))])
def simplify(x,y, tolerance=40, min_angle = 60.*pi/180.):
"""
Use the Ramer-Douglas-Peucker algorithm to simplify the path
http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
Python implementation: https://github.com/sebleier/RDP/
"""
from RDP import rdp
points=vstack((x,y)).T
simplified = array(rdp(points.tolist(), tolerance))
sx, sy = simplified.T
theta=abs(diff(angle(sx,sy)))
# Select the index of the points with the greatest theta
# Large theta is associated with greatest change in direction.
idx = where(theta>min_angle)[0]+1
return sx,sy,idx
sx1,sy1,i1 = simplify(x1[ind1],y1[ind1])
sx2,sy2,i2 = simplify(x2[ind2],y2[ind2])
fig = plt.figure(figsize=(10,6))
ax =fig.add_subplot(111)
ax.plot(x1, y1, 'b-', x2, y2, 'g-',label='original path')
ax.plot(sx1, sy1, 'ko-', sx2, sy2, 'ko-',lw=2, label='simplified path')
ax.plot(sx1[i1], sy1[i1], 'ro', sx2[i2], sy2[i2], 'ro',
markersize = 10, label='turning points')
ax.invert_yaxis()
plt.legend(loc='best')
# determine x,y offset between 1st turning points, and
# angle from difference in slopes of line segments approaching 1st turning point
xoff = sx2[i2[0]] - sx1[i1[0]]
yoff = sy2[i2[0]] - sy1[i1[0]]
iseg1 = [i1[0]-1, i1[0]]
iseg2 = [i2[0]-1, i2[0]]
ang1 = angle(sx1[iseg1], sy1[iseg1])
ang2 = angle(sx2[iseg2], sy2[iseg2])
ang = -(ang2[0] - ang1[0])
print xoff, yoff, ang*180.*pi
-28 14 5.07775871644
# 2x3 affine matrix M
M=array([cos(ang),sin(ang),xoff,-sin(ang),cos(ang),yoff]).reshape(2,3)
print M
[[ 9.99959685e-01 8.97932821e-03 -2.80000000e+01]
[ -8.97932821e-03 9.99959685e-01 1.40000000e+01]]
# warp 2nd image into coordinate frame of 1st
Minv = cv2.invertAffineTransform(M)
gray2b = cv2.warpAffine(gray2,Minv,shape(gray2.T))
figure(figsize=(10,10))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1[ind1],y1[ind1],'b-')
imshow(gray2b,cmap=cm.gray, alpha=0.5);
axis([0,1500,1000,0]);
title('image1 and transformed image2 overlain with 50% transparency');

Good question.
One approach is to represent contours as 2d point clouds and then do registration.
More simple and clear code in Matlab that can give you affine transform.
And more complex C++ code(using VXL lib) with python and matlab wrapper included.
Or you can use some modificated ICP(iterative closest point) algorithm that is robust to noise and can handle affine transform.
Also your contours seems to be not very accurate so it can be a problem.
Another approach is to use some kind of registration that use pixel values.
Matlab code (I think it's using some kind of minimizer+ crosscorrelation metric)
Also maybe there is some kind of optical flow registration(or some other kind) that is used in medical imaging.
Also you can use point features as SIFT(SURF).
You can try it quick in FIJI(ImageJ)
also this link.
Open 2 images
Plugins->feature extraction-> sift (or other)
Set expected transformation to affine
Look at estimated transformation model [3,3] homography matrix in ImageJ log.
If it works good then you can implement it in python using OpenCV or maybe using Jython with ImageJ.
And it will be better if you post original images and describe all conditions (it seems that image is changing between frames)

You can represent these contours with their respective ellipses. These ellipses are centered on the centroid of the contour and they are oriented towards the main density axis. You can compare the centroids and the orientation angle.
1) Fill the contours => drawContours with thickness=CV_FILLED
2) Find moments => cvMoments()
3) And use them.
Centroid: { x, y } = {M10/M00, M01/M00 }
Orientation (theta):
EDIT: I customized the sample code from legacy (enteringblobdetection.cpp) for your case.
/* Image moments */
double M00,X,Y,XX,YY,XY;
CvMoments m;
CvRect r = ((CvContour*)cnt)->rect;
CvMat mat;
cvMoments( cvGetSubRect(pImgFG,&mat,r), &m, 0 );
M00 = cvGetSpatialMoment( &m, 0, 0 );
X = cvGetSpatialMoment( &m, 1, 0 )/M00;
Y = cvGetSpatialMoment( &m, 0, 1 )/M00;
XX = (cvGetSpatialMoment( &m, 2, 0 )/M00) - X*X;
YY = (cvGetSpatialMoment( &m, 0, 2 )/M00) - Y*Y;
XY = (cvGetSpatialMoment( &m, 1, 1 )/M00) - X*Y;
/* Contour description */
CvPoint myCentroid(r.x+(float)X,r.y+(float)Y);
double myTheta = atan( 2*XY/(XX-YY) );
Also, check this with OpenCV 2.0 examples.

If you don't want to find the homography between the two images and want to find the affine transformation you have three unknowns, rotation angle (R), and the displacement in x and y (X,Y). Therefore minimum of two points (with two known values for each) are needed to find the unknowns. Two points should be matched between the two images or two lines, each has two known values, the intercept and slope. If you go with the point matching approach, the further the points are from each other the more robust is the found transform to noise (this is very simple if you remember error propagation rules).
In the two point matching method:
find two points (A and B) in the first image I1 and their corresponding points (A',B') in the second image I2
find the middle point between A and B: C, and the middle point between A' and B': C'
the difference C and C' (C-C') gives the translation between the images (X and Y)
using the dot product of C-A and C'-A' you can find the rotation angle (R)
To detect robust points, I would find the the points along the side of counter you have found with highest absolute value of the second derivative (Hessian) and then try to match them. Since you mentioned this is a video footage you can easily make the assumption the transformation between each two frames is small to reject the outliers.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.