From the Tutorial: https://programtalk.com/vs2/?source=python/8176/opencv-python-blueprints/chapter4/scene3D.py
I don't understand why they first undistort the images
# undistort the images
self.img1 = cv2.undistort(self.img1, self.K, self.d)
self.img2 = cv2.undistort(self.img2, self.K, self.d)
and: Compute the Essential Matrix
def _find_fundamental_matrix(self):
self.F, self.Fmask = cv2.findFundamentalMat(self.match_pts1,
self.match_pts2,
cv2.FM_RANSAC, 0.1,0.99)
def _find_essential_matrix(self):
self.E = self.K.T.dot(self.F).dot(self.K)
and also Normalize the coordinates:
first_inliers = []
second_inliers = []
for i in range(len(self.Fmask)):
if self.Fmask[i]:
# normalize and homogenize the image coordinates
first_inliers.append(self.K_inv.dot([self.match_pts1[i][0],
self.match_pts1[i][1], 1.0]))
second_inliers.append(self.K_inv.dot([self.match_pts2[i][0],
self.match_pts2[i][1], 1.0]))
Shouldn't it be either or? Or do I have some wrong understanding here?
Can please somone help me on that?
The first step, undistort, does a number of things to reverse the typical warping caused by small camera lenses. See the Wikipedia article on distortion (optics) for more background.
The last step, homogenizing the coordinates, is a completely different thing. The Wikipedia article on homogenous coordinates explains it, but the basic idea is that you add in an extra fake axis that lets you do all affine and projective transformations with chained simple matrix multiplication and then just project back to 3D at the end. Normalizing is just a step you do to make that math easier—basically, you want your extra coordinate to start off as 1.0 (multiply by the inverse of the projective norm).
The requirement for normalization is explained at page-107 of Multi-View Geometry (Hartley and Zisserman). The normalization is required in addition to the un-distortion.
If are using raw pixel values in homogeneous coordinates, the Z-coordinate which is 1 will be small compared to the X and Y co-coordinates. Eg: (X=320, Y=220, Z=1).
But if the homogenized coordinates are the image pixel positions normalized to a standard range, ie -1.0 to 1.0, then we are talking about coordinate values all of whom are kind of in the same range , Eg: (0.75, -0.89, 1.0).
If the image coordinates are of dramatically different ranges(as in the unnormalized case), then the DLT matrix produced will have a bad condition number, and consequently small variations in input image pixel positions, could produce wide variations in the result.
Please see page 107 for a very good explanation.
Related
I have a 2d gaussian whose center has been destroyed by pixel saturation. I need the center to be filled in because a poorly filled in center will confuse a neural network I'm trying to train. See below:
The scattered nan values I can handle fairly easily, but the large cluster in the gaussian's center I cannot.
I've tried various methods to correct this, but none seem to work in the sense that the gaussian is filled in correctly.
Here are some other similar answers that I've tried:
Python Image Processing - How to remove certain contour and blend the value with surrounding pixels?
https://docs.astropy.org/en/stable/convolution/index.html
These work well for the small discrete nans floating around the image, but don't adequately address the center cluster.
This is what I get with convolution infilling:
I've taken slices of the centers as well.
I do actually have a reference image that does not have nans. However, the scaling of the pixel values are not constant, so I've made a function that takes into account the different scaling of each pixel.
def mult_mean_surround(s_arr, c_arr, coord):
directions = np.array([[1,0],[-1,0],[0,1],[0,-1],[1,1],[1,-1],[-1,-1],[-1,1]])
s = np.array([])
for i in directions:
try:
if not np.isnan(s_arr[coord[0]+i[0],coord[1]+i[1]]):
s=np.append(s,s_arr[coord[0]+i[0],coord[1]+i[1]]/c_arr[coord[0]+i[0],coord[1]+i[1]])
except IndexError:
pass
if len(s)!=0:
s_arr[coord[0],coord[1]] = c_arr[coord[0],coord[1]] *np.mean(s)
It copies the corresponding pixel values of the reference image and scales it to the correct amount.
Ideally, it would look something like this:
The center is brighter than the rim and it looks more like a gaussian. However, this method is also substantially slower than the rest, so I'm not sure how to get around either of my issues. I've tried boosting speed with cupy to no luck as shown here: Boosting algorithm with cupy
If anyone has any helpful ideas, that would be great.
I am assuming that you are filling the 'hole' with only one gaussian.
First make a mask of all the NaNs, i.e. NaN = 1, not NaN = 0.
You can do a neighbor-count check to remove all mask pixels with no neighbors, then use a clustering algorithm (like DBSCAN) to find the largest cluster of pixels.
Calculate the centroid, width (max x - min x), and height (max y - min y) of the resulting cluster.
You can then use the following code:
import math
def gaussian_fit(query_x,query_y,
centroid_x,centroid_y,
filter_w,filter_h,
sigma_at_edge = 1.0):
x_coord = (query_x - centroid_x) * 2 / (filter_w * sigma_at_edge)
y_coord = (query_y - centroid_y) * 2 / (filter_h * sigma_at_edge)
return math.exp(-1.0*(x_coord**2+y_coord**2))
You may need to rescale the result by some constant.
I am working in Python, and I a trying to compute a wight matrix for a graph of pixels, and the weight of each edge is dependent on their "feature" similarity (F(i) - F(j)), and their location similarity (X(i)-X(j)). "Features" includes intensity, color, texture.
Right now I have it implemented and it is working, but not for color images. I at first tried to simply take some RGB values and average each pixel to convert the entire image to greyscale. But that didn't work as I had hoped, and I have read throgh a paper that suggests a different method.
They say to use this: F(i) = [v, v*s*sin(h), v*s*cos(h)](i)
where h, s, and v and the HSV color values.
I am just confused on the notation. What is this suppsed to mean? What does it mean to have three different terms separated by commas inside square brackets? I'm also confused with what the (i) at the end is supposed to mean. The solution to F(i) for any given pixel should be a single number, to be able to carry out F(i)-F(j)?
I'm not asking for someone to do this for me I just need some clarification.
Features can be vectors and you can calculate distance between vectors.
f1 = numpy.array([1,2,3])
f2 = numpy.array([0,2,3])
distance = numpy.linalg.norm(f1 - f2).
I am perplexed by the API to scipy.ndimage.interpolation.affine_transform. And judging by this issue I'm not the only one. I'm actually wanting to do more interesting things with affine_transform than just rotating an image, but a rotation would do for starters. (And yes I'm well aware of scipy.ndimage.interpolation.rotate, but figuring out how to drive affine_transform is what interests me here).
When I want to do this sort of thing in systems like OpenGL, I'm think in terms of computing the transform which applies a 2x2 rotation matrix R about a centre c, and therefore thinking of points p being transformed (p-c)R+c = pR+c-cR, which gives a c-cR term to be used as the translation component of a transform. However, according to the issue above, scipy's affine_transform does "offset first" so we actually need to compute an offset s such that (p-c)R+c=(p+s)R which with a bit of rearrangement gives s=(c-cR)R' where R' is the inverse of R.
If I plug this into an ipython notebook (pylab mode; code below maybe needs some additional imports):
img=scipy.misc.lena()
#imshow(img,cmap=cm.gray);show()
centre=0.5*array(img.shape)
a=15.0*pi/180.0
rot=array([[cos(a),sin(a)],[-sin(a),cos(a)]])
offset=(centre-centre.dot(rot)).dot(linalg.inv(rot))
rotimg=scipy.ndimage.interpolation.affine_transform(
img,rot,order=2,offset=offset,cval=0.0,output=float32
)
imshow(rotimg,cmap=cm.gray);show()
I get
which unfortunately isn't rotated about the centre.
So what's the trick I'm missing here?
Once treddy's answer got me a working baseline, I managed to get a better working model of affine_transform. It's not actually as odd as the issue linked in the original question hints.
Basically, each point (coordinate) p in the output image is transformed to pT+s where T and s are the matrix and offset passed to the function.
So if we want point c_out in the output to be mapped to and sampled from c_in from the input image, with rotation R and (possibly anisotropic) scaling S we need pT+s = (p-c_out)RS+c_in which can be rearranged to yield s = (c_int-c_out)T (with T=RS).
For some reason I then need to pass transform.T to affine_transform but I'm not going to worry about that too much; probably something to do with row-coordinates with transforms on the right (assumed above) vs column-coordinates with transforms on the left.
So here's a simple test rotating a centred image:
src=scipy.misc.lena()
c_in=0.5*array(src.shape)
c_out=array((256.0,256.0))
for i in xrange(0,7):
a=i*15.0*pi/180.0
transform=array([[cos(a),-sin(a)],[sin(a),cos(a)]])
offset=c_in-c_out.dot(transform)
dst=scipy.ndimage.interpolation.affine_transform(
src,transform.T,order=2,offset=offset,output_shape=(512,512),cval=0.0,output=float32
)
subplot(1,7,i+1);axis('off');imshow(dst,cmap=cm.gray)
show()
Here's it modified for different image sizes
src=scipy.misc.lena()[::2,::2]
c_in=0.5*array(src.shape)
c_out=array((256.0,256.0))
for i in xrange(0,7):
a=i*15.0*pi/180.0
transform=array([[cos(a),-sin(a)],[sin(a),cos(a)]])
offset=c_in-c_out.dot(transform)
dst=scipy.ndimage.interpolation.affine_transform(
src,transform.T,order=2,offset=offset,output_shape=(512,512),cval=0.0,output=float32
)
subplot(1,7,i+1);axis('off');imshow(dst,cmap=cm.gray)
show()
And here's a version with anisotropic scaling to compensate for the anisotropic resolution of the source image.
src=scipy.misc.lena()[::2,::4]
c_in=0.5*array(src.shape)
c_out=array((256.0,256.0))
for i in xrange(0,7):
a=i*15.0*pi/180.0
transform=array([[cos(a),-sin(a)],[sin(a),cos(a)]]).dot(diag(([0.5,0.25])))
offset=c_in-c_out.dot(transform)
dst=scipy.ndimage.interpolation.affine_transform(
src,transform.T,order=2,offset=offset,output_shape=(512,512),cval=0.0,output=float32
)
subplot(1,7,i+1);axis('off');imshow(dst,cmap=cm.gray)
show()
Based on the insight from #timday that matrix and offset are defined in the output coordinate system, I would offer the following reading of the issue, which fits with standard notations in linear algebra and allows to understand the scaling of images as well. I use here T.inv=T^-1 as pseudo-python notation to mean the inverse of a matrix and * to mean the dot product.
For each point o in the output image, affine_transform finds the corresponding point i in the input image as i=T.inv*o+s, where matrix=T.inv is the inverse of the 2x2 transformation matrix that one would use to define the forward affine transformation and offset=s is the translation defined in the output coordinates. For a pure rotation T=R=[[cos,-sin],[sin,cos]], and in this special case matrix=T.inv=T.T, which is the reason why #timday had to apply the transposition still (alternatively one could just use the negative angle).
The value for the offset s is found exactly the way described by #timday: if c_in is supposed to be positioned, after the affine transformation, at c_out (e.g. the input centre should be placed at the output centre) then c_in=T.inv*c_out+s or s=c_in-T.inv*c_out (note the conventional mathematical order of the matrix product used here, matrix*vector, which is why #timday, who used the revers order, didn't need a transposition at this point in his code).
If one wants a scaling S first and then a rotation R it holds that T=R*S and therefore T.inv=S.inv*R.inv (note the reversed order). For example, if one wants to make the image double as wide in the columns direction ('x'), then S=diag((1, 2)), hence S.inv=diag((1, 0.5)).
src = scipy.misc.lena()
c_in = 0.5 * array(src.shape)
dest_shape = (512, 1028)
c_out = 0.5 * array(dest_shape)
for i in xrange(0, 7):
a = i * 15.0 * pi / 180.0
rot = array([[cos(a), -sin(a)], [sin(a), cos(a)]])
invRot = rot.T
invScale = diag((1.0, 0.5))
invTransform = dot(invScale, invRot)
offset = c_in - dot(invTransform, c_out)
dest = scipy.ndimage.interpolation.affine_transform(
src, invTransform, order=2, offset=offset, output_shape=dest_shape, cval=0.0, output=float32
)
subplot(1, 7, i + 1);axis('off');imshow(dest, cmap=cm.gray)
show()
If the image is to be first rotated, then stretched, the order of the dot product needs to be reversed:
invTransform = dot(invRot, invScale)
Just doing some quick & dirty testing I noticed that taking the negative value of your offset seems to rotate about the centre.
I have a historical time sequence of seafloor images scanned from film that need registration.
from pylab import *
import cv2
import urllib
urllib.urlretrieve('http://geoport.whoi.edu/images/frame014.png','frame014.png');
urllib.urlretrieve('http://geoport.whoi.edu/images/frame015.png','frame015.png');
gray1=cv2.imread('frame014.png',0)
gray2=cv2.imread('frame015.png',0)
figure(figsize=(14,6))
subplot(121);imshow(gray1,cmap=cm.gray);
subplot(122);imshow(gray2,cmap=cm.gray);
I want to use the black region on the left of each image to do the registration, since that region was inside the camera and should be fixed in time. So I just need to compute the affine transformation between the black regions.
I determined these regions by thresholding and finding the largest contour:
def find_biggest_contour(gray,threshold=40):
# threshold a grayscale image
ret,thresh = cv2.threshold(gray,threshold,255,1)
# find the contours
contours,h = cv2.findContours(thresh,mode=cv2.RETR_LIST,method=cv2.CHAIN_APPROX_NONE)
# measure the perimeter
perim = [cv2.arcLength(cnt,True) for cnt in contours]
# find contour with largest perimeter
i=perim.index(max(perim))
return contours[i]
c1=find_biggest_contour(gray1)
c2=find_biggest_contour(gray2)
x1=c1[:,0,0];y1=c1[:,0,1]
x2=c2[:,0,0];y2=c2[:,0,1]
figure(figsize=(8,8))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1,y1,'b-')
imshow(gray2,cmap=cm.gray, alpha=0.5);plot(x2,y2,'g-')
axis([0,1500,1000,0]);
The blue is the longest contour from the 1st frame, the green is the longest contour from the 2nd frame.
What is the best way to determine the rotation and offset between the blue and green contours?
I only want to use the right side of the contours in some region surrounding the step, something like the region between the arrows.
Of course, if there is a better way to register these images, I'd love to hear it. I already tried a standard feature matching approach on the raw images, and it didn't work well enough.
Following Shambool's suggested approach, here's what I've come up with. I used a Ramer-Douglas-Peucker algorithm to simplify the contour in the region of interest and identified the two turning points. I was going to use the two turning points to get my three unknowns (xoffset, yoffset and angle of rotation), but the 2nd turning point is a bit too far toward the right because RDP simplified away the smoother curve in this region. So instead I used the angle of the line segment leading up to the 1st turning point. Differencing this angle between image1 and image2 gives me the rotation angle. I'm still not completely happy with this solution. It worked well enough for these two images, but I'm not sure it will work well on the entire image sequence. We'll see.
It would really be better to fit the contour to the known shape of the black border.
# select region of interest from largest contour
ind1=where((x1>190.) & (y1>200.) & (y1<900.))[0]
ind2=where((x2>190.) & (y2>200.) & (y2<900.))[0]
figure(figsize=(10,10))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1[ind1],y1[ind1],'b-')
imshow(gray2,cmap=cm.gray, alpha=0.5);plot(x2[ind2],y2[ind2],'g-')
axis([0,1500,1000,0])
def angle(x1,y1):
# Returns angle of each segment along an (x,y) track
return array([math.atan2(y,x) for (y,x) in zip(diff(y1),diff(x1))])
def simplify(x,y, tolerance=40, min_angle = 60.*pi/180.):
"""
Use the Ramer-Douglas-Peucker algorithm to simplify the path
http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
Python implementation: https://github.com/sebleier/RDP/
"""
from RDP import rdp
points=vstack((x,y)).T
simplified = array(rdp(points.tolist(), tolerance))
sx, sy = simplified.T
theta=abs(diff(angle(sx,sy)))
# Select the index of the points with the greatest theta
# Large theta is associated with greatest change in direction.
idx = where(theta>min_angle)[0]+1
return sx,sy,idx
sx1,sy1,i1 = simplify(x1[ind1],y1[ind1])
sx2,sy2,i2 = simplify(x2[ind2],y2[ind2])
fig = plt.figure(figsize=(10,6))
ax =fig.add_subplot(111)
ax.plot(x1, y1, 'b-', x2, y2, 'g-',label='original path')
ax.plot(sx1, sy1, 'ko-', sx2, sy2, 'ko-',lw=2, label='simplified path')
ax.plot(sx1[i1], sy1[i1], 'ro', sx2[i2], sy2[i2], 'ro',
markersize = 10, label='turning points')
ax.invert_yaxis()
plt.legend(loc='best')
# determine x,y offset between 1st turning points, and
# angle from difference in slopes of line segments approaching 1st turning point
xoff = sx2[i2[0]] - sx1[i1[0]]
yoff = sy2[i2[0]] - sy1[i1[0]]
iseg1 = [i1[0]-1, i1[0]]
iseg2 = [i2[0]-1, i2[0]]
ang1 = angle(sx1[iseg1], sy1[iseg1])
ang2 = angle(sx2[iseg2], sy2[iseg2])
ang = -(ang2[0] - ang1[0])
print xoff, yoff, ang*180.*pi
-28 14 5.07775871644
# 2x3 affine matrix M
M=array([cos(ang),sin(ang),xoff,-sin(ang),cos(ang),yoff]).reshape(2,3)
print M
[[ 9.99959685e-01 8.97932821e-03 -2.80000000e+01]
[ -8.97932821e-03 9.99959685e-01 1.40000000e+01]]
# warp 2nd image into coordinate frame of 1st
Minv = cv2.invertAffineTransform(M)
gray2b = cv2.warpAffine(gray2,Minv,shape(gray2.T))
figure(figsize=(10,10))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1[ind1],y1[ind1],'b-')
imshow(gray2b,cmap=cm.gray, alpha=0.5);
axis([0,1500,1000,0]);
title('image1 and transformed image2 overlain with 50% transparency');
Good question.
One approach is to represent contours as 2d point clouds and then do registration.
More simple and clear code in Matlab that can give you affine transform.
And more complex C++ code(using VXL lib) with python and matlab wrapper included.
Or you can use some modificated ICP(iterative closest point) algorithm that is robust to noise and can handle affine transform.
Also your contours seems to be not very accurate so it can be a problem.
Another approach is to use some kind of registration that use pixel values.
Matlab code (I think it's using some kind of minimizer+ crosscorrelation metric)
Also maybe there is some kind of optical flow registration(or some other kind) that is used in medical imaging.
Also you can use point features as SIFT(SURF).
You can try it quick in FIJI(ImageJ)
also this link.
Open 2 images
Plugins->feature extraction-> sift (or other)
Set expected transformation to affine
Look at estimated transformation model [3,3] homography matrix in ImageJ log.
If it works good then you can implement it in python using OpenCV or maybe using Jython with ImageJ.
And it will be better if you post original images and describe all conditions (it seems that image is changing between frames)
You can represent these contours with their respective ellipses. These ellipses are centered on the centroid of the contour and they are oriented towards the main density axis. You can compare the centroids and the orientation angle.
1) Fill the contours => drawContours with thickness=CV_FILLED
2) Find moments => cvMoments()
3) And use them.
Centroid: { x, y } = {M10/M00, M01/M00 }
Orientation (theta):
EDIT: I customized the sample code from legacy (enteringblobdetection.cpp) for your case.
/* Image moments */
double M00,X,Y,XX,YY,XY;
CvMoments m;
CvRect r = ((CvContour*)cnt)->rect;
CvMat mat;
cvMoments( cvGetSubRect(pImgFG,&mat,r), &m, 0 );
M00 = cvGetSpatialMoment( &m, 0, 0 );
X = cvGetSpatialMoment( &m, 1, 0 )/M00;
Y = cvGetSpatialMoment( &m, 0, 1 )/M00;
XX = (cvGetSpatialMoment( &m, 2, 0 )/M00) - X*X;
YY = (cvGetSpatialMoment( &m, 0, 2 )/M00) - Y*Y;
XY = (cvGetSpatialMoment( &m, 1, 1 )/M00) - X*Y;
/* Contour description */
CvPoint myCentroid(r.x+(float)X,r.y+(float)Y);
double myTheta = atan( 2*XY/(XX-YY) );
Also, check this with OpenCV 2.0 examples.
If you don't want to find the homography between the two images and want to find the affine transformation you have three unknowns, rotation angle (R), and the displacement in x and y (X,Y). Therefore minimum of two points (with two known values for each) are needed to find the unknowns. Two points should be matched between the two images or two lines, each has two known values, the intercept and slope. If you go with the point matching approach, the further the points are from each other the more robust is the found transform to noise (this is very simple if you remember error propagation rules).
In the two point matching method:
find two points (A and B) in the first image I1 and their corresponding points (A',B') in the second image I2
find the middle point between A and B: C, and the middle point between A' and B': C'
the difference C and C' (C-C') gives the translation between the images (X and Y)
using the dot product of C-A and C'-A' you can find the rotation angle (R)
To detect robust points, I would find the the points along the side of counter you have found with highest absolute value of the second derivative (Hessian) and then try to match them. Since you mentioned this is a video footage you can easily make the assumption the transformation between each two frames is small to reject the outliers.
So, I'm learning my self python by this tutorial and I'm stuck with exercise number 13 which says:
Write a function to uniformly shrink or enlarge an image. Your function should take an image along with a scaling factor. To shrink the image the scale factor should be between 0 and 1 to enlarge the image the scaling factor should be greater than 1.
This is not meant as a question about PIL, but to ask which algorithm to use so I can code it myself.
I've found some similar questions like this, but I dunno how to translate this into python.
Any help would be appreciated.
I've come to this:
import image
win = image.ImageWin()
img = image.Image("cy.png")
factor = 2
W = img.getWidth()
H = img.getHeight()
newW = int(W*factor)
newH = int(H*factor)
newImage = image.EmptyImage(newW, newH)
for col in range(newW):
for row in range(newH):
p = img.getPixel(col,row)
newImage.setPixel(col*factor,row*factor,p)
newImage.draw(win)
win.exitonclick()
I should do this in a function, but this doesn't matter right now. Arguments for function would be (image, factor). You can try it on OP tutorial in ActiveCode. It makes a stretched image with empty columns :.
Your code as shown is simple and effective for what's known as a Nearest Neighbor resize, except for one little bug:
p = img.getPixel(col/factor,row/factor)
newImage.setPixel(col,row,p)
Edit: since you're sending a floating point coordinate into getPixel you're not limited to Nearest Neighbor - you can implement any interpolation algorithm you want inside. The simplest thing to do is simply truncate the coordinates to int which will cause pixels to be replicated when factor is greater than 1, or skipped when factor is less than 1.
Mark has the correct approach. To get a smoother result, you replace:
p = img.getPixel(col/factor,row/factor)
with a function that takes floating point coordinates and returns a pixel interpolated from several neighboring points in the source image. For linear interpolation it takes the four nearest neigbors; for higher-order interpolation it takes a larger number of surrounding pixels.
For example, if col/factor = 3.75 and row/factor = 1.9, a linear interpolation would take the source pixels at (3,1), (3,2), (4,1), and (4,2) and give a result between those 4 rgb values, weighted most heavily to the pixel at (4,2).
You can do that using the Python Imaging Library.
Image.resize() should do what you want.
See http://effbot.org/imagingbook/image.htm
EDIT
Since you want to program this yourself without using a module, I have added an extra solution.
You will have to use the following algorithm.
load your image
extract it's size
calculate the desired size (height * factor, width * factor)
create a new EmptyImage with the desired size
Using a nested loop through the pixels (row by column) in your image.
Then (for shrinking) you remove some pixels every once in while, or for (enlarging) you duplicate some pixels in your image.
If you want you want to get fancy, you could smooth the added, or removed pixels, by averaging the rgb values with their neighbours.