I'm trying to register two images that are a rotated and translated version of one another using opencv. Generally speaking, the procedure is (pseudo code):
a. IF1 = FFT2(I1); IF2 = FFT2(I2)
b. R_translation = (IF1).*(IF2_conjugate)
c. R_translation = R_translation./abs(R_translation)
d. r_translation = IFFT2(R_translation)
where the maximum of r_translation corresponds to the translation. Moving on to calculate the rotation, the abs value removes the translation part,
e. IF1_abs = abs(IF1); IF2_abs = abs(IF2)
Converting to Linear-Polar coordinates,
f. IF1_abs_pol = LINPOL(IF1_abs); IF2_abs_pol = LINPOL(IF2_abs)
f. IFF1 = FFT2(IF1_abs_pol); IFF2 = FFT2(IF2_abs_pol)
f. R_rot = (IFF1).*(IFF2_conjugate)
c. R_rot = R_rot./abs(R_rot)
d. r_rot = IFFT2(R_rot)
where the maximum of r_rotationn corresponds to the rotation. While for translation alone, the cv2.phaseCorrelate function returns expected results, for rotation, it returns odd results. So I had tried the following.
I took two numpy.array-s 5x5, which are a rotated version of one another like so:
a = numpy.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
a = a.astype('float')/a.astype('float').max()
b = numpy.array([[5, 5, 5, 5, 5], [4, 4, 4, 4, 4], [3, 3, 3, 3, 3], [2, 2, 2, 2, 2], [1, 1, 1, 1, 1]])
b = b.astype('float') / b.astype('float').max()
First I calculated the phase correlation myself:
center_x = numpy.floor(a.shape[0] / 2.0)#the x center of rotation (= x center of image)
center_y = numpy.floor(a.shape[1] / 2.0)#the y center of rotation (= y center of image)
Mvalue = a.shape[1] / numpy.sqrt(
((a.shape[0] / 2.0) ** 2.0) + ((a.shape[1] / 2.0) ** 2.0)) # rotation radius
Calculating the FFT, taking the absolute value (losing the translation difference data if existed), and switching to Linear-Polar coordinates and normalizing:
a_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(a)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
b_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(b)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
a_polar = a_polar/a_polar.max()
b_polar = b_polar / b_polar.max()
Another FFT step, multiplying point wise, and IFFT back:
aff = numpy.fft.fft2(a_polar)
bff = numpy.fft.fft2(b_polar)
R = aff * numpy.ma.conjugate(bff)
R = R / numpy.absolute(R)
r = numpy.fft.ifft2(R).real
r = r/r.max()
yields,
Phase correlation for rotation, b with respect to a
According to cv2.linearPolar() the rows, span the angle (in this case with step size of 360/5 = 72degrees) and the columns span the radius (from 0 to the maximum radius given in Mvalue. The maximum is evident at the last row (corresponding to approximately -90degree shift). So far so good..
The second method is using cv2.phaseCorrelate() directly,
r_direct = cv2.phaseCorrelate(a_polar, b_polar)
which yields,
Phase correlation for rotation, b with respect to a direct method
The first tuple, is the X,Y correlation coefficient (in pixels?) and the third number is the fit grade. When it is close to unity, the correlation coefficient represents better the data (the blob around the maximum is more distinct).
Other than the fact that the result is not distinct enough (why?), the result is confusing...
Generally, The first FFT process in this 5x5 example was not necessary. If rotation is the only interference, one can immediately switch to Linear-Polar coordinates and use cv2.phaseCorrelate. In that case, the result is also confusing.
Any help would be appreciated :)
Thanks!
David
Related
I have a batch of 20 flattened tensors representing 256X256 images.
>>> imgs.shape
(20, 65536)
Each image was split into 32x32 patches (a total of 64 patches per image). I have calculated a score for each patch and got a vector with the shape of (20,64)
I would like to multiply each pixel with the corresponding patch score.
imgs * score yields an error and score.repeat(1,1,64) didn't repeat the scores in a way that preserves the score of each pixel.
How can this be achieved?
EDIT:
A simple example can be using
import torch
img_size = 4
patch_size = 2
img = torch.rand((2,img_size,img_size)) # (2,4,4)
score = torch.tensor([[1,2,3,4],[5,6,7,8]]) # (2,4)
And trying to achieve
score = [[1,1,3,3],[2,2,4,4],[5,5,6,6][7,7,8,8]]
I would suggest reshaping your scores array to preserve information about how it relates to the original image, then using repeat_interleave() twice.
Example:
import torch
img_size = 4
patch_size = 2
patches_per_axis = int(img_size / patch_size)
num_images = 2
img = torch.rand((2,img_size,img_size)) # (2,4,4)
score = torch.tensor([[1,2,3,4],[5,6,7,8]]) # (2,4)
def expand_scores(scores):
# Unflatten scores
scores = scores.reshape((num_images, patches_per_axis, patches_per_axis))
# Repeat scores to match dimensions of image, in vertical direction
scores = scores.repeat_interleave(repeats=patch_size, axis=1)
# Repeat scores to match dimensions of image, in horizontal direction
scores = scores.repeat_interleave(repeats=patch_size, axis=2)
# Optional: use reshape() to re-flatten scores. If you do that here, you'll need to do it to the image tensor too.
return scores
(I added two constants at the top to your example, num_images, and patches_per_axis. In your original example, these would be set to 20 and 8, respectively.)
When you call expand_scores(), you'll get the following output:
tensor([[[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]],
[[5, 5, 6, 6],
[5, 5, 6, 6],
[7, 7, 8, 8],
[7, 7, 8, 8]]])
You can multiply that by the pixel values:
expand_scores(score) * img
I'm in the process of learning some ML concepts using OpenCV, and I have a piece of python code that I was given to translate into c++. I have a very basic knowledge of python, and I've run into some syntax that I can't seem to find the meaning for.
I have a variable being passed into a method (whole method not shown) that is coming from the result of cv2.imread(), so an image. In c++, it's of type Mat:
def preprocess_image(img, side = 96):
min_side = min(img.shape[0], img.shape[1])
img = img[:min_side, :min_side * 2]
I have a couple questions:
What does the syntax ":min_side" do?
What is that line doing in terms of the image?
I am assuming the input of the image is a Matrix. In Python the image is generally read as numpy matrix
1.What does the syntax ":min_side" do?
It "Slice" the List/Array or basically in this case, a Matrix.
2.What is that line doing in terms of the image?
It "crops" the 2D Array(Basically a Matrix/Image)
A simple example of slicing:
x = np.array([[0, 1, 2],[3, 4, 5], [6, 7, 8]])
print(x)
out:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Performing Slicing on this Matrix(Image):
x[:2, :3]
output after Slicing:
array([[0, 1, 2],
[3, 4, 5]])
A good source to read more about it would be straight from the source: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html
The line:
img = img[:min_side, :min_side * 2]
is cropping the image so that the resulting image is min_side in height and min_side * 2 in width. The colon preceding a variable name is python's slicing syntax. Observe:
arr = [1, 2, 3, 4, 5, 6]
length = 4
print(arr[:length])
Output:
[1, 2, 3, 4]
:min_side is a shorthand for 0:min_side i.e it produces a slice the object from the start to min_side. For example:
f = [2, 4, 5, 6, 8, 9]
f[:3] # returns [2,4,5]
img = img[:min_side, :min_side *2] produces a crop of the image (which is a numpy array) from 0 to min_side along the height and from 0 to min_side * 2 along the width. Therefore the resulting image would be one of width min_side * 2 and height min_side .
I have the following python function:
def npnearest(u: np.ndarray, X: np.ndarray, Y: np.ndarray, distance: 'callbale'=npdistance):
'''
Finds x1 so that x1 is in X and u and x1 have a minimal distance (according to the
provided distance function) compared to all other data points in X. Returns the label of x1
Args:
u (np.ndarray): The vector (ndim=1) we want to classify
X (np.ndarray): A matrix (ndim=2) with training data points (vectors)
Y (np.ndarray): A vector containing the label of each data point in X
distance (callable): A function that receives two inputs and defines the distance function used
Returns:
int: The label of the data point which is closest to `u`
'''
xbest = None
ybest = None
dbest = float('inf')
for x, y in zip(X, Y):
d = distance(u, x)
if d < dbest:
ybest = y
xbest = x
dbest = d
return ybest
Where, npdistance simply gives distance between two points i.e.
def npdistance(x1, x2):
return(np.sum((x1-x2)**2))
I want to optimize npnearest by performing nearest neighbor search directly in numpy. This means that the function cannot use for/while loops.
Thanks
Since you don't need to use that exact function, you can simply change the sum to work over a particular axis. This will return a new list with the calculations and you can call argmin to get the index of the minimum value. Use that and lookup your label:
import numpy as np
def npdistance_idx(x1, x2):
return np.argmin(np.sum((x1-x2)**2, axis=1))
Y = ["label 0", "label 1", "label 2", "label 3"]
u = np.array([[1, 5.5]])
X = np.array([[1,2], [1, 5], [0, 0], [7, 7]])
idx = npdistance_idx(X, u)
print(Y[idx]) # label 1
Numpy supports vectorized operations (broadcasting)
This means you can pass in arrays and operations will be applied to entire arrays in an optimized way (SIMD - single instruction, multiple data)
You can then get the address of the array minimum using .argmin()
Hope this helps
In [9]: numbers = np.arange(10); numbers
Out[9]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [10]: numbers -= 5; numbers
Out[10]: array([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4])
In [11]: numbers = np.power(numbers, 2); numbers
Out[11]: array([25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
In [12]: numbers.argmin()
Out[12]: 5
I have a 2D grid with radioactive beta-decay rates. Each vale corresponds to a rate on a specific pair of temperature and density (both on logarithmic scale). What I would like to do, is when I have a temperature and density data pair (after getting their logarithms), to find the matching values in the table. I tried using the scipy interpolate interpn function, but I got a little confused, I would be grateful for the help.
What I have so far:
pointsx = np.array([7+0.2*i for i in range(0,16)]) #temperature range
pointsy = np.array([i for i in range(0,11) ]) #rho_el range
data = numpy.loadtxt(filename) #getting data from file
logT = np.log10(T) #wanted temperature logarithmic
logrho = np.log10(rho) #wanted rho logarithmic
The interpn function has the following arguments: points, values, xi, method='linear', bounds_error=True, fill_value=nan. I figure that the points will be the pointsx and pointsy I have, the data is quite obvious, and xi will be the (T,rho) I'm looking for. But I'm not sure, what dimensions they should have? The points is the same size, as the data? So I have to make an array of the corresponding pairs of T and rho, which will be the points part, and then have a (T, rho) pair as xi?
When you aren't certain about how a function works, it's always a good idea to open up a REPL and test it yourself. In this case, the function works exactly as expected, given your understanding of the documentation.
>>> points = [[1, 2, 3, 4], [1, 2, 3, 4]] # Input values for each grid dimension
>>> values = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7]] # The grid itself
>>> xi = (1, 1.5)
>>> scipy.interpolate.interpn(points, values, xi)
array([ 1.5])
>>> xi = [[1, 1.5], [2, 1.5], [2, 2.5], [3, 2.5], [3, 3.5], [4, 3.5]]
>>> scipy.interpolate.interpn(points, values, xi)
array([ 1.5, 2.5, 3.5, 4.5, 5.5, 6.5])
The only thing you missed was that points is supposed to be a tuple. But as you can see from the above, it works even if points ins't a tuple.
I have a point set which I have stored its coordinates in three different arrays (xa, ya, za). Now, I want to calculate the euclidean distance between each point of this point set (xa[0], ya[0], za[0] and so on) with all the points of an another point set (xb, yb, zb) and every time store the minimum distance in a new array.
Let's say that xa.shape = (11,), ya.shape = (11,), za.shape= (11,). Respectively, xb.shape = (13,), yb.shape = (13,), zb.shape = (13,). What I want to do is to take each time one xa[],ya[],za[], and calculate its distance with all the elements of xb, yb, zb, and at the end store the minimum value into an xfinal.shape = (11,) array.
Do you think that this would be possible with numpy?
A different solution would be to use the spatial module from scipy, the KDTree in particular.
This class learn from a set of data and can be interrogated given a new dataset:
from scipy.spatial import KDTree
# create some fake data
x = arange(20)
y = rand(20)
z = x**2
# put them togheter, should have a form [n_points, n_dimension]
data = np.vstack([x, y, z]).T
# create the KDTree
kd = KDTree(data)
now if you have a point you can ask the distance and the index of the closet point (or the N closest points) simply by doing:
kd.query([1, 2, 3])
# (1.8650720813822905, 2)
# your may differs
or, given an array of positions:
#bogus position
x2 = rand(20)*20
y2 = rand(20)*20
z2 = rand(20)*20
# join them togheter as the input
data2 = np.vstack([x2, y2, z2]).T
#query them
kd.query(data2)
#(array([ 14.96118553, 9.15924813, 16.08269197, 21.50037074,
# 18.14665096, 13.81840533, 17.464429 , 13.29368755,
# 20.22427196, 9.95286671, 5.326888 , 17.00112683,
# 3.66931946, 20.370496 , 13.4808055 , 11.92078034,
# 5.58668204, 20.20004206, 5.41354322, 4.25145521]),
#array([4, 3, 2, 4, 2, 2, 4, 2, 3, 3, 2, 3, 4, 4, 3, 3, 3, 4, 4, 4]))
You can calculate the difference from each xa to each xb with np.subtract.outer(xa, xb). The distance to the nearest xb is given by
np.min(np.abs(np.subtract.outer(xa, xb)), axis=1)
To extend this to 3D,
distances = np.sqrt(np.subtract.outer(xa, xb)**2 + \
np.subtract.outer(ya, yb)**2 + np.subtract.outer(za, zb)**2)
distance_to_nearest = np.min(distances, axis=1)
If you actually want to know which of the b points is the nearest, you use argmin in place of min.
index_of_nearest = np.argmin(distances, axis=1)
There is more than one way of doing this. Most importantly, there's a trade-off between memory-usage and speed. Here's the wasteful method:
s = (1, -1)
d = min((xa.reshape(s)-xb.reshape(s).T)**2
+ (ya.reshape(s)-yb.reshape(s).T)**2
+ (za.reshape(s)-zb.reshape(s).T)**2), axis=0)
The other method would be to iterate over the point set in b to avoid the expansion to the full blown matrix.