How to use cv2.warpperspective with only one source point [x,y] - python

I'm having a problem for just one point (x, y) of the image and having already calculated the transformation matrix on the two images calculate what the corresponding point (x, y) in the second image.
If i have a pixel point [510,364] from my source image and de transformation matrix that i already calculate:
Matrix Transform: [[ 7.36664511e-01 3.38845039e+01 2.17700574e+03]
[-1.16261372e+00 6.30840432e+01 8.09587058e+03]
[ 4.28933532e-05 8.15551141e-03 1.00000000e+00]]
i can get my new point : [3730,7635]
How can i do this?
h, status =cv2.findHomography(arraypoints_fire,arraypoints_vertical)
warped_image = cv2.warpPerspective(fire_image_open, h, (vertical_image_open.shape[1],vertical_image_open.shape[0]))
cv2.namedWindow('Warped Source Image', cv2.WINDOW_NORMAL)
cv2.imshow("Warped Source Image", warped_image)
cv2.namedWindow('Overlay', cv2.WINDOW_NORMAL)
overlay_image=cv2.addWeighted(vertical_image_open,0.3,warped_image,0.8,0)
cv2.imshow('Overlay',overlay_image)

I've met the same problem and found an answer here, but on cpp. According docs, opencv warpPerspective uses this formula, where
src - input image.
dst - output image that has the size dsize and the same type as src.
M - 3×3 transformation matrix (inverted).
You can use it directly on the point:
# M - transform matrix, created with cv2.perspectiveTransform
def warp_point(x: int, y: int) -> tuple[int, int]:
d = M[2, 0] * x + M[2, 1] * y + M[2, 2]
return (
int((M[0, 0] * x + M[0, 1] * y + M[0, 2]) / d), # x
int((M[1, 0] * x + M[1, 1] * y + M[1, 2]) / d), # y
)
UPD:
I've found one more answer, but on python :D
It seems, I forgot brackets, in first part, fixed.

Related

perspective transform image segment and transform back

I am currently trying to cut a region of interest out of an image, do some calculations based on the information inside the snippet, and then either transform the snippet back into the original position or transform some coordinates from the calculation done on the snippet back into the original image.
Here are some code snippets:
x, y, w, h = cv2.boundingRect(localized_mask)
p1 = [x, y + h]
p4 = [x, y]
p3 = [x + w, y]
p2 = [x + w, y + h]
w1 = int(np.linalg.norm(np.array(p2) - np.array(p3)))
w2 = int(np.linalg.norm(np.array(p4) - np.array(p1)))
h1 = int(np.linalg.norm(np.array(p1) - np.array(p2)))
h2 = int(np.linalg.norm(np.array(p3) - np.array(p4)))
maxWidth = max(w1, w2)
maxHeight = max(h1, h2)
neighbor_points = [p1, p2, p3, p4]
output_poins = np.float32(
[
[0, 0],
[0, maxHeight],
[maxWidth, maxHeight],
[maxWidth, 0],
]
)
matrix = cv2.getPerspectiveTransform(np.float32(neighbor_points), output_poins)
result = cv2.warpPerspective(
mask, matrix, (maxWidth, maxHeight), cv2.INTER_LINEAR
)
Here are some images to illustrate this problem:
Original with marked RoI:
Transformed snippet with markings:
I tried to transform the snippet back into the original position with the following code snippets:
test2 = cv2.warpPerspective(
result, matrix, (maxHeight, maxWidth), cv2.WARP_INVERSE_MAP
)
test3 = cv2.warpPerspective(
result, matrix, (img.shape[1], img.shape[0]), cv2.WARP_INVERSE_MAP
)
Both resulted in a black image with either the shape of the snippet or a black image with the shape of the original image.
But I am honestly more interested in the white markings inside the snippet, so I tried to transform these by hand with the following code snippet:
inverse_matrix = cv2.invert(matrix)[1]
inverse_left=[]
for point in output_dict["left"]["knots"]:
trans_point = [point.x, point.y] + [1]
trans_point = np.float32(trans_point)
x, y, z = np.dot(inverse_matrix, trans_point)
new_x = np.uint8(x/z)
new_y = np.uint8(y/z)
inverse_left.append([new_x, new_y])
But I didn't account for the position of the RoI inside the image and the resulting coordinates (white dots in the upper left half) didn't end up where I wanted them.
Does anybody have an idea what I am doing wrong or know a better solution to this problem?
Thanks.
Finally found a solution and it was as simple as i thought it would be...
I first inverted the transformation matrix I used to get my image snippet and looped and transformed every single coordinate that I got out of my calculation based on the snippet.
The code looks something like this:
inv_matrix = cv2.invert(matrix)
for point in points:
x, y = (cv2.transform(np.array([[[point.x, point.y]]]), inv_matrix[1]).squeeze())[:2]

Padding scipy affine_transform output to show non-overlapping regions of transformed images

I have source (src) image(s) I wish to align to a destination (dst) image using an Affine Transformation whilst retaining the full extent of both images during alignment (even the non-overlapping areas).
I am already able to calculate the Affine Transformation rotation and offset matrix, which I feed to scipy.ndimage.interpolate.affine_transform to recover the dst-aligned src image.
The problem is that, when the images are not fuly overlapping, the resultant image is cropped to only the common footprint of the two images. What I need is the full extent of both images, placed on the same pixel coordinate system. This question is almost a duplicate of this one - and the excellent answer and repository there provides this functionality for OpenCV transformations. I unfortunately need this for scipy's implementation.
Much too late, after repeatedly hitting a brick wall trying to translate the above question's answer to scipy, I came across this issue and subsequently followed to this question. The latter question did give some insight into the wonderful world of scipy's affine transformation, but I have as yet been unable to crack my particular needs.
The transformations from src to dst can have translations and rotation. I can get translations only working (an example is shown below) and I can get rotations only working (largely hacking around the below and taking inspiration from the use of the reshape argument in scipy.ndimage.interpolation.rotate). However, I am getting thoroughly lost combining the two. I have tried to calculate what should be the correct offset (see this question's answers again), but I can't get it working in all scenarios.
Translation-only working example of padded affine transformation, which follows largely this repo, explained in this answer:
from scipy.ndimage import rotate, affine_transform
import numpy as np
import matplotlib.pyplot as plt
nblob = 50
shape = (200, 100)
buffered_shape = (300, 200) # buffer for rotation and translation
def affine_test(angle=0, translate=(0, 0)):
np.random.seed(42)
# Maxiumum translation allowed is half difference between shape and buffered_shape
# Generate a buffered_shape-sized base image with random blobs
base = np.zeros(buffered_shape, dtype=np.float32)
random_locs = np.random.choice(np.arange(2, buffered_shape[0] - 2), nblob * 2, replace=False)
i = random_locs[:nblob]
j = random_locs[nblob:]
for k, (_i, _j) in enumerate(zip(i, j)):
# Use different values, just to make it easier to distinguish blobs
base[_i - 2 : _i + 2, _j - 2 : _j + 2] = k + 10
# Impose a rotation and translation on source
src = rotate(base, angle, reshape=False, order=1, mode="constant")
bsc = (np.array(buffered_shape) / 2).astype(int)
sc = (np.array(shape) / 2).astype(int)
src = src[
bsc[0] - sc[0] + translate[0] : bsc[0] + sc[0] + translate[0],
bsc[1] - sc[1] + translate[1] : bsc[1] + sc[1] + translate[1],
]
# Cut-out destination from the centre of the base image
dst = base[bsc[0] - sc[0] : bsc[0] + sc[0], bsc[1] - sc[1] : bsc[1] + sc[1]]
src_y, src_x = src.shape
def get_matrix_offset(centre, angle, scale):
"""Follows OpenCV.getRotationMatrix2D"""
angle = angle * np.pi / 180
alpha = scale * np.cos(angle)
beta = scale * np.sin(angle)
return (
np.array([[alpha, beta], [-beta, alpha]]),
np.array(
[
(1 - alpha) * centre[0] - beta * centre[1],
beta * centre[0] + (1 - alpha) * centre[1],
]
),
)
# Obtain the rotation matrix and offset that describes the transformation
# between src and dst
matrix, offset = get_matrix_offset(np.array([src_y / 2, src_x / 2]), angle, 1)
offset = offset - translate
# Determine the outer bounds of the new image
lin_pts = np.array([[0, src_x, src_x, 0], [0, 0, src_y, src_y]])
transf_lin_pts = np.dot(matrix.T, lin_pts) - offset[::-1].reshape(2, 1)
# Find min and max bounds of the transformed image
min_x = np.floor(np.min(transf_lin_pts[0])).astype(int)
min_y = np.floor(np.min(transf_lin_pts[1])).astype(int)
max_x = np.ceil(np.max(transf_lin_pts[0])).astype(int)
max_y = np.ceil(np.max(transf_lin_pts[1])).astype(int)
# Add translation to the transformation matrix to shift to positive values
anchor_x, anchor_y = 0, 0
if min_x < 0:
anchor_x = -min_x
if min_y < 0:
anchor_y = -min_y
shifted_offset = offset - np.dot(matrix, [anchor_y, anchor_x])
# Create padded destination image
dst_h, dst_w = dst.shape[:2]
pad_widths = [anchor_y, max(max_y, dst_h) - dst_h, anchor_x, max(max_x, dst_w) - dst_w]
dst_padded = np.pad(
dst,
((pad_widths[0], pad_widths[1]), (pad_widths[2], pad_widths[3])),
"constant",
constant_values=-1,
)
dst_pad_h, dst_pad_w = dst_padded.shape
# Create the aligned and padded source image
source_aligned = affine_transform(
src,
matrix.T,
offset=shifted_offset,
output_shape=(dst_pad_h, dst_pad_w),
order=3,
mode="constant",
cval=-1,
)
# Plot the images
fig, axes = plt.subplots(1, 4, figsize=(10, 5), sharex=True, sharey=True)
axes[0].imshow(src, cmap="viridis", vmin=-1, vmax=nblob)
axes[0].set_title("Source")
axes[1].imshow(dst, cmap="viridis", vmin=-1, vmax=nblob)
axes[1].set_title("Dest")
axes[2].imshow(source_aligned, cmap="viridis", vmin=-1, vmax=nblob)
axes[2].set_title("Source aligned to Dest padded")
axes[3].imshow(dst_padded, cmap="viridis", vmin=-1, vmax=nblob)
axes[3].set_title("Dest padded")
plt.show()
e.g.:
affine_test(0, (-20, 40))
gives:
With a zoom in showing the aligned in the padded images:
I require the full extent of the src and dst images aligned on the same pixel coordinates, with both rotations and translations.
Any help is greatly appreciated!
Complexity analysis
The problem is to determine three parameters
Let's suppose that you have a grid for angle, x and y displacements, each with size O(n) and that your images are of size O(n x n) so, rotation, translation, and comparison of the images all take O(n^2), since you have O(n^3) candidate transforms to try, you end up with complexity O(n^5), and probably that's why you are asking the question.
However the part of the displacement can be computed slightly more efficiently by computing maximum correlation using Fourier transforms. The Fourier transforms can be performed with complexity O(n log n) each axis, and we have to perform them to the two spatial dimensions, the complete correlation matrix can be computed in O(n^2 log^2 n), then we find the maximum with complexity O(n^2), so the overall time complexity of determining the best alignment is O(n^2 log^2 n). However you still want to search for the best angle, since we have O(n) candidate angles the overall complexity of this search will be O(n^3 log^2 n). Remember we are using python and we may have some significant overhead, so this complexity only gives us an idea of how difficult it will be, and I have handled problems like this before so I start confident.
Preparing some example
I will start by downloading an image and applying rotation and centering the image padding with zeros.
def centralized(a, width, height):
'''
Image centralized to the given width and height
by padding with zeros (black)
'''
assert width >= a.shape[0] and height >= a.shape[1]
ap = np.zeros((width, height) + a.shape[2:], a.dtype)
ccx = (width - a.shape[0])//2
ccy = (height - a.shape[1])//2
ap[ccx:ccx+a.shape[0], ccy:ccy+a.shape[1], ...] = a
return ap
def image_pair(im, width, height, displacement=(0,0), angle=0):
'''
this build an a pair of images as numpy arrays
from the input image.
Both images will be padded with zeros (black)
and roughly centralized.
and will have the specified shape
make sure that the width and height chosen are enough
to fit the rotated image
'''
a = np.array(im)
a1 = centralized(a, width, height)
a2 = centralized(ndimage.rotate(a, angle), width, height)
a2 = np.roll(a2, displacement, axis=(0,1))
return a1, a2
def random_transform():
angle = np.random.rand() * 360
displacement = np.random.randint(-100, 100, 2)
return displacement, angle
a1, a2 = image_pair(im, 512, 512, *random_transform())
plt.subplot(121)
plt.imshow(a1)
plt.subplot(122)
plt.imshow(a2)
The displacement search
The first thing is to compute the correlation of the image
def compute_correlation(a1, a2):
A1 = np.fft.rfftn(a1, axes=(0,1))
A2 = np.fft.rfftn(a2, axes=(0,1))
C = np.fft.irfftn(np.sum(A1 * np.conj(A2), axis=2))
return C
Then, let's create an example without rotation and confirm that the with the index of the maximum correlation we can find the displacement that fit one image to the other.
displacement, _ = random_transform()
a1, a2 = image_pair(im, 521, 512, displacement, angle=0)
C = compute_correlation(a1, a2)
np.unravel_index(np.argmax(C), C.shape), displacement
a3 = np.roll(a2, np.unravel_index(np.argmax(C), C.shape), axis=(0,1))
assert np.all(a3 == a1)
With rotation or interpolation this result may not be exact but it gives the displacement that will give us the closest possible alignment.
Let's put this in a function for future use
def get_aligned(a1, a2, angle):
a1_rotated = ndimage.rotate(a1, angle, reshape=False)
C = compute_correlation(a2, a1_rotated)
found_displacement = np.unravel_index(np.argmax(C), C.shape)
a1_aligned = np.roll(a1_rotated, found_displacement, axis=(0,1))
return a1_aligned
Searching for the angle
Now we can do something in two steps,
in one we compute the correlation for each angle, then with the angle that gives maximum correlation find the alignment.
displacement, angle = random_transform()
a1, a2 = image_pair(im, 521, 512, displacement, angle)
C_max = []
C_argmax = []
angle_guesses = np.arange(0, 360, 5)
for angle_guess in angle_guesses:
a1_rotated = ndimage.rotate(a1, angle_guess, reshape=False)
C = compute_correlation(a1_rotated, a2)
i = np.argmax(C)
v = C.reshape(-1)[i]
C_max.append(v)
C_argmax.append(i)
Let's see how the correlation looks like
plt.plot(angle_guesses, C_max);
We have a clear winner looking at this curve, even if a sunflower has some sort of rotation symmetry.
Let's apply the transformation to the original image and see how it looks like
a1_aligned = get_aligned(a1, a2, angle_guesses[np.argmax(C_max)])
plt.subplot(121)
plt.imshow(a2)
plt.subplot(122)
plt.imshow(a1_aligned)
Great, I wouldn't have done better than this manually.
I am using a sunflower image for beauty reasons, but the procedure is the same for any type of image. I use RGB showing that the image may have one additional dimension, i.e. it uses a feature vector, instead of the scalar feature, you can use reshape your data to (width, height, 1) if your feature is a scalar.
Working code below in case anyone else has this need of scipy's affine transformations:
def affine_test(angle=0, translate=(0, 0), shape=(200, 100), buffered_shape=(300, 200), nblob=50):
# Maxiumum translation allowed is half difference between shape and buffered_shape
np.random.seed(42)
# Generate a buffered_shape-sized base image
base = np.zeros(buffered_shape, dtype=np.float32)
random_locs = np.random.choice(np.arange(2, buffered_shape[0] - 2), nblob * 2, replace=False)
i = random_locs[:nblob]
j = random_locs[nblob:]
for k, (_i, _j) in enumerate(zip(i, j)):
base[_i - 2 : _i + 2, _j - 2 : _j + 2] = k + 10
# Impose a rotation and translation on source
src = rotate(base, angle, reshape=False, order=1, mode="constant")
bsc = (np.array(buffered_shape) / 2).astype(int)
sc = (np.array(shape) / 2).astype(int)
src = src[
bsc[0] - sc[0] + translate[0] : bsc[0] + sc[0] + translate[0],
bsc[1] - sc[1] + translate[1] : bsc[1] + sc[1] + translate[1],
]
# Cut-out destination from the centre of the base image
dst = base[bsc[0] - sc[0] : bsc[0] + sc[0], bsc[1] - sc[1] : bsc[1] + sc[1]]
src_y, src_x = src.shape
def get_matrix_offset(centre, angle, scale):
"""Follows OpenCV.getRotationMatrix2D"""
angle_rad = angle * np.pi / 180
alpha = np.round(scale * np.cos(angle_rad), 8)
beta = np.round(scale * np.sin(angle_rad), 8)
return (
np.array([[alpha, beta], [-beta, alpha]]),
np.array(
[
(1 - alpha) * centre[0] - beta * centre[1],
beta * centre[0] + (1 - alpha) * centre[1],
]
),
)
matrix, offset = get_matrix_offset(np.array([((src_y - 1) / 2) - translate[0], ((src_x - 1) / 2) - translate[
1]]), angle, 1)
offset += np.array(translate)
M = np.column_stack((matrix, offset))
M = np.vstack((M, [0, 0, 1]))
iM = np.linalg.inv(M)
imatrix = iM[:2, :2]
ioffset = iM[:2, 2]
# Determine the outer bounds of the new image
lin_pts = np.array([[0, src_y-1, src_y-1, 0], [0, 0, src_x-1, src_x-1]])
transf_lin_pts = np.dot(matrix, lin_pts) + offset.reshape(2, 1) # - np.array(translate).reshape(2, 1) # both?
# Find min and max bounds of the transformed image
min_x = np.floor(np.min(transf_lin_pts[1])).astype(int)
min_y = np.floor(np.min(transf_lin_pts[0])).astype(int)
max_x = np.ceil(np.max(transf_lin_pts[1])).astype(int)
max_y = np.ceil(np.max(transf_lin_pts[0])).astype(int)
# Add translation to the transformation matrix to shift to positive values
anchor_x, anchor_y = 0, 0
if min_x < 0:
anchor_x = -min_x
if min_y < 0:
anchor_y = -min_y
dot_anchor = np.dot(imatrix, [anchor_y, anchor_x])
shifted_offset = ioffset - dot_anchor
# Create padded destination image
dst_y, dst_x = dst.shape[:2]
pad_widths = [anchor_y, max(max_y, dst_y) - dst_y, anchor_x, max(max_x, dst_x) - dst_x]
dst_padded = np.pad(
dst,
((pad_widths[0], pad_widths[1]), (pad_widths[2], pad_widths[3])),
"constant",
constant_values=-10,
)
dst_pad_y, dst_pad_x = dst_padded.shape
# Create the aligned and padded source image
source_aligned = affine_transform(
src,
imatrix,
offset=shifted_offset,
output_shape=(dst_pad_y, dst_pad_x),
order=3,
mode="constant",
cval=-10,
)
E.g. running:
affine_test(angle=-25, translate=(10, -40))
will show:
and zoomed in:
Apologies the code is not nicely written as is.
Note that running this in the wild I notice it cannot handle any change in scale size of the images, but I am not certain it isn't something to do with how I calculate the transformation - so a caveat worth noting, and checking out, if you are aligning images with different scales.
If you have two images that are similar (or the same) and you want to align them, you can do it using both functions rotate and shift :
from scipy.ndimage import rotate, shift
You need to find first the difference of angle between the two images angle_to_rotate, having that you apply a rotation to src:
angle_to_rotate = 25
rotated_src = rotate(src, angle_to_rotate , reshape=True, order=1, mode="constant")
With reshape=True you avoid losing information from your original src matrix, and it pads the result so the image could be translated around the 0,0 indexes. You can calculate this translation as it is (x*cos(angle),y*sin(angle) where x and y are the dimensions of the image, but it probably won't matter.
Now you will need to translate the image to the source, for doing that you can use the shift function:
rot_translated_src = shift(rotated_src , [distance_x, distance_y])
In this case there is no reshape (because otherwise you wouldn't have any real translation) so if the image was not previously padded some information will be lost.
But you can do some padding with
np.pad(src, number, mode='constant')
To calculate distance_x and distance_y you will need to find a point that serves you as a reference between the rotated_src and the destination, then just calculate the distance in the x and y axis.
Summary
Make some padding in src, and dst
Find the angular distance between them.
Rotate src with scipy.ndimage.rotate using reshape=True
Find the horizontal and vertical distance distance_x, distance_y between the rotated image and dst
Translate your 'rotated_src' with scipy.ndimage.shift
Code
from scipy.ndimage import rotate, shift
import matplotlib.pyplot as plt
import numpy as np
First we make the destination image:
# make and plot dest
dst = np.ones([40,20])
dst = np.pad(dst,10)
dst[17,[14,24]]=4
dst[27,14:25]=4
dst[26,[14,25]]=4
rotated_dst = rotate(dst, 20, order=1)
plt.imshow(dst) # plot it
plt.imshow(rotated_dst)
plt.show()
We make the Source image:
# make_src image and plot it
src = np.zeros([40,20])
src = np.pad(src,10)
src[0:20,0:20]=1
src[7,[4,14]]=4
src[17,4:15]=4
src[16,[4,15]]=4
plt.imshow(src)
plt.show()
Then we align the src to the destination:
rotated_src = rotate(src, 20, order=1) # find the angle 20, reshape true is by default
plt.imshow(rotated_src)
plt.show()
distance_y = 8 # find this distances from rotated_src and dst
distance_x = 12 # use any visual reference or even the corners
translated_src = shift(rotated_src, [distance_y,distance_x])
plt.imshow(translated_src)
plt.show()
pd: If you find problems to find the angle and the distances in a programmatic way, please leave a comment providing a bit more of insight of what can be used as a reference that could be for example the frame of the image or some image features / data)

Implementing a bilateral filter

I am trying to implement a bilateral filter from the paper Fast Bilateral Filteringfor the Display of High-Dynamic-Range Images. The equation (from the paper) that implements the bilateral filter is given as :
According to what I understood,
f is a Gaussian filter
g is a Gaussian filter
p is a pixel in a given image window
s is the current pixel
Ip is the intensity at the current pixel
With this, I wrote the code to implement these equations, given as :
import cv2
import numpy as np
img = cv2.imread("fish.png")
# image of width 239 and height 200
bl_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
i = cv2.magnitude(
cv2.Sobel(bl_img, cv2.CV_64F, 1, 0, ksize=3),
cv2.Sobel(bl_img, cv2.CV_64F, 0, 1, ksize=3)
)
f = cv2.getGaussianKernel(5, 0.1, cv2.CV_64F)
g = cv2.getGaussianKernel(5, 0.1, cv2.CV_64F)
rows, cols, _ = img.shape
filtered = np.zeros(img.shape, dtype=img.dtype)
for r in range(rows):
for c in range(cols):
ks = []
for index in [-2,-1,1,2]:
if index + c > 0 and index + c < cols-1:
p = img[r][index + c]
s = img[r][c]
i_p = i[index+c]
i_s = i[c]
ks.append(
(f * (p-s)) * (g * (i_p * i_s)) # EQUATION 7
)
ks = np.sum(np.array(ks))
js = []
for index in [-2, -1, 1, 2]:
if index + c > 0 and index + c < cols -1:
p = img[r][index + c]
s = img[r][c]
i_p = i[index+c]
i_s = i[c]
js.append((f * (p-s)) * (g * (i_p * i_s)) * i_p) # EQUATION 6
js = np.sum(np.asarray(js))
js = js / ks
filtered[r][c] = js
cv2.imwrite("f.png", filtered)
But as I run this code I get an error saying:
Traceback (most recent call last):
File "bft.py", line 33, in <module>
(f * (p-s)) * (g * (i_p * i_s))
ValueError: operands could not be broadcast together with shapes (5,3) (5,239)
Did I incorrectly implement the equations? What am I missing?
There are various issues with your code. Foremost, the equation is interpreted in a wrong way. f(p-s) means evaluating the function f at p-s. f is the Gaussian. Likewise with g. The section of the code would look like this:
weight = gaussian(p - s, sigma_f) * gaussian(i_p - i_s, sigma_g)
ks.append(weight)
js.append(weight * i_p)
Note that the two loops can be merged, this way you avoid some duplicated computation. gaussian(x, sigma) would be a function that computes the Gaussian weight at x. You need to define two sigmas, sigma_f and sigma_g, the spatial and the tonal sigma respectively.
The second issue is in the definition of p and s. These are the coordinates of the pixel, not the value of the image at the pixel. i_p and i_s are the value of the image at those locations. p-s is basically the spatial distance between the pixel at (r,c) and the given neighbor.
The third issue is the loop over the neighborhood. The neighborhood is all pixels where gaussian(p - s, sigma_f) is not negligible. So how large the neighborhood is depends on the chosen sigma_f. You should take it at least to be ceil(2*sigma_f). Say sigma_f is 2, then you want the neighborhood to go from -4 to 4 (9 pixels). But this neighborhood is two dimensional, not one-dimensional as in your code. So you need two loops:
for ii in range(-ceil(2*sigma_f), ceil(2*sigma_f)+1):
if ii + c > 0 and ii + c < cols-1:
for jj in range(-ceil(2*sigma_f), ceil(2*sigma_f)+1):
if jj + r > 0 and jj + r < rows-1:
# compute weight here
Note that now, p-s is computed with math.sqrt(ii**2 + jj**2). But also note that the Gaussian uses x**2, so you could skip the computation of the square root by passing x**2 into your gaussian function.

Finding the position of an object in an image

My goal is to find the location of specific images on other images, using python. Take this example:
I want to find the location of the walnut in the image. The image of the walnut is known, so I think there is no need for any kind of advanced pattern matching or machine learning to tell if something is a walnut or not.
How would I go about finding the walnut in the image? Would a strategy along these lines work:
read images with with PIL
transform them into Numpy arrays
use Scipy's image filters (what filters?)
Thanks!
I would go with pure PIL.
Read in image and walnut.
Take any pixel of the walnut.
Find all pixels of the image with the same color.
Check to see if the surrounding pixels coincide with the surrounding pixels of the walnut (and break as soon as you find one mismatch to minimize time).
Now, if the picture uses lossy compression (like JFIF), the walnut of the image won't be exactly the same as the walnut pattern. In this case, you could define some threshold for comparison.
EDIT: I used the following code (by converting white to alpha, the colors of the original walnut slightly changed):
#! /usr/bin/python2.7
from PIL import Image, ImageDraw
im = Image.open ('zGjE6.png')
isize = im.size
walnut = Image.open ('walnut.png')
wsize = walnut.size
x0, y0 = wsize [0] // 2, wsize [1] // 2
pixel = walnut.getpixel ( (x0, y0) ) [:-1]
def diff (a, b):
return sum ( (a - b) ** 2 for a, b in zip (a, b) )
best = (100000, 0, 0)
for x in range (isize [0] ):
for y in range (isize [1] ):
ipixel = im.getpixel ( (x, y) )
d = diff (ipixel, pixel)
if d < best [0]: best = (d, x, y)
draw = ImageDraw.Draw (im)
x, y = best [1:]
draw.rectangle ( (x - x0, y - y0, x + x0, y + y0), outline = 'red')
im.save ('out.png')
Basically, one random pixel of the walnut and looking for the best match. This is a first step with not too bad an output:
What you still want to do is:
Increase the sample space (not only using one pixel, but maybe 10 or
20).
Not only check the best match, but the best 10 matches for
instance.
EDIT 2: Some improvements
#! /usr/bin/python2.7
import random
import sys
from PIL import Image, ImageDraw
im, pattern, samples = sys.argv [1:]
samples = int (samples)
im = Image.open (im)
walnut = Image.open (pattern)
pixels = []
while len (pixels) < samples:
x = random.randint (0, walnut.size [0] - 1)
y = random.randint (0, walnut.size [1] - 1)
pixel = walnut.getpixel ( (x, y) )
if pixel [-1] > 200:
pixels.append ( ( (x, y), pixel [:-1] ) )
def diff (a, b):
return sum ( (a - b) ** 2 for a, b in zip (a, b) )
best = []
for x in range (im.size [0] ):
for y in range (im.size [1] ):
d = 0
for coor, pixel in pixels:
try:
ipixel = im.getpixel ( (x + coor [0], y + coor [1] ) )
d += diff (ipixel, pixel)
except IndexError:
d += 256 ** 2 * 3
best.append ( (d, x, y) )
best.sort (key = lambda x: x [0] )
best = best [:3]
draw = ImageDraw.Draw (im)
for best in best:
x, y = best [1:]
draw.rectangle ( (x, y, x + walnut.size [0], y + walnut.size [1] ), outline = 'red')
im.save ('out.png')
Running this with scriptname.py image.png walnut.png 5 yields for instance:

3d rotation on image

I'm trying to get some code that will perform a perspective transformation (in this case a 3d rotation) on an image.
import os.path
import numpy as np
import cv
def rotation(angle, axis):
return np.eye(3) + np.sin(angle) * skew(axis) \
+ (1 - np.cos(angle)) * skew(axis).dot(skew(axis))
def skew(vec):
return np.array([[0, -vec[2], vec[1]],
[vec[2], 0, -vec[0]],
[-vec[1], vec[0], 0]])
def rotate_image(imgname_in, angle, axis, imgname_out=None):
if imgname_out is None:
base, ext = os.path.splitext(imgname_in)
imgname_out = base + '-out' + ext
img_in = cv.LoadImage(imgname_in)
img_size = cv.GetSize(img_in)
img_out = cv.CreateImage(img_size, img_in.depth, img_in.nChannels)
transform = rotation(angle, axis)
cv.WarpPerspective(img_in, img_out, cv.fromarray(transform))
cv.SaveImage(imgname_out, img_out)
When I rotate about the z-axis, everything works as expected, but rotating around the x or y axis seems completely off. I need to rotate by angles as small as pi/200 before I start getting results that seem at all reasonable. Any idea what could be wrong?
First, build the rotation matrix, of the form
[cos(theta) -sin(theta) 0]
R = [sin(theta) cos(theta) 0]
[0 0 1]
Applying this coordinate transform gives you a rotation around the origin.
If, instead, you want to rotate around the image center, you have to first shift the image center
to the origin, then apply the rotation, and then shift everything back. You can do so using a
translation matrix:
[1 0 -image_width/2]
T = [0 1 -image_height/2]
[0 0 1]
The transformation matrix for translation, rotation, and inverse translation then becomes:
H = inv(T) * R * T
I'll have to think a bit about how to relate the skew matrix to the 3D transformation. I expect the easiest route is to set up a 4D transformation matrix, and then to project that back to 2D homogeneous coordinates. But for now, the general form of the skew matrix:
[x_scale 0 0]
S = [0 y_scale 0]
[x_skew y_skew 1]
The x_skew and y_skew values are typically tiny (1e-3 or less).
Here's the code:
from skimage import data, transform
import numpy as np
import matplotlib.pyplot as plt
img = data.camera()
theta = np.deg2rad(10)
tx = 0
ty = 0
S, C = np.sin(theta), np.cos(theta)
# Rotation matrix, angle theta, translation tx, ty
H = np.array([[C, -S, tx],
[S, C, ty],
[0, 0, 1]])
# Translation matrix to shift the image center to the origin
r, c = img.shape
T = np.array([[1, 0, -c / 2.],
[0, 1, -r / 2.],
[0, 0, 1]])
# Skew, for perspective
S = np.array([[1, 0, 0],
[0, 1.3, 0],
[0, 1e-3, 1]])
img_rot = transform.homography(img, H)
img_rot_center_skew = transform.homography(img, S.dot(np.linalg.inv(T).dot(H).dot(T)))
f, (ax0, ax1, ax2) = plt.subplots(1, 3)
ax0.imshow(img, cmap=plt.cm.gray, interpolation='nearest')
ax1.imshow(img_rot, cmap=plt.cm.gray, interpolation='nearest')
ax2.imshow(img_rot_center_skew, cmap=plt.cm.gray, interpolation='nearest')
plt.show()
And the output:
I do not get the way you build your rotation matrix. It seems rather complicated to me. Usually, it would be built by constructing a zero matrix, putting 1 on unneeded axes, and the common sin, cos, -cos, sin into the two used dimensions. Then multiplying all these together.
Where did you get that np.eye(3) + np.sin(angle) * skew(axis) + (1 - np.cos(angle)) * skew(axis).dot(skew(axis)) construct from?
Try building the projection matrix from basic building blocks. Constructing a rotation matrix is fairly easy, and "rotationmatrix dot skewmatrix" should work.
You might need to pay attention to the rotation center though. Your image probably is placed at a virtual position of 1 on the z axis, so by rotating on x or y, it moves around a bit.
So you'd need to use a translation so z becomes 0, then rotate, then translate back. (Translation matrixes in affine coordinates are pretty simple, too. See wikipedia: https://en.wikipedia.org/wiki/Transformation_matrix )

Categories