I'm trying to implement the loss function of the classic Image Colorization paper by Levin et al (2004) in Tensorflow/Keras:
This is the weights equation (correlation between intensities):
y is every neighboring pixel of x in a 3x3 window and w is the weight for each of these pixels.
The weights require computing the mean and variance for the neighborhood of every pixel.
I couldn't find a function that would allow me to write this loss function in a symbolic way, and I'm thinking I should write it in a loop where I calculate the w for each window.
How can I write this Loss function in Tensorflow In a Symbolic way or in loops?
Thanks so much.
EDIT: Here's the code I've come up for calculating the weights in Numpy:
import cv2
import numpy as np
im = cv2.resize(cv2.imread('./Image.jpg', 0), (256, 256)) / np.float32(255.0)
M = 3
N = 3
# Split the image into 3x3 windows
windows = [im[x:x + M, y:y + N] for x in range(0, im.shape[0], M) for y in range(0, im.shape[1], N)]
# Calculate the correlation for each window
weights = [1 + np.corrcoef(tile) for tile in windows]
I think this code computes the value in your formula:
import tensorflow as tf
from itertools import product
SIGMA = 1.0
dtype = tf.float32
# Input images batch
img = tf.placeholder(dtype, [None, None, None])
img_shape = tf.shape(img)
img_height = img_shape[1]
img_width = img_shape[2]
# Compute 3 x 3 block means
mean_filter = tf.ones((3, 3), dtype) / 9
img_mean = tf.nn.conv2d(img[:, :, :, tf.newaxis],
mean_filter[:, :, tf.newaxis, tf.newaxis],
[1, 1, 1, 1], 'VALID')[:, :, :, 0]
# Remove 1px border
img_clip = img[:, 1:-1, 1:-1]
# Difference between pixel intensity and its block mean
x_diff = img_clip - img_mean
# Compute neighboring pixel loss contributions
contributions = []
for i, j in product((-1, 0, 1), repeat=2):
if i == j == 0: continue
# Take "shifted" image
displaced_img = img[:, 1 + i:img_width - 1 + i, 1 + j:img_height - 1 + j]
# Compute difference with mean of corresponding pixel block
y_diff = displaced_img - img_mean
# Weights formula
weight = 1 + x_diff * y_diff / (SIGMA ** 2)
# Contribution of this displaced image to the loss of each pixel
contribution = weight * displaced_img
contributions = tf.add_n(contributions)
# Compute loss value
loss = tf.reduce_sum(tf.squared_difference(img_clip, contributions))
The loss for the pixels along the image border is not computed, since in principle is not well defined in the formula, although you could make a few changes to take them into account if you want (change convolution to "'SAME'", pad where necessary, etc.).
this is a mean squared error of a 3 x 3 windows. right?
sounds like a GLCM matrix for texture analysis do you want apply this loss function for every 3x3 windows in the image?
I think that is better build the function that make this calculation with a Random weight in Numpy so after try build with TF to try a optimization.
I would like to know I to calculate the mean and the std of a given dataset of RGB images.
For example, with imagenet we have imagenet_stats: ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225].
I tried:
rgb_values = [np.mean(Image.open(img).getdata(), axis=0)/255 for img in imgs_path]
np.mean(rgb_values, axis=0)
np.std(rgb_values, axis=0)
I am not sure that the values I get are correct.
Which could be a better implementation?
Two solutions:
The first solution iterates over the images. It is MUCH slower than the second solution, and it uses the same amount of memory because it first loads and then stores all the images in a list. So it is strictly worse than the second solution, unless you will change how your images are loaded - load and process them one by one from disc.
The second solution needs to hold all images in memory at the same time. It is MUCH faster, because it is fully vectorized.
First solution (iterating over the images):
For each channel: R, G, B, here is how to calculate the means and stds of all the pixels in all the images:
Each image has the same number of pixels.
If this is not the case - use the second solution (below).
images_rgb = [np.array(Image.open(img).getdata()) / 255. for img in imgs_path]
# Each image_rgb is of shape (n, 3),
# where n is the number of pixels in each image,
# and 3 are the channels: R, G, B.
means = []
for image_rgb in images_rgb:
means.append(np.mean(image_rgb, axis=0))
mu_rgb = np.mean(means, axis=0) # mu_rgb.shape == (3,)
variances = []
for image_rgb in images_rgb:
var = np.mean((image_rgb - mu_rgb) ** 2, axis=0)
std_rgb = np.sqrt(np.mean(variances, axis=0)) # std_rgb.shape == (3,)
... that the mean and std will be same if calculated like this, and if calculated using all pixels at once:
Let's say each image has n pixels (with values vals_i), and there are m images.
Then there are (n*m) pixels.
The real_mean of all pixels in all vals_is is:
total_sum = sum(vals_1) + sum(vals_2) + ... + sum(vals_m)
real_mean = total_sum / (n*m)
Adding up the means of each image individually:
sum_of_means = sum(vals_1) / m + sum(vals_2) / m + ... + sum(vals_m) / m
= (sum(vals_1) + sum(vals_2) + ... + sum(vals_m)) / m
Now, what is the relationship between the real_mean and sum_of_means? - As you can see,
real_mean = sum_of_means / n
Analogously, using the formula for standard deviation, the real_std of all pixels in all vals_is is:
sum_of_square_diffs = sum(vals_1 - real_mean) ** 2
+ sum(vals_2 - real_mean) ** 2
+ ...
+ sum(vals_m - real_mean) ** 2
real_std = sqrt( total_sum / (n*m) )
If you look at this equation from another angle, you can see that real_std is basically the average of average variances of n values in m images.
Real mean and std:
rng = np.random.default_rng(0)
vals = rng.integers(1, 100, size=100) # data
mu = np.mean(vals)
50.93 # real mean
28.048976808432776 # real standard deviation
Comparing it to the image-by-image approach:
n_images = 10
means = []
for subset in np.split(vals, n_images):
new_mu = np.mean(means)
variances = []
for subset in np.split(vals, n_images):
var = np.mean((subset - mu) ** 2)
50.92999999999999 # calculated mean
28.048976808432784 # calculated standard deviation
Second solution (fully vectorized):
Using all the pixels of all images at once.
rgb_values = np.concatenate(
[Image.open(img).getdata() for img in imgs_path],
) / 255.
# rgb_values.shape == (n, 3),
# where n is the total number of pixels in all images,
# and 3 are the 3 channels: R, G, B.
# Each value is in the interval [0; 1]
mu_rgb = np.mean(rgb_values, axis=0) # mu_rgb.shape == (3,)
std_rgb = np.std(rgb_values, axis=0) # std_rgb.shape == (3,)
I want to implement a custom random erasing function.
This function would take an input image and a percentage to mask, but would then mask between 1 and 4 random rectangles whose total area adds up to the mask percentage.
For example, say my image is 100100 pixels, and my mask percent is 15% so I randomly choose to create 3 rectangles with random shapes such that their combined area sums up to 100100*0.15 pixels.
so far i managed to write the code that decides upon the width and height and amount of rectangles, but i struggle with the part that makes sure they don't mask the same spot.
img_c, img_h, img_w = img.shape[-3], img.shape[-2], img.shape[-1]
area = img_h * img_w
for _ in range(10):
block_num = torch.randint(1,4,(1,)).item()
block_sizes = torch.rand((block_num))
block_sizes = torch.round((block_sizes / block_sizes.sum()) * (area * mask_percent))
h = torch.round((torch.rand(block_num)+0.5) * block_sizes.sqrt())
w = torch.round(block_sizes / h)
xs = []
ys = []
if not (any(h < img_h) and any(w < img_w)):
term = True
while term:
xs = [torch.randint(0, img_h - h_ + 1, size=(1, )).item() for h_ in h]
ys = [torch.randint(0, img_w - w_ + 1, size=(1, )).item() for w_ in w]
for iter,x in enumerate(xs):
if (x+h[iter]-xs)<0
#here i get all confused. should have a loop that goes over each point and checks that the location + axial size
#doesn't go over another point. it's confusing because should i also take care vice versa? maybe someone knows of a ready made solution?
return i, j, h, w, v
# Return original image
return 0, 0, img_h, img_w, img
the while loop is released once the random location generator generates locations that corresppond to the terms.
my latest attempt seems to work, but always exits the loop unsolved! is it just not a very likely set of parameters?
img = torch.rand(1,160,1024)
img_c, img_h, img_w = img.shape[-3], img.shape[-2], img.shape[-1]
area = img_h * img_w
for _ in range(100):
block_num = torch.randint(1,3,(1,)).item()
block_sizes = torch.rand((block_num))
block_sizes = torch.round((block_sizes / block_sizes.sum()) * (area * 0.15))
h = torch.round((torch.rand(block_num)+0.5) * block_sizes.sqrt()).to(dtype=torch.long)
w = torch.round(block_sizes / h).to(dtype=torch.long)
if (h > img_h).any() or (w > img_w).any():
overlap1 = torch.zeros(img.shape)
xs = [torch.randint(0, img_h - h_.item() + 1, size=(1, )).item() for h_ in h]
ys = [torch.randint(0, img_w - w_.item() + 1, size=(1, )).item() for w_ in w]
for iter,(x,y) in enumerate(zip(xs,ys)):
overlap1[0,x:x+h[iter],y:y+w[iter]] += 1
if (overlap1>1).any():
When you are checking that rectangle2 doesn't overlap with rectangle1, just check their intersection area. If intersection area is greater than 0, reject rectangle2 and check next random rectangle. Repeat the process with new rectangle.
For checking intersection - use sklearn's Jaccard score (avoid reinventing the wheel). To be able to use Jaccard for comparison, the two arrays (images) should be of same size. So generate original image size equivalent masks (mask1 and mask2) from your rectangle1 and rectangle2 respectively and then calculate Jaccard of mask1 and mask2.
import numpy as np
from sklearn.metrics import jaccard_score
mask1 = np.array([[0, 1, 1],
[1, 1, 0]])
mask2 = np.array([[1, 1, 1],
[1, 0, 0]])
jaccard_score(mask1, mask2)
I am trying to implement a bilateral filter from the paper Fast Bilateral Filteringfor the Display of High-Dynamic-Range Images. The equation (from the paper) that implements the bilateral filter is given as :
According to what I understood,
f is a Gaussian filter
g is a Gaussian filter
p is a pixel in a given image window
s is the current pixel
Ip is the intensity at the current pixel
With this, I wrote the code to implement these equations, given as :
import cv2
import numpy as np
img = cv2.imread("fish.png")
# image of width 239 and height 200
bl_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
i = cv2.magnitude(
cv2.Sobel(bl_img, cv2.CV_64F, 1, 0, ksize=3),
cv2.Sobel(bl_img, cv2.CV_64F, 0, 1, ksize=3)
f = cv2.getGaussianKernel(5, 0.1, cv2.CV_64F)
g = cv2.getGaussianKernel(5, 0.1, cv2.CV_64F)
rows, cols, _ = img.shape
filtered = np.zeros(img.shape, dtype=img.dtype)
for r in range(rows):
for c in range(cols):
ks = []
for index in [-2,-1,1,2]:
if index + c > 0 and index + c < cols-1:
p = img[r][index + c]
s = img[r][c]
i_p = i[index+c]
i_s = i[c]
(f * (p-s)) * (g * (i_p * i_s)) # EQUATION 7
ks = np.sum(np.array(ks))
js = []
for index in [-2, -1, 1, 2]:
if index + c > 0 and index + c < cols -1:
p = img[r][index + c]
s = img[r][c]
i_p = i[index+c]
i_s = i[c]
js.append((f * (p-s)) * (g * (i_p * i_s)) * i_p) # EQUATION 6
js = np.sum(np.asarray(js))
js = js / ks
filtered[r][c] = js
cv2.imwrite("f.png", filtered)
But as I run this code I get an error saying:
Traceback (most recent call last):
File "bft.py", line 33, in <module>
(f * (p-s)) * (g * (i_p * i_s))
ValueError: operands could not be broadcast together with shapes (5,3) (5,239)
Did I incorrectly implement the equations? What am I missing?
There are various issues with your code. Foremost, the equation is interpreted in a wrong way. f(p-s) means evaluating the function f at p-s. f is the Gaussian. Likewise with g. The section of the code would look like this:
weight = gaussian(p - s, sigma_f) * gaussian(i_p - i_s, sigma_g)
js.append(weight * i_p)
Note that the two loops can be merged, this way you avoid some duplicated computation. gaussian(x, sigma) would be a function that computes the Gaussian weight at x. You need to define two sigmas, sigma_f and sigma_g, the spatial and the tonal sigma respectively.
The second issue is in the definition of p and s. These are the coordinates of the pixel, not the value of the image at the pixel. i_p and i_s are the value of the image at those locations. p-s is basically the spatial distance between the pixel at (r,c) and the given neighbor.
The third issue is the loop over the neighborhood. The neighborhood is all pixels where gaussian(p - s, sigma_f) is not negligible. So how large the neighborhood is depends on the chosen sigma_f. You should take it at least to be ceil(2*sigma_f). Say sigma_f is 2, then you want the neighborhood to go from -4 to 4 (9 pixels). But this neighborhood is two dimensional, not one-dimensional as in your code. So you need two loops:
for ii in range(-ceil(2*sigma_f), ceil(2*sigma_f)+1):
if ii + c > 0 and ii + c < cols-1:
for jj in range(-ceil(2*sigma_f), ceil(2*sigma_f)+1):
if jj + r > 0 and jj + r < rows-1:
# compute weight here
Note that now, p-s is computed with math.sqrt(ii**2 + jj**2). But also note that the Gaussian uses x**2, so you could skip the computation of the square root by passing x**2 into your gaussian function.
I've recently been attempting to implement the Lucas-Kanade algorithm for image alignment, as detailed in this paper here: https://www.ri.cmu.edu/pub_files/pub3/baker_simon_2004_1/baker_simon_2004_1.pdf
I've managed to implement the algorithm detailed in page 4 of the paper I linked, but the loss doesn't seem to converge. I've been looking over my code and my math, and can't seem to figure out where I might be going wrong.
What I've tried so far is implementing the entire algorithm, and re-doing my math for calculating the Jacobian of the warp, as well as just general checking of my code.
My code is below, as well as a more readable version on Pastebin: https://pastebin.com/j28mUV65
import cv2
import numpy as np
import matplotlib.pyplot as plt
def calculate_steepest_descent(grad_x_warped, grad_y_warped, h):
rows, columns = grad_x_warped.shape
steepest_descent = np.zeros((rows, columns, 8))
warp_jacobian = np.zeros((2, 8)) # 2 x 8 because it's a homography, would be 2 x 6 if it was affine
current_gradient = np.zeros((1, 2))
# Convert homography matrix into parameter array for better readability with the math functions later
p = h.flatten()
for y in range(rows):
for x in range(columns):
# Calculate Jacobian of the warp at each pixel, which contains the partial derivatives of the
# warp parameters with respect to x and y coordinates, evaluated at the current value
# of parameters
common_denominator = (p[6]*x + p[7]*y + 1)
warp_jacobian[0, 0] = (x) / common_denominator
warp_jacobian[0, 1] = (y) / common_denominator
warp_jacobian[0, 2] = (1) / common_denominator
warp_jacobian[0, 3] = 0
warp_jacobian[0, 4] = 0
warp_jacobian[0, 5] = 0
warp_jacobian[0, 6] = (-(p[0]*(x**2) + p[1]*x*y + p[2]*x)) / (common_denominator ** 2)
warp_jacobian[0, 7] = (-(p[1]*(y**2) + p[0]*x*y + p[2]*y)) / (common_denominator ** 2)
warp_jacobian[1, 0] = 0
warp_jacobian[1, 1] = 0
warp_jacobian[1, 2] = 0
warp_jacobian[1, 3] = (x) / common_denominator
warp_jacobian[1, 4] = (y) / common_denominator
warp_jacobian[1, 5] = (1) / common_denominator
warp_jacobian[1, 6] = (-(p[3]*(x**2) + p[4]*x*y + p[5]*x)) / (common_denominator ** 2)
warp_jacobian[1, 7] = (-(p[4]*(y**2) + p[3]*x*y + p[5]*y)) / (common_denominator ** 2)
# Get the x and y gradient intensity values corresponding to the current pixel location
current_gradient[0, 0] = grad_x_warped[y, x]
current_gradient[0, 1] = grad_y_warped[y, x]
# Calculate full Jacobian (aka steepest descent image) at current pixel value
steepest_descent[y, x, :] = np.dot(current_gradient, warp_jacobian)
return steepest_descent
def calculate_hessian(steepest_descent):
rows, columns, channels = steepest_descent.shape
hessian = np.zeros((channels, channels))
for y in range(rows):
for x in range(columns):
steepest_descent_single = steepest_descent[y, x, :][np.newaxis, :]
steepest_descent_single_transpose = np.transpose(steepest_descent_single)
hessian_current = np.dot(steepest_descent_single_transpose, steepest_descent_single)
hessian += hessian_current
return hessian
def calculate_sd_param_updates(steepest_descent, img_error):
rows, columns, channels = steepest_descent.shape
sd_param_updates = np.zeros((8, 1))
for y in range(rows):
for x in range(columns):
steepest_descent_single = steepest_descent[y, x, :][np.newaxis, :]
steepest_descent_single_transpose = np.transpose(steepest_descent_single)
img_error_single = img_error[y, x]
sd_param_updates += np.dot(steepest_descent_single_transpose, img_error_single)
return sd_param_updates
def calculate_final_param_updates(sd_param_updates, hessian):
hessian_inverse = np.linalg.inv(hessian)
final_param_updates = np.dot(hessian_inverse, sd_param_updates)
return final_param_updates
if __name__ == "__main__":
# Load image
reference = cv2.imread('test.png')
reference = cv2.cvtColor(reference, cv2.COLOR_BGR2GRAY)
# Generate template as small block from within reference image using homography
# 'h' is the ground truth homography for warping reference image onto template image
template_size = (100, 100)
h = np.float32([[1, 0, -100],[0, 1, -100],[0, 0, 1]])
h_ground_truth = h.copy()
template = cv2.warpPerspective(reference, h, template_size)
# Convert template corner points to reference image coordinate plane
template_corners = np.array([[0, 0],[0, 100],[100, 100],[100, 0]])
h_inverse = np.linalg.inv(h)
reference_corners = cv2.perspectiveTransform(np.array([template_corners], dtype='float32'), h_inverse)
# Small perturbation to ground truth homography
h_mod = np.random.uniform(low=-1.0, high=1.0, size=(h.shape))
h_mod = np.array([[1, 1, 1],[1, 1, 1],[1, 1, 1]])
h_mod[0, 0] = h_mod[0, 0] * 0
h_mod[0, 1] = -h_mod[0, 1] * 0
h_mod[0, 2] = h_mod[0, 2] * 10
h_mod[1, 0] = h_mod[1, 0] * 0
h_mod[1, 1] = h_mod[1, 1] * 0
h_mod[1, 2] = h_mod[1, 2] * 10
h_mod[2, 0] = h_mod[2, 0] * 0
h_mod[2, 1] = h_mod[2, 1] * 0
h_mod[2, 2] = h_mod[2, 1] * 0
h = h + h_mod
# Warp reference image to template image based on initial perturbed homography
reference_transformed = cv2.warpPerspective(reference, h, template_size)
# ##############################
# Lucas-Kanade algorithm below
# This is supposed to calculate the homography that undoes the small perturbation
# and returns a homography as close as possible to the ground truth homography
# ##############################
# Precompute image gradients
grad_x = cv2.Sobel(reference,cv2.CV_64F,1,0,ksize=1)
grad_y = cv2.Sobel(reference,cv2.CV_64F,0,1,ksize=1)
# Loop algorithm for given # of steps
for i in range(1000):
# Step 1
# Warp reference image onto coordinate frame of template
reference_transformed = cv2.warpPerspective(reference, h, template_size)
# Step 2
# Compute error image
img_error = template - reference_transformed
# fig_overlay = plt.figure()
# ax1 = fig_overlay.add_subplot(1,3,1)
# plt.imshow(img_warped)
# ax2 = fig_overlay.add_subplot(1,3,2)
# plt.imshow(template)
# ax3 = fig_overlay.add_subplot(1,3,3)
# plt.imshow(img_error)
# plt.show()
# Step 3
# Warp the gradients
grad_x_warped = cv2.warpPerspective(grad_x, h, template_size)
grad_y_warped = cv2.warpPerspective(grad_y, h, template_size)
# Step 4 & 5
# Use Jacobian of warp to calculate steepest descent images
steepest_descent = calculate_steepest_descent(grad_x_warped, grad_y_warped, h)
# fig_overlay = plt.figure()
# ax1 = fig_overlay.add_subplot(1,8,1)
# plt.imshow(steepest_descent[:, :, 0])
# ax2 = fig_overlay.add_subplot(1,8,2)
# plt.imshow(steepest_descent[:, :, 1])
# ax3 = fig_overlay.add_subplot(1,8,3)
# plt.imshow(steepest_descent[:, :, 2])
# ax4 = fig_overlay.add_subplot(1,8,4)
# plt.imshow(steepest_descent[:, :, 3])
# ax5 = fig_overlay.add_subplot(1,8,5)
# plt.imshow(steepest_descent[:, :, 4])
# ax6 = fig_overlay.add_subplot(1,8,6)
# plt.imshow(steepest_descent[:, :, 5])
# ax7 = fig_overlay.add_subplot(1,8,7)
# plt.imshow(steepest_descent[:, :, 6])
# ax8 = fig_overlay.add_subplot(1,8,8)
# plt.imshow(steepest_descent[:, :, 7])
# plt.show()
# Step 6
# Compute Hessian matrix
hessian = calculate_hessian(steepest_descent)
# Step 7
# Compute steepest descent parameter updates by
# dot producting error image with steepest descent images
sd_param_updates = calculate_sd_param_updates(steepest_descent, img_error)
# Step 8
# Compute final parameter updates
final_param_updates = calculate_final_param_updates(sd_param_updates, hessian)
# Step 9
# Update the parameters
h = h.reshape(-1,1)
h[:-1] += final_param_updates
h = h.reshape(3,3)
# Step 10
# Calculate norm of parameter updates
final_param_update_norm = np.linalg.norm(final_param_updates)
print("Final Param Norm: {}".format(final_param_update_norm))
reference_transformed = cv2.warpPerspective(reference, h, template_size)
cv2.imwrite('warps/warp_{}.png'.format(i), reference_transformed)
# Warp source image to destination based on homography
reference_transformed = cv2.warpPerspective(reference, h, template_size)
cv2.imwrite('final_warp.png', reference_transformed)
It should just need a reference image to test with.
The expected result is that the algorithm converges to a homography that matches the ground truth homography I calculate in the code, but the loss just seems to explode instead and I end up with a totally incorrect homography.
This should be a comment since I am not certain is the full cause of your problem
But it might be part of it
To solve a system of linear equations don't compute the inverse
hessian_inverse = np.linalg.inv(hessian)
and then multiply by it
final_param_updates = np.dot(hessian_inverse, sd_param_updates)
This is both wasteful and can cause more numerical instability than solving systems of linear equations normally have.
Instead use the method solve.
Computing the inverse will repeat some of the operations needed to do solve for each of the columns of the identity matrix. None of those operations is needed.
I have been stuck here for sometime now. I cannot understand what am I doing wrong in calculating the displacement vectors along x-axis and y-axis using the Lucas Kanade method.
I implemented it as given in the above Wikipedia link. Here is what I have done:
import cv2
import numpy as np
img_a = cv2.imread("./images/1.png",0)
img_b = cv2.imread("./images/2.png",0)
# Calculate gradient along x and y axis
ix = cv2.Sobel(img_a, cv2.CV_64F, 1, 0, ksize = 3, scale = 1.0/3.0)
iy = cv2.Sobel(img_a, cv2.CV_64F, 0, 1, ksize = 3, scale = 1.0/3.0)
# Calculate temporal difference between the 2 images
it = img_b - img_a
ix = ix.flatten()
iy = iy.flatten()
it = -it.flatten()
A = np.vstack((ix, iy)).T
atai = np.linalg.inv(np.dot(A.T,A))
atb = np.dot(A.T, it)
v = np.dot(np.dot(np.linalg.inv(np.dot(A.T,A)),A.T),it)
This code runs without an error but it prints an array of 2 values! I had expected the v matrix to be of the same size as that of the image. Why does this happen? What am I doing incorrectly?
PS: I know there are methods directly available with OpenCV but I want to write this simple algorithm (as also given in the Wikipedia link shared above) myself.
To properly compute the Lucas–Kanade optical flow estimate you need to solve the system of two equations for every pixel, using information from its neighborhood, not for the image as a whole.
This is the recipe (notation refers to that used on the Wikipedia page):
Compute the image gradient (A) for the first image (ix, iy in the OP) using any method (Sobel is OK, I prefer Gaussian derivatives; note that it is important to apply the right scaling in Sobel: 1/8).
ix = cv2.Sobel(img_a, cv2.CV_64F, 1, 0, ksize = 3, scale = 1.0/8.0)
iy = cv2.Sobel(img_a, cv2.CV_64F, 0, 1, ksize = 3, scale = 1.0/8.0)
Compute the structure tensor (ATWA): Axx = ix * ix, Axy = ix * iy, Ayy = iy * iy. Each of these three images must be smoothed with a Gaussian filter (this is the windowing). For example,
Axx = cv2.GaussianBlur(ix * ix, (0,0), 5)
Axy = cv2.GaussianBlur(ix * iy, (0,0), 5)
Ayy = cv2.GaussianBlur(iy * iy, (0,0), 5)
These three images together form the structure tensor, which is a 2x2 symmetric matrix at each pixel. For a pixel at (i,j), the matrix is:
| Axx(i,j) Axy(i,j) |
| Axy(i,j) Ayy(i,j) |
Compute the temporal gradient (b) by subtracting the two images (it in the OP).
it = img_b - img_a
Compute ATWb: Abx = ix * it, Aby = iy * it, and smooth these two images with the same Gaussian filter as above.
Abx = cv2.GaussianBlur(ix * it, (0,0), 5)
Aby = cv2.GaussianBlur(iy * it, (0,0), 5)
Compute the inverse of ATWA (a symmetric positive-definite matrix) and multiply by ATWb. Note that this inverse is of the 2x2 matrix at each pixel, not of the images as a whole. You can write this out as a set of simple arithmetic operations on the images Axx, Axy, Ayy, Abx and Aby.
The inverse of the matrix ATWA is given by:
| Ayy -Axy |
| -Axy Axx | / ( Axx*Ayy - Axy*Axy )
so you can write the solution as
norm = Axx*Ayy - Axy*Axy
vx = ( Ayy * Abx - Axy * Aby ) / norm
vy = ( Axx * Aby - Axy * Abx ) / norm
If the image is natural, it will have at least a tiny bit of noise, and norm will not have zeros. But for artificial images norm could have zeros, meaning you can't divide by it. Simply adding a small value to it will avoid division by zero errors: norm += 1e-6.
The size of the Gaussian filter is chosen as a compromise between precision and allowed motion speed: a larger filter will yield less precise results, but will work with larger shifts between images.
Typically, the vx and vy is only evaluated where the two eigenvalues of the matrix ATWA are sufficiently large (if at least one is small, the result is inaccurate or possibly wrong).
Using DIPlib (disclosure: I'm an author) this is all very easy because it supports images with a matrix at each pixel. You would do this as follows:
import diplib as dip
img_a = dip.ImageRead("./images/1.png")
img_b = dip.ImageRead("./images/2.png")
A = dip.Gradient(img_a, [1.0])
b = img_b - img_a
ATA = dip.Gauss(A * dip.Transpose(A), [5.0])
ATb = dip.Gauss(A * b, [5.0])
v = dip.Inverse(ATA) * ATb