I am trying to implement a bilateral filter from the paper Fast Bilateral Filteringfor the Display of High-Dynamic-Range Images. The equation (from the paper) that implements the bilateral filter is given as :
According to what I understood,
f is a Gaussian filter
g is a Gaussian filter
p is a pixel in a given image window
s is the current pixel
Ip is the intensity at the current pixel
With this, I wrote the code to implement these equations, given as :
import cv2
import numpy as np
img = cv2.imread("fish.png")
# image of width 239 and height 200
bl_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
i = cv2.magnitude(
cv2.Sobel(bl_img, cv2.CV_64F, 1, 0, ksize=3),
cv2.Sobel(bl_img, cv2.CV_64F, 0, 1, ksize=3)
)
f = cv2.getGaussianKernel(5, 0.1, cv2.CV_64F)
g = cv2.getGaussianKernel(5, 0.1, cv2.CV_64F)
rows, cols, _ = img.shape
filtered = np.zeros(img.shape, dtype=img.dtype)
for r in range(rows):
for c in range(cols):
ks = []
for index in [-2,-1,1,2]:
if index + c > 0 and index + c < cols-1:
p = img[r][index + c]
s = img[r][c]
i_p = i[index+c]
i_s = i[c]
ks.append(
(f * (p-s)) * (g * (i_p * i_s)) # EQUATION 7
)
ks = np.sum(np.array(ks))
js = []
for index in [-2, -1, 1, 2]:
if index + c > 0 and index + c < cols -1:
p = img[r][index + c]
s = img[r][c]
i_p = i[index+c]
i_s = i[c]
js.append((f * (p-s)) * (g * (i_p * i_s)) * i_p) # EQUATION 6
js = np.sum(np.asarray(js))
js = js / ks
filtered[r][c] = js
cv2.imwrite("f.png", filtered)
But as I run this code I get an error saying:
Traceback (most recent call last):
File "bft.py", line 33, in <module>
(f * (p-s)) * (g * (i_p * i_s))
ValueError: operands could not be broadcast together with shapes (5,3) (5,239)
Did I incorrectly implement the equations? What am I missing?
There are various issues with your code. Foremost, the equation is interpreted in a wrong way. f(p-s) means evaluating the function f at p-s. f is the Gaussian. Likewise with g. The section of the code would look like this:
weight = gaussian(p - s, sigma_f) * gaussian(i_p - i_s, sigma_g)
ks.append(weight)
js.append(weight * i_p)
Note that the two loops can be merged, this way you avoid some duplicated computation. gaussian(x, sigma) would be a function that computes the Gaussian weight at x. You need to define two sigmas, sigma_f and sigma_g, the spatial and the tonal sigma respectively.
The second issue is in the definition of p and s. These are the coordinates of the pixel, not the value of the image at the pixel. i_p and i_s are the value of the image at those locations. p-s is basically the spatial distance between the pixel at (r,c) and the given neighbor.
The third issue is the loop over the neighborhood. The neighborhood is all pixels where gaussian(p - s, sigma_f) is not negligible. So how large the neighborhood is depends on the chosen sigma_f. You should take it at least to be ceil(2*sigma_f). Say sigma_f is 2, then you want the neighborhood to go from -4 to 4 (9 pixels). But this neighborhood is two dimensional, not one-dimensional as in your code. So you need two loops:
for ii in range(-ceil(2*sigma_f), ceil(2*sigma_f)+1):
if ii + c > 0 and ii + c < cols-1:
for jj in range(-ceil(2*sigma_f), ceil(2*sigma_f)+1):
if jj + r > 0 and jj + r < rows-1:
# compute weight here
Note that now, p-s is computed with math.sqrt(ii**2 + jj**2). But also note that the Gaussian uses x**2, so you could skip the computation of the square root by passing x**2 into your gaussian function.
Related
I'm wanting to combine two normal texture maps. My understanding is one of the simplest methods is sum the red/green channels from each normal map and then divide by the length.
Reference a concept from here as well and trying to convert to python: https://blog.selfshadow.com/publications/blending-in-detail/ (the simpler UDN blending method)
float3 r = normalize(float3(n1.xy + n2.xy, n1.z));
Using the concept of dividing the rgb vector by its length as my "normalizing" method.
For any vector V = (x, y, z), |V| = sqrt(xx + yy + z*z) gives the
length of the vector. When we normalize a vector, we actually
calculate V/|V| = (x/|V|, y/|V|, z/|V|).
img1 = cv2.imread(str(base_image)).astype(np.float32)
img2 = cv2.imread(str(top_image)).astype(np.float32)
# img = img1 + img2 # Could just do this to combine r,g channels?
(b1, g1, r1) = cv2.split(img1)
(b2, g2, r2) = cv2.split(img2)
r = r1 + r2
g = g1 + g2
b = b1
r_norm = []
g_norm = []
for (_r, _g, _b) in zip(r.ravel(), g.ravel(), b.ravel()):
_l = length(_r, _g, _b)
r_norm.append((_r / _l)*255)
g_norm.append((_g / _l)*255)
r = np.reshape(r_norm, (-1, 2048))
g = np.reshape(g_norm, (-1, 2048))
img = np.dstack((b, g, r))
cv2.imwrite(str(output_path), img)
where length is defined as:
def length(r, g, b):
return math.sqrt((r ** 2 + g ** 2 + b ** 2))
But its not working..I get a very gray image.
On the side this process is slow so if anyone has ideas to speed up the loop (or remove it entirely) that would be awesome :). Been pulling my hair out on this one...
I want to implement a custom random erasing function.
This function would take an input image and a percentage to mask, but would then mask between 1 and 4 random rectangles whose total area adds up to the mask percentage.
For example, say my image is 100100 pixels, and my mask percent is 15% so I randomly choose to create 3 rectangles with random shapes such that their combined area sums up to 100100*0.15 pixels.
so far i managed to write the code that decides upon the width and height and amount of rectangles, but i struggle with the part that makes sure they don't mask the same spot.
img_c, img_h, img_w = img.shape[-3], img.shape[-2], img.shape[-1]
area = img_h * img_w
for _ in range(10):
block_num = torch.randint(1,4,(1,)).item()
block_sizes = torch.rand((block_num))
block_sizes = torch.round((block_sizes / block_sizes.sum()) * (area * mask_percent))
h = torch.round((torch.rand(block_num)+0.5) * block_sizes.sqrt())
w = torch.round(block_sizes / h)
xs = []
ys = []
if not (any(h < img_h) and any(w < img_w)):
continue
term = True
while term:
xs = [torch.randint(0, img_h - h_ + 1, size=(1, )).item() for h_ in h]
ys = [torch.randint(0, img_w - w_ + 1, size=(1, )).item() for w_ in w]
for iter,x in enumerate(xs):
if (x+h[iter]-xs)<0
#here i get all confused. should have a loop that goes over each point and checks that the location + axial size
#doesn't go over another point. it's confusing because should i also take care vice versa? maybe someone knows of a ready made solution?
return i, j, h, w, v
# Return original image
return 0, 0, img_h, img_w, img
the while loop is released once the random location generator generates locations that corresppond to the terms.
edit_____________
my latest attempt seems to work, but always exits the loop unsolved! is it just not a very likely set of parameters?
img = torch.rand(1,160,1024)
img_c, img_h, img_w = img.shape[-3], img.shape[-2], img.shape[-1]
area = img_h * img_w
for _ in range(100):
block_num = torch.randint(1,3,(1,)).item()
block_sizes = torch.rand((block_num))
block_sizes = torch.round((block_sizes / block_sizes.sum()) * (area * 0.15))
h = torch.round((torch.rand(block_num)+0.5) * block_sizes.sqrt()).to(dtype=torch.long)
w = torch.round(block_sizes / h).to(dtype=torch.long)
if (h > img_h).any() or (w > img_w).any():
continue
overlap1 = torch.zeros(img.shape)
xs = [torch.randint(0, img_h - h_.item() + 1, size=(1, )).item() for h_ in h]
ys = [torch.randint(0, img_w - w_.item() + 1, size=(1, )).item() for w_ in w]
for iter,(x,y) in enumerate(zip(xs,ys)):
overlap1[0,x:x+h[iter],y:y+w[iter]] += 1
if (overlap1>1).any():
continue
When you are checking that rectangle2 doesn't overlap with rectangle1, just check their intersection area. If intersection area is greater than 0, reject rectangle2 and check next random rectangle. Repeat the process with new rectangle.
For checking intersection - use sklearn's Jaccard score (avoid reinventing the wheel). To be able to use Jaccard for comparison, the two arrays (images) should be of same size. So generate original image size equivalent masks (mask1 and mask2) from your rectangle1 and rectangle2 respectively and then calculate Jaccard of mask1 and mask2.
import numpy as np
from sklearn.metrics import jaccard_score
mask1 = np.array([[0, 1, 1],
[1, 1, 0]])
mask2 = np.array([[1, 1, 1],
[1, 0, 0]])
jaccard_score(mask1, mask2)
0.6666
I have :
print(self.L.T.shape)
print(self.M.T.shape)
(8, 3)
(8, 9082318)
self.N = np.linalg.lstsq(self.L.T, self.M.T, rcond=None)[0].T
which is working fine and return
(9082318, 3)
But
I want to perform a kind of sort on M and compute the solution only on the best 8 - n values of M.
Or ignore values of M below and/or higher than a certain value.
Any pointer on how to do that would be extremely appreciated.
Thank you.
Tried to copy this solution exactly but it return an error
The original working function but basically it's just one line.
M is a stack of 8 grayscale images reshaped.
L is a stack of 8 light direction vectors.
M contains shadows but not always at the same location in the image.
So I need to remove those pixel from the computation but L must retains its dimensions.
def _solve_l2(self):
"""
Lambertian Photometric stereo based on least-squares
Woodham 1980
:return: None
Compute surface normal : numpy array of surface normal (p \times 3)
"""
self.N = np.linalg.lstsq(self.L.T, M.T, rcond=None)[0].T
print(self.N.shape)
self.N = normalize(self.N, axis=1) # normalize to account for diffuse reflectance
Here is the borrowed code from the link for trying to resolve this :
L and M as previously used
Ma = self.M.copy()
thresh = 300
Ma[self.M <= thresh] = 0
Ma[self.M > thresh] = 1
Ma = Ma.T
self.M = self.M.T
self.L = self.L.T
print(self.L.shape)
print(self.M.shape)
print(Ma.shape)
A = self.L
B = self.M
M = Ma #http://alexhwilliams.info/itsneuronalblog/2018/02/26/censored-lstsq/
# else solve via tensor representation
rhs = np.dot(A.T, M * B).T[:,:,None] # n x r x 1 tensor
T = np.matmul(A.T[None,:,:], M.T[:,:,None] * A[None,:,:]) # n x r x r tensor
self.N = np.squeeze(np.linalg.solve(T, rhs)).T # transpose to get r x n
return
numpy.linalg.LinAlgError: Singular matrix
I have been stuck here for sometime now. I cannot understand what am I doing wrong in calculating the displacement vectors along x-axis and y-axis using the Lucas Kanade method.
I implemented it as given in the above Wikipedia link. Here is what I have done:
import cv2
import numpy as np
img_a = cv2.imread("./images/1.png",0)
img_b = cv2.imread("./images/2.png",0)
# Calculate gradient along x and y axis
ix = cv2.Sobel(img_a, cv2.CV_64F, 1, 0, ksize = 3, scale = 1.0/3.0)
iy = cv2.Sobel(img_a, cv2.CV_64F, 0, 1, ksize = 3, scale = 1.0/3.0)
# Calculate temporal difference between the 2 images
it = img_b - img_a
ix = ix.flatten()
iy = iy.flatten()
it = -it.flatten()
A = np.vstack((ix, iy)).T
atai = np.linalg.inv(np.dot(A.T,A))
atb = np.dot(A.T, it)
v = np.dot(np.dot(np.linalg.inv(np.dot(A.T,A)),A.T),it)
print(v)
This code runs without an error but it prints an array of 2 values! I had expected the v matrix to be of the same size as that of the image. Why does this happen? What am I doing incorrectly?
PS: I know there are methods directly available with OpenCV but I want to write this simple algorithm (as also given in the Wikipedia link shared above) myself.
To properly compute the Lucas–Kanade optical flow estimate you need to solve the system of two equations for every pixel, using information from its neighborhood, not for the image as a whole.
This is the recipe (notation refers to that used on the Wikipedia page):
Compute the image gradient (A) for the first image (ix, iy in the OP) using any method (Sobel is OK, I prefer Gaussian derivatives; note that it is important to apply the right scaling in Sobel: 1/8).
ix = cv2.Sobel(img_a, cv2.CV_64F, 1, 0, ksize = 3, scale = 1.0/8.0)
iy = cv2.Sobel(img_a, cv2.CV_64F, 0, 1, ksize = 3, scale = 1.0/8.0)
Compute the structure tensor (ATWA): Axx = ix * ix, Axy = ix * iy, Ayy = iy * iy. Each of these three images must be smoothed with a Gaussian filter (this is the windowing). For example,
Axx = cv2.GaussianBlur(ix * ix, (0,0), 5)
Axy = cv2.GaussianBlur(ix * iy, (0,0), 5)
Ayy = cv2.GaussianBlur(iy * iy, (0,0), 5)
These three images together form the structure tensor, which is a 2x2 symmetric matrix at each pixel. For a pixel at (i,j), the matrix is:
| Axx(i,j) Axy(i,j) |
| Axy(i,j) Ayy(i,j) |
Compute the temporal gradient (b) by subtracting the two images (it in the OP).
it = img_b - img_a
Compute ATWb: Abx = ix * it, Aby = iy * it, and smooth these two images with the same Gaussian filter as above.
Abx = cv2.GaussianBlur(ix * it, (0,0), 5)
Aby = cv2.GaussianBlur(iy * it, (0,0), 5)
Compute the inverse of ATWA (a symmetric positive-definite matrix) and multiply by ATWb. Note that this inverse is of the 2x2 matrix at each pixel, not of the images as a whole. You can write this out as a set of simple arithmetic operations on the images Axx, Axy, Ayy, Abx and Aby.
The inverse of the matrix ATWA is given by:
| Ayy -Axy |
| -Axy Axx | / ( Axx*Ayy - Axy*Axy )
so you can write the solution as
norm = Axx*Ayy - Axy*Axy
vx = ( Ayy * Abx - Axy * Aby ) / norm
vy = ( Axx * Aby - Axy * Abx ) / norm
If the image is natural, it will have at least a tiny bit of noise, and norm will not have zeros. But for artificial images norm could have zeros, meaning you can't divide by it. Simply adding a small value to it will avoid division by zero errors: norm += 1e-6.
The size of the Gaussian filter is chosen as a compromise between precision and allowed motion speed: a larger filter will yield less precise results, but will work with larger shifts between images.
Typically, the vx and vy is only evaluated where the two eigenvalues of the matrix ATWA are sufficiently large (if at least one is small, the result is inaccurate or possibly wrong).
Using DIPlib (disclosure: I'm an author) this is all very easy because it supports images with a matrix at each pixel. You would do this as follows:
import diplib as dip
img_a = dip.ImageRead("./images/1.png")
img_b = dip.ImageRead("./images/2.png")
A = dip.Gradient(img_a, [1.0])
b = img_b - img_a
ATA = dip.Gauss(A * dip.Transpose(A), [5.0])
ATb = dip.Gauss(A * b, [5.0])
v = dip.Inverse(ATA) * ATb
I am creating a circular mask in python as follows:
import numpy as np
def make_mask(image, radius, center=(0, 0)):
r, c, d = image.shape
y, x = np.ogrid[-center[0]:r-center[0], -center[1]:r-center[1]]
mask = x*x + y*y <= radius*radius
array = np.zeros((r, c))
array[mask] = 1
return array
This returns a mask of shape (r, c). What I would like to do is have a weighted mask where the weight is 1 at the center of the image (given by the center parameter) and decreasing linearly towards the edge of the image. So, his should be an added weight calculated between 0 and 1 (0 not included) in the line. I was thinking this should be something like:
distance = (center[0] - x)**2 + (center[1] - y)**2
# weigh it inversely to distance from center
mask = (x*x + y*y) * 1.0/distance
However, this will result in divide by 0 and the mask would not be between 0 and 1 either.
First, if you want to weight to be linear, you need to take the square root of what you have for distance (ie, what you're calling "distance" isn't the distance from the center but the square of that, so you should rename it to something like R_squared). So:
R_squared = (center[0] - x)**2 + (center[1] - y)**2 # what you have for distance
r = sqrt(R_squared)
Then, since it starts off as 0 where you want it to be 1, add 1 to it; but now that you've added 1 scale the value so it's 1 where you want the result to be 0. Say you want it to be 0 at a distance L from then center, then your equation is:
weight = 1 - r/L
Here this will be 1 where r==0 and 0 where r==L.