Sobel filter implementation in scipy - python

I tried to implement the Sobel_X filter in scipy with convolve2d function.
I compared with the results from this function:
from scipy.signal import convolve2d
from scipy import misc
from skimage.exposure import rescale_intensity
import cv2
import numpy as np
#https://www.pyimagesearch.com/2016/07/25/convolutions-with-opencv-and-python/
def convolve(image, kernel):
# grab the spatial dimensions of the image, along with
# the spatial dimensions of the kernel
(iH, iW) = image.shape[:2]
(kH, kW) = kernel.shape[:2]
# print("Kh,Kw", kernel.shape[:2])
# allocate memory for the output image, taking care to
# "pad" the borders of the input image so the spatial
# size (i.e., width and height) are not reduced
pad = (kW - 1) // 2
# print("pad", pad)
image = cv2.copyMakeBorder(image, pad, pad, pad, pad,
cv2.BORDER_REPLICATE)
# self.imshow(image, "padded image")
output = np.zeros((iH, iW), dtype="float32")
# loop over the input image, "sliding" the kernel across
# each (x, y)-coordinate from left-to-right and top to
# bottom
for y in np.arange(pad, iH + pad):
for x in np.arange(pad, iW + pad):
# extract the ROI of the image by extracting the
# *center* region of the current (x, y)-coordinates
# dimensions
roi = image[y - pad:y + pad + 1, x - pad:x + pad + 1]
# perform the actual convolution by taking the
# element-wise multiplicate between the ROI and
# the kernel, then summing the matrix
k = (roi * kernel).sum()
# store the convolved value in the output (x,y)-
# coordinate of the output image
output[y - pad, x - pad] = k
# self.imshow(output, "padded image")
# rescale the output image to be in the range [0, 255]
output = rescale_intensity(output, in_range=(0, 255))
output = (output * 255).astype("uint8")
# return the output image
return output
Here are the Sobel_X Kernel and code to compare.
sobelX = np.array((
[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1]), dtype="int")]
testim=misc.face(gray=True)
convolved_func=convolve(testim, sobelX)
convolved_np=convolve2d(testim, sobelX, boundary='symm', mode='same')
cv2.imshow("Face", np.hstack((convolved_func,np.array(convolved_np, dtype="uint8"))))
cv2.waitKey(0)
cv2.destroyAllWindows()
As you can see here the results are entirely different
I can't get how to implement these filters to get the same results.
Should I somehow change the filter function or maybe there some special things in numpy to implement it, wright?
I tried to make the function for scipy as in this and that examples, but the results the same or worth (I've got black image).

You will get results slightly different.
Do thresholding to remove all numbers which are less than 0.
convolved_np[convolved_np<0]=0
That will give you something similar, still not the same. Some artifacts appeared.
I think these functions differ, that's why I have got a bit different results. Maybe there are some mistakes, so if you can add some to this answer, I will appreciate it.

Related

How to rotate and translate an image with opencv without losing off-screen data

I'm trying to use opencv to perform subsequent image transformations. I have an image that I want to both rotate and translate while keeping the overall image size constant. I've been using the warpAffine function with rotation and translation matrixes to perform the transformations, but the problem is that after performing one transformation some image data is lost and not carried over to the next transformation.
Original
Rotated
Translated and rotated. Note how the corners of the original image have been clipped off.
Desired output
What I would like is to have an image that's very similar to the translated image here, but without the corners clipped off. I understand that this occurs because after performing the first Affine transform the corner data is removed. However, I'm not really sure how I can preserve this data while still maintaining the image's original size with respect to its center. I am very new to computer vision and do not have a strong background in linear algebra or matrix math. Most of the code I've managed to make work has been learned from online tutorials, so any and all accessible help fixing this would be greatly appreciated!
Here is the code used to generate the above images:
import numpy as np
import cv2
def rotate_image(image, angle):
w, h = (image.shape[1], image.shape[0])
cx, cy = (w//2,h//2)
M = cv2.getRotationMatrix2D((cx,cy), -1*angle, 1.0)
rotated = cv2.warpAffine(image, M, (w,h))
return rotated
def translate_image(image, d_x, d_y):
M = np.float32([
[1,0,d_x],
[0,1,d_y]
])
return cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))
path = "dog.jpg"
image = cv2.imread(path)
angle = 30.0
d_x = 200
d_y = 300
rotated = rotate_image(image, angle)
translated = translate_image(rotated, d_x, d_y)
Chaining the rotation and translation transformations is what you are looking for.
Instead of applying the rotation and translation one after the other, we may apply cv2.warpAffine with the equivalent chained transformation matrix.
Using cv2.warpAffine only once, prevents the corner cutting that resulted by the intermediate image (intermediate rotated only image with cutted corners).
Chaining two transformation matrices is done by multiplying the two matrices.
In OpenCV convention, the first transformation is multiplied from the left side.
The last row of 2D affine transformation matrix is always [0, 0, 1].
OpenCV conversion is omitting the last row, so M is 2x3 matrix.
Chaining OpenCV transformations M0 and M1 applies the following stages:
Insert the "omitting last row" [0, 0, 1] to M0 and to M1.
T0 = np.vstack((M0, np.array([0, 0, 1])))
T1 = np.vstack((M1, np.array([0, 0, 1])))
Chain transformations by matrix multiplication:
T = T1 # T0
Remove the last row (equals [0, 0, 1]) for matching OpenCV 2x3 convention:
M = T[0:2, :]
The higher level solution applies the following stages:
Compute rotation transformation matrix.
Compute translation transformation matrix.
Chain the rotation and translation transformations.
Apply affine transformation with the chained transformations matrix.
Code sample:
import numpy as np
import cv2
def get_rotation_mat(image, angle):
w, h = (image.shape[1], image.shape[0])
cx, cy = (w//2,h//2)
M = cv2.getRotationMatrix2D((cx, cy), -1*angle, 1.0)
#rotated = cv2.warpAffine(image, M, (w,h))
return M
def get_translation_mat(d_x, d_y):
M = np.float64([
[1, 0, d_x],
[0, 1, d_y]
])
#return cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))
return M
def chain_affine_transformation_mats(M0, M1):
"""
Chaining affine transformations given by M0 and M1 matrices.
M0 - 2x3 matrix applying the first affine transformation (e.g rotation).
M1 - 2x3 matrix applying the second affine transformation (e.g translation).
The method returns M - 2x3 matrix that chains the two transformations M0 and M1 (e.g rotation then translation in a single matrix).
"""
T0 = np.vstack((M0, np.array([0, 0, 1]))) # Add row [0, 0, 1] to the bottom of M0 ([0, 0, 1] applies last row of eye matrix), T0 is 3x3 matrix.
T1 = np.vstack((M1, np.array([0, 0, 1]))) # Add row [0, 0, 1] to the bottom of M1.
T = T1 # T0 # Chain transformations T0 and T1 using matrix multiplication.
M = T[0:2, :] # Remove the last row from T (the last row of affine transformations is always [0, 0, 1] and OpenCV conversion is omitting the last row).
return M
path = "dog.jpg"
image = cv2.imread(path)
angle = 30.0
d_x = 200
d_y = 300
#rotated = rotate_image(image, angle)
#translated = translate_image(rotated, d_x, d_y)
rotationM = get_rotation_mat(image, angle) # Compute rotation transformation matrix
translationM = get_translation_mat(d_x, d_y) # Compute translation transformation matrix
M = chain_affine_transformation_mats(rotationM, translationM) # Chain rotation and translation transformations (translation after rotation)
transformed_image = cv2.warpAffine(image, M, (image.shape[1], image.shape[0])) # Apply affine transformation with the chained (unified) matrix M.
cv2.imwrite("transformed_dog.jpg", transformed_image) # Store output for testing
Output:

Upsampling images in frequency domain using Pytorch

I'm trying to upsample an RGB image in the frequency domain, using Pytorch. I'm using this article for reference on grayscale images. Since Pytorch processes the channels individually, I figure the colorspace is irrelevant here.
The basic steps outlined by this article are:
Perform FFT on the image.
Pad the FFT with zeros.
Perform inverse FFT.
I wrote the following code for the same:
import torch
import cv2
import numpy as np
img = src = cv2.imread('orig.png')
torch_img = torch.from_numpy(img).to(torch.float32).permute(2, 0, 1) / 255.
fft = torch.fft.fft2(torch_img, norm="forward")
fr = fft.real
fi = fft.imag
fr = F.pad(fr, (fft.shape[-1]//2, fft.shape[-1]//2, fft.shape[-2]//2, fft.shape[-2]//2), mode='constant', value=0)
fi = F.pad(fi, (fft.shape[-1]//2, fft.shape[-1]//2, fft.shape[-2]//2, fft.shape[-2]//2), mode='constant', value=0)
fft_hires = torch.complex(fr, fi)
inv = torch.fft.ifft2(fft_hires, norm="forward").real
print(inv.max(), inv.min())
img = (inv.permute(1, 2, 0).detach()).clamp(0, 1)
img = (255 * img).numpy().astype(np.uint8)
cv2.imwrite('hires.png', img)
The original image:
The upscaled image:
Another interesting thing to note is the maximum and minimum values of the image pixels after performing IFFT: they are 2.2729 and -1.8376 respectively. Ideally, they should be 1.0 and 0.0.
Can someone please explain what's wrong here?
The usual convention for the DFT is to treat the first sample as 0Hz component. But you need to have the 0Hz component in the center in order for padding to make sense. Most FFT tools provide a shift function to circularly shift your result so that the 0Hz component is in the center. In pytorch you need to perform torch.fft.fftshift after the FFT and torch.fft.ifftshift right before taking the inverse FFT to put the 0Hz component back in the upper left corner.
import torch
import torch.nn.functional as F
import cv2
import numpy as np
img = src = cv2.imread('orig.png')
torch_img = torch.from_numpy(img).to(torch.float32).permute(2, 0, 1) / 255.
# note the fftshift
fft = torch.fft.fftshift(torch.fft.fft2(torch_img, norm="forward"))
fr = fft.real
fi = fft.imag
fr = F.pad(fr, (fft.shape[-1]//2, fft.shape[-1]//2, fft.shape[-2]//2, fft.shape[-2]//2), mode='constant', value=0)
fi = F.pad(fi, (fft.shape[-1]//2, fft.shape[-1]//2, fft.shape[-2]//2, fft.shape[-2]//2), mode='constant', value=0)
# note the ifftshift
fft_hires = torch.fft.ifftshift(torch.complex(fr, fi))
inv = torch.fft.ifft2(fft_hires, norm="forward").real
print(inv.max(), inv.min())
img = (inv.permute(1, 2, 0).detach()).clamp(0, 1)
img = (255 * img).numpy().astype(np.uint8)
cv2.imwrite('hires.png', img)
which produces the following hires.png

Linear-Blurring an Image

I'm trying to blurr an image by mapping each pixel to the average of the N pixels to the right of it (in the same row). My iterative solution produces good output, but my linear-algebra solution is producing bad output.
From testing, I believe my kernel-matrix is correct; and, I know the last N rows don't get blurred, but that's fine for now. I'd appreciate any hints or solutions.
iterative-solution output (good), linear-algebra output (bad)
original image; and here is the failing linear-algebra code:
def blur(orig_img):
# turn image-mat into a vector
flattened_img = orig_img.flatten()
L = flattened_img.shape[0]
N = 3
# kernel
kernel = np.zeros((L, L))
for r, row in enumerate(kernel[0:-N]):
row[r:r+N] = [round(1/N, 3)]*N
print(kernel)
# blurr the img
print('starting blurring')
blurred_img = np.matmul(kernel, flattened_img)
blurred_img = blurred_img.reshape(orig_img.shape)
return blurred_img
The equation I'm modelling is this:
One option might be to just use a kernel and a convolution?
For example if we load a gray scale image like so:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from scipy import ndimage
# load a hackinsh grayscale image
image = np.asarray(Image.open('cup.jpg')).mean(axis=2)
plt.imshow(image)
plt.title('Gray scale image')
plt.show()
Now one can use a kernel and convolution. For example to create a filter that filters just one rows and compute the value of the center pixel as the difference between the pixels to the right and left one can do the following:
# Create a kernel that takes the difference between neighbors horizontal pixes
k = np.array([[-1,0,1]])
plt.subplot(121)
plt.title('Kernel')
plt.imshow(k)
plt.subplot(122)
plt.title('Output')
plt.imshow(ndimage.convolve(image, k, mode='constant', cval=0.0))
plt.show()
Therefore, one can blurr an image by mapping each pixel to the average of the N pixels to the right of it by creating the appropiate kernel.
# Create a kernel that takes the average of N pixels to the right
n=10
k = np.zeros(n*2);k[n:]=1/n
k = k[np.newaxis,...]
plt.subplot(121)
plt.title('Kernel')
plt.imshow(k)
plt.subplot(122)
plt.title('Output')
plt.imshow(ndimage.convolve(image, k, mode='constant', cval=0.0))
plt.show()
The issue was incorrect usage of cv2.imshow() in displaying the output image. It expects floating-point pixel values to be in [0, 1]; which, is done in the below code (near bottom):
def blur(orig_img):
flattened_img = orig_img.flatten()
L = flattened_img.shape[0]
N = int(round(0.1 * orig_img.shape[0], 0))
# mask (A)
mask = np.zeros((L, L))
for r, row in enumerate(mask[0:-N]):
row[r:r+N] = [round(1/N, 2)]*N
# blurred img = A * flattened_img
print('starting blurring')
blurred_img = np.matmul(mask, flattened_img)
blurred_img = blurred_img.reshape(orig_img.shape)
cv2.imwrite('blurred_img.png', blurred_img)
# normalize img to [0,1]
blurred_img = (
blurred_img - blurred_img.min()) / (blurred_img.max()-blurred_img.min())
return blurred_img
Ammended output
Thank you to #CrisLuengo for identifying the issue.

Using gabor kernel to extract vertical lines results in black image

I understand the concept of the gabor kernel and how it can be used to identify directional edges. So I want to use it to identify barcode lines in images.
However when I filter an image with a gabor kernel I always get a blank/black result. Can you provide feedback on what I need to do to get Gabor to identify the vertical lines in an image, ie, produce a result that has white where the vertical edges are?
Input image:
Result:
import cv2
import numpy as np
def deginrad(degree):
radiant = 2*np.pi/360 * degree
return radiant
def main():
src = cv2.imread('./images/barcode1.jpg', cv2.IMREAD_GRAYSCALE)
# Introduce consistency in width
const_width = 300
aspect = float(src.shape[0]) / float(src.shape[1])
src = cv2.resize(src, (const_width, int(const_width * aspect)))
src = cv2.GaussianBlur(src, (7,7), 0)
# Apply gabor kernel to identify vertical edges
g_kernel = cv2.getGaborKernel((9,9), 8, deginrad(0), 5, 0.5, 0, ktype=cv2.CV_32F)
gabor = cv2.filter2D(src, cv2.CV_8UC3, g_kernel)
# Visual the gabor kernel
h, w = g_kernel.shape[:2]
g_kernel = cv2.resize(g_kernel, (20*w, 20*h), interpolation=cv2.INTER_CUBIC)
cv2.imshow('src', src)
cv2.imshow('gabor', gabor) # gabor is just black
cv2.imshow('gabor kernel', g_kernel)
cv2.waitKey(0)
if __name__ == "__main__":
main()
You need to play with the parameters to view it properly. The parameters are as follows-
cv2.getGaborKernel(ksize, sigma, theta, lambda, gamma, psi, ktype)
with
g_kernel = cv2.getGaborKernel((31,31), 4, deginrad(0), 5, 0.5, 0, ktype=cv2.CV_32F)
the result is -

Non-maximum suppression in corner detection

I am writing a Harris Corner Detection algorithm in Python, and am up to performing non-max suppression in order to detect the corner points.
I have found the corner response function R which appears to be accurate when I print it out, however I do not know where to go from here. I roughly understand the concept of non-max suppression, i.e. taking the pixel with the highest intensity within a window, setting that as the corner point and the rest to 0. Though I am not sure how to go about this in regard to implementation.
After calculating it, would I then use the map it creates to set those pixels within the original image to a particular color (to indicate which are corners)?
My code so far is as follows:
import matplotlib.pyplot as plt
import numpy as np
import cv2
# Load image
img = cv2.imread('mountains.jpg')
# Make a copy of the image
img_copy = np.copy(img)
# Convert image from BGR to RGB
img_copy = cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB)
# Convert to grayscale for filtering
gray = cv2.cvtColor(img_copy, cv2.COLOR_RGB2GRAY)
# Copy grayscale and convert to float32 type
gray_1 = np.copy(gray)
gray_1 = np.float32(gray_1)
img_1 = np.copy(img)
# Compute derivatives in both x and y directions
sobelx = cv2.Sobel(gray_1, cv2.CV_64F, 1, 0, ksize=5)
sobely = cv2.Sobel(gray_1, cv2.CV_64F, 0, 1, ksize=5)
# Determine M = [A C ; C B] performing element-wise multiplication
A = np.square(sobelx)
B = np.square(sobely)
C = np.multiply(sobelx, sobely)
# Apply gaussian filter to matrix components
gauss = np.array([[1, 2, 1],
[2, 4, 2],
[1, 2, 1]])/16
A_fil = cv2.filter2D(A, cv2.CV_64F, gauss)
B_fil = cv2.filter2D(B, cv2.CV_64F, gauss)
C_fil = cv2.filter2D(C, cv2.CV_64F, gauss)
# Calculate determinant
det = A_fil * B_fil - (C_fil ** 2)
# Calculate trace (alpha = 0.04 to 0.06)
alpha = 0.04
trace = alpha * (A_fil + B_fil) ** 2
# Using determinant and trace, calculate corner response function
R = det - trace
# Display corner response function
f, ax1 = plt.subplots(1, 1, figsize=(20,10))
ax1.set_title('Corner response fuction')
ax1.imshow(R, cmap="gray")
(Note: Stack overflow images were not working properly)
Output:
Using OpenCV's Harris Corner Detection:
It's been a while so you likely already have your answer but since no one has responded yet, I'll leave this for posterity.
Here is a simple algorithm for non-max suppression:
partition your image into tiles (e.g. 64 x 48 pixels)
choose a minimum separation space (e.g. >= 10 pixels between features)
choose a max number of features per tile (such as 5)
Sort your features in each window tile by R (highest to lowest) and then only accept (at most) 5 largest features that are at least 10 pixels away from other features. You won't be able to guarantee a minimum 10px distance across window tiles but this is a reasonable start.

Categories