Inconsistency when comparing scipy, torch and fourier periodic convolution - python

I'm implementing a 2d periodic convolution on a synthetic image in three different ways: using scipy, using torch and using the Fourier transform (also under torch framework).
However, I've got different results. Performing the operation by hand I can see that scipy's convolution yields the correct results. torch's spatial version, on the other hand, yields the expected result inverted. Finally, the Fourier version returns something unexpected.
The code is the following:
import torch
import numpy as np
import scipy.signal as sig
import torch.nn.functional as F
import matplotlib.pyplot as plt
def numpy_periodic_conv(f, k):
H, W = f.shape
periodic_f = np.hstack([f, f])
periodic_f = np.vstack([periodic_f, periodic_f])
conv = sig.convolve2d(periodic_f, k, mode='same')
conv = conv[H // 2:-H // 2, W // 2:-W // 2]
return periodic_f, conv
def torch_periodic_conv(f, k):
H, W = f.shape[-2:]
periodic_f = f.repeat(1, 1, 2, 2)
conv = F.conv2d(periodic_f, k, padding=1)
conv = conv[:, :, H // 2:-H // 2, W // 2:-W // 2]
return periodic_f.squeeze().numpy(), conv.squeeze().numpy()
def torch_fourier_conv(f, k):
pad_x = f.shape[-2] - k.shape[-2]
pad_y = f.shape[-1] - k.shape[-1]
expanded_kernel = F.pad(k, [0, pad_x, 0, pad_y])
fft_x = torch.rfft(f, 2, onesided=False)
fft_kernel = torch.rfft(expanded_kernel, 2, onesided=False)
real = fft_x[:, :, :, :, 0] * fft_kernel[:, :, :, :, 0] - \
fft_x[:, :, :, :, 1] * fft_kernel[:, :, :, :, 1]
im = fft_x[:, :, :, :, 0] * fft_kernel[:, :, :, :, 1] + \
fft_x[:, :, :, :, 1] * fft_kernel[:, :, :, :, 0]
fft_conv = torch.stack([real, im], -1) # (a+bj)*(c+dj) = (ac-bd)+(ad+bc)j
ifft_conv = torch.irfft(fft_conv, 2, onesided=False)
return expanded_kernel.squeeze().numpy(), ifft_conv.squeeze().numpy()
if __name__ == '__main__':
f = np.concatenate([np.ones((10, 5)), np.zeros((10, 5))], 1)
k = np.array([[1, 0, -1], [2, 0, -2], [1, 0, -1]])
f_tensor = torch.from_numpy(f).unsqueeze(0).unsqueeze(0).float()
k_tensor = torch.from_numpy(k).unsqueeze(0).unsqueeze(0).float()
np_periodic_f, np_periodic_conv = numpy_periodic_conv(f, k)
tc_periodic_f, tc_periodic_conv = torch_periodic_conv(f_tensor, k_tensor)
tc_fourier_k, tc_fourier_conv = torch_fourier_conv(f_tensor, k_tensor)
print('Spatial numpy conv shape= ', np_periodic_conv.shape)
print('Spatial torch conv shape= ', tc_periodic_conv.shape)
print('Fourier torch conv shape= ', tc_fourier_conv.shape)
r_np = dict(name='numpy', im=np_periodic_f, k=k, conv=np_periodic_conv)
r_torch = dict(name='torch', im=tc_periodic_f, k=k, conv=tc_periodic_conv)
r_fourier = dict(name='fourier', im=f, k=tc_fourier_k, conv=tc_fourier_conv)
titles = ['{} im', '{} kernel', '{} conv']
results = [r_np, r_torch, r_fourier]
fig, axs = plt.subplots(3, 3)
for i, r_dict in enumerate(results):
axs[i, 0].imshow(r_dict['im'], cmap='gray')
axs[i, 0].set_title(titles[0].format(r_dict['name']))
axs[i, 1].imshow(r_dict['k'], cmap='gray')
axs[i, 1].set_title(titles[1].format(r_dict['name']))
axs[i, 2].imshow(r_dict['conv'], cmap='gray')
axs[i, 2].set_title(titles[2].format(r_dict['name']))
The results I'm obtaining:
Note: The image for both numpyand torch versions show the periodic image, which is required to perform the periodic convolution. The kernel for the Fourier version shows the original kernel zero-padded to the image size, which is required to compute the element-wise multiplication in the frequency domain.
-Edit1: There was an error when in the multiplication in the Fourier version, I was doing (ac-bd)+(ad-bc)j instead of (ac-bd)+(ad+bc)j. But now, I get the convolution shifted by one column.
-Edit2: torch's spatial convolution results is inverted because the operation is actually a cross-correlation. This was confirmed in the pytorch's official forum here. Furthermore, after fixing the kernel padding as Cris Luengo's answer, the frequency method yielded the same results as the correlations. This is rather strange for me because, as far as I know, the frequency property hold for convolution, not correlation.
New-results after fixing the kernel:

The FFT result is wrong because the padding is wrong. When padding, you need to put the origin (center of the kernel) at the top-left corner of the image. See this other answer for details.
The difference between the other two is the difference between a convolution and a correlation. It looks like the “numpy“ result is a convolution, the “torch” one a correlation.


fastest way to calculate edges (derivatives) of a big torch tensor

Given a tensor with shape (b,c,h,w), I want to extract edges of the spatial data, that is, calculate x, y direction derivatives of the (h,w) and calculate the magnitude I=sqrt(|x_amplitude|^2+|y_amplitude|^2)
My current implementation is as followed
row_mat = np.asarray([[0, 0, 0], [1, 0, -1], [0, 0, 0]])
col_mat = row_mat.T
row_mat = row_mat[None, None, :, :] # expand dim to convolve with tensor (batch,channel,width,height)
col_mat = col_mat[None, None, :, :] # expand dim to convolve with tensor (batch,channel,width,height)
def derivative(batch: torch.Tensor) -> torch.Tensor:
uses convolution to perform x and y derivatives
:param batch: input tensor batch
:return: image derivative magnitudes
x_amplitude = ndimage.convolve(batch, row_mat)
y_amplitude = ndimage.convolve(batch, col_mat)
magnitude = np.sqrt(np.abs(x_amplitude) ** 2 + np.abs(y_amplitude) ** 2)
return torch.tensor(magnitude)
I was wondering if there's a faster way, as this approach actually convolves using the definition of a derivative, so there might be downsides to that.
PS. to test this you can use the tensor torch.randn(1000,128,28,28), as these are the dimension I'm dealing with
For this specific operation you might be able to speed things up a bit by doing it "manually":
import torch.nn.functional as nnf
def derivative(batch: torch.Tensor) -> torch.Tensor:
# pad batch
x = nnf.pad(batch, (1, 1, 1, 1), mode='reflect')
dx2 = (x[..., 1:-2, :-2] - x[..., 1:-2, 2:])**2
dy2 = (x[..., :-2, 1:-2] - x[..., 2:, 1:-2])**2
mag = torch.sqrt(dx2 + dy2)
return mag

How to compute loss given formula?

In this paper the method of depth estimation is proposed. There is a disparity smoothness loss
where N is number of pixels, d is the disparity map, I is the image. I wonder if the following implementation in torch is correct, since I am getting poor results and want to exclude possibility of incorrect implementation.
import torch
from torch import nn
import torch.nn.functional as F
import torch.linalg as LA
class DisparitySmoothnessLoss(nn.Module):
def x_grad(image):
image = F.pad(image, (0, 1, 0, 0), mode='replicate')
grad_x = image[:, :, :, :-1] - image[:, :, :, 1:]
return grad_x
def y_grad(image):
image = F.pad(image, (0, 0, 0, 1), mode='replicate')
grad_y = image[:, :, :-1, :] - image[:, :, 1:, :]
return grad_y
def forward(self, disparity_map, image):
disparity_grad_x = self.x_grad(disparity_map)
disparity_grad_y = self.y_grad(disparity_map)
image_grad_x = self.x_grad(image)
image_grad_y = self.y_grad(image)
return torch.mean(torch.abs(disparity_grad_x) * torch.exp(-LA.vector_norm(image_grad_x, dim=1, keepdim=True)) + \
torch.abs(disparity_grad_y) * torch.exp(-LA.vector_norm(image_grad_y, dim=1, keepdim=True)))
I expect a list of grads into disparity grads like that implementation. On paper with code you can check many implementations of that article, you can also transform the TensorFlow ones (the official) to PyTorch if you do not trust into PyTorch one.

Fancy indexing for numpy arrary

I want to sample len(valid_frame_id_ls) frame from data by fancy indexing for numpy arrary. But I received an error message when i run code1. I don't know why the shape of data[n, :, valid_frame_id_ls, :, :] is not equal to the shape of new_data[n, :, :len(valid_frame_id_ls), :, :].Can anyone help me solve this bug. help...
I modify my code and write in code2 block. I did't receive any error message when i run code2. I don't know why code2 is correct.
data = np.random.random((2, 3, 50, 25, 1))
N, C, T, V, M = data.shape
new_data = np.zeros((N, C, T, V, M))
valid_frame_id_ls = [2, 3, 4, 5, 6]
for n in range(N):
new_data[n, :, :len(valid_frame_id_ls), :, :] = data[n, :, valid_frame_id_ls, :, :]
# code1 error message:
new_data[n, :, :len(valid_frame_id_ls), :, :] = data[n, :, valid_frame_id_ls, :, :]
ValueError: could not broadcast input array from shape (5,3,25,1) into shape (3,5,25,1)
data = np.random.random((2, 3, 50, 25, 1))
N, C, T, V, M = data.shape
new_data = np.zeros((N, C, T, V, M))
valid_frame_id_ls = [2, 3, 4, 5, 11]
for n in range(N):
new_data[n][:, :len(valid_frame_id_ls), :, :] = data[n][ :, valid_frame_id_ls, :, :]
As described in this section of the docs, putting a slice in the middle of 'advanced' indexing results in an unexpected rearrangement of dimensions. Your size 5 dimension has been placed first, and the other dimensions after.
This has come up occasionally on SO as well. With a scalar n this really shouldn't be happening, but apparently the issue occurs deep in the indexing, and isn't easily corrected.
data[n][ :, valid_frame_id_ls, :, :]
breaks up the indexing, so the first ':' is no longer in the middle.
Another fix is to replace the slice with an equivalent array. Now both sides will have the same dimensions.
new_data[n, :, np.arange(len(valid_frame_id_ls)), :, :] = data[n, :, valid_frame_id_ls, :, :]
Though in this case I don't think you need to iterate on N at all:
new_data[:,:,:len(valid_frame_id_ls),:,:] = data[:,:, valid_frame_id_ls, :,:]

Numpy Slicing - Calculate Matrix PseudoInverses without Iteration from 3x3 array

I have N, 2x4 arrays stored in a (2x4xN) array J. I am trying to calculate the pseudoinverse for each of the N, 2x4 arrays, and save the pseudoinverses to a (N x 4 x 2) array J_pinv.
What I'm currently doing:
J_pinvs = np.zeros((N, 4, 2))
for i in range(N):
J_pinvs[i, :, :] = np.transpose(J[:, :, i]) # np.linalg.inv(J[:, :, i] # J[:, :, i].transpose())
This works but I would like to speed up the compute time as this will be running in a layer of a neural network so I would like to make it as fast as possible.
What I've tried:
J_pinvs = np.zeros((N, 4, 2))
J_pinvs2[:, :, :] = np.transpose(J[:, :, :]) # np.linalg.inv(J[:, :, :] # J[:, :, :].transpose())
Generates the error:
<ipython-input-87-d8ee1ba2ae5e> in <module>
1 J_pinvs2 = np.zeros((4, 2, 3))
----> 2 J_pinvs2[:, :, :] = np.transpose(J[:, :, :]) # np.linalg.inv(J[:, :, :] # J[:, :, :].transpose())
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 4 is different from 3)
Is there a way to do this with slicing so that I don't need to use an iterator? I'm having trouble finding anything online. Any help/suggestions would be appretiated!
I think you need to specify how to transpose a 3-D array:
np.linalg.inv(a # a.transpose(0,2,1))
will work. As oppose to
# sample data
a = np.arange(24).reshape(-1,2,4)
# (3, 2, 4)
# (4, 2, 3)
a # a.transpose()
will not work.
Finally, the whole script should be:
a.transpose(0,2,1) # np.linalg.inv(a # a.transpose(0,2,1))

Prove Convolution is Equivariant with respect to translation

I was reading the following statement about how convolution is equivariant with respect to translation from the Deep Learning Book.
Let g be a function mapping one image function to another image
function, such that I'=g(I) is the image function with I'(x, y)
=I(x−1, y). This shifts every pixel ofIone unit to the right. If we apply this transformation to I, then apply convolution, the result will
be the same as if we applied convolution to I', then applied the
transformation g to the output.
For the last line I bolded, they are applying convolution to I', but shouldn't this be I? I' is the translated image. Otherwise it would effectively be saying:
f(g(I)) = g( f(g(I)) )
where f is convolution & g is translation.
I am trying to execute the same myself in python using 3D kernel equal to the depth of the image as would be the case in the convolution layer for a colored image, a house.
Here is my code for applying a translation & then convolution to an image.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Translation
translated = 100
new_I = np.zeros( (I.shape[0]-translated, I.shape[1], I.shape[2]) )
for i in range(translated, I.shape[0]):
for j in range(I.shape[1]):
for l in range(I.shape[2]):
new_I[i-translated,j,l] = I[i,j,l]
## Convolution
conv = np.zeros( (int((new_I.shape[0]-3)/2), int((new_I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(new_I[2*i:2*i+3, 2*j:2*j+3, :], k)
scipy.misc.imsave('pics/convoled_image_2nd.png', conv)
I get the following output:
Now, I switch the convolution and Translation steps:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Convolution
conv = np.zeros( (int((I.shape[0]-3)/2), int((I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(I[2*i:2*i+3, 2*j:2*j+3, :], k)
## Translation
translated = 100
new_I = np.zeros( (conv.shape[0]-translated, conv.shape[1]) )
for i in range(translated, conv.shape[0]):
for j in range(conv.shape[1]):
new_I[i-translated,j] = conv[i,j]
scipy.misc.imsave('pics/conv_trans_image.png', new_I)
And now I get the following output:
Shouldn't they be the same according the book? What am I doing wrong?
Just as the book says, the linearity properties of convolution and translation guarantee that their order is interchangable, excepting boundary effects.
For instance:
import numpy as np
from scipy import misc, ndimage, signal
def translate(img, dx):
img_t = np.zeros_like(img)
if dx == 0: img_t[:, :] = img[:, :]
elif dx > 0: img_t[:, dx:] = img[:, :-dx]
else: img_t[:, :dx] = img[:, -dx:]
return img_t
def convolution(img, k):
return np.sum([signal.convolve2d(img[:, :, c], k[:, :, c])
for c in range(img.shape[2])], axis=0)
img = ndimage.imread('house.jpg')
k = np.array([
[[ 0, 1, -1], [1, -1, 0], [ 0, 0, 0]],
[[-1, 0, -1], [1, -1, 0], [ 1, 0, 0]],
[[ 1, -1, 0], [1, 0, 1], [-1, 0, 1]]])
ct = translate(convolution(img, k), 100)
tc = convolution(translate(img, 100), k)
misc.imsave('conv_then_trans.png', ct)
misc.imsave('trans_then_conv.png', tc)
if np.all(ct[2:-2, 2:-2] == tc[2:-2, 2:-2]):
The problem is that you're overtranslating in the second example. After you shrink the image 2x, try translating by 50 instead.
