Related
Given a tensor with shape (b,c,h,w), I want to extract edges of the spatial data, that is, calculate x, y direction derivatives of the (h,w) and calculate the magnitude I=sqrt(|x_amplitude|^2+|y_amplitude|^2)
My current implementation is as followed
row_mat = np.asarray([[0, 0, 0], [1, 0, -1], [0, 0, 0]])
col_mat = row_mat.T
row_mat = row_mat[None, None, :, :] # expand dim to convolve with tensor (batch,channel,width,height)
col_mat = col_mat[None, None, :, :] # expand dim to convolve with tensor (batch,channel,width,height)
def derivative(batch: torch.Tensor) -> torch.Tensor:
"""
uses convolution to perform x and y derivatives
:param batch: input tensor batch
:return: image derivative magnitudes
"""
x_amplitude = ndimage.convolve(batch, row_mat)
y_amplitude = ndimage.convolve(batch, col_mat)
magnitude = np.sqrt(np.abs(x_amplitude) ** 2 + np.abs(y_amplitude) ** 2)
return torch.tensor(magnitude)
I was wondering if there's a faster way, as this approach actually convolves using the definition of a derivative, so there might be downsides to that.
PS. to test this you can use the tensor torch.randn(1000,128,28,28), as these are the dimension I'm dealing with
For this specific operation you might be able to speed things up a bit by doing it "manually":
import torch.nn.functional as nnf
def derivative(batch: torch.Tensor) -> torch.Tensor:
# pad batch
x = nnf.pad(batch, (1, 1, 1, 1), mode='reflect')
dx2 = (x[..., 1:-2, :-2] - x[..., 1:-2, 2:])**2
dy2 = (x[..., :-2, 1:-2] - x[..., 2:, 1:-2])**2
mag = torch.sqrt(dx2 + dy2)
return mag
I have a 3D array (1883,100,68) as (batch,step,features).
The 68 features are totally different features such as energy and mfcc.
I wish to normalize the features respective to their own type.
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.reshape(X_train.shape[0], -1)).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(X_test.shape[0], -1)).reshape(X_test.shape)
print(X_train.shape)
print(max(X_train[0][0]))
print(min(X_train[0][0]))
Apparently, turning it into a 2D array won't work because each feature is normalized with respect to all the 6800 features. This caused multiple features from all the 100 steps to become zeros.
What I am looking for example, feature[0] is energy. For a batch, there are 100 energy values due to the 100 steps. I wish these 100 energy values normalized within themselves.
So the normalization should be carried out among [1,1,0],[1,2,0],[1,3,0]...[1,100,0]. Same for all other features.
How should I approach it?
Update:
The following codes were produced with help from sai.
def feature_normalization(x):
batches_unrolled = np.expand_dims(np.reshape(x, (-1, x.shape[2])), axis=0)
x_normalized = (x - np.mean(batches_unrolled, axis=1, keepdims=True)) / np.std(batches_unrolled, axis=1, keepdims=True)
np.testing.assert_allclose(x_normalized[0, :, 0], (x[0, :, 0] - np.mean(x[:, :, 0])) / np.std(x[:, :, 0]))
return x_normalized
def testset_normalization(X_train,X_test):
batches_unrolled = np.expand_dims(np.reshape(X_train, (-1, x.shape[2])), axis=0)
fitted_mean = np.mean(batches_unrolled, axis=1, keepdims=True)
fitted_std = np.std(batches_unrolled, axis=1, keepdims=True)
X_test_normalized = (X_test - fitted_mean) / fitted_std
return X_test_normalized
To normalize features independently across all the samples in a batch-
Unroll batch samples to get [10(time steps)*batch_size] x [40 features] matrix
Get the mean and standard deviations of every feature
Perform an element wise normalization on the actual batched samples
import numpy as np
x = np.random.random((20, 10, 40))
batches_unrolled = np.expand_dims(np.reshape(x, (-1, 40)), axis=0)
x_normalized = (x - np.mean(batches_unrolled, axis=1, keepdims=True)) / np.std(batches_unrolled, axis=1, keepdims=True)
np.testing.assert_allclose(x_normalized[0, :, 0], (x[0, :, 0] - np.mean(x[:, :, 0])) / np.std(x[:, :, 0]))
I'm implementing a 2d periodic convolution on a synthetic image in three different ways: using scipy, using torch and using the Fourier transform (also under torch framework).
However, I've got different results. Performing the operation by hand I can see that scipy's convolution yields the correct results. torch's spatial version, on the other hand, yields the expected result inverted. Finally, the Fourier version returns something unexpected.
The code is the following:
import torch
import numpy as np
import scipy.signal as sig
import torch.nn.functional as F
import matplotlib.pyplot as plt
def numpy_periodic_conv(f, k):
H, W = f.shape
periodic_f = np.hstack([f, f])
periodic_f = np.vstack([periodic_f, periodic_f])
conv = sig.convolve2d(periodic_f, k, mode='same')
conv = conv[H // 2:-H // 2, W // 2:-W // 2]
return periodic_f, conv
def torch_periodic_conv(f, k):
H, W = f.shape[-2:]
periodic_f = f.repeat(1, 1, 2, 2)
conv = F.conv2d(periodic_f, k, padding=1)
conv = conv[:, :, H // 2:-H // 2, W // 2:-W // 2]
return periodic_f.squeeze().numpy(), conv.squeeze().numpy()
def torch_fourier_conv(f, k):
pad_x = f.shape[-2] - k.shape[-2]
pad_y = f.shape[-1] - k.shape[-1]
expanded_kernel = F.pad(k, [0, pad_x, 0, pad_y])
fft_x = torch.rfft(f, 2, onesided=False)
fft_kernel = torch.rfft(expanded_kernel, 2, onesided=False)
real = fft_x[:, :, :, :, 0] * fft_kernel[:, :, :, :, 0] - \
fft_x[:, :, :, :, 1] * fft_kernel[:, :, :, :, 1]
im = fft_x[:, :, :, :, 0] * fft_kernel[:, :, :, :, 1] + \
fft_x[:, :, :, :, 1] * fft_kernel[:, :, :, :, 0]
fft_conv = torch.stack([real, im], -1) # (a+bj)*(c+dj) = (ac-bd)+(ad+bc)j
ifft_conv = torch.irfft(fft_conv, 2, onesided=False)
return expanded_kernel.squeeze().numpy(), ifft_conv.squeeze().numpy()
if __name__ == '__main__':
f = np.concatenate([np.ones((10, 5)), np.zeros((10, 5))], 1)
k = np.array([[1, 0, -1], [2, 0, -2], [1, 0, -1]])
f_tensor = torch.from_numpy(f).unsqueeze(0).unsqueeze(0).float()
k_tensor = torch.from_numpy(k).unsqueeze(0).unsqueeze(0).float()
np_periodic_f, np_periodic_conv = numpy_periodic_conv(f, k)
tc_periodic_f, tc_periodic_conv = torch_periodic_conv(f_tensor, k_tensor)
tc_fourier_k, tc_fourier_conv = torch_fourier_conv(f_tensor, k_tensor)
print('Spatial numpy conv shape= ', np_periodic_conv.shape)
print('Spatial torch conv shape= ', tc_periodic_conv.shape)
print('Fourier torch conv shape= ', tc_fourier_conv.shape)
r_np = dict(name='numpy', im=np_periodic_f, k=k, conv=np_periodic_conv)
r_torch = dict(name='torch', im=tc_periodic_f, k=k, conv=tc_periodic_conv)
r_fourier = dict(name='fourier', im=f, k=tc_fourier_k, conv=tc_fourier_conv)
titles = ['{} im', '{} kernel', '{} conv']
results = [r_np, r_torch, r_fourier]
fig, axs = plt.subplots(3, 3)
for i, r_dict in enumerate(results):
axs[i, 0].imshow(r_dict['im'], cmap='gray')
axs[i, 0].set_title(titles[0].format(r_dict['name']))
axs[i, 1].imshow(r_dict['k'], cmap='gray')
axs[i, 1].set_title(titles[1].format(r_dict['name']))
axs[i, 2].imshow(r_dict['conv'], cmap='gray')
axs[i, 2].set_title(titles[2].format(r_dict['name']))
plt.show()
The results I'm obtaining:
Note: The image for both numpyand torch versions show the periodic image, which is required to perform the periodic convolution. The kernel for the Fourier version shows the original kernel zero-padded to the image size, which is required to compute the element-wise multiplication in the frequency domain.
-Edit1: There was an error when in the multiplication in the Fourier version, I was doing (ac-bd)+(ad-bc)j instead of (ac-bd)+(ad+bc)j. But now, I get the convolution shifted by one column.
-Edit2: torch's spatial convolution results is inverted because the operation is actually a cross-correlation. This was confirmed in the pytorch's official forum here. Furthermore, after fixing the kernel padding as Cris Luengo's answer, the frequency method yielded the same results as the correlations. This is rather strange for me because, as far as I know, the frequency property hold for convolution, not correlation.
New-results after fixing the kernel:
The FFT result is wrong because the padding is wrong. When padding, you need to put the origin (center of the kernel) at the top-left corner of the image. See this other answer for details.
The difference between the other two is the difference between a convolution and a correlation. It looks like the “numpy“ result is a convolution, the “torch” one a correlation.
Or: Scattering different phases of multi-channel images in tensorflow...
My question is as follows:
I have "images", all of the same dimensions, which in some sense correspond to different phases of a target image. And I'd like to rebuild that full-blown image with tf functionality.
This turns out to be much less simple than I originally expected and I'd be very grateful for any help!
A more detailed exposition follows:
In numpy, one easily interleaves images via simple assignment -
import numpy as np
im = np.random.random((1, 8, 8, 2))
phased_im_01 = im[:, ::2, 1::2, :]
phased_im_00 = im[:, ::2, ::2, :]
phased_im_10 = im[:, 1::2, ::2, :]
phased_im_11 = im[:, 1::2, 1::2, :]
rebuild_im = np.zeros((1, 8, 8, 2))
rebuild_im[:, ::2, ::2, :] = phased_im_00
rebuild_im[:, ::2, 1::2, :] = phased_im_01
rebuild_im[:, 1::2, ::2, :] = phased_im_10
rebuild_im[:, 1::2, 1::2, :] = phased_im_11
print(np.all(rebuild_im == im))
But as known, assignment is a no-go in tf, and one usually uses things like tf.concat coupled with tf.reshape (for very simple cases) or tf.scatter_nd (for more complicated cases). I was unsuccessful in implementing the equivalent of the above numpy-functionality using any of the many things I tried (like permuting the tensor to have the width dimension first, trying to scatter_nd, and permuting back, a method I've successfully used before for other problems), or any solution on SO (like stacking and reshaping oneself to death).
Just to be clear, my actual use-case has an unknown batch-size, thousands of channels, and 4 phases in each image dimension. But I just need a working solution for the simple toy example above; generalization is on me ;-)
Thanks to any helpers out there, (and sorry I can only describe my efforts and not show them. They're just a mess of unsuccessful mistakes degrading into horrible trial-and-error code snippets until giving up and coming here for some help anyway, so no major loss).
Clarifications can be added on demand.
To reproduce the numpy example in TensorFlow, please try depth_to_space:
import tensorflow as tf
im = tf.random_normal((1, 8, 8, 2))
phased_im_01 = im[:, ::2, 1::2, :]
phased_im_00 = im[:, ::2, ::2, :]
phased_im_10 = im[:, 1::2, ::2, :]
phased_im_11 = im[:, 1::2, 1::2, :]
phases = tf.concat(
(phased_im_00, phased_im_01, phased_im_10, phased_im_11), axis=3)
rebuild_im = tf.nn.depth_to_space(phases, block_size=2, data_format='NHWC')
dif = tf.reduce_sum(rebuild_im - im) # 0.0
As kindly suggested by ShlomiF, the more general example is:
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()
num_of_channels = 20
h = w = 256
num_of_phases = 4
im = np.random.random((1, h, w, num_of_channels))
phase_ims = []
for i in range(num_of_phases):
for j in range(num_of_phases):
phase_ims.append(im[:, i::num_of_phases, j::num_of_phases, :])
all_phases = tf.concat(phase_ims, axis=3)
rebuild_im = tf.depth_to_space(all_phases, block_size=num_of_phases, data_format='NHWC')
diff = tf.reduce_sum(rebuild_im - im)
print(np.asarray(diff)) # --> 0.0
As far as I know, the idea of depth_to_space, or periodic shuffling, came from this paper. You may find more details and visualization there.
I was reading the following statement about how convolution is equivariant with respect to translation from the Deep Learning Book.
Let g be a function mapping one image function to another image
function, such that I'=g(I) is the image function with I'(x, y)
=I(x−1, y). This shifts every pixel ofIone unit to the right. If we apply this transformation to I, then apply convolution, the result will
be the same as if we applied convolution to I', then applied the
transformation g to the output.
For the last line I bolded, they are applying convolution to I', but shouldn't this be I? I' is the translated image. Otherwise it would effectively be saying:
f(g(I)) = g( f(g(I)) )
where f is convolution & g is translation.
I am trying to execute the same myself in python using 3D kernel equal to the depth of the image as would be the case in the convolution layer for a colored image, a house.
Here is my code for applying a translation & then convolution to an image.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Translation
translated = 100
new_I = np.zeros( (I.shape[0]-translated, I.shape[1], I.shape[2]) )
for i in range(translated, I.shape[0]):
for j in range(I.shape[1]):
for l in range(I.shape[2]):
new_I[i-translated,j,l] = I[i,j,l]
## Convolution
conv = np.zeros( (int((new_I.shape[0]-3)/2), int((new_I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(new_I[2*i:2*i+3, 2*j:2*j+3, :], k)
scipy.misc.imsave('pics/convoled_image_2nd.png', conv)
I get the following output:
Now, I switch the convolution and Translation steps:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Convolution
conv = np.zeros( (int((I.shape[0]-3)/2), int((I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(I[2*i:2*i+3, 2*j:2*j+3, :], k)
## Translation
translated = 100
new_I = np.zeros( (conv.shape[0]-translated, conv.shape[1]) )
for i in range(translated, conv.shape[0]):
for j in range(conv.shape[1]):
new_I[i-translated,j] = conv[i,j]
scipy.misc.imsave('pics/conv_trans_image.png', new_I)
And now I get the following output:
Shouldn't they be the same according the book? What am I doing wrong?
Just as the book says, the linearity properties of convolution and translation guarantee that their order is interchangable, excepting boundary effects.
For instance:
import numpy as np
from scipy import misc, ndimage, signal
def translate(img, dx):
img_t = np.zeros_like(img)
if dx == 0: img_t[:, :] = img[:, :]
elif dx > 0: img_t[:, dx:] = img[:, :-dx]
else: img_t[:, :dx] = img[:, -dx:]
return img_t
def convolution(img, k):
return np.sum([signal.convolve2d(img[:, :, c], k[:, :, c])
for c in range(img.shape[2])], axis=0)
img = ndimage.imread('house.jpg')
k = np.array([
[[ 0, 1, -1], [1, -1, 0], [ 0, 0, 0]],
[[-1, 0, -1], [1, -1, 0], [ 1, 0, 0]],
[[ 1, -1, 0], [1, 0, 1], [-1, 0, 1]]])
ct = translate(convolution(img, k), 100)
tc = convolution(translate(img, 100), k)
misc.imsave('conv_then_trans.png', ct)
misc.imsave('trans_then_conv.png', tc)
if np.all(ct[2:-2, 2:-2] == tc[2:-2, 2:-2]):
print('Equal!')
Prints:
Equal!
The problem is that you're overtranslating in the second example. After you shrink the image 2x, try translating by 50 instead.