Related
I am trying to perform a 2d convolution in python using numpy
I have a 2d array as follows with kernel H_r for the rows and H_c for the columns
data = np.zeros((nr, nc), dtype=np.float32)
#fill array with some data here then convolve
for r in range(nr):
data[r,:] = np.convolve(data[r,:], H_r, 'same')
for c in range(nc):
data[:,c] = np.convolve(data[:,c], H_c, 'same')
data = data.astype(np.uint8);
It does not produce the output that I was expecting, does this code look OK, I think the problem is with the casting from float32 to 8bit. Whats the best way to do this
Thanks
Maybe it is not the most optimized solution, but this is an implementation I used before with numpy library for Python:
def convolution2d(image, kernel, bias):
m, n = kernel.shape
if (m == n):
y, x = image.shape
y = y - m + 1
x = x - m + 1
new_image = np.zeros((y,x))
for i in range(y):
for j in range(x):
new_image[i][j] = np.sum(image[i:i+m, j:j+m]*kernel) + bias
return new_image
I hope this code helps other guys with the same doubt.
Regards.
Edit [Jan 2019]
#Tashus comment bellow is correct, and #dudemeister's answer is thus probably more on the mark. The function he suggested is also more efficient, by avoiding a direct 2D convolution and the number of operations that would entail.
Possible Problem
I believe you are doing two 1d convolutions, the first per columns and the second per rows, and replacing the results from the first with the results of the second.
Notice that numpy.convolve with the 'same' argument returns an array of equal shape to the largest one provided, so when you make the first convolution you already populated the entire data array.
One good way to visualize your arrays during these steps is to use Hinton diagrams, so you can check which elements already have a value.
Possible Solution
You can try to add the results of the two convolutions (use data[:,c] += .. instead of data[:,c] = on the second for loop), if your convolution matrix is the result of using the one dimensional H_r and H_c matrices like so:
Another way to do that would be to use scipy.signal.convolve2d with a 2d convolution array, which is probably what you wanted to do in the first place.
Since you already have your kernel separated you should simply use the sepfir2d function from scipy:
from scipy.signal import sepfir2d
convolved = sepfir2d(data, H_r, H_c)
On the other hand, the code you have there looks all right ...
I checked out many implementations and found none for my purpose, which should be really simple. So here is a dead-simple implementation with for loop
def convolution2d(image, kernel, stride, padding):
image = np.pad(image, [(padding, padding), (padding, padding)], mode='constant', constant_values=0)
kernel_height, kernel_width = kernel.shape
padded_height, padded_width = image.shape
output_height = (padded_height - kernel_height) // stride + 1
output_width = (padded_width - kernel_width) // stride + 1
new_image = np.zeros((output_height, output_width)).astype(np.float32)
for y in range(0, output_height):
for x in range(0, output_width):
new_image[y][x] = np.sum(image[y * stride:y * stride + kernel_height, x * stride:x * stride + kernel_width] * kernel).astype(np.float32)
return new_image
It might not be the most optimized solution either, but it is approximately ten times faster than the one proposed by #omotto and it only uses basic numpy function (as reshape, expand_dims, tile...) and no 'for' loops:
def gen_idx_conv1d(in_size, ker_size):
"""
Generates a list of indices. This indices correspond to the indices
of a 1D input tensor on which we would like to apply a 1D convolution.
For instance, with a 1D input array of size 5 and a kernel of size 3, the
1D convolution product will successively looks at elements of indices [0,1,2],
[1,2,3] and [2,3,4] in the input array. In this case, the function idx_conv1d(5,3)
outputs the following array: array([0,1,2,1,2,3,2,3,4]).
args:
in_size: (type: int) size of the input 1d array.
ker_size: (type: int) kernel size.
return:
idx_list: (type: np.array) list of the successive indices of the 1D input array
access to the 1D convolution algorithm.
example:
>>> gen_idx_conv1d(in_size=5, ker_size=3)
array([0, 1, 2, 1, 2, 3, 2, 3, 4])
"""
f = lambda dim1, dim2, axis: np.reshape(np.tile(np.expand_dims(np.arange(dim1),axis),dim2),-1)
out_size = in_size-ker_size+1
return f(ker_size, out_size, 0)+f(out_size, ker_size, 1)
def repeat_idx_2d(idx_list, nbof_rep, axis):
"""
Repeats an array of indices (idx_list) a number of time (nbof_rep) "along" an axis
(axis). This function helps to browse through a 2d array of size
(len(idx_list),nbof_rep).
args:
idx_list: (type: np.array or list) a 1D array of indices.
nbof_rep: (type: int) number of repetition.
axis: (type: int) axis "along" which the repetition will be applied.
return
idx_list: (type: np.array) a 1D array of indices of size len(idx_list)*nbof_rep.
example:
>>> a = np.array([0, 1, 2])
>>> repeat_idx_2d(a, 3, 0) # repeats array 'a' 3 times along 'axis' 0
array([0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> repeat_idx_2d(a, 3, 1) # repeats array 'a' 3 times along 'axis' 1
array([0, 1, 2, 0, 1, 2, 0, 1, 2])
>>> b = np.reshape(np.arange(3*4), (3,4))
>>> b[repeat_idx_2d(np.arange(3), 4, 0), repeat_idx_2d(np.arange(4), 3, 1)]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
"""
assert axis in [0,1], "Axis should be equal to 0 or 1."
tile_axis = (nbof_rep,1) if axis else (1,nbof_rep)
return np.reshape(np.tile(np.expand_dims(idx_list, 1),tile_axis),-1)
def conv2d(im, ker):
"""
Performs a 'valid' 2D convolution on an image. The input image may be
a 2D or a 3D array.
The output image first two dimensions will be reduced depending on the
convolution size.
The kernel may be a 2D or 3D array. If 2D, it will be applied on every
channel of the input image. If 3D, its last dimension must match the
image one.
args:
im: (type: np.array) image (2D or 3D).
ker: (type: np.array) convolution kernel (2D or 3D).
returns:
im: (type: np.array) convolved image.
example:
>>> im = np.reshape(np.arange(10*10*3),(10,10,3))/(10*10*3) # 3D image
>>> ker = np.array([[0,1,0],[-1,0,1],[0,-1,0]]) # 2D kernel
>>> conv2d(im, ker) # 3D array of shape (8,8,3)
"""
if len(im.shape)==2: # if the image is a 2D array, it is reshaped by expanding the last dimension
im = np.expand_dims(im,-1)
im_x, im_y, im_w = im.shape
if len(ker.shape)==2: # if the kernel is a 2D array, it is reshaped so it will be applied to all of the image channels
ker = np.tile(np.expand_dims(ker,-1),[1,1,im_w]) # the same kernel will be applied to all of the channels
assert ker.shape[-1]==im.shape[-1], "Kernel and image last dimension must match."
ker_x = ker.shape[0]
ker_y = ker.shape[1]
# shape of the output image
out_x = im_x - ker_x + 1
out_y = im_y - ker_y + 1
# reshapes the image to (out_x, ker_x, out_y, ker_y, im_w)
idx_list_x = gen_idx_conv1d(im_x, ker_x) # computes the indices of a 1D conv (cf. idx_conv1d doc)
idx_list_y = gen_idx_conv1d(im_y, ker_y)
idx_reshaped_x = repeat_idx_2d(idx_list_x, len(idx_list_y), 0) # repeats the previous indices to be used in 2D (cf. repeat_idx_2d doc)
idx_reshaped_y = repeat_idx_2d(idx_list_y, len(idx_list_x), 1)
im_reshaped = np.reshape(im[idx_reshaped_x, idx_reshaped_y, :], [out_x, ker_x, out_y, ker_y, im_w]) # reshapes
# reshapes the 2D kernel
ker = np.reshape(ker,[1, ker_x, 1, ker_y, im_w])
# applies the kernel to the image and reduces the dimension back to the one of original input image
return np.squeeze(np.sum(im_reshaped*ker, axis=(1,3)))
I tried to add a lot of comments to explain the method but the global idea is to reshape the 3D input image to a 5D one of shape (output_image_height, kernel_height, output_image_width, kernel_width, output_image_channel) and then to apply the kernel directly using the basic array multiplication. Of course, this methods is then using more memory (during the execution the size of the image is thus multiply by kernel_height*kernel_width) but it is faster.
To do this reshape step, I 'over-used' the indexing methods of numpy arrays, especially, the possibility of giving a numpy array as indices into a numpy array.
This methods could also be used to re-code the 2D convolution product in Pytorch or Tensorflow using the base math functions but I have no doubt in saying that it will be slower than the existing nn.conv2d operator...
I really enjoyed coding this method by only using the numpy basic tools.
One of the most obvious is to hard code the kernel.
img = img.convert('L')
a = np.array(img)
out = np.zeros([a.shape[0]-2, a.shape[1]-2], dtype='float')
out += a[:-2, :-2]
out += a[1:-1, :-2]
out += a[2:, :-2]
out += a[:-2, 1:-1]
out += a[1:-1,1:-1]
out += a[2:, 1:-1]
out += a[:-2, 2:]
out += a[1:-1, 2:]
out += a[2:, 2:]
out /= 9.0
out = out.astype('uint8')
img = Image.fromarray(out)
This example does a box blur 3x3 completely unrolled. You can multiply the values where you have a different value and divide them by a different amount. But, if you honestly want the quickest and dirtiest method this is it. I think it beats Guillaume Mougeot's method by a factor of like 5. His method beating the others by a factor of 10.
It may lose a few steps if you're doing something like a gaussian blur. and need to multiply some stuff.
Try to first round and then cast to uint8:
data = data.round().astype(np.uint8);
I wrote this convolve_stride which uses numpy.lib.stride_tricks.as_strided. Moreover it supports both strides and dilation. It is also compatible to tensor with order > 2.
import numpy as np
from numpy.lib.stride_tricks import as_strided
from im2col import im2col
def conv_view(X, F_s, dr, std):
X_s = np.array(X.shape)
F_s = np.array(F_s)
dr = np.array(dr)
Fd_s = (F_s - 1) * dr + 1
if np.any(Fd_s > X_s):
raise ValueError('(Dilated) filter size must be smaller than X')
std = np.array(std)
X_ss = np.array(X.strides)
Xn_s = (X_s - Fd_s) // std + 1
Xv_s = np.append(Xn_s, F_s)
Xv_ss = np.tile(X_ss, 2) * np.append(std, dr)
return as_strided(X, Xv_s, Xv_ss, writeable=False)
def convolve_stride(X, F, dr=None, std=None):
if dr is None:
dr = np.ones(X.ndim, dtype=int)
if std is None:
std = np.ones(X.ndim, dtype=int)
if not (X.ndim == F.ndim == len(dr) == len(std)):
raise ValueError('X.ndim, F.ndim, len(dr), len(std) must be the same')
Xv = conv_view(X, F.shape, dr, std)
return np.tensordot(Xv, F, axes=X.ndim)
%timeit -n 100 -r 10 convolve_stride(A, F)
#31.2 ms ± 1.31 ms per loop (mean ± std. dev. of 10 runs, 100 loops each)
Super simple and fast convolution using only basic numpy:
import numpy as np
def conv2d(image, kernel):
# apply kernel to image, return image of the same shape
# assume both image and kernel are 2D arrays
# kernel = np.flipud(np.fliplr(kernel)) # optionally flip the kernel
k = kernel.shape[0]
width = k//2
# place the image inside a frame to compensate for the kernel overlap
a = framed(image, width)
b = np.zeros(image.shape) # fill the output array with zeros; do not use np.empty()
# shift the image around each pixel, multiply by the corresponding kernel value and accumulate the results
for p, dp, r, dr in [(i, i + image.shape[0], j, j + image.shape[1]) for i in range(k) for j in range(k)]:
b += a[p:dp, r:dr] * kernel[p, r]
# or just write two nested for loops if you prefer
# np.clip(b, 0, 255, out=b) # optionally clip values exceeding the limits
return b
def framed(image, width):
a = np.zeros((image.shape[0]+2*width, image.shape[1]+2*width))
a[width:-width, width:-width] = image
# alternatively fill the frame with ones or copy border pixels
return a
Run it:
Image.fromarray(conv2d(image, kernel).astype('uint8'))
Instead of sliding the kernel along the image and computing the transformation pixel by pixel, create a series of shifted versions of the image corresponding to each element in the kernel and apply the corresponding kernel value to each of the shifted image versions.
This is probably the fastest you can get using just basic numpy; the speed is already comparable to C implementation of scipy convolve2d and better than fftconvolve. The idea is similar to #Tatarize. This example works only for one color component; for RGB just repeat for each (or modify the algorithm accordingly).
Typically, Convolution 2D is a misnomer. Ideally, under the hood,
whats being done is a correlation of 2 matrices.
pad == same
returns the output as the same as input dimension
It can also take asymmetric images. In order to perform correlation(convolution in deep learning lingo) on a batch of 2d matrices, one can iterate over all the channels, calculate the correlation for each of the channel slices with the respective filter slice.
For example: If image is (28,28,3) and filter size is (5,5,3) then take each of the 3 slices from the image channel and perform the cross correlation using the custom function above and stack the resulting matrix in the respective dimension of the output.
def get_cross_corr_2d(W, X, pad = 'valid'):
if(pad == 'same'):
pr = int((W.shape[0] - 1)/2)
pc = int((W.shape[1] - 1)/2)
conv_2d = np.zeros((X.shape[0], X.shape[1]))
X_pad = np.zeros((X.shape[0] + 2*pr, X.shape[1] + 2*pc))
X_pad[pr:pr+X.shape[0], pc:pc+X.shape[1]] = X
for r in range(conv_2d.shape[0]):
for c in range(conv_2d.shape[1]):
conv_2d[r,c] = np.sum(np.inner(W, X_pad[r:r+W.shape[0], c:c+W.shape[1]]))
return conv_2d
else:
pr = W.shape[0] - 1
pc = W.shape[1] - 1
conv_2d = np.zeros((X.shape[0] - W.shape[0] + 2*pr + 1,
X.shape[1] - W.shape[1] + 2*pc + 1))
X_pad = np.zeros((X.shape[0] + 2*pr, X.shape[1] + 2*pc))
X_pad[pr:pr+X.shape[0], pc:pc+X.shape[1]] = X
for r in range(conv_2d.shape[0]):
for c in range(conv_2d.shape[1]):
conv_2d[r,c] = np.sum(np.multiply(W, X_pad[r:r+W.shape[0], c:c+W.shape[1]]))
return conv_2d
This code incorrect:
for r in range(nr):
data[r,:] = np.convolve(data[r,:], H_r, 'same')
for c in range(nc):
data[:,c] = np.convolve(data[:,c], H_c, 'same')
See Nussbaumer transformation from multidimentional convolution to one dimentional.
I'm working with multi-temporal, multispectral satellite imagery. The data are stored as geotiffs in the shape ((n_bands x n_timesteps), height, width). I need to reshape the array for ML model training where each pixel in the image is a "sample". The training array would therefore be shape (n_samples x n_timesteps x n_bands).
Assume the following array and associated variables. Also assume the number of bands in the dataset is 12 and the number if time steps is 4. Therefore, the first dimension of the array is 12*4 = 48.
x = np.random.rand(48, 512, 512)
width = x.shape[1]
height = x.shape[2]
n_samples = width * height
n_bands = 12
n_timesteps = x.shape[0] / n_bands
What is the best way to reshape the array x to x_reshape such that x_reshape.shape returns:
(n_samples, n_timesteps, n_bands)
Making sure to preserve the correct ordering of the data such that a slice of x_reshape[0] is a single "sample" of the dataset of shape (n_timesteps x n_features)?
You can use numpy.reshape() and after that numpy.moveaxis() to achieve what you want:
import numpy as np
x = np.random.rand(48, 512, 512)
width = x.shape[1]
height = x.shape[2]
n_samples = width * height
n_bands = 12
n_timesteps = int(x.shape[0] / n_bands)
x = np.reshape(x, (n_timesteps,n_bands,n_samples))
x = np.moveaxis(x, -1, 0)
I want to apply various filters like GLCM or Gabor filter bank as a custom layer in Tensorflow, but I could not find enough custom layer samples. How can I apply these type of filters as a layer?
The process of generating GLCM is defined in the scikit-image library as follows:
from skimage.feature import greycomatrix, greycoprops
from skimage import data
#load image
img = data.brick()
#result glcm
glcm = greycomatrix(img, distances=[5], angles=[0], levels=256, symmetric=True, normed=True)
The use of Gabor filter bank is as follows:
import matplotlib.pyplot as plt
import numpy as np
from scipy import ndimage as ndi
from skimage import data
from skimage.util import img_as_float
from skimage.filters import gabor_kernel
shrink = (slice(0, None, 3), slice(0, None, 3))
brick = img_as_float(data.brick())[shrink]
grass = img_as_float(data.grass())[shrink]
gravel = img_as_float(data.gravel())[shrink]
image_names = ('brick', 'grass', 'gravel')
images = (brick, grass, gravel)
def power(image, kernel):
# Normalize images for better comparison.
image = (image - image.mean()) / image.std()
return np.sqrt(ndi.convolve(image, np.real(kernel), mode='wrap')**2 +
ndi.convolve(image, np.imag(kernel), mode='wrap')**2)
# Plot a selection of the filter bank kernels and their responses.
results = []
kernel_params = []
for theta in (0, 1):
theta = theta / 4. * np.pi
for sigmax in (1, 3):
for sigmay in (1, 3):
for frequency in (0.1, 0.4):
kernel = gabor_kernel(frequency, theta=theta,sigma_x=sigmax, sigma_y=sigmay)
params = 'theta=%d,f=%.2f\nsx=%.2f sy=%.2f' % (theta * 180 / np.pi, frequency,sigmax, sigmay)
kernel_params.append(params)
# Save kernel and the power image for each image
results.append((kernel, [power(img, kernel) for img in images]))
fig, axes = plt.subplots(nrows=6, ncols=4, figsize=(5, 6))
plt.gray()
fig.suptitle('Image responses for Gabor filter kernels', fontsize=12)
axes[0][0].axis('off')
# Plot original images
for label, img, ax in zip(image_names, images, axes[0][1:]):
ax.imshow(img)
ax.set_title(label, fontsize=9)
ax.axis('off')
for label, (kernel, powers), ax_row in zip(kernel_params, results, axes[1:]):
# Plot Gabor kernel
ax = ax_row[0]
ax.imshow(np.real(kernel))
ax.set_ylabel(label, fontsize=7)
ax.set_xticks([])
ax.set_yticks([])
# Plot Gabor responses with the contrast normalized for each filter
vmin = np.min(powers)
vmax = np.max(powers)
for patch, ax in zip(powers, ax_row[1:]):
ax.imshow(patch, vmin=vmin, vmax=vmax)
ax.axis('off')
plt.show()
How do I define these and similar filters in tensorflow.
I tried above code but it didnt gave the same results like : https://scikit-image.org/docs/dev/auto_examples/features_detection/plot_gabor.html
I got this:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow.keras.backend as K
from tensorflow.keras import Input, layers
from tensorflow.keras.models import Model
from scipy import ndimage as ndi
from skimage import data
from skimage.util import img_as_float
from skimage.filters import gabor_kernel
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
def gfb_filter(shape,size=3, tlist=[1,2,3], slist=[2,5],flist=[0.01,0.25],dtype=None):
print(shape)
fsize=np.ones([size,size])
kernels = []
for theta in tlist:
theta = theta / 4. * np.pi
for sigma in slist:
for frequency in flist:
kernel = np.real(gabor_kernel(frequency, theta=theta,sigma_x=sigma, sigma_y=sigma))
kernels.append(kernel)
gfblist = []
for k, kernel in enumerate(kernels):
ck=ndi.convolve(fsize, kernel, mode='wrap')
gfblist.append(ck)
gfblist=np.asarray(gfblist).reshape(size,size,1,len(gfblist))
print(gfblist.shape)
return K.variable(gfblist, dtype='float32')
dimg=img_as_float(data.brick())
input_mat = dimg.reshape((1, 512, 512, 1))
def build_model():
input_tensor = Input(shape=(512,512,1))
x = layers.Conv2D(filters=12,
kernel_size = 3,
kernel_initializer=gfb_filter,
strides=1,
padding='valid') (input_tensor)
model = Model(inputs=input_tensor, outputs=x)
return model
model = build_model()
out = model.predict(input_mat)
print(out)
o1=out.reshape(12,510,510)
plt.subplot(2,2,1)
plt.imshow(dimg)
plt.subplot(2,2,2)
plt.imshow(o1[0,:,:])
plt.subplot(2,2,3)
plt.imshow(o1[6,:,:])
plt.subplot(2,2,4)
plt.imshow(o1[10,:,:])
You can read the documentation about writing a custom layer, and about Making new Layers and Models via subclassing
Here is a simple implementation of the Gabor filter bank based on your code:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
from skimage.filters import gabor_kernel
class GaborFilterBank(layers.Layer):
def __init__(self):
super().__init__()
def build(self, input_shape):
# assumption: shape is NHWC
self.n_channel = input_shape[-1]
self.kernels = []
for theta in range(4):
theta = theta / 4.0 * np.pi
for sigma in (1, 3):
for frequency in (0.05, 0.25):
kernel = np.real(
gabor_kernel(
frequency, theta=theta, sigma_x=sigma, sigma_y=sigma
)
).astype(np.float32)
# tf.nn.conv2d does crosscorrelation, not convolution, so flipping
# the kernel is needed
kernel = np.flip(kernel)
# we stack the kernel on itself to match the number of channel of
# the input
kernel = np.stack((kernel,)*self.n_channel, axis=-1)
# print(kernel.shape)
# adding the number of out channel, here 1.
kernel = kernel[:, :, : , np.newaxis]
# because the kernel shapes are different, we can't do the conv op
# in one go, so we stack the kernels in a list
self.kernels.append(tf.Variable(kernel, trainable=False))
def call(self, x):
out_list = []
for kernel in self.kernels:
out_list.append(tf.nn.conv2d(x, kernel, strides=1, padding="SAME"))
# output is [batch_size, H, W, 16] where 16 is the number of filters
# 16 = n_theta * n_sigma * n_freq = 4 * 2 * 2
return tf.concat(out_list,axis=-1)
There is some differences though:
tensorflow does not have a "wrap" mode for convolution. I used "SAME" which is akin to "constant" with a padding value of 0 inscipy. Its possible to provide your own padding, so it is definitely possible to mimic the "wrap" mode, I let that as an exercise to the reader.
tf.nn.conv2d expect a 4D input, so I add a batch dimension and a channel dimension to the img as an input.
the filters for tf.nn.conv2d must follow the shape [filter_height, filter_width, in_channels, out_channels]. In that case, I use the number of channel of the input as in_channels. out_channels could be equal to the number of filters in the filter bank, but because their shape is not constant, it is easier to concatenate them afterwards, so I set it to 1. It means that the output of the layer is [N,H,W,C] where C is the number of filters in the bank (in your example, 16).
tf.nn.conv2d is not a real convolution, but a cross-correlation (see the doc), so flipping the filters before hand is needed to get an actual convolution.
I'm adding a quick example on how to use it:
# defining the model
inp = tf.keras.Input(shape=(512,512,1))
conv = tf.keras.layers.Conv2D(4, (3,3), padding="SAME")(inp)
g = GaborFilterBank()(conv)
model = tf.keras.Model(inputs=inp, outputs=g)
# calling the model with an example Image
img = img_as_float(data.brick())
img_nhwc = img[np.newaxis, :, :, np.newaxis]
out = model(img_nhwc)
# out shape is [1,512,512,16]
I implemented FFT-based convolution in Pytorch and compared the result with spatial convolution via conv2d() function. The convolution filter used is an average filter. The conv2d() function produced smoothened output due to average filtering as expected but the fft-based convolution returned a more blurry output.
I have attached the code and outputs here -
spatial convolution -
from PIL import Image, ImageOps
import torch
from matplotlib import pyplot as plt
from torchvision.transforms import ToTensor
import torch.nn.functional as F
import numpy as np
im = Image.open("/kaggle/input/tiger.jpg")
im = im.resize((256,256))
gray_im = im.convert('L')
gray_im = ToTensor()(gray_im)
gray_im = gray_im.squeeze()
fil = torch.tensor([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
conv_gray_im = gray_im.unsqueeze(0).unsqueeze(0)
conv_fil = fil.unsqueeze(0).unsqueeze(0)
conv_op = F.conv2d(conv_gray_im,conv_fil)
conv_op = conv_op.squeeze()
plt.figure()
plt.imshow(conv_op, cmap='gray')
FFT-based convolution -
def fftshift(image):
sh = image.shape
x = np.arange(0, sh[2], 1)
y = np.arange(0, sh[3], 1)
xm, ym = np.meshgrid(x,y)
shifter = (-1)**(xm + ym)
shifter = torch.from_numpy(shifter)
return image*shifter
shift_im = fftshift(conv_gray_im)
padded_fil = F.pad(conv_fil, (0, gray_im.shape[0]-fil.shape[0], 0, gray_im.shape[1]-fil.shape[1]))
shift_fil = fftshift(padded_fil)
fft_shift_im = torch.rfft(shift_im, 2, onesided=False)
fft_shift_fil = torch.rfft(shift_fil, 2, onesided=False)
shift_prod = fft_shift_im*fft_shift_fil
shift_fft_conv = fftshift(torch.irfft(shift_prod, 2, onesided=False))
fft_op = shift_fft_conv.squeeze()
plt.figure('shifted fft')
plt.imshow(fft_op, cmap='gray')
original image -
spatial convolution output -
fft-based convolution output -
Could someone kindly explain the issue?
The main problem with your code is that Torch doesn't do complex numbers, the output of its FFT is a 3D array, with the 3rd dimension having two values, one for the real component and one for the imaginary. Consequently, the multiplication does not do a complex multiplication.
There currently is no complex multiplication defined in Torch (see this issue), we'll have to define our own.
A minor issue, but also important if you want to compare the two convolution operations, is the following:
The FFT takes the origin of its input in the first element (top-left pixel for an image). To avoid a shifted output, you need to generate a padded kernel where the origin of the kernel is the top-left pixel. This is quite tricky, actually...
Your current code:
fil = torch.tensor([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
conv_fil = fil.unsqueeze(0).unsqueeze(0)
padded_fil = F.pad(conv_fil, (0, gray_im.shape[0]-fil.shape[0], 0, gray_im.shape[1]-fil.shape[1]))
generates a padded kernel where the origin is in pixel (1,1), rather than (0,0). It needs to be shifted by one pixel in each direction. NumPy has a function roll that is useful for this, I don't know the Torch equivalent (I'm not at all familiar with Torch). This should work:
fil = torch.tensor([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
padded_fil = fil.unsqueeze(0).unsqueeze(0).numpy()
padded_fil = np.pad(padded_fil, ((0, gray_im.shape[0]-fil.shape[0]), (0, gray_im.shape[1]-fil.shape[1])))
padded_fil = np.roll(padded_fil, -1, axis=(0, 1))
padded_fil = torch.from_numpy(padded_fil)
Finally, your fftshift function, applied to the spatial-domain image, causes the frequency-domain image (the result of the FFT applied to the image) to be shifted such that the origin is in the middle of the image, rather than the top-left. This shift is useful when looking at the output of the FFT, but is pointless when computing the convolution.
Putting these things together, the convolution is now:
def complex_multiplication(t1, t2):
real1, imag1 = t1[:,:,0], t1[:,:,1]
real2, imag2 = t2[:,:,0], t2[:,:,1]
return torch.stack([real1 * real2 - imag1 * imag2, real1 * imag2 + imag1 * real2], dim = -1)
fft_im = torch.rfft(gray_im, 2, onesided=False)
fft_fil = torch.rfft(padded_fil, 2, onesided=False)
fft_conv = torch.irfft(complex_multiplication(fft_im, fft_fil), 2, onesided=False)
Note that you can do one-sided FFTs to save a bit of computation time:
fft_im = torch.rfft(gray_im, 2, onesided=True)
fft_fil = torch.rfft(padded_fil, 2, onesided=True)
fft_conv = torch.irfft(complex_multiplication(fft_im, fft_fil), 2, onesided=True, signal_sizes=gray_im.shape)
Here the frequency domain is about half the size as in the full FFT, but it is only redundant parts that are left out. The result of the convolution is unchanged.
I am trying to perform a 2d convolution in python using numpy
I have a 2d array as follows with kernel H_r for the rows and H_c for the columns
data = np.zeros((nr, nc), dtype=np.float32)
#fill array with some data here then convolve
for r in range(nr):
data[r,:] = np.convolve(data[r,:], H_r, 'same')
for c in range(nc):
data[:,c] = np.convolve(data[:,c], H_c, 'same')
data = data.astype(np.uint8);
It does not produce the output that I was expecting, does this code look OK, I think the problem is with the casting from float32 to 8bit. Whats the best way to do this
Thanks
Maybe it is not the most optimized solution, but this is an implementation I used before with numpy library for Python:
def convolution2d(image, kernel, bias):
m, n = kernel.shape
if (m == n):
y, x = image.shape
y = y - m + 1
x = x - m + 1
new_image = np.zeros((y,x))
for i in range(y):
for j in range(x):
new_image[i][j] = np.sum(image[i:i+m, j:j+m]*kernel) + bias
return new_image
I hope this code helps other guys with the same doubt.
Regards.
Edit [Jan 2019]
#Tashus comment bellow is correct, and #dudemeister's answer is thus probably more on the mark. The function he suggested is also more efficient, by avoiding a direct 2D convolution and the number of operations that would entail.
Possible Problem
I believe you are doing two 1d convolutions, the first per columns and the second per rows, and replacing the results from the first with the results of the second.
Notice that numpy.convolve with the 'same' argument returns an array of equal shape to the largest one provided, so when you make the first convolution you already populated the entire data array.
One good way to visualize your arrays during these steps is to use Hinton diagrams, so you can check which elements already have a value.
Possible Solution
You can try to add the results of the two convolutions (use data[:,c] += .. instead of data[:,c] = on the second for loop), if your convolution matrix is the result of using the one dimensional H_r and H_c matrices like so:
Another way to do that would be to use scipy.signal.convolve2d with a 2d convolution array, which is probably what you wanted to do in the first place.
Since you already have your kernel separated you should simply use the sepfir2d function from scipy:
from scipy.signal import sepfir2d
convolved = sepfir2d(data, H_r, H_c)
On the other hand, the code you have there looks all right ...
I checked out many implementations and found none for my purpose, which should be really simple. So here is a dead-simple implementation with for loop
def convolution2d(image, kernel, stride, padding):
image = np.pad(image, [(padding, padding), (padding, padding)], mode='constant', constant_values=0)
kernel_height, kernel_width = kernel.shape
padded_height, padded_width = image.shape
output_height = (padded_height - kernel_height) // stride + 1
output_width = (padded_width - kernel_width) // stride + 1
new_image = np.zeros((output_height, output_width)).astype(np.float32)
for y in range(0, output_height):
for x in range(0, output_width):
new_image[y][x] = np.sum(image[y * stride:y * stride + kernel_height, x * stride:x * stride + kernel_width] * kernel).astype(np.float32)
return new_image
It might not be the most optimized solution either, but it is approximately ten times faster than the one proposed by #omotto and it only uses basic numpy function (as reshape, expand_dims, tile...) and no 'for' loops:
def gen_idx_conv1d(in_size, ker_size):
"""
Generates a list of indices. This indices correspond to the indices
of a 1D input tensor on which we would like to apply a 1D convolution.
For instance, with a 1D input array of size 5 and a kernel of size 3, the
1D convolution product will successively looks at elements of indices [0,1,2],
[1,2,3] and [2,3,4] in the input array. In this case, the function idx_conv1d(5,3)
outputs the following array: array([0,1,2,1,2,3,2,3,4]).
args:
in_size: (type: int) size of the input 1d array.
ker_size: (type: int) kernel size.
return:
idx_list: (type: np.array) list of the successive indices of the 1D input array
access to the 1D convolution algorithm.
example:
>>> gen_idx_conv1d(in_size=5, ker_size=3)
array([0, 1, 2, 1, 2, 3, 2, 3, 4])
"""
f = lambda dim1, dim2, axis: np.reshape(np.tile(np.expand_dims(np.arange(dim1),axis),dim2),-1)
out_size = in_size-ker_size+1
return f(ker_size, out_size, 0)+f(out_size, ker_size, 1)
def repeat_idx_2d(idx_list, nbof_rep, axis):
"""
Repeats an array of indices (idx_list) a number of time (nbof_rep) "along" an axis
(axis). This function helps to browse through a 2d array of size
(len(idx_list),nbof_rep).
args:
idx_list: (type: np.array or list) a 1D array of indices.
nbof_rep: (type: int) number of repetition.
axis: (type: int) axis "along" which the repetition will be applied.
return
idx_list: (type: np.array) a 1D array of indices of size len(idx_list)*nbof_rep.
example:
>>> a = np.array([0, 1, 2])
>>> repeat_idx_2d(a, 3, 0) # repeats array 'a' 3 times along 'axis' 0
array([0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> repeat_idx_2d(a, 3, 1) # repeats array 'a' 3 times along 'axis' 1
array([0, 1, 2, 0, 1, 2, 0, 1, 2])
>>> b = np.reshape(np.arange(3*4), (3,4))
>>> b[repeat_idx_2d(np.arange(3), 4, 0), repeat_idx_2d(np.arange(4), 3, 1)]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
"""
assert axis in [0,1], "Axis should be equal to 0 or 1."
tile_axis = (nbof_rep,1) if axis else (1,nbof_rep)
return np.reshape(np.tile(np.expand_dims(idx_list, 1),tile_axis),-1)
def conv2d(im, ker):
"""
Performs a 'valid' 2D convolution on an image. The input image may be
a 2D or a 3D array.
The output image first two dimensions will be reduced depending on the
convolution size.
The kernel may be a 2D or 3D array. If 2D, it will be applied on every
channel of the input image. If 3D, its last dimension must match the
image one.
args:
im: (type: np.array) image (2D or 3D).
ker: (type: np.array) convolution kernel (2D or 3D).
returns:
im: (type: np.array) convolved image.
example:
>>> im = np.reshape(np.arange(10*10*3),(10,10,3))/(10*10*3) # 3D image
>>> ker = np.array([[0,1,0],[-1,0,1],[0,-1,0]]) # 2D kernel
>>> conv2d(im, ker) # 3D array of shape (8,8,3)
"""
if len(im.shape)==2: # if the image is a 2D array, it is reshaped by expanding the last dimension
im = np.expand_dims(im,-1)
im_x, im_y, im_w = im.shape
if len(ker.shape)==2: # if the kernel is a 2D array, it is reshaped so it will be applied to all of the image channels
ker = np.tile(np.expand_dims(ker,-1),[1,1,im_w]) # the same kernel will be applied to all of the channels
assert ker.shape[-1]==im.shape[-1], "Kernel and image last dimension must match."
ker_x = ker.shape[0]
ker_y = ker.shape[1]
# shape of the output image
out_x = im_x - ker_x + 1
out_y = im_y - ker_y + 1
# reshapes the image to (out_x, ker_x, out_y, ker_y, im_w)
idx_list_x = gen_idx_conv1d(im_x, ker_x) # computes the indices of a 1D conv (cf. idx_conv1d doc)
idx_list_y = gen_idx_conv1d(im_y, ker_y)
idx_reshaped_x = repeat_idx_2d(idx_list_x, len(idx_list_y), 0) # repeats the previous indices to be used in 2D (cf. repeat_idx_2d doc)
idx_reshaped_y = repeat_idx_2d(idx_list_y, len(idx_list_x), 1)
im_reshaped = np.reshape(im[idx_reshaped_x, idx_reshaped_y, :], [out_x, ker_x, out_y, ker_y, im_w]) # reshapes
# reshapes the 2D kernel
ker = np.reshape(ker,[1, ker_x, 1, ker_y, im_w])
# applies the kernel to the image and reduces the dimension back to the one of original input image
return np.squeeze(np.sum(im_reshaped*ker, axis=(1,3)))
I tried to add a lot of comments to explain the method but the global idea is to reshape the 3D input image to a 5D one of shape (output_image_height, kernel_height, output_image_width, kernel_width, output_image_channel) and then to apply the kernel directly using the basic array multiplication. Of course, this methods is then using more memory (during the execution the size of the image is thus multiply by kernel_height*kernel_width) but it is faster.
To do this reshape step, I 'over-used' the indexing methods of numpy arrays, especially, the possibility of giving a numpy array as indices into a numpy array.
This methods could also be used to re-code the 2D convolution product in Pytorch or Tensorflow using the base math functions but I have no doubt in saying that it will be slower than the existing nn.conv2d operator...
I really enjoyed coding this method by only using the numpy basic tools.
One of the most obvious is to hard code the kernel.
img = img.convert('L')
a = np.array(img)
out = np.zeros([a.shape[0]-2, a.shape[1]-2], dtype='float')
out += a[:-2, :-2]
out += a[1:-1, :-2]
out += a[2:, :-2]
out += a[:-2, 1:-1]
out += a[1:-1,1:-1]
out += a[2:, 1:-1]
out += a[:-2, 2:]
out += a[1:-1, 2:]
out += a[2:, 2:]
out /= 9.0
out = out.astype('uint8')
img = Image.fromarray(out)
This example does a box blur 3x3 completely unrolled. You can multiply the values where you have a different value and divide them by a different amount. But, if you honestly want the quickest and dirtiest method this is it. I think it beats Guillaume Mougeot's method by a factor of like 5. His method beating the others by a factor of 10.
It may lose a few steps if you're doing something like a gaussian blur. and need to multiply some stuff.
Try to first round and then cast to uint8:
data = data.round().astype(np.uint8);
I wrote this convolve_stride which uses numpy.lib.stride_tricks.as_strided. Moreover it supports both strides and dilation. It is also compatible to tensor with order > 2.
import numpy as np
from numpy.lib.stride_tricks import as_strided
from im2col import im2col
def conv_view(X, F_s, dr, std):
X_s = np.array(X.shape)
F_s = np.array(F_s)
dr = np.array(dr)
Fd_s = (F_s - 1) * dr + 1
if np.any(Fd_s > X_s):
raise ValueError('(Dilated) filter size must be smaller than X')
std = np.array(std)
X_ss = np.array(X.strides)
Xn_s = (X_s - Fd_s) // std + 1
Xv_s = np.append(Xn_s, F_s)
Xv_ss = np.tile(X_ss, 2) * np.append(std, dr)
return as_strided(X, Xv_s, Xv_ss, writeable=False)
def convolve_stride(X, F, dr=None, std=None):
if dr is None:
dr = np.ones(X.ndim, dtype=int)
if std is None:
std = np.ones(X.ndim, dtype=int)
if not (X.ndim == F.ndim == len(dr) == len(std)):
raise ValueError('X.ndim, F.ndim, len(dr), len(std) must be the same')
Xv = conv_view(X, F.shape, dr, std)
return np.tensordot(Xv, F, axes=X.ndim)
%timeit -n 100 -r 10 convolve_stride(A, F)
#31.2 ms ± 1.31 ms per loop (mean ± std. dev. of 10 runs, 100 loops each)
Super simple and fast convolution using only basic numpy:
import numpy as np
def conv2d(image, kernel):
# apply kernel to image, return image of the same shape
# assume both image and kernel are 2D arrays
# kernel = np.flipud(np.fliplr(kernel)) # optionally flip the kernel
k = kernel.shape[0]
width = k//2
# place the image inside a frame to compensate for the kernel overlap
a = framed(image, width)
b = np.zeros(image.shape) # fill the output array with zeros; do not use np.empty()
# shift the image around each pixel, multiply by the corresponding kernel value and accumulate the results
for p, dp, r, dr in [(i, i + image.shape[0], j, j + image.shape[1]) for i in range(k) for j in range(k)]:
b += a[p:dp, r:dr] * kernel[p, r]
# or just write two nested for loops if you prefer
# np.clip(b, 0, 255, out=b) # optionally clip values exceeding the limits
return b
def framed(image, width):
a = np.zeros((image.shape[0]+2*width, image.shape[1]+2*width))
a[width:-width, width:-width] = image
# alternatively fill the frame with ones or copy border pixels
return a
Run it:
Image.fromarray(conv2d(image, kernel).astype('uint8'))
Instead of sliding the kernel along the image and computing the transformation pixel by pixel, create a series of shifted versions of the image corresponding to each element in the kernel and apply the corresponding kernel value to each of the shifted image versions.
This is probably the fastest you can get using just basic numpy; the speed is already comparable to C implementation of scipy convolve2d and better than fftconvolve. The idea is similar to #Tatarize. This example works only for one color component; for RGB just repeat for each (or modify the algorithm accordingly).
Typically, Convolution 2D is a misnomer. Ideally, under the hood,
whats being done is a correlation of 2 matrices.
pad == same
returns the output as the same as input dimension
It can also take asymmetric images. In order to perform correlation(convolution in deep learning lingo) on a batch of 2d matrices, one can iterate over all the channels, calculate the correlation for each of the channel slices with the respective filter slice.
For example: If image is (28,28,3) and filter size is (5,5,3) then take each of the 3 slices from the image channel and perform the cross correlation using the custom function above and stack the resulting matrix in the respective dimension of the output.
def get_cross_corr_2d(W, X, pad = 'valid'):
if(pad == 'same'):
pr = int((W.shape[0] - 1)/2)
pc = int((W.shape[1] - 1)/2)
conv_2d = np.zeros((X.shape[0], X.shape[1]))
X_pad = np.zeros((X.shape[0] + 2*pr, X.shape[1] + 2*pc))
X_pad[pr:pr+X.shape[0], pc:pc+X.shape[1]] = X
for r in range(conv_2d.shape[0]):
for c in range(conv_2d.shape[1]):
conv_2d[r,c] = np.sum(np.inner(W, X_pad[r:r+W.shape[0], c:c+W.shape[1]]))
return conv_2d
else:
pr = W.shape[0] - 1
pc = W.shape[1] - 1
conv_2d = np.zeros((X.shape[0] - W.shape[0] + 2*pr + 1,
X.shape[1] - W.shape[1] + 2*pc + 1))
X_pad = np.zeros((X.shape[0] + 2*pr, X.shape[1] + 2*pc))
X_pad[pr:pr+X.shape[0], pc:pc+X.shape[1]] = X
for r in range(conv_2d.shape[0]):
for c in range(conv_2d.shape[1]):
conv_2d[r,c] = np.sum(np.multiply(W, X_pad[r:r+W.shape[0], c:c+W.shape[1]]))
return conv_2d
This code incorrect:
for r in range(nr):
data[r,:] = np.convolve(data[r,:], H_r, 'same')
for c in range(nc):
data[:,c] = np.convolve(data[:,c], H_c, 'same')
See Nussbaumer transformation from multidimentional convolution to one dimentional.