Convolutional layer in Python using Numpy - python

I am trying to implement a convolutional layer in Python using Numpy.
The input is a 4-dimensional array of shape [N, H, W, C], where:
N: Batch size
H: Height of image
W: Width of image
C: Number of channels
The convolutional filter is also a 4-dimensional array of shape [F, F, Cin, Cout], where
F: Height and width of a square filter
Cin: Number of input channels (Cin = C)
Cout: Number of output channels
Assuming a stride of one along all axes, and no padding, the output should be a 4-dimensional array of shape [N, H - F + 1, W - F + 1, Cout].
My code is as follows:
import numpy as np
def conv2d(image, filter):
# Height and width of output image
Hout = image.shape[1] - filter.shape[0] + 1
Wout = image.shape[2] - filter.shape[1] + 1
output = np.zeros([image.shape[0], Hout, Wout, filter.shape[3]])
for n in range(output.shape[0]):
for i in range(output.shape[1]):
for j in range(output.shape[2]):
for cout in range(output.shape[3]):
output[n,i,j,cout] = np.multiply(image[n, i:i+filter.shape[0], j:j+filter.shape[1], :], filter[:,:,:,cout]).sum()
return output
This works perfectly, but uses four for loops and is extremely slow. Is there a better way of implementing a convolutional layer that takes 4-dimensional input and filter, and returns a 4-dimensional output, using Numpy?

This a straightforward implementation of this kind of keras-like (?) convolution. It might be hard to understand for beginners because it uses a lot of broadcasting and stride tricks.
from numpy.lib.stride_tricks import as_strided
def conv2d(a, b):
a = as_strided(a,(len(a),a.shape[1]-len(b)+1,a.shape[2]-b.shape[1]+1,len(b),b.shape[1],a.shape[3]),a.strides[:3]+a.strides[1:])
return np.einsum('abcijk,ijkd', a, b[::-1,::-1])
BTW: if you are doing convolution with very-big kernel, use Fourier-based algorithm instead.
EDIT: The [::-1,::-1] should be removed in the case that convolution does not involve flipping the kernel first (like what's in tensorflow).
EDIT: np.tensordot(a, b, axes=3) performs much better than np.einsum("abcijk,ijkd", a, b), and is highly recommended.
So, the function becomes:
from numpy.lib.stride_tricks import as_strided
def conv2d(a, b):
Hout = a.shape[1] - b.shape[0] + 1
Wout = a.shape[2] - b.shape[1] + 1
a = as_strided(a, (a.shape[0], Hout, Wout, b.shape[0], b.shape[1], a.shape[3]), a.strides[:3] + a.strides[1:])
return np.tensordot(a, b, axes=3)

Related

Image Convolution with callback function in python

I want to loop over the pixels of a binary image in python and set the value of a pixel depending on a surrounding neighborhood of pixels. Similar to convolution but I want create a method that sets the value of the center pixel using a custom function rather than normal convolution that sets the center pixel to the arithmetic mean of the neighborhood.
In essence I would like to create a function that does the following:
def convolve(img, conv_function = lambda subImg: np.mean(subImg)):
newImage = emptyImage
for nxn_window in img:
newImage[center_pixel] = conv_function(nxn_window)
return newImage
At the moment I have a solution but it is very slow:
#B is the structuing array or convolution window/kernel
def convolve(func):
def wrapper(img, B):
#get dimensions of img
length, width = len(img), len(img[0])
#half width and length of dimensions
hw = (int)((len(B) - 1) / 2)
hh = (int)((len(B[0]) - 1) / 2)
#convert to npArray for fast operations
B = np.array(B)
#initialize empty return image
retVal = np.zeros([length, width])
#start loop over the values where the convolution window has a neighborhood
for row in range(hh, length - hh):
for pixel in range(hw, width - hw):
#window as subarray of pixels
window = [arr[pixel-hh:pixel+hh+1]
for arr in img[row-hw:row+hw+1]]
retVal[row][pixel] = func(window, B)
return retVal
return wrapper
with this function as a decorator I then do
# dilation
#convolve
def __add__(img, B):
return np.mean(np.logical_and(img, B)) > 0
# erosion
#convolve
def __sub__(img, B):
return np.mean(np.logical_and(img, B)) == 1
Is there a library that provides this type of function or is there a better way I can loop over the image?
Here's an idea: assign each pixel an array with its neighborhood and then simply apply your custom function to the extended image. It'll be fast BUT will consume more memory ( times more memory; if your B.shape is (3, 3) then you'll need 9 times more memory). Try this:
import numpy as np
def convolve2(func):
def conv(image, kernel):
""" Apply given filter on an image """
k = kernel.shape[0] # which is assumed equal to kernel.shape[1]
width = k//2 # note that width == 1 for k == 3 but also width == 1 for k == 2
a = framed(image, width) # create a frame around an image to compensate for kernel overlap when shifting
b = np.empty(image.shape + kernel.shape) # add two more dimensions for each pixel's neighbourhood
di, dj = image.shape[:2] # will be used as delta for slicing
# add the neighbourhood ('kernel size') to each pixel in preparation for the final step
# in other words: slide the image along the kernel rather than sliding the kernel along the image
for i in range(k):
for j in range(k):
b[..., i, j] = a[i:i+di, j:j+dj]
# apply the desired function
return func(b, kernel)
return conv
def framed(image, width):
a = np.zeros(np.array(image.shape) + [2 * width, 2 * width]) # only add the frame to the first two dimensions
a[width:-width, width:-width] = image # place the image centered inside the frame
return a
I've used a greyscale image 512x512 pixels and a filter 3x3 for testing:
embossing_kernel = np.array([
[-2, -1, 0],
[-1, 1, 1],
[0, 1, 2]
])
#convolve2
def filter2(img, B):
return np.sum(img * B, axis=(2,3))
#convolve2
def __add2__(img, B):
return np.mean(np.logical_and(img, B), axis=(2,3)) > 0
# image_gray is a 2D grayscale image (not color/RGB)
b = filter2(image_gray, embossing_kernel)
To compare with your convolve I've used:
#convolve
def filter(img, B):
return np.sum(img * B)
#convolve
def __add__(img, B):
return np.mean(np.logical_and(img, B)) > 0
b = filter2(image_gray, embossing_kernel)
The time for convolve was 4.3 s, for convolve2 0.05 s on my machine.
In my case the custom function needs to specify the axes over which to operate, i.e., the additional dimensions holding the neighborhood data. Perhaps the axes could be avoided too but I haven't tried.
Note: this works for 2D images (grayscale) (as you asked about binary images) but can be easily extended to 3D (color) images. In your case you could probably get rid of the frame (or fill it with zeros or ones e.g., in case of repeated application of the function).
In case memory was an issue you might want to adapt a fast implementation of convolve I've posted here: https://stackoverflow.com/a/74288118/20188124.

multiplying a vector (1 x N) by a tensor (N x M x M)

I am looking for a matrix operation in numpy or preferably in pytorch that allows one to multiply a vector (1 x N) by a tensor (N x M x M) and get (1 x M x M). This is easily accomplished using a for loop, but the for loop does not allow back propagation during training. I tried using matmul in numpy and pytorch (and several others such as dot and bmm), but could not get any to work. Here is an example (where M=2, but is 256 in my use case) of what I am trying to do:
a = np.array([1,2,3])
b = np.array([[[1,2],[3,4]],[[5,6],[7,8]],[[9,10],[11,12]]])
I would like to perform the operation: 1*[[1,2],[3,4]] + 2*[[5,6],[7,8]] + 3*[[9,10],[11,12]], which can be achieved with a for loop like this:
for i in range(3):
matrix_sum += a[i]*b[i]
Any advice or solution would be greatly appreciated.
You can use simple einsum:
#this gives you 2-D array (M,M)
np.einsum('i,ijk->jk',a,b)
output:
[[38 44]
[50 56]]
or another solution:
#this gives you 3-D array (1,M,M)
a[None,:]#b.swapaxes(0,1)
output:
[[[38 44]
[50 56]]]
Numpy and pytorch were built uppon matrix multiplications!
Torch example:
A = torch.rand(1, N)
B = torch.rand(N, M, M)
C = A # B.transpose(0, 1)
C.transpose_(0, 1)
C.shape
torch.size(1, M, M)
And similarly for numpy:
A = np.random.randn(1, N)
B = np.random.randn(N, M, M)
C = A # B.transpose(1, 0, 2)
C = C.transpose(1, 0, 2)
C.shape
(1, M, M)
Edit For the Einsum lovers:
Pytorch and numpy handle einsum pretty much in the same way:
torch.einsum('i,ijk->jk', A, B)
np.einsum('i,ijk->jk', A, B)
Pytorch einsum documentation: https://pytorch.org/docs/master/generated/torch.einsum.html
Numpy einsum documentation: https://numpy.org/doc/stable/reference/generated/numpy.einsum.html

Translating tensorflows conv2d to numpy/scipy operations?

This is the documentation for tf.nn.conv2d: Given an input tensor of shape [batch, in_height, in_width, in_channels] and a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels], this op performs the following
Flattens the filter to a 2-D matrix with shape [filter_height *
filter_width * in_channels,
Extracts image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height *
filter_width * in_channels].
For each patch, right-multiplies the filter matrix and the image patch vector.
In other words, it takes in a tensor of n images and does convolution with out_channel filters.
I am trying to translate to code that uses only numpy operations and the code is the following:
def my_conv2d(x, kernel):
nf = kernel.shape[-1] # number of filters
rf = kernel.shape[0] # filter size
w = kernel
s = 1 # stride
h_range = int((x.shape[2] - rf) / s) + 1 # (W - F + 2P) / S
w_range = int((x.shape[1] - rf) / s) + 1 # (W - F + 2P) / S
np_o = np.zeros((1, h_range, w_range, nf))
for i in range(x.shape[0]):
for z in range(nf):
for _h in range(h_range):
for _w in range(w_range):
np_o[0, _h, _w, z] = np.sum(x[i, _h * s:_h * s + rf, _w * s:_w * s
+ rf, * w[:, :, :, z])
return np_o
The problem is that code is extremely slow. Are there any numpy or scipy functions that can replicate what tensorflows' conv2d is doing that is of similar efficiency? I have looked at https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve2d.html and it does convolution ONCE, meaning I have to pass a 2d tensor alongside a 2d kernel (it does not do multiple filters).
None of the previous stackoverflow questions helped much with this.
Thanks
Edit: did some testing and my code is about 44000% slower than doing tf.nn.conv2d!
Things are slow for you because you are using loops. Implementing with vector operations will be much faster but not as efficient as the high-level APIs such as tf.nn.conv2d or tf.nn.convolution. This post should be able to help you with the vectorized implementation of the same in numpy : https://wiseodd.github.io/techblog/2016/07/16/convnet-conv-layer/

Compute the pairwise distance between each pair of the two collections of inputs in TensorFlow

I have two collections. One consists of m1 points in k dimensions and another one of m2 points in k dimensions. I need to calculate pairwise distance between each pair of the two collections.
Basically having two matrices Am1, k and Bm2, k I need to get a matrix Cm1, m2.
I can easily do this in scipy by using distance.sdist and select one of many distance metrics, and I also can do this in TF in a loop, but I can't figure out how to do this with matrix manipulations even for Eucledian distance.
After a few hours I finally found how to do this in Tensorflow. My solution works only for Eucledian distance and is pretty verbose. I also do not have a mathematical proof (just a lot of handwaving, which I hope to make more rigorous):
import tensorflow as tf
import numpy as np
from scipy.spatial.distance import cdist
M1, M2, K = 3, 4, 2
# Scipy calculation
a = np.random.rand(M1, K).astype(np.float32)
b = np.random.rand(M2, K).astype(np.float32)
print cdist(a, b, 'euclidean'), '\n'
# TF calculation
A = tf.Variable(a)
B = tf.Variable(b)
p1 = tf.matmul(
tf.expand_dims(tf.reduce_sum(tf.square(A), 1), 1),
tf.ones(shape=(1, M2))
)
p2 = tf.transpose(tf.matmul(
tf.reshape(tf.reduce_sum(tf.square(B), 1), shape=[-1, 1]),
tf.ones(shape=(M1, 1)),
transpose_b=True
))
res = tf.sqrt(tf.add(p1, p2) - 2 * tf.matmul(A, B, transpose_b=True))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print sess.run(res)
This will do it for tensors of arbitrary dimensionality (i.e. containing (..., N, d) vectors). Note that it isn't between collections (i.e. not like scipy.spatial.distance.cdist) it's instead within a single batch of vectors (i.e. like scipy.spatial.distance.pdist)
import tensorflow as tf
import string
def pdist(arr):
"""Pairwise Euclidean distances between vectors contained at the back of tensors.
Uses expansion: (x - y)^T (x - y) = x^Tx - 2x^Ty + y^Ty
:param arr: (..., N, d) tensor
:returns: (..., N, N) tensor of pairwise distances between vectors in the second-to-last dim.
:rtype: tf.Tensor
"""
shape = tuple(arr.get_shape().as_list())
rank_ = len(shape)
N, d = shape[-2:]
# Build a prefix from the array without the indices we'll use later.
pref = string.ascii_lowercase[:rank_ - 2]
# Outer product of points (..., N, N)
xxT = tf.einsum('{0}ni,{0}mi->{0}nm'.format(pref), arr, arr)
# Inner product of points. (..., N)
xTx = tf.einsum('{0}ni,{0}ni->{0}n'.format(pref), arr, arr)
# (..., N, N) inner products tiled.
xTx_tile = tf.tile(xTx[..., None], (1,) * (rank_ - 1) + (N,))
# Build the permuter. (sigh, no tf.swapaxes yet)
permute = list(range(rank_))
permute[-2], permute[-1] = permute[-1], permute[-2]
# dists = (x^Tx - 2x^Ty + y^Tx)^(1/2). Note the axis swapping is necessary to 'pair' x^Tx and y^Ty
return tf.sqrt(xTx_tile - 2 * xxT + tf.transpose(xTx_tile, permute))

Fast inner product of more than two matrices in python

I'm currently writting code where I need to compute as fast as possible a kind of inner product between three 2-D arrays.
Let's call them a,b,c. They all have the same size (N x M).
I want to compute the following 3-d array, op, of size (N x N x N), such that op[i, j, k] is the sum over m of the a[i, m] b[j, m] c[k, m]
(click here for the nice Latex formula)
This is basically an extend version of np.inner to 3 inputs rather than 2.
In practice, the dimensions I will run into are something like N = 100 and M = 300 000. The matrices are not going to be sparse at all, so op contains about 1 million nonzero values.
So far, I've attempted two methods.
The first one uses broadcasting:
import numpy as np
N = 100
M = 300000
a = np.random.randn(N, M)
b = np.random.randn(N, M)
c = np.random.randn(N, M)
def method1(a, b, c):
a_i = a[:, None, None, :]
b_j = b[None, :, None, :]
c_k = c[None, None, :, :]
return np.sum(a_i * b_j * c_k, axis=3)
The problem with this is that it first computes a_i * b_j * c_k which is an N x N x N x M array, so in my case it is simply too much to handle.
I've tried another method using np.einsum, and it is much faster than the previous method:
def method2(a, b, c):
return np.einsum('im,jm,km', a, b, c)
My problem is that it is still too slow. For N = 100 and M = 30 000, it already takes 95 seconds to run on my computer, so taking M to its actual value of 300 000 is impossible.
My question is: do you know any pythonic way to solve my problem (maybe a magic numpy function?), or do I have to resort to things like cython or numba to actually make this computation feasible?
Thanks in advance for any help!
Very interesting one and related to this other problem.
Approach #1: For decent size arrays
Based on the winning approach there at the above mentioned Q&A, here's one solution -
np.tensordot(a[:,None]*b,c,axes=(2,1))
Explanation :
1) a[:,None]*b : Get a 3D array of shape (N, N, M). So, for the use case, it would be (100, 100, 30000), which might be a bit too much for regular systems, but might just work out given some extra system memory juice.
2) np.tensordot(..): Next up, we would sum-reduce that last axis from previous step with tensor-dot against the third array c to have a (100, 100, 100) shaped output array.
Approach #2: For very large arrays and with b identical to c
out = np.zeros((N, N, N))
for i in range(N):
for j in range(N):
for k in range(j+1):
out[i,j,k] = np.einsum('i,i,i->',a[i],b[j],b[k])
r,c = np.triu_indices(N,1)
out[np.arange(N)[:,None], r,c] = out[np.arange(N)[:,None], c,r]

Categories