This is the documentation for tf.nn.conv2d: Given an input tensor of shape [batch, in_height, in_width, in_channels] and a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels], this op performs the following
Flattens the filter to a 2-D matrix with shape [filter_height *
filter_width * in_channels,
Extracts image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height *
filter_width * in_channels].
For each patch, right-multiplies the filter matrix and the image patch vector.
In other words, it takes in a tensor of n images and does convolution with out_channel filters.
I am trying to translate to code that uses only numpy operations and the code is the following:
def my_conv2d(x, kernel):
nf = kernel.shape[-1] # number of filters
rf = kernel.shape[0] # filter size
w = kernel
s = 1 # stride
h_range = int((x.shape[2] - rf) / s) + 1 # (W - F + 2P) / S
w_range = int((x.shape[1] - rf) / s) + 1 # (W - F + 2P) / S
np_o = np.zeros((1, h_range, w_range, nf))
for i in range(x.shape[0]):
for z in range(nf):
for _h in range(h_range):
for _w in range(w_range):
np_o[0, _h, _w, z] = np.sum(x[i, _h * s:_h * s + rf, _w * s:_w * s
+ rf, * w[:, :, :, z])
return np_o
The problem is that code is extremely slow. Are there any numpy or scipy functions that can replicate what tensorflows' conv2d is doing that is of similar efficiency? I have looked at https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve2d.html and it does convolution ONCE, meaning I have to pass a 2d tensor alongside a 2d kernel (it does not do multiple filters).
None of the previous stackoverflow questions helped much with this.
Thanks
Edit: did some testing and my code is about 44000% slower than doing tf.nn.conv2d!
Things are slow for you because you are using loops. Implementing with vector operations will be much faster but not as efficient as the high-level APIs such as tf.nn.conv2d or tf.nn.convolution. This post should be able to help you with the vectorized implementation of the same in numpy : https://wiseodd.github.io/techblog/2016/07/16/convnet-conv-layer/
Related
I am looking for a matrix operation in numpy or preferably in pytorch that allows one to multiply a vector (1 x N) by a tensor (N x M x M) and get (1 x M x M). This is easily accomplished using a for loop, but the for loop does not allow back propagation during training. I tried using matmul in numpy and pytorch (and several others such as dot and bmm), but could not get any to work. Here is an example (where M=2, but is 256 in my use case) of what I am trying to do:
a = np.array([1,2,3])
b = np.array([[[1,2],[3,4]],[[5,6],[7,8]],[[9,10],[11,12]]])
I would like to perform the operation: 1*[[1,2],[3,4]] + 2*[[5,6],[7,8]] + 3*[[9,10],[11,12]], which can be achieved with a for loop like this:
for i in range(3):
matrix_sum += a[i]*b[i]
Any advice or solution would be greatly appreciated.
You can use simple einsum:
#this gives you 2-D array (M,M)
np.einsum('i,ijk->jk',a,b)
output:
[[38 44]
[50 56]]
or another solution:
#this gives you 3-D array (1,M,M)
a[None,:]#b.swapaxes(0,1)
output:
[[[38 44]
[50 56]]]
Numpy and pytorch were built uppon matrix multiplications!
Torch example:
A = torch.rand(1, N)
B = torch.rand(N, M, M)
C = A # B.transpose(0, 1)
C.transpose_(0, 1)
C.shape
torch.size(1, M, M)
And similarly for numpy:
A = np.random.randn(1, N)
B = np.random.randn(N, M, M)
C = A # B.transpose(1, 0, 2)
C = C.transpose(1, 0, 2)
C.shape
(1, M, M)
Edit For the Einsum lovers:
Pytorch and numpy handle einsum pretty much in the same way:
torch.einsum('i,ijk->jk', A, B)
np.einsum('i,ijk->jk', A, B)
Pytorch einsum documentation: https://pytorch.org/docs/master/generated/torch.einsum.html
Numpy einsum documentation: https://numpy.org/doc/stable/reference/generated/numpy.einsum.html
I have this function that rotates the MNIST images. The function returns a pytorch Tensor. I am more familiar with Tensorflow and I want to convert the pytorch tensor to a numpy ndarray that I can use. Is there a function that will allow me to do that? I tried to modify the function a little bit by adding .numpy() after tensor(img.rotate(rotation)).view(784) and save it in an empty ndarray, but that didn't work. Parameter d is MNIST data saved in .pt (pytensor, I think). Thanks! (Would love to know if there is a tensorflow function that can rotate the data.)
t = 1
min_rot = 1.0 * t / 20 * (180 - 0) + \
0
max_rot = 1.0 * (t + 1) / 20 * \
(180 - 0) + 0
rot = random.random() * (max_rot - min_rot) + min_rot
rotate_dataset(x_tr, rot)
def rotate_dataset(d, rotation):
result = torch.FloatTensor(d.size(0), 784)
tensor = transforms.ToTensor()
for i in range(d.size(0)):
img = Image.fromarray(d[i].numpy(), mode='L')
result[i] = tensor(img.rotate(rotation)).view(784)
return result
How about not converting to tensor in the first place:
result[i] = np.array(img.rotate(rotation)).flatten()
I am trying to implement a convolutional layer in Python using Numpy.
The input is a 4-dimensional array of shape [N, H, W, C], where:
N: Batch size
H: Height of image
W: Width of image
C: Number of channels
The convolutional filter is also a 4-dimensional array of shape [F, F, Cin, Cout], where
F: Height and width of a square filter
Cin: Number of input channels (Cin = C)
Cout: Number of output channels
Assuming a stride of one along all axes, and no padding, the output should be a 4-dimensional array of shape [N, H - F + 1, W - F + 1, Cout].
My code is as follows:
import numpy as np
def conv2d(image, filter):
# Height and width of output image
Hout = image.shape[1] - filter.shape[0] + 1
Wout = image.shape[2] - filter.shape[1] + 1
output = np.zeros([image.shape[0], Hout, Wout, filter.shape[3]])
for n in range(output.shape[0]):
for i in range(output.shape[1]):
for j in range(output.shape[2]):
for cout in range(output.shape[3]):
output[n,i,j,cout] = np.multiply(image[n, i:i+filter.shape[0], j:j+filter.shape[1], :], filter[:,:,:,cout]).sum()
return output
This works perfectly, but uses four for loops and is extremely slow. Is there a better way of implementing a convolutional layer that takes 4-dimensional input and filter, and returns a 4-dimensional output, using Numpy?
This a straightforward implementation of this kind of keras-like (?) convolution. It might be hard to understand for beginners because it uses a lot of broadcasting and stride tricks.
from numpy.lib.stride_tricks import as_strided
def conv2d(a, b):
a = as_strided(a,(len(a),a.shape[1]-len(b)+1,a.shape[2]-b.shape[1]+1,len(b),b.shape[1],a.shape[3]),a.strides[:3]+a.strides[1:])
return np.einsum('abcijk,ijkd', a, b[::-1,::-1])
BTW: if you are doing convolution with very-big kernel, use Fourier-based algorithm instead.
EDIT: The [::-1,::-1] should be removed in the case that convolution does not involve flipping the kernel first (like what's in tensorflow).
EDIT: np.tensordot(a, b, axes=3) performs much better than np.einsum("abcijk,ijkd", a, b), and is highly recommended.
So, the function becomes:
from numpy.lib.stride_tricks import as_strided
def conv2d(a, b):
Hout = a.shape[1] - b.shape[0] + 1
Wout = a.shape[2] - b.shape[1] + 1
a = as_strided(a, (a.shape[0], Hout, Wout, b.shape[0], b.shape[1], a.shape[3]), a.strides[:3] + a.strides[1:])
return np.tensordot(a, b, axes=3)
from numpy import exp, array, random, dot, matrix, asarray
class NeuralNetwork():
def __init__(self):
random.seed(1)
self.synaptic_weights = 2 * random.random((3, 1)) - 1 # init weight from -1 to 1
def __sigmoid(self, x):
return 1 / (1 + exp(-x))
def __sigmoid_derivative(self, x):
return x * (1 - x)
def train(self, train_input, train_output, iter):
for i in range(iter):
output = self.think(train_input)
error = train_output - output
adjustment = dot(train_input.T, error * self.__sigmoid_derivative(output))
self.synaptic_weights += adjustment
def think(self, inputs):
return self.__sigmoid(dot(inputs, self.synaptic_weights))
neural_network = NeuralNetwork()
train = matrix([[0, 0, 1, 0],[1, 1, 1, 1],[1, 0, 1, 1],[0, 1, 1, 0]])
train_input = asarray(train[:, 0:3])
train_output = asarray(train[:,3])
neural_network.train(train_input, train_output, 10000)
This code is a basic neural network. It works well when I convert the training set using asarray, but It does not work matrix itself. It seems matrix cannot calculate the sigmoid_derivative, and terminal shows ValueError: shapes (4,1) and (4,1) not aligned: 1 (dim 1) != 4 (dim 0)
Why matrix does not work in the code?
The error is in the
x * (1 - x)
expression. x is (4,1) shape. With the array element by element multiplication, this x*(1-x) works fine, returning another (4,1) result.
But if x is a (4,1) matrix, then * is the matrix product, the same np.dot for arrays. That would require a (4,1) * (1,4) => (4,4), or a (1,4)*(4,1)=>(1,1). You are already using dot for matrix product, so this derivative is clear the element wise one.
If you see machine learning code that uses np.matrix it is probably based on older examples, and retains matrix for backward compatibility. It is better to use array, and use the dot product as needed.
I'm trying to implement the softmax function for a neural network written in Numpy. Let h be the softmax value of a given signal i.
I've struggled to implement the softmax activation function's partial derivative.
I'm currently stuck at issue where all the partial derivatives approaches 0 as the training progresses. I've cross-referenced my math with this excellent answer, but my math does not seem to work out.
import numpy as np
def softmax_function( signal, derivative=False ):
# Calculate activation signal
e_x = np.exp( signal )
signal = e_x / np.sum( e_x, axis = 1, keepdims = True )
if derivative:
# Return the partial derivation of the activation function
return np.multiply( signal, 1 - signal ) + sum(
# handle the off-diagonal values
- signal * np.roll( signal, i, axis = 1 )
for i in xrange(1, signal.shape[1] )
)
else:
# Return the activation signal
return signal
#end activation function
The signal parameter contains the input signal sent into the activation function and has the shape (n_samples, n_features).
# sample signal (3 samples, 3 features)
signal = [[0.3394572666491664, 0.3089068053925853, 0.3516359279582483], [0.33932706934615525, 0.3094755563319447, 0.3511973743219001], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256]]
The following code snipped is a fully working activation function and is only included as a reference and proof (mostly for myself) that the conceptual idea actually work.
from scipy.special import expit
import numpy as np
def sigmoid_function( signal, derivative=False ):
# Prevent overflow.
signal = np.clip( signal, -500, 500 )
# Calculate activation signal
signal = expit( signal )
if derivative:
# Return the partial derivation of the activation function
return np.multiply(signal, 1 - signal)
else:
# Return the activation signal
return signal
#end activation function
Edit
The problem intuitively persist with simple single layer networks. The softmax (and its derivative) is applied at the final layer.
This is an answer on how to calculate the derivative of the softmax function in a more vectorized numpy fashion. However, the fact that the partial derivatives approach to zero might not be a math issue, and just be a problem of the learning rate or the known dying weight issue with complex deep neural networks. Layers like ReLU help preventing the latter issue.
First, I've used the following signal (just duplicating your last entry) to make it 4 samples x 3 features so is easier to see what is going on with the dimensions.
>>> signal = [[0.3394572666491664, 0.3089068053925853, 0.3516359279582483], [0.33932706934615525, 0.3094755563319447, 0.3511973743219001], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256]]
>>> signal.shape
(4, 3)
Next, you want to compute the Jacobian matrix of your softmax function. According to the cited page it is defined as -hi * hj for the off-diagonal entries (majority of the matrix for n_features > 2), so lets start there. In numpy, you can efficiently calculate that Jacobian matrix using broadcasting:
>>> J = - signal[..., None] * signal[:, None, :]
>>> J.shape
(4, 3, 3)
The first signal[..., None] (equivalent to signal[:, :, None]) reshapes the signal to (4, 3, 1) while the second signal[:, None, :] reshapes the signal to (4, 1, 3). Then, the * just multiplies both matrices element-wise. Numpy's internal broadcasting repeats both matrices to form the n_features x n_features matrix for every sample.
Then, we need to fix the diagonal elements:
>>> iy, ix = np.diag_indices_from(J[0])
>>> J[:, iy, ix] = signal * (1. - signal)
The above lines extract diagonal indices for n_features x n_features matrix. It is equivalent of doing iy = np.arange(n_features); ix = np.arange(n_features). Then, replaces the diagonal entries with your defitinion hi * (1 - hi).
Last, according to the linked source, you need to sum across rows for each of the samples. That can be done as:
>>> J = J.sum(axis=1)
>>> J.shape
(4, 3)
Find bellow a summarized version:
if derivative:
J = - signal[..., None] * signal[:, None, :] # off-diagonal Jacobian
iy, ix = np.diag_indices_from(J[0])
J[:, iy, ix] = signal * (1. - signal) # diagonal
return J.sum(axis=1) # sum across-rows for each sample
Comparison of the derivatives:
>>> signal = [[0.3394572666491664, 0.3089068053925853, 0.3516359279582483], [0.33932706934615525, 0.3094755563319447, 0.3511973743219001], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256]]
>>> e_x = np.exp( signal )
>>> signal = e_x / np.sum( e_x, axis = 1, keepdims = True )
Yours:
>>> np.multiply( signal, 1 - signal ) + sum(
# handle the off-diagonal values
- signal * np.roll( signal, i, axis = 1 )
for i in xrange(1, signal.shape[1] )
)
array([[ 2.77555756e-17, -2.77555756e-17, 0.00000000e+00],
[ -2.77555756e-17, -2.77555756e-17, -2.77555756e-17],
[ 2.77555756e-17, 0.00000000e+00, 2.77555756e-17],
[ 2.77555756e-17, 0.00000000e+00, 2.77555756e-17]])
Mine:
>>> J = signal[..., None] * signal[:, None, :]
>>> iy, ix = np.diag_indices_from(J[0])
>>> J[:, iy, ix] = signal * (1. - signal)
>>> J.sum(axis=1)
array([[ 4.16333634e-17, -1.38777878e-17, 0.00000000e+00],
[ -2.77555756e-17, -2.77555756e-17, -2.77555756e-17],
[ 2.77555756e-17, 1.38777878e-17, 2.77555756e-17],
[ 2.77555756e-17, 1.38777878e-17, 2.77555756e-17]])
#Imanol Luengo's solution is wrong the moment he takes sums across rows.
#Harveyslash also makes a good point since he noticed extremely low "gradients" in his solution, the NN won't learn or learn in the wrong direction.
We have 4 samples x 3 inputs
The thing is that softmax is not a scalar function that takes just 1 input, but 3 in this case. Remember that all inputs in one sample must add up to one, and therefore you can't compute the value of one without knowing the others? This implies that the gradient must be a square matrix, because we need to also take into account our partial derivatives.
TLDR: The output of the gradient of softmax(sample) in this case must be a 3x3 matrix.
This is correct:
J = - signal[..., None] * signal[:, None, :] # off-diagonal Jacobian
iy, ix = np.diag_indices_from(J[0])
J[:, iy, ix] = signal * (1. - signal) # diagonal
Up until this point Imanol uses fast vector operations to compute the Jacobian of the softmax function in 4 points, resulting in a 3x3 matrix stacked 4 times: 4 x 3 x 3.
Now I think what the OP really wants is dJdZ, the first step in ANN backprop:
dJdZ(4x3) = dJdy(4x3) * gradSoftmax[layer signal(4x3)](?,?)
The problem is that we usually (with sigmoid, ReLU,... any scalar activation function) can compute the gradient as a stacked vector and then multiply element-wise with dJdy, but here we got a stacked matrix. How can we marry the two concepts?
The vector can be seen as the non zero elements of a diagonal matrix -> All this time we have been getting away with easy element-wise multiplication just because our activation function was scalar! For our softmax it's not that simple, and therefore we have to use matrix multiplication
dJdZ(4x3) = dJdy(4-1x3) * anygradient[layer signal(4,3)](4-3x3)
Now we multiply each 1x3 dJdy vector times the 3x3 gradient, for each of the 4 samples, but usually common operations will fail. We need to specify along which dimensions we multiply too (use np.einsum). End result:
#For reference 'mnr,mrr->mr' = 4x1x3,4x3x3->4x3
dJdZ = np.einsum('mnr,mrr->mr', dJdy[:,None,:], gradient)