PyTorch offers a functional 2D convolution operation torch.nn.functional.conv2d which may take an input and a weight parameter (https://pytorch.org/docs/stable/nn.html#conv2d).
I want to perform a self convolution, meaning that given a tensor X. I want to calculate:
conv2d(X, X)
However the shape of the filter is expected to be of the form [out_ch, in_channel/groups, kernel_size, kernel_size].
Let's say X is of the form [32, 3, 16, 16,] I have tried to permute X to try to match the expected form of the weight parameter, such as:
W = X.permute(1, 0, 2, 3) # 3, 32, 16, 16
conv2d(X, W)
But that does not work since 32 != in_channel/groups == 3 / 1 == 3
The complete operation would essentially look like this (to see the dimensions requirements):
X + conv2d(X, X)
Is it possible to do this with PyTorch?
Other packages such as SciPy offers a convolve2d operation (https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve2d.html) but that only supports 2D arrays, and appears to do what I'm trying to do above, but without support for batches and channels.
Related
I have a 500x2000 matrix, where each row represents an individual and each column is a measurement of some particular quality about that individual. I'm using a batch size of 64, so the input for each cycle of the network is actually a 64x2000 matrix. I'm trying to build a CNN in PyTorch to classify individuals given a set of these measurements. However, I've stumbled on the parameters for the convolutional layer.
Below is my current definition for a simple convolutional neural network.
class CNNnet(nn.Module)
def __init__(self):
self.conv1 = nn.Conv1d(2000, 200, (1,2), stride=10)
self.pool = nn.MaxPool1d(kernel_size = (1, 2), stride = 2)
self.fc1 = nn.Linear(64, 30)
self.fc2 = nn.Linear(30, 7)
def forward(self, x):
x = x.view(64, 2000, 1)
x = F.relu(self.conv1(x))
x = self.pool(x)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
Attempting to train this model produces the following error:
"RuntimeError: Expected 4-dimensional input for 4-dimensional weight
200 2000 1 2, but got 3-dimensional input of size [64, 2000, 1]
instead".
I'm confused on why it's expecting a 4D 200x2000x1x2 matrix (shouldn't the number of output channels be irrelevant to the input? And why is there a 2 at the end?).
My question is what would be the proper syntax or approach for writing a CNN (specifically the convolutional layer) when dealing with 1D data. Any help is greatly appreciated.
So the kernel size in the 1 dimensional case is simply a vector. So if you’ll want a kernel of size ‘1X2’ you need to specify the ‘2’
In the 2 dimensional case 2 will mean a ‘2X2’ kernel size.
You gave a tuple of 2 values so you use 2 kernel types each will create its own channel
This question already has answers here:
How does pytorch broadcasting work?
(2 answers)
Closed 3 years ago.
I'm new to Deep Learning. I'm studying from Udacity.
I came across one of the codes to build up a neural network, where 2 tensors are being added, specifically the 'bias' tensor with the output of the tensor-multiplication product.
It was kind of...
def activation(x):
return (1/(1+torch.exp(-x)))
inputs = images.view(images.shape[0], -1)
w1 = torch.randn(784, 256)
b1 = torch.randn(256)
h = activation(torch.mm(inputs,w1) + b1)
After flattening the MNIST, it came out as [64,784] (inputs).
I'm not getting how the bias tensor (b1) of dimension [256], could be added to the multiplication product of 'inputs' and 'w1' which comes out to be the dimensions of [256, 64].
In simple terms, whenever we use "broadcasting" from a Python library (Numpy or PyTorch), what we are doing is treating our arrays (weight, bias) dimensionally compatible.
In other words, if you are operating with W of shape [256,64], and your bias is only [256]. Then, broadcasting will complete that lacking dimension.
As you can see in the image above, the dimension left is being filled so that our operations can be done successfully. Hope this is helpful
64 is your batch size, meaning that the bias tensor will be added to each of the 64 examples inside of your batch. Basically it's like if you took 64 tensor of size 256 and added the bias to each of them. Pytorch will naturally broadcast the 256 tensor to a 64*256 size that can be added to the 64*256 output of your precedent layer.
This is something called PyTorch broadcasting.
It is very similar to NumPy broadcasting if you used the library.
Here is the example adding a scalar to a 2D tensor m.
m = torch.rand(3,3)
print(m)
s=1
print(m+s)
# tensor([[0.2616, 0.4726, 0.1077],
# [0.0097, 0.1070, 0.7539],
# [0.9406, 0.1967, 0.1249]])
# tensor([[1.2616, 1.4726, 1.1077],
# [1.0097, 1.1070, 1.7539],
# [1.9406, 1.1967, 1.1249]])
Here is the another example adding 1D tensor and 2D tensor.
v = torch.rand(3)
print(v)
print(m+v)
# tensor([0.2346, 0.9966, 0.0266])
# tensor([[0.4962, 1.4691, 0.1343],
# [0.2442, 1.1035, 0.7805],
# [1.1752, 1.1932, 0.1514]])
I rewrote your example:
def activation(x):
return (1/(1+torch.exp(-x)))
images = torch.randn(3,28,28)
inputs = images.view(images.shape[0], -1)
print("INPUTS:", inputs.shape)
W1 = torch.randn(784, 256)
print("W1:", w1.shape)
B1 = torch.randn(256)
print("B1:", b1.shape)
h = activation(torch.mm(inputs,W1) + B1)
Out
INPUTS: torch.Size([3, 784])
W1: torch.Size([784, 256])
B1: torch.Size([256])
To explain:
INPUTS: of size [3, 784] # W1: of size [784, 256] will create tensor of size [3, 256]
Then the addition:
After mm: [3, 256] + B1: [256] is done because B1 will take the shape of [3, 256] based on broadcasting.
I am having trouble figuring out what the dimensions of each CNN layer is.
Let's say my input is a vector which I then projected onto a 4x4x256 matrix using a fully-connected layer as so...
zP = slim.fully_connected(
z,
4*4*256,
normalizer_fn=slim.batch_norm,
activation_fn=tf.nn.relu,
scope='g_project',
weights_initializer=initializer
)
# Layer is reshaped to a 4x4x256 mapping.
zCon = tf.reshape(zP,[-1,4,4,256])
Where z was my original vector. I then take this 4x4x256 matrix and feed it into a CNN...
gen1 = slim.convolution2d_transpose(
zCon,
num_outputs=64,
kernel_size=[5,5],
stride=[2,2],
padding="SAME",
normalizer_fn=slim.batch_norm,
activation_fn=tf.nn.relu,
scope='g_conv1',
weights_initializer=initializer
)
As you can see I used a convolutional 2d transpose and I specified the output as 64, with a stride of 2 and a filter size of 5. This means that I know one of my dimension will be 64, however I do not know what the other 2 dimensions will be and I do not know how to calculate it.
I tried using the following formula but it is not working out for me...
How can I calculate the remaining dimensions?
The formula you have written is for the Convolution operation, since you need to calculate for the transposed convolution where the shapes are inverse of convolution, the formula can be derived from the above equation by re-arranging the terms as:
W = (Out-1)*S + F - 2P
W is your actual output and Out is your actual input to the transpose convolution.
I am new of mxnet, in the official doc, the generation of a convolution layer could be
conv = nd.Convolution(data=data, weight=W, bias=b, kernel=(3,3), num_filter=10)
But it is required that the weight parameter needs to take a 4-D tensor
W = [weight_num, stride, kernel_height, kernel_width]
So why we still need to set a kernel parameter in Convolution function?
kernel parameter sets up the kernel size, which can be either:
(width,) - for 1D convolution
(height, width) - for 2D convolution
(depth, height, width) - for 3D convolution
It only defines shapes.
The weight and bias parameters contain actual parameters that are going to be trained. The actual values are going to be here.
While you could probably figure out kernel (shapes) by provided weight, it is more defensive to ask to provide kernel shape explicitly instead of trying figuring it out based on parameters passed to weight.
Here is an example of 2D convolution:
# shape is batch_size x channels x height x width
x = mx.nd.random.uniform(shape=(100, 1, 9, 9))
# kernel is just 3 x 3,
# weight is num_filter x channels x kernel_height x kernel_width
# bias is num_filter
mx.nd.Convolution(data=x,
kernel=(3, 3),
num_filter=5,
weight=mx.nd.random.uniform(shape=(5, 1, 3, 3)),
bias=mx.nd.random.uniform(shape=(5,)))
The documentation explaining various shapes of parameters in case of 1D, 2D or 3D convolutions is quite good: https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html#mxnet.ndarray.Convolution
I'm trying to use lambda layer in keras to return a Euclidean distance of two vectors. The code is:
def distance(x):
a=x[0]
b=x[1]
dist=np.linalg.norm(a-b)
return dist
dist=Lambda(distance,output_shape=(1,1)name='dist')([x,y])
The input of this layer are two vectors of (100,1,8192). The 100 is the batch.The output is a constant in theory. And I want to use dist as output of this model like:
model = Model(inputs=[probe_input_car,probe_input_sign,gallary_input_car,gallary_input_sign], outputs=dist, name='fcn')`
When I run this model, there will be a error:
ValueError: Input dimension mis-match. (input[0].shape[2] = 1, input[1].shape[2] = 8192)
Apply node that caused the error: Elemwise{Composite{EQ(i0, RoundHalfToEven(i1))}}(/dist_target, Elemwise{Composite{sqrt(sqr(i0))}}.0)
Toposort index: 92
Inputs types: [TensorType(float32, 3D), TensorType(float32, 3D)]
Inputs shapes: [(100, 1, 1), (100, 1, 8192)]
Inputs strides: [(4, 4, 4), (32768, 32768, 4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Sum{acc_dtype=int64}(Elemwise{Composite{EQ(i0, RoundHalfToEven(i1))}}.0)]]
I think this is caused by the output_shape of lambda layer. How should I set the output_shape of the layer. Because I use theano as the backend, it can't calculate the output_shape itself.
And if it is not caused by output_shape. Where is the error?
It seems you're simply getting the wrong parts of the vector.
The message says it's trying to compute something with two tensors shaped as:
(100,1,1)
(100,1,8192)
Based on your input list, where you have [car,signal,car2,signal2]. I believe you probably want some operation betwen either car x car or signal x signal.
So, your lambda layer should probably start as either:
a = x[0]
b = x[2]
or:
a = x[1]
b = x[3]
Hint: if you're able to find an equivalent function in keras backend to calculate what you want, it's probably better. I wonder how you haven't got a "disconnected" error message for using a numpy function.
The error occurs because you used np.linalg.norm() on a Theano tensor. It doesn't throw an error, but the output is definitely not what you expect.
To avoid this error, use Keras backend functions instead. For example,
dist = K.sqrt(K.sum(K.square(a - b), axis=-1, keepdims=True))
What happened inside np.linalg.norm(x):
x = np.asarray(x) wraps a - b into a length-1 array (of dtype object) whose only element is a Theano tensor of shape (100, 1, 8192).
sqnorm = np.dot(x, x): recall the definition of dot product. When you dot a length-1 array with itself, you're actually computing (a - b) * (a - b), or an element-wise square of a - b. That's why there's sqr(i0) in the second line of your error.
np.sqrt(sqnorm) is returned. So you can see sqrt(sqr(i0)) appear in your error.
Therefore, the output of np.linalg.norm(a - b) is a tensor of shape (100, 1, 8192), not (100, 1, 1).
Also, if you look closer into the code, Elemwise{Composite{EQ(i0, RoundHalfToEven(i1))}} is just accuracy.
def binary_accuracy(y_true, y_pred):
return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
So the error message is trying to tell you that there's a mismatch between the shapes of y_true and y_pred. While y_true is of shape (100, 1, 1), y_pred has a shape (100, 1, 8192) because np.linalg.norm() gives wrong results for Theano tensors.