I'm trying to perform a convolution (conv2d) on images of variable dimensions. I have those images in form of an 1-D array and I want to perform a convolution on them, but I have a lot of troubles with the shapes.
This is my code of the conv2d:
tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
where x is the input image.
The error is:
ValueError: Shape must be rank 4 but is rank 1 for 'Conv2D' (op: 'Conv2D') with input shapes: [1], [5,5,1,32].
I think I might reshape x, but I don't know the right dimensions. When I try this code:
x = tf.reshape(self.x, shape=[-1, 5, 5, 1]) # example
I get this:
ValueError: Dimension size must be evenly divisible by 25 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [1], [4] and with input tensors computed as partial shapes: input[1] = [?,5,5,1].
You can't use conv2d with a tensor of rank 1. Here's the description from the doc:
Computes a 2-D convolution given 4-D input and filter tensors.
These four dimensions are [batch, height, width, channels] (as Engineero already wrote).
If you don't know the dimensions of the image in advance, tensorflow allows to provide a dynamic shape:
x = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='x')
with tf.Session() as session:
print session.run(x, feed_dict={x: data})
In this example, a 4-D tensor x is created, but only the number of channels is known statically (3), everything else is determined on runtime. So you can pass this x into conv2d, even if the size is dynamic.
But there's another problem. You didn't say your task, but if you're building a convolutional neural network, I'm afraid, you'll need to know the size of the input to determine the size of FC layer after all pooling operations - this size must be static. If this is the case, I think the best solution is actually to scale your inputs to a common size before passing it into a convolutional network.
UPD:
Since it wasn't clear, here's how you can reshape any image into 4-D array.
a = np.zeros([50, 178, 3])
shape = a.shape
print shape # prints (50, 178, 3)
a = a.reshape([1] + list(shape))
print a.shape # prints (1, 50, 178, 3)
Related
I'm trying to get my head around 1D convolution - specifically, how the padding comes into it.
Suppose I have an input sequence of shape (batch,128,1) and run it through the following Keras layer:
tf.keras.layers.Conv1D(32, 5, strides=2, padding="same")
I get an output of shape (batch,64,32), but I don't understand why the sequence length has reduced from 128 to 64... I thought the padding="same" parameter kept the output length the same as the input? I suppose that's only true if strides=1; so in this case I'm confused about what padding="same" actually means.
According to the TensorFlow documents in your case we have:
filters (Number of filters - output dimension) = 32
kernelSize (The filter size) = 5
strides (The unit to move in input data by the convolution layer in each dimensions after applying each convolution) = 2
So applying input in shape (batch, 128, 1) will be to apply 32 kernels (in shape 5) and jump two unit after each convolution - so we have 128 / 2 = 64 value corresponding to each filter and at the end output would be in shape (batch, 64, 32).
padding="same" is just determining the the convolution on borders. For more details you can check here.
I am looking at an implementation of RCNN for text classification using PyTorch. Full Code. There are two points where the dimensions of tensors are permuted using the permute function. The first is after the LSTM layer and before tanh. The second is after a linear layer and before a max pooling layer.
Could you please explain why the permutation is necessary or useful?
Relevant Code
def forward(self, x):
# x.shape = (seq_len, batch_size)
embedded_sent = self.embeddings(x)
# embedded_sent.shape = (seq_len, batch_size, embed_size)
lstm_out, (h_n,c_n) = self.lstm(embedded_sent)
# lstm_out.shape = (seq_len, batch_size, 2 * hidden_size)
input_features = torch.cat([lstm_out,embedded_sent], 2).permute(1,0,2)
# final_features.shape = (batch_size, seq_len, embed_size + 2*hidden_size)
linear_output = self.tanh(
self.W(input_features)
)
# linear_output.shape = (batch_size, seq_len, hidden_size_linear)
linear_output = linear_output.permute(0,2,1) # Reshaping fot max_pool
max_out_features = F.max_pool1d(linear_output, linear_output.shape[2]).squeeze(2)
# max_out_features.shape = (batch_size, hidden_size_linear)
max_out_features = self.dropout(max_out_features)
final_out = self.fc(max_out_features)
return self.softmax(final_out)
Similar Code in other Repositories
Similar implementations of RCNN use permute or transpose. Here are examples:
https://github.com/prakashpandey9/Text-Classification-Pytorch/blob/master/models/RCNN.py
https://github.com/jungwhank/rcnn-text-classification-pytorch/blob/master/model.py
What permute function does is rearranges the original tensor according to the desired ordering, note permute is different from reshape function, because when apply permute, the elements in tensor follow the index you provide where in reshape it's not.
Example code:
import torch
var = torch.randn(2, 4)
pe_var = var.permute(1, 0)
re_var = torch.reshape(var, (4, 2))
print("Original size:\n{}\nOriginal var:\n{}\n".format(var.size(), var) +
"Permute size:\n{}\nPermute var:\n{}\n".format(pe_var.size(), pe_var) +
"Reshape size:\n{}\nReshape var:\n{}\n".format(re_var.size(), re_var))
Outputs:
Original size:
torch.Size([2, 4])
Original var:
tensor([[ 0.8250, -0.1984, 0.5567, -0.7123],
[-1.0503, 0.0470, -1.9473, 0.9925]])
Permute size:
torch.Size([4, 2])
Permute var:
tensor([[ 0.8250, -1.0503],
[-0.1984, 0.0470],
[ 0.5567, -1.9473],
[-0.7123, 0.9925]])
Reshape size:
torch.Size([4, 2])
Reshape var:
tensor([[ 0.8250, -0.1984],
[ 0.5567, -0.7123],
[-1.0503, 0.0470],
[-1.9473, 0.9925]])
With the role of permute in mind we could see what first permute does is reordering the concatenate tensor for it to fit the inputs format of self.W, i.e with batch as first dimension; and the second permute does similar thing because we want to max pool the linear_output along the sequence and F.max_pool1d will pool along the last dimension.
I am adding this answer to provide additional PyTorch-specific details.
It is necessary to use permute between nn.LSTM and nn.Linear because the output shape of LSTM does not correspond to the expected input shape of Linear.
nn.LSTM outputs output, (h_n, c_n). Tensor output has shape seq_len, batch, num_directions * hidden_size nn.LSTM. nn.Linear expects an input tensor with shape N,∗,H, where N is batch size and H is number of input features. nn.Linear.
It is necessary to use permute between nn.Linear and nn.MaxPool1d because the output of nn.Linear is N, L, C, where N is batch size, C is the number of features, and and L is sequence length. nn.MaxPool1d expects an input tensor of shape N, C, L. nn.MaxPool1d
I reviewed seven implementations of RCNN for text classification with PyTorch on GitHub and gitee and found that permute and transpose are the normal ways to convert the output of one layer to the input of a subsequent layer.
This question already has answers here:
How does pytorch broadcasting work?
(2 answers)
Closed 3 years ago.
I'm new to Deep Learning. I'm studying from Udacity.
I came across one of the codes to build up a neural network, where 2 tensors are being added, specifically the 'bias' tensor with the output of the tensor-multiplication product.
It was kind of...
def activation(x):
return (1/(1+torch.exp(-x)))
inputs = images.view(images.shape[0], -1)
w1 = torch.randn(784, 256)
b1 = torch.randn(256)
h = activation(torch.mm(inputs,w1) + b1)
After flattening the MNIST, it came out as [64,784] (inputs).
I'm not getting how the bias tensor (b1) of dimension [256], could be added to the multiplication product of 'inputs' and 'w1' which comes out to be the dimensions of [256, 64].
In simple terms, whenever we use "broadcasting" from a Python library (Numpy or PyTorch), what we are doing is treating our arrays (weight, bias) dimensionally compatible.
In other words, if you are operating with W of shape [256,64], and your bias is only [256]. Then, broadcasting will complete that lacking dimension.
As you can see in the image above, the dimension left is being filled so that our operations can be done successfully. Hope this is helpful
64 is your batch size, meaning that the bias tensor will be added to each of the 64 examples inside of your batch. Basically it's like if you took 64 tensor of size 256 and added the bias to each of them. Pytorch will naturally broadcast the 256 tensor to a 64*256 size that can be added to the 64*256 output of your precedent layer.
This is something called PyTorch broadcasting.
It is very similar to NumPy broadcasting if you used the library.
Here is the example adding a scalar to a 2D tensor m.
m = torch.rand(3,3)
print(m)
s=1
print(m+s)
# tensor([[0.2616, 0.4726, 0.1077],
# [0.0097, 0.1070, 0.7539],
# [0.9406, 0.1967, 0.1249]])
# tensor([[1.2616, 1.4726, 1.1077],
# [1.0097, 1.1070, 1.7539],
# [1.9406, 1.1967, 1.1249]])
Here is the another example adding 1D tensor and 2D tensor.
v = torch.rand(3)
print(v)
print(m+v)
# tensor([0.2346, 0.9966, 0.0266])
# tensor([[0.4962, 1.4691, 0.1343],
# [0.2442, 1.1035, 0.7805],
# [1.1752, 1.1932, 0.1514]])
I rewrote your example:
def activation(x):
return (1/(1+torch.exp(-x)))
images = torch.randn(3,28,28)
inputs = images.view(images.shape[0], -1)
print("INPUTS:", inputs.shape)
W1 = torch.randn(784, 256)
print("W1:", w1.shape)
B1 = torch.randn(256)
print("B1:", b1.shape)
h = activation(torch.mm(inputs,W1) + B1)
Out
INPUTS: torch.Size([3, 784])
W1: torch.Size([784, 256])
B1: torch.Size([256])
To explain:
INPUTS: of size [3, 784] # W1: of size [784, 256] will create tensor of size [3, 256]
Then the addition:
After mm: [3, 256] + B1: [256] is done because B1 will take the shape of [3, 256] based on broadcasting.
I'm building an autoencoder based on RNN. After FC layer, I have to reshape my output to [batch_size, sequence_length, embedding_dimension]. However, my sequence length(timestep) for my decoder is uncertain. What I wish is something work as follow.
outputs = tf.reshape(outputs, [batch_size, None, word_dimension])
Or, is there any other way for me to get the sequence length from the input data which has a shape [batch_size, sequence_length, embedding_dimension].
You can use -1 for the dimension in your reshape operation that you want to be calculated automatically.
For example, here:
x = tf.zeros((100 * 10 *12,))
reshaped = tf.reshape(x, [100, -1, 12])
reshaped will have shape (100, 10, 12)
Or, is there any other way for me to get the sequence length from the input data which has a shape [batch_size, sequence_length, embedding_dimension].
You can use the tf.shape operation to find the shape of a tensor at runtime so if you want sequence_length in a tensor with shape [batch_size, sequence_length, embedding_dimension], you need just call tf.shape(x)[1].
For my example above, calling:
tf.shape(reshaped)[1]
would give an int32 tensor with shape () and value 10
I am trying to implement a 1D convolution on a time series classification problem using keras. I am having some trouble interpreting the output size of the 1D convolutional layer.
I have my data composed of the time series of different features over a time interval of 128 units and I apply a 1D convolutional layer:
x = Input((n_timesteps, n_features))
cnn1_1 = Conv1D(filters = 100, kernel_size= 10, activation='relu')(x)
which after compilation I obtain the following shapes of the outputs:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_26 (InputLayer) (None, 128, 9) 0
_________________________________________________________________
conv1d_28 (Conv1D) (None, 119, 100) 9100
I was assuming that with 1D convolution, the data is only convoluted across the time axis (axis 1) and the size of my output would be:
119, 100*9. But I guess that the network is performing some king of operation across the feature dimension (axis 2) and I don't know which operation is performing.
I am saying this because what I interpret as 1d convolution is that the features shapes must be preserved because I am only convolving the time domain: If I have 9 features, then for each filter I have 9 convolutional kernels, each of these applied to a different features and convoluted across the time axis. This should return 9 convoluted features for each filter resulting in an output shape of 119, 9*100.
However the output shape is 119, 100.
Clearly something else is happening and I can't understand it or get it.
where am I failing my reasoning? How is the 1d convolution performed?
I add one more comment which is my comment on one of the answers provided:
I understand the reduction from 128 to 119, but what I don't understand is why the feature dimension changes. For example, if I use
Conv1D(filters = 1, kernel_size= 10, activation='relu')
, then the output dimension is going to be (None, 119, 1), giving rise to only one feature after the convolution. What is going on in this dimension, which operation is performed to go from from 9 --> 1?
Conv1D needs 3D tensor for its input with shape (batch_size,time_step,feature). Based on your code, the filter size is 100 which means filter converted from 9 dimensions to 100 dimensions. How does this happen? Dot Product.
In above, X_i is the concatenation of k words (k = kernel_size), l is number of filters (l=filters), d is the dimension of input word vector, and p_i is output vector for each window of k words.
What happens in your code?
[n_features * 9] dot [n_features * 9] => [1] => repeat l-times => [1 * 100]
do above for all sequences => [128 * 100]
Another thing that happens here is you did not specify the padding type. According to the docs, by default Conv1d use valid padding which caused your dimension to reduce from 128 to 119. If you need the dimension to be the same as the input you can choose the same option:
Conv1D(filters = 100, kernel_size= 10, activation='relu', padding='same')
It Sums over the last axis, which is the feature axis, you can easily check this by doing the following:
input_shape = (1, 128, 9)
# initialize kernel with ones, and use linear activations
y = tf.keras.layers.Conv1D(1,3, activation="linear", input_shape=input_shape[2:],kernel_initializer="ones")(x)
y :
if you sum x along the feature axis you will get:
x
Now you can easily see that the sum of the first 3 values of sum of x is the first value of convolution, I used a kernel size of 3 to make this verification easier