Should a 1D CNN need padding to retain input length? - python

Shouldn't a 1D CNN with stride = 1 and 1 filter have output length equal to input length without the need for padding?
I thought this was the case, but created a Keras model with these specifications that says the output shape is (17902,1) when the input shape is (17910,1). I'm wondering why the dimension has been reduced, since the stride is 1 and it's a 1D convolution.
model = keras.Sequential([
layers.Conv1D(filters=1,kernel_size=9,strides=1,activation=tf.nn.relu,input_shape=X_train[0].shape)
])
I expect that the output shape of this model should be (17910,1), but clearly I'm missing a source of reduction in dimension in this conv. layer.

The length of your output vector is dependent on the length of the input and your kernel size. Since you have a kernel size of 9 you'll get 17902 convolutions with your input and thus an output of shape (17902,1) (without padding).
For better understanding:
Without padding:
With padding:
Whether you should use padding or not is more a question of accuracy. As Ian Goodfellow, Yoshua Bengio and Aaaron Courville in their Deep Learning book found, the optimal padding (at least for 2D images) lies somewhere between "none" and "same"
So my suggestion would be, to try two different CNNs, which have the same architecture except the padding and take the one which has the better accuracy.
(Source: https://www.slideshare.net/xavigiro/recurrent-neural-networks-2-d2l3-deep-learning-for-speech-and-language-upc-2017)

Related

How to interpret this CNN architecture

How does this CNN architecture work from an input layer to the first convolution layer? hx98 are input matrix dimensions, is n the number of channels or the number of inputs?
It doesn't seem like n is the number of channels because 25 is the number of feature maps and their dimensions do not indicate they are two channels.
However if n is the number of inputs and matrices are single channel, I haven't found a single CNN architecture anywhere that takes multiple input matrices and convolute them together. Most example convolute them seperately and then concatenate.
In my example, n is 2, one is matrix with BER values and another with connection line-rate values.
What mistake am I making? How does this CNN work.
In CNN the image pixels with height and width are multiplied with the
kernel weights of the convolution layer and are added to create a
feature map.
The kernel will pass through all the channels of the
image (3 channels for RGB, 1 channel for GreyScale) based on the
strides defined in the convolution layer.
After the convolution, the size of the image is reduced.
To get the same output dimension as the input dimension, you need to add padding. Padding consists of adding
the right number of rows and columns on each side of the matrix. For
details, please refer to this
documentation.
Thank You.

Stuck understanding ResNet's Identity block and Convolutional blocks

I'm learning Residual Networks (ResNet50) from Andrew Ng coursera lectures. I understand that one of the main reasons why ResNets work is that they can learn identity function and that's why adding more and more layers in network does not hurt the performance of the network.
Now as described in lectures, there are two type of blocks are used in ResNets: 1) Identity block and Convolutional block.
Identity Block is used when there is no change in input and output dimensions. Convolutional block is almost same as identity block but there is a convolutional layer in short-cut path to just change the dimension such that the dimension of input and output matches.
Here is identity block:
and here is convolutional block:
Now in implementation of convolutional block (2nd image), First block (i.e. conv2d --> BatchNorm --> ReLu is implemented with 1x1 convolution and stride > 1.
# First component of main path
X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', padding = 'valid', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
X = Activation('relu')(X)
I don't understand the reason behind keeping stride > 1 with window size 1. Isn't it just data loss? We are just considering alternate pixels in this case.
What should be the possible reason for such hyperparameter selection? Any intuitive explanation will help! Thanks.
I don't understand the reason behind keeping stride > 1 with window
size 1. Isn't it just data loss?
Please refer the section on Deeper Bottleneck Architectures in the resnet paper. Also, Figure 5.
https://arxiv.org/pdf/1512.03385.pdf
1 x 1 convolutions are typically used to increase or decrease the dimensionality along the filter dimension. So, in the bottleneck architecture the first 1 x 1 layer reduces the dimensions so that the 3 x 3 layer needs to handle smaller input/output dimensions. Then the final 1 x 1 layer increases the filter dimensions again.
It's done to save on computation/training time.
From the paper,
"Because of concerns on the training time that we can afford, we modify the building block as a bottleneck design".
I believe you might have answered your own question. The convolutional block is used whenever you need to change the dimension in order for the output and input dimensions to match. That being said, how do you change the dimension of a certain volume using convolutions? Well, you change the stride.
For any given convolution operation, assuming a square input, the dimension of the output volume can be obtained through the formula (n+2p-f)/s +1, where n is the input dimension, p is your zero-padding, f the filter dimension and s is the stride. By increasing the stride you're effectively reducing the dimension of your shortcut's output volume, and thus, it can be used in such a way as to make sure that the dimensions of your shortcut and lower paths will match in order for the final sum to be performed.
Why is it >1 then? Well, if you didn't need a stride larger than one, you wouldn't be needing a dimension alteration in the first place and therefore would be using the identity block instead.

converting an array of size (n,n,m) to (None,n,n,m)

I am trying to reshape an array of size (14,14,3) to (None, 14,14,3). I have seen that the output of each layer in convolutional neural network has shape in the format(None, n, n, m).
Consider that the name of my array is arr
I tried arr[None,:,:] but it converts it to a dimension of (1,14,14,3).
How should I do it?
https://www.tensorflow.org/api_docs/python/tf/TensorShape
A TensorShape represents a possibly-partial shape specification for a Tensor. It may be one of the following:
Partially-known shape: has a known number of dimensions, and an unknown size for one or more dimension. e.g. TensorShape([None, 256])
That is not possible in numpy. All dimensions of a ndarray are known.
arr[None,:,:] notation adds a new size 1 dimension, (1,14,14,3). Under broadcasting rules, such a dimension may be changed to match a dimension of another array. In that sense we often treat the None as a flexible dimension.
I haven't worked with tensorflow though I see a lot of questions with both tags. tensorflow should have mechanisms for transfering values to and from tensors. It knows about numpy, but numpy does not 'know' anything about tensorflow.
A ndarray is an object with known values, and its shape is used to access those values in a multidimensional way. In contrast a tensor does not have values:
https://www.tensorflow.org/api_docs/python/tf/Tensor
It does not hold the values of that operation's output, but instead provides a means of computing those values
Looks like you can create a TensorProt from an array (and return an array from one as well):
https://www.tensorflow.org/api_docs/python/tf/make_tensor_proto
and to make a Tensor from an array:
https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor
The shape (None, 14,14,3) represent ,(batch_size,imgH,imgW,imgChannel) now imgH and imgW can be use interchangeably depends on the network and the problem.
But the batchsize is given as "None" in the neural network because we don't want to restrict our batchsize to some specific value as our batchsize depends on a lot of factors like memory available for our model to run etc.
So lets say you have 4 images of size 14x14x3 then you can append each image into the array say L1, and now the L1 will have the shape 4x14x14x3 i.e you made a batch of 4 images and now you can feed this to your neural network.
NOTE here None will be replaced by 4 and for the whole training process it will be 4. Similarly when you feed your network only one image it assumes the batchsize of 1 and set None equal to 1 giving you the shape (1X14X14X3)

Kernel size change in convolutional neural networks

I have been working on creating a convolutional neural network from scratch, and am a little confused on how to treat kernel size for hidden convolutional layers. For example, say I have an MNIST image as input (28 x 28) and put it through the following layers.
Convolutional layer with kernel_size = (5,5) with 32 output channels
new dimension of throughput = (32, 28, 28)
Max Pooling layer with pool_size (2,2) and step (2,2)
new dimension of throughput = (32, 14, 14)
If I now want to create a second convolutional layer with kernel size = (5x5) and 64 output channels, how do I proceed? Does this mean that I only need two new filters (2 x 32 existing channels) or does the kernel size change to be (32 x 5 x 5) since there are already 32 input channels?
Since the initial input was a 2D image, I do not know how to conduct convolution for the hidden layer since the input is now 3 dimensional (32 x 14 x 14).
you need 64 kernel, each with the size of (32,5,5) .
depth(#channels) of kernels, 32 in this case, or 3 for a RGB image, 1 for gray scale etc, should always match the input depth, but values are all the same.
e.g. if you have a 3x3 kernel like this : [-1 0 1; -2 0 2; -1 0 1] and now you want to convolve it with an input with N as depth or say channel, you just copy this 3x3 kernel N times in 3rd dimension, the following math is just like the 1 channel case, you sum all values in all N channels which your kernel window is currently on them after multiplying the kernel values with them and get the value of just 1 entry or pixel. so what you get as output in the end is a matrix with 1 channel:) how much depth you want your matrix for next layer to have? that's the number of kernels you should apply. hence in your case it would be a kernel with this size (64 x 32 x 5 x 5) which is actually 64 kernels with 32 channels for each and same 5x5 values in all cahnnels.
("I am not a very confident english speaker hope you get what I said, it would be nice if someone edit this :)")
You essentially answered your own question. YOU are building the network solver. It seems like your convolutional layer output is [channels out] = [channels in] * [number of kernels]. I had to infer this from the wording of your question. In general, this is how it works: you specify the kernel size of the layer and how many kernels to use. Since you have one input channel you are essentially saying that there are 32 kernels in your first convolution layer. That is 32 unique 5x5 kernels. Each of these kernels will be applied to the one input channel. More in general, each of the layer kernels (32 in your example) is applied to each of the input channels. And that is the key. If you build code to implement the convolution layer according to these generalities, then your subsequent convolution layers are done. In the next layer you specify two kernels per channel. In your example there would be 32 input channels, the hidden layer has 2 kernels per channel, and the output would be 64 channels.
You could then down sample by applying a pooling layer, then flatten the 64 channels [turn a matrix into a vector by stacking the columns or rows], and pass it as a column vector into a fully connected network. That is the basic scheme of convolutional networks.
The work comes when you try to code up backpropagation through the convolutional layers. But the OP didn’t ask about that. I’ll just say this, you will come to a place where you have the stored input matrix (one channel), you have a gradient from a lower layer in the form of a matrix and is the size of the layer kernel, and you need to backpropagate it up to the next convolutional layer.
The simple approach is to rotate your stored channel matrix by 180 degrees and then convolve it with the gradient. The explanation for this is long and tedious, too much to write here, and not a lot on the internet explains it well.
A more sophisticated approach is to apply “correlation” between the input gradient and the stored channel matrix. Note I specifically said “correlation” as opposed to “convolution” and that is key. If you think they “almost” the same thing, then I recommend you take some time and learn about the differences.
If you would like to have a look at my CNN solver here's a link to the project. It's C++ and no documentation, sorry :) It's all in a header file called layer.h, find the class FilterLayer2D. I think the code is pretty readable (what programmer doesn't think his code is readable :) )
https://github.com/sraber/simplenet.git
I also wrote a paper on basic fully connected networks. I wrote it so that I would forget what I learned in my self study. Maybe you'll get something out of it. It's at this link:
http://www.raberfamily.com/scottblog/scottblog.htm

When bulding a CNN, I am getting complaints from Keras that do not make sense to me.

My input shape is supposed to be 100x100. It represents a sentence. Each word is a vector of 100 dimensions and there are 100 words at maximum in a sentence.
I feed eight sentences to the CNN.I am not sure whether this means my input shape should be 100x100x8 instead.
Then the following lines
Convolution2D(10, 3, 3, border_mode='same',
input_shape=(100, 100))
complains:
Input 0 is incompatible with layer convolution2d_1: expected ndim=4, found ndim=3
This does not make sense to me as my input dimension is 2. I can get through it by changing input_shape to (100,100,8). But the "expected ndim=4" bit just does not make sense to me.
I also cannot see why a convolution layer of 3x3 with 10 filters do not accept input of 100x100.
Even I get thru the complains about the "expected ndim=4". I run into problem in my activation layer. There it complains:
Cannot apply softmax to a tensor that is not 2D or 3D. Here, ndim=4
Can anyone explain what is going on here and how to fix it? Many thanks.
I had the same problem and I solved it adding one dimension for channel to input_shape argument.
I suggest following solution:
Convolution2D(10, 3, 3, border_mode='same', input_shape=(100, 100, 1))
the missing dimension for 2D convolutional layers is the "channel" dimension.
For image data, the channel dimension is 1 for grayscale images and 3 for color images.
In your case, to make sure that Keras won't complain, you could use 2D convolution with 1 channel, or 1D convolution with 100 channels.
Ref: http://keras.io/layers/convolutional/#convolution2d
Keras's Convolutional layer takes in 4 dimensional arrays, so you need to structure your input to fit it that way. The dimensions are made up of (batch_size,x_dim,y_dim,channels). This makes a lot of sense in the case of images, which is where CNNs are mostly used, but for your case it gets a bit trickier.
However, batch_size is invariant to the dataset so you need to stack the 8 sentences in the first dimension to get (8,100,100). Channels can be kept to 1 and you need to write it in such a way that keras will accept the data, so expanding the data to (8,100,100,1) would be the input shape you need.

Categories