Shape of 1D convolution output on a 2D data using keras - python

I am trying to implement a 1D convolution on a time series classification problem using keras. I am having some trouble interpreting the output size of the 1D convolutional layer.
I have my data composed of the time series of different features over a time interval of 128 units and I apply a 1D convolutional layer:
x = Input((n_timesteps, n_features))
cnn1_1 = Conv1D(filters = 100, kernel_size= 10, activation='relu')(x)
which after compilation I obtain the following shapes of the outputs:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_26 (InputLayer) (None, 128, 9) 0
_________________________________________________________________
conv1d_28 (Conv1D) (None, 119, 100) 9100
I was assuming that with 1D convolution, the data is only convoluted across the time axis (axis 1) and the size of my output would be:
119, 100*9. But I guess that the network is performing some king of operation across the feature dimension (axis 2) and I don't know which operation is performing.
I am saying this because what I interpret as 1d convolution is that the features shapes must be preserved because I am only convolving the time domain: If I have 9 features, then for each filter I have 9 convolutional kernels, each of these applied to a different features and convoluted across the time axis. This should return 9 convoluted features for each filter resulting in an output shape of 119, 9*100.
However the output shape is 119, 100.
Clearly something else is happening and I can't understand it or get it.
where am I failing my reasoning? How is the 1d convolution performed?
I add one more comment which is my comment on one of the answers provided:
I understand the reduction from 128 to 119, but what I don't understand is why the feature dimension changes. For example, if I use
Conv1D(filters = 1, kernel_size= 10, activation='relu')
, then the output dimension is going to be (None, 119, 1), giving rise to only one feature after the convolution. What is going on in this dimension, which operation is performed to go from from 9 --> 1?

Conv1D needs 3D tensor for its input with shape (batch_size,time_step,feature). Based on your code, the filter size is 100 which means filter converted from 9 dimensions to 100 dimensions. How does this happen? Dot Product.
In above, X_i is the concatenation of k words (k = kernel_size), l is number of filters (l=filters), d is the dimension of input word vector, and p_i is output vector for each window of k words.
What happens in your code?
[n_features * 9] dot [n_features * 9] => [1] => repeat l-times => [1 * 100]
do above for all sequences => [128 * 100]
Another thing that happens here is you did not specify the padding type. According to the docs, by default Conv1d use valid padding which caused your dimension to reduce from 128 to 119. If you need the dimension to be the same as the input you can choose the same option:
Conv1D(filters = 100, kernel_size= 10, activation='relu', padding='same')

It Sums over the last axis, which is the feature axis, you can easily check this by doing the following:
input_shape = (1, 128, 9)
# initialize kernel with ones, and use linear activations
y = tf.keras.layers.Conv1D(1,3, activation="linear", input_shape=input_shape[2:],kernel_initializer="ones")(x)
y :
if you sum x along the feature axis you will get:
x
Now you can easily see that the sum of the first 3 values of sum of x is the first value of convolution, I used a kernel size of 3 to make this verification easier

Related

Understanding output shape for 1D convolution

I'm trying to get my head around 1D convolution - specifically, how the padding comes into it.
Suppose I have an input sequence of shape (batch,128,1) and run it through the following Keras layer:
tf.keras.layers.Conv1D(32, 5, strides=2, padding="same")
I get an output of shape (batch,64,32), but I don't understand why the sequence length has reduced from 128 to 64... I thought the padding="same" parameter kept the output length the same as the input? I suppose that's only true if strides=1; so in this case I'm confused about what padding="same" actually means.
According to the TensorFlow documents in your case we have:
filters (Number of filters - output dimension) = 32
kernelSize (The filter size) = 5
strides (The unit to move in input data by the convolution layer in each dimensions after applying each convolution) = 2
So applying input in shape (batch, 128, 1) will be to apply 32 kernels (in shape 5) and jump two unit after each convolution - so we have 128 / 2 = 64 value corresponding to each filter and at the end output would be in shape (batch, 64, 32).
padding="same" is just determining the the convolution on borders. For more details you can check here.

Keras CNN for non-image matrix

I have recently started learning about Deep Learning and Reinforcement Learning, and I am trying to figure out how to code a Convolutional Neural Network using Keras for a matrix of 0s and 1s with 10 rows and 3 columns.
The input matrix would look like this for example
[
[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
...
]
The output should be another matrix of 0s and 1s, different from the aforementioned input matrix and with a different number of rows and columns.
The location of 0s and 1s in the output matrix is dependent on the location of the 0s and 1s in the input matrix.
There is also a second output, an array where the values are dependent on the location of the 1 in the input matrix.
I have searched the internet for code examples but couldn't find anything useful.
Edit:
The input to the neural network is a 2D array with 10 rows and each row has 3 columns.
The output (for now at least) is a 2D array with 12 rows and each row has 10 columns (the same as the number of rows in the input 2D array).
This is what I came up with so far and I have no idea if it's correct or not.
nbre_users = 10 # number of rows in the input 2D matrix
nbre_subchannels = 12 # number of rows in the output 2D matrix
model = Sequential()
model.add(Dense(50, input_shape=(nbre_users, 3), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Flatten())
model.add(Dense(nbre_subchannels))
model.add(Dense(nbre_users, activation = 'softmax'))
model.compile(optimizer=Adam(learning_rate=1e-4), loss='mean_squared_error')
Here is the model summary:
After clarifications, here is my answer.
The problem you are trying to solve seems to be a neural network that transforms a 2D grayscale image of size (10,3,1) to a 2D grayscale image of size (12,10,1).
A 2D grayscale image is nothing but a 2D matrix with an extra axis set to 1.
a = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
a.shape
#OUTPUT = (3,3)
a.reshape((3,3,1)) #reshape to 3,3,1
#OUTPUT -
#array([[[0],
# [1],
# [0]],
#
# [[1],
# [0],
# [1]],
#
# [[0],
# [1],
# [0]]])
So a 2D matrix of (10,3) can be called a 3D image with a single channel (10,3,1). This will allow you to properly apply convolutions to your input.
If this part is clear, then in the forward computation of the network, since you want to ensure that spatial positions of the 1s and 0s are captured, you want to use convolution layers. Using Dense layers here is not the right step.
However, a series convolution operation help to Downsample and image. Since you need an output 2D matrix (gray scale image), you want to Upsample as well. Such a network is called a Deconv network.
The first series of layers convolve over the input, 'flattening' them into a vector of channels. The next set of layers use 2D Conv Transpose operations to change the channels back into a 2D matrix (Gray scale image)
Refer to this image for reference -
Here is a sample code that shows you how you can take a (10,3,1) image to a (12,10,1) image using a deconv net.
from tensorflow.keras import layers, Model
inp = layers.Input((10,3,1)) ##
x = layers.Conv2D(2, (2,2))(inp) ## Convolution part
x = layers.Conv2D(4, (2,2))(x) ##
x = layers.Conv2DTranspose(4, (3,4))(x) ##
x = layers.Conv2DTranspose(2, (2,4))(x) ## Deconvolution part
out = layers.Conv2DTranspose(1, (2,4))(x) ##
model = Model(inp, out)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_33 (InputLayer) [(None, 10, 3, 1)] 0
_________________________________________________________________
conv2d_49 (Conv2D) (None, 9, 2, 2) 10
_________________________________________________________________
conv2d_50 (Conv2D) (None, 8, 1, 4) 36
_________________________________________________________________
conv2d_transpose_46 (Conv2DT (None, 10, 4, 4) 196
_________________________________________________________________
conv2d_transpose_47 (Conv2DT (None, 11, 7, 2) 66
_________________________________________________________________
conv2d_transpose_48 (Conv2DT (None, 12, 10, 1) 17
=================================================================
Total params: 325
Trainable params: 325
Non-trainable params: 0
_________________________________________________________________
Obviously, feel free to add activations, dropouts, pooling layers etc etc etc. The above code just shows how you can use downsampling and upsampling to get from a given single-channel image to another single-channel image.
On a side note - I would really advise that you spend some time understanding how CNNs work. Deconv nets are complex and if you are solving a problem that involves them, before properly understanding how 2D CNNs work, it may cause some foundational problems especially if you are starting to learn this domain.
You can use 1D convolutional layers if you want to convolve in a single spatial, which from what I understood what you want.
e.g.
# assuming 3x10 matrix with single batch
input_shape = (1, 3, 10)
y = tf.keras.layers.Conv1D(32, 3, activation='relu',input_shape=input_shape[1:])(x)

Python, Keras - ValueError: Cannot feed value of shape (10, 70, 1025) for Tensor u'dense_2_target:0', which has shape '(?, ?)'

I am trying to train a RNN by batches.
The input input size
(10, 70, 3075),
where 10 is the batch size, 70 the time dimension, 3075 are the frequency dimension.
There are three outputs whose size is
(10, 70, 1025)
each, basically 10 spectrograms with size (70,1025).
I would like to train this RNN by regression, whose structure is
input_img = Input(shape=(70,3075 ) )
x = Bidirectional(LSTM(n_hid,return_sequences=True, dropout=0.5, recurrent_dropout=0.2))(input_img)
x = Dropout(0.2)(x)
x = Bidirectional(LSTM(n_hid, dropout=0.5, recurrent_dropout=0.2))(x)
x = Dropout(0.2)(x)
o0 = ( Dense(1025, activation='sigmoid'))(x)
o1 = ( Dense(1025, activation='sigmoid'))(x)
o2 = ( Dense(1025, activation='sigmoid'))(x)
The problem is that output dense layers cannot take into account three dimensions, they want something like (None, 1025), which I don't know how to provide, unless I concatenate along the time dimension.
The following error occurs:
ValueError: Cannot feed value of shape (10, 70, 1025) for Tensor u'dense_2_target:0', which has shape '(?, ?)'
Would be the batch_shape option useful in the input layer? I have actually tried it, but I've got the same error.
In this instance the second RNN is collapsing the sequence to a single vector because by default return_sequences=False. To make the model return sequences and run the Dense layer over each timestep separately just add return_sequences=True to the second RNN as well:
x = Bidirectional(LSTM(n_hid, return_sequences=True, dropout=0.5, recurrent_dropout=0.2))(x)
The Dense layers automatically apply to the last dimension so no need to reshape afterwards.
To get the right output shape, you can use the Reshape layer:
o0 = Dense(70 * 1025, activation='sigmoid')(x)
o0 = Reshape((70, 1025)))(o0)
This will output (batch_dim, 70, 1025). You can do exactly the same for the other two outputs.

Keras SimpleRNN confusion

...coming from TensorFlow, where pretty much any shape and everything is defined explicitly, I am confused about Keras' API for recurrent models. Getting an Elman network to work in TF was pretty easy, but Keras resists to accept the correct shapes...
For example:
x = k.layers.Input(shape=(2,))
y = k.layers.Dense(10)(x)
m = k.models.Model(x, y)
...works perfectly and according to model.summary() I get an input layer with shape (None, 2), followed by a dense layer with output shape (None, 10). Makes sense since Keras automatically adds the first dimension for batch processing.
However, the following code:
x = k.layers.Input(shape=(2,))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
raises an exception ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2.
It works only if I add another dimension:
x = k.layers.Input(shape=(2,1))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
...but now, of course, my input would not be (None, 2) anymore.
model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 2, 1) 0
_________________________________________________________________
simple_rnn_1 (SimpleRNN) (None, 10) 120
=================================================================
How can I have an input of type batch_size x 2 when I just want to feed vectors with 2 values to the network?
Furthermore, how would I chain RNN cells?
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...raises the same exception with incompatible dim sizes.
This sample here works:
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10, return_sequences=True)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...but then layer h does not output (None, 10) anymore, but (None, 2, 10) since it returns the whole sequence instead of just the "regular" RNN cell output.
Why is this needed at all?
Moreover: where are the states? Do they just default to 1 recurrent state?
The documentation touches on the expected shapes of recurrent components in Keras, let's look at your case:
Any RNN layer in Keras expects a 3D shape (batch_size, timesteps, features). This means you have timeseries data.
The RNN layer then iterates over the second, time dimension of the input using a recurrent cell, the actual recurrent computation.
If you specify return_sequences then you collect the output for every timestep getting another 3D tensor (batch_size, timesteps, units) otherwise you only get the last output which is (batch_size, units).
Now returning to your questions:
You mention vectors but shape=(2,) is a vector so this doesn't work. shape=(2,1) works because now you have 2 vectors of size 1, these shapes exclude batch_size. So to feed vectors of size to you need shape=(how_many_vectors, 2) where the first dimension is the number of vectors you want your RNN to process, the timesteps in this case.
To chain RNN layers you need to feed 3D data because that what RNNs expect. When you specify return_sequences the RNN layer returns output at every timestep so that can be chained to another RNN layer.
States are collection of vectors that a RNN cell uses, LSTM uses 2, GRU has 1 hidden state which is also the output. They default to 0s but can be specified when calling the layer using initial_states=[...] as a list of tensors.
There is already a post about the difference between RNN layers and RNN cells in Keras which might help clarify the situation further.

Keras dimensionality in convolutional layer mismatch

I'm trying to play around with Keras to build my first neural network. I have zero experience and I can't seem to figure out why my dimensionality isn't right. I can't figure it out from their docs what this error is complaining about, or even what layer is causing it.
My model takes in a 32byte array of numbers, and is supposed to give a boolean value on the other side. I want a 1D convolution on the input byte array.
arr1 is the 32byte array, arr2 is an array of booleans.
inputData = np.array(arr1)
inputData = np.expand_dims(inputData, axis = 2)
labelData = np.array(arr2)
print inputData.shape
print labelData.shape
model = k.models.Sequential()
model.add(k.layers.convolutional.Convolution1D(32,2, input_shape = (32, 1)))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.core.Dense(32))
model.add(k.layers.Activation('sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'rmsprop',
metrics=['accuracy'])
model.fit(
inputData,labelData
)
The output of the print of shapes are
(1000, 32, 1) and (1000,)
The error I receive is:
Traceback (most recent call last): File "cnn/init.py", line
50, in
inputData,labelData File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/models.py",
line 863, in fit
initial_epoch=initial_epoch) File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/engine/training.py",
line 1358, in fit
batch_size=batch_size) File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/engine/training.py",
line 1238, in _standardize_user_data
exception_prefix='target') File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/engine/training.py",
line 128, in _standardize_input_data
str(array.shape)) ValueError: Error when checking target: expected activation_5 to have 3 dimensions, but got array with shape (1000, 1)
Well It seems to me that you need to google a bit more about convolutional networks :-)
You are applying at each step 32 filters of length 2 over yout sequence. So if we follow the dimensions of the tensors after each layer :
Dimensions : (None, 32, 1)
model.add(k.layers.convolutional.Convolution1D(32,2, input_shape = (32, 1)))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 31, 32)
(your filter of length 2 goes over the whole sequence so the sequence is now of length 31)
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 30, 32)
(you lose again one value because of your filters of length 2, but you still have 32 of them)
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 29, 32)
(same...)
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 28, 32)
Now you want to use a Dense layer on top of that... the thing is that the Dense layer will work as follow on your 3D input :
model.add(k.layers.core.Dense(32))
model.add(k.layers.Activation('sigmoid'))
Dimensions : (None, 28, 32)
This is your output. First thing that I find weird is that you want 32 outputs out of your dense layer... You should have put 1 instead of 32. But even this will not fix your problem. See what happens if we change the last layer :
model.add(k.layers.core.Dense(1))
model.add(k.layers.Activation('sigmoid'))
Dimensions : (None, 28, 1)
This happens because you apply a dense layer to a '2D' tensor. What it does in case you apply a dense(1) layer to an input [28, 32] is that it produces a weight matrix of shape (32,1) that it applies to the 28 vectors so that you find yourself with 28 outputs of size 1.
What I propose to fix this is to change the last 2 layers like this :
model = k.models.Sequential()
model.add(k.layers.convolutional.Convolution1D(32,2, input_shape = (32, 1)))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
# Only use one filter so that the output will be a sequence of 28 values, not a matrix.
model.add(k.layers.convolutional.Convolution1D(1,2))
model.add(k.layers.Activation('relu'))
# Change the shape from (None, 28, 1) to (None, 28)
model.add(k.layers.core.Flatten())
# Only one neuron as output to get the binary target.
model.add(k.layers.core.Dense(1))
model.add(k.layers.Activation('sigmoid'))
Now the last two steps will take your tensor from
(None, 29, 32) -> (None, 28, 1) -> (None, 28) -> (None, 1)
I hope this helps you.
ps. if you were wondering what None is , it's the dimension of the batch, you don't feed the 1000 samples at onces, you feed it batch by batch and as the value depends on what is chosen, by convension we put None.
EDIT :
Explaining a bit more why the sequences length loses one value at each step.
Say you have a sequence of 4 values [x1 x2 x3 x4], you want to use your filter of length 2 [f1 f2] to convolve over the sequence. The first value will be given by y1 = [f1 f2] * [x1 x2], the second will be y2 = [f1 f2] * [x2 x3], the third will be y3 = [f1 f2] * [x3 x4]. Then you reached the end of your sequence and cannot go further. You have as a result a sequnce [y1 y2 y3].
This is due to the filter length and the effects at the borders of your sequence. There are multiple options, some pad the sequence with 0's in order to get exactly the same length of output... You can chose that option with the parameter 'padding'. You can read more about this here and find the different values possible for the padding argument here. I encourage you to read this last link, it gives informations about input and output shapes...
From the doc :
padding: One of "valid" or "same" (case-insensitive). "valid" means "no padding". "same" results in padding the input such that the output has the same length as the original input.
the default is 'valid', so you don't pad in your example.
I also recommend you to upgrade your keras version to the latest. Convolution1D is now Conv1D, so you might find the doc and tutorials confusing.

Categories