I have recently started learning about Deep Learning and Reinforcement Learning, and I am trying to figure out how to code a Convolutional Neural Network using Keras for a matrix of 0s and 1s with 10 rows and 3 columns.
The input matrix would look like this for example
[
[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
...
]
The output should be another matrix of 0s and 1s, different from the aforementioned input matrix and with a different number of rows and columns.
The location of 0s and 1s in the output matrix is dependent on the location of the 0s and 1s in the input matrix.
There is also a second output, an array where the values are dependent on the location of the 1 in the input matrix.
I have searched the internet for code examples but couldn't find anything useful.
Edit:
The input to the neural network is a 2D array with 10 rows and each row has 3 columns.
The output (for now at least) is a 2D array with 12 rows and each row has 10 columns (the same as the number of rows in the input 2D array).
This is what I came up with so far and I have no idea if it's correct or not.
nbre_users = 10 # number of rows in the input 2D matrix
nbre_subchannels = 12 # number of rows in the output 2D matrix
model = Sequential()
model.add(Dense(50, input_shape=(nbre_users, 3), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Flatten())
model.add(Dense(nbre_subchannels))
model.add(Dense(nbre_users, activation = 'softmax'))
model.compile(optimizer=Adam(learning_rate=1e-4), loss='mean_squared_error')
Here is the model summary:
After clarifications, here is my answer.
The problem you are trying to solve seems to be a neural network that transforms a 2D grayscale image of size (10,3,1) to a 2D grayscale image of size (12,10,1).
A 2D grayscale image is nothing but a 2D matrix with an extra axis set to 1.
a = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
a.shape
#OUTPUT = (3,3)
a.reshape((3,3,1)) #reshape to 3,3,1
#OUTPUT -
#array([[[0],
# [1],
# [0]],
#
# [[1],
# [0],
# [1]],
#
# [[0],
# [1],
# [0]]])
So a 2D matrix of (10,3) can be called a 3D image with a single channel (10,3,1). This will allow you to properly apply convolutions to your input.
If this part is clear, then in the forward computation of the network, since you want to ensure that spatial positions of the 1s and 0s are captured, you want to use convolution layers. Using Dense layers here is not the right step.
However, a series convolution operation help to Downsample and image. Since you need an output 2D matrix (gray scale image), you want to Upsample as well. Such a network is called a Deconv network.
The first series of layers convolve over the input, 'flattening' them into a vector of channels. The next set of layers use 2D Conv Transpose operations to change the channels back into a 2D matrix (Gray scale image)
Refer to this image for reference -
Here is a sample code that shows you how you can take a (10,3,1) image to a (12,10,1) image using a deconv net.
from tensorflow.keras import layers, Model
inp = layers.Input((10,3,1)) ##
x = layers.Conv2D(2, (2,2))(inp) ## Convolution part
x = layers.Conv2D(4, (2,2))(x) ##
x = layers.Conv2DTranspose(4, (3,4))(x) ##
x = layers.Conv2DTranspose(2, (2,4))(x) ## Deconvolution part
out = layers.Conv2DTranspose(1, (2,4))(x) ##
model = Model(inp, out)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_33 (InputLayer) [(None, 10, 3, 1)] 0
_________________________________________________________________
conv2d_49 (Conv2D) (None, 9, 2, 2) 10
_________________________________________________________________
conv2d_50 (Conv2D) (None, 8, 1, 4) 36
_________________________________________________________________
conv2d_transpose_46 (Conv2DT (None, 10, 4, 4) 196
_________________________________________________________________
conv2d_transpose_47 (Conv2DT (None, 11, 7, 2) 66
_________________________________________________________________
conv2d_transpose_48 (Conv2DT (None, 12, 10, 1) 17
=================================================================
Total params: 325
Trainable params: 325
Non-trainable params: 0
_________________________________________________________________
Obviously, feel free to add activations, dropouts, pooling layers etc etc etc. The above code just shows how you can use downsampling and upsampling to get from a given single-channel image to another single-channel image.
On a side note - I would really advise that you spend some time understanding how CNNs work. Deconv nets are complex and if you are solving a problem that involves them, before properly understanding how 2D CNNs work, it may cause some foundational problems especially if you are starting to learn this domain.
You can use 1D convolutional layers if you want to convolve in a single spatial, which from what I understood what you want.
e.g.
# assuming 3x10 matrix with single batch
input_shape = (1, 3, 10)
y = tf.keras.layers.Conv1D(32, 3, activation='relu',input_shape=input_shape[1:])(x)
Related
I'm trying to get my head around 1D convolution - specifically, how the padding comes into it.
Suppose I have an input sequence of shape (batch,128,1) and run it through the following Keras layer:
tf.keras.layers.Conv1D(32, 5, strides=2, padding="same")
I get an output of shape (batch,64,32), but I don't understand why the sequence length has reduced from 128 to 64... I thought the padding="same" parameter kept the output length the same as the input? I suppose that's only true if strides=1; so in this case I'm confused about what padding="same" actually means.
According to the TensorFlow documents in your case we have:
filters (Number of filters - output dimension) = 32
kernelSize (The filter size) = 5
strides (The unit to move in input data by the convolution layer in each dimensions after applying each convolution) = 2
So applying input in shape (batch, 128, 1) will be to apply 32 kernels (in shape 5) and jump two unit after each convolution - so we have 128 / 2 = 64 value corresponding to each filter and at the end output would be in shape (batch, 64, 32).
padding="same" is just determining the the convolution on borders. For more details you can check here.
I'm trying to convert a trained sign language classification solution in python to a C language headers so that I can deploy on a M4-cortex CPU board.
In Python, I'm able to build model and train it and I can see it predicting with 90% accuracy.
But I see an issue with number of weights used/generated in convolution layers
**Conv_1d configuration**
print(x_train.shape)
model = Sequential()
model.add(Conv1D(32,kernel_size=5, padding='same',
input_shape=x_train.shape[1:], name='conv1d_1'))
print(model.layers[0].kernel.numpy().shape)
**output:**
(1742, 45, 45)
**(5, 45, 32)**
According to above configuration
input dimension = 45x45x1 pixels of image(gray scale)
input channels = 1
output dimension = 45x45x32
output channesls = 32
kernel size = 5
As per the concept(w.r.t https://cs231n.github.io/convolutional-networks/)
number of weights = (input_channels) x (kernel_size) x (kernel_size) x (output_channels)=1x5x5x32=800
But keras model produces weights array of size = [5][45][32]=7200
I'm not sure if my interpretation of weight array in keras model is correct, I would be glad if someone can help me with this
Some bullets that should clarify your doubts.
You're formula for the number of weights can't be right because you're using a Conv1D, so the kernel size has only one dimension.
Defining the input shape x_train.shape[1:] = (45,45) corresponds to 45 filters applied on an array with 45 elements (again because it's a Conv1D).
Said so, the number of weights is:
# of weights = input_filters x kernel_size x output_filters = 45x5x32 = 7200 (without biases)
Considering that you have images, probably you're looking for Conv2D. In this case, the input shape should be (45,45,1), the kernel has two dimensions, and the number of parameters is exactly 800 (without biases)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32,kernel_size=5, padding='same',
input_shape=(45, 45, 1), use_bias=False))
model.summary()
# Layer (type) Output Shape Param #
# conv (Conv2D) (None, 45, 45, 32) 800
I have 30 sets of 3D points which are the keypoints describing 30 objects, each set contains 10 points, these points denote as X with shape [30,10,3]. I also have the corrsponding 3D points of the 30 objects after certain transformation, denote as Y with shape [30,10,3].
Now I want to train a machine learning model from these 30 objects, using the X and Y as data and annotation, and predict the keypoints coordinates of a new object after transformation.
Anybody has ideas how to do it with python?
30 sets seem too low for the task, as the input and output have too many dimensions (30). If you need to map these high dimensional data, you need thousands of examples (more sets).
I would suggest to simulate the transformations to generate a few thousands samples. Then use a small neural network to predict the Y from X. As, the inputs don't have spatial or temporal dimensions instead represent discrete points I don't think convolutional or recurrent models will be useful.
So, start with a small MLP with mean squared error loss. However, if the output points are always integer then you can model it as a classification problem too considering the range is not big.
I have added a small neural network model in keras to predict the transformation.
import numpy as np
import keras
import tensorflow
from keras.layers import Input, Dense, Reshape
from keras.models import Model
X = np.random.randint(-100, 100, (3000, 10, 3)) # 10 3d points
Y = 2*(X + 5)/7 # this is our simple transformation operation
print(X.shape)
print(Y.shape)
in_m = Input(shape=(30,)) # input layer
f1_fc = Dense(100, activation = 'relu')(in_m) # first fc layer
f2_fc = Dense(30, activation = 'linear')(f1_fc) # second fc layer
simple_model = Model(in_m, f2_fc)
simple_model.summary()
simple_model.compile(loss='mse', metrics=['mae'], optimizer='adam')
X_flat = np.reshape(X, (3000, 30))
Y_flat = np.reshape(Y, (3000, 30))
hist = simple_model.fit(X_flat, Y_flat, epochs = 100, validation_split = 0.2, batch_size = 20)
Output:
(3000, 10, 3)
(3000, 10, 3)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) (None, 30) 0
_________________________________________________________________
dense_15 (Dense) (None, 100) 3100
_________________________________________________________________
dense_16 (Dense) (None, 30) 3030
=================================================================
Total params: 6,130
Trainable params: 6,130
Non-trainable params: 0
I am trying to implement a 1D convolution on a time series classification problem using keras. I am having some trouble interpreting the output size of the 1D convolutional layer.
I have my data composed of the time series of different features over a time interval of 128 units and I apply a 1D convolutional layer:
x = Input((n_timesteps, n_features))
cnn1_1 = Conv1D(filters = 100, kernel_size= 10, activation='relu')(x)
which after compilation I obtain the following shapes of the outputs:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_26 (InputLayer) (None, 128, 9) 0
_________________________________________________________________
conv1d_28 (Conv1D) (None, 119, 100) 9100
I was assuming that with 1D convolution, the data is only convoluted across the time axis (axis 1) and the size of my output would be:
119, 100*9. But I guess that the network is performing some king of operation across the feature dimension (axis 2) and I don't know which operation is performing.
I am saying this because what I interpret as 1d convolution is that the features shapes must be preserved because I am only convolving the time domain: If I have 9 features, then for each filter I have 9 convolutional kernels, each of these applied to a different features and convoluted across the time axis. This should return 9 convoluted features for each filter resulting in an output shape of 119, 9*100.
However the output shape is 119, 100.
Clearly something else is happening and I can't understand it or get it.
where am I failing my reasoning? How is the 1d convolution performed?
I add one more comment which is my comment on one of the answers provided:
I understand the reduction from 128 to 119, but what I don't understand is why the feature dimension changes. For example, if I use
Conv1D(filters = 1, kernel_size= 10, activation='relu')
, then the output dimension is going to be (None, 119, 1), giving rise to only one feature after the convolution. What is going on in this dimension, which operation is performed to go from from 9 --> 1?
Conv1D needs 3D tensor for its input with shape (batch_size,time_step,feature). Based on your code, the filter size is 100 which means filter converted from 9 dimensions to 100 dimensions. How does this happen? Dot Product.
In above, X_i is the concatenation of k words (k = kernel_size), l is number of filters (l=filters), d is the dimension of input word vector, and p_i is output vector for each window of k words.
What happens in your code?
[n_features * 9] dot [n_features * 9] => [1] => repeat l-times => [1 * 100]
do above for all sequences => [128 * 100]
Another thing that happens here is you did not specify the padding type. According to the docs, by default Conv1d use valid padding which caused your dimension to reduce from 128 to 119. If you need the dimension to be the same as the input you can choose the same option:
Conv1D(filters = 100, kernel_size= 10, activation='relu', padding='same')
It Sums over the last axis, which is the feature axis, you can easily check this by doing the following:
input_shape = (1, 128, 9)
# initialize kernel with ones, and use linear activations
y = tf.keras.layers.Conv1D(1,3, activation="linear", input_shape=input_shape[2:],kernel_initializer="ones")(x)
y :
if you sum x along the feature axis you will get:
x
Now you can easily see that the sum of the first 3 values of sum of x is the first value of convolution, I used a kernel size of 3 to make this verification easier
...coming from TensorFlow, where pretty much any shape and everything is defined explicitly, I am confused about Keras' API for recurrent models. Getting an Elman network to work in TF was pretty easy, but Keras resists to accept the correct shapes...
For example:
x = k.layers.Input(shape=(2,))
y = k.layers.Dense(10)(x)
m = k.models.Model(x, y)
...works perfectly and according to model.summary() I get an input layer with shape (None, 2), followed by a dense layer with output shape (None, 10). Makes sense since Keras automatically adds the first dimension for batch processing.
However, the following code:
x = k.layers.Input(shape=(2,))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
raises an exception ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2.
It works only if I add another dimension:
x = k.layers.Input(shape=(2,1))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
...but now, of course, my input would not be (None, 2) anymore.
model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 2, 1) 0
_________________________________________________________________
simple_rnn_1 (SimpleRNN) (None, 10) 120
=================================================================
How can I have an input of type batch_size x 2 when I just want to feed vectors with 2 values to the network?
Furthermore, how would I chain RNN cells?
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...raises the same exception with incompatible dim sizes.
This sample here works:
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10, return_sequences=True)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...but then layer h does not output (None, 10) anymore, but (None, 2, 10) since it returns the whole sequence instead of just the "regular" RNN cell output.
Why is this needed at all?
Moreover: where are the states? Do they just default to 1 recurrent state?
The documentation touches on the expected shapes of recurrent components in Keras, let's look at your case:
Any RNN layer in Keras expects a 3D shape (batch_size, timesteps, features). This means you have timeseries data.
The RNN layer then iterates over the second, time dimension of the input using a recurrent cell, the actual recurrent computation.
If you specify return_sequences then you collect the output for every timestep getting another 3D tensor (batch_size, timesteps, units) otherwise you only get the last output which is (batch_size, units).
Now returning to your questions:
You mention vectors but shape=(2,) is a vector so this doesn't work. shape=(2,1) works because now you have 2 vectors of size 1, these shapes exclude batch_size. So to feed vectors of size to you need shape=(how_many_vectors, 2) where the first dimension is the number of vectors you want your RNN to process, the timesteps in this case.
To chain RNN layers you need to feed 3D data because that what RNNs expect. When you specify return_sequences the RNN layer returns output at every timestep so that can be chained to another RNN layer.
States are collection of vectors that a RNN cell uses, LSTM uses 2, GRU has 1 hidden state which is also the output. They default to 0s but can be specified when calling the layer using initial_states=[...] as a list of tensors.
There is already a post about the difference between RNN layers and RNN cells in Keras which might help clarify the situation further.