keras conv1d layer weight count is not produced as expected - python

I'm trying to convert a trained sign language classification solution in python to a C language headers so that I can deploy on a M4-cortex CPU board.
In Python, I'm able to build model and train it and I can see it predicting with 90% accuracy.
But I see an issue with number of weights used/generated in convolution layers
**Conv_1d configuration**
print(x_train.shape)
model = Sequential()
model.add(Conv1D(32,kernel_size=5, padding='same',
input_shape=x_train.shape[1:], name='conv1d_1'))
print(model.layers[0].kernel.numpy().shape)
**output:**
(1742, 45, 45)
**(5, 45, 32)**
According to above configuration
input dimension = 45x45x1 pixels of image(gray scale)
input channels = 1
output dimension = 45x45x32
output channesls = 32
kernel size = 5
As per the concept(w.r.t https://cs231n.github.io/convolutional-networks/)
number of weights = (input_channels) x (kernel_size) x (kernel_size) x (output_channels)=1x5x5x32=800
But keras model produces weights array of size = [5][45][32]=7200
I'm not sure if my interpretation of weight array in keras model is correct, I would be glad if someone can help me with this

Some bullets that should clarify your doubts.
You're formula for the number of weights can't be right because you're using a Conv1D, so the kernel size has only one dimension.
Defining the input shape x_train.shape[1:] = (45,45) corresponds to 45 filters applied on an array with 45 elements (again because it's a Conv1D).
Said so, the number of weights is:
# of weights = input_filters x kernel_size x output_filters = 45x5x32 = 7200 (without biases)
Considering that you have images, probably you're looking for Conv2D. In this case, the input shape should be (45,45,1), the kernel has two dimensions, and the number of parameters is exactly 800 (without biases)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32,kernel_size=5, padding='same',
input_shape=(45, 45, 1), use_bias=False))
model.summary()
# Layer (type) Output Shape Param #
# conv (Conv2D) (None, 45, 45, 32) 800

Related

Which axis does Keras SimpleRNN / LSTM use as the temporal axis by default?

When using a SimpleRNN or LSTM for classical sentiment analysis algorithms (applied here to sentences of length <= 250 words/tokens):
model = Sequential()
model.add(Embedding(5000, 32, input_length=250)) # Output shape: (None, 250, 32)
model.add(SimpleRNN(100)) # Output shape: (None, 100)
model.add(Dense(1, activation='sigmoid')) # Output shape: (None, 1)
where is it specified which axis of the input of the RNN is used as the "temporal" axis?
To be more precise, after the Embedding layer, a given input sentence, e.g. "the cat sat on the mat", is encoded into a matrix x of shape (250, 32), where 250 is the max length (in words) of the input text, and 32 the dimension of the embedding. Then, where in Keras is it specified if this will be used:
h[t] = activation( W_h * x[:, t] + U_h * h[t-1] + b_h )
or this:
h[t] = activation( W_h * x[t, :] + U_h * h[t-1] + b_h )
(In both cases, y[t] = activation( W_y * h[t] + b_y ))
TL;DR: if an input for a RNN Keras layer is of size, say, (250, 32), which axis does it use as the temporal axis by default? Where is this detailed in the Keras or Tensorflow documentation?
PS: how to explain the number of parameters (given by model.summary()) which is 13300? W_h has 100x32 coefs, U_h has 100x100 coefs, b_h has 100x1 coefs, i.e. we already have 13300! There is no coefs left for W_y and b_y! How to explain this?
Temporal axis: it's always dim 1, unless time_major=True, then it's dim 2; the Embedding layer outputs a 3D tensor. This can be seen here where step_input_shape is the shape of input fed to the RNN cell at each step in the recurrent loop. For your case, timesteps=250, and the SimpleRNN cell "sees" a tensor shaped (batch_size, 32) at each step.
# of params: you can see how the figure's derived by inspecting each layer's .build() code: Embedding, SimpleRNN, Dense, or likewise calling .weights on each layer. For your case, w/ l = model.layers[1]:
l.weights[0].shape == (32, 100) --> 3200 params (kernel)
l.weights[1].shape == (100, 100) --> 10000 params (recurrent_kernel)
l.weights[2].shape == (100,) --> 100 params (bias) (sum: 13,300)
Computation logic: there is no W_y or b_y; the "y" is the hidden state, h, actually for all recurrent layers - what you cite are likely from generic RNN formulae. # "in both cases..." - this is false; to see what's actually happening, inspect the .call() code.
P.S. I recommend defining the full batch_shape of the model for debugging, as it eliminates the ambiguous None shapes
SimpleRNN formula vs. code: as requested; note the h in source code is misleading, and is typically z in formulae ("pre-activation").
return_sequences=True -> outputs for all timesteps are returned: (batch_size, timesteps, channels)
return_sequences=False -> only last timestep's output is returned: (batch_size, 1, channels). See here

Mismatch in expected Keras shapes after pooling

I'm building a few simple models in Keras to improve my knowledge of deep learning, and encountering some issues I don't quite understand how to debug.
I want to use a 1D CNN to perform regression on some time-series data. My input feature tensor is of shape N x T x D, where N is the number of data points, T is the number of sequences, and D is the number of dimensions. My target tensor is of shape N x T x 1 (1 because I am trying to output a scalar value).
I've set up my model architecture like this:
feature_tensor.shape
# (75584, 40, 38)
target_tensor.shape
# (75584, 40, 1)
inputs = Input(shape=(SEQUENCE_LENGTH,DIMENSIONS))
conv1 = Conv1D(filters=64, kernel_size=3, activation='relu')
x = conv1(inputs)
x = MaxPooling1D(pool_size=2)(x)
x = Flatten()(x)
x = Dense(100, activation='relu')(x)
predictions = Dense(1, activation="linear")(x)
model = Model(inputs, predictions)
opt = Adam(lr=1e-5, decay=1e-4 / 200)
model.compile(loss="mean_absolute_error", optimizer=opt)
When I attempt to train my model, however, I get the following output:
r = model.fit(cleaned_tensor, target_tensor, epochs=100, batch_size=2058)
ValueError: Error when checking target: expected dense_164 to have 2
dimensions, but got array with shape (75584, 40, 1).
The first two numbers are familiar: 75584 is the # of samples, 40 is the sequence length.
When I debug my model summary object, I see that the expected output from the Flatten layer should be 1216:
However, my colleague and I stared at the code for a long time and could not understand why the shape of (75584, 40, 1) was being arrived at via the architecture when it reached the dense layer.
Could someone point me in the direction of what I am doing wrong?
Try reshaping your target variable to N x T, and it looks like your final dense layer should be 40 rather than 1 (i think).

keras model predict without fit, what does it mean?

I see the following example code on tensorflow 2.0 API
model = Sequential()
model.add(Embedding(1000, 64, input_length=10))
# the model will take as input an integer matrix of size (batch,
# input_length).
# the largest integer (i.e. word index) in the input should be no larger
# than 999 (vocabulary size).
# now model.output_shape == (None, 10, 64), where None is the batch
# dimension.
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
assert output_array.shape == (32, 10, 64)
I have used keras API for a few days, compile, fit and then predict is my way.
What does above example mean without fit step?
It represents the use of initialized parameters in the model without fit(). This example is just to illustrate the return shape of Embedding layer.

Keras - Embedding Layer and GRU Layer Shape Error

# input_shape = (137861, 21, 1)
# output_sequence_length = 21
# english_vocab_size = 199
# french_vocab_size = 344
def embed_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
'''
Build and train a RNN model using word embedding on x and y
:param input_shape: Tuple of input shape
:param output_sequence_length: Length of output sequence
:param english_vocab_size: Number of unique English words in the dataset
:param french_vocab_size: Number of unique French words in the dataset
:return: Keras model built, but not trained
'''
learning_rate = 1e-3
model = Sequential()
model.add(Embedding(english_vocab_size, 128, input_length=output_sequence_length, input_shape=input_shape[1:]))
model.add(GRU(units=128, return_sequences=True))
model.add(TimeDistributed(Dense(french_vocab_size)))
model.add(Activation('softmax'))
model.summary()
model.compile(loss=sparse_categorical_crossentropy,
optimizer=Adam(learning_rate),
metrics=['accuracy'])
return model
When invoking this method to train a model, it gets the error:
ValueError: Input 0 is incompatible with layer gru_1: expected ndim=3, found ndim=4
How to fix the shape error between Embedding Layer and GRU Layer?
The problem is that the Embedding layer takes a 2D array as the input. However, the shape of the input array is (137861, 21, 1) which makes it a 3D array. Simply remove the last axis using squeeze() method from numpy:
data = np.squeeze(data, axis=-1)
As a side, there is no need to use TimeDistributed layer here, since the Dense layer is applied on the last axis by defualt.

Keras SimpleRNN confusion

...coming from TensorFlow, where pretty much any shape and everything is defined explicitly, I am confused about Keras' API for recurrent models. Getting an Elman network to work in TF was pretty easy, but Keras resists to accept the correct shapes...
For example:
x = k.layers.Input(shape=(2,))
y = k.layers.Dense(10)(x)
m = k.models.Model(x, y)
...works perfectly and according to model.summary() I get an input layer with shape (None, 2), followed by a dense layer with output shape (None, 10). Makes sense since Keras automatically adds the first dimension for batch processing.
However, the following code:
x = k.layers.Input(shape=(2,))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
raises an exception ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2.
It works only if I add another dimension:
x = k.layers.Input(shape=(2,1))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
...but now, of course, my input would not be (None, 2) anymore.
model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 2, 1) 0
_________________________________________________________________
simple_rnn_1 (SimpleRNN) (None, 10) 120
=================================================================
How can I have an input of type batch_size x 2 when I just want to feed vectors with 2 values to the network?
Furthermore, how would I chain RNN cells?
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...raises the same exception with incompatible dim sizes.
This sample here works:
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10, return_sequences=True)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...but then layer h does not output (None, 10) anymore, but (None, 2, 10) since it returns the whole sequence instead of just the "regular" RNN cell output.
Why is this needed at all?
Moreover: where are the states? Do they just default to 1 recurrent state?
The documentation touches on the expected shapes of recurrent components in Keras, let's look at your case:
Any RNN layer in Keras expects a 3D shape (batch_size, timesteps, features). This means you have timeseries data.
The RNN layer then iterates over the second, time dimension of the input using a recurrent cell, the actual recurrent computation.
If you specify return_sequences then you collect the output for every timestep getting another 3D tensor (batch_size, timesteps, units) otherwise you only get the last output which is (batch_size, units).
Now returning to your questions:
You mention vectors but shape=(2,) is a vector so this doesn't work. shape=(2,1) works because now you have 2 vectors of size 1, these shapes exclude batch_size. So to feed vectors of size to you need shape=(how_many_vectors, 2) where the first dimension is the number of vectors you want your RNN to process, the timesteps in this case.
To chain RNN layers you need to feed 3D data because that what RNNs expect. When you specify return_sequences the RNN layer returns output at every timestep so that can be chained to another RNN layer.
States are collection of vectors that a RNN cell uses, LSTM uses 2, GRU has 1 hidden state which is also the output. They default to 0s but can be specified when calling the layer using initial_states=[...] as a list of tensors.
There is already a post about the difference between RNN layers and RNN cells in Keras which might help clarify the situation further.

Categories