Keras SimpleRNN confusion - python

...coming from TensorFlow, where pretty much any shape and everything is defined explicitly, I am confused about Keras' API for recurrent models. Getting an Elman network to work in TF was pretty easy, but Keras resists to accept the correct shapes...
For example:
x = k.layers.Input(shape=(2,))
y = k.layers.Dense(10)(x)
m = k.models.Model(x, y)
...works perfectly and according to model.summary() I get an input layer with shape (None, 2), followed by a dense layer with output shape (None, 10). Makes sense since Keras automatically adds the first dimension for batch processing.
However, the following code:
x = k.layers.Input(shape=(2,))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
raises an exception ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2.
It works only if I add another dimension:
x = k.layers.Input(shape=(2,1))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
...but now, of course, my input would not be (None, 2) anymore.
model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 2, 1) 0
_________________________________________________________________
simple_rnn_1 (SimpleRNN) (None, 10) 120
=================================================================
How can I have an input of type batch_size x 2 when I just want to feed vectors with 2 values to the network?
Furthermore, how would I chain RNN cells?
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...raises the same exception with incompatible dim sizes.
This sample here works:
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10, return_sequences=True)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...but then layer h does not output (None, 10) anymore, but (None, 2, 10) since it returns the whole sequence instead of just the "regular" RNN cell output.
Why is this needed at all?
Moreover: where are the states? Do they just default to 1 recurrent state?

The documentation touches on the expected shapes of recurrent components in Keras, let's look at your case:
Any RNN layer in Keras expects a 3D shape (batch_size, timesteps, features). This means you have timeseries data.
The RNN layer then iterates over the second, time dimension of the input using a recurrent cell, the actual recurrent computation.
If you specify return_sequences then you collect the output for every timestep getting another 3D tensor (batch_size, timesteps, units) otherwise you only get the last output which is (batch_size, units).
Now returning to your questions:
You mention vectors but shape=(2,) is a vector so this doesn't work. shape=(2,1) works because now you have 2 vectors of size 1, these shapes exclude batch_size. So to feed vectors of size to you need shape=(how_many_vectors, 2) where the first dimension is the number of vectors you want your RNN to process, the timesteps in this case.
To chain RNN layers you need to feed 3D data because that what RNNs expect. When you specify return_sequences the RNN layer returns output at every timestep so that can be chained to another RNN layer.
States are collection of vectors that a RNN cell uses, LSTM uses 2, GRU has 1 hidden state which is also the output. They default to 0s but can be specified when calling the layer using initial_states=[...] as a list of tensors.
There is already a post about the difference between RNN layers and RNN cells in Keras which might help clarify the situation further.

Related

Which axis does Keras SimpleRNN / LSTM use as the temporal axis by default?

When using a SimpleRNN or LSTM for classical sentiment analysis algorithms (applied here to sentences of length <= 250 words/tokens):
model = Sequential()
model.add(Embedding(5000, 32, input_length=250)) # Output shape: (None, 250, 32)
model.add(SimpleRNN(100)) # Output shape: (None, 100)
model.add(Dense(1, activation='sigmoid')) # Output shape: (None, 1)
where is it specified which axis of the input of the RNN is used as the "temporal" axis?
To be more precise, after the Embedding layer, a given input sentence, e.g. "the cat sat on the mat", is encoded into a matrix x of shape (250, 32), where 250 is the max length (in words) of the input text, and 32 the dimension of the embedding. Then, where in Keras is it specified if this will be used:
h[t] = activation( W_h * x[:, t] + U_h * h[t-1] + b_h )
or this:
h[t] = activation( W_h * x[t, :] + U_h * h[t-1] + b_h )
(In both cases, y[t] = activation( W_y * h[t] + b_y ))
TL;DR: if an input for a RNN Keras layer is of size, say, (250, 32), which axis does it use as the temporal axis by default? Where is this detailed in the Keras or Tensorflow documentation?
PS: how to explain the number of parameters (given by model.summary()) which is 13300? W_h has 100x32 coefs, U_h has 100x100 coefs, b_h has 100x1 coefs, i.e. we already have 13300! There is no coefs left for W_y and b_y! How to explain this?
Temporal axis: it's always dim 1, unless time_major=True, then it's dim 2; the Embedding layer outputs a 3D tensor. This can be seen here where step_input_shape is the shape of input fed to the RNN cell at each step in the recurrent loop. For your case, timesteps=250, and the SimpleRNN cell "sees" a tensor shaped (batch_size, 32) at each step.
# of params: you can see how the figure's derived by inspecting each layer's .build() code: Embedding, SimpleRNN, Dense, or likewise calling .weights on each layer. For your case, w/ l = model.layers[1]:
l.weights[0].shape == (32, 100) --> 3200 params (kernel)
l.weights[1].shape == (100, 100) --> 10000 params (recurrent_kernel)
l.weights[2].shape == (100,) --> 100 params (bias) (sum: 13,300)
Computation logic: there is no W_y or b_y; the "y" is the hidden state, h, actually for all recurrent layers - what you cite are likely from generic RNN formulae. # "in both cases..." - this is false; to see what's actually happening, inspect the .call() code.
P.S. I recommend defining the full batch_shape of the model for debugging, as it eliminates the ambiguous None shapes
SimpleRNN formula vs. code: as requested; note the h in source code is misleading, and is typically z in formulae ("pre-activation").
return_sequences=True -> outputs for all timesteps are returned: (batch_size, timesteps, channels)
return_sequences=False -> only last timestep's output is returned: (batch_size, 1, channels). See here

Making inputs to keras RNN written in Functional API

I'm having some problems making masking work with a keras RNN written in Functional API. The idea is to mask a tensor, zero-padded, with shape (batch_size, timesteps, 100) and feed it into a SimpleRNN. Right now I have the following:
input = keras.layers.Input(shape=(None, 100))
mask_layer = keras.layers.Masking(mask_value=0.)
mask = mask_layer(input)
rnn = keras.layers.SimpleRNN(20)
x = rnn(input, mask=mask)
However, this does not work, because it raises the following InvalidArgumentError:
InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 20 and 2000. Shapes are [?,20] and [?,2000]. for 'Select' (op: 'Select') with input shapes: [?,2000], [?,20], [?,20].
By changing my Input's shape into (None, 1) - a sequential input where each element is a single integer, instead of n-dimensional embeddings - I've gotten this code to work. I've also gotten the same idea to work with the Sequential API, but I cannot do this, as my final model will have multiple inputs and outputs. I also do not want to force my Input's shape to be (None, 1), as I want to swap out different embedding models (Word2Vec, etc) during preprocessing, which means my Inputs will be embedding vectors from the start.
Can anyone help me with using masks with RNNs when using keras's functional API?
According to Masking and Padding with Keras, you won't need to manually set mask on the RNN layer, in the following code the RNN layer will automatically receive the mask.
import keras
input_layer = keras.layers.Input(shape=(None, 100))
masked_layer = keras.layers.Masking(mask_value=0.)(input_layer)
rnn_layer = keras.layers.SimpleRNN(20)(masked_layer)

Creating tensor of dynamic shape from python lists to feed tensorflow RNN

I'm creating an end-to-end speech recognition architecture, in which my data is a list of segmented spectrograms. My data has shape (batch_size, timesteps, 8, 65, 1) in which batch_size is fixed but timesteps is varying. I can't figure out, how to put this data into a tensor with the appropriate shape to feed my model. Here is a piece of code that shows my problem:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Dropout, Flatten, TimeDistributed
from tensorflow.keras.layers import SimpleRNN, LSTM
from tensorflow.keras import Input, layers
from tensorflow.keras import backend as K
segment_width = 8
segment_height = 65
segment_channels = 1
batch_size = 4
segment_lengths = [28, 33, 67, 43]
label_lengths = [16, 18, 42, 32]
TARGET_LABELS = np.arange(35)
# Generating data
X = [np.random.uniform(0,1, size=(segment_lengths[k], segment_width, segment_height, segment_channels))
for k in range(batch_size)]
y = [np.random.choice(TARGET_LABELS, size=label_lengths[k]) for k in range(batch_size)]
# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(None, segment_width, segment_height, segment_channels),
dtype='float32')
input_segment_lengths = tf.keras.Input(name='input_segment_lengths', shape=[1], dtype='int64')
input_label_lengths = tf.keras.Input(name='input_label_lengths', shape=[1], dtype='int64')
# More complex architecture comes here
outputs = Flatten()(input_segments_data)
model = tf.keras.Model(inputs=[input_segments_data, input_segment_lengths, input_label_lengths], outputs = outputs)
def dummy_loss(y_true, y_pred):
return y_pred
model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()
output:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_segments_data (InputLayer [(None, None, 8, 65, 0
__________________________________________________________________________________________________
input_segment_lengths (InputLay [(None, 1)] 0
__________________________________________________________________________________________________
input_label_lengths (InputLayer [(None, 1)] 0
__________________________________________________________________________________________________
flatten (Flatten) (None, None) 0 input_segments_data[0][0]
==================================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
__________________________________________________________________________________________________
Now when I try to predict from my random data:
model.predict([X, segment_lengths, segment_lengths])
I get this error:
ValueError: Error when checking input: expected input_segments_data to have 5 dimensions, but got array with shape (4, 1)
How can I convert X (which is a list of arrays) to a tensor of shape (None, None, 8, 65, 1) and feed it to my model? I don't want to use zero padding!
Keras model takes numpy array (tensor) as input. You cannot have a tensor with variable timesteps. Instead, what you can do is to pad all the data into same shape, using e.g. pad_sequence And then, you can add a Masking layer to your model to ignore the padded values.
This is a common issue with Tensorflow and other deep learning frameworks that operate on tensors. Unfortunately, there is no current easy way to this exactly as you asked, besides padding your sequences and then masking.
To do this, you simply have to store your input data in a numpy array with fixed dimensions and feed that to the model. You have to add dummy values to represent the missing timesteps in your sequences (a common value is 0).
Then, you have to add a Masking layer to your model, that will tell Keras to ignore the timesteps that have the dummy features.
From the documentation:
keras.layers.Masking(mask_value=0.0)
If all features for a given sample timestep are equal to mask_value, then the sample timestep will be masked (skipped) in all downstream layers (as long as they support masking).
I've adapted and simplified part of your code to give you an idea of how this works. You can adapt this to your variable-sized labels, as well:
# Generating data (using a dummy zero-array to store padded sequences)
X = np.zeros((batch_size, max(segment_lengths), segment_width, segment_height, segment_channels))
X_true = [np.ones((segment_lengths[k], segment_width, segment_height, segment_channels))
for k in range(batch_size)]
# Populate dummy array
for i, x in enumerate(X_true):
X[i, -segment_lengths[i]:, ...] = x
# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(max(segment_lengths), segment_width, segment_height, segment_channels))
masked_segments_data = tf.keras.layers.Masking()(input_segments_data)
# More complex architecture comes here
outputs = tf.keras.layers.Flatten()(input_segments_data)
model = tf.keras.Model(inputs=input_segments_data, outputs = outputs)
def dummy_loss(y_true, y_pred):
return y_pred
model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()
A drawback of this approach is that if you actually have a "real" feature that is exactly like a dummy feature (e.g., all zeros), the model will mask it. Choose your masking value appropriately to avoid this.
An alternative approach would be to do something similar as what you did, but using batches of size 1. This, however, is likely to cause instability in your training procedure and I would avoid it if possible.
As a final note, Tensorflow 2 added support for RaggedTensors, which are tensors with one or more variable dimensions. Currently there is no support for RNNs, but it will probably be added eventually.
Hope this helps.

Python, Keras - ValueError: Cannot feed value of shape (10, 70, 1025) for Tensor u'dense_2_target:0', which has shape '(?, ?)'

I am trying to train a RNN by batches.
The input input size
(10, 70, 3075),
where 10 is the batch size, 70 the time dimension, 3075 are the frequency dimension.
There are three outputs whose size is
(10, 70, 1025)
each, basically 10 spectrograms with size (70,1025).
I would like to train this RNN by regression, whose structure is
input_img = Input(shape=(70,3075 ) )
x = Bidirectional(LSTM(n_hid,return_sequences=True, dropout=0.5, recurrent_dropout=0.2))(input_img)
x = Dropout(0.2)(x)
x = Bidirectional(LSTM(n_hid, dropout=0.5, recurrent_dropout=0.2))(x)
x = Dropout(0.2)(x)
o0 = ( Dense(1025, activation='sigmoid'))(x)
o1 = ( Dense(1025, activation='sigmoid'))(x)
o2 = ( Dense(1025, activation='sigmoid'))(x)
The problem is that output dense layers cannot take into account three dimensions, they want something like (None, 1025), which I don't know how to provide, unless I concatenate along the time dimension.
The following error occurs:
ValueError: Cannot feed value of shape (10, 70, 1025) for Tensor u'dense_2_target:0', which has shape '(?, ?)'
Would be the batch_shape option useful in the input layer? I have actually tried it, but I've got the same error.
In this instance the second RNN is collapsing the sequence to a single vector because by default return_sequences=False. To make the model return sequences and run the Dense layer over each timestep separately just add return_sequences=True to the second RNN as well:
x = Bidirectional(LSTM(n_hid, return_sequences=True, dropout=0.5, recurrent_dropout=0.2))(x)
The Dense layers automatically apply to the last dimension so no need to reshape afterwards.
To get the right output shape, you can use the Reshape layer:
o0 = Dense(70 * 1025, activation='sigmoid')(x)
o0 = Reshape((70, 1025)))(o0)
This will output (batch_dim, 70, 1025). You can do exactly the same for the other two outputs.

Shape of 1D convolution output on a 2D data using keras

I am trying to implement a 1D convolution on a time series classification problem using keras. I am having some trouble interpreting the output size of the 1D convolutional layer.
I have my data composed of the time series of different features over a time interval of 128 units and I apply a 1D convolutional layer:
x = Input((n_timesteps, n_features))
cnn1_1 = Conv1D(filters = 100, kernel_size= 10, activation='relu')(x)
which after compilation I obtain the following shapes of the outputs:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_26 (InputLayer) (None, 128, 9) 0
_________________________________________________________________
conv1d_28 (Conv1D) (None, 119, 100) 9100
I was assuming that with 1D convolution, the data is only convoluted across the time axis (axis 1) and the size of my output would be:
119, 100*9. But I guess that the network is performing some king of operation across the feature dimension (axis 2) and I don't know which operation is performing.
I am saying this because what I interpret as 1d convolution is that the features shapes must be preserved because I am only convolving the time domain: If I have 9 features, then for each filter I have 9 convolutional kernels, each of these applied to a different features and convoluted across the time axis. This should return 9 convoluted features for each filter resulting in an output shape of 119, 9*100.
However the output shape is 119, 100.
Clearly something else is happening and I can't understand it or get it.
where am I failing my reasoning? How is the 1d convolution performed?
I add one more comment which is my comment on one of the answers provided:
I understand the reduction from 128 to 119, but what I don't understand is why the feature dimension changes. For example, if I use
Conv1D(filters = 1, kernel_size= 10, activation='relu')
, then the output dimension is going to be (None, 119, 1), giving rise to only one feature after the convolution. What is going on in this dimension, which operation is performed to go from from 9 --> 1?
Conv1D needs 3D tensor for its input with shape (batch_size,time_step,feature). Based on your code, the filter size is 100 which means filter converted from 9 dimensions to 100 dimensions. How does this happen? Dot Product.
In above, X_i is the concatenation of k words (k = kernel_size), l is number of filters (l=filters), d is the dimension of input word vector, and p_i is output vector for each window of k words.
What happens in your code?
[n_features * 9] dot [n_features * 9] => [1] => repeat l-times => [1 * 100]
do above for all sequences => [128 * 100]
Another thing that happens here is you did not specify the padding type. According to the docs, by default Conv1d use valid padding which caused your dimension to reduce from 128 to 119. If you need the dimension to be the same as the input you can choose the same option:
Conv1D(filters = 100, kernel_size= 10, activation='relu', padding='same')
It Sums over the last axis, which is the feature axis, you can easily check this by doing the following:
input_shape = (1, 128, 9)
# initialize kernel with ones, and use linear activations
y = tf.keras.layers.Conv1D(1,3, activation="linear", input_shape=input_shape[2:],kernel_initializer="ones")(x)
y :
if you sum x along the feature axis you will get:
x
Now you can easily see that the sum of the first 3 values of sum of x is the first value of convolution, I used a kernel size of 3 to make this verification easier

Categories