LSTM Initial state from Dense layer

LSTM Initial state from Dense layer - python

I am using a lstm on time series data. I have features about the time series that are not time dependent. Imagine company stocks for the series and stuff like company location in the non-time series features. This is not the usecase, but it is the same idea. For this example, let's just predict the next value in the time series.
So a simple example would be:
feature_input = Input(shape=(None, data.training_features.shape[1]))
dense_1 = Dense(4, activation='relu')(feature_input)
dense_2 = Dense(8, activation='relu')(dense_1)
series_input = Input(shape=(None, data.training_series.shape[1]))
lstm = LSTM(8)(series_input, initial_state=dense_2)
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])
however, I am just not sure on how to specify the initial state on the list correctly. I get
ValueError: An initial_state was passed that is not compatible with `cell.state_size`. Received `state_spec`=[<keras.engine.topology.InputSpec object at 0x11691d518>]; However `cell.state_size` is (8, 8)
which I can see is caused by the 3d batch dimension. I tried using Flatten, Permutation, and Resize layers but I don't believe that is correct. What am I missing and how can I connect these layers?

The first problem is that an LSTM(8) layer expects two initial states h_0 and c_0, each of dimension (None, 8). That's what it means by "cell.state_size is (8, 8)" in the error message.
If you only have one initial state dense_2, maybe you can switch to GRU (which requires only h_0). Or, you can transform your feature_input into two initial states.
The second problem is that h_0 and c_0 are of shape (batch_size, 8), but your dense_2 is of shape (batch_size, timesteps, 8). You need to deal with the time dimension before using dense_2 as initial states.
So maybe you can change your input shape into (data.training_features.shape[1],) or take average over timesteps with GlobalAveragePooling1D.
A working example would be:
feature_input = Input(shape=(5,))
dense_1_h = Dense(4, activation='relu')(feature_input)
dense_2_h = Dense(8, activation='relu')(dense_1_h)
dense_1_c = Dense(4, activation='relu')(feature_input)
dense_2_c = Dense(8, activation='relu')(dense_1_c)
series_input = Input(shape=(None, 5))
lstm = LSTM(8)(series_input, initial_state=[dense_2_h, dense_2_c])
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])

Related

TensorFlow Keras MaxPool2D breaks LSTM with CTC loss?

I am trying to tie together a CNN layer with 2 LSTM layers and ctc_batch_cost for loss, but I'm encountering some problems. My model is supposed to work with grayscale images.
During my debugging I've figured out that if I use just a CNN layer that keeps the output size equal to the input size + LSTM and CTC, the model is able to train:
# === Without MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
# Go from Bx128x32x1 to Bx128x32 (B x TimeSteps x Features)
rnn_inp = Reshape((128, 32))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, calling fit works!
But when I add a MaxPool2D layer that halves the dimensions, I get an error sequence_length(0) <= 64, similar to the one presented here.
# === With MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
maxp = MaxPool2D(name='maxp', pool_size=2, strides=2, padding='valid')(cnn) # -> 64x16x1
# Go from Bx64x16x1 to Bx64x16 (B x TimeSteps x Features)
rnn_inp = Reshape((64, 16))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, but calling fit crashes with:
# InvalidArgumentError: sequence_length(0) <= 64
# [[{{node ctc_loss_1/CTCLoss}}]]

After struggling for about 3 days with this problem, I posted the above question here, on StackOverflow. About 2 hours after posting the questions I finally figured it out.
TL;DR Solution:
If you're using ctc_batch_cost:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the input_length argument.
If you're using ctc_loss:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the logit_length argument.
Solution:
The solution lies in the documentation, which, relatively sparse, can be cryptic for a machine learning newbie like myself.
The TensorFlow documentation for ctc_batch_cost reads:
tf.keras.backend.ctc_batch_cost(
y_true, y_pred, input_length, label_length
)
...
input_length tensor (samples, 1) containing the sequence length for
each batch item in y_pred.
...
input_length corresponds to logit_length from ctc_loss function's TensorFlow documentation:
tf.nn.ctc_loss(
labels, logits, label_length, logit_length, logits_time_major=True, unique=None,
blank_index=None, name=None
)
...
logit_length tensor of shape [batch_size] Length of input sequence in
logits.
...
That's where it clicked, at the word logit. So, the argument for input_length or logit_length is supposed to be a tensor/container (in my case, numpy array) of the lengths (i.e. number of timesteps) of the sequences entering the RNN (in my case LSTM) as input.
I was originally making the mistake of considering the required length to be the width of the grayscale images that act as input for the whole network (CNN + MaxPool2D + RNN), but because the MaxPool2D layer creates a tensor of different dimensions for the RNN's input, the ctc loss function crashes.
Now fit runs without crashing.

ValueError: Error when checking target: expected dense_13 to have shape (None, 6) but got array with shape (6, 1)

I am training a classification network with training data which has X.shape = (1119, 7) and Y.shape = (1119, 6). Below is my simple Keras network with and output dim of 6 (size of labels). The error which is returned is below the code
hidden_size = 128
model = Sequential()
model.add(Embedding(7, hidden_size))
#model.add(LSTM(128, input_shape=(1,7)))
model.add(LSTM(hidden_size, return_sequences=True))
model.add(LSTM(hidden_size, return_sequences=True))
model.add(Dense(output_dim=6, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=["categorical_accuracy"])
ValueError: Error when checking target: expected dense_13 to have shape (None, 6) but got array with shape (6, 1)
I would perfer not to do this in tensorflow because I am just prototyping yet it is my first run at Keras and am confused about why it cannot take this data. I attempted to reshape the data in a number of ways in which nothing worked. Any advice as to why this isn't work would be greatly appreciated.

You should probably remove the parameter return_sequences=True from your last LSTM layer. When using return_sequences=True, the output of the LSTM layer has shape (seq_len, hidden_size). Passing this on to a Dense layer gives you an output shape of (seq_len, 6), which is incompatible with your labels. If you instead omit return_sequences=True, then your LSTM layer returns shape (hidden_size,) (it only returns the last element of the sequence) and subsequently your final Dense layer will have output shape (6,) like your labels.

RNN with GRU in Keras

I want to implement Recurrent Neural network with GRU using Keras in python. I have problem in running code and I change variables more and more but it doesn't work. Do you have an idea for solve it?
inputs = 42 #number of columns input
num_hidden =50 #number of neurons in the layer
outputs = 1 #number of columns output
num_epochs = 50
batch_size = 1000
learning_rate = 0.05
#train (125973, 42) 125973 Rows and 42 Features
#Labels (125973,1) is True Results
model = tf.contrib.keras.models.Sequential()
fv=tf.contrib.keras.layers.GRU
model.add(fv(units=42, activation='tanh', input_shape= (1000,42),return_sequences=True)) #i want to send Batches to train
#model.add(tf.keras.layers.Dropout(0.15)) # Dropout overfitting
#model.add(fv((1,42),activation='tanh', return_sequences=True))
#model.add(Dropout(0.2)) # Dropout overfitting
model.add(fv(42, activation='tanh'))
model.add(tf.keras.layers.Dropout(0.15)) # Dropout overfitting
model.add(tf.keras.layers.Dense(1000,activation='softsign'))
#model.add(tf.keras.layers.Activation("softsign"))
start = time.time()
# sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
# model.compile(loss="mse", optimizer=sgd)
model.compile(loss="mse", optimizer="Adam")
inp = np.array(train)
oup = np.array(labels)
X_tr = inp[:batch_size].reshape(-1, batch_size, inputs)
model.fit(X_tr,labels,epochs=20, batch_size=batch_size)
However I get the following error:
ValueError: Error when checking target: expected dense to have shape (1000,) but got array with shape (1,)

Here, you have mentioned input vector shape to be 1000.
model.add(fv(units=42, activation='tanh', input_shape= (1000,42),return_sequences=True)) #i want to send Batches to train
However, shape of your training data (X_tr) is 1-D
Check your X_tr variable and have same dimension for input layer.

If you read the error carefully you would realize there is a shape mismatch between the shapes of labels you provide, which is (None, 1), and the shape of output of model, which is (None, 1):
ValueError: Error when checking target: <--- This means the output shapes
expected dense to have shape (1000,) <--- output shape of model
but got array with shape (1,) <--- the shape of labels you give when training
Therefore you need to make them consistent. You just need to change the number of units in the last layer to 1 since there is one output per input sample:
model.add(tf.keras.layers.Dense(1, activation='softsign')) # 1 unit in the output

understanding shapes for Keras model

I am trying to wrap my head around the shape needed for my specific task. I am attempting to train a qlearner on some time series data which is contained in a dataframe. My dataframe has the following columns: open, close, high, low and I am trying to get a sliding window of say 50x timesteps. Here is example code for each window:
window = df.iloc[0:50]
df_norm = (window - window.mean()) / (window.max() - window.min())
x = df_norm.values
x = np.expand_dims(x, axis=0)
print x.shape
#(1,50, 4)
Now that I know my shape is (1,50,4) for each item in X I'm at a loss for what shape I feed my model. Lets say I have the following:
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(50,4)))
model.add(LSTM(32, return_sequences=True))
model.add(Dense(num_actions))
Gives the following error
ValueError: could not broadcast input array from shape (50,4) into shape (1,50)
And here is another attempt:
model = Sequential()
model.add(Dense(hidden_size, input_shape=(50,4), activation='relu'))
model.add(Dense(hidden_size, activation='relu'))
model.add(Dense(num_actions))
model.compile(sgd(lr=.2), "mse")
which gives the following error:
ValueError: could not broadcast input array from shape (50,4) into shape (1,50))
Here is the shape the model is expecting and the state from my env:
print "Inputs: {}".format(model.input_shape)
print "actual: {}".format(env.state.shape)
#Inputs: (None, 50, 4)
#actual: (1, 50, 4)
Can someone explain where I am going wrong with the shapes here?

The recurrent layer takes inputs of shape (batch_size, timesteps, input_features). Since the shape of x is (1, 50, 4), the data should be interpreted as a single batch of 50 timesteps, each containing 4 features. When initializing the first layer of a model, you pass an input_shape: a tuple specifying the shape of the input, excluding the batch_size dimension. In the case of LSTM layers, you can pass None as the timesteps dimension. Hence, this is how the first layer of the network should be initialized:
model.add(LSTM(32, return_sequences=True, input_shape=(None, 4)))
The second LSTM layer is followed by a dense layer. So you don't need to return sequences for this layer. Hence, this is how you should initialize the second LSTM layer:
model.add(LSTM(32))
Every batch of 50 time steps in x is supposed to be mapped to a single action vector in y. Therefore, since the shape of x is (1, 50, 4), the shape of y must be (1, num_actions). Make sure y doesn't have the timesteps dimension.
Therefore, under the assumption that x and y have the right shapes, the following code should work:
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(None, 4)))
model.add(LSTM(32))
model.add(Dense(num_actions))
model.compile(sgd(lr=.2), "mse")
# x.shape == (1, 50, 4)
# y.shape == (1, num_actions)
history = model.fit(x, y)

LSTM Keras target size error

#x_train.shape = 7x5x5 numpy array
#y_train.shape = 3x5x5 numpy array
#x_test.shape = (7,) numpy array
#y_test.shape = (3,) numpy array I have binary output as 0 or 1.
timeteps = 5
data_dim = 5
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=`(timesteps,data_dim)))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=5, batch_size=1)
score = model.evaluate(X_test,y_test,batch_size=1)
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (3, 5, 5)
I am trying to model LSTM using random data and this error occurs. I have tried many things but I could not succeed.
Thanks in advance.

There are a few problems/misunderstands here?
You can see that your y is actually 3 dimensional. However, the last lstm layer, you have return sequences as false, meaning that the LSTM is returning a single 32 long vector and sending that into the dense layer.
Furthermore, the use of multiple LSTMS here seems to lack purpose, though it does not necessarily harm anything.
In order to fit your presumed data, you would want the last lstm to have return_sequences as True, and have the number of neurons in that lstm not 32, but rather 5, as in the final dimension of your y data.
You could also not have it at all (since you already have two lstms before that, and instead make the second lstm only have 5 neurons and have the final lstm layer be removed entirely. You would then use a time distributed wrapper on the last dense layer
model.add(TimeDistrubuted(Dense(1,activation='sigmoid')))
which says to apply the same dense layer to every timestep of the data, which is required by the shape of your y data.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.