How to correctly shape my CNN-LSTM input layer - python

I have a data set with the shape (3340, 6). I want to use a CNN-LSTM to read a sequence of 30 rows and predict the next row's (6) elements. From what I have read, this is considered a multi-parallel time series. I have been primarily following this machine learning mastery tutorial and am having trouble implementing the CNN-LSTM architecture for a multi-parallel time series.
I have used this function to split the data into 30 day time step frames
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
X, y = list(), list()
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the dataset
if end_ix > len(sequences)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
Here is a sample of the data frames produced by the function above.
# 30 Time Step Input Frame X[0], X.shape = (3310, 30, 6)
[4.951e-02, 8.585e-02, 5.941e-02, 8.584e-02, 8.584e-02, 5.000e+00],
[8.584e-02, 9.307e-02, 7.723e-02, 8.080e-02, 8.080e-02, 4.900e+01],
[8.080e-02, 8.181e-02, 7.426e-02, 7.474e-02, 7.474e-02, 2.000e+01],
[7.474e-02, 7.921e-02, 6.634e-02, 7.921e-02, 7.921e-02, 4.200e+01],
...
# 1 Time Step Output Array y[0], y.shape = (3310, 6)
[6.550e-02, 7.690e-02, 6.243e-02, 7.000e-02, 7.000e-02, 9.150e+02]
Here is the following model that I am using:
model = Sequential()
model.add(TimeDistributed(Conv1D(64, 1, activation='relu'), input_shape=(None, 30, 6)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(Dense(6))
model.compile(optimizer='adam', loss='mse')
When I run model.fit, I receive the following error:
ValueError: Error when checking input: expected time_distributed_59_input to have
4 dimensions, but got array with shape (3310, 30, 6)
I am at a loss at how to properly shape my input layer so that I can get this model learning. I have done several Conv2D nets in the past but this is my first time series model so I apologize if there's an obvious answer here that I am missing.

Remove TimeDistributed from Conv1D and MaxPooling1D; 3D inputs are supported
Remove Flatten(), as it destroys timesteps-channels relationships
Add TimeDistributed to the last Dense layer, as Dense does not support 3D inputs (returned by LSTM(return_sequences=True); alternatively, use return_sequences=False)

Related

Keras Many-to-one predicting the entire sequence

I have a keras model that is trained on a sequence of data with a single label. I'm assuming a categorically encoded feature which passes through an embedding layer before a GRU layer.
samples, timesteps, features = 2000, 10, 1
inputs_1 = np.random.randint(1, 50, [samples, timesteps, features]).astype(np.float32)
labels = np.random.randint(0, 2, [samples, 1])
# Input
input_ = Input(shape=(None,))
# Embeddings
emb = Embedding(input_dim=int(50),
output_dim=20,
input_length=(None,),
mask_zero=False,
name="cat_feat_0" + "_emb")(input_)
gru = GRU(32,
activation="tanh",
dropout=0,
recurrent_dropout=0,
go_backwards=False,
return_sequences=False,
name="gru_cat")(emb)
y = Dense(10, activation = "tanh")(gru)
y = Dropout(0.4)(y)
y = Dense(1, activation = "sigmoid")(y)
model = Model(inputs=input_, outputs=y)
model.compile(loss=BCE_Last_Event,
optimizer=Adam(beta_1=0.9, beta_2=0.999),
metrics=["accuracy"])
model.predict(inputs_1).shape
When I predict my data, the output shape is (2000,1) given that it predicts a single label for the sequence. Would it be possible to output the scores for every event in the sequence such that the model returns predictions of shape (2000, 10, 1)?
I know I can return the sequence in the GRU layer which will be propagated. However, I still only have a single label so the loss function would be erroneous. My current thinking is either:
Create a new model which returns the sequences using the same weights as the trained model
Wrap the model in a TimeDistributed layer such that it predicts every event in the sequence.
I am concerned that the second solution will be error-prone as it will only take as input a single event throughout the entire length of the sequence, rather than the entire sequence for its prediction. Is this thinking correct?
What are the best solutions?

keras lstm incorrect input_shape

I am trying to use a lstm model to predict the weather (mainly to learn about lstm's and using python).
I have a dataset of 500,000 rows each of which represents a date and there are 8 columns which are my features.
Below is my model.
model = Sequential()
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.fit(
X,
y,
batch_size=512,
epochs=100,
validation_split=0.05)
For the input parameters as I understand it the first parameter is the time step so here I am saying that I think the last 30 observations should be used to predict the next value. The 8 as I understand are the features so, air pressure, temperature etc.
So my X matrix I convert into a 3D matrix with the line below so X is now 500000, 8, 1 matrix.
X = np.reshape(X, (X.shape[0], X.shape[1], 1))
When I run the model though I get the error below.
ValueError: Error when checking input: expected lstm_3_input to have shape (30, 8) but got array with shape (8, 1)
What am I doing wrong?
Your issue is with data preparation.
Find details on data preparation for LSTMs here.
LSTMs map a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple samples Consider a given univariate sequence:
[10, 20, 30, 40, 50, 60, 70, 80, 90]
We can divide the sequence into multiple input/output patterns called samples, where three n_steps time steps are used as input and one time step is used as label for the one-step prediction that is being learned.
X, y
10, 20, 30 40
20, 30, 40 50
30, 40, 50 60
# ...
So what you want to do is implemented in the split_sequence() function below:
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
Getting back to our initial example the following happens:
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# summarize the data
for i in range(len(X)):
print(X[i], y[i])
# [10 20 30] 40
# [20 30 40] 50
# [30 40 50] 60
# [40 50 60] 70
# [50 60 70] 80
# [60 70 80] 90
Take away: Now your shapes should be what your LSTM model expects them to be, and you should be able to adjust your data shape to your needs. Obviously the same works for multiple input feature rows.
I think your input shape is off. The NN does not understand that you want it to take slices of 30 points to predict 31st. What you need to do is to slice your dataset into chunks of length 30 (which means each point is going to be copied 29 time) and train on that, which will have a shape of (499969, 30, 8) , assuming that last point goes only into y. Also do not add a dummy dimension at the end, it is needed in conv layers for RGB channels.
I think you might need just a simple explanation of how layers work. In particular, note that all Keras layers behave something like this:
NAME(output_dim, input_shape = (...,input_dim))
For example, suppose I have 15000, 3 long vectors and I would like to change them to 5 long vectors. Then something like this would do that:
import numpy as np, tensorflow as tf
X = np.random.random((15000,3))
Y = np.random.random((15000,5))
M = tf.keras.models.Sequential()
M.add(tf.keras.layers.Dense(5,input_shape=(3,)))
M.compile('sgd','mse')
M.fit(X,Y) # Take note that I provided complete working code here. Good practice.
# I even include the imports and random data to check that it works.
Likewise, if my input looks something like (1000,10,5) and I run it through an LSTM like LSTM(7); then I should know (automatically) that I will get something like (...,7) as my output. Those 5 long vectors will get changed to 7 long vectors. Rule to understand. The last dimension is always the vector you are changing and the first parameter of the layer is always the dimension to change it to.
Now the second thing to learn about LSTMs. They use a time axis (which is not the last axis, because as we just went over, that is always the "changing dimension axis") which is removed if return_sequences=False and kept if return_sequences=True. Some examples:
LSTM(7) # (10000,100,5) -> (10000,7)
# Here the LSTM will loop through the 100, 5 long vectors (like a time series with memory),
# producing 7 long vectors. Only the last 7 long vector is kept.
LSTM(7,return_sequences=True) # (10000,100,5) -> (10000,100,7)
# Same thing as the layer above, except we keep all the intermediate steps.
You provide a layer that looks like this:
LSTM(50,input_shape=(30,8),return_sequences=True) # (10000,30,8) -> (10000,30,50)
Notice the 30 is the TIME dimension used in your LSTM model. The 8 and the 50 are the INPUT_DIM and OUTPUT_DIM, and have nothing to do with the time axis. Another common misunderstanding, notice that the LSTM expects you to provide each SAMPLE with it's own COMPLETE PAST and TIME AXIS. That is, an LSTM does not use previous sample points for the next sample point; each sample is independent and comes with it's own complete past data.
So let's take a look at your model. Step one. What is your model doing and what kind of data is it expecting?
from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')
print(model.input_shape)
model.summary() # Lets see what your model is doing.
So, now I clearly see your model does:
(10000,30,8) -> (10000,30,50) -> (10000,30,100) -> (10000,50) -> (10000,1)
Did you expect that? Did you see that those would be the dimensions of the intermediate steps? Now that I know what input and output your model is expecting, I can easily verify that your model trains and works on that kind of data.
from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential
import numpy as np
X = np.random.random((10000,30,8))
Y = np.random.random((10000,1))
model = Sequential()
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')
model.fit(X,Y)
Did you notice that your model was expecting inputs like (...,30,8)? Did you know your model was expecting output data that looked like (...,1)? Knowing what your model wants, also means you can now change your model to fit the data your interested in. If you want your data to run over your 8 parameters like a time axis, then your input dimension needs to reflect that. Change the 30 to an 8 and change the 8 to a 1. If you do this, notice also that your first layer is expanding each 1 long vector (a single number) into a 50 long vector. Does that sound like what you wanted the model to do? Maybe your LSTM should be an LSTM(2) or LSTM(5) instead of 50...etc. You could spend the next 1000 hours trying to find the right parameters that work with the data you are using.
Maybe you don't want to go over your FEATURE space as a TIME SPACE, maybe try repeating your data into batches of size 10, where each sample has it's own history, dimensions say (10000,10,8). Then a LSTM(50) would use your 8 long feature space and change it into a 50 long feature space while going over the TIME AXIS of 10. Maybe you just want to keep the last one with return_sequences=False.
Let me copy a function I used for preparing my data for LSTM:
from itertools import islice
def slice_data_for_lstm(data, lookback):
return np.array(list(zip(*[islice(np.array(data), i, None, 1) for i in range(lookback)])))
X_sliced = slice_data_for_lstm(X, 30)
lookback should be 30 in your case and will create 30 stacks of your (8, 1) features. The resulting data is in shape (N, 30, 8, 1).

Bi-directional LSTM for entity recognition

Following a paper, I'm using word embeddings as a feature vector for entity recognition.
I've attempted to architect the network using Keras but have run into a dimensionality problem I cannot seem to resolve.
Take the following example sentence:
["I went to the shop"]
The sentence has 5 words, and after computing the feature matrix, I am left with a matrix of dimension: (1, 120, 1000) == (#examples, sequence_length, embedding).
Note that sequence_length appends 0. padding when not complete. In this example, the actual sequence_length would be 5.
My network architecture is as follows:
enc = encode()
claims_input = Input(shape=(120, 1000), dtype='float32', name='claims')
x = Masking(mask_value=0., input_shape=(120, 1000))(claims_input)
x = Bidirectional(LSTM(units=512, return_sequences=True, recurrent_dropout=0.2, dropout=0.2))(x)
x = Bidirectional(LSTM(units=512, return_sequences=True, recurrent_dropout=0.2, dropout=0.2))(x)
out = TimeDistributed(Dense(8, activation="softmax"))(x)
model = Model(inputs=claims_input, output=out)
model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=["accuracy"])
model.fit(enc, y)
The architecture is straight forward, I mask specific time steps, run two bidirectional LSTMs, followed by a softmax output. My y variable in this case, is a (9,8) one-hot-encoded matrix corresponding to the gold label of each word.
When trying to fit() this model, I am running into a dimensionality problem relating to the TimeDistributed() layer and I'm unsure how to resolve, or even begin to debug this.
Error: ValueError: Error when checking target: expected time_distributed_1 to have 3 dimensions, but got array with shape (9, 8)
Any help would be appreciated.
You are doing entity recognition. So each element in your input sequence will be assigned an entity (probably some of them as null). If your model takes an input sample of shape (120, n_features), then the output must also be a sequence of length of 120, i.e. one entity for each element. Therefore, the labels, i.e. y, you provide to the model must have a shape of (n_samples, 120, n_entities) (or (n_samples, 120, 1) if you are using sparse labeling).
Side note: There is no difference between TimeDistributed(Dense(...)) and Dense(...), as the Dense layer is applied on the last axis.

Mismatch in expected Keras shapes after pooling

I'm building a few simple models in Keras to improve my knowledge of deep learning, and encountering some issues I don't quite understand how to debug.
I want to use a 1D CNN to perform regression on some time-series data. My input feature tensor is of shape N x T x D, where N is the number of data points, T is the number of sequences, and D is the number of dimensions. My target tensor is of shape N x T x 1 (1 because I am trying to output a scalar value).
I've set up my model architecture like this:
feature_tensor.shape
# (75584, 40, 38)
target_tensor.shape
# (75584, 40, 1)
inputs = Input(shape=(SEQUENCE_LENGTH,DIMENSIONS))
conv1 = Conv1D(filters=64, kernel_size=3, activation='relu')
x = conv1(inputs)
x = MaxPooling1D(pool_size=2)(x)
x = Flatten()(x)
x = Dense(100, activation='relu')(x)
predictions = Dense(1, activation="linear")(x)
model = Model(inputs, predictions)
opt = Adam(lr=1e-5, decay=1e-4 / 200)
model.compile(loss="mean_absolute_error", optimizer=opt)
When I attempt to train my model, however, I get the following output:
r = model.fit(cleaned_tensor, target_tensor, epochs=100, batch_size=2058)
ValueError: Error when checking target: expected dense_164 to have 2
dimensions, but got array with shape (75584, 40, 1).
The first two numbers are familiar: 75584 is the # of samples, 40 is the sequence length.
When I debug my model summary object, I see that the expected output from the Flatten layer should be 1216:
However, my colleague and I stared at the code for a long time and could not understand why the shape of (75584, 40, 1) was being arrived at via the architecture when it reached the dense layer.
Could someone point me in the direction of what I am doing wrong?
Try reshaping your target variable to N x T, and it looks like your final dense layer should be 40 rather than 1 (i think).

Python Keras Multiple Input Layers - How to Concatenate/Merge?

In python, I am trying to build a neural network model using Sequential in keras to perform binary classification. Note that X is a numpy array of time series data 59x1000x3 (samples x timesteps x features) and D is a numpy array of 59x100 (samples x auxillary features). I want to pass the time series through an lstm layer, and then augment at a later layer with the accompanying features (i.e. concatenate two layers).
My code to fit the model is below:
def fit_model(X, y, D, neurons, batch_size, nb_epoch):
model = Sequential()
model.add(LSTM(units = neurons, input_shape = (X.shape[1], X.shape[2]))
model.add(Dropout(0.1))
model.add(Dense(10))
input1 = Sequential()
d = K.variable(D)
d_input = Input(tensor=d)
input1.add(InputLayer(input_tensor=d_input))
input1.add(Dropout(0.1))
input1.add(Dense(10))
final_model = Sequential()
merged = Concatenate([model, input1])
final_model.add(merged)
final_model.add(Dense(1, activation='sigmoid'))
final_model.compile(loss = 'binary_crossentropy', optimizer = 'adam')
final_model.fit(X, y, batch_size = batch_size, epochs = nb_epoch)
return final_model
I get the following error:
ValueError: A Concatenate layer should be called on a list of at least 2 inputs
I tried using various permutations of merge/concatenate/the functional api/not the functional api, but I keep landing with some sort of error. I've seen answers using Merge from keras.engine.topology. However, it seems to now be deprecated. Any suggestions to fix the error when using Sequential or how to convert the code to the functional API would be appreciated. Thanks.
You are incorrectly passing a Model and an Input as parameters of the Concatenate layer:
merged = Concatenate([model, input1])
Try passing another Input layer instead:
merged = Concatenate([input1, input2])

Categories